Otis, On Fri, Mar 04, 2005 at 11:31:22AM -0500, Brian Cuttler wrote: > > Otis, > > > If by shtml you mean HTML with server-side includes, then note that you > > will not be able to do this with Lucene alone, as server-side includes > > are not static. > > My understanding is that our webmaster is hoping to have very simple > files created by the data-suppliers, ie, depatmental people that generate > web pages. Then use server-side-includes to include site or departement > comment headers/footers/backgrounds, so that ultimately it will be > the .shtml with the interesting data.
Still not sure I used the right vocabulary to express myself. I don't believe that there is much in the included files that will need to be indexed, that will be common header/footer data. It is the static information in the unprocessed shtml that I believe will be the data people wanting to search the sites will want to find. > Is this something that in your experience works well or is there some > other methology I can recommend to her ? > > thank you, > > Brian > > On Thu, Mar 03, 2005 at 02:40:41PM -0800, Otis Gospodnetic wrote: > > Brian, > > > > It sounds like you are using a little demo application that comes with > > Lucene. This is really just a demo that shows how you can use Lucene. > > Lucene in a toolkit for building search applications, so you would > > really want to develop something of your own around Lucene. Sure, you > > can use a demo, but that little demo is not perfect. Since v 1.2 there > > have been some changes in the demo area, so you could try grabbing the > > latest version of Lucene and trying its demo (same application, it's > > just that it may be a bit better). If that fails, you can try the file > > indexing framework from Lucene in Action (c.f. > > http://www.lucenebook.com ) - source code is freely available. > > > > If by shtml you mean HTML with server-side includes, then note that you > > will not be able to do this with Lucene alone, as server-side includes > > are not static. > > > > Otis > > > > > > --- Brian Cuttler <[EMAIL PROTECTED]> wrote: > > > > > I've sorry if this is the wrong forum, I was trying for lucene-user > > > and been unable to subscribe (but seem to see lucene items here). > > > > > > We have been runing apache on our internal sites for a while, with > > > tomcat and lucene. Plugged in the demo index build and search > > > features... and for a long time life has been good. > > > > > > Now we are looking to implement apache on our external site with > > > tomcat and lucene. > > > > > > System is Solaris 9 > > > Apache/1.3.29 (Unix) ApacheJserv/1.1.2 mod_perl/1.25 configured > > > Apache, with Tomcat included, from Solaris freeware site. > > > > > > I didn't see Lucene on the Sun FW site so just replicated the > > > installation > > > from the internal to the (future) external website. > > > > > > Lucene is currently v 1.2 (at least that is the version number of the > > > demo package). > > > > > > The index we are building (org.apache.lucene.demo.IndexHTML) seems to > > > capture the tags from the "ALT" text, where really we need it to pick > > > up not image texts but content and keyword fields, or perhaps even > > > plain > > > text that is outside of the ALT tags. > > > > > > We also suspect that we are not picking up all documents, ie, not > > > both > > > html, htm. We'd like to extend the range of documents we index, soon > > > to include shtml unless I'm mistaken. > > > > > > I strongly suspect that newer demos might already do this, or that > > > with > > > some basic instruction I could modify the document extentions if not > > > the > > > (I suspect rather complex) target strings. > > > > > > Unfortunately while the implement demo docs are great, I've so far > > > not > > > found (or simply not understood) the docs that might give the options > > > we are hoping to implement. > > > > > > Thank you in advance, > > > > > > Brian > > > > > > --- > > > Brian R Cuttler [EMAIL PROTECTED] > > > Computer Systems Support (v) 518 486-1697 > > > Wadsworth Center (f) 518 473-6384 > > > NYS Department of Health Help Desk 518 473-0773 > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > --- > Brian R Cuttler [EMAIL PROTECTED] > Computer Systems Support (v) 518 486-1697 > Wadsworth Center (f) 518 473-6384 > NYS Department of Health Help Desk 518 473-0773 > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > --- Brian R Cuttler [EMAIL PROTECTED] Computer Systems Support (v) 518 486-1697 Wadsworth Center (f) 518 473-6384 NYS Department of Health Help Desk 518 473-0773 --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]