Acedemic Question About Indexing

Luke Shannon Wed, 10 Nov 2004 09:41:44 -0800

I am working on debugging an existing Lucene implementation.

Before I started, I built a demo to understand Lucene. In my demo I indexed
the entire content hierarhcy all at once, and than optimize this index and
used it for queries. It was time consuming but very simply.


The code I am currently trying to fix indexes the content hierarchy by
folder creating a seperate index for each one. Thus it ends up with a bunch
of indexes. I still don't understand how this works (I am assuming they get
merged someone that I have tracked down yet) but I have noticed it doesn't
always index the right folder. This results in the users reporting
"inconsistant" behavior in searching after they make a change to a document.
To keep things simiple I would like to remove all the logic that figures out
which folder to index and just do them all (usually less than 1000 files) so
I end up with one index.

Would indexing time be the only area I would be losing out in, or is there
something more to the approach of creating multiple indexes and merging
them.

What is a good approach I can take to indexing a content hierarchy composed
primarily of pdf, xsl, doc and xml where any of these documents can be
changed several times a day?

Thanks,

Luke



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Acedemic Question About Indexing

Reply via email to