Hello.  Apologies if this has come up before, I'm new to the list and
didn't see anything in the archives that exactly matched my situation.

I am considering using Lucene to index and search a large collection of
small documents in a  specialized domain -- probably only a few
thousands unique terms spanning across anywhere from one million to ten
million small source documents.  I hope to be able to get ranked search
results back in less than 400 msec.

I suspect one issue I may face is index density owing to the large
numbers of documents and relatively small vocabulary.  That, in turn,
may be a drag on query processing.  I am working on strategies to
ameliorate that somewhat but it may be difficult.

In the meantime, I'm looking for some gut reactions from the experts
before I take this to the next stage.  Can Lucene scale well to this
kind of situation?  Can I realistically hope to get anywhere near my
performance targets?  Will I have to distribute pieces of the index
across several machines,  parallelize my retrievals, and merge the
results to do so?  If so, does Lucene already support that or will I
have to develop that logic in house?  (Seems like I saw a reference
somewhere that such a feature was coming soon, but I'm not sure when or
how it will be implemented.)

Any help, tips, references, or advice would be welcome and appreciated.
Thank you!

Regards,

Greg 


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to