Hello. Apologies if this has come up before, I'm new to the list and didn't see anything in the archives that exactly matched my situation.
I am considering using Lucene to index and search a large collection of small documents in a specialized domain -- probably only a few thousands unique terms spanning across anywhere from one million to ten million small source documents. I hope to be able to get ranked search results back in less than 400 msec. I suspect one issue I may face is index density owing to the large numbers of documents and relatively small vocabulary. That, in turn, may be a drag on query processing. I am working on strategies to ameliorate that somewhat but it may be difficult. In the meantime, I'm looking for some gut reactions from the experts before I take this to the next stage. Can Lucene scale well to this kind of situation? Can I realistically hope to get anywhere near my performance targets? Will I have to distribute pieces of the index across several machines, parallelize my retrievals, and merge the results to do so? If so, does Lucene already support that or will I have to develop that logic in house? (Seems like I saw a reference somewhere that such a feature was coming soon, but I'm not sure when or how it will be implemented.) Any help, tips, references, or advice would be welcome and appreciated. Thank you! Regards, Greg --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]