Currently I'm working with a single index where content is indexed by
it's original printed page. I have to show the total number of matching
documents, so I end up running through all the hits and taking an order
of magnitude hit on performance as I calculate the number of unique
documents. It's stupid for many many reasons.
To correct all this, I've decided to create two (maybe three) indexes
for the same set of documents: in the first index there is a one to one
relationship between the original document and the Lucene Document
object. The other index is a paragraph index, where each lucene
document represents a single paragraph. I may even throw in a third
index where each lucene document represents a logical section/chapter.
When I'm building the search results page I'll have to execute a fair
number of queries. The first query will execute on the Document-Index,
then for each of the 10 to 2o results I'm displaying at the time, I'll
execute another query to find the best paragraph and or section.
Is this a reasonable solution to the problem?
Thanks for the advice.
Dan
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]