SOME ONE wrote:
Hi,
Yes, my queries are like the first case. And as there
have been no other suggestions to do it in a single
search operation, will have to do it the way you
suggested. This technique will do the job particularly
because title's text is always in the body as well. So
finally I will have to run two search operations like
(body:a AND body:b AND body:c) AND
(title:a OR title:b OR title:c)
to get the first group of results for title match, and
(body:a AND body:b AND body:c) NOT
(title:a OR title:b OR title:c)
to get the second group of results with just body
match, and sort each group by date.
Here's another way. Search just the first part of the first query, i.e.
(body:a AND body:b AND body:c). Then use an IndexReader, TermDocs and related
objects to get a list of document numbers for each of title:a, title:b, and
title:c. Then merge / compare the lists to separate out the title matches and
the body-only matches. Doing it this way will save whatever resources are
consumed by score computations. Resources needed by the ORing will still
basically be consumed, because of the merge / compare step you must perform.
Of course, this approach wouldn't work if the query terms aren't simple terms,
i.e. you use ranges, or wildcards, or prefixes, etc. If you do use constructs
like those, then I don't see an alternative to the two-search algorithm.
Unless you have hundreds of millions of documents, each of the two searches
could be simpler than what you have above, though (see my earlier email for
details). The way they're set up above, 2 sets of reads through the indexes
are necessary, and 2 sets of boolean operations. The way I suggested earlier
would have only 1 read of the index and 1 merge / compare computation. if you
have a few tens of millions of documents, or less, the lists of document
numbers should easily fit in RAM.
--MDC
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]