Re: Paging results

Volodymyr Bychkoviak Wed, 29 Mar 2006 02:22:24 -0800

Hi

Marios Skounakis wrote:

 Hi all,
I have the following issue (I am giving a quantified example so we cantalk more concretely)
My documents have an docId field, stored as a keyword field.
I am executing searches that return between 2000 and 10000 documentsand sorting the results by relevance (or sometimes alphabetically).
In every query, I need to discard some of the results based on theirdocId. I have a list of the docIds that need to be discarded in anarray. The size of the list is usually about 100.
I generally perform paging, so I don't need to display all theresults, but only those of the current page (e.g. results 100 to 120)
Currently, after getting the Hits object from the Searcher, I loopover the documents, retrieve their docId, and see if it is in thediscard list. If not, I put the document in a validResult collection.When I have read enough valid results to be able to show the currentpage, I stop. E.g., in the above example, I would read until I had 120valid results.
The problem with this implementation is that I have to read the docIdfor the results that precede the current page, in order to determineif they are in the invalid list. So, when showing the pages near thelast page, I have to read the entire result list. I don't like thisbecause this means that the computation required to display a page isnot constant but depends on the page's position. Anyway, what do theexperts think? Is this implementation prohibitively expensive?
(As a sidenote, when calling hits.doc(i), does Lucene retrieve thewhole document, or just a pointer to it, and retrieves the data whendoing hits.doc(i).getField...?)

Yes, Lucene does retrieve the whole document on this call.

An alternative would be to extend the query to exclude the ids in thediscard list. How would adding 100 exclusion clauses to the queryimpact the query's performance? Are there any studies on search speedin relation to the number of query clauses?

It's preferred way of doing such things. Adding 100 exclusion clausesinto query will much less expensive than retrieving documents andchecking their field.


Is there another way to do handle this issue?

Many thanks in advance,
Marios Skounakis

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


--
regards,
Volodymyr Bychkoviak

Re: Paging results

Reply via email to