A Hits object is essentially a cache on query results.  It caches in 2 ways:

1. When a query returning Hits is requested, only the top 100 document IDs and 
scores are requested from the scoring system, and the ID/Score pairs are stored 
in a list in the Hits object.  Whenever a document ID, score, or Document 
object are requested that lie beyond the end of that list, the query is 
reexecuted in order to grow the list to at or beyond the request, typically 
100% beyond it.

2. Returning a Document object (rather than a score or document ID) requires 
reconstituting the Document from the stored fields in the index, which is an 
expensive operation.  The Hits object maintains a cache of the 200 most 
recently requested Document objects, so it is unlikely they will need to be 
reconstituted more than once.

This is all optimized around typical hitlist access patterns - navigate forward 
and backwards through the results pages a small number of documents at a time.  
For applications which cannot benefit from the Hits caching, for example which 
employ their own hit caching layer, one can effectively use the so-called "low 
level" IndexSearcher routines which return TopDocs rather than Hits.

- J.J.

At 8:26 PM +0100 10/5/05, Cyril Barlow wrote:
>Is it an actual array of full Documents or a list of reference points to
>Documents? And what's the typical size in memory of a Hits object with say
>1000 avg size docs?

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to