hit.doc, hit.score and FSDir performance

Sameer Shisodia Mon, 10 Apr 2006 21:12:28 -0700

Hi All.

I am using Lucene as the backbone of a 'Smart Search'.


I have a layer over search that extensively analyzes results at runtime to
bucket them. I do trim the resultset, but only after this procesing since
their are non document weights that are combined with the result scores, and
the hits are then reordered/modified.

This needs to essentially get all docs (cause there's some field level
analysis), and the score for each upfront, and it seems to be taking forever
to do for a large no of hits. The documents themselves are tiny - less than
half a k usually.  Hit.doc() and .score() seem to be where its taking time -
quite as cautioned in the javadocs.

Another peculiarity : the query is basically "keywords" which hits all
fields, and you can additionally make it more precise by certain fields as
field:value. For the latter case, for a similar number of hits, the same
iteration above is much quicker than in the case where a similar number of
hits is found by the keywords hitting all fields. The query is NOT visibly
slower - but the iteration is. Something to do with how spread out across
the index the hits are ?

Is there a possible workaround for the .doc()/.score() access ? Can
RAMDirectory be used only for searches over a "regular" FSDirectory index -
and is it usable when the index size is a multiple of available RAM (this is
on RH9 or fedora core) ?

Thanks in advance,
Sameer

--
Sameer Shisodia  Bangalore
[EMAIL PROTECTED]

hit.doc, hit.score and FSDir performance

Reply via email to