Re: [Nutch-general] Plugin HitCollector

Andrzej Bialecki Mon, 23 Oct 2006 15:28:32 -0700

Dennis Kubes wrote:
> We are running into the same issue.  Remember that hits just give you 
> doc id and getting hit details from the hit does another read.  So 
> looping through the hits to access every document will do a read per 
> document.  If it is a small number of hits, no big deal, but the more 
> hits to access, the more time.  For our situation limiting the query 
> doesn't work, we need to know information about the hit itself (i.e. a 
> certain field so we can do a count based on the field).  We 
> implemented it using HitCollector modifications in Lucene.  This works 
> but is not ideal in terms of speed so we are looking at making 
> modifications to the IndexReader itself so when it gets the Hits it 
> also gets our field.  Understand that doing something like this though 
> changes core Lucene functionality.  I am not necessarily recommending 
> doing it this way, we just couldn't find another way.


Well, all depends on what kind of details you need to get from each hit. 
Have you tried using FieldCache instead? Or pre-populated BitSets which 
you then would intersect with the result BitSet to get counts of 
matching docs?

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Re: [Nutch-general] Plugin HitCollector

Reply via email to