On Fri, Apr 30, 2010 at 2:37 PM, Burton-West, Tom <tburt...@umich.edu> wrote:
> Thanks Mike!
>
> A follow-up question:
>
>> DocsEnum.read() currently delegates to nextDoc() in the base class and there
>> is a note that subclasses may do this more efficiently.  Is there currently
>> a more efficient implementation in a subclass?
>>>Yes, the standard codec does so (StandardPostingsReaderImpl.java).
>
> I assume that the standard codec is the default.

Right.  Only if the app uses its own codec in IndexWriter will it be
different...

> Will what I'm using in HighFreqTermsWithTF to instantiate an IndexReader 
> (below) eventually end up instantiating the StandardPostingReaderImpl or do I 
> need to do something explicitly that will cause it to be instantiated?
>
> dir = FSDirectory.open(new File(args[0]));
> reader = IndexReader.open(dir, true);

Yes, this will use standard codec (assuming your IndexWriter didn't
use a different codec).

Actually, for HighFreqTerms, what I said before ("you should go
segment by segment") is not a good idea, since that'd mean you'd have
to aggregate across terms, ie when the same term appears in multiple
segments.

I would say you should just use MultiTermsEnum, and use the bulk API,
but not worry for now that MultiTermsEnum doesn't override the bulk
read impl?  Or your patch could fix this -- it's just a matter of
calling the current segment's bulk read, and then doing a 2nd pass to
add the offset to each doc.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to