Re: questions about DocsEnum.read()in flex api

Michael McCandless Fri, 30 Apr 2010 10:25:46 -0700

On Fri, Apr 30, 2010 at 1:15 PM, Burton-West, Tom <[email protected]> wrote:
> I’m a bit confused about the DocsEnum.read() in the flex API.   I have three
> questions:
>
>
> DocsEnum.read() currently delegates to nextDoc() in the base class and there
> is a note that subclasses may do this more efficiently.  Is there currently
> a more efficient implementation in a subclass?  I didn’t see one in
> MultiDocsEnum or MappingMultiDocsEnum, but perhaps I’m not understanding the
> code.


Yes, the standard codec does so (StandardPostingsReaderImpl.java).

MultiDocsEnum doesn't... but you should not use that (if performance
is important).  Instead you should go segment by segment.

> DocsEnum.read reads 64 docs/freqs at a time as set up in initBulkResult().
> Would it make sense to have this configurable as an argument somewhere?
> I’m looking at very large indexes where a common term might occur in 100,000
> or more docs.

We could do that... maybe .getBulkResult should take a "suggested
size"?  It'd just be a suggestion though, since eg block based codecs
would presumably return to you a direct slice into their underlying
int[] buffers.

> At the very top of the JavaDoc there is a warning “you must first call
> nextDoc”   It seems that this applies to calling DocsEnum.docID() or
> DocsEnum.freq() but not to DocsEnum.read().  Is that correct?

That's right -- I just committed a small fix to the jdoc to clarify this.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: questions about DocsEnum.read()in flex api

Reply via email to