On Fri, Apr 30, 2010 at 2:37 PM, Burton-West, Tom <tburt...@umich.edu> wrote: > Thanks Mike! > > A follow-up question: > >> DocsEnum.read() currently delegates to nextDoc() in the base class and there >> is a note that subclasses may do this more efficiently. Is there currently >> a more efficient implementation in a subclass? >>>Yes, the standard codec does so (StandardPostingsReaderImpl.java). > > I assume that the standard codec is the default.
Right. Only if the app uses its own codec in IndexWriter will it be different... > Will what I'm using in HighFreqTermsWithTF to instantiate an IndexReader > (below) eventually end up instantiating the StandardPostingReaderImpl or do I > need to do something explicitly that will cause it to be instantiated? > > dir = FSDirectory.open(new File(args[0])); > reader = IndexReader.open(dir, true); Yes, this will use standard codec (assuming your IndexWriter didn't use a different codec). Actually, for HighFreqTerms, what I said before ("you should go segment by segment") is not a good idea, since that'd mean you'd have to aggregate across terms, ie when the same term appears in multiple segments. I would say you should just use MultiTermsEnum, and use the bulk API, but not worry for now that MultiTermsEnum doesn't override the bulk read impl? Or your patch could fix this -- it's just a matter of calling the current segment's bulk read, and then doing a 2nd pass to add the offset to each doc. Mike --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org