MultiReaders can't quickly compute the exact term count. Would they be allowed to throw UOE? (Like IndexReader.getUniqueTermCount)
TermsHashPerField.numPostings (not .numPostingsInt) tells you the # unique terms currently in IndexWriter's RAM buffer, so I think we could save that out with FieldInfo. That seems reasonable? We could also compute it at search time, because the SegmentTermEnum knows its position. Ie you could seek to first term of field X and then first term of field after X and subtract the positions. But, the position is not exposed publicly now, and this'd be more costly to do (though we could cache & reuse the result). It wouldn't involve changing the index format. With LUCENE-1458 this becomes simple (it already keeps track of each fields's terms, separately, including total number of terms for that field). Mike On Mon, Sep 21, 2009 at 9:14 AM, John Wang <john.w...@gmail.com> wrote: > Hi guys: > Not sure if this would be a better fit on the users or the dev list. > It would be very useful to be able to get term count given a field, > e.g. > int IndexReader.termCount(String field) > Wanted to get your opinion on what is the best way to approach this. > After looking through the code, seems like we do have it stored > in TermsHashPerField.numPostingInt. (hopefully I am reading it correctly) > Is it possible to add to the FieldInfo class and write it out? > > Thanks > -John > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org