Re: TermCount per fiend

Michael McCandless Mon, 21 Sep 2009 11:17:56 -0700

MultiReaders can't quickly compute the exact term count.  Would they
be allowed to throw UOE?  (Like IndexReader.getUniqueTermCount)

TermsHashPerField.numPostings (not .numPostingsInt) tells you the #
unique terms currently in IndexWriter's RAM buffer, so I think we
could save that out with FieldInfo.  That seems reasonable?

We could also compute it at search time, because the SegmentTermEnum
knows its position.  Ie you could seek to first term of field X and
then first term of field after X and subtract the positions.  But, the
position is not exposed publicly now, and this'd be more costly to do
(though we could cache & reuse the result).  It wouldn't involve
changing the index format.

With LUCENE-1458 this becomes simple (it already keeps track of each
fields's terms, separately, including total number of terms for that
field).

Mike

On Mon, Sep 21, 2009 at 9:14 AM, John Wang <[email protected]> wrote:
> Hi guys:
>      Not sure if this would be a better fit on the users or the dev list.
>      It would be very useful to be able to get term count given a field,
> e.g.
>      int IndexReader.termCount(String field)
>      Wanted to get your opinion on what is the best way to approach this.
> After looking through the code, seems like we do have it stored
> in TermsHashPerField.numPostingInt. (hopefully I am reading it correctly)
>     Is it possible to add to the FieldInfo class and write it out?
>
> Thanks
> -John
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: TermCount per fiend

Reply via email to