Also in 2.9.2 and 3.0.1:
http://lucene.apache.org/java/2_9_2/api/all/org/apache/lucene/index/IndexRea
der.html#getUniqueTermCount()
http://lucene.apache.org/java/3_0_1/api/all/org/apache/lucene/index/IndexRea
der.html#getUniqueTermCount()

Please note, this works only with SegmentReaders, so you have to first get
the getSequentialSubReaders() and you *may* sum up the number on them. But
this would not give the correct number, as segments may have (or in most
cases they have lots of) overlapping terms. For an optimized index
getSequentialSubReaders() returns one index and its unique term count is
correct.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [email protected]


> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of Yonik
> Seeley
> Sent: Thursday, May 27, 2010 9:44 PM
> To: [email protected]
> Subject: Re: How to get the number of unique terms in the inverted index
> 
> On Thu, May 27, 2010 at 2:32 PM, kannan chandrasekaran
> <[email protected]> wrote:
> > I was wondering  if there is a way to retrieve the number of unique
terms
> in the lucene ( version 2.4.0) ... I am aware of the terms() &&
terms(Term)
> method that returns an enumeration (TermEnum) but that involves iterating
> through the terms and couting them.  I looking for something similar to
> numdocs() in the IndexReader class.
> 
> No there is not.
> In 4.0-dev, with the new "flex" APIs, you can retrieve the number of
unique
> terms in a single segment (Terms.getUniqueTermCount()), but not a whole
> index.
> 
> -Yonik
> http://www.lucidimagination.com
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to