Hi there,
I'm not sure if the performance is considerable for you but you could try:
TermDocs termDocs = this.reader.termDocs(term);
int count = 0;
while(termDocs.next()){
count += termDocs.freq();
}
simon
On Mon, Aug 24, 2009 at 6:14 PM, Ivan Vasilev<[email protected]> wrote:
> Hi All,
>
> We use faceting in our app but it is very slow for the indexes that use our
> clients.
> First I will say what I understand under faceting - this is for each term
> for certain field to obtain 1. number of docs that contain it, 2. the total
> number of occurrences of the term in the index.
> Now what we use to obtain the information:
>
> ...
> some code for obtained terms on which we will make faceting
> ...
>
> Term[] retTerms = new Term[terms.size()];
> int[] retFreqs = new int[retTerms.length];
> int[] retDocs = new int[retTerms.length];
> TermPositions tp = mSearcher.getIndexReader().termPositions();
> int i = 0;
> for(Iterator<Term> iter = terms.iterator(); iter.hasNext(); i++) {
> try {
> retTerms[i] = iter.next();
> tp.seek(retTerms[i]);
> while(tp.next()) {
> // tp.read(new int[]{}, new int[]{});
> // tp.doc();
> retFreqs[i] += tp.freq();
> retDocs[i]++;
> }
> } finally {
> if(tp != null) {
> tp.close();
> }
> }
> }
>
> Now what I discovered that is extremely faster for obtaining number of docs
> that contain each term.
>
> ...
> the same code for obtained terms on which we will make faceting
> ...
>
> Term[] retTerms = new Term[terms.size()];
> int[] retFreqs = new int[retTerms.length];
> int i = 0;
> long t1 = System.currentTimeMillis();
> for (Term currTerm : terms) {
> retTerms[i] = currTerm;
> retFreqs[i] = mSearcher.docFreq(currTerm);
> i++;
> }
>
> I tested two code versions for obtaining 1 237 390 term facets. The
> difference in time was 10 times (second version wins). I know that this is
> because Lucene index keeps for each term the number of docs that contain it.
>
> My question - is there some way to obtain the total number of occurrences of
> the term in the index in some similar fast way?
>
> Best Regards,
> Ivan
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]