Using term vectors means passing on the terms too many times - i.e - loop on terms - - loop on docs of a term - - - loop on terms of a doc
Would something like this be better: do { System.out.println(tenum.term()+" appears in "+tenum.docFreq()+" docs!"); TermDocs td = reader.termDocs(tenum.term()); do { System.out.println(" In doc id: "+td.doc() + " it appears: " + td.freq()+ " times"); } while (td.next()); } while (tenum.next()); Also, you can skip faster to a certain doc (id) or certain term using the skipTo() methods. Doron Venkateshprasanna <[EMAIL PROTECTED]> wrote on 19/12/2006 19:20:52: > > > Take a look at TermDocs and TermEnum. > > I need to get the frequency of each word in each of the documents I have > indexed. > > This is what I could do with TermEnums and TermDocs. For each Term from > TermEnum, I have instantiated a TermsDoc and for each doc, I am trying to > get the frequency of the Term. > > IndexReader ir = IndexReader.open("index file"); > TermEnum terms = ir.terms(); > while(terms.next()) { > TermDocs docs = ir.termDocs(terms.term()); > > while(docs.next()) { > TermFreqVector tfv = ir.getTermFreqVector(docs.doc(),"contents"); > String indexTerms[] = tfv.getTerms(); > int indexFreqs[] = tfv.getTermFrequencies(); > > for(int i = 0; i<indexTerms.length; i++) { > System.out.println(indexTerms[i]+" "+indexFreqs[i]); > } > } > } > > But there is no way of getting the frequency of only 'that' term in 'that' > document. I have to get the entire vector. This puts the loop in jeopardy. > How can I overcome this? > > -- > View this message in context: http://www.nabble.com/Extracting-data- > from-Lucene-index-files-tf2813318.html#a7984092 > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]