Is "Text" the only field in the index? Note that the search only looks at field "Text", while the terms() iteration as appears in that code might bump into a term with same text but in another field. A better comparison would be to create a Term ("Text",<your-word>), and compare TermQuery(thatTerm) to termDocs(thatTerm). Btw, if iterating all terms is ever a must, note TermEnum.skipTo(Term).
Hope this helps, Doron dziadgba <[EMAIL PROTECTED]> wrote on 10/03/2007 05:33:03: > hye, > I want to extract documents which contain a specific term. > I tried to do it in two different ways: > > 1 Using the 'iterator' termdocs = reader.termDocs(term); > 2 Using search and examing Hits > > turns out that the result are sometimes equal, sometimes the first is a > subset of the > second and sometimes there is no connection between the two results. > > can somebody give me a hint? > > bye > > > public void addDocumentsToTerm(int debug, String myterm) throws > Exception > { TermEnum terms=reader.terms(); > > MyDocument doc; > Document docLucene; > int count=0; > boolean b=false; > if(debug==1) System.out.print("Checking (MyTerm) "+myterm+" ... "); > > while (terms.next()) > { Term t = terms.term(); > if(t.text().compareTo(myterm)==0) > { TermDocs termDocs = reader.termDocs(t); > while(termDocs.next()) > { > if(debug==1 && count==0) System.out.println("equal to (Term) > "+t.text()); > count++; > docLucene = reader.document(termDocs.doc()); > if(debug==1) System.out.println(" docLucene: > ["+termDocs.doc()+"- > "+docLucene.getField("Code").stringValue()+"] > "); > b=true; > } > System.out.println(); > > QueryParser pars = new QueryParser("Text",new > StandardAnalyzer()); > Query q= pars.parse(t.text()); > Hits hits = searcher.search(q); > > System.out.println("Found "+hits.length()+" matches for query > "+q); > for(int i=0;i<hits.length();i++) > { Document d = hits.doc(i); > System.out.println(" doc: > ["+d.+"-"+d.getField("Code").stringValue()+"]"); > > } > > System.out.println(); > if(b==false) > System.out.println("No Document found for term: > "+myterm.getTerm()); > if(debug==1)if(count < myterm.getDocFreq()&&count>0) > System.out.println(" Error term: "+myterm.getTerm()+", documents > found: " > +count+", > docFreq: "+myterm.getDocFreq()); > > return; > } > } > } > > OUPUT > > Checking (Myterm) zucca ... equal to (Term) zucca > docLucene: [9963-356 U.S. 256] > > Found 8 matches for query Text:zucca > doc: [0-356 U.S. 256] > doc: [1-365 U.S. 290] > doc: [2-351 U.S. 91] > doc: [3-356 U.S. 660] > doc: [4-365 U.S. 265] > doc: [5-377 U.S. 235] > doc: [6-435 U.S. 519] > doc: [7-441 U.S. 281] > > Error term: zucca, documents found: 1, docFreq: 8 > > Checking (Myterm) zimroth ... equal to (Term) zimroth > doc: [16478-476 U.S. 467] > doc: [17142-492 U.S. 257] > doc: [17208-488 U.S. 235] > doc: [17911-484 U.S. 1] > doc: [17920-487 U.S. 1] > doc: [18010-484 U.S. 301] > > Found 8 matches for query Text:zimroth > doc: [0-389 U.S. 143] > doc: [1-468 U.S. 981] > doc: [2-489 U.S. 688] > doc: [3-491 U.S. 781] > doc: [4-436 U.S. 412] > doc: [5-445 U.S. 573] > doc: [6-462 U.S. 213] > doc: [7-468 U.S. 897] > > Error term: zimroth, documents found: 6, docFreq: 8 > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]