hye,
I want to extract documents which contain a specific term.
I tried to do it in two different ways:

1 Using the 'iterator' termdocs = reader.termDocs(term);
2 Using search and examing Hits

turns out that the result are sometimes equal, sometimes the first is a subset of the
second and sometimes there is no connection between the two results.

can somebody give me a hint?

bye


public void addDocumentsToTerm(int debug, String myterm) throws Exception
    { TermEnum terms=reader.terms();

      MyDocument doc;
      Document docLucene;
      int count=0;
      boolean b=false;
      if(debug==1) System.out.print("Checking (MyTerm) "+myterm+" ... ");

      while (terms.next())
      { Term t = terms.term();
        if(t.text().compareTo(myterm)==0)
        { TermDocs termDocs = reader.termDocs(t);
          while(termDocs.next())
          {
if(debug==1 && count==0) System.out.println("equal to (Term) "+t.text());
            count++;
            docLucene = reader.document(termDocs.doc());
if(debug==1) System.out.println(" docLucene: ["+termDocs.doc()+"- "+docLucene.getField("Code").stringValue()+"] ");
            b=true;
          }
            System.out.println();

QueryParser pars = new QueryParser("Text",new StandardAnalyzer());
            Query q= pars.parse(t.text());
            Hits hits = searcher.search(q);

System.out.println("Found "+hits.length()+" matches for query "+q);
            for(int i=0;i<hits.length();i++)
            {  Document d = hits.doc(i);
System.out.println(" doc: ["+d.+"-"+d.getField("Code").stringValue()+"]");

            }

          System.out.println();
          if(b==false)
System.out.println("No Document found for term: "+myterm.getTerm()); if(debug==1)if(count < myterm.getDocFreq()&&count>0) System.out.println(" Error term: "+myterm.getTerm()+", documents found: " +count+", docFreq: "+myterm.getDocFreq());

          return;
        }
      }
    }

OUPUT

Checking (Myterm) zucca ... equal to (Term) zucca
  docLucene: [9963-356 U.S. 256]

Found 8 matches for query Text:zucca
        doc: [0-356 U.S. 256]
        doc: [1-365 U.S. 290]
        doc: [2-351 U.S. 91]
        doc: [3-356 U.S. 660]
        doc: [4-365 U.S. 265]
        doc: [5-377 U.S. 235]
        doc: [6-435 U.S. 519]
        doc: [7-441 U.S. 281]

        Error term: zucca, documents found: 1, docFreq: 8

Checking (Myterm) zimroth ... equal to (Term) zimroth
        doc: [16478-476 U.S. 467]
        doc: [17142-492 U.S. 257]
        doc: [17208-488 U.S. 235]
        doc: [17911-484 U.S. 1]
        doc: [17920-487 U.S. 1]
        doc: [18010-484 U.S. 301]

Found 8 matches for query Text:zimroth
        doc: [0-389 U.S. 143]
        doc: [1-468 U.S. 981]
        doc: [2-489 U.S. 688]
        doc: [3-491 U.S. 781]
        doc: [4-436 U.S. 412]
        doc: [5-445 U.S. 573]
        doc: [6-462 U.S. 213]
        doc: [7-468 U.S. 897]

        Error term: zimroth, documents found: 6, docFreq: 8




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to