Re: [RegexQuery] how to check what words were founded in particulary Documents ?

mhzmark Sat, 21 Jul 2007 10:39:31 -0700

To begin with, thanks for Yours quick response.

I am still trying write this code and I've already written this one:


import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.store.*;
import org.apache.lucene.index.*;
import org.apache.lucene.document.*;
import org.apache.lucene.search.*;
import org.apache.lucene.search.regex.*;

public class TmpMain {
        public static void main(String[] args) throws Exception {       
                RAMDirectory directory = new RAMDirectory(); 
                IndexWriter writer = new IndexWriter(directory, new 
StandardAnalyzer(), 
                                                      true); 
                Document doc = new Document(); 
                doc.add(new Field("field", "Cat Cot Cit Cet", Field.Store.YES, 
                                    Field.Index.TOKENIZED)); 
                writer.addDocument(doc); 
                writer.optimize();
                writer.close(); 
                IndexReader reader=IndexReader.open(directory);
                RegexTermEnum regexTermEnum=new RegexTermEnum(reader, new 
Term("field", 
                                "C.t"), new JavaUtilRegexCapabilities());
        /*
        //Second Version:
                WildcardTermEnum regexTermEnum=new WildcardTermEnum(reader, new 
                                          Term("field", "C?t"));
        */
                while(regexTermEnum.next()) {
                        System.out.println("Next:");
                        System.out.println(regexTermEnum.term().text());
                }
                System.out.println("End.");             
        }
}


But the output of this code is:
End.

As you can see above I have tried change line:
        RegexTermEnum regexTermEnum=new RegexTermEnum(reader, new Term("field", 
                "C.t"), new JavaUtilRegexCapabilities());
on
        WildcardTermEnum regexTermEnum=new WildcardTermEnum(reader, new 
Term("field", 
                "C?t"));

but the result is the same :/
What am I doing wrong ?




> Erick - you're not missing anything, except that the original poster  
> is after RegexQuery, not WildcardQuery.  Both work basically the same  
> way, except in the pattern matching capabilities.
> 
>       Erik
> 
> On Jul 20, 2007, at 5:45 PM, Erick Erickson wrote:
> > Erik:
> >
> > Well, you wrote the book <G>. But I thought something like this
> > would work
> >
> > TermDocs td = reader.termDocs();
> > WildcardTermEnum we = new WildcardTermEnum(reader, new term("field",
> > "c*t"));
> > while (we.next()) {
> >  td.seek(we);
> >  while (td.next()) {
> >     report document contains term;
> >  }
> > }
> >
> > Although I admit I haven't tried it, so I could be totally off  
> > base. What
> > am I missing?
> >
> > Erick
> >
> > On 7/20/07, Erik Hatcher <[EMAIL PROTECTED]> wrote:
> >>
> >> Erick - I think you're mixing things up with WildcardQuery.
> >> RegexQuery does support all regex capabilities (depending on the
> >> underlying regex matcher used).
> >>
> >> A couple of techniques you could use to achieve the goal:
> >>
> >>         * Use RegexTermEnum, though that'll give you the terms  
> >> across the
> >> entire index, so maybe in your use case you could index a single
> >> document into a RAMDirectory and RegexTermEnum on it.
> >>
> >>         * Try out SpanRegexQuery and use getSpans() to get the exact
> >> matches.
> >>
> >> Erik
> >>
> >>
> >>
> >> On Jul 20, 2007, at 4:10 PM, Erick Erickson wrote:
> >>
> >> > First, the period (.) isn't part of the syntax, so make sure you  
> >> look
> >> > more carefully at the Lucene syntax...
> >> >
> >> > Then, you might be able to use WildcardTermEnum to find
> >> > the terms that match and TermDocs to find the documents
> >> > that contain those terms.
> >> >
> >> > There's nothing built into Lucene to do this out of the box, you
> >> > have to roll your own.
> >> >
> >> > Best
> >> > Erick
> >> >
> >> > On 20 Jul 2007 21:27:40 +0200, [EMAIL PROTECTED]  
> >> <[EMAIL PROTECTED]>
> >> > wrote:
> >> >>
> >> >> Hello.
> >> >>
> >> >> Let assume that I have this code in my application:
> >> >>
> >> >>    (...)
> >> >>    Query query = new RegexQuery(new Term("field", "C.T"));;
> >> >>    // searching...
> >> >>    (...)
> >> >>
> >> >> And now, I would like to know if my application founded "cat" or
> >> >> "cot" or
> >> >> something else. How can I check what was founded by my  
> >> application ?
> >> >>
> >> >> I would like to write application like this:
> >> >>    INPUT -> regular expression
> >> >>    OUTPUT -> file  ---> word
> >> >>
> >> >> example: INPUT = "C.T"
> >> >>          OUTPUT =
> >> >>                   a.txt --> CAT
> >> >>                   a.txt --> COT
> >> >>                   b.txt --> CAT
> >> >>                   b.txt --> CAT
> >> >>                   b.txt --> COT
> >> >>                   (...)
> >> >>
> >> >> So, how to check what words were founded in particulary Documents
> >> >> after
> >> >> searching? I see that Hits class contains only founded  
> >> documents and
> >> >> nothing more (I am new in this technology so I can be wrong...)
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>  
> >> ---------------------------------------------------------------------
> >> >> -
> >> >> Dowiedz sie, co naprawde podnieca kobiety. Wiecej wiesz,  
> >> latwiej je
> >> >> oczarujesz
> >> >>
> >> >> >>>http://link.interia.pl/f1b17
> >> >>
> >> >>
> >> >>  
> >> ---------------------------------------------------------------------
> >> >> To unsubscribe, e-mail: [EMAIL PROTECTED]
> >> >> For additional commands, e-mail: [EMAIL PROTECTED]
> >> >>
> >> >>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [EMAIL PROTECTED]
> >> For additional commands, e-mail: [EMAIL PROTECTED]
> >>
> >>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> 


----------------------------------------------------------------------
Najbogatsza Polka
Zobacz >>> http://link.interia.pl/f1ae2


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [RegexQuery] how to check what words were founded in particulary Documents ?

Reply via email to