Re: Wildcard Searching

Otis Gospodnetic Sat, 16 Mar 2002 18:03:32 -0800

Hello,

This was a thread on lucene-user initially, but I'm copying lucene-dev
as well.  Sorry about duplicates.


--- Stefan Bergstrand <[EMAIL PROTECTED]> wrote:
> Doug Cutting <[EMAIL PROTECTED]> writes:
> 
> Just noticed this problem in my program.
> 
> It seems as if the analyzer passed to QueryParser.parse(), never is
> passed to PrefixQuery (which is what my test case is parsed to).
> 
> A quick look in QueryParser.jj confirms this: 
> 
>  q = new PrefixQuery(new Term(field, term.image.substring
>                                       (0, term.image.length()-1)));

I thought that queries such as 'rou?d' are considered wildcard queries
by QueryParser.jj, and not Prefix queries, no?
In the default definition of token in QueryParser.jj I see this:

| <PREFIXTERM:  <_TERM_START_CHAR> (<_TERM_CHAR>)* "*" >
| <WILDTERM:  <_TERM_START_CHAR> 
              (<_TERM_CHAR> | ( [ "*", "?" ] ))* >

Then further down in QueryParser.jj we have this:

       if (wildcard)
         q = new WildcardQuery(new Term(field, term.image));

So a WildWuery is being constructed, not PrefixQuery, I think.

What I don't understand is why the definition of _TERM_START_CHAR looks
like this:

| <#_TERM_START_CHAR: ~[ " ", "\t", "+", "-", "!", "(", ")", ":", "^", 
                         "[", "]", "\"", "{", "}", "~", "*" ] >

Maybe the name is misleading, but it seems like _TERM_START_CHAR are
the characters that a TERM can start with, because later in
QueryParser.jj we have TERM defined as:

| <TERM:      <_TERM_START_CHAR> (<_TERM_CHAR>)*  >

and _TERM_CHAR has this definition:

| <#_TERM_CHAR: <_TERM_START_CHAR> >

So how can we have a "*" in _TERM_START_CHAR when terms are not allowed
to start with a "*", and if we do have "*", how come we do not have "?"
as well?

Can somebodyt correct me in every place where I made false statements,
assumptions, and conclusions?

Thanks,
Otis

> > > From: Howk, Michael [mailto:[EMAIL PROTECTED]]
> > > 
> > > Also, Lucene returns the parsed version of each of our 
> > > searches. When we
> > > search by rou*d, Lucene parses it as rou*d (which is what we 
> > > would expect).
> > > But when we search by rou?d, Lucene parses it as "rou d". It 
> > > seems to wrap
> > > the term in quotes and replace the question mark with a 
> > > space. Any ideas? Or
> > > can someone give us an idea of how to understand WildcardQuery or
> > > WildcardTermEnum?
> > 
> > It sounds like the problem is in the query parser.  Brian?
> > 
> > Doug
> > 
> > --
> > To unsubscribe, e-mail:  
> <mailto:[EMAIL PROTECTED]>
> > For additional commands, e-mail:
> <mailto:[EMAIL PROTECTED]>
> > 
> > 
> 
> -- 
> ---------------------------
> Stefan Bergstrand
> Polopoly - Cultivating the information garden
> Ph:   +46 8 506 782 67
> Cell: +46 704 47 82 67
> Fax:  +46 8 506 782 51
> [EMAIL PROTECTED], http://www.polopoly.com
> 
> --
> To unsubscribe, e-mail:  
> <mailto:[EMAIL PROTECTED]>
> For additional commands, e-mail:
> <mailto:[EMAIL PROTECTED]>
> 



__________________________________________________
Do You Yahoo!?
Yahoo! Sports - live college hoops coverage
http://sports.yahoo.com/

--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Re: Wildcard Searching

Reply via email to