RE: [Lucene-dev] QueryParser bug ?

Tal Dayan Thu, 07 Jun 2001 16:42:00 -0700

Here are two suggestions:

1. (basic) Make the syntax of the parser more tolerant to missing terms.
This is a partial solution that
will work well as long as a token is always eliminated or replaced by anothr
token by the analyzer
(that is, no multi tokens per single input token).

Example:

  +strong +will

analyzed into

  +strong +

and is reduced to

  +strong

(the question how to handle required stop words can be done both way, one as
you said is to always fail
the matching and the other is to simply ignore this term).

2. (extension) Have code to fegure out the effect of the given analyzer
(which is actually a factory of analyzers so you
can create as many as you need).
That is, find a mapping of each of the input token to the output token(s) it
genrates (0, 1, or possibly 2 or more tokens). Then Reconstruct the query
using the structure of the original query string and the term value from the
token mapping.

Example:

   +dogs +will +rock

the analyzer modified the tokens as follows:

   dogs ->  dog              // porter filter
   will ->  (empty)          // stop filter
   rock ->  music, stone     // multi alias filter

Then you construct the query

   +(dog) +()  +(music OR stone)

which is reduced (according to the 'basic' aproach above) to

  +dog +(music OR stone)

If the second apraoch makes sense, we can discuss how to figure out the
token mapping of a given
analyzer.

Tal


> -----Original Message-----
> From: Doug Cutting [mailto:[EMAIL PROTECTED]]
> Sent: Thursday, June 07, 2001 12:06 PM
> To: 'Tal Dayan'; [EMAIL PROTECTED]
> Subject: RE: [Lucene-dev] QueryParser bug ?
>
>
> > From: Tal Dayan [mailto:[EMAIL PROTECTED]]
> >
> > The statement
> >
> >       final Query query = QueryParser.parse("+strong +will",
> >                                             "field",
> >                                             new StandardAnalyzer());
> >
> > Generates the following exception:
> >
> > com.lucene.queryParser.ParseException: Encountered "<EOF>" at
>
> Since "will" is on the stop list, this query will never return any
> documents.  So I think this should be an error, the problem is that the
> error message is uninformative.
>
> To make the message more informative you need to first check to
> see if there
> are stop words in the query, and warn the user of this.  You could do this
> by constructing a StandardAnalyzer without a stop list, and
> checking each of
> the words that comes out to see if it is a stop word.  (I just added a
> StandardAnalyzer constructor that lets you specify the stop list, which
> makes this easier.)
>
> It's a little ugly to add this generically to QueryParser, since
> it requires
> a lot of knowledge about the analyzer you're using.  I suppose we
> could add
> a method like:
>   QueryParser.parse(String query, String field,
>                     Analyzer withoutStop, Analyzer withStop, String[]
> stopWords);
> That's a pretty complicated API!  Does anyone have better ideas?
>
> Doug
>


_______________________________________________
Lucene-dev mailing list
[EMAIL PROTECTED]
http://lists.sourceforge.net/lists/listinfo/lucene-dev
RE: [Lucene-dev] QueryParser bug ?

Reply via email to