RE: QueryParser in 3.x

Scott Smith Fri, 17 Sep 2010 10:34:47 -0700

First, let me say that I didn't think the problem was in QueryParser and I 
apologize if that's how it sounded.  QueryParser is a central method to Lucene. 
 1 of me having problems with QueryParser, 1000's of others not.  Is the 
problem more likely in my code or lucene.  We'll all agree on the answer to 
that question.


As further proof, I ran the following code.  The first part is from Simon's 
email (thanks for that snippet) and the second part is from LIA2.

        // code from Willnauer email
        Analyzer a = new MyAnalyzer(Version.LUCENE_30);
        TokenStream stream = a.reusableTokenStream("body", new 
StringReader("Europabörsen")); 
        TermAttribute attr = stream.addAttribute(TermAttribute.class);
        while(stream.incrementToken())
        {
          System.out.println(attr.term());
        }

        // code from LIA2
        stream = a.tokenStream("body", new StringReader("Europabörsen"));
        TermAttribute term = stream.addAttribute(TermAttribute.class);
        while (stream.incrementToken())
        {
            System.out.print(term.term());
        }


The answer I got back was:
europabörsen
europaborsen

I realized the difference between these two was whether I was getting the 
reusableTokeStream or the tokenStream.  In looking at my code, the 
ASCIIFoldingFilter was not in the filter setup for the resusableTokenStream().  
It was for the tokenStream().  I added it to the reusableTokenStream and I now 
get the result I wanted.  The above code snippet generates the word without the 
umlaut in both cases.  So, problem solved.

Thanks to Simon for putting on the right track.

Scott


-----Original Message-----
From: Simon Willnauer [mailto:simon.willna...@googlemail.com] 
Sent: Friday, September 17, 2010 1:03 AM
To: java-user@lucene.apache.org
Subject: Re: QueryParser in 3.x

On Fri, Sep 17, 2010 at 1:06 AM, Scott Smith <ssm...@mainstreamdata.com> wrote:
> I recently upgraded to Lucene 3.0 and am seeing some new behavior that I 
> don't understand.  Perhaps someone can explain why.
>
>
>
> I have a custom analyzer.  Part of the analyzer uses the AsciiFoldingFilter.  
> If I run a word with an umlaut through that analyzer using the AnalyzerDemo 
> code in LIA2, as expected, I get the same word except that the umlauted 
> letter is now a simple ascii letter (no umlaut).  That's what I would expect 
> and want.
>
>
>
> If I create a Queryparser using the call "new QueryParser(LUCENE_30, "body", 
> myAnalyzer) and then call the parse() method passing the same word, I can see 
> that the query parser has not removed the umlaut.  The string it has is 
> "+body: Europabörsen".
>
This seems to be an issue with your analyzer rather than with the
QueryParser. Since QueryParser didn't really change its behavior in
3.0 except of some default values. Can you provide more info what you
did with your analyzer? Did you try running the term with umlaut chars
through your Analyzer / Tokenstream directly? Something like that:

Analyzer a = new MyAnalyzer();
TokenStream stream = a.reusableTokenStream("body", new
StringReader("Europabörsen"));
TermAttribute attr = stream.addAttribute(TermAttribute.class);
while(stream.incrementToken())
  System.out.println(attr.term());

simon
>
>
> I know I had to make a number of changes to the analyzer and the tokenizer to 
> upgrade to 3.x.  Is there something very different from the 2.x version that 
> I'm likely missing.
>
>
>
> Anyone have any thoughts?
>
>
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

RE: QueryParser in 3.x

Reply via email to