On Fri, Sep 17, 2010 at 7:34 PM, Scott Smith <ssm...@mainstreamdata.com> wrote:
> First, let me say that I didn't think the problem was in QueryParser and I 
> apologize if that's how it sounded.  QueryParser is a central method to 
> Lucene.  1 of me having problems with QueryParser, 1000's of others not.  Is 
> the problem more likely in my code or lucene.  We'll all agree on the answer 
> to that question.

Don't worry :)
>
> As further proof, I ran the following code.  The first part is from Simon's 
> email (thanks for that snippet) and the second part is from LIA2.
>
>        // code from Willnauer email
>        Analyzer a = new MyAnalyzer(Version.LUCENE_30);
>        TokenStream stream = a.reusableTokenStream("body", new 
> StringReader("Europabörsen"));
>        TermAttribute attr = stream.addAttribute(TermAttribute.class);
>        while(stream.incrementToken())
>        {
>          System.out.println(attr.term());
>        }
>
>        // code from LIA2
>        stream = a.tokenStream("body", new StringReader("Europabörsen"));
>        TermAttribute term = stream.addAttribute(TermAttribute.class);
>        while (stream.incrementToken())
>        {
>            System.out.print(term.term());
>        }
>
>
> The answer I got back was:
> europabörsen
> europaborsen
>
> I realized the difference between these two was whether I was getting the 
> reusableTokeStream or the tokenStream.  In looking at my code, the 
> ASCIIFoldingFilter was not in the filter setup for the 
> resusableTokenStream().  It was for the tokenStream().  I added it to the 
> reusableTokenStream and I now get the result I wanted.  The above code 
> snippet generates the word without the umlaut in both cases.  So, problem 
> solved.
>
> Thanks to Simon for putting on the right track.
you are using lucene 3.0? If so take a look at ReusableAnalyzerBase
which makes it much easier to build Analyzers and prevents code
duplication.

simon
>
> Scott
>
>
> -----Original Message-----
> From: Simon Willnauer [mailto:simon.willna...@googlemail.com]
> Sent: Friday, September 17, 2010 1:03 AM
> To: java-user@lucene.apache.org
> Subject: Re: QueryParser in 3.x
>
> On Fri, Sep 17, 2010 at 1:06 AM, Scott Smith <ssm...@mainstreamdata.com> 
> wrote:
>> I recently upgraded to Lucene 3.0 and am seeing some new behavior that I 
>> don't understand.  Perhaps someone can explain why.
>>
>>
>>
>> I have a custom analyzer.  Part of the analyzer uses the AsciiFoldingFilter. 
>>  If I run a word with an umlaut through that analyzer using the AnalyzerDemo 
>> code in LIA2, as expected, I get the same word except that the umlauted 
>> letter is now a simple ascii letter (no umlaut).  That's what I would expect 
>> and want.
>>
>>
>>
>> If I create a Queryparser using the call "new QueryParser(LUCENE_30, "body", 
>> myAnalyzer) and then call the parse() method passing the same word, I can 
>> see that the query parser has not removed the umlaut.  The string it has is 
>> "+body: Europabörsen".
>>
> This seems to be an issue with your analyzer rather than with the
> QueryParser. Since QueryParser didn't really change its behavior in
> 3.0 except of some default values. Can you provide more info what you
> did with your analyzer? Did you try running the term with umlaut chars
> through your Analyzer / Tokenstream directly? Something like that:
>
> Analyzer a = new MyAnalyzer();
> TokenStream stream = a.reusableTokenStream("body", new
> StringReader("Europabörsen"));
> TermAttribute attr = stream.addAttribute(TermAttribute.class);
> while(stream.incrementToken())
>  System.out.println(attr.term());
>
> simon
>>
>>
>> I know I had to make a number of changes to the analyzer and the tokenizer 
>> to upgrade to 3.x.  Is there something very different from the 2.x version 
>> that I'm likely missing.
>>
>>
>>
>> Anyone have any thoughts?
>>
>>
>>
>>
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to