On Fri, Sep 17, 2010 at 7:34 PM, Scott Smith <ssm...@mainstreamdata.com> wrote: > First, let me say that I didn't think the problem was in QueryParser and I > apologize if that's how it sounded. QueryParser is a central method to > Lucene. 1 of me having problems with QueryParser, 1000's of others not. Is > the problem more likely in my code or lucene. We'll all agree on the answer > to that question.
Don't worry :) > > As further proof, I ran the following code. The first part is from Simon's > email (thanks for that snippet) and the second part is from LIA2. > > // code from Willnauer email > Analyzer a = new MyAnalyzer(Version.LUCENE_30); > TokenStream stream = a.reusableTokenStream("body", new > StringReader("Europabörsen")); > TermAttribute attr = stream.addAttribute(TermAttribute.class); > while(stream.incrementToken()) > { > System.out.println(attr.term()); > } > > // code from LIA2 > stream = a.tokenStream("body", new StringReader("Europabörsen")); > TermAttribute term = stream.addAttribute(TermAttribute.class); > while (stream.incrementToken()) > { > System.out.print(term.term()); > } > > > The answer I got back was: > europabörsen > europaborsen > > I realized the difference between these two was whether I was getting the > reusableTokeStream or the tokenStream. In looking at my code, the > ASCIIFoldingFilter was not in the filter setup for the > resusableTokenStream(). It was for the tokenStream(). I added it to the > reusableTokenStream and I now get the result I wanted. The above code > snippet generates the word without the umlaut in both cases. So, problem > solved. > > Thanks to Simon for putting on the right track. you are using lucene 3.0? If so take a look at ReusableAnalyzerBase which makes it much easier to build Analyzers and prevents code duplication. simon > > Scott > > > -----Original Message----- > From: Simon Willnauer [mailto:simon.willna...@googlemail.com] > Sent: Friday, September 17, 2010 1:03 AM > To: java-user@lucene.apache.org > Subject: Re: QueryParser in 3.x > > On Fri, Sep 17, 2010 at 1:06 AM, Scott Smith <ssm...@mainstreamdata.com> > wrote: >> I recently upgraded to Lucene 3.0 and am seeing some new behavior that I >> don't understand. Perhaps someone can explain why. >> >> >> >> I have a custom analyzer. Part of the analyzer uses the AsciiFoldingFilter. >> If I run a word with an umlaut through that analyzer using the AnalyzerDemo >> code in LIA2, as expected, I get the same word except that the umlauted >> letter is now a simple ascii letter (no umlaut). That's what I would expect >> and want. >> >> >> >> If I create a Queryparser using the call "new QueryParser(LUCENE_30, "body", >> myAnalyzer) and then call the parse() method passing the same word, I can >> see that the query parser has not removed the umlaut. The string it has is >> "+body: Europabörsen". >> > This seems to be an issue with your analyzer rather than with the > QueryParser. Since QueryParser didn't really change its behavior in > 3.0 except of some default values. Can you provide more info what you > did with your analyzer? Did you try running the term with umlaut chars > through your Analyzer / Tokenstream directly? Something like that: > > Analyzer a = new MyAnalyzer(); > TokenStream stream = a.reusableTokenStream("body", new > StringReader("Europabörsen")); > TermAttribute attr = stream.addAttribute(TermAttribute.class); > while(stream.incrementToken()) > System.out.println(attr.term()); > > simon >> >> >> I know I had to make a number of changes to the analyzer and the tokenizer >> to upgrade to 3.x. Is there something very different from the 2.x version >> that I'm likely missing. >> >> >> >> Anyone have any thoughts? >> >> >> >> >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org