I recently upgraded to Lucene 3.0 and am seeing some new behavior that I don't 
understand.  Perhaps someone can explain why.

 

I have a custom analyzer.  Part of the analyzer uses the AsciiFoldingFilter.  
If I run a word with an umlaut through that analyzer using the AnalyzerDemo 
code in LIA2, as expected, I get the same word except that the umlauted letter 
is now a simple ascii letter (no umlaut).  That's what I would expect and want.

 

If I create a Queryparser using the call "new QueryParser(LUCENE_30, "body", 
myAnalyzer) and then call the parse() method passing the same word, I can see 
that the query parser has not removed the umlaut.  The string it has is "+body: 
Europabörsen".  

 

I know I had to make a number of changes to the analyzer and the tokenizer to 
upgrade to 3.x.  Is there something very different from the 2.x version that 
I'm likely missing.

 

Anyone have any thoughts?

 

 

Reply via email to