On Mon, Nov 29, 2010 at 12:51 PM, DM Smith <dmsmith...@gmail.com> wrote: > > I'd have to look to be sure: IIRC, Turkish was one. The treatment of 'i' was > buggy. Russian had it's own encoding that was replaced with UTF-8. The > QueryParser had bug fixes. There is some effort to migrate away from stemmer > to snowball, but at least the Dutch one is not "identical". >
but none of these broke backwards compatibility, they all respect the Version constant! The SnowballAnalyzer respects the version constant for the buggy turkish lowercasing! If you use VERSION.LUCENE_30 (or less) it wrongly lowercases so you get your old buggy behavior. Even the old buggy Dutch stemmer is still there, and if you use DutchAnalyzer(Version.LUCENE_30) (or less) it stems incorrectly so you get your old buggy behavior! The russian was the same way, same with the QueryParser. So I'm sorry, I am left confused about where the backwards breaks are? --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org