Hi, Lucene dev community:

Our current code is based on Lucene7.
In some analyzer testcase, give a string "*Google's biologist’s*", the
tokenization result is, *["google", "biologist"]*

But after I migrating the codebase to Lucene9,
the result becomes, *["googles", "**biologist’s**"]*

It looks like some behavior has changed among the major versions.

But I cannot find exactly where is the RC that causes this.
Could someone please provide some clues? Maybe some grammar has changed?

The analyzer uses the following three Lucene libraries:

org.apache.lucene.analysis.core.FlattenGraphFilter;

org.apache.lucene.analysis.shingle.ShingleFilter;

org.apache.lucene.analysis.synonym.SynonymGraphFilter;


Thanks

Reply via email to