It sounds like an EnglishPossessiveFilter is missing and I think it is not relevant to the filters you listed? Are there other Lucene filters you're using?
Also what exact versions are you upgrading from and to? On Fri, Apr 28, 2023 at 10:20 AM MyCoy Z <mycoy.zh...@gmail.com> wrote: > Hi, Lucene dev community: > > Our current code is based on Lucene7. > In some analyzer testcase, give a string "*Google's biologist’s*", the > tokenization result is, *["google", "biologist"]* > > But after I migrating the codebase to Lucene9, > the result becomes, *["googles", "**biologist’s**"]* > > It looks like some behavior has changed among the major versions. > > But I cannot find exactly where is the RC that causes this. > Could someone please provide some clues? Maybe some grammar has changed? > > The analyzer uses the following three Lucene libraries: > > org.apache.lucene.analysis.core.FlattenGraphFilter; > > org.apache.lucene.analysis.shingle.ShingleFilter; > > org.apache.lucene.analysis.synonym.SynonymGraphFilter; > > > Thanks > >