Hi All! I think I've found a bug with sentence boundary detection explained in detail here https://github.com/apache/lucene/issues/11735
It affects KeywordRepeatFilter + OpenNLPLemmatizer configuration which apparently is thought to be common enough to be directly mentioned in solr documentation/examples https://solr.apache.org/guide/7_3/language-analysis.html#opennlp-lemmatizer-filter The bug should be fairly easy to verify with the this test https://github.com/kotman12/lucene/blob/8ecd42ec88685f47d42a88dd2536e879028af023/lucene/analysis/opennlp/src/test/org/apache/lucene/analysis/opennlp/TestOpenNLPLemmatizerFilterFactory.java#L298 and I'd greatly appreciate if someone could give this a look. I'm also proposing a fix here https://github.com/apache/lucene/pull/11734 but naturally I am open to other thoughts on how to approach this. Thanks, Luke