[ https://issues.apache.org/jira/browse/LUCENE-9006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957457#comment-16957457 ]
ASF subversion and git services commented on LUCENE-9006: --------------------------------------------------------- Commit 517bfd0ab75adb59ad85797118d263bebcf11f52 in lucene-solr's branch refs/heads/jira/SOLR-13822 from David Smiley [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=517bfd0 ] LUCENE-9006: WDGF catenateAll should come before parts Fixes #953 > Ensure WordDelimiterGraphFilter always emits catenateAll token early > -------------------------------------------------------------------- > > Key: LUCENE-9006 > URL: https://issues.apache.org/jira/browse/LUCENE-9006 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis > Reporter: David Smiley > Assignee: David Smiley > Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > Ideally, the first token of WDGF is the preserveOriginal (if configured to > emit), and the second should be the catenateAll (if configured to emit). The > deprecated WDF does this but WDGF can sometimes put the first other token > earlier when there is a non-emitted candidate sub-token. > Example input "8-other" when only generateWordParts and catenateAll -- *not* > generateNumberParts. WDGF internally sees the '8' but moves on. Ultimately, > the "other" token and the catenated "8other" will appear at the same internal > position, which by luck fools the sorter to emit "other" first. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org