[ 
https://issues.apache.org/jira/browse/LUCENE-9006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957457#comment-16957457
 ] 

ASF subversion and git services commented on LUCENE-9006:
---------------------------------------------------------

Commit 517bfd0ab75adb59ad85797118d263bebcf11f52 in lucene-solr's branch 
refs/heads/jira/SOLR-13822 from David Smiley
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=517bfd0 ]

LUCENE-9006: WDGF catenateAll should come before parts
Fixes #953


> Ensure WordDelimiterGraphFilter always emits catenateAll token early
> --------------------------------------------------------------------
>
>                 Key: LUCENE-9006
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9006
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Major
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> Ideally, the first token of WDGF is the preserveOriginal (if configured to 
> emit), and the second should be the catenateAll (if configured to emit).  The 
> deprecated WDF does this but WDGF can sometimes put the first other token 
> earlier when there is a non-emitted candidate sub-token.
> Example input "8-other" when only generateWordParts and catenateAll -- *not* 
> generateNumberParts.  WDGF internally sees the '8' but moves on.  Ultimately, 
> the "other" token and the catenated "8other" will appear at the same internal 
> position, which by luck fools the sorter to emit "other" first.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to