[ https://issues.apache.org/jira/browse/LUCENE-8916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16888780#comment-16888780 ]
ASF subversion and git services commented on LUCENE-8916: --------------------------------------------------------- Commit 1eb2a26c6cc9346827a321c3f883f17ea94ea013 in lucene-solr's branch refs/heads/branch_8x from Alan Woodward [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=1eb2a26 ] LUCENE-8916: GraphTokenStreamFiniteStrings preserves all attributes > GraphTokenStreamFiniteStrings.FiniteStringsTokenStream does not play well > with subsequent TokenFilters > ------------------------------------------------------------------------------------------------------ > > Key: LUCENE-8916 > URL: https://issues.apache.org/jira/browse/LUCENE-8916 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Alan Woodward > Assignee: Alan Woodward > Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > GraphTokenStreamFiniteStrings provides a view over multiple paths through a > Token graph, which is useful when building queries over multiple length > synonyms. This view is exposed as an iterator over simple TokenStreams. > However, these TokenStreams do not work correctly when further wrapped in > token filters, because they do not use a CharTermAttribute. > For an example of issues this can cause, see > https://github.com/elastic/elasticsearch/issues/43976, where elasticsearch > uses a special shingle field to speed up phrase searches. Queries are > converted to shingles if they have multiple terms. However, if the query > resolves into a graph due to synonyms, then this conversion breaks because > the FixedShingleFilter is given a token stream built by GTSFS; terms are set > using BytesTermAttribute, but then read using CharTermAttribute, and as these > have different backing implementations, FSF ends up emitting null tokens. -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org