[ https://issues.apache.org/jira/browse/LUCENE-8916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16885109#comment-16885109 ]
Alan Woodward commented on LUCENE-8916: --------------------------------------- Interestingly, the patch attached to LUCENE-8644 will fix this, as it makes FTSFS clone all attributes, rather than just saving terms and playing them back again in a synthetic token stream. > GraphTokenStreamFiniteStrings.FiniteStringsTokenStream does not play well > with subsequent TokenFilters > ------------------------------------------------------------------------------------------------------ > > Key: LUCENE-8916 > URL: https://issues.apache.org/jira/browse/LUCENE-8916 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Alan Woodward > Assignee: Alan Woodward > Priority: Major > > GraphTokenStreamFiniteStrings provides a view over multiple paths through a > Token graph, which is useful when building queries over multiple length > synonyms. This view is exposed as an iterator over simple TokenStreams. > However, these TokenStreams do not work correctly when further wrapped in > token filters, because they do not use a CharTermAttribute. > For an example of issues this can cause, see > https://github.com/elastic/elasticsearch/issues/43976, where elasticsearch > uses a special shingle field to speed up phrase searches. Queries are > converted to shingles if they have multiple terms. However, if the query > resolves into a graph due to synonyms, then this conversion breaks because > the FixedShingleFilter is given a token stream built by GTSFS; terms are set > using BytesTermAttribute, but then read using CharTermAttribute, and as these > have different backing implementations, FSF ends up emitting null tokens. -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org