[ 
https://issues.apache.org/jira/browse/LUCENE-8916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16885109#comment-16885109
 ] 

Alan Woodward commented on LUCENE-8916:
---------------------------------------

Interestingly, the patch attached to LUCENE-8644 will fix this, as it makes 
FTSFS clone all attributes, rather than just saving terms and playing them back 
again in a synthetic token stream.

> GraphTokenStreamFiniteStrings.FiniteStringsTokenStream does not play well 
> with subsequent TokenFilters
> ------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-8916
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8916
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Alan Woodward
>            Assignee: Alan Woodward
>            Priority: Major
>
> GraphTokenStreamFiniteStrings provides a view over multiple paths through a 
> Token graph, which is useful when building queries over multiple length 
> synonyms.  This view is exposed as an iterator over simple TokenStreams.  
> However, these TokenStreams do not work correctly when further wrapped in 
> token filters, because they do not use a CharTermAttribute.
> For an example of issues this can cause, see 
> https://github.com/elastic/elasticsearch/issues/43976, where elasticsearch 
> uses a special shingle field to speed up phrase searches.  Queries are 
> converted to shingles if they have multiple terms. However, if the query 
> resolves into a graph due to synonyms, then this conversion breaks because 
> the FixedShingleFilter is given a token stream built by GTSFS; terms are set 
> using BytesTermAttribute, but then read using CharTermAttribute, and as these 
> have different backing implementations, FSF ends up emitting null tokens.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to