[jira] [Commented] (LUCENE-8723) Bad interaction bewteen WordDelimiterGraphFilter, StopFilter and FlattenGraphFilter

Michael Sokolov (Jira) Tue, 31 Aug 2021 11:09:07 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-8723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17407572#comment-17407572
 ]


Michael Sokolov commented on LUCENE-8723:
-----------------------------------------

I wonder if WDGF and SynonymGraphFilter can also be used together now? If we 
have managed to get all our filters able to consume graphs then we could 
actually remove the (currently deprecated) non-graph versions (SynonymFilter, 
WordDelimiterFilter)

> Bad interaction bewteen WordDelimiterGraphFilter, StopFilter and 
> FlattenGraphFilter
> -----------------------------------------------------------------------------------
>
>                 Key: LUCENE-8723
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8723
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: modules/analysis
>    Affects Versions: 7.7.1, 8.0, 8.3
>            Reporter: Nicolás Lichtmaier
>            Priority: Major
>             Fix For: main (9.0), 8.10
>
>
> I was debugging an issue (missing tokens after analysis) and when I enabled 
> Java assertions I uncovered a bug when using WordDelimiterGraphFilter + 
> StopFilter + FlattenGraphFilter.
> I could reproduce the issue in a small piece of code. This code gives an 
> assertion failure when assertions are enabled (-ea java option):
> {code:java}
>     Builder builder = CustomAnalyzer.builder();
>     builder.withTokenizer(StandardTokenizerFactory.class);
>     builder.addTokenFilter(WordDelimiterGraphFilterFactory.class, 
> "preserveOriginal", "1");
>     builder.addTokenFilter(StopFilterFactory.class);
>     builder.addTokenFilter(FlattenGraphFilterFactory.class);
>     Analyzer analyzer = builder.build();
>      
>     TokenStream ts = analyzer.tokenStream("*", new StringReader("x7in"));
>     ts.reset();
>     while(ts.incrementToken())
>         ;
> {code}
> This gives:
> {code}
> Exception in thread "main" java.lang.AssertionError: 2
>      at 
> org.apache.lucene.analysis.core.FlattenGraphFilter.releaseBufferedToken(FlattenGraphFilter.java:195)
>      at 
> org.apache.lucene.analysis.core.FlattenGraphFilter.incrementToken(FlattenGraphFilter.java:258)
>      at com.wolfram.textsearch.AnalyzerError.main(AnalyzerError.java:32)
> {code}
> Maybe removing stop words after WordDelimiterGraphFilter is wrong, I don't 
> know. However is the only way to process stop-words generated by that filter. 
> In any case, it should not eat tokens or produce assertions. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8723) Bad interaction bewteen WordDelimiterGraphFilter, StopFilter and FlattenGraphFilter

Reply via email to