[ https://issues.apache.org/jira/browse/LUCENE-6582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ian Ribas updated LUCENE-6582: ------------------------------ Attachment: after2.png Another example image. > SynonymFilter should generate a correct (or, at least, better) graph > -------------------------------------------------------------------- > > Key: LUCENE-6582 > URL: https://issues.apache.org/jira/browse/LUCENE-6582 > Project: Lucene - Core > Issue Type: Bug > Reporter: Ian Ribas > Attachments: LUCENE-6582.patch, after.png, after2.png, before.png > > > Some time ago, I had a problem with synonyms and phrase type queries > (actually, it was elasticsearch and I was using a match query with multiple > terms and the "and" operator, as better explained here: > https://github.com/elastic/elasticsearch/issues/10394). > That issue led to some work on Lucene: LUCENE-6400 (where I helped a little > with tests) and LUCENE-6401. This issue is also related to LUCENE-3843. > Starting from the discussion on LUCENE-6400, I'm attempting to implement a > solution. Here is a patch with a first step - the implementation to fix > "SynFilter to be able to 'make positions'" (as was mentioned on the > [issue|https://issues.apache.org/jira/browse/LUCENE-6400?focusedCommentId=14498554&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14498554]). > In this way, the synonym filter generates a correct (or, at least, better) > graph. > As the synonym matching is greedy, I only had to worry about fixing the > position length of the rules of the current match, no future or past synonyms > would "span" over this match (please correct me if I'm wrong!). It did > require more buffering, twice as much. > The new behavior I added is not active by default, a new parameter has to be > passed in a new constructor for {{SynonymFilter}}. The changes I made do > change the token stream generated by the synonym filter, and I thought it > would be better to let that be a voluntary decision for now. > I did some refactoring on the code, but mostly on what I had to change for > may implementation, so that the patch was not too hard to read. I created > specific unit tests for the new implementation > ({{TestMultiWordSynonymFilter}}) that should show how things will be with the > new behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org