Ian Ribas created LUCENE-6582:
---------------------------------
Summary: SynonymFilter should generate a correct (or, at least,
better) graph
Key: LUCENE-6582
URL: https://issues.apache.org/jira/browse/LUCENE-6582
Project: Lucene - Core
Issue Type: Bug
Reporter: Ian Ribas
Some time ago, I had a problem with synonyms and phrase type queries (actually,
it was elasticsearch and I was using a match query with multiple terms and the
"and" operator, as better explained here:
https://github.com/elastic/elasticsearch/issues/10394).
That issue led to some work on Lucene: LUCENE-6400 (where I helped a little
with tests) and LUCENE-6401. This issue is also related to LUCENE-3843.
Starting from the discussion on LUCENE-6400, I'm attempting to implement a
solution. Here is a patch with a first step - the implementation to fix
"SynFilter to be able to 'make positions'" (as was mentioned on the
[issue|https://issues.apache.org/jira/browse/LUCENE-6400?focusedCommentId=14498554&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14498554]).
In this way, the synonym filter generates a correct (or, at least, better)
graph.
As the synonym matching is greedy, I only had to worry about fixing the
position length of the rules of the current match, no future or past synonyms
would "span" over this match (please correct me if I'm wrong!). It did require
more buffering, twice as much.
The new behavior I added is not active by default, a new parameter has to be
passed in a new constructor for {{SynonymFilter}}. The changes I made do change
the token stream generated by the synonym filter, and I thought it would be
better to let that be a voluntary decision for now.
I did some refactoring on the code, but mostly on what I had to change for may
implementation, so that the patch was not too hard to read. I created specific
unit tests for the new implementation ({{TestMultiWordSynonymFilter}}) that
should show how things will be with the new behavior.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]