Michael McCandless created LUCENE-6638:
------------------------------------------
Summary: Factor graph flattening out of SynonymFilter
Key: LUCENE-6638
URL: https://issues.apache.org/jira/browse/LUCENE-6638
Project: Lucene - Core
Issue Type: New Feature
Reporter: Michael McCandless
Assignee: Michael McCandless
Fix For: 5.3, Trunk
Spinoff from LUCENE-6582.
SynonymFilter is very hairy, and has known nearly-impossible-to-fix bugs: it
produces the wrong graph, both accepting too many phrases and not enough
phrases, because it never creates new positions.
This makes improvements like LUCENE-6582, to fix some of its bugs,
unnecessarily hard.
I'd like to pull out the graph flattening into its own token filter, and I
think I have a starting patch that works. I started with the "sausagizer"
stage on the branch from LUCENE-5012, but changed the approach so that it
should not have so many adversarial cases.
I think this should make SynonymFilter quite a bit simpler, hopefully to the
point where we can just fix its bugs already.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]