One clarification from my previous comment: One requirement is to prevent false matches for instances of "heart infarction" and "myocardial attack" - the current synonym filter does not preserver the "path" or term ordering within the multi-term phrases. Even if the query parser does present the full term sequence as a single input string.

Yes, the position information is preserved, but there is no "path" attribute to be able to tell that "heart" was before "attack" as opposed to before "infarction".

-- Jack Krupansky

-----Original Message----- From: Robert Muir
Sent: Friday, January 25, 2013 9:47 AM
To: dev@lucene.apache.org
Subject: Re: Fixing query-time multi-word synonym issue

On Fri, Jan 25, 2013 at 9:19 AM, Jack Krupansky <j...@basetechnology.com> wrote:
Here's an example query with q.op=AND:

   causes of heart attack

And I have this synonym definition:

   heart attack, myocardial infarction

So, what is the alleged query parser fix so that the query is treated as:

   causes of ("heart attack" OR "myocardial infarction")


Thats actually inefficient and stupid to do. if you make a parser that
doesnt split on whitespace, you can just tell it to fold at index and
query time just like stemming. no OR necessary.

But I think you are trying to get off topic, again the real problem
affecting 99%+ users is that the lucene queryparser splits on
whitespace.

If this is fixed, then lots of things (not just synonyms, but other
basic shit that is broken today) starts working too:
https://issues.apache.org/jira/browse/LUCENE-2605

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to