Never did I request the QP to do Analysis. I simply mentioned this bug -
what this definitely is - so you could tackle it while you're at it. This is
an definitely relevant to a discussion about re-making how the QP determines
what is a legit PhraseQuery and what is not.

The fix is quite easy I believe - just make sure you don't identify a
double-quote as a trigger for starting or ending a phrase unless it is
followed by a white-space (or another non-char). An English query like
'Foo"bar"' (with no enclosing quotes...) is invalid anyway (although it is
not handled as such at the moment).

I cannot handle this on the application side, simply because there the
double-quote char is NOT a special character. As I mentioned, for Hebrew it
is part of the word, pretty much like Niqqud is. If the user has entered a
textual query with an acronym, there's no point in me parsing it once just
to escape what I suspect are acronyms and then send it to the core QP, or
just create the queries by myself. All this being valid in light of my
second paragraph in this message - the fix is easy and also correct for the
basic, non-Hebrew, implementation.

Itamar.

-----Original Message-----
From: Robert Muir [mailto:rcm...@gmail.com] 
Sent: Wednesday, May 12, 2010 4:25 PM
To: dev@lucene.apache.org
Subject: Re: [jira] Commented: (LUCENE-2458) queryparser shouldn't generate
phrasequeries based on term count

On Wed, May 12, 2010 at 6:05 AM, Itamar Syn-Hershko <ita...@code972.com>
wrote:
> The QueryParser also fails to correctly parse Hebrew acronyms; 
> although not being an integral part of the current discussion, I 
> thought this would be the best place to bring that up.
>

Just as I don't think Analysis should do QueryParsing, I don't think
QueryParsing should do Analysis either.
Similar problems to this exist in other languages (I have to escape :
for some, because lucene wants to interpret it as a field name).

But this can be easily remedied on the application side, its documented and
understood that the double-quote is a special character, and there is an
escape mechanism so you can escape the ones you think are acronyms.

This issue is about about a buggy implementation: its not documented and
only internal to how the queryparser determines what is a phrase query or
not (and, contrary to what you would believe from the documentation, the
choice of whether or not to make a PhraseQuery is not based on syntax one
bit!)

--
Robert Muir
rcm...@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
commands, e-mail: dev-h...@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to