Never did I request the QP to do Analysis. I simply mentioned this bug - what this definitely is - so you could tackle it while you're at it. This is an definitely relevant to a discussion about re-making how the QP determines what is a legit PhraseQuery and what is not.
The fix is quite easy I believe - just make sure you don't identify a double-quote as a trigger for starting or ending a phrase unless it is followed by a white-space (or another non-char). An English query like 'Foo"bar"' (with no enclosing quotes...) is invalid anyway (although it is not handled as such at the moment). I cannot handle this on the application side, simply because there the double-quote char is NOT a special character. As I mentioned, for Hebrew it is part of the word, pretty much like Niqqud is. If the user has entered a textual query with an acronym, there's no point in me parsing it once just to escape what I suspect are acronyms and then send it to the core QP, or just create the queries by myself. All this being valid in light of my second paragraph in this message - the fix is easy and also correct for the basic, non-Hebrew, implementation. Itamar. -----Original Message----- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Wednesday, May 12, 2010 4:25 PM To: dev@lucene.apache.org Subject: Re: [jira] Commented: (LUCENE-2458) queryparser shouldn't generate phrasequeries based on term count On Wed, May 12, 2010 at 6:05 AM, Itamar Syn-Hershko <ita...@code972.com> wrote: > The QueryParser also fails to correctly parse Hebrew acronyms; > although not being an integral part of the current discussion, I > thought this would be the best place to bring that up. > Just as I don't think Analysis should do QueryParsing, I don't think QueryParsing should do Analysis either. Similar problems to this exist in other languages (I have to escape : for some, because lucene wants to interpret it as a field name). But this can be easily remedied on the application side, its documented and understood that the double-quote is a special character, and there is an escape mechanism so you can escape the ones you think are acronyms. This issue is about about a buggy implementation: its not documented and only internal to how the queryparser determines what is a phrase query or not (and, contrary to what you would believe from the documentation, the choice of whether or not to make a PhraseQuery is not based on syntax one bit!) -- Robert Muir rcm...@gmail.com --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org