Again, this is not a hack, and that was exactly my point. As I said:

> resolving this is very simple, by just applying a correct logic 
> (ignore double-quotes followed by a char) which isn't enforced today 
> and once it will be, it won't cause any cases of unexpected behavior.

It is just valid for English queries to ignore double-quotes in mid-word
instead of tokenizing upon it if not followed by an empty char, as it is in
Hebrew.

Itamar. 

-----Original Message-----
From: Robert Muir [mailto:rcm...@gmail.com] 
Sent: Thursday, May 13, 2010 3:24 AM
To: dev@lucene.apache.org
Subject: Re: [jira] Commented: (LUCENE-2458) queryparser shouldn't generate
phrasequeries based on term count

Internationalization doesn't work by just piling hacks for language X,
language Y, and language Z on top of each other.

Just like I want the English hack removed, I strongly recommend against
adding any Hebrew hack.

On Wed, May 12, 2010 at 6:55 PM, Itamar Syn-Hershko <ita...@code972.com>
wrote:
> I think we understand each other perfectly well. I still think 
> resolving this is very simple, by just applying a correct logic 
> (ignore double-quotes followed by a char) which isn't enforced today 
> and once it will be, it won't cause any cases of unexpected behavior. 
> This isn't an analysis related task, and I'm not sure what  makes you 
> insist so bad. I will be openning a dedicated JIRA ticket for this 
> discussion if this won't become part of the current one.
>
> Itamar.
>
> -----Original Message-----
> From: Robert Muir [mailto:rcm...@gmail.com]
> Sent: Thursday, May 13, 2010 1:42 AM
> To: dev@lucene.apache.org
> Subject: Re: [jira] Commented: (LUCENE-2458) queryparser shouldn't 
> generate phrasequeries based on term count
>
> On Wed, May 12, 2010 at 6:30 PM, Itamar Syn-Hershko 
> <ita...@code972.com>
> wrote:
>> Never did I request the QP to do Analysis. I simply mentioned this 
>> bug
>> - what this definitely is -
>
> Its definitely not a bug for Hebrew, there is a unicode character for 
> gershayim (U+05F4), so technically this should be used according to
unicode.
>
> Its arguably your responsibility to convert your data to unicode 
> before passing it thru Lucene, and that includes disambiguating when a 
> double quote should be gershayim
>
> --
> Robert Muir
> rcm...@gmail.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For 
> additional commands, e-mail: dev-h...@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For 
> additional commands, e-mail: dev-h...@lucene.apache.org
>
>



--
Robert Muir
rcm...@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
commands, e-mail: dev-h...@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to