RE: [jira] Commented: (LUCENE-2458) queryparser shouldn't generate phrasequeries based on term count

Itamar Syn-Hershko Wed, 12 May 2010 22:21:45 -0700

Again, this is not a hack, and that was exactly my point. As I said:

> resolving this is very simple, by just applying a correct logic 
> (ignore double-quotes followed by a char) which isn't enforced today 
> and once it will be, it won't cause any cases of unexpected behavior.


It is just valid for English queries to ignore double-quotes in mid-word
instead of tokenizing upon it if not followed by an empty char, as it is in
Hebrew.

Itamar. 

-----Original Message-----
From: Robert Muir [mailto:[email protected]] 
Sent: Thursday, May 13, 2010 3:24 AM
To: [email protected]
Subject: Re: [jira] Commented: (LUCENE-2458) queryparser shouldn't generate
phrasequeries based on term count

Internationalization doesn't work by just piling hacks for language X,
language Y, and language Z on top of each other.

Just like I want the English hack removed, I strongly recommend against
adding any Hebrew hack.

On Wed, May 12, 2010 at 6:55 PM, Itamar Syn-Hershko <[email protected]>
wrote:
> I think we understand each other perfectly well. I still think 
> resolving this is very simple, by just applying a correct logic 
> (ignore double-quotes followed by a char) which isn't enforced today 
> and once it will be, it won't cause any cases of unexpected behavior. 
> This isn't an analysis related task, and I'm not sure what  makes you 
> insist so bad. I will be openning a dedicated JIRA ticket for this 
> discussion if this won't become part of the current one.
>
> Itamar.
>
> -----Original Message-----
> From: Robert Muir [mailto:[email protected]]
> Sent: Thursday, May 13, 2010 1:42 AM
> To: [email protected]
> Subject: Re: [jira] Commented: (LUCENE-2458) queryparser shouldn't 
> generate phrasequeries based on term count
>
> On Wed, May 12, 2010 at 6:30 PM, Itamar Syn-Hershko 
> <[email protected]>
> wrote:
>> Never did I request the QP to do Analysis. I simply mentioned this 
>> bug
>> - what this definitely is -
>
> Its definitely not a bug for Hebrew, there is a unicode character for 
> gershayim (U+05F4), so technically this should be used according to
unicode.
>
> Its arguably your responsibility to convert your data to unicode 
> before passing it thru Lucene, and that includes disambiguating when a 
> double quote should be gershayim
>
> --
> Robert Muir
> [email protected]
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected] For 
> additional commands, e-mail: [email protected]
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected] For 
> additional commands, e-mail: [email protected]
>
>



--
Robert Muir
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected] For additional
commands, e-mail: [email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

RE: [jira] Commented: (LUCENE-2458) queryparser shouldn't generate phrasequeries based on term count

Reply via email to