Again, this is not a hack, and that was exactly my point. As I said: > resolving this is very simple, by just applying a correct logic > (ignore double-quotes followed by a char) which isn't enforced today > and once it will be, it won't cause any cases of unexpected behavior.
It is just valid for English queries to ignore double-quotes in mid-word instead of tokenizing upon it if not followed by an empty char, as it is in Hebrew. Itamar. -----Original Message----- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Thursday, May 13, 2010 3:24 AM To: dev@lucene.apache.org Subject: Re: [jira] Commented: (LUCENE-2458) queryparser shouldn't generate phrasequeries based on term count Internationalization doesn't work by just piling hacks for language X, language Y, and language Z on top of each other. Just like I want the English hack removed, I strongly recommend against adding any Hebrew hack. On Wed, May 12, 2010 at 6:55 PM, Itamar Syn-Hershko <ita...@code972.com> wrote: > I think we understand each other perfectly well. I still think > resolving this is very simple, by just applying a correct logic > (ignore double-quotes followed by a char) which isn't enforced today > and once it will be, it won't cause any cases of unexpected behavior. > This isn't an analysis related task, and I'm not sure what makes you > insist so bad. I will be openning a dedicated JIRA ticket for this > discussion if this won't become part of the current one. > > Itamar. > > -----Original Message----- > From: Robert Muir [mailto:rcm...@gmail.com] > Sent: Thursday, May 13, 2010 1:42 AM > To: dev@lucene.apache.org > Subject: Re: [jira] Commented: (LUCENE-2458) queryparser shouldn't > generate phrasequeries based on term count > > On Wed, May 12, 2010 at 6:30 PM, Itamar Syn-Hershko > <ita...@code972.com> > wrote: >> Never did I request the QP to do Analysis. I simply mentioned this >> bug >> - what this definitely is - > > Its definitely not a bug for Hebrew, there is a unicode character for > gershayim (U+05F4), so technically this should be used according to unicode. > > Its arguably your responsibility to convert your data to unicode > before passing it thru Lucene, and that includes disambiguating when a > double quote should be gershayim > > -- > Robert Muir > rcm...@gmail.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For > additional commands, e-mail: dev-h...@lucene.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For > additional commands, e-mail: dev-h...@lucene.apache.org > > -- Robert Muir rcm...@gmail.com --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org