RE: 2.3.2 -> 2.4.0 StandardTokenizer issue

Philip Puffinburger Sat, 21 Feb 2009 09:19:51 -0800

Thanks for the suggestion.   We're going to go over all of this 
information/suggestions next week to see what we want to do.

-----Original Message-----
From: Robert Muir [mailto:rcm...@gmail.com] 
Sent: Saturday, February 21, 2009 11:52 AM
To: java-user@lucene.apache.org
Subject: Re: 2.3.2 -> 2.4.0 StandardTokenizer issue

that was just a suggestion as a quick hack...

it still won't really fix the problem because some character + accent
combinations don't have composed forms.

even if you added entire combining diacritical marks block to the jflex
grammar, its still wrong... what needs to be supported is \p{Word_Break =
Extend} property, etc etc.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

RE: 2.3.2 -> 2.4.0 StandardTokenizer issue

Reply via email to