Steven A Rowe wrote:
Korean has been treated differently from Chinese and Japanese since LUCENE-461 <https://issues.apache.org/jira/browse/LUCENE-461>. The grouping of Hangul with digits was introduced in this issue.
Certainly I found LUCENE-461 during my search, and certainly grouping together the words is a lot better *if* there are spaces between them. Although in several cases I have found there are no spaces, it's relatively rare and the way it's breaking it now appears to produce better hits than when it was separating them out.
Really I was just wondering about the digits being mixed in. Maybe it's legitimate to have a digit in the middle of a sequence of Hangul, like when we have AB3F for a product code with Latin characters.
You're right though, to do differently it will require a lot of jiggery to restrict ranges down to each language again instead of using [:letter:] which is much more convenient.
Daniel -- Daniel Noll Forensic and eDiscovery Software Senior Developer The world's most advanced Nuix email data analysis http://nuix.com/ and eDiscovery software --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]