Kathey Marsden <[EMAIL PROTECTED]> writes: > Does anyone know of an easy built in Java mechanism for Locale > sensitive matching? > > I continue to work with a user trying to develop a strategy for > language based string type handling in Derby 10.1. > The ordering seems doable with the approach in > http://wiki.apache.org/db-derby/LanguageBasedOrdering > For <, =. > comparisons I was able to implement a LOCALE_COMPARE > function pretty easily using Collators as well, > but matching (LIKE replacement) seems harder. For example in > Norwegian we need to have "aa" be treated as one character and in the > US have it treated as two. So given the values acorn, aacorn, and > aass ( a Norwegian brewery) , and matching "a.*", we should see three > rows in english and just one in Norwegian. [snip]
Hi Kathey, It is true that in a Norwegian phone book, Wanvik is listed before Waagan. However, Haas (which is not a Norwegian name) would be listed before Hatlen. Likewise, geographical names from other countries could have "aa" which should be treated as two characters in Norwegian (Saarland, Saarbrücken, Haag). Also, you could have composite words like "pizzaauksjon" (pizza auction - whatever that is) which would be listed before "pizzabakar" (pizza baker) in a dictionary. You could also have words where the stem ends with an a and the ending starts with an a, like "dataa" which consists of "data" (same word as in English) and "a" (definite article, plural, neuter). It is not possible to decide how "aa" should be treated without knowing the context, so in general I think it is best if Derby just treats "aa" as two characters and lets the application do the magic if magic is required. But, as many others have said, IANAL... (I Am Not A Linguist) -- Knut Anders
