hmm, long time no response. exclude these testcases to make HUT passed at r646187.
On 2/21/08, Tony Wu <[EMAIL PROTECTED]> wrote: > A little further study. > > The collation is defined in CLDR. Please refer to the data in locale > "es" [1]. There is a block describing the traditional collation. I > quote a part of it below[2]. Let me try to explain a little bit about > this definition. > > First, the term "traditional" is explicitly defined. You can also find > the definition in UTS#35[3] which says "For a traditional-style sort > (as in Spanish) ". > > Second, the data[2] indicates that the rule in traditional spanish > locale should be ... C<ch<<<Ch<<<CH. the tag <p> is "primary", which > is to say the "ch" is a base-character. > > The conclusion is there IS a tradition Spanish collation rule which > has a key "ch". The question is "Is it necessary for Harmony to > support it or just to be the same behavoir as RI?" > > [1] > http://www.unicode.org/repository/*checkout*/cldr/common/collation/es.xml?rev=1.21 > > [2] > <collation type="traditional"> > - <rules> > ... > <reset>C</reset> > <p>ch</p> > <t>Ch</t> > <t>CH</t> > ... > </rules> > </collation> > > [3] > http://www.unicode.org/reports/tr35/ > > > On 2/20/08, Alexei Zakharov <[EMAIL PROTECTED]> wrote: > > ¡Buenos dìas! > > > > :) No, I'm not an expert in Spanish. But after reading your post I got > > an impression that we have support for additional variant of Spanish > > language comparing to RI. However, I've tried to find something about > > traditional Spanish variant in ICU locale browser and found nothing. I > > believe we should learn more about this problem before making any > > decision. > > > > Regards, > > Alexei > > > > 2008/2/19, Tony Wu <[EMAIL PROTECTED]>: > > > Hi, all > > > > > > I'm investigating the regression[1] in text module. Actually these 5 > > > failures come down to one reason: the support of traditional Spanish > > > charactor "ch". Following is my understanding. > > > > > > My fix for HARMONY-5465 makes the Locale.toString be compatible with > > > RI. Before my commit, the toString() of the Locale with empty "contry" > > > field has only one underscore in the output but RI has two. For > > > instance, new Locale("es","","TRADITIONAL").toString() returns > > > "es_TRADITIONAL" in Harmony whereas "es__TRADITIONAL" in RI. Something > > > interesting, ICU makes use of the output of toString() as keyword to > > > indicate its Locale instance. That is to say, the 5 testcases passes > > > before because they have not been tested in real traditional Spanish > > > locale so that the character "ch" was interpreted as two separate > > > characters "c" and "h". That is why we can set the offset to 1 in our > > > testcases. After my commit, ICU find the right Spanish locale so that > > > its behavior is compatible with spec[2]. > > > > > > One thing strange is that I can not get the traditional Spanish locale > > > in RI. RI behaves the same no mater whether there is a variant > > > "TRADITIONAL" or not. Spec does not say anything about the > > > "traditional", but I googled to know that from 1998 the character "ch" > > > has been cancelled in Spanish. I suppose that RI changed the behavior > > > of Spanish locale but forgot to modify the spec accordingly. > > > > > > BTW for the normal Spanish Locale(new Locale("es","ES")), we have the > > > same behavior with RI. Seems ICU supports the traditional Spanish in > > > the form of new Locale("es","","TRADITIONAL") but RI does not. Run > > > testcase below[3] on RI to show the differences. > > > > > > Is there any expert familiar with Spanish here? Neey your advice. > > > > > > [1] > > > http://people.apache.org/~smishura/r628209/Windows_x86/classlib-test/ > > > > > > [2] > > > spec says, > > > For example, consider the following in Spanish: > > > > > > "ca" -> the first key is key('c') and second key is key('a'). > > > "cha" -> the first key is key('ch') and second key is key('a'). > > > > > > > > > [3] > > > RuleBasedCollator rbColl = (RuleBasedCollator) Collator > > > .getInstance(new Locale("es", "", "TRADITIONAL")); > > > String text = "cha"; > > > CollationElementIterator iterator = rbColl > > > .getCollationElementIterator(text); > > > int keyNum = 0; > > > while (iterator.next() != -1) { > > > keyNum++; > > > } > > > System.out.println("RI has " + keyNum + " keys"); > > > > > > com.ibm.icu.text.RuleBasedCollator r = > > > (com.ibm.icu.text.RuleBasedCollator) com.ibm.icu.text.Collator > > > .getInstance(new Locale("es", "", "TRADITIONAL")); > > > com.ibm.icu.text.CollationElementIterator it = r > > > .getCollationElementIterator(text); > > > keyNum = 0; > > > while (it.next() != -1) { > > > keyNum++; > > > } > > > System.out.println("ICU has " + keyNum + " keys"); > > > > > > > > > > > > The output is: > > > RI has 3 keys > > > ICU has 2 keys > > > > > > > > > -- > > > Tony Wu > > > China Software Development Lab, IBM > > > > > > > > -- > Tony Wu > China Software Development Lab, IBM > -- Tony Wu China Software Development Lab, IBM
