Actually I was wrong at fixing the first "bug" you reported. It was actually .NET which is buggy, though unlike older Mono it doesn't result in an unhandled exception.
http://demo.icu-project.org/icu-bin/nbrowser?t=\u03B1\u0313\u0345&s=&uv=0 To examine C# implementation, try below: foreach (char c in "\u03B1\u0313\u0345".Normalize ()) Console.Write ("{0:X04} ", (int) c); NET outputs: 03B1 0313 0345 I have a fix that corrects the output as: 1F80 I'll check in the fix soon. With the fix your test prints all "True". Atsushi Eno Atsushi Eno wrote: > Hi Tom, and Tom :) > > I have tried the Hindle version of the test. > > Summary: the sample depends on .NET bug; 2 .NET bugs, 1 mono bug. > > This exactly shows that .NET Normalization is buggy. Here is the > result from ICU normalization results: > http://demo.icu-project.org/icu-bin/nbrowser?t=\u00e1bc&s=&uv=0 > > i.e. in NFKD, \u00e1bc must be decomposed to \u0061\u0301\u0062\u0063, > while .NET returns the same string as the input. > > The sample code is confusing because it uses "styleName" output > to the next input. .NET does not correctly decompose it to > \u0061\u0301\u0062\u0063, while Mono is correct. When it ran on mono, > it keeps using the correct NFKD as the next input to the following > normalizations and hence difference in NFKC (i.e. we have no bug in > normalizing NFKC string, unlike the test claims). > > I have created a bit visible modification below: > http://pastebin.ca/1465907 > > Though, there seems a mono bug on NFD-to-NFC and NFKD-to-NFKC > composition. I have extracted a simpler test: > > string s1 = "\u0061\u0301bc"; > string s2 = "\u00e1bc"; > Console.WriteLine (s1.Normalize () == s2); > > *Both* Mono and .NET says "False", but it must be "True". See > ICU conversion results: > http://demo.icu-project.org/icu-bin/nbrowser?t=\u0061\u0301bc&s=&uv=0 > Its NFC must be \u00e1\u0062\u0063 (the string s2 above). > > I'll work on fixing the composition part of the issue. > > I haven't tried the Philpot version as I have never installed > mbunit on this Windows machine - it'd be nicer if the sample just > compiles and runs within standard libs to make it possible to > integrate our nunit tests. > > Atsushi Eno > > > Tom Hindle wrote: >> Attached small self contained my test case. >> I think the output should be 5 trues. >> >> I getting 2 Trues and 3 Fails. on mono version r136435 >> >> Incidentally .NET returns 5 trues for this test case. >> >> Is there a Bugzilla entry for this issue? >> >> >> >> Also normalization-tables.h is now has windows line endings (CRLF) >> >> Thanks >> Tom >> >> On Thu, 2009-06-18 at 13:51 -0700, Tom Philpot wrote: >>> Here is a revision of the test case I sent earlier to the list that >>> doesn't >>> rely on any specific encoding (only uses '\uXXXX' characters). >>> >>> Hopefully this will be helpful. >>> >>> Tom >>> >>> >>> On 6/18/09 1:49 PM, "Tom Hindle" <tom_hin...@sil.org> wrote: >>> >>>> Hi Guys, >>>> >>>> With regard to recent Normalization changes I have just run our test >>>> suite with recent mono r136422 - and are getting a number of >>>> regressions. >>>> >>>> >>>> For example: >>>> >>>> { >>>> string styleName = "\u00e1bc"; >>>> StStyle style = new StStyle(); >>>> Cache.LangProject.StylesOC.Add(style); >>>> style.Name = styleName; >>>> >>>> FwStyleSheet.StyleInfoCollection styleCollection = new >>>> FwStyleSheet.StyleInfoCollection(); >>>> styleCollection.Add(new BaseStyleInfo(style)); >>>> >>>> >>> Assert.IsTrue(styleCollection.Contains(styleName.Normalize(NormalizationForm.F >>>> ormC))); >>> Assert.IsTrue(styleCollection.Contains(styleName.Normalize(Normalizat >>>> ionForm.FormD))); >>> Assert.IsTrue(styleCollection.Contains(styleName.Normalize >>>> (NormalizationForm.FormKC))); >>> Assert.IsTrue(styleCollection.Contains(styleName >>>> .Normalize(NormalizationForm.FormKD))); >>>> } >>>> >>>> is now failing, as well as other larger unit tests. >>>> >>>> I will look info this further to try and produce an example test >>> program >>>> that doesn't contain references to our code base. >>>> >>>> Thanks >>>> Tom >>>> >>>> On Thu, 2009-06-18 at 15:01 +0900, Atsushi Eno wrote: >>>>> Hi, >>>>> >>>>> If you mean the test cases by the previous email, then that's what >>>>> (I said) includes raw native encoding in your land (Latin1?) and is >>>>> what I cannot read. Replace them all with ASCII representation >>> (\uxxxx). >>>>> Even if the attachment includes encoding (you mean BOMs?), it is >>>>> not readable in some environment (like the text editor I use on >>>>> Windows). Let me repeat, Latin1 is not universal. Don't depend on >>> it >>>>> (if you do). >>>>> >>>>> Atsushi Eno >>>>> >>>>> >>>>> Tom Philpot wrote: >>>>>> Atsushi, >>>>>> >>>>>> Thanks for the feedback. For some reason, the Mac when displaying >>>>>> unicode always composes strings before display. I'll look at the >>> test >>>>>> case in corlib tomorrow when I get in to work. Would it be helpful >>> for >>>>>> the test cases if I gave you both the formD bytes and the formC >>> bytes >>>>>> that I think are correct for the test case I sent? Perhaps the >>> encoding >>>>>> did not come across in the attachment. >>>>>> >>>>>> We have a workaround for the Mac port of our app which would >>> require >>>>>> overriding string.Normalize to p/invoke to Mac OS X's NSString >>> library >>>>>> to do normalization. It would work, but we would prefer not to >>> have to >>>>>> ship a custom build of Mono. The normalization on .NET appears to >>> be >>>>>> "good enough" for our purposes and we'd just like our Mac version >>> to be >>>>>> consistent. >>>>>> >>>>>> Tom >>>>>> >>>>>> -----Original Message----- >>>>>> From: Atsushi Eno [mailto:atsushi...@veritas-vos-liberabit.com] >>>>>> Sent: Wed 6/17/2009 7:51 PM >>>>>> To: Tom Philpot >>>>>> Cc: mono-devel-l...@ximian.com >>>>>> Subject: Re: [Mono-dev] Unhandled Exception in Normalization.cs >>> Combine() >>>>>> You seem to have embedded raw native encoding in your land that >>>>>> is *not* understandable in Japan. Anyways the input string you >>>>>> posted in the previous sample was already in FormC which will >>>>>> look like "doing nothing" as the conversion results. >>>>>> >>>>>> There is a standalone normalization test generated from >>> normalization >>>>>> conformance test in corlib/Mono.Globalization.Unicode. We fail >>>>>> about 26000. Far from good, but still better than 35000 on .NET. >>>>>> >>>>>> Atsushi Eno >>>>>> >>>>>> Tom Philpot wrote: >>>>>>> Now, string.Normalize(NormalizationForm.FormC) doesn't do >>> anything using >>>>>>> mono (r136228). >>>>>>> >>>>>>> I've attached some test cases which will hopefully help in >>> tracking down >>>>>>> what doesn't work. >>>>>>> >>>>>>> On 6/15/09 1:58 AM, "Atsushi Eno" >>> <atsushi...@veritas-vos-liberabit.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi again, >>>>>>>> >>>>>>>> It should be now fixed in trunk. >>>>>>>> >>>>>>>> Atsushi Eno >>>>>>>> >>>>>>>> Atsushi Eno wrote: >>>>>>>>> I'll have a look. However since 4 years have passed since I >>> wrote it, >>>>>>>>> I'll have to revisit the spec and will take not a little time. >>>>>>>>> >>>>>>>>> Atsushi Eno >>>>>>>>> >>>>> _______________________________________________ >>>>> Mono-devel-list mailing list >>>>> Mono-devel-list@lists.ximian.com >>>>> http://lists.ximian.com/mailman/listinfo/mono-devel-list >>> > > _______________________________________________ > Mono-devel-list mailing list > Mono-devel-list@lists.ximian.com > http://lists.ximian.com/mailman/listinfo/mono-devel-list > > > _______________________________________________ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list