Atsushi, Thats great - our unit tests that involve Normalization stuff now pass with mono r136521.
Regression squashed from our perspective! Thanks very much! Tom On Fri, 2009-06-19 at 19:04 +0900, Atsushi Eno wrote: > Actually I was wrong at fixing the first "bug" you reported. It was > actually .NET which is buggy, though unlike older Mono it doesn't result > in an unhandled exception. > > http://demo.icu-project.org/icu-bin/nbrowser?t=\u03B1\u0313\u0345&s=&uv=0 > > To examine C# implementation, try below: > > foreach (char c in "\u03B1\u0313\u0345".Normalize ()) > Console.Write ("{0:X04} ", (int) c); > > NET outputs: 03B1 0313 0345 > > I have a fix that corrects the output as: 1F80 > > I'll check in the fix soon. With the fix your test prints all "True". > > Atsushi Eno > > > Atsushi Eno wrote: > > Hi Tom, and Tom :) > > > > I have tried the Hindle version of the test. > > > > Summary: the sample depends on .NET bug; 2 .NET bugs, 1 mono bug. > > > > This exactly shows that .NET Normalization is buggy. Here is the > > result from ICU normalization results: > > http://demo.icu-project.org/icu-bin/nbrowser?t=\u00e1bc&s=&uv=0 > > > > i.e. in NFKD, \u00e1bc must be decomposed to \u0061\u0301\u0062\u0063, > > while .NET returns the same string as the input. > > > > The sample code is confusing because it uses "styleName" output > > to the next input. .NET does not correctly decompose it to > > \u0061\u0301\u0062\u0063, while Mono is correct. When it ran on mono, > > it keeps using the correct NFKD as the next input to the following > > normalizations and hence difference in NFKC (i.e. we have no bug in > > normalizing NFKC string, unlike the test claims). > > > > I have created a bit visible modification below: > > http://pastebin.ca/1465907 > > > > Though, there seems a mono bug on NFD-to-NFC and NFKD-to-NFKC > > composition. I have extracted a simpler test: > > > > string s1 = "\u0061\u0301bc"; > > string s2 = "\u00e1bc"; > > Console.WriteLine (s1.Normalize () == s2); > > > > *Both* Mono and .NET says "False", but it must be "True". See > > ICU conversion results: > > http://demo.icu-project.org/icu-bin/nbrowser?t=\u0061\u0301bc&s=&uv=0 > > Its NFC must be \u00e1\u0062\u0063 (the string s2 above). > > > > I'll work on fixing the composition part of the issue. > > > > I haven't tried the Philpot version as I have never installed > > mbunit on this Windows machine - it'd be nicer if the sample just > > compiles and runs within standard libs to make it possible to > > integrate our nunit tests. > > > > Atsushi Eno > > > > > > Tom Hindle wrote: > >> Attached small self contained my test case. > >> I think the output should be 5 trues. > >> > >> I getting 2 Trues and 3 Fails. on mono version r136435 > >> > >> Incidentally .NET returns 5 trues for this test case. > >> > >> Is there a Bugzilla entry for this issue? > >> > >> > >> > >> Also normalization-tables.h is now has windows line endings (CRLF) > >> > >> Thanks > >> Tom > >> > >> On Thu, 2009-06-18 at 13:51 -0700, Tom Philpot wrote: > >>> Here is a revision of the test case I sent earlier to the list that > >>> doesn't > >>> rely on any specific encoding (only uses '\uXXXX' characters). > >>> > >>> Hopefully this will be helpful. > >>> > >>> Tom > >>> > >>> > >>> On 6/18/09 1:49 PM, "Tom Hindle" <tom_hin...@sil.org> wrote: > >>> > >>>> Hi Guys, > >>>> > >>>> With regard to recent Normalization changes I have just run our test > >>>> suite with recent mono r136422 - and are getting a number of > >>>> regressions. > >>>> > >>>> > >>>> For example: > >>>> > >>>> { > >>>> string styleName = "\u00e1bc"; > >>>> StStyle style = new StStyle(); > >>>> Cache.LangProject.StylesOC.Add(style); > >>>> style.Name = styleName; > >>>> > >>>> FwStyleSheet.StyleInfoCollection styleCollection = new > >>>> FwStyleSheet.StyleInfoCollection(); > >>>> styleCollection.Add(new BaseStyleInfo(style)); > >>>> > >>>> > >>> Assert.IsTrue(styleCollection.Contains(styleName.Normalize(NormalizationForm.F > >>>> ormC))); > >>> Assert.IsTrue(styleCollection.Contains(styleName.Normalize(Normalizat > >>>> ionForm.FormD))); > >>> Assert.IsTrue(styleCollection.Contains(styleName.Normalize > >>>> (NormalizationForm.FormKC))); > >>> Assert.IsTrue(styleCollection.Contains(styleName > >>>> .Normalize(NormalizationForm.FormKD))); > >>>> } > >>>> > >>>> is now failing, as well as other larger unit tests. > >>>> > >>>> I will look info this further to try and produce an example test > >>> program > >>>> that doesn't contain references to our code base. > >>>> > >>>> Thanks > >>>> Tom > >>>> > >>>> On Thu, 2009-06-18 at 15:01 +0900, Atsushi Eno wrote: > >>>>> Hi, > >>>>> > >>>>> If you mean the test cases by the previous email, then that's what > >>>>> (I said) includes raw native encoding in your land (Latin1?) and is > >>>>> what I cannot read. Replace them all with ASCII representation > >>> (\uxxxx). > >>>>> Even if the attachment includes encoding (you mean BOMs?), it is > >>>>> not readable in some environment (like the text editor I use on > >>>>> Windows). Let me repeat, Latin1 is not universal. Don't depend on > >>> it > >>>>> (if you do). > >>>>> > >>>>> Atsushi Eno > >>>>> > >>>>> > >>>>> Tom Philpot wrote: > >>>>>> Atsushi, > >>>>>> > >>>>>> Thanks for the feedback. For some reason, the Mac when displaying > >>>>>> unicode always composes strings before display. I'll look at the > >>> test > >>>>>> case in corlib tomorrow when I get in to work. Would it be helpful > >>> for > >>>>>> the test cases if I gave you both the formD bytes and the formC > >>> bytes > >>>>>> that I think are correct for the test case I sent? Perhaps the > >>> encoding > >>>>>> did not come across in the attachment. > >>>>>> > >>>>>> We have a workaround for the Mac port of our app which would > >>> require > >>>>>> overriding string.Normalize to p/invoke to Mac OS X's NSString > >>> library > >>>>>> to do normalization. It would work, but we would prefer not to > >>> have to > >>>>>> ship a custom build of Mono. The normalization on .NET appears to > >>> be > >>>>>> "good enough" for our purposes and we'd just like our Mac version > >>> to be > >>>>>> consistent. > >>>>>> > >>>>>> Tom > >>>>>> > >>>>>> -----Original Message----- > >>>>>> From: Atsushi Eno [mailto:atsushi...@veritas-vos-liberabit.com] > >>>>>> Sent: Wed 6/17/2009 7:51 PM > >>>>>> To: Tom Philpot > >>>>>> Cc: mono-devel-l...@ximian.com > >>>>>> Subject: Re: [Mono-dev] Unhandled Exception in Normalization.cs > >>> Combine() > >>>>>> You seem to have embedded raw native encoding in your land that > >>>>>> is *not* understandable in Japan. Anyways the input string you > >>>>>> posted in the previous sample was already in FormC which will > >>>>>> look like "doing nothing" as the conversion results. > >>>>>> > >>>>>> There is a standalone normalization test generated from > >>> normalization > >>>>>> conformance test in corlib/Mono.Globalization.Unicode. We fail > >>>>>> about 26000. Far from good, but still better than 35000 on .NET. > >>>>>> > >>>>>> Atsushi Eno > >>>>>> > >>>>>> Tom Philpot wrote: > >>>>>>> Now, string.Normalize(NormalizationForm.FormC) doesn't do > >>> anything using > >>>>>>> mono (r136228). > >>>>>>> > >>>>>>> I've attached some test cases which will hopefully help in > >>> tracking down > >>>>>>> what doesn't work. > >>>>>>> > >>>>>>> On 6/15/09 1:58 AM, "Atsushi Eno" > >>> <atsushi...@veritas-vos-liberabit.com> > >>>>>>> wrote: > >>>>>>> > >>>>>>>> Hi again, > >>>>>>>> > >>>>>>>> It should be now fixed in trunk. > >>>>>>>> > >>>>>>>> Atsushi Eno > >>>>>>>> > >>>>>>>> Atsushi Eno wrote: > >>>>>>>>> I'll have a look. However since 4 years have passed since I > >>> wrote it, > >>>>>>>>> I'll have to revisit the spec and will take not a little time. > >>>>>>>>> > >>>>>>>>> Atsushi Eno > >>>>>>>>> > >>>>> _______________________________________________ > >>>>> Mono-devel-list mailing list > >>>>> Mono-devel-list@lists.ximian.com > >>>>> http://lists.ximian.com/mailman/listinfo/mono-devel-list > >>> > > > > _______________________________________________ > > Mono-devel-list mailing list > > Mono-devel-list@lists.ximian.com > > http://lists.ximian.com/mailman/listinfo/mono-devel-list > > > > > > > _______________________________________________ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list