Atsushi,

Thats great - our unit tests that involve Normalization stuff now pass
with mono r136521.

Regression squashed from our perspective!

Thanks very much!
Tom

On Fri, 2009-06-19 at 19:04 +0900, Atsushi Eno wrote:
> Actually I was wrong at fixing the first "bug" you reported. It was
> actually .NET which is buggy, though unlike older Mono it doesn't result
> in an unhandled exception.
> 
> http://demo.icu-project.org/icu-bin/nbrowser?t=\u03B1\u0313\u0345&s=&uv=0
> 
> To examine C# implementation, try below:
> 
>       foreach (char c in "\u03B1\u0313\u0345".Normalize ())
>               Console.Write ("{0:X04} ", (int) c);
> 
> NET outputs: 03B1 0313 0345
> 
> I have a fix that corrects the output as: 1F80
> 
> I'll check in the fix soon. With the fix your test prints all "True".
> 
> Atsushi Eno
> 
> 
> Atsushi Eno wrote:
> > Hi Tom, and Tom :)
> > 
> > I have tried the Hindle version of the test.
> > 
> > Summary: the sample depends on .NET bug; 2 .NET bugs, 1 mono bug.
> > 
> > This exactly shows that .NET Normalization is buggy. Here is the
> > result from ICU normalization results:
> > http://demo.icu-project.org/icu-bin/nbrowser?t=\u00e1bc&s=&uv=0
> > 
> > i.e. in NFKD, \u00e1bc must be decomposed to \u0061\u0301\u0062\u0063,
> > while .NET returns the same string as the input.
> > 
> > The sample code is confusing because it uses "styleName" output
> > to the next input. .NET does not correctly decompose it to
> > \u0061\u0301\u0062\u0063, while Mono is correct. When it ran on mono,
> > it keeps using the correct NFKD as the next input to the following
> > normalizations and hence difference in NFKC (i.e. we have no bug in
> > normalizing NFKC string, unlike the test claims).
> > 
> > I have created a bit visible modification below:
> > http://pastebin.ca/1465907
> > 
> > Though, there seems a mono bug on NFD-to-NFC and NFKD-to-NFKC
> > composition. I have extracted a simpler test:
> > 
> >     string s1 = "\u0061\u0301bc";
> >     string s2 = "\u00e1bc";
> >     Console.WriteLine (s1.Normalize () == s2);
> > 
> > *Both* Mono and .NET says "False", but it must be "True". See
> > ICU conversion results:
> > http://demo.icu-project.org/icu-bin/nbrowser?t=\u0061\u0301bc&s=&uv=0
> > Its NFC must be \u00e1\u0062\u0063 (the string s2 above).
> > 
> > I'll work on fixing the composition part of the issue.
> > 
> > I haven't tried the Philpot version as I have never installed
> > mbunit on this Windows machine - it'd be nicer if the sample just
> > compiles and runs within standard libs to make it possible to
> > integrate our nunit tests.
> > 
> > Atsushi Eno
> > 
> > 
> > Tom Hindle wrote:
> >> Attached small self contained my test case.
> >> I think the output should be 5 trues.
> >>
> >> I getting 2 Trues and 3 Fails. on mono version r136435
> >>
> >> Incidentally .NET returns 5 trues for this test case.
> >>
> >> Is there a Bugzilla entry for this issue?
> >>
> >>
> >>
> >> Also normalization-tables.h is now has windows line endings (CRLF)
> >>
> >> Thanks
> >> Tom
> >>
> >> On Thu, 2009-06-18 at 13:51 -0700, Tom Philpot wrote:
> >>> Here is a revision of the test case I sent earlier to the list that
> >>> doesn't
> >>> rely on any specific encoding (only uses '\uXXXX' characters).
> >>>
> >>> Hopefully this will be helpful.
> >>>
> >>> Tom
> >>>
> >>>
> >>> On 6/18/09 1:49 PM, "Tom Hindle" <tom_hin...@sil.org> wrote:
> >>>
> >>>> Hi Guys,
> >>>>
> >>>> With regard to recent Normalization changes I have just run our test
> >>>> suite with recent mono r136422 - and are getting a number of
> >>>> regressions.
> >>>>
> >>>>
> >>>> For example:
> >>>>
> >>>> {
> >>>> string styleName = "\u00e1bc";
> >>>> StStyle style = new StStyle();
> >>>> Cache.LangProject.StylesOC.Add(style);
> >>>> style.Name = styleName;
> >>>>
> >>>> FwStyleSheet.StyleInfoCollection styleCollection = new
> >>>> FwStyleSheet.StyleInfoCollection();
> >>>> styleCollection.Add(new BaseStyleInfo(style));
> >>>>
> >>>>
> >>> Assert.IsTrue(styleCollection.Contains(styleName.Normalize(NormalizationForm.F
> >>>> ormC)));
> >>> Assert.IsTrue(styleCollection.Contains(styleName.Normalize(Normalizat
> >>>> ionForm.FormD)));
> >>> Assert.IsTrue(styleCollection.Contains(styleName.Normalize
> >>>> (NormalizationForm.FormKC)));
> >>> Assert.IsTrue(styleCollection.Contains(styleName
> >>>> .Normalize(NormalizationForm.FormKD)));
> >>>> }
> >>>>
> >>>> is now failing, as well as other larger unit tests.
> >>>>
> >>>> I will look info this further to try and produce an example test
> >>> program
> >>>> that doesn't contain references to our code base.
> >>>>
> >>>> Thanks
> >>>> Tom
> >>>>
> >>>> On Thu, 2009-06-18 at 15:01 +0900, Atsushi Eno wrote:
> >>>>> Hi,
> >>>>>
> >>>>> If you mean the test cases by the previous email, then that's what
> >>>>> (I said) includes raw native encoding in your land (Latin1?) and is
> >>>>> what I cannot read. Replace them all with ASCII representation
> >>> (\uxxxx).
> >>>>> Even if the attachment includes encoding (you mean BOMs?), it is
> >>>>> not readable in some environment (like the text editor I use on
> >>>>> Windows). Let me repeat, Latin1 is not universal. Don't depend on
> >>> it
> >>>>> (if you do).
> >>>>>
> >>>>> Atsushi Eno
> >>>>>
> >>>>>
> >>>>> Tom Philpot wrote:
> >>>>>> Atsushi,
> >>>>>>
> >>>>>> Thanks for the feedback. For some reason, the Mac when displaying
> >>>>>> unicode always composes strings before display. I'll look at the
> >>> test
> >>>>>> case in corlib tomorrow when I get in to work. Would it be helpful
> >>> for
> >>>>>> the test cases if I gave you both the formD bytes and the formC
> >>> bytes
> >>>>>> that I think are correct for the test case I sent? Perhaps the
> >>> encoding
> >>>>>> did not come across in the attachment.
> >>>>>>
> >>>>>> We have a workaround for the Mac port of our app which would
> >>> require
> >>>>>> overriding string.Normalize to p/invoke to Mac OS X's NSString
> >>> library
> >>>>>> to do normalization. It would work, but we would prefer not to
> >>> have to
> >>>>>> ship a custom build of Mono. The normalization on .NET appears to
> >>> be
> >>>>>> "good enough" for our purposes and we'd just like our Mac version
> >>> to be
> >>>>>> consistent.
> >>>>>>
> >>>>>> Tom
> >>>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Atsushi Eno [mailto:atsushi...@veritas-vos-liberabit.com]
> >>>>>> Sent: Wed 6/17/2009 7:51 PM
> >>>>>> To: Tom Philpot
> >>>>>> Cc: mono-devel-l...@ximian.com
> >>>>>> Subject: Re: [Mono-dev] Unhandled Exception in Normalization.cs
> >>> Combine()
> >>>>>> You seem to have embedded raw native encoding in your land that
> >>>>>> is *not* understandable in Japan. Anyways the input string you
> >>>>>> posted in the previous sample was already in FormC which will
> >>>>>> look like "doing nothing" as the conversion results.
> >>>>>>
> >>>>>> There is a standalone normalization test generated from
> >>> normalization
> >>>>>> conformance test in corlib/Mono.Globalization.Unicode. We fail
> >>>>>> about 26000. Far from good, but still better than 35000 on .NET.
> >>>>>>
> >>>>>> Atsushi Eno
> >>>>>>
> >>>>>> Tom Philpot wrote:
> >>>>>>> Now, string.Normalize(NormalizationForm.FormC) doesn't do
> >>> anything using
> >>>>>>> mono (r136228).
> >>>>>>>
> >>>>>>> I've attached some test cases which will hopefully help in
> >>> tracking down
> >>>>>>> what doesn't work.
> >>>>>>>
> >>>>>>> On 6/15/09 1:58 AM, "Atsushi Eno"
> >>> <atsushi...@veritas-vos-liberabit.com>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Hi again,
> >>>>>>>>
> >>>>>>>> It should be now fixed in trunk.
> >>>>>>>>
> >>>>>>>> Atsushi Eno
> >>>>>>>>
> >>>>>>>> Atsushi Eno wrote:
> >>>>>>>>> I'll have a look. However since 4 years have passed since I
> >>> wrote it,
> >>>>>>>>> I'll have to revisit the spec and will take not a little time.
> >>>>>>>>>
> >>>>>>>>> Atsushi Eno
> >>>>>>>>>
> >>>>> _______________________________________________
> >>>>> Mono-devel-list mailing list
> >>>>> Mono-devel-list@lists.ximian.com
> >>>>> http://lists.ximian.com/mailman/listinfo/mono-devel-list
> >>>
> > 
> > _______________________________________________
> > Mono-devel-list mailing list
> > Mono-devel-list@lists.ximian.com
> > http://lists.ximian.com/mailman/listinfo/mono-devel-list
> > 
> > 
> > 
> 

_______________________________________________
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list

Reply via email to