You certainly don't want CompareOrdinal! .NET is doing the right thing in this case, but so is Java.
The problem is that Ø is not a character that is used in US English (or any English, for that matter), so the actual order that would be returned when doing a compare in a locale like en-us is not really important. What IS important is if you do the comparison in the context of a locale that DOES use the Ø character. If you change your .NET culture name or your Java locale to (for example) "da" (that is, Danish) then the results are the same. So the bug, I believe, is in the test case which is relying on which is, in my opinion, undefined. Dean. > -----Original Message----- > From: George Aroush [mailto:[EMAIL PROTECTED] > Sent: Thursday, 14 December 2006 1:07 pm > To: [email protected] > Cc: [email protected] > Subject: RE: Sort differences between .NET and Java in Lucene.Net 2.0 > > Hi Joe and all, > > I don't think we can use CompareOrdinal() as it doesn't take locale into > consideration. > > The issue is with the following function in > Lucene.Net.Search.FieldSortedHitQueue.cs: > > public int Compare(ScoreDoc i, ScoreDoc j) > { > return collator.Compare(index[i.doc].ToString(), > index[j.doc].ToString()); > } > > To demonstrate how Java and C# differ in the way they do compare, here is > a > sample code: > > // C# code: you get back -1 for 'res' > string s1 = "H\u00D8T"; > string s2 = "HUT"; > System.Globalization.CultureInfo locale = new > System.Globalization.CultureInfo("en-US"); > System.Globalization.CompareInfo collator = locale.CompareInfo; > int res = collator.Compare(s1, s2); > > // Java code: you get back 1 for 'res' > String s1 = "H\u00D8T"; > String s2 = "HUT"; > Collator collator = Collator.getInstance (Locale.US); > int diff = collator.compare(s1, s2); > > Who is doing the right thing? Or am I missing additional calls before I > can > compare? > > My goal is to understand why the difference exist and thus we can judge > how > serious this is and either fix it or accept it as a language difference. > > Btw, I am going to post this question on the Java Lucene mailing list to > see > what folks on the Java land have to say. > > Regards, > > -- George Aroush > > > -----Original Message----- > From: Joe Shaw [mailto:[EMAIL PROTECTED] > Sent: Wednesday, December 13, 2006 1:35 PM > To: [email protected] > Cc: [email protected] > Subject: RE: Sort differences between .NET and Java in Lucene.Net 2.0 > > Hi, > > On Wed, 2006-12-13 at 11:35 -0500, George Aroush wrote: > > This is why those two tests are failing and I wander if this is a > > defect in NET or in the way the culture info is used in those two > > languages or if there is more culture setting I have to do in .NET. > > > > My thinking is, in .NET during compare, "\u00D8", is being treated as > > ASCII "O" and not the Unicode character that it really is. > > This isn't the case, because if so "HOT" would be equal to "H\u00D8T". > > I think that the sort order is just different between .NET and Java -- ie, > the order is "O", "\u00D8", "U" in .NET but "O", "U", "\u00D8" in Java -- > at > least in the culture you're using. > > If you're looking for the actual numerical values of the characters for > comparison (in which "\u00D8" would be quite a bit higher than both "O" > and "U", you probably want to use String.CompareOrdinal()). > > BTW, doing culture insensitive string comparisons might be a good thing to > do anyway. From the MSDN docs for String.Compare(string, string): > > The comparison uses the current culture to obtain > culture-specific information such as casing rules and the > alphabetic order of individual characters. For example, a > culture could specify that certain combinations of characters be > treated as a single character, or uppercase and lowercase > characters be compared in a particular way, or that the sorting > order of a character depends on the characters that precede or > follow it. > > For more info, see the String.Compare() docs: > http://msdn.microsoft.com/library/default.asp?url=/library/en- > us/cpref/html/ > frlrfsystemStringclassComparetopic.asp > > Joe
