Locale string compare: Java vs. C#

2006-12-13 Thread George Aroush
Hi folks,

Over at Lucene.Net, I have run into a NUnit test which is failing with
Lucene.Net (C#) but is passing with Lucene (Java).  The two tests that fail
are: TestInternationalMultiSearcherSort and TestInternationalSort

After several hours of investigation, I narrowed the problem to what I
believe is a difference in the way Java and .NET implement compare.

The code in question is this method (found in FieldSortedHitQueue.java):

public final int compare (final ScoreDoc i, final ScoreDoc j) {
return collator.compare (index[i.doc], index[j.doc]);
}

To demonstrate the compare problem (Java vs. .NET) I crated this simple code
both in Java and C#:

// Java code: you get back 1 for 'res'
String s1 = H\u00D8T;
String s2 = HUT;
Collator collator = Collator.getInstance (Locale.US);
int diff = collator.compare(s1, s2);

// C# code: you get back -1 for 'res'
string s1 = H\u00D8T;
string s2 = HUT;
System.Globalization.CultureInfo locale = new
System.Globalization.CultureInfo(en-US);
System.Globalization.CompareInfo collator = locale.CompareInfo;
int res = collator.Compare(s1, s2);

Java will give me back a 1 while .NET gives me back -1.

So, what I am trying to figure out is who is doing the right thing?  Or am I
missing additional calls before I can compare?

My goal is to understand why the difference exist and thus based on that
understanding I can judge how serious this issue is and find a fix for it or
just document it as a language difference between Java and .NET.

Btw, this is based on Lucene 2.0 for both Java and C# Lucene.

Regards,

-- George Aroush


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Locale string compare: Java vs. C#

2006-12-13 Thread Chuck Williams
Surprising but it looks to me like a bug in Java's collation rules for
en-US.  According to
http://developer.mimer.com/collations/charts/UCA_latin.htm, \u00D8
(which is Latin Capital Letter O With Stroke) should be before U,
implying -1 is the correct result.  Java is returning 1 for all
strengths of the collator.  Maybe there is some other subtlety with this
character...

Chuck


George Aroush wrote on 12/13/2006 04:20 PM:
 Hi folks,

 Over at Lucene.Net, I have run into a NUnit test which is failing with
 Lucene.Net (C#) but is passing with Lucene (Java).  The two tests that fail
 are: TestInternationalMultiSearcherSort and TestInternationalSort

 After several hours of investigation, I narrowed the problem to what I
 believe is a difference in the way Java and .NET implement compare.

 The code in question is this method (found in FieldSortedHitQueue.java):

 public final int compare (final ScoreDoc i, final ScoreDoc j) {
 return collator.compare (index[i.doc], index[j.doc]);
 }

 To demonstrate the compare problem (Java vs. .NET) I crated this simple code
 both in Java and C#:

 // Java code: you get back 1 for 'res'
 String s1 = H\u00D8T;
 String s2 = HUT;
 Collator collator = Collator.getInstance (Locale.US);
 int diff = collator.compare(s1, s2);

 // C# code: you get back -1 for 'res'
 string s1 = H\u00D8T;
 string s2 = HUT;
 System.Globalization.CultureInfo locale = new
 System.Globalization.CultureInfo(en-US);
 System.Globalization.CompareInfo collator = locale.CompareInfo;
 int res = collator.Compare(s1, s2);

 Java will give me back a 1 while .NET gives me back -1.

 So, what I am trying to figure out is who is doing the right thing?  Or am I
 missing additional calls before I can compare?

 My goal is to understand why the difference exist and thus based on that
 understanding I can judge how serious this issue is and find a fix for it or
 just document it as a language difference between Java and .NET.

 Btw, this is based on Lucene 2.0 for both Java and C# Lucene.

 Regards,

 -- George Aroush


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]

   


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]