[cp-patches] FYI: Proposal for RuleBasedCollator and 1.3 completeness

2006-12-06 Thread Mario Torre
Hi!

This patch moves the logic to build a CollationElementIterator from
RuleBasedCollator and Collator into the class CollationElementIterator
itself.

*This is not a fix*. Infact, actually the new Constructor just read a
string out of the iterator, without any processing.

This way we can declare the 1.3 completeness and I can start fixing the
CollationElementIterator in just one place.

I would like to have this in for 0.93, but I understand that this is not
a fix...

What do you think?

Mario
-- 
Lima Software, SO.PR.IND. s.r.l.
http://www.limasoftware.net/
pgp key: http://subkeys.pgp.net/

Please, support open standards:
http://opendocumentfellowship.org/petition/
http://www.nosoftwarepatents.com/
### Eclipse Workspace Patch 1.0
#P classpath
Index: java/text/CollationElementIterator.java
===
RCS file: /cvsroot/classpath/classpath/java/text/CollationElementIterator.java,v
retrieving revision 1.24
diff -u -r1.24 CollationElementIterator.java
--- java/text/CollationElementIterator.java	23 Jul 2005 20:25:15 -	1.24
+++ java/text/CollationElementIterator.java	6 Dec 2006 12:16:37 -
@@ -101,7 +101,8 @@
* to iterate over the specified codeString/code using the rules in the
* specified codeRuleBasedCollator/code.
*
-   * @param collator The codeRuleBasedCollation/code used for calculating collation values
+   * @param collator The codeRuleBasedCollation/code used for calculating
+   * collation values
* @param text The codeString/code to iterate over.
*/
   CollationElementIterator(RuleBasedCollator collator, String text)
@@ -111,6 +112,33 @@
 setText (text);
   }
 
+  /**
+   * This method initializes a new instance of codeCollationElementIterator/code
+   * to iterate over the specified codeCharacterIterator/code using the
+   * rules in the specified codeRuleBasedCollator/code.
+   *
+   * @param collator The codeRuleBasedCollation/code used for calculating
+   * collation values
+   * @param text The codeString/code to iterate over.
+   */
+  CollationElementIterator(RuleBasedCollator collator,
+   CharacterIterator source)
+  {
+this.collator = collator;
+
+// FIXME: does the same as CollationElementIterator(RuleBasedCollator,
+// String text) for now
+
+StringBuffer text = new StringBuffer();
+
+for (char c = source.first();
+ c != CharacterIterator.DONE;
+ c = source.next())
+  text.append(c);
+
+setText (text.toString());
+  }
+  
   RuleBasedCollator.CollationElement nextBlock()
   {
 if (index = text_decomposition.length)
Index: java/text/Collator.java
===
RCS file: /cvsroot/classpath/classpath/java/text/Collator.java,v
retrieving revision 1.17
diff -u -r1.17 Collator.java
--- java/text/Collator.java	24 Mar 2006 17:04:23 -	1.17
+++ java/text/Collator.java	6 Dec 2006 12:16:38 -
@@ -375,14 +375,6 @@
 this.strength = strength;
   }
 
-  // Decompose a single character and append results to the buffer.
-  // FIXME: for libgcj this is a native method which handles
-  // decomposition.  For Classpath, for now, it does nothing.
-  final void decomposeCharacter (char c, StringBuffer buf)
-  {
-buf.append (c);
-  }
-
   /**
* This is the current collation decomposition setting.
*/
Index: java/text/RuleBasedCollator.java
===
RCS file: /cvsroot/classpath/classpath/java/text/RuleBasedCollator.java,v
retrieving revision 1.32
diff -u -r1.32 RuleBasedCollator.java
--- java/text/RuleBasedCollator.java	22 Mar 2006 19:15:25 -	1.32
+++ java/text/RuleBasedCollator.java	6 Dec 2006 12:16:40 -
@@ -38,8 +38,6 @@
 
 package java.text;
 
-import gnu.classpath.NotImplementedException;
-
 import java.util.ArrayList;
 import java.util.HashMap;
 
@@ -894,7 +892,8 @@
 else
   v = (short) c;
 return new CollationElement( + c, (short) 0,
-(short) 0, (short) (last_tertiary_value + v), (short) 0, null, false);
+(short) 0, (short) (last_tertiary_value + v),
+(short) 0, null, false);
   }
 
   /**
@@ -922,18 +921,10 @@
*
* @return A codeCollationElementIterator/code for the specified codeString/code.
*/
-  public CollationElementIterator getCollationElementIterator(CharacterIterator source)
-throws NotImplementedException  // Because decomposeCharacter does not work
+  public
+  CollationElementIterator getCollationElementIterator(CharacterIterator source)
   {
-StringBuffer expand = new StringBuffer();
-
-// Right now we assume that we will read from the beginning of the string.
-for (char c = source.first();
-	 c != CharacterIterator.DONE;
-	 c = source.next())
-  decomposeCharacter(c, expand);
-
-return getCollationElementIterator(expand.toString());
+return new CollationElementIterator(this, 

Re: [cp-patches] FYI: Proposal for RuleBasedCollator and 1.3 completeness

2006-12-06 Thread Mario Torre
Ops, sorry, actually the subject is a RFC...

Mario
-- 
Lima Software, SO.PR.IND. s.r.l.
http://www.limasoftware.net/
pgp key: http://subkeys.pgp.net/

Please, support open standards:
http://opendocumentfellowship.org/petition/
http://www.nosoftwarepatents.com/


signature.asc
Description: Questa è una parte del messaggio	firmata digitalmente


[cp-patches] Re: FYI: Proposal for RuleBasedCollator and 1.3 completeness

2006-12-06 Thread Mark Wielaard
Hi Mario,

On Wed, 2006-12-06 at 13:18 +0100, Mario Torre wrote:
 This patch moves the logic to build a CollationElementIterator from
 RuleBasedCollator and Collator into the class CollationElementIterator
 itself.
 
 *This is not a fix*. Infact, actually the new Constructor just read a
 string out of the iterator, without any processing.

But in the old code the decomposing is done, although not really in the
classpath case, only in the case of libgcj. Could you explain the
difference between classpath/libgcj here and how this actually helps us?
Can't we use the libgcj version?

 This way we can declare the 1.3 completeness and I can start fixing the
 CollationElementIterator in just one place.
 
 I would like to have this in for 0.93, but I understand that this is not
 a fix...
 
 What do you think?

It doesn't really add or remove functionality it seems. How is the user
better of with this version than they were with the old one? If it helps
you structure the code in a way that makes improving it better please do
go for it. But if the functionality doesn't really change I am not sure
the user will actually care which version is in 0.93 (I am not opposed
to adding it though).

Cheers,

Mark


signature.asc
Description: This is a digitally signed message part


[cp-patches] Re: FYI: Proposal for RuleBasedCollator and 1.3 completeness

2006-12-06 Thread Mario Torre
Il giorno mer, 06/12/2006 alle 13.50 +0100, Mark Wielaard ha scritto:
 Hi Mario,

Hi Mark! Thank you for the quick reply!

 But in the old code the decomposing is done, although not really in the
 classpath case, only in the case of libgcj. Could you explain the
 difference between classpath/libgcj here and how this actually helps us?
 Can't we use the libgcj version?

Yes, in the case of classpath it is not done at all. In case of gcj it
is broken though.

I'm not sure about how much broken it is. It maybe just a matter of
wrong data (gcj use the UCD 3.0, jdk 1.5 uses the UCD 4.0.0). This is
not that different, just few addictions, so I guess this is only part of
the problem.

The method used by gcj is not complete, this is sure.

The javadoc says that the following rules are defined:

* NO_DECOMPOSITION

jdk: accented characters will not be decomposed for collation.
gcj: no decomposition is performed at all.

* CANONICAL_DECOMPOSITION

jdk: characters that are canonical variants according to Unicode
standard will be decomposed for collation. Used for accented character.
gcj: read from canonical_decomposition array the values and use this
array to calculate decomposition.

* FULL_DECOMPOSITION

jdk: Unicode canonical variants and Unicode compatibility variants will
be decomposed for collation.
gcj: does the same as before, using a different array.

The last method should be the Compatibility decomposition named in the
Unicode Standard, if I'm not wrong.

What is clear to me is that we are doing the wrong thing here, as this
class and these methods are more complex than what we have (and I fear
another DecimalFormat...).

 It doesn't really add or remove functionality it seems. How is the user
 better of with this version than they were with the old one?

Actually yes, it is just to say that we have 1.3 complete... it is of no
use at all as is.

 If it helps you structure the code in a way that makes improving it better 
 please do
 go for it.

This is the reason. It makes sense to have all this functionality in one
place, as it is related to just this class. Unless, of course, reading
better the code and understanding it I find that even Collator and
RuleBasedCollator are wrong (I have no reason to think that now, but I
also know that this area is a bit in darkness, the javadoc does not
help, and there are no effective tests in mauve).

 But if the functionality doesn't really change I am not sure

True, and the drawback is to fool users into thinking that we have
implemented this functionality. I think I'll do as in DecimalFormat, I
will keep a local branch until all the functionality are in place and
then submit them for review.

The Unicode standard is well documented, I only have to find how it is
implemented in the jdk.

 Mark

Ciao,
Mario
-- 
Lima Software, SO.PR.IND. s.r.l.
http://www.limasoftware.net/
pgp key: http://subkeys.pgp.net/

Please, support open standards:
http://opendocumentfellowship.org/petition/
http://www.nosoftwarepatents.com/


signature.asc
Description: Questa è una parte del messaggio	firmata digitalmente


FYI: Proposal for RuleBasedCollator and 1.3 completeness

2006-12-06 Thread Mario Torre
Hi!

This patch moves the logic to build a CollationElementIterator from
RuleBasedCollator and Collator into the class CollationElementIterator
itself.

*This is not a fix*. Infact, actually the new Constructor just read a
string out of the iterator, without any processing.

This way we can declare the 1.3 completeness and I can start fixing the
CollationElementIterator in just one place.

I would like to have this in for 0.93, but I understand that this is not
a fix...

What do you think?

Mario
-- 
Lima Software, SO.PR.IND. s.r.l.
http://www.limasoftware.net/
pgp key: http://subkeys.pgp.net/

Please, support open standards:
http://opendocumentfellowship.org/petition/
http://www.nosoftwarepatents.com/
### Eclipse Workspace Patch 1.0
#P classpath
Index: java/text/CollationElementIterator.java
===
RCS file: /cvsroot/classpath/classpath/java/text/CollationElementIterator.java,v
retrieving revision 1.24
diff -u -r1.24 CollationElementIterator.java
--- java/text/CollationElementIterator.java	23 Jul 2005 20:25:15 -	1.24
+++ java/text/CollationElementIterator.java	6 Dec 2006 12:16:37 -
@@ -101,7 +101,8 @@
* to iterate over the specified codeString/code using the rules in the
* specified codeRuleBasedCollator/code.
*
-   * @param collator The codeRuleBasedCollation/code used for calculating collation values
+   * @param collator The codeRuleBasedCollation/code used for calculating
+   * collation values
* @param text The codeString/code to iterate over.
*/
   CollationElementIterator(RuleBasedCollator collator, String text)
@@ -111,6 +112,33 @@
 setText (text);
   }
 
+  /**
+   * This method initializes a new instance of codeCollationElementIterator/code
+   * to iterate over the specified codeCharacterIterator/code using the
+   * rules in the specified codeRuleBasedCollator/code.
+   *
+   * @param collator The codeRuleBasedCollation/code used for calculating
+   * collation values
+   * @param text The codeString/code to iterate over.
+   */
+  CollationElementIterator(RuleBasedCollator collator,
+   CharacterIterator source)
+  {
+this.collator = collator;
+
+// FIXME: does the same as CollationElementIterator(RuleBasedCollator,
+// String text) for now
+
+StringBuffer text = new StringBuffer();
+
+for (char c = source.first();
+ c != CharacterIterator.DONE;
+ c = source.next())
+  text.append(c);
+
+setText (text.toString());
+  }
+  
   RuleBasedCollator.CollationElement nextBlock()
   {
 if (index = text_decomposition.length)
Index: java/text/Collator.java
===
RCS file: /cvsroot/classpath/classpath/java/text/Collator.java,v
retrieving revision 1.17
diff -u -r1.17 Collator.java
--- java/text/Collator.java	24 Mar 2006 17:04:23 -	1.17
+++ java/text/Collator.java	6 Dec 2006 12:16:38 -
@@ -375,14 +375,6 @@
 this.strength = strength;
   }
 
-  // Decompose a single character and append results to the buffer.
-  // FIXME: for libgcj this is a native method which handles
-  // decomposition.  For Classpath, for now, it does nothing.
-  final void decomposeCharacter (char c, StringBuffer buf)
-  {
-buf.append (c);
-  }
-
   /**
* This is the current collation decomposition setting.
*/
Index: java/text/RuleBasedCollator.java
===
RCS file: /cvsroot/classpath/classpath/java/text/RuleBasedCollator.java,v
retrieving revision 1.32
diff -u -r1.32 RuleBasedCollator.java
--- java/text/RuleBasedCollator.java	22 Mar 2006 19:15:25 -	1.32
+++ java/text/RuleBasedCollator.java	6 Dec 2006 12:16:40 -
@@ -38,8 +38,6 @@
 
 package java.text;
 
-import gnu.classpath.NotImplementedException;
-
 import java.util.ArrayList;
 import java.util.HashMap;
 
@@ -894,7 +892,8 @@
 else
   v = (short) c;
 return new CollationElement( + c, (short) 0,
-(short) 0, (short) (last_tertiary_value + v), (short) 0, null, false);
+(short) 0, (short) (last_tertiary_value + v),
+(short) 0, null, false);
   }
 
   /**
@@ -922,18 +921,10 @@
*
* @return A codeCollationElementIterator/code for the specified codeString/code.
*/
-  public CollationElementIterator getCollationElementIterator(CharacterIterator source)
-throws NotImplementedException  // Because decomposeCharacter does not work
+  public
+  CollationElementIterator getCollationElementIterator(CharacterIterator source)
   {
-StringBuffer expand = new StringBuffer();
-
-// Right now we assume that we will read from the beginning of the string.
-for (char c = source.first();
-	 c != CharacterIterator.DONE;
-	 c = source.next())
-  decomposeCharacter(c, expand);
-
-return getCollationElementIterator(expand.toString());
+return new CollationElementIterator(this, 

Re: [cp-patches] FYI: Proposal for RuleBasedCollator and 1.3 completeness

2006-12-06 Thread Mario Torre
Ops, sorry, actually the subject is a RFC...

Mario
-- 
Lima Software, SO.PR.IND. s.r.l.
http://www.limasoftware.net/
pgp key: http://subkeys.pgp.net/

Please, support open standards:
http://opendocumentfellowship.org/petition/
http://www.nosoftwarepatents.com/


signature.asc
Description: Questa è una parte del messaggio	firmata digitalmente


Re: FYI: Proposal for RuleBasedCollator and 1.3 completeness

2006-12-06 Thread Mark Wielaard
Hi Mario,

On Wed, 2006-12-06 at 13:18 +0100, Mario Torre wrote:
 This patch moves the logic to build a CollationElementIterator from
 RuleBasedCollator and Collator into the class CollationElementIterator
 itself.
 
 *This is not a fix*. Infact, actually the new Constructor just read a
 string out of the iterator, without any processing.

But in the old code the decomposing is done, although not really in the
classpath case, only in the case of libgcj. Could you explain the
difference between classpath/libgcj here and how this actually helps us?
Can't we use the libgcj version?

 This way we can declare the 1.3 completeness and I can start fixing the
 CollationElementIterator in just one place.
 
 I would like to have this in for 0.93, but I understand that this is not
 a fix...
 
 What do you think?

It doesn't really add or remove functionality it seems. How is the user
better of with this version than they were with the old one? If it helps
you structure the code in a way that makes improving it better please do
go for it. But if the functionality doesn't really change I am not sure
the user will actually care which version is in 0.93 (I am not opposed
to adding it though).

Cheers,

Mark


signature.asc
Description: This is a digitally signed message part


Re: FYI: Proposal for RuleBasedCollator and 1.3 completeness

2006-12-06 Thread Mario Torre
Il giorno mer, 06/12/2006 alle 13.50 +0100, Mark Wielaard ha scritto:
 Hi Mario,

Hi Mark! Thank you for the quick reply!

 But in the old code the decomposing is done, although not really in the
 classpath case, only in the case of libgcj. Could you explain the
 difference between classpath/libgcj here and how this actually helps us?
 Can't we use the libgcj version?

Yes, in the case of classpath it is not done at all. In case of gcj it
is broken though.

I'm not sure about how much broken it is. It maybe just a matter of
wrong data (gcj use the UCD 3.0, jdk 1.5 uses the UCD 4.0.0). This is
not that different, just few addictions, so I guess this is only part of
the problem.

The method used by gcj is not complete, this is sure.

The javadoc says that the following rules are defined:

* NO_DECOMPOSITION

jdk: accented characters will not be decomposed for collation.
gcj: no decomposition is performed at all.

* CANONICAL_DECOMPOSITION

jdk: characters that are canonical variants according to Unicode
standard will be decomposed for collation. Used for accented character.
gcj: read from canonical_decomposition array the values and use this
array to calculate decomposition.

* FULL_DECOMPOSITION

jdk: Unicode canonical variants and Unicode compatibility variants will
be decomposed for collation.
gcj: does the same as before, using a different array.

The last method should be the Compatibility decomposition named in the
Unicode Standard, if I'm not wrong.

What is clear to me is that we are doing the wrong thing here, as this
class and these methods are more complex than what we have (and I fear
another DecimalFormat...).

 It doesn't really add or remove functionality it seems. How is the user
 better of with this version than they were with the old one?

Actually yes, it is just to say that we have 1.3 complete... it is of no
use at all as is.

 If it helps you structure the code in a way that makes improving it better 
 please do
 go for it.

This is the reason. It makes sense to have all this functionality in one
place, as it is related to just this class. Unless, of course, reading
better the code and understanding it I find that even Collator and
RuleBasedCollator are wrong (I have no reason to think that now, but I
also know that this area is a bit in darkness, the javadoc does not
help, and there are no effective tests in mauve).

 But if the functionality doesn't really change I am not sure

True, and the drawback is to fool users into thinking that we have
implemented this functionality. I think I'll do as in DecimalFormat, I
will keep a local branch until all the functionality are in place and
then submit them for review.

The Unicode standard is well documented, I only have to find how it is
implemented in the jdk.

 Mark

Ciao,
Mario
-- 
Lima Software, SO.PR.IND. s.r.l.
http://www.limasoftware.net/
pgp key: http://subkeys.pgp.net/

Please, support open standards:
http://opendocumentfellowship.org/petition/
http://www.nosoftwarepatents.com/


signature.asc
Description: Questa è una parte del messaggio	firmata digitalmente