[cp-patches] FYI: Proposal for RuleBasedCollator and 1.3 completeness
Hi! This patch moves the logic to build a CollationElementIterator from RuleBasedCollator and Collator into the class CollationElementIterator itself. *This is not a fix*. Infact, actually the new Constructor just read a string out of the iterator, without any processing. This way we can declare the 1.3 completeness and I can start fixing the CollationElementIterator in just one place. I would like to have this in for 0.93, but I understand that this is not a fix... What do you think? Mario -- Lima Software, SO.PR.IND. s.r.l. http://www.limasoftware.net/ pgp key: http://subkeys.pgp.net/ Please, support open standards: http://opendocumentfellowship.org/petition/ http://www.nosoftwarepatents.com/ ### Eclipse Workspace Patch 1.0 #P classpath Index: java/text/CollationElementIterator.java === RCS file: /cvsroot/classpath/classpath/java/text/CollationElementIterator.java,v retrieving revision 1.24 diff -u -r1.24 CollationElementIterator.java --- java/text/CollationElementIterator.java 23 Jul 2005 20:25:15 - 1.24 +++ java/text/CollationElementIterator.java 6 Dec 2006 12:16:37 - @@ -101,7 +101,8 @@ * to iterate over the specified codeString/code using the rules in the * specified codeRuleBasedCollator/code. * - * @param collator The codeRuleBasedCollation/code used for calculating collation values + * @param collator The codeRuleBasedCollation/code used for calculating + * collation values * @param text The codeString/code to iterate over. */ CollationElementIterator(RuleBasedCollator collator, String text) @@ -111,6 +112,33 @@ setText (text); } + /** + * This method initializes a new instance of codeCollationElementIterator/code + * to iterate over the specified codeCharacterIterator/code using the + * rules in the specified codeRuleBasedCollator/code. + * + * @param collator The codeRuleBasedCollation/code used for calculating + * collation values + * @param text The codeString/code to iterate over. + */ + CollationElementIterator(RuleBasedCollator collator, + CharacterIterator source) + { +this.collator = collator; + +// FIXME: does the same as CollationElementIterator(RuleBasedCollator, +// String text) for now + +StringBuffer text = new StringBuffer(); + +for (char c = source.first(); + c != CharacterIterator.DONE; + c = source.next()) + text.append(c); + +setText (text.toString()); + } + RuleBasedCollator.CollationElement nextBlock() { if (index = text_decomposition.length) Index: java/text/Collator.java === RCS file: /cvsroot/classpath/classpath/java/text/Collator.java,v retrieving revision 1.17 diff -u -r1.17 Collator.java --- java/text/Collator.java 24 Mar 2006 17:04:23 - 1.17 +++ java/text/Collator.java 6 Dec 2006 12:16:38 - @@ -375,14 +375,6 @@ this.strength = strength; } - // Decompose a single character and append results to the buffer. - // FIXME: for libgcj this is a native method which handles - // decomposition. For Classpath, for now, it does nothing. - final void decomposeCharacter (char c, StringBuffer buf) - { -buf.append (c); - } - /** * This is the current collation decomposition setting. */ Index: java/text/RuleBasedCollator.java === RCS file: /cvsroot/classpath/classpath/java/text/RuleBasedCollator.java,v retrieving revision 1.32 diff -u -r1.32 RuleBasedCollator.java --- java/text/RuleBasedCollator.java 22 Mar 2006 19:15:25 - 1.32 +++ java/text/RuleBasedCollator.java 6 Dec 2006 12:16:40 - @@ -38,8 +38,6 @@ package java.text; -import gnu.classpath.NotImplementedException; - import java.util.ArrayList; import java.util.HashMap; @@ -894,7 +892,8 @@ else v = (short) c; return new CollationElement( + c, (short) 0, -(short) 0, (short) (last_tertiary_value + v), (short) 0, null, false); +(short) 0, (short) (last_tertiary_value + v), +(short) 0, null, false); } /** @@ -922,18 +921,10 @@ * * @return A codeCollationElementIterator/code for the specified codeString/code. */ - public CollationElementIterator getCollationElementIterator(CharacterIterator source) -throws NotImplementedException // Because decomposeCharacter does not work + public + CollationElementIterator getCollationElementIterator(CharacterIterator source) { -StringBuffer expand = new StringBuffer(); - -// Right now we assume that we will read from the beginning of the string. -for (char c = source.first(); - c != CharacterIterator.DONE; - c = source.next()) - decomposeCharacter(c, expand); - -return getCollationElementIterator(expand.toString()); +return new CollationElementIterator(this,
Re: [cp-patches] FYI: Proposal for RuleBasedCollator and 1.3 completeness
Ops, sorry, actually the subject is a RFC... Mario -- Lima Software, SO.PR.IND. s.r.l. http://www.limasoftware.net/ pgp key: http://subkeys.pgp.net/ Please, support open standards: http://opendocumentfellowship.org/petition/ http://www.nosoftwarepatents.com/ signature.asc Description: Questa è una parte del messaggio firmata digitalmente
[cp-patches] Re: FYI: Proposal for RuleBasedCollator and 1.3 completeness
Hi Mario, On Wed, 2006-12-06 at 13:18 +0100, Mario Torre wrote: This patch moves the logic to build a CollationElementIterator from RuleBasedCollator and Collator into the class CollationElementIterator itself. *This is not a fix*. Infact, actually the new Constructor just read a string out of the iterator, without any processing. But in the old code the decomposing is done, although not really in the classpath case, only in the case of libgcj. Could you explain the difference between classpath/libgcj here and how this actually helps us? Can't we use the libgcj version? This way we can declare the 1.3 completeness and I can start fixing the CollationElementIterator in just one place. I would like to have this in for 0.93, but I understand that this is not a fix... What do you think? It doesn't really add or remove functionality it seems. How is the user better of with this version than they were with the old one? If it helps you structure the code in a way that makes improving it better please do go for it. But if the functionality doesn't really change I am not sure the user will actually care which version is in 0.93 (I am not opposed to adding it though). Cheers, Mark signature.asc Description: This is a digitally signed message part
[cp-patches] Re: FYI: Proposal for RuleBasedCollator and 1.3 completeness
Il giorno mer, 06/12/2006 alle 13.50 +0100, Mark Wielaard ha scritto: Hi Mario, Hi Mark! Thank you for the quick reply! But in the old code the decomposing is done, although not really in the classpath case, only in the case of libgcj. Could you explain the difference between classpath/libgcj here and how this actually helps us? Can't we use the libgcj version? Yes, in the case of classpath it is not done at all. In case of gcj it is broken though. I'm not sure about how much broken it is. It maybe just a matter of wrong data (gcj use the UCD 3.0, jdk 1.5 uses the UCD 4.0.0). This is not that different, just few addictions, so I guess this is only part of the problem. The method used by gcj is not complete, this is sure. The javadoc says that the following rules are defined: * NO_DECOMPOSITION jdk: accented characters will not be decomposed for collation. gcj: no decomposition is performed at all. * CANONICAL_DECOMPOSITION jdk: characters that are canonical variants according to Unicode standard will be decomposed for collation. Used for accented character. gcj: read from canonical_decomposition array the values and use this array to calculate decomposition. * FULL_DECOMPOSITION jdk: Unicode canonical variants and Unicode compatibility variants will be decomposed for collation. gcj: does the same as before, using a different array. The last method should be the Compatibility decomposition named in the Unicode Standard, if I'm not wrong. What is clear to me is that we are doing the wrong thing here, as this class and these methods are more complex than what we have (and I fear another DecimalFormat...). It doesn't really add or remove functionality it seems. How is the user better of with this version than they were with the old one? Actually yes, it is just to say that we have 1.3 complete... it is of no use at all as is. If it helps you structure the code in a way that makes improving it better please do go for it. This is the reason. It makes sense to have all this functionality in one place, as it is related to just this class. Unless, of course, reading better the code and understanding it I find that even Collator and RuleBasedCollator are wrong (I have no reason to think that now, but I also know that this area is a bit in darkness, the javadoc does not help, and there are no effective tests in mauve). But if the functionality doesn't really change I am not sure True, and the drawback is to fool users into thinking that we have implemented this functionality. I think I'll do as in DecimalFormat, I will keep a local branch until all the functionality are in place and then submit them for review. The Unicode standard is well documented, I only have to find how it is implemented in the jdk. Mark Ciao, Mario -- Lima Software, SO.PR.IND. s.r.l. http://www.limasoftware.net/ pgp key: http://subkeys.pgp.net/ Please, support open standards: http://opendocumentfellowship.org/petition/ http://www.nosoftwarepatents.com/ signature.asc Description: Questa è una parte del messaggio firmata digitalmente
FYI: Proposal for RuleBasedCollator and 1.3 completeness
Hi! This patch moves the logic to build a CollationElementIterator from RuleBasedCollator and Collator into the class CollationElementIterator itself. *This is not a fix*. Infact, actually the new Constructor just read a string out of the iterator, without any processing. This way we can declare the 1.3 completeness and I can start fixing the CollationElementIterator in just one place. I would like to have this in for 0.93, but I understand that this is not a fix... What do you think? Mario -- Lima Software, SO.PR.IND. s.r.l. http://www.limasoftware.net/ pgp key: http://subkeys.pgp.net/ Please, support open standards: http://opendocumentfellowship.org/petition/ http://www.nosoftwarepatents.com/ ### Eclipse Workspace Patch 1.0 #P classpath Index: java/text/CollationElementIterator.java === RCS file: /cvsroot/classpath/classpath/java/text/CollationElementIterator.java,v retrieving revision 1.24 diff -u -r1.24 CollationElementIterator.java --- java/text/CollationElementIterator.java 23 Jul 2005 20:25:15 - 1.24 +++ java/text/CollationElementIterator.java 6 Dec 2006 12:16:37 - @@ -101,7 +101,8 @@ * to iterate over the specified codeString/code using the rules in the * specified codeRuleBasedCollator/code. * - * @param collator The codeRuleBasedCollation/code used for calculating collation values + * @param collator The codeRuleBasedCollation/code used for calculating + * collation values * @param text The codeString/code to iterate over. */ CollationElementIterator(RuleBasedCollator collator, String text) @@ -111,6 +112,33 @@ setText (text); } + /** + * This method initializes a new instance of codeCollationElementIterator/code + * to iterate over the specified codeCharacterIterator/code using the + * rules in the specified codeRuleBasedCollator/code. + * + * @param collator The codeRuleBasedCollation/code used for calculating + * collation values + * @param text The codeString/code to iterate over. + */ + CollationElementIterator(RuleBasedCollator collator, + CharacterIterator source) + { +this.collator = collator; + +// FIXME: does the same as CollationElementIterator(RuleBasedCollator, +// String text) for now + +StringBuffer text = new StringBuffer(); + +for (char c = source.first(); + c != CharacterIterator.DONE; + c = source.next()) + text.append(c); + +setText (text.toString()); + } + RuleBasedCollator.CollationElement nextBlock() { if (index = text_decomposition.length) Index: java/text/Collator.java === RCS file: /cvsroot/classpath/classpath/java/text/Collator.java,v retrieving revision 1.17 diff -u -r1.17 Collator.java --- java/text/Collator.java 24 Mar 2006 17:04:23 - 1.17 +++ java/text/Collator.java 6 Dec 2006 12:16:38 - @@ -375,14 +375,6 @@ this.strength = strength; } - // Decompose a single character and append results to the buffer. - // FIXME: for libgcj this is a native method which handles - // decomposition. For Classpath, for now, it does nothing. - final void decomposeCharacter (char c, StringBuffer buf) - { -buf.append (c); - } - /** * This is the current collation decomposition setting. */ Index: java/text/RuleBasedCollator.java === RCS file: /cvsroot/classpath/classpath/java/text/RuleBasedCollator.java,v retrieving revision 1.32 diff -u -r1.32 RuleBasedCollator.java --- java/text/RuleBasedCollator.java 22 Mar 2006 19:15:25 - 1.32 +++ java/text/RuleBasedCollator.java 6 Dec 2006 12:16:40 - @@ -38,8 +38,6 @@ package java.text; -import gnu.classpath.NotImplementedException; - import java.util.ArrayList; import java.util.HashMap; @@ -894,7 +892,8 @@ else v = (short) c; return new CollationElement( + c, (short) 0, -(short) 0, (short) (last_tertiary_value + v), (short) 0, null, false); +(short) 0, (short) (last_tertiary_value + v), +(short) 0, null, false); } /** @@ -922,18 +921,10 @@ * * @return A codeCollationElementIterator/code for the specified codeString/code. */ - public CollationElementIterator getCollationElementIterator(CharacterIterator source) -throws NotImplementedException // Because decomposeCharacter does not work + public + CollationElementIterator getCollationElementIterator(CharacterIterator source) { -StringBuffer expand = new StringBuffer(); - -// Right now we assume that we will read from the beginning of the string. -for (char c = source.first(); - c != CharacterIterator.DONE; - c = source.next()) - decomposeCharacter(c, expand); - -return getCollationElementIterator(expand.toString()); +return new CollationElementIterator(this,
Re: [cp-patches] FYI: Proposal for RuleBasedCollator and 1.3 completeness
Ops, sorry, actually the subject is a RFC... Mario -- Lima Software, SO.PR.IND. s.r.l. http://www.limasoftware.net/ pgp key: http://subkeys.pgp.net/ Please, support open standards: http://opendocumentfellowship.org/petition/ http://www.nosoftwarepatents.com/ signature.asc Description: Questa è una parte del messaggio firmata digitalmente
Re: FYI: Proposal for RuleBasedCollator and 1.3 completeness
Hi Mario, On Wed, 2006-12-06 at 13:18 +0100, Mario Torre wrote: This patch moves the logic to build a CollationElementIterator from RuleBasedCollator and Collator into the class CollationElementIterator itself. *This is not a fix*. Infact, actually the new Constructor just read a string out of the iterator, without any processing. But in the old code the decomposing is done, although not really in the classpath case, only in the case of libgcj. Could you explain the difference between classpath/libgcj here and how this actually helps us? Can't we use the libgcj version? This way we can declare the 1.3 completeness and I can start fixing the CollationElementIterator in just one place. I would like to have this in for 0.93, but I understand that this is not a fix... What do you think? It doesn't really add or remove functionality it seems. How is the user better of with this version than they were with the old one? If it helps you structure the code in a way that makes improving it better please do go for it. But if the functionality doesn't really change I am not sure the user will actually care which version is in 0.93 (I am not opposed to adding it though). Cheers, Mark signature.asc Description: This is a digitally signed message part
Re: FYI: Proposal for RuleBasedCollator and 1.3 completeness
Il giorno mer, 06/12/2006 alle 13.50 +0100, Mark Wielaard ha scritto: Hi Mario, Hi Mark! Thank you for the quick reply! But in the old code the decomposing is done, although not really in the classpath case, only in the case of libgcj. Could you explain the difference between classpath/libgcj here and how this actually helps us? Can't we use the libgcj version? Yes, in the case of classpath it is not done at all. In case of gcj it is broken though. I'm not sure about how much broken it is. It maybe just a matter of wrong data (gcj use the UCD 3.0, jdk 1.5 uses the UCD 4.0.0). This is not that different, just few addictions, so I guess this is only part of the problem. The method used by gcj is not complete, this is sure. The javadoc says that the following rules are defined: * NO_DECOMPOSITION jdk: accented characters will not be decomposed for collation. gcj: no decomposition is performed at all. * CANONICAL_DECOMPOSITION jdk: characters that are canonical variants according to Unicode standard will be decomposed for collation. Used for accented character. gcj: read from canonical_decomposition array the values and use this array to calculate decomposition. * FULL_DECOMPOSITION jdk: Unicode canonical variants and Unicode compatibility variants will be decomposed for collation. gcj: does the same as before, using a different array. The last method should be the Compatibility decomposition named in the Unicode Standard, if I'm not wrong. What is clear to me is that we are doing the wrong thing here, as this class and these methods are more complex than what we have (and I fear another DecimalFormat...). It doesn't really add or remove functionality it seems. How is the user better of with this version than they were with the old one? Actually yes, it is just to say that we have 1.3 complete... it is of no use at all as is. If it helps you structure the code in a way that makes improving it better please do go for it. This is the reason. It makes sense to have all this functionality in one place, as it is related to just this class. Unless, of course, reading better the code and understanding it I find that even Collator and RuleBasedCollator are wrong (I have no reason to think that now, but I also know that this area is a bit in darkness, the javadoc does not help, and there are no effective tests in mauve). But if the functionality doesn't really change I am not sure True, and the drawback is to fool users into thinking that we have implemented this functionality. I think I'll do as in DecimalFormat, I will keep a local branch until all the functionality are in place and then submit them for review. The Unicode standard is well documented, I only have to find how it is implemented in the jdk. Mark Ciao, Mario -- Lima Software, SO.PR.IND. s.r.l. http://www.limasoftware.net/ pgp key: http://subkeys.pgp.net/ Please, support open standards: http://opendocumentfellowship.org/petition/ http://www.nosoftwarepatents.com/ signature.asc Description: Questa è una parte del messaggio firmata digitalmente