[jira] [Commented] (NET-418) File truncated when transfer on ftp
[ https://issues.apache.org/jira/browse/NET-418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13080620#comment-13080620 ] Sebb commented on NET-418: -- What is the code you are using to upload the file? Are you using binary or ascii mode? File truncated when transfer on ftp --- Key: NET-418 URL: https://issues.apache.org/jira/browse/NET-418 Project: Commons Net Issue Type: Bug Components: FTP Affects Versions: 3.0.1 Environment: Transfer from Windows server 2008 R2 64 bits to Linux Centos 5.5 x86 64 bits Pro FTPD A31 Reporter: PARENT JP Attachments: notices.txt, notices_total.zip File after transfer is truncated. Original file has a size of 17 172 261 bytes and file after transfer 17 170 762 bytes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (NET-418) File truncated when transfer on ftp
[ https://issues.apache.org/jira/browse/NET-418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebb resolved NET-418. -- Resolution: Not A Problem File truncated when transfer on ftp --- Key: NET-418 URL: https://issues.apache.org/jira/browse/NET-418 Project: Commons Net Issue Type: Bug Components: FTP Affects Versions: 3.0.1 Environment: Transfer from Windows server 2008 R2 64 bits to Linux Centos 5.5 x86 64 bits Pro FTPD A31 Reporter: PARENT JP Attachments: notices.txt, notices_total.zip File after transfer is truncated. Original file has a size of 17 172 261 bytes and file after transfer 17 170 762 bytes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Closed] (NET-418) File truncated when transfer on ftp
[ https://issues.apache.org/jira/browse/NET-418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebb closed NET-418. File truncated when transfer on ftp --- Key: NET-418 URL: https://issues.apache.org/jira/browse/NET-418 Project: Commons Net Issue Type: Bug Components: FTP Affects Versions: 3.0.1 Environment: Transfer from Windows server 2008 R2 64 bits to Linux Centos 5.5 x86 64 bits Pro FTPD A31 Reporter: PARENT JP Attachments: notices.txt, notices_total.zip File after transfer is truncated. Original file has a size of 17 172 261 bytes and file after transfer 17 170 762 bytes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CODEC-125) Implement a Beider-Morse phonetic matching codec
[ https://issues.apache.org/jira/browse/CODEC-125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083198#comment-13083198 ] Sebb commented on CODEC-125: Changing API means more than a version bump; for Commons it generally requires a change of package name and Maven id so that old and new versions can co-exist. So it's important to get the API correct before release if at all possible. In the case of brand new code, maybe it would be possible to document it as being unstable, and therefore allow changes to the API. But this should be discussed on the developer list first. Implement a Beider-Morse phonetic matching codec Key: CODEC-125 URL: https://issues.apache.org/jira/browse/CODEC-125 Project: Commons Codec Issue Type: New Feature Reporter: Matthew Pocock Priority: Minor Attachments: Rule$4$1-All_Objects.html, acz.patch, bm-gg.diff, bmpm.patch, bmpm.patch, bmpm.patch, bmpm.patch, bmpm.patch, bmpm.patch, bmpm.patch, bmpm.patch, comparator.patch, fightingMemoryChurn.patch, fightingMemoryChurn.patch, fixmeInvariant.patch, handleH.patch, majorFix.patch, performanceAndBugs.patch, testAllChars-mem-profile.html, testEncodeGna.patch I have implemented Beider Morse Phonetic Matching as a codec against the commons-codec svn trunk. I would like to contribute this to commons-codec. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CODEC-125) Implement a Beider-Morse phonetic matching codec
[ https://issues.apache.org/jira/browse/CODEC-125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083428#comment-13083428 ] Sebb commented on CODEC-125: @Gary. If there is already a need to break compat., then I don't have an issue with changing the bmpm api at the same time. Implement a Beider-Morse phonetic matching codec Key: CODEC-125 URL: https://issues.apache.org/jira/browse/CODEC-125 Project: Commons Codec Issue Type: New Feature Reporter: Matthew Pocock Priority: Minor Attachments: Rule$4$1-All_Objects.html, acz.patch, bm-gg.diff, bmpm.patch, bmpm.patch, bmpm.patch, bmpm.patch, bmpm.patch, bmpm.patch, bmpm.patch, bmpm.patch, comparator.patch, fightingMemoryChurn.patch, fightingMemoryChurn.patch, fixmeInvariant.patch, handleH.patch, majorFix.patch, performanceAndBugs.patch, testAllChars-mem-profile.html, testEncodeGna.patch I have implemented Beider Morse Phonetic Matching as a codec against the commons-codec svn trunk. I would like to contribute this to commons-codec. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CODEC-125) Implement a Beider-Morse phonetic matching codec
[ https://issues.apache.org/jira/browse/CODEC-125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083501#comment-13083501 ] Sebb commented on CODEC-125: @Gary: yes, I know it's new; my concern is about the next release after adding bmpm. I am concerned that the bmpm API is not stable, and that an early release might entail an incompatible change later. Hence the suggestion to discuss if that would require a package/maven name change, given that few external classes would be using it. Implement a Beider-Morse phonetic matching codec Key: CODEC-125 URL: https://issues.apache.org/jira/browse/CODEC-125 Project: Commons Codec Issue Type: New Feature Reporter: Matthew Pocock Priority: Minor Attachments: Rule$4$1-All_Objects.html, acz.patch, bm-gg.diff, bmpm.patch, bmpm.patch, bmpm.patch, bmpm.patch, bmpm.patch, bmpm.patch, bmpm.patch, bmpm.patch, comparator.patch, fightingMemoryChurn.patch, fightingMemoryChurn.patch, fixmeInvariant.patch, handleH.patch, majorFix.patch, performanceAndBugs.patch, testAllChars-mem-profile.html, testEncodeGna.patch I have implemented Beider Morse Phonetic Matching as a codec against the commons-codec svn trunk. I would like to contribute this to commons-codec. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CODEC-30) [codec] Character ö or é not mapped in soundex encoding
[ https://issues.apache.org/jira/browse/CODEC-30?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebb updated CODEC-30: -- Description: When calling soundex.soundex\(x) with x a string with a diacritical mark like ö or é the following exception occurs: java.lang.ArrayIndexOutOfBoundsException: 131 at org.apache.commons.codec.language.Soundex.map(Soundex.java:199) at org.apache.commons.codec.language.Soundex.getMappingCode(Soundex.java:157) This happens when calling the difference(s1, s2) in codec verion 1.3-dev this exception occurs too Cheers Rogier was: When calling soundex.soundex(x) with x a string with a diacritical mark like ö or é the following exception occurs: java.lang.ArrayIndexOutOfBoundsException: 131 at org.apache.commons.codec.language.Soundex.map(Soundex.java:199) at org.apache.commons.codec.language.Soundex.getMappingCode(Soundex.java :157) This happens when calling the difference(s1, s2) in codec verion 1.3-dev this exception occurs too Cheers Rogier Escape inadvertent special sequence [codec] Character ö or é not mapped in soundex encoding - Key: CODEC-30 URL: https://issues.apache.org/jira/browse/CODEC-30 Project: Commons Codec Issue Type: Bug Affects Versions: 1.2 Environment: Operating System: All Platform: All Reporter: Rogier Selie When calling soundex.soundex\(x) with x a string with a diacritical mark like ö or é the following exception occurs: java.lang.ArrayIndexOutOfBoundsException: 131 at org.apache.commons.codec.language.Soundex.map(Soundex.java:199) at org.apache.commons.codec.language.Soundex.getMappingCode(Soundex.java:157) This happens when calling the difference(s1, s2) in codec verion 1.3-dev this exception occurs too Cheers Rogier -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CODEC-127) Non-ascii characters in test source files
Non-ascii characters in test source files - Key: CODEC-127 URL: https://issues.apache.org/jira/browse/CODEC-127 Project: Commons Codec Issue Type: Bug Reporter: Sebb Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java {code} language\SoundexTest.java:367 in particular is incorrect, because it's supposed to be a single character. Now one might think that native2ascii -encoding UTF-8 would fix that, but it gives: if (Character.isLetter('\ufffd')) which is an unknown character. Similarly for binary\Base64Test.java:96. It's not all that clear what the Unicode escapes should be in these cases, but probably not the unknown character. [Possibly the characters got mangled at some point, or maybe they have always been wrong] The ColognePhoneticTest.java cases are less serious, as the characters are valid ISO-8859-1 (accented German), but given that the rest of the file uses unicode escaps, I think they should be changed too (but add comments to say what they are, e.g. o-umlaut, u-umlaut) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CODEC-127) Non-ascii characters in test source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084611#comment-13084611 ] Sebb commented on CODEC-127: The problem is that it's not possible to see what the test data is in the IDE (apart from the German chars). Also, unless you tell SVN the encoding (e.g. via mime-type), diff e-mails (and possibly conversion to local EOL) may suffer. Saving IDE settings in SVN is a non-starter, because there are many different IDEs, and it's anyway not possible to have the settings automatically picked up, as far as I know. Have a look again at the non-ISO-8858-1 characters and see if they are correct. I suspect not, as they all appear to be the unspecified character (\ufffd), at least when treated as UTF-8. Non-ascii characters in test source files - Key: CODEC-127 URL: https://issues.apache.org/jira/browse/CODEC-127 Project: Commons Codec Issue Type: Bug Reporter: Sebb Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java {code} language\SoundexTest.java:367 in particular is incorrect, because it's supposed to be a single character. Now one might think that native2ascii -encoding UTF-8 would fix that, but it gives: if (Character.isLetter('\ufffd')) which is an unknown character. Similarly for binary\Base64Test.java:96. It's not all that clear what the Unicode escapes should be in these cases, but probably not the unknown character. [Possibly the characters got mangled at some point, or maybe they have always been wrong] The ColognePhoneticTest.java cases are less serious, as the characters are valid ISO-8859-1 (accented German), but given that the rest of the file uses unicode escaps, I think they should be changed too (but add comments to say what they are, e.g. o-umlaut, u-umlaut) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CODEC-127) Non-ascii characters in test source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084743#comment-13084743 ] Sebb commented on CODEC-127: Here's the full list of lines containing non-ASCII characters: {code} java/org/apache/commons/codec/language/ColognePhonetic.java:264private static final char[][] PREPROCESS_MAP = new char[][]{{'\u00C4', 'A'}, // ├âÔÇ× java/org/apache/commons/codec/language/ColognePhonetic.java:265 {'\u00DC', 'U'}, // ├â┼ô java/org/apache/commons/codec/language/ColognePhonetic.java:266 {'\u00D6', 'O'}, // ├âÔÇô java/org/apache/commons/codec/language/ColognePhonetic.java:267 {'\u00DF', 'S'} // ├â┼© java/org/apache/commons/codec/language/ColognePhonetic.java:388 * Converts the string to upper case and replaces germanic umlauts, and the ├óÔé¼┼ô├â┼©├óÔé¼´┐¢. test/org/apache/commons/codec/binary/Base64Test.java:96byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); test/org/apache/commons/codec/language/ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, test/org/apache/commons/codec/language/ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; test/org/apache/commons/codec/language/ColognePhoneticTest.java:137 {Meyer, M├╝ller}, test/org/apache/commons/codec/language/ColognePhoneticTest.java:143 {ganz, G├ñnse}, test/org/apache/commons/codec/language/DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); test/org/apache/commons/codec/language/DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); test/org/apache/commons/codec/language/SoundexTest.java:367if (Character.isLetter('´┐¢')) { test/org/apache/commons/codec/language/SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); test/org/apache/commons/codec/language/SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); test/org/apache/commons/codec/language/SoundexTest.java:387if (Character.isLetter('´┐¢')) { test/org/apache/commons/codec/language/SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); test/org/apache/commons/codec/language/SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); test/org/apache/commons/codec/language/bm/BeiderMorseEncoderTest.java:93 String[] names = { ├ícz, ├ítz, Ign├ícz, Ign├ítz, Ign├íc }; test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:47 { Nu├▒ez, spanish, EXACT }, test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:49 { ─îapek, czech, EXACT }, test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:52 { K├╝├º├╝k, turkish, EXACT }, test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:55 { Ceau┼ƒescu, romanian, EXACT }, test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:57 { ╬æ╬│╬│╬Á╬╗¤î¤Ç╬┐¤à╬╗╬┐¤é, greek, EXACT }, test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:58 { ðƒÐâÐêð║ð©ð¢, cyrillic, EXACT }, test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:59 { ÎøÎö΃, hebrew, EXACT }, test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:60 { ├ícz, any, EXACT }, test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:61 { ├ítz, any, EXACT } }); {code} Note the comment at ColognePhonetic.java:388 - this does not seem to make sense in any encoding, but I could be wrong. Non-ascii characters in test source files - Key: CODEC-127 URL: https://issues.apache.org/jira/browse/CODEC-127 Project: Commons Codec Issue Type: Bug Reporter: Sebb Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt,
[jira] [Issue Comment Edited] (CODEC-127) Non-ascii characters in test source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084743#comment-13084743 ] Sebb edited comment on CODEC-127 at 8/14/11 12:04 AM: -- Here's the full list of lines containing non-ASCII characters: {code} java/org/apache/commons/codec/language/ColognePhonetic.java:264private static final char[][] PREPROCESS_MAP = new char[][]{{'\u00C4', 'A'}, // ├âÔÇ× java/org/apache/commons/codec/language/ColognePhonetic.java:265 {'\u00DC', 'U'}, // ├â┼ô java/org/apache/commons/codec/language/ColognePhonetic.java:266 {'\u00D6', 'O'}, // ├âÔÇô java/org/apache/commons/codec/language/ColognePhonetic.java:267 {'\u00DF', 'S'} // ├â┼© java/org/apache/commons/codec/language/ColognePhonetic.java:388 * Converts the string to upper case and replaces germanic umlauts, and the ├óÔé¼┼ô├â┼©├óÔé¼´┐¢. test/org/apache/commons/codec/binary/Base64Test.java:96byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); test/org/apache/commons/codec/language/ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, test/org/apache/commons/codec/language/ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; test/org/apache/commons/codec/language/ColognePhoneticTest.java:137 {Meyer, M├╝ller}, test/org/apache/commons/codec/language/ColognePhoneticTest.java:143 {ganz, G├ñnse}, test/org/apache/commons/codec/language/DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); test/org/apache/commons/codec/language/DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); test/org/apache/commons/codec/language/SoundexTest.java:367if (Character.isLetter('´┐¢')) { test/org/apache/commons/codec/language/SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); test/org/apache/commons/codec/language/SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); test/org/apache/commons/codec/language/SoundexTest.java:387if (Character.isLetter('´┐¢')) { test/org/apache/commons/codec/language/SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); test/org/apache/commons/codec/language/SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); test/org/apache/commons/codec/language/bm/BeiderMorseEncoderTest.java:93 String[] names = { ├ícz, ├ítz, Ign├ícz, Ign├ítz, Ign├íc }; test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:47 { Nu├▒ez, spanish, EXACT }, test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:49 { ─îapek, czech, EXACT }, test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:52 { K├╝├º├╝k, turkish, EXACT }, test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:55 { Ceau┼ƒescu, romanian, EXACT }, test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:57 { ╬æ╬│╬│╬Á╬╗¤î¤Ç╬┐¤à╬╗╬┐¤é, greek, EXACT }, test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:58 { ðƒÐâÐêð║ð©ð¢, cyrillic, EXACT }, test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:59 { ÎøÎö΃, hebrew, EXACT }, test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:60 { ├ícz, any, EXACT }, test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:61 { ├ítz, any, EXACT } }); {code} Note the comment at ColognePhonetic.java:388 - this does not seem to make sense in any encoding, but I could be wrong. [You'll need to look at it in the source file itself - the Perl script I used is crude and does not display non-ASCII properly] The other dubious entris are: Base64Test.java:96 DoubleMetaphoneTest.java:1222 DoubleMetaphoneTest.java:1227 and most of the SoundexTest.java entries. was (Author: s...@apache.org): Here's the full list of lines containing non-ASCII characters: {code} java/org/apache/commons/codec/language/ColognePhonetic.java:264private static final char[][] PREPROCESS_MAP = new char[][]{{'\u00C4', 'A'}, // ├âÔÇ× java/org/apache/commons/codec/language/ColognePhonetic.java:265 {'\u00DC', 'U'}, // ├â┼ô java/org/apache/commons/codec/language/ColognePhonetic.java:266 {'\u00D6', 'O'}, // ├âÔÇô java/org/apache/commons/codec/language/ColognePhonetic.java:267 {'\u00DF', 'S'} // ├â┼© java/org/apache/commons/codec/language/ColognePhonetic.java:388 * Converts the string to upper case and replaces germanic umlauts, and the ├óÔé¼┼ô├â┼©├óÔé¼´┐¢. test/org/apache/commons/codec/binary/Base64Test.java:96byte[] decode =
[jira] [Commented] (CODEC-127) Non-ascii characters in test source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084752#comment-13084752 ] Sebb commented on CODEC-127: Just done a comparison of the various versions of ColognePhonetic.java in trunk. The corruption of the comments on PREPROCESS_MAP occurred between r1080701 and r1087901 (April 1st, ironically). This also corrupted other comments, and the string at line 382. The SVN log message says Annotate with @Override and @Deprecated - were those added automatically perhaps? Non-ascii characters in test source files - Key: CODEC-127 URL: https://issues.apache.org/jira/browse/CODEC-127 Project: Commons Codec Issue Type: Bug Reporter: Sebb Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java {code} language\SoundexTest.java:367 in particular is incorrect, because it's supposed to be a single character. Now one might think that native2ascii -encoding UTF-8 would fix that, but it gives: if (Character.isLetter('\ufffd')) which is an unknown character. Similarly for binary\Base64Test.java:96. It's not all that clear what the Unicode escapes should be in these cases, but probably not the unknown character. [Possibly the characters got mangled at some point, or maybe they have always been wrong] The ColognePhoneticTest.java cases are less serious, as the characters are valid ISO-8859-1 (accented German), but given that the rest of the file uses unicode escaps, I think they should be changed too (but add comments to say what they are, e.g. o-umlaut, u-umlaut) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CODEC-127) Non-ascii characters in test source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084753#comment-13084753 ] Sebb commented on CODEC-127: SoundexTest appears to have been corrupted in r1075426 = r1080414. Log comment says Keep these files in UTF-8 encoding for proper Javadoc processing However, I suspect the file was originally in ISO-8859-1, not UTF-8. Non-ascii characters in test source files - Key: CODEC-127 URL: https://issues.apache.org/jira/browse/CODEC-127 Project: Commons Codec Issue Type: Bug Reporter: Sebb Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java {code} language\SoundexTest.java:367 in particular is incorrect, because it's supposed to be a single character. Now one might think that native2ascii -encoding UTF-8 would fix that, but it gives: if (Character.isLetter('\ufffd')) which is an unknown character. Similarly for binary\Base64Test.java:96. It's not all that clear what the Unicode escapes should be in these cases, but probably not the unknown character. [Possibly the characters got mangled at some point, or maybe they have always been wrong] The ColognePhoneticTest.java cases are less serious, as the characters are valid ISO-8859-1 (accented German), but given that the rest of the file uses unicode escaps, I think they should be changed too (but add comments to say what they are, e.g. o-umlaut, u-umlaut) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CODEC-127) Non-ascii characters in source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085110#comment-13085110 ] Sebb commented on CODEC-127: What error do you get? Just curious. I now get: {code} commons-codec-generics/src/test/org/apache/commons/codec/language/ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, commons-codec-generics/src/test/org/apache/commons/codec/language/ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; commons-codec-generics/src/test/org/apache/commons/codec/language/ColognePhoneticTest.java:137 {Meyer, M├╝ller}, commons-codec-generics/src/test/org/apache/commons/codec/language/ColognePhoneticTest.java:143 {ganz, G├ñnse}, commons-codec-generics/src/test/org/apache/commons/codec/language/DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); commons-codec-generics/src/test/org/apache/commons/codec/language/DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); commons-codec-generics/src/test/org/apache/commons/codec/language/bm/BeiderMorseEncoderTest.java:93 String[] names = { ├ícz, ├ítz, Ign├ícz, Ign├ítz, Ign├íc }; commons-codec-generics/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:47 { Nu├▒ez, spanish, EXACT }, commons-codec-generics/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:49 { ─îapek, czech, EXACT }, commons-codec-generics/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:52 { K├╝├º├╝k, turkish, EXACT }, commons-codec-generics/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:55 { Ceau┼ƒescu, romanian, EXACT }, commons-codec-generics/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:57 { ╬æ╬│╬│╬Á╬╗¤î¤Ç╬┐¤à╬╗╬┐¤é, greek, EXACT }, commons-codec-generics/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:58 { ðƒÐâÐêð║ð©ð¢, cyrillic, EXACT }, commons-codec-generics/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:59 { ÎøÎö΃, hebrew, EXACT }, commons-codec-generics/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:60 { ├ícz, any, EXACT }, commons-codec-generics/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:61 { ├ítz, any, EXACT } }); {code} and {code} commons-codec/src/test/org/apache/commons/codec/language/ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, commons-codec/src/test/org/apache/commons/codec/language/ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; commons-codec/src/test/org/apache/commons/codec/language/ColognePhoneticTest.java:137 {Meyer, M├╝ller}, commons-codec/src/test/org/apache/commons/codec/language/ColognePhoneticTest.java:143 {ganz, G├ñnse}, commons-codec/src/test/org/apache/commons/codec/language/DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); commons-codec/src/test/org/apache/commons/codec/language/DoubleMetaphoneTest.java:1232 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); commons-codec/src/test/org/apache/commons/codec/language/bm/BeiderMorseEncoderTest.java:93 String[] names = { ├ícz, ├ítz, Ign├ícz, Ign├ítz, Ign├íc }; commons-codec/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:47 { Nu├▒ez, spanish, EXACT }, commons-codec/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:49 { ─îapek, czech, EXACT }, commons-codec/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:52 { K├╝├º├╝k, turkish, EXACT }, commons-codec/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:55 { Ceau┼ƒescu, romanian, EXACT }, commons-codec/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:57 { ╬æ╬│╬│╬Á╬╗¤î¤Ç╬┐¤à╬╗╬┐¤é, greek, EXACT }, commons-codec/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:58 { ðƒÐâÐêð║ð©ð¢, cyrillic, EXACT }, commons-codec/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:59 { ÎøÎö΃, hebrew, EXACT }, commons-codec/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:60 { ├ícz, any, EXACT }, commons-codec/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:61 { ├ítz, any, EXACT } }); {code} This was using an updated version of the script that uses File::Find to process directory traversal better. (Some lines shortened above by manually removing leading spaces) I think all the actual errors have now been
[jira] [Updated] (CODEC-127) Non-ascii characters in source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebb updated CODEC-127: --- Description: Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java {code} language\SoundexTest.java:367 in particular is incorrect, because it's supposed to be a single character. Now one might think that native2ascii -encoding UTF-8 would fix that, but it gives: if (Character.isLetter('\ufffd')) which is an unknown character. Similarly for binary\Base64Test.java:96. It's not all that clear what the Unicode escapes should be in these cases, but probably not the unknown character. [Possibly the characters got mangled at some point, or maybe they have always been wrong] The ColognePhoneticTest.java cases are less serious, as the characters are valid ISO-8859-1 (accented German), but given that the rest of the file uses unicode escaps, I think they should be changed too (but add comments to say what they are, e.g. o-umlaut, u-umlaut) was: Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java {code} language\SoundexTest.java:367 in particular
[jira] [Commented] (CODEC-127) Non-ascii characters in source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085128#comment-13085128 ] Sebb commented on CODEC-127: If you change Eclipse to set the container / resource / text file encoding to UTF-8 (since that is what the POM says) the files should display correctly assuming they really are UTF-8. Non-ascii characters in source files Key: CODEC-127 URL: https://issues.apache.org/jira/browse/CODEC-127 Project: Commons Codec Issue Type: Bug Reporter: Sebb Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java {code} language\SoundexTest.java:367 in particular is incorrect, because it's supposed to be a single character. Now one might think that native2ascii -encoding UTF-8 would fix that, but it gives: if (Character.isLetter('\ufffd')) which is an unknown character. Similarly for binary\Base64Test.java:96. It's not all that clear what the Unicode escapes should be in these cases, but probably not the unknown character. [Possibly the characters got mangled at some point, or maybe they have always been wrong] The ColognePhoneticTest.java cases are less serious, as the characters are valid ISO-8859-1 (accented German), but given that the rest of the file uses unicode escaps, I think they should be changed too (but add comments to say what they are, e.g. o-umlaut, u-umlaut) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CODEC-127) Non-ascii characters in source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085135#comment-13085135 ] Sebb commented on CODEC-127: See my fix to ColognePhoneticTest in trunk. That now shows native comments for all unicode escapes. Two of the otherwise lowercase names were previously converted to the Unicode for upper case umlauts; I wonder if that was a mistake? Non-ascii characters in source files Key: CODEC-127 URL: https://issues.apache.org/jira/browse/CODEC-127 Project: Commons Codec Issue Type: Bug Reporter: Sebb Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java {code} language\SoundexTest.java:367 in particular is incorrect, because it's supposed to be a single character. Now one might think that native2ascii -encoding UTF-8 would fix that, but it gives: if (Character.isLetter('\ufffd')) which is an unknown character. Similarly for binary\Base64Test.java:96. It's not all that clear what the Unicode escapes should be in these cases, but probably not the unknown character. [Possibly the characters got mangled at some point, or maybe they have always been wrong] The ColognePhoneticTest.java cases are less serious, as the characters are valid ISO-8859-1 (accented German), but given that the rest of the file uses unicode escaps, I think they should be changed too (but add comments to say what they are, e.g. o-umlaut, u-umlaut) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CODEC-127) Non-ascii characters in source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085145#comment-13085145 ] Sebb commented on CODEC-127: Sorry, forgot I was using a local module which handles DOS wildcards, see http://docs.activestate.com/activeperl/5.14/lib/pods/perlwin32.html#command_line_wildcard_expansion Either pass each file in separately, or create Wild.pm and use: {code} perl -MWild -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java {code} Wild.pm only works for one level of directories. Non-ascii characters in source files Key: CODEC-127 URL: https://issues.apache.org/jira/browse/CODEC-127 Project: Commons Codec Issue Type: Bug Reporter: Sebb Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java {code} language\SoundexTest.java:367 in particular is incorrect, because it's supposed to be a single character. Now one might think that native2ascii -encoding UTF-8 would fix that, but it gives: if (Character.isLetter('\ufffd')) which is an unknown character. Similarly for binary\Base64Test.java:96. It's not all that clear what the Unicode escapes should be in these cases, but probably not the unknown character. [Possibly the characters got mangled at some point, or maybe they have always been wrong] The ColognePhoneticTest.java cases are less serious, as the characters are valid ISO-8859-1 (accented German), but given that the rest of the file uses unicode escaps, I think they should be changed too (but add comments to say what they are, e.g. o-umlaut, u-umlaut) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CODEC-127) Non-ascii characters in source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085149#comment-13085149 ] Sebb commented on CODEC-127: It's not that one cannot edit UTF-8; the problem is that it is easy to mangle non-ASCII characters by mistake. The safest is to only use ASCII, i.e. Unicode escapes, which are valid in both UTF-8 and ISO-8859-1 and all likely default encodings. However, they are difficult to read, hence the comments on the lines. If the comments get mangled, it will be obvious, because they won't look right; and it's relatively easy to fix them from the Unicode. I don't think it's an option to use native characters in the non-comment code, because we already know they can get corrupted, and the corruption won't necessarily cause errors. I don't see the harm in translating the code into commments; after all the translation can be done again. Non-ascii characters in source files Key: CODEC-127 URL: https://issues.apache.org/jira/browse/CODEC-127 Project: Commons Codec Issue Type: Bug Reporter: Sebb Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java {code} language\SoundexTest.java:367 in particular is incorrect, because it's supposed to be a single character. Now one might think that native2ascii -encoding UTF-8 would fix that, but it gives: if (Character.isLetter('\ufffd')) which is an unknown character. Similarly for binary\Base64Test.java:96. It's not all that clear what the Unicode escapes should be in these cases, but probably not the unknown character. [Possibly the characters got mangled at some point, or maybe they have always been wrong] The ColognePhoneticTest.java cases are less serious, as the characters are valid ISO-8859-1 (accented German), but given that the rest of the file uses unicode escaps, I think they should be changed too (but add comments to say what they are, e.g. o-umlaut, u-umlaut) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CODEC-127) Non-ascii characters in source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085145#comment-13085145 ] Sebb edited comment on CODEC-127 at 8/15/11 4:55 PM: - Sorry, forgot I was using a local module which handles DOS wildcards, see http://docs.activestate.com/activeperl/5.14/lib/pods/perlwin32.html#command_line_wildcard_expansion Either pass each file in separately, or create Wild.pm and use: {code} perl -MWild -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java {code} Wild.pm only works for one level of directories. was (Author: s...@apache.org): Sorry, forgot I was using a local module which handles DOS wildcards, see http://docs.activestate.com/activeperl/5.14/lib/pods/perlwin32.html#command_line_wildcard_expansion Either pass each file in separately, or create Wild.pm and use: {code} perl -MWild -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java {code} Wild.pm only works for one level of directories. Non-ascii characters in source files Key: CODEC-127 URL: https://issues.apache.org/jira/browse/CODEC-127 Project: Commons Codec Issue Type: Bug Reporter: Sebb Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java {code} language\SoundexTest.java:367 in particular is incorrect, because it's supposed to be a single character. Now one might think that native2ascii -encoding UTF-8 would fix that, but it gives: if (Character.isLetter('\ufffd')) which is an unknown character. Similarly for binary\Base64Test.java:96. It's not all that clear what the Unicode escapes should be in these cases, but probably not the unknown character. [Possibly the characters got mangled at some point, or maybe they have always been wrong] The ColognePhoneticTest.java cases are less serious, as the characters are valid ISO-8859-1 (accented German), but given that the rest of the file uses unicode escaps, I think they should be changed too (but add comments to say what they are, e.g. o-umlaut, u-umlaut) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CODEC-127) Non-ascii characters in source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085165#comment-13085165 ] Sebb commented on CODEC-127: Sorry, closing was in the wrong place; it should have been before the file name params Non-ascii characters in source files Key: CODEC-127 URL: https://issues.apache.org/jira/browse/CODEC-127 Project: Commons Codec Issue Type: Bug Reporter: Sebb Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java {code} language\SoundexTest.java:367 in particular is incorrect, because it's supposed to be a single character. Now one might think that native2ascii -encoding UTF-8 would fix that, but it gives: if (Character.isLetter('\ufffd')) which is an unknown character. Similarly for binary\Base64Test.java:96. It's not all that clear what the Unicode escapes should be in these cases, but probably not the unknown character. [Possibly the characters got mangled at some point, or maybe they have always been wrong] The ColognePhoneticTest.java cases are less serious, as the characters are valid ISO-8859-1 (accented German), but given that the rest of the file uses unicode escaps, I think they should be changed too (but add comments to say what they are, e.g. o-umlaut, u-umlaut) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CHAIN-53) Global Update of Chain - Generics, JDK 1.5, Update Dependency Versions
[ https://issues.apache.org/jira/browse/CHAIN-53?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085237#comment-13085237 ] Sebb commented on CHAIN-53: --- Major version bump is not required when changing minimum Java version (though would be sensible if making a major jump) http://commons.apache.org/releases/versioning.html Global Update of Chain - Generics, JDK 1.5, Update Dependency Versions -- Key: CHAIN-53 URL: https://issues.apache.org/jira/browse/CHAIN-53 Project: Commons Chain Issue Type: Improvement Reporter: Elijah Zupancic Labels: newbie, patch As posted in the mailing list, I've done this work outside of an offical branch. Here is the source: http://elijah.zupancic.name/projects/commons-chain-v2-proof-of-concept.tar.gz And here is a diff: http://elijah.zupancic.name/projects/uber-diff In this patch: * Global upgrade to the JDK 1.5 * Added @Override annotations * Upgraded to the Servlet 2.5 API * Upgraded to the Faces 2.1 API * Upgraded to the Portlet 2.0 API * Upgraded the Maven Parent POM version * Added generics support to Command so that Command's API looks like: public interface CommandT extends Context { ... boolean execute(T context) throws Exception; } I'm very much new to the ASF and I was advised to file a bug in order to get the process started for these changes to be integrated. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CODEC-127) Non-ascii characters in source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085242#comment-13085242 ] Sebb commented on CODEC-127: Actually, DoubleMetaphoneTest is still corrupt; fixing now. Non-ascii characters in source files Key: CODEC-127 URL: https://issues.apache.org/jira/browse/CODEC-127 Project: Commons Codec Issue Type: Bug Reporter: Sebb Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java {code} language\SoundexTest.java:367 in particular is incorrect, because it's supposed to be a single character. Now one might think that native2ascii -encoding UTF-8 would fix that, but it gives: if (Character.isLetter('\ufffd')) which is an unknown character. Similarly for binary\Base64Test.java:96. It's not all that clear what the Unicode escapes should be in these cases, but probably not the unknown character. [Possibly the characters got mangled at some point, or maybe they have always been wrong] The ColognePhoneticTest.java cases are less serious, as the characters are valid ISO-8859-1 (accented German), but given that the rest of the file uses unicode escaps, I think they should be changed too (but add comments to say what they are, e.g. o-umlaut, u-umlaut) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CODEC-127) Non-ascii characters in source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085266#comment-13085266 ] Sebb commented on CODEC-127: Tried it here; works fine. Probably an error in your Wild.pm, because I see the same if I omit the -MWild option. Non-ascii characters in source files Key: CODEC-127 URL: https://issues.apache.org/jira/browse/CODEC-127 Project: Commons Codec Issue Type: Bug Reporter: Sebb Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java {code} language\SoundexTest.java:367 in particular is incorrect, because it's supposed to be a single character. Now one might think that native2ascii -encoding UTF-8 would fix that, but it gives: if (Character.isLetter('\ufffd')) which is an unknown character. Similarly for binary\Base64Test.java:96. It's not all that clear what the Unicode escapes should be in these cases, but probably not the unknown character. [Possibly the characters got mangled at some point, or maybe they have always been wrong] The ColognePhoneticTest.java cases are less serious, as the characters are valid ISO-8859-1 (accented German), but given that the rest of the file uses unicode escaps, I think they should be changed too (but add comments to say what they are, e.g. o-umlaut, u-umlaut) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CODEC-127) Non-ascii characters in source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebb updated CODEC-127: --- Comment: was deleted (was: Sebb: I get errors when I try your perl script on Windows with the latest perl (64 bit) from ActiveState. Rather than use this space to figure out why, can you please run it again and check if we are done with this ticket? Thank you, Gary) Non-ascii characters in source files Key: CODEC-127 URL: https://issues.apache.org/jira/browse/CODEC-127 Project: Commons Codec Issue Type: Bug Reporter: Sebb Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; .java {code} language\SoundexTest.java:367 in particular is incorrect, because it's supposed to be a single character. Now one might think that native2ascii -encoding UTF-8 would fix that, but it gives: if (Character.isLetter('\ufffd')) which is an unknown character. Similarly for binary\Base64Test.java:96. It's not all that clear what the Unicode escapes should be in these cases, but probably not the unknown character. [Possibly the characters got mangled at some point, or maybe they have always been wrong] The ColognePhoneticTest.java cases are less serious, as the characters are valid ISO-8859-1 (accented German), but given that the rest of the file uses unicode escaps, I think they should be changed too (but add comments to say what they are, e.g. o-umlaut, u-umlaut) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CODEC-127) Non-ascii characters in source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebb updated CODEC-127: --- Comment: was deleted (was: Sorry, closing was in the wrong place; it should have been before the file name params) Non-ascii characters in source files Key: CODEC-127 URL: https://issues.apache.org/jira/browse/CODEC-127 Project: Commons Codec Issue Type: Bug Reporter: Sebb Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java {code} language\SoundexTest.java:367 in particular is incorrect, because it's supposed to be a single character. Now one might think that native2ascii -encoding UTF-8 would fix that, but it gives: if (Character.isLetter('\ufffd')) which is an unknown character. Similarly for binary\Base64Test.java:96. It's not all that clear what the Unicode escapes should be in these cases, but probably not the unknown character. [Possibly the characters got mangled at some point, or maybe they have always been wrong] The ColognePhoneticTest.java cases are less serious, as the characters are valid ISO-8859-1 (accented German), but given that the rest of the file uses unicode escaps, I think they should be changed too (but add comments to say what they are, e.g. o-umlaut, u-umlaut) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CODEC-127) Non-ascii characters in source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebb updated CODEC-127: --- Description: Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; .java {code} language\SoundexTest.java:367 in particular is incorrect, because it's supposed to be a single character. Now one might think that native2ascii -encoding UTF-8 would fix that, but it gives: if (Character.isLetter('\ufffd')) which is an unknown character. Similarly for binary\Base64Test.java:96. It's not all that clear what the Unicode escapes should be in these cases, but probably not the unknown character. [Possibly the characters got mangled at some point, or maybe they have always been wrong] The ColognePhoneticTest.java cases are less serious, as the characters are valid ISO-8859-1 (accented German), but given that the rest of the file uses unicode escaps, I think they should be changed too (but add comments to say what they are, e.g. o-umlaut, u-umlaut) was: Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java {code} language\SoundexTest.java:367 in
[jira] [Updated] (CODEC-127) Non-ascii characters in source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebb updated CODEC-127: --- Comment: was deleted (was: If I run the command as is, I get: {quote} Can't open perl script ne: No such file or directory {quote}) Non-ascii characters in source files Key: CODEC-127 URL: https://issues.apache.org/jira/browse/CODEC-127 Project: Commons Codec Issue Type: Bug Reporter: Sebb Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; .java {code} language\SoundexTest.java:367 in particular is incorrect, because it's supposed to be a single character. Now one might think that native2ascii -encoding UTF-8 would fix that, but it gives: if (Character.isLetter('\ufffd')) which is an unknown character. Similarly for binary\Base64Test.java:96. It's not all that clear what the Unicode escapes should be in these cases, but probably not the unknown character. [Possibly the characters got mangled at some point, or maybe they have always been wrong] The ColognePhoneticTest.java cases are less serious, as the characters are valid ISO-8859-1 (accented German), but given that the rest of the file uses unicode escaps, I think they should be changed too (but add comments to say what they are, e.g. o-umlaut, u-umlaut) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CODEC-127) Non-ascii characters in source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebb updated CODEC-127: --- Comment: was deleted (was: Can you post your .pm here or email to ggregory at apache dot org? ) Non-ascii characters in source files Key: CODEC-127 URL: https://issues.apache.org/jira/browse/CODEC-127 Project: Commons Codec Issue Type: Bug Reporter: Sebb Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; .java {code} language\SoundexTest.java:367 in particular is incorrect, because it's supposed to be a single character. Now one might think that native2ascii -encoding UTF-8 would fix that, but it gives: if (Character.isLetter('\ufffd')) which is an unknown character. Similarly for binary\Base64Test.java:96. It's not all that clear what the Unicode escapes should be in these cases, but probably not the unknown character. [Possibly the characters got mangled at some point, or maybe they have always been wrong] The ColognePhoneticTest.java cases are less serious, as the characters are valid ISO-8859-1 (accented German), but given that the rest of the file uses unicode escaps, I think they should be changed too (but add comments to say what they are, e.g. o-umlaut, u-umlaut) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CODEC-127) Non-ascii characters in source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085110#comment-13085110 ] Sebb edited comment on CODEC-127 at 8/15/11 8:07 PM: - I now get: {code} commons-codec-generics/src/test/org/apache/commons/codec/language/ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, commons-codec-generics/src/test/org/apache/commons/codec/language/ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; commons-codec-generics/src/test/org/apache/commons/codec/language/ColognePhoneticTest.java:137 {Meyer, M├╝ller}, commons-codec-generics/src/test/org/apache/commons/codec/language/ColognePhoneticTest.java:143 {ganz, G├ñnse}, commons-codec-generics/src/test/org/apache/commons/codec/language/DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); commons-codec-generics/src/test/org/apache/commons/codec/language/DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); commons-codec-generics/src/test/org/apache/commons/codec/language/bm/BeiderMorseEncoderTest.java:93 String[] names = { ├ícz, ├ítz, Ign├ícz, Ign├ítz, Ign├íc }; commons-codec-generics/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:47 { Nu├▒ez, spanish, EXACT }, commons-codec-generics/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:49 { ─îapek, czech, EXACT }, commons-codec-generics/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:52 { K├╝├º├╝k, turkish, EXACT }, commons-codec-generics/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:55 { Ceau┼ƒescu, romanian, EXACT }, commons-codec-generics/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:57 { ╬æ╬│╬│╬Á╬╗¤î¤Ç╬┐¤à╬╗╬┐¤é, greek, EXACT }, commons-codec-generics/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:58 { ðƒÐâÐêð║ð©ð¢, cyrillic, EXACT }, commons-codec-generics/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:59 { ÎøÎö΃, hebrew, EXACT }, commons-codec-generics/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:60 { ├ícz, any, EXACT }, commons-codec-generics/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:61 { ├ítz, any, EXACT } }); {code} and {code} commons-codec/src/test/org/apache/commons/codec/language/ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, commons-codec/src/test/org/apache/commons/codec/language/ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; commons-codec/src/test/org/apache/commons/codec/language/ColognePhoneticTest.java:137 {Meyer, M├╝ller}, commons-codec/src/test/org/apache/commons/codec/language/ColognePhoneticTest.java:143 {ganz, G├ñnse}, commons-codec/src/test/org/apache/commons/codec/language/DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); commons-codec/src/test/org/apache/commons/codec/language/DoubleMetaphoneTest.java:1232 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); commons-codec/src/test/org/apache/commons/codec/language/bm/BeiderMorseEncoderTest.java:93 String[] names = { ├ícz, ├ítz, Ign├ícz, Ign├ítz, Ign├íc }; commons-codec/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:47 { Nu├▒ez, spanish, EXACT }, commons-codec/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:49 { ─îapek, czech, EXACT }, commons-codec/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:52 { K├╝├º├╝k, turkish, EXACT }, commons-codec/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:55 { Ceau┼ƒescu, romanian, EXACT }, commons-codec/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:57 { ╬æ╬│╬│╬Á╬╗¤î¤Ç╬┐¤à╬╗╬┐¤é, greek, EXACT }, commons-codec/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:58 { ðƒÐâÐêð║ð©ð¢, cyrillic, EXACT }, commons-codec/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:59 { ÎøÎö΃, hebrew, EXACT }, commons-codec/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:60 { ├ícz, any, EXACT }, commons-codec/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:61 { ├ítz, any, EXACT } }); {code} This was using an updated version of the script that uses File::Find to process directory traversal better. (Some lines shortened above by manually removing leading spaces) I think all the actual errors have now
[jira] [Updated] (CODEC-127) Non-ascii characters in source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebb updated CODEC-127: --- Comment: was deleted (was: Typo - missing hyphen for flags) Non-ascii characters in source files Key: CODEC-127 URL: https://issues.apache.org/jira/browse/CODEC-127 Project: Commons Codec Issue Type: Bug Reporter: Sebb Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; .java {code} language\SoundexTest.java:367 in particular is incorrect, because it's supposed to be a single character. Now one might think that native2ascii -encoding UTF-8 would fix that, but it gives: if (Character.isLetter('\ufffd')) which is an unknown character. Similarly for binary\Base64Test.java:96. It's not all that clear what the Unicode escapes should be in these cases, but probably not the unknown character. [Possibly the characters got mangled at some point, or maybe they have always been wrong] The ColognePhoneticTest.java cases are less serious, as the characters are valid ISO-8859-1 (accented German), but given that the rest of the file uses unicode escaps, I think they should be changed too (but add comments to say what they are, e.g. o-umlaut, u-umlaut) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CODEC-127) Non-ascii characters in source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebb updated CODEC-127: --- Comment: was deleted (was: Tried it here; works fine. Probably an error in your Wild.pm, because I see the same if I omit the -MWild option.) Non-ascii characters in source files Key: CODEC-127 URL: https://issues.apache.org/jira/browse/CODEC-127 Project: Commons Codec Issue Type: Bug Reporter: Sebb Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; .java {code} language\SoundexTest.java:367 in particular is incorrect, because it's supposed to be a single character. Now one might think that native2ascii -encoding UTF-8 would fix that, but it gives: if (Character.isLetter('\ufffd')) which is an unknown character. Similarly for binary\Base64Test.java:96. It's not all that clear what the Unicode escapes should be in these cases, but probably not the unknown character. [Possibly the characters got mangled at some point, or maybe they have always been wrong] The ColognePhoneticTest.java cases are less serious, as the characters are valid ISO-8859-1 (accented German), but given that the rest of the file uses unicode escaps, I think they should be changed too (but add comments to say what they are, e.g. o-umlaut, u-umlaut) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CODEC-127) Non-ascii characters in source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebb updated CODEC-127: --- Comment: was deleted (was: Perl: I did all that and I get: {noformat} C:\svn\org\apache\commons\trunks-proper\codecperl -MWild -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java syntax error at -e line 1, near *. Execution of -e aborted due to compilation errors. {noformat} I also have: PERL5OPT=-MWild in my environment. Gary) Non-ascii characters in source files Key: CODEC-127 URL: https://issues.apache.org/jira/browse/CODEC-127 Project: Commons Codec Issue Type: Bug Reporter: Sebb Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; .java {code} language\SoundexTest.java:367 in particular is incorrect, because it's supposed to be a single character. Now one might think that native2ascii -encoding UTF-8 would fix that, but it gives: if (Character.isLetter('\ufffd')) which is an unknown character. Similarly for binary\Base64Test.java:96. It's not all that clear what the Unicode escapes should be in these cases, but probably not the unknown character. [Possibly the characters got mangled at some point, or maybe they have always been wrong] The ColognePhoneticTest.java cases are less serious, as the characters are valid ISO-8859-1 (accented German), but given that the rest of the file uses unicode escaps, I think they should be changed too (but add comments to say what they are, e.g. o-umlaut, u-umlaut) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CODEC-127) Non-ascii characters in source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebb updated CODEC-127: --- Comment: was deleted (was: Arg: {noformat} C:\svn\org\apache\commons\trunks-proper\codecperl -MWild -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java Can't open */*.java: Invalid argument. {noformat} ) Non-ascii characters in source files Key: CODEC-127 URL: https://issues.apache.org/jira/browse/CODEC-127 Project: Commons Codec Issue Type: Bug Reporter: Sebb Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; .java {code} language\SoundexTest.java:367 in particular is incorrect, because it's supposed to be a single character. Now one might think that native2ascii -encoding UTF-8 would fix that, but it gives: if (Character.isLetter('\ufffd')) which is an unknown character. Similarly for binary\Base64Test.java:96. It's not all that clear what the Unicode escapes should be in these cases, but probably not the unknown character. [Possibly the characters got mangled at some point, or maybe they have always been wrong] The ColognePhoneticTest.java cases are less serious, as the characters are valid ISO-8859-1 (accented German), but given that the rest of the file uses unicode escaps, I think they should be changed too (but add comments to say what they are, e.g. o-umlaut, u-umlaut) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CODEC-127) Non-ascii characters in source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebb updated CODEC-127: --- Comment: was deleted (was: Sorry, forgot I was using a local module which handles DOS wildcards, see http://docs.activestate.com/activeperl/5.14/lib/pods/perlwin32.html#command_line_wildcard_expansion Either pass each file in separately, or create Wild.pm and use: {code} perl -MWild -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java {code} Wild.pm only works for one level of directories.) Non-ascii characters in source files Key: CODEC-127 URL: https://issues.apache.org/jira/browse/CODEC-127 Project: Commons Codec Issue Type: Bug Reporter: Sebb Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; .java {code} language\SoundexTest.java:367 in particular is incorrect, because it's supposed to be a single character. Now one might think that native2ascii -encoding UTF-8 would fix that, but it gives: if (Character.isLetter('\ufffd')) which is an unknown character. Similarly for binary\Base64Test.java:96. It's not all that clear what the Unicode escapes should be in these cases, but probably not the unknown character. [Possibly the characters got mangled at some point, or maybe they have always been wrong] The ColognePhoneticTest.java cases are less serious, as the characters are valid ISO-8859-1 (accented German), but given that the rest of the file uses unicode escaps, I think they should be changed too (but add comments to say what they are, e.g. o-umlaut, u-umlaut) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CODEC-127) Non-ascii characters in source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebb updated CODEC-127: --- Comment: was deleted (was: If I run: {noformat} perl -n -e $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java {noformat} I get: {noformat} Can't open */*.java: Invalid argument. {noformat} ) Non-ascii characters in source files Key: CODEC-127 URL: https://issues.apache.org/jira/browse/CODEC-127 Project: Commons Codec Issue Type: Bug Reporter: Sebb Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; .java {code} language\SoundexTest.java:367 in particular is incorrect, because it's supposed to be a single character. Now one might think that native2ascii -encoding UTF-8 would fix that, but it gives: if (Character.isLetter('\ufffd')) which is an unknown character. Similarly for binary\Base64Test.java:96. It's not all that clear what the Unicode escapes should be in these cases, but probably not the unknown character. [Possibly the characters got mangled at some point, or maybe they have always been wrong] The ColognePhoneticTest.java cases are less serious, as the characters are valid ISO-8859-1 (accented German), but given that the rest of the file uses unicode escaps, I think they should be changed too (but add comments to say what they are, e.g. o-umlaut, u-umlaut) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CODEC-127) Non-ascii characters in source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085301#comment-13085301 ] Sebb commented on CODEC-127: I think all the files are now fixed so that the code uses Unicode escapes; the only non-ASCII characters are now in comments. Non-ascii characters in source files Key: CODEC-127 URL: https://issues.apache.org/jira/browse/CODEC-127 Project: Commons Codec Issue Type: Bug Reporter: Sebb Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; .java {code} language\SoundexTest.java:367 in particular is incorrect, because it's supposed to be a single character. Now one might think that native2ascii -encoding UTF-8 would fix that, but it gives: if (Character.isLetter('\ufffd')) which is an unknown character. Similarly for binary\Base64Test.java:96. It's not all that clear what the Unicode escapes should be in these cases, but probably not the unknown character. [Possibly the characters got mangled at some point, or maybe they have always been wrong] The ColognePhoneticTest.java cases are less serious, as the characters are valid ISO-8859-1 (accented German), but given that the rest of the file uses unicode escaps, I think they should be changed too (but add comments to say what they are, e.g. o-umlaut, u-umlaut) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (JCI-67) Dubious use of mkdirs() return code
Dubious use of mkdirs() return code --- Key: JCI-67 URL: https://issues.apache.org/jira/browse/JCI-67 Project: Commons JCI Issue Type: Bug Reporter: Sebb Priority: Minor FileRestoreStore.java uses mkdirs() as follows: {code} final File parent = file.getParentFile(); if (!parent.exists()) { if (!parent.mkdirs()) { throw new IOException(could not create + parent); } } {code} Now mkdirs() returns true *only* if the method actually created the directories; it's theoretically possible for the directory to be created in the window between the exists() and mkdirs() invocations. Also, the initial exists() call is redundant, because that's what mkdirs() does anyway (in the RI implementation, at least). I suggest the following instead: {code} final File parent = file.getParentFile(); if (!parent.mkdirs() !parent.exists()) { throw new IOException(could not create + parent); } } {code} If mkdirs() returns false, the code then checks to see if the directory exists, so the throws clause will only be invoked if the parent really cannot be created. The same code also appears in AbstractTestCase and FilesystemAlterationMonitorTestCase. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (JCI-67) Dubious use of mkdirs() return code
[ https://issues.apache.org/jira/browse/JCI-67?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085339#comment-13085339 ] Sebb commented on JCI-67: - Safer would be the following, as it checks the path is actually a directory: {code} final File parent = file.getParentFile(); if (!parent.mkdirs() !parent.isDirectory()) { throw new IOException(could not create + parent); } } {code} Dubious use of mkdirs() return code --- Key: JCI-67 URL: https://issues.apache.org/jira/browse/JCI-67 Project: Commons JCI Issue Type: Bug Reporter: Sebb Priority: Minor FileRestoreStore.java uses mkdirs() as follows: {code} final File parent = file.getParentFile(); if (!parent.exists()) { if (!parent.mkdirs()) { throw new IOException(could not create + parent); } } {code} Now mkdirs() returns true *only* if the method actually created the directories; it's theoretically possible for the directory to be created in the window between the exists() and mkdirs() invocations. Also, the initial exists() call is redundant, because that's what mkdirs() does anyway (in the RI implementation, at least). I suggest the following instead: {code} final File parent = file.getParentFile(); if (!parent.mkdirs() !parent.exists()) { throw new IOException(could not create + parent); } } {code} If mkdirs() returns false, the code then checks to see if the directory exists, so the throws clause will only be invoked if the parent really cannot be created. The same code also appears in AbstractTestCase and FilesystemAlterationMonitorTestCase. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (IO-280) Dubious use of mkdirs() return code
Dubious use of mkdirs() return code --- Key: IO-280 URL: https://issues.apache.org/jira/browse/IO-280 Project: Commons IO Issue Type: Bug Reporter: Sebb Priority: Minor FileUtils.openOutputStream() has the following code: {code} File parent = file.getParentFile(); if (parent != null parent.exists() == false) { if (parent.mkdirs() == false) { throw new IOException(File ' + file + ' could not be created); } } {code} Now mkdirs() returns true only if the method actually created the directories; it's theoretically possible for the directory to be created in the window between the exists() and mkdirs() invocations. [Indeed the class actually checks for this in the forceMkdir() method] It would be safer to use: {code} File parent = file.getParentFile(); if (parent != null !parent.mkdirs() !parent.isDirectory()) { throw new IOException(Directory ' + parent + ' could not be created); // note changed text } } {code} Similarly elsewhere in the class where mkdirs() is used. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CODEC-127) Non-ascii characters in source files
[ https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085997#comment-13085997 ] Sebb commented on CODEC-127: I think Base64Test is OK - I looked back at the original commits, and found an uncorrupted version. By the way, it was only Test files that needed fixing, apart from ColognePhonetic, where the fixes were only needed in comments anyway. Non-ascii characters in source files Key: CODEC-127 URL: https://issues.apache.org/jira/browse/CODEC-127 Project: Commons Codec Issue Type: Bug Reporter: Sebb Some of the test cases include characters in a native encoding (possibly UTF-8), rather than using Unicode escapes. This can cause a problem for IDEs if they don't know the encoding (e.g. cause compilation errors, which is how I found the issue), and possibly some transformations may corrupt the contents, e.g. fixing EOL. I think we should have a rule of using Unicode escapes for all such non-ascii characters. It's particularly important for non-ISO-8859-1 characters. Some example classes with non-ascii characters: {code} binary\Base64Test.java:96 byte[] decode = b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=); language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 664645214}, language\ColognePhoneticTest.java:130 String[][] data = {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}}; language\ColognePhoneticTest.java:137 {Meyer, M├╝ller}, language\ColognePhoneticTest.java:143 {ganz, G├ñnse}, language\DoubleMetaphoneTest.java:1222 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S); language\DoubleMetaphoneTest.java:1227 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N); language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:375 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) { language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, this.getSoundexEncoder().encode(´┐¢)); language\SoundexTest.java:395 Assert.assertEquals(, this.getSoundexEncoder().encode(´┐¢)); {code} The characters are probably not correct above, because I used a crude perl script to find them: {code} perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; .java {code} language\SoundexTest.java:367 in particular is incorrect, because it's supposed to be a single character. Now one might think that native2ascii -encoding UTF-8 would fix that, but it gives: if (Character.isLetter('\ufffd')) which is an unknown character. Similarly for binary\Base64Test.java:96. It's not all that clear what the Unicode escapes should be in these cases, but probably not the unknown character. [Possibly the characters got mangled at some point, or maybe they have always been wrong] The ColognePhoneticTest.java cases are less serious, as the characters are valid ISO-8859-1 (accented German), but given that the rest of the file uses unicode escaps, I think they should be changed too (but add comments to say what they are, e.g. o-umlaut, u-umlaut) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (LANG-744) StringUtils throws java.security.AccessControlException on Google App Engine
[ https://issues.apache.org/jira/browse/LANG-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088637#comment-13088637 ] Sebb commented on LANG-744: --- The static code should probably just catch Exception. Do we really want any RuntimeExceptions to escape into the calling code? StringUtils throws java.security.AccessControlException on Google App Engine Key: LANG-744 URL: https://issues.apache.org/jira/browse/LANG-744 Project: Commons Lang Issue Type: Bug Components: lang.* Affects Versions: 3.0.1 Environment: Google App Engine Reporter: Clément Denis In the static initializer of org.apache.commons.lang3.StringUtils, there is an attempt to load the class sun.text.Normalizer. Such a class is prohibited on Google App Engine, and the static intializer throws a java.security.AccessControlException. {code} Caused by: java.security.AccessControlException: access denied (java.lang.RuntimePermission accessClassInPackage.sun.text) at java.security.AccessControlContext.checkPermission(AccessControlContext.java:374) at java.security.AccessController.checkPermission(AccessController.java:546) at java.lang.SecurityManager.checkPermission(SecurityManager.java:532) at com.google.appengine.tools.development.DevAppServerFactory$CustomSecurityManager.checkPermission(DevAppServerFactory.java:166) at java.lang.SecurityManager.checkPackageAccess(SecurityManager.java:1512) at java.lang.Class.checkMemberAccess(Class.java:2164) at java.lang.Class.getMethod(Class.java:1602) at org.apache.commons.lang3.StringUtils.clinit(StringUtils.java:739) {code} The exception should be caught in the catch clauses around loadClass(sun.text.Normalizer). Commons lang 2 worked fine on GAE. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MATH-649) SimpleRegression needs the ability to suppress the intercept
[ https://issues.apache.org/jira/browse/MATH-649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebb updated MATH-649: -- Summary: SimpleRegression needs the ability to suppress the intercept (was: SimpleRegression needs the ability to surpress the intercept) Typo SimpleRegression needs the ability to suppress the intercept Key: MATH-649 URL: https://issues.apache.org/jira/browse/MATH-649 Project: Commons Math Issue Type: New Feature Affects Versions: 1.2, 2.1, 2.2 Environment: JAVA Reporter: greg sterijevski Priority: Minor Labels: NOINTERCEPT, SIMPLEREGRESSION Fix For: 3.0 Attachments: simplereg, simpleregtest Original Estimate: 2h Remaining Estimate: 2h The SimpleRegression class is a useful class for running regressions involving one independent variable. It lacks the ability to constrain the constant to be zero. I am attaching a patch which gives a constructor for setting NOINT. I am also checking in two NIST data sets for noint estimation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (IO-281) WildcardFileFilter fails for wild card pattern with a '*' in it
[ https://issues.apache.org/jira/browse/IO-281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088946#comment-13088946 ] Sebb commented on IO-281: - Are you sure that the dir variable points to the correct directory? Try printing it out, and/or removing the filter. WildcardFileFilter fails for wild card pattern with a '*' in it --- Key: IO-281 URL: https://issues.apache.org/jira/browse/IO-281 Project: Commons IO Issue Type: Bug Components: Filters Affects Versions: 1.3.2 Environment: Windows XP Reporter: Dean Schulze Priority: Blocker The code below reports no files found when there is a file matching the wild card pattern. If I enter this command in a DOS windows in the same directory it finds the file so the wild card pattern is correct as far as DOS is concerned: {code} C:\dean\clipper\src\metadata.maildir 320620110821433-*.RWD Directory of C:\dean\clipper\src\metadata.mail 08/22/2011 12:36 PM 9,728 320620110821433-1.RWD 1 File(s) 9,728 bytes 0 Dir(s) 50,033,049,600 bytes free {code} This code should work according to the docs but it reports no file found: {code} void testFileNameFilter() throws IOException { String fileNamePrefix = 320620110821433; File f = new File(fileNamePrefix + .rwd); String filterString = fileNamePrefix + -*.RWD; FileFilter filter = new WildcardFileFilter(filterString, IOCase.SYSTEM); File dir = f.getCanonicalFile(); File[] existingFiles = dir.listFiles(filter); if (existingFiles != null) for (File f2 : existingFiles) System.out.println(f2.getName()); else System.out.println(No files found for + filterString); } {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Closed] (IO-281) WildcardFileFilter fails for wild card pattern with a '*' in it
[ https://issues.apache.org/jira/browse/IO-281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebb closed IO-281. --- WildcardFileFilter fails for wild card pattern with a '*' in it --- Key: IO-281 URL: https://issues.apache.org/jira/browse/IO-281 Project: Commons IO Issue Type: Bug Components: Filters Affects Versions: 1.3.2 Environment: Windows XP Reporter: Dean Schulze Priority: Blocker The code below reports no files found when there is a file matching the wild card pattern. If I enter this command in a DOS windows in the same directory it finds the file so the wild card pattern is correct as far as DOS is concerned: {code} C:\dean\clipper\src\metadata.maildir 320620110821433-*.RWD Directory of C:\dean\clipper\src\metadata.mail 08/22/2011 12:36 PM 9,728 320620110821433-1.RWD 1 File(s) 9,728 bytes 0 Dir(s) 50,033,049,600 bytes free {code} This code should work according to the docs but it reports no file found: {code} void testFileNameFilter() throws IOException { String fileNamePrefix = 320620110821433; File f = new File(fileNamePrefix + .rwd); String filterString = fileNamePrefix + -*.RWD; FileFilter filter = new WildcardFileFilter(filterString, IOCase.SYSTEM); File dir = f.getCanonicalFile(); File[] existingFiles = dir.listFiles(filter); if (existingFiles != null) for (File f2 : existingFiles) System.out.println(f2.getName()); else System.out.println(No files found for + filterString); } {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (IO-281) WildcardFileFilter fails for wild card pattern with a '*' in it
[ https://issues.apache.org/jira/browse/IO-281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebb resolved IO-281. - Resolution: Invalid WildcardFileFilter fails for wild card pattern with a '*' in it --- Key: IO-281 URL: https://issues.apache.org/jira/browse/IO-281 Project: Commons IO Issue Type: Bug Components: Filters Affects Versions: 1.3.2 Environment: Windows XP Reporter: Dean Schulze Priority: Blocker The code below reports no files found when there is a file matching the wild card pattern. If I enter this command in a DOS windows in the same directory it finds the file so the wild card pattern is correct as far as DOS is concerned: {code} C:\dean\clipper\src\metadata.maildir 320620110821433-*.RWD Directory of C:\dean\clipper\src\metadata.mail 08/22/2011 12:36 PM 9,728 320620110821433-1.RWD 1 File(s) 9,728 bytes 0 Dir(s) 50,033,049,600 bytes free {code} This code should work according to the docs but it reports no file found: {code} void testFileNameFilter() throws IOException { String fileNamePrefix = 320620110821433; File f = new File(fileNamePrefix + .rwd); String filterString = fileNamePrefix + -*.RWD; FileFilter filter = new WildcardFileFilter(filterString, IOCase.SYSTEM); File dir = f.getCanonicalFile(); File[] existingFiles = dir.listFiles(filter); if (existingFiles != null) for (File f2 : existingFiles) System.out.println(f2.getName()); else System.out.println(No files found for + filterString); } {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (LANG-744) StringUtils throws java.security.AccessControlException on Google App Engine
[ https://issues.apache.org/jira/browse/LANG-744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebb reopened LANG-744: --- Why do we care which Exceptions can be generated? We take the same action in each case, so I don't see the point of enumerating the Exceptions, unless there is different action to be taken for some of them. But even then, we would probably need a catchall Exception. StringUtils throws java.security.AccessControlException on Google App Engine Key: LANG-744 URL: https://issues.apache.org/jira/browse/LANG-744 Project: Commons Lang Issue Type: Bug Components: lang.* Affects Versions: 3.0.1 Environment: Google App Engine Reporter: Clément Denis Fix For: 3.0.2 In the static initializer of org.apache.commons.lang3.StringUtils, there is an attempt to load the class sun.text.Normalizer. Such a class is prohibited on Google App Engine, and the static intializer throws a java.security.AccessControlException. {code} Caused by: java.security.AccessControlException: access denied (java.lang.RuntimePermission accessClassInPackage.sun.text) at java.security.AccessControlContext.checkPermission(AccessControlContext.java:374) at java.security.AccessController.checkPermission(AccessController.java:546) at java.lang.SecurityManager.checkPermission(SecurityManager.java:532) at com.google.appengine.tools.development.DevAppServerFactory$CustomSecurityManager.checkPermission(DevAppServerFactory.java:166) at java.lang.SecurityManager.checkPackageAccess(SecurityManager.java:1512) at java.lang.Class.checkMemberAccess(Class.java:2164) at java.lang.Class.getMethod(Class.java:1602) at org.apache.commons.lang3.StringUtils.clinit(StringUtils.java:739) {code} The exception should be caught in the catch clauses around loadClass(sun.text.Normalizer). Commons lang 2 worked fine on GAE. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (LANG-746) NumberUtils does not handle upper-case hex: 0X and -0X
NumberUtils does not handle upper-case hex: 0X and -0X -- Key: LANG-746 URL: https://issues.apache.org/jira/browse/LANG-746 Project: Commons Lang Issue Type: Bug Reporter: Sebb NumberUtils.createNumber() should work equally for 0x1234 and 0X1234; currently 0X1234 generates a NumberFormatException Integer.decode() handles both upper and lower case hex. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (LANG-746) NumberUtils does not handle upper-case hex: 0X and -0X
[ https://issues.apache.org/jira/browse/LANG-746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebb updated LANG-746: -- Affects Version/s: 3.0 3.0.1 Fix Version/s: 3.0.2 NumberUtils does not handle upper-case hex: 0X and -0X -- Key: LANG-746 URL: https://issues.apache.org/jira/browse/LANG-746 Project: Commons Lang Issue Type: Bug Affects Versions: 3.0, 3.0.1 Reporter: Sebb Fix For: 3.0.2 NumberUtils.createNumber() should work equally for 0x1234 and 0X1234; currently 0X1234 generates a NumberFormatException Integer.decode() handles both upper and lower case hex. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (LANG-746) NumberUtils does not handle upper-case hex: 0X and -0X
[ https://issues.apache.org/jira/browse/LANG-746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebb resolved LANG-746. --- Resolution: Fixed URL: http://svn.apache.org/viewvc?rev=1160660view=rev Log: LANG-746 NumberUtils does not handle upper-case hex: 0X and -0X Modified: commons/proper/lang/trunk/src/main/java/org/apache/commons/lang3/math/NumberUtils.java commons/proper/lang/trunk/src/site/changes/changes.xml commons/proper/lang/trunk/src/test/java/org/apache/commons/lang3/math/NumberUtilsTest.java NumberUtils does not handle upper-case hex: 0X and -0X -- Key: LANG-746 URL: https://issues.apache.org/jira/browse/LANG-746 Project: Commons Lang Issue Type: Bug Affects Versions: 3.0, 3.0.1 Reporter: Sebb NumberUtils.createNumber() should work equally for 0x1234 and 0X1234; currently 0X1234 generates a NumberFormatException Integer.decode() handles both upper and lower case hex. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (LANG-747) NumberUtils does not handle Long Hex numbers
NumberUtils does not handle Long Hex numbers Key: LANG-747 URL: https://issues.apache.org/jira/browse/LANG-747 Project: Commons Lang Issue Type: Bug Reporter: Sebb NumberUtils.createLong() does not handle hex numbers, but createInteger() handles hex and octal. This seems odd. NumberUtils.createNumber() assumes that hex numbers can only be Integer. Again, why not handle bigger Hex numbers? == It is trivial to fix createLong() - just use Long.decode() instead of valueOf(). It's not clear why this was not done originally - the decode() method was added to both Integer and Long in Java 1.2. Fixing createNumber() is also fairly easy - if the hex string has more than 8 digits, use Long. Should we allow for leading zeros in an Integer? If not, the length check is trivial. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MATH-650) FastMath has static code which slows the first access to FastMath
[ https://issues.apache.org/jira/browse/MATH-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090224#comment-13090224 ] Sebb commented on MATH-650: --- Any change needs to bear in mind that the fields need to remain thread-safe, i.e. whatever generates them must publish the values safely to all threads. This is currently achieved by using the static{} block with final fields. Seems to me there are two possible approaches to fix this: - improve the performance of the existing code - change the code to use initialisation on demand, so only the required parts are intialised. Static holder classes can probably be used here to ensure safe publication. In the case of floor(), that does not need the calculated fields, so it would speed it up. Note that the FastMath version of floor() is likely to have similar performance to Math.floor(), as it's not an algorithm that benefits from (or indeed needs) the FastMath approach. It would be useful to have performance figures for more complicated calculations, where FastMath should start to show benefits. FastMath has static code which slows the first access to FastMath - Key: MATH-650 URL: https://issues.apache.org/jira/browse/MATH-650 Project: Commons Math Issue Type: Improvement Affects Versions: Nightly Builds Environment: Android 2.3 (Dalvik VM with JIT) Reporter: Alexis Robert Priority: Minor Working on an Android application using Orekit, I've discovered that a simple FastMath.floor() takes about 4 to 5 secs on a 1GHz Nexus One phone (only the first time it's called). I've launched the Android profiling tool (traceview) and the problem seems to be linked with the static portion of FastMath code named // Initialize tables The timing resulted in : - FastMath.slowexp (40.8%) - FastMath.expint (39.2%) \- FastMath.quadmult() (95.6% of expint) - FastMath.slowlog (18.2%) Hoping that would help Thanks! Alexis Robert -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MATH-650) FastMath has static code which slows the first access to FastMath
[ https://issues.apache.org/jira/browse/MATH-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090308#comment-13090308 ] Sebb commented on MATH-650: --- bq. I wonder if we should have some way to compute these tables at compile time and have them simply loaded without recomputation. Not sure the compiler can create the values. But we could add code to print out the generated data, and then incorporate back into the source. Should be no need to update it once created, however to check the ongoing accuracy of the tables, the generating code could be moved into a test class, and used to compare against the fixed data. This would probably require some package protected helper methods to give access to the private data. Or the generator code could remain in the FastMath class, to be called by the unit test code only. FastMath has static code which slows the first access to FastMath - Key: MATH-650 URL: https://issues.apache.org/jira/browse/MATH-650 Project: Commons Math Issue Type: Improvement Affects Versions: Nightly Builds Environment: Android 2.3 (Dalvik VM with JIT) Reporter: Alexis Robert Priority: Minor Working on an Android application using Orekit, I've discovered that a simple FastMath.floor() takes about 4 to 5 secs on a 1GHz Nexus One phone (only the first time it's called). I've launched the Android profiling tool (traceview) and the problem seems to be linked with the static portion of FastMath code named // Initialize tables The timing resulted in : - FastMath.slowexp (40.8%) - FastMath.expint (39.2%) \- FastMath.quadmult() (95.6% of expint) - FastMath.slowlog (18.2%) Hoping that would help Thanks! Alexis Robert -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NET-420) Retrieving files from AS400 FTP systems returns null timestamps in FTPFile.getTimestamp
[ https://issues.apache.org/jira/browse/NET-420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13091494#comment-13091494 ] Sebb commented on NET-420: -- Can you provide access to an AS400 FTP server for testing purposes? If not, please at least provide a full listing as returned by the LIST command. Retrieving files from AS400 FTP systems returns null timestamps in FTPFile.getTimestamp --- Key: NET-420 URL: https://issues.apache.org/jira/browse/NET-420 Project: Commons Net Issue Type: Bug Components: FTP Affects Versions: 2.0 Environment: Commons Net 2.0 FTP System: AS400 systems Reporter: Ramya Rajendiran Priority: Critical We are trying to list files from AS400 systems and retrieve the timestamps from these files using the following code: FTPClientConfig conf = new FTPClientConfig(FTPClientConfig.SYST_AS400); conf.setDefaultDateFormatStr(MM/dd/yy HH:mm:ss); ftpClient.configure(conf); ftpClient.connect(hostName); FTPFile[] file = ftpClient.listFiles(remoteFileName); Calendar timeStamp = files[0].getTimestamp(); timeStamp returned is always null. I have also tried various setting other parsers.. but that also does not work: FTPListParseEngine engine = ftpClient.initiateListParsing(org.apache.commons.net.ftp.parser.OS400FTPEntryParser,remoteFileName); FTPFile[] files = engine.getNext(25); The LIST command which is used internally in the FTPClient retrieves the timestamps successfully. However after parsing the FTPFile has a null value for the timestamp field. I tried the latest commons net 3.0.1 and the problem still exists. Please help us fix this problem. It is critical to us. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (IO-280) Dubious use of mkdirs() return code
[ https://issues.apache.org/jira/browse/IO-280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebb resolved IO-280. - Resolution: Fixed Dubious use of mkdirs() return code --- Key: IO-280 URL: https://issues.apache.org/jira/browse/IO-280 Project: Commons IO Issue Type: Bug Reporter: Sebb Priority: Minor FileUtils.openOutputStream() has the following code: {code} File parent = file.getParentFile(); if (parent != null parent.exists() == false) { if (parent.mkdirs() == false) { throw new IOException(File ' + file + ' could not be created); } } {code} Now mkdirs() returns true only if the method actually created the directories; it's theoretically possible for the directory to be created in the window between the exists() and mkdirs() invocations. [Indeed the class actually checks for this in the forceMkdir() method] It would be safer to use: {code} File parent = file.getParentFile(); if (parent != null !parent.mkdirs() !parent.isDirectory()) { throw new IOException(Directory ' + parent + ' could not be created); // note changed text } } {code} Similarly elsewhere in the class where mkdirs() is used. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (NET-421) Problem connecting to TLS/SSL SMTP server using explicit mode
[ https://issues.apache.org/jira/browse/NET-421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebb resolved NET-421. -- Resolution: Fixed Problem connecting to TLS/SSL SMTP server using explicit mode - Key: NET-421 URL: https://issues.apache.org/jira/browse/NET-421 Project: Commons Net Issue Type: Bug Components: SMTP Affects Versions: 3.0, 3.0.1 Reporter: Oliver Saggau Priority: Critical Fix For: 3.1 Just tried to send an email through gmail servers by doing the following: {code}AuthenticatingSMTPClient client = new AuthenticatingSMTPClient(); client.connect(smtp.gmail.com, 587); // reply: 220 220 mx.google.com ESMTP client.login(); // reply: 250 250 mx.google.com at your service client.execTLS(); // reply: 220 2.0.0 Ready to start TLS client.auth(AUTH_METHOD.PLAIN, username, password); // exception ...{code} Unfortunality after execTLS() I get a MalformedServerReplyException. I looked at the SMTPSClient source code and found out that the reader/writer are wrong after execTLS() got called. The performSSLNegotiation() method sets _input_ and _output_ to the new input/output streams from SSLSocket, but the reader/writer are still pointing to the values set inside _connectAction_(). Possible fix for this issue: {code}public boolean execTLS() throws SSLException, IOException { if (!SMTPReply.isPositiveCompletion(sendCommand(STARTTLS))) { return false; //throw new SSLException(getReplyString()); } performSSLNegotiation(); _reader = new CRLFLineReader(new InputStreamReader(_input_, encoding)); _writer = new BufferedWriter(new OutputStreamWriter(_output_, encoding)); return true; }{code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NET-421) Problem connecting to TLS/SSL SMTP server using explicit mode
[ https://issues.apache.org/jira/browse/NET-421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebb updated NET-421: - Affects Version/s: 3.0.1 Fix Version/s: 3.1 Problem connecting to TLS/SSL SMTP server using explicit mode - Key: NET-421 URL: https://issues.apache.org/jira/browse/NET-421 Project: Commons Net Issue Type: Bug Components: SMTP Affects Versions: 3.0, 3.0.1 Reporter: Oliver Saggau Priority: Critical Fix For: 3.1 Just tried to send an email through gmail servers by doing the following: {code}AuthenticatingSMTPClient client = new AuthenticatingSMTPClient(); client.connect(smtp.gmail.com, 587); // reply: 220 220 mx.google.com ESMTP client.login(); // reply: 250 250 mx.google.com at your service client.execTLS(); // reply: 220 2.0.0 Ready to start TLS client.auth(AUTH_METHOD.PLAIN, username, password); // exception ...{code} Unfortunality after execTLS() I get a MalformedServerReplyException. I looked at the SMTPSClient source code and found out that the reader/writer are wrong after execTLS() got called. The performSSLNegotiation() method sets _input_ and _output_ to the new input/output streams from SSLSocket, but the reader/writer are still pointing to the values set inside _connectAction_(). Possible fix for this issue: {code}public boolean execTLS() throws SSLException, IOException { if (!SMTPReply.isPositiveCompletion(sendCommand(STARTTLS))) { return false; //throw new SSLException(getReplyString()); } performSSLNegotiation(); _reader = new CRLFLineReader(new InputStreamReader(_input_, encoding)); _writer = new BufferedWriter(new OutputStreamWriter(_output_, encoding)); return true; }{code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NET-420) Retrieving files from AS400 FTP systems returns null timestamps in FTPFile.getTimestamp
[ https://issues.apache.org/jira/browse/NET-420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093790#comment-13093790 ] Sebb commented on NET-420: -- Your code is using: bq. conf.setDefaultDateFormatStr(MM/dd/yy HH:mm:ss); yet the list looks like the following: bq. -rwxrwxrwx 1 RAMYARAJ 0 22 Aug 25 22:31 file.txt It's not surprising that the entries are not being parsed. Try using a date format string that corresponds with the listing output, for example: {code} conf.setDefaultDateFormatStr(dd MMM yy HH:mm:ss); {code} Retrieving files from AS400 FTP systems returns null timestamps in FTPFile.getTimestamp --- Key: NET-420 URL: https://issues.apache.org/jira/browse/NET-420 Project: Commons Net Issue Type: Bug Components: FTP Affects Versions: 2.0, 3.0.1 Environment: Commons Net 2.0 FTP System: AS400 systems Reporter: Ramya Rajendiran Priority: Critical We are trying to list files from AS400 systems and retrieve the timestamps from these files using the following code: FTPClientConfig conf = new FTPClientConfig(FTPClientConfig.SYST_AS400); conf.setDefaultDateFormatStr(MM/dd/yy HH:mm:ss); ftpClient.configure(conf); ftpClient.connect(hostName); FTPFile[] file = ftpClient.listFiles(remoteFileName); Calendar timeStamp = files[0].getTimestamp(); timeStamp returned is always null. I have also tried various setting other parsers.. but that also does not work: FTPListParseEngine engine = ftpClient.initiateListParsing(org.apache.commons.net.ftp.parser.OS400FTPEntryParser,remoteFileName); FTPFile[] files = engine.getNext(25); The LIST command which is used internally in the FTPClient retrieves the timestamps successfully. However after parsing the FTPFile has a null value for the timestamp field. I tried the latest commons net 3.0.1 and the problem still exists. Please help us fix this problem. It is critical to us. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NET-420) Retrieving files from AS400 FTP systems returns null timestamps in FTPFile.getTimestamp
[ https://issues.apache.org/jira/browse/NET-420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebb updated NET-420: - Priority: Minor (was: Critical) Retrieving files from AS400 FTP systems returns null timestamps in FTPFile.getTimestamp --- Key: NET-420 URL: https://issues.apache.org/jira/browse/NET-420 Project: Commons Net Issue Type: Bug Components: FTP Affects Versions: 2.0, 3.0.1 Environment: Commons Net 2.0 FTP System: AS400 systems Reporter: Ramya Rajendiran Priority: Minor We are trying to list files from AS400 systems and retrieve the timestamps from these files using the following code: FTPClientConfig conf = new FTPClientConfig(FTPClientConfig.SYST_AS400); conf.setDefaultDateFormatStr(MM/dd/yy HH:mm:ss); ftpClient.configure(conf); ftpClient.connect(hostName); FTPFile[] file = ftpClient.listFiles(remoteFileName); Calendar timeStamp = files[0].getTimestamp(); timeStamp returned is always null. I have also tried various setting other parsers.. but that also does not work: FTPListParseEngine engine = ftpClient.initiateListParsing(org.apache.commons.net.ftp.parser.OS400FTPEntryParser,remoteFileName); FTPFile[] files = engine.getNext(25); The LIST command which is used internally in the FTPClient retrieves the timestamps successfully. However after parsing the FTPFile has a null value for the timestamp field. I tried the latest commons net 3.0.1 and the problem still exists. Please help us fix this problem. It is critical to us. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (NET-420) Retrieving files from AS400 FTP systems returns null timestamps in FTPFile.getTimestamp
[ https://issues.apache.org/jira/browse/NET-420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebb updated NET-420: - Affects Version/s: 3.0.1 Retrieving files from AS400 FTP systems returns null timestamps in FTPFile.getTimestamp --- Key: NET-420 URL: https://issues.apache.org/jira/browse/NET-420 Project: Commons Net Issue Type: Bug Components: FTP Affects Versions: 2.0, 3.0.1 Environment: Commons Net 2.0 FTP System: AS400 systems Reporter: Ramya Rajendiran Priority: Minor We are trying to list files from AS400 systems and retrieve the timestamps from these files using the following code: FTPClientConfig conf = new FTPClientConfig(FTPClientConfig.SYST_AS400); conf.setDefaultDateFormatStr(MM/dd/yy HH:mm:ss); ftpClient.configure(conf); ftpClient.connect(hostName); FTPFile[] file = ftpClient.listFiles(remoteFileName); Calendar timeStamp = files[0].getTimestamp(); timeStamp returned is always null. I have also tried various setting other parsers.. but that also does not work: FTPListParseEngine engine = ftpClient.initiateListParsing(org.apache.commons.net.ftp.parser.OS400FTPEntryParser,remoteFileName); FTPFile[] files = engine.getNext(25); The LIST command which is used internally in the FTPClient retrieves the timestamps successfully. However after parsing the FTPFile has a null value for the timestamp field. I tried the latest commons net 3.0.1 and the problem still exists. Please help us fix this problem. It is critical to us. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NET-153) Add getCause method to CopyStreamException
[ https://issues.apache.org/jira/browse/NET-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093823#comment-13093823 ] Sebb commented on NET-153: -- Yes, the update does seem to have been lost. However, now that we require Java 1.5, and getCause()/initCause() were added to Throwable in 1.4, would it not be better to use the underlying cause field from Throwable? i.e. rather than overriding getCause, we should store the cause using initCause(), and update the method as follows: {code} public IOException getIOException() { return getCause(); } {code} This could further be simplified to merge the initCause() in the super(message) method invocation once Java 1.6 is a minimum requirement. Add getCause method to CopyStreamException -- Key: NET-153 URL: https://issues.apache.org/jira/browse/NET-153 Project: Commons Net Issue Type: Improvement Affects Versions: 1.4 Reporter: Dan Godfrey Priority: Trivial Fix For: 2.0 Attachments: CopyStreamException.patch Add a getCause method to CopyStreamException that has the same signature as Throwable#getCause from JDK 1.4 and returns the wrapped IOException. This will override the existing getCause method in version of Java 1.4 and hence include the IOExceptions stack trace in the CopyStreamExceptions stack trace or just be ignored in Java 1.3. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (LANG-744) StringUtils throws java.security.AccessControlException on Google App Engine
[ https://issues.apache.org/jira/browse/LANG-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13095211#comment-13095211 ] Sebb commented on LANG-744: --- The message will be thrown even if the sun method is not needed; that does not seem right. If the sun method is unavailable, the code that conditionally calls it throws UnsupportedOperationException: The stripAccents(CharSequence) method requires at least Java 1.6 or a Sun JVM); We could record the Exception in the static block, and add it as the cause for the UOE. It would then only appear when necessary. StringUtils throws java.security.AccessControlException on Google App Engine Key: LANG-744 URL: https://issues.apache.org/jira/browse/LANG-744 Project: Commons Lang Issue Type: Bug Components: lang.* Affects Versions: 3.0.1 Environment: Google App Engine Reporter: Clément Denis Fix For: 3.0.2 In the static initializer of org.apache.commons.lang3.StringUtils, there is an attempt to load the class sun.text.Normalizer. Such a class is prohibited on Google App Engine, and the static intializer throws a java.security.AccessControlException. {code} Caused by: java.security.AccessControlException: access denied (java.lang.RuntimePermission accessClassInPackage.sun.text) at java.security.AccessControlContext.checkPermission(AccessControlContext.java:374) at java.security.AccessController.checkPermission(AccessController.java:546) at java.lang.SecurityManager.checkPermission(SecurityManager.java:532) at com.google.appengine.tools.development.DevAppServerFactory$CustomSecurityManager.checkPermission(DevAppServerFactory.java:166) at java.lang.SecurityManager.checkPackageAccess(SecurityManager.java:1512) at java.lang.Class.checkMemberAccess(Class.java:2164) at java.lang.Class.getMethod(Class.java:1602) at org.apache.commons.lang3.StringUtils.clinit(StringUtils.java:739) {code} The exception should be caught in the catch clauses around loadClass(sun.text.Normalizer). Commons lang 2 worked fine on GAE. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CHAIN-54) upgrate JUnit dependency to latest released version and adapt tests
[ https://issues.apache.org/jira/browse/CHAIN-54?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13095982#comment-13095982 ] Sebb commented on CHAIN-54: --- Latest version is 4.9. AFAIK this is backwards compatible, so tests don't *have* to be updated. However, updating tests to the new way using annotations can make them easier to read and maintain. upgrate JUnit dependency to latest released version and adapt tests --- Key: CHAIN-54 URL: https://issues.apache.org/jira/browse/CHAIN-54 Project: Commons Chain Issue Type: Improvement Affects Versions: 2.0 Reporter: Simone Tripodi Assignee: Simone Tripodi Fix For: 2.0 JUnit dependency has to be migrated to latest stable 4.X released - and tests consequently have to be updated -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MATH-650) FastMath has static code which slows the first access to FastMath
[ https://issues.apache.org/jira/browse/MATH-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096971#comment-13096971 ] Sebb commented on MATH-650: --- I think the simplest would be to just print out the values of the arrays at the end of the static block, and feed them back into the code. Rather than deleting the setup code, it could be left as documentation - either commented out or disabled via an if (false) block. I've done most of the work to implement this. Thoughts? FastMath has static code which slows the first access to FastMath - Key: MATH-650 URL: https://issues.apache.org/jira/browse/MATH-650 Project: Commons Math Issue Type: Improvement Affects Versions: Nightly Builds Environment: Android 2.3 (Dalvik VM with JIT) Reporter: Alexis Robert Priority: Minor Working on an Android application using Orekit, I've discovered that a simple FastMath.floor() takes about 4 to 5 secs on a 1GHz Nexus One phone (only the first time it's called). I've launched the Android profiling tool (traceview) and the problem seems to be linked with the static portion of FastMath code named // Initialize tables The timing resulted in : - FastMath.slowexp (40.8%) - FastMath.expint (39.2%) \- FastMath.quadmult() (95.6% of expint) - FastMath.slowlog (18.2%) Hoping that would help Thanks! Alexis Robert -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MATH-650) FastMath has static code which slows the first access to FastMath
[ https://issues.apache.org/jira/browse/MATH-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13097063#comment-13097063 ] Sebb commented on MATH-650: --- Yes, I did suggest that in an earlier comment. However turns out it's quite a bit of extra work to do so, which I was hoping to avoid. Also there is already a unit test which compares the accuracy with Dfp. FastMath has static code which slows the first access to FastMath - Key: MATH-650 URL: https://issues.apache.org/jira/browse/MATH-650 Project: Commons Math Issue Type: Improvement Affects Versions: Nightly Builds Environment: Android 2.3 (Dalvik VM with JIT) Reporter: Alexis Robert Priority: Minor Working on an Android application using Orekit, I've discovered that a simple FastMath.floor() takes about 4 to 5 secs on a 1GHz Nexus One phone (only the first time it's called). I've launched the Android profiling tool (traceview) and the problem seems to be linked with the static portion of FastMath code named // Initialize tables The timing resulted in : - FastMath.slowexp (40.8%) - FastMath.expint (39.2%) \- FastMath.quadmult() (95.6% of expint) - FastMath.slowlog (18.2%) Hoping that would help Thanks! Alexis Robert -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (LANG-744) StringUtils throws java.security.AccessControlException on Google App Engine
[ https://issues.apache.org/jira/browse/LANG-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13098076#comment-13098076 ] Sebb commented on LANG-744: --- Reworked static init in r1165701. StringUtils throws java.security.AccessControlException on Google App Engine Key: LANG-744 URL: https://issues.apache.org/jira/browse/LANG-744 Project: Commons Lang Issue Type: Bug Components: lang.* Affects Versions: 3.0.1 Environment: Google App Engine Reporter: Clément Denis Fix For: 3.0.2 In the static initializer of org.apache.commons.lang3.StringUtils, there is an attempt to load the class sun.text.Normalizer. Such a class is prohibited on Google App Engine, and the static intializer throws a java.security.AccessControlException. {code} Caused by: java.security.AccessControlException: access denied (java.lang.RuntimePermission accessClassInPackage.sun.text) at java.security.AccessControlContext.checkPermission(AccessControlContext.java:374) at java.security.AccessController.checkPermission(AccessController.java:546) at java.lang.SecurityManager.checkPermission(SecurityManager.java:532) at com.google.appengine.tools.development.DevAppServerFactory$CustomSecurityManager.checkPermission(DevAppServerFactory.java:166) at java.lang.SecurityManager.checkPackageAccess(SecurityManager.java:1512) at java.lang.Class.checkMemberAccess(Class.java:2164) at java.lang.Class.getMethod(Class.java:1602) at org.apache.commons.lang3.StringUtils.clinit(StringUtils.java:739) {code} The exception should be caught in the catch clauses around loadClass(sun.text.Normalizer). Commons lang 2 worked fine on GAE. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MATH-621) BOBYQA is missing in optimization
[ https://issues.apache.org/jira/browse/MATH-621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13098079#comment-13098079 ] Sebb commented on MATH-621: --- Hover over the comment you want to edit; there will be an edit icon in the top rhs of the grey box. BOBYQA is missing in optimization - Key: MATH-621 URL: https://issues.apache.org/jira/browse/MATH-621 Project: Commons Math Issue Type: New Feature Affects Versions: 3.0 Reporter: Dr. Dietmar Wolz Fix For: 3.0 Attachments: BOBYQA.math.patch, BOBYQA.v02.math.patch, BOBYQAOptimizer.java.patch, BOBYQAOptimizer0.4.zip, bobyqa.zip, bobyqa_convert.pl, bobyqaoptimizer0.4.zip, bobyqav0.3.zip Original Estimate: 8h Remaining Estimate: 8h During experiments with space flight trajectory optimizations I recently observed, that the direct optimization algorithm BOBYQA http://plato.asu.edu/ftp/other_software/bobyqa.zip from Mike Powell is significantly better than the simple Powell algorithm already in commons.math. It uses significantly lower function calls and is more reliable for high dimensional problems. You can replace CMA-ES in many more application cases by BOBYQA than by the simple Powell optimizer. I would like to contribute a Java port of the algorithm. I maintained the structure of the original FORTRAN code, so the code is fast but not very nice. License status: Michael Powell has sent the agreement via snail mail - it hasn't arrived yet. Progress: The attached patch relative to the trunk contains both the optimizer and the related unit tests - which are all green now. Performance: Performance difference (number of function evaluations) PowellOptimizer / BOBYQA for different test functions (taken from the unit test of BOBYQA, dimension=13 for most of the tests. Rosen = 9350 / 1283 MinusElli = 118 / 59 Elli = 223 / 58 ElliRotated = 8626 / 1379 Cigar = 353 / 60 TwoAxes = 223 / 66 CigTab = 362 / 60 Sphere = 223 / 58 Tablet = 223 / 58 DiffPow = 421 / 928 SsDiffPow = 614 / 219 Ackley = 757 / 97 Rastrigin = 340 / 64 The number for DiffPow should be dicussed with Michael Powell, I will send him the details. Open Problems: Some checkstyle violations because of the original Fortran source: - Original method comments were copied - doesn't follow javadoc standard - Multiple variable declarations in one line as in the original source - Problems related to goto conversions: gotos not convertible in loops were transated into a finite automata (switch statement) no default in switch fall through from previos case in switch which usually are bad style make no sense here. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MATH-650) FastMath has static code which slows the first access to FastMath
[ https://issues.apache.org/jira/browse/MATH-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13098883#comment-13098883 ] Sebb commented on MATH-650: --- FastMath has been updated to use preset tables, eliminating the static setup code. @Alexis: Would you be able to check if the changes have helped on Android? There is a SNAPSHOT available at: https://repository.apache.org/content/repositories/snapshots/org/apache/commons/commons-math/3.0-SNAPSHOT/ commons-math-3.0-20110907.123252-61.jar FastMath has static code which slows the first access to FastMath - Key: MATH-650 URL: https://issues.apache.org/jira/browse/MATH-650 Project: Commons Math Issue Type: Improvement Affects Versions: Nightly Builds Environment: Android 2.3 (Dalvik VM with JIT) Reporter: Alexis Robert Priority: Minor Working on an Android application using Orekit, I've discovered that a simple FastMath.floor() takes about 4 to 5 secs on a 1GHz Nexus One phone (only the first time it's called). I've launched the Android profiling tool (traceview) and the problem seems to be linked with the static portion of FastMath code named // Initialize tables The timing resulted in : - FastMath.slowexp (40.8%) - FastMath.expint (39.2%) \- FastMath.quadmult() (95.6% of expint) - FastMath.slowlog (18.2%) Hoping that would help Thanks! Alexis Robert -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MATH-650) FastMath has static code which slows the first access to FastMath
[ https://issues.apache.org/jira/browse/MATH-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13099185#comment-13099185 ] Sebb commented on MATH-650: --- It appears that the new code is almost twice as fast as the old. However, it can still take 20-30ms to initialise the class. This seems to be because of the large array initialisations. I hacked the code to comment out most of the array entries, leaving just one or two in each of the large arrays, and that improved the startup time to about 6 times as fast - about 6-7ms. [Of course that code won't work properly] So it might be worth attempting initialisation on demand, using a static holder class that contains the pre-calculated data. There was also a slight speed up from removing all the unused initialisation code and its data items. FastMath has static code which slows the first access to FastMath - Key: MATH-650 URL: https://issues.apache.org/jira/browse/MATH-650 Project: Commons Math Issue Type: Improvement Affects Versions: Nightly Builds Environment: Android 2.3 (Dalvik VM with JIT) Reporter: Alexis Robert Priority: Minor Working on an Android application using Orekit, I've discovered that a simple FastMath.floor() takes about 4 to 5 secs on a 1GHz Nexus One phone (only the first time it's called). I've launched the Android profiling tool (traceview) and the problem seems to be linked with the static portion of FastMath code named // Initialize tables The timing resulted in : - FastMath.slowexp (40.8%) - FastMath.expint (39.2%) \- FastMath.quadmult() (95.6% of expint) - FastMath.slowlog (18.2%) Hoping that would help Thanks! Alexis Robert -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MATH-650) FastMath has static code which slows the first access to FastMath
[ https://issues.apache.org/jira/browse/MATH-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13099597#comment-13099597 ] Sebb commented on MATH-650: --- Yes, I looked at IODH. Turns out that the holder is not required. Instead, one can use a static class which contains the initial data: {code} public class FastMath{ private static class lnMant { private static final double LN_MANT[][] = { ... }; ... double d = lnMant.LN_MANT[j][j]; // was double d = LN_MANT[i][j]; } } {code} Very simple to implement; doing that plus commenting out all init code and data results in speed-up of about 6 times for FastMath.max(). Does not seem to affect performance of method calls once its table(s) has/ve been loaded. What remains to be decided is what to do with the init code. Some of it might be useful in its own right - Taylor expansions for sine/cosine etc. Perhaps create another class (SlowMath anyone?) in the same package. And/or move it to the test tree? FastMath has static code which slows the first access to FastMath - Key: MATH-650 URL: https://issues.apache.org/jira/browse/MATH-650 Project: Commons Math Issue Type: Improvement Affects Versions: Nightly Builds Environment: Android 2.3 (Dalvik VM with JIT) Reporter: Alexis Robert Priority: Minor Working on an Android application using Orekit, I've discovered that a simple FastMath.floor() takes about 4 to 5 secs on a 1GHz Nexus One phone (only the first time it's called). I've launched the Android profiling tool (traceview) and the problem seems to be linked with the static portion of FastMath code named // Initialize tables The timing resulted in : - FastMath.slowexp (40.8%) - FastMath.expint (39.2%) \- FastMath.quadmult() (95.6% of expint) - FastMath.slowlog (18.2%) Hoping that would help Thanks! Alexis Robert -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MATH-650) FastMath has static code which slows the first access to FastMath
[ https://issues.apache.org/jira/browse/MATH-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13099617#comment-13099617 ] Sebb commented on MATH-650: --- Another snapshot uploaded as commons-math-3.0-20110907.222813-62.jar FastMath has static code which slows the first access to FastMath - Key: MATH-650 URL: https://issues.apache.org/jira/browse/MATH-650 Project: Commons Math Issue Type: Improvement Affects Versions: Nightly Builds Environment: Android 2.3 (Dalvik VM with JIT) Reporter: Alexis Robert Priority: Minor Working on an Android application using Orekit, I've discovered that a simple FastMath.floor() takes about 4 to 5 secs on a 1GHz Nexus One phone (only the first time it's called). I've launched the Android profiling tool (traceview) and the problem seems to be linked with the static portion of FastMath code named // Initialize tables The timing resulted in : - FastMath.slowexp (40.8%) - FastMath.expint (39.2%) \- FastMath.quadmult() (95.6% of expint) - FastMath.slowlog (18.2%) Hoping that would help Thanks! Alexis Robert -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MATH-650) FastMath has static code which slows the first access to FastMath
[ https://issues.apache.org/jira/browse/MATH-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13099657#comment-13099657 ] Sebb commented on MATH-650: --- In my tests, I found that pre-calculating the data is about twice as fast as calculating it in the static block. That seems a worthwhile improvement to me. Converting the larger preset data tables to IOD gives a massive improvement for routines that don't need any of the IOD tables, and gives corresponding improvements for methods that only use some of the IOD tables. It's also trivial to do, so I did it. Tidying up the code to move the now-unused init code makes only a minor improvement to load times, but is worth it for readability and maintenance. You can do the tests if you want. Math 2.2 is the original FastMath implementation https://repository.apache.org/content/repositories/snapshots/org/apache/commons/commons-math/3.0-SNAPSHOT/ has the jars: commons-math-3.0-20110907.123252-61.jar - preset arrays commons-math-3.0-20110907.222813-62.jar - the IOD code So yes, if an application makes lots of calls the overhead will gradually fade away, but the overhead is very large. In my test I used exp(1000) which uses an IOD table. The repeat time for that is about 5000ns. The first times are approx 40,000,000ns (original) and 8,000,000ns (current). I agree that the lazy init does not help applications that use all the tables. However all applications perform better if the table calculation is done beforehand. FastMath has static code which slows the first access to FastMath - Key: MATH-650 URL: https://issues.apache.org/jira/browse/MATH-650 Project: Commons Math Issue Type: Improvement Affects Versions: Nightly Builds Environment: Android 2.3 (Dalvik VM with JIT) Reporter: Alexis Robert Priority: Minor Working on an Android application using Orekit, I've discovered that a simple FastMath.floor() takes about 4 to 5 secs on a 1GHz Nexus One phone (only the first time it's called). I've launched the Android profiling tool (traceview) and the problem seems to be linked with the static portion of FastMath code named // Initialize tables The timing resulted in : - FastMath.slowexp (40.8%) - FastMath.expint (39.2%) \- FastMath.quadmult() (95.6% of expint) - FastMath.slowlog (18.2%) Hoping that would help Thanks! Alexis Robert -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MATH-658) Dead code in FastMath.pow(double, double) and some improvement in test coverage
[ https://issues.apache.org/jira/browse/MATH-658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13100397#comment-13100397 ] Sebb commented on MATH-658: --- The body format (3rd line onwards) is OK, but the header lines are incorrect. They should look something like: {code} Index: src/main/java/org/apache/commons/math/util/FastMath.java === --- src/main/java/org/apache/commons/math/util/FastMath.java(revision 1166437) +++ src/main/java/org/apache/commons/math/util/FastMath.java(working copy) {code} This was created by updating the file in an SVN working copy and then creating the patch (I used Eclipse, specifying project-relative mode, but svn diff would produce much the same output). Your patch has completely different names and paths for the input and output files: {code} --- D:/DOCUME~1/tanguyy/LOCALS~1/Temp/FastMath.java-revBASE.svn000.tmp.java jeu. sept. 8 16:28:36 2011 +++ D:/DONNEES/ATELIER_JAVA/workspace/Commons-Math_Trunk/src/main/java/org/apache/commons/math/util/FastMath.java jeu. sept. 8 16:10:02 2011 {code} This means it's impossible to apply the patch automatically. However, it's not too difficult to fix the header lines, e.g. in the above case to: {code} --- FastMath.java jeu. sept. 8 16:28:36 2011 +++ FastMath.java jeu. sept. 8 16:10:02 2011 {code} and the patch can then be applied in the appropriate directory. No need to resubmit these particular patches, but if you submit any more please use the proper unified diff format relative to the top-level project directory, so paths start with src/. Dead code in FastMath.pow(double, double) and some improvement in test coverage --- Key: MATH-658 URL: https://issues.apache.org/jira/browse/MATH-658 Project: Commons Math Issue Type: Improvement Reporter: Yannick TANGUY Priority: Minor Fix For: 3.0 Attachments: FastMath.java.diff, FastMathTest.java, FastMathTest.java.diff This issue concerns the FastMath class and its test class. (1) In the double pow(double, double) function, there are 2 identical if blocks. The second one can be suppressed. if (y 0 y == yi (yi 1) == 1) { return Double.NEGATIVE_INFINITY; } // this block is never used - to be suppressed if (y 0 y == yi (yi 1) == 1) { return -0.0; } if (y 0 y == yi (yi 1) == 1) { return -0.0; } (2) To obtain better code coverage, we added some tests case in FastMathTest.java (see attached file) - Added test for log1p - Added tests in testPowSpecialCases() - Added tests for a 100% coverage of acos(). - Added tests for a 100% coverage of asin(). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MATH-658) Dead code in FastMath.pow(double, double) and some improvement in test coverage
[ https://issues.apache.org/jira/browse/MATH-658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebb resolved MATH-658. --- Resolution: Fixed Patches applied. Dead code in FastMath.pow(double, double) and some improvement in test coverage --- Key: MATH-658 URL: https://issues.apache.org/jira/browse/MATH-658 Project: Commons Math Issue Type: Improvement Reporter: Yannick TANGUY Priority: Minor Fix For: 3.0 Attachments: FastMath.java.diff, FastMathTest.java, FastMathTest.java.diff This issue concerns the FastMath class and its test class. (1) In the double pow(double, double) function, there are 2 identical if blocks. The second one can be suppressed. if (y 0 y == yi (yi 1) == 1) { return Double.NEGATIVE_INFINITY; } // this block is never used - to be suppressed if (y 0 y == yi (yi 1) == 1) { return -0.0; } if (y 0 y == yi (yi 1) == 1) { return -0.0; } (2) To obtain better code coverage, we added some tests case in FastMathTest.java (see attached file) - Added test for log1p - Added tests in testPowSpecialCases() - Added tests for a 100% coverage of acos(). - Added tests for a 100% coverage of asin(). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (LANG-747) NumberUtils does not handle Long Hex numbers
[ https://issues.apache.org/jira/browse/LANG-747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebb updated LANG-747: -- Component/s: lang.math.* NumberUtils does not handle Long Hex numbers Key: LANG-747 URL: https://issues.apache.org/jira/browse/LANG-747 Project: Commons Lang Issue Type: Bug Components: lang.math.* Reporter: Sebb NumberUtils.createLong() does not handle hex numbers, but createInteger() handles hex and octal. This seems odd. NumberUtils.createNumber() assumes that hex numbers can only be Integer. Again, why not handle bigger Hex numbers? == It is trivial to fix createLong() - just use Long.decode() instead of valueOf(). It's not clear why this was not done originally - the decode() method was added to both Integer and Long in Java 1.2. Fixing createNumber() is also fairly easy - if the hex string has more than 8 digits, use Long. Should we allow for leading zeros in an Integer? If not, the length check is trivial. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (LANG-752) Fix createLong() so it behaves like createInteger()
Fix createLong() so it behaves like createInteger() --- Key: LANG-752 URL: https://issues.apache.org/jira/browse/LANG-752 Project: Commons Lang Issue Type: Sub-task Reporter: Sebb NumberUtils.createLong() does not handle hex numbers, but createInteger() handles hex and octal. Fix it by using Long.decode() instead of Long.valueOf(). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (LANG-752) Fix createLong() so it behaves like createInteger()
[ https://issues.apache.org/jira/browse/LANG-752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebb resolved LANG-752. --- Resolution: Fixed Fix Version/s: 3.0.2 Fix createLong() so it behaves like createInteger() --- Key: LANG-752 URL: https://issues.apache.org/jira/browse/LANG-752 Project: Commons Lang Issue Type: Sub-task Components: lang.math.* Reporter: Sebb Fix For: 3.0.2 NumberUtils.createLong() does not handle hex numbers, but createInteger() handles hex and octal. Fix it by using Long.decode() instead of Long.valueOf(). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CONFIGURATION-454) Malformed pom uploaded to repositories
[ https://issues.apache.org/jira/browse/CONFIGURATION-454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13100497#comment-13100497 ] Sebb commented on CONFIGURATION-454: Maven Central won't allow updates to jars etc. because that would mean builds were not repeatable. They do allow hashes and sigs to be replaced, and they may allow the url to be fixed as it does not affect builds. Worth reporting in case. Malformed pom uploaded to repositories -- Key: CONFIGURATION-454 URL: https://issues.apache.org/jira/browse/CONFIGURATION-454 Project: Commons Configuration Issue Type: Bug Components: Build Affects Versions: 1.6 Reporter: Kevin Meyer Priority: Minor Labels: maven The pom downloaded, for example, from: http://uk.maven.org/maven2/commons-configuration/commons-configuration/1.6/commons-configuration-1.6.pom is damaged: e.g. the url is given as: urlhttp://commons.apache.org/${pom.artifactId.substring(8)}//url directory${basedir}/directory etc. This affects the generated licenses for other projects. Compare with commons-collections, which is fine. Problems seems to be in trunk/project.xml ? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MATH-650) FastMath has static code which slows the first access to FastMath
[ https://issues.apache.org/jira/browse/MATH-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13100733#comment-13100733 ] Sebb commented on MATH-650: --- bq. As far as I understand, only the tables that are used are loaded. Yes. I used lazy init for the larger tables only. There are two paired tables, each pair in its own class, and another table in a third class. The tables are only referenced where they are used. FastMath has static code which slows the first access to FastMath - Key: MATH-650 URL: https://issues.apache.org/jira/browse/MATH-650 Project: Commons Math Issue Type: Improvement Affects Versions: Nightly Builds Environment: Android 2.3 (Dalvik VM with JIT) Reporter: Alexis Robert Priority: Minor Working on an Android application using Orekit, I've discovered that a simple FastMath.floor() takes about 4 to 5 secs on a 1GHz Nexus One phone (only the first time it's called). I've launched the Android profiling tool (traceview) and the problem seems to be linked with the static portion of FastMath code named // Initialize tables The timing resulted in : - FastMath.slowexp (40.8%) - FastMath.expint (39.2%) \- FastMath.quadmult() (95.6% of expint) - FastMath.slowlog (18.2%) Hoping that would help Thanks! Alexis Robert -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MATH-658) Dead code in FastMath.pow(double, double) and some improvement in test coverage
[ https://issues.apache.org/jira/browse/MATH-658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13100734#comment-13100734 ] Sebb commented on MATH-658: --- Thanks - I did see the wildcard import, but left it as it is test code so not so important. Dead code in FastMath.pow(double, double) and some improvement in test coverage --- Key: MATH-658 URL: https://issues.apache.org/jira/browse/MATH-658 Project: Commons Math Issue Type: Improvement Reporter: Yannick TANGUY Priority: Minor Fix For: 3.0 Attachments: FastMath.java.diff, FastMathTest.java, FastMathTest.java.diff This issue concerns the FastMath class and its test class. (1) In the double pow(double, double) function, there are 2 identical if blocks. The second one can be suppressed. if (y 0 y == yi (yi 1) == 1) { return Double.NEGATIVE_INFINITY; } // this block is never used - to be suppressed if (y 0 y == yi (yi 1) == 1) { return -0.0; } if (y 0 y == yi (yi 1) == 1) { return -0.0; } (2) To obtain better code coverage, we added some tests case in FastMathTest.java (see attached file) - Added test for log1p - Added tests in testPowSpecialCases() - Added tests for a 100% coverage of acos(). - Added tests for a 100% coverage of asin(). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MATH-658) Dead code in FastMath.pow(double, double) and some improvement in test coverage
[ https://issues.apache.org/jira/browse/MATH-658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13100746#comment-13100746 ] Sebb commented on MATH-658: --- Also, just noticed some tab characters in the test class patch which I have just fixed. We don't allow tabs. Dead code in FastMath.pow(double, double) and some improvement in test coverage --- Key: MATH-658 URL: https://issues.apache.org/jira/browse/MATH-658 Project: Commons Math Issue Type: Improvement Reporter: Yannick TANGUY Priority: Minor Fix For: 3.0 Attachments: FastMath.java.diff, FastMathTest.java, FastMathTest.java.diff This issue concerns the FastMath class and its test class. (1) In the double pow(double, double) function, there are 2 identical if blocks. The second one can be suppressed. if (y 0 y == yi (yi 1) == 1) { return Double.NEGATIVE_INFINITY; } // this block is never used - to be suppressed if (y 0 y == yi (yi 1) == 1) { return -0.0; } if (y 0 y == yi (yi 1) == 1) { return -0.0; } (2) To obtain better code coverage, we added some tests case in FastMathTest.java (see attached file) - Added test for log1p - Added tests in testPowSpecialCases() - Added tests for a 100% coverage of acos(). - Added tests for a 100% coverage of asin(). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MATH-658) Dead code in FastMath.pow(double, double) and some improvement in test coverage
[ https://issues.apache.org/jira/browse/MATH-658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13101203#comment-13101203 ] Sebb commented on MATH-658: --- Thanks, format looks OK now. @Luc - sorry, should have noticed the incorrect testing code. If I'm being picky, I'd say that code such as {code} // Logp of -1.0 should be -Inf Assert.assertTrue(Double.isInfinite(FastMath.log1p(-1.0))); {code} would be better expressed as {code} Assert.assertTrue(Logp of -1.0 should be -Inf,Double.isInfinite(FastMath.log1p(-1.0))); {code} because it's then obvious what the error is without needing to check which line has failed. [And what if the test class has been amended since the test run?] No need to resubmit; I can fix that later, but please consider for future patches. Dead code in FastMath.pow(double, double) and some improvement in test coverage --- Key: MATH-658 URL: https://issues.apache.org/jira/browse/MATH-658 Project: Commons Math Issue Type: Improvement Reporter: Yannick TANGUY Priority: Minor Fix For: 3.0 Attachments: FastMath.java.diff, FastMathTest.java.diff This issue concerns the FastMath class and its test class. (1) In the double pow(double, double) function, there are 2 identical if blocks. The second one can be suppressed. if (y 0 y == yi (yi 1) == 1) { return Double.NEGATIVE_INFINITY; } // this block is never used - to be suppressed if (y 0 y == yi (yi 1) == 1) { return -0.0; } if (y 0 y == yi (yi 1) == 1) { return -0.0; } (2) To obtain better code coverage, we added some tests case in FastMathTest.java (see attached file) - Added test for log1p - Added tests in testPowSpecialCases() - Added tests for a 100% coverage of acos(). - Added tests for a 100% coverage of asin(). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MATH-658) Dead code in FastMath.pow(double, double) and some improvement in test coverage
[ https://issues.apache.org/jira/browse/MATH-658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebb resolved MATH-658. --- Resolution: Fixed Hope this is now better resolved ... Dead code in FastMath.pow(double, double) and some improvement in test coverage --- Key: MATH-658 URL: https://issues.apache.org/jira/browse/MATH-658 Project: Commons Math Issue Type: Improvement Reporter: Yannick TANGUY Priority: Minor Fix For: 3.0 Attachments: FastMath.java.diff, FastMathTest.java.diff This issue concerns the FastMath class and its test class. (1) In the double pow(double, double) function, there are 2 identical if blocks. The second one can be suppressed. if (y 0 y == yi (yi 1) == 1) { return Double.NEGATIVE_INFINITY; } // this block is never used - to be suppressed if (y 0 y == yi (yi 1) == 1) { return -0.0; } if (y 0 y == yi (yi 1) == 1) { return -0.0; } (2) To obtain better code coverage, we added some tests case in FastMathTest.java (see attached file) - Added test for log1p - Added tests in testPowSpecialCases() - Added tests for a 100% coverage of acos(). - Added tests for a 100% coverage of asin(). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MATH-650) FastMath has static code which slows the first access to FastMath
[ https://issues.apache.org/jira/browse/MATH-650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebb updated MATH-650: -- Attachment: FastMathLoadCheck.java Very simple test to demonstrate effect of IOD and calculate. Requires that FastMath.USE_PRECOMPUTED_TABLES be set to package-protected and non-final. Should be set back to final before release. FastMath has static code which slows the first access to FastMath - Key: MATH-650 URL: https://issues.apache.org/jira/browse/MATH-650 Project: Commons Math Issue Type: Improvement Affects Versions: Nightly Builds Environment: Android 2.3 (Dalvik VM with JIT) Reporter: Alexis Robert Priority: Minor Attachments: FastMathLoadCheck.java Working on an Android application using Orekit, I've discovered that a simple FastMath.floor() takes about 4 to 5 secs on a 1GHz Nexus One phone (only the first time it's called). I've launched the Android profiling tool (traceview) and the problem seems to be linked with the static portion of FastMath code named // Initialize tables The timing resulted in : - FastMath.slowexp (40.8%) - FastMath.expint (39.2%) \- FastMath.quadmult() (95.6% of expint) - FastMath.slowlog (18.2%) Hoping that would help Thanks! Alexis Robert -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (MATH-650) FastMath has static code which slows the first access to FastMath
[ https://issues.apache.org/jira/browse/MATH-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102300#comment-13102300 ] Sebb edited comment on MATH-650 at 9/11/11 4:48 PM: Very simple test to demonstrate effect of IOD and calculate. Requires that FastMath.USE_PRECOMPUTED_TABLES be set to package-protected and non-final. Should be set back to final before release. was (Author: s...@apache.org): Very simple test to demonstrate effect of IOD and calculate. Requires that FastMath.USE_PRECOMPUTED_TABLES be set to package-protected and non-final. Should be set back to final before release. FastMath has static code which slows the first access to FastMath - Key: MATH-650 URL: https://issues.apache.org/jira/browse/MATH-650 Project: Commons Math Issue Type: Improvement Affects Versions: Nightly Builds Environment: Android 2.3 (Dalvik VM with JIT) Reporter: Alexis Robert Priority: Minor Attachments: FastMathLoadCheck.java Working on an Android application using Orekit, I've discovered that a simple FastMath.floor() takes about 4 to 5 secs on a 1GHz Nexus One phone (only the first time it's called). I've launched the Android profiling tool (traceview) and the problem seems to be linked with the static portion of FastMath code named // Initialize tables The timing resulted in : - FastMath.slowexp (40.8%) - FastMath.expint (39.2%) \- FastMath.quadmult() (95.6% of expint) - FastMath.slowlog (18.2%) Hoping that would help Thanks! Alexis Robert -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (LANG-744) StringUtils throws java.security.AccessControlException on Google App Engine
[ https://issues.apache.org/jira/browse/LANG-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102665#comment-13102665 ] Sebb commented on LANG-744: --- It might be worth changing the static init to a lazy init (IOD). This would reduce the overhead for applications that don't call stripAccents. Even if it is possible to change permissions without reloading the class, I'm not sure we check the methods each time. StringUtils throws java.security.AccessControlException on Google App Engine Key: LANG-744 URL: https://issues.apache.org/jira/browse/LANG-744 Project: Commons Lang Issue Type: Bug Components: lang.* Affects Versions: 3.0.1 Environment: Google App Engine Reporter: Clément Denis Fix For: 3.0.2 In the static initializer of org.apache.commons.lang3.StringUtils, there is an attempt to load the class sun.text.Normalizer. Such a class is prohibited on Google App Engine, and the static intializer throws a java.security.AccessControlException. {code} Caused by: java.security.AccessControlException: access denied (java.lang.RuntimePermission accessClassInPackage.sun.text) at java.security.AccessControlContext.checkPermission(AccessControlContext.java:374) at java.security.AccessController.checkPermission(AccessController.java:546) at java.lang.SecurityManager.checkPermission(SecurityManager.java:532) at com.google.appengine.tools.development.DevAppServerFactory$CustomSecurityManager.checkPermission(DevAppServerFactory.java:166) at java.lang.SecurityManager.checkPackageAccess(SecurityManager.java:1512) at java.lang.Class.checkMemberAccess(Class.java:2164) at java.lang.Class.getMethod(Class.java:1602) at org.apache.commons.lang3.StringUtils.clinit(StringUtils.java:739) {code} The exception should be caught in the catch clauses around loadClass(sun.text.Normalizer). Commons lang 2 worked fine on GAE. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CHAIN-57) Chain 2.0 does not build on older JDKs
[ https://issues.apache.org/jira/browse/CHAIN-57?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102691#comment-13102691 ] Sebb commented on CHAIN-57: --- I just committed an alternate fix that works for me on Sun Java 1.5, but does not require an unchecked cast Chain 2.0 does not build on older JDKs -- Key: CHAIN-57 URL: https://issues.apache.org/jira/browse/CHAIN-57 Project: Commons Chain Issue Type: Bug Affects Versions: 2.0 Environment: OS name: linux version: 2.6.35-30-generic arch: amd64 Family: unix Ubuntu 10.10 x64 Versions Tested: {noformat} ibm-java2-x86_64-50 (1.5 j9vmxa6423ifx-20110624) [SUCCESS] Sun/Oracle 1.5.0_22 [FAILURE] OpenJdk 1.6.0_20 [SUCCESS] Sun/Oracle 1.6.0_11 [FAILURE] Sun/Oracle 1.6.0_21 [FAILURE] Sun/Oracle 1.6.0_27 [SUCCESS] ibm-java-x86_64-60 (1.6 jvmxa6460-20081105_25433) [FAILURE] ibm-java-x86_64-60 (1.6 jvmxa6460sr9-20110624_85526) [SUCCESS] {noformat} Reporter: Elijah Zupancic Priority: Minor Older versions of the JDK irrespective of vendor fail to compile chain v2. I recommend that we do not do any code changes, but rather inform the users in the documentation to compile with a newer JDK version. The following is the typical output of a failed build. This particular output is when I tried to build using the Sun/Oracle JDK 1.6.0_21. {noformat} mvn clean package [INFO] Scanning for projects... [INFO] [INFO] Building Commons Chain [INFO]task-segment: [clean, package] [INFO] [INFO] artifact org.apache.maven.plugins:maven-idea-plugin: checking for updates from internal [INFO] Repository 'internal' will be blacklisted [INFO] [clean:clean {execution: default-clean}] [INFO] Deleting /home/elijah/dev/version-2.0-work/target [INFO] [antrun:run {execution: javadoc.resources}] [INFO] Executing tasks main: [copy] Copying 2 files to /home/elijah/dev/version-2.0-work/target/apidocs/META-INF [INFO] Executed tasks [INFO] Setting property: classpath.resource.loader.class = 'org.codehaus.plexus.velocity.ContextClassLoaderResourceLoader'. [INFO] Setting property: velocimacro.messages.on = 'false'. [INFO] Setting property: resource.loader = 'classpath'. [INFO] Setting property: resource.manager.logwhenfound = 'false'. [INFO] [remote-resources:process {execution: default}] [INFO] [resources:resources {execution: default-resources}] [INFO] Using 'iso-8859-1' encoding to copy filtered resources. [INFO] Copying 2 resources to META-INF [INFO] [compiler:compile {execution: default-compile}] [INFO] Compiling 63 source files to /home/elijah/dev/version-2.0-work/target/classes [INFO] [bundle:manifest {execution: bundle-manifest}] [WARNING] Warning in manifest for commons-chain:commons-chain:jar:2.0-SNAPSHOT : Did not find matching referal for !javax.portlet [INFO] [resources:testResources {execution: default-testResources}] [INFO] Using 'iso-8859-1' encoding to copy filtered resources. [INFO] Copying 2 resources [INFO] [compiler:testCompile {execution: default-testCompile}] [INFO] Compiling 37 source files to /home/elijah/dev/version-2.0-work/target/test-classes [INFO] - [ERROR] COMPILATION ERROR : [INFO] - [ERROR] /home/elijah/dev/version-2.0-work/src/test/java/org/apache/commons/chain/generic/DispatchCommandTestCase.java:[141,42] type parameters of TT cannot be determined; no unique maximal instance exists for type variable T with upper bounds T,java.lang.Object [INFO] 1error [INFO] - [INFO] [ERROR] BUILD FAILURE [INFO] [INFO] Compilation failure /home/elijah/dev/version-2.0-work/src/test/java/org/apache/commons/chain/generic/DispatchCommandTestCase.java:[141,42] type parameters of TT cannot be determined; no unique maximal instance exists for type variable T with upper bounds T,java.lang.Object [INFO] [INFO] For more information, run Maven with the -e switch [INFO] [INFO] Total time: 15 seconds [INFO] Finished at: Wed Sep 07 08:09:12 PDT 2011 [INFO] Final Memory: 51M/300M [INFO] {noformat} -- This message is automatically generated by JIRA. For more information on JIRA, see:
[jira] [Issue Comment Edited] (LANG-744) StringUtils throws java.security.AccessControlException on Google App Engine
[ https://issues.apache.org/jira/browse/LANG-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102665#comment-13102665 ] Sebb edited comment on LANG-744 at 9/12/11 6:07 PM: It might be worth changing the static init to a lazy init (IOD). This would reduce the overhead for applications that don't call stripAccents. Even if it is possible to change permissions without reloading the class, I don't think we should check the methods each time. was (Author: s...@apache.org): It might be worth changing the static init to a lazy init (IOD). This would reduce the overhead for applications that don't call stripAccents. Even if it is possible to change permissions without reloading the class, I'm not sure we check the methods each time. StringUtils throws java.security.AccessControlException on Google App Engine Key: LANG-744 URL: https://issues.apache.org/jira/browse/LANG-744 Project: Commons Lang Issue Type: Bug Components: lang.* Affects Versions: 3.0.1 Environment: Google App Engine Reporter: Clément Denis Fix For: 3.0.2 Attachments: LANG-744.patch In the static initializer of org.apache.commons.lang3.StringUtils, there is an attempt to load the class sun.text.Normalizer. Such a class is prohibited on Google App Engine, and the static intializer throws a java.security.AccessControlException. {code} Caused by: java.security.AccessControlException: access denied (java.lang.RuntimePermission accessClassInPackage.sun.text) at java.security.AccessControlContext.checkPermission(AccessControlContext.java:374) at java.security.AccessController.checkPermission(AccessController.java:546) at java.lang.SecurityManager.checkPermission(SecurityManager.java:532) at com.google.appengine.tools.development.DevAppServerFactory$CustomSecurityManager.checkPermission(DevAppServerFactory.java:166) at java.lang.SecurityManager.checkPackageAccess(SecurityManager.java:1512) at java.lang.Class.checkMemberAccess(Class.java:2164) at java.lang.Class.getMethod(Class.java:1602) at org.apache.commons.lang3.StringUtils.clinit(StringUtils.java:739) {code} The exception should be caught in the catch clauses around loadClass(sun.text.Normalizer). Commons lang 2 worked fine on GAE. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (LANG-744) StringUtils throws java.security.AccessControlException on Google App Engine
[ https://issues.apache.org/jira/browse/LANG-744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebb updated LANG-744: -- Attachment: LANG-744.patch Patch to convert the static checks to IOD StringUtils throws java.security.AccessControlException on Google App Engine Key: LANG-744 URL: https://issues.apache.org/jira/browse/LANG-744 Project: Commons Lang Issue Type: Bug Components: lang.* Affects Versions: 3.0.1 Environment: Google App Engine Reporter: Clément Denis Fix For: 3.0.2 Attachments: LANG-744.patch In the static initializer of org.apache.commons.lang3.StringUtils, there is an attempt to load the class sun.text.Normalizer. Such a class is prohibited on Google App Engine, and the static intializer throws a java.security.AccessControlException. {code} Caused by: java.security.AccessControlException: access denied (java.lang.RuntimePermission accessClassInPackage.sun.text) at java.security.AccessControlContext.checkPermission(AccessControlContext.java:374) at java.security.AccessController.checkPermission(AccessController.java:546) at java.lang.SecurityManager.checkPermission(SecurityManager.java:532) at com.google.appengine.tools.development.DevAppServerFactory$CustomSecurityManager.checkPermission(DevAppServerFactory.java:166) at java.lang.SecurityManager.checkPackageAccess(SecurityManager.java:1512) at java.lang.Class.checkMemberAccess(Class.java:2164) at java.lang.Class.getMethod(Class.java:1602) at org.apache.commons.lang3.StringUtils.clinit(StringUtils.java:739) {code} The exception should be caught in the catch clauses around loadClass(sun.text.Normalizer). Commons lang 2 worked fine on GAE. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (COMPRESS-157) Wrong EOF detection in CBZip2InputStream
[ https://issues.apache.org/jira/browse/COMPRESS-157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13106082#comment-13106082 ] Sebb commented on COMPRESS-157: --- Are you sure the class is in Compress? I can only find BZip2CompressorInputStream which does not have the problem. Commons VFS does have a file called CBZip2InputStream, and there is one instance of casting read() to a char: {code} 619 while (bsLive 1) 620 { 621 char ch = 0; 622 try 623 { 624 ch = (char) inputStream.read(); 625 } 626 catch (IOException e) 627 { 628 compressedStreamEOF(); 629 } 630 631 bsBuff = (bsBuff 8) | (ch 0xff); 632 bsLive += 8; 633 } {code} That does look wrong. Wrong EOF detection in CBZip2InputStream Key: COMPRESS-157 URL: https://issues.apache.org/jira/browse/COMPRESS-157 Project: Commons Compress Issue Type: Bug Reporter: Jan Priority: Minor The following snippet form CBZip2InputStream does a wrong EOF check. The char 'thech' will never be equal to the integer '-1'. You have to check for #read() returning -1 before casting to char. I found the bug in http://svn.wikimedia.org/svnroot/mediawiki/trunk/mwdumper/src/org/apache/commons/compress/bzip2/ not in your TRUNK. {noformat} int zzi; char thech = 0; try { thech = (char)m_input.read(); } catch( IOException e ) { compressedStreamEOF(); } if( thech == -1 ) //HERE { compressedStreamEOF(); } {noformat} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (VFS-363) Wrong EOF detection in CBZip2InputStream
Wrong EOF detection in CBZip2InputStream Key: VFS-363 URL: https://issues.apache.org/jira/browse/VFS-363 Project: Commons VFS Issue Type: Bug Reporter: Sebb See https://issues.apache.org/jira/browse/COMPRESS-157 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MATH-650) FastMath has static code which slows the first access to FastMath
[ https://issues.apache.org/jira/browse/MATH-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13107265#comment-13107265 ] Sebb commented on MATH-650: --- Thanks, useful to know. It would be interesting to know the times for the second invocation as well. FastMath has static code which slows the first access to FastMath - Key: MATH-650 URL: https://issues.apache.org/jira/browse/MATH-650 Project: Commons Math Issue Type: Improvement Affects Versions: Nightly Builds Environment: Android 2.3 (Dalvik VM with JIT) Reporter: Alexis Robert Priority: Minor Attachments: FastMathLoadCheck.java, LucTestPerformance.java Working on an Android application using Orekit, I've discovered that a simple FastMath.floor() takes about 4 to 5 secs on a 1GHz Nexus One phone (only the first time it's called). I've launched the Android profiling tool (traceview) and the problem seems to be linked with the static portion of FastMath code named // Initialize tables The timing resulted in : - FastMath.slowexp (40.8%) - FastMath.expint (39.2%) \- FastMath.quadmult() (95.6% of expint) - FastMath.slowlog (18.2%) Hoping that would help Thanks! Alexis Robert -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (DBUTILS-80) DbUtils.loadDriver catches Throwable
[ https://issues.apache.org/jira/browse/DBUTILS-80?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebb updated DBUTILS-80: Attachment: dbutils-80.patch DbUtils.loadDriver catches Throwable Key: DBUTILS-80 URL: https://issues.apache.org/jira/browse/DBUTILS-80 Project: Commons DbUtils Issue Type: Bug Reporter: Sebb Attachments: dbutils-80.patch DbUtils.loadDriver catches Throwable, which is a very bad idea. It should just catch Exception. Suggested patch to follow (also simplifies code) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (DBUTILS-80) DbUtils.loadDriver catches Throwable
[ https://issues.apache.org/jira/browse/DBUTILS-80?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebb updated DBUTILS-80: Affects Version/s: 1.3 DbUtils.loadDriver catches Throwable Key: DBUTILS-80 URL: https://issues.apache.org/jira/browse/DBUTILS-80 Project: Commons DbUtils Issue Type: Bug Affects Versions: 1.3 Reporter: Sebb Attachments: dbutils-80.patch DbUtils.loadDriver catches Throwable, which is a very bad idea. It should just catch Exception. Suggested patch to follow (also simplifies code) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (DBUTILS-81) DbUtils.loadDriver() uses Class.forName()
[ https://issues.apache.org/jira/browse/DBUTILS-81?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13112470#comment-13112470 ] Sebb commented on DBUTILS-81: - Have you a suggested patch for this? DbUtils.loadDriver() uses Class.forName() - Key: DBUTILS-81 URL: https://issues.apache.org/jira/browse/DBUTILS-81 Project: Commons DbUtils Issue Type: Bug Reporter: Simone Tripodi The {{Class.forName()}} statement should be avoided due to potential OSGi issues - commons components are OSGi bundles! The ClassLoader should be used instead to [load classes|http://download.oracle.com/javase/6/docs/api/java/lang/ClassLoader.html#loadClass(java.lang.String)] and add a new method to pass custom ClassLoader. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (DBUTILS-80) DbUtils.loadDriver catches Throwable
[ https://issues.apache.org/jira/browse/DBUTILS-80?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebb resolved DBUTILS-80. - Resolution: Fixed DbUtils.loadDriver catches Throwable Key: DBUTILS-80 URL: https://issues.apache.org/jira/browse/DBUTILS-80 Project: Commons DbUtils Issue Type: Bug Affects Versions: 1.3 Reporter: Sebb Attachments: dbutils-80.patch DbUtils.loadDriver catches Throwable, which is a very bad idea. It should just catch Exception. Suggested patch to follow (also simplifies code) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (LANG-744) StringUtils throws java.security.AccessControlException on Google App Engine
[ https://issues.apache.org/jira/browse/LANG-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13112992#comment-13112992 ] Sebb commented on LANG-744: --- Any objection to applying the patch to convert the method checks to IOD? That will remove the overhead for applications that don't use stripAccents. StringUtils throws java.security.AccessControlException on Google App Engine Key: LANG-744 URL: https://issues.apache.org/jira/browse/LANG-744 Project: Commons Lang Issue Type: Bug Components: lang.* Affects Versions: 3.0.1 Environment: Google App Engine Reporter: Clément Denis Fix For: 3.0.2 Attachments: LANG-744.patch In the static initializer of org.apache.commons.lang3.StringUtils, there is an attempt to load the class sun.text.Normalizer. Such a class is prohibited on Google App Engine, and the static intializer throws a java.security.AccessControlException. {code} Caused by: java.security.AccessControlException: access denied (java.lang.RuntimePermission accessClassInPackage.sun.text) at java.security.AccessControlContext.checkPermission(AccessControlContext.java:374) at java.security.AccessController.checkPermission(AccessController.java:546) at java.lang.SecurityManager.checkPermission(SecurityManager.java:532) at com.google.appengine.tools.development.DevAppServerFactory$CustomSecurityManager.checkPermission(DevAppServerFactory.java:166) at java.lang.SecurityManager.checkPackageAccess(SecurityManager.java:1512) at java.lang.Class.checkMemberAccess(Class.java:2164) at java.lang.Class.getMethod(Class.java:1602) at org.apache.commons.lang3.StringUtils.clinit(StringUtils.java:739) {code} The exception should be caught in the catch clauses around loadClass(sun.text.Normalizer). Commons lang 2 worked fine on GAE. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (VFS-360) Migrate to HttpComponent HttpClient from the old Commons HttpClient
[ https://issues.apache.org/jira/browse/VFS-360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13114105#comment-13114105 ] Sebb commented on VFS-360: -- HC4 uses a different groupId/artifactId and package names, so AFAICT VFS could be updated without affecting JackRabbit. Migrate to HttpComponent HttpClient from the old Commons HttpClient --- Key: VFS-360 URL: https://issues.apache.org/jira/browse/VFS-360 Project: Commons VFS Issue Type: Improvement Reporter: Gary D. Gregory Migrate to HttpComponent HttpClient from the old Commons HttpClient. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (NET-460) _retrieveFile() blocks calling thread, on FTP I/O till the time file transfer is complete
[ https://issues.apache.org/jira/browse/NET-460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebb resolved NET-460. -- Resolution: Duplicate _retrieveFile() blocks calling thread, on FTP I/O till the time file transfer is complete - Key: NET-460 URL: https://issues.apache.org/jira/browse/NET-460 Project: Commons Net Issue Type: Improvement Components: FTP Affects Versions: 3.1 Environment: linux/windows Reporter: Agent Vinod Labels: newbie, patch The Function _retrieveFile in file: FTPClient.java , does not respond to interrupts from calling thread. For Example: A Basic FTP Client Application has 1 Main (Parent) Thread and 1 Child Thread. Main (Parent) thread handles all functions except the FtpClient download/upload. Child Thread handles only FtpClient related functions mainly (_retrieveFile()) etc. Steps to reproduce: 1) Main Thread has initiated child Thread . 2) Child thread is presently downloading a file using _retrieveFile(String command, String remote, OutputStream local) . 3) After some time, Main Thread fires Interrupt on child Thread to stop( Abort) download. Expected behavior: Child Thread immediately aborts download and dies. Observed behavior: Child Thread blocks on retrieveFile(String command, String remote, OutputStream local) till the file finishes download. Only after this ,does the Child thread respond to any interrupt from Parent Thread. My Workaround: file: FTPClient.java Class: FTPClient Step 1: declare private Socket mySocket; Step 2: In the function : protected boolean _retrieveFile(String command, String remote, OutputStream local) throws IOException{} Comment out: Socket socket; and instead use: mySocket ( declared as global in step1) Step 3: In the function : public boolean abort() throws IOException Add a statement: Util.closeQuietly(mySocket); before the statement: return FTPReply.isPositiveCompletion(abor()); This way, every time the Main Thread calls abort(), the download active and blocked on mySocket in _retrieveFile() is immediately interrupted and stopped. raising an immediate Exception and thus stopping the Child thread (of course one needs to catch this exception properly). I am not sure if this is the right way of doing it and am afraid if this breaks something else. Requesting the core developers to look into a better solution to this workaround. thank -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NET-460) _retrieveFile() blocks calling thread, on FTP I/O till the time file transfer is complete
[ https://issues.apache.org/jira/browse/NET-460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13259566#comment-13259566 ] Sebb commented on NET-460: -- The abort() method sends an ABOR command to the server; that is its only function. If the server fails to honour the ABOR command, that is not the fault of Commons NET. Unfortunately, many FTP servers seem to stop processing the control channel whilst data transfer is in use. The abort() method cannot be extended as suggested. However, it might be possible to provide a new method which allows the data socket to be closed as suggested. This is covered under NET-419. _retrieveFile() blocks calling thread, on FTP I/O till the time file transfer is complete - Key: NET-460 URL: https://issues.apache.org/jira/browse/NET-460 Project: Commons Net Issue Type: Improvement Components: FTP Affects Versions: 3.1 Environment: linux/windows Reporter: Agent Vinod Labels: newbie, patch The Function _retrieveFile in file: FTPClient.java , does not respond to interrupts from calling thread. For Example: A Basic FTP Client Application has 1 Main (Parent) Thread and 1 Child Thread. Main (Parent) thread handles all functions except the FtpClient download/upload. Child Thread handles only FtpClient related functions mainly (_retrieveFile()) etc. Steps to reproduce: 1) Main Thread has initiated child Thread . 2) Child thread is presently downloading a file using _retrieveFile(String command, String remote, OutputStream local) . 3) After some time, Main Thread fires Interrupt on child Thread to stop( Abort) download. Expected behavior: Child Thread immediately aborts download and dies. Observed behavior: Child Thread blocks on retrieveFile(String command, String remote, OutputStream local) till the file finishes download. Only after this ,does the Child thread respond to any interrupt from Parent Thread. My Workaround: file: FTPClient.java Class: FTPClient Step 1: declare private Socket mySocket; Step 2: In the function : protected boolean _retrieveFile(String command, String remote, OutputStream local) throws IOException{} Comment out: Socket socket; and instead use: mySocket ( declared as global in step1) Step 3: In the function : public boolean abort() throws IOException Add a statement: Util.closeQuietly(mySocket); before the statement: return FTPReply.isPositiveCompletion(abor()); This way, every time the Main Thread calls abort(), the download active and blocked on mySocket in _retrieveFile() is immediately interrupted and stopped. raising an immediate Exception and thus stopping the Child thread (of course one needs to catch this exception properly). I am not sure if this is the right way of doing it and am afraid if this breaks something else. Requesting the core developers to look into a better solution to this workaround. thank -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira