[jira] [Commented] (LANG-607) StringUtils methods do not handle Unicode 2.0+ supplementary characters correctly.
[ https://issues.apache.org/jira/browse/LANG-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816432#comment-17816432 ] Elliotte Rusty Harold commented on LANG-607: ping, should be closed per comment > StringUtils methods do not handle Unicode 2.0+ supplementary characters > correctly. > -- > > Key: LANG-607 > URL: https://issues.apache.org/jira/browse/LANG-607 > Project: Commons Lang > Issue Type: Bug > Components: lang.* >Affects Versions: 2.5 > Environment: java version "1.6.0_16" > Java(TM) SE Runtime Environment (build 1.6.0_16-b01) > Java HotSpot(TM) 64-Bit Server VM (build 14.2-b01, mixed mode) > Microsoft Windows [Version 6.0.6002] > Apache Maven 2.2.1 (r801777; 2009-08-06 12:16:01-0700) > Java version: 1.6.0_16 > Java home: C:\Program Files\Java\jdk1.6.0_16\jre > Default locale: en_US, platform encoding: Cp1252 > OS name: "windows vista" version: "6.0" arch: "amd64" Family: "windows" >Reporter: Gary Gregory >Assignee: Gary D. Gregory >Priority: Minor > Fix For: Patch Needed > > Attachments: LANG-607.diff > > > StringUtils.containsAny methods incorrectly matches Unicode 2.0+ > supplementary characters. > For example, define a test fixture to be the Unicode character U+2 where > U+2 is written in Java source as "\uD840\uDC00" > private static final String CharU2 = "\uD840\uDC00"; > private static final String CharU20001 = "\uD840\uDC01"; > You can see Unicode supplementary characters correctly implemented in the JRE > call: > assertEquals(-1, CharU2.indexOf(CharU20001)); > But this is broken: > assertEquals(false, StringUtils.containsAny(CharU2, CharU20001)); > assertEquals(false, StringUtils.containsAny(CharU20001, CharU2)); > This is fine: > assertEquals(true, StringUtils.contains(CharU2 + CharU20001, > CharU2)); > assertEquals(true, StringUtils.contains(CharU2 + CharU20001, > CharU20001)); > assertEquals(true, StringUtils.contains(CharU2, CharU2)); > assertEquals(false, StringUtils.contains(CharU2, CharU20001)); > because the method calls the JRE to perform the match. > More than you want to know: > - http://java.sun.com/developer/technicalArticles/Intl/Supplementary/ -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (LANG-607) StringUtils methods do not handle Unicode 2.0+ supplementary characters correctly.
[ https://issues.apache.org/jira/browse/LANG-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17235713#comment-17235713 ] Arturo Bernal commented on LANG-607: Hi [~ggregory] We can close this issues. It's already fixit [https://github.com/apache/commons-lang/blob/master/src/test/java/org/apache/commons/lang3/StringUtilsContainsTest.java#L156] > StringUtils methods do not handle Unicode 2.0+ supplementary characters > correctly. > -- > > Key: LANG-607 > URL: https://issues.apache.org/jira/browse/LANG-607 > Project: Commons Lang > Issue Type: Bug > Components: lang.* >Affects Versions: 2.5 > Environment: java version "1.6.0_16" > Java(TM) SE Runtime Environment (build 1.6.0_16-b01) > Java HotSpot(TM) 64-Bit Server VM (build 14.2-b01, mixed mode) > Microsoft Windows [Version 6.0.6002] > Apache Maven 2.2.1 (r801777; 2009-08-06 12:16:01-0700) > Java version: 1.6.0_16 > Java home: C:\Program Files\Java\jdk1.6.0_16\jre > Default locale: en_US, platform encoding: Cp1252 > OS name: "windows vista" version: "6.0" arch: "amd64" Family: "windows" >Reporter: Gary Gregory >Assignee: Gary D. Gregory >Priority: Minor > Fix For: Patch Needed > > Attachments: LANG-607.diff > > > StringUtils.containsAny methods incorrectly matches Unicode 2.0+ > supplementary characters. > For example, define a test fixture to be the Unicode character U+2 where > U+2 is written in Java source as "\uD840\uDC00" > private static final String CharU2 = "\uD840\uDC00"; > private static final String CharU20001 = "\uD840\uDC01"; > You can see Unicode supplementary characters correctly implemented in the JRE > call: > assertEquals(-1, CharU2.indexOf(CharU20001)); > But this is broken: > assertEquals(false, StringUtils.containsAny(CharU2, CharU20001)); > assertEquals(false, StringUtils.containsAny(CharU20001, CharU2)); > This is fine: > assertEquals(true, StringUtils.contains(CharU2 + CharU20001, > CharU2)); > assertEquals(true, StringUtils.contains(CharU2 + CharU20001, > CharU20001)); > assertEquals(true, StringUtils.contains(CharU2, CharU2)); > assertEquals(false, StringUtils.contains(CharU2, CharU20001)); > because the method calls the JRE to perform the match. > More than you want to know: > - http://java.sun.com/developer/technicalArticles/Intl/Supplementary/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] Commented: (LANG-607) StringUtils methods do not handle Unicode 2.0+ supplementary characters correctly.
[ https://issues.apache.org/jira/browse/LANG-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979781#action_12979781 ] Niall Pemberton commented on LANG-607: -- Is the work complete on this? I have ported the changes back to the 2.x branch (I copied the Character.isHighSurrogate(char) method from Apache Harmony into CharUtils) StringUtils methods do not handle Unicode 2.0+ supplementary characters correctly. -- Key: LANG-607 URL: https://issues.apache.org/jira/browse/LANG-607 Project: Commons Lang Issue Type: Bug Components: lang.* Affects Versions: 2.5 Environment: java version 1.6.0_16 Java(TM) SE Runtime Environment (build 1.6.0_16-b01) Java HotSpot(TM) 64-Bit Server VM (build 14.2-b01, mixed mode) Microsoft Windows [Version 6.0.6002] Apache Maven 2.2.1 (r801777; 2009-08-06 12:16:01-0700) Java version: 1.6.0_16 Java home: C:\Program Files\Java\jdk1.6.0_16\jre Default locale: en_US, platform encoding: Cp1252 OS name: windows vista version: 6.0 arch: amd64 Family: windows Reporter: Gary Gregory Assignee: Gary Gregory Priority: Minor Fix For: 3.0 Attachments: LANG-607.diff StringUtils.containsAny methods incorrectly matches Unicode 2.0+ supplementary characters. For example, define a test fixture to be the Unicode character U+2 where U+2 is written in Java source as \uD840\uDC00 private static final String CharU2 = \uD840\uDC00; private static final String CharU20001 = \uD840\uDC01; You can see Unicode supplementary characters correctly implemented in the JRE call: assertEquals(-1, CharU2.indexOf(CharU20001)); But this is broken: assertEquals(false, StringUtils.containsAny(CharU2, CharU20001)); assertEquals(false, StringUtils.containsAny(CharU20001, CharU2)); This is fine: assertEquals(true, StringUtils.contains(CharU2 + CharU20001, CharU2)); assertEquals(true, StringUtils.contains(CharU2 + CharU20001, CharU20001)); assertEquals(true, StringUtils.contains(CharU2, CharU2)); assertEquals(false, StringUtils.contains(CharU2, CharU20001)); because the method calls the JRE to perform the match. More than you want to know: - http://java.sun.com/developer/technicalArticles/Intl/Supplementary/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (LANG-607) StringUtils methods do not handle Unicode 2.0+ supplementary characters correctly.
[ https://issues.apache.org/jira/browse/LANG-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979785#action_12979785 ] Gary Gregory commented on LANG-607: --- I am pretty sure that I did not complete the task. Lots of nooks and crannies... StringUtils methods do not handle Unicode 2.0+ supplementary characters correctly. -- Key: LANG-607 URL: https://issues.apache.org/jira/browse/LANG-607 Project: Commons Lang Issue Type: Bug Components: lang.* Affects Versions: 2.5 Environment: java version 1.6.0_16 Java(TM) SE Runtime Environment (build 1.6.0_16-b01) Java HotSpot(TM) 64-Bit Server VM (build 14.2-b01, mixed mode) Microsoft Windows [Version 6.0.6002] Apache Maven 2.2.1 (r801777; 2009-08-06 12:16:01-0700) Java version: 1.6.0_16 Java home: C:\Program Files\Java\jdk1.6.0_16\jre Default locale: en_US, platform encoding: Cp1252 OS name: windows vista version: 6.0 arch: amd64 Family: windows Reporter: Gary Gregory Assignee: Gary Gregory Priority: Minor Fix For: 3.0 Attachments: LANG-607.diff StringUtils.containsAny methods incorrectly matches Unicode 2.0+ supplementary characters. For example, define a test fixture to be the Unicode character U+2 where U+2 is written in Java source as \uD840\uDC00 private static final String CharU2 = \uD840\uDC00; private static final String CharU20001 = \uD840\uDC01; You can see Unicode supplementary characters correctly implemented in the JRE call: assertEquals(-1, CharU2.indexOf(CharU20001)); But this is broken: assertEquals(false, StringUtils.containsAny(CharU2, CharU20001)); assertEquals(false, StringUtils.containsAny(CharU20001, CharU2)); This is fine: assertEquals(true, StringUtils.contains(CharU2 + CharU20001, CharU2)); assertEquals(true, StringUtils.contains(CharU2 + CharU20001, CharU20001)); assertEquals(true, StringUtils.contains(CharU2, CharU2)); assertEquals(false, StringUtils.contains(CharU2, CharU20001)); because the method calls the JRE to perform the match. More than you want to know: - http://java.sun.com/developer/technicalArticles/Intl/Supplementary/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (LANG-607) StringUtils methods do not handle Unicode 2.0+ supplementary characters correctly.
[ https://issues.apache.org/jira/browse/LANG-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979877#action_12979877 ] Niall Pemberton commented on LANG-607: -- Is the work thats been done so far OK to go into a release? I'm wondering whether I should revert it from the 2.x branch before releasing 2.6 or is whats been done in trunk (and ported to 2.x) good to go? StringUtils methods do not handle Unicode 2.0+ supplementary characters correctly. -- Key: LANG-607 URL: https://issues.apache.org/jira/browse/LANG-607 Project: Commons Lang Issue Type: Bug Components: lang.* Affects Versions: 2.5 Environment: java version 1.6.0_16 Java(TM) SE Runtime Environment (build 1.6.0_16-b01) Java HotSpot(TM) 64-Bit Server VM (build 14.2-b01, mixed mode) Microsoft Windows [Version 6.0.6002] Apache Maven 2.2.1 (r801777; 2009-08-06 12:16:01-0700) Java version: 1.6.0_16 Java home: C:\Program Files\Java\jdk1.6.0_16\jre Default locale: en_US, platform encoding: Cp1252 OS name: windows vista version: 6.0 arch: amd64 Family: windows Reporter: Gary Gregory Assignee: Gary Gregory Priority: Minor Fix For: 3.0 Attachments: LANG-607.diff StringUtils.containsAny methods incorrectly matches Unicode 2.0+ supplementary characters. For example, define a test fixture to be the Unicode character U+2 where U+2 is written in Java source as \uD840\uDC00 private static final String CharU2 = \uD840\uDC00; private static final String CharU20001 = \uD840\uDC01; You can see Unicode supplementary characters correctly implemented in the JRE call: assertEquals(-1, CharU2.indexOf(CharU20001)); But this is broken: assertEquals(false, StringUtils.containsAny(CharU2, CharU20001)); assertEquals(false, StringUtils.containsAny(CharU20001, CharU2)); This is fine: assertEquals(true, StringUtils.contains(CharU2 + CharU20001, CharU2)); assertEquals(true, StringUtils.contains(CharU2 + CharU20001, CharU20001)); assertEquals(true, StringUtils.contains(CharU2, CharU2)); assertEquals(false, StringUtils.contains(CharU2, CharU20001)); because the method calls the JRE to perform the match. More than you want to know: - http://java.sun.com/developer/technicalArticles/Intl/Supplementary/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (LANG-607) StringUtils methods do not handle Unicode 2.0+ supplementary characters correctly.
[ https://issues.apache.org/jira/browse/LANG-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979894#action_12979894 ] Sebb commented on LANG-607: --- Looks like the following condition could be taken out of the loop: CharUtils.isHighSurrogate(ch) as there's no point rechecking it for each search character. I don't know if the code is otherwise correct. StringUtils methods do not handle Unicode 2.0+ supplementary characters correctly. -- Key: LANG-607 URL: https://issues.apache.org/jira/browse/LANG-607 Project: Commons Lang Issue Type: Bug Components: lang.* Affects Versions: 2.5 Environment: java version 1.6.0_16 Java(TM) SE Runtime Environment (build 1.6.0_16-b01) Java HotSpot(TM) 64-Bit Server VM (build 14.2-b01, mixed mode) Microsoft Windows [Version 6.0.6002] Apache Maven 2.2.1 (r801777; 2009-08-06 12:16:01-0700) Java version: 1.6.0_16 Java home: C:\Program Files\Java\jdk1.6.0_16\jre Default locale: en_US, platform encoding: Cp1252 OS name: windows vista version: 6.0 arch: amd64 Family: windows Reporter: Gary Gregory Assignee: Gary Gregory Priority: Minor Fix For: 3.0 Attachments: LANG-607.diff StringUtils.containsAny methods incorrectly matches Unicode 2.0+ supplementary characters. For example, define a test fixture to be the Unicode character U+2 where U+2 is written in Java source as \uD840\uDC00 private static final String CharU2 = \uD840\uDC00; private static final String CharU20001 = \uD840\uDC01; You can see Unicode supplementary characters correctly implemented in the JRE call: assertEquals(-1, CharU2.indexOf(CharU20001)); But this is broken: assertEquals(false, StringUtils.containsAny(CharU2, CharU20001)); assertEquals(false, StringUtils.containsAny(CharU20001, CharU2)); This is fine: assertEquals(true, StringUtils.contains(CharU2 + CharU20001, CharU2)); assertEquals(true, StringUtils.contains(CharU2 + CharU20001, CharU20001)); assertEquals(true, StringUtils.contains(CharU2, CharU2)); assertEquals(false, StringUtils.contains(CharU2, CharU20001)); because the method calls the JRE to perform the match. More than you want to know: - http://java.sun.com/developer/technicalArticles/Intl/Supplementary/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (LANG-607) StringUtils methods do not handle Unicode 2.0+ supplementary characters correctly.
[ https://issues.apache.org/jira/browse/LANG-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979905#action_12979905 ] Gary Gregory commented on LANG-607: --- What is there is good to go but I did not cover all of our APIs. StringUtils methods do not handle Unicode 2.0+ supplementary characters correctly. -- Key: LANG-607 URL: https://issues.apache.org/jira/browse/LANG-607 Project: Commons Lang Issue Type: Bug Components: lang.* Affects Versions: 2.5 Environment: java version 1.6.0_16 Java(TM) SE Runtime Environment (build 1.6.0_16-b01) Java HotSpot(TM) 64-Bit Server VM (build 14.2-b01, mixed mode) Microsoft Windows [Version 6.0.6002] Apache Maven 2.2.1 (r801777; 2009-08-06 12:16:01-0700) Java version: 1.6.0_16 Java home: C:\Program Files\Java\jdk1.6.0_16\jre Default locale: en_US, platform encoding: Cp1252 OS name: windows vista version: 6.0 arch: amd64 Family: windows Reporter: Gary Gregory Assignee: Gary Gregory Priority: Minor Fix For: 3.0 Attachments: LANG-607.diff StringUtils.containsAny methods incorrectly matches Unicode 2.0+ supplementary characters. For example, define a test fixture to be the Unicode character U+2 where U+2 is written in Java source as \uD840\uDC00 private static final String CharU2 = \uD840\uDC00; private static final String CharU20001 = \uD840\uDC01; You can see Unicode supplementary characters correctly implemented in the JRE call: assertEquals(-1, CharU2.indexOf(CharU20001)); But this is broken: assertEquals(false, StringUtils.containsAny(CharU2, CharU20001)); assertEquals(false, StringUtils.containsAny(CharU20001, CharU2)); This is fine: assertEquals(true, StringUtils.contains(CharU2 + CharU20001, CharU2)); assertEquals(true, StringUtils.contains(CharU2 + CharU20001, CharU20001)); assertEquals(true, StringUtils.contains(CharU2, CharU2)); assertEquals(false, StringUtils.contains(CharU2, CharU20001)); because the method calls the JRE to perform the match. More than you want to know: - http://java.sun.com/developer/technicalArticles/Intl/Supplementary/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.