[jira] [Commented] (LANG-607) StringUtils methods do not handle Unicode 2.0+ supplementary characters correctly.

2024-02-11 Thread Elliotte Rusty Harold (Jira)


[ 
https://issues.apache.org/jira/browse/LANG-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816432#comment-17816432
 ] 

Elliotte Rusty Harold commented on LANG-607:


ping, should be closed per comment

> StringUtils methods do not handle Unicode 2.0+ supplementary characters 
> correctly.
> --
>
> Key: LANG-607
> URL: https://issues.apache.org/jira/browse/LANG-607
> Project: Commons Lang
>  Issue Type: Bug
>  Components: lang.*
>Affects Versions: 2.5
> Environment: java version "1.6.0_16"
> Java(TM) SE Runtime Environment (build 1.6.0_16-b01)
> Java HotSpot(TM) 64-Bit Server VM (build 14.2-b01, mixed mode)
> Microsoft Windows [Version 6.0.6002]
> Apache Maven 2.2.1 (r801777; 2009-08-06 12:16:01-0700)
> Java version: 1.6.0_16
> Java home: C:\Program Files\Java\jdk1.6.0_16\jre
> Default locale: en_US, platform encoding: Cp1252
> OS name: "windows vista" version: "6.0" arch: "amd64" Family: "windows"
>Reporter: Gary Gregory
>Assignee: Gary D. Gregory
>Priority: Minor
> Fix For: Patch Needed
>
> Attachments: LANG-607.diff
>
>
> StringUtils.containsAny methods incorrectly matches Unicode 2.0+ 
> supplementary characters.
> For example, define a test fixture to be the Unicode character U+2 where 
> U+2 is written in Java source as "\uD840\uDC00"
>   private static final String CharU2 = "\uD840\uDC00";
>   private static final String CharU20001 = "\uD840\uDC01";
> You can see Unicode supplementary characters correctly implemented in the JRE 
> call:
>   assertEquals(-1, CharU2.indexOf(CharU20001));
> But this is broken:
>   assertEquals(false, StringUtils.containsAny(CharU2, CharU20001));
>   assertEquals(false, StringUtils.containsAny(CharU20001, CharU2));
> This is fine:
>   assertEquals(true, StringUtils.contains(CharU2 + CharU20001, 
> CharU2));
>   assertEquals(true, StringUtils.contains(CharU2 + CharU20001, 
> CharU20001));
>   assertEquals(true, StringUtils.contains(CharU2, CharU2));
>   assertEquals(false, StringUtils.contains(CharU2, CharU20001));
> because the method calls the JRE to perform the match.
> More than you want to know:
> - http://java.sun.com/developer/technicalArticles/Intl/Supplementary/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (LANG-607) StringUtils methods do not handle Unicode 2.0+ supplementary characters correctly.

2020-11-19 Thread Arturo Bernal (Jira)


[ 
https://issues.apache.org/jira/browse/LANG-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17235713#comment-17235713
 ] 

Arturo Bernal commented on LANG-607:


Hi [~ggregory]

We can close this issues. It's already fixit 

 

[https://github.com/apache/commons-lang/blob/master/src/test/java/org/apache/commons/lang3/StringUtilsContainsTest.java#L156]

 

 

> StringUtils methods do not handle Unicode 2.0+ supplementary characters 
> correctly.
> --
>
> Key: LANG-607
> URL: https://issues.apache.org/jira/browse/LANG-607
> Project: Commons Lang
>  Issue Type: Bug
>  Components: lang.*
>Affects Versions: 2.5
> Environment: java version "1.6.0_16"
> Java(TM) SE Runtime Environment (build 1.6.0_16-b01)
> Java HotSpot(TM) 64-Bit Server VM (build 14.2-b01, mixed mode)
> Microsoft Windows [Version 6.0.6002]
> Apache Maven 2.2.1 (r801777; 2009-08-06 12:16:01-0700)
> Java version: 1.6.0_16
> Java home: C:\Program Files\Java\jdk1.6.0_16\jre
> Default locale: en_US, platform encoding: Cp1252
> OS name: "windows vista" version: "6.0" arch: "amd64" Family: "windows"
>Reporter: Gary Gregory
>Assignee: Gary D. Gregory
>Priority: Minor
> Fix For: Patch Needed
>
> Attachments: LANG-607.diff
>
>
> StringUtils.containsAny methods incorrectly matches Unicode 2.0+ 
> supplementary characters.
> For example, define a test fixture to be the Unicode character U+2 where 
> U+2 is written in Java source as "\uD840\uDC00"
>   private static final String CharU2 = "\uD840\uDC00";
>   private static final String CharU20001 = "\uD840\uDC01";
> You can see Unicode supplementary characters correctly implemented in the JRE 
> call:
>   assertEquals(-1, CharU2.indexOf(CharU20001));
> But this is broken:
>   assertEquals(false, StringUtils.containsAny(CharU2, CharU20001));
>   assertEquals(false, StringUtils.containsAny(CharU20001, CharU2));
> This is fine:
>   assertEquals(true, StringUtils.contains(CharU2 + CharU20001, 
> CharU2));
>   assertEquals(true, StringUtils.contains(CharU2 + CharU20001, 
> CharU20001));
>   assertEquals(true, StringUtils.contains(CharU2, CharU2));
>   assertEquals(false, StringUtils.contains(CharU2, CharU20001));
> because the method calls the JRE to perform the match.
> More than you want to know:
> - http://java.sun.com/developer/technicalArticles/Intl/Supplementary/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] Commented: (LANG-607) StringUtils methods do not handle Unicode 2.0+ supplementary characters correctly.

2011-01-10 Thread Niall Pemberton (JIRA)

[ 
https://issues.apache.org/jira/browse/LANG-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979781#action_12979781
 ] 

Niall Pemberton commented on LANG-607:
--

Is the work complete on this?

I have ported the changes back to the 2.x branch (I copied the 
Character.isHighSurrogate(char) method from Apache Harmony into CharUtils)

 StringUtils methods do not handle Unicode 2.0+ supplementary characters 
 correctly.
 --

 Key: LANG-607
 URL: https://issues.apache.org/jira/browse/LANG-607
 Project: Commons Lang
  Issue Type: Bug
  Components: lang.*
Affects Versions: 2.5
 Environment: java version 1.6.0_16
 Java(TM) SE Runtime Environment (build 1.6.0_16-b01)
 Java HotSpot(TM) 64-Bit Server VM (build 14.2-b01, mixed mode)
 Microsoft Windows [Version 6.0.6002]
 Apache Maven 2.2.1 (r801777; 2009-08-06 12:16:01-0700)
 Java version: 1.6.0_16
 Java home: C:\Program Files\Java\jdk1.6.0_16\jre
 Default locale: en_US, platform encoding: Cp1252
 OS name: windows vista version: 6.0 arch: amd64 Family: windows
Reporter: Gary Gregory
Assignee: Gary Gregory
Priority: Minor
 Fix For: 3.0

 Attachments: LANG-607.diff


 StringUtils.containsAny methods incorrectly matches Unicode 2.0+ 
 supplementary characters.
 For example, define a test fixture to be the Unicode character U+2 where 
 U+2 is written in Java source as \uD840\uDC00
   private static final String CharU2 = \uD840\uDC00;
   private static final String CharU20001 = \uD840\uDC01;
 You can see Unicode supplementary characters correctly implemented in the JRE 
 call:
   assertEquals(-1, CharU2.indexOf(CharU20001));
 But this is broken:
   assertEquals(false, StringUtils.containsAny(CharU2, CharU20001));
   assertEquals(false, StringUtils.containsAny(CharU20001, CharU2));
 This is fine:
   assertEquals(true, StringUtils.contains(CharU2 + CharU20001, 
 CharU2));
   assertEquals(true, StringUtils.contains(CharU2 + CharU20001, 
 CharU20001));
   assertEquals(true, StringUtils.contains(CharU2, CharU2));
   assertEquals(false, StringUtils.contains(CharU2, CharU20001));
 because the method calls the JRE to perform the match.
 More than you want to know:
 - http://java.sun.com/developer/technicalArticles/Intl/Supplementary/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (LANG-607) StringUtils methods do not handle Unicode 2.0+ supplementary characters correctly.

2011-01-10 Thread Gary Gregory (JIRA)

[ 
https://issues.apache.org/jira/browse/LANG-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979785#action_12979785
 ] 

Gary Gregory commented on LANG-607:
---

I am pretty sure that I did not complete the task. Lots of nooks and crannies...

 StringUtils methods do not handle Unicode 2.0+ supplementary characters 
 correctly.
 --

 Key: LANG-607
 URL: https://issues.apache.org/jira/browse/LANG-607
 Project: Commons Lang
  Issue Type: Bug
  Components: lang.*
Affects Versions: 2.5
 Environment: java version 1.6.0_16
 Java(TM) SE Runtime Environment (build 1.6.0_16-b01)
 Java HotSpot(TM) 64-Bit Server VM (build 14.2-b01, mixed mode)
 Microsoft Windows [Version 6.0.6002]
 Apache Maven 2.2.1 (r801777; 2009-08-06 12:16:01-0700)
 Java version: 1.6.0_16
 Java home: C:\Program Files\Java\jdk1.6.0_16\jre
 Default locale: en_US, platform encoding: Cp1252
 OS name: windows vista version: 6.0 arch: amd64 Family: windows
Reporter: Gary Gregory
Assignee: Gary Gregory
Priority: Minor
 Fix For: 3.0

 Attachments: LANG-607.diff


 StringUtils.containsAny methods incorrectly matches Unicode 2.0+ 
 supplementary characters.
 For example, define a test fixture to be the Unicode character U+2 where 
 U+2 is written in Java source as \uD840\uDC00
   private static final String CharU2 = \uD840\uDC00;
   private static final String CharU20001 = \uD840\uDC01;
 You can see Unicode supplementary characters correctly implemented in the JRE 
 call:
   assertEquals(-1, CharU2.indexOf(CharU20001));
 But this is broken:
   assertEquals(false, StringUtils.containsAny(CharU2, CharU20001));
   assertEquals(false, StringUtils.containsAny(CharU20001, CharU2));
 This is fine:
   assertEquals(true, StringUtils.contains(CharU2 + CharU20001, 
 CharU2));
   assertEquals(true, StringUtils.contains(CharU2 + CharU20001, 
 CharU20001));
   assertEquals(true, StringUtils.contains(CharU2, CharU2));
   assertEquals(false, StringUtils.contains(CharU2, CharU20001));
 because the method calls the JRE to perform the match.
 More than you want to know:
 - http://java.sun.com/developer/technicalArticles/Intl/Supplementary/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (LANG-607) StringUtils methods do not handle Unicode 2.0+ supplementary characters correctly.

2011-01-10 Thread Niall Pemberton (JIRA)

[ 
https://issues.apache.org/jira/browse/LANG-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979877#action_12979877
 ] 

Niall Pemberton commented on LANG-607:
--

Is the work thats been done so far OK to go into a release? I'm wondering 
whether I should revert it from the 2.x branch before releasing 2.6 or is whats 
been done in trunk (and ported to 2.x) good to go?

 StringUtils methods do not handle Unicode 2.0+ supplementary characters 
 correctly.
 --

 Key: LANG-607
 URL: https://issues.apache.org/jira/browse/LANG-607
 Project: Commons Lang
  Issue Type: Bug
  Components: lang.*
Affects Versions: 2.5
 Environment: java version 1.6.0_16
 Java(TM) SE Runtime Environment (build 1.6.0_16-b01)
 Java HotSpot(TM) 64-Bit Server VM (build 14.2-b01, mixed mode)
 Microsoft Windows [Version 6.0.6002]
 Apache Maven 2.2.1 (r801777; 2009-08-06 12:16:01-0700)
 Java version: 1.6.0_16
 Java home: C:\Program Files\Java\jdk1.6.0_16\jre
 Default locale: en_US, platform encoding: Cp1252
 OS name: windows vista version: 6.0 arch: amd64 Family: windows
Reporter: Gary Gregory
Assignee: Gary Gregory
Priority: Minor
 Fix For: 3.0

 Attachments: LANG-607.diff


 StringUtils.containsAny methods incorrectly matches Unicode 2.0+ 
 supplementary characters.
 For example, define a test fixture to be the Unicode character U+2 where 
 U+2 is written in Java source as \uD840\uDC00
   private static final String CharU2 = \uD840\uDC00;
   private static final String CharU20001 = \uD840\uDC01;
 You can see Unicode supplementary characters correctly implemented in the JRE 
 call:
   assertEquals(-1, CharU2.indexOf(CharU20001));
 But this is broken:
   assertEquals(false, StringUtils.containsAny(CharU2, CharU20001));
   assertEquals(false, StringUtils.containsAny(CharU20001, CharU2));
 This is fine:
   assertEquals(true, StringUtils.contains(CharU2 + CharU20001, 
 CharU2));
   assertEquals(true, StringUtils.contains(CharU2 + CharU20001, 
 CharU20001));
   assertEquals(true, StringUtils.contains(CharU2, CharU2));
   assertEquals(false, StringUtils.contains(CharU2, CharU20001));
 because the method calls the JRE to perform the match.
 More than you want to know:
 - http://java.sun.com/developer/technicalArticles/Intl/Supplementary/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (LANG-607) StringUtils methods do not handle Unicode 2.0+ supplementary characters correctly.

2011-01-10 Thread Sebb (JIRA)

[ 
https://issues.apache.org/jira/browse/LANG-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979894#action_12979894
 ] 

Sebb commented on LANG-607:
---

Looks like the following condition could be taken out of the loop:

CharUtils.isHighSurrogate(ch)

as there's no point rechecking it for each search character.

I don't know if the code is otherwise correct.



 StringUtils methods do not handle Unicode 2.0+ supplementary characters 
 correctly.
 --

 Key: LANG-607
 URL: https://issues.apache.org/jira/browse/LANG-607
 Project: Commons Lang
  Issue Type: Bug
  Components: lang.*
Affects Versions: 2.5
 Environment: java version 1.6.0_16
 Java(TM) SE Runtime Environment (build 1.6.0_16-b01)
 Java HotSpot(TM) 64-Bit Server VM (build 14.2-b01, mixed mode)
 Microsoft Windows [Version 6.0.6002]
 Apache Maven 2.2.1 (r801777; 2009-08-06 12:16:01-0700)
 Java version: 1.6.0_16
 Java home: C:\Program Files\Java\jdk1.6.0_16\jre
 Default locale: en_US, platform encoding: Cp1252
 OS name: windows vista version: 6.0 arch: amd64 Family: windows
Reporter: Gary Gregory
Assignee: Gary Gregory
Priority: Minor
 Fix For: 3.0

 Attachments: LANG-607.diff


 StringUtils.containsAny methods incorrectly matches Unicode 2.0+ 
 supplementary characters.
 For example, define a test fixture to be the Unicode character U+2 where 
 U+2 is written in Java source as \uD840\uDC00
   private static final String CharU2 = \uD840\uDC00;
   private static final String CharU20001 = \uD840\uDC01;
 You can see Unicode supplementary characters correctly implemented in the JRE 
 call:
   assertEquals(-1, CharU2.indexOf(CharU20001));
 But this is broken:
   assertEquals(false, StringUtils.containsAny(CharU2, CharU20001));
   assertEquals(false, StringUtils.containsAny(CharU20001, CharU2));
 This is fine:
   assertEquals(true, StringUtils.contains(CharU2 + CharU20001, 
 CharU2));
   assertEquals(true, StringUtils.contains(CharU2 + CharU20001, 
 CharU20001));
   assertEquals(true, StringUtils.contains(CharU2, CharU2));
   assertEquals(false, StringUtils.contains(CharU2, CharU20001));
 because the method calls the JRE to perform the match.
 More than you want to know:
 - http://java.sun.com/developer/technicalArticles/Intl/Supplementary/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (LANG-607) StringUtils methods do not handle Unicode 2.0+ supplementary characters correctly.

2011-01-10 Thread Gary Gregory (JIRA)

[ 
https://issues.apache.org/jira/browse/LANG-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979905#action_12979905
 ] 

Gary Gregory commented on LANG-607:
---

What is there is good to go but I did not cover all of our APIs. 

 StringUtils methods do not handle Unicode 2.0+ supplementary characters 
 correctly.
 --

 Key: LANG-607
 URL: https://issues.apache.org/jira/browse/LANG-607
 Project: Commons Lang
  Issue Type: Bug
  Components: lang.*
Affects Versions: 2.5
 Environment: java version 1.6.0_16
 Java(TM) SE Runtime Environment (build 1.6.0_16-b01)
 Java HotSpot(TM) 64-Bit Server VM (build 14.2-b01, mixed mode)
 Microsoft Windows [Version 6.0.6002]
 Apache Maven 2.2.1 (r801777; 2009-08-06 12:16:01-0700)
 Java version: 1.6.0_16
 Java home: C:\Program Files\Java\jdk1.6.0_16\jre
 Default locale: en_US, platform encoding: Cp1252
 OS name: windows vista version: 6.0 arch: amd64 Family: windows
Reporter: Gary Gregory
Assignee: Gary Gregory
Priority: Minor
 Fix For: 3.0

 Attachments: LANG-607.diff


 StringUtils.containsAny methods incorrectly matches Unicode 2.0+ 
 supplementary characters.
 For example, define a test fixture to be the Unicode character U+2 where 
 U+2 is written in Java source as \uD840\uDC00
   private static final String CharU2 = \uD840\uDC00;
   private static final String CharU20001 = \uD840\uDC01;
 You can see Unicode supplementary characters correctly implemented in the JRE 
 call:
   assertEquals(-1, CharU2.indexOf(CharU20001));
 But this is broken:
   assertEquals(false, StringUtils.containsAny(CharU2, CharU20001));
   assertEquals(false, StringUtils.containsAny(CharU20001, CharU2));
 This is fine:
   assertEquals(true, StringUtils.contains(CharU2 + CharU20001, 
 CharU2));
   assertEquals(true, StringUtils.contains(CharU2 + CharU20001, 
 CharU20001));
   assertEquals(true, StringUtils.contains(CharU2, CharU2));
   assertEquals(false, StringUtils.contains(CharU2, CharU20001));
 because the method calls the JRE to perform the match.
 More than you want to know:
 - http://java.sun.com/developer/technicalArticles/Intl/Supplementary/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.