[jira] [Comment Edited] (LANG-1406) StringIndexOutOfBoundsException in StringUtils.replaceIgnoreCase

HiuFung Kwok (JIRA) Wed, 08 Aug 2018 05:05:20 -0700


    [ 
https://issues.apache.org/jira/browse/LANG-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16573108#comment-16573108
 ]


HiuFung Kwok edited comment on LANG-1406 at 8/8/18 12:04 PM:
-------------------------------------------------------------

Hi all,

After a bit of research, it seem to be a known issue when unicode is contained 
on a String object 
([ref|https://www.quora.com/Is-Javas-toLowercase-string-method-reliable-for-Unicode]
 ), String.toLowerCase() would produce a incorrect result.

In this case "\u0130" would become a String object with three char which are [ 
i,  ̇, x] instead of [ İ, x].

So by given a incorrect result from .toLowCase() method, 
StringUtils.replaceIgnoreCase end attempt to access the segment of string which 
is not exist which is 3 in this case while str.length() is 2.

The fixture I come up with is replacing the .toLowcase() to .toUpperCase() in 
order to avoid the mis-interprettion on .toLowerCase while performing 
case-insensitive comparisons.

Fixture: 
[https://github.com/HiuKwok/commons-lang/commit/e0f6c7802b5e721019a602bf30b31c79dbf6d233]

Testcase: 
https://github.com/HiuKwok/commons-lang/commit/590f90889bf61a5570bd98b78e73410a07d7410b

 

 


was (Author: hiukwok):
Hi all,

After a bit of research, it seem to be a known issue when unicode is contained 
on a String 
object[ref|https://www.quora.com/Is-Javas-toLowercase-string-method-reliable-for-Unicode],
 String.toLowerCase() would produce a incorrect result.

In this case "\u0130" would become a String object with three char which are [ 
i,  ̇, x] instead of [ İ, x].

So by given a incorrect result from .toLowCase() method, 
StringUtils.replaceIgnoreCase end attempt to access the segment of string which 
is not exist which is 3 in this case while str.length() is 2.

The fixture I come up with is replacing the .toLowcase() to .toUpperCase() in 
order to avoid the mis-interprettion on .toLowerCase while performing 
case-insensitive comparisons.

Fixture: 
[https://github.com/HiuKwok/commons-lang/commit/e0f6c7802b5e721019a602bf30b31c79dbf6d233]

Testcase: 
https://github.com/HiuKwok/commons-lang/commit/590f90889bf61a5570bd98b78e73410a07d7410b

 

 

> StringIndexOutOfBoundsException in StringUtils.replaceIgnoreCase
> ----------------------------------------------------------------
>
>                 Key: LANG-1406
>                 URL: https://issues.apache.org/jira/browse/LANG-1406
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.*
>            Reporter: Michael Ryan
>            Priority: Major
>
> STEPS TO REPRODUCE:
> {code}
> StringUtils.replaceIgnoreCase("\u0130x", "x", "")
> {code}
> EXPECTED: "\u0130" is returned.
> ACTUAL: StringIndexOutOfBoundsException
> This happens because the replace method is assuming that text.length() == 
> text.toLowerCase().length(), which is not true for certain characters.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (LANG-1406) StringIndexOutOfBoundsException in StringUtils.replaceIgnoreCase

Reply via email to