[ https://issues.apache.org/jira/browse/LANG-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16573108#comment-16573108 ]
HiuFung Kwok edited comment on LANG-1406 at 8/8/18 12:04 PM: ------------------------------------------------------------- Hi all, After a bit of research, it seem to be a known issue when unicode is contained on a String object ([ref|https://www.quora.com/Is-Javas-toLowercase-string-method-reliable-for-Unicode] ), String.toLowerCase() would produce a incorrect result. In this case "\u0130" would become a String object with three char which are [ i, ̇, x] instead of [ İ, x]. So by given a incorrect result from .toLowCase() method, StringUtils.replaceIgnoreCase end attempt to access the segment of string which is not exist which is 3 in this case while str.length() is 2. The fixture I come up with is replacing the .toLowcase() to .toUpperCase() in order to avoid the mis-interprettion on .toLowerCase while performing case-insensitive comparisons. Fixture: [https://github.com/HiuKwok/commons-lang/commit/e0f6c7802b5e721019a602bf30b31c79dbf6d233] Testcase: https://github.com/HiuKwok/commons-lang/commit/590f90889bf61a5570bd98b78e73410a07d7410b was (Author: hiukwok): Hi all, After a bit of research, it seem to be a known issue when unicode is contained on a String object[ref|https://www.quora.com/Is-Javas-toLowercase-string-method-reliable-for-Unicode], String.toLowerCase() would produce a incorrect result. In this case "\u0130" would become a String object with three char which are [ i, ̇, x] instead of [ İ, x]. So by given a incorrect result from .toLowCase() method, StringUtils.replaceIgnoreCase end attempt to access the segment of string which is not exist which is 3 in this case while str.length() is 2. The fixture I come up with is replacing the .toLowcase() to .toUpperCase() in order to avoid the mis-interprettion on .toLowerCase while performing case-insensitive comparisons. Fixture: [https://github.com/HiuKwok/commons-lang/commit/e0f6c7802b5e721019a602bf30b31c79dbf6d233] Testcase: https://github.com/HiuKwok/commons-lang/commit/590f90889bf61a5570bd98b78e73410a07d7410b > StringIndexOutOfBoundsException in StringUtils.replaceIgnoreCase > ---------------------------------------------------------------- > > Key: LANG-1406 > URL: https://issues.apache.org/jira/browse/LANG-1406 > Project: Commons Lang > Issue Type: Bug > Components: lang.* > Reporter: Michael Ryan > Priority: Major > > STEPS TO REPRODUCE: > {code} > StringUtils.replaceIgnoreCase("\u0130x", "x", "") > {code} > EXPECTED: "\u0130" is returned. > ACTUAL: StringIndexOutOfBoundsException > This happens because the replace method is assuming that text.length() == > text.toLowerCase().length(), which is not true for certain characters. -- This message was sent by Atlassian JIRA (v7.6.3#76005)