[ https://issues.apache.org/jira/browse/LANG-720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13067496#comment-13067496 ]
Henri Yandell commented on LANG-720: ------------------------------------ Note that we'll release this in 3.0.1. 3.0 will go out with this as a known issue and 3.0.1 will follow (August). > StringEscapeUtils.escapeXml(input) outputs wrong results when an input > contains characters in Supplementary Planes. > ------------------------------------------------------------------------------------------------------------------- > > Key: LANG-720 > URL: https://issues.apache.org/jira/browse/LANG-720 > Project: Commons Lang > Issue Type: Bug > Components: lang.*, lang.text.translate.* > Affects Versions: 3.0 > Reporter: Taro Yabuki > Labels: patch > Fix For: 3.0.1 > > Attachments: CharSequenceTranslator.java.20110714.diff > > > Hello. > I use StringEscapeUtils.escapeXml(input) to escape special characters for XML. > This method outputs wrong results when input contains characters in > Supplementary Planes. > String str1 = "\uD842\uDFB7" + "A"; > String str2 = StringEscapeUtils.escapeXml(str1); > // The value of str2 must be equal to the one of str1, > // because str1 does not contain characters to be escaped. > // However, str2 is diffrent from str1. > System.out.println(URLEncoder.encode(str1, "UTF-16BE")); //%D8%42%DF%B7A > System.out.println(URLEncoder.encode(str2, "UTF-16BE")); //%D8%42%DF%B7%FF%FD > The cause of this problem is that the loop to translate input character by > character is wrong. > In CharSequenceTranslator.translate(CharSequence input, Writer out), > loop counter "i" moves from 0 to Character.codePointCount(input, 0, > input.length()), > but it should move from 0 to input.length(). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira