[ https://issues.apache.org/jira/browse/LANG-480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Henri Yandell closed LANG-480. ------------------------------ Resolution: Fixed svn ci -m "Applying Alexander Kjall's patch from LANG-480; along with a unit test made from his example. Fixes unicode conversion above U+00FFFF being done into 2 characters" Sending src/java/org/apache/commons/lang/Entities.java Sending src/test/org/apache/commons/lang/StringEscapeUtilsTest.java Transmitting file data .. Committed revision 749095. > StringEscapeUtils.escapeHtml incorrectly converts unicode characters above > U+00FFFF into 2 characters > ----------------------------------------------------------------------------------------------------- > > Key: LANG-480 > URL: https://issues.apache.org/jira/browse/LANG-480 > Project: Commons Lang > Issue Type: Bug > Affects Versions: 2.4 > Environment: doesn't matter > Reporter: Alexander Kjäll > Priority: Minor > Fix For: 3.0 > > Attachments: lang-480.patch > > > Characters that are represented as a 2 characters internaly by java are > incorrectly converted by the function. The following test displays the > problem quite nicely: > import org.apache.commons.lang.*; > public class J2 { > public static void main(String[] args) throws Exception { > // this is the utf8 representation of the character: > // COUNTING ROD UNIT DIGIT THREE > // in unicode > // codepoint: U+1D362 > byte[] data = new byte[] { (byte)0xF0, (byte)0x9D, (byte)0x8D, > (byte)0xA2 }; > //output is: �� > // should be: 𝍢 > System.out.println("'" + StringEscapeUtils.escapeHtml(new > String(data, "UTF8")) + "'"); > } > } > Should be very quick to fix, feel free to drop me an email if you want a > patch. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.