[ https://issues.apache.org/jira/browse/LANG-480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12666149#action_12666149 ]
James Carman commented on LANG-480: ----------------------------------- Of course it is. :) My point was that we would be engaging in reflection nastiness and it might not be worth it. I would suggest that if Alexander needs a release sooner that they do an "internal" release from the trunk with the changes applied and then "upgrade" when we get a newer release out. I don't like the idea of building in the reflection stuff. We get no compiler checking that way and it leads to unreadable code. > StringEscapeUtils.escapeHtml incorrectly converts unicode characters above > U+00FFFF into 2 characters > ----------------------------------------------------------------------------------------------------- > > Key: LANG-480 > URL: https://issues.apache.org/jira/browse/LANG-480 > Project: Commons Lang > Issue Type: Bug > Affects Versions: 2.4 > Environment: doesn't matter > Reporter: Alexander Kjäll > Priority: Minor > Attachments: lang-480.patch > > Original Estimate: 4h > Remaining Estimate: 4h > > Characters that are represented as a 2 characters internaly by java are > incorrectly converted by the function. The following test displays the > problem quite nicely: > import org.apache.commons.lang.*; > public class J2 { > public static void main(String[] args) throws Exception { > // this is the utf8 representation of the character: > // COUNTING ROD UNIT DIGIT THREE > // in unicode > // codepoint: U+1D362 > byte[] data = new byte[] { (byte)0xF0, (byte)0x9D, (byte)0x8D, > (byte)0xA2 }; > //output is: �� > // should be: 𝍢 > System.out.println("'" + StringEscapeUtils.escapeHtml(new > String(data, "UTF8")) + "'"); > } > } > Should be very quick to fix, feel free to drop me an email if you want a > patch. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.