Characters from the Unicode Supplemental Multilingual Plane included in story definitions get rendered incorrectly in HTML --------------------------------------------------------------------------------------------------------------------------
Key: JBEHAVE-374 URL: http://jira.codehaus.org/browse/JBEHAVE-374 Project: JBehave Issue Type: Bug Affects Versions: 3.0.3 Environment: Windows 7, 64-bit Reporter: Alistair Dutton Priority: Minor If one includes characters from the Unicode Supplemental Multilingual Plane (code points U+10000 upwards) in a story file, if one then asks for an HTML report from the test run the characters will not be HTML-escaped correctly. For example, given a story file with the following scenario: ------------ Scenario: Some scenario Given some situation When I do something Then the result is 𐐆 ------------ (The "dagger"-type character is actually code point U+10406 - see http://en.wikibooks.org/wiki/Unicode/Character_reference/10000-10FFF) The resulting HTML report will have the "dagger" character escaped as �� - which represent surrogate-pair code points (used in UTF-16 only) and so is rendered as gibberish in HTML. The escape should be 𐐆 NOTE: This is NOT a bug in JBehave per se - the bug is in the StringEscapeUtils class of commons-lang. A related bug has already been raised (and fixed) in commons-lang: https://issues.apache.org/jira/browse/LANG-617. Although the commons-lang bug report relates to XML escaping rather than HTML escaping, it seems likely that the fix will cover both. Unfortunately, the fix is in commons-lang 3.0... -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://jira.codehaus.org/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe from this list, please visit: http://xircles.codehaus.org/manage_email