Eugene Shkel created XALANJ-2593:
------------------------------------

             Summary: Incorrect showing of supplementary characters in 
attributes
                 Key: XALANJ-2593
                 URL: https://issues.apache.org/jira/browse/XALANJ-2593
             Project: XalanJ2
          Issue Type: Bug
      Security Level: No security risk; visible to anyone (Ordinary problems in 
Xalan projects.  Anybody can view the issue.)
          Components: Serialization
    Affects Versions: 2.7.2
         Environment: Win 7 x64, Java 1.6 
            Reporter: Eugene Shkel
            Assignee: Steven J. Hathaway


In Xalan 2.7.2 the supplementary characters (see 
http://www.oracle.com/technetwork/articles/javase/supplementary-142654.html for 
details) shown incorrectly in attributes .
For example, I need to show symbols 𣎴 (& # 144308 ; ) or 𠘨 (& # 132648 ; ) in 
attribute "y" of element "x"
Expected result: {code}<?xml version="1.0" encoding="UTF-8"?><x y="&#144308; - 
&#132648;"/>{code}
Actual result for Xalan 2.7.2 is:{code} <?xml version="1.0" 
encoding="UTF-8"?><x y="&#55372;&#57268; - &#55361;&#56872;"/>{code}

Code snippet for test:
{code}
public static void main(String[] argv) throws Exception {
        TransformerFactory tFactory = TransformerFactory.newInstance();
        StreamSource stylesource = new StreamSource(new StringReader("<?xml 
version=\"1.0\" encoding=\"UTF-8\"?><xsl:stylesheet 
xmlns:xsl=\"http://www.w3.org/1999/XSL/Transform\"; version=\"1.0\" 
><xsl:template match=\"/\"><x y=\"{xslt/search/value1}\" 
/></xsl:template></xsl:stylesheet>"));
        Transformer transformer = tFactory.newTransformer(stylesource);
        StreamSource source = new StreamSource(new StringReader("<?xml 
version=\"1.0\"?><xslt><search><value1>𣎴 - 𠘨</value1></search></xslt>"));
        Result result = new StreamResult(System.out);
        transformer.transform(source, result);
    } 
{code}
The problem relates to the method 
org.apache.xml.serializer.ToStream.writeAttrString(Writer, String, String). 
{code}
            if (m_charInfo.shouldMapAttrChar(ch)) {
                // The character is supposed to be replaced by a String
                // e.g.   '&'  -->  "&amp;"
                // e.g.   '<'  -->  "&lt;"
                accumDefaultEscape(writer, ch, i, stringChars, len, false, 
true);
            }
{code}
this part doesn't process multicharacter sequences like supplementary 
characters within Java platform and this leads to executing next part within 
same method
{code}
            else {
                    // This is a fallback plan, we should never get here
                    // but if the character wasn't previously handled
                    // (i.e. isn't in the encoding, etc.) then what
                    // should we do?  We choose to write out a character ref
                    writer.write("!13&#");
                    writer.write(Integer.toString(ch));
                    writer.write(';');
                }
{code}
 PS: Can't add patch file, so put here.
{code}
--- src\org\apache\xml\serializer\ToStream.java 2014-03-26 17:21:30 +0200
+++ src\org\apache\xml\serializer\ToStream.java 2014-09-09 19:09:30 +0300
@@ -2112,8 +2112,13 @@
                 // e.g.   '&'  -->  "&amp;"
                 // e.g.   '<'  -->  "&lt;"
                 accumDefaultEscape(writer, ch, i, stringChars, len, false, 
true);
-            }
-            else {
+            } else if (Encodings.isHighUTF16Surrogate(ch)) {
+                // more than single input character can be processed
+                // within accumDefaultEscape()
+                // so we set appropriate value for loop for().
+                i = accumDefaultEscape(writer, ch, i, stringChars, len, false, 
true); 
+
+            } else {
                 if (0x0 <= ch && ch <= 0x1F) {
                     // Range 0x00 through 0x1F inclusive
                     // This covers the non-whitespace control characters
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to