I'm getting some very strange results from E4X and normalize() when working with CDATA text nodes, especially when those text nodes may contain strings that, unescaped, represent CDATA end tags.
Consider the following code: ------------------------------------------------------------ var x:XML = <test />; x.appendChild('<![CDATA[ test1 ]]]]>'); x.appendChild('<![CDATA[> test2 ]]>'); trace ('--- before normalize (string value) ---'); trace (x.toString()); trace ('--- before normalize (full xml) ---'); trace (x.toXMLString()); trace ("\n"); x.normalize(); trace ('--- after normalize (string value) ---'); trace (x.toString()); trace ('--- after normalize (full xml) ---'); trace (x.toXMLString()); trace ("\n"); var xAsString:String = x.toXMLString(); x = XML(xAsString); trace ('--- after reparse (string value) ---'); trace (x.toString()); trace ('--- after reparse (full xml) ---'); trace (x.toXMLString()); trace ("\n"); ------------------------------------------------------------ Here's the output when using Flash Player 10.0.12.36 Debug on Linux: --- before normalize (string value) --- test1 ]]> test2 --- before normalize (full xml) --- <test> <![CDATA[ test1 ]]]]> <![CDATA[> test2 ]]> </test> --- after normalize (string value) --- test1 ]]> test2 --- after normalize (full xml) --- <test><![CDATA[ test1 ]]> test2 ]]></test> --- after reparse (string value) --- test1 test2 ]]> --- after reparse (full xml) --- <test> <![CDATA[ test1 ]]> test2 ]]> </test> Note how the call to .normalize() causes the text of <test> to be concatenated to one incorrectly formatted CDATA node, containing an unescaped "]]>" end-of-CDATA marker. The resulting XML is invalid and will not parse with other XML parsers, such as libxml2's xmllint: badxml.xml:1: parser error : Sequence ']]>' not allowed in content <test><![CDATA[ test1 ]]> test2 ]]></test> Using Flash's E4X to re-parse this XML does not throw an error, but the resulting XML does not represent the original XML in any way. It appears that the XML parser switches out of "CDATA-mode" when reaching the first end-of-CDATA-marker (between 'test1' and 'test2'), and then enters some sort of "lenient parser mode" where it "helpfully" converts the bare '>' after test2 into >. Of course, the resulting string value for <test>'s text node is very much different from its original contents. (compare 'after normalize (string value)' to 'after reparse (string value)') On the other hand, not calling .normalize() causes the resulting XML to contain a newline "\n" character between the two original CDATA text nodes, which when parsed by other xml readers usually results in "]]\n>", or worse "]]\n >". Anyone have any experience with how to properly embed strings containing "xml-ish" content with E4X?