I don't think you can represent codepoint 24 in well-formed XML, with or without a CDATA.
http://www.w3.org/TR/xquery/#doc-xquery-CDataSectionContents defines the CDATA section as containing Char, and refers to NT-Char from XML. [108] CDataSectionContents ::= (Char* - (Char* ']]>' Char*)) [157] Char ::= [http://www.w3.org/TR/REC-xml#NT-Char] http://www.w3.org/TR/REC-xml/#NT-Char [2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */ Codepoint 24 (0x18) is less that 0x20, and is not 0x9, 0xa, or 0xd. So as I read the spec, codepoint 24 (0x18) isn't allowed in CDATA. If that's correct, then it may be a bug that the server allowed your test case to run at all. To check this with another tool, I placed a test XML doc in a file and tried xmllint: $ xmllint /tmp/cp24 /tmp/cp24:1: parser error : CData section not finished This is a tes <doc><![CDATA[This is a test ▒ if not escaped, this should not work]]></doc> ^ /tmp/cp24:1: parser error : PCDATA invalid Char value 24 ... So xmllint seems to agree with my interpretation of the W3C recommendations. For more discussion of CDATA and MarkLogic Server, you might find http://marklogic.markmail.org/search/?q=cdata interesting. -- Mike On 2010-07-15 10:32, cashatzer-markm...@yahoo.com wrote: > We are having a problem where Marklogic appears to be removing the CDATA > sections that we have wrapped our text elements with. This is causing > Marklogic replication to fail for documents with content that needs to be > escaped; the document is saved successfully in the "source" database, but > when it goes to replicate a document containing content that must be escaped > to a "target" database, it fails. > > Below is an example of what I am referring to. NOTE that there is SUPPOSED > to be a non-printable character between "test" and "if" in the text below. > This non-printable character is being converted to by Marklogic. When > Marklogic replication tries to send that text over to the target, it fails > with a similar error (XDMP-DOCCHARREF) to the error shown below. I can send > a text file containing this text with the non-printable character if needed > to duplicate this problem. > > Why is MarkLogic stripping the CDATA sections? It should not do this. > Applications should not have to reprocess their documents to put CDATA > sections around text that was already wrapped previously. > > > EXAMPLE: > > cqsh> xdmp:document-insert("testcdata.xml", > -> <doc><![CDATA[This is a test if not escaped, this should not > work]]></doc>); > Done (0.04 sec) > > cqsh> for $i in //doc return $i; > <doc>This is a test if not escaped, this should not work</doc> > Done (0.01 sec) > cqsh> xdmp:document-insert("testcdata.xml", > -> <doc>This is a test if not escaped, this should not work</doc>); > --------------------------- > XQuery Error > --------------------------- > Message: XDMP-CHARREF: (err:XPST0003) Invalid character reference "24 if not > escaped, this should not work" > --STACK DUMP-- > line number: 1 > context item: null > context position: 0 > uri: /eval > variable bindings: > _______________________________________________ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general