[ https://issues.apache.org/jira/browse/JCR-4935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17727405#comment-17727405 ]
Julian Reschke commented on JCR-4935: ------------------------------------- I agree that there is a problem here, but the JCR spec in fact defines the handling here (fortunately). See <[https://developer.adobe.com/experience-manager/reference-materials/spec/jcr/2.0/7_Export.html]>: {quote} * If, after conversion to string and entity escaping is performed, the string form of a value still contains characters which cannot appear in an XML document (neither as literals nor as character references{^}[13|https://developer.adobe.com/experience-manager/reference-materials/spec/jcr/2.0/7_Export.html#sdfootnote13sym]{^}) then: ## The string form is further encoded using Base64 encoding. ## The attribute xsi:type=“xsd:base64Binary” is added to the <sv:value> element. ## The namespace mappings for xsi and xsd are added to the exported XML document so that the xsi:type attribute is within their scope. The namespace declarations required are xmlns:xsd=“http://www.w3.org/2001/XMLSchema” and xmlns:xsi=“http://www.w3.org/2001/XMLSchema-instance”. Note that the prefixes representing these two namespaces need not be _literally_ “xsd” and “xsi”. Any two prefixes are permitted as long as the corresponding namespace declarations are changed accordingly. {quote} > session.exportDocumentView() generates unparsable XML if a JCR Property > contains invalid XML character > ------------------------------------------------------------------------------------------------------ > > Key: JCR-4935 > URL: https://issues.apache.org/jira/browse/JCR-4935 > Project: Jackrabbit Content Repository > Issue Type: Bug > Components: jackrabbit-jcr-commons > Affects Versions: 2.21.17 > Reporter: Yegor Kozlov > Assignee: Julian Reschke > Priority: Major > Attachments: image-2023-05-29-14-58-05-591.png > > > I came across this issue in AEM, where user content can contain all kinds of > special characters. In my case it was a 0x3 character (^C) in a node property > which was written in the JCR XML as-is, and it resulted in a unparsable > output. > !image-2023-05-29-14-58-05-591.png|width=968,height=305! > IMO control characters, non-characters and out-of-unicode-range characters > should be skipped when writing XML. These can come from user data and can act > as a "poison pill" breaking the export/import functionality. > > The PR is coming. -- This message was sent by Atlassian Jira (v8.20.10#820010)