[ 
https://issues.apache.org/jira/browse/JCR-4935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17727405#comment-17727405
 ] 

Julian Reschke commented on JCR-4935:
-------------------------------------

I agree that there is a problem here, but the JCR spec in fact defines the 
handling here (fortunately). See 
<[https://developer.adobe.com/experience-manager/reference-materials/spec/jcr/2.0/7_Export.html]>:

 
{quote} * If, after conversion to string and entity escaping is performed, the 
string form of a value still contains characters which cannot appear in an XML 
document (neither as literals nor as character 
references{^}[13|https://developer.adobe.com/experience-manager/reference-materials/spec/jcr/2.0/7_Export.html#sdfootnote13sym]{^})
 then:

 ## The string form is further encoded using Base64 encoding.

 ## The attribute xsi:type=“xsd:base64Binary” is added to the <sv:value> 
element.

 ## The namespace mappings for xsi and xsd are added to the exported XML 
document so that the xsi:type attribute is within their scope. The namespace 
declarations required are xmlns:xsd=“http://www.w3.org/2001/XMLSchema” and 
xmlns:xsi=“http://www.w3.org/2001/XMLSchema-instance”. Note that the prefixes 
representing these two namespaces need not be _literally_ “xsd” and “xsi”. Any 
two prefixes are permitted as long as the corresponding namespace declarations 
are changed accordingly.
{quote}

> session.exportDocumentView() generates unparsable XML if a JCR Property 
> contains invalid XML character
> ------------------------------------------------------------------------------------------------------
>
>                 Key: JCR-4935
>                 URL: https://issues.apache.org/jira/browse/JCR-4935
>             Project: Jackrabbit Content Repository
>          Issue Type: Bug
>          Components: jackrabbit-jcr-commons
>    Affects Versions: 2.21.17
>            Reporter: Yegor Kozlov
>            Assignee: Julian Reschke
>            Priority: Major
>         Attachments: image-2023-05-29-14-58-05-591.png
>
>
> I came across this issue in AEM, where user content can contain all kinds of 
> special characters. In my case it was a 0x3 character (^C) in a node property 
> which was written in the JCR XML as-is, and it resulted in a unparsable 
> output. 
> !image-2023-05-29-14-58-05-591.png|width=968,height=305!
> IMO control characters, non-characters and out-of-unicode-range characters 
> should be skipped when writing XML. These can come from user data and can act 
> as a "poison pill" breaking the export/import functionality. 
>  
> The PR is coming.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to