[ 
https://issues.apache.org/jira/browse/JCR-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17959774#comment-17959774
 ] 

Konrad Windszus commented on JCR-5153:
--------------------------------------

Seems the way to escape surrogates as 
https://www.rfc-editor.org/rfc/rfc3986#section-2.1 only defines encoding for 
UTF-8 characters (2 hexdigits per character) first needs conversion from Javas 
UTF-16 to UTF-8. According to [https://www.rfc-editor.org/rfc/rfc3629,] chapter 
3

{quote}
The definition of UTF-8 prohibits encoding character numbers between
U+D800 and U+DFFF, which are reserved for use with the UTF-16
encoding form (as surrogate pairs) and do not directly represent
characters.  When encoding in UTF-8 from UTF-16 data, it is necessary
to first decode the UTF-16 data to obtain character numbers, which
are then encoded in UTF-8 as described above.
{quote}

> Text.escapeIllegalJcrChars(String) does not escape all illegal JCR characters
> -----------------------------------------------------------------------------
>
>                 Key: JCR-5153
>                 URL: https://issues.apache.org/jira/browse/JCR-5153
>             Project: Jackrabbit Content Repository
>          Issue Type: Bug
>          Components: jackrabbit-jcr-commons
>    Affects Versions: 2.23.1
>            Reporter: Konrad Windszus
>            Priority: Major
>
> The grammar at 
> https://s.apache.org/jcr-2.0-spec/3_Repository_Model.html#3.2.2%20Local%20Names
>  defines which characters are valid within a local JCR name. However the 
> method {{Text.escapeIllegalJcrChars(String)}} does not properly escape:
> # unicode characters which are outside the char range defined by 
> https://www.w3.org/TR/xml/#NT-Char



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to