[
https://issues.apache.org/jira/browse/JCR-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17959774#comment-17959774
]
Konrad Windszus commented on JCR-5153:
--------------------------------------
Seems the way to escape surrogates as
https://www.rfc-editor.org/rfc/rfc3986#section-2.1 only defines encoding for
UTF-8 characters (2 hexdigits per character) first needs conversion from Javas
UTF-16 to UTF-8. According to [https://www.rfc-editor.org/rfc/rfc3629,] chapter
3
{quote}
The definition of UTF-8 prohibits encoding character numbers between
U+D800 and U+DFFF, which are reserved for use with the UTF-16
encoding form (as surrogate pairs) and do not directly represent
characters. When encoding in UTF-8 from UTF-16 data, it is necessary
to first decode the UTF-16 data to obtain character numbers, which
are then encoded in UTF-8 as described above.
{quote}
> Text.escapeIllegalJcrChars(String) does not escape all illegal JCR characters
> -----------------------------------------------------------------------------
>
> Key: JCR-5153
> URL: https://issues.apache.org/jira/browse/JCR-5153
> Project: Jackrabbit Content Repository
> Issue Type: Bug
> Components: jackrabbit-jcr-commons
> Affects Versions: 2.23.1
> Reporter: Konrad Windszus
> Priority: Major
>
> The grammar at
> https://s.apache.org/jcr-2.0-spec/3_Repository_Model.html#3.2.2%20Local%20Names
> defines which characters are valid within a local JCR name. However the
> method {{Text.escapeIllegalJcrChars(String)}} does not properly escape:
> # unicode characters which are outside the char range defined by
> https://www.w3.org/TR/xml/#NT-Char
--
This message was sent by Atlassian Jira
(v8.20.10#820010)