[ https://issues.apache.org/jira/browse/JCR-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17959774#comment-17959774 ]
Konrad Windszus commented on JCR-5153: -------------------------------------- Seems the way to escape surrogates as https://www.rfc-editor.org/rfc/rfc3986#section-2.1 only defines encoding for UTF-8 characters (2 hexdigits per character) first needs conversion from Javas UTF-16 to UTF-8. According to [https://www.rfc-editor.org/rfc/rfc3629,] chapter 3 {quote} The definition of UTF-8 prohibits encoding character numbers between U+D800 and U+DFFF, which are reserved for use with the UTF-16 encoding form (as surrogate pairs) and do not directly represent characters. When encoding in UTF-8 from UTF-16 data, it is necessary to first decode the UTF-16 data to obtain character numbers, which are then encoded in UTF-8 as described above. {quote} > Text.escapeIllegalJcrChars(String) does not escape all illegal JCR characters > ----------------------------------------------------------------------------- > > Key: JCR-5153 > URL: https://issues.apache.org/jira/browse/JCR-5153 > Project: Jackrabbit Content Repository > Issue Type: Bug > Components: jackrabbit-jcr-commons > Affects Versions: 2.23.1 > Reporter: Konrad Windszus > Priority: Major > > The grammar at > https://s.apache.org/jcr-2.0-spec/3_Repository_Model.html#3.2.2%20Local%20Names > defines which characters are valid within a local JCR name. However the > method {{Text.escapeIllegalJcrChars(String)}} does not properly escape: > # unicode characters which are outside the char range defined by > https://www.w3.org/TR/xml/#NT-Char -- This message was sent by Atlassian Jira (v8.20.10#820010)