[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15492282#comment-15492282
 ] 

Ben Fortuna commented on COCOON-2352:
-------------------------------------

So I've looked at XMLEncoder, and it seems that the fix will require a change 
to the method signature - specifically XMLEncoder.encode(char c):

https://github.com/apache/cocoon/blob/3ce60f6ecb257b138fc68077bc562a871df045e5/src/blocks/serializers/java/org/apache/cocoon/components/serializers/encoding/XMLEncoder.java#L88

Unfortunately this also means the Encoder interface needs to change, so will 
need an exercise to identify what else implements this interface. The proposed 
change would be something like:

public char[] Encoder.encode(char[] chars)

https://github.com/apache/cocoon/blob/3ce60f6ecb257b138fc68077bc562a871df045e5/src/blocks/serializers/java/org/apache/cocoon/components/serializers/encoding/Encoder.java#L36

I'm happy to implement a fix and submit a pull request, just looking for some 
acknowledgement of the issue before proceeding.


> XMLEncoder doesn't support Unicode surrogate pairs
> --------------------------------------------------
>
>                 Key: COCOON-2352
>                 URL: https://issues.apache.org/jira/browse/COCOON-2352
>             Project: Cocoon
>          Issue Type: Bug
>          Components: * Cocoon Core
>            Reporter: Ben Fortuna
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to