[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15582333#comment-15582333 ] Ben Fortuna commented on COCOON-2352: - Great! I've tested the snapshot against my code and it looks good. Many thanks for your assistance. :-) > XMLEncoder doesn't support Unicode surrogate pairs > -- > > Key: COCOON-2352 > URL: https://issues.apache.org/jira/browse/COCOON-2352 > Project: Cocoon > Issue Type: Bug > Components: * Cocoon Core, Blocks: Serializers >Affects Versions: 2.1.12 >Reporter: Ben Fortuna >Assignee: Francesco Chicchiriccò > Fix For: 2.1.13 > > > Whilst investigating an issue with the Sling project and support for emoji > characters, I've come to notice that the XMLEncoder used by HTMLSerializer > doesn't support Unicode surrogate pairs to represent higher order unicode > characters. > A simple unit test that demonstrates this issue is here: > https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy > More background info here also: SLING-5973 > This seems to have been identified/addressed in other Apache projects also: > https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15582339#comment-15582339 ] Francesco Chicchiriccò commented on COCOON-2352: You're welcome ;-) > XMLEncoder doesn't support Unicode surrogate pairs > -- > > Key: COCOON-2352 > URL: https://issues.apache.org/jira/browse/COCOON-2352 > Project: Cocoon > Issue Type: Bug > Components: * Cocoon Core, Blocks: Serializers >Affects Versions: 2.1.12 >Reporter: Ben Fortuna >Assignee: Francesco Chicchiriccò > Fix For: 2.1.13 > > > Whilst investigating an issue with the Sling project and support for emoji > characters, I've come to notice that the XMLEncoder used by HTMLSerializer > doesn't support Unicode surrogate pairs to represent higher order unicode > characters. > A simple unit test that demonstrates this issue is here: > https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy > More background info here also: SLING-5973 > This seems to have been identified/addressed in other Apache projects also: > https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15582202#comment-15582202 ] Hudson commented on COCOON-2352: SUCCESS: Integrated in Jenkins build Cocoon 2.1.X #116 (See [https://builds.apache.org/job/Cocoon%202.1.X/116/]) [COCOON-2352] Applying further changes to better deal with HTML encoding (ilgrosso: [http://svn.apache.org/viewvc/?view=rev=1765265]) * (edit) BRANCH_2_1_X/src/blocks/serializers/test/org/apache/cocoon/components/serializers/encoding/XMLEncoderTestCase.java > XMLEncoder doesn't support Unicode surrogate pairs > -- > > Key: COCOON-2352 > URL: https://issues.apache.org/jira/browse/COCOON-2352 > Project: Cocoon > Issue Type: Bug > Components: * Cocoon Core, Blocks: Serializers >Affects Versions: 2.1.12 >Reporter: Ben Fortuna >Assignee: Francesco Chicchiriccò > Fix For: 2.1.13 > > > Whilst investigating an issue with the Sling project and support for emoji > characters, I've come to notice that the XMLEncoder used by HTMLSerializer > doesn't support Unicode surrogate pairs to represent higher order unicode > characters. > A simple unit test that demonstrates this issue is here: > https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy > More background info here also: SLING-5973 > This seems to have been identified/addressed in other Apache projects also: > https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15582183#comment-15582183 ] Francesco Chicchiriccò commented on COCOON-2352: My bad: problem solved, committed to (Cocoon 2.1) * http://svn.apache.org/repos/asf/cocoon/branches/BRANCH_2_1_X/src/blocks/serializers/java/org/apache/cocoon/components/serializers/encoding/XMLEncoder.java * http://svn.apache.org/repos/asf/cocoon/branches/BRANCH_2_1_X/src/blocks/serializers/test/org/apache/cocoon/components/serializers/encoding/XMLEncoderTestCase.java and (Maven artifact, with SNAPSHOT already redeployed): * http://svn.apache.org/repos/asf/cocoon/subprojects/cocoon-serializers-charsets/trunk/src/main/java/org/apache/cocoon/components/serializers/encoding/XMLEncoder.java * http://svn.apache.org/repos/asf/cocoon/subprojects/cocoon-serializers-charsets/trunk/src/test/java/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.java > XMLEncoder doesn't support Unicode surrogate pairs > -- > > Key: COCOON-2352 > URL: https://issues.apache.org/jira/browse/COCOON-2352 > Project: Cocoon > Issue Type: Bug > Components: * Cocoon Core, Blocks: Serializers >Affects Versions: 2.1.12 >Reporter: Ben Fortuna >Assignee: Francesco Chicchiriccò > Fix For: 2.1.13 > > > Whilst investigating an issue with the Sling project and support for emoji > characters, I've come to notice that the XMLEncoder used by HTMLSerializer > doesn't support Unicode surrogate pairs to represent higher order unicode > characters. > A simple unit test that demonstrates this issue is here: > https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy > More background info here also: SLING-5973 > This seems to have been identified/addressed in other Apache projects also: > https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15582147#comment-15582147 ] Ben Fortuna edited comment on COCOON-2352 at 10/17/16 12:46 PM: Hmm, do you have a link to the source? I checked on BRANCH_2_1_X and it still has the old code. I noticed the error is on line 42, but the test I submitted only has 33 lines. Note it is important for the test to encode the surrogate pairs together, which is why I had the sequence like this: char[] expectedValue = encoder.encode((char) 127808); // surrogate 1/2 assertTrue(encoder.encode('\uD83C').length == 0); // surrogate 2/2 assertTrue(Arrays.equals(expectedValue, encoder.encode('\uDF40'))); was (Author: fortuna): Hmm, do you have a link to the source? I checked on BRANCH_2_1_X and it still has the old code. I noticed the error is on line 42, but the test I submitted only has 33 lines. Note it is important for the test to encode the surrogate pairs together, which is why I had the sequence like this: ``` char[] expectedValue = encoder.encode((char) 127808); // surrogate 1/2 assertTrue(encoder.encode('\uD83C').length == 0); // surrogate 2/2 assertTrue(Arrays.equals(expectedValue, encoder.encode('\uDF40'))); ``` > XMLEncoder doesn't support Unicode surrogate pairs > -- > > Key: COCOON-2352 > URL: https://issues.apache.org/jira/browse/COCOON-2352 > Project: Cocoon > Issue Type: Bug > Components: * Cocoon Core, Blocks: Serializers >Affects Versions: 2.1.12 >Reporter: Ben Fortuna >Assignee: Francesco Chicchiriccò > Fix For: 2.1.13 > > > Whilst investigating an issue with the Sling project and support for emoji > characters, I've come to notice that the XMLEncoder used by HTMLSerializer > doesn't support Unicode surrogate pairs to represent higher order unicode > characters. > A simple unit test that demonstrates this issue is here: > https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy > More background info here also: SLING-5973 > This seems to have been identified/addressed in other Apache projects also: > https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15582147#comment-15582147 ] Ben Fortuna commented on COCOON-2352: - Hmm, do you have a link to the source? I checked on BRANCH_2_1_X and it still has the old code. I noticed the error is on line 42, but the test I submitted only has 33 lines. Note it is important for the test to encode the surrogate pairs together, which is why I had the sequence like this: {code} char[] expectedValue = encoder.encode((char) 127808); // surrogate 1/2 assertTrue(encoder.encode('\uD83C').length == 0); // surrogate 2/2 assertTrue(Arrays.equals(expectedValue, encoder.encode('\uDF40'))); {code} > XMLEncoder doesn't support Unicode surrogate pairs > -- > > Key: COCOON-2352 > URL: https://issues.apache.org/jira/browse/COCOON-2352 > Project: Cocoon > Issue Type: Bug > Components: * Cocoon Core, Blocks: Serializers >Affects Versions: 2.1.12 >Reporter: Ben Fortuna >Assignee: Francesco Chicchiriccò > Fix For: 2.1.13 > > > Whilst investigating an issue with the Sling project and support for emoji > characters, I've come to notice that the XMLEncoder used by HTMLSerializer > doesn't support Unicode surrogate pairs to represent higher order unicode > characters. > A simple unit test that demonstrates this issue is here: > https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy > More background info here also: SLING-5973 > This seems to have been identified/addressed in other Apache projects also: > https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15582147#comment-15582147 ] Ben Fortuna edited comment on COCOON-2352 at 10/17/16 12:45 PM: Hmm, do you have a link to the source? I checked on BRANCH_2_1_X and it still has the old code. I noticed the error is on line 42, but the test I submitted only has 33 lines. Note it is important for the test to encode the surrogate pairs together, which is why I had the sequence like this: ``` char[] expectedValue = encoder.encode((char) 127808); // surrogate 1/2 assertTrue(encoder.encode('\uD83C').length == 0); // surrogate 2/2 assertTrue(Arrays.equals(expectedValue, encoder.encode('\uDF40'))); ``` was (Author: fortuna): Hmm, do you have a link to the source? I checked on BRANCH_2_1_X and it still has the old code. I noticed the error is on line 42, but the test I submitted only has 33 lines. Note it is important for the test to encode the surrogate pairs together, which is why I had the sequence like this: {code} char[] expectedValue = encoder.encode((char) 127808); // surrogate 1/2 assertTrue(encoder.encode('\uD83C').length == 0); // surrogate 2/2 assertTrue(Arrays.equals(expectedValue, encoder.encode('\uDF40'))); {code} > XMLEncoder doesn't support Unicode surrogate pairs > -- > > Key: COCOON-2352 > URL: https://issues.apache.org/jira/browse/COCOON-2352 > Project: Cocoon > Issue Type: Bug > Components: * Cocoon Core, Blocks: Serializers >Affects Versions: 2.1.12 >Reporter: Ben Fortuna >Assignee: Francesco Chicchiriccò > Fix For: 2.1.13 > > > Whilst investigating an issue with the Sling project and support for emoji > characters, I've come to notice that the XMLEncoder used by HTMLSerializer > doesn't support Unicode surrogate pairs to represent higher order unicode > characters. > A simple unit test that demonstrates this issue is here: > https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy > More background info here also: SLING-5973 > This seems to have been identified/addressed in other Apache projects also: > https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs
[ https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15581450#comment-15581450 ] Francesco Chicchiriccò commented on COCOON-2352: With the new test code, I receive java.lang.IllegalArgumentException: Expected low surrogate char at org.apache.cocoon.components.serializers.encoding.XMLEncoder.encode(XMLEncoder.java:97) at org.apache.cocoon.components.serializers.encoding.XMLEncoderTestCase.testEncodingSurrogatePairs(XMLEncoderTestCase.java:42) when running the test. > XMLEncoder doesn't support Unicode surrogate pairs > -- > > Key: COCOON-2352 > URL: https://issues.apache.org/jira/browse/COCOON-2352 > Project: Cocoon > Issue Type: Bug > Components: * Cocoon Core, Blocks: Serializers >Affects Versions: 2.1.12 >Reporter: Ben Fortuna >Assignee: Francesco Chicchiriccò > Fix For: 2.1.13 > > > Whilst investigating an issue with the Sling project and support for emoji > characters, I've come to notice that the XMLEncoder used by HTMLSerializer > doesn't support Unicode surrogate pairs to represent higher order unicode > characters. > A simple unit test that demonstrates this issue is here: > https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy > More background info here also: SLING-5973 > This seems to have been identified/addressed in other Apache projects also: > https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22 -- This message was sent by Atlassian JIRA (v6.3.4#6332)