[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-10-17 Thread Ben Fortuna (JIRA)

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15582333#comment-15582333
 ] 

Ben Fortuna commented on COCOON-2352:
-

Great! I've tested the snapshot against my code and it looks good. Many thanks 
for your assistance. :-)

> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core, Blocks: Serializers
>Affects Versions: 2.1.12
>Reporter: Ben Fortuna
>Assignee: Francesco Chicchiriccò
> Fix For: 2.1.13
>
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-10-17 Thread JIRA

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15582339#comment-15582339
 ] 

Francesco Chicchiriccò commented on COCOON-2352:


You're welcome ;-)

> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core, Blocks: Serializers
>Affects Versions: 2.1.12
>Reporter: Ben Fortuna
>Assignee: Francesco Chicchiriccò
> Fix For: 2.1.13
>
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-10-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15582202#comment-15582202
 ] 

Hudson commented on COCOON-2352:


SUCCESS: Integrated in Jenkins build Cocoon 2.1.X #116 (See 
[https://builds.apache.org/job/Cocoon%202.1.X/116/])
[COCOON-2352] Applying further changes to better deal with HTML encoding 
(ilgrosso: [http://svn.apache.org/viewvc/?view=rev=1765265])
* (edit) 
BRANCH_2_1_X/src/blocks/serializers/test/org/apache/cocoon/components/serializers/encoding/XMLEncoderTestCase.java


> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core, Blocks: Serializers
>Affects Versions: 2.1.12
>Reporter: Ben Fortuna
>Assignee: Francesco Chicchiriccò
> Fix For: 2.1.13
>
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-10-17 Thread JIRA

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15582183#comment-15582183
 ] 

Francesco Chicchiriccò commented on COCOON-2352:


My bad: problem solved, committed to (Cocoon 2.1)

* 
http://svn.apache.org/repos/asf/cocoon/branches/BRANCH_2_1_X/src/blocks/serializers/java/org/apache/cocoon/components/serializers/encoding/XMLEncoder.java
* 
http://svn.apache.org/repos/asf/cocoon/branches/BRANCH_2_1_X/src/blocks/serializers/test/org/apache/cocoon/components/serializers/encoding/XMLEncoderTestCase.java

and (Maven artifact, with SNAPSHOT already redeployed):

* 
http://svn.apache.org/repos/asf/cocoon/subprojects/cocoon-serializers-charsets/trunk/src/main/java/org/apache/cocoon/components/serializers/encoding/XMLEncoder.java
* 
http://svn.apache.org/repos/asf/cocoon/subprojects/cocoon-serializers-charsets/trunk/src/test/java/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.java

> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core, Blocks: Serializers
>Affects Versions: 2.1.12
>Reporter: Ben Fortuna
>Assignee: Francesco Chicchiriccò
> Fix For: 2.1.13
>
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-10-17 Thread Ben Fortuna (JIRA)

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15582147#comment-15582147
 ] 

Ben Fortuna edited comment on COCOON-2352 at 10/17/16 12:46 PM:


Hmm, do you have a link to the source? I checked on BRANCH_2_1_X and it still 
has the old code. I noticed the error is on line 42, but the test I submitted 
only has 33 lines. 

Note it is important for the test to encode the surrogate pairs together, which 
is why I had the sequence like this:

char[] expectedValue = encoder.encode((char) 127808);
// surrogate 1/2
assertTrue(encoder.encode('\uD83C').length == 0);
// surrogate 2/2
assertTrue(Arrays.equals(expectedValue, encoder.encode('\uDF40')));




was (Author: fortuna):
Hmm, do you have a link to the source? I checked on BRANCH_2_1_X and it still 
has the old code. I noticed the error is on line 42, but the test I submitted 
only has 33 lines. 

Note it is important for the test to encode the surrogate pairs together, which 
is why I had the sequence like this:

```
char[] expectedValue = encoder.encode((char) 127808);
// surrogate 1/2
assertTrue(encoder.encode('\uD83C').length == 0);
// surrogate 2/2
assertTrue(Arrays.equals(expectedValue, encoder.encode('\uDF40')));
```


> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core, Blocks: Serializers
>Affects Versions: 2.1.12
>Reporter: Ben Fortuna
>Assignee: Francesco Chicchiriccò
> Fix For: 2.1.13
>
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-10-17 Thread Ben Fortuna (JIRA)

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15582147#comment-15582147
 ] 

Ben Fortuna commented on COCOON-2352:
-

Hmm, do you have a link to the source? I checked on BRANCH_2_1_X and it still 
has the old code. I noticed the error is on line 42, but the test I submitted 
only has 33 lines. 

Note it is important for the test to encode the surrogate pairs together, which 
is why I had the sequence like this:

{code}
char[] expectedValue = encoder.encode((char) 127808);
// surrogate 1/2
assertTrue(encoder.encode('\uD83C').length == 0);
// surrogate 2/2
assertTrue(Arrays.equals(expectedValue, encoder.encode('\uDF40')));
{code}


> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core, Blocks: Serializers
>Affects Versions: 2.1.12
>Reporter: Ben Fortuna
>Assignee: Francesco Chicchiriccò
> Fix For: 2.1.13
>
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-10-17 Thread Ben Fortuna (JIRA)

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15582147#comment-15582147
 ] 

Ben Fortuna edited comment on COCOON-2352 at 10/17/16 12:45 PM:


Hmm, do you have a link to the source? I checked on BRANCH_2_1_X and it still 
has the old code. I noticed the error is on line 42, but the test I submitted 
only has 33 lines. 

Note it is important for the test to encode the surrogate pairs together, which 
is why I had the sequence like this:

```
char[] expectedValue = encoder.encode((char) 127808);
// surrogate 1/2
assertTrue(encoder.encode('\uD83C').length == 0);
// surrogate 2/2
assertTrue(Arrays.equals(expectedValue, encoder.encode('\uDF40')));
```



was (Author: fortuna):
Hmm, do you have a link to the source? I checked on BRANCH_2_1_X and it still 
has the old code. I noticed the error is on line 42, but the test I submitted 
only has 33 lines. 

Note it is important for the test to encode the surrogate pairs together, which 
is why I had the sequence like this:

{code}
char[] expectedValue = encoder.encode((char) 127808);
// surrogate 1/2
assertTrue(encoder.encode('\uD83C').length == 0);
// surrogate 2/2
assertTrue(Arrays.equals(expectedValue, encoder.encode('\uDF40')));
{code}


> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core, Blocks: Serializers
>Affects Versions: 2.1.12
>Reporter: Ben Fortuna
>Assignee: Francesco Chicchiriccò
> Fix For: 2.1.13
>
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-10-17 Thread JIRA

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15581450#comment-15581450
 ] 

Francesco Chicchiriccò commented on COCOON-2352:


With the new test code, I receive

java.lang.IllegalArgumentException: Expected low surrogate char
at 
org.apache.cocoon.components.serializers.encoding.XMLEncoder.encode(XMLEncoder.java:97)
at 
org.apache.cocoon.components.serializers.encoding.XMLEncoderTestCase.testEncodingSurrogatePairs(XMLEncoderTestCase.java:42)

when running the test.

> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core, Blocks: Serializers
>Affects Versions: 2.1.12
>Reporter: Ben Fortuna
>Assignee: Francesco Chicchiriccò
> Fix For: 2.1.13
>
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)