[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-10-19 Thread Ben Fortuna (JIRA)

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15590441#comment-15590441
 ] 

Ben Fortuna commented on COCOON-2352:
-

My sincerest apologies, but I discovered a bug in the patch I submitted. 
Unfortunately I had assumed we can cast an int to a char to encode the higher 
order unicode characters, but of course this isn't possible and is why unicode 
surrogate pairs exist in the first place..

So I had to make a slight change to the code (again) - I have updated two 
files: XMLEncoder and XMLEncoderTestCase to ensure that after combining a 
surrogate pair to a code point we are then correctly encoding the int value as 
an HTML-compatible string.

https://github.com/apache/cocoon/pull/3/files

Thanks again, and fingers crossed there are no more changes required. :-)

> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core, Blocks: Serializers
>Affects Versions: 2.1.12
>Reporter: Ben Fortuna
>Assignee: Francesco Chicchiriccò
> Fix For: 2.1.13
>
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-10-17 Thread Ben Fortuna (JIRA)

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15582333#comment-15582333
 ] 

Ben Fortuna commented on COCOON-2352:
-

Great! I've tested the snapshot against my code and it looks good. Many thanks 
for your assistance. :-)

> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core, Blocks: Serializers
>Affects Versions: 2.1.12
>Reporter: Ben Fortuna
>Assignee: Francesco Chicchiriccò
> Fix For: 2.1.13
>
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-10-17 Thread Ben Fortuna (JIRA)

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15582147#comment-15582147
 ] 

Ben Fortuna edited comment on COCOON-2352 at 10/17/16 12:46 PM:


Hmm, do you have a link to the source? I checked on BRANCH_2_1_X and it still 
has the old code. I noticed the error is on line 42, but the test I submitted 
only has 33 lines. 

Note it is important for the test to encode the surrogate pairs together, which 
is why I had the sequence like this:

char[] expectedValue = encoder.encode((char) 127808);
// surrogate 1/2
assertTrue(encoder.encode('\uD83C').length == 0);
// surrogate 2/2
assertTrue(Arrays.equals(expectedValue, encoder.encode('\uDF40')));




was (Author: fortuna):
Hmm, do you have a link to the source? I checked on BRANCH_2_1_X and it still 
has the old code. I noticed the error is on line 42, but the test I submitted 
only has 33 lines. 

Note it is important for the test to encode the surrogate pairs together, which 
is why I had the sequence like this:

```
char[] expectedValue = encoder.encode((char) 127808);
// surrogate 1/2
assertTrue(encoder.encode('\uD83C').length == 0);
// surrogate 2/2
assertTrue(Arrays.equals(expectedValue, encoder.encode('\uDF40')));
```


> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core, Blocks: Serializers
>Affects Versions: 2.1.12
>Reporter: Ben Fortuna
>Assignee: Francesco Chicchiriccò
> Fix For: 2.1.13
>
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-10-17 Thread Ben Fortuna (JIRA)

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15582147#comment-15582147
 ] 

Ben Fortuna commented on COCOON-2352:
-

Hmm, do you have a link to the source? I checked on BRANCH_2_1_X and it still 
has the old code. I noticed the error is on line 42, but the test I submitted 
only has 33 lines. 

Note it is important for the test to encode the surrogate pairs together, which 
is why I had the sequence like this:

{code}
char[] expectedValue = encoder.encode((char) 127808);
// surrogate 1/2
assertTrue(encoder.encode('\uD83C').length == 0);
// surrogate 2/2
assertTrue(Arrays.equals(expectedValue, encoder.encode('\uDF40')));
{code}


> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core, Blocks: Serializers
>Affects Versions: 2.1.12
>Reporter: Ben Fortuna
>Assignee: Francesco Chicchiriccò
> Fix For: 2.1.13
>
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-10-17 Thread Ben Fortuna (JIRA)

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15582147#comment-15582147
 ] 

Ben Fortuna edited comment on COCOON-2352 at 10/17/16 12:45 PM:


Hmm, do you have a link to the source? I checked on BRANCH_2_1_X and it still 
has the old code. I noticed the error is on line 42, but the test I submitted 
only has 33 lines. 

Note it is important for the test to encode the surrogate pairs together, which 
is why I had the sequence like this:

```
char[] expectedValue = encoder.encode((char) 127808);
// surrogate 1/2
assertTrue(encoder.encode('\uD83C').length == 0);
// surrogate 2/2
assertTrue(Arrays.equals(expectedValue, encoder.encode('\uDF40')));
```



was (Author: fortuna):
Hmm, do you have a link to the source? I checked on BRANCH_2_1_X and it still 
has the old code. I noticed the error is on line 42, but the test I submitted 
only has 33 lines. 

Note it is important for the test to encode the surrogate pairs together, which 
is why I had the sequence like this:

{code}
char[] expectedValue = encoder.encode((char) 127808);
// surrogate 1/2
assertTrue(encoder.encode('\uD83C').length == 0);
// surrogate 2/2
assertTrue(Arrays.equals(expectedValue, encoder.encode('\uDF40')));
{code}


> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core, Blocks: Serializers
>Affects Versions: 2.1.12
>Reporter: Ben Fortuna
>Assignee: Francesco Chicchiriccò
> Fix For: 2.1.13
>
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-10-16 Thread Ben Fortuna (JIRA)

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15580714#comment-15580714
 ] 

Ben Fortuna edited comment on COCOON-2352 at 10/16/16 11:39 PM:


Yes sorry, I forgot to mention I had updated the unit test also. See the same 
PR for the changes (3 lines in the test method).

https://github.com/apache/cocoon/pull/2/files#diff-4f5d5b9cb8b320832b3f0dfb8183a1b9R28




was (Author: fortuna):
Yes sorry, I forgot to mention I had updated the unit test also. See the same 
PR for the changes.

> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core, Blocks: Serializers
>Affects Versions: 2.1.12
>Reporter: Ben Fortuna
>Assignee: Francesco Chicchiriccò
> Fix For: 2.1.13
>
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-10-16 Thread Ben Fortuna (JIRA)

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15580714#comment-15580714
 ] 

Ben Fortuna commented on COCOON-2352:
-

Yes sorry, I forgot to mention I had updated the unit test also. See the same 
PR for the changes.

> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core, Blocks: Serializers
>Affects Versions: 2.1.12
>Reporter: Ben Fortuna
>Assignee: Francesco Chicchiriccò
> Fix For: 2.1.13
>
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-10-13 Thread Ben Fortuna (JIRA)

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15574193#comment-15574193
 ] 

Ben Fortuna commented on COCOON-2352:
-

[~ilgrosso] Fantastic, thanks. I've used the snapshot dependency to test the 
fix in my project and I did notice one more thing.. whilst it does create the 
unicode character correctly from the surrogate pair it doesn't actually HTML 
encode the character. 

In order to fix this I've created another pull request, which simply encodes 
the unicode character created from the surrogate pair:

https://github.com/apache/cocoon/pull/2/files#diff-2b4ac8dab4cdcce4c7ffd948c2490b52R101

I hope it isn't too much trouble to apply this change also, I'm confident this 
is the last change required. Many thanks.


> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core, Blocks: Serializers
>Affects Versions: 2.1.12
>Reporter: Ben Fortuna
>Assignee: Francesco Chicchiriccò
> Fix For: 2.1.13
>
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-10-12 Thread Ben Fortuna (JIRA)

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570179#comment-15570179
 ] 

Ben Fortuna edited comment on COCOON-2352 at 10/12/16 11:15 PM:


[~ilgrosso] I am happy to have this issue closed, however it would be good if 
there was a snapshot JAR available to verify the functionality. Specifically I 
am hoping this change will make it into this artefact:

http://search.maven.org/#search%7Cgav%7C1%7Cg%3A%22org.apache.cocoon%22%20AND%20a%3A%22cocoon-serializers-charsets%22

Will a new version be produced with the next release? Many thanks for your 
efforts.


was (Author: fortuna):
[~ilgrosso] I am happy to have this issue closed, however it would be good if 
there was a snapshot JAR available to verify the functionality. Specifically I 
am hoping this change will make it into this artefact:

http://search.maven.org/#search%7Cgav%7C1%7Cg%3A%22org.apache.cocoon%22%20AND%20a%3A%22cocoon-serializers-charsets%22

Will a new version be produced with the next release? Many thanks for your 
efforts.

> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core, Blocks: Serializers
>Affects Versions: 2.1.12
>Reporter: Ben Fortuna
>Assignee: Francesco Chicchiriccò
> Fix For: 2.1.13
>
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-10-12 Thread Ben Fortuna (JIRA)

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570179#comment-15570179
 ] 

Ben Fortuna commented on COCOON-2352:
-

[~ilgrosso] I am happy to have this issue closed, however it would be good if 
there was a snapshot JAR available to verify the functionality. Specifically I 
am hoping this change will make it into this artefact:

http://search.maven.org/#search%7Cgav%7C1%7Cg%3A%22org.apache.cocoon%22%20AND%20a%3A%22cocoon-serializers-charsets%22

Will a new version be produced with the next release? Many thanks for your 
efforts.

> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core, Blocks: Serializers
>Affects Versions: 2.1.12
>Reporter: Ben Fortuna
>Assignee: Francesco Chicchiriccò
> Fix For: 2.1.13
>
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-10-10 Thread Ben Fortuna (JIRA)

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15563730#comment-15563730
 ] 

Ben Fortuna commented on COCOON-2352:
-

Hmm, I guess from that failed build that you are still maintaining 
compatibility with Java 1.4 (Character.isLowSurrogate() was introduced in 1.5). 
I guess we can work around that although I'm not sure anyone is using Java 1.4 
anymore.. ;)


> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core, Blocks: Serializers
>Affects Versions: 2.1.12
>Reporter: Ben Fortuna
>Assignee: Francesco Chicchiriccò
> Fix For: 2.1.13
>
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-10-09 Thread Ben Fortuna (JIRA)

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15560988#comment-15560988
 ] 

Ben Fortuna commented on COCOON-2352:
-

I've just created a pull request in github to add support for surrogate pairs.

https://github.com/apache/cocoon/pull/1

Summary of changes:

* Added instance variable to XMLEncoder to record the first surrogate of the 
pair - NOTE: this means the XMLEncoder is no longer thread safe. This may have 
implications I'm not aware of (i.e. usage in multi-threaded way)
* Added unit test to demonstrate the behaviour - NOTE: I needed to add the 
serializers project to the test classpath, not sure if there is a better way to 
do this with the ant config.

I look forward to any feedback or comments.

regards,
ben


> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core, Blocks: Serializers
>Reporter: Ben Fortuna
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-09-16 Thread Ben Fortuna (JIRA)

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15495624#comment-15495624
 ] 

Ben Fortuna commented on COCOON-2352:
-

Ok, I'll first create a unit test to demonstrate the issue. I'd prefer not to 
change the Encoder interface so I'll see if it's possible to just update 
XMLEncoder.

I have looked at the EncodingSerializer, however I think a surrogate pair needs 
to be encoded "together", so the logic really needs to be in the delegate 
encoder (i.e. XMLEncoder).


> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core, Blocks: Serializers
>Reporter: Ben Fortuna
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-09-16 Thread Ben Fortuna (JIRA)

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15495623#comment-15495623
 ] 

Ben Fortuna commented on COCOON-2352:
-

Ok, I'll first create a unit test to demonstrate the issue. I'd prefer not to 
change the Encoder interface so I'll see if it's possible to just update 
XMLEncoder.

I have looked at the EncodingSerializer, however I think a surrogate pair needs 
to be encoded "together", so the logic really needs to be in the delegate 
encoder (i.e. XMLEncoder).


> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core, Blocks: Serializers
>Reporter: Ben Fortuna
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-09-16 Thread Ben Fortuna (JIRA)

 [ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Fortuna updated COCOON-2352:

Comment: was deleted

(was: Ok, I'll first create a unit test to demonstrate the issue. I'd prefer 
not to change the Encoder interface so I'll see if it's possible to just update 
XMLEncoder.

I have looked at the EncodingSerializer, however I think a surrogate pair needs 
to be encoded "together", so the logic really needs to be in the delegate 
encoder (i.e. XMLEncoder).
)

> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core, Blocks: Serializers
>Reporter: Ben Fortuna
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-09-15 Thread Ben Fortuna (JIRA)

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15495081#comment-15495081
 ] 

Ben Fortuna commented on COCOON-2352:
-

Hi Francesco,

The JAR I am using is: org.apache.cocoon:cocoon-serializers-charsets:1.0.2 - 
which appears to be built in 2012. It looks like it came from the BRANCH_2_1.X 
branch but I can't be certain..

I will try to make a patch - the easiest for me would a pull request on GitHub, 
but if you prefer a patch file I can do that also. 

I am looking at the unit tests in the project and it is a little difficult to 
get my head around. Would you prefer that I write a unit test using htmlunit, 
or junit, or no preference? It appears tests haven't been updated for a number 
of years. Many thanks.


> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core
>Reporter: Ben Fortuna
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-09-14 Thread Ben Fortuna (JIRA)

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15492310#comment-15492310
 ] 

Ben Fortuna commented on COCOON-2352:
-

A possibly less-instrusive approach would be to leave the method signatures as 
is, but when a surrogate char is detected, record it and return an empty char 
array. Expect the second surrogate in the pair to be encoded next and return 
the correct char array result (if second surrogate in the pair isn't encoded 
throw encoding exception).

> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core
>Reporter: Ben Fortuna
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-09-14 Thread Ben Fortuna (JIRA)

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15492282#comment-15492282
 ] 

Ben Fortuna commented on COCOON-2352:
-

So I've looked at XMLEncoder, and it seems that the fix will require a change 
to the method signature - specifically XMLEncoder.encode(char c):

https://github.com/apache/cocoon/blob/3ce60f6ecb257b138fc68077bc562a871df045e5/src/blocks/serializers/java/org/apache/cocoon/components/serializers/encoding/XMLEncoder.java#L88

Unfortunately this also means the Encoder interface needs to change, so will 
need an exercise to identify what else implements this interface. The proposed 
change would be something like:

public char[] Encoder.encode(char[] chars)

https://github.com/apache/cocoon/blob/3ce60f6ecb257b138fc68077bc562a871df045e5/src/blocks/serializers/java/org/apache/cocoon/components/serializers/encoding/Encoder.java#L36

I'm happy to implement a fix and submit a pull request, just looking for some 
acknowledgement of the issue before proceeding.


> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core
>Reporter: Ben Fortuna
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-08-18 Thread Ben Fortuna (JIRA)
Ben Fortuna created COCOON-2352:
---

 Summary: XMLEncoder doesn't support Unicode surrogate pairs
 Key: COCOON-2352
 URL: https://issues.apache.org/jira/browse/COCOON-2352
 Project: Cocoon
  Issue Type: Bug
  Components: * Cocoon Core
Reporter: Ben Fortuna


Whilst investigating an issue with the Sling project and support for emoji 
characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
doesn't support Unicode surrogate pairs to represent higher order unicode 
characters.

A simple unit test that demonstrates this issue is here:

https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy

More background info here also: SLING-5973

This seems to have been identified/addressed in other Apache projects also:

https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)