[jira] [Commented] (XERCESC-2130) UTF16 Surrgate values 0xD800-0xDFFF can not longer be written with xerces 3.2.0 (e.g. emoticons)

2018-01-22 Thread Andreas Krantz (JIRA)

[ 
https://issues.apache.org/jira/browse/XERCESC-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16334409#comment-16334409
 ] 

Andreas Krantz commented on XERCESC-2130:
-

https://issues.apache.org/jira/browse/XERCESC-1854

describes that xerces could be used to write files that no longer can be read.

[http://svn.apache.org/viewvc/xerces/c/trunk/src/xercesc/dom/impl/DOMLSSerializerImpl.cpp?r1=768978=1226891]

introduced the new method
DOMLSSerializerImpl::ensureValidString
method that fails to validate characters x1-#x10.
Those valid characters can not be displayed using one 16bit XMLCh but two 16bit 
XMLCh are needed.

To implement those characters the range D800 - DFFF is used.

[https://en.wikipedia.org/wiki/UTF-16#U+D800_to_U+DFFF]

There is one leading(high) 16bit XMLCh and a trailing(low) 16bit character.
Checking 
[http://svn.apache.org/viewvc/xerces/c/trunk/src/xercesc/util/XMLChar.cpp]
will show you that the

{{isXMLChar}}

method used already is aware of this fact and can be used to validate two 
character XMLChs.

*An easy fix would be:*
 * *reopen XERCESC 1854*
 * *clear the content of ensureValidString to do nothing*
 * *make sure this redistributed to avoid not beeing able to write 
x1-#x10*

I use xerces for over a decade and writing invalid files was always there. So 
it does no harm to remove this broken feature (introduced in 3.2.0) again.

P.S.: Signing an CLA seems not that easy. I am checking.

 

 

 

> UTF16 Surrgate values 0xD800-0xDFFF can not longer be written with xerces 
> 3.2.0 (e.g. emoticons)
> 
>
> Key: XERCESC-2130
> URL: https://issues.apache.org/jira/browse/XERCESC-2130
> Project: Xerces-C++
>  Issue Type: Bug
>  Components: DOM
>Affects Versions: 3.2.0
>Reporter: Andreas Krantz
>Priority: Critical
> Attachments: fix.patch, patch_.cpp, reproduce.cpp
>
>
> Solution for XERCESC-1854 introduced method
> {{DOMLSSerializerImpl::ensureValidString}}
> which has an error in validation. 
> The method validates XMLCh which represent UTF16.
> [Valid Characters|https://www.w3.org/TR/REC-xml/#NT-Char] #x9 | #xA | #xD | 
> [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x1-#x10]
> are the valid UTF32 characters.
> The UTF16 surrogate range from xD800 - xDFFF is used to represent 
> [#x1-#x10] and should not be handled as nvalid.
> *The reader threads this correctly and does not complain, which leads to an 
> asmetric behavior*
> Reading DOM => OK
> Save back DOM => Exception
> I tried to attach an example to show the behavior.
> The used methods
> {{bool XMLChar1_1::isXMLChar(const XMLCh toCheck, const XMLCh toCheck2)}}
> already have a second optional parameter to check surrogate values.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org
For additional commands, e-mail: c-dev-h...@xerces.apache.org



[jira] [Commented] (XERCESC-2130) UTF16 Surrgate values 0xD800-0xDFFF can not longer be written with xerces 3.2.0 (e.g. emoticons)

2018-01-22 Thread Scott Cantor (JIRA)

[ 
https://issues.apache.org/jira/browse/XERCESC-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16334309#comment-16334309
 ] 

Scott Cantor commented on XERCESC-2130:
---

I don't believe the patch would cross any threshold of significance as to 
require a CLA, but that doesn't really matter. I am not touching this code, it 
would be just as likely to break ten other Unicode features as fix anything (I 
say that from a position of ignorance, I couldn't possibly know what it would 
do).

Rolling back whatever broke this, depending on what *that* fixed, would be a 
more likely fix from my perspective. But I don't know anything about the 
original change, I didn't make it.

If somebody else understands any of this and wants to take responsibility for 
the change, have at it.

> UTF16 Surrgate values 0xD800-0xDFFF can not longer be written with xerces 
> 3.2.0 (e.g. emoticons)
> 
>
> Key: XERCESC-2130
> URL: https://issues.apache.org/jira/browse/XERCESC-2130
> Project: Xerces-C++
>  Issue Type: Bug
>  Components: DOM
>Affects Versions: 3.2.0
>Reporter: Andreas Krantz
>Priority: Critical
> Attachments: fix.patch, patch_.cpp, reproduce.cpp
>
>
> Solution for XERCESC-1854 introduced method
> {{DOMLSSerializerImpl::ensureValidString}}
> which has an error in validation. 
> The method validates XMLCh which represent UTF16.
> [Valid Characters|https://www.w3.org/TR/REC-xml/#NT-Char] #x9 | #xA | #xD | 
> [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x1-#x10]
> are the valid UTF32 characters.
> The UTF16 surrogate range from xD800 - xDFFF is used to represent 
> [#x1-#x10] and should not be handled as nvalid.
> *The reader threads this correctly and does not complain, which leads to an 
> asmetric behavior*
> Reading DOM => OK
> Save back DOM => Exception
> I tried to attach an example to show the behavior.
> The used methods
> {{bool XMLChar1_1::isXMLChar(const XMLCh toCheck, const XMLCh toCheck2)}}
> already have a second optional parameter to check surrogate values.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org
For additional commands, e-mail: c-dev-h...@xerces.apache.org



[jira] [Commented] (XERCESC-2130) UTF16 Surrgate values 0xD800-0xDFFF can not longer be written with xerces 3.2.0 (e.g. emoticons)

2018-01-22 Thread Roger Leigh (JIRA)

[ 
https://issues.apache.org/jira/browse/XERCESC-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16334103#comment-16334103
 ] 

Roger Leigh commented on XERCESC-2130:
--

Regarding signing, I did my work on my employer's time for at least some of it, 
so I had to get them to also sign a corporate CLA.  It wasn't a problem, but it 
was a massive pain due to it taking about six months to be approved.  May be 
easier for smaller organisations with less tortuous bureaucracy!

> UTF16 Surrgate values 0xD800-0xDFFF can not longer be written with xerces 
> 3.2.0 (e.g. emoticons)
> 
>
> Key: XERCESC-2130
> URL: https://issues.apache.org/jira/browse/XERCESC-2130
> Project: Xerces-C++
>  Issue Type: Bug
>  Components: DOM
>Affects Versions: 3.2.0
>Reporter: Andreas Krantz
>Priority: Critical
> Attachments: fix.patch, patch_.cpp, reproduce.cpp
>
>
> Solution for XERCESC-1854 introduced method
> {{DOMLSSerializerImpl::ensureValidString}}
> which has an error in validation. 
> The method validates XMLCh which represent UTF16.
> [Valid Characters|https://www.w3.org/TR/REC-xml/#NT-Char] #x9 | #xA | #xD | 
> [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x1-#x10]
> are the valid UTF32 characters.
> The UTF16 surrogate range from xD800 - xDFFF is used to represent 
> [#x1-#x10] and should not be handled as nvalid.
> *The reader threads this correctly and does not complain, which leads to an 
> asmetric behavior*
> Reading DOM => OK
> Save back DOM => Exception
> I tried to attach an example to show the behavior.
> The used methods
> {{bool XMLChar1_1::isXMLChar(const XMLCh toCheck, const XMLCh toCheck2)}}
> already have a second optional parameter to check surrogate values.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org
For additional commands, e-mail: c-dev-h...@xerces.apache.org



[jira] [Commented] (XERCESC-2130) UTF16 Surrgate values 0xD800-0xDFFF can not longer be written with xerces 3.2.0 (e.g. emoticons)

2018-01-22 Thread Andreas Krantz (JIRA)

[ 
https://issues.apache.org/jira/browse/XERCESC-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16334092#comment-16334092
 ] 

Andreas Krantz commented on XERCESC-2130:
-

An alternative would be an rollback of {{DOMLSSerializerImpl}} by means of 
going back to not check.
Actually the xerces 3.2.0 has a broken writer code. You will not be able to 
save any messager message including an emoticon.

*So it is important to come up with an 3.2.1 before the 3.2.0 is migrated in to 
many distributions including this issue.*

P.S:. I think I must first figure out if I can just sign this. I am not a legal 
expert too.

> UTF16 Surrgate values 0xD800-0xDFFF can not longer be written with xerces 
> 3.2.0 (e.g. emoticons)
> 
>
> Key: XERCESC-2130
> URL: https://issues.apache.org/jira/browse/XERCESC-2130
> Project: Xerces-C++
>  Issue Type: Bug
>  Components: DOM
>Affects Versions: 3.2.0
>Reporter: Andreas Krantz
>Priority: Critical
> Attachments: fix.patch, patch_.cpp, reproduce.cpp
>
>
> Solution for XERCESC-1854 introduced method
> {{DOMLSSerializerImpl::ensureValidString}}
> which has an error in validation. 
> The method validates XMLCh which represent UTF16.
> [Valid Characters|https://www.w3.org/TR/REC-xml/#NT-Char] #x9 | #xA | #xD | 
> [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x1-#x10]
> are the valid UTF32 characters.
> The UTF16 surrogate range from xD800 - xDFFF is used to represent 
> [#x1-#x10] and should not be handled as nvalid.
> *The reader threads this correctly and does not complain, which leads to an 
> asmetric behavior*
> Reading DOM => OK
> Save back DOM => Exception
> I tried to attach an example to show the behavior.
> The used methods
> {{bool XMLChar1_1::isXMLChar(const XMLCh toCheck, const XMLCh toCheck2)}}
> already have a second optional parameter to check surrogate values.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org
For additional commands, e-mail: c-dev-h...@xerces.apache.org



[jira] [Commented] (XERCESC-2130) UTF16 Surrgate values 0xD800-0xDFFF can not longer be written with xerces 3.2.0 (e.g. emoticons)

2018-01-22 Thread Roger Leigh (JIRA)

[ 
https://issues.apache.org/jira/browse/XERCESC-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16334086#comment-16334086
 ] 

Roger Leigh commented on XERCESC-2130:
--

I'm not a legal expert, and I don't know where the Apache organisation draws 
the line between trivial and non-trivial contributions which require a CLA, but 
I suspect this counts as non-trivial.  I think you would need to fill out an 
[individual CLA]([https://www.apache.org/licenses/#clas)] to allow this to be 
included.  However, others might wish to correct me if I'm wrong.

> UTF16 Surrgate values 0xD800-0xDFFF can not longer be written with xerces 
> 3.2.0 (e.g. emoticons)
> 
>
> Key: XERCESC-2130
> URL: https://issues.apache.org/jira/browse/XERCESC-2130
> Project: Xerces-C++
>  Issue Type: Bug
>  Components: DOM
>Affects Versions: 3.2.0
>Reporter: Andreas Krantz
>Priority: Critical
> Attachments: fix.patch, patch_.cpp, reproduce.cpp
>
>
> Solution for XERCESC-1854 introduced method
> {{DOMLSSerializerImpl::ensureValidString}}
> which has an error in validation. 
> The method validates XMLCh which represent UTF16.
> [Valid Characters|https://www.w3.org/TR/REC-xml/#NT-Char] #x9 | #xA | #xD | 
> [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x1-#x10]
> are the valid UTF32 characters.
> The UTF16 surrogate range from xD800 - xDFFF is used to represent 
> [#x1-#x10] and should not be handled as nvalid.
> *The reader threads this correctly and does not complain, which leads to an 
> asmetric behavior*
> Reading DOM => OK
> Save back DOM => Exception
> I tried to attach an example to show the behavior.
> The used methods
> {{bool XMLChar1_1::isXMLChar(const XMLCh toCheck, const XMLCh toCheck2)}}
> already have a second optional parameter to check surrogate values.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org
For additional commands, e-mail: c-dev-h...@xerces.apache.org



[jira] [Commented] (XERCESC-2130) UTF16 Surrgate values 0xD800-0xDFFF can not longer be written with xerces 3.2.0 (e.g. emoticons)

2018-01-22 Thread Andreas Krantz (JIRA)

[ 
https://issues.apache.org/jira/browse/XERCESC-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16334076#comment-16334076
 ] 

Andreas Krantz commented on XERCESC-2130:
-

Could someone please tell me what to do to get this bug fix in there.

*Potentially a 3.2.1 should be produced!*

> UTF16 Surrgate values 0xD800-0xDFFF can not longer be written with xerces 
> 3.2.0 (e.g. emoticons)
> 
>
> Key: XERCESC-2130
> URL: https://issues.apache.org/jira/browse/XERCESC-2130
> Project: Xerces-C++
>  Issue Type: Bug
>  Components: DOM
>Affects Versions: 3.2.0
>Reporter: Andreas Krantz
>Priority: Critical
> Attachments: fix.patch, patch_.cpp, reproduce.cpp
>
>
> Solution for XERCESC-1854 introduced method
> {{DOMLSSerializerImpl::ensureValidString}}
> which has an error in validation. 
> The method validates XMLCh which represent UTF16.
> [Valid Characters|https://www.w3.org/TR/REC-xml/#NT-Char] #x9 | #xA | #xD | 
> [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x1-#x10]
> are the valid UTF32 characters.
> The UTF16 surrogate range from xD800 - xDFFF is used to represent 
> [#x1-#x10] and should not be handled as nvalid.
> *The reader threads this correctly and does not complain, which leads to an 
> asmetric behavior*
> Reading DOM => OK
> Save back DOM => Exception
> I tried to attach an example to show the behavior.
> The used methods
> {{bool XMLChar1_1::isXMLChar(const XMLCh toCheck, const XMLCh toCheck2)}}
> already have a second optional parameter to check surrogate values.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org
For additional commands, e-mail: c-dev-h...@xerces.apache.org



[jira] [Commented] (XERCESC-2130) UTF16 Surrgate values 0xD800-0xDFFF can not longer be written with xerces 3.2.0 (e.g. emoticons)

2018-01-12 Thread Andreas Krantz (JIRA)

[ 
https://issues.apache.org/jira/browse/XERCESC-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16323900#comment-16323900
 ] 

Andreas Krantz commented on XERCESC-2130:
-

I am new on the apache stuff. Just figured out the issue when migrating our 
product to xercers 3.2.0.
So actually I have no Apache CLA.

> UTF16 Surrgate values 0xD800-0xDFFF can not longer be written with xerces 
> 3.2.0 (e.g. emoticons)
> 
>
> Key: XERCESC-2130
> URL: https://issues.apache.org/jira/browse/XERCESC-2130
> Project: Xerces-C++
>  Issue Type: Bug
>  Components: DOM
>Affects Versions: 3.2.0
>Reporter: Andreas Krantz
>Priority: Critical
> Attachments: fix.patch, patch_.cpp, reproduce.cpp
>
>
> Solution for XERCESC-1854 introduced method
> {{DOMLSSerializerImpl::ensureValidString}}
> which has an error in validation. 
> The method validates XMLCh which represent UTF16.
> [Valid Characters|https://www.w3.org/TR/REC-xml/#NT-Char] #x9 | #xA | #xD | 
> [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x1-#x10]
> are the valid UTF32 characters.
> The UTF16 surrogate range from xD800 - xDFFF is used to represent 
> [#x1-#x10] and should not be handled as nvalid.
> *The reader threads this correctly and does not complain, which leads to an 
> asmetric behavior*
> Reading DOM => OK
> Save back DOM => Exception
> I tried to attach an example to show the behavior.
> The used methods
> {{bool XMLChar1_1::isXMLChar(const XMLCh toCheck, const XMLCh toCheck2)}}
> already have a second optional parameter to check surrogate values.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org
For additional commands, e-mail: c-dev-h...@xerces.apache.org



[jira] [Commented] (XERCESC-2130) UTF16 Surrgate values 0xD800-0xDFFF can not longer be written with xerces 3.2.0 (e.g. emoticons)

2018-01-12 Thread Roger Leigh (JIRA)

[ 
https://issues.apache.org/jira/browse/XERCESC-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16323829#comment-16323829
 ] 

Roger Leigh commented on XERCESC-2130:
--

Ouch, emoticons were a bit of a low blow!

I'm not sure what the threshold is for contributions to require it, but have 
you already submitted the Apache CLA?

Thanks,
Roger

> UTF16 Surrgate values 0xD800-0xDFFF can not longer be written with xerces 
> 3.2.0 (e.g. emoticons)
> 
>
> Key: XERCESC-2130
> URL: https://issues.apache.org/jira/browse/XERCESC-2130
> Project: Xerces-C++
>  Issue Type: Bug
>  Components: DOM
>Affects Versions: 3.2.0
>Reporter: Andreas Krantz
>Priority: Critical
> Attachments: fix.patch, patch_.cpp, reproduce.cpp
>
>
> Solution for XERCESC-1854 introduced method
> {{DOMLSSerializerImpl::ensureValidString}}
> which has an error in validation. 
> The method validates XMLCh which represent UTF16.
> [Valid Characters|https://www.w3.org/TR/REC-xml/#NT-Char] #x9 | #xA | #xD | 
> [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x1-#x10]
> are the valid UTF32 characters.
> The UTF16 surrogate range from xD800 - xDFFF is used to represent 
> [#x1-#x10] and should not be handled as nvalid.
> *The reader threads this correctly and does not complain, which leads to an 
> asmetric behavior*
> Reading DOM => OK
> Save back DOM => Exception
> I tried to attach an example to show the behavior.
> The used methods
> {{bool XMLChar1_1::isXMLChar(const XMLCh toCheck, const XMLCh toCheck2)}}
> already have a second optional parameter to check surrogate values.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org
For additional commands, e-mail: c-dev-h...@xerces.apache.org