[ 
https://issues.apache.org/jira/browse/XERCESC-2063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084051#comment-16084051
 ] 

Scott Cantor commented on XERCESC-2063:
---------------------------------------

The limitation is due to the fact that it turns into a surrogate pair in 
UTF-16, so the string length in that encoding comes back as 2, much like a 
Unicode-ignorant strlen() against the original buffer would be 4.

> A 4 byte UTF-8 character incorrectly failing maxlenght facet.
> -------------------------------------------------------------
>
>                 Key: XERCESC-2063
>                 URL: https://issues.apache.org/jira/browse/XERCESC-2063
>             Project: Xerces-C++
>          Issue Type: Bug
>          Components: Validating Parser (XML Schema)
>    Affects Versions: 3.1.3
>         Environment: Windows (Affects all OS)
>            Reporter: Greg Iwinski
>
> A 4 byte UTF-8 character incorrectly failing maxlenght facet.
> The data is F0 9D 90 80 and is a 4-byte UTF-8 sequence to represent 1 
> character.
> It is failing with
> Error at file input.xml, line 4, char 17
>   Message: value '??' has length '2' which exceeds maxLength facet value '1'
> when running  sax2count.exe
> This looks like a limitation but I could not find any documentation about it 
> in the bug list.
> **Input XML**
> <?xml version="1.1" encoding="UTF-8"?>
> <Root xmlns="http://www.example.org/Test"; 
> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; 
> xsi:schemaLocation="http://www.example.org/Test
> Input.xsd">
>       <Data>𝐀</Data>
> </Root>
> **Schema**
> <?xml version="1.0" encoding="UTF-8"?>
> <schema targetNamespace="http://www.example.org/Test"; 
> elementFormDefault="qualified" xmlns="http://www.w3.org/2001/XMLSchema"; 
> xmlns:tns="http://www.example.org/Test";>
> <element name="Root">
> <complexType>
> <sequence>
> <element name="Data">
> <simpleType>
> <restriction base="string">
> <maxLength value="1"/>
> </restriction>
> </simpleType>
> </element>
> </sequence>
> </complexType>
> </element>
> </schema>



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to