Hi Taki,

I wasn't implying that we wouldn't fix this. Just hoping that whatever we
end up doing is better than the obvious solution.

Thanks.

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: [EMAIL PROTECTED]
E-mail: [EMAIL PROTECTED]

"Taki Kamiya" <[EMAIL PROTECTED]> wrote on 08/06/2008 01:02:54 AM:

> Hi Michael,
>
> I think it should deserve fixed at some point, because it is making
> some valid schemas considered invalid, instead of making invalid
> schemas to be processed as being valid.
>
> I created the test case about a while back, in preparation for vista's
> extended support for more variety of fonts, which enables users
> to use non-BMP characters in ways they had not been able to before.
> We modified in-house schema processor to take surrogate pairs
> into account whenever we check length, minLength and maxLength
> facet, and has not seen any major performance penalty because
> of that, which is probably partly because the use of length is not
> very common in practice.
>
> Thanks!
>
> -taki
>
> ________________________________
>
> From: Michael Glavassevich [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, August 05, 2008 6:00 PM
> To: j-users@xerces.apache.org
> Subject: Re: single non-BMP character counted as two characters
>
>
>
> Hi Taki,
>
> It's a long standing bug/limitation. Xerces uses String.length()
> (which returns the length of the string in chars rather than Unicode
> code points) for checking the length facet.
>
> Thanks.
>
> Michael Glavassevich
> XML Parser Development
> IBM Toronto Lab
> E-mail: [EMAIL PROTECTED]
> E-mail: [EMAIL PROTECTED]
>
> "Taki Kamiya" <[EMAIL PROTECTED]> wrote on 08/05/2008 08:20:38 PM:
>
> > Hi,
> >
> > The following schema, which is supposedly valid, results in this error:
> >
> >   cvc-length-valid: Value '𠀋' with length = '2' is not facet-valid
> > with respect to length '1'
> >   for type '#AnonType_act'.
> >
> > The default value "&#x2000B;" for attribute "a" is a single non-
> BMP character.
> > It is as though a surrogate pair is counted as two characters.
> >
> > Regards,
> >
> > -taki
> >
> >
> >
> > <xsd:schema targetNamespace="urn:foo"
> >            xmlns:xsd="http://www.w3.org/2001/XMLSchema";
> >            xmlns:foo="urn:foo">
> >
> > <xsd:complexType name="ct">
> >   <xsd:attribute name="a" default="&#x2000B;"><!-- single
> character 
> in SIP (U+2000B) -->
> >     <xsd:simpleType>
> >       <xsd:restriction base="xsd:string">
> >         <xsd:length value="1"/>
> >       </xsd:restriction>
> >     </xsd:simpleType>
> >   </xsd:attribute>
> > </xsd:complexType>
> >
> > </xsd:schema>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to