Hi Taki, I wasn't implying that we wouldn't fix this. Just hoping that whatever we end up doing is better than the obvious solution.
Thanks. Michael Glavassevich XML Parser Development IBM Toronto Lab E-mail: [EMAIL PROTECTED] E-mail: [EMAIL PROTECTED] "Taki Kamiya" <[EMAIL PROTECTED]> wrote on 08/06/2008 01:02:54 AM: > Hi Michael, > > I think it should deserve fixed at some point, because it is making > some valid schemas considered invalid, instead of making invalid > schemas to be processed as being valid. > > I created the test case about a while back, in preparation for vista's > extended support for more variety of fonts, which enables users > to use non-BMP characters in ways they had not been able to before. > We modified in-house schema processor to take surrogate pairs > into account whenever we check length, minLength and maxLength > facet, and has not seen any major performance penalty because > of that, which is probably partly because the use of length is not > very common in practice. > > Thanks! > > -taki > > ________________________________ > > From: Michael Glavassevich [mailto:[EMAIL PROTECTED] > Sent: Tuesday, August 05, 2008 6:00 PM > To: j-users@xerces.apache.org > Subject: Re: single non-BMP character counted as two characters > > > > Hi Taki, > > It's a long standing bug/limitation. Xerces uses String.length() > (which returns the length of the string in chars rather than Unicode > code points) for checking the length facet. > > Thanks. > > Michael Glavassevich > XML Parser Development > IBM Toronto Lab > E-mail: [EMAIL PROTECTED] > E-mail: [EMAIL PROTECTED] > > "Taki Kamiya" <[EMAIL PROTECTED]> wrote on 08/05/2008 08:20:38 PM: > > > Hi, > > > > The following schema, which is supposedly valid, results in this error: > > > > cvc-length-valid: Value '𠀋' with length = '2' is not facet-valid > > with respect to length '1' > > for type '#AnonType_act'. > > > > The default value "𠀋" for attribute "a" is a single non- > BMP character. > > It is as though a surrogate pair is counted as two characters. > > > > Regards, > > > > -taki > > > > > > > > <xsd:schema targetNamespace="urn:foo" > > xmlns:xsd="http://www.w3.org/2001/XMLSchema" > > xmlns:foo="urn:foo"> > > > > <xsd:complexType name="ct"> > > <xsd:attribute name="a" default="𠀋"><!-- single > character > in SIP (U+2000B) --> > > <xsd:simpleType> > > <xsd:restriction base="xsd:string"> > > <xsd:length value="1"/> > > </xsd:restriction> > > </xsd:simpleType> > > </xsd:attribute> > > </xsd:complexType> > > > > </xsd:schema> > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED]