[
https://issues.apache.org/jira/browse/XERCESJ-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17445809#comment-17445809
]
Mukul Gandhi edited comment on XERCESJ-1716 at 11/20/21, 10:51 AM:
-------------------------------------------------------------------
I got a chance, to look at the original bug report with this thread.
Instead of,
<xs:simpleType name="SimpleText255NotBlankType">
<xs:annotation>
<xs:documentation xml:lang="en">String of maximum 255 characters,
not blank</xs:documentation>
</xs:annotation>
<xs:restriction base="xs:string">
<xs:minLength value="1"/>
<xs:maxLength value="255"/>
<xs:pattern value=".*[^\s].*"/>
</xs:restriction>
</xs:simpleType>
We can write (and that runs very fast on the provided XML document
long_string.xml),
<xs:simpleType name="SimpleText255NotBlankType">
<xs:annotation>
<xs:documentation xml:lang="en">String of maximum 255 characters,
not blank</xs:documentation>
</xs:annotation>
<xs:restriction base="xs:string">
<xs:pattern value="[^\s]{1,255}"/>
</xs:restriction>
</xs:simpleType>
I think that, Xerces XSD processor in general, should not evaluate
xs:minLength, xs:maxLength facets before xs:pattern facet. The XSD
specification doesn't prescribe, any such guideline, and implementers can
determine order of XSD facet evaluation within a simple type as implementation
dependent.
was (Author: mukul_gandhi):
I got a chance, to look at the original bug report with this thread.
Instead of,
<xs:simpleType name="SimpleText255NotBlankType">
<xs:annotation>
<xs:documentation xml:lang="en">String of maximum 255 characters,
not blank</xs:documentation>
</xs:annotation>
<xs:restriction base="xs:string">
<xs:minLength value="1"/>
<xs:maxLength value="255"/>
<xs:pattern value=".*[^\s].*"/>
</xs:restriction>
</xs:simpleType>
We can write (and that runs very fast on the provided XML document
long_string.xml),
<xs:simpleType name="SimpleText255NotBlankType">
<xs:annotation>
<xs:documentation xml:lang="en">String of maximum 255 characters,
not blank</xs:documentation>
</xs:annotation>
<xs:restriction base="xs:string">
<xs:pattern value="[^\s]\{1,255}"/>
</xs:restriction>
</xs:simpleType>
I think that, Xerces XSD processor in general, should not evaluate
xs:minLength, xs:maxLength facets before xs:pattern facet. The XSD
specification doesn't prescribe, any such guideline, and implementers can
determine order of XSD facet evaluation within a simple type as implementation
dependent.
> Validating XML against XSD is slow for long strings if pattern restrictions
> are defined, even if maxLength is restricted.
> -------------------------------------------------------------------------------------------------------------------------
>
> Key: XERCESJ-1716
> URL: https://issues.apache.org/jira/browse/XERCESJ-1716
> Project: Xerces2-J
> Issue Type: Improvement
> Reporter: Márk Petrényi
> Assignee: Mukul Gandhi
> Priority: Major
> Attachments: long_string.xml, unsafe.xsd, workaround.xsd
>
>
> Validating XML against XSD is slow for long strings if pattern restrictions
> are defined, even if maxLength is restricted.
> We have the following simple type defined in our xsd (unsafe.xsd):
> {code:xml}
> <xsd:simpleType name="SimpleText255NotBlankType">
> <xsd:annotation>
> <xsd:documentation xml:lang="en">String of maximum 255 characters, not
> blank</xsd:documentation>
> </xsd:annotation>
> <xsd:restriction base="xsd:string">
> <xsd:minLength value="1"/>
> <xsd:maxLength value="255"/>
> <xsd:pattern value=".*[^\s].*"/>
> </xsd:restriction>
> </xsd:simpleType>
> {code}
> The problem is when a really long string (ca. 1000000 characters) is provided
> as a value in the input xml, we would assume that it is regarded invalid
> quickly because of the length. Actually the validation takes several minutes
> since the regex gets evaluated before the maxLength restriction.
> We found a workaround for the issue if we define the simpleType this way
> (workaround.xsd):
> {code:xml}
> <xsd:simpleType name="SimpleText255Type">
> <xsd:annotation>
> <xsd:documentation xml:lang="en">String of maximum 255
> characters</xsd:documentation>
> </xsd:annotation>
> <xsd:restriction base="xsd:string">
> <xsd:minLength value="1"/>
> <xsd:maxLength value="255"/>
> <xsd:pattern value=".\{1,255}"/>
> </xsd:restriction>
> </xsd:simpleType>
> <xsd:simpleType name="SimpleText255NotBlankType">
> <xsd:annotation>
> <xsd:documentation xml:lang="en">String of maximum 255 characters, not
> blank</xsd:documentation>
> </xsd:annotation>
> <xsd:restriction base="SimpleText255Type">
> <xsd:pattern value=".*[^\s].*"/>
> </xsd:restriction>
> </xsd:simpleType>
> {code}
> The workaround only works because the implementation of the XSSimpleType
> builds a Vector of the regex patterns and the {{.{1,255}}} pattern will be
> evaluated first and it fails relatively quickly thus the time consuming
> second regex wont be checked.
> It would be great to have the regex pattern checked after validating other
> xsd restrictions (minLength, maxLength, etc..) or to have control over the
> validation ordering, thus avoiding unneccesseraly slow validations and the
> use of a workaround based on undocumented features.
> I attached the xsd-s referenced above and an xml containing a long string
> value. The problem can be checked using the SourceValidator from Xerces2-J
> samples:
> The original xsd with slow validation:
> {code:java}
> java jaxp.SourceValidator -a unsafe.xsd -i long_string.xml
> {code}
> The workaround xsd with normal run-time:
> {code:java}
> java jaxp.SourceValidator -a workaround.xsd -i long_string.xml
> {code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]