Márk Petrényi created XERCESJ-1716:
--------------------------------------
Summary: Validating XML against XSD is slow for long strings if
pattern restrictions are defined, even if maxLength is restricted.
Key: XERCESJ-1716
URL: https://issues.apache.org/jira/browse/XERCESJ-1716
Project: Xerces2-J
Issue Type: Bug
Reporter: Márk Petrényi
Attachments: long_string.xml, unsafe.xsd, workaround.xsd
Validating XML against XSD is slow for long strings if pattern restrictions are
defined, even if maxLength is restricted.
We have the following simple type defined in our xsd (unsafe.xsd):
{code:xml}
<xsd:simpleType name="SimpleText255NotBlankType">
<xsd:annotation>
<xsd:documentation xml:lang="en">String of maximum 255 characters, not
blank</xsd:documentation>
</xsd:annotation>
<xsd:restriction base="xsd:string">
<xsd:minLength value="1"/>
<xsd:maxLength value="255"/>
<xsd:pattern value=".*[^\s].*"/>
</xsd:restriction>
</xsd:simpleType>
{code}
The problem is when a really long string (ca. 1000000 characters) is provided
as a value in the input xml, we would assume that it is regarded invalid
quickly because of the length. Actually the validation takes several minutes
since the regex gets evaluated before the maxLength restriction.
We found a workaround for the issue if we define the simpleType this way
(workaround.xsd):
{code:xml}
<xsd:simpleType name="SimpleText255Type">
<xsd:annotation>
<xsd:documentation xml:lang="en">String of maximum 255
characters</xsd:documentation>
</xsd:annotation>
<xsd:restriction base="xsd:string">
<xsd:minLength value="1"/>
<xsd:maxLength value="255"/>
<xsd:pattern value=".\{1,255}"/>
</xsd:restriction>
</xsd:simpleType>
<xsd:simpleType name="SimpleText255NotBlankType">
<xsd:annotation>
<xsd:documentation xml:lang="en">String of maximum 255 characters, not
blank</xsd:documentation>
</xsd:annotation>
<xsd:restriction base="SimpleText255Type">
<xsd:pattern value=".*[^\s].*"/>
</xsd:restriction>
</xsd:simpleType>
{code}
The workaround only works because the implementation of the XSSimpleType builds
a Vector of the regex patterns and the {{.{1,255}}} pattern will be evaluated
first and it fails relatively quickly thus the time consuming second regex wont
be checked.
It would be great to have the regex pattern checked after validating other xsd
restrictions (minLength, maxLength, etc..) or to have control over the
validation ordering, thus avoiding unneccesseraly slow validations and the use
of a workaround based on undocumented features.
I attached the xsd-s referenced above and an xml containing a long string
value. The problem can be checked using the SourceValidator from Xerces2-J
samples:
The original xsd with slow validation:
{code:java}
java jaxp.SourceValidator -a unsafe.xsd -i long_string.xml
{code}
The workaround xsd with normal run-time:
{code:java}
java jaxp.SourceValidator -a workaround.xsd -i long_string.xml
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]