[
https://issues.apache.org/jira/browse/XERCESJ-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mukul Gandhi updated XERCESJ-1716:
----------------------------------
Issue Type: Improvement (was: Bug)
I'm changing the issue type of this report, specifying it as a possible
performance improvement requirement.
> Validating XML against XSD is slow for long strings if pattern restrictions
> are defined, even if maxLength is restricted.
> -------------------------------------------------------------------------------------------------------------------------
>
> Key: XERCESJ-1716
> URL: https://issues.apache.org/jira/browse/XERCESJ-1716
> Project: Xerces2-J
> Issue Type: Improvement
> Reporter: Márk Petrényi
> Priority: Major
> Attachments: long_string.xml, unsafe.xsd, workaround.xsd
>
>
> Validating XML against XSD is slow for long strings if pattern restrictions
> are defined, even if maxLength is restricted.
> We have the following simple type defined in our xsd (unsafe.xsd):
> {code:xml}
> <xsd:simpleType name="SimpleText255NotBlankType">
> <xsd:annotation>
> <xsd:documentation xml:lang="en">String of maximum 255 characters, not
> blank</xsd:documentation>
> </xsd:annotation>
> <xsd:restriction base="xsd:string">
> <xsd:minLength value="1"/>
> <xsd:maxLength value="255"/>
> <xsd:pattern value=".*[^\s].*"/>
> </xsd:restriction>
> </xsd:simpleType>
> {code}
> The problem is when a really long string (ca. 1000000 characters) is provided
> as a value in the input xml, we would assume that it is regarded invalid
> quickly because of the length. Actually the validation takes several minutes
> since the regex gets evaluated before the maxLength restriction.
> We found a workaround for the issue if we define the simpleType this way
> (workaround.xsd):
> {code:xml}
> <xsd:simpleType name="SimpleText255Type">
> <xsd:annotation>
> <xsd:documentation xml:lang="en">String of maximum 255
> characters</xsd:documentation>
> </xsd:annotation>
> <xsd:restriction base="xsd:string">
> <xsd:minLength value="1"/>
> <xsd:maxLength value="255"/>
> <xsd:pattern value=".\{1,255}"/>
> </xsd:restriction>
> </xsd:simpleType>
> <xsd:simpleType name="SimpleText255NotBlankType">
> <xsd:annotation>
> <xsd:documentation xml:lang="en">String of maximum 255 characters, not
> blank</xsd:documentation>
> </xsd:annotation>
> <xsd:restriction base="SimpleText255Type">
> <xsd:pattern value=".*[^\s].*"/>
> </xsd:restriction>
> </xsd:simpleType>
> {code}
> The workaround only works because the implementation of the XSSimpleType
> builds a Vector of the regex patterns and the {{.{1,255}}} pattern will be
> evaluated first and it fails relatively quickly thus the time consuming
> second regex wont be checked.
> It would be great to have the regex pattern checked after validating other
> xsd restrictions (minLength, maxLength, etc..) or to have control over the
> validation ordering, thus avoiding unneccesseraly slow validations and the
> use of a workaround based on undocumented features.
> I attached the xsd-s referenced above and an xml containing a long string
> value. The problem can be checked using the SourceValidator from Xerces2-J
> samples:
> The original xsd with slow validation:
> {code:java}
> java jaxp.SourceValidator -a unsafe.xsd -i long_string.xml
> {code}
> The workaround xsd with normal run-time:
> {code:java}
> java jaxp.SourceValidator -a workaround.xsd -i long_string.xml
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]