[
https://issues.apache.org/jira/browse/XERCESJ-1716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968107#comment-16968107
]
Mukul Gandhi commented on XERCESJ-1716:
---------------------------------------
out of my curiosity, I tried your unsafe.xsd schema with the type
"SimpleText255NotBlankType" written as following,
<xs:simpleType name="SimpleText255NotBlankType">
<xs:annotation>
<xs:documentation xml:lang="en">String of maximum 255 characters, not
blank</xs:documentation>
</xs:annotation>
<xs:restriction base="xs:string">
<xs:minLength value="1"/>
<xs:maxLength value="255"/>
<xs:assertion test="not(contains($value, ' '))"/>
</xs:restriction>
</xs:simpleType>
(I'm using an XSD 1.1 <assertion> facet instead of <pattern>)
This performs fast with the .xml document you've posted.
Therefore, this seems to be another workaround for your use case.
> Validating XML against XSD is slow for long strings if pattern restrictions
> are defined, even if maxLength is restricted.
> -------------------------------------------------------------------------------------------------------------------------
>
> Key: XERCESJ-1716
> URL: https://issues.apache.org/jira/browse/XERCESJ-1716
> Project: Xerces2-J
> Issue Type: Bug
> Reporter: Márk Petrényi
> Priority: Major
> Attachments: long_string.xml, unsafe.xsd, workaround.xsd
>
>
> Validating XML against XSD is slow for long strings if pattern restrictions
> are defined, even if maxLength is restricted.
> We have the following simple type defined in our xsd (unsafe.xsd):
> {code:xml}
> <xsd:simpleType name="SimpleText255NotBlankType">
> <xsd:annotation>
> <xsd:documentation xml:lang="en">String of maximum 255 characters, not
> blank</xsd:documentation>
> </xsd:annotation>
> <xsd:restriction base="xsd:string">
> <xsd:minLength value="1"/>
> <xsd:maxLength value="255"/>
> <xsd:pattern value=".*[^\s].*"/>
> </xsd:restriction>
> </xsd:simpleType>
> {code}
> The problem is when a really long string (ca. 1000000 characters) is provided
> as a value in the input xml, we would assume that it is regarded invalid
> quickly because of the length. Actually the validation takes several minutes
> since the regex gets evaluated before the maxLength restriction.
> We found a workaround for the issue if we define the simpleType this way
> (workaround.xsd):
> {code:xml}
> <xsd:simpleType name="SimpleText255Type">
> <xsd:annotation>
> <xsd:documentation xml:lang="en">String of maximum 255
> characters</xsd:documentation>
> </xsd:annotation>
> <xsd:restriction base="xsd:string">
> <xsd:minLength value="1"/>
> <xsd:maxLength value="255"/>
> <xsd:pattern value=".\{1,255}"/>
> </xsd:restriction>
> </xsd:simpleType>
> <xsd:simpleType name="SimpleText255NotBlankType">
> <xsd:annotation>
> <xsd:documentation xml:lang="en">String of maximum 255 characters, not
> blank</xsd:documentation>
> </xsd:annotation>
> <xsd:restriction base="SimpleText255Type">
> <xsd:pattern value=".*[^\s].*"/>
> </xsd:restriction>
> </xsd:simpleType>
> {code}
> The workaround only works because the implementation of the XSSimpleType
> builds a Vector of the regex patterns and the {{.{1,255}}} pattern will be
> evaluated first and it fails relatively quickly thus the time consuming
> second regex wont be checked.
> It would be great to have the regex pattern checked after validating other
> xsd restrictions (minLength, maxLength, etc..) or to have control over the
> validation ordering, thus avoiding unneccesseraly slow validations and the
> use of a workaround based on undocumented features.
> I attached the xsd-s referenced above and an xml containing a long string
> value. The problem can be checked using the SourceValidator from Xerces2-J
> samples:
> The original xsd with slow validation:
> {code:java}
> java jaxp.SourceValidator -a unsafe.xsd -i long_string.xml
> {code}
> The workaround xsd with normal run-time:
> {code:java}
> java jaxp.SourceValidator -a workaround.xsd -i long_string.xml
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]