[
https://issues.apache.org/jira/browse/XERCESJ-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16741769#comment-16741769
]
Mukul Gandhi commented on XERCESJ-1705:
---------------------------------------
Here are my findings after analyzing your bug report with the file attachments.
Your XML file is of the size about 2.3 MB. This is a reasonably sized XML file
and is not large, and asserts should run fast on such a XML file. Within your
XML, the total number of sibling element A are 91000. The structure of element
A is very shallow. The assert in your XSD evaluates on each element A. On my
localhost, the XSD validation took about 25 sec (about the same order as the
timing you've reported).
I think, an assert evaluation on one A takes very tiny time. The total time (25
sec. or 20 sec.) for all assert evaluations, is the time to repeat one fast
assert 91000 times. I don't think this is a performance bug in Xerces. As a
comparison, consider following java code (which is a simple repetition done
91000 times),
long start = System.currentTimeMillis();
for (int idx = 0; idx < 91000; idx++) {
System.out.println(idx);
}
long end = System.currentTimeMillis();
System.out.println((end - start) + "ms");
The time reported by this code on my localhost is, about 15 sec. Is this a
performance bug with this java code? I don't think so. Just the repetitions are
too many.
If we remove your assert from its original place, and instead have assert as
shown in below XSD fragment (copied from your example),
<xsd:element name="root">
<xsd:complexType>
<xsd:sequence ...>
...
</xsd:sequence>
<xsd:assert test="every $a in A satisfies ((not($a/B) and $a/C) or ($a/B
and not($a/C)))"/>
</xsd:complexType>
</xsd:element>
This new assert conceptually does the same thing as your assert, but it is
evaluated only once (on a much much bigger tree fragment). The XML validation
time reported by this change, on my localhost is about 6.5 sec (this is a huge
improvement as compared to the original time).
My conclusion is that the issue reported by your examples is not a Xerces
performance bug.
> Validation against asserts (1.1) is slow and takes up a lot of memory for
> larger files.
> ---------------------------------------------------------------------------------------
>
> Key: XERCESJ-1705
> URL: https://issues.apache.org/jira/browse/XERCESJ-1705
> Project: Xerces2-J
> Issue Type: Bug
> Components: XML Schema 1.1 Structures
> Affects Versions: 2.12.0
> Reporter: Gerben Abbink
> Priority: Major
> Attachments: PROBLEM.xml, PROBLEM.xsd
>
>
> The validation of xml against asserts in XMLSchema 1.1 is slow and takes up a
> lot of memory for larger xml files. I have created a simple test xml file
> with lots of repetition and a corresponding xml schema to show the problem.
> It takes 20 sec. to validate the xml against the xml schema. When i remove
> the asserts in the xml schema it takes just 1 second to validate. Testing was
> done from the command prompt on a modern Windows machine with 8GByte memory.
> To compare, i have also validated the xml file against the xml schema in
> XMLSpy. With asserts it takes 2 sec., without the asserts 1 sec. (XMLSpy does
> not uses Xerces.)
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]