[
https://issues.apache.org/jira/browse/XERCESJ-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16741769#comment-16741769
]
Mukul Gandhi edited comment on XERCESJ-1705 at 1/15/19 6:02 AM:
----------------------------------------------------------------
Here are my findings after analyzing your bug report, using the file
attachments provided by you. Your XML file is of the size about 2.3 MB. This is
a reasonably sized XML file and is not large, and asserts should run fast on
such a XML file. Within your XML, the total number of sibling element A are
91000. The structure of element A is very shallow (i.e its of very tiny
height). The assert in your XSD evaluates on each element A. On my localhost,
the XSD validation took about 25 sec (about the same order as the timing you've
reported).
I think, an assert evaluation on one A XML element takes very less time. The
total time (25 sec. or 20 sec.) for all assert evaluations, is the time to
repeat one fast assert 91000 times. I don't think this is a performance bug in
Xerces. As a comparison, consider following java code (which is a simple
repetition done 91000 times),
long start = System.currentTimeMillis();
for (int idx = 0; idx < 91000; idx++) {
System.out.println(idx);
}
long end = System.currentTimeMillis();
System.out.println((end - start) + "ms");
The time reported by this code on my localhost is, about 15 sec. Is this a
performance bug with this java code? I don't think so. Just the repetitions are
too many.
If we remove your assert from its original place and instead have assert as
shown in below XSD fragment (non assert parts are copied from your example),
<xsd:element name="root">
<xsd:complexType>
<xsd:sequence ...>
...
</xsd:sequence>
<xsd:assert test="every $a in A satisfies ((not($a/B) and $a/C) or
($a/B and not($a/C)))"/>
</xsd:complexType>
</xsd:element>
This new assert conceptually does the same thing as your assert, but it is
evaluated only once (on a much much bigger XML tree fragment). The XML
validation time reported after this change, on my localhost is about 6.5 sec
(this is a huge improvement as compared to the original time).
The title of your bug report says, "takes up a lot of memory for larger files".
Using your sample files, and as inferred by my explanation above, I think
that's not true. The greater time taken to report final validation outcome, is
due to large number of 'assert' evaluations (i.e iterations). I've a feeling,
that memory used by the JVM during this process is not high.
My conclusion is that the issue reported by your examples is not a Xerces
performance bug.
was (Author: mukul_gandhi):
Here are my findings after analyzing your bug report, using the file
attachments provided by you. Your XML file is of the size about 2.3 MB. This is
a reasonably sized XML file and is not large, and asserts should run fast on
such a XML file. Within your XML, the total number of sibling element A are
91000. The structure of element A is very shallow (i.e its of very tiny
height). The assert in your XSD evaluates on each element A. On my localhost,
the XSD validation took about 25 sec (about the same order as the timing you've
reported).
I think, an assert evaluation on one A XML element takes very less time. The
total time (25 sec. or 20 sec.) for all assert evaluations, is the time to
repeat one fast assert 91000 times. I don't think this is a performance bug in
Xerces. As a comparison, consider following java code (which is a simple
repetition done 91000 times),
long start = System.currentTimeMillis();
for (int idx = 0; idx < 91000; idx++) {
System.out.println(idx);
}
long end = System.currentTimeMillis();
System.out.println((end - start) + "ms");
The time reported by this code on my localhost is, about 15 sec. Is this a
performance bug with this java code? I don't think so. Just the repetitions are
too many.
If we remove your assert from its original place and instead have assert as
shown in below XSD fragment (non assert parts are copied from your example),
<xsd:element name="root">
<xsd:complexType>
<xsd:sequence ...>
...
</xsd:sequence>
<xsd:assert test="every $a in A satisfies ((not($a/B) and $a/C) or
($a/B and not($a/C)))"/>
</xsd:complexType>
</xsd:element>
This new assert conceptually does the same thing as your assert, but it is
evaluated only once (on a much much bigger XML tree fragment). The XML
validation time reported after this change, on my localhost is about 6.5 sec
(this is a huge improvement as compared to the original time).
My conclusion is that the issue reported by your examples is not a Xerces
performance bug.
> Validation against asserts (1.1) is slow and takes up a lot of memory for
> larger files.
> ---------------------------------------------------------------------------------------
>
> Key: XERCESJ-1705
> URL: https://issues.apache.org/jira/browse/XERCESJ-1705
> Project: Xerces2-J
> Issue Type: Bug
> Components: XML Schema 1.1 Structures
> Affects Versions: 2.12.0
> Reporter: Gerben Abbink
> Priority: Major
> Attachments: PROBLEM.xml, PROBLEM.xsd
>
>
> The validation of xml against asserts in XMLSchema 1.1 is slow and takes up a
> lot of memory for larger xml files. I have created a simple test xml file
> with lots of repetition and a corresponding xml schema to show the problem.
> It takes 20 sec. to validate the xml against the xml schema. When i remove
> the asserts in the xml schema it takes just 1 second to validate. Testing was
> done from the command prompt on a modern Windows machine with 8GByte memory.
> To compare, i have also validated the xml file against the xml schema in
> XMLSpy. With asserts it takes 2 sec., without the asserts 1 sec. (XMLSpy does
> not uses Xerces.)
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]