[ 
https://issues.apache.org/jira/browse/XERCESJ-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16741769#comment-16741769
 ] 

Mukul Gandhi edited comment on XERCESJ-1705 at 1/15/19 5:54 AM:
----------------------------------------------------------------

Here are my findings after analyzing your bug report, using the file 
attachments provided by you. Your XML file is of the size about 2.3 MB. This is 
a reasonably sized XML file and is not large, and asserts should run fast on 
such a XML file. Within your XML, the total number of sibling element A are 
91000. The structure of element A is very shallow (i.e its of very tiny 
height). The assert in your XSD evaluates on each element A. On my localhost, 
the XSD validation took about 25 sec (about the same order as the timing you've 
reported).

I think, an assert evaluation on one A XML element takes very less time. The 
total time (25 sec. or 20 sec.) for all assert evaluations, is the time to 
repeat one fast assert 91000 times. I don't think this is a performance bug in 
Xerces. As a comparison, consider following java code (which is a simple 
repetition done 91000 times),

long start = System.currentTimeMillis();
 for (int idx = 0; idx < 91000; idx++) {      

     System.out.println(idx);

}

long end = System.currentTimeMillis();
 System.out.println((end - start) + "ms");

The time reported by this code on my localhost is, about 15 sec. Is this a 
performance bug with this java code? I don't think so. Just the repetitions are 
too many.

If we remove your assert from its original place and instead have assert as 
shown in below XSD fragment (non assert parts are copied from your example),

<xsd:element name="root">
     <xsd:complexType>

       <xsd:sequence ...>

          ...

       </xsd:sequence>
        <xsd:assert test="every $a in A satisfies ((not($a/B) and $a/C) or 
($a/B and not($a/C)))"/>
     </xsd:complexType>
 </xsd:element>

This new assert conceptually does the same thing as your assert, but it is 
evaluated only once (on a much much bigger XML tree fragment). The XML 
validation time reported after this change, on my localhost is about 6.5 sec 
(this is a huge improvement as compared to the original time).

My conclusion is that the issue reported by your examples is not a Xerces 
performance bug.


was (Author: mukul_gandhi):
Here are my findings after analyzing your bug report with the file attachments. 
Your XML file is of the size about 2.3 MB. This is a reasonably sized XML file 
and is not large, and asserts should run fast on such a XML file. Within your 
XML, the total number of sibling element A are 91000. The structure of element 
A is very shallow. The assert in your XSD evaluates on each element A. On my 
localhost, the XSD validation took about 25 sec (about the same order as the 
timing you've reported).

I think, an assert evaluation on one A takes very tiny time. The total time (25 
sec. or 20 sec.) for all assert evaluations, is the time to repeat one fast 
assert 91000 times. I don't think this is a performance bug in Xerces. As a 
comparison, consider following java code (which is a simple repetition done 
91000 times),

long start = System.currentTimeMillis();
 for (int idx = 0; idx < 91000; idx++) {
     System.out.println(idx);
 }
 long end = System.currentTimeMillis();
 System.out.println((end - start) + "ms");

The time reported by this code on my localhost is, about 15 sec. Is this a 
performance bug with this java code? I don't think so. Just the repetitions are 
too many.

If we remove your assert from its original place, and instead have assert as 
shown in below XSD fragment (copied from your example),

<xsd:element name="root">
    <xsd:complexType>

       <xsd:sequence ...>

          ...

       </xsd:sequence>
       <xsd:assert test="every $a in A satisfies ((not($a/B) and $a/C) or ($a/B 
and not($a/C)))"/>
    </xsd:complexType>
 </xsd:element>

This new assert conceptually does the same thing as your assert, but it is 
evaluated only once (on a much much bigger tree fragment). The XML validation 
time reported by this change, on my localhost is about 6.5 sec (this is a huge 
improvement as compared to the original time).

My conclusion is that the issue reported by your examples is not a Xerces 
performance bug.

> Validation against asserts (1.1) is slow and takes up a lot of memory for 
> larger files.
> ---------------------------------------------------------------------------------------
>
>                 Key: XERCESJ-1705
>                 URL: https://issues.apache.org/jira/browse/XERCESJ-1705
>             Project: Xerces2-J
>          Issue Type: Bug
>          Components: XML Schema 1.1 Structures
>    Affects Versions: 2.12.0
>            Reporter: Gerben Abbink
>            Priority: Major
>         Attachments: PROBLEM.xml, PROBLEM.xsd
>
>
> The validation of xml against asserts in XMLSchema 1.1 is slow and takes up a 
> lot of memory for larger xml files. I have created a simple test xml file 
> with lots of repetition and a corresponding xml schema to show the problem.
> It takes 20 sec. to validate the xml against the xml schema. When i remove 
> the asserts in the xml schema it takes just 1 second to validate. Testing was 
> done from the command prompt on a modern Windows machine with 8GByte memory.
> To compare, i have also validated the xml file against the xml schema in 
> XMLSpy. With asserts it takes 2 sec., without the asserts 1 sec. (XMLSpy does 
> not uses Xerces.)
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to