[ 
https://issues.apache.org/jira/browse/XERCESJ-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16258027#comment-16258027
 ] 

Mukul Gandhi commented on XERCESJ-1684:
---------------------------------------

I've few observations, as below.

1) case 1: I commented all your <assert> tags, and added my <assert> tag as 
below,

<xs:element name="AuditFile">
        <xs:complexType>
            <xs:sequence>
                <xs:element ref="Header" minOccurs="1" />
                <xs:element name="MasterFiles">
                    <xs:complexType>
                        <xs:sequence>
                            <xs:element ref="GeneralLedgerAccounts" 
minOccurs="0" />
                            <xs:element ref="Customer" minOccurs="0" 
maxOccurs="unbounded" />
                            <xs:element ref="Supplier" minOccurs="0" 
maxOccurs="unbounded" />
                            <xs:element ref="Product" minOccurs="0" 
maxOccurs="unbounded" />
                            <xs:element ref="TaxTable" minOccurs="0" />
                        </xs:sequence>
                    </xs:complexType>
                </xs:element>
                <xs:element ref="GeneralLedgerEntries" minOccurs="0" />
                <xs:element ref="SourceDocuments" minOccurs="0" />
            </xs:sequence>
            <xs:assert test="1 = 1" />
        </xs:complexType>
        ... more XSD

With this change, when I do Xerces XSD 1.1 validation, using your XML and XSD I 
get following error:

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit 
exceeded

(I've a fairly decent development configuration on my workstation)

In Xerces XSD 1.1 implementation, whenever an <assert> occurs during validation 
episode, to process <assert> validation we construct a DOM for the <assert> and 
process the validation. For my example <assert> above, the DOM for the <assert> 
spans the entire XML document. Your XML document is sized 53.3 MB (this is 
huge). The processing of <assert> is in conjunction to resources consumed for 
other validation tasks.

You're XSD document is also pretty huge. It's about 1644 LOC.

2) case 2: When I don't have any <assert> in the XSD document which you've 
posted, the validation terminates successfully in about 21.5 seconds from the 
command line. I would say, this is high latency. If this validation is part of 
a web app for example, the response time for typical requirements would be 
considered large.

My overall feeling therefore,
1) Use <assert> at lower parts of XML tree (if you can).
2) Refactor your XSD into multiple XSDs, and join them via <include>/<import>. 
I'm not sure, if this will solve your current problem, but it'll atleast 
improve the design of your XSDs.
3) Play with following JVM parameters, when doing validation: -Xms & -Xmx.
4) Have a validation system, where you can have smaller XML documents as input. 
I imagine, you can have multiple XML documents as input (you can start with 
2-3), and then chain multiple validation episodes.
5) You may also move validation tasks to other parts of the app architecture, 
and don't do that in XSD.

IMHO my feeling is, this bug report doesn't warrant a fix. I'm sorry, if you 
don't feel happy with my reply.

> Very high memory usage validating XSD 1.1 (+memory leak)
> --------------------------------------------------------
>
>                 Key: XERCESJ-1684
>                 URL: https://issues.apache.org/jira/browse/XERCESJ-1684
>             Project: Xerces2-J
>          Issue Type: Bug
>          Components: JAXP (javax.xml.validation)
>         Environment: windows java 1.8
>            Reporter: Simon Sprott
>            Priority: Critical
>         Attachments: SAFTPT_1_04_01_XSD11_Full.zip, tmp.zip
>
>
> using the 1.1 code branch built from
> http://svn.apache.org/repos/asf/xerces/java/branches/xml-schema-1.1-dev
> Using the built in sample validator 
> java jaxp.SourceValidator -xsd11 -a "SAFTPT_1_04_01_XSD11_Full.xsd" -i 
> "tmp.xml"
> The validation consumes huge amounts of memory (breaks at 10 GB on my 
> system), on smaller sample files validation does complete, but still consumes 
> a very large quantity of memory.
> Furthermore it seems to retain the memory allocated via a reference in 
> org.eclipse.wst.xml.xpath2.processor.internal.DefaultRSFactory._factory after 
> the validator has completed (this is not evident in the cmd line sample as 
> the process ends, but I have observed in in my own code).
> If I was to guess I would say that the results of the XPath queries are being 
> cached via DefaultRSFactory and never released.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to