[
https://issues.apache.org/jira/browse/XERCESJ-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17425287#comment-17425287
]
Christopher Sahnwaldt edited comment on XERCESJ-1227 at 10/7/21, 12:19 AM:
---------------------------------------------------------------------------
For what it's worth, I finally uploaded the sparse array version of CMStateSet
I wrote 15 years ago to GitHub:
[https://github.com/jcsahnwaldt/xerces-sparse-CMStateSet/blob/master/src/org/apache/xerces/impl/dtd/models/CMStateSet.java]
I copied the XML and XSD test files provided by [~mukul_gandhi] above. On my
machine, validating test.xml with maxOccurs.xsd takes about 7.5 to 8 seconds
with the original Xerces code, and 1.7 to 2 seconds with the sparse CMStateSet.
Still much slower than the {{assert count\(*) le 5000}} solution, but maybe it
helps.
Disclaimer: I haven't tested the CMStateSet code thoroughly. It seems to work
well, but it may also be slower than the original version in other use cases.
was (Author: jcsahnwaldt):
For what it's worth, I finally uploaded the sparse array version of CMStateSet
I wrote 15 years ago to GitHub:
[https://github.com/jcsahnwaldt/xerces-sparse-CMStateSet/blob/master/src/org/apache/xerces/impl/dtd/models/CMStateSet.java]
I copied the XML and XSD test files provided by [~mukul_gandhi] above. On my
machine, validating test.xml with maxOccurs.xsd takes about 7.5 to 8 seconds
with the original Xerces code, and 1.7 to 2 seconds with the sparse CMStateSet.
Still much slower than the {{assert count(*) le 5000}} solution, but maybe it
helps.
Disclaimer: I haven't tested the CMStateSet code thoroughly. It seems to work
well, but it may also be slower than the original version in other use cases.
> Poor performance / OutOfMemoryError for sequences, choices and nested with
> large minOccurs/maxOccurs
> ----------------------------------------------------------------------------------------------------
>
> Key: XERCESJ-1227
> URL: https://issues.apache.org/jira/browse/XERCESJ-1227
> Project: Xerces2-J
> Issue Type: Bug
> Components: XML Schema 1.0 Structures, XML Schema 1.1 Structures
> Affects Versions: 2.9.0
> Reporter: Michael Glavassevich
> Priority: Minor
> Labels: gsoc, gsoc2014, mentor
>
> We now handle large minOccurs/maxOccurs on element/wildcard particles more
> gracefully by creating a compact representation in the DFA and using counters
> to check the occurence constraints, however we will still fully expand the
> content model for minOccurs/maxOccurs on sequences and choices which could
> still lead to an OutOfMemoryError or very poor performance (i.e. could still
> take several minutes to build the DFA). Sequences, choices and nested
> minOccurs/maxOccurs are somewhat tricker to handle. We would need a more
> general solution than the one implemented for elements and wildcards to
> improve those.
> With the introduction of XML Schema 1.1 support we would also need to
> consider how to improve this for the enhanced xs:all model groups.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]