[ 
https://issues.apache.org/jira/browse/XERCESJ-1276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13593297#comment-13593297
 ] 

Jens Dittrich commented on XERCESJ-1276:
----------------------------------------

All, 

I did some work on the Xerces-J 2.11 sources and did the following changes:

* implement a TreeSet<List<Comparable>> for storing tuples of unique 
constraints and key constraints in XMLSchemaValidator$ValuationStoreBase.
* I rather decided to not use a HashSet as my use case requires dynamically 
extendable datastructures as I am validating using JAXB on the fly and I do not 
know the number of entries. As I understand the HashMap::resize implementation 
it runs with O(n), right?
* I made any interface in org.apache.xerces.xs.datatypes extending 
java.util.Comparable and implemented the respective operations in 
org.apache.xerces.impl.dv.* when required via lexicographical comparism, see 
ObjectListImpl.compareTo for instance.
* I had to add some casts in the class XSSimpleTypeDecl and one in 
SchemaGrammar. That is where I expect ClassCastExceptions during the Xerces 
Unit Tests that are not available to me.

Regards,
Jens.

                
> Improve performance of XML Schema Identity-constraint validation --- 
> XMLSchemaValidator$ValueStoreBase.contains() is painfully slow.
> ------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: XERCESJ-1276
>                 URL: https://issues.apache.org/jira/browse/XERCESJ-1276
>             Project: Xerces2-J
>          Issue Type: Bug
>          Components: XML Schema 1.0 Structures
>    Affects Versions: 2.6.2, 2.9.1
>            Reporter: Kenny MacLeod
>              Labels: gsoc, gsoc2013, mentor
>         Attachments: Xerces-J-src.2.11.0_patch1276.txt, 
> XMLSchemaValidator.java
>
>
> Under certain conditions, the contains() method in 
> XMLSchemaValidator$ValueStoreBase can cripple the performance of parsing and 
> validation.
> I'm not sure what those conditions are, but as a guideline figure I was using 
> JAXB2 to deserialize a 22meg XML file.  Without schema validation, it took 5 
> seconds.  With validation, it took over 3 minutes (JDK 1.5.0_10 on win32). My 
> profiler pointed the finger squarely at that method XMLSchemaValidator.
> Suspicions were aroused further when seeing this comment in the source:
> public boolean contains() {
>             // REVISIT: we can improve performance by using hash codes, 
> instead of
>             // traversing global vector that could be quite large.
> This is present in Xerces 2.6.2 contained with JDK1.5.0_10, and also in the 
> source for 2.9.1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to