[ 
https://issues.apache.org/jira/browse/XERCESJ-1276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16597277#comment-16597277
 ] 

Antti S. Lankila commented on XERCESJ-1276:
-------------------------------------------

I had to revisit this issue because the xercesImpl.jar provided above did not 
work for me; the unique constraint did not trigger when using that 
implementation, but did trigger when using JDK's own xerces. I did not 
investigate why exactly the patch did not work. I rewrote a similar 
implementation that solves the O(N^2) performance of the constraint check.

If there is interest for finally fixing this bug in Xerces, I can provide my 
patch at low "as-is" quality to assist such an effort. I do not recommended it 
for inclusion. The key thing is that my Eclipse's save actions removed 
unnecessary casts and added some extra Override annotations, which now bloat 
the patch for no reason, so please ignore that as noise. The key idea was that 
I created a container class called HeldValue for which contains the value, 
valueType and itemValueType of a specific attribute as a single entity. Then, I 
implemented equals() and hashCode() for HeldValue so that I could replace 
fValues with a LinkedHashSet<HeldValue>.

Initially, the approach did not work – unique check did not trigger on file I 
knew had an error. On investigation, I realized that hashCode() methods are 
missing on some container classes from Xerces, and this prevents JDK's 
LinkedHashSet from finding the instances despite they would now compare as 
equal. Because xerces mostly uses Object type on the method signature, I did 
not know exactly which classes were missing their hashCodes, so I just added a 
hashCode() to all classes that had equals() but no hashCode() defined.

[^xerces-fast-unique-check.diff]

 

> Improve performance of XML Schema Identity-constraint validation --- 
> XMLSchemaValidator$ValueStoreBase.contains() is painfully slow.
> ------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: XERCESJ-1276
>                 URL: https://issues.apache.org/jira/browse/XERCESJ-1276
>             Project: Xerces2-J
>          Issue Type: Bug
>          Components: XML Schema 1.0 Structures
>    Affects Versions: 2.6.2, 2.9.1
>            Reporter: Kenny MacLeod
>            Priority: Major
>              Labels: gsoc, gsoc2013, mentor
>         Attachments: XMLSchemaValidator.java, 
> Xerces-J-src.2.11.0_patch1276.txt, xerces-binaries-patched-over-2.11.0.zip, 
> xerces-fast-unique-check.diff, xerces-value-store.txt
>
>
> Under certain conditions, the contains() method in 
> XMLSchemaValidator$ValueStoreBase can cripple the performance of parsing and 
> validation.
> I'm not sure what those conditions are, but as a guideline figure I was using 
> JAXB2 to deserialize a 22meg XML file.  Without schema validation, it took 5 
> seconds.  With validation, it took over 3 minutes (JDK 1.5.0_10 on win32). My 
> profiler pointed the finger squarely at that method XMLSchemaValidator.
> Suspicions were aroused further when seeing this comment in the source:
> public boolean contains() {
>             // REVISIT: we can improve performance by using hash codes, 
> instead of
>             // traversing global vector that could be quite large.
> This is present in Xerces 2.6.2 contained with JDK1.5.0_10, and also in the 
> source for 2.9.1.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to