[ 
https://issues.apache.org/jira/browse/XERCESJ-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16606866#comment-16606866
 ] 

Mukul Gandhi commented on XERCESJ-1701:
---------------------------------------

I'm tempted to give a 'non XSD' workaround, for this problem. I think, the java 
SAX parser won't have problem parsing the volume of data in your large sized 
XML (I ran the Xerces sax.Counter sample on your large XML, and it completed 
successfully in about 9-10 seconds). You need to write a java component that 
parses the XML with SAX parser. Within the SAX endElement call, you can 
construct a String which is a concatination of fields of your key. You need to 
store this composite key value / string in a global HashSet. When an attempt to 
add a duplicate to HashSet will occur, you can detect that and your processing 
should stop (i.e you can stop parsing on first failure of the detection of a 
duplicate key value).

In relation to this, the following information may also be helpful, 
https://stackoverflow.com/questions/20870879/why-set-is-not-allowed-duplicate-value-which-kind-of-mechanism-used-behind-them.

> Xerces-J 2.12.0: XSD 1.1 PK constraint scalability issue
> --------------------------------------------------------
>
>                 Key: XERCESJ-1701
>                 URL: https://issues.apache.org/jira/browse/XERCESJ-1701
>             Project: Xerces2-J
>          Issue Type: Bug
>          Components: JAXP (javax.xml.validation)
>    Affects Versions: 2.12.0
>         Environment: Windows 10, 64-bit
>            Reporter: Yitzhak Khabinsky
>            Priority: Major
>              Labels: XSD, key
>             Fix For: 2.12.0
>
>         Attachments: SubscriberCountFact.zip
>
>
> Hello,
> A test case is very simple:
>  * XML file, size 700 MB
>  * XSD file
> XSD is enforcing the following:
>  * XML structure
>  * Data elements/attributes data types
>  * *PK constraint, composite primary key based on four elements*
>  * No asserts/assertions/CTAs
> <xs:key name="PK">
>   <xs:selector xpath="r"/>
>   <xs:field xpath="CountryCode"/>
>   <xs:field xpath="Date"/>
>   <xs:field xpath="AnalyticsArrangementKey"/>
>   <xs:field xpath="PaymentType"/>
>  </xs:key>
>  
>  Saxon Java EE runs XSD validation for 2 minutes
>  Xerces-J 2.12.0 cannot finish it at all, running for many hours.
>  If I comment out the *xs:key* constraint, Xerces has no problems to finish 
> the validation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to