[ 
https://issues.apache.org/jira/browse/XERCESJ-970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16104065#comment-16104065
 ] 

Adam Leggett commented on XERCESJ-970:
--------------------------------------

I realize this is a very old ticket - and seemingly fixed in the later JRE 
editions of Xerces - but I have seen this manifest recently when we added a 
library to our project which has a dependency on xercesImpl-2.10.0.jar.   It 
took me a while to track down that the old library was the culprit.  We were 
seeing processing times of over 2 minutes with 12MB base64 encoded PNG files, 
which dropped to under 2 seconds when the old library was removed.  Perhaps 
this ticket is obsolete, but given the JRE edition is fixed - I'm curious as to 
whether this ever fixed publically? - the fix version here is 'None' and it's 
priority was deemed 'minor'.

To help others out, I'm going to mention a couple of keywords SAXBuilder, 
org.jdom.Document and build - as it took a while to find this old thread.

> Large comments are extremely slow to parse
> ------------------------------------------
>
>                 Key: XERCESJ-970
>                 URL: https://issues.apache.org/jira/browse/XERCESJ-970
>             Project: Xerces2-J
>          Issue Type: Bug
>          Components: XNI
>    Affects Versions: 2.2.0, 2.2.1, 2.3.0, 2.4.0, 2.5.0, 2.6.0, 2.6.1, 2.6.2
>         Environment: Windows XP running Java 1.4.2
>            Reporter: Sean Griffin
>            Priority: Minor
>         Attachments: comments.txt
>
>
> Very large comments drastically increase the parsing time for both SAX and 
> DOM implementations.  Running the sax.Counter and dom.Counter samples with a 
> 410KB file where the entire thing is uncommented results in parse times in 
> the 100ms to 300ms range.  However, if I comment out 95% of the file and run 
> the same samples the parse times jump to between 40 and 50 seconds.  I ran 
> the same samples using the Aelfred parser shipped with Saxon 7.9 and, while 
> the file with the large comment was slower than without the comment, it 
> jumped by only 100ms or so.
> I briefly compared the code between the two parsers, and they don't look 
> significantly different when it comes to handling comments.  The only main 
> difference I noticed was around low/high byte character checks.  I suspect it 
> is an inefficiency in the XMLStringBuffer class, but I'm not seeing anything.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to