Performance issues related to XMLBeans

Yaniv Kunda Tue, 29 Jul 2014 01:49:19 -0700

I encountered several performance issues that pertain to the way POI uses
XmlBeans:


https://issues.apache.org/bugzilla/show_bug.cgi?id=44581

https://issues.apache.org/bugzilla/show_bug.cgi?id=51585

https://issues.apache.org/bugzilla/show_bug.cgi?id=55280

https://issues.apache.org/bugzilla/show_bug.cgi?id=56556



They all seem to be rooted in the following XmlBeans bug:

https://issues.apache.org/jira/browse/XMLBEANS-389



The XMLBeans recommendation is that when accessing long XML documents, or
when performance matters, one should use XmlCursor and not getXXXList().

Obviously performance matters for POI users, especially for big XLSX files,
so this is very much relevant,

but I’m not sure converting to XmlCursor would be easy –

replacing XmlBeans with another library might even be preferable.



But in order to provide a shorter-term solution, I propose we sweep the
code and replace all usages of getXXXList() with getXXXArray() -  which is
much faster since it creates a snapshot of the xml data instead of a live
view.

This will be trivial for read-only operations (e.g. bug #56556) and easy
for write operations – just read the array, work on it or create a new one,
and set the result back to the proxy.

Also, some attempts to minimize this – using toArray() on the returned
list, can be reduced to getXXXArray(), since toArray() iterates over the
elements, which is very inefficient in the XmlBeans impl.



Let’s discuss a way to do this across the board –

I’m able to contribute by either supplying patches or committing myself –
I’m already a committer for other projects and have a CLA.



Regards,

Yaniv

Performance issues related to XMLBeans

Reply via email to