I encountered several performance issues that pertain to the way POI uses XmlBeans:
https://issues.apache.org/bugzilla/show_bug.cgi?id=44581 https://issues.apache.org/bugzilla/show_bug.cgi?id=51585 https://issues.apache.org/bugzilla/show_bug.cgi?id=55280 https://issues.apache.org/bugzilla/show_bug.cgi?id=56556 They all seem to be rooted in the following XmlBeans bug: https://issues.apache.org/jira/browse/XMLBEANS-389 The XMLBeans recommendation is that when accessing long XML documents, or when performance matters, one should use XmlCursor and not getXXXList(). Obviously performance matters for POI users, especially for big XLSX files, so this is very much relevant, but I’m not sure converting to XmlCursor would be easy – replacing XmlBeans with another library might even be preferable. But in order to provide a shorter-term solution, I propose we sweep the code and replace all usages of getXXXList() with getXXXArray() - which is much faster since it creates a snapshot of the xml data instead of a live view. This will be trivial for read-only operations (e.g. bug #56556) and easy for write operations – just read the array, work on it or create a new one, and set the result back to the proxy. Also, some attempts to minimize this – using toArray() on the returned list, can be reduced to getXXXArray(), since toArray() iterates over the elements, which is very inefficient in the XmlBeans impl. Let’s discuss a way to do this across the board – I’m able to contribute by either supplying patches or committing myself – I’m already a committer for other projects and have a CLA. Regards, Yaniv
