[ https://issues.apache.org/jira/browse/XMLBEANS-438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18007610#comment-18007610 ]
PJ Fanning commented on XMLBEANS-438: ------------------------------------- That code is still a major draw on POI performance. I intend to make some changes in XMLBeans 5.4.0 but must admit that the key method (arraySetterHelper) that has terrible performance is going to be very hard to refactor without breaking things. So far, I'm concentrating on some less complex methods in XMLBeans. https://github.com/apache/xmlbeans/pull/21 is my initial change but I intend to come back to this and do a 2nd PR to try to refactor the arraySetterHelper method. My initial attempts there caused issues. > O(n^2) operation in xmlbeans would like to make O(n log n) or better > -------------------------------------------------------------------- > > Key: XMLBEANS-438 > URL: https://issues.apache.org/jira/browse/XMLBEANS-438 > Project: XMLBeans > Issue Type: Improvement > Affects Versions: Version 2.4 > Environment: Using XMLbeans inside of POI when saving spreadsheets in > the XLSX file format. > Reporter: Bryce Alcock > Assignee: PJ Fanning > Priority: Major > Labels: OOXML, XLSX, XmlComplexContentImpl, Xobj > Fix For: 5.4.0 > > Attachments: Main.java, PoiTest.jar, ResponseTimeVsRowCount.png > > > Using POI to save a spread sheet in XLSX format causes an O(n^2) operation to > occur. > Specifically: > Just a Quick overview > it seems like XmlComplexContentImpl.arraySetterHelper iterates over all > elements in the spread sheet. > then in side of that iteration, the Xobj.find_element_user(Qname,int) is an > Order n operation. > Just making Xobj.find_element_user a O(log n) would completely solve the > problem. > ANALYSIS: O(n^2) in XML Beans: (Around Line 1153 or so pf > XmlComplexContentImpl.java method arraySetterHelper) > ----------------------------------------------------------- > // Order N (this is invoked 1 time for each element in sources (~120,000 > records in my case) > for ( i++, j++; i < sources.length; i++, j++) > { > long t1 = System.currentTimeMillis(); > XmlCursor c = sources[i].isImmutable() ? null : sources[i].newCursor(); > getCursorTime += (System.currentTimeMillis()-t1); > long t2=System.currentTimeMillis(); > if (c != null && c.toParent() && c.getObject() == this) > { > ifCheckTime += (System.currentTimeMillis()-t2); > long t3 = System.currentTimeMillis(); > c.dispose(); > long t4 = System.currentTimeMillis(); > // THIS CAUSES O(n^2) SEE CODE BELOW, but this on O(n) for each record. > // THIS IS THE INEFFICIENCY in XMLBeans... > current = store.find_element_user( elemName, j ); > findUserElement += (System.currentTimeMillis() - t4); > if (current == sources[i]) > ; // advance > else > { > // Fall back to the general case > break; > } > insideIfTime += (System.currentTimeMillis()-t3); > } > else > { > ifCheckTime += (System.currentTimeMillis()-t2); > c.dispose(); > // Insert before the current element > TypeStoreUser user = store.insert_element_user( elemName, j ); > ((XmlObjectBase) user).set( sources[i] ); > } > } > ---------------- Method from store.find_element_user(QName,int i)--------- > // THIS METHOD IS 0(n) > // but this is called n times by XmlComplexContentImpl.java around line > 1153.... > public TypeStoreUser find_element_user ( QName name, int i ) > { > for ( Xobj x = _firstChild ; x != null ; x = x._nextSibling ) > if (x.isElem() && x._name.equals( name ) && --i < 0) > return x.getUser(); > return null; > } > I will add a sample code showing exactly how to reproduce this issue. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org For additional commands, e-mail: dev-h...@poi.apache.org