Hi All,
There are performance issues with using JDK 1.5 syntax methods in
ooxml-schemas-1.1.jar and my concern is whether it is a blocker for POI-3.7.
In ooxml-schemas-1.0.jar a collection of XmlBeans could be accessed via
getXXXArray(), e.g.
CTRow[] rows = sheet.getRowArray()
ooxml-schemas-1.1.jar was compiled with JDK-1.5 support and the
preferred way of accessing collections is via getXXXList():
List<CTRow> rows = sheet.getRowList()
XmlBeans seems to force users to use getXXXList(), because all
getXXXArray() accessors are marked deprecated. So we changed everything
to use getXXXList() and thought we were fine :).
I always thought that getXXXList() and getXXXArray() are synonyms and
the returned List is a wrapper around the array. I also thought that the
following two forms of walking the sheet matrix are equivalent:
//old-style getXXArray()
for(CTRow row : sheet.getRowArray()){
for(CTCell cell : row.getCArray()){
}
}
// new getXXXList()
for(CTRow row : sheet.getRowList()){
for(CTCell cell : row.getCList()){
}
}
It turned out it is NOT so, getXXArray() is way faster than
getXXXList(). I analyzed the auto-generated source code and found that
they work differently.
A call of getXXXArray() performs an XPATH request to the underlying DOM
and returns the selected beans.
A call of getXXXList() does nothing. The returned List is a custom
subclass of AbstractList where overridden List.get(int index) sends an
XPATH request. This means that XPATH is sent on every iteration or on
every call of List.get(int index).
You won't notice much difference for small files, but the larger the
DOM, the more dramatic difference is.
Below are my benchmarks. I ran the code snippets above against sample
sheets of different sizes.
matrix,
rows x columns getXXXArray() getXXXList()
100 x 100 35ms 70ms
1000 x 100 150ms 700ms
5000 x 100 570ms 4900ms
10000 x 100 3600ms 27000ms
I'm going to produce the release artifacts by Friday. There is no time
to fix this problem - it is a serious change and I don't want to
occasionally break anything. I'm inclined to think it is OK to release -
we did three betas and so far the feedback was positive. I plan POI-3.8
in Dec-Jan and we can defer the fix until then.
What do people think?
Regards,
Yegor
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]