I've been poking around a little bit into the 3.0 branch. One issue that
pops out is that the parallel arrays in ValueRecordsAggregate are
sparse. This makes iteration slower (have to skip empty cells) and uses
more memory. In addition, all of the arrays must be the same size, even
though some inherently sparser (e.g. formulaptgs).
One possibility is to eliminate totally empty rows and columns from each
of the data sets. Separate row and column index arrays would be needed
for each data set to map into the value arrays. Random access would use
linear search on the index arrays, followed by a simple lookup into the
value array.
For even better performance and memory use, I would recommend moving
away from the current sheet.getRow(i).getCell(j) model. That model
forces a certain level of object creation which is not always
necessary. Something like a JDBC ResultSet is more appropriate for
managing large amounts of data, where you reuse the same object, and
just the underlying values changes as you iterate. Example:
Cell cell = new ColumnIterator(sheet, 3);
while (cell.next()) {
if (cell.getType() == Cell.STRING) {
System.out.println(cell.getValue());
}
}
In a read-only, low-memory scenario, this type of API would allow you to
get rid of *all* the data storage in ValueRecordsAggregate. The
iterators could advance over the underlying RandomAccessFile directly,
doing the necessary conversions on the fly. For modifying or writing
sheets, something like ValueRecordsAggregate will still be necessary.
Chris
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]