Do you think the tradeoff would be worth it in a read-write scenario? I'm
leaning towards to completing this level of refactoring done before going
another round. Though if you want to take a go at it PLEASE do so.
There are still several places left where we can remove object creation. I
was going for the least radical and invasive and figured we'd iterate from
there.
We actually have an API for read-only which will become more efficient soon.
Its a reactor pattern which allows you to specify *what* types of data
you're interested in. I plan to add more granularity ("only interested in
rows x-y or columns i-n", etc). I think this is actually more efficient
than a cursor approach, though please attempt to persuade me otherwise.
-Andy
PS:
The Andy Queue (just so everyone asking for things from me knows where they
stand):
1. A SuperLink proposal (today/tomorrow)
2. Finish the Jboss Xdoclet training that I'm working on (today/tomorrow)
3. Performance/etc testing Chris's POIFS2 (sometime in the next few days)
4. Finish JBossMail SMTP service
5. Work on resolving unit test errors in 3.0 / Work on 2.0 bugs
On 7/29/03 2:30 PM, "Chris Nokleberg" <[EMAIL PROTECTED]> wrote:
> I've been poking around a little bit into the 3.0 branch. One issue that
> pops out is that the parallel arrays in ValueRecordsAggregate are
> sparse. This makes iteration slower (have to skip empty cells) and uses
> more memory. In addition, all of the arrays must be the same size, even
> though some inherently sparser (e.g. formulaptgs).
>
> One possibility is to eliminate totally empty rows and columns from each
> of the data sets. Separate row and column index arrays would be needed
> for each data set to map into the value arrays. Random access would use
> linear search on the index arrays, followed by a simple lookup into the
> value array.
>
> For even better performance and memory use, I would recommend moving
> away from the current sheet.getRow(i).getCell(j) model. That model
> forces a certain level of object creation which is not always
> necessary. Something like a JDBC ResultSet is more appropriate for
> managing large amounts of data, where you reuse the same object, and
> just the underlying values changes as you iterate. Example:
>
> Cell cell = new ColumnIterator(sheet, 3);
> while (cell.next()) {
> if (cell.getType() == Cell.STRING) {
> System.out.println(cell.getValue());
> }
> }
>
> In a read-only, low-memory scenario, this type of API would allow you to
> get rid of *all* the data storage in ValueRecordsAggregate. The
> iterators could advance over the underlying RandomAccessFile directly,
> doing the necessary conversions on the fly. For modifying or writing
> sheets, something like ValueRecordsAggregate will still be necessary.
>
> Chris
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
--
Andrew C. Oliver
http://www.superlinksoftware.com/poi.jsp
Custom enhancements and Commercial Implementation for Jakarta POI
http://jakarta.apache.org/poi
For Java and Excel, Got POI?
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]