Re: Reducing HSSF memory use

Andrew C. Oliver Tue, 29 Jul 2003 11:46:31 -0700

Do you think the tradeoff would be worth it in a read-write scenario?  I'm
leaning towards to completing this level of refactoring done before going
another round.  Though if you want to take a go at it PLEASE do so.

There are still several places left where we can remove object creation.  I
was going for the least radical and invasive and figured we'd iterate from
there.

We actually have an API for read-only which will become more efficient soon.
Its a reactor pattern which allows you to specify *what* types of data
you're interested in.  I plan to add more granularity ("only interested in
rows x-y or columns i-n", etc).  I think this is actually more efficient
than a cursor approach, though please attempt to persuade me otherwise.

-Andy

PS:

The Andy Queue (just so everyone asking for things from me knows where they
stand):

1. A SuperLink proposal (today/tomorrow)
2. Finish the Jboss Xdoclet training that I'm working on (today/tomorrow)
3. Performance/etc testing Chris's POIFS2 (sometime in the next few days)
4. Finish JBossMail SMTP service
5. Work on resolving unit test errors in 3.0 / Work on 2.0 bugs

On 7/29/03 2:30 PM, "Chris Nokleberg" <[EMAIL PROTECTED]> wrote:

> I've been poking around a little bit into the 3.0 branch. One issue that
> pops out is that the parallel arrays in ValueRecordsAggregate are
> sparse. This makes iteration slower (have to skip empty cells) and uses
> more memory. In addition, all of the arrays must be the same size, even
> though some inherently sparser (e.g. formulaptgs).
> 
> One possibility is to eliminate totally empty rows and columns from each
> of the data sets. Separate row and column index arrays would be needed
> for each data set to map into the value arrays. Random access would use
> linear search on the index arrays, followed by a simple lookup into the
> value array.
> 
> For even better performance and memory use, I would recommend moving
> away from the current sheet.getRow(i).getCell(j) model. That model
> forces a certain level of object creation which is not always
> necessary. Something like a JDBC ResultSet is more appropriate for
> managing large amounts of data, where you reuse the same object, and
> just the underlying values changes as you iterate. Example:
> 
> Cell cell = new ColumnIterator(sheet, 3);
> while (cell.next()) {
>     if (cell.getType() == Cell.STRING) {
>        System.out.println(cell.getValue());
>     }
> }
> 
> In a read-only, low-memory scenario, this type of API would allow you to
> get rid of *all* the data storage in ValueRecordsAggregate. The
> iterators could advance over the underlying RandomAccessFile directly,
> doing the necessary conversions on the fly. For modifying or writing
> sheets, something like ValueRecordsAggregate will still be necessary.
> 
> Chris
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 

-- 
Andrew C. Oliver
http://www.superlinksoftware.com/poi.jsp
Custom enhancements and Commercial Implementation for Jakarta POI

http://jakarta.apache.org/poi
For Java and Excel, Got POI?

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Reducing HSSF memory use

Reply via email to