[
https://issues.apache.org/jira/browse/SANSELAN-76?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gary Lucas updated SANSELAN-76:
-------------------------------
Attachment: Tracker_76_Test_5_May_2012.patch
Patch showing changes
> Reduce memory use of TIFF readers
> ---------------------------------
>
> Key: SANSELAN-76
> URL: https://issues.apache.org/jira/browse/SANSELAN-76
> Project: Commons Sanselan
> Issue Type: Improvement
> Components: Format: TIFF
> Reporter: Gary Lucas
> Attachments: Tracker_76_Test_5_May_2012.patch
>
> Original Estimate: 80h
> Remaining Estimate: 80h
>
> This Tracker Item proposes changes to the TIFF file readers to address memory
> issues when reading very large images from TIFF files. The TIFF format is
> used extensively in technical applications such as aerial photographs,
> satellite images, and digital raster maps which feature very large image
> sizes. For example, the public-domain Natural Earth Data set features raster
> files sized 21,600 by 10,800 pixels (222.5 megapixels). Although this
> example is unusually large, image sizes of 25 to 100 megapixels are common
> for such applications.
> Unfortunately, when Sanselan reads a TIFF image, it consumes nearly twice as
> much memory as is necessary. The reader operates in two stages. First, it
> reads the entire source file into memory then it builds the output image,
> also in memory. In the example file mentioned above, the source data runs
> from 83.19 to 373 megabytes (depending on compression). Thus Sanselan would
> require a minimum of 83.19+4*222.5 = 985 megabytes to produce an image for
> one of these files (allowing 4 bytes per pixel in the output BufferedImage)
> Fortunately, TIFF files are organized so that they can be read a piece at a
> time. TIFF files are divided into either strips or tiles and, if data
> compression is used, each piece is compressed individually. Thus each
> individual piece has no dependency on the other.
> This item proposes to implement two changes:
> 1) Allow the TIFF data reader to read the files one piece at a time while
> constructing the buffered image. Thus the memory use for reading would be no
> larger than the piece size. This would be an internal change, so the
> external appearance of the Sanselan getBufferedImage methods would not change.
> 2) Provide new API elements that permit applications to read the strips or
> tiles from TIFF files individually. This change would support
> applications that needed to access very large TIFF files without committing
> the memory to store a BufferedImage for the entire file (a 222.5 megapixel
> image requires 890 megabytes, which is a lot even by contemporary standards).
> There is one minor issue in this implementation that is easily addressed.
> Sanselan reads images from ByteSources that can be either random-access files
> or sequential-access input streams. In the case of sequential-input streams,
> it may be hard to perform a partial read on a TIFF directory. In such a
> case, the TIFF access routines might have to resort to reading the entire
> source data into memory as it currently does. This would simply be a
> limitation of the implementation.
> There is one issue that may make this change a bit problematic. The TIFF
> processors depend on accessing a class called TiffDataElement that contains a
> public array of bytes called "data". The most expeditious way of
> implementing the enchancement is to make this element private and add an
> accessor that either returns the data from internal memory or else loads it
> on-demand. Unfortunately, because the data element is scoped to public,
> there is a chance that some existing applications are using it directly. In
> hindsight, it is clear that scoping this element as public was a mistake, but
> it may be too late to fix it. So care will be required to ensure that
> compatibility remains. The most likely solution seems to be to implement a
> new class for passing raw data from the source TIFF files to the DataReader
> implementations.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira