[ 
https://issues.apache.org/jira/browse/IMAGING-356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17739288#comment-17739288
 ] 

Gary Lucas commented on IMAGING-356:
------------------------------------

I haven't studied the changes that were made, so I can't offer any 
authoritative recommendations on the approach.  Instead, I have a few general 
observations about the way TIFF files work that may be useful in figuring how 
you tackle the problem.  Or perhaps not. So take them with a grain of salt.

TIFF files are kind of a special case in terms of image formats. First off, one 
can never assume that a TIFF file is going to be accessed in-order.  It is 
common for the the "directory" section of the file (which tells how its 
organized) to come last rather than first. And, of course, a TIFF file may have 
multiple directories (because it may contain multiple images).     Second, TIFF 
files are typically quite large, often in the hundreds of megabytes range, and 
sometimes in the gigabyte range.  So it is often preferred to not keep the 
entire thing in memory. In many cases, an application will not  access the 
entire file, but only a subsection.  For example, a mapping program displaying 
an aerial photograph might only access the subsection of the photograph that is 
actually visible on the map. And finally, I note that TIFF files are often not 
images at all, but are used to store numerical raster data (such as Earth 
elevation and ocean depth data). 

All of this means that the file-access pattern for a TIFF file is a closer fit 
to the idea of a random access file rather than the idea of a sequential IO 
channel such as a network socket or a serial device.  I know that the PNG 
format (the only other one I've studied in depth)  was designed with network 
access specifically in mind.  The TIFF format evolved before network access was 
in the ascendency as it is today.

That being said, even the original Commons Imaging approach to TIFF file IO 
wasn't quite a perfect fit. For one thing, the original authors open and close 
a file multiple times (as they access each part of the file) . That is 
suboptimal since opening and closing a file carries its own performance 
overhead.  Also, when I was looking at refactoring Commons Imaging IO to 
implement Closeable to support of try-with-resources blocks, I didn't see a way 
to accomplish that without a significant rewrite and compatibility breaking 
changes to the public API.  



> TIFF reading extremely slow in version 1.0-SNAPSHOT
> ---------------------------------------------------
>
>                 Key: IMAGING-356
>                 URL: https://issues.apache.org/jira/browse/IMAGING-356
>             Project: Commons Imaging
>          Issue Type: Bug
>          Components: Format: TIFF
>    Affects Versions: 1.0
>            Reporter: Gary Lucas
>            Priority: Major
>
> I am using the latest code from github (1.0-SNAPSHOT downloaded from github 
> of June 2023) to read a 300 megabyte TIFF file.  Version 1.0-alpha3 required 
> 673 milliseconds to read that file.  The new code requires upward of 15 
> minutes.   Clearly something got broken since the last release.
> The TIFF file is a 10000x10000 pixel 4 byte image format organized in strips. 
>  The bottleneck appears to occur in the TiffReader getTiffRawImageData method 
> which reads raw data from the file in preparation of creating a BufferedImage 
> object.
> I suspect that there may be a general slowness of file access.  In debugging, 
> even reading the initial metadata (22 TIFF tags) took a couple of seconds.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to