This is, I hope, the first of a regular feature in which I explicate an OIIO
scenario that seems particularly interesting or helpful for a wider audience.
Enjoy.
[ Note: this one is much longer and more detailed than I'll generally aim for;
but the topic was very rich to explore. ]
An OIIO user recently had the problem of wanting to generate three different
reduced-resolution preview images from each source image, using oiiotool. This
was done for each of a huge number of source images, so performance was
important, and they also wanted to keep memory usage low on their server-side
app.
The source images were very large greyscale TIFFs with an odd aspect ratio.
We'll reproduce the test case like this:
oiiotool -pattern checker 2272x152780 1 -d uint8 -o big.tif
That's an admittedly big and oddly-shaped file. Nonetheless, the goal was to
produce three successive resizes, with the longest side being 1024, 384, and
128 pixels, saved as JPEG files. (Ours is not to reason why, ours is but to do
and die.)
Baseline:
So the naive approach is,
oiiotool big.tif -resize 0x1024 -o 1024.jpg
oiiotool big.tif -resize 0x384 -o 384.jpg
oiiotool big.tif -resize 0x128 -o 128.jpg
These three commands take a total of 10m:40s (on my 2015 MacBookPro, quad core)
and use a peak of 1.4 GB. Yikes!
Aside: note the -resize 0x1024 ... when one dimension of a resize is 0, it
means to select a value that preserves the original aspect ratio, given the
constraint of the dimension you did specify.
Step 1: Use successive resizes in the same oiiotool command line
The main reason it's so expensive is the extreme resize (152k -> 1k, to
152k->128 vertically), it means that each output pixel needs to sample an
absurd number of pixels in the source image. We can save the time of the latter
two resizes by successively resizing from the lower-res images as we go (i.e.,
resize the 1024 image to 384, then the 384 to 128).
There's also no reason to use three separate invocations of oiiotool. Note that
oiiotool processes its commands strictly from left to right, it's fine to
include multiple -o outputs, and they can be interspersed among the other
commands and will output the results at that point in the command sequence.
oiiotool big.tif -resize 0x1024 -o 1024.jpg -resize 0x384 -o 384.jpg -resize
0x128 -o 128.jpg
Time: 1:50 Peak memory: 1.4GB
Improvement so far: 5.8x speed, 1x memory
Step 2: Speed up the expensive resize with a cheaper filter
The slow resize speed is exacerbated by the fact that the default downsize
filter is a "lanczos3" filter, which is quite wide. That's great for high
quality resizes of reasonable magnification. But in this case, the resize is so
extreme that I didn't think anybody would notice the fine points of filtering
quality, and these are intended as low-res previews, not "final quality"
delivery images. So I hypothesized that a box filter would be faster and look
fine for this purpose.
oiiotool big.tif -resize:filter=box 0x1024 -o 1024.jpg -resize:filter=box 0x384
-o 384.jpg -resize:filter=box 0x128 -o 128.jpg
(excuse the line wrap, that is a single command)
Time: 4s Peak memory: 1.4GB
Improvement so far: 160x speed, 1.0x memory
Aside: stats show that of the 4s it now takes, 1.9s was file I/O, so we're
already at the point where even if we could make the "resize" be infinitely
fast, we could squeeze out no more than an additional factor of 2 in speed.
Step 3: Use -native to prevent expansion to float internally
By default, oiiotool converts all images to float pixels internally, so that
any math you ask it to do will be at full precision. Also, this is usually the
fastest option, since it converts to float just once per pixel, whereas if it
kept it in uint8, say, it might end up converting to float and back for every
individual math op in the resize. But what the heck, let's see what the time vs
memory tradeoff is in this case, where a bit of accuracy loss is probably
acceptable for a thumbnail preview image. We'll use the --native option:
oiiotool --native big.tif -resize:filter=box 0x1024 -o 1024.jpg
-resize:filter=box 0x384 -o 384.jpg -resize:filter=box 0x128 -o 128.jpg
Time: 3.3s Peak memory: 361MB
Improvement so far: 194x speed, 4x memory
Aside: I said that I expected maximum speed to be when the internal
representation was float. Why is it faster now? I assume that by reducing the
size of the image in memory, more could fit into processor cache at any given
time, so we have probably made the performance somewhat less bottlenecked on
RAM speed. For more reasonable images where the working set can fit into cache
and the math itself is dominating the time, I do expect the up-front float
conversion to be faster as well as more accurate.
Step 4: Restrict the ImageCache size
The way oiiotool works, small images are read directly into RAM whole, but big
images (like this one) are backed by ImageCache, which tries to limit memory
size. But oiiotool's default is to let the ImageCache use up to 4GB. Perhaps we
can squeeze down the working size of the cache ever further, without
sacrificing much time? Let's use the --cache argument (the argument is size in
MB, with a default of 4096):
oiiotool --cache 50 --native big.tif -resize:filter=box 0x1024 -o 1024.jpg
-resize:filter=box 0x384 -o 384.jpg -resize:filter=box 0x128 -o 128.jpg
Time: 3.25s Peak memory: 77MB
We got lucky again -- reducing the cache size didn't seem to hurt performance
at all. It would if we had somewhat more random or repeated access to the big
source image. But in this case, we are doing just one resize (of the original
big input image), and it's reasonably coherent in its access pattern -- the
"working set" at any given time didn't exceed the size of the ImageCache, so
there were no redundant reads from the files.
The reason why capping the memory footprint was so important is the number of
processes they could run simultaneously on each physical server was dependent
on the memory (swapping would kill performance). This reduction in memory
allowed them to run this same process in 28 threads on the same server without
swapping, doubling the number of threads they could deploy, and thus resulted
in another 2x performance boost for them, in terms of total throughput to
process their whole database.
Final results:
Time: 3.25s Peak memory: 77MB
Total improvement: 194x speed, 20x memory
Moral: Power users, make sure you know what every obscure oiiotool command line
option does!
--
Larry Gritz
[email protected]
_______________________________________________
Oiio-dev mailing list
[email protected]
http://lists.openimageio.org/listinfo.cgi/oiio-dev-openimageio.org