Max Eliaser created BATIK-1343:
----------------------------------
Summary: [PATCH] PNG compression level hint and TGA encoder for
higher throughput bulk rasterization
Key: BATIK-1343
URL: https://issues.apache.org/jira/browse/BATIK-1343
Project: Batik
Issue Type: New Feature
Components: batik-test-old, SVG Rasterizer
Affects Versions: trunk
Reporter: Max Eliaser
Attachments: benchmark.sh, png_patch_main.diff,
png_patch_rasterizer_app.diff, rip_svgs.sh, tga_patch_main.diff,
tga_patch_rasterizer_app.diff, tga_unit_test_files.zip
Hello Apache. I, along with several of my colleagues, have been using the Batik
library as a dependency-of-a-dependency in production for some time. We wanted
to contribute back all the changes we made for our use.
I want to extend the gratitude of Amazon Web Services. We have found Batik very
useful in AWS Elemental MediaConvert, as part of the rendering pipeline for the
"style passthrough" feature documented here:
https://docs.aws.amazon.com/mediaconvert/latest/ug/burn-in-output-captions.html
We use SVG as an intermediate format when rendering captions, and this SVG data
then gets rasterized by Batik. When we were doing our performance testing, we
found that the PNG encoding used by Batik became a significant bottleneck.
This patch series, which I authored myself on behalf of Amazon Web Services,
adds some more raster formats to the Batik library and its demo rasterizer app,
allowing the user to select different tradeoffs of compression ratio and
encoding time.
I originally developed these patches against Batik 1.2, but I have ported them
to the latest trunk (r1904320) and retested them thoroughly. I also ensured
that each individual patch in the series passes unit tests.
h2. Patches
These patches are meant to be applied in this order.
# [^png_patch_main.diff] adds a tunable parameter for ZLib compression level to
the internal PNG encoder using the "hints" mechanism, exposing the
functionality to other Java software that calls into the Batik library.
Previously, Batik would always use the highest compression level of 9 when
encoding PNGs, but my benchmark results below show how using other values can
achieve higher throughput for large bulk conversions, without too much cost in
compression ratio.
# [^png_patch_rasterizer_app.diff] exposes the PNG compression level hint
through the included svgrasterizer demo app.
# [^tga_patch_main.diff] adds an encoder for the TrueVision Targa (TGA) raster
file format. This is a more old-fashioned file format that uses simple RLE
compression, making it dramatically faster than even the lowest PNG compression
level, at the expense of a worse compression ratio. The TGA file format could
be a good choice when the highest throughput is desired, but it is limited to 8
bits per channel and my encoder doesn't implement the paletted modes. Since I
cannot include binary files with a patch, when applying this patch, you must
also extract [^tga_unit_test_files.zip] into your Batik tree, for the resources
needed by the unit tests I added. I tried to cover all edge cases in the RLE
algorithm using crafted pixel contents in the unit test input files.
# [^tga_patch_rasterizer_app.diff] exposes the TGA encoder through the included
svgrasterizer demo app.
h2. Testing
The original implementation against Batik 1.2 has been in production for around
a year now and has proven itself to be solid. Existing unit tests are passing
with this patch series, and new unit tests have been added.
To demonstrate the performance benefits, I implemented a quick-and-dirty
benchmark. I wrote a script [^rip_svgs.sh] to scrape 292 SVG files from
Wikimedia Commons. I then wrote another script [^benchmark.sh] (which should be
run from the top-level trunk directory) to do bulk conversions of all SVGs in
each encoding mode. The script shows how long each mode took to encode the SVGs
and the total size of output files, to give an idea of the tradeoff of
throughput vs compression ratio. A few of the SVGs fail to render, but these
get ignored silently.
Although the Wikimedia SVGs I looked at all seemed innocuous enough, I did not
personally verify that every single one is safe for work, which is why I'm
distributing the script instead of the SVGs. The benchmark also runs the
rasterizer with \{{-scriptSecurityOff}}, so there's that too. Not liable for
damages from running this benchmark etc etc etc.
Here are my own results, collected on a Core i7-5930K:
{code}
TGA
real 0m27.714s
user 1m13.000s
sys 0m2.529s
79M /tmp/svgbenches/tga
PNG compression level 1
real 0m35.124s
user 1m17.808s
sys 0m2.538s
40M /tmp/svgbenches/png1
PNG compression level 2
real 0m35.884s
user 1m21.007s
sys 0m2.443s
39M /tmp/svgbenches/png2
PNG compression level 3
real 0m36.440s
user 1m22.461s
sys 0m2.645s
39M /tmp/svgbenches/png3
PNG compression level 4
real 0m39.022s
user 1m27.379s
sys 0m2.671s
33M /tmp/svgbenches/png4
PNG compression level 5
real 0m41.311s
user 1m32.538s
sys 0m2.600s
33M /tmp/svgbenches/png5
PNG compression level 6
real 0m41.878s
user 1m27.781s
sys 0m2.545s
32M /tmp/svgbenches/png6
PNG compression level 7
real 0m42.547s
user 1m26.767s
sys 0m2.220s
32M /tmp/svgbenches/png7
PNG compression level 8
real 1m0.883s
user 1m45.745s
sys 0m1.958s
32M /tmp/svgbenches/png8
PNG compression level 9
real 1m33.160s
user 2m27.109s
sys 0m2.784s
31M /tmp/svgbenches/png9
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]