Max Eliaser created BATIK-1343:
----------------------------------

             Summary: [PATCH] PNG compression level hint and TGA encoder for 
higher throughput bulk rasterization
                 Key: BATIK-1343
                 URL: https://issues.apache.org/jira/browse/BATIK-1343
             Project: Batik
          Issue Type: New Feature
          Components: batik-test-old, SVG Rasterizer
    Affects Versions: trunk
            Reporter: Max Eliaser
         Attachments: benchmark.sh, png_patch_main.diff, 
png_patch_rasterizer_app.diff, rip_svgs.sh, tga_patch_main.diff, 
tga_patch_rasterizer_app.diff, tga_unit_test_files.zip

Hello Apache. I, along with several of my colleagues, have been using the Batik 
library as a dependency-of-a-dependency in production for some time. We wanted 
to contribute back all the changes we made for our use.

I want to extend the gratitude of Amazon Web Services. We have found Batik very 
useful in AWS Elemental MediaConvert, as part of the rendering pipeline for the 
"style passthrough" feature documented here: 
https://docs.aws.amazon.com/mediaconvert/latest/ug/burn-in-output-captions.html

We use SVG as an intermediate format when rendering captions, and this SVG data 
then gets rasterized by Batik. When we were doing our performance testing, we 
found that the PNG encoding used by Batik became a significant bottleneck.

This patch series, which I authored myself on behalf of Amazon Web Services, 
adds some more raster formats to the Batik library and its demo rasterizer app, 
allowing the user to select different tradeoffs of compression ratio and 
encoding time.

I originally developed these patches against Batik 1.2, but I have ported them 
to the latest trunk (r1904320) and retested them thoroughly. I also ensured 
that each individual patch in the series passes unit tests.

h2. Patches

These patches are meant to be applied in this order.
# [^png_patch_main.diff] adds a tunable parameter for ZLib compression level to 
the internal PNG encoder using the "hints" mechanism, exposing the 
functionality to other Java software that calls into the Batik library. 
Previously, Batik would always use the highest compression level of 9 when 
encoding PNGs, but my benchmark results below show how using other values can 
achieve higher throughput for large bulk conversions, without too much cost in 
compression ratio.
# [^png_patch_rasterizer_app.diff] exposes the PNG compression level hint 
through the included svgrasterizer demo app.
# [^tga_patch_main.diff] adds an encoder for the TrueVision Targa (TGA) raster 
file format. This is a more old-fashioned file format that uses simple RLE 
compression, making it dramatically faster than even the lowest PNG compression 
level, at the expense of a worse compression ratio. The TGA file format could 
be a good choice when the highest throughput is desired, but it is limited to 8 
bits per channel and my encoder doesn't implement the paletted modes. Since I 
cannot include binary files with a patch, when applying this patch, you must 
also extract [^tga_unit_test_files.zip] into your Batik tree, for the resources 
needed by the unit tests I added. I tried to cover all edge cases in the RLE 
algorithm using crafted pixel contents in the unit test input files.
# [^tga_patch_rasterizer_app.diff] exposes the TGA encoder through the included 
svgrasterizer demo app.

h2. Testing

The original implementation against Batik 1.2 has been in production for around 
a year now and has proven itself to be solid. Existing unit tests are passing 
with this patch series, and new unit tests have been added.

To demonstrate the performance benefits, I implemented a quick-and-dirty 
benchmark. I wrote a script [^rip_svgs.sh] to scrape 292 SVG files from 
Wikimedia Commons. I then wrote another script [^benchmark.sh] (which should be 
run from the top-level trunk directory) to do bulk conversions of all SVGs in 
each encoding mode. The script shows how long each mode took to encode the SVGs 
and the total size of output files, to give an idea of the tradeoff of 
throughput vs compression ratio. A few of the SVGs fail to render, but these 
get ignored silently.

Although the Wikimedia SVGs I looked at all seemed innocuous enough, I did not 
personally verify that every single one is safe for work, which is why I'm 
distributing the script instead of the SVGs. The benchmark also runs the 
rasterizer with \{{-scriptSecurityOff}}, so there's that too. Not liable for 
damages from running this benchmark etc etc etc.

Here are my own results, collected on a Core i7-5930K:
{code}
TGA

real    0m27.714s
user    1m13.000s
sys    0m2.529s
79M    /tmp/svgbenches/tga


PNG compression level 1

real    0m35.124s
user    1m17.808s
sys    0m2.538s
40M    /tmp/svgbenches/png1


PNG compression level 2

real    0m35.884s
user    1m21.007s
sys    0m2.443s
39M    /tmp/svgbenches/png2


PNG compression level 3

real    0m36.440s
user    1m22.461s
sys    0m2.645s
39M    /tmp/svgbenches/png3


PNG compression level 4

real    0m39.022s
user    1m27.379s
sys    0m2.671s
33M    /tmp/svgbenches/png4


PNG compression level 5

real    0m41.311s
user    1m32.538s
sys    0m2.600s
33M    /tmp/svgbenches/png5


PNG compression level 6

real    0m41.878s
user    1m27.781s
sys    0m2.545s
32M    /tmp/svgbenches/png6


PNG compression level 7

real    0m42.547s
user    1m26.767s
sys    0m2.220s
32M    /tmp/svgbenches/png7


PNG compression level 8

real    1m0.883s
user    1m45.745s
sys    0m1.958s
32M    /tmp/svgbenches/png8


PNG compression level 9

real    1m33.160s
user    2m27.109s
sys    0m2.784s
31M    /tmp/svgbenches/png9


{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to