[ 
https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475382#comment-16475382
 ] 

Emmeran Seehuber commented on PDFBOX-4184:
------------------------------------------

I've found the bug in my code by changing the sum method (while trying to 
optimize the code). The PNG Average function was simply plain wrong 
implemented, just got the formula wrong... The bug triggered with your sample 
image after changing estCompressSum().

I've also implemented a benchmark. But the benchmark is likely only for trunk, 
as JMH now needs Java 1.7. So to even compile the benchmark you need JDK 1.7+ - 
I tested also with JDK 10 ...

I thought that the predictor would be faster then the "simple" way. But no, it 
is not and at the moment I don't have any future idea what I could do to 
optimize it future ...
{code:java}
Benchmark                                   (zipLevel)   Mode  Cnt    Score    
Error  Units
LosslessFactoryBenchmark.predictor                   3  thrpt    5  114.055 ± 
10.120  ops/s
LosslessFactoryBenchmark.predictor                   6  thrpt    5   79.463 ± 
15.921  ops/s
LosslessFactoryBenchmark.predictor                   9  thrpt    5   16.542 ±  
7.951  ops/s
LosslessFactoryBenchmark.predictorBig                3  thrpt    5    1.355 ±  
0.585  ops/s
LosslessFactoryBenchmark.predictorBig                6  thrpt    5    1.360 ±  
0.045  ops/s
LosslessFactoryBenchmark.predictorBig                9  thrpt    5    1.135 ±  
0.021  ops/s
LosslessFactoryBenchmark.predictorBigBytes           3  thrpt    5    1.420 ±  
0.028  ops/s
LosslessFactoryBenchmark.predictorBigBytes           6  thrpt    5    1.286 ±  
0.052  ops/s
LosslessFactoryBenchmark.predictorBigBytes           9  thrpt    5    1.073 ±  
0.014  ops/s
LosslessFactoryBenchmark.rgbOnly                     3  thrpt    5  248.467 ±  
8.199  ops/s
LosslessFactoryBenchmark.rgbOnly                     6  thrpt    5  126.354 ±  
9.548  ops/s
LosslessFactoryBenchmark.rgbOnly                     9  thrpt    5   13.954 ±  
1.092  ops/s
LosslessFactoryBenchmark.rgbOnlyBig                  3  thrpt    5    7.939 ±  
0.395  ops/s
LosslessFactoryBenchmark.rgbOnlyBig                  6  thrpt    5    3.278 ±  
0.038  ops/s
LosslessFactoryBenchmark.rgbOnlyBig                  9  thrpt    5    1.248 ±  
0.080  ops/s
LosslessFactoryBenchmark.rgbOnlyBigBytes             3  thrpt    5    3.380 ±  
0.229  ops/s
LosslessFactoryBenchmark.rgbOnlyBigBytes             6  thrpt    5    2.108 ±  
0.064  ops/s
LosslessFactoryBenchmark.rgbOnlyBigBytes             9  thrpt    5    1.064 ±  
0.023  ops/s
{code}
I've tested both your "old" rgbOnly code and the predictor using the zip levels 
3, 6 and 9. The images used are your sample image and that image scaled up 10x 
to a INT Bitmap (Big) and to a 3BYTE Bitmap (BigBytes). Only when compressing 
with maximum zip level the predictor is on par with rgbOnly. So in all other 
cases it's always slower. But the big image has a huge difference in 
compression size on zip level 9:  58077 (Predictor) vs. 167808 (RGB Only).

So I'm not sure if it would not be better to allow the user to choose between 
simple encoding and predictor encoding, as there is a tradeoff between speed 
and size. What do you think about the API?

[^lossless_predictor_based_imageencoding_v3.patch]

I've not yet tested against the govdocs, I'll try to let this test run in the 
background today. For me this patch is still WIP, not ready to be comited.

> [PATCH]: Support simple lossless compression of 16 bit RGB images
> -----------------------------------------------------------------
>
>                 Key: PDFBOX-4184
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4184
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: Writing
>    Affects Versions: 2.0.9
>            Reporter: Emmeran Seehuber
>            Priority: Minor
>             Fix For: 2.0.10, 3.0.0 PDFBox
>
>         Attachments: LoadGovdocs.java, 
> lossless_predictor_based_imageencoding.patch, 
> lossless_predictor_based_imageencoding_v2.patch, 
> lossless_predictor_based_imageencoding_v3.patch, 
> pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, 
> png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf
>
>
> The attached patch add support to write 16 bit per component images 
> correctly. I've integrated a test for this here: 
> [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9]
> It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this 
> is what you usually get when you read a 16 bit PNG file.
> This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173].
> The patch is against 2.0.9, but should apply to 3.0.0 too.
> There is still some room for improvements when writing lossless images, as 
> the images are currently not efficiently encoded. I.e. you could use PNG 
> encodings to get a better compression. (By adding a COSName.DECODE_PARMS with 
> a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is 
> something for a later patch. It would also need another API, as there is a 
> tradeoff speed vs compression ratio. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to