[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475382#comment-16475382 ]
Emmeran Seehuber commented on PDFBOX-4184: ------------------------------------------ I've found the bug in my code by changing the sum method (while trying to optimize the code). The PNG Average function was simply plain wrong implemented, just got the formula wrong... The bug triggered with your sample image after changing estCompressSum(). I've also implemented a benchmark. But the benchmark is likely only for trunk, as JMH now needs Java 1.7. So to even compile the benchmark you need JDK 1.7+ - I tested also with JDK 10 ... I thought that the predictor would be faster then the "simple" way. But no, it is not and at the moment I don't have any future idea what I could do to optimize it future ... {code:java} Benchmark (zipLevel) Mode Cnt Score Error Units LosslessFactoryBenchmark.predictor 3 thrpt 5 114.055 ± 10.120 ops/s LosslessFactoryBenchmark.predictor 6 thrpt 5 79.463 ± 15.921 ops/s LosslessFactoryBenchmark.predictor 9 thrpt 5 16.542 ± 7.951 ops/s LosslessFactoryBenchmark.predictorBig 3 thrpt 5 1.355 ± 0.585 ops/s LosslessFactoryBenchmark.predictorBig 6 thrpt 5 1.360 ± 0.045 ops/s LosslessFactoryBenchmark.predictorBig 9 thrpt 5 1.135 ± 0.021 ops/s LosslessFactoryBenchmark.predictorBigBytes 3 thrpt 5 1.420 ± 0.028 ops/s LosslessFactoryBenchmark.predictorBigBytes 6 thrpt 5 1.286 ± 0.052 ops/s LosslessFactoryBenchmark.predictorBigBytes 9 thrpt 5 1.073 ± 0.014 ops/s LosslessFactoryBenchmark.rgbOnly 3 thrpt 5 248.467 ± 8.199 ops/s LosslessFactoryBenchmark.rgbOnly 6 thrpt 5 126.354 ± 9.548 ops/s LosslessFactoryBenchmark.rgbOnly 9 thrpt 5 13.954 ± 1.092 ops/s LosslessFactoryBenchmark.rgbOnlyBig 3 thrpt 5 7.939 ± 0.395 ops/s LosslessFactoryBenchmark.rgbOnlyBig 6 thrpt 5 3.278 ± 0.038 ops/s LosslessFactoryBenchmark.rgbOnlyBig 9 thrpt 5 1.248 ± 0.080 ops/s LosslessFactoryBenchmark.rgbOnlyBigBytes 3 thrpt 5 3.380 ± 0.229 ops/s LosslessFactoryBenchmark.rgbOnlyBigBytes 6 thrpt 5 2.108 ± 0.064 ops/s LosslessFactoryBenchmark.rgbOnlyBigBytes 9 thrpt 5 1.064 ± 0.023 ops/s {code} I've tested both your "old" rgbOnly code and the predictor using the zip levels 3, 6 and 9. The images used are your sample image and that image scaled up 10x to a INT Bitmap (Big) and to a 3BYTE Bitmap (BigBytes). Only when compressing with maximum zip level the predictor is on par with rgbOnly. So in all other cases it's always slower. But the big image has a huge difference in compression size on zip level 9: 58077 (Predictor) vs. 167808 (RGB Only). So I'm not sure if it would not be better to allow the user to choose between simple encoding and predictor encoding, as there is a tradeoff between speed and size. What do you think about the API? [^lossless_predictor_based_imageencoding_v3.patch] I've not yet tested against the govdocs, I'll try to let this test run in the background today. For me this patch is still WIP, not ready to be comited. > [PATCH]: Support simple lossless compression of 16 bit RGB images > ----------------------------------------------------------------- > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing > Affects Versions: 2.0.9 > Reporter: Emmeran Seehuber > Priority: Minor > Fix For: 2.0.10, 3.0.0 PDFBox > > Attachments: LoadGovdocs.java, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org