[jira] [Comment Edited] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700641#comment-16700641 ] Tim Allison edited comment on PDFBOX-4184 at 11/27/18 4:15 PM: --- Re-opening to add literal govdocs1 test file 032163.jpg was (Author: talli...@mitre.org): Re-opening to add attachment > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Assignee: Tim Allison >Priority: Minor > Fix For: 2.0.12, 3.0.0 PDFBox > > Attachments: 032163.jpg, 16bit.png, LoadGovdocs.java, > fix_profile_use.patch, fix_profile_use3.patch, fix_profile_use4.patch, > images.zip, lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > lossless_predictor_based_imageencoding_v6.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf, > size_compare.txt > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16551917#comment-16551917 ] Tilman Hausherr edited comment on PDFBOX-4184 at 9/21/18 5:44 PM: -- I did a size comparison. It went over the zip files from 0 to 18. The attachment has the files were the size of the predictor compression was at least 5% over the size of the "old" compression. Almost all of the files are jpeg files and of the kind that shouldn't have been jpeg compressed in the first place. Jpeg is for photographs and not for charts, or anything with sharp edges. was (Author: tilman): I did a size comparison. It went over the zip files from 0 to 18. The attachment has the files were the size of the predictor compression was at least 5% over the size of the "old" compression. Alsmost all of the files are jpeg files and of the kind that shouldn't have been jpeg compressed in the first place. Jpeg is for photographs and not for charts, or anything with sharp edges. > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Assignee: Tilman Hausherr >Priority: Minor > Fix For: 2.0.12, 3.0.0 PDFBox > > Attachments: 16bit.png, LoadGovdocs.java, fix_profile_use.patch, > fix_profile_use3.patch, fix_profile_use4.patch, images.zip, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > lossless_predictor_based_imageencoding_v6.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf, > size_compare.txt > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16620827#comment-16620827 ] Tilman Hausherr edited comment on PDFBOX-4184 at 9/19/18 4:39 PM: -- The cmyk test fails, there are many 1-differences like this if I modify the test so that it reports differences without failing: expected: but was: ; expected: but was: ; expected: but was: ; expected: but was: ; expected: but was: ; This is not much but I wonder why it works for you. What OS and what Java are you using? I tested this on W10 with jdk8 latest. was (Author: tilman): The cmyk test fails, there are many 1-differences like this: expected: but was: ; expected: but was: ; expected: but was: ; expected: but was: ; expected: but was: ; This is not much but I wonder why it works for you. What OS and what Java are you using? I tested this on W10 with jdk8 latest. > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.12, 3.0.0 PDFBox > > Attachments: 16bit.png, LoadGovdocs.java, fix_profile_use.patch, > fix_profile_use3.patch, fix_profile_use4.patch, images.zip, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > lossless_predictor_based_imageencoding_v6.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf, > size_compare.txt > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618743#comment-16618743 ] Emmeran Seehuber edited comment on PDFBOX-4184 at 9/18/18 12:31 PM: [~tilman] If you have a ICC profile on an image, which is not the builtin sRGB profile, you need the ICC profile, otherwise you will just have plain wrong colors. You should not look at (r,g,b) or (c,m,y,k) as concrete color values, but rather as vectors within the color space. Without a profile describing the vectorspace/colorspace you have no idea what real colors the vector values result in. DeviceRGB is (on screen) often interpreted as sRGB. But what DeviceCMYK means is really up to the concrete interpreting device. I.e. this will look different on every printer (brightness, color, ...). So DeviceCMYK as a colorspace for an image mostly means "random", if you are not explicit targeting one specific printer. The ICC profile describes how to transform the color-vector-data into other colorspaces, e.g. into sRGB to view on the screen or the concrete ICC profile of the printing device. If you load images in java using ImageIO you usually (especially when using twelve monkeys) get an sRGB image. So you would never hit this code path. If you want to load an image with the real color profile of the image you must pass a special prepared (i.e. with the right profile) BufferedImage into ImageIO. So you won't get an image with a color space different to sRGB by accident. If you have an image with an ICC profile, you always want the image to be written with the ICC profile because you explicit care about it. Regarding file size bloat: Yes, the ICC profile will sum up, especially if you have more images. The correct solution would be a ICC_Profile <-> PDICCBased cache in the document, so that the same profile does not get encoded twice. Should I implement such a cache? In my application I manually deduplicate the ICC profiles at the moment. The attached patch [^fix_profile_use4.patch] fixes the test driver and also specifies a "Alternate" colorspace for the profile, for all those devices which can not handle ICC_Profile's. With the correct ICC_Profile specified now also the "roundtrip" sRGB->ISO Coated->sRGB works correctly, so the image can be compared with the original image. was (Author: rototor): [~tilman] If you have a ICC profile on an image, which is not the builtin sRGB profile, you need the ICC profile, otherwise you will just have plain wrong colors. You should not look at (r,g,b) or (c,m,y,k) as concrete color values, but rather as vectors within the color space. Without a profile describing the vectorspace/colorspace you have no idea what real colors the vector values result in. DeviceRGB is (on screen) often interpreted as sRGB. But what DeviceCMYK means is really up to the concrete interpreting device. I.e. this will look different on every printer (brightness, color, ...). So DeviceCMYK as a colorspace for an image mostly means "random", if you are not explicit targeting one specific printer. The ICC profile describes how to transform the color-vector-data into other colorspaces, e.g. into sRGB to view on the screen or the concrete ICC profile of the printing device. If you load images in java using ImageIO you usually (especially when using twelve monkeys) get an sRGB image. So you would never hit this path. If you want to load an image with the real color profile of the image you must pass a special prepared (i.e. with the right profile) BufferedImage into ImageIO. So you wont get an image with an color space different to sRGB by accident. If you have a image with an ICC profile, you always want the in this colorspace with the attached profile. As its already not so easy to get the image in anything different than sRGB. Regarding file size bloat: Yes, the ICC profile will sum up, especially if you have more images. The correct solution would be a ICC_Profile <-> PDICCBased cache in the document, so that the same profile does not get encoded twice. Should I implement such a cache? In my application I manually deduplicate the ICC profiles at the moment. The attached patch [^fix_profile_use4.patch] fixes the test driver and also specifies a "Alternate" colorspace for the profile, for all those devices which can not handle ICC_Profile's. With the correct ICC_Profile specified now also the "roundtrip" sRGB->ISO Coated->sRGB works correctly, so the image can be compared with the original image. > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >
[jira] [Comment Edited] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16617995#comment-16617995 ] Tilman Hausherr edited comment on PDFBOX-4184 at 9/17/18 6:56 PM: -- Thanks, the change makes sense, but I'd like to have a "no longer failing" test for this, i.e. where the generated PDF looks different than the image due to the missing ICC profile. Another problem is that {{testCreateLosslessFromImageCMYK}} now fails. I wonder if the ICC profile is needed for CMYK? I also see the danger that PDFs get bigger, if each image now has a (different) ICC profile. And what about b/w images? was (Author: tilman): Thanks, the change makes sense, but I'd like to have a "no longer failing" test for this, i.e. where the generated PDF looks different than the image due to the missing ICC profile. Another problem is that\{{testCreateLosslessFromImageCMYK}} now fails. I wonder if the ICC profile is needed for CMYK? I also see the danger that PDFs get bigger, if each image now has a (different) ICC profile. And what about b/w images? > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.12, 3.0.0 PDFBox > > Attachments: 16bit.png, LoadGovdocs.java, fix_profile_use.patch, > images.zip, lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > lossless_predictor_based_imageencoding_v6.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf, > size_compare.txt > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530102#comment-16530102 ] Tilman Hausherr edited comment on PDFBOX-4184 at 7/2/18 3:53 PM: - I looked at the sizes of the PDF test result files. Have a look at bitmask4babgr.pdf and intargb.pdf. This isn't just space needed for the extra dictionary. In bitmask4babgr.pdf, the first image had a compressed size of 214 and now it has a size of 701. OTOH the file PDFBOX-4184-032163.pdf had a size of 36240 and now 31607, and only 27007 by modifying estCompressSum() to sum += Math.abs(aDataRawRowSub); I'm wondering about the logic of chooseDataRowToWrite(). You're choosing the compression method based on the result of estCompressSum() which is the sum of the byte values. How would this have any influence on compression? Why would a sequence of 00 have a different compression length than a sequence of FF? Your comment mentions "This is just the recommend algorithm in the spec" and surprisingly, this is true: [https://medium.com/@duhroach/how-png-works-f1174e3cc7b7] that one recommends to use abs of signed values (which I tried above). I tried that but it doesn't make things better for the non photo files. Same here with more details: [https://www.w3.org/TR/PNG-Encoders.html#E.Filter-selection] I think we should count colors and/or consider the bit depth. Or the geometric size of the image, i.e. something below 25x25 is probably rather an icon than a photograph. The current situation might have a negative impact on the openhtmltopdf project, because many web pages have small icons. was (Author: tilman): I looked at the sizes of the PDF test result files. Have a look at bitmask4babgr.pdf and intargb.pdf. This isn't just space needed for the extra dictionary. In bitmask4babgr.pdf, the first image had a compressed size of 214 and now it has a size of 701. OTOH the file PDFBOX-4184-032163.pdf had a size of 36240 and now 31607, and only 27007 by modifying estCompressSum() to sum += Math.abs(aDataRawRowSub); I'm wondering about the logic of chooseDataRowToWrite(). You're choosing the compression method based on the result of estCompressSum() which is the sum of the byte values. How would this have any influence on compression? Why would a sequence of 00 have a different compression length than a sequence of FF? Your comment mentions "This is just the recommend algorithm in the spec" and surprisingly, this is true: [https://medium.com/@duhroach/how-png-works-f1174e3cc7b7] that one recommends to use abs of signed values (which I tried above). I tried that but it doesn't make things better for the non photo files. Same here with more details: [https://www.w3.org/TR/PNG-Encoders.html#E.Filter-selection] I think we should count colors and/or consider the bit depth. Or the geometric size of the image, i.e. something below 25x25 is probably rather an icon than a photograph. The current situation might have a negative impact on the openhtmltopdf project, because many web page have small icons. > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.12, 3.0.0 PDFBox > > Attachments: 16bit.png, LoadGovdocs.java, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > lossless_predictor_based_imageencoding_v6.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a
[jira] [Comment Edited] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16529218#comment-16529218 ] Tilman Hausherr edited comment on PDFBOX-4184 at 7/1/18 8:08 PM: - For copyright reasons we can't include some of the files in the repository. File 032163.jpg comes from a government site but I couldn't find any details. And of course we don't know the copyright of the arrow picture from https://github.com/danfickle/openhtmltopdf/issues/173 . No need to resubmit anything. was (Author: tilman): For copyright reasons we can't include some of the files in the repository. File 032163.jpg comes from a government site but I couldn't find any details. And of course we don't know the copyright of the arrow picture from https://github.com/danfickle/openhtmltopdf/issues/173 . > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.12, 3.0.0 PDFBox > > Attachments: 16bit.png, LoadGovdocs.java, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > lossless_predictor_based_imageencoding_v6.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16528728#comment-16528728 ] Tilman Hausherr edited comment on PDFBOX-4184 at 6/30/18 2:29 PM: -- There's a new problem and I don't know why this didn't come up before. See this code: {code:java} public void testCreateLosslessFrom16BitPNG() throws IOException { PDDocument document = new PDDocument(); BufferedImage image = ImageIO.read(this.getClass().getResourceAsStream("16bit.png")); assertEquals(64, image.getColorModel().getPixelSize()); assertEquals(Transparency.TRANSLUCENT, image.getColorModel().getTransparency()); assertEquals(4, image.getRaster().getNumDataElements()); assertEquals(java.awt.image.DataBuffer.TYPE_USHORT, image.getRaster().getDataBuffer().getDataType()); PDImageXObject ximage = LosslessFactory.createFromImage(document, image); int w = image.getWidth(); int h = image.getHeight(); validate(ximage, 16, w, h, "png", PDDeviceRGB.INSTANCE.getName()); System.out.println(ximage.getImage()); checkIdent(image, ximage.getImage()); checkIdentRGB(image, ximage.getOpaqueImage()); assertNotNull(ximage.getSoftMask()); validate(ximage.getSoftMask(), 8, w, h, "png", PDDeviceGray.INSTANCE.getName()); assertEquals(35, colorCount(ximage.getSoftMask().getImage())); doWritePDF(document, ximage, testResultsDir, "png16bit.pdf"); } {code} The test fails because the softmask is all 0. For some reason, {{alphaImageData}} is not filled when {{prepareImageXObject}} is called by {{preparePredictorPDImage}}. Could it be that when the PredictorEncoder path is taken, that you forgot to handle the transparency? That test wasn't public because the test file (file from one of your users) is probably copyrighted somehow. Maybe in my previous tests I had deleted it to allow the patch being applied, or I had tested your patch on an unmodified project, or on my other computer. was (Author: tilman): There's a new problem and I don't know why this didn't come up before. See this code: {code:java} public void testCreateLosslessFrom16BitPNG() throws IOException { PDDocument document = new PDDocument(); BufferedImage image = ImageIO.read(this.getClass().getResourceAsStream("16bit.png")); assertEquals(64, image.getColorModel().getPixelSize()); assertEquals(Transparency.TRANSLUCENT, image.getColorModel().getTransparency()); assertEquals(4, image.getRaster().getNumDataElements()); assertEquals(java.awt.image.DataBuffer.TYPE_USHORT, image.getRaster().getDataBuffer().getDataType()); PDImageXObject ximage = LosslessFactory.createFromImage(document, image); int w = image.getWidth(); int h = image.getHeight(); validate(ximage, 16, w, h, "png", PDDeviceRGB.INSTANCE.getName()); System.out.println(ximage.getImage()); checkIdent(image, ximage.getImage()); checkIdentRGB(image, ximage.getOpaqueImage()); assertNotNull(ximage.getSoftMask()); validate(ximage.getSoftMask(), 8, w, h, "png", PDDeviceGray.INSTANCE.getName()); assertEquals(35, colorCount(ximage.getSoftMask().getImage())); doWritePDF(document, ximage, testResultsDir, "png16bit.pdf"); } {code} The test fails because the softmask is all 0. For some reason, {{alphaImageData}} is not filled when {{prepareImageXObject}} is called by {{preparePredictorPDImage}}. Could it be that when the PredictorEncoder path is taken, that you forgot to handle the transparency? > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.12, 3.0.0 PDFBox > > Attachments: 16bit.png, LoadGovdocs.java, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images
[jira] [Comment Edited] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16528728#comment-16528728 ] Tilman Hausherr edited comment on PDFBOX-4184 at 6/30/18 2:05 PM: -- There's a new problem and I don't know why this didn't come up before. See this code: {code:java} public void testCreateLosslessFrom16BitPNG() throws IOException { PDDocument document = new PDDocument(); BufferedImage image = ImageIO.read(this.getClass().getResourceAsStream("16bit.png")); assertEquals(64, image.getColorModel().getPixelSize()); assertEquals(Transparency.TRANSLUCENT, image.getColorModel().getTransparency()); assertEquals(4, image.getRaster().getNumDataElements()); assertEquals(java.awt.image.DataBuffer.TYPE_USHORT, image.getRaster().getDataBuffer().getDataType()); PDImageXObject ximage = LosslessFactory.createFromImage(document, image); int w = image.getWidth(); int h = image.getHeight(); validate(ximage, 16, w, h, "png", PDDeviceRGB.INSTANCE.getName()); System.out.println(ximage.getImage()); checkIdent(image, ximage.getImage()); checkIdentRGB(image, ximage.getOpaqueImage()); assertNotNull(ximage.getSoftMask()); validate(ximage.getSoftMask(), 8, w, h, "png", PDDeviceGray.INSTANCE.getName()); assertEquals(35, colorCount(ximage.getSoftMask().getImage())); doWritePDF(document, ximage, testResultsDir, "png16bit.pdf"); } {code} The test fails because the softmask is all 0. For some reason, {{alphaImageData}} is not filled when {{prepareImageXObject}} is called by {{preparePredictorPDImage}}. Could it be that when the PredictorEncoder path is taken, that you forgot to handle the transparency? was (Author: tilman): There's a new problem and I don't know why this didn't come up before. See this code: {code:java} public void testCreateLosslessFrom16BitPNG() throws IOException { PDDocument document = new PDDocument(); BufferedImage image = ImageIO.read(this.getClass().getResourceAsStream("16bit.png")); assertEquals(64, image.getColorModel().getPixelSize()); assertEquals(Transparency.TRANSLUCENT, image.getColorModel().getTransparency()); assertEquals(4, image.getRaster().getNumDataElements()); assertEquals(java.awt.image.DataBuffer.TYPE_USHORT, image.getRaster().getDataBuffer().getDataType()); PDImageXObject ximage = LosslessFactory.createFromImage(document, image); int w = image.getWidth(); int h = image.getHeight(); validate(ximage, 16, w, h, "png", PDDeviceRGB.INSTANCE.getName()); System.out.println(ximage.getImage()); checkIdent(image, ximage.getImage()); checkIdentRGB(image, ximage.getOpaqueImage()); assertNotNull(ximage.getSoftMask()); validate(ximage.getSoftMask(), 8, w, h, "png", PDDeviceGray.INSTANCE.getName()); assertEquals(35, colorCount(ximage.getSoftMask().getImage())); doWritePDF(document, ximage, testResultsDir, "png16bit.pdf"); } {code} The test fails because the softmask is all 0. For some reason, {{alphaImageData}} is not filled when {{prepareImageXObject}} is called by {{preparePredictorPDImage}}. > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.12, 3.0.0 PDFBox > > Attachments: 16bit.png, LoadGovdocs.java, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > lossless_predictor_based_imageencoding_v3.patch, > lossless_predictor_based_imageencoding_v4.patch, > lossless_predictor_based_imageencoding_v5.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG >
[jira] [Comment Edited] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16473037#comment-16473037 ] Tilman Hausherr edited comment on PDFBOX-4184 at 5/12/18 11:05 AM: --- If you find a bug in your code, please create a failing test. Ideally this would include an image that fails. Usually the govdocs images are from the US government but we need to be sure, e.g. by doing a reverse search on google images. was (Author: tilman): If you find a bug in your code, please create a failing test. Ideally this would include an image that fails. Usually these images are from the US government but we need to be sure, e.g. by doing a reverse search on google images. > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.10, 3.0.0 PDFBox > > Attachments: LoadGovdocs.java, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16473018#comment-16473018 ] Emmeran Seehuber edited comment on PDFBOX-4184 at 5/12/18 10:27 AM: The Govdocs corpus is a little bit big ... I'll let those tests run in the office on Monday, as my iMac there is faster to process that many documents... Regarding directly using the DeflaterOutputStream: I do this to be able to *stream* compress the image data, so that the image data is compressed row by row. This leads to less memory used while compressing and better CPU cache usage (as the data of one row is still in cache when it's fed to zip, in opposite to first encode the image in one big byte buffer (which means doubling the needed memory for the image) and then compressing it at the end. Of course when constructing a DeflateOutputStream it should use the Filter.SYSPROP_DEFLATELEVEL setting. I've refactored the code for this into its own method in Filter.getCompressionLevel(). See the updated patch. [^lossless_predictor_based_imageencoding_v2.patch] - This is still work in progress, not to be commited yet (need to analyze those image mismatches in the govdocs first) was (Author: rototor): The Govdocs corpus is a little bit big ... I'll let those tests run in the office on Monday, as my iMac there is faster to process that many documents... Regarding directly using the DeflaterOutputStream: I do this to be able to *stream* compress the image data, so that the image data is compressed row by row. This leads to less memory used while compressing and better CPU cache usage (as the data of one row is still in cache when it's fed to zip, in opposite to first encode the image in one big byte buffer (which means doubling the needed memory for the image) and then compressing it at the end. Of course when constructing a DeflateOutputStream it should use the Filter.SYSPROP_DEFLATELEVEL setting. I've refactored the code for this into its own method in Filter.getCompressionLevel(). See the updated patch. [^lossless_predictor_based_imageencoding_v2.patch] - This is as still work in progress, not to be commited yet (need to analyze those image mismatches in the govdocs first) > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.10, 3.0.0 PDFBox > > Attachments: LoadGovdocs.java, > lossless_predictor_based_imageencoding.patch, > lossless_predictor_based_imageencoding_v2.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16472558#comment-16472558 ] Tilman Hausherr edited comment on PDFBOX-4184 at 5/11/18 7:50 PM: -- I forgot to mention, we're planning a release soon, I prefer to wait until after the release before deciding to commit to 2.0. was (Author: tilman): I forgot to mention, we're planning a release soon, I prefer to wait until after the release. > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.10, 3.0.0 PDFBox > > Attachments: LoadGovdocs.java, > lossless_predictor_based_imageencoding.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16472478#comment-16472478 ] Tilman Hausherr edited comment on PDFBOX-4184 at 5/11/18 6:55 PM: -- Please run the tool I just uploaded... If a test fails, something is written to System.err and the two files are saved in the local directory. I get a few "hits": 001/001229.png: images not equal 001/001230.png: images not equal and also some jpg images. Without the change, this doesn't happen. I suspect that the differences are minor, but IMHO there shouldn't be any at all... was (Author: tilman): Please run the tool I just uploaded... I get a few "hits": 001/001229.png: images not equal 001/001230.png: images not equal and also some jpg images. Without the change, this doesn't happen. I suspect that the differences are minor, but IMHO there shouldn't be any at all... > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.10, 3.0.0 PDFBox > > Attachments: LoadGovdocs.java, > lossless_predictor_based_imageencoding.patch, > pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, > png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when writing lossless images, as > the images are currently not efficiently encoded. I.e. you could use PNG > encodings to get a better compression. (By adding a COSName.DECODE_PARMS with > a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is > something for a later patch. It would also need another API, as there is a > tradeoff speed vs compression ratio. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
[jira] [Comment Edited] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429420#comment-16429420 ] Tilman Hausherr edited comment on PDFBOX-4184 at 4/7/18 3:32 PM: - Thanks... I'll commit this within the next few days... I managed to create such an Image (IrfanView says it has "64 BitsPerPixel") so we can also have a local test but I didn't manage to have a failure, i.e. a bad PDF like with [the image from your issue|https://user-images.githubusercontent.com/29379074/36145630-f304cd0e-10d7-11e8-942c-66eb8040be70.png]: {code} ColorModel colorModel = new ComponentColorModel(ColorSpace.getInstance(ColorSpace.CS_LINEAR_RGB), true, false, Transparency.TRANSLUCENT, DataBuffer.TYPE_USHORT); WritableRaster raster = Raster.createInterleavedRaster(DataBuffer.TYPE_USHORT, 256, 256, 4, null); BufferedImage image = new BufferedImage(colorModel, raster, false, null); for (int x = 0; x < image.getWidth(); ++x) { for (int y = 0; y < image.getHeight(); ++y) { if (x == y) { switch (x % 4) { case 0: image.setRGB(x, y, 0x); break; case 1: image.setRGB(x, y, 0xFF00FF00); break; case 2: image.setRGB(x, y, 0xFFFF); break; case 3: image.setRGB(x, y, 0x); break; } } } } PDDocument doc = new PDDocument(); PDPage page = new PDPage(); doc.addPage(page); try (PDPageContentStream cs = new PDPageContentStream(doc, page)) { cs.drawImage(LosslessFactory.createFromImage(doc, image), 0f, page.getMediaBox().getHeight() - image.getHeight()); } {code} was (Author: tilman): Thanks... I'll commit this within the next few days... I managed to create such an Image (IrfanView says it has "64 BitsPerPixel") so we can also have a local test but I didn't manage to have a failure, i.e. a bad PDF like with the image from your issue: {code} ColorModel colorModel = new ComponentColorModel(ColorSpace.getInstance(ColorSpace.CS_LINEAR_RGB), true, false, Transparency.TRANSLUCENT, DataBuffer.TYPE_USHORT); WritableRaster raster = Raster.createInterleavedRaster(DataBuffer.TYPE_USHORT, 256, 256, 4, null); BufferedImage image = new BufferedImage(colorModel, raster, false, null); for (int x = 0; x < image.getWidth(); ++x) { for (int y = 0; y < image.getHeight(); ++y) { if (x == y) { switch (x % 4) { case 0: image.setRGB(x, y, 0x); break; case 1: image.setRGB(x, y, 0xFF00FF00); break; case 2: image.setRGB(x, y, 0xFFFF); break; case 3: image.setRGB(x, y, 0x); break; } } } } PDDocument doc = new PDDocument(); PDPage page = new PDPage(); doc.addPage(page); try (PDPageContentStream cs = new PDPageContentStream(doc, page)) { cs.drawImage(LosslessFactory.createFromImage(doc, image), 0f, page.getMediaBox().getHeight() - image.getHeight()); } {code} > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.10, 3.0.0 PDFBox > > Attachments: pdfbox_support_16bit_image_write.patch > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix
[jira] [Comment Edited] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429420#comment-16429420 ] Tilman Hausherr edited comment on PDFBOX-4184 at 4/7/18 3:32 PM: - Thanks... I'll commit this within the next few days... I managed to create such an Image (IrfanView says it has "64 BitsPerPixel") so we can also have a local test but I didn't manage to have a failure, i.e. a bad PDF like with [the image from the github issue|https://user-images.githubusercontent.com/29379074/36145630-f304cd0e-10d7-11e8-942c-66eb8040be70.png]: {code} ColorModel colorModel = new ComponentColorModel(ColorSpace.getInstance(ColorSpace.CS_LINEAR_RGB), true, false, Transparency.TRANSLUCENT, DataBuffer.TYPE_USHORT); WritableRaster raster = Raster.createInterleavedRaster(DataBuffer.TYPE_USHORT, 256, 256, 4, null); BufferedImage image = new BufferedImage(colorModel, raster, false, null); for (int x = 0; x < image.getWidth(); ++x) { for (int y = 0; y < image.getHeight(); ++y) { if (x == y) { switch (x % 4) { case 0: image.setRGB(x, y, 0x); break; case 1: image.setRGB(x, y, 0xFF00FF00); break; case 2: image.setRGB(x, y, 0xFFFF); break; case 3: image.setRGB(x, y, 0x); break; } } } } PDDocument doc = new PDDocument(); PDPage page = new PDPage(); doc.addPage(page); try (PDPageContentStream cs = new PDPageContentStream(doc, page)) { cs.drawImage(LosslessFactory.createFromImage(doc, image), 0f, page.getMediaBox().getHeight() - image.getHeight()); } {code} was (Author: tilman): Thanks... I'll commit this within the next few days... I managed to create such an Image (IrfanView says it has "64 BitsPerPixel") so we can also have a local test but I didn't manage to have a failure, i.e. a bad PDF like with [the image from your issue|https://user-images.githubusercontent.com/29379074/36145630-f304cd0e-10d7-11e8-942c-66eb8040be70.png]: {code} ColorModel colorModel = new ComponentColorModel(ColorSpace.getInstance(ColorSpace.CS_LINEAR_RGB), true, false, Transparency.TRANSLUCENT, DataBuffer.TYPE_USHORT); WritableRaster raster = Raster.createInterleavedRaster(DataBuffer.TYPE_USHORT, 256, 256, 4, null); BufferedImage image = new BufferedImage(colorModel, raster, false, null); for (int x = 0; x < image.getWidth(); ++x) { for (int y = 0; y < image.getHeight(); ++y) { if (x == y) { switch (x % 4) { case 0: image.setRGB(x, y, 0x); break; case 1: image.setRGB(x, y, 0xFF00FF00); break; case 2: image.setRGB(x, y, 0xFFFF); break; case 3: image.setRGB(x, y, 0x); break; } } } } PDDocument doc = new PDDocument(); PDPage page = new PDPage(); doc.addPage(page); try (PDPageContentStream cs = new PDPageContentStream(doc, page)) { cs.drawImage(LosslessFactory.createFromImage(doc, image), 0f, page.getMediaBox().getHeight() - image.getHeight()); } {code} > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.10, 3.0.0 PDFBox > > Attachments: pdfbox_support_16bit_image_write.patch > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but
[jira] [Comment Edited] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images
[ https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429420#comment-16429420 ] Tilman Hausherr edited comment on PDFBOX-4184 at 4/7/18 3:31 PM: - Thanks... I'll commit this within the next few days... I managed to create such an Image (IrfanView says it has "64 BitsPerPixel") so we can also have a local test but I didn't manage to have a failure, i.e. a bad PDF like with the image from your issue: {code} ColorModel colorModel = new ComponentColorModel(ColorSpace.getInstance(ColorSpace.CS_LINEAR_RGB), true, false, Transparency.TRANSLUCENT, DataBuffer.TYPE_USHORT); WritableRaster raster = Raster.createInterleavedRaster(DataBuffer.TYPE_USHORT, 256, 256, 4, null); BufferedImage image = new BufferedImage(colorModel, raster, false, null); for (int x = 0; x < image.getWidth(); ++x) { for (int y = 0; y < image.getHeight(); ++y) { if (x == y) { switch (x % 4) { case 0: image.setRGB(x, y, 0x); break; case 1: image.setRGB(x, y, 0xFF00FF00); break; case 2: image.setRGB(x, y, 0xFFFF); break; case 3: image.setRGB(x, y, 0x); break; } } } } PDDocument doc = new PDDocument(); PDPage page = new PDPage(); doc.addPage(page); try (PDPageContentStream cs = new PDPageContentStream(doc, page)) { cs.drawImage(LosslessFactory.createFromImage(doc, image), 0f, page.getMediaBox().getHeight() - image.getHeight()); } {code} was (Author: tilman): Thanks... I'll commit this within the next few days... I managed to create such an image so we can also have a local test but I didn't manage to have a failure, i.e. a bad PDF like with the image from your issue: {code} ColorModel colorModel = new ComponentColorModel(ColorSpace.getInstance(ColorSpace.CS_LINEAR_RGB), true, false, Transparency.TRANSLUCENT, DataBuffer.TYPE_USHORT); WritableRaster raster = Raster.createInterleavedRaster(DataBuffer.TYPE_USHORT, 256, 256, 4, null); BufferedImage image = new BufferedImage(colorModel, raster, false, null); for (int x = 0; x < image.getWidth(); ++x) { for (int y = 0; y < image.getHeight(); ++y) { if (x == y) { switch (x % 4) { case 0: image.setRGB(x, y, 0x); break; case 1: image.setRGB(x, y, 0xFF00FF00); break; case 2: image.setRGB(x, y, 0xFFFF); break; case 3: image.setRGB(x, y, 0x); break; } } } } PDDocument doc = new PDDocument(); PDPage page = new PDPage(); doc.addPage(page); try (PDPageContentStream cs = new PDPageContentStream(doc, page)) { cs.drawImage(LosslessFactory.createFromImage(doc, image), 0f, page.getMediaBox().getHeight() - image.getHeight()); } {code} > [PATCH]: Support simple lossless compression of 16 bit RGB images > - > > Key: PDFBOX-4184 > URL: https://issues.apache.org/jira/browse/PDFBOX-4184 > Project: PDFBox > Issue Type: Improvement > Components: Writing >Affects Versions: 2.0.9 >Reporter: Emmeran Seehuber >Priority: Minor > Fix For: 2.0.10, 3.0.0 PDFBox > > Attachments: pdfbox_support_16bit_image_write.patch > > > The attached patch add support to write 16 bit per component images > correctly. I've integrated a test for this here: > [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9] > It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this > is what you usually get when you read a 16 bit PNG file. > This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173]. > The patch is against 2.0.9, but should apply to 3.0.0 too. > There is still some room for improvements when