[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images

Emmeran Seehuber (JIRA) Tue, 18 Sep 2018 02:08:47 -0700


    [ 
https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16618743#comment-16618743
 ]


Emmeran Seehuber commented on PDFBOX-4184:
------------------------------------------

[~tilman] If you have a ICC profile on an image, which is not the builtin sRGB 
profile, you need the ICC profile, otherwise you will just have plain wrong 
colors. You should not look at (r,g,b) or (c,m,y,k) as concrete color values, 
but rather as vectors within the color space. Without a profile describing the 
vectorspace/colorspace you have no idea what real colors the vector values 
result in. DeviceRGB is (on screen) often interpreted as sRGB. But what 
DeviceCMYK means is really up to the concrete interpreting device. I.e. this 
will look different on every printer (brightness, color, ...). So DeviceCMYK as 
a colorspace for an image mostly means "random", if you are not explicit 
targeting one specific printer. 

The ICC profile describes how to transform the color-vector-data into other 
colorspaces, e.g. into sRGB to view on the screen or the concrete ICC profile 
of the printing device. 

If you load images in java using ImageIO you usually (especially when using 
twelve monkeys) get an sRGB image. So you would never hit this path. If you 
want to load an image with the real color profile of the image you must pass a 
special prepared (i.e. with the right profile) BufferedImage into ImageIO. So 
you wont get an image with an color space different to sRGB by accident.

If you have a image with an ICC profile, you always want the in this colorspace 
with the attached profile. As its already not so easy to get the image in 
anything different than sRGB.

Regarding file size bloat: Yes, the ICC profile will sum up, especially if you 
have more images. The correct solution would be a ICC_Profile <-> PDICCBased 
cache in the document, so that the same profile does not get encoded twice. 
Should I implement such a cache? In my application I manually deduplicate the 
ICC profiles at the moment.

The attached patch [^fix_profile_use4.patch] fixes the test driver and also 
specifies a "Alternate" colorspace for the profile, for all those devices which 
can not handle ICC_Profile's. With the correct ICC_Profile specified now also 
the "roundtrip" sRGB->ISO Coated->sRGB works correctly, so the image can be 
compared with the original image.

 

> [PATCH]: Support simple lossless compression of 16 bit RGB images
> -----------------------------------------------------------------
>
>                 Key: PDFBOX-4184
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4184
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: Writing
>    Affects Versions: 2.0.9
>            Reporter: Emmeran Seehuber
>            Priority: Minor
>             Fix For: 2.0.12, 3.0.0 PDFBox
>
>         Attachments: 16bit.png, LoadGovdocs.java, fix_profile_use.patch, 
> fix_profile_use3.patch, fix_profile_use4.patch, images.zip, 
> lossless_predictor_based_imageencoding.patch, 
> lossless_predictor_based_imageencoding_v2.patch, 
> lossless_predictor_based_imageencoding_v3.patch, 
> lossless_predictor_based_imageencoding_v4.patch, 
> lossless_predictor_based_imageencoding_v5.patch, 
> lossless_predictor_based_imageencoding_v6.patch, 
> pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, 
> png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf, 
> size_compare.txt
>
>
> The attached patch add support to write 16 bit per component images 
> correctly. I've integrated a test for this here: 
> [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9]
> It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this 
> is what you usually get when you read a 16 bit PNG file.
> This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173].
> The patch is against 2.0.9, but should apply to 3.0.0 too.
> There is still some room for improvements when writing lossless images, as 
> the images are currently not efficiently encoded. I.e. you could use PNG 
> encodings to get a better compression. (By adding a COSName.DECODE_PARMS with 
> a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is 
> something for a later patch. It would also need another API, as there is a 
> tradeoff speed vs compression ratio. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images

Reply via email to