[jira] [Comment Edited] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images

2018-11-27 Thread Tim Allison (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700641#comment-16700641
 ] 

Tim Allison edited comment on PDFBOX-4184 at 11/27/18 4:15 PM:
---

Re-opening to add literal govdocs1 test file 032163.jpg


was (Author: talli...@mitre.org):
Re-opening to add attachment

> [PATCH]: Support simple lossless compression of 16 bit RGB images
> -
>
> Key: PDFBOX-4184
> URL: https://issues.apache.org/jira/browse/PDFBOX-4184
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Writing
>Affects Versions: 2.0.9
>Reporter: Emmeran Seehuber
>Assignee: Tim Allison
>Priority: Minor
> Fix For: 2.0.12, 3.0.0 PDFBox
>
> Attachments: 032163.jpg, 16bit.png, LoadGovdocs.java, 
> fix_profile_use.patch, fix_profile_use3.patch, fix_profile_use4.patch, 
> images.zip, lossless_predictor_based_imageencoding.patch, 
> lossless_predictor_based_imageencoding_v2.patch, 
> lossless_predictor_based_imageencoding_v3.patch, 
> lossless_predictor_based_imageencoding_v4.patch, 
> lossless_predictor_based_imageencoding_v5.patch, 
> lossless_predictor_based_imageencoding_v6.patch, 
> pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, 
> png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf, 
> size_compare.txt
>
>
> The attached patch add support to write 16 bit per component images 
> correctly. I've integrated a test for this here: 
> [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9]
> It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this 
> is what you usually get when you read a 16 bit PNG file.
> This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173].
> The patch is against 2.0.9, but should apply to 3.0.0 too.
> There is still some room for improvements when writing lossless images, as 
> the images are currently not efficiently encoded. I.e. you could use PNG 
> encodings to get a better compression. (By adding a COSName.DECODE_PARMS with 
> a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is 
> something for a later patch. It would also need another API, as there is a 
> tradeoff speed vs compression ratio. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images

2018-09-21 Thread Tilman Hausherr (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16551917#comment-16551917
 ] 

Tilman Hausherr edited comment on PDFBOX-4184 at 9/21/18 5:44 PM:
--

I did a size comparison. It went over the zip files from 0 to 18. The 
attachment has the files were the size of the predictor compression was at 
least 5% over the size of the "old" compression. Almost all of the files are 
jpeg files and of the kind that shouldn't have been jpeg compressed in the 
first place. Jpeg is for photographs and not for charts, or anything with sharp 
edges.


was (Author: tilman):
I did a size comparison. It went over the zip files from 0 to 18. The 
attachment has the files were the size of the predictor compression was at 
least 5% over the size of the "old" compression. Alsmost all of the files are 
jpeg files and of the kind that shouldn't have been jpeg compressed in the 
first place. Jpeg is for photographs and not for charts, or anything with sharp 
edges.

> [PATCH]: Support simple lossless compression of 16 bit RGB images
> -
>
> Key: PDFBOX-4184
> URL: https://issues.apache.org/jira/browse/PDFBOX-4184
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Writing
>Affects Versions: 2.0.9
>Reporter: Emmeran Seehuber
>Assignee: Tilman Hausherr
>Priority: Minor
> Fix For: 2.0.12, 3.0.0 PDFBox
>
> Attachments: 16bit.png, LoadGovdocs.java, fix_profile_use.patch, 
> fix_profile_use3.patch, fix_profile_use4.patch, images.zip, 
> lossless_predictor_based_imageencoding.patch, 
> lossless_predictor_based_imageencoding_v2.patch, 
> lossless_predictor_based_imageencoding_v3.patch, 
> lossless_predictor_based_imageencoding_v4.patch, 
> lossless_predictor_based_imageencoding_v5.patch, 
> lossless_predictor_based_imageencoding_v6.patch, 
> pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, 
> png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf, 
> size_compare.txt
>
>
> The attached patch add support to write 16 bit per component images 
> correctly. I've integrated a test for this here: 
> [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9]
> It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this 
> is what you usually get when you read a 16 bit PNG file.
> This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173].
> The patch is against 2.0.9, but should apply to 3.0.0 too.
> There is still some room for improvements when writing lossless images, as 
> the images are currently not efficiently encoded. I.e. you could use PNG 
> encodings to get a better compression. (By adding a COSName.DECODE_PARMS with 
> a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is 
> something for a later patch. It would also need another API, as there is a 
> tradeoff speed vs compression ratio. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images

2018-09-19 Thread Tilman Hausherr (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16620827#comment-16620827
 ] 

Tilman Hausherr edited comment on PDFBOX-4184 at 9/19/18 4:39 PM:
--

The cmyk test fails, there are many 1-differences like this if I modify the 
test so that it reports differences without failing:

 

expected:  but was: ;

expected:  but was: ;

expected:  but was: ;

expected:  but was: ;

expected:  but was: ;

This is not much but I wonder why it works for you. What OS and what Java are 
you using? I tested this on W10 with jdk8 latest.


was (Author: tilman):
The cmyk test fails, there are many 1-differences like this:

 

expected:  but was: ;

expected:  but was: ;

expected:  but was: ;

expected:  but was: ;

expected:  but was: ;

This is not much but I wonder why it works for you. What OS and what Java are 
you using? I tested this on W10 with jdk8 latest.

> [PATCH]: Support simple lossless compression of 16 bit RGB images
> -
>
> Key: PDFBOX-4184
> URL: https://issues.apache.org/jira/browse/PDFBOX-4184
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Writing
>Affects Versions: 2.0.9
>Reporter: Emmeran Seehuber
>Priority: Minor
> Fix For: 2.0.12, 3.0.0 PDFBox
>
> Attachments: 16bit.png, LoadGovdocs.java, fix_profile_use.patch, 
> fix_profile_use3.patch, fix_profile_use4.patch, images.zip, 
> lossless_predictor_based_imageencoding.patch, 
> lossless_predictor_based_imageencoding_v2.patch, 
> lossless_predictor_based_imageencoding_v3.patch, 
> lossless_predictor_based_imageencoding_v4.patch, 
> lossless_predictor_based_imageencoding_v5.patch, 
> lossless_predictor_based_imageencoding_v6.patch, 
> pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, 
> png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf, 
> size_compare.txt
>
>
> The attached patch add support to write 16 bit per component images 
> correctly. I've integrated a test for this here: 
> [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9]
> It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this 
> is what you usually get when you read a 16 bit PNG file.
> This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173].
> The patch is against 2.0.9, but should apply to 3.0.0 too.
> There is still some room for improvements when writing lossless images, as 
> the images are currently not efficiently encoded. I.e. you could use PNG 
> encodings to get a better compression. (By adding a COSName.DECODE_PARMS with 
> a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is 
> something for a later patch. It would also need another API, as there is a 
> tradeoff speed vs compression ratio. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images

2018-09-18 Thread Emmeran Seehuber (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618743#comment-16618743
 ] 

Emmeran Seehuber edited comment on PDFBOX-4184 at 9/18/18 12:31 PM:


[~tilman] If you have a ICC profile on an image, which is not the builtin sRGB 
profile, you need the ICC profile, otherwise you will just have plain wrong 
colors. You should not look at (r,g,b) or (c,m,y,k) as concrete color values, 
but rather as vectors within the color space. Without a profile describing the 
vectorspace/colorspace you have no idea what real colors the vector values 
result in. DeviceRGB is (on screen) often interpreted as sRGB. But what 
DeviceCMYK means is really up to the concrete interpreting device. I.e. this 
will look different on every printer (brightness, color, ...). So DeviceCMYK as 
a colorspace for an image mostly means "random", if you are not explicit 
targeting one specific printer. 

The ICC profile describes how to transform the color-vector-data into other 
colorspaces, e.g. into sRGB to view on the screen or the concrete ICC profile 
of the printing device. 

If you load images in java using ImageIO you usually (especially when using 
twelve monkeys) get an sRGB image. So you would never hit this code path. If 
you want to load an image with the real color profile of the image you must 
pass a special prepared (i.e. with the right profile) BufferedImage into 
ImageIO. So you won't get an image with a color space different to sRGB by 
accident.

If you have an image with an ICC profile, you always want the image to be 
written with the ICC profile because you explicit care about it.

Regarding file size bloat: Yes, the ICC profile will sum up, especially if you 
have more images. The correct solution would be a ICC_Profile <-> PDICCBased 
cache in the document, so that the same profile does not get encoded twice. 
Should I implement such a cache? In my application I manually deduplicate the 
ICC profiles at the moment.

The attached patch [^fix_profile_use4.patch] fixes the test driver and also 
specifies a "Alternate" colorspace for the profile, for all those devices which 
can not handle ICC_Profile's. With the correct ICC_Profile specified now also 
the "roundtrip" sRGB->ISO Coated->sRGB works correctly, so the image can be 
compared with the original image.

 


was (Author: rototor):
[~tilman] If you have a ICC profile on an image, which is not the builtin sRGB 
profile, you need the ICC profile, otherwise you will just have plain wrong 
colors. You should not look at (r,g,b) or (c,m,y,k) as concrete color values, 
but rather as vectors within the color space. Without a profile describing the 
vectorspace/colorspace you have no idea what real colors the vector values 
result in. DeviceRGB is (on screen) often interpreted as sRGB. But what 
DeviceCMYK means is really up to the concrete interpreting device. I.e. this 
will look different on every printer (brightness, color, ...). So DeviceCMYK as 
a colorspace for an image mostly means "random", if you are not explicit 
targeting one specific printer. 

The ICC profile describes how to transform the color-vector-data into other 
colorspaces, e.g. into sRGB to view on the screen or the concrete ICC profile 
of the printing device. 

If you load images in java using ImageIO you usually (especially when using 
twelve monkeys) get an sRGB image. So you would never hit this path. If you 
want to load an image with the real color profile of the image you must pass a 
special prepared (i.e. with the right profile) BufferedImage into ImageIO. So 
you wont get an image with an color space different to sRGB by accident.

If you have a image with an ICC profile, you always want the in this colorspace 
with the attached profile. As its already not so easy to get the image in 
anything different than sRGB.

Regarding file size bloat: Yes, the ICC profile will sum up, especially if you 
have more images. The correct solution would be a ICC_Profile <-> PDICCBased 
cache in the document, so that the same profile does not get encoded twice. 
Should I implement such a cache? In my application I manually deduplicate the 
ICC profiles at the moment.

The attached patch [^fix_profile_use4.patch] fixes the test driver and also 
specifies a "Alternate" colorspace for the profile, for all those devices which 
can not handle ICC_Profile's. With the correct ICC_Profile specified now also 
the "roundtrip" sRGB->ISO Coated->sRGB works correctly, so the image can be 
compared with the original image.

 

> [PATCH]: Support simple lossless compression of 16 bit RGB images
> -
>
> Key: PDFBOX-4184
> URL: https://issues.apache.org/jira/browse/PDFBOX-4184
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Writing
>

[jira] [Comment Edited] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images

2018-09-17 Thread Tilman Hausherr (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16617995#comment-16617995
 ] 

Tilman Hausherr edited comment on PDFBOX-4184 at 9/17/18 6:56 PM:
--

Thanks, the change makes sense, but I'd like to have a "no longer failing" test 
for this, i.e. where the generated PDF looks different than the image due to 
the missing ICC profile. Another problem is that 
{{testCreateLosslessFromImageCMYK}} now fails. I wonder if the ICC profile is 
needed for CMYK? I also see the danger that PDFs get bigger, if each image now 
has a (different) ICC profile. And what about b/w images?


was (Author: tilman):
Thanks, the change makes sense, but I'd like to have a "no longer failing" test 
for this, i.e. where the generated PDF looks different than the image due to 
the missing ICC profile. Another problem is 
that\{{testCreateLosslessFromImageCMYK}} now fails. I wonder if the ICC profile 
is needed for CMYK? I also see the danger that PDFs get bigger, if each image 
now has a (different) ICC profile. And what about b/w images?

> [PATCH]: Support simple lossless compression of 16 bit RGB images
> -
>
> Key: PDFBOX-4184
> URL: https://issues.apache.org/jira/browse/PDFBOX-4184
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Writing
>Affects Versions: 2.0.9
>Reporter: Emmeran Seehuber
>Priority: Minor
> Fix For: 2.0.12, 3.0.0 PDFBox
>
> Attachments: 16bit.png, LoadGovdocs.java, fix_profile_use.patch, 
> images.zip, lossless_predictor_based_imageencoding.patch, 
> lossless_predictor_based_imageencoding_v2.patch, 
> lossless_predictor_based_imageencoding_v3.patch, 
> lossless_predictor_based_imageencoding_v4.patch, 
> lossless_predictor_based_imageencoding_v5.patch, 
> lossless_predictor_based_imageencoding_v6.patch, 
> pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, 
> png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf, 
> size_compare.txt
>
>
> The attached patch add support to write 16 bit per component images 
> correctly. I've integrated a test for this here: 
> [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9]
> It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this 
> is what you usually get when you read a 16 bit PNG file.
> This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173].
> The patch is against 2.0.9, but should apply to 3.0.0 too.
> There is still some room for improvements when writing lossless images, as 
> the images are currently not efficiently encoded. I.e. you could use PNG 
> encodings to get a better compression. (By adding a COSName.DECODE_PARMS with 
> a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is 
> something for a later patch. It would also need another API, as there is a 
> tradeoff speed vs compression ratio. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images

2018-07-02 Thread Tilman Hausherr (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530102#comment-16530102
 ] 

Tilman Hausherr edited comment on PDFBOX-4184 at 7/2/18 3:53 PM:
-

I looked at the sizes of the PDF test result files. Have a look at 
bitmask4babgr.pdf and intargb.pdf. This isn't just space needed for the extra 
dictionary. In bitmask4babgr.pdf, the first image had a compressed size of 214 
and now it has a size of 701.

OTOH the file PDFBOX-4184-032163.pdf had a size of 36240 and now 31607, and 
only 27007 by modifying estCompressSum() to

sum += Math.abs(aDataRawRowSub);

I'm wondering about the logic of chooseDataRowToWrite(). You're choosing the 
compression method based on the result of estCompressSum() which is the sum of 
the byte values. How would this have any influence on compression? Why would a 
sequence of 00 have a different compression length than a sequence of FF? Your 
comment mentions "This is just the recommend algorithm in the spec" and 
surprisingly, this is true:

[https://medium.com/@duhroach/how-png-works-f1174e3cc7b7]
 that one recommends to use abs of signed values (which I tried above). I tried 
that but it doesn't make things better for the non photo files.

Same here with more details:
 [https://www.w3.org/TR/PNG-Encoders.html#E.Filter-selection]

I think we should count colors and/or consider the bit depth. Or the geometric 
size of the image, i.e. something below 25x25 is probably rather an icon than a 
photograph.

The current situation might have a negative impact on the openhtmltopdf 
project, because many web pages have small icons.


was (Author: tilman):
I looked at the sizes of the PDF test result files. Have a look at 
bitmask4babgr.pdf and intargb.pdf. This isn't just space needed for the extra 
dictionary. In bitmask4babgr.pdf, the first image had a compressed size of 214 
and now it has a size of 701.

OTOH the file PDFBOX-4184-032163.pdf had a size of 36240 and now 31607, and 
only 27007 by modifying estCompressSum() to

sum += Math.abs(aDataRawRowSub);

I'm wondering about the logic of chooseDataRowToWrite(). You're choosing the 
compression method based on the result of estCompressSum() which is the sum of 
the byte values. How would this have any influence on compression? Why would a 
sequence of 00 have a different compression length than a sequence of FF? Your 
comment mentions "This is just the recommend algorithm in the spec" and 
surprisingly, this is true:

[https://medium.com/@duhroach/how-png-works-f1174e3cc7b7]
 that one recommends to use abs of signed values (which I tried above). I tried 
that but it doesn't make things better for the non photo files.

Same here with more details:
 [https://www.w3.org/TR/PNG-Encoders.html#E.Filter-selection]

I think we should count colors and/or consider the bit depth. Or the geometric 
size of the image, i.e. something below 25x25 is probably rather an icon than a 
photograph.

The current situation might have a negative impact on the openhtmltopdf 
project, because many web page have small icons.

> [PATCH]: Support simple lossless compression of 16 bit RGB images
> -
>
> Key: PDFBOX-4184
> URL: https://issues.apache.org/jira/browse/PDFBOX-4184
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Writing
>Affects Versions: 2.0.9
>Reporter: Emmeran Seehuber
>Priority: Minor
> Fix For: 2.0.12, 3.0.0 PDFBox
>
> Attachments: 16bit.png, LoadGovdocs.java, 
> lossless_predictor_based_imageencoding.patch, 
> lossless_predictor_based_imageencoding_v2.patch, 
> lossless_predictor_based_imageencoding_v3.patch, 
> lossless_predictor_based_imageencoding_v4.patch, 
> lossless_predictor_based_imageencoding_v5.patch, 
> lossless_predictor_based_imageencoding_v6.patch, 
> pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, 
> png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf
>
>
> The attached patch add support to write 16 bit per component images 
> correctly. I've integrated a test for this here: 
> [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9]
> It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this 
> is what you usually get when you read a 16 bit PNG file.
> This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173].
> The patch is against 2.0.9, but should apply to 3.0.0 too.
> There is still some room for improvements when writing lossless images, as 
> the images are currently not efficiently encoded. I.e. you could use PNG 
> encodings to get a better compression. (By adding a COSName.DECODE_PARMS with 
> a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is 
> something for a 

[jira] [Comment Edited] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images

2018-07-01 Thread Tilman Hausherr (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16529218#comment-16529218
 ] 

Tilman Hausherr edited comment on PDFBOX-4184 at 7/1/18 8:08 PM:
-

For copyright reasons we can't include some of the files in the repository. 
File 032163.jpg comes from a government site but I couldn't find any details. 
And of course we don't know the copyright of the arrow picture from 
https://github.com/danfickle/openhtmltopdf/issues/173 .

No need to resubmit anything.


was (Author: tilman):
For copyright reasons we can't include some of the files in the repository. 
File 032163.jpg comes from a government site but I couldn't find any details. 
And of course we don't know the copyright of the arrow picture from 
https://github.com/danfickle/openhtmltopdf/issues/173 .

> [PATCH]: Support simple lossless compression of 16 bit RGB images
> -
>
> Key: PDFBOX-4184
> URL: https://issues.apache.org/jira/browse/PDFBOX-4184
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Writing
>Affects Versions: 2.0.9
>Reporter: Emmeran Seehuber
>Priority: Minor
> Fix For: 2.0.12, 3.0.0 PDFBox
>
> Attachments: 16bit.png, LoadGovdocs.java, 
> lossless_predictor_based_imageencoding.patch, 
> lossless_predictor_based_imageencoding_v2.patch, 
> lossless_predictor_based_imageencoding_v3.patch, 
> lossless_predictor_based_imageencoding_v4.patch, 
> lossless_predictor_based_imageencoding_v5.patch, 
> lossless_predictor_based_imageencoding_v6.patch, 
> pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, 
> png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf
>
>
> The attached patch add support to write 16 bit per component images 
> correctly. I've integrated a test for this here: 
> [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9]
> It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this 
> is what you usually get when you read a 16 bit PNG file.
> This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173].
> The patch is against 2.0.9, but should apply to 3.0.0 too.
> There is still some room for improvements when writing lossless images, as 
> the images are currently not efficiently encoded. I.e. you could use PNG 
> encodings to get a better compression. (By adding a COSName.DECODE_PARMS with 
> a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is 
> something for a later patch. It would also need another API, as there is a 
> tradeoff speed vs compression ratio. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images

2018-06-30 Thread Tilman Hausherr (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16528728#comment-16528728
 ] 

Tilman Hausherr edited comment on PDFBOX-4184 at 6/30/18 2:29 PM:
--

There's a new problem and I don't know why this didn't come up before. See this 
code:
{code:java}
    public void testCreateLosslessFrom16BitPNG() throws IOException
    {
    PDDocument document = new PDDocument();
    BufferedImage image = 
ImageIO.read(this.getClass().getResourceAsStream("16bit.png"));

    assertEquals(64, image.getColorModel().getPixelSize());
    assertEquals(Transparency.TRANSLUCENT, 
image.getColorModel().getTransparency());
    assertEquals(4, image.getRaster().getNumDataElements());
    assertEquals(java.awt.image.DataBuffer.TYPE_USHORT, 
image.getRaster().getDataBuffer().getDataType());

    PDImageXObject ximage = LosslessFactory.createFromImage(document, 
image);

    int w = image.getWidth();
    int h = image.getHeight();
    validate(ximage, 16, w, h, "png", PDDeviceRGB.INSTANCE.getName());
    System.out.println(ximage.getImage());
    checkIdent(image, ximage.getImage());
    checkIdentRGB(image, ximage.getOpaqueImage());

    assertNotNull(ximage.getSoftMask());
    validate(ximage.getSoftMask(), 8, w, h, "png", 
PDDeviceGray.INSTANCE.getName());
    assertEquals(35, colorCount(ximage.getSoftMask().getImage()));

    doWritePDF(document, ximage, testResultsDir, "png16bit.pdf");
    }
{code}
The test fails because the softmask is all 0. For some reason, 
{{alphaImageData}} is not filled when {{prepareImageXObject}} is called by 
{{preparePredictorPDImage}}. Could it be that when the PredictorEncoder path is 
taken, that you forgot to handle the transparency?

That test wasn't public because the test file (file from one of your users) is 
probably copyrighted somehow. Maybe in my previous tests I had deleted it to 
allow the patch being applied, or I had tested your patch on an unmodified 
project, or on my other computer.


was (Author: tilman):
There's a new problem and I don't know why this didn't come up before. See this 
code:
{code:java}
    public void testCreateLosslessFrom16BitPNG() throws IOException
    {
    PDDocument document = new PDDocument();
    BufferedImage image = 
ImageIO.read(this.getClass().getResourceAsStream("16bit.png"));

    assertEquals(64, image.getColorModel().getPixelSize());
    assertEquals(Transparency.TRANSLUCENT, 
image.getColorModel().getTransparency());
    assertEquals(4, image.getRaster().getNumDataElements());
    assertEquals(java.awt.image.DataBuffer.TYPE_USHORT, 
image.getRaster().getDataBuffer().getDataType());

    PDImageXObject ximage = LosslessFactory.createFromImage(document, 
image);

    int w = image.getWidth();
    int h = image.getHeight();
    validate(ximage, 16, w, h, "png", PDDeviceRGB.INSTANCE.getName());
    System.out.println(ximage.getImage());
    checkIdent(image, ximage.getImage());
    checkIdentRGB(image, ximage.getOpaqueImage());

    assertNotNull(ximage.getSoftMask());
    validate(ximage.getSoftMask(), 8, w, h, "png", 
PDDeviceGray.INSTANCE.getName());
    assertEquals(35, colorCount(ximage.getSoftMask().getImage()));

    doWritePDF(document, ximage, testResultsDir, "png16bit.pdf");
    }
{code}
The test fails because the softmask is all 0. For some reason, 
{{alphaImageData}} is not filled when {{prepareImageXObject}} is called by 
{{preparePredictorPDImage}}. Could it be that when the PredictorEncoder path is 
taken, that you forgot to handle the transparency?

> [PATCH]: Support simple lossless compression of 16 bit RGB images
> -
>
> Key: PDFBOX-4184
> URL: https://issues.apache.org/jira/browse/PDFBOX-4184
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Writing
>Affects Versions: 2.0.9
>Reporter: Emmeran Seehuber
>Priority: Minor
> Fix For: 2.0.12, 3.0.0 PDFBox
>
> Attachments: 16bit.png, LoadGovdocs.java, 
> lossless_predictor_based_imageencoding.patch, 
> lossless_predictor_based_imageencoding_v2.patch, 
> lossless_predictor_based_imageencoding_v3.patch, 
> lossless_predictor_based_imageencoding_v4.patch, 
> lossless_predictor_based_imageencoding_v5.patch, 
> pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, 
> png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf
>
>
> The attached patch add support to write 16 bit per component images 
> correctly. I've integrated a test for this here: 
> [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9]
> It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images 

[jira] [Comment Edited] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images

2018-06-30 Thread Tilman Hausherr (JIRA)


[ 
https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16528728#comment-16528728
 ] 

Tilman Hausherr edited comment on PDFBOX-4184 at 6/30/18 2:05 PM:
--

There's a new problem and I don't know why this didn't come up before. See this 
code:
{code:java}
    public void testCreateLosslessFrom16BitPNG() throws IOException
    {
    PDDocument document = new PDDocument();
    BufferedImage image = 
ImageIO.read(this.getClass().getResourceAsStream("16bit.png"));

    assertEquals(64, image.getColorModel().getPixelSize());
    assertEquals(Transparency.TRANSLUCENT, 
image.getColorModel().getTransparency());
    assertEquals(4, image.getRaster().getNumDataElements());
    assertEquals(java.awt.image.DataBuffer.TYPE_USHORT, 
image.getRaster().getDataBuffer().getDataType());

    PDImageXObject ximage = LosslessFactory.createFromImage(document, 
image);

    int w = image.getWidth();
    int h = image.getHeight();
    validate(ximage, 16, w, h, "png", PDDeviceRGB.INSTANCE.getName());
    System.out.println(ximage.getImage());
    checkIdent(image, ximage.getImage());
    checkIdentRGB(image, ximage.getOpaqueImage());

    assertNotNull(ximage.getSoftMask());
    validate(ximage.getSoftMask(), 8, w, h, "png", 
PDDeviceGray.INSTANCE.getName());
    assertEquals(35, colorCount(ximage.getSoftMask().getImage()));

    doWritePDF(document, ximage, testResultsDir, "png16bit.pdf");
    }
{code}
The test fails because the softmask is all 0. For some reason, 
{{alphaImageData}} is not filled when {{prepareImageXObject}} is called by 
{{preparePredictorPDImage}}. Could it be that when the PredictorEncoder path is 
taken, that you forgot to handle the transparency?


was (Author: tilman):
There's a new problem and I don't know why this didn't come up before. See this 
code:
{code:java}
    public void testCreateLosslessFrom16BitPNG() throws IOException
    {
    PDDocument document = new PDDocument();
    BufferedImage image = 
ImageIO.read(this.getClass().getResourceAsStream("16bit.png"));

    assertEquals(64, image.getColorModel().getPixelSize());
    assertEquals(Transparency.TRANSLUCENT, 
image.getColorModel().getTransparency());
    assertEquals(4, image.getRaster().getNumDataElements());
    assertEquals(java.awt.image.DataBuffer.TYPE_USHORT, 
image.getRaster().getDataBuffer().getDataType());

    PDImageXObject ximage = LosslessFactory.createFromImage(document, 
image);

    int w = image.getWidth();
    int h = image.getHeight();
    validate(ximage, 16, w, h, "png", PDDeviceRGB.INSTANCE.getName());
    System.out.println(ximage.getImage());
    checkIdent(image, ximage.getImage());
    checkIdentRGB(image, ximage.getOpaqueImage());

    assertNotNull(ximage.getSoftMask());
    validate(ximage.getSoftMask(), 8, w, h, "png", 
PDDeviceGray.INSTANCE.getName());
    assertEquals(35, colorCount(ximage.getSoftMask().getImage()));

    doWritePDF(document, ximage, testResultsDir, "png16bit.pdf");
    }
{code}
The test fails because the softmask is all 0. For some reason, 
{{alphaImageData}} is not filled when {{prepareImageXObject}} is called by 
{{preparePredictorPDImage}}.

> [PATCH]: Support simple lossless compression of 16 bit RGB images
> -
>
> Key: PDFBOX-4184
> URL: https://issues.apache.org/jira/browse/PDFBOX-4184
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Writing
>Affects Versions: 2.0.9
>Reporter: Emmeran Seehuber
>Priority: Minor
> Fix For: 2.0.12, 3.0.0 PDFBox
>
> Attachments: 16bit.png, LoadGovdocs.java, 
> lossless_predictor_based_imageencoding.patch, 
> lossless_predictor_based_imageencoding_v2.patch, 
> lossless_predictor_based_imageencoding_v3.patch, 
> lossless_predictor_based_imageencoding_v4.patch, 
> lossless_predictor_based_imageencoding_v5.patch, 
> pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, 
> png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf
>
>
> The attached patch add support to write 16 bit per component images 
> correctly. I've integrated a test for this here: 
> [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9]
> It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this 
> is what you usually get when you read a 16 bit PNG file.
> This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173].
> The patch is against 2.0.9, but should apply to 3.0.0 too.
> There is still some room for improvements when writing lossless images, as 
> the images are currently not efficiently encoded. I.e. you could use PNG 
> 

[jira] [Comment Edited] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images

2018-05-12 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16473037#comment-16473037
 ] 

Tilman Hausherr edited comment on PDFBOX-4184 at 5/12/18 11:05 AM:
---

If you find a bug in your code, please create a failing test. Ideally this 
would include an image that fails. Usually the govdocs images are from the US 
government but we need to be sure, e.g. by doing a reverse search on google 
images.


was (Author: tilman):
If you find a bug in your code, please create a failing test. Ideally this 
would include an image that fails. Usually these images are from the US 
government but we need to be sure, e.g. by doing a reverse search on google 
images.

> [PATCH]: Support simple lossless compression of 16 bit RGB images
> -
>
> Key: PDFBOX-4184
> URL: https://issues.apache.org/jira/browse/PDFBOX-4184
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Writing
>Affects Versions: 2.0.9
>Reporter: Emmeran Seehuber
>Priority: Minor
> Fix For: 2.0.10, 3.0.0 PDFBox
>
> Attachments: LoadGovdocs.java, 
> lossless_predictor_based_imageencoding.patch, 
> lossless_predictor_based_imageencoding_v2.patch, 
> pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, 
> png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf
>
>
> The attached patch add support to write 16 bit per component images 
> correctly. I've integrated a test for this here: 
> [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9]
> It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this 
> is what you usually get when you read a 16 bit PNG file.
> This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173].
> The patch is against 2.0.9, but should apply to 3.0.0 too.
> There is still some room for improvements when writing lossless images, as 
> the images are currently not efficiently encoded. I.e. you could use PNG 
> encodings to get a better compression. (By adding a COSName.DECODE_PARMS with 
> a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is 
> something for a later patch. It would also need another API, as there is a 
> tradeoff speed vs compression ratio. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images

2018-05-12 Thread Emmeran Seehuber (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16473018#comment-16473018
 ] 

Emmeran Seehuber edited comment on PDFBOX-4184 at 5/12/18 10:27 AM:


The Govdocs corpus is a little bit big ... I'll let those tests run in the 
office on Monday, as my iMac there is faster to process that many documents...

Regarding directly using the DeflaterOutputStream: I do this to be able to 
*stream* compress the image data, so that the image data is compressed row by 
row. This leads to less memory used while compressing and better CPU cache 
usage (as the data of one row is still in cache when it's fed to zip, in 
opposite to first encode the image in one big byte buffer (which means doubling 
the needed memory for the image) and then compressing it at the end. Of course 
when constructing a DeflateOutputStream it should use the 
Filter.SYSPROP_DEFLATELEVEL setting. I've refactored the code for this into its 
own method in Filter.getCompressionLevel(). See the updated patch. 
[^lossless_predictor_based_imageencoding_v2.patch] - This is still work in 
progress, not to be commited yet (need to analyze those image mismatches in the 
govdocs first)


was (Author: rototor):
The Govdocs corpus is a little bit big ... I'll let those tests run in the 
office on Monday, as my iMac there is faster to process that many documents...

Regarding directly using the DeflaterOutputStream: I do this to be able to 
*stream* compress the image data, so that the image data is compressed row by 
row. This leads to less memory used while compressing and better CPU cache 
usage (as the data of one row is still in cache when it's fed to zip, in 
opposite to first encode the image in one big byte buffer (which means doubling 
the needed memory for the image) and then compressing it at the end. Of course 
when constructing a DeflateOutputStream it should use the 
Filter.SYSPROP_DEFLATELEVEL setting. I've refactored the code for this into its 
own method in Filter.getCompressionLevel(). See the updated patch. 
[^lossless_predictor_based_imageencoding_v2.patch] - This is as still work in 
progress, not to be commited yet (need to analyze those image mismatches in the 
govdocs first)

> [PATCH]: Support simple lossless compression of 16 bit RGB images
> -
>
> Key: PDFBOX-4184
> URL: https://issues.apache.org/jira/browse/PDFBOX-4184
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Writing
>Affects Versions: 2.0.9
>Reporter: Emmeran Seehuber
>Priority: Minor
> Fix For: 2.0.10, 3.0.0 PDFBox
>
> Attachments: LoadGovdocs.java, 
> lossless_predictor_based_imageencoding.patch, 
> lossless_predictor_based_imageencoding_v2.patch, 
> pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, 
> png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf
>
>
> The attached patch add support to write 16 bit per component images 
> correctly. I've integrated a test for this here: 
> [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9]
> It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this 
> is what you usually get when you read a 16 bit PNG file.
> This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173].
> The patch is against 2.0.9, but should apply to 3.0.0 too.
> There is still some room for improvements when writing lossless images, as 
> the images are currently not efficiently encoded. I.e. you could use PNG 
> encodings to get a better compression. (By adding a COSName.DECODE_PARMS with 
> a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is 
> something for a later patch. It would also need another API, as there is a 
> tradeoff speed vs compression ratio. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images

2018-05-11 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16472558#comment-16472558
 ] 

Tilman Hausherr edited comment on PDFBOX-4184 at 5/11/18 7:50 PM:
--

I forgot to mention, we're planning a release soon, I prefer to wait until 
after the release before deciding to commit to 2.0.


was (Author: tilman):
I forgot to mention, we're planning a release soon, I prefer to wait until 
after the release.

> [PATCH]: Support simple lossless compression of 16 bit RGB images
> -
>
> Key: PDFBOX-4184
> URL: https://issues.apache.org/jira/browse/PDFBOX-4184
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Writing
>Affects Versions: 2.0.9
>Reporter: Emmeran Seehuber
>Priority: Minor
> Fix For: 2.0.10, 3.0.0 PDFBox
>
> Attachments: LoadGovdocs.java, 
> lossless_predictor_based_imageencoding.patch, 
> pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, 
> png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf
>
>
> The attached patch add support to write 16 bit per component images 
> correctly. I've integrated a test for this here: 
> [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9]
> It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this 
> is what you usually get when you read a 16 bit PNG file.
> This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173].
> The patch is against 2.0.9, but should apply to 3.0.0 too.
> There is still some room for improvements when writing lossless images, as 
> the images are currently not efficiently encoded. I.e. you could use PNG 
> encodings to get a better compression. (By adding a COSName.DECODE_PARMS with 
> a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is 
> something for a later patch. It would also need another API, as there is a 
> tradeoff speed vs compression ratio. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images

2018-05-11 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16472478#comment-16472478
 ] 

Tilman Hausherr edited comment on PDFBOX-4184 at 5/11/18 6:55 PM:
--

Please run the tool I just uploaded...

If a test fails, something is written to System.err and the two files are saved 
in the local directory.

I get a few "hits":

001/001229.png: images not equal
 001/001230.png: images not equal

and also some jpg images. Without the change, this doesn't happen. I suspect 
that the differences are minor, but IMHO there shouldn't be any at all...


was (Author: tilman):
Please run the tool I just uploaded... I get a few "hits":

001/001229.png: images not equal
001/001230.png: images not equal

and also some jpg images. Without the change, this doesn't happen. I suspect 
that the differences are minor, but IMHO there shouldn't be any at all...

> [PATCH]: Support simple lossless compression of 16 bit RGB images
> -
>
> Key: PDFBOX-4184
> URL: https://issues.apache.org/jira/browse/PDFBOX-4184
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Writing
>Affects Versions: 2.0.9
>Reporter: Emmeran Seehuber
>Priority: Minor
> Fix For: 2.0.10, 3.0.0 PDFBox
>
> Attachments: LoadGovdocs.java, 
> lossless_predictor_based_imageencoding.patch, 
> pdfbox_support_16bit_image_write.patch, png16-arrow-bad-no-smask.pdf, 
> png16-arrow-bad.pdf, png16-arrow-good-no-mask.pdf, png16-arrow-good.pdf
>
>
> The attached patch add support to write 16 bit per component images 
> correctly. I've integrated a test for this here: 
> [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9]
> It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this 
> is what you usually get when you read a 16 bit PNG file.
> This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173].
> The patch is against 2.0.9, but should apply to 3.0.0 too.
> There is still some room for improvements when writing lossless images, as 
> the images are currently not efficiently encoded. I.e. you could use PNG 
> encodings to get a better compression. (By adding a COSName.DECODE_PARMS with 
> a COSName.PREDICTOR == 15 and encoding the images as PNG). But this is 
> something for a later patch. It would also need another API, as there is a 
> tradeoff speed vs compression ratio. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[jira] [Comment Edited] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images

2018-04-07 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429420#comment-16429420
 ] 

Tilman Hausherr edited comment on PDFBOX-4184 at 4/7/18 3:32 PM:
-

Thanks... I'll commit this within the next few days... I managed to create such 
an Image (IrfanView says it has "64 BitsPerPixel") so we can also have a local 
test but I didn't manage to have a failure, i.e. a bad PDF like with [the image 
from your 
issue|https://user-images.githubusercontent.com/29379074/36145630-f304cd0e-10d7-11e8-942c-66eb8040be70.png]:
{code}
ColorModel colorModel = new 
ComponentColorModel(ColorSpace.getInstance(ColorSpace.CS_LINEAR_RGB),
true, false, Transparency.TRANSLUCENT, DataBuffer.TYPE_USHORT);
WritableRaster raster = 
Raster.createInterleavedRaster(DataBuffer.TYPE_USHORT, 256, 256, 4, null);
BufferedImage image = new BufferedImage(colorModel, raster, false, 
null);
for (int x = 0; x < image.getWidth(); ++x)
{
for (int y = 0; y < image.getHeight(); ++y)
{
if (x == y)
{
switch (x % 4)
{
case 0:
image.setRGB(x, y, 0x);
break;
case 1:
image.setRGB(x, y, 0xFF00FF00);
break;
case 2:
image.setRGB(x, y, 0xFFFF);
break;
case 3:
image.setRGB(x, y, 0x);
break;
}

}

}
}

PDDocument doc = new PDDocument();
PDPage page = new PDPage();
doc.addPage(page);
try (PDPageContentStream cs = new PDPageContentStream(doc, page))
{
cs.drawImage(LosslessFactory.createFromImage(doc, image), 0f, 
page.getMediaBox().getHeight() - image.getHeight());
}

{code}



was (Author: tilman):
Thanks... I'll commit this within the next few days... I managed to create such 
an Image (IrfanView says it has "64 BitsPerPixel") so we can also have a local 
test but I didn't manage to have a failure, i.e. a bad PDF like with the image 
from your issue:
{code}
ColorModel colorModel = new 
ComponentColorModel(ColorSpace.getInstance(ColorSpace.CS_LINEAR_RGB),
true, false, Transparency.TRANSLUCENT, DataBuffer.TYPE_USHORT);
WritableRaster raster = 
Raster.createInterleavedRaster(DataBuffer.TYPE_USHORT, 256, 256, 4, null);
BufferedImage image = new BufferedImage(colorModel, raster, false, 
null);
for (int x = 0; x < image.getWidth(); ++x)
{
for (int y = 0; y < image.getHeight(); ++y)
{
if (x == y)
{
switch (x % 4)
{
case 0:
image.setRGB(x, y, 0x);
break;
case 1:
image.setRGB(x, y, 0xFF00FF00);
break;
case 2:
image.setRGB(x, y, 0xFFFF);
break;
case 3:
image.setRGB(x, y, 0x);
break;
}

}

}
}

PDDocument doc = new PDDocument();
PDPage page = new PDPage();
doc.addPage(page);
try (PDPageContentStream cs = new PDPageContentStream(doc, page))
{
cs.drawImage(LosslessFactory.createFromImage(doc, image), 0f, 
page.getMediaBox().getHeight() - image.getHeight());
}

{code}


> [PATCH]: Support simple lossless compression of 16 bit RGB images
> -
>
> Key: PDFBOX-4184
> URL: https://issues.apache.org/jira/browse/PDFBOX-4184
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Writing
>Affects Versions: 2.0.9
>Reporter: Emmeran Seehuber
>Priority: Minor
> Fix For: 2.0.10, 3.0.0 PDFBox
>
> Attachments: pdfbox_support_16bit_image_write.patch
>
>
> The attached patch add support to write 16 bit per component images 
> correctly. I've integrated a test for this here: 
> [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9]
> It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this 
> is what you usually get when you read a 16 bit PNG file.
> This would also fix 

[jira] [Comment Edited] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images

2018-04-07 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429420#comment-16429420
 ] 

Tilman Hausherr edited comment on PDFBOX-4184 at 4/7/18 3:32 PM:
-

Thanks... I'll commit this within the next few days... I managed to create such 
an Image (IrfanView says it has "64 BitsPerPixel") so we can also have a local 
test but I didn't manage to have a failure, i.e. a bad PDF like with [the image 
from the github 
issue|https://user-images.githubusercontent.com/29379074/36145630-f304cd0e-10d7-11e8-942c-66eb8040be70.png]:
{code}
ColorModel colorModel = new 
ComponentColorModel(ColorSpace.getInstance(ColorSpace.CS_LINEAR_RGB),
true, false, Transparency.TRANSLUCENT, DataBuffer.TYPE_USHORT);
WritableRaster raster = 
Raster.createInterleavedRaster(DataBuffer.TYPE_USHORT, 256, 256, 4, null);
BufferedImage image = new BufferedImage(colorModel, raster, false, 
null);
for (int x = 0; x < image.getWidth(); ++x)
{
for (int y = 0; y < image.getHeight(); ++y)
{
if (x == y)
{
switch (x % 4)
{
case 0:
image.setRGB(x, y, 0x);
break;
case 1:
image.setRGB(x, y, 0xFF00FF00);
break;
case 2:
image.setRGB(x, y, 0xFFFF);
break;
case 3:
image.setRGB(x, y, 0x);
break;
}

}

}
}

PDDocument doc = new PDDocument();
PDPage page = new PDPage();
doc.addPage(page);
try (PDPageContentStream cs = new PDPageContentStream(doc, page))
{
cs.drawImage(LosslessFactory.createFromImage(doc, image), 0f, 
page.getMediaBox().getHeight() - image.getHeight());
}

{code}



was (Author: tilman):
Thanks... I'll commit this within the next few days... I managed to create such 
an Image (IrfanView says it has "64 BitsPerPixel") so we can also have a local 
test but I didn't manage to have a failure, i.e. a bad PDF like with [the image 
from your 
issue|https://user-images.githubusercontent.com/29379074/36145630-f304cd0e-10d7-11e8-942c-66eb8040be70.png]:
{code}
ColorModel colorModel = new 
ComponentColorModel(ColorSpace.getInstance(ColorSpace.CS_LINEAR_RGB),
true, false, Transparency.TRANSLUCENT, DataBuffer.TYPE_USHORT);
WritableRaster raster = 
Raster.createInterleavedRaster(DataBuffer.TYPE_USHORT, 256, 256, 4, null);
BufferedImage image = new BufferedImage(colorModel, raster, false, 
null);
for (int x = 0; x < image.getWidth(); ++x)
{
for (int y = 0; y < image.getHeight(); ++y)
{
if (x == y)
{
switch (x % 4)
{
case 0:
image.setRGB(x, y, 0x);
break;
case 1:
image.setRGB(x, y, 0xFF00FF00);
break;
case 2:
image.setRGB(x, y, 0xFFFF);
break;
case 3:
image.setRGB(x, y, 0x);
break;
}

}

}
}

PDDocument doc = new PDDocument();
PDPage page = new PDPage();
doc.addPage(page);
try (PDPageContentStream cs = new PDPageContentStream(doc, page))
{
cs.drawImage(LosslessFactory.createFromImage(doc, image), 0f, 
page.getMediaBox().getHeight() - image.getHeight());
}

{code}


> [PATCH]: Support simple lossless compression of 16 bit RGB images
> -
>
> Key: PDFBOX-4184
> URL: https://issues.apache.org/jira/browse/PDFBOX-4184
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Writing
>Affects Versions: 2.0.9
>Reporter: Emmeran Seehuber
>Priority: Minor
> Fix For: 2.0.10, 3.0.0 PDFBox
>
> Attachments: pdfbox_support_16bit_image_write.patch
>
>
> The attached patch add support to write 16 bit per component images 
> correctly. I've integrated a test for this here: 
> [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9]
> It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but 

[jira] [Comment Edited] (PDFBOX-4184) [PATCH]: Support simple lossless compression of 16 bit RGB images

2018-04-07 Thread Tilman Hausherr (JIRA)

[ 
https://issues.apache.org/jira/browse/PDFBOX-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429420#comment-16429420
 ] 

Tilman Hausherr edited comment on PDFBOX-4184 at 4/7/18 3:31 PM:
-

Thanks... I'll commit this within the next few days... I managed to create such 
an Image (IrfanView says it has "64 BitsPerPixel") so we can also have a local 
test but I didn't manage to have a failure, i.e. a bad PDF like with the image 
from your issue:
{code}
ColorModel colorModel = new 
ComponentColorModel(ColorSpace.getInstance(ColorSpace.CS_LINEAR_RGB),
true, false, Transparency.TRANSLUCENT, DataBuffer.TYPE_USHORT);
WritableRaster raster = 
Raster.createInterleavedRaster(DataBuffer.TYPE_USHORT, 256, 256, 4, null);
BufferedImage image = new BufferedImage(colorModel, raster, false, 
null);
for (int x = 0; x < image.getWidth(); ++x)
{
for (int y = 0; y < image.getHeight(); ++y)
{
if (x == y)
{
switch (x % 4)
{
case 0:
image.setRGB(x, y, 0x);
break;
case 1:
image.setRGB(x, y, 0xFF00FF00);
break;
case 2:
image.setRGB(x, y, 0xFFFF);
break;
case 3:
image.setRGB(x, y, 0x);
break;
}

}

}
}

PDDocument doc = new PDDocument();
PDPage page = new PDPage();
doc.addPage(page);
try (PDPageContentStream cs = new PDPageContentStream(doc, page))
{
cs.drawImage(LosslessFactory.createFromImage(doc, image), 0f, 
page.getMediaBox().getHeight() - image.getHeight());
}

{code}



was (Author: tilman):
Thanks... I'll commit this within the next few days... I managed to create such 
an image so we can also have a local test but I didn't manage to have a 
failure, i.e. a bad PDF like with the image from your issue:
{code}
ColorModel colorModel = new 
ComponentColorModel(ColorSpace.getInstance(ColorSpace.CS_LINEAR_RGB),
true, false, Transparency.TRANSLUCENT, DataBuffer.TYPE_USHORT);
WritableRaster raster = 
Raster.createInterleavedRaster(DataBuffer.TYPE_USHORT, 256, 256, 4, null);
BufferedImage image = new BufferedImage(colorModel, raster, false, 
null);
for (int x = 0; x < image.getWidth(); ++x)
{
for (int y = 0; y < image.getHeight(); ++y)
{
if (x == y)
{
switch (x % 4)
{
case 0:
image.setRGB(x, y, 0x);
break;
case 1:
image.setRGB(x, y, 0xFF00FF00);
break;
case 2:
image.setRGB(x, y, 0xFFFF);
break;
case 3:
image.setRGB(x, y, 0x);
break;
}

}

}
}

PDDocument doc = new PDDocument();
PDPage page = new PDPage();
doc.addPage(page);
try (PDPageContentStream cs = new PDPageContentStream(doc, page))
{
cs.drawImage(LosslessFactory.createFromImage(doc, image), 0f, 
page.getMediaBox().getHeight() - image.getHeight());
}

{code}


> [PATCH]: Support simple lossless compression of 16 bit RGB images
> -
>
> Key: PDFBOX-4184
> URL: https://issues.apache.org/jira/browse/PDFBOX-4184
> Project: PDFBox
>  Issue Type: Improvement
>  Components: Writing
>Affects Versions: 2.0.9
>Reporter: Emmeran Seehuber
>Priority: Minor
> Fix For: 2.0.10, 3.0.0 PDFBox
>
> Attachments: pdfbox_support_16bit_image_write.patch
>
>
> The attached patch add support to write 16 bit per component images 
> correctly. I've integrated a test for this here: 
> [https://github.com/rototor/pdfbox-graphics2d/commit/8bf089cb74945bd4f0f15054754f51dd5b361fe9]
> It only supports 16-Bit TYPE_CUSTOM with DataType == USHORT images - but this 
> is what you usually get when you read a 16 bit PNG file.
> This would also fix [https://github.com/danfickle/openhtmltopdf/issues/173].
> The patch is against 2.0.9, but should apply to 3.0.0 too.
> There is still some room for improvements when