Re: PDF's, TIFF's and JAI, oh my!

Jeremias Maerki Thu, 16 Apr 2009 05:14:13 -0700

On 16.04.2009 13:25:26 Daniel Wilson wrote:
> Yes, AI beginning with version 10 is PDF.  That's what brought me to PDFBox
> in the first place ... 3 years ago I think.


Ah!

> What is the PDFDebugger tool you're using?  That sounds like a tremendous
> help!

The built-in org.apache.pdfbox.PDFDebugger. :-) Helped me a number of
times already.

> FlateDecode with predictor 15 ... right I see that.  The "getColorSpace()
> returned NULL" is from PDXObjectImage.getColorSpace().  There's a branch
> there for a null CS when the filter is CCITTFAX_DECODE.  In that case it
> assumes PDDeviceGray.  I tried the same thing, but just got exceptions so
> thought I was headed down the wrong road.
> 
> Do you have any ideas about what should be there?

I'm not sure that PDDeviceGray is actually right here. After all, the
image references a Separation color space. Assuming DeviceGray for CCITT
assumes that all bi-level images are black&white. The image in your
example could just as well bit CCITT compressed. The first thing I'd do
is try to get the XObject to report the right PDF color space, i.e.
PDSeparation. For rendering the PDF, that obviously doesn't help, yet. I
suspect that you may have to write a subclass of
java.awt.color.ColorSpace that takes a PDSeparation and implements the
abstract methods on ColorSpace based on the tintTransform function. At
least that seems to be the clean approach. Not sure if there's a
short-cut, because only that function lets you know how to convert the
zeroes and ones of the bitmap to their effective color values (in sRGB
or XYZ).

Anyway, a bi-level image in Java is usually using the IndexColorModel
with two entries in the lookup table. However, I think IndexColorModel
only supports sRGB color, not arbitrary color components. So you may
actually be able to skip the ColorSpace subclass and calculate the sRGB
values for 0 and 1 directly. However, that only works as long as you
work with an image with less than 8 bits per sample. As soon as the
image uses at least 8 bits a color lookup table is inefficient and the
ColorSpace subclass comes back to be useful.

This all looks very much like non-trivial stuff. I'm no color expert so
I'm not sure if I got the above about right. But as first step, I'd try
to create an IndexColorModel for black&white with a gray color space to
see if you can at least get the bitmap displayed, even if the colors
will be wrong. From there, you can probably try to ease closer to the
final solution.

HTH & good luck!


> Thanks!
> 
> Daniel Wilson
> 
> On Thu, Apr 16, 2009 at 3:04 AM, Jeremias Maerki 
> <d...@jeremias-maerki.ch>wrote:
> 
> > Hehe, I didn't make the link that a *.ai file could also be a PDF file.
> > So now I know where I have to look. I learned something myself just now.
> >
> > So, I opened ArchiveRGB.ai with the PDFDebugger and took a look at the
> > image contained in the file. There are two images, both of which are
> > 1bit black/white images which are configured as image masks (which I
> > find strange). The images may well come from a TIFF file originally but
> > what ends up in the PDF has nothing to do with TIFF. The compression
> > used is not even T.4 or T.6 (the most common compression scheme for
> > bi-level images), the FlateDecode filter with predictor 15 (PNG optimum)
> > is used. TIFF 6.0 doesn't even support that compression type. So Adobe
> > Illustrator loaded the TIFF file but put it in the PDF as something else.
> > Predictor 15 should be supported by PDFBox if I interpret the source
> > code correctly. So the problem to display that PDF probably doesn't lie
> > with decompressing the image but with painting it. At any rate, I
> > suggest you should not get yourself distracted by the fact the original
> > image might have been a TIFF once. Another problem (and probably the key)
> > will be the "getColorSpace() returned NULL" error on the log. The two
> > images seem to be using a Separation color space, which means that the
> > black pixels in the bi-level images are not painted in black but in the
> > color specified in the separation color space. I'd start looking in
> > PDPixelMap and why it gets null as the color space. HTH
> >
> > On 16.04.2009 02:31:19 Daniel Wilson wrote:
> > > Thanks for the explanation, Jeremias, but I'm not sure you're correct.
> > >
> > > Here's what the customer who created the artwork told me:
> > >
> > > Daniel,
> > > To your question below, are you are asking "what process is used" for the
> > > background color that you are displaying as bright lime green? That is a
> > > normal CMYK spot color as far as the background goes. The next layer is
> > the
> > > <text> that is a dark green color and on top of it is a tiff scan to give
> > it
> > > that faded look. Then the final layer is the text in white.
> > > Matt
> > >
> > > That helps I think, Matt.
> > >
> > > I think the "faded look" is what I'm having trouble with.  I'm seeing a
> > > faded look on another one that almost looks like water droplets or
> > > something.
> > >
> > > Is that also a tiff scan?
> > >
> > > On that tiff scan, are you using that tiff semi-transparently?  I think
> > I'm
> > > either not using the tiff, setting it to fully transparent so it's not
> > seen
> > > at all, or moving it behind other layers.
> > >
> > > Thanks for the help.
> > >
> > > Daniel Wilson
> > >
> > > Yes I believe we have a scan that looks like water droplets or something
> > > similar, we have about every type of scan possible. Yes I'm sure the scan
> > > has to be transparent to give it that affect.
> > >
> > > So ... they claim they are putting a TIFF into the PDF file.  And ...
> > isn't
> > > that what PDInlinedImage and PDXObjectImage are all about?
> > >
> > > I have gotten permission and posted another example (red instead of
> > green)
> > > in the trunk\test\input\rendering folder as ArchiveRGB.ai.
> > >
> > > Thanks!
> > >
> > > Daniel Wilson
> > >
> > >
> > > On Wed, Apr 15, 2009 at 4:53 PM, Jeremias Maerki <d...@jeremias-maerki.ch
> > >wrote:
> > >
> > > > Daniel, I think there's a misunderstanding. PDF doesn't contain TIFF
> > > > files. And not PNGs. In a way you could say PDFs can contain JPEGs as
> > > > the DCTDecode filter handles practically the same as a raw JPEG file.
> > In
> > > > FOP we can basically embed a baseline JPEG file 1:1 without
> > > > decompression in a PDF. But the same is not true for TIFF and PNG. What
> > > > PDF uses are the PNG predictors to increase image compression over
> > plain
> > > > deflate. But that's not the same as PNG. I've tried to embed PNGs in
> > > > PDFs without decompressing them in FOP and I didn't manage for some
> > > > reason.
> > > >
> > > > By transparent TIFF, you mean black/white 1bit images. Is that right?
> > > > When you're talking about TIFF, are you not rather talking about the
> > > > CCITTFaxDecode filter which uses the compression algorithms defined in
> > > > the ITU T.4 and T.6 specifications (CCITT Fax Group 3 and 4)? Like PDF,
> > > > TIFF uses those algorithms, but that's not the same as embedding TIFF.
> > > > In FOP, we can transfer CCITT encoded image data extracted from TIFF
> > > > into PDFs without decompression, much like JPEG data.
> > > >
> > > > I've just had a closer look at CCITTFaxDecodeFilter in PDFBox. If I
> > > > interpret the code correctly, it actually just embeds the stream data
> > in
> > > > a TIFF wrapper which is loaded (now by ImageIO?) somewhere else. Not
> > > > what I expected. This was probably a work-around to make use of JAI's
> > > > codec for this kind of image. I guess what you're really looking for is
> > > > a decompressor (and eventually a compressor) for ITU T.4 and T.6.
> > > >
> > > > A suitably licensed decompressor can be found in Apache XML Graphics
> > > > Commons:
> > > >
> > > >
> > http://svn.apache.org/viewvc/xmlgraphics/commons/trunk/src/java/org/apache/xmlgraphics/image/codec/tiff/TIFFFaxDecoder.java?view=markup
> > > >
> > > > This could be integrated into an implementation of the CCITTFaxDecode
> > > > filter in PDFBox.
> > > >
> > > > For the compression side, I'm currently working on a T.4/T.6 compressor
> > > > but that's not finished, yet. I need that for FOP's PDF, PS, AFP and
> > > > TIFF output. I'm implementing it as an OutputStream subclass so it can
> > > > easily be integrated in Sanselan, PDFBox or whatever. The only two
> > > > problems left is finding the right place to put it in the end and for
> > me
> > > > to find time to finish it.
> > > >
> > > > BTW, Sanselan doesn't have a CCITT/T.4/T.6 implementation, yet, so it
> > > > won't be a help right now.
> > > >
> > > > So if I got this right, a full TIFF codec is only needed if you wanted
> > > > PDFBox to be able to read TIFFs when embedding them in a new PDF or
> > when
> > > > you extract bitmaps from a PDF and want to save them as external image
> > > > files. For PDF viewing, you only need the T.4/T.6 decompressor.
> > > >
> > > > I hope I'm making sense.
> > > >
> > > > On 15.04.2009 21:52:25 Daniel Wilson wrote:
> > > > > Some PDF's have transparent TIFF's in them.
> > > > >
> > > > > They come into the PDXObject arena ... as a PDPixelMap.  But there we
> > are
> > > > > best prepared to handle JPEG's and PNG's.
> > > > >
> > > > > Most sources on rendering a TIFF in Java say to use JAI.  Someone
> > (Jukka
> > > > I
> > > > > think) went to a good deal of trouble to excise JAI from PDFBox due
> > to
> > > > > licensing restrictions.
> > > > >
> > > > > Lizardworks' TIFF library is about 10 years old, lacks Deflate
> > > > > decompression, and is licensed under the Library GPL.
> > > > > http://www.lizardworks.com/libs.html
> > > > >
> > > > > So I don't think it is an option.
> > > > >
> > > > > The TIFF spec is 121 pages long in its own right.
> > > > > http://partners.adobe.com/public/developer/en/tiff/TIFF6.pdf
> > > > > That's a lot simpler than PDF, but doing our own implementation would
> > be
> > > > a
> > > > > non-trivial undertaking.
> > > > >
> > > > > Any ideas on how to procede?
> > > > >
> > > > > Thanks.
> > > > >
> > > > > Daniel Wilson
> > > >
> > > >
> > > >
> > > >
> > > > Jeremias Maerki
> > > >
> > > >
> >
> >
> >
> >
> > Jeremias Maerki
> >
> >




Jeremias Maerki

Re: PDF's, TIFF's and JAI, oh my!

Reply via email to