Hello again list!

On Fri, 24 Feb 2006, Leonard Rosenthol wrote:

At 09:39 AM 2/24/2006, Petter Nyström wrote:

One of the basic things I want to do is to pull images out of a PDF, let third-party software modify these images and then plug the modified images back into the PDF without changing the document layout. Can iText do this?

       No.

In fact, I am not aware of ANY non-commercial library that will provide that level of functionality (specifically the "putting back" part - the extraction is easy).

You could use iText to extract the images, though you'd also need a VERY detailed understanding of image handling and color management in order to make sure that the extracted data was in the correct form.

On Sat, 25 Feb 2006, Leonard Rosenthol wrote:

Image data in PDF is either in JPEG/JFIF format (which can just be written out to a file) - OR it is simply an array of "pixels" in the specified colorspace. So in the latter case (which is probably the more common), you would need to transform the data into something usuable in JPEG, TIFF, etc. This may include not only file format, but also colorspace handling since PDF supports 11 colorspaces while JPEG (for example) only does 2.

I have been trying to accomplish this - to extract an image from a test document by using iText. And to begin with I am assuming that the image is stored as a JPEG. (Is there, by the way, a way of reading out the storage format from the PDF document? I used pdfimages from the xpdf package to extract the images from a test document - and it wrote them as JPEG, but that may not be a guarantee for the images being stored as JPEG in the document, I suppose.)

The short version of my problem is that I do not realize how I should be attacking even this simplified task. If there are any good examples doing something similar to this, a pointer would be great!

The longer version is that I am having difficulties understanding how iText works under the hood. It reads a PDF document, but how is the document data stored in the program? By studying a bit of the iText source code it seems as if several of the PDF objects, especially some PDF dictionaries, are read and stored as tailormade Java data structures. While on the other hand, it looks as if the PDF document is stored in its entirety as a plain byte buffer. I am just not getting things straight here. =)

Could someone give me a quick read-up on how iText does its stuff behind the scenes? I think that'd be crucial understanding for me if I am going to use iText to do my work.

Again, thanks a lot for all the help!

Regards,

Petter Nyström


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

Reply via email to