Hello again list!
On Fri, 24 Feb 2006, Leonard Rosenthol wrote:
At 09:39 AM 2/24/2006, Petter Nyström wrote:
One of the basic things I want to do is to pull images out of a PDF,
let third-party software modify these images and then plug the modified
images back into the PDF without changing the document layout. Can
iText do this?
No.
In fact, I am not aware of ANY non-commercial library that will
provide that level of functionality (specifically the "putting back"
part - the extraction is easy).
You could use iText to extract the images, though you'd also need
a VERY detailed understanding of image handling and color management in
order to make sure that the extracted data was in the correct form.
On Sat, 25 Feb 2006, Leonard Rosenthol wrote:
Image data in PDF is either in JPEG/JFIF format (which can just
be written out to a file) - OR it is simply an array of "pixels" in the
specified colorspace. So in the latter case (which is probably the more
common), you would need to transform the data into something usuable in
JPEG, TIFF, etc. This may include not only file format, but also
colorspace handling since PDF supports 11 colorspaces while JPEG (for
example) only does 2.
I have been trying to accomplish this - to extract an image from a test
document by using iText. And to begin with I am assuming that the image is
stored as a JPEG. (Is there, by the way, a way of reading out the storage
format from the PDF document? I used pdfimages from the xpdf package to
extract the images from a test document - and it wrote them as JPEG, but
that may not be a guarantee for the images being stored as JPEG in the
document, I suppose.)
The short version of my problem is that I do not realize how I should be
attacking even this simplified task. If there are any good examples doing
something similar to this, a pointer would be great!
The longer version is that I am having difficulties understanding how
iText works under the hood. It reads a PDF document, but how is the
document data stored in the program? By studying a bit of the iText source
code it seems as if several of the PDF objects, especially some PDF
dictionaries, are read and stored as tailormade Java data structures.
While on the other hand, it looks as if the PDF document is stored in its
entirety as a plain byte buffer. I am just not getting things straight
here. =)
Could someone give me a quick read-up on how iText does its stuff behind
the scenes? I think that'd be crucial understanding for me if I am going
to use iText to do my work.
Again, thanks a lot for all the help!
Regards,
Petter Nyström
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions