Re: [iText-questions] Can iText replace images etc.

Petter Nyström Mon, 06 Mar 2006 07:22:09 -0800

Hello again list!

On Fri, 24 Feb 2006, Leonard Rosenthol wrote:

At 09:39 AM 2/24/2006, Petter Nyström wrote:
One of the basic things I want to do is to pull images out of a PDF,let third-party software modify these images and then plug the modifiedimages back into the PDF without changing the document layout. CaniText do this?
       No.
In fact, I am not aware of ANY non-commercial library that willprovide that level of functionality (specifically the "putting back"part - the extraction is easy).
You could use iText to extract the images, though you'd also needa VERY detailed understanding of image handling and color management inorder to make sure that the extracted data was in the correct form.


On Sat, 25 Feb 2006, Leonard Rosenthol wrote:

Image data in PDF is either in JPEG/JFIF format (which can justbe written out to a file) - OR it is simply an array of "pixels" in thespecified colorspace. So in the latter case (which is probably the morecommon), you would need to transform the data into something usuable inJPEG, TIFF, etc. This may include not only file format, but alsocolorspace handling since PDF supports 11 colorspaces while JPEG (forexample) only does 2.

I have been trying to accomplish this - to extract an image from a testdocument by using iText. And to begin with I am assuming that the image isstored as a JPEG. (Is there, by the way, a way of reading out the storageformat from the PDF document? I used pdfimages from the xpdf package toextract the images from a test document - and it wrote them as JPEG, butthat may not be a guarantee for the images being stored as JPEG in thedocument, I suppose.)

The short version of my problem is that I do not realize how I should beattacking even this simplified task. If there are any good examples doingsomething similar to this, a pointer would be great!

The longer version is that I am having difficulties understanding howiText works under the hood. It reads a PDF document, but how is thedocument data stored in the program? By studying a bit of the iText sourcecode it seems as if several of the PDF objects, especially some PDFdictionaries, are read and stored as tailormade Java data structures.While on the other hand, it looks as if the PDF document is stored in itsentirety as a plain byte buffer. I am just not getting things straighthere. =)

Could someone give me a quick read-up on how iText does its stuff behindthe scenes? I think that'd be crucial understanding for me if I am goingto use iText to do my work.


Again, thanks a lot for all the help!

Regards,

Petter Nyström


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

Re: [iText-questions] Can iText replace images etc.

Reply via email to