RE: [iText-questions] Can iText replace images etc.

Paulo Soares Mon, 06 Mar 2006 07:54:22 -0800

The first thing you have to understand is that nothing is simple when dealing 
with PDFs. Read chapter 3 and 4.8 of the pdf reference until you know it by 
heart (no kidding). iText has classes, like PdfArray or PdfDictionary, that 
mimic the structures in the pdf reference. To get an image:


- get the page where the image is with PdfReader.getPageN()
- get the resource dictionary, the xobject dictionary and finnaly the image

You'll have to apply the filters to get the image or in the case of jpeg images 
just copy the stream bytes.

Paulo 

> -----Original Message-----
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On 
> Behalf Of Petter Nyström
> Sent: Monday, March 06, 2006 3:22 PM
> To: [email protected]
> Subject: Re: [iText-questions] Can iText replace images etc.
> 
> Hello again list!
> 
> On Fri, 24 Feb 2006, Leonard Rosenthol wrote:
> 
> > At 09:39 AM 2/24/2006, Petter Nyström wrote:
> >
> >> One of the basic things I want to do is to pull images out 
> of a PDF, 
> >> let third-party software modify these images and then plug 
> the modified 
> >> images back into the PDF without changing the document layout. Can 
> >> iText do this?
> >
> >        No.
> >
> >        In fact, I am not aware of ANY non-commercial 
> library that will 
> > provide that level of functionality (specifically the 
> "putting back" 
> > part - the extraction is easy).
> >
> >        You could use iText to extract the images, though 
> you'd also need 
> > a VERY detailed understanding of image handling and color 
> management in 
> > order to make sure that the extracted data was in the correct form.
> 
> On Sat, 25 Feb 2006, Leonard Rosenthol wrote:
> 
> >        Image data in PDF is either in JPEG/JFIF format 
> (which can just 
> > be written out to a file) - OR it is simply an array of 
> "pixels" in the 
> > specified colorspace.  So in the latter case (which is 
> probably the more 
> > common), you would need to transform the data into 
> something usuable in 
> > JPEG, TIFF, etc.  This may include not only file format, but also 
> > colorspace handling since PDF supports 11 colorspaces while 
> JPEG (for 
> > example) only does 2.
> 
> I have been trying to accomplish this - to extract an image 
> from a test 
> document by using iText. And to begin with I am assuming that 
> the image is 
> stored as a JPEG. (Is there, by the way, a way of reading out 
> the storage 
> format from the PDF document? I used pdfimages from the xpdf 
> package to 
> extract the images from a test document - and it wrote them 
> as JPEG, but 
> that may not be a guarantee for the images being stored as 
> JPEG in the 
> document, I suppose.)
> 
> The short version of my problem is that I do not realize how 
> I should be 
> attacking even this simplified task. If there are any good 
> examples doing 
> something similar to this, a pointer would be great!
> 
> The longer version is that I am having difficulties understanding how 
> iText works under the hood. It reads a PDF document, but how is the 
> document data stored in the program? By studying a bit of the 
> iText source 
> code it seems as if several of the PDF objects, especially some PDF 
> dictionaries, are read and stored as tailormade Java data structures. 
> While on the other hand, it looks as if the PDF document is 
> stored in its 
> entirety as a plain byte buffer. I am just not getting things 
> straight 
> here.  =)
> 
> Could someone give me a quick read-up on how iText does its 
> stuff behind 
> the scenes? I think that'd be crucial understanding for me if 
> I am going 
> to use iText to do my work.
> 
> Again, thanks a lot for all the help!
> 
> Regards,
> 
> Petter Nyström
> 
> 
> -------------------------------------------------------
> This SF.Net email is sponsored by xPML, a groundbreaking 
> scripting language
> that extends applications into web and mobile media. Attend 
> the live webcast
> and join the prime developer group breaking into this new 
> coding territory!
> http://sel.as-us.falkag.net/sel?cmd=k&kid0944&bid$1720&dat1642
> _______________________________________________
> iText-questions mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/itext-questions
> 


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

RE: [iText-questions] Can iText replace images etc.

Reply via email to