Re: [iText-questions] Images extraction.

Mark Storer Tue, 25 Aug 2009 09:48:02 -0700

There are a number of potential issues with this technique.  And, to add to the 
confusion, sometimes it might work just fine.
 
1) /Type /Image objects can use just about any type (or combination of types in 
some cases) of compression supported by PDF:
ASCIIHEX (generally combined with something else)
ASCII85 (ditto)
LZWDecode
FlateDecode (zip)
RunLengthDecode
CCITTFaxDecode (fax machine)
JPIG2Decode
DCTDecode (jpeg)
JPXDecode
 
2) pixel images can be described "in line" within a page's content stream.  
Your algorithm, even if it adjusted for the various possible stream filters, 
would never find these images.
 
3) Things that look like images but are actually line art or even characters in 
some strange font.
 
4) Some images may be "striped": chopped up into smaller parts and drawn 
seamlessly next to each other for various reasons.  Even if you successfully 
extracted the pieces, you currently have no way of assembling them into the 
larger whole (save "manually").
 
 
Your best bet for all the "low hanging fruit" images (discrete XObject Images) 
is going to be:
1) Extract the filtered bytes using getStreamBytes (not the Raw version)
2) Get the width/height/bits-per-component/number-of-colors out of the XObject 
Image
3) Pass all that data to some image API and use it to write out the image in 
your chosen format.
 
For the rest?  That's a long row to hoe.
 
PS: You can compare PdfNames directly.  
 
pdfsubtype.equals(PdfName.IMAGE)
 
instead of: 
 
pdfsubtype.toString().equals(PdfName.IMAGE.toString())
 
 
 
--Mark Storer 
  Senior Software Engineer 
  Cardiff.com


#include <disclaimer> 
typedef std::Disclaimer<Cardiff> DisCard; 

-----Original Message-----
From: [email protected] [mailto:[email protected]]
Sent: Tuesday, August 25, 2009 2:00 AM
To: [email protected]
Subject: [iText-questions] Images extraction.



Hi Team,

I am trying to get the images from PDF file, I am using he below API to do it 
but I am getting 'Invalid Image'

 

PdfReader chartReader = new PdfReader("C:\\Raj\\Test\\signtest.pdf");

                        for (int i = 0; i < chartReader.getXrefSize(); i++) {

                          PdfObject pdfobj = chartReader.getPdfObject(i);

                          if (pdfobj != null && pdfobj.isStream()) {

                            PdfStream stream = (PdfStream) pdfobj;

                            PdfObject pdfsubtype = stream.get(PdfName.SUBTYPE);

                            //System.out.println("Stream subType: " + 
pdfsubtype);

                            if (pdfsubtype != null && 
pdfsubtype.toString().equals(PdfName.IMAGE.toString())) {

                              writeToFile(stream, i);

                            }

                          }

                        }

 

private static void writeToFile(PdfStream stream, int i) throws IOException {

              byte[] image = PdfReader.getStreamBytesRaw((PRStream) stream);

              System.out.println("Image bytes length:"+image.length);

              //byte[] image = PdfReader.getStreamBytes((PRStream) stream);

              FileOutputStream fw = new 
FileOutputStream("C:\\Raj\\Test\\beastpics\\" + i + ".jpg");

              fw.write(image);

              fw.flush();

              fw.close();

            }

 

Can you please help to fix the problem? Please find the attached pdf file and 
output images file.

 

Thanks for your support.

 

 

Rajasekhar
BestBuy Accenture IDC | Desk +91 80 418 63392 | Mobile +91 94482 26670 | AIM: 
paletiraja 

 



This message is for the designated recipient only and may contain privileged, 
proprietary, or otherwise private information. If you have received it in 
error, please notify the sender immediately and delete the original. Any other 
use of the email by you is prohibited.

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july

_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Re: [iText-questions] Images extraction.

Reply via email to