There are a number of potential issues with this technique. And, to add to the
confusion, sometimes it might work just fine.
1) /Type /Image objects can use just about any type (or combination of types in
some cases) of compression supported by PDF:
ASCIIHEX (generally combined with something else)
ASCII85 (ditto)
LZWDecode
FlateDecode (zip)
RunLengthDecode
CCITTFaxDecode (fax machine)
JPIG2Decode
DCTDecode (jpeg)
JPXDecode
2) pixel images can be described "in line" within a page's content stream.
Your algorithm, even if it adjusted for the various possible stream filters,
would never find these images.
3) Things that look like images but are actually line art or even characters in
some strange font.
4) Some images may be "striped": chopped up into smaller parts and drawn
seamlessly next to each other for various reasons. Even if you successfully
extracted the pieces, you currently have no way of assembling them into the
larger whole (save "manually").
Your best bet for all the "low hanging fruit" images (discrete XObject Images)
is going to be:
1) Extract the filtered bytes using getStreamBytes (not the Raw version)
2) Get the width/height/bits-per-component/number-of-colors out of the XObject
Image
3) Pass all that data to some image API and use it to write out the image in
your chosen format.
For the rest? That's a long row to hoe.
PS: You can compare PdfNames directly.
pdfsubtype.equals(PdfName.IMAGE)
instead of:
pdfsubtype.toString().equals(PdfName.IMAGE.toString())
--Mark Storer
Senior Software Engineer
Cardiff.com
#include <disclaimer>
typedef std::Disclaimer<Cardiff> DisCard;
-----Original Message-----
From: [email protected] [mailto:[email protected]]
Sent: Tuesday, August 25, 2009 2:00 AM
To: [email protected]
Subject: [iText-questions] Images extraction.
Hi Team,
I am trying to get the images from PDF file, I am using he below API to do it
but I am getting 'Invalid Image'
PdfReader chartReader = new PdfReader("C:\\Raj\\Test\\signtest.pdf");
for (int i = 0; i < chartReader.getXrefSize(); i++) {
PdfObject pdfobj = chartReader.getPdfObject(i);
if (pdfobj != null && pdfobj.isStream()) {
PdfStream stream = (PdfStream) pdfobj;
PdfObject pdfsubtype = stream.get(PdfName.SUBTYPE);
//System.out.println("Stream subType: " +
pdfsubtype);
if (pdfsubtype != null &&
pdfsubtype.toString().equals(PdfName.IMAGE.toString())) {
writeToFile(stream, i);
}
}
}
private static void writeToFile(PdfStream stream, int i) throws IOException {
byte[] image = PdfReader.getStreamBytesRaw((PRStream) stream);
System.out.println("Image bytes length:"+image.length);
//byte[] image = PdfReader.getStreamBytes((PRStream) stream);
FileOutputStream fw = new
FileOutputStream("C:\\Raj\\Test\\beastpics\\" + i + ".jpg");
fw.write(image);
fw.flush();
fw.close();
}
Can you please help to fix the problem? Please find the attached pdf file and
output images file.
Thanks for your support.
Rajasekhar
BestBuy Accenture IDC | Desk +91 80 418 63392 | Mobile +91 94482 26670 | AIM:
paletiraja
This message is for the designated recipient only and may contain privileged,
proprietary, or otherwise private information. If you have received it in
error, please notify the sender immediately and delete the original. Any other
use of the email by you is prohibited.
------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now. http://p.sf.net/sfu/bobj-july
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions
Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask questions:
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/