> I am currently using iText to create and copy
> PDFs. I have some pdfs that have blank pages
> (no text, no image), and I would like to detect
> and remove those pages when I come across them.
> If I can detect them, then I can remove them,
> however I've had trouble detecting them. Any
> suggestions?
PdfImported page is a subclass of PdfContentByte. As such, you can get access
to its internal ByteBuffer through getInternalBuffer().
If your pages are REALLY empty, then this will return an empty buffer. I don't
think the buffer will ever be null, but it never hurts to check. The bad news
is that some applications will draw a white rectangle over the empty page...
leading to a /visibly/ empty page where getInternalBuffer().size() > 0. At
this point you have a couple options:
1) If all your PDFs are coming from the same source, they'll probably all have
the same 'empty' format, which you can examine through the ByteBuffer.
2) If all you're worried about are text and graphics (in other words, there's
no line art), you can examine the page's resource dictionary... though you've
got to go through a couple extra steps to get there:
---
bool noFontsOrImages = true;
try {
PdfDictionary pageDict = reader.getPageN(myPageNum)
// We need to examine the resource dictionary for /Font or
// /XObject keys. If either are present, they're almost
// certainly actually used on the page -> not blank.
PdfDictionary resDict = (PdfDictionary) pageDict.get( PdfName.RESOURCES );
if (resDict != null) {
noFontsOrImages = resDict.get( PdfName.FONT ) == null &&
resDict.get( PdfName.XOBJECT ) == null;
}
} catch (IOException ioe) { //getPFX() can throw an ioe
// cry to moma.
}
---
If all this is still giving you incorrect results, you might have to go with
something like GhostScript (a PDF renderer) and examine it's output...
expensive both in CPU cycles and memory.
--Mark Storer
Senior Software Engineer
Cardiff Software
#include <disclaimer>
typedef std::Disclaimer<Cardiff> DisCard;
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions