> I am currently using iText to create and copy 
> PDFs. I have some pdfs that have blank pages 
> (no text, no image), and I would like to detect 
> and remove those pages when I come across them.  
> If I can detect them, then I can remove them, 
> however I've had trouble detecting them. Any 
> suggestions?

PdfImported page is a subclass of PdfContentByte.  As such, you can get access 
to its internal ByteBuffer through getInternalBuffer().

If your pages are REALLY empty, then this will return an empty buffer.  I don't 
think the buffer will ever be null, but it never hurts to check.  The bad news 
is that some applications will draw a white rectangle over the empty page... 
leading to a /visibly/ empty page where getInternalBuffer().size() > 0.  At 
this point you have a couple options:

1) If all your PDFs are coming from the same source, they'll probably all have 
the same 'empty' format, which you can examine through the ByteBuffer.

2) If all you're worried about are text and graphics (in other words, there's 
no line art), you can examine the page's resource dictionary... though you've 
got to go through a couple extra steps to get there:

---

bool noFontsOrImages = true;
try {
  PdfDictionary pageDict = reader.getPageN(myPageNum)
  // We need to examine the resource dictionary for /Font or 
  // /XObject keys.  If either are present, they're almost
  // certainly actually used on the page -> not blank.
  PdfDictionary resDict = (PdfDictionary) pageDict.get( PdfName.RESOURCES );
  if (resDict != null) {
    noFontsOrImages = resDict.get( PdfName.FONT ) == null &&
                      resDict.get( PdfName.XOBJECT ) == null;
  } 
} catch (IOException ioe) { //getPFX() can throw an ioe
  // cry to moma.
}

---

If all this is still giving you incorrect results, you might have to go with 
something like GhostScript (a PDF renderer) and examine it's output... 
expensive both in CPU cycles and memory.

--Mark Storer 
  Senior Software Engineer 
  Cardiff Software 
#include <disclaimer> 
typedef std::Disclaimer<Cardiff> DisCard; 


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

Reply via email to