Re: [iText-questions] Modify Content

Mark Storer Fri, 04 Aug 2006 11:36:37 -0700

> because
> the image was added into a PDF file by myself, I know
> exactly where I put there and all the other infomation
> about the image, so is those enough to enable me to
> delete the image later with iText?


If you're feeling a bit brave, yes... or if your PDF-Fu is Mighty.  But if that 
were the case, you wouldn't need to ask.  Read on Brave One.

Content added by your iText app (so long as no other program has made changes) 
will be the first and/or last content streams in a given page's /Contents array.

There's an example called "AddWatermarkPageNumbers" on the "iText by Example" 
page: http://itextdocs.lowagie.com/tutorial/.  If you'll have a look at the 
output PDF in a text editor (Or a tool like windjack.com's PDF Can Opener in 
Acrobat) you'll see that there are three content streams in each page's 
/Contents array.  The first is the 'under' content, the second is the original 
content stream (and there can be more than one stream from the original), and 
the last is the 'over' content.

If all you've added was that image, removing it is a 'simple' matter of 
deleting the appropriate content stream from the /Contents array.  You'll have 
to dig around in the PDF object model... using dictionaries, arrays and streams 
in this case:

----

// this is all off the cuff, so might need some tweaking.
PdfDictionary pageDict = myReader.getPageN( pageNum );
PdfObject pageCont = pageDict.get( PdfName.CONTENTS );
if (pageCont.getType() == PdfObject.ARRAY) {
  PdfArray contArray = (PdfArray)pageCont;
  // how strange.  PdfArrays dont have any 'remove' methods, 
  // but they do expose their internal ArrayList. ;)
  ArrayList contGuts = contArray.getArrayList();
  if (arrayLen > 0 && killOverContent) {
    contGuts.remove( arrayLen - 1 );
  }
  if (arrayLen > 0 && killUnderContent) {
    contGuts.remove( 0 );
  }
}

----

On the other hand, if you've added a number of different things other than the 
image at the same time, you have a bit more work ahead of you.

Option 1: Dig up the image you added and make it invisible.
1a: By setting it's width & height to zero.
1b: By removing all its image data (which involves altering an existing content 
stream)

Option 2: Modify the way you add the image to make it easier to dig out of the 
page's content, then parse and alter the appropriate content stream.

To make it easier to find, you can use the Marked Content system:

---

myContentByte.beginMarkedContentSequence( new PdfName( "IndianGiver" ) );
...  // add the image here
myContentByte.endMarkedContentSequence();

---

To actually get the content stream's bytes... that's tough.  Lets see here...  
Ah!  Not so bad after all.  All the elements of a page's /Contents array are 
PdfStream objects.  If they're from a PdfReader (rather than created by a 
PdfWriter) they're instances of PRStream (a descendant of PdfStream).  You can 
call PdfReader's static getStreamBytes( PRStream yourContents ) and get back a 
byte[].

You'll need to search that byte array for "BMC IndianGiver" (Begin Marked 
Content, plus whatever tag you used in beginMarkedContentSquence() ), and yank 
out everything until you reach "EMC".  You'll also need to create a new 
PdfStream with your altered version of that byte array.  You cannot modify it 
in place:

(AND, you'll want to save the last token before "EMC" because you'll want to 
know the name of the resource to yank later.)

---

PRStream modifiedContent = new PRStream( myReader, modifiedBytes );
pageContArray.getArrayList().set( streamSourceIndex, modifiedContent );

---

You have to make these changes before connecting the PdfReader to a PdfWriter 
or the changes may not 'take'.

For space reasons, you'll also want to hunt down the image and drop it from the 
page's resource dictionary, then call myReader.removeUnusedObjects().  Just 
removing the reference to the image in the page resources and contents isn't 
enough to remove it from the file.

---

PdfDictionary pageDict = myReader.getPageN( pageNumNOT_INDEX );
PdfDictionary resDict = (PdfDictionary) pageDict.get( PdfName.RESOURCES );
if (resDict != null) { // they should always be dictionaries
  PdfDictionary xobjDict = (PdfDictionary) resDict.get( PdfName.XOBJECT ); // 
again, always a dict
  if (xobjDict != null) {
    xobjDict.remove( new PdfName( nameOfImageYouSavedEarlier ) );
  }
}
---

That's a lot of black-belt Pdf Fu going on there.  It becomes much easier to 
understand what's going on when you know what's already there, and what you 
want it to look like when you're done.  That's where opening PDFs in a text 
editor (or something like PDF Can Opener, no affiliation) is INVALUABLE.

When you can snatch the content stream from my hand, young grasshopper...

--Mark Storer
  Senior Software Engineer
  Cardiff Software

#include <disclaimer>
typedef std::Disclaimer<Cardiff> DisCard;

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

Re: [iText-questions] Modify Content

Reply via email to