If you can get the creator of the PDF to do so, the easiest way I found was
to have them create a bookmark for each document and then separate by the
bookmark. I have set it up with our customers so that the bookmark contains
metadata that we use for indexing purposes and, when I do the split I use
the bookmark as the pdf name. Makes it very easy to work with later on in
the process. And it seemed to be the most natural way to delimit the start
of each document. Naturally the last page of each pdf would be the page of
the next bookmark minus one.

 

I did this 6-7 years ago using iText 2.1.7 and the first book, so there may
be better or easier ways to do this in the newer versions. If the container
documents are very large you will run into performance issues. I found that
making 2 passes, one to break up the original into more manageable chunks of
10,000 documents each; and the second to use multiple threads to break up
the chunks into the individual pdfs reduced the process time from 22+ hours
to about 4-5 hours. But these are 1.5Gb files containing ~500,000 four to
six page documents.

 

And this will only work if the original pdf was created with each document
as a series of consecutive pages.

 

From: Scott Harris [mailto:sharris...@comcast.net] 
Sent: Tuesday, November 22, 2011 12:36 PM
To: itext-questions@lists.sourceforge.net
Subject: [iText-questions] - Extracting multiple PDF docs from a
consolidated PDF "container" doc...

 

iText Community,

 

I'm new to this list so I hope my first outreach adheres to all standards &
protocol for participation...

 

[OBJECTIVE]: I have PDF files already concatenated (by a mechanism other
than iText) into a single PDF. I need to burst - or "de-stack" those files
back into the individual constituent documents contained in the concatenated
"parent-container" PDF. Chapter 6 of 'iText In Action' shows us how to burst
documents at the page level. I need to burst documents not by page, but at
arbitrary points in the content marked by particular delimiters denoting the
start and end of each original PDF document.

 

'Not looking to be handed the answer - just a pointer w.r.t. where best to
start in learning how to leverage the API for basic functions like this that
might not be explicitly captured in the examples. 

 

<+> I have 'iText in Action - 2nd Edition' as an eBook, and all examples
fully loaded and executing in an IDE project.

 

Thanks very much in advance for any initial direction.

 

-smh

 

 

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference 
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: 
http://itextpdf.com/themes/keywords.php

Reply via email to