Hi Jason,
 
I've had some experience in this regard and with large files in general, and
I agree with Paulo/Leonard that this can be a tricky process.
 
My experience is that the file difference is not that great, but it can add
up especially if you have a lot of fonts. Font and FontDescriptor objects
are replicated on a per page basis so if you have more than a handful per
page, this starts to add up.
 
But that's the least of your worries. A 100,000 pg file can easily top 500
MB in size. Once you get up there it can take in excess of 16 GB of RAM to
process in memory. That means running on a 32-bit box is out of the
question. Try doing this using disk based access and you could be waiting
several days for the job to finish. iText does not support files > 2 GB so
if you get into that territory you are essentially on your own. You can, of
course, add it yourself if you are handy and got a week or two to spare. And
don't forget that with the new licensing model you are required to make such
changes public unless buy a license saying otherwise.
 
Then there is the question of what you are going to do with your print file.
Will it be converted to PostScript? If that is the case, the duplication of
Font and FontDescriptor objects may become a killer as you can end up with
multiply embedded fonts making the PostScript file monstrous in size if it
can be generated at all, and the printer may crap out trying to process it
or run at much less than rated speed. Again, if you are handy, you can
change the iText code to cache those objects, but unless you understand the
innards, it may not be your cup of tea. Some RIP engines don't like the
XObject Forms that imported pages are encapsulated into and die while
attempting to cache each and every one of them so you might want to consider
making sure your print vendor is happy with your concatenated file before
you invest too much time. ISO 16612-2 deals with this, but you will need to
make changes to the iText code to support it, and that will only work if you
are lucky enough to find a print vendor that supports it as well. 
 
There are other nuances, such as containers starting to act funny when
object count gets into the stratosphere, stack space becomes an issue,
recursive code used for tree balancing starts to take forever, and on and
on. It should be fairly trivial for you to try this out using your own test
data and I would highly recommend it as your data is going to be the best
indicator on what will happen.
 
Best of luck,
 
Gylfi

  _____  

From: Jason Berk [mailto:jb...@purdueefcu.com] 
Sent: Monday, March 22, 2010 2:46 PM
To: Post all your questions about iText here
Subject: Re: [iText-questions] design pattern



Yeah, I’m crafting one now….but an answer to the question would tell me if
my test “followed suit”.  Memory isn’t an issue on my hardware…so I’d
sacrifice it for speed/resulting file size….and in the end, a repeatable and
well understood process trumps everything.

 

Jason

 

  _____  

From: Paulo Soares [mailto:psoa...@glintt.com] 
Sent: Monday, March 22, 2010 1:55 PM
To: Post all your questions about iText here
Subject: Re: [iText-questions] design pattern

 

In PdfsmartCopy if the images are the same only one instance will be used
but subset fonts won't be merged and you'll end up with an instance per file
inserted. It will also use a lot of memory for the generation. iText does a
lot of things but it's not particulary efficient memorywise and will always
lose for a custom app written in C. The best way to evaluate this is to
create test PDFs, do the process and see the result.

 

Paulo

 


  _____  


From: Jason Berk [mailto:jb...@purdueefcu.com] 
Sent: Monday, March 22, 2010 5:30 PM
To: Post all your questions about iText here
Subject: Re: [iText-questions] design pattern

I understand….in the meantime, could you answer this question:

 

if I create one PDF and add 100,000 pages, what will it's file size be
relative to creating 100,000 PDFs and using PdfSmartCopy to concat them
together?  If the difference is a few KB/MB I'll go that route.  If it's 10s
or 100s of MB, I'll explore other solutions.

I’m curious….

 

jason

 


  _____  


From: Leonard Rosenthol [mailto:lrose...@adobe.com] 
Sent: Sunday, March 21, 2010 5:15 PM
To: Post all your questions about iText here
Subject: Re: [iText-questions] design pattern

 

If you’re doing that type of volume – I would strongly recommend that you
invest in commercial-grade solutions for document merging, many of which
also support PDF optimization options.

 

iText is a great product, but there are some things it simply doesn’t do
well – large volume document assembly is one of them (see the archives of
the mailing list for discussions on this in the past).

 

Leonard

 

From: Jason Berk [mailto:jb...@purdueefcu.com] 
Sent: Sunday, March 21, 2010 5:06 PM
To: Post all your questions about iText here; Post all your questions about
iText here
Subject: Re: [iText-questions] design pattern

 

I think you misunderstood.  The images are TIF or PNG files.  The pages are
NOT images.

I need to create 60,000 PDFs and concatenate 30,000 of them together into a
single "print" file.

priority 1 is to avoid creating the pdf content for a given account twice.
priority 2 is to create the smallest possible print file (size wise)

if I create one PDF and add 100,000 pages, what will it's file size be
relative to creating 100,000 PDFs and using PdfSmartCopy to concat them
together?  If the difference is a few KB/MB I'll go that route.  If it's 10s
or 100s of MB, I'll explore other solutions.

Thanks,

Jason


-----Original Message-----
From: Leonard Rosenthol [mailto:lrose...@adobe.com]
Sent: Sun 3/21/2010 1:05 PM
To: Post all your questions about iText here
Subject: Re: [iText-questions] design pattern

Why are the pages images and not real text and vector objects?  If you want
small files, DON'T use raster images!

From: Jason Berk [mailto:jb...@purdueefcu.com]
Sent: Sunday, March 21, 2010 12:47 PM
To: iText-questions@lists.sourceforge.net
Subject: [iText-questions] design pattern


hello all.

looking for advice...

my credit union has 60K members.  I need to produce a single PDF for each
member that uses low res images (96 DPI).  This is know as the
"E-Statement".  For about half of the members, I also need to produce a
"print statement" version which uses 300 DPI images.  The print statement
needs to be just one single PDF (with thousands of pages).  I would like
this print file to be as small as possible.

it is possible to

1) create a statement with a button as an image place holder
2) save that statement as a singular PDF after adding low res images to the
buttons
3) concat that statement to an open PDF stream after adding high res images
to the buttons (so the pdf specific stuff isn't repeated with each
statement)

I've read this: http://1t3xt.info/examples/browse/?page=example
<http://1t3xt.info/examples/browse/?page=example&id=347> &id=347 which shows
how to set button images and I've read about using PdfSmartCopy to concat
PDFs, but from what I read, the PDF innards will still be repeated in my
resulting file, eating up file size (please correct me if this is not the
case)

I've also thought about creating two different processes...once to create e
versions and one to create the print file, but that seems inefficient as I'd
be creating the (hopefully) same statement twice for basically half of my
accounts and could potentially result in the statement looking slightly
different between e and print versions.

The end goal is for a member to get his print statement in the mail, and
print the estatement from online banking and (basically) not be able to tell
which is which.

how would you all handle this with iText / Java?

(FYI: this is a server side java process...no web container)

Thanks for any opinions / help.

Jason

 

  _____  

Aviso Legal:
Esta mensagem é destinada exclusivamente ao destinatário. Pode conter
informação confidencial ou legalmente protegida. A incorrecta transmissão
desta mensagem não significa a perca de confidencialidade. Se esta mensagem
for recebida por engano, por favor envie-a de volta para o remetente e
apague-a do seu sistema de imediato. É proibido a qualquer pessoa que não o
destinatário de usar, revelar ou distribuir qualquer parte desta mensagem. 

Disclaimer:
This message is destined exclusively to the intended receiver. It may
contain confidential or legally protected information. The incorrect
transmission of this message does not mean the loss of its confidentiality.
If this message is received by mistake, please send it back to the sender
and delete it from your system immediately. It is forbidden to any person
who is not the intended receiver to use, distribute or copy any part of this
message.

***This is a transmission from Purdue Employees Federal Credit

Union (PEFCU) and is intended solely for its authorized

recipient(s), and may contain information that is confidential

and or legally privileged.  If you are not an addressee, or the

employee or agent responsible for delivering it to an addressee,

you are hereby notified that any use, dissemination,

distribution, publication or copying of the information 

contained

in this email is strictly prohibited. If you have received this

transmission in error, please notify us by telephoning (765)

497-3328 or returning the email. You are then instructed to

delete the information from your computer.  Thank you for your

cooperation.***


------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Reply via email to