You could do a Save as Optimized PDF in Acrobat to get the size back down.

There  isn't anything in iText to make it easy to do what you want, though it 
would be possible (with a detailed understanding of PDF constructs).

Leonard

-----Original Message-----
From: Arthur Murray [mailto:[email protected]] 
Sent: Thursday, January 12, 2012 6:07 PM
To: [email protected]
Subject: [iText-questions] Copying OCR'd hidden text from one PDF to another 
while retaining original images?

Is there an example snippet that can help with this or a pointer on how to 
approach this?

I have a scanned book as a PDF, for example this google one:
http://ia600307.us.archive.org/21/items/lightsandshadow00whipgoog/lightsandshadow00whipgoog.pdf

When I OCR this in AcrobatX the filesize grows from 12 megs to 54 megs (the 
images get bigger even though I use Searchable Image "Exact").
I'd like to open the original non-OCRd PDF and copy the OCRed hidden text from 
the second larger OCR'd PDF into it, hopefully retaining a smaller image 
filesize but gaining the ability to search and highlight the PDF.

Thanks.

------------------------------------------------------------------------------
RSA(R) Conference 2012
Mar 27 - Feb 2
Save $400 by Jan. 27
Register now!
http://p.sf.net/sfu/rsa-sfdev2dev2
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference 
to the iText book: http://www.itextpdf.com/book/ Please check the keywords list 
before you ask for examples: http://itextpdf.com/themes/keywords.php

------------------------------------------------------------------------------
RSA(R) Conference 2012
Mar 27 - Feb 2
Save $400 by Jan. 27
Register now!
http://p.sf.net/sfu/rsa-sfdev2dev2
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference 
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: 
http://itextpdf.com/themes/keywords.php

Reply via email to