Without seeing the actual PDF, but based on that differences array - the answer 
is NO - there is no (easy) way to extract the text.  Your only option might be 
rasterization + OCR.

-----Original Message-----
From: Pakhu [mailto:[email protected]] 
Sent: Wednesday, March 09, 2011 9:38 PM
To: [email protected]
Subject: [iText-questions] Unreadable Pdf with PdfTextExtractor

I have received a set of pdf files that cannot be parsed using itext
pdftextextractor.

All characters are meaningless. I attach a sample if you want to verify it.

If I copy part of the file and paste it on a text editor I also get that
messy  meaningless result.

All fonts are TrueType embedded subsets.

The differences array look like this:

<&lt;/Type/Encoding/BaseEncoding/WinAnsiEncoding/Differences[
1/g48/g55/g54/g3/g44/g81/g70/g82/g80/g76/g74/g86/g17/g79/g29/g36
/g87/g72/g47/g53/g73/g85/g49/g18/g51/g68/g91/g89/g88/g39/g41/g16
/g56/g38/g75]&gt;>

Is there any way I could render this file? any transformation to the
document that could help?

I'm interested in just the text not in its format.


Thanks

--
View this message in context: 
http://itext-general.2136553.n4.nabble.com/Unreadable-Pdf-with-PdfTextExtractor-tp3345219p3345219.html
Sent from the iText - General mailing list archive at Nabble.com.

------------------------------------------------------------------------------
Colocation vs. Managed Hosting
A question and answer guide to determining the best fit
for your organization - today and in the future.
http://p.sf.net/sfu/internap-sfd2d
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference 
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: 
http://itextpdf.com/themes/keywords.php

------------------------------------------------------------------------------
Colocation vs. Managed Hosting
A question and answer guide to determining the best fit
for your organization - today and in the future.
http://p.sf.net/sfu/internap-sfd2d
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference 
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: 
http://itextpdf.com/themes/keywords.php

Reply via email to