Spaces between words ignored in scanned pdf files
-------------------------------------------------
Key: PDFBOX-349
URL: https://issues.apache.org/jira/browse/PDFBOX-349
Project: PDFBox
Issue Type: Bug
Components: Text extraction
Reporter: Jukka Zitting
[Issue from SourceForge]
http://sourceforge.net/tracker/index.php?func=detail&aid=1922502&group_id=78314&atid=552832
I am using PDF-Box-0.7.3.dll with C# and have tested extraction on two
searchable pdfs that I have scanned in from paper. Spaces between words are
ignored for both files. I have also tested another pdf file (which I
downloaded from the internet) and it was parsed correctly. Unfortunately,
the file is 1.2MB and the upload was blocked. Please send me an email
([EMAIL PROTECTED]) and I will reply back with the file.
Thanks for looking into this.
Greg
[Comment on SourceForge]
Date: 2008-03-23 21:24
Sender: gkobzeff
Logged In: YES
user_id=2042611
Originator: YES
I have scanned the file into a smaller file size. I have attached the
file.
Thanks
File Added: Advanced Pain Mgmt BW.pdf
http://sourceforge.net/tracker/download.php?group_id=78314&atid=552832&file_id=271548&aid=1922502
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.