[
https://issues.apache.org/jira/browse/PDFBOX-4992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr closed PDFBOX-4992.
-----------------------------------
Resolution: Not A Bug
Closing because it isn't a bug. Yes it's inconvenient, but that's the dark
secret of PDF that many don't have a proper text extraction.
If anyone wants to see how the type 3 glyphs look, open Type3Font.java in the
source code, search for {{// for debug you can save the PDF here}} and then add
code like {{doc.save("SOMEDIRECTORY" + index + ".pdf");}}
Alternatively, change {{renderImage(0)}} to {{renderImage(0, 4)}} and in
FontEncodingView.java change {{table.setRowHeight(40);}} to
{{table.setRowHeight(200);}}. See the screenshot I just added.
> PDF created by Bullzip PDF Printer / www.bullzip.com / Freeware Edition shows
> weird characters
> ----------------------------------------------------------------------------------------------
>
> Key: PDFBOX-4992
> URL: https://issues.apache.org/jira/browse/PDFBOX-4992
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 2.0.21
> Environment: windows
> Reporter: Peter van der Weerd
> Priority: Major
> Labels: type3
> Attachments: 2brightsparks.onfastspring.com - invoice.pdf,
> Clipboard01.png
>
>
> I copy the text from the original bug (PDFBOX-1107). I experience the same
> issue.
> I have quite a few of these documents, but most are classified. I attached a
> non-classified one.
> I was hoping that the recent version solved this issue, but it doesn't.
>
> Original text from 1107:
> Opening the PDF via PDFReader 1.6 + 1.7 SNAPSHOT results in an unreadable
> page. All other pdf viewers I tried have correctly displayed the file.
> The only related log message shown was
> 25.08.2011 11:59:41 org.apache.pdfbox.util.PDFStreamEngine processOperator
> INFO: unsupported/disabled operation: EI
> which is probably unrelated. My guess its the font they used (see screenshot)
> however if the font is unknown or problematic, shouldn't pdfreader use a
> default font or something? Maybe I am wrong anyway :)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]