[ 
https://issues.apache.org/jira/browse/PDFBOX-4992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr closed PDFBOX-4992.
-----------------------------------
    Resolution: Not A Bug

Closing because it isn't a bug. Yes it's inconvenient, but that's the dark 
secret of PDF that many don't have a proper text extraction.

If anyone wants to see how the type 3 glyphs look, open Type3Font.java in the 
source code, search for {{// for debug you can save the PDF here}} and then add 
code like {{doc.save("SOMEDIRECTORY" + index + ".pdf");}}

Alternatively, change {{renderImage(0)}} to {{renderImage(0, 4)}} and in 
FontEncodingView.java change {{table.setRowHeight(40);}} to 
{{table.setRowHeight(200);}}. See the screenshot I just added.

> PDF created by Bullzip PDF Printer / www.bullzip.com / Freeware Edition shows 
> weird characters
> ----------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-4992
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4992
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 2.0.21
>         Environment: windows
>            Reporter: Peter van der Weerd
>            Priority: Major
>              Labels: type3
>         Attachments: 2brightsparks.onfastspring.com - invoice.pdf, 
> Clipboard01.png
>
>
> I copy the text from the original bug (PDFBOX-1107). I experience the same 
> issue. 
> I have quite a few of these documents, but most are classified. I attached a 
> non-classified one.
> I was hoping that the recent version solved this issue, but it doesn't.
>  
> Original text from 1107:
> Opening the PDF via PDFReader 1.6 + 1.7 SNAPSHOT results in an unreadable 
> page. All other pdf viewers I tried have correctly displayed the file.
> The only related log message shown was
> 25.08.2011 11:59:41 org.apache.pdfbox.util.PDFStreamEngine processOperator
> INFO: unsupported/disabled operation: EI
> which is probably unrelated. My guess its the font they used (see screenshot) 
> however if the font is unknown or problematic, shouldn't pdfreader use a 
> default font or something? Maybe I am wrong anyway :)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to