[
https://issues.apache.org/jira/browse/PDFBOX-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13709582#comment-13709582
]
Andreas Lehmkühler commented on PDFBOX-1661:
--------------------------------------------
You didn't mentioned any text extraction issue before. Did you perform the
"adobe test" to check if the text can be extracted at all? [1]
[1] http://pdfbox.apache.org/userguide/faq.html#notext
> Fix font subtype automatically
> ------------------------------
>
> Key: PDFBOX-1661
> URL: https://issues.apache.org/jira/browse/PDFBOX-1661
> Project: PDFBox
> Issue Type: Improvement
> Components: PDModel
> Affects Versions: 1.8.1
> Environment: PDFBox: PDFBox 1.8.1
> Reader: Adobe Reader 11.0.0
> Generator: TCPDF 4.5.041
> PDF Content:
> <</Type /Font
> /BaseFont /AdobeSongStd-Light,Bold-UniGB-UTF16-H
> /Subtype /Type0
> /Encoding /UniGB-UTF16-H
> /DescendantFonts [27 0 R]
> Reporter: Raymond Wu
> Labels: encoding, font
>
> Subtype is parsed as "Type0" by PDFBox, but parsed as "Type1" by Adobe Reader.
> This is not a bug of PDFBox.
> The reason is TCPDF 4.5.041 generate font AdobeSongStd-Light with bad subtype
> "Type0".
> It should be "Type1".
> I have test the following codes and they work.
> File: org/apache/pdfbox/pdmodel/font/PDFontFactory.java
> Method: public static PDFont createFont( COSDictionary dic ) throws
> IOException
> Original:
> else if( subType.equals( COSName.TYPE0 ) )
> {
> retval = new PDType0Font( dic );
> }
> Fixed:
> else if( subType.equals( COSName.TYPE0 ) )
> {
> COSName encoding = (COSName)dic.getDictionaryObject(COSName.ENCODING);
> retval = (encoding!=null) ? new PDType1Font( dic ) : new PDType0Font( dic
> );
> }
> With such patch PDFBox will act like Adobe Reader.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira