Hello team,

I recently encountered the problem that PDFBox cannot render Chinese, the 
problem is very similar to https://issues.apache.org/jira/browse/PDFBOX-5704.

In this case, the attached PDF file embedded a CCF font file, the correct font 
type/subtype should be /CIDFontType0 and /CIDFontType0C and should declare 
property /FontFile3. But it wrongly declared the subfont as a truetype, and it 
makes PDFBox uses TTF parser to parse the font file stream based on the 
declared type.

According to the spec, PDFBox does it right, but from the perspective of use, 
this looks more like a "bug", though this file would display good in other most 
used PDF readers (Adobe, Foxit, pdfjs etc.)

I have many years of working experience in PDF generation (iText, PDFBox, 
etc.), and I know that after a PDF is generated, as long as it can be displayed 
correctly in Adobe Reader, then it is considered correct. If another program 
cannot display it correctly, it will be considered a bug in other program. It's 
not fair, but it's reality. Many low-quality PDF generation tools/libraries are 
still widely used.

In pdf.js,  it will parse the font file first, and prefer the font type in font 
file rather than the type declared in font dictionary.
https://github.com/mozilla/pdf.js/blob/1cdbcfef821c7f6e81ea22fe68a8b815bca01c4e/src/core/fonts.js#L1052

So my question is "Is that possible that PDFBox provide some font processing 
workaround logic to handle such case?"

Thanks
Mike

Reply via email to