[ 
https://issues.apache.org/jira/browse/PDFBOX-4793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17051977#comment-17051977
 ] 

Tilman Hausherr commented on PDFBOX-4793:
-----------------------------------------

You could try deleting the cache file but I expect this to stay as it is... the 
weird thing is I have MalgunGothic Semilight too on my machine and it isn't 
chosen.

But wait, it gets weirder...

I looked into my developing code of {{getFontMatches()}} and found this:
{code}
// Idee: ausschluss mapping                
//System.out.println("fontDescriptor.getFontName(): " + 
fontDescriptor.getFontName() + ", info.getPostScriptName(): " + 
info.getPostScriptName());
                // https://github.com/mozilla/pdf.js/issues/10699
                //  fontDescriptor.getFontName(): AdobeSongStd-Light, 
info.getPostScriptName(): MalgunGothic-Semilight
//if (info.getPostScriptName().startsWith("MalgunGothic"))
//{
//    continue;
//}
{code}
I also had a look at the score values. For your file all scores are 0. So it's 
just bad luck that Malgun was chosen:

getFontMatches: AdobeSongStd-Light
ArialUnicodeMS (TTF, mac: 0x0, os/2: 0x805, cid: null) 
C:\WINDOWS\FONTS\ARIALUNI.TTF 0.0
MalgunGothic-Semilight (TTF, mac: 0x0, os/2: 0x805, cid: null) 
C:\WINDOWS\FONTS\malgunsl.ttf 0.0
MalgunGothic-Semilight (TTF, mac: 0x0, os/2: 0x805, cid: null) 
C:\WINDOWS\FONTS\malgunsl.ttf 0.0
MicrosoftYaHei (TTF, mac: 0x0, os/2: 0x805, cid: null) 
C:\WINDOWS\FONTS\msyh.ttc 0.0
MicrosoftYaHeiUI (TTF, mac: 0x0, os/2: 0x805, cid: null) 
C:\WINDOWS\FONTS\msyh.ttc 0.0
MicrosoftYaHei-Bold (TTF, mac: 0x1, os/2: 0x805, cid: null) 
C:\WINDOWS\FONTS\msyhbd.ttc 0.0
MicrosoftYaHei-Bold (TTF, mac: 0x1, os/2: 0x805, cid: null) 
C:\WINDOWS\FONTS\msyhbd.ttc 0.0
MicrosoftYaHeiUI-Bold (TTF, mac: 0x1, os/2: 0x805, cid: null) 
C:\WINDOWS\FONTS\msyhbd.ttc 0.0
MicrosoftYaHeiUI-Bold (TTF, mac: 0x1, os/2: 0x805, cid: null) 
C:\WINDOWS\FONTS\msyhbd.ttc 0.0
MicrosoftYaHeiLight (TTF, mac: 0x0, os/2: 0x805, cid: null) 
C:\WINDOWS\FONTS\msyhl.ttc 0.0
MicrosoftYaHeiUILight (TTF, mac: 0x0, os/2: 0x805, cid: null) 
C:\WINDOWS\FONTS\msyhl.ttc 0.0
SimSun (TTF, mac: 0x0, os/2: 0x0, cid: null) C:\WINDOWS\FONTS\simsun.ttc 0.0
NSimSun (TTF, mac: 0x0, os/2: 0x0, cid: null) C:\WINDOWS\FONTS\simsun.ttc 0.0
SimSun-ExtB (TTF, mac: 0x0, os/2: 0x0, cid: null) C:\WINDOWS\FONTS\simsunb.ttf 
0.0
SimSun-ExtB (TTF, mac: 0x0, os/2: 0x0, cid: null) C:\WINDOWS\FONTS\simsunb.ttf 
0.0


Then I did the same for the file from PDF.js mentioned in the comment. There it 
looks like this:

getFontMatches: AdobeSongStd-Light
MalgunGothic-Semilight (TTF, mac: 0x0, os/2: 0x805, cid: null) 
C:\WINDOWS\FONTS\malgunsl.ttf 1.0
MicrosoftYaHeiLight (TTF, mac: 0x0, os/2: 0x805, cid: null) 
C:\WINDOWS\FONTS\msyhl.ttc 0.9499999992549419
MalgunGothic-Semilight (TTF, mac: 0x0, os/2: 0x805, cid: null) 
C:\WINDOWS\FONTS\malgunsl.ttf 1.0
MicrosoftYaHei (TTF, mac: 0x0, os/2: 0x805, cid: null) 
C:\WINDOWS\FONTS\msyh.ttc 0.5
MicrosoftYaHeiUILight (TTF, mac: 0x0, os/2: 0x805, cid: null) 
C:\WINDOWS\FONTS\msyhl.ttc 0.9499999992549419
SimSun (TTF, mac: 0x0, os/2: 0x0, cid: null) C:\WINDOWS\FONTS\simsun.ttc 0.5
SimSun-ExtB (TTF, mac: 0x0, os/2: 0x0, cid: null) C:\WINDOWS\FONTS\simsunb.ttf 
0.5
MicrosoftYaHeiUI-Bold (TTF, mac: 0x1, os/2: 0x805, cid: null) 
C:\WINDOWS\FONTS\msyhbd.ttc -1.0
MicrosoftYaHeiUI-Bold (TTF, mac: 0x1, os/2: 0x805, cid: null) 
C:\WINDOWS\FONTS\msyhbd.ttc -1.0
MicrosoftYaHeiUI (TTF, mac: 0x0, os/2: 0x805, cid: null) 
C:\WINDOWS\FONTS\msyh.ttc 0.5
ArialUnicodeMS (TTF, mac: 0x0, os/2: 0x805, cid: null) 
C:\WINDOWS\FONTS\ARIALUNI.TTF 0.5
MicrosoftYaHei-Bold (TTF, mac: 0x1, os/2: 0x805, cid: null) 
C:\WINDOWS\FONTS\msyhbd.ttc -1.0
NSimSun (TTF, mac: 0x0, os/2: 0x0, cid: null) C:\WINDOWS\FONTS\simsun.ttc 0.5
MicrosoftYaHei-Bold (TTF, mac: 0x1, os/2: 0x805, cid: null) 
C:\WINDOWS\FONTS\msyhbd.ttc -1.0
SimSun-ExtB (TTF, mac: 0x0, os/2: 0x0, cid: null) C:\WINDOWS\FONTS\simsunb.ttf 
0.5


> Questionable fallback font for some embedded chinese fonts
> ----------------------------------------------------------
>
>                 Key: PDFBOX-4793
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4793
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Rendering
>    Affects Versions: 2.0.18
>            Reporter: Christian Appl
>            Priority: Major
>         Attachments: image-2020-03-04-09-49-42-323.png, 
> image-2020-03-04-09-58-01-055.png, image-2020-03-04-10-09-25-343.png, 
> image-2020-03-04-10-31-03-065.png, pdf_font-zhcn.pdf, screenshot-2.png, 
> screenshot-3.png, screenshot-4.png, screenshot-5.png, screenshot-6.png
>
>
> *Issue:*
> I tried to render PDFs, that contain embedded chinese fonts. Neither the PDF 
> Debugger, nor printouts of the document (PDFPrintable), nor the PDFRenderer 
> can display/render the chinese glyphs correctly and will render placeholders 
> instead.
> *Assumptions:*
> I assume, that said embedded fonts are incomplete and don't contain all 
> glyphs, that would be required to render the text properly and therefore 
> PDFbox attempts to use the previously determined fallback font. (!?)
>  !image-2020-03-04-09-49-42-323.png! 
>  !image-2020-03-04-09-58-01-055.png! 
> And fails to find the glyphs in said fallback font.
> Which is not surprising, as the Fallback font "MalgunGothic-Semilight" 
> (Windows standard font) does not contain chinese characters.
>  !image-2020-03-04-10-09-25-343.png! 
> *Debugging:*
> I tried to understand how the fallback font is determined and what could be 
> done to solve this problem on my end. But I was unable to find a satisfying 
> solution.
> My best guess so far is, that the CIDFontMapping (FontMapperImpl) is to blame 
> for determining an unfit fallback font.
> Although it seems to check, whether required codepages are contained in a 
> fallback font, it still does rank the Malgun font as the topscorer and best 
> substitute font, even though it does clearly not contain all required 
> codepages.
> *My opinion:*
> This is troubling, as better fit fonts exist and could have been selected. 
> (ie.: Adobe Stong Std) And are indeed included in the CIDFontMapping, but 
> seemingly are scoring lower for some reason.
> *Further information:*
> I can not disclose the document in question, however I found a document 
> (pdf_font-zhcn.pdf) in another issue (PDFBOX-3132), that can be used to 
> reproduce the issue (ie.: by dropping it into the PDF Debugger)
>  !image-2020-03-04-10-31-03-065.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to