[jira] [Comment Edited] (PDFBOX-2509) Korean Text wrong

John Hewson (JIRA) Thu, 27 Nov 2014 11:00:25 -0800

    [ 
https://issues.apache.org/jira/browse/PDFBOX-2509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14227896#comment-14227896
 ]


John Hewson edited comment on PDFBOX-2509 at 11/27/14 6:59 PM:
---------------------------------------------------------------

That code is a direct copy of the algorithm given in the PDF spec, so we won't 
take a patch which just shoves in extra hacks at the end. Your mechanism is 
correct: we need to use the descendent font's CIDSystemInfo to find a fallback 
CMap. But we need to be populating cMap, cMapUCS2, and isCMapPredefined 
correctly in readEncoding() and fetchCMapUCS2() instead of messing with the 
standard algorithm in toUnicode().

The spec does mention this, when it says:

{quote}
if the font is composite and uses a predefined cmap (excluding Identity-H/V) 
*or if its descendant font* uses Adobe-GB1/CNS1/Japan1/Korea1 then ...
{quote}

PDFBox was missing the part in bold. 


was (Author: jahewson):
That code is a direct copy of the algorithm given in the PDF spec, so we won't 
take a patch which just shoves in extra hacks at the end. Your mechanism is 
correct: we need to use the descendent font's CIDSystemInfo as a fallback CMap. 
But we need to be populating cMap, cMapUCS2, and isCMapPredefined correctly in 
readEncoding() and fetchCMapUCS2() instead of messing with the standard 
algorithm in toUnicode().

The spec does mention this, when it says:

{quote}
if the font is composite and uses a predefined cmap (excluding Identity-H/V) 
*or if its descendant font* uses Adobe-GB1/CNS1/Japan1/Korea1 then ...
{quote}

PDFBox was missing the part in bold. 

> Korean Text wrong
> -----------------
>
>                 Key: PDFBOX-2509
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2509
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Rendering
>    Affects Versions: 2.0.0
>            Reporter: simon steiner
>            Assignee: John Hewson
>             Fix For: 2.1.0
>
>         Attachments: japan.patch, pdfbox147.png, pdfbox238.png, 
> pdfbox238_2.png, pdfbox328.png
>
>
> http://acroeng.adobe.com/Test_Files/fonts/asian%20font%20files/Korean/nonembedded/K4SystemFontsNotEmbeded218.PDF
> and
> http://acroeng.adobe.com/Test_Files/fonts/asian%20font%20files/Korean/nonembedded/KGulimcheNotembeded218.PDF
> and
> http://acroeng.adobe.com/Test_Files/fonts/asian%20font%20files/Korean/nonembedded/VariousKFontsNotembeded218.PDF
> and
> http://acroeng.adobe.com/Test_Files/fonts//EmbeddedCmap.pdf
> and
> http://acroeng.adobe.com/Test_Files/fonts/asian%20font%20files/Japanese/nonembedded/Jun101.pdf
> and
> http://acroeng.adobe.com/Test_Files/fonts/asian%20font%20files/Japanese/nonembedded/ACPTJ_WIN_MSGothic.DOC.pdf
> java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar PDFToImage 
> K4SystemFontsNotEmbeded218.PDF



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (PDFBOX-2509) Korean Text wrong

Reply via email to