Shigeru Okada created PDFBOX-5140:
-------------------------------------

             Summary: Can't change PDF including some Chinese font to JPG 
correctly
                 Key: PDFBOX-5140
                 URL: https://issues.apache.org/jira/browse/PDFBOX-5140
             Project: PDFBox
          Issue Type: Bug
          Components: Rendering
    Affects Versions: 2.0.22
         Environment: Windows 10

            Reporter: Shigeru Okada
         Attachments: TC_DFKaiShuSB.pdf, TC_DFKaiShuSB_001.jpg, TC_MingLiu.pdf, 
TC_MingLiu_001.jpg, TC_PMingLiU.pdf, TC_PMingLiU_001.jpg

I tried to change PDF file including Chinese font to JPG file.
Source code is as below.

        private List<String> convertPdf2Jpg(File pdfFile) throws 
TextImageExtractorException{

                List<String> jpgList = new ArrayList<String>();

                try {
                        PDDocument document = PDDocument.load(pdfFile);
                        PDFRenderer pdfRenderer = new PDFRenderer(document);
                        for (int i = 0; i < document.getNumberOfPages(); i++) {
                                BufferedImage image = null;
                                try{
                                        image = 
pdfRenderer.renderImageWithDPI(i, 300 ,ImageType.RGB);
                                        String jpgName = 
pdfFile.getPath().split(".pdf")[0] + "_" + String.format("%03d", i+1) +  ".jpg";
                                        ImageIOUtil.writeImage(image, jpgName, 
300);
                                        jpgList.add(jpgName);
                                }
                                catch(Exception e) {
                                        document.close();
                                        LOG.error(pdfFile + "(" + i + " page) " 
+ " can't convert pdf to jpg file (convertPdf2Jpg())." + e.toString());
                                        
throwPdfBoxException("convertPdf2Jpg():" +  pdfFile + "(" + i + " page) " + " 
can't convert pdf to jpg file." + e.toString());
                                }
                        }
                        document.close();
                }
                catch(Exception e){
                        LOG.error(pdfFile + "Can't load PDF file 
(convertPdf2Jpg()).");
                }
                return jpgList;
        }

I attached example of PDF and JPG. Chinese characters are broken.
It seems that it depends on font. 
If you need more information, please let me know.

Thanks

//Okada




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to