[
https://issues.apache.org/jira/browse/PDFBOX-5140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shigeru Okada updated PDFBOX-5140:
----------------------------------
Description:
I tried to change PDF file including Chinese font to JPG file.
Source code is as below.
private List<String> convertPdf2Jpg(File pdfFile) {
List<String> jpgList = new ArrayList<String>();
try {
PDDocument document = PDDocument.load(pdfFile);
PDFRenderer pdfRenderer = new PDFRenderer(document);
for (int i = 0; i < document.getNumberOfPages(); i++) {
BufferedImage image = null;
try{
image =
pdfRenderer.renderImageWithDPI(i, 300 ,ImageType.RGB);
String jpgName =
pdfFile.getPath().split(".pdf")[0] + "_" + String.format("%03d", i+1) + ".jpg";
ImageIOUtil.writeImage(image, jpgName,
300);
jpgList.add(jpgName);
}
catch(Exception e) {
document.close();
LOG.error(pdfFile + "(" + i + " page) "
+ " can't convert pdf to jpg file (convertPdf2Jpg())." + e.toString());
throwPdfBoxException("convertPdf2Jpg():" + pdfFile + "(" + i + " page) " + "
can't convert pdf to jpg file." + e.toString());
}
}
document.close();
}
catch(Exception e){
LOG.error(pdfFile + "Can't load PDF file
(convertPdf2Jpg()).");
}
return jpgList;
}
I attached example of PDF and JPG. Chinese characters are broken.
It seems that it depends on font.
If you need more information, please let me know.
Thanks
//Okada
was:
I tried to change PDF file including Chinese font to JPG file.
Source code is as below.
private List<String> convertPdf2Jpg(File pdfFile) throws
TextImageExtractorException{
List<String> jpgList = new ArrayList<String>();
try {
PDDocument document = PDDocument.load(pdfFile);
PDFRenderer pdfRenderer = new PDFRenderer(document);
for (int i = 0; i < document.getNumberOfPages(); i++) {
BufferedImage image = null;
try{
image =
pdfRenderer.renderImageWithDPI(i, 300 ,ImageType.RGB);
String jpgName =
pdfFile.getPath().split(".pdf")[0] + "_" + String.format("%03d", i+1) + ".jpg";
ImageIOUtil.writeImage(image, jpgName,
300);
jpgList.add(jpgName);
}
catch(Exception e) {
document.close();
LOG.error(pdfFile + "(" + i + " page) "
+ " can't convert pdf to jpg file (convertPdf2Jpg())." + e.toString());
throwPdfBoxException("convertPdf2Jpg():" + pdfFile + "(" + i + " page) " + "
can't convert pdf to jpg file." + e.toString());
}
}
document.close();
}
catch(Exception e){
LOG.error(pdfFile + "Can't load PDF file
(convertPdf2Jpg()).");
}
return jpgList;
}
I attached example of PDF and JPG. Chinese characters are broken.
It seems that it depends on font.
If you need more information, please let me know.
Thanks
//Okada
> Can't change PDF including some Chinese font to JPG correctly
> -------------------------------------------------------------
>
> Key: PDFBOX-5140
> URL: https://issues.apache.org/jira/browse/PDFBOX-5140
> Project: PDFBox
> Issue Type: Bug
> Components: Rendering
> Affects Versions: 2.0.22
> Environment: Windows 10
> Reporter: Shigeru Okada
> Priority: Major
> Attachments: TC_DFKaiShuSB.pdf, TC_DFKaiShuSB_001.jpg,
> TC_MingLiu.pdf, TC_MingLiu_001.jpg, TC_PMingLiU.pdf, TC_PMingLiU_001.jpg
>
>
> I tried to change PDF file including Chinese font to JPG file.
> Source code is as below.
> private List<String> convertPdf2Jpg(File pdfFile) {
> List<String> jpgList = new ArrayList<String>();
> try {
> PDDocument document = PDDocument.load(pdfFile);
> PDFRenderer pdfRenderer = new PDFRenderer(document);
> for (int i = 0; i < document.getNumberOfPages(); i++) {
> BufferedImage image = null;
> try{
> image =
> pdfRenderer.renderImageWithDPI(i, 300 ,ImageType.RGB);
> String jpgName =
> pdfFile.getPath().split(".pdf")[0] + "_" + String.format("%03d", i+1) +
> ".jpg";
> ImageIOUtil.writeImage(image, jpgName,
> 300);
> jpgList.add(jpgName);
> }
> catch(Exception e) {
> document.close();
> LOG.error(pdfFile + "(" + i + " page) "
> + " can't convert pdf to jpg file (convertPdf2Jpg())." + e.toString());
>
> throwPdfBoxException("convertPdf2Jpg():" + pdfFile + "(" + i + " page) " + "
> can't convert pdf to jpg file." + e.toString());
> }
> }
> document.close();
> }
> catch(Exception e){
> LOG.error(pdfFile + "Can't load PDF file
> (convertPdf2Jpg()).");
> }
> return jpgList;
> }
> I attached example of PDF and JPG. Chinese characters are broken.
> It seems that it depends on font.
> If you need more information, please let me know.
> Thanks
> //Okada
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]