Antonio Pozo created PDFBOX-6050:
------------------------------------
Summary: Japanese text not rendered when converting editable PDFs
to images
Key: PDFBOX-6050
URL: https://issues.apache.org/jira/browse/PDFBOX-6050
Project: PDFBox
Issue Type: Bug
Reporter: Antonio Pozo
We are using the PDFBox library to convert PDF documents to images. We have
found that, for many _editable_ PDFs containing Japanese text (PDF forms), the
generated images do not display the Japanese characters (it seems PDFBox is
unable to extract/render them). Text in other languages within the same
document is rendered correctly.
This issue does *not* occur with non-editable PDFs — in those cases, Japanese
text is rendered without problems.
We have observed that if we open these editable PDFs in a PDF editor and simply
save them again (without making any changes), PDFBox is then able to generate
the images with the Japanese text rendered correctly. We have also tried
setting a [font that supports the Japanese
alphabe|https://fonts.google.com/noto/specimen/Noto+Sans+JP]t in the library
before converting the PDF to an image, without success.
Other tools (such as Adobe Reader) are able to display and extract the Japanese
text from these editable PDFs without requiring the open-and-save step.
*Expected behavior:*
When converting editable PDFs containing Japanese text to images, the Japanese
characters should be rendered correctly without requiring the document to be
re-saved.
*Actual behavior:*
When converting editable PDFs containing Japanese text to images, the Japanese
characters are missing in the generated images, while text in other languages
is rendered correctly.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]