[
https://issues.apache.org/jira/browse/PDFBOX-4076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18067007#comment-18067007
]
ASF subversion and git services commented on PDFBOX-4076:
---------------------------------------------------------
Commit 1932403 from Maruan Sahyoun in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1932403 ]
PDFBOX-6178, PDFBOX-4076: add tests; partially created with Claude Haiku 4.5
> PDFBox cannot properly handle PDF Name objects containing bytes with values
> outside the US_ASCII range
> ------------------------------------------------------------------------------------------------------
>
> Key: PDFBOX-4076
> URL: https://issues.apache.org/jira/browse/PDFBOX-4076
> Project: PDFBox
> Issue Type: Bug
> Affects Versions: 2.0.8
> Reporter: Tilman Hausherr
> Assignee: Maruan Sahyoun
> Priority: Major
> Fix For: 4.0.0
>
>
> As reported by [~mkl] in his SO answer
> {quote}The first error in PDF Name handling is that PDFBox internally
> represents them as strings after a mixed UTF-8 / CP-1252 decoding strategy.
> This is wrong, according to the PDF specification a name object is an atomic
> symbol uniquely defined by a sequence of any characters (8-bit values) except
> null (character code 0).
> (...)
> The second error is, though, that while serializing the PDF it only properly
> encodes the characters in the strings representing names which are from
> US_ASCII, all else are replaced by '?'
> {quote}
> sample code
> {code:java}
> PDDocument document = new PDDocument();
> PDPage page = new PDPage();
> document.addPage(page);
> document.getDocumentCatalog().getCOSObject().setString(COSName.getPDFName("äöüß"),
> "äöüß");
> ByteArrayOutputStream baos = new ByteArrayOutputStream();
> document.save(baos);
> document.close();
> document = PDDocument.load(baos.toByteArray());
> System.out.println(document.getDocumentCatalog().getCOSObject().keySet());
> document.close();
> {code}
> output:
> {noformat}
> [COSName{Type}, COSName{Version}, COSName{Pages}, COSName{????}]
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]