[ https://issues.apache.org/jira/browse/PDFBOX-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15278670#comment-15278670 ]
John Hewson commented on PDFBOX-3347: ------------------------------------- Ok, so the UTF-8 sequence for å (U+00E5) is *two bytes* {{C3 A5}}, so this PDF file is bad. This PDF appears to have its named encoded with ISO-8859-1, which is wrong. > COSName parsing/writing interprets byte sequences as UTF-8 when parsing > ----------------------------------------------------------------------- > > Key: PDFBOX-3347 > URL: https://issues.apache.org/jira/browse/PDFBOX-3347 > Project: PDFBox > Issue Type: Bug > Components: Parsing, Writing > Affects Versions: 1.8.12, 2.0.1, 2.0.2 > Reporter: Maruan Sahyoun > Priority: Minor > > As discussed here > http://stackoverflow.com/questions/36964496/pdfbox-2-0-overcoming-dictionary-key-encoding/ > a byte sequence making up a COSName is interpreted during parsing and > writing where it shouldn't. Details are given my mkl's excellent analysis. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org