[ https://issues.apache.org/jira/browse/PDFBOX-904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Neil McErlean updated PDFBOX-904: --------------------------------- Attachment: PDFBOX-904.patch Here's a patch with a test case that constructs various java.lang.Strings, constructs COSStrings from them and reads back the Strings, the byte[] and the PDF String that is written at the back end. > Potential issue with COSString and UTF-16-encoded Strings. > ---------------------------------------------------------- > > Key: PDFBOX-904 > URL: https://issues.apache.org/jira/browse/PDFBOX-904 > Project: PDFBox > Issue Type: Bug > Components: PDModel > Affects Versions: 1.4.0 > Reporter: Neil McErlean > Attachments: PDFBOX-904.patch > > > I've been looking into PDFBOX-903 and I came across a potential issue with > the COSString class. > The issue occurs when you construct an instance of COSString and pass a > UTF-16-encoded String. > The current code (trunk) checks the passed String parameter in the > constructor to see if it is UTF-16. It does this by looking for char values > above 255. > Whilst a String that contains char values greater than 255 is likely to be > UTF-16, it is possible to have UTF-16-encoded Strings whose characters do not > exceed this limit. > These Strings would be incorrectly marked as being not unicode16. An example > (from the upcoming patch) > /**してく */ > String textHighBits = "\u3057\u3066\u304f"; > Furthermore, if you construct a COSString using the COSString(byte[]) > constructor, then the COSString class cannot know what the encoding is. > I will attach a patch in a moment which includes a test case to reproduce the > issue and a fix for the product code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.