Potential issue with COSString and UTF-16-encoded Strings. ----------------------------------------------------------
Key: PDFBOX-904 URL: https://issues.apache.org/jira/browse/PDFBOX-904 Project: PDFBox Issue Type: Bug Components: PDModel Affects Versions: 1.3.1 Reporter: Neil McErlean I've been looking into PDFBOX-903 and I came across a potential issue with the COSString class. The issue occurs when you construct an instance of COSString and pass a UTF-16-encoded String. The current code (trunk) checks the passed String parameter in the constructor to see if it is UTF-16. It does this by looking for char values above 255. Whilst a String that contains char values greater than 255 is likely to be UTF-16, it is possible to have UTF-16-encoded Strings whose characters do not exceed this limit. These Strings would be incorrectly marked as being not unicode16. An example (from the upcoming patch) /**してく */ String textHighBits = "\u3057\u3066\u304f"; Furthermore, if you construct a COSString using the COSString(byte[]) constructor, then the COSString class cannot know what the encoding is. I will attach a patch in a moment which includes a test case to reproduce the issue and a fix for the product code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.