Potential issue with COSString and UTF-16-encoded Strings.
----------------------------------------------------------

                 Key: PDFBOX-904
                 URL: https://issues.apache.org/jira/browse/PDFBOX-904
             Project: PDFBox
          Issue Type: Bug
          Components: PDModel
    Affects Versions: 1.3.1
            Reporter: Neil McErlean


I've been looking into PDFBOX-903 and I came across a potential issue with the 
COSString class.

The issue occurs when you construct an instance of COSString and pass a 
UTF-16-encoded String.
The current code (trunk) checks the passed String parameter in the constructor 
to see if it is UTF-16. It does this by looking for char values above 255.
Whilst a String that contains char values greater than 255 is likely to be 
UTF-16, it is possible to have UTF-16-encoded Strings whose characters do not 
exceed this limit.
These Strings would be incorrectly marked as being not unicode16. An example 
(from the upcoming patch)
/**してく */
String textHighBits =  "\u3057\u3066\u304f";

Furthermore, if you construct a COSString using the COSString(byte[]) 
constructor, then the COSString class cannot know what the encoding is.

I will attach a patch in a moment which includes a test case to reproduce the 
issue and a fix for the product code.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to