[ 
https://issues.apache.org/jira/browse/PDFBOX-904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil McErlean updated PDFBOX-904:
---------------------------------

    Attachment: PDFBOX-904.patch

Here's a patch with a test case that constructs various java.lang.Strings, 
constructs COSStrings from them and reads back the Strings, the byte[] and the 
PDF String that is written at the back end.


> Potential issue with COSString and UTF-16-encoded Strings.
> ----------------------------------------------------------
>
>                 Key: PDFBOX-904
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-904
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel
>    Affects Versions: 1.4.0
>            Reporter: Neil McErlean
>         Attachments: PDFBOX-904.patch
>
>
> I've been looking into PDFBOX-903 and I came across a potential issue with 
> the COSString class.
> The issue occurs when you construct an instance of COSString and pass a 
> UTF-16-encoded String.
> The current code (trunk) checks the passed String parameter in the 
> constructor to see if it is UTF-16. It does this by looking for char values 
> above 255.
> Whilst a String that contains char values greater than 255 is likely to be 
> UTF-16, it is possible to have UTF-16-encoded Strings whose characters do not 
> exceed this limit.
> These Strings would be incorrectly marked as being not unicode16. An example 
> (from the upcoming patch)
> /**してく */
> String textHighBits =  "\u3057\u3066\u304f";
> Furthermore, if you construct a COSString using the COSString(byte[]) 
> constructor, then the COSString class cannot know what the encoding is.
> I will attach a patch in a moment which includes a test case to reproduce the 
> issue and a fix for the product code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to