[ 
https://issues.apache.org/jira/browse/DERBY-2346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12488197
 ] 

Kristian Waagan commented on DERBY-2346:
----------------------------------------

Regarding the UTF-8 char -> byte -> char conversion using String methods, I 
don't think it is a bug. Unmappable "chars" are represented by '?' (0xf3 / 63).
In the snippet above, (char)56249 (0xdbb9) happens to be in a PUA area. These 
codepoints are reserved for private use, and the Unicode standard does not 
define any characters for them.

You could use DataOutput/DataInput and write-/readUTF, but I don't know how 
efficient this would be. These methods write the strings to the modfied UTF-8 
format, and the equals in the example above returns true. I think writing your 
own method would be acceptable, but it would be interesting if anyone took the 
time to investigate the cpu/space differences (i.e. what kind of stream can we 
use underneath? ByteArrayOutputStream? Subclass of it that returns reference to 
the byte array?)

Even though the example uses a "very special codepoint", the database should 
handle it. An application could potentially use it for its own custom character 
(not quite sure how though). Further, it seems the "UTF-8" encoding (as used in 
String.getBytes()) does not promise to encode all unsigned 16 bit values, but 
only valid Unicode characters.

I'm not very good with the Unicode terminology, so there might be errors in my 
comment and maybe important additions. Feel free to correct me.

> Provide set methods for clob for embedded driver
> ------------------------------------------------
>
>                 Key: DERBY-2346
>                 URL: https://issues.apache.org/jira/browse/DERBY-2346
>             Project: Derby
>          Issue Type: Sub-task
>          Components: JDBC
>    Affects Versions: 10.3.0.0
>            Reporter: Anurag Shekhar
>         Assigned To: Anurag Shekhar
>         Attachments: derby-2346-only_for_review.diff, derby-2346.v1.diff
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to