[ https://issues.apache.org/jira/browse/DERBY-2346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12488197 ]
Kristian Waagan commented on DERBY-2346: ---------------------------------------- Regarding the UTF-8 char -> byte -> char conversion using String methods, I don't think it is a bug. Unmappable "chars" are represented by '?' (0xf3 / 63). In the snippet above, (char)56249 (0xdbb9) happens to be in a PUA area. These codepoints are reserved for private use, and the Unicode standard does not define any characters for them. You could use DataOutput/DataInput and write-/readUTF, but I don't know how efficient this would be. These methods write the strings to the modfied UTF-8 format, and the equals in the example above returns true. I think writing your own method would be acceptable, but it would be interesting if anyone took the time to investigate the cpu/space differences (i.e. what kind of stream can we use underneath? ByteArrayOutputStream? Subclass of it that returns reference to the byte array?) Even though the example uses a "very special codepoint", the database should handle it. An application could potentially use it for its own custom character (not quite sure how though). Further, it seems the "UTF-8" encoding (as used in String.getBytes()) does not promise to encode all unsigned 16 bit values, but only valid Unicode characters. I'm not very good with the Unicode terminology, so there might be errors in my comment and maybe important additions. Feel free to correct me. > Provide set methods for clob for embedded driver > ------------------------------------------------ > > Key: DERBY-2346 > URL: https://issues.apache.org/jira/browse/DERBY-2346 > Project: Derby > Issue Type: Sub-task > Components: JDBC > Affects Versions: 10.3.0.0 > Reporter: Anurag Shekhar > Assigned To: Anurag Shekhar > Attachments: derby-2346-only_for_review.diff, derby-2346.v1.diff > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.