[ https://issues.apache.org/jira/browse/CASSANDRA-8101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14168199#comment-14168199 ]
Aleksey Yeschenko commented on CASSANDRA-8101: ---------------------------------------------- LGTM. Nit - imports in CBUtil. > Invalid ASCII and UTF-8 chars not rejected in CQL string literals > ----------------------------------------------------------------- > > Key: CASSANDRA-8101 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8101 > Project: Cassandra > Issue Type: Bug > Components: Core > Reporter: Tyler Hobbs > Assignee: Tyler Hobbs > Priority: Critical > Fix For: 2.0.11, 2.1.1 > > Attachments: 8101.txt > > > When processing CQL string literals, we ultimately use > {{String.getBytes(Charset)}}, which has the following note: > {quote} > This method always replaces malformed-input and unmappable-character > sequences with this charset's default replacement byte array. The > CharsetEncoder class should be used when more control over the encoding > process is required. > {quote} > So, if we insert a non-ASCII character into an ascii string literal, it will > be replaced with a {{?}} char. Something similar happens for UTF-8. > For example: > {noformat} > cqlsh:ks1> create table badstrings (a int primary key, b ascii); > cqlsh:ks1> insert into badstrings (a, b) VALUES ( 0, 'ΎΔδϠ'); > cqlsh:ks1> select * from badstrings; > a | b > ---+------ > 0 | ???? > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)