Tyler Hobbs created CASSANDRA-8101: -------------------------------------- Summary: Invalid ASCII and UTF-8 chars not rejected in CQL string literals Key: CASSANDRA-8101 URL: https://issues.apache.org/jira/browse/CASSANDRA-8101 Project: Cassandra Issue Type: Bug Components: Core Reporter: Tyler Hobbs Assignee: Tyler Hobbs Priority: Critical Fix For: 2.0.11, 2.1.1
When processing CQL string literals, we ultimately use {{String.getBytes(Charset)}}, which has the following note: {quote} This method always replaces malformed-input and unmappable-character sequences with this charset's default replacement byte array. The CharsetEncoder class should be used when more control over the encoding process is required. {quote} So, if we insert a non-ASCII character into an ascii string literal, it will be replaced with a {{?}} char. Something similar happens for UTF-8. For example: {noformat} cqlsh:ks1> create table badstrings (a int primary key, b ascii); cqlsh:ks1> insert into badstrings (a, b) VALUES ( 0, 'ΎΔδϠ'); cqlsh:ks1> select * from badstrings; a | b ---+------ 0 | ???? {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)