Tyler Hobbs created CASSANDRA-8101:
--------------------------------------

             Summary: Invalid ASCII and UTF-8 chars not rejected in CQL string 
literals
                 Key: CASSANDRA-8101
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8101
             Project: Cassandra
          Issue Type: Bug
          Components: Core
            Reporter: Tyler Hobbs
            Assignee: Tyler Hobbs
            Priority: Critical
             Fix For: 2.0.11, 2.1.1


When processing CQL string literals, we ultimately use 
{{String.getBytes(Charset)}}, which has the following note:

{quote}
This method always replaces malformed-input and unmappable-character sequences 
with this charset's default replacement byte array. The CharsetEncoder class 
should be used when more control over the encoding process is required.
{quote}

So, if we insert a non-ASCII character into an ascii string literal, it will be 
replaced with a {{?}} char.  Something similar happens for UTF-8.

For example:
{noformat}
cqlsh:ks1> create table badstrings (a int primary key, b ascii);
cqlsh:ks1> insert into badstrings (a, b) VALUES ( 0, 'ΎΔδϠ');
cqlsh:ks1> select * from badstrings;

 a | b
---+------
 0 | ????
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to