[ 
https://issues.apache.org/jira/browse/CASSANDRA-8101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14168199#comment-14168199
 ] 

Aleksey Yeschenko commented on CASSANDRA-8101:
----------------------------------------------

LGTM. Nit - imports in CBUtil.

> Invalid ASCII and UTF-8 chars not rejected in CQL string literals
> -----------------------------------------------------------------
>
>                 Key: CASSANDRA-8101
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8101
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Tyler Hobbs
>            Assignee: Tyler Hobbs
>            Priority: Critical
>             Fix For: 2.0.11, 2.1.1
>
>         Attachments: 8101.txt
>
>
> When processing CQL string literals, we ultimately use 
> {{String.getBytes(Charset)}}, which has the following note:
> {quote}
> This method always replaces malformed-input and unmappable-character 
> sequences with this charset's default replacement byte array. The 
> CharsetEncoder class should be used when more control over the encoding 
> process is required.
> {quote}
> So, if we insert a non-ASCII character into an ascii string literal, it will 
> be replaced with a {{?}} char.  Something similar happens for UTF-8.
> For example:
> {noformat}
> cqlsh:ks1> create table badstrings (a int primary key, b ascii);
> cqlsh:ks1> insert into badstrings (a, b) VALUES ( 0, 'ΎΔδϠ');
> cqlsh:ks1> select * from badstrings;
>  a | b
> ---+------
>  0 | ????
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to