[ 
https://issues.apache.org/jira/browse/CASSANDRA-8101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tyler Hobbs updated CASSANDRA-8101:
-----------------------------------
    Attachment: 8101.txt

The ascii problem was exactly as the descriptions says, and the changes in 
AsciiType fix that.

When it comes to UTF8, the issue runs deeper.  There ended up being a netty bug 
(which I will open a ticket for shortly) that caused characters outside of the 
specified charset to be replaced (by \uFFFD).  Since the native protocol 
specifies that all strings must be UTF-8, the validation happens in 
CBUtil.readString().

I've pushed a 
[dtest|https://github.com/thobbs/cassandra-dtest/tree/CASSANDRA-8101] to cover 
both cases.  In addition to the attached patch, there's also a 
[branch|https://github.com/thobbs/cassandra/tree/CASSANDRA-8101].

> Invalid ASCII and UTF-8 chars not rejected in CQL string literals
> -----------------------------------------------------------------
>
>                 Key: CASSANDRA-8101
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8101
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Tyler Hobbs
>            Assignee: Tyler Hobbs
>            Priority: Critical
>             Fix For: 2.0.11, 2.1.1
>
>         Attachments: 8101.txt
>
>
> When processing CQL string literals, we ultimately use 
> {{String.getBytes(Charset)}}, which has the following note:
> {quote}
> This method always replaces malformed-input and unmappable-character 
> sequences with this charset's default replacement byte array. The 
> CharsetEncoder class should be used when more control over the encoding 
> process is required.
> {quote}
> So, if we insert a non-ASCII character into an ascii string literal, it will 
> be replaced with a {{?}} char.  Something similar happens for UTF-8.
> For example:
> {noformat}
> cqlsh:ks1> create table badstrings (a int primary key, b ascii);
> cqlsh:ks1> insert into badstrings (a, b) VALUES ( 0, 'ΎΔδϠ');
> cqlsh:ks1> select * from badstrings;
>  a | b
> ---+------
>  0 | ????
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to