[ 
https://issues.apache.org/jira/browse/CASSANDRA-11030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127558#comment-15127558
 ] 

Stefania edited comment on CASSANDRA-11030 at 2/2/16 3:10 AM:
--------------------------------------------------------------

You are correct, it finally works. I think I inserted the data initially by 
copy and paste in a git bash terminal (launched via ConEmu), the only one where 
I could paste a unicode character, but for this terminal the default encoding 
was cp1252 since I only worked out today how to change it to cp65001. So even 
if I inserted the data with --encoding=UTF-8 it would have probably caused 
problems. From other terminals (command prompt, power shell) I could not paste 
the character into cqlsh and trying to insert something like u'\uXXXX' would 
give a syntax error. 

The following works however (unicode.cql is encoded with utf-8):

{code}
chcp 65001
C:\Users\stefania\git\cstar\cassandra>type unicode.cql
INSERT INTO test.test (val) VALUES ('não');
C:\Users\stefania\git\cstar\cassandra>bin\cqlsh.bat --encoding=UTF-8 
--file=unicode.cql
C:\Users\stefania\git\cstar\cassandra>bin\cqlsh.bat --encoding=UTF-8
Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 2.2.5-SNAPSHOT | CQL spec 3.3.1 | Native protocol v4]
Use HELP for help.
cqlsh> select * from test.test;

 val
-----
 não
{code}

The source command also works *provided the encoding specified via the command 
line is the same as the file encoding*, otherwise we get a missing character 
glyph (a square). 

Inserting the character directly from git bash also works now, but because I 
changed the code page to 65001 for it, otherwise it causes the original problem.

You are probably right regarding changing default encoding, I'm + 1 to change 
it to 'utf-8' if you want. Also, shouldn't {{do_source}} use the same encoding 
as the file encoding? I think we should also stress that whichever terminal 
people are using on Windows, it should have the same encoding as the one used 
by cqlsh.

We can commit this ticket as is and open a new ticket re. default encoding or 
change it here, up to you.


was (Author: stefania):
You are correct, it finally works. I think I inserted the data initially by 
copy and paste in a git bash terminal (launched via ConEmu), the only one where 
I could paste a unicode character, but for this terminal the default encoding 
was cp1252 since I only worked out today how to change it to cp65001. So even 
if I inserted the data with --encoding=UTF-8 it would have probably caused 
problems. From other terminals (command prompt, power shell) I could not paste 
the character into cqlsh and trying to insert something like u'\uXXXX' would 
give a syntax error. 

The following works however (unicode.cql is encoded with utf-8):

{code}
chcp 65001
C:\Users\stefania\git\cstar\cassandra>type unicode.cql
INSERT INTO test.test (val) VALUES ('não');
C:\Users\stefania\git\cstar\cassandra>bin\cqlsh.bat --encoding=UTF-8 
--file=unicode.cql
C:\Users\stefania\git\cstar\cassandra>bin\cqlsh.bat --encoding=UTF-8
Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 2.2.5-SNAPSHOT | CQL spec 3.3.1 | Native protocol v4]
Use HELP for help.
cqlsh> select * from test.test;

 val
-----
 não
{code}

The source command also works *provided the encoding specified via the command 
line is the same as the file encoding*, otherwise we get a missing character 
glyph (a square). 

Inserting the character directly from git bash also works now, but because I 
changed the code page to 65001 for it, otherwise it causes the original problem.

You are probably right regarding changing default encoding, I'm + 1 to change 
it to 'utf-8' if you want. Also, shouldn't {{do_source}} use the same encoding 
as the file encoding? I think we should also stress that whichever terminal 
people are using on Windows, it should have the same encoding as the one used 
by cqlsh.

We can commit this ticket as it and open a new ticket re. default encoding or 
change it here, up to you.

> utf-8 characters incorrectly displayed/inserted on cqlsh on Windows
> -------------------------------------------------------------------
>
>                 Key: CASSANDRA-11030
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11030
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Paulo Motta
>            Assignee: Paulo Motta
>            Priority: Minor
>              Labels: cqlsh, windows
>
> {noformat}
> C:\Users\Paulo\Repositories\cassandra [2.2-10948 +6 ~1 -0 !]> .\bin\cqlsh.bat 
> --encoding utf-8
> Connected to test at 127.0.0.1:9042.
> [cqlsh 5.0.1 | Cassandra 2.2.4-SNAPSHOT | CQL spec 3.3.1 | Native protocol v4]
> Use HELP for help.
> cqlsh> INSERT INTO bla.test (bla ) VALUES  ('não') ;
> cqlsh> select * from bla.test;
>  bla
> -----
>  n?o
> (1 rows)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to