[ https://issues.apache.org/jira/browse/CASSANDRA-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13224545#comment-13224545 ]
paul cannon commented on CASSANDRA-4003: ---------------------------------------- Sure. Since the CQL driver deserializes column names before the client software (cqlsh) can see them, and does not expose the Cassandra data type for the column names, it was not always possible to determine from returned column names how they were meant to be interpreted. For example, it was sometimes impossible to tell TimeUUIDType from UUIDType, or any of the various integer or counter types apart, or even BytesType from AsciiType. Cqlsh makes an effort to display data in the most meaningful form, and secondarily to visually distinguish data that would otherwise be too ambiguous using colors. So it needs to know the original column name type. The CQL driver does not expose that, so this code uses internals to get it. Clearly it would make more sense to expose the info from the driver side, and I plan to do that, but it takes some extra process and testing. This hack is backwards compatible with older CQL driver versions, but possibly not forwards-compat. Maybe it would be best to do a runtime check against the driver to see if it supports exposing column types before making this call. > cqlsh still failing to handle decode errors in some column names > ---------------------------------------------------------------- > > Key: CASSANDRA-4003 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4003 > Project: Cassandra > Issue Type: Bug > Components: Tools > Affects Versions: 1.0.8 > Reporter: paul cannon > Assignee: paul cannon > Priority: Minor > Labels: cqlsh > Fix For: 1.0.9 > > > Columns which are expected to be text, but which are not valid utf8, cause > cqlsh to display an error and not show any output: > {noformat} > cqlsh:ks> CREATE COLUMNFAMILY test (a text PRIMARY KEY) WITH comparator = > timestamp; > cqlsh:ks> INSERT INTO test (a, '2012-03-05') VALUES ('val1', 'val2'); > cqlsh:ks> ASSUME test NAMES ARE text; > cqlsh:ks> select * from test; > 'utf8' codec can't decode byte 0xe1 in position 4: invalid continuation byte > {noformat} > the traceback with cqlsh --debug: > {noformat} > Traceback (most recent call last): > File "bin/cqlsh", line 581, in onecmd > self.handle_statement(st) > File "bin/cqlsh", line 606, in handle_statement > return custom_handler(parsed) > File "bin/cqlsh", line 663, in do_select > self.perform_statement_as_tokens(parsed.matched, decoder=decoder) > File "bin/cqlsh", line 666, in perform_statement_as_tokens > return self.perform_statement(cqlhandling.cql_detokenize(tokens), > decoder=decoder) > File "bin/cqlsh", line 693, in perform_statement > self.print_result(self.cursor) > File "bin/cqlsh", line 728, in print_result > self.print_static_result(cursor) > File "bin/cqlsh", line 742, in print_static_result > formatted_names = map(self.myformat_colname, colnames) > File "bin/cqlsh", line 413, in myformat_colname > wcwidth.wcswidth(name.decode(self.output_codec.name))) > File "/usr/local/Cellar/python/2.7.2/lib/python2.7/encodings/utf_8.py", > line 16, in decode > return codecs.utf_8_decode(input, errors, True) > UnicodeDecodeError: 'utf8' codec can't decode byte 0xe1 in position 4: > invalid continuation byte > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira