[ https://issues.apache.org/jira/browse/CASSANDRA-8790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14317902#comment-14317902 ]
Johnny Miller edited comment on CASSANDRA-8790 at 2/12/15 9:48 AM: ------------------------------------------------------------------- [~brandon.williams] I appreciate your point, but [~thobbs] example isn't the actual problem. Its specifically non-printable unicode characters. This isn't a problem, when your using the drivers - it is only when using CQLSH. The problem is that if I start writing data like this via the drivers and then for some reason I ever need to query it via CQLSH I will get the wrong answer when doing a select. i.e. SELECT * from testunicode where id = 'state\u001Ccard' will not return any results via CQLSH when the data does actually exist in my table and the same query via the drivers (java) would actually return a result. The workaround to use blobs is not great (IMHO) - its a shame to have model your data round this specific CQLSH limitation We should either provide some functionality to handle this this or alternatively error if someone enters a non-printable unicode character in CQLSH as the answer we get back is incorrect and likely to mislead people. If you have an example of handling this via CQLSH on any OS, please share it was (Author: johnny15676): [~brandon.williams] I appreciate your point, but [~thobbs] example isn't the actual problem. Its specifically non-printable unicode characters. This isn't a problem, when your using the drivers - it is only when using CQLSH. The problem is that if I start writing data like this via the drivers and then for some reason I ever need to query it via CQLSH I will get the wrong answer when doing a select. i.e. SELECT * from testunicode where id = 'state\u001Ccard' will not return any results via CQLSH when the data does actually exist in my table and the same query via the drivers (java) would actually return a result. The workaround to use blobs is not great (IMHO) - its a shame to have model your data round this specific CQLSH limitation We should either provide some functionality to handle this this or alternatively error if someone enters a non-printable unicode character in CQLSH as the answer we get back is incorrect and likely to mislead people. > Improve handling of non-printable unicode characters in text fields and CQLSH > ----------------------------------------------------------------------------- > > Key: CASSANDRA-8790 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8790 > Project: Cassandra > Issue Type: Improvement > Reporter: Johnny Miller > Priority: Minor > > Currently to store a string/text value that contains a non-printable unicode > character and than subsequently be able to query it CQLSH I need to store the > field as a blob via the blobAsText and textAsBlob functions. > This is not really optimal - it would be better if CQLSH handled this rather > than having to model data around this limitation. > For example: > {code:title=Example Code|borderStyle=solid} > String createTableCql = "CREATE TABLE IF NOT EXISTS test_ks.testunicode (id > blob PRIMARY KEY, inserted_on timestamp, lorem text)"; > session.execute(createTableCql); > System.out.println("Table created."); > > String dimension1 = "state"; > String dimension2 = "card"; > String key = dimension1 + '\u001C' + dimension2; > Date now = new Date(); > String lorem = "Lorem ipsum dolor sit amet."; > > String insertcql = "INSERT INTO testunicode (id, inserted_on, lorem) VALUES > (textAsBlob(?), ?, ?)"; > PreparedStatement ps = session.prepare(insertcql); > BoundStatement bs = new BoundStatement(ps); > bs.bind(key, now, lorem); > session.execute(bs); > System.out.println("Row inserted with key "+key); > > String selectcql = "SELECT blobAsText(id) AS id, inserted_on, lorem FROM > testunicode WHERE id = textAsBlob(?)"; > PreparedStatement ps2 = session.prepare(selectcql); > BoundStatement bs2 = new BoundStatement(ps2); > bs2.bind(key); > ResultSet results = session.execute(bs2); > > System.out.println("Got results..."); > > for (Row row : results) { > System.out.println(String.format("%-30s\t%-20s\t%-20s", > row.getString("id"), row.getDate("inserted_on"), row.getString("lorem"))); > } > {code} > And to query via CQLSH: > {code} > select * from testunicode where id = 0x73746174651c63617264 ; > id | inserted_on | lorem > ------------------------+--------------------------+----------------------------- > 0x73746174651c63617264 | 2015-02-11 20:32:20+0000 | Lorem ipsum dolor sit > amet. > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)