[ https://issues.apache.org/jira/browse/CASSANDRA-5198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sylvain Lebresne updated CASSANDRA-5198: ---------------------------------------- Attachment: 0003-Respect-partitioner-type-for-Token-function.txt 0002-Improve-printing-of-type-in-error-message.txt 0001-Respect-CQL3-constant-types.txt Attached 3 patches related to the proposed changes above: # the first one adds proper type validation. In other word, it rejects a string value when the column is int, or reject an int value when the column is a blob (instead of interpreting it as an hex value which I'm pretty sure is counter-intuitive). This does however also reject a string value when the column is a blob, because I'm far from convince than interpreting the content of the string as an hex value is particularly intuitive. But to allow inserting blobs, it allow a new type of hex constants (that must start with '0x'). In other words, if b is a blob column: {noformat} UPDATE ... SET b = '00ff' ... {noformat} is not valid anymore, but {noformat} UPDATE ... SET b = 0x00ff ... {noformat} is. I note that the patch ain't tiny because it required a few refactoring here and there to be done properly, but overall I think those refactor actually improve the code. # the second patch is mainly of cosmetic and make sure we use CQL3 type in CQL3 error message. I.e. 'map<text, int>' rather than 'org.apache.cassandra.db.marshal.MapType(org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Type.Int32Type)'. # the third patch make sure we take the partitioner token type into account. So if your partitioner is M3P you should provide a bigint value, if it's RP a varint one and if it's OPP a blob one. Those patches don't add yet support for the token function in select clause that I talk above. I also want to add conversion function that allow to say convert a string or a uuid to a blob, but I want to refactor a bit the (currently ugly) handling of functions to do that so that will follow later (and it can be done in another ticket). > token () function automatically coerces types leading to confusing output > ------------------------------------------------------------------------- > > Key: CASSANDRA-5198 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5198 > Project: Cassandra > Issue Type: Bug > Affects Versions: 1.2.1 > Reporter: Edward Capriolo > Priority: Minor > Attachments: 0001-Respect-CQL3-constant-types.txt, > 0002-Improve-printing-of-type-in-error-message.txt, > 0003-Respect-partitioner-type-for-Token-function.txt > > > This works as it should. > {noformat} > cqlsh:movies> select * from users where token (username) > token('') ; > username | created_date | email | firstname | lastname | password > -----------+--------------+-------+-----------+----------+---------- > bsmith | null | null | bob | smith | null > scapriolo | null | null | stacey | capriolo | null > ecapriolo | null | null | edward | capriolo | null > cqlsh:movies> select * from users where token (username) > token('bsmith') ; > username | created_date | email | firstname | lastname | password > -----------+--------------+-------+-----------+----------+---------- > scapriolo | null | null | stacey | capriolo | null > ecapriolo | null | null | edward | capriolo | null > cqlsh:movies> select * from users where token (username) > token('scapriolo') > ; > username | created_date | email | firstname | lastname | password > -----------+--------------+-------+-----------+----------+---------- > ecapriolo | null | null | edward | capriolo | null > {noformat} > But look what happens when you supply numbers into the token function. > {noformat} > qlsh:movies> select * from users where token (username) > token(0) ; > username | created_date | email | firstname | lastname | password > -----------+--------------+-------+-----------+----------+---------- > ecapriolo | null | null | edward | capriolo | null > cqlsh:movies> select * from users where token (username) > token(1134314) ; > username | created_date | email | firstname | lastname | password > -----------+--------------+-------+-----------+----------+---------- > bsmith | null | null | bob | smith | null > scapriolo | null | null | stacey | capriolo | null > ecapriolo | null | null | edward | capriolo | null > cqlsh:movies> select * from users where token (username) > token(113431431) ; > username | created_date | email | firstname | lastname | password > -----------+--------------+-------+-----------+----------+---------- > scapriolo | null | null | stacey | capriolo | null > ecapriolo | null | null | edward | capriolo | null > cqlsh:movies> select * from users where token (username) > token(1134) ; > username | created_date | email | firstname | lastname | password > -----------+--------------+-------+-----------+----------+---------- > ecapriolo | null | null | edward | capriolo | null > cqlsh:movies> select * from users where token (username) > token(1134434) ; > username | created_date | email | firstname | lastname | password > -----------+--------------+-------+-----------+----------+---------- > scapriolo | null | null | stacey | capriolo | null > {noformat} > This does not make sense to me. The token function is apparently converting > integers to strings leading to seemingly unpredictable results. > However I find this syntax odd, I feel like I should be able to say > 'token(username) > 0 and token(username) < 10' because from a thrift side I > can page tokens or I can page keys. In this case, I guess, I am only able to > page keys because the token is not returned to the user. > Is token 0 = ''? How do I arrive at the minimal token for and int column. > Should the token() function at least be smart enough to reject integers for > string columns? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira