[ https://issues.apache.org/jira/browse/CASSANDRA-13004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16050397#comment-16050397 ]
Aleksey Yeschenko edited comment on CASSANDRA-13004 at 6/15/17 5:07 PM: ------------------------------------------------------------------------ Changes look good to me, though, 1. CHANGES.txt could be a bit more verbose here: "After the whole cluster is upgraded to 3.0.14, do a rolling restart of the cluster" - specify that this rolling restart shouldn't set the flag. 2. Boolean.parseBoolean(System.getProperty("cassandra.force_3_0_protocol_version", "false")) can usually be replaced with Boolean.getBoolean("cassandra.force_3_0_protocol_version"), though you might have a reason not to? 3. Since we are adapting the flag version, there is no reason not to make MS version conditional on the 3.11 branch as well, to allow for a rolling upgrade with no interruptions for those on 3.X. It's cheap to do so might as well go for it. Everything else LGTM. was (Author: iamaleksey): Changes look good to me, though, 1. CHANGES.txt could be a bit more verbose here: "After the whole cluster is upgraded to 3.0.14, do a rolling restart of the cluster" - specify that this rolling restart shouldn't set the flag. 2. Boolean.parseBoolean(System.getProperty("cassandra.force_3_0_protocol_version", "false")) can usually be replaced with Boolean.getBoolean("cassandra.force_3_0_protocol_version"), though you might have a reason not to? 3. Since we are adapting the flag version, there is no reason to make MS version conditional on the 3.11 branch as well, to allow for a rolling upgrade with no interruptions for those on 3.X. It's cheap to do so might as well go for it. Everything else LGTM. > Corruption while adding/removing a column to/from the table > ----------------------------------------------------------- > > Key: CASSANDRA-13004 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13004 > Project: Cassandra > Issue Type: Bug > Components: Distributed Metadata > Reporter: Stanislav Vishnevskiy > Assignee: Alex Petrov > Priority: Blocker > Fix For: 3.0.x, 3.11.x, 4.x > > > We had the following schema in production. > {code:none} > CREATE TYPE IF NOT EXISTS discord_channels.channel_recipient ( > nick text > ); > CREATE TYPE IF NOT EXISTS discord_channels.channel_permission_overwrite ( > id bigint, > type int, > allow_ int, > deny int > ); > CREATE TABLE IF NOT EXISTS discord_channels.channels ( > id bigint, > guild_id bigint, > type tinyint, > name text, > topic text, > position int, > owner_id bigint, > icon_hash text, > recipients map<bigint, frozen<channel_recipient>>, > permission_overwrites map<bigint, frozen<channel_permission_overwrite>>, > bitrate int, > user_limit int, > last_pin_timestamp timestamp, > last_message_id bigint, > PRIMARY KEY (id) > ); > {code} > And then we executed the following alter. > {code:none} > ALTER TABLE discord_channels.channels ADD application_id bigint; > {code} > And one row (that we can tell) got corrupted at the same time and could no > longer be read from the Python driver. > {code:none} > [E 161206 01:56:58 geventreactor:141] Error decoding response from Cassandra. > ver(4); flags(0000); stream(27); op(8); offset(9); len(887); buffer: > '\x84\x00\x00\x1b\x08\x00\x00\x03w\x00\x00\x00\x02\x00\x00\x00\x01\x00\x00\x00\x0f\x00\x10discord_channels\x00\x08channels\x00\x02id\x00\x02\x00\x0eapplication_id\x00\x02\x00\x07bitrate\x00\t\x00\x08guild_id\x00\x02\x00\ticon_hash\x00\r\x00\x0flast_message_id\x00\x02\x00\x12last_pin_timestamp\x00\x0b\x00\x04name\x00\r\x00\x08owner_id\x00\x02\x00\x15permission_overwrites\x00!\x00\x02\x000\x00\x10discord_channels\x00\x1cchannel_permission_overwrite\x00\x04\x00\x02id\x00\x02\x00\x04type\x00\t\x00\x06allow_\x00\t\x00\x04deny\x00\t\x00\x08position\x00\t\x00\nrecipients\x00!\x00\x02\x000\x00\x10discord_channels\x00\x11channel_recipient\x00\x01\x00\x04nick\x00\r\x00\x05topic\x00\r\x00\x04type\x00\x14\x00\nuser_limit\x00\t\x00\x00\x00\x01\x00\x00\x00\x08\x03\x8a\x19\x8e\xf8\x82\x00\x01\xff\xff\xff\xff\x00\x00\x00\x04\x00\x00\xfa\x00\x00\x00\x00\x08\x00\x00\xfa\x00\x00\xf8G\xc5\x00\x00\x00\x00\x00\x00\x00\x08\x03\x8b\xc0\xb5nB\x00\x02\x00\x00\x00\x08G\xc5\xffI\x98\xc4\xb4(\x00\x00\x00\x03\x8b\xc0\xa8\xff\xff\xff\xff\x00\x00\x01<\x00\x00\x00\x06\x00\x00\x00\x08\x03\x81L\xea\xfc\x82\x00\n\x00\x00\x00$\x00\x00\x00\x08\x03\x81L\xea\xfc\x82\x00\n\x00\x00\x00\x04\x00\x00\x00\x01\x00\x00\x00\x04\x00\x00\x08\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x08\x03\x8a\x1e\xe6\x8b\x80\x00\n\x00\x00\x00$\x00\x00\x00\x08\x03\x8a\x1e\xe6\x8b\x80\x00\n\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x040\x07\xf8Q\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x08\x03\x8a\x1f\x1b{\x82\x00\x00\x00\x00\x00$\x00\x00\x00\x08\x03\x8a\x1f\x1b{\x82\x00\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x04\x00\x07\xf8Q\x00\x00\x00\x04\x10\x00\x00\x00\x00\x00\x00\x08\x03\x8a\x1fH6\x82\x00\x01\x00\x00\x00$\x00\x00\x00\x08\x03\x8a\x1fH6\x82\x00\x01\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x04\x00\x05\xe8A\x00\x00\x00\x04\x10\x02\x00\x00\x00\x00\x00\x08\x03\x8a+=\xca\xc0\x00\n\x00\x00\x00$\x00\x00\x00\x08\x03\x8a+=\xca\xc0\x00\n\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x08\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x08\x03\x8a\x8f\x979\x80\x00\n\x00\x00\x00$\x00\x00\x00\x08\x03\x8a\x8f\x979\x80\x00\n\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x04\x00 > > \x08\x01\x00\x00\x00\x04\xc4\xb4(\x00\xff\xff\xff\xff\x00\x00\x00O[f\x80Q\x07general\x05\xf8G\xc5\xffI\x98\xc4\xb4(\x00\xf8O[f\x80Q\x00\x00\x00\x02\x04\xf8O[f\x80Q\x00\xf8G\xc5\xffI\x98\x01\x00\x00\xf8O[f\x80Q\x00\x00\x00\x00\xf8G\xc5\xffI\x97\xc4\xb4(\x06\x00\xf8O\x7fe\x1fm\x08\x03\x00\x00\x00\x01\x00\x00\x00\x00\x04\x00\x00\x00\x00' > {code} > And then in cqlsh when trying to read the row we got this. > {code:none} > /usr/bin/cqlsh.py:632: DateOverFlowWarning: Some timestamps are larger than > Python datetime can represent. Timestamps are displayed in milliseconds from > epoch. > Traceback (most recent call last): > File "/usr/bin/cqlsh.py", line 1301, in perform_simple_statement > result = future.result() > File > "/usr/share/cassandra/lib/cassandra-driver-internal-only-3.5.0.post0-d8d0456.zip/cassandra-driver-3.5.0.post0-d8d0456/cassandra/cluster.py", > line 3650, in result > raise self._final_exception > UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 2: > invalid start byte > {code} > We tried to read the data and it would refuse to read the name column (the > UTF8 error) and the last_pin_timestamp column had an absurdly large value. > We ended up rewriting the whole row as we had the data in another place and > it fixed the problem. However there is clearly a race condition in the schema > change sub-system. > Any ideas? -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org