[jira] [Commented] (CASSANDRA-9136) Improve error handling when table is queried before the schema has fully propagated
[ https://issues.apache.org/jira/browse/CASSANDRA-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532183#comment-14532183 ] Sylvain Lebresne commented on CASSANDRA-9136: - +1 Improve error handling when table is queried before the schema has fully propagated --- Key: CASSANDRA-9136 URL: https://issues.apache.org/jira/browse/CASSANDRA-9136 Project: Cassandra Issue Type: Bug Components: Core Environment: 3 Nodes GCE, N1-Standard-2, Ubuntu 12, 1 Node on 2.1.4, 2 on 2.0.14 Reporter: Russell Alexander Spitzer Assignee: Tyler Hobbs Fix For: 2.1.x, 2.0.x Attachments: 9136-2.0-v2.txt, 9136-2.0.txt, 9136-2.1-v2.txt, 9136-2.1.txt This error occurs during a rolling upgrade between 2.0.14 and 2.1.4. h3. Repo With all the nodes on 2.0.14 make the following tables {code} CREATE KEYSPACE test WITH replication = { 'class': 'SimpleStrategy', 'replication_factor': '2' }; USE test; CREATE TABLE compact ( k int, c int, d int, PRIMARY KEY ((k), c) ) WITH COMPACT STORAGE; CREATE TABLE norm ( k int, c int, d int, PRIMARY KEY ((k), c) ) ; {code} Then load some data into these tables. I used the python driver {code} from cassandra.cluster import Cluster s = Cluster().connect() for x in range (1000): for y in range (1000): s.execute_async(INSERT INTO test.compact (k,c,d) VALUES (%d,%d,%d)%(x,y,y)) s.execute_async(INSERT INTO test.norm (k,c,d) VALUES (%d,%d,%d)%(x,y,y)) {code} Upgrade one node from 2.0.14 - 2.1.4 From the 2.1.4 node, create a new table. Query that table On the 2.0.14 nodes you get these exceptions because the schema didn't propagate there. This exception kills the TCP connection between the nodes. {code} ERROR [Thread-19] 2015-04-08 18:48:45,337 CassandraDaemon.java (line 258) Exception in thread Thread[Thread-19,5,main] java.lang.NullPointerException at org.apache.cassandra.db.RangeSliceCommandSerializer.deserialize(RangeSliceCommand.java:247) at org.apache.cassandra.db.RangeSliceCommandSerializer.deserialize(RangeSliceCommand.java:156) at org.apache.cassandra.net.MessageIn.read(MessageIn.java:99) at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:149) at org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:131) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:74) {code} Run cqlsh on the upgraded node and queries will fail until the TCP connection is established again, easiest to repo with CL = ALL {code} cqlsh SELECT count(*) FROM test.norm where k = 22 ; ReadTimeout: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message=Operation timed out - received only 1 responses. info={'received_responses': 1, 'required_responses': 2, 'consistency': 'ALL'} cqlsh SELECT count(*) FROM test.norm where k = 21 ; ReadTimeout: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message=Operation timed out - received only 1 responses. info={'received_responses': 1, 'required_responses': 2, 'consistency': 'ALL'} {code} So connection made: {code} DEBUG [Thread-227] 2015-04-09 05:09:02,718 IncomingTcpConnection.java (line 107) Set version for /10.240.14.115 to 8 (will use 7) {code} Connection broken by query of table before schema propagated: {code} ERROR [Thread-227] 2015-04-09 05:10:24,015 CassandraDaemon.java (line 258) Exception in thread Thread[Thread-227,5,main] java.lang.NullPointerException at org.apache.cassandra.db.RangeSliceCommandSerializer.deserialize(RangeSliceCommand.java:247) at org.apache.cassandra.db.RangeSliceCommandSerializer.deserialize(RangeSliceCommand.java:156) at org.apache.cassandra.net.MessageIn.read(MessageIn.java:99) at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:149) at org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:131) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:74) {code} All query to that node will fail with timeouts now until... Connection re-established {code} DEBUG [Thread-228] 2015-04-09 05:11:00,323 IncomingTcpConnection.java (line 107) Set version for /10.240.14.115 to 8 (will use 7) {code} Now queries work again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9136) Improve error handling when table is queried before the schema has fully propagated
[ https://issues.apache.org/jira/browse/CASSANDRA-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14530074#comment-14530074 ] Sylvain Lebresne commented on CASSANDRA-9136: - Patches looks good, though I would make the error messages more clear as to what is likely the issue, adding something along the lines of If the table was just created, this is likely due to its schema not having fully propagated yet, please wait for schema agreement on table creation. Improve error handling when table is queried before the schema has fully propagated --- Key: CASSANDRA-9136 URL: https://issues.apache.org/jira/browse/CASSANDRA-9136 Project: Cassandra Issue Type: Bug Components: Core Environment: 3 Nodes GCE, N1-Standard-2, Ubuntu 12, 1 Node on 2.1.4, 2 on 2.0.14 Reporter: Russell Alexander Spitzer Assignee: Tyler Hobbs Fix For: 2.1.x, 2.0.x Attachments: 9136-2.0.txt, 9136-2.1.txt This error occurs during a rolling upgrade between 2.0.14 and 2.1.4. h3. Repo With all the nodes on 2.0.14 make the following tables {code} CREATE KEYSPACE test WITH replication = { 'class': 'SimpleStrategy', 'replication_factor': '2' }; USE test; CREATE TABLE compact ( k int, c int, d int, PRIMARY KEY ((k), c) ) WITH COMPACT STORAGE; CREATE TABLE norm ( k int, c int, d int, PRIMARY KEY ((k), c) ) ; {code} Then load some data into these tables. I used the python driver {code} from cassandra.cluster import Cluster s = Cluster().connect() for x in range (1000): for y in range (1000): s.execute_async(INSERT INTO test.compact (k,c,d) VALUES (%d,%d,%d)%(x,y,y)) s.execute_async(INSERT INTO test.norm (k,c,d) VALUES (%d,%d,%d)%(x,y,y)) {code} Upgrade one node from 2.0.14 - 2.1.4 From the 2.1.4 node, create a new table. Query that table On the 2.0.14 nodes you get these exceptions because the schema didn't propagate there. This exception kills the TCP connection between the nodes. {code} ERROR [Thread-19] 2015-04-08 18:48:45,337 CassandraDaemon.java (line 258) Exception in thread Thread[Thread-19,5,main] java.lang.NullPointerException at org.apache.cassandra.db.RangeSliceCommandSerializer.deserialize(RangeSliceCommand.java:247) at org.apache.cassandra.db.RangeSliceCommandSerializer.deserialize(RangeSliceCommand.java:156) at org.apache.cassandra.net.MessageIn.read(MessageIn.java:99) at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:149) at org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:131) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:74) {code} Run cqlsh on the upgraded node and queries will fail until the TCP connection is established again, easiest to repo with CL = ALL {code} cqlsh SELECT count(*) FROM test.norm where k = 22 ; ReadTimeout: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message=Operation timed out - received only 1 responses. info={'received_responses': 1, 'required_responses': 2, 'consistency': 'ALL'} cqlsh SELECT count(*) FROM test.norm where k = 21 ; ReadTimeout: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message=Operation timed out - received only 1 responses. info={'received_responses': 1, 'required_responses': 2, 'consistency': 'ALL'} {code} So connection made: {code} DEBUG [Thread-227] 2015-04-09 05:09:02,718 IncomingTcpConnection.java (line 107) Set version for /10.240.14.115 to 8 (will use 7) {code} Connection broken by query of table before schema propagated: {code} ERROR [Thread-227] 2015-04-09 05:10:24,015 CassandraDaemon.java (line 258) Exception in thread Thread[Thread-227,5,main] java.lang.NullPointerException at org.apache.cassandra.db.RangeSliceCommandSerializer.deserialize(RangeSliceCommand.java:247) at org.apache.cassandra.db.RangeSliceCommandSerializer.deserialize(RangeSliceCommand.java:156) at org.apache.cassandra.net.MessageIn.read(MessageIn.java:99) at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:149) at org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:131) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:74) {code} All query to that node will fail with timeouts now until... Connection re-established {code} DEBUG [Thread-228] 2015-04-09 05:11:00,323 IncomingTcpConnection.java (line 107) Set version for /10.240.14.115 to 8 (will use 7) {code} Now queries work again. -- This message
[jira] [Commented] (CASSANDRA-9136) Improve error handling when table is queried before the schema has fully propagated
[ https://issues.apache.org/jira/browse/CASSANDRA-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524258#comment-14524258 ] Tyler Hobbs commented on CASSANDRA-9136: Since we haven't decided what the best solution for recovery is and we all agree on the error message part, I've opened CASSANDRA-9289 to deal with recovery separately. I'll have a patch and test for the error message shortly. Improve error handling when table is queried before the schema has fully propagated --- Key: CASSANDRA-9136 URL: https://issues.apache.org/jira/browse/CASSANDRA-9136 Project: Cassandra Issue Type: Bug Components: Core Environment: 3 Nodes GCE, N1-Standard-2, Ubuntu 12, 1 Node on 2.1.4, 2 on 2.0.14 Reporter: Russell Alexander Spitzer Assignee: Tyler Hobbs Fix For: 2.1.x This error occurs during a rolling upgrade between 2.0.14 and 2.1.4. h3. Repo With all the nodes on 2.0.14 make the following tables {code} CREATE KEYSPACE test WITH replication = { 'class': 'SimpleStrategy', 'replication_factor': '2' }; USE test; CREATE TABLE compact ( k int, c int, d int, PRIMARY KEY ((k), c) ) WITH COMPACT STORAGE; CREATE TABLE norm ( k int, c int, d int, PRIMARY KEY ((k), c) ) ; {code} Then load some data into these tables. I used the python driver {code} from cassandra.cluster import Cluster s = Cluster().connect() for x in range (1000): for y in range (1000): s.execute_async(INSERT INTO test.compact (k,c,d) VALUES (%d,%d,%d)%(x,y,y)) s.execute_async(INSERT INTO test.norm (k,c,d) VALUES (%d,%d,%d)%(x,y,y)) {code} Upgrade one node from 2.0.14 - 2.1.4 From the 2.1.4 node, create a new table. Query that table On the 2.0.14 nodes you get these exceptions because the schema didn't propagate there. This exception kills the TCP connection between the nodes. {code} ERROR [Thread-19] 2015-04-08 18:48:45,337 CassandraDaemon.java (line 258) Exception in thread Thread[Thread-19,5,main] java.lang.NullPointerException at org.apache.cassandra.db.RangeSliceCommandSerializer.deserialize(RangeSliceCommand.java:247) at org.apache.cassandra.db.RangeSliceCommandSerializer.deserialize(RangeSliceCommand.java:156) at org.apache.cassandra.net.MessageIn.read(MessageIn.java:99) at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:149) at org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:131) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:74) {code} Run cqlsh on the upgraded node and queries will fail until the TCP connection is established again, easiest to repo with CL = ALL {code} cqlsh SELECT count(*) FROM test.norm where k = 22 ; ReadTimeout: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message=Operation timed out - received only 1 responses. info={'received_responses': 1, 'required_responses': 2, 'consistency': 'ALL'} cqlsh SELECT count(*) FROM test.norm where k = 21 ; ReadTimeout: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message=Operation timed out - received only 1 responses. info={'received_responses': 1, 'required_responses': 2, 'consistency': 'ALL'} {code} So connection made: {code} DEBUG [Thread-227] 2015-04-09 05:09:02,718 IncomingTcpConnection.java (line 107) Set version for /10.240.14.115 to 8 (will use 7) {code} Connection broken by query of table before schema propagated: {code} ERROR [Thread-227] 2015-04-09 05:10:24,015 CassandraDaemon.java (line 258) Exception in thread Thread[Thread-227,5,main] java.lang.NullPointerException at org.apache.cassandra.db.RangeSliceCommandSerializer.deserialize(RangeSliceCommand.java:247) at org.apache.cassandra.db.RangeSliceCommandSerializer.deserialize(RangeSliceCommand.java:156) at org.apache.cassandra.net.MessageIn.read(MessageIn.java:99) at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:149) at org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:131) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:74) {code} All query to that node will fail with timeouts now until... Connection re-established {code} DEBUG [Thread-228] 2015-04-09 05:11:00,323 IncomingTcpConnection.java (line 107) Set version for /10.240.14.115 to 8 (will use 7) {code} Now queries work again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9136) Improve error handling when table is queried before the schema has fully propagated
[ https://issues.apache.org/jira/browse/CASSANDRA-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521761#comment-14521761 ] Jeremiah Jordan commented on CASSANDRA-9136: I think it would be good to be able to recover. That would be my preference. Yes you shouldn't query before schema has settled, but if you do, I don't think it shouldn't break all your other queries. But a better error message would at least give people a clue to what broke them. Improve error handling when table is queried before the schema has fully propagated --- Key: CASSANDRA-9136 URL: https://issues.apache.org/jira/browse/CASSANDRA-9136 Project: Cassandra Issue Type: Bug Components: Core Environment: 3 Nodes GCE, N1-Standard-2, Ubuntu 12, 1 Node on 2.1.4, 2 on 2.0.14 Reporter: Russell Alexander Spitzer Assignee: Tyler Hobbs Fix For: 2.1.x This error occurs during a rolling upgrade between 2.0.14 and 2.1.4. h3. Repo With all the nodes on 2.0.14 make the following tables {code} CREATE KEYSPACE test WITH replication = { 'class': 'SimpleStrategy', 'replication_factor': '2' }; USE test; CREATE TABLE compact ( k int, c int, d int, PRIMARY KEY ((k), c) ) WITH COMPACT STORAGE; CREATE TABLE norm ( k int, c int, d int, PRIMARY KEY ((k), c) ) ; {code} Then load some data into these tables. I used the python driver {code} from cassandra.cluster import Cluster s = Cluster().connect() for x in range (1000): for y in range (1000): s.execute_async(INSERT INTO test.compact (k,c,d) VALUES (%d,%d,%d)%(x,y,y)) s.execute_async(INSERT INTO test.norm (k,c,d) VALUES (%d,%d,%d)%(x,y,y)) {code} Upgrade one node from 2.0.14 - 2.1.4 From the 2.1.4 node, create a new table. Query that table On the 2.0.14 nodes you get these exceptions because the schema didn't propagate there. This exception kills the TCP connection between the nodes. {code} ERROR [Thread-19] 2015-04-08 18:48:45,337 CassandraDaemon.java (line 258) Exception in thread Thread[Thread-19,5,main] java.lang.NullPointerException at org.apache.cassandra.db.RangeSliceCommandSerializer.deserialize(RangeSliceCommand.java:247) at org.apache.cassandra.db.RangeSliceCommandSerializer.deserialize(RangeSliceCommand.java:156) at org.apache.cassandra.net.MessageIn.read(MessageIn.java:99) at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:149) at org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:131) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:74) {code} Run cqlsh on the upgraded node and queries will fail until the TCP connection is established again, easiest to repo with CL = ALL {code} cqlsh SELECT count(*) FROM test.norm where k = 22 ; ReadTimeout: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message=Operation timed out - received only 1 responses. info={'received_responses': 1, 'required_responses': 2, 'consistency': 'ALL'} cqlsh SELECT count(*) FROM test.norm where k = 21 ; ReadTimeout: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message=Operation timed out - received only 1 responses. info={'received_responses': 1, 'required_responses': 2, 'consistency': 'ALL'} {code} So connection made: {code} DEBUG [Thread-227] 2015-04-09 05:09:02,718 IncomingTcpConnection.java (line 107) Set version for /10.240.14.115 to 8 (will use 7) {code} Connection broken by query of table before schema propagated: {code} ERROR [Thread-227] 2015-04-09 05:10:24,015 CassandraDaemon.java (line 258) Exception in thread Thread[Thread-227,5,main] java.lang.NullPointerException at org.apache.cassandra.db.RangeSliceCommandSerializer.deserialize(RangeSliceCommand.java:247) at org.apache.cassandra.db.RangeSliceCommandSerializer.deserialize(RangeSliceCommand.java:156) at org.apache.cassandra.net.MessageIn.read(MessageIn.java:99) at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:149) at org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:131) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:74) {code} All query to that node will fail with timeouts now until... Connection re-established {code} DEBUG [Thread-228] 2015-04-09 05:11:00,323 IncomingTcpConnection.java (line 107) Set version for /10.240.14.115 to 8 (will use 7) {code} Now queries work again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9136) Improve error handling when table is queried before the schema has fully propagated
[ https://issues.apache.org/jira/browse/CASSANDRA-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14502902#comment-14502902 ] Sylvain Lebresne commented on CASSANDRA-9136: - It's not unreasonable per-se, but the fact that you have to manually pass how much bytes you've deserialized when throwing the exception makes this a bit error prone in general imo, even though it's arguably easy enough to proof check in this particular case (it would also make it slightly more annoying to add support for {{EncodedDataInputStream}} if we wanted too for instance, though that's a minor point). The intial idea I had was to use something like {{BytesReadTracker}} to make the counting automatic, but I'm married to that idea either though since it adds a small overhead in general which I don't like. Overall, I respect wanting to improve this but I think I'm of the opinion that simply making the error message a lot more clear should be good enough and that it's not worth trying to be too smart in recovering. Not a strong opinion though, just a data point. Improve error handling when table is queried before the schema has fully propagated --- Key: CASSANDRA-9136 URL: https://issues.apache.org/jira/browse/CASSANDRA-9136 Project: Cassandra Issue Type: Bug Components: Core Environment: 3 Nodes GCE, N1-Standard-2, Ubuntu 12, 1 Node on 2.1.4, 2 on 2.0.14 Reporter: Russell Alexander Spitzer Assignee: Tyler Hobbs Fix For: 2.1.5 This error occurs during a rolling upgrade between 2.0.14 and 2.1.4. h3. Repo With all the nodes on 2.0.14 make the following tables {code} CREATE KEYSPACE test WITH replication = { 'class': 'SimpleStrategy', 'replication_factor': '2' }; USE test; CREATE TABLE compact ( k int, c int, d int, PRIMARY KEY ((k), c) ) WITH COMPACT STORAGE; CREATE TABLE norm ( k int, c int, d int, PRIMARY KEY ((k), c) ) ; {code} Then load some data into these tables. I used the python driver {code} from cassandra.cluster import Cluster s = Cluster().connect() for x in range (1000): for y in range (1000): s.execute_async(INSERT INTO test.compact (k,c,d) VALUES (%d,%d,%d)%(x,y,y)) s.execute_async(INSERT INTO test.norm (k,c,d) VALUES (%d,%d,%d)%(x,y,y)) {code} Upgrade one node from 2.0.14 - 2.1.4 From the 2.1.4 node, create a new table. Query that table On the 2.0.14 nodes you get these exceptions because the schema didn't propagate there. This exception kills the TCP connection between the nodes. {code} ERROR [Thread-19] 2015-04-08 18:48:45,337 CassandraDaemon.java (line 258) Exception in thread Thread[Thread-19,5,main] java.lang.NullPointerException at org.apache.cassandra.db.RangeSliceCommandSerializer.deserialize(RangeSliceCommand.java:247) at org.apache.cassandra.db.RangeSliceCommandSerializer.deserialize(RangeSliceCommand.java:156) at org.apache.cassandra.net.MessageIn.read(MessageIn.java:99) at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:149) at org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:131) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:74) {code} Run cqlsh on the upgraded node and queries will fail until the TCP connection is established again, easiest to repo with CL = ALL {code} cqlsh SELECT count(*) FROM test.norm where k = 22 ; ReadTimeout: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message=Operation timed out - received only 1 responses. info={'received_responses': 1, 'required_responses': 2, 'consistency': 'ALL'} cqlsh SELECT count(*) FROM test.norm where k = 21 ; ReadTimeout: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message=Operation timed out - received only 1 responses. info={'received_responses': 1, 'required_responses': 2, 'consistency': 'ALL'} {code} So connection made: {code} DEBUG [Thread-227] 2015-04-09 05:09:02,718 IncomingTcpConnection.java (line 107) Set version for /10.240.14.115 to 8 (will use 7) {code} Connection broken by query of table before schema propagated: {code} ERROR [Thread-227] 2015-04-09 05:10:24,015 CassandraDaemon.java (line 258) Exception in thread Thread[Thread-227,5,main] java.lang.NullPointerException at org.apache.cassandra.db.RangeSliceCommandSerializer.deserialize(RangeSliceCommand.java:247) at org.apache.cassandra.db.RangeSliceCommandSerializer.deserialize(RangeSliceCommand.java:156) at org.apache.cassandra.net.MessageIn.read(MessageIn.java:99) at
[jira] [Commented] (CASSANDRA-9136) Improve error handling when table is queried before the schema has fully propagated
[ https://issues.apache.org/jira/browse/CASSANDRA-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14497108#comment-14497108 ] Tyler Hobbs commented on CASSANDRA-9136: [~slebresne] I've pushed a couple of commits to a [branch|https://github.com/thobbs/cassandra/tree/CASSANDRA-9136] as a basic example of what I would do (not complete or tested). The first just throws {{UnknownColumnFamilyException}}, the second recovers from that error during deserialization. Does that seem reasonable? Improve error handling when table is queried before the schema has fully propagated --- Key: CASSANDRA-9136 URL: https://issues.apache.org/jira/browse/CASSANDRA-9136 Project: Cassandra Issue Type: Bug Components: Core Environment: 3 Nodes GCE, N1-Standard-2, Ubuntu 12, 1 Node on 2.1.4, 2 on 2.0.14 Reporter: Russell Alexander Spitzer Assignee: Tyler Hobbs Fix For: 2.1.5 This error occurs during a rolling upgrade between 2.0.14 and 2.1.4. h3. Repo With all the nodes on 2.0.14 make the following tables {code} CREATE KEYSPACE test WITH replication = { 'class': 'SimpleStrategy', 'replication_factor': '2' }; USE test; CREATE TABLE compact ( k int, c int, d int, PRIMARY KEY ((k), c) ) WITH COMPACT STORAGE; CREATE TABLE norm ( k int, c int, d int, PRIMARY KEY ((k), c) ) ; {code} Then load some data into these tables. I used the python driver {code} from cassandra.cluster import Cluster s = Cluster().connect() for x in range (1000): for y in range (1000): s.execute_async(INSERT INTO test.compact (k,c,d) VALUES (%d,%d,%d)%(x,y,y)) s.execute_async(INSERT INTO test.norm (k,c,d) VALUES (%d,%d,%d)%(x,y,y)) {code} Upgrade one node from 2.0.14 - 2.1.4 From the 2.1.4 node, create a new table. Query that table On the 2.0.14 nodes you get these exceptions because the schema didn't propagate there. This exception kills the TCP connection between the nodes. {code} ERROR [Thread-19] 2015-04-08 18:48:45,337 CassandraDaemon.java (line 258) Exception in thread Thread[Thread-19,5,main] java.lang.NullPointerException at org.apache.cassandra.db.RangeSliceCommandSerializer.deserialize(RangeSliceCommand.java:247) at org.apache.cassandra.db.RangeSliceCommandSerializer.deserialize(RangeSliceCommand.java:156) at org.apache.cassandra.net.MessageIn.read(MessageIn.java:99) at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:149) at org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:131) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:74) {code} Run cqlsh on the upgraded node and queries will fail until the TCP connection is established again, easiest to repo with CL = ALL {code} cqlsh SELECT count(*) FROM test.norm where k = 22 ; ReadTimeout: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message=Operation timed out - received only 1 responses. info={'received_responses': 1, 'required_responses': 2, 'consistency': 'ALL'} cqlsh SELECT count(*) FROM test.norm where k = 21 ; ReadTimeout: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message=Operation timed out - received only 1 responses. info={'received_responses': 1, 'required_responses': 2, 'consistency': 'ALL'} {code} So connection made: {code} DEBUG [Thread-227] 2015-04-09 05:09:02,718 IncomingTcpConnection.java (line 107) Set version for /10.240.14.115 to 8 (will use 7) {code} Connection broken by query of table before schema propagated: {code} ERROR [Thread-227] 2015-04-09 05:10:24,015 CassandraDaemon.java (line 258) Exception in thread Thread[Thread-227,5,main] java.lang.NullPointerException at org.apache.cassandra.db.RangeSliceCommandSerializer.deserialize(RangeSliceCommand.java:247) at org.apache.cassandra.db.RangeSliceCommandSerializer.deserialize(RangeSliceCommand.java:156) at org.apache.cassandra.net.MessageIn.read(MessageIn.java:99) at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:149) at org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:131) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:74) {code} All query to that node will fail with timeouts now until... Connection re-established {code} DEBUG [Thread-228] 2015-04-09 05:11:00,323 IncomingTcpConnection.java (line 107) Set version for /10.240.14.115 to 8 (will use 7) {code} Now queries work again. -- This message was sent by Atlassian JIRA
[jira] [Commented] (CASSANDRA-9136) Improve error handling when table is queried before the schema has fully propagated
[ https://issues.apache.org/jira/browse/CASSANDRA-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14497030#comment-14497030 ] Tyler Hobbs commented on CASSANDRA-9136: Linking to CASSANDRA-8996 because this occasionally causes dtest failures when the default role setup triggers the NPE. Improve error handling when table is queried before the schema has fully propagated --- Key: CASSANDRA-9136 URL: https://issues.apache.org/jira/browse/CASSANDRA-9136 Project: Cassandra Issue Type: Bug Components: Core Environment: 3 Nodes GCE, N1-Standard-2, Ubuntu 12, 1 Node on 2.1.4, 2 on 2.0.14 Reporter: Russell Alexander Spitzer Assignee: Tyler Hobbs Fix For: 2.1.5 This error occurs during a rolling upgrade between 2.0.14 and 2.1.4. h3. Repo With all the nodes on 2.0.14 make the following tables {code} CREATE KEYSPACE test WITH replication = { 'class': 'SimpleStrategy', 'replication_factor': '2' }; USE test; CREATE TABLE compact ( k int, c int, d int, PRIMARY KEY ((k), c) ) WITH COMPACT STORAGE; CREATE TABLE norm ( k int, c int, d int, PRIMARY KEY ((k), c) ) ; {code} Then load some data into these tables. I used the python driver {code} from cassandra.cluster import Cluster s = Cluster().connect() for x in range (1000): for y in range (1000): s.execute_async(INSERT INTO test.compact (k,c,d) VALUES (%d,%d,%d)%(x,y,y)) s.execute_async(INSERT INTO test.norm (k,c,d) VALUES (%d,%d,%d)%(x,y,y)) {code} Upgrade one node from 2.0.14 - 2.1.4 From the 2.1.4 node, create a new table. Query that table On the 2.0.14 nodes you get these exceptions because the schema didn't propagate there. This exception kills the TCP connection between the nodes. {code} ERROR [Thread-19] 2015-04-08 18:48:45,337 CassandraDaemon.java (line 258) Exception in thread Thread[Thread-19,5,main] java.lang.NullPointerException at org.apache.cassandra.db.RangeSliceCommandSerializer.deserialize(RangeSliceCommand.java:247) at org.apache.cassandra.db.RangeSliceCommandSerializer.deserialize(RangeSliceCommand.java:156) at org.apache.cassandra.net.MessageIn.read(MessageIn.java:99) at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:149) at org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:131) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:74) {code} Run cqlsh on the upgraded node and queries will fail until the TCP connection is established again, easiest to repo with CL = ALL {code} cqlsh SELECT count(*) FROM test.norm where k = 22 ; ReadTimeout: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message=Operation timed out - received only 1 responses. info={'received_responses': 1, 'required_responses': 2, 'consistency': 'ALL'} cqlsh SELECT count(*) FROM test.norm where k = 21 ; ReadTimeout: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message=Operation timed out - received only 1 responses. info={'received_responses': 1, 'required_responses': 2, 'consistency': 'ALL'} {code} So connection made: {code} DEBUG [Thread-227] 2015-04-09 05:09:02,718 IncomingTcpConnection.java (line 107) Set version for /10.240.14.115 to 8 (will use 7) {code} Connection broken by query of table before schema propagated: {code} ERROR [Thread-227] 2015-04-09 05:10:24,015 CassandraDaemon.java (line 258) Exception in thread Thread[Thread-227,5,main] java.lang.NullPointerException at org.apache.cassandra.db.RangeSliceCommandSerializer.deserialize(RangeSliceCommand.java:247) at org.apache.cassandra.db.RangeSliceCommandSerializer.deserialize(RangeSliceCommand.java:156) at org.apache.cassandra.net.MessageIn.read(MessageIn.java:99) at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:149) at org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:131) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:74) {code} All query to that node will fail with timeouts now until... Connection re-established {code} DEBUG [Thread-228] 2015-04-09 05:11:00,323 IncomingTcpConnection.java (line 107) Set version for /10.240.14.115 to 8 (will use 7) {code} Now queries work again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)