Russell Alexander Spitzer created CASSANDRA-7240: ----------------------------------------------------
Summary: Altering Keyspace Replication On Large Cluster With vnodes Leads to Warns on All nodes Key: CASSANDRA-7240 URL: https://issues.apache.org/jira/browse/CASSANDRA-7240 Project: Cassandra Issue Type: Bug Components: Core Environment: 1000 Nodes M1.large ubuntu 12.04 Reporter: Russell Alexander Spitzer 1000 Node cluster started with vnodes(256) on. 25 separate Nodes began an all write workload against the first 1000 nodes. During the test I attempted to alter the key-space from simple strategy to a network topology strategy. {code} cqlsh> ALTER KEYSPACE "Keyspace1" WITH replication = {'class': 'NetworkTopologyStrategy', 'DC1': '3', 'DC2':'3'} AND durable_writes = true; errors={}, last_host=127.0.0.1 cqlsh> ALTER KEYSPACE "Keyspace1" WITH replication = {'class': 'NetworkTopologyStrategy', 'DC1': '3', 'DC2':'3'} AND durable_writes = true; ('Unable to complete the operation against any hosts', {<Host: 127.0.0.1 DC1>: ConnectionShutdown('Connection to 127.0.0.1 is defunct',)}) {code} All one thousand nodes then began to repeat the following in their respective logs {code} WARN [Thread-50131] 2014-05-14 23:34:07,631 IncomingTcpConnection.java:91 - UnknownColumnFamilyException reading from socket; closing org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=46b7b090-dbaf-11e3-8413-fffd4403e7d2 at org.apache.cassandra.db.ColumnFamilySerializer.deserializeCfId(ColumnFamilySerializer.java:164) ~[apache-cassandra-2.1.0-beta2.jar:2.1.0-beta2] at org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:97) ~[apache-cassandra-2.1.0-beta2.jar:2.1.0-beta2] at org.apache.cassandra.db.Mutation$MutationSerializer.deserializeOneCf(Mutation.java:318) ~[apache-cassandra-2.1.0-beta2.jar:2.1.0-beta2] at org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:298) ~[apache-cassandra-2.1.0-beta2.jar:2.1.0-beta2] at org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:326) ~[apache-cassandra-2.1.0-beta2.jar:2.1.0-beta2] at org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:268) ~[apache-cassandra-2.1.0-beta2.jar:2.1.0-beta2] at org.apache.cassandra.net.MessageIn.read(MessageIn.java:99) ~[apache-cassandra-2.1.0-beta2.jar:2.1.0-beta2] at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:165) ~[apache-cassandra-2.1.0-beta2.jar:2.1.0-beta2] at org.apache.cassandra.net.IncomingTcpConnection.handleModernVersion(IncomingTcpConnection.java:147) ~[apache-cassandra-2.1.0-beta2.jar:2.1.0-beta2] at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:82) ~[apache-cassandra-2.1.0-beta2.jar:2.1.0-beta2] {code} Stress continued but at a decreased speed {code} Excerpt from one of the 25 Stress Nodes 83222847 , 14602, 14602, 6.7, 2.1, 23.1, 132.1, 292.3, 531.3, 5216.5, 0.00188 83239512 , 13888, 13888, 7.3, 2.1, 31.3, 129.9, 267.9, 555.8, 5217.7, 0.00188 83258520 , 14301, 14301, 7.0, 2.1, 28.8, 125.4, 297.2, 758.1, 5219.0, 0.00188 83277750 , 14023, 14023, 7.1, 2.1, 28.4, 132.8, 292.3, 703.6, 5220.4, 0.00188 83301413 , 14410, 14410, 6.9, 2.1, 24.5, 124.8, 391.4, 1010.1, 5222.0, 0.00188 83316846 , 12313, 12313, 8.1, 2.1, 35.1, 168.2, 275.3, 467.9, 5223.3, 0.00188 83332883 , 13753, 13753, 6.9, 2.1, 28.1, 132.2, 276.1, 498.9, 5224.4, 0.00188 #ALTER REQUEST HERE 83351413 , 9981, 9981, 9.9, 2.1, 46.7, 172.0, 447.8, 1327.9, 5226.3, 0.00188 83358381 , 4464, 4464, 22.7, 2.2, 125.9, 257.8, 594.6, 1650.6, 5227.8, 0.00188 83363153 , 3186, 3186, 31.7, 2.5, 153.0, 300.3, 477.0, 566.1, 5229.3, 0.00189 83367341 , 2967, 2967, 33.7, 2.4, 173.9, 311.5, 465.8, 761.9, 5230.7, 0.00190 83370738 , 2392, 2392, 41.4, 2.9, 208.0, 308.1, 434.8, 839.6, 5232.2, 0.00191 83373651 , 2283, 2283, 43.0, 2.5, 213.9, 310.5, 409.3, 503.3, 5233.4, 0.00192 {code} -- This message was sent by Atlassian JIRA (v6.2#6252)