[jira] [Commented] (CASSANDRA-6614) 2 hours loop flushing+compacting system/{schema_keyspaces,schema_columnfamilies,schema_columns} when upgrading
[ https://issues.apache.org/jira/browse/CASSANDRA-6614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079070#comment-14079070 ] Cyril Scetbon commented on CASSANDRA-6614: -- As told before, I only met it when I was upgrading my cluster from 1.2.2 to 1.2.13. Now that it's done I don't work anymore on it. I'll be able to get information about the next upgrade from 1.2.13 to 2.0.9+ 2 hours loop flushing+compacting system/{schema_keyspaces,schema_columnfamilies,schema_columns} when upgrading -- Key: CASSANDRA-6614 URL: https://issues.apache.org/jira/browse/CASSANDRA-6614 Project: Cassandra Issue Type: Bug Components: Core Environment: ubuntu 12.04 Reporter: Cyril Scetbon It happens when we upgrade one node to 1.2.13 on a 1.2.2 cluster see http://pastebin.com/YZKUQLXz If I grep for only InternalResponseStage logs I get http://pastebin.com/htnXZCiT which always displays same account of ops and serialized/live bytes per column family. When I upgrade one node from 1.2.2 to 1.2.13, for 2h I get the previous messages with a raise of CPU (as it flushes and compacts continually) on all nodes http://picpaste.com/pics/Screen_Shot_2014-01-24_at_09.18.50-ggcCDVqd.1390587562.png After that, everything is fine and I can upgrade other nodes without any raise of cpus load. when I start the upgrade, the more nodes I upgrade at the same time (at the beginning), the higher the cpu load is http://picpaste.com/pics/Screen_Shot_2014-01-23_at_17.45.56-I3fdEQ2T.1390587597.png -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6614) 2 hours loop flushing+compacting system/{schema_keyspaces,schema_columnfamilies,schema_columns} when upgrading
[ https://issues.apache.org/jira/browse/CASSANDRA-6614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14078410#comment-14078410 ] Michael Shuler commented on CASSANDRA-6614: --- [~cscetbon] and [~jasobrown] are you still seeing this behavior in the latest releases? I seem to recall some fixes around migration events.. 2 hours loop flushing+compacting system/{schema_keyspaces,schema_columnfamilies,schema_columns} when upgrading -- Key: CASSANDRA-6614 URL: https://issues.apache.org/jira/browse/CASSANDRA-6614 Project: Cassandra Issue Type: Bug Components: Core Environment: ubuntu 12.04 Reporter: Cyril Scetbon It happens when we upgrade one node to 1.2.13 on a 1.2.2 cluster see http://pastebin.com/YZKUQLXz If I grep for only InternalResponseStage logs I get http://pastebin.com/htnXZCiT which always displays same account of ops and serialized/live bytes per column family. When I upgrade one node from 1.2.2 to 1.2.13, for 2h I get the previous messages with a raise of CPU (as it flushes and compacts continually) on all nodes http://picpaste.com/pics/Screen_Shot_2014-01-24_at_09.18.50-ggcCDVqd.1390587562.png After that, everything is fine and I can upgrade other nodes without any raise of cpus load. when I start the upgrade, the more nodes I upgrade at the same time (at the beginning), the higher the cpu load is http://picpaste.com/pics/Screen_Shot_2014-01-23_at_17.45.56-I3fdEQ2T.1390587597.png -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6614) 2 hours loop flushing+compacting system/{schema_keyspaces,schema_columnfamilies,schema_columns} when upgrading
[ https://issues.apache.org/jira/browse/CASSANDRA-6614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13891436#comment-13891436 ] Jason Brown commented on CASSANDRA-6614: [~cscetbon] I've been told 100% we're doing drain, but still running into this. What we do notice is that our clusters are multi-DC, and the problem only seems to occur within the the region we are upgrading (the other DCs remain unaffected wrt read/write latency). Further, we are using the Ec2MultiRegionSnitch. [~cscetbon] Are you running a multi-DC cluster, and/or are you using a 'reconnectable snitch, like Ec2MultiRegionSnitch? 2 hours loop flushing+compacting system/{schema_keyspaces,schema_columnfamilies,schema_columns} when upgrading -- Key: CASSANDRA-6614 URL: https://issues.apache.org/jira/browse/CASSANDRA-6614 Project: Cassandra Issue Type: Bug Components: Core Environment: ubuntu 12.04 Reporter: Cyril Scetbon It happens when we upgrade one node to 1.2.13 on a 1.2.2 cluster see http://pastebin.com/YZKUQLXz If I grep for only InternalResponseStage logs I get http://pastebin.com/htnXZCiT which always displays same account of ops and serialized/live bytes per column family. When I upgrade one node from 1.2.2 to 1.2.13, for 2h I get the previous messages with a raise of CPU (as it flushes and compacts continually) on all nodes http://picpaste.com/pics/Screen_Shot_2014-01-24_at_09.18.50-ggcCDVqd.1390587562.png After that, everything is fine and I can upgrade other nodes without any raise of cpus load. when I start the upgrade, the more nodes I upgrade at the same time (at the beginning), the higher the cpu load is http://picpaste.com/pics/Screen_Shot_2014-01-23_at_17.45.56-I3fdEQ2T.1390587597.png -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-6614) 2 hours loop flushing+compacting system/{schema_keyspaces,schema_columnfamilies,schema_columns} when upgrading
[ https://issues.apache.org/jira/browse/CASSANDRA-6614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13891532#comment-13891532 ] Cyril Scetbon commented on CASSANDRA-6614: -- bq. I've been told 100% we're doing drain :) bq. Are you running a multi-DC cluster, and/or are you using a 'reconnectable snitch, like Ec2MultiRegionSnitch? I'm using a multi-DC configuration on AWS with a PropertyFileSnitch as I was testing the same configuration as our private configuration (which is not hosted in AWS) 2 hours loop flushing+compacting system/{schema_keyspaces,schema_columnfamilies,schema_columns} when upgrading -- Key: CASSANDRA-6614 URL: https://issues.apache.org/jira/browse/CASSANDRA-6614 Project: Cassandra Issue Type: Bug Components: Core Environment: ubuntu 12.04 Reporter: Cyril Scetbon It happens when we upgrade one node to 1.2.13 on a 1.2.2 cluster see http://pastebin.com/YZKUQLXz If I grep for only InternalResponseStage logs I get http://pastebin.com/htnXZCiT which always displays same account of ops and serialized/live bytes per column family. When I upgrade one node from 1.2.2 to 1.2.13, for 2h I get the previous messages with a raise of CPU (as it flushes and compacts continually) on all nodes http://picpaste.com/pics/Screen_Shot_2014-01-24_at_09.18.50-ggcCDVqd.1390587562.png After that, everything is fine and I can upgrade other nodes without any raise of cpus load. when I start the upgrade, the more nodes I upgrade at the same time (at the beginning), the higher the cpu load is http://picpaste.com/pics/Screen_Shot_2014-01-23_at_17.45.56-I3fdEQ2T.1390587597.png -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-6614) 2 hours loop flushing+compacting system/{schema_keyspaces,schema_columnfamilies,schema_columns} when upgrading
[ https://issues.apache.org/jira/browse/CASSANDRA-6614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886947#comment-13886947 ] Cyril Scetbon commented on CASSANDRA-6614: -- I found that using nodetool drain prevents from having this issue. However nodetool drain should only prevent overcounts of counter data and make the restart faster. And why only with the first node and not with others (maybe system column families have been flushed on others ??) [~jasobrown], did you use nodetool drain ? 2 hours loop flushing+compacting system/{schema_keyspaces,schema_columnfamilies,schema_columns} when upgrading -- Key: CASSANDRA-6614 URL: https://issues.apache.org/jira/browse/CASSANDRA-6614 Project: Cassandra Issue Type: Bug Components: Core Environment: ubuntu 12.04 Reporter: Cyril Scetbon It happens when we upgrade one node to 1.2.13 on a 1.2.2 cluster see http://pastebin.com/YZKUQLXz If I grep for only InternalResponseStage logs I get http://pastebin.com/htnXZCiT which always displays same account of ops and serialized/live bytes per column family. When I upgrade one node from 1.2.2 to 1.2.13, for 2h I get the previous messages with a raise of CPU (as it flushes and compacts continually) on all nodes http://picpaste.com/pics/Screen_Shot_2014-01-24_at_09.18.50-ggcCDVqd.1390587562.png After that, everything is fine and I can upgrade other nodes without any raise of cpus load. when I start the upgrade, the more nodes I upgrade at the same time (at the beginning), the higher the cpu load is http://picpaste.com/pics/Screen_Shot_2014-01-23_at_17.45.56-I3fdEQ2T.1390587597.png -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-6614) 2 hours loop flushing+compacting system/{schema_keyspaces,schema_columnfamilies,schema_columns} when upgrading
[ https://issues.apache.org/jira/browse/CASSANDRA-6614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881182#comment-13881182 ] Jason Brown commented on CASSANDRA-6614: We just started our upgrade of 1.1.7 to 1.2.12, and are seeing this, as well. It seems related to MigrationManager/MigrationTask where a node is endlessly looping getting retrieving the following system keyspaces from peers: schema_columnfamilies, schema_columns, schema_keyspaces 2 hours loop flushing+compacting system/{schema_keyspaces,schema_columnfamilies,schema_columns} when upgrading -- Key: CASSANDRA-6614 URL: https://issues.apache.org/jira/browse/CASSANDRA-6614 Project: Cassandra Issue Type: Bug Components: Core Environment: ubuntu 12.04 Reporter: Cyril Scetbon It happens when we upgrade one node to 1.2.13 on a 1.2.2 cluster see http://pastebin.com/YZKUQLXz If I grep for only InternalResponseStage logs I get http://pastebin.com/htnXZCiT which always displays same account of ops and serialized/live bytes per column family. When I upgrade one node from 1.2.2 to 1.2.13, for 2h I get the previous messages with a raise of CPU (as it flushes and compacts continually) on all nodes http://picpaste.com/pics/Screen_Shot_2014-01-24_at_09.18.50-ggcCDVqd.1390551670.png After that, everything is fine and I can upgrade other nodes without any raise of cpus load. when I start the upgrade, the more nodes I upgrade at the same time (at the beginning), the higher the cpu load is http://picpaste.com/pics/Screen_Shot_2014-01-23_at_17.45.56-I3fdEQ2T.1390552036.png -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-6614) 2 hours loop flushing+compacting system/{schema_keyspaces,schema_columnfamilies,schema_columns} when upgrading
[ https://issues.apache.org/jira/browse/CASSANDRA-6614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881196#comment-13881196 ] Jason Brown commented on CASSANDRA-6614: I also see this in the log: {code} INFO [GossipTasks:1] 2014-01-23 20:39:14,899 Gossiper.java (line 822) InetAddress /54.xx.yy.zz is now DOWN ERROR [MigrationStage:1] 2014-01-23 20:39:15,823 MigrationTask.java (line 55) Can't send migration request: node /54.xx.yy.zz is down. ERROR [MigrationStage:1] 2014-01-23 20:39:17,842 MigrationTask.java (line 55) Can't send migration request: node /54.xx.yy.zz is down. ERROR [MigrationStage:1] 2014-01-23 20:39:19,862 MigrationTask.java (line 55) Can't send migration request: node /54.xx.yy.zz is down. ERROR [MigrationStage:1] 2014-01-23 20:39:20,862 MigrationTask.java (line 55) Can't send migration request: node /54.xx.yy.zz is down. ERROR [MigrationStage:1] 2014-01-23 20:39:21,091 MigrationTask.java (line 55) Can't send migration request: node /54.xx.yy.zz is down. ERROR [MigrationStage:1] 2014-01-23 20:39:23,708 MigrationTask.java (line 55) Can't send migration request: node /54.xx.yy.zz is down. ERROR [MigrationStage:1] 2014-01-23 20:39:24,597 MigrationTask.java (line 55) Can't send migration request: node /54.xx.yy.zz is down. ERROR [MigrationStage:1] 2014-01-23 20:39:25,874 MigrationTask.java (line 55) Can't send migration request: node /54.xx.yy.zz is down. ERROR [MigrationStage:1] 2014-01-23 20:39:27,879 MigrationTask.java (line 55) Can't send migration request: node /54.xx.yy.zz is down. ERROR [MigrationStage:1] 2014-01-23 20:39:28,231 MigrationTask.java (line 55) Can't send migration request: node /54.xx.yy.zz is down. ERROR [MigrationStage:1] 2014-01-23 20:39:30,784 MigrationTask.java (line 55) Can't send migration request: node /54.xx.yy.zz is down. ERROR [MigrationStage:1] 2014-01-23 20:39:31,656 MigrationTask.java (line 55) Can't send migration request: node /54.xx.yy.zz is down. INFO [HANDSHAKE-ec2-54.xx.yy.zz/54.xx.yy.zz] 2014-01-23 20:41:59,793 OutboundTcpConnection.java (line 399) Handshaking version with ec2-54.xx.yy.zz/54.xx.yy.zz INFO [GossipStage:1] 2014-01-23 20:42:00,348 Gossiper.java (line 840) Node /54.xx.yy.zz has restarted, now UP {code} 2 hours loop flushing+compacting system/{schema_keyspaces,schema_columnfamilies,schema_columns} when upgrading -- Key: CASSANDRA-6614 URL: https://issues.apache.org/jira/browse/CASSANDRA-6614 Project: Cassandra Issue Type: Bug Components: Core Environment: ubuntu 12.04 Reporter: Cyril Scetbon It happens when we upgrade one node to 1.2.13 on a 1.2.2 cluster see http://pastebin.com/YZKUQLXz If I grep for only InternalResponseStage logs I get http://pastebin.com/htnXZCiT which always displays same account of ops and serialized/live bytes per column family. When I upgrade one node from 1.2.2 to 1.2.13, for 2h I get the previous messages with a raise of CPU (as it flushes and compacts continually) on all nodes http://picpaste.com/pics/Screen_Shot_2014-01-24_at_09.18.50-ggcCDVqd.1390551670.png After that, everything is fine and I can upgrade other nodes without any raise of cpus load. when I start the upgrade, the more nodes I upgrade at the same time (at the beginning), the higher the cpu load is http://picpaste.com/pics/Screen_Shot_2014-01-23_at_17.45.56-I3fdEQ2T.1390552036.png -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-6614) 2 hours loop flushing+compacting system/{schema_keyspaces,schema_columnfamilies,schema_columns} when upgrading
[ https://issues.apache.org/jira/browse/CASSANDRA-6614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13881372#comment-13881372 ] Ravi Prasad commented on CASSANDRA-6614: Seeing this too, while upgrading from 1.2.9 to 2.0.4. As Jason mentioned, found this subsides, once all the nodes in the cluster are upgraded or on same schema. 2 hours loop flushing+compacting system/{schema_keyspaces,schema_columnfamilies,schema_columns} when upgrading -- Key: CASSANDRA-6614 URL: https://issues.apache.org/jira/browse/CASSANDRA-6614 Project: Cassandra Issue Type: Bug Components: Core Environment: ubuntu 12.04 Reporter: Cyril Scetbon It happens when we upgrade one node to 1.2.13 on a 1.2.2 cluster see http://pastebin.com/YZKUQLXz If I grep for only InternalResponseStage logs I get http://pastebin.com/htnXZCiT which always displays same account of ops and serialized/live bytes per column family. When I upgrade one node from 1.2.2 to 1.2.13, for 2h I get the previous messages with a raise of CPU (as it flushes and compacts continually) on all nodes http://picpaste.com/pics/Screen_Shot_2014-01-24_at_09.18.50-ggcCDVqd.1390587562.png After that, everything is fine and I can upgrade other nodes without any raise of cpus load. when I start the upgrade, the more nodes I upgrade at the same time (at the beginning), the higher the cpu load is http://picpaste.com/pics/Screen_Shot_2014-01-23_at_17.45.56-I3fdEQ2T.1390587597.png -- This message was sent by Atlassian JIRA (v6.1.5#6160)