Re: Three questions about cassandra
Thanks! Hadmut
Re: Three questions about cassandra
There is a window after a node goes down that changes that node should have gotten will be kept. If the node is down LONGER than that, it will server stale data. If the consistency is greater than two, its data will be ignored (if consistency one, its data could be the first returned, if consistency two then the application needs to be able to handle such a situation. Nodetool repair needs to be run in this case to get data consistent. Cleanup does more than make things pretty, but it will do that. The comment about disabling the thrift listener is related to preventing the node serving old data if the timeout I mention above has expired between the time the node comes on line and the time the repair is completed. One of the advantages of using e.g. Ansible is that it can be configured to whack an errant node's thrift listener BEFORE it starts the node's Cass instance. Agent based tools like Puppet and Chef can have this magic performed. This automatically start Cass vs. NOT automatically starting the service sometimes makes for interesting religious wars. And obviously if the node didn't stop but just lost network connections, there are advantages to agent based tools. *...* *“Life should not be a journey to the grave with the intention of arriving safely in apretty and well preserved body, but rather to skid in broadside in a cloud of smoke,thoroughly used up, totally worn out, and loudly proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872* On Fri, Nov 27, 2015 at 3:51 AM, Hadmut Danischwrote: > Thanks! > > Hadmut >
Issues on upgrading from 2.2.3 to 3.0
Hello all, I had 2 of my systems upgraded to 3.0 from the same previous version. The first cluster seem to be fine. But the second, each node starts and then fails. On the log I have the following on all of them: INFO [main] 2015-11-27 19:40:21,168 ColumnFamilyStore.java:381 - Initializing system_schema.keyspaces INFO [main] 2015-11-27 19:40:21,177 ColumnFamilyStore.java:381 - Initializing system_schema.tables INFO [main] 2015-11-27 19:40:21,185 ColumnFamilyStore.java:381 - Initializing system_schema.columns INFO [main] 2015-11-27 19:40:21,192 ColumnFamilyStore.java:381 - Initializing system_schema.triggers INFO [main] 2015-11-27 19:40:21,198 ColumnFamilyStore.java:381 - Initializing system_schema.dropped_columns INFO [main] 2015-11-27 19:40:21,203 ColumnFamilyStore.java:381 - Initializing system_schema.views INFO [main] 2015-11-27 19:40:21,208 ColumnFamilyStore.java:381 - Initializing system_schema.types INFO [main] 2015-11-27 19:40:21,215 ColumnFamilyStore.java:381 - Initializing system_schema.functions INFO [main] 2015-11-27 19:40:21,220 ColumnFamilyStore.java:381 - Initializing system_schema.aggregates INFO [main] 2015-11-27 19:40:21,225 ColumnFamilyStore.java:381 - Initializing system_schema.indexes ERROR [main] 2015-11-27 19:40:21,831 CassandraDaemon.java:250 - Cannot start node if snitch's rack differs from previous rack. Please fix the snitch or decommission and rebootstrap this node. It asks to "Please fix the snitch or decommission and rebootstrap this node" If none of the nodes can go up, how can I decommission all of them? Doesn't make sense. Any suggestions? Thanks, C.
Huge ReadStage Pending tasks during startup
Hello! We have some strange troubles with cassandra startup. Cluster consists of 4 nodes. 32 Gb RAM per node, each node has about 30Gb of data, 8 CPU. root@vega010:~# nodetool version ReleaseVersion: 2.2.1 So, before stop (using disablethrift, drain): nodetool tpstats: Read Stage 0 0 3093579 0 0 Just after start in logs: INFO [main] http://airmail.calendar/2015-11-25%2013:22:04%20GMT+3 YamlConfigurationLoader.java:92 - Loading settings from file:/etc/cassandra/cassandra.yaml . . . skipped . . . INFO [main] http://airmail.calendar/2015-11-25%2013:22:21%20GMT+3 CommitLog.java:168 - Replaying /var/lib/cassandra/commitlog/CommitLog–5–1448388020045.log, /var/lib/cassandra/commitlog/CommitLog–5–1448388020046.log, /var/lib/cassand . . .skipped. . . INFO [main] http://airmail.calendar/2015-11-25%2013:23:44%20GMT+3 CommitLog.java:170 - Log replay complete, 1047857 replayed mutations . . . skipped .. . INFO [CompactionExecutor:4] http://airmail.calendar/2015-11-25%2013:23:45%20GMT+3 CompactionTask.java:142 - Compacting (cf08d1d0–93ba–11e5-b9f0–7be7ca1986fb) [/var/lib/cassandra/data/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/la–3479-big-Data.db:level=0, /var/lib/cassandra/data/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/la–3474-big-Data.db:level=0, /var/lib/cassandra/data/system/compaction_history-b4db . . . skipped. . . INFO [HANDSHAKE-/10.50.2.60] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 OutboundTcpConnection.java:494 - Handshaking version with /10.50.2.60 INFO [GossipStage:1] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 Gossiper.java:1003 - Node /10.50.2.66 has restarted, now UP WARN [GossipTasks:1] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 FailureDetector.java:243 - Not marking nodes down due to local pause of 101075806441 > 50 INFO [GossipStage:1] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 StorageService.java:1869 - Node /10.50.2.66 state jump to normal INFO [HANDSHAKE-/10.50.2.60] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 OutboundTcpConnection.java:494 - Handshaking version with /10.50.2.60 INFO [GossipStage:1] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 Gossiper.java:1003 - Node /10.50.2.60 has restarted, now UP INFO [GossipStage:1] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 StorageService.java:1869 - Node /10.50.2.60 state jump to normal INFO [GossipStage:1] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 Gossiper.java:1003 - Node /10.50.2.57 has restarted, now UP INFO [HANDSHAKE-/10.50.2.66] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 OutboundTcpConnection.java:494 - Handshaking version with /10.50.2.66 INFO [HANDSHAKE-/10.50.2.57] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 OutboundTcpConnection.java:494 - Handshaking version with /10.50.2.57 INFO [GossipStage:1] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 StorageService.java:1869 - Node /10.50.2.57 state jump to normal INFO [SharedPool-Worker–20] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 Gossiper.java:970 - InetAddress /10.50.2.60 is now UP INFO [main] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 ColumnFamilyStore.java:743 - Completed loading (557 ms; 7022 shards) counter cache for SourcesAggregatedEventsV2.StoryReadingTimeSumPerDay_UTC_P_7 INFO [HANDSHAKE-/10.50.2.66] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 OutboundTcpConnection.java:494 - Handshaking version with /10.50.2.66 INFO [main] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 AutoSavingCache.java:146 - reading saved cache /var/lib/cassandra/saved_caches/SourcesAggregatedEventsV2-StoryReadingTimeSumPerDay_UTC_N_2-f318e310735f11e5b9599b83dc51d0b0-CounterCache-c.db INFO [SharedPool-Worker–13] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 Gossiper.java:970 - InetAddress /10.50.2.57 is now UP INFO [SharedPool-Worker–3] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 Gossiper.java:970 - InetAddress /10.50.2.57 is now UP INFO [SharedPool-Worker–16] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 Gossiper.java:970 - InetAddress /10.50.2.57 is now UP INFO [GossipStage:1] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 StorageService.java:1869 - Node /10.50.2.60 state jump to normal INFO [SharedPool-Worker–4] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 Gossiper.java:970 - InetAddress /10.50.2.66 is now UP INFO [SharedPool-Worker–20] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 Gossiper.java:970 - InetAddress /10.50.2.57 is now UP INFO [SharedPool-Worker–1] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 Gossiper.java:970 - InetAddress /10.50.2.66 is now UP INFO [SharedPool-Worker–5] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 Gossiper.java:970 - InetAddress /10.50.2.66 is now UP INFO [SharedPool-Worker–2] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 Gossiper.java:970 - InetAddress