Re: Three questions about cassandra

2015-11-27 Thread Hadmut Danisch
Thanks! 

Hadmut


Re: Three questions about cassandra

2015-11-27 Thread daemeon reiydelle
There is a window after a node goes down that changes that node should have
gotten will be kept. If the node is down LONGER than that, it will server
stale data. If the consistency is greater than two, its data will be
ignored (if consistency one, its data could be the first returned, if
consistency two then the application needs to be able to handle such a
situation. Nodetool repair needs to be run in this case to get data
consistent. Cleanup does more than make things pretty, but it will do that.

The comment about disabling the thrift listener is related to preventing
the node serving old data if the timeout I mention above has expired
between the time the node comes on line and the time the repair is
completed.

One of the advantages of using e.g. Ansible is that it can be configured to
whack an errant node's thrift listener BEFORE it starts the node's Cass
instance. Agent based tools like Puppet and Chef can have this magic
performed. This automatically start Cass vs. NOT automatically starting the
service sometimes makes for interesting religious wars. And obviously if
the node didn't stop but just lost network connections, there are
advantages to agent based tools.





*...*






*“Life should not be a journey to the grave with the intention of arriving
safely in apretty and well preserved body, but rather to skid in broadside
in a cloud of smoke,thoroughly used up, totally worn out, and loudly
proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
(+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Fri, Nov 27, 2015 at 3:51 AM, Hadmut Danisch  wrote:

> Thanks!
>
> Hadmut
>


Issues on upgrading from 2.2.3 to 3.0

2015-11-27 Thread Carlos A
Hello all,

I had 2 of my systems upgraded to 3.0 from the same previous version.

The first cluster seem to be fine.

But the second, each node starts and then fails.

On the log I have the following on all of them:

INFO  [main] 2015-11-27 19:40:21,168 ColumnFamilyStore.java:381 -
Initializing system_schema.keyspaces
INFO  [main] 2015-11-27 19:40:21,177 ColumnFamilyStore.java:381 -
Initializing system_schema.tables
INFO  [main] 2015-11-27 19:40:21,185 ColumnFamilyStore.java:381 -
Initializing system_schema.columns
INFO  [main] 2015-11-27 19:40:21,192 ColumnFamilyStore.java:381 -
Initializing system_schema.triggers
INFO  [main] 2015-11-27 19:40:21,198 ColumnFamilyStore.java:381 -
Initializing system_schema.dropped_columns
INFO  [main] 2015-11-27 19:40:21,203 ColumnFamilyStore.java:381 -
Initializing system_schema.views
INFO  [main] 2015-11-27 19:40:21,208 ColumnFamilyStore.java:381 -
Initializing system_schema.types
INFO  [main] 2015-11-27 19:40:21,215 ColumnFamilyStore.java:381 -
Initializing system_schema.functions
INFO  [main] 2015-11-27 19:40:21,220 ColumnFamilyStore.java:381 -
Initializing system_schema.aggregates
INFO  [main] 2015-11-27 19:40:21,225 ColumnFamilyStore.java:381 -
Initializing system_schema.indexes
ERROR [main] 2015-11-27 19:40:21,831 CassandraDaemon.java:250 - Cannot
start node if snitch's rack differs from previous rack. Please fix the
snitch or decommission and rebootstrap this node.

It asks to "Please fix the snitch or decommission and rebootstrap this node"

If none of the nodes can go up, how can I decommission all of them?

Doesn't make sense.

Any suggestions?

Thanks,

C.


Huge ReadStage Pending tasks during startup

2015-11-27 Thread Vasiliy I Ozerov
Hello!

We have some strange troubles with cassandra startup. Cluster consists of 4 
nodes. 32 Gb RAM per node, each node has about 30Gb of data, 8 CPU.

root@vega010:~# nodetool version ReleaseVersion: 2.2.1

So, before stop (using disablethrift, drain):

nodetool tpstats: Read Stage 0 0 3093579 0 0

Just after start in logs:

INFO [main] http://airmail.calendar/2015-11-25%2013:22:04%20GMT+3 
YamlConfigurationLoader.java:92 - Loading settings from 
file:/etc/cassandra/cassandra.yaml 
. . . skipped . . .
INFO [main] http://airmail.calendar/2015-11-25%2013:22:21%20GMT+3 
CommitLog.java:168 - Replaying 
/var/lib/cassandra/commitlog/CommitLog–5–1448388020045.log, 
/var/lib/cassandra/commitlog/CommitLog–5–1448388020046.log, /var/lib/cassand
. . .skipped. . .
INFO [main] http://airmail.calendar/2015-11-25%2013:23:44%20GMT+3 
CommitLog.java:170 - Log replay complete, 1047857 replayed mutations
. . . skipped .. .
INFO [CompactionExecutor:4] 
http://airmail.calendar/2015-11-25%2013:23:45%20GMT+3 CompactionTask.java:142 - 
Compacting (cf08d1d0–93ba–11e5-b9f0–7be7ca1986fb) 
[/var/lib/cassandra/data/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/la–3479-big-Data.db:level=0,
 
/var/lib/cassandra/data/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/la–3474-big-Data.db:level=0,
 /var/lib/cassandra/data/system/compaction_history-b4db
. . . skipped. . .
INFO [HANDSHAKE-/10.50.2.60] 
http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 
OutboundTcpConnection.java:494 - Handshaking version with /10.50.2.60
INFO [GossipStage:1] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 
Gossiper.java:1003 - Node /10.50.2.66 has restarted, now UP
WARN [GossipTasks:1] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 
FailureDetector.java:243 - Not marking nodes down due to local pause of 
101075806441 > 50
INFO [GossipStage:1] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 
StorageService.java:1869 - Node /10.50.2.66 state jump to normal
INFO [HANDSHAKE-/10.50.2.60] 
http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 
OutboundTcpConnection.java:494 - Handshaking version with /10.50.2.60
INFO [GossipStage:1] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 
Gossiper.java:1003 - Node /10.50.2.60 has restarted, now UP
INFO [GossipStage:1] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 
StorageService.java:1869 - Node /10.50.2.60 state jump to normal
INFO [GossipStage:1] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 
Gossiper.java:1003 - Node /10.50.2.57 has restarted, now UP
INFO [HANDSHAKE-/10.50.2.66] 
http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 
OutboundTcpConnection.java:494 - Handshaking version with /10.50.2.66
INFO [HANDSHAKE-/10.50.2.57] 
http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 
OutboundTcpConnection.java:494 - Handshaking version with /10.50.2.57
INFO [GossipStage:1] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 
StorageService.java:1869 - Node /10.50.2.57 state jump to normal
INFO [SharedPool-Worker–20] 
http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 Gossiper.java:970 - 
InetAddress /10.50.2.60 is now UP
INFO [main] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 
ColumnFamilyStore.java:743 - Completed loading (557 ms; 7022 shards) counter 
cache for SourcesAggregatedEventsV2.StoryReadingTimeSumPerDay_UTC_P_7
INFO [HANDSHAKE-/10.50.2.66] 
http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 
OutboundTcpConnection.java:494 - Handshaking version with /10.50.2.66
INFO [main] http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 
AutoSavingCache.java:146 - reading saved cache 
/var/lib/cassandra/saved_caches/SourcesAggregatedEventsV2-StoryReadingTimeSumPerDay_UTC_N_2-f318e310735f11e5b9599b83dc51d0b0-CounterCache-c.db
INFO [SharedPool-Worker–13] 
http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 Gossiper.java:970 - 
InetAddress /10.50.2.57 is now UP INFO [SharedPool-Worker–3] 
http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 Gossiper.java:970 - 
InetAddress /10.50.2.57 is now UP
INFO [SharedPool-Worker–16] 
http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 Gossiper.java:970 - 
InetAddress /10.50.2.57 is now UP INFO [GossipStage:1] 
http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 StorageService.java:1869 
- Node /10.50.2.60 state jump to normal
INFO [SharedPool-Worker–4] 
http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 Gossiper.java:970 - 
InetAddress /10.50.2.66 is now UP INFO [SharedPool-Worker–20] 
http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 Gossiper.java:970 - 
InetAddress /10.50.2.57 is now UP
INFO [SharedPool-Worker–1] 
http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 Gossiper.java:970 - 
InetAddress /10.50.2.66 is now UP INFO [SharedPool-Worker–5] 
http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 Gossiper.java:970 - 
InetAddress /10.50.2.66 is now UP
INFO [SharedPool-Worker–2] 
http://airmail.calendar/2015-11-25%2013:23:46%20GMT+3 Gossiper.java:970 - 
InetAddress