Re: Three questions about cassandra
There is a window after a node goes down that changes that node should have gotten will be kept. If the node is down LONGER than that, it will server stale data. If the consistency is greater than two, its data will be ignored (if consistency one, its data could be the first returned, if consistency two then the application needs to be able to handle such a situation. Nodetool repair needs to be run in this case to get data consistent. Cleanup does more than make things pretty, but it will do that. The comment about disabling the thrift listener is related to preventing the node serving old data if the timeout I mention above has expired between the time the node comes on line and the time the repair is completed. One of the advantages of using e.g. Ansible is that it can be configured to whack an errant node's thrift listener BEFORE it starts the node's Cass instance. Agent based tools like Puppet and Chef can have this magic performed. This automatically start Cass vs. NOT automatically starting the service sometimes makes for interesting religious wars. And obviously if the node didn't stop but just lost network connections, there are advantages to agent based tools. *...* *“Life should not be a journey to the grave with the intention of arriving safely in apretty and well preserved body, but rather to skid in broadside in a cloud of smoke,thoroughly used up, totally worn out, and loudly proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872* On Fri, Nov 27, 2015 at 3:51 AM, Hadmut Danisch wrote: > Thanks! > > Hadmut >
Re: Three questions about cassandra
Thanks! Hadmut
Re: Three questions about cassandra
1) It comes online in its former state. The operator is responsible for consistency beyond that point. Common solutions would be `nodetool repair` (and if you get really smart, you can start the daemon with the thrift/native listeners disabled, run repair, and then enable listeners, so that when it DOES serve requests, they’re not out of date) 2) Consistency level tells cassandra how many replicas it will wait to acknowledge the write - it doesn’t necessarily tell us how many replicas will/won’t get the write (even writing at QUORUM, it’s likely that replicas will get the write). Those that do not may get the writes later via read repair, or explicit repair (`nodetool repair`). 3) Yes, joining nodes acquire a part of the token range, and data will be streamed to the joining node On 11/26/15, 7:10 AM, "Hadmut Danisch" wrote: >Hi, > >I'm currently reading through heaps of docs and web pages to learn >cassandra, but there's still three questions I could not find answers >for, maybe someone could help: > > >1. What happens, if a node is down for some time (hours, days, > weeks,...) for whatever reason (hardware, power, or network > failure, maintenance...) and gets back online? > > Does the node remain in its former state and thus become > inconsistent, have outdated data, or does it update the changes > that occured during its downtime from other nodes? > > Can nodes be easily offline for some time, then return and proceed, > or do they have to be added as a fresh node replacement (of their > own) to start from scratch? > > > >2. cassandra allows to choose from several data consistency levels, > especially allowing write access that does not update all nodes > (i.e. QUORUM, ONE, TWO, THREE). > > What happens with those nodes who did not get an update? Will they > synchronize with the updated nodes automatically, or will they > remain in their old state (forever or until next explicit write > access)? > > > > > >3. What exactly happens, when a new node is added to a cluster? Will > all records now belonging to the new node be automatically shifted > from others? > > Web page > > http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html > describes a "streaming process", which sounds as if a new node was > busy to collect it's belongings from others, but it also says to > perform a > > nodetool cleanup > > on all the old nodes, which would "remove the keys no longer > belonging to those nodes", which rather sounds like a simple drop, > i.e. having those records lost. > > So does cassandra safely fill new nodes, or do they start as empty > ones and their data is lost? > > > >Thank you! > >regards >Hadmut smime.p7s Description: S/MIME cryptographic signature
Three questions about cassandra
Hi, I'm currently reading through heaps of docs and web pages to learn cassandra, but there's still three questions I could not find answers for, maybe someone could help: 1. What happens, if a node is down for some time (hours, days, weeks,...) for whatever reason (hardware, power, or network failure, maintenance...) and gets back online? Does the node remain in its former state and thus become inconsistent, have outdated data, or does it update the changes that occured during its downtime from other nodes? Can nodes be easily offline for some time, then return and proceed, or do they have to be added as a fresh node replacement (of their own) to start from scratch? 2. cassandra allows to choose from several data consistency levels, especially allowing write access that does not update all nodes (i.e. QUORUM, ONE, TWO, THREE). What happens with those nodes who did not get an update? Will they synchronize with the updated nodes automatically, or will they remain in their old state (forever or until next explicit write access)? 3. What exactly happens, when a new node is added to a cluster? Will all records now belonging to the new node be automatically shifted from others? Web page http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html describes a "streaming process", which sounds as if a new node was busy to collect it's belongings from others, but it also says to perform a nodetool cleanup on all the old nodes, which would "remove the keys no longer belonging to those nodes", which rather sounds like a simple drop, i.e. having those records lost. So does cassandra safely fill new nodes, or do they start as empty ones and their data is lost? Thank you! regards Hadmut