Re: UnavailableException with 1 node down and RF=2?
> Thank you for your explanations. Even with a RF=1 and one node down I don't > understand why I can't at least read the data in the nodes that are still > up? You will be able to read data for row keys that do not live on the node that is down. But for any request to a row which is on the node that is down, Unavailable is the expected result. If the data simply does not exist other than on the one single node, and that node is down, there's nothing Cassandra, or any other system, can do ;) > Also, why can't I at least perform writes with consistency level ANY and > failover policy ON_FAIL_TRY_ALL_AVAILABLE...shouldn't the nodes that are up > be able to take in the writes destined for the node that is down and perform > hinted handoffs when it comes back again? You seem to be mixing Hector stuff and Cassandra concepts here. So to be clear: You can use CL.ANY in order to make writes be accepted even if the one and only node that owns the data in question is down. However, that data won't be *readable* until that node (1) comes back up, and (2) hints are delivered to it. This is all in Cassandra. The failover policy stuff applies to Hector and how it chooses to select nodes, and should be orthogonal to whether or not data is readable as such. Basically, don't try to use that to get around lack of data due to nodes being down. (Also, note that while I don't know/remember off hand, I don't think Unavailable is going to be tried on all available as that indicates the node responded correctly and that nodes are in fact actually down. I would expect the policy to apply to cases where communication with the co-ordinator node fails. But, I am speculating here and this might be wrong.) > Unless by construction Cassandra > behaves in the way you describe (which is perfectly fine and I will use it > that way from now on) it would be logical for the RF=1 to not affect the > behaviour I expect from just reading the top level descriptions of Cassandra > behaviour I found in the documentation. If you mean that rows that are NOT on the node that is down should be readable, then that is indeed the case. If you are unable to read data from other rows, that is definitely unexpected. In *that* case, the failover policy that you mention might be at play. I.e., you want the hector client not to fail a request just because a single node happens to be down. But since you're getting an "unavailable" exception, that indicates that Hector was able to talk to the selected Cassandra node, and that the node in question gave an Unavailable exception back indicating that the read or write could not be serviced at the given consistency level due to nodes being down. I would start by double checking exactly which row key(s) are being written to/read from, and whether they are truly not on the node(s) that are down. -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)
Re: UnavailableException with 1 node down and RF=2?
Hi Peter, Thank you for your explanations. Even with a RF=1 and one node down I don't understand why I can't at least read the data in the nodes that are still up? Also, why can't I at least perform writes with consistency level ANY and failover policy ON_FAIL_TRY_ALL_AVAILABLE...shouldn't the nodes that are up be able to take in the writes destined for the node that is down and perform hinted handoffs when it comes back again? Unless by construction Cassandra behaves in the way you describe (which is perfectly fine and I will use it that way from now on) it would be logical for the RF=1 to not affect the behaviour I expect from just reading the top level descriptions of Cassandra behaviour I found in the documentation. Cheers, Alex On Fri, Oct 28, 2011 at 10:58 AM, Peter Schuller < peter.schul...@infidyne.com> wrote: > > If you want to survive node failures, use an RF above 1. And then make > > sure to use an appropriate consistency level. > > To elaborate a bit: RF, or replication factor, is the *total* number > of copies of any piece of data in the cluster. So with only one copy, > the data will not be available when a single node is down. > > Consistency levels control how many nodes are required to respond to > requests before it is considered successful, and this has implications > on availability. For example, if you want to survive a single node > going down and you use RF=2, you must use ConsistencyLevel.ONE. If you > used QUORUM or ALL, any read or write would fail (QUORUM of 2 is 2). > > Probably a common setup is to use RF=3 because it allows you to > survive a node going down, while also allowing you to use QUORUM. But, > whether that matters will be up to your use-case. > > -- > / Peter Schuller (@scode, http://worldmodscode.wordpress.com) > -- Alexandru Dan Sicoe MEng, CERN Marie Curie ACEOLE Fellow
Re: UnavailableException with 1 node down and RF=2?
> If you want to survive node failures, use an RF above 1. And then make > sure to use an appropriate consistency level. To elaborate a bit: RF, or replication factor, is the *total* number of copies of any piece of data in the cluster. So with only one copy, the data will not be available when a single node is down. Consistency levels control how many nodes are required to respond to requests before it is considered successful, and this has implications on availability. For example, if you want to survive a single node going down and you use RF=2, you must use ConsistencyLevel.ONE. If you used QUORUM or ALL, any read or write would fail (QUORUM of 2 is 2). Probably a common setup is to use RF=3 because it allows you to survive a node going down, while also allowing you to use QUORUM. But, whether that matters will be up to your use-case. -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)
Re: UnavailableException with 1 node down and RF=2?
> took a node down to see how it behaves. All of a sudden I couldn't write or [snip] > me.prettyprint.hector.api.exceptions.HUnavailableException: : May not be [snip] > Default replication factor = 1 So you have an RF=1 cluster (only one copy of data) and you bring a node down. This fundamentally and necessarily means that the data on the node you brought down will be unavailable. If you want to survive node failures, use an RF above 1. And then make sure to use an appropriate consistency level. -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)
Re: UnavailableException with 1 node down and RF=2?
gt;> >> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/2-node-cluster-1-node-down-overall-failure-td6936722.html >> >> >> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/2-node-cluster-1-node-down-overall-failure-td6936722.html >> >> >> >> -- >> >> View this message in context: >> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/UnavailableException-with-1-node-down-and-RF-2-tp5242055p6936767.html >> >> Sent from the cassandra-u...@incubator.apache.org mailing list archive >> at Nabble.com. >> >> >> > >> > >> > >> > -- >> > Jonathan Ellis >> > Project Chair, Apache Cassandra >> > co-founder of DataStax, the source for professional Cassandra support >> > http://www.datastax.com >> > >> >> >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder of DataStax, the source for professional Cassandra support >> http://www.datastax.com >> > >
Re: UnavailableException with 1 node down and RF=2?
Thats correct. It was a read consistency problem, not so smart of me ;-) Thank you anyway. 2011/10/27 Jonathan Ellis > (I see that you did start a new thread and solved it with Jake's help.) > > On Thu, Oct 27, 2011 at 11:23 AM, Jonathan Ellis > wrote: > > Ha. On the one hand, good on you for searching the list archives for > > similar problems. On the other hand, after over a year it's probably > > worth starting a new thread. :) > > > > Standard questions: > > > > - What Cassandra version are you running? > > - Are there exceptions in the log for the machine still running? > > - What does "not responding anymore" mean? Reporting timeouts, > > reporting unavailable, refusing client connections, ... ? > > > > On Thu, Oct 27, 2011 at 10:22 AM, RobinUs2 wrote: > >> I'm currently having a similar problem with a 2-node cluster. When 1 > shutdown > >> one of the nodes, the other isn't responding any more. > >> > >> Did you found a solution for your problem? > >> > >> /I'm new to mailing lists, if it's inappropriate to reply here, please > let > >> me know../ > >> > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/2-node-cluster-1-node-down-overall-failure-td6936722.html > >> > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/2-node-cluster-1-node-down-overall-failure-td6936722.html > >> > >> -- > >> View this message in context: > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/UnavailableException-with-1-node-down-and-RF-2-tp5242055p6936767.html > >> Sent from the cassandra-u...@incubator.apache.org mailing list archive > at Nabble.com. > >> > > > > > > > > -- > > Jonathan Ellis > > Project Chair, Apache Cassandra > > co-founder of DataStax, the source for professional Cassandra support > > http://www.datastax.com > > > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com >
Re: UnavailableException with 1 node down and RF=2?
(I see that you did start a new thread and solved it with Jake's help.) On Thu, Oct 27, 2011 at 11:23 AM, Jonathan Ellis wrote: > Ha. On the one hand, good on you for searching the list archives for > similar problems. On the other hand, after over a year it's probably > worth starting a new thread. :) > > Standard questions: > > - What Cassandra version are you running? > - Are there exceptions in the log for the machine still running? > - What does "not responding anymore" mean? Reporting timeouts, > reporting unavailable, refusing client connections, ... ? > > On Thu, Oct 27, 2011 at 10:22 AM, RobinUs2 wrote: >> I'm currently having a similar problem with a 2-node cluster. When 1 shutdown >> one of the nodes, the other isn't responding any more. >> >> Did you found a solution for your problem? >> >> /I'm new to mailing lists, if it's inappropriate to reply here, please let >> me know../ >> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/2-node-cluster-1-node-down-overall-failure-td6936722.html >> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/2-node-cluster-1-node-down-overall-failure-td6936722.html >> >> -- >> View this message in context: >> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/UnavailableException-with-1-node-down-and-RF-2-tp5242055p6936767.html >> Sent from the cassandra-u...@incubator.apache.org mailing list archive at >> Nabble.com. >> > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: UnavailableException with 1 node down and RF=2?
What the problem might be is that you are setting the Consistency Level to a value bigger than 1. In such cases, Cassandra will respond you with an UnavailableException since it can't achieve the level of consistency you are asking for. Remember that, when you have RF=2, CS values as ALL and QUORUM are the same. Regards, Javier. On Thu, Oct 27, 2011 at 1:23 PM, Jonathan Ellis wrote: > Ha. On the one hand, good on you for searching the list archives for > similar problems. On the other hand, after over a year it's probably > worth starting a new thread. :) > > Standard questions: > > - What Cassandra version are you running? > - Are there exceptions in the log for the machine still running? > - What does "not responding anymore" mean? Reporting timeouts, > reporting unavailable, refusing client connections, ... ? > > On Thu, Oct 27, 2011 at 10:22 AM, RobinUs2 wrote: > > I'm currently having a similar problem with a 2-node cluster. When 1 > shutdown > > one of the nodes, the other isn't responding any more. > > > > Did you found a solution for your problem? > > > > /I'm new to mailing lists, if it's inappropriate to reply here, please > let > > me know../ > > > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/2-node-cluster-1-node-down-overall-failure-td6936722.html > > > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/2-node-cluster-1-node-down-overall-failure-td6936722.html > > > > -- > > View this message in context: > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/UnavailableException-with-1-node-down-and-RF-2-tp5242055p6936767.html > > Sent from the cassandra-u...@incubator.apache.org mailing list archive > at Nabble.com. > > > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com >
Re: UnavailableException with 1 node down and RF=2?
Ha. On the one hand, good on you for searching the list archives for similar problems. On the other hand, after over a year it's probably worth starting a new thread. :) Standard questions: - What Cassandra version are you running? - Are there exceptions in the log for the machine still running? - What does "not responding anymore" mean? Reporting timeouts, reporting unavailable, refusing client connections, ... ? On Thu, Oct 27, 2011 at 10:22 AM, RobinUs2 wrote: > I'm currently having a similar problem with a 2-node cluster. When 1 shutdown > one of the nodes, the other isn't responding any more. > > Did you found a solution for your problem? > > /I'm new to mailing lists, if it's inappropriate to reply here, please let > me know../ > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/2-node-cluster-1-node-down-overall-failure-td6936722.html > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/2-node-cluster-1-node-down-overall-failure-td6936722.html > > -- > View this message in context: > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/UnavailableException-with-1-node-down-and-RF-2-tp5242055p6936767.html > Sent from the cassandra-u...@incubator.apache.org mailing list archive at > Nabble.com. > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: UnavailableException with 1 node down and RF=2?
I'm currently having a similar problem with a 2-node cluster. When 1 shutdown one of the nodes, the other isn't responding any more. Did you found a solution for your problem? /I'm new to mailing lists, if it's inappropriate to reply here, please let me know../ http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/2-node-cluster-1-node-down-overall-failure-td6936722.html http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/2-node-cluster-1-node-down-overall-failure-td6936722.html -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/UnavailableException-with-1-node-down-and-RF-2-tp5242055p6936767.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: UnavailableException with 1 node down and RF=2?
Then either you have at least one machine that thinks RF=1 or you found a bug. On Thu, Jul 1, 2010 at 7:08 AM, James Golick wrote: > It's happening consistently when I take any node out of rotation. > > On Thu, Jul 1, 2010 at 2:24 AM, Jonathan Ellis wrote: >> >> Presumably the failure detector generated a false positive for a >> second node temporarily >> >> On Wed, Jun 30, 2010 at 10:55 PM, James Golick >> wrote: >> > Oops. I meant to say that I'm reading with CL.ONE. >> > >> > J. >> > >> > Sent from my iPhone. >> > >> > On 2010-07-01, at 1:39 AM, Benjamin Black wrote: >> > >> >> .QUORUM or .ALL (they are the same with RF=2). >> >> >> >> On Wed, Jun 30, 2010 at 10:22 PM, James Golick >> >> wrote: >> >>> 4 nodes, RF=2, 1 node down. >> >>> How can I get an UnavailableException in that scenario? >> >>> - J. >> > >> >> >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder of Riptano, the source for professional Cassandra support >> http://riptano.com > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: UnavailableException with 1 node down and RF=2?
It's happening consistently when I take any node out of rotation. On Thu, Jul 1, 2010 at 2:24 AM, Jonathan Ellis wrote: > Presumably the failure detector generated a false positive for a > second node temporarily > > On Wed, Jun 30, 2010 at 10:55 PM, James Golick > wrote: > > Oops. I meant to say that I'm reading with CL.ONE. > > > > J. > > > > Sent from my iPhone. > > > > On 2010-07-01, at 1:39 AM, Benjamin Black wrote: > > > >> .QUORUM or .ALL (they are the same with RF=2). > >> > >> On Wed, Jun 30, 2010 at 10:22 PM, James Golick > wrote: > >>> 4 nodes, RF=2, 1 node down. > >>> How can I get an UnavailableException in that scenario? > >>> - J. > > > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of Riptano, the source for professional Cassandra support > http://riptano.com >
Re: UnavailableException with 1 node down and RF=2?
Presumably the failure detector generated a false positive for a second node temporarily On Wed, Jun 30, 2010 at 10:55 PM, James Golick wrote: > Oops. I meant to say that I'm reading with CL.ONE. > > J. > > Sent from my iPhone. > > On 2010-07-01, at 1:39 AM, Benjamin Black wrote: > >> .QUORUM or .ALL (they are the same with RF=2). >> >> On Wed, Jun 30, 2010 at 10:22 PM, James Golick wrote: >>> 4 nodes, RF=2, 1 node down. >>> How can I get an UnavailableException in that scenario? >>> - J. > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: UnavailableException with 1 node down and RF=2?
Oops. I meant to say that I'm reading with CL.ONE. J. Sent from my iPhone. On 2010-07-01, at 1:39 AM, Benjamin Black wrote: > .QUORUM or .ALL (they are the same with RF=2). > > On Wed, Jun 30, 2010 at 10:22 PM, James Golick wrote: >> 4 nodes, RF=2, 1 node down. >> How can I get an UnavailableException in that scenario? >> - J.
Re: UnavailableException with 1 node down and RF=2?
.QUORUM or .ALL (they are the same with RF=2). On Wed, Jun 30, 2010 at 10:22 PM, James Golick wrote: > 4 nodes, RF=2, 1 node down. > How can I get an UnavailableException in that scenario? > - J.
UnavailableException with 1 node down and RF=2?
4 nodes, RF=2, 1 node down. How can I get an UnavailableException in that scenario? - J.