Re: Best Practice to add a node in a Cluster

2015-04-28 Thread Neha Trivedi
Interesting Eric !!!
Not sure if this would be allowed. Alter keyspace to RF=3 and then add a
node.

On Tue, Apr 28, 2015 at 8:54 PM, Eric Stevens  wrote:

> I would double check in a test cluster (or with a tool like CCM to confirm
> to set up a local throwaway cluster), but for this *specific* use case
> (going from RF==NodeCount to RF==NodeCount with a higher number) you should
> be able to have a simpler path.  Set RF=3 before you add your new node,
> then add the new node.  It will bootstrap all data from the other two
> nodes, then your job is done.
>
> You shouldn't have to run repair (which you normally have to do after
> increasing RF in order to make sure all nodes have their data - the nodes
> already have all their data), and you shouldn't have to run cleanup (which
> you normally have to do after increasing node count to instruct the old
> nodes to forget data for which they are no longer responsible).  The data
> responsibility hasn't changed for any node, all nodes are still responsible
> for all data.
>
> On Mon, Apr 27, 2015 at 9:19 PM, Neha Trivedi 
> wrote:
>
>> Thans Arun !
>>
>> On Tue, Apr 28, 2015 at 9:44 AM, arun sirimalla 
>> wrote:
>>
>>> Hi Neha,
>>>
>>>
>>> After you add the node to the cluster, run nodetool cleanup on all nodes.
>>> Next running repair on each node will replicate the data. Make sure you
>>> run the repair on one node at a time, because repair is an expensive
>>> process (Utilizes high CPU).
>>>
>>>
>>>
>>>
>>> On Mon, Apr 27, 2015 at 8:36 PM, Neha Trivedi 
>>> wrote:
>>>
 Thanks Eric and Matt :) !!

 Yes the purpose is to improve reliability.
 Right now, from our driver we are querying using degradePolicy for
 reliability.



 *For changing the keyspace for RF=3, the procedure is as under:*
 1. Add a new node to the cluster (new node is not in seed list)

 2. ALTER KEYSPACE system_auth WITH REPLICATION =
   {'class' : 'NetworkTopologyStrategy', 'dc1' : 3};


1. On each affected node, run nodetool repair

 .

2. Wait until repair completes on a node, then move to the next
node.


 Any other things to take care?

 Thanks
 Regards
 neha


 On Mon, Apr 27, 2015 at 9:45 PM, Eric Stevens 
 wrote:

> It depends on why you're adding a new node.  If you're running out of
> disk space or IO capacity in your 2 node cluster, then changing RF to 3
> will not improve either condition - you'd still be writing all data to all
> three nodes.
>
> However if you're looking to improve reliability, a 2 node RF=2
> cluster cannot have either node offline without losing quorum, while a 3
> node RF=3 cluster can have one node offline and still be able to achieve
> quorum.  RF=3 is a common replication factor because of this 
> characteristic.
>
> Make sure your new node is not in its own seeds list, or it will not
> bootstrap (it will come online immediately and start serving requests).
>
> On Mon, Apr 27, 2015 at 8:46 AM, Neha Trivedi 
> wrote:
>
>> Hi
>> We have a 2 Cluster Node with RF=2. We are planing to add a new node.
>>
>> Should we change RF to 3 in the schema?
>> OR Just added a new node with the same RF=2?
>>
>> Any other Best Practice that we need to take care?
>>
>> Thanks
>> regards
>> Neha
>>
>>
>

>>>
>>>
>>> --
>>> Arun
>>> Senior Hadoop/Cassandra Engineer
>>> Cloudwick
>>>
>>> Champion of Big Data (Cloudera)
>>>
>>> http://www.cloudera.com/content/dev-center/en/home/champions-of-big-data.html
>>>
>>> 2014 Data Impact Award Winner (Cloudera)
>>>
>>> http://www.cloudera.com/content/cloudera/en/campaign/data-impact-awards.html
>>>
>>>
>>
>


Re: Best Practice to add a node in a Cluster

2015-04-28 Thread Eric Stevens
I would double check in a test cluster (or with a tool like CCM to confirm
to set up a local throwaway cluster), but for this *specific* use case
(going from RF==NodeCount to RF==NodeCount with a higher number) you should
be able to have a simpler path.  Set RF=3 before you add your new node,
then add the new node.  It will bootstrap all data from the other two
nodes, then your job is done.

You shouldn't have to run repair (which you normally have to do after
increasing RF in order to make sure all nodes have their data - the nodes
already have all their data), and you shouldn't have to run cleanup (which
you normally have to do after increasing node count to instruct the old
nodes to forget data for which they are no longer responsible).  The data
responsibility hasn't changed for any node, all nodes are still responsible
for all data.

On Mon, Apr 27, 2015 at 9:19 PM, Neha Trivedi 
wrote:

> Thans Arun !
>
> On Tue, Apr 28, 2015 at 9:44 AM, arun sirimalla 
> wrote:
>
>> Hi Neha,
>>
>>
>> After you add the node to the cluster, run nodetool cleanup on all nodes.
>> Next running repair on each node will replicate the data. Make sure you
>> run the repair on one node at a time, because repair is an expensive
>> process (Utilizes high CPU).
>>
>>
>>
>>
>> On Mon, Apr 27, 2015 at 8:36 PM, Neha Trivedi 
>> wrote:
>>
>>> Thanks Eric and Matt :) !!
>>>
>>> Yes the purpose is to improve reliability.
>>> Right now, from our driver we are querying using degradePolicy for
>>> reliability.
>>>
>>>
>>>
>>> *For changing the keyspace for RF=3, the procedure is as under:*
>>> 1. Add a new node to the cluster (new node is not in seed list)
>>>
>>> 2. ALTER KEYSPACE system_auth WITH REPLICATION =
>>>   {'class' : 'NetworkTopologyStrategy', 'dc1' : 3};
>>>
>>>
>>>1. On each affected node, run nodetool repair
>>>
>>> .
>>>
>>>2. Wait until repair completes on a node, then move to the next node.
>>>
>>>
>>> Any other things to take care?
>>>
>>> Thanks
>>> Regards
>>> neha
>>>
>>>
>>> On Mon, Apr 27, 2015 at 9:45 PM, Eric Stevens  wrote:
>>>
 It depends on why you're adding a new node.  If you're running out of
 disk space or IO capacity in your 2 node cluster, then changing RF to 3
 will not improve either condition - you'd still be writing all data to all
 three nodes.

 However if you're looking to improve reliability, a 2 node RF=2 cluster
 cannot have either node offline without losing quorum, while a 3 node RF=3
 cluster can have one node offline and still be able to achieve quorum.
 RF=3 is a common replication factor because of this characteristic.

 Make sure your new node is not in its own seeds list, or it will not
 bootstrap (it will come online immediately and start serving requests).

 On Mon, Apr 27, 2015 at 8:46 AM, Neha Trivedi 
 wrote:

> Hi
> We have a 2 Cluster Node with RF=2. We are planing to add a new node.
>
> Should we change RF to 3 in the schema?
> OR Just added a new node with the same RF=2?
>
> Any other Best Practice that we need to take care?
>
> Thanks
> regards
> Neha
>
>

>>>
>>
>>
>> --
>> Arun
>> Senior Hadoop/Cassandra Engineer
>> Cloudwick
>>
>> Champion of Big Data (Cloudera)
>>
>> http://www.cloudera.com/content/dev-center/en/home/champions-of-big-data.html
>>
>> 2014 Data Impact Award Winner (Cloudera)
>>
>> http://www.cloudera.com/content/cloudera/en/campaign/data-impact-awards.html
>>
>>
>


Re: Best Practice to add a node in a Cluster

2015-04-27 Thread Neha Trivedi
Thans Arun !

On Tue, Apr 28, 2015 at 9:44 AM, arun sirimalla  wrote:

> Hi Neha,
>
>
> After you add the node to the cluster, run nodetool cleanup on all nodes.
> Next running repair on each node will replicate the data. Make sure you
> run the repair on one node at a time, because repair is an expensive
> process (Utilizes high CPU).
>
>
>
>
> On Mon, Apr 27, 2015 at 8:36 PM, Neha Trivedi 
> wrote:
>
>> Thanks Eric and Matt :) !!
>>
>> Yes the purpose is to improve reliability.
>> Right now, from our driver we are querying using degradePolicy for
>> reliability.
>>
>>
>>
>> *For changing the keyspace for RF=3, the procedure is as under:*
>> 1. Add a new node to the cluster (new node is not in seed list)
>>
>> 2. ALTER KEYSPACE system_auth WITH REPLICATION =
>>   {'class' : 'NetworkTopologyStrategy', 'dc1' : 3};
>>
>>
>>1. On each affected node, run nodetool repair
>>
>> .
>>
>>2. Wait until repair completes on a node, then move to the next node.
>>
>>
>> Any other things to take care?
>>
>> Thanks
>> Regards
>> neha
>>
>>
>> On Mon, Apr 27, 2015 at 9:45 PM, Eric Stevens  wrote:
>>
>>> It depends on why you're adding a new node.  If you're running out of
>>> disk space or IO capacity in your 2 node cluster, then changing RF to 3
>>> will not improve either condition - you'd still be writing all data to all
>>> three nodes.
>>>
>>> However if you're looking to improve reliability, a 2 node RF=2 cluster
>>> cannot have either node offline without losing quorum, while a 3 node RF=3
>>> cluster can have one node offline and still be able to achieve quorum.
>>> RF=3 is a common replication factor because of this characteristic.
>>>
>>> Make sure your new node is not in its own seeds list, or it will not
>>> bootstrap (it will come online immediately and start serving requests).
>>>
>>> On Mon, Apr 27, 2015 at 8:46 AM, Neha Trivedi 
>>> wrote:
>>>
 Hi
 We have a 2 Cluster Node with RF=2. We are planing to add a new node.

 Should we change RF to 3 in the schema?
 OR Just added a new node with the same RF=2?

 Any other Best Practice that we need to take care?

 Thanks
 regards
 Neha


>>>
>>
>
>
> --
> Arun
> Senior Hadoop/Cassandra Engineer
> Cloudwick
>
> Champion of Big Data (Cloudera)
>
> http://www.cloudera.com/content/dev-center/en/home/champions-of-big-data.html
>
> 2014 Data Impact Award Winner (Cloudera)
>
> http://www.cloudera.com/content/cloudera/en/campaign/data-impact-awards.html
>
>


Re: Best Practice to add a node in a Cluster

2015-04-27 Thread arun sirimalla
Hi Neha,


After you add the node to the cluster, run nodetool cleanup on all nodes.
Next running repair on each node will replicate the data. Make sure you run
the repair on one node at a time, because repair is an expensive process
(Utilizes high CPU).




On Mon, Apr 27, 2015 at 8:36 PM, Neha Trivedi 
wrote:

> Thanks Eric and Matt :) !!
>
> Yes the purpose is to improve reliability.
> Right now, from our driver we are querying using degradePolicy for
> reliability.
>
>
>
> *For changing the keyspace for RF=3, the procedure is as under:*
> 1. Add a new node to the cluster (new node is not in seed list)
>
> 2. ALTER KEYSPACE system_auth WITH REPLICATION =
>   {'class' : 'NetworkTopologyStrategy', 'dc1' : 3};
>
>
>1. On each affected node, run nodetool repair
>
> .
>
>2. Wait until repair completes on a node, then move to the next node.
>
>
> Any other things to take care?
>
> Thanks
> Regards
> neha
>
>
> On Mon, Apr 27, 2015 at 9:45 PM, Eric Stevens  wrote:
>
>> It depends on why you're adding a new node.  If you're running out of
>> disk space or IO capacity in your 2 node cluster, then changing RF to 3
>> will not improve either condition - you'd still be writing all data to all
>> three nodes.
>>
>> However if you're looking to improve reliability, a 2 node RF=2 cluster
>> cannot have either node offline without losing quorum, while a 3 node RF=3
>> cluster can have one node offline and still be able to achieve quorum.
>> RF=3 is a common replication factor because of this characteristic.
>>
>> Make sure your new node is not in its own seeds list, or it will not
>> bootstrap (it will come online immediately and start serving requests).
>>
>> On Mon, Apr 27, 2015 at 8:46 AM, Neha Trivedi 
>> wrote:
>>
>>> Hi
>>> We have a 2 Cluster Node with RF=2. We are planing to add a new node.
>>>
>>> Should we change RF to 3 in the schema?
>>> OR Just added a new node with the same RF=2?
>>>
>>> Any other Best Practice that we need to take care?
>>>
>>> Thanks
>>> regards
>>> Neha
>>>
>>>
>>
>


-- 
Arun
Senior Hadoop/Cassandra Engineer
Cloudwick

Champion of Big Data (Cloudera)
http://www.cloudera.com/content/dev-center/en/home/champions-of-big-data.html

2014 Data Impact Award Winner (Cloudera)
http://www.cloudera.com/content/cloudera/en/campaign/data-impact-awards.html


Re: Best Practice to add a node in a Cluster

2015-04-27 Thread Neha Trivedi
Thanks Eric and Matt :) !!

Yes the purpose is to improve reliability.
Right now, from our driver we are querying using degradePolicy for
reliability.



*For changing the keyspace for RF=3, the procedure is as under:*
1. Add a new node to the cluster (new node is not in seed list)

2. ALTER KEYSPACE system_auth WITH REPLICATION =
  {'class' : 'NetworkTopologyStrategy', 'dc1' : 3};


   1. On each affected node, run nodetool repair
   
.

   2. Wait until repair completes on a node, then move to the next node.


Any other things to take care?

Thanks
Regards
neha


On Mon, Apr 27, 2015 at 9:45 PM, Eric Stevens  wrote:

> It depends on why you're adding a new node.  If you're running out of disk
> space or IO capacity in your 2 node cluster, then changing RF to 3 will not
> improve either condition - you'd still be writing all data to all three
> nodes.
>
> However if you're looking to improve reliability, a 2 node RF=2 cluster
> cannot have either node offline without losing quorum, while a 3 node RF=3
> cluster can have one node offline and still be able to achieve quorum.
> RF=3 is a common replication factor because of this characteristic.
>
> Make sure your new node is not in its own seeds list, or it will not
> bootstrap (it will come online immediately and start serving requests).
>
> On Mon, Apr 27, 2015 at 8:46 AM, Neha Trivedi 
> wrote:
>
>> Hi
>> We have a 2 Cluster Node with RF=2. We are planing to add a new node.
>>
>> Should we change RF to 3 in the schema?
>> OR Just added a new node with the same RF=2?
>>
>> Any other Best Practice that we need to take care?
>>
>> Thanks
>> regards
>> Neha
>>
>>
>


RE: Best Practice to add a node in a Cluster

2015-04-27 Thread Matthew Johnson
Hi Neha,



I guess it depends why you are adding a new node – do you need more storage
capacity, do you want better resilience, or are you trying to increase
performance?



If you add a new node with the same amount of storage as the previous two,
but you increase the RF, you will use up all of the storage you have added
by replicating the existing data onto the new node. If you keep it at RF=2,
once you have done all the bootstrapping and cleanup then your usage on the
existing two should decrease by about 30% (of their total size).



However, if it is resilience you are after (being able to take down nodes
without losing availability) then increasing the RF will give you this, at
the expense of using more storage.



Hope that helps.



Cheers,

Matt





*From:* Neha Trivedi [mailto:nehajtriv...@gmail.com]
*Sent:* 27 April 2015 16:46
*To:* user@cassandra.apache.org
*Subject:* Best Practice to add a node in a Cluster



Hi

We have a 2 Cluster Node with RF=2. We are planing to add a new node.

Should we change RF to 3 in the schema?
OR Just added a new node with the same RF=2?

Any other Best Practice that we need to take care?

Thanks

regards

Neha


Re: Best Practice to add a node in a Cluster

2015-04-27 Thread Eric Stevens
It depends on why you're adding a new node.  If you're running out of disk
space or IO capacity in your 2 node cluster, then changing RF to 3 will not
improve either condition - you'd still be writing all data to all three
nodes.

However if you're looking to improve reliability, a 2 node RF=2 cluster
cannot have either node offline without losing quorum, while a 3 node RF=3
cluster can have one node offline and still be able to achieve quorum.
RF=3 is a common replication factor because of this characteristic.

Make sure your new node is not in its own seeds list, or it will not
bootstrap (it will come online immediately and start serving requests).

On Mon, Apr 27, 2015 at 8:46 AM, Neha Trivedi 
wrote:

> Hi
> We have a 2 Cluster Node with RF=2. We are planing to add a new node.
>
> Should we change RF to 3 in the schema?
> OR Just added a new node with the same RF=2?
>
> Any other Best Practice that we need to take care?
>
> Thanks
> regards
> Neha
>
>


Best Practice to add a node in a Cluster

2015-04-27 Thread Neha Trivedi
Hi
We have a 2 Cluster Node with RF=2. We are planing to add a new node.

Should we change RF to 3 in the schema?
OR Just added a new node with the same RF=2?

Any other Best Practice that we need to take care?

Thanks
regards
Neha