[jira] [Commented] (CASSANDRA-10243) Warn or fail when changing cluster topology live

2016-07-26 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15394003#comment-15394003
 ] 

Jonathan Ellis commented on CASSANDRA-10243:


I edited cassandra.yaml as follows:

{code}
# CASSANDRA WILL NOT ALLOW YOU TO SWITCH TO AN INCOMPATIBLE SNITCH
# ONCE DATA IS INSERTED INTO THE CLUSTER.  This would cause data loss.
# This means that if you start with the default SimpleSnitch, which
# locates every node on "rack1" in "datacenter1", your only options
# if you need to add another datacenter are GossipingPropertyFileSnitch
# (and the older PFS).  From there, if you want to migrate to an
# incompatible snitch like Ec2Snitch you can do it by adding new nodes
# under Ec2Snitch (which will locate them in a new "datacenter") and
# decommissioning the old ones.
{code}

> Warn or fail when changing cluster topology live
> 
>
> Key: CASSANDRA-10243
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10243
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Jonathan Ellis
>Assignee: Stefania
>Priority: Critical
> Fix For: 2.1.12, 2.2.4, 3.0.1, 3.1, 3.2
>
>
> Moving a node from one rack to another in the snitch, while it is alive, is 
> almost always the wrong thing to do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10243) Warn or fail when changing cluster topology live

2015-12-02 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15035730#comment-15035730
 ] 

Paulo Motta commented on CASSANDRA-10243:
-

+1

> Warn or fail when changing cluster topology live
> 
>
> Key: CASSANDRA-10243
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10243
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Jonathan Ellis
>Assignee: Stefania
>Priority: Critical
> Fix For: 2.1.12, 2.2.4, 3.0.1, 3.1, 3.2
>
>
> Moving a node from one rack to another in the snitch, while it is alive, is 
> almost always the wrong thing to do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10243) Warn or fail when changing cluster topology live

2015-12-01 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15035009#comment-15035009
 ] 

Stefania commented on CASSANDRA-10243:
--

+1 with a couple of nits:

* introduced documentation so people are less tempted to remove this method in 
future
* rename {{liveEndpoints}} to {{liveMembers}}
* re-introduced {{getLiveTokenOwners()}} as well, this too was public and in 
the same class

Patches and CI here:

||2.1||2.2||3.0||3.1||trunk||
|[patch|https://github.com/stef1927/cassandra/commits/10243-getLiveMembers-2.1]|[patch|https://github.com/stef1927/cassandra/commits/10243-getLiveMembers-2.2]|[patch|https://github.com/stef1927/cassandra/commits/10243-getLiveMembers-3.0]|[patch|https://github.com/stef1927/cassandra/commits/10243-getLiveMembers-3.1]|[patch|https://github.com/stef1927/cassandra/commits/10243-getLiveMembers]|
|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-10243-getLiveMembers-2.1-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-10243-getLiveMembers-2.2-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-10243-getLiveMembers-3.0-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-10243-getLiveMembers-3.1-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-10243-getLiveMembers-testall/]|
|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-10243-getLiveMembers-2.1-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-10243-getLiveMembers-2.2-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-10243-getLiveMembers-3.0-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-10243-getLiveMembers-3.1-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-10243-getLiveMembers-dtest/]|


> Warn or fail when changing cluster topology live
> 
>
> Key: CASSANDRA-10243
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10243
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Jonathan Ellis
>Assignee: Stefania
>Priority: Critical
> Fix For: 2.1.12, 2.2.4, 3.0.1, 3.1, 3.2
>
>
> Moving a node from one rack to another in the snitch, while it is alive, is 
> almost always the wrong thing to do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10243) Warn or fail when changing cluster topology live

2015-12-01 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15035369#comment-15035369
 ] 

Stefania commented on CASSANDRA-10243:
--

CI looks OK to me.

> Warn or fail when changing cluster topology live
> 
>
> Key: CASSANDRA-10243
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10243
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Jonathan Ellis
>Assignee: Stefania
>Priority: Critical
> Fix For: 2.1.12, 2.2.4, 3.0.1, 3.1, 3.2
>
>
> Moving a node from one rack to another in the snitch, while it is alive, is 
> almost always the wrong thing to do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10243) Warn or fail when changing cluster topology live

2015-12-01 Thread Jeremiah Jordan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034385#comment-15034385
 ] 

Jeremiah Jordan commented on CASSANDRA-10243:
-

+1 getLiveMembers is used by snitches, which is are a common extension point.  
So the original change will probably break existing external snitches.

> Warn or fail when changing cluster topology live
> 
>
> Key: CASSANDRA-10243
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10243
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Jonathan Ellis
>Assignee: Stefania
>Priority: Critical
> Fix For: 2.1.12, 2.2.4, 3.0.1, 3.1, 3.2
>
>
> Moving a node from one rack to another in the snitch, while it is alive, is 
> almost always the wrong thing to do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10243) Warn or fail when changing cluster topology live

2015-12-01 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034357#comment-15034357
 ] 

Paulo Motta commented on CASSANDRA-10243:
-

Since {{Gossiper.getLiveEndpoints()}} might be used by other tools built on the 
top of Cassandra, it may break compatibility if renamed to 
{{Gossiper.getLiveMembers}}. I'm attaching a patch that keeps 
{{Gossiper.getLiveMembers}} method while renaming the new 
{{StorageService.getLiveMembers}} to {{StorageService.getLiveRingMembers}} to 
differentiate from the {{Gossiper}} method. 2.2 patch applies cleanly upwards.


||2.1||2.2||3.0||3.1||trunk||
|[branch|https://github.com/apache/cassandra/compare/cassandra-2.1...pauloricardomg:2.1-10243-getLiveMembers]|[branch|https://github.com/apache/cassandra/compare/cassandra-2.2...pauloricardomg:2.2-10243-getLiveMembers]|[branch|https://github.com/apache/cassandra/compare/cassandra-3.0...pauloricardomg:3.0-10243-getLiveMembers]|[branch|https://github.com/apache/cassandra/compare/cassandra-3.1...pauloricardomg:3.1-10243-getLiveMembers]|[branch|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:trunk-10243-getLiveMembers]|
|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.1-10243-getLiveMembers-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.2-10243-getLiveMembers-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-10243-getLiveMembers-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.1-10243-getLiveMembers-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-10243-getLiveMembers-testall/lastCompletedBuild/testReport/]|
|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.1-10243-getLiveMembers-dtest/lastCompletedBuild/testReport/]|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.2-10243-getLiveMembers-dtest/lastCompletedBuild/testReport/]|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-10243-getLiveMembers-dtest/lastCompletedBuild/testReport/]|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.1-10243-getLiveMembers-dtest/lastCompletedBuild/testReport/]|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-10243-getLiveMembers-dtest/lastCompletedBuild/testReport/]|


> Warn or fail when changing cluster topology live
> 
>
> Key: CASSANDRA-10243
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10243
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Jonathan Ellis
>Assignee: Stefania
>Priority: Critical
> Fix For: 2.1.12, 2.2.4, 3.0.1, 3.1, 3.2
>
>
> Moving a node from one rack to another in the snitch, while it is alive, is 
> almost always the wrong thing to do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10243) Warn or fail when changing cluster topology live

2015-11-29 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15031271#comment-15031271
 ] 

Stefania commented on CASSANDRA-10243:
--

Actually we cannot hold the P.R. because now that this ticket is committed the 
existing {{SnitchConfigurationUpdateTest}} tests in _replication_test.py_ fail. 
So I've created the pull request 
[here|https://github.com/riptano/cassandra-dtest/pull/688]. We'll have to check 
{{test_cannot_restart_with_different_rack}} after CASSANDRA-9474 is committed.

> Warn or fail when changing cluster topology live
> 
>
> Key: CASSANDRA-10243
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10243
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Jonathan Ellis
>Assignee: Stefania
>Priority: Critical
> Fix For: 2.1.12, 2.2.4, 3.0.1, 3.1, 3.2
>
>
> Moving a node from one rack to another in the snitch, while it is alive, is 
> almost always the wrong thing to do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10243) Warn or fail when changing cluster topology live

2015-11-28 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15030772#comment-15030772
 ] 

Stefania commented on CASSANDRA-10243:
--

I've updated the test, thanks. I'll hold the P.R. until 9474 is committed since 
I want to test it locally again first.

> Warn or fail when changing cluster topology live
> 
>
> Key: CASSANDRA-10243
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10243
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Jonathan Ellis
>Assignee: Stefania
>Priority: Critical
> Fix For: 2.1.12, 2.2.4, 3.0.1, 3.1, 3.2
>
>
> Moving a node from one rack to another in the snitch, while it is alive, is 
> almost always the wrong thing to do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10243) Warn or fail when changing cluster topology live

2015-11-27 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15030266#comment-15030266
 ] 

Paulo Motta commented on CASSANDRA-10243:
-

[~Stefania] After CASSANDRA-9474 is merged, the rack startup failure message 
will change to {{Cannot start node if snitch's rack (rack2) differs from 
previous rack (rack1). Please fix the snitch configuration, decommission and 
rebootstrap this node or use the flag -Dcassandra.ignore_rack=true}}, can you 
please update that on 
{{replication_test.py:SnitchConfigurationUpdateTest.test_cannot_restart_with_different_rack}}
 before submitting the PR? Thanks!

> Warn or fail when changing cluster topology live
> 
>
> Key: CASSANDRA-10243
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10243
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Jonathan Ellis
>Assignee: Stefania
>Priority: Critical
> Fix For: 2.1.12, 2.2.4, 3.0.1, 3.1, 3.2
>
>
> Moving a node from one rack to another in the snitch, while it is alive, is 
> almost always the wrong thing to do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10243) Warn or fail when changing cluster topology live

2015-11-24 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023965#comment-15023965
 ] 

Stefania commented on CASSANDRA-10243:
--

CI is complete on all branches. There are failures but they do not seem related.

> Warn or fail when changing cluster topology live
> 
>
> Key: CASSANDRA-10243
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10243
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Jonathan Ellis
>Assignee: Stefania
>Priority: Critical
> Fix For: 2.1.x
>
>
> Moving a node from one rack to another in the snitch, while it is alive, is 
> almost always the wrong thing to do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10243) Warn or fail when changing cluster topology live

2015-11-24 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15024630#comment-15024630
 ] 

Paulo Motta commented on CASSANDRA-10243:
-

Awesome, now it should be ready to go. Thanks!

> Warn or fail when changing cluster topology live
> 
>
> Key: CASSANDRA-10243
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10243
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Jonathan Ellis
>Assignee: Stefania
>Priority: Critical
> Fix For: 2.1.x
>
>
> Moving a node from one rack to another in the snitch, while it is alive, is 
> almost always the wrong thing to do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10243) Warn or fail when changing cluster topology live

2015-11-23 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023553#comment-15023553
 ] 

Stefania commented on CASSANDRA-10243:
--

Thanks for your review. Here is the next update:

bq. It's not very common to update snitch property of a live node (specially 
with GPFS), so I don't think the warning is necessary. Should we maybe add a 
note to NEWS.txt instead? Or maybe print the warn on 2.1/2.2, and remove it on 
3.0?

Added a note to NEWS.txt and removed the warning.

bq. On CASSANDRA-9474 we changed the property name from override_rackdc to the 
original ignore_rack in addition to ignore_dc, could you please fix dtests?

I've updated the tests and removed the @require on 9474 (I've added ignore_rack 
to 2.2 so we can run the full tests on Jenkins via DTEST_REPO and DTEST_BRANCH).

bq. Instead of manipulating gossip EndpointState in 
StorageProxy.truncateBlocking(), what do you think about adding a 
excludeNodesInDeadState option to StorageService.getLiveMembers and 
Gossiper.getLiveMembers (which you could also rename it to 
Gossiper.getLiveEndpoints)?

Done

bq. Please fix nodetool_test.TestNodetool.test_correct_dc_rack_in_nodetool_info 
dtest

Now that the GPFS error is gone this test is fine, thanks for checking this.

bq. It seems 3.1 dtest did not run, mind resubmitting please?

Submitted together with the other branches just now. The reason I did not 
submit it yesterday is because the 3.1 branch is identical to the 3.0 branch 
(checked with git diff).

I'll post another update once CI completes.

> Warn or fail when changing cluster topology live
> 
>
> Key: CASSANDRA-10243
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10243
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Jonathan Ellis
>Assignee: Stefania
>Priority: Critical
> Fix For: 2.1.x
>
>
> Moving a node from one rack to another in the snitch, while it is alive, is 
> almost always the wrong thing to do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10243) Warn or fail when changing cluster topology live

2015-11-23 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023072#comment-15023072
 ] 

Paulo Motta commented on CASSANDRA-10243:
-

Looking good, a few nits left:
* It's not very common to update snitch property of a live node (specially with 
GPFS), so I don't think the warning is necessary. Should we maybe add a note to 
NEWS.txt instead? Or maybe print the warn on 2.1/2.2, and remove it on 3.0?
* On CASSANDRA-9474 we changed the property name from {{override_rackdc}} to 
the original {{ignore_rack}} in addition to {{ignore_dc}}, could you please fix 
dtests?
* Instead of manipulating gossip {{EndpointState}} in 
{{StorageProxy.truncateBlocking()}}, what do you think about adding a 
excludeNodesInDeadState option to {{StorageService.getLiveMembers}} and 
{{Gossiper.getLiveMembers}} (which you could also rename it to 
{{Gossiper.getLiveEndpoints}})?
* Please fix 
[nodetool_test.TestNodetool.test_correct_dc_rack_in_nodetool_info|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-10243-2.2-dtest/lastCompletedBuild/testReport/nodetool_test/TestNodetool/test_correct_dc_rack_in_nodetool_info/]
 dtest
* It seems [3.1 
dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-10243-3.1-dtest/]
 did not run, mind resubmitting please?

Thanks!

> Warn or fail when changing cluster topology live
> 
>
> Key: CASSANDRA-10243
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10243
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Jonathan Ellis
>Assignee: Stefania
>Priority: Critical
> Fix For: 2.1.x
>
>
> Moving a node from one rack to another in the snitch, while it is alive, is 
> almost always the wrong thing to do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10243) Warn or fail when changing cluster topology live

2015-11-22 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021671#comment-15021671
 ] 

Stefania commented on CASSANDRA-10243:
--

I've removed config reload from GPFS but I've decided to emit an error message 
to warn users this is no longer supported in case we detect an updated 
configuration file.

I've simplified the liveliness check in the other two remaining snitches as 
discussed. I've cleaned-up a bit the code in Gossiper and SS but the legacy 
functionality for getting live token owners in SS truncate should be 
functionally identical.

I've fixed the [new 
dtests|https://github.com/stef1927/cassandra-dtest/commits/10243] according to 
the new GPFS error message and noticed that in 2.2 the tests do not pass 
because of the missing {{-Dcassandra.ignore_rack}} property. I've therefore 
updated them to use the new property name that will be introduced by 
CASSANDRA-9474 and marked the tests depending on it. For this reason, I haven't 
created  a pull request yet.

I've also up-merged, CI is still running:

||2.1||2.2||3.0||3.1||trunk||
|[patch|https://github.com/stef1927/cassandra/commits/10243-2.1]|[patch|https://github.com/stef1927/cassandra/commits/10243-2.2]|[patch|https://github.com/stef1927/cassandra/commits/10243-3.0]|[patch|https://github.com/stef1927/cassandra/commits/10243-3.1]|[patch|https://github.com/stef1927/cassandra/commits/10243]|
|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-10243-2.1-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-10243-2.2-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-10243-3.0-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-10243-3.1-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-10243-testall/]|
|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-10243-2.1-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-10243-2.2-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-10243-3.0-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-10243-3.1-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-10243-dtest/]|


> Warn or fail when changing cluster topology live
> 
>
> Key: CASSANDRA-10243
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10243
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Jonathan Ellis
>Assignee: Stefania
>Priority: Critical
> Fix For: 2.1.x
>
>
> Moving a node from one rack to another in the snitch, while it is alive, is 
> almost always the wrong thing to do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10243) Warn or fail when changing cluster topology live

2015-11-20 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15018849#comment-15018849
 ] 

Paulo Motta commented on CASSANDRA-10243:
-

Finished second part of review and don't have much to add besides the previous 
comments. Very nice and comprehensive dtest and unit test suite, 
congratulations!

I wasn't very familiar with PropertyFileSnitch and YamlPropertyFileSnitch so 
took a bit longer to review those, specially the default rack/dc thing. I don't 
see much point in keeping PropertyFileSnitch around (and having to maintain 
it), given you can achieve the same, and even more, in a much simpler way with 
GossipingPropertyFileSnitch, so created CASSANDRA-10745 to deprecate the 
PropertyFileSnitch.

While this is very well tested and CASSANDRA-10242 and CASSANDRA-9474 don't 
make much sense without this patch this is quite a bit of code to add in the 
end of 2.1, so I'll leave to the committer to decide if this should go into 
2.1, but I guess it should be OK.

Addressing your previous comments:

bq. my preference would be to leave existing code unchanged, especially if this 
goes to 2.1, but I am not opposed to simplifying the new liveliness check for 
the snitch to what you suggested

+1

bq. I don't see why wait for up to 60 seconds before reloading a config file, 5 
seconds is a pretty long time and it should not have any adverse impact.

this file is rarely ever changed, and now even less, so 60 seconds is more than 
enough, but if lowering makes testing easier I guess it should be fine

bq. maybe we should never allow chaning dc/rack for GPFS, or remove the config 
reload altogether as suggested in 

+1, we should keep GPFS as simple as possible, and I don't see much sense in 
reloading only prefer_local. You can maybe just reuse the 
[patch|https://issues.apache.org/jira/secure/attachment/12738530/cassandra-2.1-9474.patch]
 from CASSANDRA-9474 which is ready.

bq. Should we add a JVM property to override the liveliness checks, just as a 
safety measure in case someone has a legitimate reason to change rack/dc of a 
live node?

I don't see a legitimate reason to change the rack/dc of a live node and 
restarting the node in this case shouldn't be a big deal, so better avoid 
adding new properties IMO.

Good job!

> Warn or fail when changing cluster topology live
> 
>
> Key: CASSANDRA-10243
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10243
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Jonathan Ellis
>Assignee: Stefania
>Priority: Critical
> Fix For: 2.1.x
>
>
> Moving a node from one rack to another in the snitch, while it is alive, is 
> almost always the wrong thing to do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10243) Warn or fail when changing cluster topology live

2015-11-19 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014921#comment-15014921
 ] 

Paulo Motta commented on CASSANDRA-10243:
-

Nice work! Some preliminary comments in case you want to address before the 
timezone flip: 


> Warn or fail when changing cluster topology live
> 
>
> Key: CASSANDRA-10243
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10243
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Jonathan Ellis
>Assignee: Stefania
>Priority: Critical
> Fix For: 2.1.x
>
>
> Moving a node from one rack to another in the snitch, while it is alive, is 
> almost always the wrong thing to do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10243) Warn or fail when changing cluster topology live

2015-11-19 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15015353#comment-15015353
 ] 

Stefania commented on CASSANDRA-10243:
--

Thanks for the review! I plan on working early next week on your comments, see 
answers below, as well as any other comments you may have.

bq. Is it necessary to check if a node is in dead state for the purpose of this 
snitch check? In my understanding, if a node is on a dead state, it's neither 
live nor member of the ring, so I didn't get why that check was done previously 
on getLiveTokenOwners() in the first place, do you know? Maybe historical 
reasons? I'd prefer to have a simpler isLiveMember() check on StorageService 
(since it checks both gossip and tokenmetadata), and this method would 
basically return Gossiper.isLiveEndpoint(endpoint) && 
tokenMetadata.isMember(ep), but this is a personal thing so it's up to you to 
take this suggestion.

>From a quick code analysis I think leaving nodes are still members but their 
>state is dead? In any case, my preference would be to leave existing code 
>unchanged, especially if this goes to 2.1, but I am not opposed to simplifying 
>the new liveliness check for the snitch to what you suggested, 
>{{Gossiper.isLiveEndpoint(endpoint) && tokenMetadata.isMember(ep)}}, since 
>this would mean leaving nodes are also live, which is safer I believe.

bq. Did you intend to decrease the default snitch configuration refresh period 
from 60 to 5 seconds?

Yes I did. I reduced it so that the dtests could complete in a reasonable 
amount of time. I don't see why wait for up to 60 seconds before reloading a 
config file, 5 seconds is a pretty long time and it should not have any adverse 
impact.

bq. On GossipingPropertyFileSnitch I think it's only necessary to check if the 
dc/rack changed, or do you see a situation where one would want to live change 
the rack/dc of a non-ring memmber?

Maybe start the node with {{-Djoin_ring=false}}, change the rack and then join 
the ring? If the node is not live I'd say let them change the rack/dc even 
though I agree it doesn't make much sense. I'm really undecided to be honest, 
basically the only reason to reload the GPFS config should be to change 
{{preferLocal}} so maybe we should never allow chaning dc/rack for GPFS, or 
remove the config reload altogether as suggested in CASSANDRA-9474.
 
bq. Also on the GossipingPropertyFileSnitch maybe it's not necessary to 
updateTopology/invalidateCachedRing, since topology change is not allowed 
anymore?

{{updateTopology}} definitely no longer makes sense but 
{{invalidateCachedRing}} is probably safer to keep it, at least on startup. 
However see my next question, in which case we would need to keep 
{{updateTopology}}.

Should we add a JVM property to override the liveliness checks, just as a 
safety measure in case someone has a legitimate reason to change rack/dc of a 
live node?

> Warn or fail when changing cluster topology live
> 
>
> Key: CASSANDRA-10243
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10243
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Jonathan Ellis
>Assignee: Stefania
>Priority: Critical
> Fix For: 2.1.x
>
>
> Moving a node from one rack to another in the snitch, while it is alive, is 
> almost always the wrong thing to do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10243) Warn or fail when changing cluster topology live

2015-11-18 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012894#comment-15012894
 ] 

Stefania commented on CASSANDRA-10243:
--

The [2.1 patch|https://github.com/stef1927/cassandra/commits/10243-2.1] is 
ready for review. There are also some 
[dtests|https://github.com/stef1927/cassandra-dtest/commits/10243]. 

CI is here:

http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-10243-2.1-testall/
http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-10243-2.1-dtest/

I will prepare the 2.2+ up-merges and corresponding CI once review is done.

> Warn or fail when changing cluster topology live
> 
>
> Key: CASSANDRA-10243
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10243
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Jonathan Ellis
>Assignee: Stefania
>Priority: Critical
> Fix For: 2.1.x
>
>
> Moving a node from one rack to another in the snitch, while it is alive, is 
> almost always the wrong thing to do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10243) Warn or fail when changing cluster topology live

2015-11-18 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15013059#comment-15013059
 ] 

Stefania commented on CASSANDRA-10243:
--

Note: {{replication_test.SnitchConfigurationUpdateTest}} tests are expected to 
fail until the new dtests are committed.

> Warn or fail when changing cluster topology live
> 
>
> Key: CASSANDRA-10243
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10243
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Jonathan Ellis
>Assignee: Stefania
>Priority: Critical
> Fix For: 2.1.x
>
>
> Moving a node from one rack to another in the snitch, while it is alive, is 
> almost always the wrong thing to do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10243) Warn or fail when changing cluster topology live

2015-11-12 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001851#comment-15001851
 ] 

Stefania commented on CASSANDRA-10243:
--

Actually the startup checks have already been implemented by CASSANDRA-10242.

> Warn or fail when changing cluster topology live
> 
>
> Key: CASSANDRA-10243
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10243
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Jonathan Ellis
>Assignee: Stefania
>Priority: Critical
> Fix For: 2.1.x
>
>
> Moving a node from one rack to another in the snitch, while it is alive, is 
> almost always the wrong thing to do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10243) Warn or fail when changing cluster topology live

2015-11-10 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14998328#comment-14998328
 ] 

Stefania commented on CASSANDRA-10243:
--

So far I've done {{GossipingPropertyFileSnitch}} and {{PropertyFileSnith}} in 
2.1. Here is what the error looks like:

{code}
ERROR 09:51:55 Cannot update data center or rack from [DC1, RAC1] to [DC2, 
RAC2] for live host /127.0.0.1, property file NOT RELOADED
{code}

cc [~jbellis] and [~tjake] regarding the two questions above and summarized 
here:

* Should we enhance the deprecated snitch {{YamlFileNetworkTopologySnitch}} 
which is available in 2.1 only
* Should we  allow a node to join the cluster when the rack changes on startup 
 

> Warn or fail when changing cluster topology live
> 
>
> Key: CASSANDRA-10243
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10243
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Jonathan Ellis
>Assignee: Stefania
>Priority: Critical
> Fix For: 2.1.x
>
>
> Moving a node from one rack to another in the snitch, while it is alive, is 
> almost always the wrong thing to do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10243) Warn or fail when changing cluster topology live

2015-11-09 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996237#comment-14996237
 ] 

Stefania commented on CASSANDRA-10243:
--

Before I start work, can I confirm the fix version, 2.1, and the snitches: 
{{YamlFileNetworkTopologySnitch}}, {{PropertyFileSnitch}} and 
{{GossipingPropertyFileSnitch}}. Note that {{YamlFileNetworkTopologySnitch}} 
was deleted in 2.2.

For tracking rack and dc, everything should be already there, either in system 
peers, or the gossip application states, or the snitches themselves. If a 
snitch configuration update would result in a rack or dc change for a live node 
(a node that belongs to Gossiper.liveEndpoints) then we log a loud error 
message and do not update the snitch at all. Correct?

bq. I'm not even sure if we should allow a node to join the cluster when the 
rack changes on startup without an explicit operator intervention.

We can look at system peers and refuse to join unless the operator uses a 
specific system property, would this be OK?

> Warn or fail when changing cluster topology live
> 
>
> Key: CASSANDRA-10243
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10243
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Jonathan Ellis
>Assignee: Stefania
>Priority: Critical
> Fix For: 2.1.x
>
>
> Moving a node from one rack to another in the snitch, while it is alive, is 
> almost always the wrong thing to do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10243) Warn or fail when changing cluster topology live

2015-09-02 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14727388#comment-14727388
 ] 

T Jake Luciani commented on CASSANDRA-10243:


bq. Are we confident that this is always the wrong thing?

Yes. We should track the rack name on startup and use while the node is up.
I'm not even sure if we should allow a node to join the cluster on startup 
without an explicit operator intervention.

We should apply the same logic to DC movement.

> Warn or fail when changing cluster topology live
> 
>
> Key: CASSANDRA-10243
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10243
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Jonathan Ellis
> Fix For: 2.1.x
>
>
> Moving a node from one rack to another in the snitch, while it is alive, is 
> almost always the wrong thing to do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10243) Warn or fail when changing cluster topology live

2015-09-01 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14726159#comment-14726159
 ] 

Jonathan Ellis commented on CASSANDRA-10243:


Are we confident that this is *always* the wrong thing?  If so we could fail 
changing the snitch topology when it would change a live node's location.

But this is dangerous since different nodes could have different views of 
cluster states, resulting in some loading the new config and others rejecting 
it.  Nor does it help when a node is legitimately down temporarily but comes 
back up to find its rack has been changed out from under it.

Perhaps it is simplest to just warn loudly in all these cases.

> Warn or fail when changing cluster topology live
> 
>
> Key: CASSANDRA-10243
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10243
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Jonathan Ellis
> Fix For: 2.1.x
>
>
> Moving a node from one rack to another in the snitch, while it is alive, is 
> almost always the wrong thing to do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)