[jira] [Commented] (CASSANDRA-15878) Ec2Snitch fails on upgrade in legacy mode

2020-06-28 Thread Alexander Dejanovski (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17147566#comment-17147566
 ] 

Alexander Dejanovski commented on CASSANDRA-15878:
--

Thanks for the feedback [~jolynch].

I've added back the DC name check and adjusted it as suggested. I provided 
accurate informations on which case we're actually covering now with this check.

[~mck], I've reintroduced the unit tests that I had deleted and changed the 
assertions where needed.

You can check the changes here: 
[https://github.com/apache/cassandra/compare/trunk...thelastpickle:CASSANDRA-15878]

Let me know what you think.

> Ec2Snitch fails on upgrade in legacy mode
> -
>
> Key: CASSANDRA-15878
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15878
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Distributed Metadata
>Reporter: Alexander Dejanovski
>Assignee: Alexander Dejanovski
>Priority: Normal
> Fix For: 4.0-beta
>
>
> CASSANDRA-7839 changed the way the EC2 DC/Rack naming was handled in the 
> Ec2Snitch to match AWS conventions.
> The "legacy" mode was introduced to allow upgrades from Cassandra 3.0/3.x and 
> keep the same naming as before (while the "standard" mode uses the new naming 
> convention).
> When performing an upgrade in the us-west-2 region, the second node failed to 
> start with the following exception:
>  
> {code:java}
> ERROR [main] 2020-06-16 09:14:42,218 Ec2Snitch.java:210 - This ec2-enabled 
> snitch appears to be using the legacy naming scheme for regions, but existing 
> nodes in cluster are using the opposite: region(s) = [us-west-2], 
> availability zone(s) = [2a]. Please check the ec2_naming_scheme property in 
> the cassandra-rackdc.properties configuration file for more details.
> ERROR [main] 2020-06-16 09:14:42,219 CassandraDaemon.java:789 - Exception 
> encountered during startup
> java.lang.IllegalStateException: null
>   at 
> org.apache.cassandra.service.StorageService.validateEndpointSnitch(StorageService.java:573)
>   at 
> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:530)
>   at 
> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:800)
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:659)
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:610)
>   at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:373)
>   at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:650)
>   at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:767)
> {code}
>  
> The exception leads back to [this piece of 
> code|https://github.com/apache/cassandra/blob/cassandra-4.0-alpha4/src/java/org/apache/cassandra/locator/Ec2Snitch.java#L183-L185].
> After adding some logging, it turned out the DC name of the first upgraded 
> node was considered invalid as a legacy one:
> {code:java}
> INFO  [main] 2020-06-16 09:14:42,216 Ec2Snitch.java:183 - Detected DC 
> us-west-2
> INFO  [main] 2020-06-16 09:14:42,217 Ec2Snitch.java:185 - 
> dcUsesLegacyFormat=false / usingLegacyNaming=true
> ERROR [main] 2020-06-16 09:14:42,217 Ec2Snitch.java:188 - Invalid DC name 
> us-west-2
> {code}
>  
> The problem is that the regex that's used to identify legacy dc names will 
> match both old and new names : 
> {code:java}
> boolean dcUsesLegacyFormat = !dc.matches("[a-z]+-[a-z].+-[\\d].*");
> {code}
> Knowing that some dc names didn't change between the two modes (us-west-2 for 
> example), I don't see how we can use the dc names to detect if the legacy 
> mode is being used by other nodes in the cluster.
>   
>  The rack names on the other hand are totally different in the legacy and 
> standard modes and can be used to detect mismatching settings.
>   
>  My go to fix would be to drop the check on datacenters by removing the 
> following lines: 
> [https://github.com/apache/cassandra/blob/cassandra-4.0-alpha4/src/java/org/apache/cassandra/locator/Ec2Snitch.java#L172-L186]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15878) Ec2Snitch fails on upgrade in legacy mode

2020-06-28 Thread Joey Lynch (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17147473#comment-17147473
 ] 

Joey Lynch commented on CASSANDRA-15878:


[~adejanovski] I agree with you that the rack check is sufficient to determine 
that we have an accidental mixed mode cluster, but I think we can salvage the 
datacenter check as well which imo provides a nice second layer of defense 
against a very easy to make mistake. Specifically, if we see a datacenter with 
no number (e.g. just "us-east" or just "eu-west", and usingLegacyNaming is set 
to False then we could reject it as invalid and raise an exception. Yes, if 
users had custom suffixes that included numbers we would miss it, but then the 
rack verification would catch it.

I'm fine with the proposed patch as I agree the rack checks are sufficient, but 
I'd also be fine with fixing the datacenter check to be:
{noformat}
boolean dcUsesLegacyFormat = dc.matches("[a-z]+-[a-z]+$");
if (dcUsesLegacyFormat != usingLegacyNaming)
valid = false;
{noformat}
So for example if someone started up with usingLegacyNaming=false, and the node 
sees another node with datacenter "us-east" or "eu-west", then we fail with the 
error to prevent a split brained cluster. If someone had a custom datacenter 
suffix of -1, we would be foiled, and the rack check would have to catch the 
operator error.

> Ec2Snitch fails on upgrade in legacy mode
> -
>
> Key: CASSANDRA-15878
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15878
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Distributed Metadata
>Reporter: Alexander Dejanovski
>Assignee: Alexander Dejanovski
>Priority: Normal
> Fix For: 4.0-beta
>
>
> CASSANDRA-7839 changed the way the EC2 DC/Rack naming was handled in the 
> Ec2Snitch to match AWS conventions.
> The "legacy" mode was introduced to allow upgrades from Cassandra 3.0/3.x and 
> keep the same naming as before (while the "standard" mode uses the new naming 
> convention).
> When performing an upgrade in the us-west-2 region, the second node failed to 
> start with the following exception:
>  
> {code:java}
> ERROR [main] 2020-06-16 09:14:42,218 Ec2Snitch.java:210 - This ec2-enabled 
> snitch appears to be using the legacy naming scheme for regions, but existing 
> nodes in cluster are using the opposite: region(s) = [us-west-2], 
> availability zone(s) = [2a]. Please check the ec2_naming_scheme property in 
> the cassandra-rackdc.properties configuration file for more details.
> ERROR [main] 2020-06-16 09:14:42,219 CassandraDaemon.java:789 - Exception 
> encountered during startup
> java.lang.IllegalStateException: null
>   at 
> org.apache.cassandra.service.StorageService.validateEndpointSnitch(StorageService.java:573)
>   at 
> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:530)
>   at 
> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:800)
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:659)
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:610)
>   at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:373)
>   at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:650)
>   at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:767)
> {code}
>  
> The exception leads back to [this piece of 
> code|https://github.com/apache/cassandra/blob/cassandra-4.0-alpha4/src/java/org/apache/cassandra/locator/Ec2Snitch.java#L183-L185].
> After adding some logging, it turned out the DC name of the first upgraded 
> node was considered invalid as a legacy one:
> {code:java}
> INFO  [main] 2020-06-16 09:14:42,216 Ec2Snitch.java:183 - Detected DC 
> us-west-2
> INFO  [main] 2020-06-16 09:14:42,217 Ec2Snitch.java:185 - 
> dcUsesLegacyFormat=false / usingLegacyNaming=true
> ERROR [main] 2020-06-16 09:14:42,217 Ec2Snitch.java:188 - Invalid DC name 
> us-west-2
> {code}
>  
> The problem is that the regex that's used to identify legacy dc names will 
> match both old and new names : 
> {code:java}
> boolean dcUsesLegacyFormat = !dc.matches("[a-z]+-[a-z].+-[\\d].*");
> {code}
> Knowing that some dc names didn't change between the two modes (us-west-2 for 
> example), I don't see how we can use the dc names to detect if the legacy 
> mode is being used by other nodes in the cluster.
>   
>  The rack names on the other hand are totally different in the legacy and 
> standard modes and can be used to detect mismatching settings.
>   
>  My go to fix would be to drop the check on datacenters by removing the 
> following lines: 
> [https://github.com/apache/cassandra

[jira] [Commented] (CASSANDRA-15234) Standardise config and JVM parameters

2020-06-28 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17147437#comment-17147437
 ] 

Benedict Elliott Smith commented on CASSANDRA-15234:


> it is scheduled to be removed by 5.0

Do we need to remove the legacy parameters if they continue to mean the same 
thing?  I think we should generally defer breaking deprecated public APIs until 
it's useful or necessary.  We're doing this to cleanup for ourselves and for 
future users, not to make existing users have a bad time.


> Standardise config and JVM parameters
> -
>
> Key: CASSANDRA-15234
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15234
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Benedict Elliott Smith
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.0-alpha
>
> Attachments: CASSANDRA-15234-3-DTests-JAVA8.txt
>
>
> We have a bunch of inconsistent names and config patterns in the codebase, 
> both from the yams and JVM properties.  It would be nice to standardise the 
> naming (such as otc_ vs internode_) as well as the provision of values with 
> units - while maintaining perpetual backwards compatibility with the old 
> parameter names, of course.
> For temporal units, I would propose parsing strings with suffixes of:
> {{code}}
> u|micros(econds?)?
> ms|millis(econds?)?
> s(econds?)?
> m(inutes?)?
> h(ours?)?
> d(ays?)?
> mo(nths?)?
> {{code}}
> For rate units, I would propose parsing any of the standard {{B/s, KiB/s, 
> MiB/s, GiB/s, TiB/s}}.
> Perhaps for avoiding ambiguity we could not accept bauds {{bs, Mbps}} or 
> powers of 1000 such as {{KB/s}}, given these are regularly used for either 
> their old or new definition e.g. {{KiB/s}}, or we could support them and 
> simply log the value in bytes/s.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15234) Standardise config and JVM parameters

2020-06-28 Thread Michael Semb Wever (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17147386#comment-17147386
 ] 

Michael Semb Wever edited comment on CASSANDRA-15234 at 6/28/20, 6:33 PM:
--

I've added some (very minor) comments on the 
[commit|https://github.com/ekaterinadimitrova2/cassandra/commit/f7ed1e00db5cc5810779a3d71dc065c750ba8e6b].
 

Higher level comments (while testing)…
 - during startup warnings about legacy yaml names are logged multiple times…
 - {{"it is scheduled to be removed by 5.0"}} would be more precise as {{"the 
old name is scheduled to be removed by 5.0"}} (as opposed to the setting+ value 
being removed),
 - if I define setting  to a blank value, I get a error message like: 
{{"Invalid yaml. Those properties [max_hint_window] are not valid"}}.  This 
falsely tells me thtat the setting name is wrong, rather than the value. 
(Invalid non-blank values provide useful error messages.)
 - integer overflow could have a better error message, currently 
{{"NumberFormatException: For input string: "9223372036854775808""}}
 - the under/over flow protection between units is nice. 
{{Duration.toMilliseconds()}}  apidoc also describe the overflow to  MAX_VALUE. 
(same goes for similar default-unit methods in BitsRate and DataStorage)
 - awesome that you can use {{µs}} :-)


was (Author: michaelsembwever):
I've added some (very minor) comments on the 
[commit|https://github.com/ekaterinadimitrova2/cassandra/commit/f7ed1e00db5cc5810779a3d71dc065c750ba8e6b].
 

Higher level comments (while testing)…
 - during startup warnings about legacy yaml names are logged multiple times…
 - {{"it is scheduled to be removed by 5.0"}} would be more precise as "the old 
name is scheduled to be removed by 5.0" (as opposed to the setting+ value being 
removed),
 - if I define setting  to a blank value, I get a error message like: 
{{"Invalid yaml. Those properties [max_hint_window] are not valid"}}.  This 
falsely tells me thtat the setting name is wrong, rather than the value. 
(Invalid non-blank values provide useful error messages.)
 - integer overflow could have a better error message, currently 
{{"NumberFormatException: For input string: "9223372036854775808""}}
 - the under/over flow protection between units is nice. 
{{Duration.toMilliseconds()}}  apidoc also describe the overflow to  MAX_VALUE. 
(same goes for similar default-unit methods in BitsRate and DataStorage)
 - awesome that you can use {{µs}} :-)

> Standardise config and JVM parameters
> -
>
> Key: CASSANDRA-15234
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15234
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Benedict Elliott Smith
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.0-alpha
>
> Attachments: CASSANDRA-15234-3-DTests-JAVA8.txt
>
>
> We have a bunch of inconsistent names and config patterns in the codebase, 
> both from the yams and JVM properties.  It would be nice to standardise the 
> naming (such as otc_ vs internode_) as well as the provision of values with 
> units - while maintaining perpetual backwards compatibility with the old 
> parameter names, of course.
> For temporal units, I would propose parsing strings with suffixes of:
> {{code}}
> u|micros(econds?)?
> ms|millis(econds?)?
> s(econds?)?
> m(inutes?)?
> h(ours?)?
> d(ays?)?
> mo(nths?)?
> {{code}}
> For rate units, I would propose parsing any of the standard {{B/s, KiB/s, 
> MiB/s, GiB/s, TiB/s}}.
> Perhaps for avoiding ambiguity we could not accept bauds {{bs, Mbps}} or 
> powers of 1000 such as {{KB/s}}, given these are regularly used for either 
> their old or new definition e.g. {{KiB/s}}, or we could support them and 
> simply log the value in bytes/s.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15234) Standardise config and JVM parameters

2020-06-28 Thread Michael Semb Wever (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17147386#comment-17147386
 ] 

Michael Semb Wever edited comment on CASSANDRA-15234 at 6/28/20, 6:32 PM:
--

I've added some (very minor) comments on the 
[commit|https://github.com/ekaterinadimitrova2/cassandra/commit/f7ed1e00db5cc5810779a3d71dc065c750ba8e6b].
 

Higher level comments (while testing)…
 - during startup warnings about legacy yaml names are logged multiple times…
 - {{"it is scheduled to be removed by 5.0"}} would be more precise as "the old 
name is scheduled to be removed by 5.0" (as opposed to the setting+ value being 
removed),
 - if I define setting  to a blank value, I get a error message like: 
{{"Invalid yaml. Those properties [max_hint_window] are not valid"}}.  This 
falsely tells me thtat the setting name is wrong, rather than the value. 
(Invalid non-blank values provide useful error messages.)
 - integer overflow could have a better error message, currently 
{{"NumberFormatException: For input string: "9223372036854775808""}}
 - the under/over flow protection between units is nice. 
{{Duration.toMilliseconds()}}  apidoc also describe the overflow to  MAX_VALUE. 
(same goes for similar default-unit methods in BitsRate and DataStorage)
 - awesome that you can use {{µs}} :-)


was (Author: michaelsembwever):
I've added some (very minor) comments on the 
[commit|https://github.com/ekaterinadimitrova2/cassandra/commit/f7ed1e00db5cc5810779a3d71dc065c750ba8e6b].
 

> Standardise config and JVM parameters
> -
>
> Key: CASSANDRA-15234
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15234
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Benedict Elliott Smith
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.0-alpha
>
> Attachments: CASSANDRA-15234-3-DTests-JAVA8.txt
>
>
> We have a bunch of inconsistent names and config patterns in the codebase, 
> both from the yams and JVM properties.  It would be nice to standardise the 
> naming (such as otc_ vs internode_) as well as the provision of values with 
> units - while maintaining perpetual backwards compatibility with the old 
> parameter names, of course.
> For temporal units, I would propose parsing strings with suffixes of:
> {{code}}
> u|micros(econds?)?
> ms|millis(econds?)?
> s(econds?)?
> m(inutes?)?
> h(ours?)?
> d(ays?)?
> mo(nths?)?
> {{code}}
> For rate units, I would propose parsing any of the standard {{B/s, KiB/s, 
> MiB/s, GiB/s, TiB/s}}.
> Perhaps for avoiding ambiguity we could not accept bauds {{bs, Mbps}} or 
> powers of 1000 such as {{KB/s}}, given these are regularly used for either 
> their old or new definition e.g. {{KiB/s}}, or we could support them and 
> simply log the value in bytes/s.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15234) Standardise config and JVM parameters

2020-06-28 Thread Michael Semb Wever (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17147386#comment-17147386
 ] 

Michael Semb Wever commented on CASSANDRA-15234:


I've added some (very minor) comments on the 
[commit|]https://github.com/ekaterinadimitrova2/cassandra/commit/f7ed1e00db5cc5810779a3d71dc065c750ba8e6b.
 

> Standardise config and JVM parameters
> -
>
> Key: CASSANDRA-15234
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15234
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Benedict Elliott Smith
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.0-alpha
>
> Attachments: CASSANDRA-15234-3-DTests-JAVA8.txt
>
>
> We have a bunch of inconsistent names and config patterns in the codebase, 
> both from the yams and JVM properties.  It would be nice to standardise the 
> naming (such as otc_ vs internode_) as well as the provision of values with 
> units - while maintaining perpetual backwards compatibility with the old 
> parameter names, of course.
> For temporal units, I would propose parsing strings with suffixes of:
> {{code}}
> u|micros(econds?)?
> ms|millis(econds?)?
> s(econds?)?
> m(inutes?)?
> h(ours?)?
> d(ays?)?
> mo(nths?)?
> {{code}}
> For rate units, I would propose parsing any of the standard {{B/s, KiB/s, 
> MiB/s, GiB/s, TiB/s}}.
> Perhaps for avoiding ambiguity we could not accept bauds {{bs, Mbps}} or 
> powers of 1000 such as {{KB/s}}, given these are regularly used for either 
> their old or new definition e.g. {{KiB/s}}, or we could support them and 
> simply log the value in bytes/s.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15234) Standardise config and JVM parameters

2020-06-28 Thread Michael Semb Wever (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17147386#comment-17147386
 ] 

Michael Semb Wever edited comment on CASSANDRA-15234 at 6/28/20, 4:35 PM:
--

I've added some (very minor) comments on the 
[commit|https://github.com/ekaterinadimitrova2/cassandra/commit/f7ed1e00db5cc5810779a3d71dc065c750ba8e6b].
 


was (Author: michaelsembwever):
I've added some (very minor) comments on the 
[commit|]https://github.com/ekaterinadimitrova2/cassandra/commit/f7ed1e00db5cc5810779a3d71dc065c750ba8e6b.
 

> Standardise config and JVM parameters
> -
>
> Key: CASSANDRA-15234
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15234
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Benedict Elliott Smith
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.0-alpha
>
> Attachments: CASSANDRA-15234-3-DTests-JAVA8.txt
>
>
> We have a bunch of inconsistent names and config patterns in the codebase, 
> both from the yams and JVM properties.  It would be nice to standardise the 
> naming (such as otc_ vs internode_) as well as the provision of values with 
> units - while maintaining perpetual backwards compatibility with the old 
> parameter names, of course.
> For temporal units, I would propose parsing strings with suffixes of:
> {{code}}
> u|micros(econds?)?
> ms|millis(econds?)?
> s(econds?)?
> m(inutes?)?
> h(ours?)?
> d(ays?)?
> mo(nths?)?
> {{code}}
> For rate units, I would propose parsing any of the standard {{B/s, KiB/s, 
> MiB/s, GiB/s, TiB/s}}.
> Perhaps for avoiding ambiguity we could not accept bauds {{bs, Mbps}} or 
> powers of 1000 such as {{KB/s}}, given these are regularly used for either 
> their old or new definition e.g. {{KiB/s}}, or we could support them and 
> simply log the value in bytes/s.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15878) Ec2Snitch fails on upgrade in legacy mode

2020-06-28 Thread Michael Semb Wever (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17147297#comment-17147297
 ] 

Michael Semb Wever commented on CASSANDRA-15878:


bq.  feel free to put me as a reviewer (although I see mck already got to it  ).

Best with your input on this [~jolynch], as you've got a better idea on the 
original work.
my two cents, is it would help if the unit tests could cover all scenarios, 
including those where we're dependent on racks to identify legacy, e.g. 
"us-west-2". 

> Ec2Snitch fails on upgrade in legacy mode
> -
>
> Key: CASSANDRA-15878
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15878
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Distributed Metadata
>Reporter: Alexander Dejanovski
>Assignee: Alexander Dejanovski
>Priority: Normal
> Fix For: 4.0-beta
>
>
> CASSANDRA-7839 changed the way the EC2 DC/Rack naming was handled in the 
> Ec2Snitch to match AWS conventions.
> The "legacy" mode was introduced to allow upgrades from Cassandra 3.0/3.x and 
> keep the same naming as before (while the "standard" mode uses the new naming 
> convention).
> When performing an upgrade in the us-west-2 region, the second node failed to 
> start with the following exception:
>  
> {code:java}
> ERROR [main] 2020-06-16 09:14:42,218 Ec2Snitch.java:210 - This ec2-enabled 
> snitch appears to be using the legacy naming scheme for regions, but existing 
> nodes in cluster are using the opposite: region(s) = [us-west-2], 
> availability zone(s) = [2a]. Please check the ec2_naming_scheme property in 
> the cassandra-rackdc.properties configuration file for more details.
> ERROR [main] 2020-06-16 09:14:42,219 CassandraDaemon.java:789 - Exception 
> encountered during startup
> java.lang.IllegalStateException: null
>   at 
> org.apache.cassandra.service.StorageService.validateEndpointSnitch(StorageService.java:573)
>   at 
> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:530)
>   at 
> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:800)
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:659)
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:610)
>   at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:373)
>   at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:650)
>   at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:767)
> {code}
>  
> The exception leads back to [this piece of 
> code|https://github.com/apache/cassandra/blob/cassandra-4.0-alpha4/src/java/org/apache/cassandra/locator/Ec2Snitch.java#L183-L185].
> After adding some logging, it turned out the DC name of the first upgraded 
> node was considered invalid as a legacy one:
> {code:java}
> INFO  [main] 2020-06-16 09:14:42,216 Ec2Snitch.java:183 - Detected DC 
> us-west-2
> INFO  [main] 2020-06-16 09:14:42,217 Ec2Snitch.java:185 - 
> dcUsesLegacyFormat=false / usingLegacyNaming=true
> ERROR [main] 2020-06-16 09:14:42,217 Ec2Snitch.java:188 - Invalid DC name 
> us-west-2
> {code}
>  
> The problem is that the regex that's used to identify legacy dc names will 
> match both old and new names : 
> {code:java}
> boolean dcUsesLegacyFormat = !dc.matches("[a-z]+-[a-z].+-[\\d].*");
> {code}
> Knowing that some dc names didn't change between the two modes (us-west-2 for 
> example), I don't see how we can use the dc names to detect if the legacy 
> mode is being used by other nodes in the cluster.
>   
>  The rack names on the other hand are totally different in the legacy and 
> standard modes and can be used to detect mismatching settings.
>   
>  My go to fix would be to drop the check on datacenters by removing the 
> following lines: 
> [https://github.com/apache/cassandra/blob/cassandra-4.0-alpha4/src/java/org/apache/cassandra/locator/Ec2Snitch.java#L172-L186]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org