[ https://issues.apache.org/jira/browse/CASSANDRA-15878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17147902#comment-17147902 ]
Michael Semb Wever commented on CASSANDRA-15878: ------------------------------------------------ Patch looks good to me. Move the ticket into 'patch submitted' if you're ready for me to commit it. > Ec2Snitch fails on upgrade in legacy mode > ----------------------------------------- > > Key: CASSANDRA-15878 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15878 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Distributed Metadata > Reporter: Alexander Dejanovski > Assignee: Alexander Dejanovski > Priority: Normal > Fix For: 4.0-beta > > > CASSANDRA-7839 changed the way the EC2 DC/Rack naming was handled in the > Ec2Snitch to match AWS conventions. > The "legacy" mode was introduced to allow upgrades from Cassandra 3.0/3.x and > keep the same naming as before (while the "standard" mode uses the new naming > convention). > When performing an upgrade in the us-west-2 region, the second node failed to > start with the following exception: > > {code:java} > ERROR [main] 2020-06-16 09:14:42,218 Ec2Snitch.java:210 - This ec2-enabled > snitch appears to be using the legacy naming scheme for regions, but existing > nodes in cluster are using the opposite: region(s) = [us-west-2], > availability zone(s) = [2a]. Please check the ec2_naming_scheme property in > the cassandra-rackdc.properties configuration file for more details. > ERROR [main] 2020-06-16 09:14:42,219 CassandraDaemon.java:789 - Exception > encountered during startup > java.lang.IllegalStateException: null > at > org.apache.cassandra.service.StorageService.validateEndpointSnitch(StorageService.java:573) > at > org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:530) > at > org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:800) > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:659) > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:610) > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:373) > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:650) > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:767) > {code} > > The exception leads back to [this piece of > code|https://github.com/apache/cassandra/blob/cassandra-4.0-alpha4/src/java/org/apache/cassandra/locator/Ec2Snitch.java#L183-L185]. > After adding some logging, it turned out the DC name of the first upgraded > node was considered invalid as a legacy one: > {code:java} > INFO [main] 2020-06-16 09:14:42,216 Ec2Snitch.java:183 - Detected DC > us-west-2 > INFO [main] 2020-06-16 09:14:42,217 Ec2Snitch.java:185 - > dcUsesLegacyFormat=false / usingLegacyNaming=true > ERROR [main] 2020-06-16 09:14:42,217 Ec2Snitch.java:188 - Invalid DC name > us-west-2 > {code} > > The problem is that the regex that's used to identify legacy dc names will > match both old and new names : > {code:java} > boolean dcUsesLegacyFormat = !dc.matches("[a-z]+-[a-z].+-[\\d].*"); > {code} > Knowing that some dc names didn't change between the two modes (us-west-2 for > example), I don't see how we can use the dc names to detect if the legacy > mode is being used by other nodes in the cluster. > > The rack names on the other hand are totally different in the legacy and > standard modes and can be used to detect mismatching settings. > > My go to fix would be to drop the check on datacenters by removing the > following lines: > [https://github.com/apache/cassandra/blob/cassandra-4.0-alpha4/src/java/org/apache/cassandra/locator/Ec2Snitch.java#L172-L186] -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org