[ https://issues.apache.org/jira/browse/CASSANDRA-15355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
James Baker updated CASSANDRA-15355: ------------------------------------ Component/s: Cluster/Gossip Impacts: (was: None) Since Version: 2.2.18 > Schema push/pull race on continuous schema changes > -------------------------------------------------- > > Key: CASSANDRA-15355 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15355 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip > Reporter: James Baker > Priority: Normal > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > In https://issues.apache.org/jira/browse/CASSANDRA-5025, pull based schema > updates were scheduled 1 minute after the schema change was first visible, so > as to prefer the push codepath as much as possible. > Unfortunately, this does not handle the case where there are many schema > changes happening - imagine a scenario where we create a table every 5 > seconds for 2 minutes - the first update tasks execute 60 seconds in and the > schemas may well be out of sync between nodes at that point. > In this case, there is some chance that when the task runs, the schemas are > out of sync because a subsequent schema update has occurred, and so the same > push/pull race has happened. > A fix is to modify the codepath such that the scheduled task is only run if > the other node's schema version is the same as when the task was scheduled. A > different (later scheduled) task should run otherwise. > For us, what we see is that when we have a reasonably large number of > changes, a few schema changes can have the unfortunate outcome of causing our > nodes to run out of memory and crash - if we have a 30 node cluster, create a > table every second for 2 minutes, and for some reason we pause for 10 seconds > after 60 seconds with no progress, we can easily end up currently running 300 > schema pulls for a single node. These can cause further piling up which > causes cascading failures. This change stops that. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org