[ 
https://issues.apache.org/jira/browse/CASSANDRA-18791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-18791:
----------------------------------------
    Change Category: Operability
         Complexity: Normal
          Reviewers: Alex Petrov, Sam Tunnicliffe
             Status: Open  (was: Triage Needed)

> CEP-21 - Multiple TCM fixes for issues discovered by unit, integration and 
> simulation testing
> ---------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-18791
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18791
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Cluster/Membership
>            Reporter: Marcus Eriksson
>            Assignee: Marcus Eriksson
>            Priority: Normal
>
> Full branch: [https://github.com/krummas/cassandra/commits/marcuse/cep-21-tcm]
> Tests: 
> [cci|https://app.circleci.com/pipelines/github/krummas/cassandra/885/workflows/7cc3e1a1-45b9-4069-bb02-2a46855b4dbe]
>  - current status is: 
> unit tests: 35/12052 failures
> jvm dtests: 23/1459 failures
> python dtests: 110/1018 failures
> We will spend the next few weeks getting all the test targets down to 0
> Summary of changes;
> [CEP-21] Python dtest fixes * maybe fix hintedhandoff test
>  - [https://github.com/krummas/cassandra/commit/3da91c26fb]
>  - [https://github.com/krummas/cassandra/commit/98e444adbb]
> [CEP-21] In-JVM DTest fixes
>  - [https://github.com/krummas/cassandra/compare/6491a70041...c9126a8024]
> [CEP-21] Unit test fixes
>  - [https://github.com/krummas/cassandra/compare/02be7aa71c...6491a70041]
> [CEP-21] Escape infinite local log loop on replica mis-configuration
>  - [https://github.com/krummas/cassandra/commit/02be7aa71c]
> Currently different replicas can have different configurations (guardrails 
> for example) If a transformation is not applied on a replica, this node got 
> stuck in an infinite loop. For now escape that loop until we have a better 
> solution.
> [CEP-21] Fix batchlog consistency errors during epoch bumps
>  - [https://github.com/krummas/cassandra/commit/c853bf864e]
> [CEP-21] Avoid using batches in distributed metadata log keyspace
>  - [https://github.com/krummas/cassandra/commit/d4b4766e0b]
> [CEP-21] Fix table metadata serialization
>  - [https://github.com/krummas/cassandra/commit/4056bab669]
> [CEP-21] add more metrics
>  - [https://github.com/krummas/cassandra/commit/b6ccb559f5]
> [CEP-21] getHostIdForEndpoint return null if unknown endpoint
>  - [https://github.com/krummas/cassandra/commit/b9243df05b]
> [CEP-21] CMS handling
>  - [https://github.com/krummas/cassandra/commit/15eea30d43]
>  - [https://github.com/krummas/cassandra/commit/d64da5f5e4]
>  - [https://github.com/krummas/cassandra/commit/61deb52811]
> [CEP-21] Upgrade fixes
>  - [https://github.com/krummas/cassandra/commit/33d186b4ce]
> Properly set system.local host id on upgrade.
>  - [https://github.com/krummas/cassandra/commit/b96bdc83e1]
> If replica misses migration message, set migration as successfull when it 
> sees the first epoch bump.
>  - [https://github.com/krummas/cassandra/commit/712828bc82]
> Handle hints on upgrade - we change the hostid when enabling CMS, hints 
> should be delivered before that.
> [CEP-21] Catchup/log fetching improvements
>  - [https://github.com/krummas/cassandra/commit/31a183e236]
> When an instance sees a message from a peer with a newer epoch, try to catch 
> up from that peer instead of the CMS to reduce load on the CMS nodes and to 
> allow for cluster to quiesce in the case of the CMS being down.
>  - [https://github.com/krummas/cassandra/commit/8c6a4b35db]
> We can get a snapshot when catching up, in this case the pending log should 
> first apply the snapshot and skip any previous entries.
>  - [https://github.com/krummas/cassandra/commit/387853487f]
> When deserializing partition update, allow if current epoch >= serialized 
> epoch
>  - [https://github.com/krummas/cassandra/commit/626d224716]
> When we replay from a snapshot we might see a node as LEFT for the first time 
> (it was bootstrapped and left while we were down)
> [CEP-21] Require Paxos V2 for cluster metadata log operations
>  - [https://github.com/krummas/cassandra/commit/2217f551a6]
> TCM is required to use Paxos V2 to because of the way the legacy paxos path 
> uses a keyspace’s RF to assert whether there are enough available replicas to 
> perform the read before a CAS. It doesn’t work properly with meta strategy 
> when adding CMS members
> [CEP-21] Disaster recovery
>  - [https://github.com/krummas/cassandra/commit/9011233604]
> Allow an instance to dump its current cluster metadata, and force-boot from 
> it. Basically we need a way to force an instance to become the CMS, in case 
> the original CMS goes down.
> [CEP-21] Switch nodeId from uuid to int
>  - [https://github.com/krummas/cassandra/commit/aea5500ae0]
> [CEP-21] Make CQLSSTableWriter exclusively a client utility
>  - [https://github.com/krummas/cassandra/commit/0693b22297]
> [CEP-21] Support nodetool assasinate
>  - [https://github.com/krummas/cassandra/commit/312a1c1b0e]
> [CEP-21] In progress sequence updates
>  - [https://github.com/krummas/cassandra/commit/7f56e0e5b3]
> Protection against out-of-order and repeated execution, sequence rediscovery 
> and reliability improvements.
>  - [https://github.com/krummas/cassandra/commit/7ddb941d80]
> DC and RF aware acks for multistep operations. Make progress barrier 
> consistency level configurable.
> [CEP-21] Enforce data ownership checks
>  - [https://github.com/krummas/cassandra/commit/5c42fd098c]
> Never accept operations for ranges we don't own.
> [CEP-21] Gossip fixes
>  - [https://github.com/krummas/cassandra/commit/8572735e28]
> Several gossip issues found during upgrade and load testing
>  - [https://github.com/krummas/cassandra/commit/ed785cb414]
> Avoid gossip deadlock when merging CM nodes to gossip
>  - [https://github.com/krummas/cassandra/commit/e40c3a4ea]
> Replaced endpoints should be evicted from gossip like in previous versions.
> [CEP-21] Re-enable startup checks on non-test initialization
>  - [https://github.com/krummas/cassandra/commit/63013ad366]
> [CEP-21] Unify streaming: make all operations use explicit ranges for 
> streaming
>  - [https://github.com/krummas/cassandra/commit/ccba2e84de]
> All streaming operations now use a movement map describing what should be 
> streamed where.
> [CEP-21] Add vtable for metadata log
>  - [https://github.com/krummas/cassandra/commit/9fa4d61e5a]
> [CEP-21] Add exception code to commit result if rejected
>  - [https://github.com/krummas/cassandra/commit/7331e0842b]
> [CEP-21] Make cleanup safe to run during range movements
>  - [https://github.com/krummas/cassandra/commit/6f990c118f]
> [CEP-21] ReplicaPlan recomputation and stillAppliesTo implementation for Paxos
>  - [https://github.com/krummas/cassandra/commit/0e5cc6a4fd]
> [CEP-21] Update index status fixes post-rebase
>  - [https://github.com/krummas/cassandra/commit/06fba6bbc0]
> [CEP-21] Create new auth tables, remove cidr constants for column names
>  - [https://github.com/krummas/cassandra/commit/fef280dda6]
> [CEP-21] Schema fixes
>  - [https://github.com/krummas/cassandra/commit/dd9a7e9752]
> Schema cleanups, remove old schema pulling
>  - [https://github.com/krummas/cassandra/commit/9f0538c4b3]
> Don't include system_distributed in initial schema.
>  - [https://github.com/krummas/cassandra/commit/4d5fce6884]
> Simplify check for whether DROP COMPACT STORAGE is permitted
>  - [https://github.com/krummas/cassandra/commit/0bb8efb8f0]
> Don't invalidate prepared stmt cache on every schema change
>  - [https://github.com/krummas/cassandra/commit/93517d9ee4]
> Allow Schema.instance to be initialized empty for client apps
>  - [https://github.com/krummas/cassandra/commit/f612d2cd3d]
> Simplistic schema metadata diff
>  - [https://github.com/krummas/cassandra/commit/42bc2dd5ee]
> Don't warn about new system tables in StartupCheck
>  - [https://github.com/krummas/cassandra/commit/b570e74bf3]
> Exclude meta keyspace from TableMetrics::totalNonSystemTablesSize
> [CEP-21] Simulator updates
>  - [https://github.com/krummas/cassandra/commit/7e368cfc3e]
> Simulate NTS
>  - [https://github.com/krummas/cassandra/commit/00a34d88c3]
> Multi cms simulation, Deadlines for local processor, reworked retries for 
> local and remote processor
>  - [https://github.com/krummas/cassandra/commit/e6dce927da]
> Simulator harry integration
>  - [https://github.com/krummas/cassandra/commit/f30bf25060]
> Eclipse warn
> [CEP-21] Bootstrap fixes
>  - [https://github.com/krummas/cassandra/commit/6eea8aad69]
> ClusterMetadata::writePlacementAllSettled handles bootstrapping nodes 
> correctly
>  - [https://github.com/krummas/cassandra/commit/ef9e9c6074]
> Reenable write survey mode
> [CEP-21] Minor cleanups
>  - [https://github.com/krummas/cassandra/commit/335d10c9d6]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to