[ https://issues.apache.org/jira/browse/IGNITE-13358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexey Scherbakov updated IGNITE-13358: --------------------------------------- Description: We have several issues related to a partition clearing worth fixing. 1. PartitionsEvictManager doent's provide obvious guarantees for a correctness when a node or a cache group is stopped while partitions are concurrently clearing. 2. GridDhtLocalPartition#awaitDestroy is called while holding topology write lock, which is deadlock prone, because we currently require write lock to destroy a partition. 3. GridDhtLocalPartition contains a lot of messy code related to partition clearing, most notably ClearFuture, but the clearing is done by PartitionsEvictManager. We should get rid of a clearing code in GridDhtLocalPartition. This should also bring better code readility and help understand what happening during a clearing. 4. Currently moving partitions are cleared before rebalancing in the order different to rebalanceOrder, breaking the contract. 5. The clearing logic for for moving partitions (before rebalancing) seems incorrect: it's possible to lost updates received during clearing. 6. To clear partitions before full rebalancing we utilize same threads as for a partition eviction. This can slow rebalancing even if we have resources. Better to clear partitions in the rebalance pool (explicitely dedicated by user). 7. It's possible to reserve a renting partition, which have absolutely no meaning. All operations with a renting partitions (except clearing) are a waste of resources. 8. Partition eviction causes system pool starvation if a number of thread in system pool is < 8. This can break crucial functionality. was: We have several issues related to a partition clearing worth fixing. 1. PartitionsEvictManager doent's provide obvious guarantees for a correctness when a node or a cache group is stopped while partitions are concurrently clearing. 2. GridDhtLocalPartition#awaitDestroy is called while holding topology write lock, which is deadlock prone, because we currently require write lock to destroy a partition. 3. GridDhtLocalPartition contains a lot of messy code related to partition clearing, most notably ClearFuture, but the clearing is done by PartitionsEvictManager. We should get rid of a clearing code in GridDhtLocalPartition. This should also bring better code readility and help understand what happening during a clearing. 4. Currently moving partitions are cleared before rebalancing in the order different to rebalanceOrder, breaking the contract. 5. The clearing logic for for moving partitions (before rebalancing) seems incorrect: it's possible to lost updates received during clearing. 6. To clear partitions before full rebalancing we utilize same threads as for a partition eviction. This can slow rebalancing even if we have resources. Better to clear partitions in the rebalance pool (explicitely dedicated by user). 7. It's possible to reserve a renting partition, which have absolutely no meaning. All operations with a renting partitions (except clearing) are a waste of resources. > Improvements for partition clearing related parts > ------------------------------------------------- > > Key: IGNITE-13358 > URL: https://issues.apache.org/jira/browse/IGNITE-13358 > Project: Ignite > Issue Type: Improvement > Reporter: Alexey Scherbakov > Assignee: Alexey Scherbakov > Priority: Major > > We have several issues related to a partition clearing worth fixing. > 1. PartitionsEvictManager doent's provide obvious guarantees for a > correctness when a node or a cache group is stopped while partitions are > concurrently clearing. > 2. GridDhtLocalPartition#awaitDestroy is called while holding topology write > lock, which is deadlock prone, because we currently require write lock to > destroy a partition. > 3. GridDhtLocalPartition contains a lot of messy code related to partition > clearing, most notably ClearFuture, but the clearing is done by > PartitionsEvictManager. We should get rid of a clearing code in > GridDhtLocalPartition. This should also bring better code readility and help > understand what happening during a clearing. > 4. Currently moving partitions are cleared before rebalancing in the order > different to rebalanceOrder, breaking the contract. > 5. The clearing logic for for moving partitions (before rebalancing) seems > incorrect: it's possible to lost updates received during clearing. > 6. To clear partitions before full rebalancing we utilize same threads as for > a partition eviction. This can slow rebalancing even if we have resources. > Better to clear partitions in the rebalance pool (explicitely dedicated by > user). > 7. It's possible to reserve a renting partition, which have absolutely no > meaning. All operations with a renting partitions (except clearing) are a > waste of resources. > 8. Partition eviction causes system pool starvation if a number of thread in > system pool is < 8. This can break crucial functionality. -- This message was sent by Atlassian Jira (v8.3.4#803005)