[ https://issues.apache.org/jira/browse/IGNITE-10078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16844970#comment-16844970 ]
Ignite TC Bot commented on IGNITE-10078: ---------------------------------------- {panel:title=--> Run :: All: Possible Blockers|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1} {color:#d04437}Cache 6{color} [[tests 0 TIMEOUT , Exit Code |https://ci.ignite.apache.org/viewLog.html?buildId=3894733]] {color:#d04437}MVCC Cache 7{color} [[tests 2|https://ci.ignite.apache.org/viewLog.html?buildId=3894770]] * IgniteCacheMvccTestSuite7: GridCacheRebalancingWithAsyncClearingTest.testCorrectRebalancingCurrentlyRentingPartitions {color:#d04437}[Check Code Style]{color} [[tests 0 Exit Code |https://ci.ignite.apache.org/viewLog.html?buildId=3894778]] {color:#d04437}PDS (Indexing){color} [[tests 2|https://ci.ignite.apache.org/viewLog.html?buildId=3894744]] * IgnitePdsWithIndexingCoreTestSuite: IgniteLogicalRecoveryTest.testRecoveryOnJoinToActiveCluster {panel} [TeamCity *--> Run :: All* Results|https://ci.ignite.apache.org/viewLog.html?buildId=3894779&buildTypeId=IgniteTests24Java8_RunAll] > Node failure during concurrent partition updates may cause partition desync > between primary and backup. > ------------------------------------------------------------------------------------------------------- > > Key: IGNITE-10078 > URL: https://issues.apache.org/jira/browse/IGNITE-10078 > Project: Ignite > Issue Type: Bug > Reporter: Alexei Scherbakov > Assignee: Alexei Scherbakov > Priority: Major > Fix For: 2.8 > > Time Spent: 2.5h > Remaining Estimate: 0h > > This is possible if some updates are not written to WAL before node failure. > They will be not applied by rebalancing due to same partition counters in > certain scenario: > 1. Start grid with 3 nodes, 2 backups. > 2. Preload some data to partition P. > 3. Start two concurrent transactions writing single key to the same partition > P, keys are different > {noformat} > try(Transaction tx = client.transactions().txStart(PESSIMISTIC, > REPEATABLE_READ, 0, 1)) { > client.cache(DEFAULT_CACHE_NAME).put(k, v); > tx.commit(); > } > {noformat} > 4. Order updates on backup in the way such update with max partition counter > is written to WAL and update with lesser partition counter failed due to > triggering of FH before it's added to WAL > 5. Return failed node to grid, observe no rebalancing due to same partition > counters. > Possible solution: detect gaps in update counters on recovery and force > rebalance from a node without gaps if detected. -- This message was sent by Atlassian JIRA (v7.6.3#76005)