[jira] [Commented] (IGNITE-10078) Node failure during concurrent partition updates may cause partition desync between primary and backup.

Ignite TC Bot (JIRA) Tue, 21 May 2019 08:39:20 -0700


    [ 
https://issues.apache.org/jira/browse/IGNITE-10078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16844970#comment-16844970
 ]


Ignite TC Bot commented on IGNITE-10078:
----------------------------------------

{panel:title=--&gt; Run :: All: Possible 
Blockers|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1}
{color:#d04437}Cache 6{color} [[tests 0 TIMEOUT , Exit Code 
|https://ci.ignite.apache.org/viewLog.html?buildId=3894733]]

{color:#d04437}MVCC Cache 7{color} [[tests 
2|https://ci.ignite.apache.org/viewLog.html?buildId=3894770]]
* IgniteCacheMvccTestSuite7: 
GridCacheRebalancingWithAsyncClearingTest.testCorrectRebalancingCurrentlyRentingPartitions

{color:#d04437}[Check Code Style]{color} [[tests 0 Exit Code 
|https://ci.ignite.apache.org/viewLog.html?buildId=3894778]]

{color:#d04437}PDS (Indexing){color} [[tests 
2|https://ci.ignite.apache.org/viewLog.html?buildId=3894744]]
* IgnitePdsWithIndexingCoreTestSuite: 
IgniteLogicalRecoveryTest.testRecoveryOnJoinToActiveCluster

{panel}
[TeamCity *--&gt; Run :: All* 
Results|https://ci.ignite.apache.org/viewLog.html?buildId=3894779&amp;buildTypeId=IgniteTests24Java8_RunAll]

> Node failure during concurrent partition updates may cause partition desync 
> between primary and backup.
> -------------------------------------------------------------------------------------------------------
>
>                 Key: IGNITE-10078
>                 URL: https://issues.apache.org/jira/browse/IGNITE-10078
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Alexei Scherbakov
>            Assignee: Alexei Scherbakov
>            Priority: Major
>             Fix For: 2.8
>
>          Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> This is possible if some updates are not written to WAL before node failure. 
> They will be not applied by rebalancing due to same partition counters in 
> certain scenario:
> 1. Start grid with 3 nodes, 2 backups.
> 2. Preload some data to partition P.
> 3. Start two concurrent transactions writing single key to the same partition 
> P, keys are different
> {noformat}
> try(Transaction tx = client.transactions().txStart(PESSIMISTIC, 
> REPEATABLE_READ, 0, 1)) {
>       client.cache(DEFAULT_CACHE_NAME).put(k, v);
>       tx.commit();
> }
> {noformat}
> 4. Order updates on backup in the way such update with max partition counter 
> is written to WAL and update with lesser partition counter failed due to 
> triggering of FH before it's added to WAL
> 5. Return failed node to grid, observe no rebalancing due to same partition 
> counters.
> Possible solution: detect gaps in update counters on recovery and force 
> rebalance from a node without gaps if detected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (IGNITE-10078) Node failure during concurrent partition updates may cause partition desync between primary and backup.

Reply via email to