[ 
https://issues.apache.org/jira/browse/IGNITE-20124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denis Chudov updated IGNITE-20124:
----------------------------------
    Description: 
h3. Motivation

In order to preserve the guarantee that the primary replica is always 
up-to-date it's required to:
 * In case of common RW transaction - insert writeIntent to the storage within 
primary before replication.
 * In case of one-phase-commit - insert commitedWrite after the replication.

Both have already been done. However, that means that if primary is part of the 
replication group, and it's true in almost all cases, we will double the update:
 * In case of common RW transaction - through the replication.
 * In case of one-phase-commit - either through the replication, or though post 
update, if replication was fast enough.

h3. Definition of Done
 * Prevent double storage updates within primary.

h3. Implementation Notes

The easiest way to prevent double insert is to skip one if local safe time is 
greater or equal to candidates. There are 3 places where we update partition 
storage:
 # Primary pre-replication update. In that case, the second update on 
replication should be excluded.
 # Primary post-replication update in case of 1PC. It's possible to see already 
updated data if replication was already processed locally. It is expected to be 
already covered in https://issues.apache.org/jira/browse/IGNITE-15927 . We 
should check the primary safe time on post-replication update and don't do 
update if the safe time is already adjusted.
 # Insert through replication. In case of !1PC on every primary there will be 
double insert (see 1). In case of 1PC it depends, so we should check the safe 
time on primary to know whether the update should be done (see 2).

In every case, the storage indexes still should be adjusted on replication, as 
it is done now, because the progress of indexes on FSM write operations should 
not be violated - otherwise, a Raft snapshot-based rebalance would be broken. 
We may have two non-consistent storage updates on primary which may affect 
different fsyncs, so maybe we should benchmark this optimization to find out 
how useful it is. The transactional correctness isn't violated by these 
non-consistent storage updates, because there is only a possibility that some 
writes or write intents will go ahead of indexes and therefore will be included 
into snapshots - however we still can process such writes and resolve write 
intents.

Also, the safe time needs to be updated on the primary replica now.

  was:
h3. Motivation

In order to preserve the guarantee that the primary replica is always 
up-to-date it's required to:
 * In case of common RW transaction - insert writeIntent to the storage within 
primary before replication.
 * In case of one-phase-commit - insert commitedWrite after the replication.

Both have already been done. However, that means that if primary is part of the 
replication group, and it's true in almost all cases, we will double the update:
 * In case of common RW transaction - through the replication.
 * In case of one-phase-commit - either through the replication, or though post 
update, if replication was fast enough.

h3. Definition of Done
 * Prevent double storage updates within primary.

h3. Implementation Notes

The easiest way to prevent double insert is to skip one if local safe time is 
greater or equal to candidates. There are 3 places where we update partition 
storage:
 # Primary pre-replication update. In that case, the second update on 
replication should be excluded.
 # Primary post-replication update in case of 1PC. It's possible to see already 
updated data if replication was already processed locally. It is expected to be 
already covered in https://issues.apache.org/jira/browse/IGNITE-15927 . We 
should check the primary safe time on post-replication update and don't do 
update if the safe time is already adjusted.
 # Insert through replication. In case of !1PC on every primary there will be 
double insert (see 1). In case of 1PC it depends, so we should check the safe 
time on primary to know whether the update should be done (see 2).

In every case, the storage indexes still should be adjusted on replication, as 
it is done now, because the progress of indexes on FSM write operations should 
not be violated - otherwise, a Raft snapshot-based rebalance would be broken. 
We may have two non-consistent storage updates on primary which may affect 
different fsyncs, so maybe we should benchmark this optimization to find out 
how useful it is. The transactional correctness isn't violated by these 

 


> Prevent double storage updates within primary
> ---------------------------------------------
>
>                 Key: IGNITE-20124
>                 URL: https://issues.apache.org/jira/browse/IGNITE-20124
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Alexander Lapin
>            Priority: Major
>              Labels: ignite-3, transactions
>
> h3. Motivation
> In order to preserve the guarantee that the primary replica is always 
> up-to-date it's required to:
>  * In case of common RW transaction - insert writeIntent to the storage 
> within primary before replication.
>  * In case of one-phase-commit - insert commitedWrite after the replication.
> Both have already been done. However, that means that if primary is part of 
> the replication group, and it's true in almost all cases, we will double the 
> update:
>  * In case of common RW transaction - through the replication.
>  * In case of one-phase-commit - either through the replication, or though 
> post update, if replication was fast enough.
> h3. Definition of Done
>  * Prevent double storage updates within primary.
> h3. Implementation Notes
> The easiest way to prevent double insert is to skip one if local safe time is 
> greater or equal to candidates. There are 3 places where we update partition 
> storage:
>  # Primary pre-replication update. In that case, the second update on 
> replication should be excluded.
>  # Primary post-replication update in case of 1PC. It's possible to see 
> already updated data if replication was already processed locally. It is 
> expected to be already covered in 
> https://issues.apache.org/jira/browse/IGNITE-15927 . We should check the 
> primary safe time on post-replication update and don't do update if the safe 
> time is already adjusted.
>  # Insert through replication. In case of !1PC on every primary there will be 
> double insert (see 1). In case of 1PC it depends, so we should check the safe 
> time on primary to know whether the update should be done (see 2).
> In every case, the storage indexes still should be adjusted on replication, 
> as it is done now, because the progress of indexes on FSM write operations 
> should not be violated - otherwise, a Raft snapshot-based rebalance would be 
> broken. We may have two non-consistent storage updates on primary which may 
> affect different fsyncs, so maybe we should benchmark this optimization to 
> find out how useful it is. The transactional correctness isn't violated by 
> these non-consistent storage updates, because there is only a possibility 
> that some writes or write intents will go ahead of indexes and therefore will 
> be included into snapshots - however we still can process such writes and 
> resolve write intents.
> Also, the safe time needs to be updated on the primary replica now.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to