[ 
https://issues.apache.org/jira/browse/IGNITE-21213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denis Chudov updated IGNITE-21213:
----------------------------------
    Description: 
h3. Motivation

In the replica listener, we have unconsidered mechanisms between each other to 
determine primary rteplica. The first one is based on the placement driver API 
(it is used in {_}PartitionReplicaListener#ensureReplicaIsPrimary{_}) and the 
other one is based on the placement driver events (the events are hadeled by 
two methods: {_}ReplicaManager#onPrimaryReplicaElected{_}, 
{_}ReplicaManager#onPrimaryReplicaExpired{_}).

Because the replica messages and events are handled in different threads, any 
variety of processing is possible. For example, the replica can release all 
transaction locks (by PRIMARY_REPLICA_EXPIRED event) and then handle a message 
for this transaction (because ensureReplicaIsPrimary was done before), assuming 
that all the locks are holding.
h3. Definition of done

The simultaneous processing of transactional requests and 
PRIMARY_REPLICA_EXPIRED is impossible.

 

*Implementation notes*

We must take into account and prevent the possible deadlocks, such as:
 * the transactional request is trying to acquire the lock on the key A
 * the processing of the PRIMARY_REPLICA_EXPIRED cannot start because the 
aforementioned request processing is not finished
 * the lock on the key A can't be acquired because it should be released by the 
listener of PRIMARY_REPLICA_EXPIRED due to the replica expiration.

Probably the event of replica expiration should invalidate the ongoing 
transactional requests and complete them.

  was:
h3. Motivation
In the replica listener, we have unconsidered mechanisms between each other to 
determine primary rteplica. The first one is based on the placement driver API 
(it is used in _PartitionReplicaListener#ensureReplicaIsPrimary_) and the other 
one is based on the placement driver events (the events are hadeled by two 
methods: _ReplicaManager#onPrimaryReplicaElected_, 
_ReplicaManager#onPrimaryReplicaExpired_).

Because the replica messages and events are handled in different threads, any 
variety of processing is possible. For example, the replica can release all 
transaction locks (by

PRIMARY_REPLICA_EXPIRED event) and then handle a message for this transaction 
(because ensureReplicaIsPrimary was done before), assuming that all the locks 
are holding.

h3. Definition of done
The two mechanisms work in coordination.


> Coordination of mechanisms of determination for primary on replicaside
> ----------------------------------------------------------------------
>
>                 Key: IGNITE-21213
>                 URL: https://issues.apache.org/jira/browse/IGNITE-21213
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Vladislav Pyatkov
>            Priority: Major
>              Labels: ignite-3
>
> h3. Motivation
> In the replica listener, we have unconsidered mechanisms between each other 
> to determine primary rteplica. The first one is based on the placement driver 
> API (it is used in {_}PartitionReplicaListener#ensureReplicaIsPrimary{_}) and 
> the other one is based on the placement driver events (the events are hadeled 
> by two methods: {_}ReplicaManager#onPrimaryReplicaElected{_}, 
> {_}ReplicaManager#onPrimaryReplicaExpired{_}).
> Because the replica messages and events are handled in different threads, any 
> variety of processing is possible. For example, the replica can release all 
> transaction locks (by PRIMARY_REPLICA_EXPIRED event) and then handle a 
> message for this transaction (because ensureReplicaIsPrimary was done 
> before), assuming that all the locks are holding.
> h3. Definition of done
> The simultaneous processing of transactional requests and 
> PRIMARY_REPLICA_EXPIRED is impossible.
>  
> *Implementation notes*
> We must take into account and prevent the possible deadlocks, such as:
>  * the transactional request is trying to acquire the lock on the key A
>  * the processing of the PRIMARY_REPLICA_EXPIRED cannot start because the 
> aforementioned request processing is not finished
>  * the lock on the key A can't be acquired because it should be released by 
> the listener of PRIMARY_REPLICA_EXPIRED due to the replica expiration.
> Probably the event of replica expiration should invalidate the ongoing 
> transactional requests and complete them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to