[ https://issues.apache.org/jira/browse/IGNITE-21213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Denis Chudov updated IGNITE-21213: ---------------------------------- Description: h3. Motivation In the replica listener, we have unconsidered mechanisms between each other to determine primary rteplica. The first one is based on the placement driver API (it is used in {_}PartitionReplicaListener#ensureReplicaIsPrimary{_}) and the other one is based on the placement driver events (the events are hadeled by two methods: {_}ReplicaManager#onPrimaryReplicaElected{_}, {_}ReplicaManager#onPrimaryReplicaExpired{_}). Because the replica messages and events are handled in different threads, any variety of processing is possible. For example, the replica can release all transaction locks (by PRIMARY_REPLICA_EXPIRED event) and then handle a message for this transaction (because ensureReplicaIsPrimary was done before), assuming that all the locks are holding. h3. Definition of done The simultaneous processing of transactional requests and PRIMARY_REPLICA_EXPIRED is impossible. *Implementation notes* We must take into account and prevent the possible deadlocks, such as: * the transactional request is trying to acquire the lock on the key A * the processing of the PRIMARY_REPLICA_EXPIRED cannot start because the aforementioned request processing is not finished * the lock on the key A can't be acquired because it should be released by the listener of PRIMARY_REPLICA_EXPIRED due to the replica expiration. Probably the event of replica expiration should invalidate the ongoing transactional requests and complete them. was: h3. Motivation In the replica listener, we have unconsidered mechanisms between each other to determine primary rteplica. The first one is based on the placement driver API (it is used in _PartitionReplicaListener#ensureReplicaIsPrimary_) and the other one is based on the placement driver events (the events are hadeled by two methods: _ReplicaManager#onPrimaryReplicaElected_, _ReplicaManager#onPrimaryReplicaExpired_). Because the replica messages and events are handled in different threads, any variety of processing is possible. For example, the replica can release all transaction locks (by PRIMARY_REPLICA_EXPIRED event) and then handle a message for this transaction (because ensureReplicaIsPrimary was done before), assuming that all the locks are holding. h3. Definition of done The two mechanisms work in coordination. > Coordination of mechanisms of determination for primary on replicaside > ---------------------------------------------------------------------- > > Key: IGNITE-21213 > URL: https://issues.apache.org/jira/browse/IGNITE-21213 > Project: Ignite > Issue Type: Bug > Reporter: Vladislav Pyatkov > Priority: Major > Labels: ignite-3 > > h3. Motivation > In the replica listener, we have unconsidered mechanisms between each other > to determine primary rteplica. The first one is based on the placement driver > API (it is used in {_}PartitionReplicaListener#ensureReplicaIsPrimary{_}) and > the other one is based on the placement driver events (the events are hadeled > by two methods: {_}ReplicaManager#onPrimaryReplicaElected{_}, > {_}ReplicaManager#onPrimaryReplicaExpired{_}). > Because the replica messages and events are handled in different threads, any > variety of processing is possible. For example, the replica can release all > transaction locks (by PRIMARY_REPLICA_EXPIRED event) and then handle a > message for this transaction (because ensureReplicaIsPrimary was done > before), assuming that all the locks are holding. > h3. Definition of done > The simultaneous processing of transactional requests and > PRIMARY_REPLICA_EXPIRED is impossible. > > *Implementation notes* > We must take into account and prevent the possible deadlocks, such as: > * the transactional request is trying to acquire the lock on the key A > * the processing of the PRIMARY_REPLICA_EXPIRED cannot start because the > aforementioned request processing is not finished > * the lock on the key A can't be acquired because it should be released by > the listener of PRIMARY_REPLICA_EXPIRED due to the replica expiration. > Probably the event of replica expiration should invalidate the ongoing > transactional requests and complete them. -- This message was sent by Atlassian Jira (v8.20.10#820010)