[ 
https://issues.apache.org/jira/browse/IGNITE-22202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Lapin updated IGNITE-22202:
-------------------------------------
    Description: 
IGNITE-22191 describes the problem and includes a temporary solution that needs 
to be somehow corrected in the current one.
h3. upd #1
h3. Motivation

Long story short - it's possible to get null instead of storage because 
rebalance engine will stop primary replica if it's no longer in assignments. 
Aforementioned behaviour is incorrect and should be fixed, meaning that we 
should not stop replica if it's primary even if it's not in assignments until 
primary replica expiration, corresponding raft endpoint, however, should be 
stopped.

AsIs:
 # Initial assignments [A,B,C].
 # "A" selected as primary.
 # Replica factor is set to 2 ->  assignments [B,C] // Or generally Primary 
replica is excluded from the assignments.
 # As a reaction to the new assignments.stable change (that was updated on 
assignments switch) replica on "A" is stopped.

h3. Definition of Done
 * Do not stop primary replica on assignment.stable change even if it's not in 
new assignments. Corresponding raft server endpoint should be stopped though.
 * Stop replica on primary replica expiration if it's not in assignments. 
(Union of the stable and pending should be checked.)
 * Do not start new replica (but start raft endpoint) if it was not stopped 
previously. (Union of the stable and pending should be checked.)

ToBe:
 # Initial assignments [A,B,C].
 # "A" selected as primary.
 # Replica factor is set to 2 ->  assignments [B,C] 
 # As a reaction to the new assignments.stable change (that was updated on 
assignments switch) replica on "A" is *not* stopped. Corresponding raft server 
endpoint *is* stopped. 
 # Replica factor is set to 3 -> assignments [A,B,C]
 # As a reaction to the new assignments.pending change we should *not* start 
"A" replica because it wasn't stopped previoulsy (obviously we should check 
whether it was expired or not), but should start corresponding raft endpoint.
 # Replica factor is set to 2 again ->  assignments [B,C] 
 # Same as 4.
 # Primary replica on "A" expires -> stop the replica because it's no longer in 
assignments. 

h3. Implementation Notes
 # In order to ease the logic, I believe that we may skip dedicated raft 
starting/stopping. Let's stop it on replica stop and start on replica start and 
that's it.
 # Given logic clashes with changes in Refactor TableManager and move all RAFT 
related pieces to Replica
 # Please be careful with [(assignments check + PR.isExpired) -> action] HB 
relations in order to prevent races like below. I don't think that we need any 
extra synchronisation here, just same order of assignments checks and 
PR.isExpired() one within all triggers.
 ## PR "A"expires, not in assignments -> is going to stop -> hangs
 ## "A" is added to the assignments, consider it as running -> do not start "A" 
replica.
 ## 1 goes back to life -> stops the "A" replica.
 # Races related to the primary replica publishing and assignment updates will 
be covered in the separate ticket.

  was:
IGNITE-22191 describes the problem and includes a temporary solution that needs 
to be somehow corrected in the current one.
h3. upd #1
h3. Motivation

Long story short - it's possible to get null instead of storage because 
rebalance engine will stop primary replica if it's no longer in assignments. 
Aforementioned behaviour is incorrect and should be fixed, meaning that we 
should not stop replica if it's primary even if it's not in assignments until 
primary replica expiration, corresponding raft endpoint, however, should be 
stopped.

AsIs:
 # Initial assignments [A,B,C].
 # "A" selected as primary.
 # Replica factor is set to 2 ->  assignments [B,C] // Or generally Primary 
replica is excluded from the assignments.
 # As a reaction to the new assignments.stable change (that was updated on 
assignments switch) replica on "A" is stopped.

h3. Definition of Done
 * Do not stop primary replica on assignment.stable change even if it's not in 
new assignments. Corresponding raft server endpoint should be stopped though.
 * Stop replica on primary replica expiration if it's not in assignments. 
(Union of the stable and pending should be checked.)
 * Do not start new replica (but start raft endpoint) if it was not stopped 
previously. (Union of the stable and pending should be checked.)

ToBe:
 # Initial assignments [A,B,C].
 # "A" selected as primary.
 # Replica factor is set to 2 ->  assignments [B,C] 
 # As a reaction to the new assignments.stable change (that was updated on 
assignments switch) replica on "A" is *not* stopped. Corresponding raft server 
endpoint *is* stopped. 
 # Replica factor is set to 3 -> assignments [A,B,C]
 # As a reaction to the new assignments.pending change we should *not* start 
"A" replica because it wasn't stopped previoulsy (obviously we should check 
whether it was expired or not), but should start corresponding raft endpoint.
 # Replica factor is set to 2 again ->  assignments [B,C] 
 # Same as 4.
 # Primary replica on "A" expires -> stop the replica because it's no longer in 
assignments. 


> Deal with null when getting storages in IndexBuildController
> ------------------------------------------------------------
>
>                 Key: IGNITE-22202
>                 URL: https://issues.apache.org/jira/browse/IGNITE-22202
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Kirill Tkalenko
>            Assignee: Denis Chudov
>            Priority: Major
>              Labels: ignite-3
>
> IGNITE-22191 describes the problem and includes a temporary solution that 
> needs to be somehow corrected in the current one.
> h3. upd #1
> h3. Motivation
> Long story short - it's possible to get null instead of storage because 
> rebalance engine will stop primary replica if it's no longer in assignments. 
> Aforementioned behaviour is incorrect and should be fixed, meaning that we 
> should not stop replica if it's primary even if it's not in assignments until 
> primary replica expiration, corresponding raft endpoint, however, should be 
> stopped.
> AsIs:
>  # Initial assignments [A,B,C].
>  # "A" selected as primary.
>  # Replica factor is set to 2 ->  assignments [B,C] // Or generally Primary 
> replica is excluded from the assignments.
>  # As a reaction to the new assignments.stable change (that was updated on 
> assignments switch) replica on "A" is stopped.
> h3. Definition of Done
>  * Do not stop primary replica on assignment.stable change even if it's not 
> in new assignments. Corresponding raft server endpoint should be stopped 
> though.
>  * Stop replica on primary replica expiration if it's not in assignments. 
> (Union of the stable and pending should be checked.)
>  * Do not start new replica (but start raft endpoint) if it was not stopped 
> previously. (Union of the stable and pending should be checked.)
> ToBe:
>  # Initial assignments [A,B,C].
>  # "A" selected as primary.
>  # Replica factor is set to 2 ->  assignments [B,C] 
>  # As a reaction to the new assignments.stable change (that was updated on 
> assignments switch) replica on "A" is *not* stopped. Corresponding raft 
> server endpoint *is* stopped. 
>  # Replica factor is set to 3 -> assignments [A,B,C]
>  # As a reaction to the new assignments.pending change we should *not* start 
> "A" replica because it wasn't stopped previoulsy (obviously we should check 
> whether it was expired or not), but should start corresponding raft endpoint.
>  # Replica factor is set to 2 again ->  assignments [B,C] 
>  # Same as 4.
>  # Primary replica on "A" expires -> stop the replica because it's no longer 
> in assignments. 
> h3. Implementation Notes
>  # In order to ease the logic, I believe that we may skip dedicated raft 
> starting/stopping. Let's stop it on replica stop and start on replica start 
> and that's it.
>  # Given logic clashes with changes in Refactor TableManager and move all 
> RAFT related pieces to Replica
>  # Please be careful with [(assignments check + PR.isExpired) -> action] HB 
> relations in order to prevent races like below. I don't think that we need 
> any extra synchronisation here, just same order of assignments checks and 
> PR.isExpired() one within all triggers.
>  ## PR "A"expires, not in assignments -> is going to stop -> hangs
>  ## "A" is added to the assignments, consider it as running -> do not start 
> "A" replica.
>  ## 1 goes back to life -> stops the "A" replica.
>  # Races related to the primary replica publishing and assignment updates 
> will be covered in the separate ticket.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to