[ 
https://issues.apache.org/jira/browse/IGNITE-22915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17893827#comment-17893827
 ] 

Roman Puchkovskiy commented on IGNITE-22915:
--------------------------------------------

In the original design, there was no way to determine whether a node witnessed 
an MG repair (that is, that it participated in it or was later migrated and its 
Metastorage was validated for divergence), so the plan was to do the validation 
on each join. For such a validation, an MG leader is needed. If we only allow a 
node to join the cluster when its Metastorage is validated (against a leader) 
and we only have current VALIDATED and JOINED stages of join, we would have a 
chicken-and-egg problem: to be validated, a node requires a leader, but to 
elect a leader, it first has to be validated. To solve that problem, it was 
planned to introduce that additional join step.

But in IGNITE-22904, a way to effectively determine whether a node witnessed an 
MG repair was added. This means that now we only need to do the Metastorage 
divergency validation when a node (that did not participate in the repair) 
joins the cluster for the first time after repair. When it does so, it is not 
included in the voting set yet, and the vicious cycle is broken (a leader is 
elected independently from the re-entering node). Hence we don't need the new 
join step.

That's why I'm closing the issue.

> Separate join validation to basic and metastorage validation
> ------------------------------------------------------------
>
>                 Key: IGNITE-22915
>                 URL: https://issues.apache.org/jira/browse/IGNITE-22915
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Roman Puchkovskiy
>            Priority: Major
>              Labels: iep-128, ignite-3
>
> Currently, during its startup, a node does the following:
>  # Validate itself against the CMG leader by sending a JoinRequestCommand to 
> the CMG  (upon successful validation it gets added to the ‘validated’ set)
>  # Start remaining components
>  # Complete join via the CMG leader (after this the node gets added to the 
> logical topology)
> We need to validate a joining node Metastorage against the ‘cluster’ 
> Metastorage state. Join happens via interaction with the CMG, but an MG 
> leader is needed for validation. This creates a cyclic dependency between 
> validation and the startup of the MG. To avoid this, we split the validation 
> into 2 phases: basic validation and Metastorage validation.
> The part of the startup related to join and Metastorage startup now looks 
> like this:
>  # Validate itself against the CMG leader only doing basic validations, that 
> is, everything except Metastorage (upon successful validation the node gets 
> added to the ‘basically_validated’ set) by sending a JoinRequestCommand to 
> the CMG
>  # Start Metastorage (and wait for it to catch up with the leader and apply 
> everything that is committed [this is what is already done during node 
> startup])
>  # Validate the Metastorage against the MG leader (in this issue, this is a 
> no-op; the actual validation will be done in IGNITE-22916)
>  # Record the fact of Metastorage validity in the CMG by sending a 
> ValidMetastorageCommand (the node gets added to the ‘fully_validated’ set)
>  # Start remaining components
>  # Complete join via the CMG leader (after this the node gets added to the 
> logical topology)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to