[ 
https://issues.apache.org/jira/browse/IGNITE-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Kovalenko updated IGNITE-10485:
-------------------------------------
    Description: 
Currently there are no good possibilities to get more knowledge about cluster 
before PME on node join is started.

It might be usefult to do some pre-work (activate components if cluster is 
active, calculate baseline affinity, cleanup pds if baseline changed, etc.) 
before actual NODE_JOIN event is triggered cluster-wide and PME is started.
Such pre-work will significantly speed-up PME in case of node join.
Currently the only place where it can be done is during processing NodeAdded 
message on local joining node. 
But it's not a good idea, because it will freeze processing new discovery 
messages cluster-wide.

I see 2 ways how to implement it:

1) Introduce new intermediate state of node when it's discovered, but discovery 
event on node join is not triggered yet. This is right, but complicated change, 
because it requires revisiting joining process both in Tcp and Zk discovery 
protocols with extra failover scenarios.

2) Try to get this information and do pre-work before discovery manager start, 
using e.g. GridRestProcessor. This looks much simplier, but we can have some 
races there, when during pre-work cluster state has been changed (deactivation, 
baseline change). In this case we should rollback it or just stop/restart the 
node to avoid cluster instability. However these are rare scenarios in real 
world (e.g. start baseline node and start deactivation process right after node 
recovery is finished).

For starters we can expose baseline and cluster state in our rest endpoint and 
try to move out mentioned above pre-work things from PME. 

  was:
Currently there are no good possibilities to get more knowledge about cluster 
before PME on node join start.

It might be usefult to do some pre-work (activate components if cluster is 
active, calculate baseline affinity, cleanup pds if baseline changed, etc.) 
before actual NODE_JOIN event is triggered cluster-wide and PME is started.
Such pre-work will significantly speed-up PME in case of node join.
Currently the only place where it can be done is during processing NodeAdded 
message on local joining node. 
But it's not a good idea, because it will freeze processing new discovery 
messages cluster-wide.

I see 2 ways how to implement it:

1) Introduce new intermediate state of node when it's discovered, but discovery 
event on node join is not triggered yet. This is right, but complicated change, 
because it requires revisiting joining process both in Tcp and Zk discovery 
protocols with extra failover scenarios.

2) Try to get this information and do pre-work before discovery manager start, 
using e.g. GridRestProcessor. This looks much simplier, but we can have some 
races there, when during pre-work cluster state has been changed (deactivation, 
baseline change). In this case we should rollback it or just stop/restart the 
node to avoid cluster instability. However these are rare scenarios in real 
world (e.g. start baseline node and start deactivation process right after node 
recovery is finished).

For starters we can expose baseline and cluster state in our rest endpoint and 
try to move out mentioned above pre-work things from PME. 


> Ability to get know more about cluster state before NODE_JOINED event is 
> fired cluster-wide
> -------------------------------------------------------------------------------------------
>
>                 Key: IGNITE-10485
>                 URL: https://issues.apache.org/jira/browse/IGNITE-10485
>             Project: Ignite
>          Issue Type: Improvement
>          Components: cache
>            Reporter: Pavel Kovalenko
>            Priority: Major
>             Fix For: 2.8
>
>
> Currently there are no good possibilities to get more knowledge about cluster 
> before PME on node join is started.
> It might be usefult to do some pre-work (activate components if cluster is 
> active, calculate baseline affinity, cleanup pds if baseline changed, etc.) 
> before actual NODE_JOIN event is triggered cluster-wide and PME is started.
> Such pre-work will significantly speed-up PME in case of node join.
> Currently the only place where it can be done is during processing NodeAdded 
> message on local joining node. 
> But it's not a good idea, because it will freeze processing new discovery 
> messages cluster-wide.
> I see 2 ways how to implement it:
> 1) Introduce new intermediate state of node when it's discovered, but 
> discovery event on node join is not triggered yet. This is right, but 
> complicated change, because it requires revisiting joining process both in 
> Tcp and Zk discovery protocols with extra failover scenarios.
> 2) Try to get this information and do pre-work before discovery manager 
> start, using e.g. GridRestProcessor. This looks much simplier, but we can 
> have some races there, when during pre-work cluster state has been changed 
> (deactivation, baseline change). In this case we should rollback it or just 
> stop/restart the node to avoid cluster instability. However these are rare 
> scenarios in real world (e.g. start baseline node and start deactivation 
> process right after node recovery is finished).
> For starters we can expose baseline and cluster state in our rest endpoint 
> and try to move out mentioned above pre-work things from PME. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to