[GitHub] [incubator-uniffle] jerqi commented on issue #234: Introduce rejection mechanism when coordinator server is starting

GitBox Wed, 21 Sep 2022 20:37:34 -0700


jerqi commented on issue #234:
URL: 
https://github.com/apache/incubator-uniffle/issues/234#issuecomment-1254479362


   > Got your thought.
   > 
   > > How do the yarn resourcemanager to process this problem?
   > 
   > In HA resourcemanagers, there is no such problems due to the mechanism of 
failing back to standby active RM by zookeeper. Let's talk about it in 
single-one resourcemanager or hadoop namenode. As I know, the namenode will 
enter in the safe mode when starting it will exit until enough block reports 
from datanode have been accepted. Refer to : 
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html
   > 
   > > I suggest that we should pend the requests instead of rejection when we 
start the coordinator.
   > 
   > Pending will slow down the apps. I think we should make the request 
falling back to another coordinator. Maybe the heartbeat interval waiting when 
starting is a good tradeoff, this will be an indicator whether to exit the safe 
mode for coordinator.
   
   It means that we shouldn't restart the two coordinators during the short 
time. It's a little difficult for K8S controller to select a proper interval to 
restart them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [incubator-uniffle] jerqi commented on issue #234: Introduce rejection mechanism when coordinator server is starting

Reply via email to