[GitHub] [hadoop] ChenSammi commented on issue #1469: HDDS-2034. Async RATIS pipeline creation and destroy through heartbea…
ChenSammi commented on issue #1469: HDDS-2034. Async RATIS pipeline creation and destroy through heartbea… URL: https://github.com/apache/hadoop/pull/1469#issuecomment-539983705 1. use groupManagement to add and remove pipeline group, instead of new raft client 2. remove CreatePipelineCommandStatus ACK, use triggerHeartBeat instead 3. separate new pipelines and old pipelines in OneReplicaPipelineSafeModeRule This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] ChenSammi commented on issue #1469: HDDS-2034. Async RATIS pipeline creation and destroy through heartbea…
ChenSammi commented on issue #1469: HDDS-2034. Async RATIS pipeline creation and destroy through heartbea… URL: https://github.com/apache/hadoop/pull/1469#issuecomment-539294113 > > I think the purpose of safemode is to guarantee that Ozone cluster is ready to provide service to Ozone client once safemode is exited. > > @ChenSammi I agree with that. I think the problem occurs with OneReplicaPipelineSafeModeRule. This rule makes sure that atleast one datanode in the old pipeline is reported so that reads for OPEN containers can go through. Here I think that old pipelines need to be tracked separately. OK, I will try to separate the olds from the new ones. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] ChenSammi commented on issue #1469: HDDS-2034. Async RATIS pipeline creation and destroy through heartbea…
ChenSammi commented on issue #1469: HDDS-2034. Async RATIS pipeline creation and destroy through heartbea… URL: https://github.com/apache/hadoop/pull/1469#issuecomment-536527360 > > > @ChenSammi Thanks a lot for working on this! Please find my comments below. > > 1. For the SafeModeRules, if we allow pipeline creation during safe mode we need to modify the rules so that newly created pipelines are not counted in the rule. > > 2. Can we just trigger PipelineReport from the datanodes after creation of pipeline instead of CreatePipelineACK? That would greatly simplify the OPEN pipeline code. Thanks @lokeshj1703 for review the patch. 1. For SafeModeRules, I do modified the healthy pipeline rule a litttle bit. Add a > > > > For SafeModeRules, I do modified the HealthyPipelineSafeModeRule a bit. Add two properties, one is "hdds.scm.safemode.pipeline.creation" to control whether create pipeline in safemode. > > Another is "hdds.scm.safemode.min.pipeline" control the minimum pipeline number to exit safe mode when create pipeline in safemode is enabled. > > @ChenSammi The problem is safe mode rules are only tracking the old pipelines. But since they are listening to OPEN_PIPELINE event any newly created pipeline is counted in the rule. So if we are waiting for 50 old pipelines and 20 new ones are created, rule would pass if just 30 old pipelines are reported. Therefore I think we need a way to separate the old pipelines from new ones in the rules. Hi @lokeshj1703, I understand your point. Let me explain my thought from a different point of view. I think the purpose of safenode is to gurantee that Ozone cluster is ready to provide service to Ozone client once safenode is exited. As long as there are enough open pipelines to serve the read/write requst, Ozone can exit the safemode. If we want 50 open pipelines in a cluster to exit safenode, we may not care if they are new pipelines or old pipelines very much. There are datanodes up and down during the SCM start, what if some old pipelines are dead and lost for ever. New pipelines can replace these dead pipelines. Currently each datanode can only join one THREE factor RATIS pipeline, there will be very few new pipelines created after SCM restart. When multi-raft feature enabled, there is also a upper limit for how many pipelines each data can join. So basically if there is no new datanode join in, after SCM restart, majority is old pipeline, only a few new pipeline if possbile. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] ChenSammi commented on issue #1469: HDDS-2034. Async RATIS pipeline creation and destroy through heartbea…
ChenSammi commented on issue #1469: HDDS-2034. Async RATIS pipeline creation and destroy through heartbea… URL: https://github.com/apache/hadoop/pull/1469#issuecomment-536242054 > @ChenSammi Thanks a lot for working on this! Please find my comments below. > > 1. For the SafeModeRules, if we allow pipeline creation during safe mode we need to modify the rules so that newly created pipelines are not counted in the rule. > > 2. Can we just trigger PipelineReport from the datanodes after creation of pipeline instead of CreatePipelineACK? That would greatly simplify the OPEN pipeline code. Thanks @lokeshj1703 for view the patch. 1. For SafeModeRules, I do modified the HealthyPipelineSafeModeRule a bit. Add two properties, one is "hdds.scm.safemode.pipeline.creation" to control whether create pipeline in safemode. Another is "hdds.scm.safemode.min.pipeline" control the minimum pipeline number to exit safe mode when create pipeline in safemode is enabled. 2. It's a good point. I will check the code to see if I leverage the trigger pipelineReport to replace the CreatePipelineACK. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] ChenSammi commented on issue #1469: HDDS-2034. Async RATIS pipeline creation and destroy through heartbea…
ChenSammi commented on issue #1469: HDDS-2034. Async RATIS pipeline creation and destroy through heartbea… URL: https://github.com/apache/hadoop/pull/1469#issuecomment-535777599 @anuengineer and @xiaoyuyao , should I provide a new patch on trunk now, or wait until the whole communication channel design come out next week? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] ChenSammi commented on issue #1469: HDDS-2034. Async RATIS pipeline creation and destroy through heartbea…
ChenSammi commented on issue #1469: HDDS-2034. Async RATIS pipeline creation and destroy through heartbea… URL: https://github.com/apache/hadoop/pull/1469#issuecomment-534425722 Unit and integration output is not available. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] ChenSammi commented on issue #1469: HDDS-2034. Async RATIS pipeline creation and destroy through heartbea…
ChenSammi commented on issue #1469: HDDS-2034. Async RATIS pipeline creation and destroy through heartbea… URL: https://github.com/apache/hadoop/pull/1469#issuecomment-532945330 /retest This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] ChenSammi commented on issue #1469: HDDS-2034. Async RATIS pipeline creation and destroy through heartbea…
ChenSammi commented on issue #1469: HDDS-2034. Async RATIS pipeline creation and destroy through heartbea… URL: https://github.com/apache/hadoop/pull/1469#issuecomment-532936333 Majority of failed UT report "Could not initialize class org.apache.hadoop.ozone.util.OzoneVersionInfo". This class is not touched by the patch. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org