[ https://issues.apache.org/jira/browse/HDDS-2034?focusedWorklogId=320102&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-320102 ]
ASF GitHub Bot logged work on HDDS-2034: ---------------------------------------- Author: ASF GitHub Bot Created on: 29/Sep/19 03:10 Start Date: 29/Sep/19 03:10 Worklog Time Spent: 10m Work Description: ChenSammi commented on pull request #1469: HDDS-2034. Async RATIS pipeline creation and destroy through heartbea… URL: https://github.com/apache/hadoop/pull/1469#discussion_r329335596 ########## File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/BlockManagerImpl.java ########## @@ -188,6 +208,15 @@ public AllocatedBlock allocateBlock(final long size, ReplicationType type, // TODO: #CLUTIL Remove creation logic when all replication types and // factors are handled by pipeline creator pipeline = pipelineManager.createPipeline(type, factor); + // wait until pipeline is ready + long current = System.currentTimeMillis(); + while (!pipeline.isOpen() && System.currentTimeMillis() < + (current + pipelineCreateWaitTimeout)) { + try { + Thread.sleep(1000); + } catch (InterruptedException e) { + } + } Review comment: This create pipeline in block allocation path is kind of debating. A current comment sys "TODO: #CLUTIL Remove creation logic when all replication types and factors are handled by pipeline creator". To the detail, "ALLOCATED" state will be handled in task HDDS-2177, "Add a srubber thread to detect creation failure pipelines in ALLOCATED state". Currently the pipelineCreateWaitTimeout is calculated based on "hdds.command.status.report.interval" and "hdds.heartbeat.interval", under the condition that the connection between Datanode and SCM is in good state. What if pipeline is created successfully, while the connection to SCM broken and restored after a while. Would we wait a little longer to decide whether pipeline creation success or failure. So in HDDS-2177, I plan to have a configurable property for the pipeline creation timeout. Every ALLOCATED pipeline, which exceeds the creation timeout will be claimed failure and garbage collected. Whether using CompleteFuture or while loop, we all need a timeout. This is on block allocation path, how many latency can a synchronous API tolerate? Maybe the best way is not create pipeline in such case if we can make sure there are enough pipelines to use after exiting safe mode. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 320102) Time Spent: 8.5h (was: 8h 20m) > Async RATIS pipeline creation and destroy through heartbeat commands > -------------------------------------------------------------------- > > Key: HDDS-2034 > URL: https://issues.apache.org/jira/browse/HDDS-2034 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Reporter: Sammi Chen > Assignee: Sammi Chen > Priority: Major > Labels: pull-request-available > Time Spent: 8.5h > Remaining Estimate: 0h > > Currently, pipeline creation and destroy are synchronous operations. SCM > directly connect to each datanode of the pipeline through gRPC channel to > create the pipeline to destroy the pipeline. > This task is to remove the gRPC channel, send pipeline creation and destroy > action through heartbeat command to each datanode. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org