[
https://issues.apache.org/jira/browse/HDDS-2034?focusedWorklogId=320102&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-320102
]
ASF GitHub Bot logged work on HDDS-2034:
----------------------------------------
Author: ASF GitHub Bot
Created on: 29/Sep/19 03:10
Start Date: 29/Sep/19 03:10
Worklog Time Spent: 10m
Work Description: ChenSammi commented on pull request #1469: HDDS-2034.
Async RATIS pipeline creation and destroy through heartbea…
URL: https://github.com/apache/hadoop/pull/1469#discussion_r329335596
##########
File path:
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/BlockManagerImpl.java
##########
@@ -188,6 +208,15 @@ public AllocatedBlock allocateBlock(final long size,
ReplicationType type,
// TODO: #CLUTIL Remove creation logic when all replication types and
// factors are handled by pipeline creator
pipeline = pipelineManager.createPipeline(type, factor);
+ // wait until pipeline is ready
+ long current = System.currentTimeMillis();
+ while (!pipeline.isOpen() && System.currentTimeMillis() <
+ (current + pipelineCreateWaitTimeout)) {
+ try {
+ Thread.sleep(1000);
+ } catch (InterruptedException e) {
+ }
+ }
Review comment:
This create pipeline in block allocation path is kind of debating. A
current comment sys "TODO: #CLUTIL Remove creation logic when all replication
types and factors are handled by pipeline creator".
To the detail, "ALLOCATED" state will be handled in task HDDS-2177, "Add a
srubber thread to detect creation failure pipelines in ALLOCATED state".
Currently the pipelineCreateWaitTimeout is calculated based on
"hdds.command.status.report.interval" and "hdds.heartbeat.interval", under the
condition that the connection between Datanode and SCM is in good state. What
if pipeline is created successfully, while the connection to SCM broken and
restored after a while. Would we wait a little longer to decide whether
pipeline creation success or failure. So in HDDS-2177, I plan to have a
configurable property for the pipeline creation timeout. Every ALLOCATED
pipeline, which exceeds the creation timeout will be claimed failure and
garbage collected.
Whether using CompleteFuture or while loop, we all need a timeout. This is
on block allocation path, how many latency can a synchronous API tolerate?
Maybe the best way is not create pipeline in such case if we can make sure
there are enough pipelines to use after exiting safe mode.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 320102)
Time Spent: 8.5h (was: 8h 20m)
> Async RATIS pipeline creation and destroy through heartbeat commands
> --------------------------------------------------------------------
>
> Key: HDDS-2034
> URL: https://issues.apache.org/jira/browse/HDDS-2034
> Project: Hadoop Distributed Data Store
> Issue Type: Sub-task
> Reporter: Sammi Chen
> Assignee: Sammi Chen
> Priority: Major
> Labels: pull-request-available
> Time Spent: 8.5h
> Remaining Estimate: 0h
>
> Currently, pipeline creation and destroy are synchronous operations. SCM
> directly connect to each datanode of the pipeline through gRPC channel to
> create the pipeline to destroy the pipeline.
> This task is to remove the gRPC channel, send pipeline creation and destroy
> action through heartbeat command to each datanode.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]