[
https://issues.apache.org/jira/browse/HDDS-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17067325#comment-17067325
]
runzhiwang commented on HDDS-3240:
----------------------------------
I'm working on it
> Improve write efficiency by creating container in parallel.
> -----------------------------------------------------------
>
> Key: HDDS-3240
> URL: https://issues.apache.org/jira/browse/HDDS-3240
> Project: Hadoop Distributed Data Store
> Issue Type: Improvement
> Reporter: runzhiwang
> Assignee: runzhiwang
> Priority: Major
> Attachments: screenshot-1.png
>
>
> Now follower cannot create container until leader finish creating container.
> But follower and leader can create container in parallel rather than in
> sequential.
> 1. From the code, the [future
> thread|https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/ContainerStateMachine.java#L672]
> do getCachedStateMachineData in readStateMachineData and the [future
> thread|https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/ContainerStateMachine.java#L459]
> do createContainer in writeStateMachineData are the same
> [thread|https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/ContainerStateMachine.java#L505].
> Because `writeStateMachineData `called before `readStateMachineData`. So
> leader must wait `createContainer `finish then `getCachedStateMachineData
> `and append logs to the follower, so leader and follower are not independent
> in createContainer, follower must wait leader finish `createContainer`.
> 2. From the jaeger UI, you can also see follower create container after
> leader finishing it currently.
> How to improve it:
> I think this order can be improved by distinguishing the thread used by
> `getCachedStateMachineData ` and `createContainer `, and [data =
> readStateMachineData(requestProto, term,
> logIndex)|https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/ContainerStateMachine.java#L619]
> use same thread with `createContainer `. If
> [stateMachineDataCache.get(logIndex)|https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/ContainerStateMachine.java#L617]
> does not return null, leader can get stateMachineData from cache and need
> not wait `createContainer` finish, thus leader and follower can be
> independent. But if it return null, leader must finish `createContainer `and
> then apennd logs to the follower, so I think [data =
> readStateMachineData(requestProto, term,
> logIndex)|https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/ContainerStateMachine.java#L619]
> should use the same thread with `createContainer` rather than the whole
> [getCachedStateMachineData|https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/ContainerStateMachine.java#L614].
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]