[jira] [Commented] (HDFS-15382) Split FsDatasetImpl from blockpool lock to blockpool volume lock
[ https://issues.apache.org/jira/browse/HDFS-15382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17495328#comment-17495328 ] Xiaoqiao He commented on HDFS-15382: Thanks [~yuanbo] for your feedback. [~Aiphag0] is working for pushing this feature forward now. The latest PR refer to https://github.com/apache/hadoop/pull/3941. Before that we have merged https://issues.apache.org/jira/browse/HDFS-16429. Welcome to any suggestions or work together here. {quote}It seems not a compatible feature{quote} We do not create new feature branch for this improvement. IMO there is no any incompatible changes for end user now. > Split FsDatasetImpl from blockpool lock to blockpool volume lock > - > > Key: HDFS-15382 > URL: https://issues.apache.org/jira/browse/HDFS-15382 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Mingxiang Li >Assignee: Mingxiang Li >Priority: Major > Labels: pull-request-available > Attachments: HDFS-15382-sample.patch, image-2020-06-02-1.png, > image-2020-06-03-1.png > > Time Spent: 40m > Remaining Estimate: 0h > > In HDFS-15180 we split lock to blockpool grain size.But when one volume is in > heavy load and will block other request which in same blockpool but different > volume.So we split lock to two leval to avoid this happend.And to improve > datanode performance. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15382) Split FsDatasetImpl from blockpool lock to blockpool volume lock
[ https://issues.apache.org/jira/browse/HDFS-15382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17495308#comment-17495308 ] Yuanbo Liu commented on HDFS-15382: --- It seems not a compatible feature, how about merging all sub-task patches into feature branch, and then merge it into trunk? > Split FsDatasetImpl from blockpool lock to blockpool volume lock > - > > Key: HDFS-15382 > URL: https://issues.apache.org/jira/browse/HDFS-15382 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Mingxiang Li >Assignee: Mingxiang Li >Priority: Major > Labels: pull-request-available > Attachments: HDFS-15382-sample.patch, image-2020-06-02-1.png, > image-2020-06-03-1.png > > Time Spent: 40m > Remaining Estimate: 0h > > In HDFS-15180 we split lock to blockpool grain size.But when one volume is in > heavy load and will block other request which in same blockpool but different > volume.So we split lock to two leval to avoid this happend.And to improve > datanode performance. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15382) Split FsDatasetImpl from blockpool lock to blockpool volume lock
[ https://issues.apache.org/jira/browse/HDFS-15382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17476764#comment-17476764 ] Mingxinag Li commented on HDFS-15382: - [https://github.com/apache/hadoop/pull/3889] update pre dependencies pr > Split FsDatasetImpl from blockpool lock to blockpool volume lock > - > > Key: HDFS-15382 > URL: https://issues.apache.org/jira/browse/HDFS-15382 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Mingxinag Li >Assignee: Mingxinag Li >Priority: Major > Attachments: HDFS-15382-sample.patch, image-2020-06-02-1.png, > image-2020-06-03-1.png > > > In HDFS-15180 we split lock to blockpool grain size.But when one volume is in > heavy load and will block other request which in same blockpool but different > volume.So we split lock to two leval to avoid this happend.And to improve > datanode performance. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15382) Split FsDatasetImpl from blockpool lock to blockpool volume lock
[ https://issues.apache.org/jira/browse/HDFS-15382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17444371#comment-17444371 ] Yuanbo Liu commented on HDFS-15382: --- The design of splitting data lock by volume seems promising as the size of volume is getting bigger and bigger nowadays. We're looking forword this patch could be applied into branch-3. Is anyone working on this JIRA? > Split FsDatasetImpl from blockpool lock to blockpool volume lock > - > > Key: HDFS-15382 > URL: https://issues.apache.org/jira/browse/HDFS-15382 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Aiphago >Assignee: Aiphago >Priority: Major > Attachments: HDFS-15382-sample.patch, image-2020-06-02-1.png, > image-2020-06-03-1.png > > > In HDFS-15180 we split lock to blockpool grain size.But when one volume is in > heavy load and will block other request which in same blockpool but different > volume.So we split lock to two leval to avoid this happend.And to improve > datanode performance. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15382) Split FsDatasetImpl from blockpool lock to blockpool volume lock
[ https://issues.apache.org/jira/browse/HDFS-15382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17204434#comment-17204434 ] Xiaoqiao He commented on HDFS-15382: Thanks [~sodonnell],[~Jiang Xin] for your comments. HDFS-15150 and HDFS-15160 is very interesting improvement for DataNode, and I think the result is also impressive. But this solution does not solve coupling issue between different BlockPools and different Volumes when enable Federation feature. Especially one of BlockPool/Volume's load is very high, other BlockPools/Volumes read/write operation will be blocked since still some IO operation in Lock which could hold for long time, such as #updateReplicaUnderRecovery. In our inner branch, this issue is very critical. Please reference https://drive.google.com/file/d/1eaE8vSEhIli0H3j2eDiPJNYuKAC0MFgu/view?usp=sharing if interesting details. IMO, the key of this improvement is decoupling BlockPools and Volumes and try to improve performance further. with HDFS-15150 and HDFS-15160, it will get a better result. About the demo patch, if agreement, we will split to some subtask to push this feature forwards. cc [~Aiphag0] Thanks [~sodonnell] and [~LiJinglun] again. Welcome more discussion and suggestions. > Split FsDatasetImpl from blockpool lock to blockpool volume lock > - > > Key: HDFS-15382 > URL: https://issues.apache.org/jira/browse/HDFS-15382 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Aiphago >Assignee: Aiphago >Priority: Major > Attachments: HDFS-15382-sample.patch, image-2020-06-02-1.png, > image-2020-06-03-1.png > > > In HDFS-15180 we split lock to blockpool grain size.But when one volume is in > heavy load and will block other request which in same blockpool but different > volume.So we split lock to two leval to avoid this happend.And to improve > datanode performance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15382) Split FsDatasetImpl from blockpool lock to blockpool volume lock
[ https://issues.apache.org/jira/browse/HDFS-15382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17203990#comment-17203990 ] Jinglun commented on HDFS-15382: Thanks very much for [~sodonnell] your suggestions ! I'd like to give a try of the read/write lock ! > Split FsDatasetImpl from blockpool lock to blockpool volume lock > - > > Key: HDFS-15382 > URL: https://issues.apache.org/jira/browse/HDFS-15382 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Aiphago >Assignee: Aiphago >Priority: Major > Fix For: 3.2.1 > > Attachments: HDFS-15382-sample.patch, image-2020-06-02-1.png, > image-2020-06-03-1.png > > > In HDFS-15180 we split lock to blockpool grain size.But when one volume is in > heavy load and will block other request which in same blockpool but different > volume.So we split lock to two leval to avoid this happend.And to improve > datanode performance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15382) Split FsDatasetImpl from blockpool lock to blockpool volume lock
[ https://issues.apache.org/jira/browse/HDFS-15382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17203837#comment-17203837 ] Stephen O'Donnell commented on HDFS-15382: -- [~LiJinglun] There is a similar change already committed on trunk in HDFS-15150 and HDFS-15160. These changes do not go as far as the changes suggested here, but they are simpler and hence easier to backport / review. Some people who tried these patches out reported good results in reducing DN pauses. [~hexiaoqiao] The approach here does seem like a good one and worth exploring. The concern I have, is around the complexity of it and the amount of change it introduces. It would be great to also benchmark the simple read/write lock along with the change here to see how they compare. I also think it would be worth exploring where the lock is held during IO operations (ie potentially held a long time) and try to avoid holding the lock during a disk IO. If we could do this on the common code paths (create/write block, read block) then it would make most problems go away I think. There is also the recent changes to remove the locking in DirectoryScanner, which we are seeing cause a lot of problems on the 3.x branches - HDFS-15415 > Split FsDatasetImpl from blockpool lock to blockpool volume lock > - > > Key: HDFS-15382 > URL: https://issues.apache.org/jira/browse/HDFS-15382 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Aiphago >Assignee: Aiphago >Priority: Major > Fix For: 3.2.1 > > Attachments: HDFS-15382-sample.patch, image-2020-06-02-1.png, > image-2020-06-03-1.png > > > In HDFS-15180 we split lock to blockpool grain size.But when one volume is in > heavy load and will block other request which in same blockpool but different > volume.So we split lock to two leval to avoid this happend.And to improve > datanode performance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15382) Split FsDatasetImpl from blockpool lock to blockpool volume lock
[ https://issues.apache.org/jira/browse/HDFS-15382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17203790#comment-17203790 ] Jinglun commented on HDFS-15382: Great work [~Aiphag0] [~hexiaoqiao] !!! Haven't go into details(it is a very big patch) but the design makes sense to me. Our data streaming service is facing occasionally slow read/write. I'd like to test this patch on my cluster. A patch based on 3.x will be very helpful. > Split FsDatasetImpl from blockpool lock to blockpool volume lock > - > > Key: HDFS-15382 > URL: https://issues.apache.org/jira/browse/HDFS-15382 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Aiphago >Assignee: Aiphago >Priority: Major > Fix For: 3.2.1 > > Attachments: HDFS-15382-sample.patch, image-2020-06-02-1.png, > image-2020-06-03-1.png > > > In HDFS-15180 we split lock to blockpool grain size.But when one volume is in > heavy load and will block other request which in same blockpool but different > volume.So we split lock to two leval to avoid this happend.And to improve > datanode performance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15382) Split FsDatasetImpl from blockpool lock to blockpool volume lock
[ https://issues.apache.org/jira/browse/HDFS-15382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200157#comment-17200157 ] Xiaoqiao He commented on HDFS-15382: [~weichiu],[~sodonnell],[~linyiqun] do you have time to review this solution? It works well in our internal cluster. I believe this is useful feature if we can push it forward. Any suggestions and comments are welcome. We will prepare new patch based on trunk if come to an agreement on this solution. > Split FsDatasetImpl from blockpool lock to blockpool volume lock > - > > Key: HDFS-15382 > URL: https://issues.apache.org/jira/browse/HDFS-15382 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Aiphago >Assignee: Aiphago >Priority: Major > Fix For: 3.2.1 > > Attachments: HDFS-15382-sample.patch, image-2020-06-02-1.png, > image-2020-06-03-1.png > > > In HDFS-15180 we split lock to blockpool grain size.But when one volume is in > heavy load and will block other request which in same blockpool but different > volume.So we split lock to two leval to avoid this happend.And to improve > datanode performance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15382) Split FsDatasetImpl from blockpool lock to blockpool volume lock
[ https://issues.apache.org/jira/browse/HDFS-15382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17196824#comment-17196824 ] Aiphago commented on HDFS-15382: This is a sample base our 2.7 version, mian logic is similar, but apply to trunck need some wrok todo. > Split FsDatasetImpl from blockpool lock to blockpool volume lock > - > > Key: HDFS-15382 > URL: https://issues.apache.org/jira/browse/HDFS-15382 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Aiphago >Assignee: Aiphago >Priority: Major > Fix For: 3.2.1 > > Attachments: HDFS-15382-sample.patch, image-2020-06-02-1.png, > image-2020-06-03-1.png > > > In HDFS-15180 we split lock to blockpool grain size.But when one volume is in > heavy load and will block other request which in same blockpool but different > volume.So we split lock to two leval to avoid this happend.And to improve > datanode performance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15382) Split FsDatasetImpl from blockpool lock to blockpool volume lock
[ https://issues.apache.org/jira/browse/HDFS-15382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17195317#comment-17195317 ] Jiang Xin commented on HDFS-15382: -- Thanks [~Aiphag0] for your proposal, seems it helps a lot on IO heavy DNs and we planned to do this recently. Would you submit a sample patch or push it forward? > Split FsDatasetImpl from blockpool lock to blockpool volume lock > - > > Key: HDFS-15382 > URL: https://issues.apache.org/jira/browse/HDFS-15382 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Aiphago >Assignee: Aiphago >Priority: Major > Fix For: 3.2.1 > > Attachments: image-2020-06-02-1.png, image-2020-06-03-1.png > > > In HDFS-15180 we split lock to blockpool grain size.But when one volume is in > heavy load and will block other request which in same blockpool but different > volume.So we split lock to two leval to avoid this happend.And to improve > datanode performance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15382) Split FsDatasetImpl from blockpool lock to blockpool volume lock
[ https://issues.apache.org/jira/browse/HDFS-15382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17124767#comment-17124767 ] Aiphago commented on HDFS-15382: !image-2020-06-03-1.png|width=628,height=378! The simple lock model like this.Parts of implemented as follows # As for finalizeReplica(),append(),createRbw()First get BlockPoolLock read lock,and then get BlockPoolLock-volume-lock write lock. # As for getStoredBlock(),getMetaDataInputStream()First get BlockPoolLock read lock,and the then get BlockPoolLock-volume-lock read lock. # As for deepCopyReplica(),getBlockReports() get the BlockPoolLock read lock. # As for delete hold the BlockPoolLock write lock. # The change of replicaMap's Gset change to sync to make thread safe.And replicaMap itself is the same as HDFS-15180 only control blockpool lock > Split FsDatasetImpl from blockpool lock to blockpool volume lock > - > > Key: HDFS-15382 > URL: https://issues.apache.org/jira/browse/HDFS-15382 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Aiphago >Assignee: Aiphago >Priority: Major > Fix For: 3.2.1 > > Attachments: image-2020-06-02-1.png, image-2020-06-03-1.png > > > In HDFS-15180 we split lock to blockpool grain size.But when one volume is in > heavy load and will block other request which in same blockpool but different > volume.So we split lock to two leval to avoid this happend.And to improve > datanode performance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15382) Split FsDatasetImpl from blockpool lock to blockpool volume lock
[ https://issues.apache.org/jira/browse/HDFS-15382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17124736#comment-17124736 ] Aiphago commented on HDFS-15382: {quote}it is confused with {{ReplicaCachingGetSpaceUsed}}, IIUC, {{ReplicaCachingGetSpaceUsed}} is calculated in memory directly rather than sync info from disk, right? so why is it related to this changes? Also the log print is based on our internal version rather than branch trunk, some notes could be better. {quote} {{ReplicaCachingGetSpaceUsed copy replica from FsDataSetImpl most of time is spend in wait }}{{FsDataSetImpl lock,so 'Copy replica infos' time spend can reflect the time wait for the lock.}}{{}} > Split FsDatasetImpl from blockpool lock to blockpool volume lock > - > > Key: HDFS-15382 > URL: https://issues.apache.org/jira/browse/HDFS-15382 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Aiphago >Assignee: Aiphago >Priority: Major > Fix For: 3.2.1 > > Attachments: image-2020-06-02-1.png > > > In HDFS-15180 we split lock to blockpool grain size.But when one volume is in > heavy load and will block other request which in same blockpool but different > volume.So we split lock to two leval to avoid this happend.And to improve > datanode performance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15382) Split FsDatasetImpl from blockpool lock to blockpool volume lock
[ https://issues.apache.org/jira/browse/HDFS-15382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17123392#comment-17123392 ] Xiaoqiao He commented on HDFS-15382: Thanks [~Aiphag0] for your proposal here. The performance improvement is impressive especially the load of DataNode is very high. A few nits, a. the monitor index is our internal item, it is more helpful to explain what it is meaning. b. it is confused with {{ReplicaCachingGetSpaceUsed}}, IIUC, {{ReplicaCachingGetSpaceUsed}} is calculated in memory directly rather than sync info from disk, right? so why is it related to this changes? Also the log print is based on our internal version rather than branch trunk, some notes could be better. c. If could we offer a simple design document (maybe include one design chart with a simple design/refactor description), IMO it is useful to someone who is interested this improvement. > Split FsDatasetImpl from blockpool lock to blockpool volume lock > - > > Key: HDFS-15382 > URL: https://issues.apache.org/jira/browse/HDFS-15382 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Aiphago >Assignee: Aiphago >Priority: Major > Fix For: 3.2.1 > > Attachments: image-2020-06-02-1.png > > > In HDFS-15180 we split lock to blockpool grain size.But when one volume is in > heavy load and will block other request which in same blockpool but different > volume.So we split lock to two leval to avoid this happend.And to improve > datanode performance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org