[jira] [Commented] (HDFS-7964) Add support for async edit logging
[ https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16193806#comment-16193806 ] Andrew Wang commented on HDFS-7964: --- Thanks Kihwal, filed HDFS-12603 to make this happen. > Add support for async edit logging > -- > > Key: HDFS-7964 > URL: https://issues.apache.org/jira/browse/HDFS-7964 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Fix For: 2.8.0, 2.9.0, 3.0.0-alpha1 > > Attachments: HDFS-7964-branch-2.7.patch, > HDFS-7964-branch-2.8.0.patch, HDFS-7964.patch, HDFS-7964.patch, > HDFS-7964.patch, HDFS-7964.patch, HDFS-7964-rebase.patch > > > Edit logging is a major source of contention within the NN. LogEdit is > called within the namespace write log, while logSync is called outside of the > lock to allow greater concurrency. The handler thread remains busy until > logSync returns to provide the client with a durability guarantee for the > response. > Write heavy RPC load and/or slow IO causes handlers to stall in logSync. > Although the write lock is not held, readers are limited/starved and the call > queue fills. Combining an edit log thread with postponed RPC responses from > HADOOP-10300 will provide the same durability guarantee but immediately free > up the handlers. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7964) Add support for async edit logging
[ https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16193784#comment-16193784 ] Kihwal Lee commented on HDFS-7964: -- bq. Should we consider turning this on by default in newer versions of Hadoop? +1. It was off by default due to concerns about correctness. We have been running it in production for quite a while with no issues so far. > Add support for async edit logging > -- > > Key: HDFS-7964 > URL: https://issues.apache.org/jira/browse/HDFS-7964 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Fix For: 2.8.0, 2.9.0, 3.0.0-alpha1 > > Attachments: HDFS-7964-branch-2.7.patch, > HDFS-7964-branch-2.8.0.patch, HDFS-7964.patch, HDFS-7964.patch, > HDFS-7964.patch, HDFS-7964.patch, HDFS-7964-rebase.patch > > > Edit logging is a major source of contention within the NN. LogEdit is > called within the namespace write log, while logSync is called outside of the > lock to allow greater concurrency. The handler thread remains busy until > logSync returns to provide the client with a durability guarantee for the > response. > Write heavy RPC load and/or slow IO causes handlers to stall in logSync. > Although the write lock is not held, readers are limited/starved and the call > queue fills. Combining an edit log thread with postponed RPC responses from > HADOOP-10300 will provide the same durability guarantee but immediately free > up the handlers. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7964) Add support for async edit logging
[ https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16193773#comment-16193773 ] Andrew Wang commented on HDFS-7964: --- Should we consider turning this on by default in newer versions of Hadoop? It'd be nice to improve out-of-the-box performance. > Add support for async edit logging > -- > > Key: HDFS-7964 > URL: https://issues.apache.org/jira/browse/HDFS-7964 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Fix For: 2.8.0, 2.9.0, 3.0.0-alpha1 > > Attachments: HDFS-7964-branch-2.7.patch, > HDFS-7964-branch-2.8.0.patch, HDFS-7964.patch, HDFS-7964.patch, > HDFS-7964.patch, HDFS-7964.patch, HDFS-7964-rebase.patch > > > Edit logging is a major source of contention within the NN. LogEdit is > called within the namespace write log, while logSync is called outside of the > lock to allow greater concurrency. The handler thread remains busy until > logSync returns to provide the client with a durability guarantee for the > response. > Write heavy RPC load and/or slow IO causes handlers to stall in logSync. > Although the write lock is not held, readers are limited/starved and the call > queue fills. Combining an edit log thread with postponed RPC responses from > HADOOP-10300 will provide the same durability guarantee but immediately free > up the handlers. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7964) Add support for async edit logging
[ https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16185888#comment-16185888 ] Kihwal Lee commented on HDFS-7964: -- bq. How many handler threads are used for the 285k QPS? 100. More handlers are not necessarily better. More handlers increase lock contention and the typical parallelism in NN is not very high anyway. Look at how many cores a super busy NN can utilize. It's not very impressive. So in general increasing number of handlers too much is not a good idea. One of the exceptions would be the handlers serving long {{getContentSummary}} calls. If called against a big tree, it can occupy a handler for many seconds. If you have too many of these, the utilization and throughput will drop. Some say it was a mistake to add {{getContentSummary}} in the first place, but it's already there, so we need to deal with it. If it becomes a major issue, we can think about async handling them and freeing up handlers, ala async IBR processing. But the response here will be variable, so it gets a bit more complicated. If we improve the parallelism in NN, more handlers will give us better throughput. The fine-grained locking was one of the efforts put in to achieve this, but the introduction of the snapshot feature made it very complicated, if not impossible. The performance of the prototype was impressive. > Add support for async edit logging > -- > > Key: HDFS-7964 > URL: https://issues.apache.org/jira/browse/HDFS-7964 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Fix For: 2.8.0, 2.9.0, 3.0.0-alpha1 > > Attachments: HDFS-7964-branch-2.7.patch, > HDFS-7964-branch-2.8.0.patch, HDFS-7964.patch, HDFS-7964.patch, > HDFS-7964.patch, HDFS-7964.patch, HDFS-7964-rebase.patch > > > Edit logging is a major source of contention within the NN. LogEdit is > called within the namespace write log, while logSync is called outside of the > lock to allow greater concurrency. The handler thread remains busy until > logSync returns to provide the client with a durability guarantee for the > response. > Write heavy RPC load and/or slow IO causes handlers to stall in logSync. > Although the write lock is not held, readers are limited/starved and the call > queue fills. Combining an edit log thread with postponed RPC responses from > HADOOP-10300 will provide the same durability guarantee but immediately free > up the handlers. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7964) Add support for async edit logging
[ https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16185037#comment-16185037 ] Zhe Zhang commented on HDFS-7964: - Very impressive numbers [~daryn]. How many handler threads are used for the 285k QPS? > Add support for async edit logging > -- > > Key: HDFS-7964 > URL: https://issues.apache.org/jira/browse/HDFS-7964 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Fix For: 2.8.0, 2.9.0, 3.0.0-alpha1 > > Attachments: HDFS-7964-branch-2.7.patch, > HDFS-7964-branch-2.8.0.patch, HDFS-7964.patch, HDFS-7964.patch, > HDFS-7964.patch, HDFS-7964.patch, HDFS-7964-rebase.patch > > > Edit logging is a major source of contention within the NN. LogEdit is > called within the namespace write log, while logSync is called outside of the > lock to allow greater concurrency. The handler thread remains busy until > logSync returns to provide the client with a durability guarantee for the > response. > Write heavy RPC load and/or slow IO causes handlers to stall in logSync. > Although the write lock is not held, readers are limited/starved and the call > queue fills. Combining an edit log thread with postponed RPC responses from > HADOOP-10300 will provide the same durability guarantee but immediately free > up the handlers. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7964) Add support for async edit logging
[ https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16184659#comment-16184659 ] Daryn Sharp commented on HDFS-7964: --- Wow that looks super broken. I hope you don't get double edits or out of order edits during a log roll... bq. I found if editPendingQ in FSEditLogAsync.java blocked, all the IPC handlers will slow down and performance degraded. Yes, of course, and edit logging used to frequently block with huge reduction in throughput. Did you actually profile a problem or attempt to micro-optimize? For the 4k queue to block, sync time would have to be on the order of at least hundreds of ms under very heavy write load or multiple seconds for a moderate write load. Glanced at the metrics for the past hour, I see up to 10k write ops/sec + 70k read ops/sec. We've seen ~285k ops/sec under read load abuse so I think throughput is ok. > Add support for async edit logging > -- > > Key: HDFS-7964 > URL: https://issues.apache.org/jira/browse/HDFS-7964 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Fix For: 2.8.0, 2.9.0, 3.0.0-alpha1 > > Attachments: HDFS-7964-branch-2.7.patch, > HDFS-7964-branch-2.8.0.patch, HDFS-7964.patch, HDFS-7964.patch, > HDFS-7964.patch, HDFS-7964.patch, HDFS-7964-rebase.patch > > > Edit logging is a major source of contention within the NN. LogEdit is > called within the namespace write log, while logSync is called outside of the > lock to allow greater concurrency. The handler thread remains busy until > logSync returns to provide the client with a durability guarantee for the > response. > Write heavy RPC load and/or slow IO causes handlers to stall in logSync. > Although the write lock is not held, readers are limited/starved and the call > queue fills. Combining an edit log thread with postponed RPC responses from > HADOOP-10300 will provide the same durability guarantee but immediately free > up the handlers. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7964) Add support for async edit logging
[ https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16096107#comment-16096107 ] photogamerun commented on HDFS-7964: I found if editPendingQ in FSEditLogAsync.java blocked, all the IPC handlers will slow down and performance degraded. so when I backport the code to 2.5.0-cdh5.3.2 I did some change in logEdit method void logEdit(final FSEditLogOp op) { Server.Call rpcCall = Server.getCurCall().get(); if (rpcCall != null) { if (editPendingQ.size() > 32768) { super.logEdit(op); } enqueueEdit(op); } else { super.logEdit(op); } } > Add support for async edit logging > -- > > Key: HDFS-7964 > URL: https://issues.apache.org/jira/browse/HDFS-7964 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Fix For: 2.8.0, 2.9.0, 3.0.0-alpha1 > > Attachments: HDFS-7964-branch-2.7.patch, > HDFS-7964-branch-2.8.0.patch, HDFS-7964.patch, HDFS-7964.patch, > HDFS-7964.patch, HDFS-7964.patch, HDFS-7964-rebase.patch > > > Edit logging is a major source of contention within the NN. LogEdit is > called within the namespace write log, while logSync is called outside of the > lock to allow greater concurrency. The handler thread remains busy until > logSync returns to provide the client with a durability guarantee for the > response. > Write heavy RPC load and/or slow IO causes handlers to stall in logSync. > Although the write lock is not held, readers are limited/starved and the call > queue fills. Combining an edit log thread with postponed RPC responses from > HADOOP-10300 will provide the same durability guarantee but immediately free > up the handlers. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7964) Add support for async edit logging
[ https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037810#comment-16037810 ] Zhe Zhang commented on HDFS-7964: - I took some time trying to fix HDFS-11737 unit test failures but they seem quite tricky. I recommend we remove this JIRA from the release-blocker list of 2.7.4. [~shv] Lemme know if it sounds good to you. > Add support for async edit logging > -- > > Key: HDFS-7964 > URL: https://issues.apache.org/jira/browse/HDFS-7964 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Labels: release-blocker > Fix For: 2.8.0, 2.9.0, 3.0.0-alpha1 > > Attachments: HDFS-7964-branch-2.7.patch, > HDFS-7964-branch-2.8.0.patch, HDFS-7964.patch, HDFS-7964.patch, > HDFS-7964.patch, HDFS-7964.patch, HDFS-7964-rebase.patch > > > Edit logging is a major source of contention within the NN. LogEdit is > called within the namespace write log, while logSync is called outside of the > lock to allow greater concurrency. The handler thread remains busy until > logSync returns to provide the client with a durability guarantee for the > response. > Write heavy RPC load and/or slow IO causes handlers to stall in logSync. > Although the write lock is not held, readers are limited/starved and the call > queue fills. Combining an edit log thread with postponed RPC responses from > HADOOP-10300 will provide the same durability guarantee but immediately free > up the handlers. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7964) Add support for async edit logging
[ https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15991842#comment-15991842 ] Zhe Zhang commented on HDFS-7964: - Sorry for keeping this JIRA open for long time. I created HDFS-11737 to track the backport effort. > Add support for async edit logging > -- > > Key: HDFS-7964 > URL: https://issues.apache.org/jira/browse/HDFS-7964 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Fix For: 2.8.0, 2.9.0, 3.0.0-alpha1 > > Attachments: HDFS-7964-branch-2.7.patch, > HDFS-7964-branch-2.8.0.patch, HDFS-7964.patch, HDFS-7964.patch, > HDFS-7964.patch, HDFS-7964.patch, HDFS-7964-rebase.patch > > > Edit logging is a major source of contention within the NN. LogEdit is > called within the namespace write log, while logSync is called outside of the > lock to allow greater concurrency. The handler thread remains busy until > logSync returns to provide the client with a durability guarantee for the > response. > Write heavy RPC load and/or slow IO causes handlers to stall in logSync. > Although the write lock is not held, readers are limited/starved and the call > queue fills. Combining an edit log thread with postponed RPC responses from > HADOOP-10300 will provide the same durability guarantee but immediately free > up the handlers. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7964) Add support for async edit logging
[ https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15805898#comment-15805898 ] Junping Du commented on HDFS-7964: -- move to 2.9 as 2.8 RC is in progress. > Add support for async edit logging > -- > > Key: HDFS-7964 > URL: https://issues.apache.org/jira/browse/HDFS-7964 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Fix For: 2.8.0, 2.9.0, 3.0.0-alpha1 > > Attachments: HDFS-7964-branch-2.7.patch, > HDFS-7964-branch-2.8.0.patch, HDFS-7964-rebase.patch, HDFS-7964.patch, > HDFS-7964.patch, HDFS-7964.patch, HDFS-7964.patch > > > Edit logging is a major source of contention within the NN. LogEdit is > called within the namespace write log, while logSync is called outside of the > lock to allow greater concurrency. The handler thread remains busy until > logSync returns to provide the client with a durability guarantee for the > response. > Write heavy RPC load and/or slow IO causes handlers to stall in logSync. > Although the write lock is not held, readers are limited/starved and the call > queue fills. Combining an edit log thread with postponed RPC responses from > HADOOP-10300 will provide the same durability guarantee but immediately free > up the handlers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7964) Add support for async edit logging
[ https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15628513#comment-15628513 ] Hadoop QA commented on HDFS-7964: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 16m 20s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 11 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 51s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 3s{color} | {color:green} branch-2.7 passed with JDK v1.8.0_111 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 20s{color} | {color:green} branch-2.7 passed with JDK v1.7.0_111 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 54s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 45s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 36s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 52s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 20s{color} | {color:green} branch-2.7 passed with JDK v1.8.0_111 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 18s{color} | {color:green} branch-2.7 passed with JDK v1.7.0_111 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 11s{color} | {color:green} the patch passed with JDK v1.8.0_111 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 10s{color} | {color:green} the patch passed with JDK v1.7.0_111 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 10s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 44s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 12 new + 1219 unchanged - 38 fixed = 1231 total (was 1257) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 3712 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 1m 37s{color} | {color:red} The patch 197 line(s) with tabs. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 3m 22s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 23s{color} | {color:green} the patch passed with JDK v1.8.0_111 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 9s{color} | {color:green} the patch passed with JDK v1.7.0_111 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 83m 25s{color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_111. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 1m 19s{color} | {color:red} bkjournal in the patch failed with JDK v1.7.0_111. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 25s{color} | {color:red} The patch generated 3 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}255m 1s{color} |
[jira] [Commented] (HDFS-7964) Add support for async edit logging
[ https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15627922#comment-15627922 ] Hadoop QA commented on HDFS-7964: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 3m 48s{color} | {color:red} Docker failed to build yetus/hadoop:c420dfe. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-7964 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12836510/HDFS-7964-branch-2.7.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/17378/console | | Powered by | Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Add support for async edit logging > -- > > Key: HDFS-7964 > URL: https://issues.apache.org/jira/browse/HDFS-7964 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Fix For: 2.8.0, 2.9.0, 3.0.0-alpha1 > > Attachments: HDFS-7964-branch-2.7.patch, > HDFS-7964-branch-2.8.0.patch, HDFS-7964-rebase.patch, HDFS-7964.patch, > HDFS-7964.patch, HDFS-7964.patch, HDFS-7964.patch > > > Edit logging is a major source of contention within the NN. LogEdit is > called within the namespace write log, while logSync is called outside of the > lock to allow greater concurrency. The handler thread remains busy until > logSync returns to provide the client with a durability guarantee for the > response. > Write heavy RPC load and/or slow IO causes handlers to stall in logSync. > Although the write lock is not held, readers are limited/starved and the call > queue fills. Combining an edit log thread with postponed RPC responses from > HADOOP-10300 will provide the same durability guarantee but immediately free > up the handlers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7964) Add support for async edit logging
[ https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15627893#comment-15627893 ] Hadoop QA commented on HDFS-7964: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 6s{color} | {color:red} HDFS-7964 does not apply to branch-2.8. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-7964 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12836221/HDFS-7964-branch-2.8.0.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/17376/console | | Powered by | Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Add support for async edit logging > -- > > Key: HDFS-7964 > URL: https://issues.apache.org/jira/browse/HDFS-7964 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Fix For: 2.8.0, 2.9.0, 3.0.0-alpha1 > > Attachments: HDFS-7964-branch-2.8.0.patch, HDFS-7964-rebase.patch, > HDFS-7964.patch, HDFS-7964.patch, HDFS-7964.patch, HDFS-7964.patch > > > Edit logging is a major source of contention within the NN. LogEdit is > called within the namespace write log, while logSync is called outside of the > lock to allow greater concurrency. The handler thread remains busy until > logSync returns to provide the client with a durability guarantee for the > response. > Write heavy RPC load and/or slow IO causes handlers to stall in logSync. > Although the write lock is not held, readers are limited/starved and the call > queue fills. Combining an edit log thread with postponed RPC responses from > HADOOP-10300 will provide the same durability guarantee but immediately free > up the handlers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7964) Add support for async edit logging
[ https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15627888#comment-15627888 ] Hadoop QA commented on HDFS-7964: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 8s{color} | {color:red} HDFS-7964 does not apply to branch-2.7.0. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-7964 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12836509/HDFS-7964-branch-2.7.0.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/17375/console | | Powered by | Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Add support for async edit logging > -- > > Key: HDFS-7964 > URL: https://issues.apache.org/jira/browse/HDFS-7964 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Fix For: 2.8.0, 2.9.0, 3.0.0-alpha1 > > Attachments: HDFS-7964-branch-2.7.0.patch, > HDFS-7964-branch-2.8.0.patch, HDFS-7964-rebase.patch, HDFS-7964.patch, > HDFS-7964.patch, HDFS-7964.patch, HDFS-7964.patch > > > Edit logging is a major source of contention within the NN. LogEdit is > called within the namespace write log, while logSync is called outside of the > lock to allow greater concurrency. The handler thread remains busy until > logSync returns to provide the client with a durability guarantee for the > response. > Write heavy RPC load and/or slow IO causes handlers to stall in logSync. > Although the write lock is not held, readers are limited/starved and the call > queue fills. Combining an edit log thread with postponed RPC responses from > HADOOP-10300 will provide the same durability guarantee but immediately free > up the handlers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7964) Add support for async edit logging
[ https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15623774#comment-15623774 ] Zhe Zhang commented on HDFS-7964: - The findbug warning is addressed by HDFS-10485. Reported {{TestEditLog}} failure addressed by HDFS-10722, both of which I'm backporting to branch-2.8 and branch-2.7. I just pushed this change to branch-2.8. > Add support for async edit logging > -- > > Key: HDFS-7964 > URL: https://issues.apache.org/jira/browse/HDFS-7964 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Fix For: 2.8.0, 2.9.0, 3.0.0-alpha1 > > Attachments: HDFS-7964-branch-2.8.0.patch, HDFS-7964-rebase.patch, > HDFS-7964.patch, HDFS-7964.patch, HDFS-7964.patch, HDFS-7964.patch > > > Edit logging is a major source of contention within the NN. LogEdit is > called within the namespace write log, while logSync is called outside of the > lock to allow greater concurrency. The handler thread remains busy until > logSync returns to provide the client with a durability guarantee for the > response. > Write heavy RPC load and/or slow IO causes handlers to stall in logSync. > Although the write lock is not held, readers are limited/starved and the call > queue fills. Combining an edit log thread with postponed RPC responses from > HADOOP-10300 will provide the same durability guarantee but immediately free > up the handlers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7964) Add support for async edit logging
[ https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15623710#comment-15623710 ] Hadoop QA commented on HDFS-7964: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 11 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 20s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 54s{color} | {color:green} branch-2.8 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} branch-2.8 passed with JDK v1.8.0_111 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s{color} | {color:green} branch-2.8 passed with JDK v1.7.0_111 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 35s{color} | {color:green} branch-2.8 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 14s{color} | {color:green} branch-2.8 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 29s{color} | {color:green} branch-2.8 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 25s{color} | {color:green} branch-2.8 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 9s{color} | {color:green} branch-2.8 passed with JDK v1.8.0_111 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 51s{color} | {color:green} branch-2.8 passed with JDK v1.7.0_111 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 8s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s{color} | {color:green} the patch passed with JDK v1.8.0_111 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s{color} | {color:green} the patch passed with JDK v1.7.0_111 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 31s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 5 new + 1144 unchanged - 36 fixed = 1149 total (was 1180) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 10s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 3s{color} | {color:green} the patch passed with JDK v1.8.0_111 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 44s{color} | {color:green} the patch passed with JDK v1.7.0_111 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 54m 33s{color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_111. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 20s{color} | {color:green} bkjournal in the patch passed with JDK v1.7.0_111. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}142m 58s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-hdfs-project/hadoop-hdfs | | | Inconsistent synchronization of
[jira] [Commented] (HDFS-7964) Add support for async edit logging
[ https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15172922#comment-15172922 ] Hudson commented on HDFS-7964: -- FAILURE: Integrated in Hadoop-trunk-Commit #9395 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9395/]) HDFS-7964. Add support for async edit logging. Contributed by Daryn (jing9: rev 2151716832ad14932dd65b1a4e47e64d8d6cd767) * hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/test/resources/log4j.properties * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeRecovery.java * hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/test/java/org/apache/hadoop/contrib/bkjournal/TestBookKeeperAsHASharedDir.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/metrics/NameNodeMetrics.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLogJournalFailures.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLogAutoroll.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestAuditLogs.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOp.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupNode.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLogRace.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestEditLogTailer.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLog.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSEditLogLoader.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogAsync.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOpCodes.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Add support for async edit logging > -- > > Key: HDFS-7964 > URL: https://issues.apache.org/jira/browse/HDFS-7964 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Fix For: 2.9.0 > > Attachments: HDFS-7964-rebase.patch, HDFS-7964.patch, > HDFS-7964.patch, HDFS-7964.patch, HDFS-7964.patch > > > Edit logging is a major source of contention within the NN. LogEdit is > called within the namespace write log, while logSync is called outside of the > lock to allow greater concurrency. The handler thread remains busy until > logSync returns to provide the client with a durability guarantee for the > response. > Write heavy RPC load and/or slow IO causes handlers to stall in logSync. > Although the write lock is not held, readers are limited/starved and the call > queue fills. Combining an edit log thread with postponed RPC responses from > HADOOP-10300 will provide the same durability guarantee but immediately free > up the handlers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7964) Add support for async edit logging
[ https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15172637#comment-15172637 ] Daryn Sharp commented on HDFS-7964: --- Feel funny saying +1 to my own patch. Only difference I think I see is changes to debug logging and test imports. > Add support for async edit logging > -- > > Key: HDFS-7964 > URL: https://issues.apache.org/jira/browse/HDFS-7964 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Attachments: HDFS-7964-rebase.patch, HDFS-7964.patch, > HDFS-7964.patch, HDFS-7964.patch, HDFS-7964.patch > > > Edit logging is a major source of contention within the NN. LogEdit is > called within the namespace write log, while logSync is called outside of the > lock to allow greater concurrency. The handler thread remains busy until > logSync returns to provide the client with a durability guarantee for the > response. > Write heavy RPC load and/or slow IO causes handlers to stall in logSync. > Although the write lock is not held, readers are limited/starved and the call > queue fills. Combining an edit log thread with postponed RPC responses from > HADOOP-10300 will provide the same durability guarantee but immediately free > up the handlers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7964) Add support for async edit logging
[ https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15150970#comment-15150970 ] Jing Zhao commented on HDFS-7964: - just merge conflicts. you can compare the diff between the rebased patch and the last patch you uploaded to check > Add support for async edit logging > -- > > Key: HDFS-7964 > URL: https://issues.apache.org/jira/browse/HDFS-7964 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Attachments: HDFS-7964-rebase.patch, HDFS-7964.patch, > HDFS-7964.patch, HDFS-7964.patch, HDFS-7964.patch > > > Edit logging is a major source of contention within the NN. LogEdit is > called within the namespace write log, while logSync is called outside of the > lock to allow greater concurrency. The handler thread remains busy until > logSync returns to provide the client with a durability guarantee for the > response. > Write heavy RPC load and/or slow IO causes handlers to stall in logSync. > Although the write lock is not held, readers are limited/starved and the call > queue fills. Combining an edit log thread with postponed RPC responses from > HADOOP-10300 will provide the same durability guarantee but immediately free > up the handlers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7964) Add support for async edit logging
[ https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15150918#comment-15150918 ] Daryn Sharp commented on HDFS-7964: --- Thanks for the rebase. I'm currently really busy but will get to this. To help me more quickly check the patch, are there any functional changes or just trivial merge conflicts? > Add support for async edit logging > -- > > Key: HDFS-7964 > URL: https://issues.apache.org/jira/browse/HDFS-7964 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Attachments: HDFS-7964-rebase.patch, HDFS-7964.patch, > HDFS-7964.patch, HDFS-7964.patch, HDFS-7964.patch > > > Edit logging is a major source of contention within the NN. LogEdit is > called within the namespace write log, while logSync is called outside of the > lock to allow greater concurrency. The handler thread remains busy until > logSync returns to provide the client with a durability guarantee for the > response. > Write heavy RPC load and/or slow IO causes handlers to stall in logSync. > Although the write lock is not held, readers are limited/starved and the call > queue fills. Combining an edit log thread with postponed RPC responses from > HADOOP-10300 will provide the same durability guarantee but immediately free > up the handlers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7964) Add support for async edit logging
[ https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149395#comment-15149395 ] Hadoop QA commented on HDFS-7964: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 11 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 29s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 8s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 39s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 28s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 33s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 1s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 58s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 34s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 8s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 20s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 8s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 8s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 50s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 37s {color} | {color:red} hadoop-hdfs-project/hadoop-hdfs: patch generated 4 new + 1133 unchanged - 34 fixed = 1137 total (was 1167) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 27s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 25s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 1s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 51s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 32s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 76m 1s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_72. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 5m 1s {color} | {color:green} bkjournal in the patch passed with JDK v1.8.0_72. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 46s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 14s {color} | {color:green} bkjournal in the patch passed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 190m 4s {color} | {color:black} {color} | \\ \\ ||
[jira] [Commented] (HDFS-7964) Add support for async edit logging
[ https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15145469#comment-15145469 ] Dave Latham commented on HDFS-7964: --- Would love to see this get committed. > Add support for async edit logging > -- > > Key: HDFS-7964 > URL: https://issues.apache.org/jira/browse/HDFS-7964 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Attachments: HDFS-7964.patch, HDFS-7964.patch, HDFS-7964.patch > > > Edit logging is a major source of contention within the NN. LogEdit is > called within the namespace write log, while logSync is called outside of the > lock to allow greater concurrency. The handler thread remains busy until > logSync returns to provide the client with a durability guarantee for the > response. > Write heavy RPC load and/or slow IO causes handlers to stall in logSync. > Although the write lock is not held, readers are limited/starved and the call > queue fills. Combining an edit log thread with postponed RPC responses from > HADOOP-10300 will provide the same durability guarantee but immediately free > up the handlers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7964) Add support for async edit logging
[ https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15145511#comment-15145511 ] Hadoop QA commented on HDFS-7964: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 4s {color} | {color:red} HDFS-7964 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12767604/HDFS-7964.patch | | JIRA Issue | HDFS-7964 | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/14486/console | | Powered by | Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Add support for async edit logging > -- > > Key: HDFS-7964 > URL: https://issues.apache.org/jira/browse/HDFS-7964 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Attachments: HDFS-7964.patch, HDFS-7964.patch, HDFS-7964.patch > > > Edit logging is a major source of contention within the NN. LogEdit is > called within the namespace write log, while logSync is called outside of the > lock to allow greater concurrency. The handler thread remains busy until > logSync returns to provide the client with a durability guarantee for the > response. > Write heavy RPC load and/or slow IO causes handlers to stall in logSync. > Although the write lock is not held, readers are limited/starved and the call > queue fills. Combining an edit log thread with postponed RPC responses from > HADOOP-10300 will provide the same durability guarantee but immediately free > up the handlers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7964) Add support for async edit logging
[ https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15145715#comment-15145715 ] Hadoop QA commented on HDFS-7964: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 11 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 8s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 38s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 34s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 11s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 26s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 24s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 17s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 7s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 8s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 0s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 36s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 33s {color} | {color:red} hadoop-hdfs-project/hadoop-hdfs: patch generated 5 new + 1132 unchanged - 34 fixed = 1137 total (was 1166) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 7s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 47s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 14s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 58s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 54m 46s {color} | {color:green} hadoop-hdfs in the patch passed with JDK v1.8.0_72. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 7s {color} | {color:green} bkjournal in the patch passed with JDK v1.8.0_72. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 51m 59s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 7s {color} | {color:green} bkjournal in the patch passed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 144m 22s {color} | {color:black} {color} | \\
[jira] [Commented] (HDFS-7964) Add support for async edit logging
[ https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060950#comment-15060950 ] Daryn Sharp commented on HDFS-7964: --- I'm trying to whittle down my backlog of internal features before I go on year end vacation. The patch still applies, may I please have a final review? > Add support for async edit logging > -- > > Key: HDFS-7964 > URL: https://issues.apache.org/jira/browse/HDFS-7964 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Attachments: HDFS-7964.patch, HDFS-7964.patch, HDFS-7964.patch > > > Edit logging is a major source of contention within the NN. LogEdit is > called within the namespace write log, while logSync is called outside of the > lock to allow greater concurrency. The handler thread remains busy until > logSync returns to provide the client with a durability guarantee for the > response. > Write heavy RPC load and/or slow IO causes handlers to stall in logSync. > Although the write lock is not held, readers are limited/starved and the call > queue fills. Combining an edit log thread with postponed RPC responses from > HADOOP-10300 will provide the same durability guarantee but immediately free > up the handlers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7964) Add support for async edit logging
[ https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060976#comment-15060976 ] Jing Zhao commented on HDFS-7964: - Sorry for the late response, Daryn. The current patch looks good to me. I will take a final review later today. > Add support for async edit logging > -- > > Key: HDFS-7964 > URL: https://issues.apache.org/jira/browse/HDFS-7964 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Attachments: HDFS-7964.patch, HDFS-7964.patch, HDFS-7964.patch > > > Edit logging is a major source of contention within the NN. LogEdit is > called within the namespace write log, while logSync is called outside of the > lock to allow greater concurrency. The handler thread remains busy until > logSync returns to provide the client with a durability guarantee for the > response. > Write heavy RPC load and/or slow IO causes handlers to stall in logSync. > Although the write lock is not held, readers are limited/starved and the call > queue fills. Combining an edit log thread with postponed RPC responses from > HADOOP-10300 will provide the same durability guarantee but immediately free > up the handlers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7964) Add support for async edit logging
[ https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061297#comment-15061297 ] Jing Zhao commented on HDFS-7964: - The patch looks good to me. One nit is that two places in FSEditLogAsync have some commented code: {code} //if (LOG.isDebugEnabled()) { LOG.info("logSync "+edit); //} {code} +1 after addressing this. Besides, the current postponeResponse/sendResponse code uses a counter to delay the response. This looks a little hacky to me. But I understand this may be the only way to add the new functionality without changing the original code. Maybe we can do some extra refactoring in the future. > Add support for async edit logging > -- > > Key: HDFS-7964 > URL: https://issues.apache.org/jira/browse/HDFS-7964 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Attachments: HDFS-7964.patch, HDFS-7964.patch, HDFS-7964.patch > > > Edit logging is a major source of contention within the NN. LogEdit is > called within the namespace write log, while logSync is called outside of the > lock to allow greater concurrency. The handler thread remains busy until > logSync returns to provide the client with a durability guarantee for the > response. > Write heavy RPC load and/or slow IO causes handlers to stall in logSync. > Although the write lock is not held, readers are limited/starved and the call > queue fills. Combining an edit log thread with postponed RPC responses from > HADOOP-10300 will provide the same durability guarantee but immediately free > up the handlers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7964) Add support for async edit logging
[ https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058219#comment-15058219 ] Daryn Sharp commented on HDFS-7964: --- Comments? > Add support for async edit logging > -- > > Key: HDFS-7964 > URL: https://issues.apache.org/jira/browse/HDFS-7964 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Attachments: HDFS-7964.patch, HDFS-7964.patch, HDFS-7964.patch > > > Edit logging is a major source of contention within the NN. LogEdit is > called within the namespace write log, while logSync is called outside of the > lock to allow greater concurrency. The handler thread remains busy until > logSync returns to provide the client with a durability guarantee for the > response. > Write heavy RPC load and/or slow IO causes handlers to stall in logSync. > Although the write lock is not held, readers are limited/starved and the call > queue fills. Combining an edit log thread with postponed RPC responses from > HADOOP-10300 will provide the same durability guarantee but immediately free > up the handlers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7964) Add support for async edit logging
[ https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14981323#comment-14981323 ] Daryn Sharp commented on HDFS-7964: --- I think special casing a few ops is fragile even if we add an assert. I feel safer knowing a production deadlock isn't going to occur if/when someone doesn't add tests to stress all possible code paths or more importantly doesn't test against the async logging. A runtime check will require halting the NN because data structures are already updated - definitely not good when a simple sync check will prevent any issues. Exploring latency concerns: Today, logEdit has to serialize and compute the crc for ops while holding the write lock. With this feature, logEdit only queues which releases the write lock sooner. The serialization cost is shifted to the background thread, however there's not 100 threads contending. In the meantime other calls are being processed. Sadly, the slow steady stream concern isn't even possible. Even with 20-30k ops/sec, the average batch size is 1.6. I've seen as high as 7. Now once fine-grain locking is done... We may be able to have that debate. ;) > Add support for async edit logging > -- > > Key: HDFS-7964 > URL: https://issues.apache.org/jira/browse/HDFS-7964 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Attachments: HDFS-7964.patch, HDFS-7964.patch, HDFS-7964.patch > > > Edit logging is a major source of contention within the NN. LogEdit is > called within the namespace write log, while logSync is called outside of the > lock to allow greater concurrency. The handler thread remains busy until > logSync returns to provide the client with a durability guarantee for the > response. > Write heavy RPC load and/or slow IO causes handlers to stall in logSync. > Although the write lock is not held, readers are limited/starved and the call > queue fills. Combining an edit log thread with postponed RPC responses from > HADOOP-10300 will provide the same durability guarantee but immediately free > up the handlers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7964) Add support for async edit logging
[ https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967625#comment-14967625 ] Jing Zhao commented on HDFS-7964: - Thanks for updating the patch, Daryn. bq. It's ensuring correctness by preventing a deadlock with the background thread. IIRC, there is also a call synchronized on the edit log call that must know the current txid (rolling?) which isn't possible when async. Currently I only find startSegment and endSegment holding the lock and thus SyncEdit will be created for them. These two calls also requires the sync semantically. Thus I'm thinking if we can change the condition {{!Thread.holdsLock(this)}} to {{if op is not start/endSegment}}. We can still add an assertion to make sure the lock is held when creating SyncEdits and not held when creating AsyncEdits. Did I miss something here? bq. Do you mean drain only as many edits from the pending queue as were present at the beginning of the cycle? My original concern was mainly about the latency for a single request when there is not a lot of traffic to the NN. {{doSync = edit.logEdit()}} means the sync only happens when the buffer is full, thus if the request keeps coming into the pending queue (thus editPendingQ is always non-empty) the first request needs to wait for several extra iterations until the buffer is filled. But actually this extra latency is small so the current code should be fine. > Add support for async edit logging > -- > > Key: HDFS-7964 > URL: https://issues.apache.org/jira/browse/HDFS-7964 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Attachments: HDFS-7964.patch, HDFS-7964.patch, HDFS-7964.patch > > > Edit logging is a major source of contention within the NN. LogEdit is > called within the namespace write log, while logSync is called outside of the > lock to allow greater concurrency. The handler thread remains busy until > logSync returns to provide the client with a durability guarantee for the > response. > Write heavy RPC load and/or slow IO causes handlers to stall in logSync. > Although the write lock is not held, readers are limited/starved and the call > queue fills. Combining an edit log thread with postponed RPC responses from > HADOOP-10300 will provide the same durability guarantee but immediately free > up the handlers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7964) Add support for async edit logging
[ https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14965419#comment-14965419 ] Hadoop QA commented on HDFS-7964: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 20m 50s | Pre-patch trunk has 1 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 11 new or modified test files. | | {color:green}+1{color} | javac | 10m 16s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 11m 40s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 25s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 2m 2s | The applied patch generated 6 new checkstyle issues (total was 1191, now 1162). | | {color:green}+1{color} | whitespace | 0m 21s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 41s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 36s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 33s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 29s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 54m 38s | Tests failed in hadoop-hdfs. | | {color:red}-1{color} | hdfs tests | 0m 16s | Tests failed in bkjournal. | | | | 110m 1s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.TestParallelUnixDomainRead | | | hadoop.hdfs.TestWriteReadStripedFile | | | hadoop.hdfs.TestReplaceDatanodeOnFailure | | | hadoop.hdfs.TestHDFSFileSystemContract | | | hadoop.hdfs.server.blockmanagement.TestNodeCount | | Failed build | bkjournal | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12767604/HDFS-7964.patch | | Optional Tests | javac unit findbugs checkstyle javadoc | | git revision | trunk / 9cb5d35 | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-HDFS-Build/13082/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/13082/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/13082/artifact/patchprocess/testrun_hadoop-hdfs.txt | | bkjournal test log | https://builds.apache.org/job/PreCommit-HDFS-Build/13082/artifact/patchprocess/testrun_bkjournal.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/13082/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/13082/console | This message was automatically generated. > Add support for async edit logging > -- > > Key: HDFS-7964 > URL: https://issues.apache.org/jira/browse/HDFS-7964 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Attachments: HDFS-7964.patch, HDFS-7964.patch, HDFS-7964.patch > > > Edit logging is a major source of contention within the NN. LogEdit is > called within the namespace write log, while logSync is called outside of the > lock to allow greater concurrency. The handler thread remains busy until > logSync returns to provide the client with a durability guarantee for the > response. > Write heavy RPC load and/or slow IO causes handlers to stall in logSync. > Although the write lock is not held, readers are limited/starved and the call > queue fills. Combining an edit log thread with postponed RPC responses from > HADOOP-10300 will provide the same durability guarantee but immediately free > up the handlers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7964) Add support for async edit logging
[ https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14966300#comment-14966300 ] Walter Su commented on HDFS-7964: - {code} if (call.connection.responseQueue.size() == 1) { //Server.java#doRespond(Call) processResponse(call.connection.responseQueue, true); {code} {{Handlers}} will help {{Responder}} to call {{processResponse}} sometimes. FSEditLogAsync.run() --> RpcEdit.logSyncNotify(..) --> call.sendResponse() --> connection.sendResponse() --> responder.doRespond(call) --> processResponse(..) ( possibly --> closeConnection(..) ) It could slow down {{FSEditLogAsync}} thread, then [~jingzhao]'s concern #2 seems valid. > Add support for async edit logging > -- > > Key: HDFS-7964 > URL: https://issues.apache.org/jira/browse/HDFS-7964 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Attachments: HDFS-7964.patch, HDFS-7964.patch, HDFS-7964.patch > > > Edit logging is a major source of contention within the NN. LogEdit is > called within the namespace write log, while logSync is called outside of the > lock to allow greater concurrency. The handler thread remains busy until > logSync returns to provide the client with a durability guarantee for the > response. > Write heavy RPC load and/or slow IO causes handlers to stall in logSync. > Although the write lock is not held, readers are limited/starved and the call > queue fills. Combining an edit log thread with postponed RPC responses from > HADOOP-10300 will provide the same durability guarantee but immediately free > up the handlers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7964) Add support for async edit logging
[ https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14965106#comment-14965106 ] Daryn Sharp commented on HDFS-7964: --- # It's ensuring correctness by preventing a deadlock with the background thread. IIRC, there is also a call synchronized on the edit log call that must know the current txid (rolling?) which isn't possible when async. # Do you mean drain only as many edits from the pending queue as were present at the beginning of the cycle? I considered that but if the NN is falling that far behind on edits, bigger batches and fewer fsyncs are the only way to catch up. Unless there is a disk issue, the thread has never fallen behind for us. It actually batches less than we'd like because ops can't log fast enough. # I'd rather not remove code unrelated to the change, can it be a separate jira? # Was minimizing change but ok. # Mostly yes, because if I had to touch all the code calling it I would have been inclined to remove all the static methods which turns into a much larger patch. # Ok. In short, it does what it did before which is the number batched is the calls that piggybacked on another sync (hence the -1). > Add support for async edit logging > -- > > Key: HDFS-7964 > URL: https://issues.apache.org/jira/browse/HDFS-7964 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Attachments: HDFS-7964.patch, HDFS-7964.patch > > > Edit logging is a major source of contention within the NN. LogEdit is > called within the namespace write log, while logSync is called outside of the > lock to allow greater concurrency. The handler thread remains busy until > logSync returns to provide the client with a durability guarantee for the > response. > Write heavy RPC load and/or slow IO causes handlers to stall in logSync. > Although the write lock is not held, readers are limited/starved and the call > queue fills. Combining an edit log thread with postponed RPC responses from > HADOOP-10300 will provide the same durability guarantee but immediately free > up the handlers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7964) Add support for async edit logging
[ https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14962767#comment-14962767 ] Yi Liu commented on HDFS-7964: -- 1. That's right. 2. You can keep it. > Add support for async edit logging > -- > > Key: HDFS-7964 > URL: https://issues.apache.org/jira/browse/HDFS-7964 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Attachments: HDFS-7964.patch, HDFS-7964.patch > > > Edit logging is a major source of contention within the NN. LogEdit is > called within the namespace write log, while logSync is called outside of the > lock to allow greater concurrency. The handler thread remains busy until > logSync returns to provide the client with a durability guarantee for the > response. > Write heavy RPC load and/or slow IO causes handlers to stall in logSync. > Although the write lock is not held, readers are limited/starved and the call > queue fills. Combining an edit log thread with postponed RPC responses from > HADOOP-10300 will provide the same durability guarantee but immediately free > up the handlers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7964) Add support for async edit logging
[ https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14962829#comment-14962829 ] Hadoop QA commented on HDFS-7964: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 21m 28s | Findbugs (version 3.0.0) appears to be broken on trunk. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 11 new or modified test files. | | {color:green}+1{color} | javac | 9m 38s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 13m 19s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 29s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 53s | The applied patch generated 10 new checkstyle issues (total was 1190, now 1165). | | {color:red}-1{color} | whitespace | 0m 20s | The patch has 6 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 50s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 42s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 4m 12s | The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 4m 27s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 71m 37s | Tests failed in hadoop-hdfs. | | {color:green}+1{color} | hdfs tests | 6m 58s | Tests passed in bkjournal. | | | | 137m 8s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-hdfs | | Failed unit tests | hadoop.hdfs.TestReplaceDatanodeOnFailure | | | hadoop.hdfs.shortcircuit.TestShortCircuitCache | | | hadoop.hdfs.server.datanode.TestDirectoryScanner | | Timed out tests | org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure | | | org.apache.hadoop.hdfs.TestFileCreationClient | | | org.apache.hadoop.hdfs.qjournal.client.TestQJMWithFaults | | | org.apache.hadoop.hdfs.TestEncryptionZonesWithKMS | | | org.apache.hadoop.hdfs.TestLeaseRecovery | | | org.apache.hadoop.hdfs.server.namenode.TestFileTruncate | | | org.apache.hadoop.hdfs.server.mover.TestStorageMover | | | org.apache.hadoop.hdfs.server.mover.TestMover | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12766854/HDFS-7964.patch | | Optional Tests | javac unit findbugs checkstyle javadoc | | git revision | trunk / 476a251 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/13046/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt | | whitespace | https://builds.apache.org/job/PreCommit-HDFS-Build/13046/artifact/patchprocess/whitespace.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-HDFS-Build/13046/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/13046/artifact/patchprocess/testrun_hadoop-hdfs.txt | | bkjournal test log | https://builds.apache.org/job/PreCommit-HDFS-Build/13046/artifact/patchprocess/testrun_bkjournal.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/13046/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/13046/console | This message was automatically generated. > Add support for async edit logging > -- > > Key: HDFS-7964 > URL: https://issues.apache.org/jira/browse/HDFS-7964 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Attachments: HDFS-7964.patch, HDFS-7964.patch > > > Edit logging is a major source of contention within the NN. LogEdit is > called within the namespace write log, while logSync is called outside of the > lock to allow greater concurrency. The handler thread remains busy until > logSync returns to provide the client with a durability guarantee for the > response. > Write heavy RPC load and/or slow IO causes handlers to stall in logSync. > Although the write lock is not held, readers are limited/starved and the call > queue fills. Combining an edit log thread with postponed RPC responses from > HADOOP-10300 will provide the same durability guarantee but immediately free > up the handlers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7964) Add support for async edit logging
[ https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14960721#comment-14960721 ] Daryn Sharp commented on HDFS-7964: --- # The thread is the only one calling the real logEdit so the lastest txid is the one it last logged. # I changed it because the bookkeeper tests emitted nothing at all which makes it really hard to debug. I can undo it or change to INFO (what I intended) if you like. # It depends on the latest changes in HADOOP-12483. I renamed a new method in HADOOP-10300 to be more explicit about it's purpose. > Add support for async edit logging > -- > > Key: HDFS-7964 > URL: https://issues.apache.org/jira/browse/HDFS-7964 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Attachments: HDFS-7964.patch, HDFS-7964.patch > > > Edit logging is a major source of contention within the NN. LogEdit is > called within the namespace write log, while logSync is called outside of the > lock to allow greater concurrency. The handler thread remains busy until > logSync returns to provide the client with a durability guarantee for the > response. > Write heavy RPC load and/or slow IO causes handlers to stall in logSync. > Although the write lock is not held, readers are limited/starved and the call > queue fills. Combining an edit log thread with postponed RPC responses from > HADOOP-10300 will provide the same durability guarantee but immediately free > up the handlers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7964) Add support for async edit logging
[ https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961593#comment-14961593 ] Jing Zhao commented on HDFS-7964: - Thanks for rebasing the patch, Daryn. The patch looks good to me. Some minor comments: # The following code uses whether the current thread holds the monitor to decide whether the edit should be async/sync. This way may be not direct to follow, also make it hard to guarantee the correctness of future code. Can we simply make the decision based on the op itself? {code} // only rpc calls not explicitly sync'ed on the log will be async. if (rpcCall != null && !Thread.holdsLock(this)) { edit = new AsyncEdit(this, op, rpcCall); } else { edit = new SyncEdit(this, op); } {code} # If requests keeps coming but the traffic is slow, the sync will happen only when the buffer is full, which means the response may be delayed? This may be a rare case in practice but maybe we should avoid it here. Can we make each iteration of the loop either fill the buffer or drain the pending queue? {code} if (edit != null) { // sync if requested by edit log. doSync = edit.logEdit(); syncWaitQ.add(edit); } else { // sync when editq runs dry, but have edits pending a sync. doSync = !syncWaitQ.isEmpty(); } {code} # The class InvalidOp has not been used. We can either remove it or use it in {{OP_INVALID}}. # Maybe we can do some further cleanup for {{RollingUpgradeOp}}. E.g., after adding classes like {{RollingUpgradeStartOp}} and {{RollingUpgradeFinalizeOp}}, we can put {{getInstance}} methods there and remove {{getStartInstance}} and {{getFinalizeInstance}}. # Is the main reason of having {{OpInstanceCache#get}} to minimize the code change? # It will be helpful to add a comment to explain the calculation logic of {{editsBatchedInSync}}. > Add support for async edit logging > -- > > Key: HDFS-7964 > URL: https://issues.apache.org/jira/browse/HDFS-7964 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Attachments: HDFS-7964.patch, HDFS-7964.patch > > > Edit logging is a major source of contention within the NN. LogEdit is > called within the namespace write log, while logSync is called outside of the > lock to allow greater concurrency. The handler thread remains busy until > logSync returns to provide the client with a durability guarantee for the > response. > Write heavy RPC load and/or slow IO causes handlers to stall in logSync. > Although the write lock is not held, readers are limited/starved and the call > queue fills. Combining an edit log thread with postponed RPC responses from > HADOOP-10300 will provide the same durability guarantee but immediately free > up the handlers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7964) Add support for async edit logging
[ https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14960114#comment-14960114 ] Yi Liu commented on HDFS-7964: -- Thanks [~daryn] for the work. Further comments: *1.* In FSEditLogAsync#run {code} @Override public void run() { try { while (true) { if (doSync) { ... logSync(getLastWrittenTxId()); ... {code} I think it's better to pass the txid of current edit to {{logSync}}, not need to wait for all txid written. Then it's more efficient and client can get more faster response? *2.* {code} + editsBatchedInSync = txid - synctxid - 1; {code} Isn't it "txid - synctxid"? The txid is the max txid written, and synctxid is the max txid already synced, suppose txid = 20, synctxid = 10, then the editsBatchedInSync should be (txid - synctxid) = (20 - 10) = 10. Also you can get it from the existing log message: {code} final String msg = "Could not sync enough journals to persistent storage " + "due to " + e.getMessage() + ". " + "Unsynced transactions: " + (txid - synctxid); {code} *3.* {code} -log4j.rootLogger=OFF, CONSOLE +log4j.rootLogger=DEBUG, CONSOLE {code} Any reason to change it? *4.* {code} call.abortResponse(syncEx); {code} Seems this code is not available? > Add support for async edit logging > -- > > Key: HDFS-7964 > URL: https://issues.apache.org/jira/browse/HDFS-7964 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Attachments: HDFS-7964.patch, HDFS-7964.patch > > > Edit logging is a major source of contention within the NN. LogEdit is > called within the namespace write log, while logSync is called outside of the > lock to allow greater concurrency. The handler thread remains busy until > logSync returns to provide the client with a durability guarantee for the > response. > Write heavy RPC load and/or slow IO causes handlers to stall in logSync. > Although the write lock is not held, readers are limited/starved and the call > queue fills. Combining an edit log thread with postponed RPC responses from > HADOOP-10300 will provide the same durability guarantee but immediately free > up the handlers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7964) Add support for async edit logging
[ https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14960144#comment-14960144 ] Hadoop QA commented on HDFS-7964: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 21m 28s | Pre-patch trunk has 1 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 11 new or modified test files. | | {color:red}-1{color} | javac | 1m 55s | The patch appears to cause the build to fail. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12766854/HDFS-7964.patch | | Optional Tests | javac unit findbugs checkstyle javadoc | | git revision | trunk / cf23f2c | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-HDFS-Build/13022/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/13022/console | This message was automatically generated. > Add support for async edit logging > -- > > Key: HDFS-7964 > URL: https://issues.apache.org/jira/browse/HDFS-7964 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Attachments: HDFS-7964.patch, HDFS-7964.patch > > > Edit logging is a major source of contention within the NN. LogEdit is > called within the namespace write log, while logSync is called outside of the > lock to allow greater concurrency. The handler thread remains busy until > logSync returns to provide the client with a durability guarantee for the > response. > Write heavy RPC load and/or slow IO causes handlers to stall in logSync. > Although the write lock is not held, readers are limited/starved and the call > queue fills. Combining an edit log thread with postponed RPC responses from > HADOOP-10300 will provide the same durability guarantee but immediately free > up the handlers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7964) Add support for async edit logging
[ https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701823#comment-14701823 ] Daryn Sharp commented on HDFS-7964: --- [~hitliuyi], I'm sure I'll need to rebase. The batched metric is the count of edits being implicitly sync'ed by another edit, not the sum of all edits in the sync. That was the pre-existing semantic I didn't want to alter. Technically AsyncEdit#logSyncNotify doesn't need sync, but findbugs gets grumbly about access to {{done}}. Re: conf. At the time, we didn't have much run time on it so I considered it experimental. All 2.6 clusters have been running with the feature enabled for 6+ months so I'll add to the default conf. Add support for async edit logging -- Key: HDFS-7964 URL: https://issues.apache.org/jira/browse/HDFS-7964 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 2.0.2-alpha Reporter: Daryn Sharp Assignee: Daryn Sharp Attachments: HDFS-7964.patch Edit logging is a major source of contention within the NN. LogEdit is called within the namespace write log, while logSync is called outside of the lock to allow greater concurrency. The handler thread remains busy until logSync returns to provide the client with a durability guarantee for the response. Write heavy RPC load and/or slow IO causes handlers to stall in logSync. Although the write lock is not held, readers are limited/starved and the call queue fills. Combining an edit log thread with postponed RPC responses from HADOOP-10300 will provide the same durability guarantee but immediately free up the handlers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7964) Add support for async edit logging
[ https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702369#comment-14702369 ] Yi Liu commented on HDFS-7964: -- {quote} All 2.6 clusters have been running with the feature enabled for 6+ months {quote} Glad to see this. Yes, please rebase it, this is a good improvement. Add support for async edit logging -- Key: HDFS-7964 URL: https://issues.apache.org/jira/browse/HDFS-7964 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 2.0.2-alpha Reporter: Daryn Sharp Assignee: Daryn Sharp Attachments: HDFS-7964.patch Edit logging is a major source of contention within the NN. LogEdit is called within the namespace write log, while logSync is called outside of the lock to allow greater concurrency. The handler thread remains busy until logSync returns to provide the client with a durability guarantee for the response. Write heavy RPC load and/or slow IO causes handlers to stall in logSync. Although the write lock is not held, readers are limited/starved and the call queue fills. Combining an edit log thread with postponed RPC responses from HADOOP-10300 will provide the same durability guarantee but immediately free up the handlers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7964) Add support for async edit logging
[ https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386638#comment-14386638 ] Yi Liu commented on HDFS-7964: -- [~daryn], this is a cool and important improvement. Besides the benefits you stated in the JIRA description, another benefit is: in current HDFS, if force sync happens in logEdit for HDFS ops, it is protected by FSN write lock, even we increase handler threads, HDFS read ops which don't require logEdit are still blocked (require read lock), this improvement also resolves this. {quote} editsBatchedInSync = txid - synctxid - 1; {quote} Should be {{txid - synctxid}}. Don't need {{synchronized}} for AsyncEdit#logSyncNotify ? add the configuration to hdfs-default.xml? Add support for async edit logging -- Key: HDFS-7964 URL: https://issues.apache.org/jira/browse/HDFS-7964 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 2.0.2-alpha Reporter: Daryn Sharp Assignee: Daryn Sharp Attachments: HDFS-7964.patch Edit logging is a major source of contention within the NN. LogEdit is called within the namespace write log, while logSync is called outside of the lock to allow greater concurrency. The handler thread remains busy until logSync returns to provide the client with a durability guarantee for the response. Write heavy RPC load and/or slow IO causes handlers to stall in logSync. Although the write lock is not held, readers are limited/starved and the call queue fills. Combining an edit log thread with postponed RPC responses from HADOOP-10300 will provide the same durability guarantee but immediately free up the handlers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7964) Add support for async edit logging
[ https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14372334#comment-14372334 ] Daryn Sharp commented on HDFS-7964: --- As background, the problem was tackled after recurring slow IO issues caused some handlers to block with a small batch of edits. Remaining handlers filled the other side of the edit log double-buffer. In the worst case scenario, an auto-sync was triggered by logEdit while the write lock was held. The call queue overflowed, further exacerbated by the resulting tcp listen queue overflows, tcp syn cookies, and client timeouts. When the ipc machinery recovered, the process would repeat in an oscillating manner until the IO issues dissipated. Even w/o an auto-sync, the high rate of read operations caused small batching of writes. Add support for async edit logging -- Key: HDFS-7964 URL: https://issues.apache.org/jira/browse/HDFS-7964 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 2.0.2-alpha Reporter: Daryn Sharp Assignee: Daryn Sharp Attachments: HDFS-7964.patch Edit logging is a major source of contention within the NN. LogEdit is called within the namespace write log, while logSync is called outside of the lock to allow greater concurrency. The handler thread remains busy until logSync returns to provide the client with a durability guarantee for the response. Write heavy RPC load and/or slow IO causes handlers to stall in logSync. Although the write lock is not held, readers are limited/starved and the call queue fills. Combining an edit log thread with postponed RPC responses from HADOOP-10300 will provide the same durability guarantee but immediately free up the handlers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7964) Add support for async edit logging
[ https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14372226#comment-14372226 ] Daryn Sharp commented on HDFS-7964: --- Patch won't compile w/o HADOOP-10300 so not submitting yet. Add support for async edit logging -- Key: HDFS-7964 URL: https://issues.apache.org/jira/browse/HDFS-7964 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 2.0.2-alpha Reporter: Daryn Sharp Assignee: Daryn Sharp Attachments: HDFS-7964.patch Edit logging is a major source of contention within the NN. LogEdit is called within the namespace write log, while logSync is called outside of the lock to allow greater concurrency. The handler thread remains busy until logSync returns to provide the client with a durability guarantee for the response. Write heavy RPC load and/or slow IO causes handlers to stall in logSync. Although the write lock is not held, readers are limited/starved and the call queue fills. Combining an edit log thread with postponed RPC responses from HADOOP-10300 will provide the same durability guarantee but immediately free up the handlers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)