[jira] [Updated] (HDFS-6110) adding more slow action log in critical write path
[ https://issues.apache.org/jira/browse/HDFS-6110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HDFS-6110: Resolution: Fixed Fix Version/s: 2.5.0 Release Note: Log slow i/o. Set log thresholds in dfsclient and datanode via the below new configs: dfs.client.slow.io.warning.threshold.ms (Default 30 seconds) dfs.datanode.slow.io.warning.threshold.ms (Default 300ms) Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to trunk and branch-2. Thanks for the patch Liang Xie. adding more slow action log in critical write path -- Key: HDFS-6110 URL: https://issues.apache.org/jira/browse/HDFS-6110 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 3.0.0, 2.3.0 Reporter: Liang Xie Assignee: Liang Xie Fix For: 2.5.0 Attachments: HDFS-6110-v2.txt, HDFS-6110.txt, HDFS-6110v3.txt, HDFS-6110v4.txt, HDFS-6110v5.txt, HDFS-6110v6.txt After digging a HBase write spike issue caused by slow buffer io in our cluster, just realize we'd better to add more abnormal latency warning log in write flow, such that if other guys hit HLog sync spike, we could know more detail info from HDFS side at the same time. Patch will be uploaded soon. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6110) adding more slow action log in critical write path
[ https://issues.apache.org/jira/browse/HDFS-6110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Xie updated HDFS-6110: Attachment: HDFS-6110v5.txt adding more slow action log in critical write path -- Key: HDFS-6110 URL: https://issues.apache.org/jira/browse/HDFS-6110 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 3.0.0, 2.3.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HDFS-6110-v2.txt, HDFS-6110.txt, HDFS-6110v3.txt, HDFS-6110v4.txt, HDFS-6110v5.txt After digging a HBase write spike issue caused by slow buffer io in our cluster, just realize we'd better to add more abnormal latency warning log in write flow, such that if other guys hit HLog sync spike, we could know more detail info from HDFS side at the same time. Patch will be uploaded soon. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6110) adding more slow action log in critical write path
[ https://issues.apache.org/jira/browse/HDFS-6110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Xie updated HDFS-6110: Attachment: HDFS-6110v5.txt adding more slow action log in critical write path -- Key: HDFS-6110 URL: https://issues.apache.org/jira/browse/HDFS-6110 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 3.0.0, 2.3.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HDFS-6110-v2.txt, HDFS-6110.txt, HDFS-6110v3.txt, HDFS-6110v4.txt, HDFS-6110v5.txt After digging a HBase write spike issue caused by slow buffer io in our cluster, just realize we'd better to add more abnormal latency warning log in write flow, such that if other guys hit HLog sync spike, we could know more detail info from HDFS side at the same time. Patch will be uploaded soon. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6110) adding more slow action log in critical write path
[ https://issues.apache.org/jira/browse/HDFS-6110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Xie updated HDFS-6110: Attachment: (was: HDFS-6110v5.txt) adding more slow action log in critical write path -- Key: HDFS-6110 URL: https://issues.apache.org/jira/browse/HDFS-6110 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 3.0.0, 2.3.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HDFS-6110-v2.txt, HDFS-6110.txt, HDFS-6110v3.txt, HDFS-6110v4.txt, HDFS-6110v5.txt After digging a HBase write spike issue caused by slow buffer io in our cluster, just realize we'd better to add more abnormal latency warning log in write flow, such that if other guys hit HLog sync spike, we could know more detail info from HDFS side at the same time. Patch will be uploaded soon. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6110) adding more slow action log in critical write path
[ https://issues.apache.org/jira/browse/HDFS-6110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HDFS-6110: Attachment: HDFS-6110v6.txt [~xieliang007] 's latest patch adding in offline review feedback I got from our Todd (See below): i.e. having one threshold for dfsclient (a higher one so folks MR'ing don't get annoyed by all the WARNings about slow i/o), and then another for datanode side which is much lower so we can see bad i/os. {code} 16:38 todd stack: just looked at 6110. had one more thought after commenting on the JIRA 16:38 todd you think we should add a separate config for client vs server? 16:38 todd I'm afraid that the 300ms default may be a little aggressive for the client - people using hadoop fs -put to upload files may get kind of nervous the next time they upgrade if they start seeing warnings 16:38 todd MR jobs too 16:39 todd may be better to have the client default be 10sec or something really long, and then HBase could tune it down for WAL files 16:39 stack todd: thanks boss 16:39 todd you think i'm crazy? 16:39 stack no 16:39 stack Testing it, it is illuminating to see how long stuff takes 16:39 todd k. yea 16:39 todd I had a patch like that once on the server side 16:39 stack Was worried though that it'd freak folks out. 16:40 stack Or, rather, they'd ignore what is being said and just consider it 'noise'. 16:40 todd yea 16:40 todd for a throughput app it is kind of noise 16:40 todd but hbase could definitely tune the default inside the RS down 16:40 stack Let me do as you suggest. 16:40 todd k 16:40 stack Thanks for review. 16:40 todd feel free to paste this convo into the jira so it makes sense :) 16:40 todd didn't want to post yet another comment and pollute everyone's mailboxes 16:41 * stack nod {code} adding more slow action log in critical write path -- Key: HDFS-6110 URL: https://issues.apache.org/jira/browse/HDFS-6110 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 3.0.0, 2.3.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HDFS-6110-v2.txt, HDFS-6110.txt, HDFS-6110v3.txt, HDFS-6110v4.txt, HDFS-6110v5.txt, HDFS-6110v6.txt After digging a HBase write spike issue caused by slow buffer io in our cluster, just realize we'd better to add more abnormal latency warning log in write flow, such that if other guys hit HLog sync spike, we could know more detail info from HDFS side at the same time. Patch will be uploaded soon. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6110) adding more slow action log in critical write path
[ https://issues.apache.org/jira/browse/HDFS-6110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Xie updated HDFS-6110: Attachment: HDFS-6110v4.txt Attached v4 should address the last comment from Todd adding more slow action log in critical write path -- Key: HDFS-6110 URL: https://issues.apache.org/jira/browse/HDFS-6110 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 3.0.0, 2.3.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HDFS-6110-v2.txt, HDFS-6110.txt, HDFS-6110v3.txt, HDFS-6110v4.txt After digging a HBase write spike issue caused by slow buffer io in our cluster, just realize we'd better to add more abnormal latency warning log in write flow, such that if other guys hit HLog sync spike, we could know more detail info from HDFS side at the same time. Patch will be uploaded soon. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6110) adding more slow action log in critical write path
[ https://issues.apache.org/jira/browse/HDFS-6110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HDFS-6110: Attachment: HDFS-6110v3.txt I tried it out. Looks good. Minor formatting of log changes (They all have a 'Slow' prefix...). Here is an example: {code} 2014-03-27 22:46:19,975 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Slow BlockReceiver write packet to mirror took 986ms (threshold=300ms) {code} Was going to commit with the conservative 300ms threshold unless objection. adding more slow action log in critical write path -- Key: HDFS-6110 URL: https://issues.apache.org/jira/browse/HDFS-6110 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 3.0.0, 2.3.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HDFS-6110-v2.txt, HDFS-6110.txt, HDFS-6110v3.txt After digging a HBase write spike issue caused by slow buffer io in our cluster, just realize we'd better to add more abnormal latency warning log in write flow, such that if other guys hit HLog sync spike, we could know more detail info from HDFS side at the same time. Patch will be uploaded soon. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6110) adding more slow action log in critical write path
[ https://issues.apache.org/jira/browse/HDFS-6110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Xie updated HDFS-6110: Attachment: HDFS-6110.txt Here is the patch, extracted from my code, it's pretty simple, but extremely useful for my investigation on HBase write outlier these days:) [~saint@gmail.com] DFSOutputStream was modified as well, then a HBase ops could be alerted by warning log easier. adding more slow action log in critical write path -- Key: HDFS-6110 URL: https://issues.apache.org/jira/browse/HDFS-6110 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 3.0.0, 2.3.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HDFS-6110.txt After digging a HBase write spike issue caused by slow buffer io in our cluster, just realize we'd better to add more abnormal latency warning log in write flow, such that if other guys hit HLog sync spike, we could know more detail info from HDFS side at the same time. Patch will be uploaded soon. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6110) adding more slow action log in critical write path
[ https://issues.apache.org/jira/browse/HDFS-6110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Xie updated HDFS-6110: Status: Patch Available (was: Open) adding more slow action log in critical write path -- Key: HDFS-6110 URL: https://issues.apache.org/jira/browse/HDFS-6110 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.3.0, 3.0.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HDFS-6110.txt After digging a HBase write spike issue caused by slow buffer io in our cluster, just realize we'd better to add more abnormal latency warning log in write flow, such that if other guys hit HLog sync spike, we could know more detail info from HDFS side at the same time. Patch will be uploaded soon. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6110) adding more slow action log in critical write path
[ https://issues.apache.org/jira/browse/HDFS-6110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Xie updated HDFS-6110: Attachment: HDFS-6110-v2.txt making the threshold configruable in patch v2. adding more slow action log in critical write path -- Key: HDFS-6110 URL: https://issues.apache.org/jira/browse/HDFS-6110 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 3.0.0, 2.3.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HDFS-6110-v2.txt, HDFS-6110.txt After digging a HBase write spike issue caused by slow buffer io in our cluster, just realize we'd better to add more abnormal latency warning log in write flow, such that if other guys hit HLog sync spike, we could know more detail info from HDFS side at the same time. Patch will be uploaded soon. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6110) adding more slow action log in critical write path
[ https://issues.apache.org/jira/browse/HDFS-6110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Xie updated HDFS-6110: Attachment: (was: HDFS-6110-v2.txt) adding more slow action log in critical write path -- Key: HDFS-6110 URL: https://issues.apache.org/jira/browse/HDFS-6110 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 3.0.0, 2.3.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HDFS-6110.txt After digging a HBase write spike issue caused by slow buffer io in our cluster, just realize we'd better to add more abnormal latency warning log in write flow, such that if other guys hit HLog sync spike, we could know more detail info from HDFS side at the same time. Patch will be uploaded soon. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6110) adding more slow action log in critical write path
[ https://issues.apache.org/jira/browse/HDFS-6110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Xie updated HDFS-6110: Attachment: HDFS-6110-v2.txt adding more slow action log in critical write path -- Key: HDFS-6110 URL: https://issues.apache.org/jira/browse/HDFS-6110 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 3.0.0, 2.3.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HDFS-6110-v2.txt, HDFS-6110.txt After digging a HBase write spike issue caused by slow buffer io in our cluster, just realize we'd better to add more abnormal latency warning log in write flow, such that if other guys hit HLog sync spike, we could know more detail info from HDFS side at the same time. Patch will be uploaded soon. -- This message was sent by Atlassian JIRA (v6.2#6252)