[jira] [Updated] (HDFS-6110) adding more slow action log in critical write path

2014-05-26 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HDFS-6110:


   Resolution: Fixed
Fix Version/s: 2.5.0
 Release Note: 
Log slow i/o.  Set log thresholds in dfsclient and datanode via the below  new 
configs:

dfs.client.slow.io.warning.threshold.ms (Default 30 seconds)
dfs.datanode.slow.io.warning.threshold.ms (Default 300ms)
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Committed to trunk and branch-2.  Thanks for the patch Liang Xie.

 adding more slow action log in critical write path
 --

 Key: HDFS-6110
 URL: https://issues.apache.org/jira/browse/HDFS-6110
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 3.0.0, 2.3.0
Reporter: Liang Xie
Assignee: Liang Xie
 Fix For: 2.5.0

 Attachments: HDFS-6110-v2.txt, HDFS-6110.txt, HDFS-6110v3.txt, 
 HDFS-6110v4.txt, HDFS-6110v5.txt, HDFS-6110v6.txt


 After digging a HBase write spike issue caused by slow buffer io in our 
 cluster, just realize we'd better to add more abnormal latency warning log in 
 write flow, such that if other guys hit HLog sync spike, we could know more 
 detail info from HDFS side at the same time.
 Patch will be uploaded soon.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6110) adding more slow action log in critical write path

2014-04-25 Thread Liang Xie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Xie updated HDFS-6110:


Attachment: HDFS-6110v5.txt

 adding more slow action log in critical write path
 --

 Key: HDFS-6110
 URL: https://issues.apache.org/jira/browse/HDFS-6110
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 3.0.0, 2.3.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-6110-v2.txt, HDFS-6110.txt, HDFS-6110v3.txt, 
 HDFS-6110v4.txt, HDFS-6110v5.txt


 After digging a HBase write spike issue caused by slow buffer io in our 
 cluster, just realize we'd better to add more abnormal latency warning log in 
 write flow, such that if other guys hit HLog sync spike, we could know more 
 detail info from HDFS side at the same time.
 Patch will be uploaded soon.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6110) adding more slow action log in critical write path

2014-04-25 Thread Liang Xie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Xie updated HDFS-6110:


Attachment: HDFS-6110v5.txt

 adding more slow action log in critical write path
 --

 Key: HDFS-6110
 URL: https://issues.apache.org/jira/browse/HDFS-6110
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 3.0.0, 2.3.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-6110-v2.txt, HDFS-6110.txt, HDFS-6110v3.txt, 
 HDFS-6110v4.txt, HDFS-6110v5.txt


 After digging a HBase write spike issue caused by slow buffer io in our 
 cluster, just realize we'd better to add more abnormal latency warning log in 
 write flow, such that if other guys hit HLog sync spike, we could know more 
 detail info from HDFS side at the same time.
 Patch will be uploaded soon.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6110) adding more slow action log in critical write path

2014-04-25 Thread Liang Xie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Xie updated HDFS-6110:


Attachment: (was: HDFS-6110v5.txt)

 adding more slow action log in critical write path
 --

 Key: HDFS-6110
 URL: https://issues.apache.org/jira/browse/HDFS-6110
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 3.0.0, 2.3.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-6110-v2.txt, HDFS-6110.txt, HDFS-6110v3.txt, 
 HDFS-6110v4.txt, HDFS-6110v5.txt


 After digging a HBase write spike issue caused by slow buffer io in our 
 cluster, just realize we'd better to add more abnormal latency warning log in 
 write flow, such that if other guys hit HLog sync spike, we could know more 
 detail info from HDFS side at the same time.
 Patch will be uploaded soon.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6110) adding more slow action log in critical write path

2014-04-25 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HDFS-6110:


Attachment: HDFS-6110v6.txt

[~xieliang007] 's latest patch adding in offline review feedback I got from our 
Todd (See below): i.e. having one threshold for dfsclient (a higher one so 
folks MR'ing don't get annoyed by all the WARNings about slow i/o), and then 
another for datanode side which is much lower so we can see bad i/os.

{code}
16:38  todd stack: just looked at 6110. had one more thought after commenting 
on the JIRA
16:38  todd you think we should add a separate config for client vs server?
16:38  todd I'm afraid that the 300ms default may be a little aggressive for 
the client - people using hadoop fs -put to upload files may get kind of 
nervous the next time they upgrade if they start
  seeing warnings
16:38  todd MR jobs too
16:39  todd may be better to have the client default be 10sec or something 
really long, and then HBase could tune it down for WAL files
16:39  stack todd: thanks boss
16:39  todd you think i'm crazy?
16:39  stack no
16:39  stack Testing it, it is illuminating to see how long stuff takes
16:39  todd k. yea
16:39  todd I had a patch like that once on the server side
16:39  stack Was worried though that it'd freak folks out.
16:40  stack Or, rather, they'd ignore what is being said and just consider 
it 'noise'.
16:40  todd yea
16:40  todd for a throughput app it is kind of noise
16:40  todd but hbase could definitely tune the default inside the RS down
16:40  stack Let me do as you suggest.
16:40  todd k
16:40  stack Thanks for review.
16:40  todd feel free to paste this convo into the jira so it makes sense :)
16:40  todd didn't want to post yet another comment and pollute everyone's 
mailboxes
16:41  * stack nod
{code}

 adding more slow action log in critical write path
 --

 Key: HDFS-6110
 URL: https://issues.apache.org/jira/browse/HDFS-6110
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 3.0.0, 2.3.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-6110-v2.txt, HDFS-6110.txt, HDFS-6110v3.txt, 
 HDFS-6110v4.txt, HDFS-6110v5.txt, HDFS-6110v6.txt


 After digging a HBase write spike issue caused by slow buffer io in our 
 cluster, just realize we'd better to add more abnormal latency warning log in 
 write flow, such that if other guys hit HLog sync spike, we could know more 
 detail info from HDFS side at the same time.
 Patch will be uploaded soon.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6110) adding more slow action log in critical write path

2014-04-01 Thread Liang Xie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Xie updated HDFS-6110:


Attachment: HDFS-6110v4.txt

Attached v4 should address the last comment from Todd

 adding more slow action log in critical write path
 --

 Key: HDFS-6110
 URL: https://issues.apache.org/jira/browse/HDFS-6110
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 3.0.0, 2.3.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-6110-v2.txt, HDFS-6110.txt, HDFS-6110v3.txt, 
 HDFS-6110v4.txt


 After digging a HBase write spike issue caused by slow buffer io in our 
 cluster, just realize we'd better to add more abnormal latency warning log in 
 write flow, such that if other guys hit HLog sync spike, we could know more 
 detail info from HDFS side at the same time.
 Patch will be uploaded soon.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6110) adding more slow action log in critical write path

2014-03-27 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HDFS-6110:


Attachment: HDFS-6110v3.txt

I tried it out.  Looks good.  Minor formatting of log changes (They all have a 
'Slow' prefix...).  Here is an example:

{code}
2014-03-27 22:46:19,975 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
Slow BlockReceiver write packet to mirror took 986ms (threshold=300ms)
{code}

Was going to commit with the conservative 300ms threshold unless objection.


 adding more slow action log in critical write path
 --

 Key: HDFS-6110
 URL: https://issues.apache.org/jira/browse/HDFS-6110
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 3.0.0, 2.3.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-6110-v2.txt, HDFS-6110.txt, HDFS-6110v3.txt


 After digging a HBase write spike issue caused by slow buffer io in our 
 cluster, just realize we'd better to add more abnormal latency warning log in 
 write flow, such that if other guys hit HLog sync spike, we could know more 
 detail info from HDFS side at the same time.
 Patch will be uploaded soon.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6110) adding more slow action log in critical write path

2014-03-18 Thread Liang Xie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Xie updated HDFS-6110:


Attachment: HDFS-6110.txt

Here is the patch, extracted from my code, it's pretty simple, but extremely 
useful for my investigation on HBase write outlier these days:) 
[~saint@gmail.com]

DFSOutputStream was modified as well, then a HBase ops could be alerted by 
warning log easier.

 adding more slow action log in critical write path
 --

 Key: HDFS-6110
 URL: https://issues.apache.org/jira/browse/HDFS-6110
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 3.0.0, 2.3.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-6110.txt


 After digging a HBase write spike issue caused by slow buffer io in our 
 cluster, just realize we'd better to add more abnormal latency warning log in 
 write flow, such that if other guys hit HLog sync spike, we could know more 
 detail info from HDFS side at the same time.
 Patch will be uploaded soon.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6110) adding more slow action log in critical write path

2014-03-18 Thread Liang Xie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Xie updated HDFS-6110:


Status: Patch Available  (was: Open)

 adding more slow action log in critical write path
 --

 Key: HDFS-6110
 URL: https://issues.apache.org/jira/browse/HDFS-6110
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.3.0, 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-6110.txt


 After digging a HBase write spike issue caused by slow buffer io in our 
 cluster, just realize we'd better to add more abnormal latency warning log in 
 write flow, such that if other guys hit HLog sync spike, we could know more 
 detail info from HDFS side at the same time.
 Patch will be uploaded soon.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6110) adding more slow action log in critical write path

2014-03-18 Thread Liang Xie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Xie updated HDFS-6110:


Attachment: HDFS-6110-v2.txt

making the threshold configruable in patch v2.

 adding more slow action log in critical write path
 --

 Key: HDFS-6110
 URL: https://issues.apache.org/jira/browse/HDFS-6110
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 3.0.0, 2.3.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-6110-v2.txt, HDFS-6110.txt


 After digging a HBase write spike issue caused by slow buffer io in our 
 cluster, just realize we'd better to add more abnormal latency warning log in 
 write flow, such that if other guys hit HLog sync spike, we could know more 
 detail info from HDFS side at the same time.
 Patch will be uploaded soon.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6110) adding more slow action log in critical write path

2014-03-18 Thread Liang Xie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Xie updated HDFS-6110:


Attachment: (was: HDFS-6110-v2.txt)

 adding more slow action log in critical write path
 --

 Key: HDFS-6110
 URL: https://issues.apache.org/jira/browse/HDFS-6110
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 3.0.0, 2.3.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-6110.txt


 After digging a HBase write spike issue caused by slow buffer io in our 
 cluster, just realize we'd better to add more abnormal latency warning log in 
 write flow, such that if other guys hit HLog sync spike, we could know more 
 detail info from HDFS side at the same time.
 Patch will be uploaded soon.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6110) adding more slow action log in critical write path

2014-03-18 Thread Liang Xie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Xie updated HDFS-6110:


Attachment: HDFS-6110-v2.txt

 adding more slow action log in critical write path
 --

 Key: HDFS-6110
 URL: https://issues.apache.org/jira/browse/HDFS-6110
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 3.0.0, 2.3.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-6110-v2.txt, HDFS-6110.txt


 After digging a HBase write spike issue caused by slow buffer io in our 
 cluster, just realize we'd better to add more abnormal latency warning log in 
 write flow, such that if other guys hit HLog sync spike, we could know more 
 detail info from HDFS side at the same time.
 Patch will be uploaded soon.



--
This message was sent by Atlassian JIRA
(v6.2#6252)