[jira] [Commented] (HDFS-8696) Reduce the variances of latency of WebHDFS

2015-09-29 Thread Xiaobing Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935932#comment-14935932
 ] 

Xiaobing Zhou commented on HDFS-8696:
-

Checked the test failures are not related.

> Reduce the variances of latency of WebHDFS
> --
>
> Key: HDFS-8696
> URL: https://issues.apache.org/jira/browse/HDFS-8696
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Affects Versions: 2.7.0
>Reporter: Xiaobing Zhou
>Assignee: Xiaobing Zhou
> Attachments: HDFS-8696.004.patch, HDFS-8696.005.patch, 
> HDFS-8696.006.patch, HDFS-8696.007.patch, HDFS-8696.008.patch, 
> HDFS-8696.009.patch, HDFS-8696.010.patch, HDFS-8696.1.patch, 
> HDFS-8696.2.patch, HDFS-8696.3.patch
>
>
> There is an issue that appears related to the webhdfs server. When making two 
> concurrent requests, the DN will sometimes pause for extended periods (I've 
> seen 1-300 seconds), killing performance and dropping connections. 
> To reproduce: 
> 1. set up a HDFS cluster
> 2. Upload a large file (I was using 10GB). Perform 1-byte reads, writing
> the time out to /tmp/times.txt
> {noformat}
> i=1
> while (true); do 
> echo $i
> let i++
> /usr/bin/time -f %e -o /tmp/times.txt -a curl -s -L -o /dev/null 
> "http://:50070/webhdfs/v1/tmp/bigfile?op=OPEN=root=1";
> done
> {noformat}
> 3. Watch for 1-byte requests that take more than one second:
> tail -F /tmp/times.txt | grep -E "^[^0]"
> 4. After it has had a chance to warm up, start doing large transfers from
> another shell:
> {noformat}
> i=1
> while (true); do 
> echo $i
> let i++
> /usr/bin/time -f %e curl -s -L -o /dev/null 
> "http://:50070/webhdfs/v1/tmp/bigfile?op=OPEN=root";
> done
> {noformat}
> It's easy to find after a minute or two that small reads will sometimes
> pause for 1-300 seconds. In some extreme cases, it appears that the
> transfers timeout and the DN drops the connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8696) Reduce the variances of latency of WebHDFS

2015-09-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935911#comment-14935911
 ] 

Hadoop QA commented on HDFS-8696:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  19m 39s | Pre-patch trunk has 1 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 48s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  5s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 25s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 24s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 28s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 29s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 16s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 163m 47s | Tests failed in hadoop-hdfs. |
| | | 211m  2s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.server.namenode.TestAuditLogs |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12764283/HDFS-8696.010.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 80d33b5 |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12742/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12742/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12742/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12742/console |


This message was automatically generated.

> Reduce the variances of latency of WebHDFS
> --
>
> Key: HDFS-8696
> URL: https://issues.apache.org/jira/browse/HDFS-8696
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Affects Versions: 2.7.0
>Reporter: Xiaobing Zhou
>Assignee: Xiaobing Zhou
> Attachments: HDFS-8696.004.patch, HDFS-8696.005.patch, 
> HDFS-8696.006.patch, HDFS-8696.007.patch, HDFS-8696.008.patch, 
> HDFS-8696.009.patch, HDFS-8696.010.patch, HDFS-8696.1.patch, 
> HDFS-8696.2.patch, HDFS-8696.3.patch
>
>
> There is an issue that appears related to the webhdfs server. When making two 
> concurrent requests, the DN will sometimes pause for extended periods (I've 
> seen 1-300 seconds), killing performance and dropping connections. 
> To reproduce: 
> 1. set up a HDFS cluster
> 2. Upload a large file (I was using 10GB). Perform 1-byte reads, writing
> the time out to /tmp/times.txt
> {noformat}
> i=1
> while (true); do 
> echo $i
> let i++
> /usr/bin/time -f %e -o /tmp/times.txt -a curl -s -L -o /dev/null 
> "http://:50070/webhdfs/v1/tmp/bigfile?op=OPEN=root=1";
> done
> {noformat}
> 3. Watch for 1-byte requests that take more than one second:
> tail -F /tmp/times.txt | grep -E "^[^0]"
> 4. After it has had a chance to warm up, start doing large transfers from
> another shell:
> {noformat}
> i=1
> while (true); do 
> echo $i
> let i++
> /usr/bin/time -f %e curl -s -L -o /dev/null 
> "http://:50070/webhdfs/v1/tmp/bigfile?op=OPEN=root";
> done
> {noformat}
> It's easy to find after a minute or two that small reads will sometimes
> pause for 1-300 seconds. In some extreme cases, it appears that the
> transfers timeout and the DN drops the connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8696) Reduce the variances of latency of WebHDFS

2015-09-29 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935472#comment-14935472
 ] 

Haohui Mai commented on HDFS-8696:
--

The patch no longer applies to trunk anymore. Can you please rebase?

> Reduce the variances of latency of WebHDFS
> --
>
> Key: HDFS-8696
> URL: https://issues.apache.org/jira/browse/HDFS-8696
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Affects Versions: 2.7.0
>Reporter: Xiaobing Zhou
>Assignee: Xiaobing Zhou
> Attachments: HDFS-8696.004.patch, HDFS-8696.005.patch, 
> HDFS-8696.006.patch, HDFS-8696.007.patch, HDFS-8696.008.patch, 
> HDFS-8696.009.patch, HDFS-8696.1.patch, HDFS-8696.2.patch, HDFS-8696.3.patch
>
>
> There is an issue that appears related to the webhdfs server. When making two 
> concurrent requests, the DN will sometimes pause for extended periods (I've 
> seen 1-300 seconds), killing performance and dropping connections. 
> To reproduce: 
> 1. set up a HDFS cluster
> 2. Upload a large file (I was using 10GB). Perform 1-byte reads, writing
> the time out to /tmp/times.txt
> {noformat}
> i=1
> while (true); do 
> echo $i
> let i++
> /usr/bin/time -f %e -o /tmp/times.txt -a curl -s -L -o /dev/null 
> "http://:50070/webhdfs/v1/tmp/bigfile?op=OPEN=root=1";
> done
> {noformat}
> 3. Watch for 1-byte requests that take more than one second:
> tail -F /tmp/times.txt | grep -E "^[^0]"
> 4. After it has had a chance to warm up, start doing large transfers from
> another shell:
> {noformat}
> i=1
> while (true); do 
> echo $i
> let i++
> /usr/bin/time -f %e curl -s -L -o /dev/null 
> "http://:50070/webhdfs/v1/tmp/bigfile?op=OPEN=root";
> done
> {noformat}
> It's easy to find after a minute or two that small reads will sometimes
> pause for 1-300 seconds. In some extreme cases, it appears that the
> transfers timeout and the DN drops the connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8696) Reduce the variances of latency of WebHDFS

2015-09-29 Thread Xiaobing Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935523#comment-14935523
 ] 

Xiaobing Zhou commented on HDFS-8696:
-

New patch HDFS-8696.010.patch uploaded, thanks!

> Reduce the variances of latency of WebHDFS
> --
>
> Key: HDFS-8696
> URL: https://issues.apache.org/jira/browse/HDFS-8696
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Affects Versions: 2.7.0
>Reporter: Xiaobing Zhou
>Assignee: Xiaobing Zhou
> Attachments: HDFS-8696.004.patch, HDFS-8696.005.patch, 
> HDFS-8696.006.patch, HDFS-8696.007.patch, HDFS-8696.008.patch, 
> HDFS-8696.009.patch, HDFS-8696.010.patch, HDFS-8696.1.patch, 
> HDFS-8696.2.patch, HDFS-8696.3.patch
>
>
> There is an issue that appears related to the webhdfs server. When making two 
> concurrent requests, the DN will sometimes pause for extended periods (I've 
> seen 1-300 seconds), killing performance and dropping connections. 
> To reproduce: 
> 1. set up a HDFS cluster
> 2. Upload a large file (I was using 10GB). Perform 1-byte reads, writing
> the time out to /tmp/times.txt
> {noformat}
> i=1
> while (true); do 
> echo $i
> let i++
> /usr/bin/time -f %e -o /tmp/times.txt -a curl -s -L -o /dev/null 
> "http://:50070/webhdfs/v1/tmp/bigfile?op=OPEN=root=1";
> done
> {noformat}
> 3. Watch for 1-byte requests that take more than one second:
> tail -F /tmp/times.txt | grep -E "^[^0]"
> 4. After it has had a chance to warm up, start doing large transfers from
> another shell:
> {noformat}
> i=1
> while (true); do 
> echo $i
> let i++
> /usr/bin/time -f %e curl -s -L -o /dev/null 
> "http://:50070/webhdfs/v1/tmp/bigfile?op=OPEN=root";
> done
> {noformat}
> It's easy to find after a minute or two that small reads will sometimes
> pause for 1-300 seconds. In some extreme cases, it appears that the
> transfers timeout and the DN drops the connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8696) Reduce the variances of latency of WebHDFS

2015-09-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934314#comment-14934314
 ] 

Hadoop QA commented on HDFS-8696:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  18m 17s | Pre-patch trunk has 1 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   8m  5s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  4s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 25s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 25s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 31s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 39s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 35s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 16s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 168m 26s | Tests failed in hadoop-hdfs. |
| | | 214m 50s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.server.datanode.TestDirectoryScanner |
|   | hadoop.hdfs.security.TestDelegationToken |
|   | hadoop.hdfs.TestDistributedFileSystem |
|   | hadoop.hdfs.security.TestDelegationTokenForProxyUser |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12764070/HDFS-8696.009.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 4c9497c |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12718/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12718/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12718/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12718/console |


This message was automatically generated.

> Reduce the variances of latency of WebHDFS
> --
>
> Key: HDFS-8696
> URL: https://issues.apache.org/jira/browse/HDFS-8696
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Affects Versions: 2.7.0
>Reporter: Xiaobing Zhou
>Assignee: Xiaobing Zhou
> Attachments: HDFS-8696.004.patch, HDFS-8696.005.patch, 
> HDFS-8696.006.patch, HDFS-8696.007.patch, HDFS-8696.008.patch, 
> HDFS-8696.009.patch, HDFS-8696.1.patch, HDFS-8696.2.patch, HDFS-8696.3.patch
>
>
> There is an issue that appears related to the webhdfs server. When making two 
> concurrent requests, the DN will sometimes pause for extended periods (I've 
> seen 1-300 seconds), killing performance and dropping connections. 
> To reproduce: 
> 1. set up a HDFS cluster
> 2. Upload a large file (I was using 10GB). Perform 1-byte reads, writing
> the time out to /tmp/times.txt
> {noformat}
> i=1
> while (true); do 
> echo $i
> let i++
> /usr/bin/time -f %e -o /tmp/times.txt -a curl -s -L -o /dev/null 
> "http://:50070/webhdfs/v1/tmp/bigfile?op=OPEN=root=1";
> done
> {noformat}
> 3. Watch for 1-byte requests that take more than one second:
> tail -F /tmp/times.txt | grep -E "^[^0]"
> 4. After it has had a chance to warm up, start doing large transfers from
> another shell:
> {noformat}
> i=1
> while (true); do 
> echo $i
> let i++
> /usr/bin/time -f %e curl -s -L -o /dev/null 
> "http://:50070/webhdfs/v1/tmp/bigfile?op=OPEN=root";
> done
> {noformat}
> It's easy to find after a minute or two that small reads will sometimes
> pause for 1-300 seconds. In some extreme cases, it appears that the
> transfers timeout and the DN drops the connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8696) Reduce the variances of latency of WebHDFS

2015-09-28 Thread Xiaobing Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934360#comment-14934360
 ] 

Xiaobing Zhou commented on HDFS-8696:
-

Checked the test failures are not related.

> Reduce the variances of latency of WebHDFS
> --
>
> Key: HDFS-8696
> URL: https://issues.apache.org/jira/browse/HDFS-8696
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Affects Versions: 2.7.0
>Reporter: Xiaobing Zhou
>Assignee: Xiaobing Zhou
> Attachments: HDFS-8696.004.patch, HDFS-8696.005.patch, 
> HDFS-8696.006.patch, HDFS-8696.007.patch, HDFS-8696.008.patch, 
> HDFS-8696.009.patch, HDFS-8696.1.patch, HDFS-8696.2.patch, HDFS-8696.3.patch
>
>
> There is an issue that appears related to the webhdfs server. When making two 
> concurrent requests, the DN will sometimes pause for extended periods (I've 
> seen 1-300 seconds), killing performance and dropping connections. 
> To reproduce: 
> 1. set up a HDFS cluster
> 2. Upload a large file (I was using 10GB). Perform 1-byte reads, writing
> the time out to /tmp/times.txt
> {noformat}
> i=1
> while (true); do 
> echo $i
> let i++
> /usr/bin/time -f %e -o /tmp/times.txt -a curl -s -L -o /dev/null 
> "http://:50070/webhdfs/v1/tmp/bigfile?op=OPEN=root=1";
> done
> {noformat}
> 3. Watch for 1-byte requests that take more than one second:
> tail -F /tmp/times.txt | grep -E "^[^0]"
> 4. After it has had a chance to warm up, start doing large transfers from
> another shell:
> {noformat}
> i=1
> while (true); do 
> echo $i
> let i++
> /usr/bin/time -f %e curl -s -L -o /dev/null 
> "http://:50070/webhdfs/v1/tmp/bigfile?op=OPEN=root";
> done
> {noformat}
> It's easy to find after a minute or two that small reads will sometimes
> pause for 1-300 seconds. In some extreme cases, it appears that the
> transfers timeout and the DN drops the connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8696) Reduce the variances of latency of WebHDFS

2015-09-24 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14906597#comment-14906597
 ] 

Haohui Mai commented on HDFS-8696:
--

{code}
+
+  this.httpServer.childOption(ChannelOption.WRITE_BUFFER_HIGH_WATER_MARK,
+  conf.getInt(DFSConfigKeys.DFS_WEBHDFS_NETTY_CHANNEL_LOW_WATERMARK,
+  DFSConfigKeys.DFS_WEBHDFS_NETTY_CHANNEL_LOW_WATERMARK_DEFAULT));
+  this.httpServer.childOption(ChannelOption.WRITE_BUFFER_LOW_WATER_MARK,
+  conf.getInt(DFSConfigKeys.DFS_WEBHDFS_NETTY_CHANNEL_HIGH_WATERMARK,
+  DFSConfigKeys.DFS_WEBHDFS_NETTY_CHANNEL_HIGH_WATERMARK_DEFAULT));
+
   if (externalHttpChannel == null) {
{code}

I assume that there are copy and paste errors here.

> Reduce the variances of latency of WebHDFS
> --
>
> Key: HDFS-8696
> URL: https://issues.apache.org/jira/browse/HDFS-8696
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Affects Versions: 2.7.0
>Reporter: Xiaobing Zhou
>Assignee: Xiaobing Zhou
> Attachments: HDFS-8696.004.patch, HDFS-8696.005.patch, 
> HDFS-8696.006.patch, HDFS-8696.007.patch, HDFS-8696.1.patch, 
> HDFS-8696.2.patch, HDFS-8696.3.patch
>
>
> There is an issue that appears related to the webhdfs server. When making two 
> concurrent requests, the DN will sometimes pause for extended periods (I've 
> seen 1-300 seconds), killing performance and dropping connections. 
> To reproduce: 
> 1. set up a HDFS cluster
> 2. Upload a large file (I was using 10GB). Perform 1-byte reads, writing
> the time out to /tmp/times.txt
> {noformat}
> i=1
> while (true); do 
> echo $i
> let i++
> /usr/bin/time -f %e -o /tmp/times.txt -a curl -s -L -o /dev/null 
> "http://:50070/webhdfs/v1/tmp/bigfile?op=OPEN=root=1";
> done
> {noformat}
> 3. Watch for 1-byte requests that take more than one second:
> tail -F /tmp/times.txt | grep -E "^[^0]"
> 4. After it has had a chance to warm up, start doing large transfers from
> another shell:
> {noformat}
> i=1
> while (true); do 
> echo $i
> let i++
> /usr/bin/time -f %e curl -s -L -o /dev/null 
> "http://:50070/webhdfs/v1/tmp/bigfile?op=OPEN=root";
> done
> {noformat}
> It's easy to find after a minute or two that small reads will sometimes
> pause for 1-300 seconds. In some extreme cases, it appears that the
> transfers timeout and the DN drops the connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8696) Reduce the variances of latency of WebHDFS

2015-09-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14907076#comment-14907076
 ] 

Hadoop QA commented on HDFS-8696:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  18m  4s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 49s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  2s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 22s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 27s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 29s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m  9s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 163m 21s | Tests failed in hadoop-hdfs. |
| | | 208m 44s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.server.namenode.TestFSNamesystem |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12762196/HDFS-8696.008.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 71a81b6 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12665/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12665/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12665/console |


This message was automatically generated.

> Reduce the variances of latency of WebHDFS
> --
>
> Key: HDFS-8696
> URL: https://issues.apache.org/jira/browse/HDFS-8696
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Affects Versions: 2.7.0
>Reporter: Xiaobing Zhou
>Assignee: Xiaobing Zhou
> Attachments: HDFS-8696.004.patch, HDFS-8696.005.patch, 
> HDFS-8696.006.patch, HDFS-8696.007.patch, HDFS-8696.008.patch, 
> HDFS-8696.1.patch, HDFS-8696.2.patch, HDFS-8696.3.patch
>
>
> There is an issue that appears related to the webhdfs server. When making two 
> concurrent requests, the DN will sometimes pause for extended periods (I've 
> seen 1-300 seconds), killing performance and dropping connections. 
> To reproduce: 
> 1. set up a HDFS cluster
> 2. Upload a large file (I was using 10GB). Perform 1-byte reads, writing
> the time out to /tmp/times.txt
> {noformat}
> i=1
> while (true); do 
> echo $i
> let i++
> /usr/bin/time -f %e -o /tmp/times.txt -a curl -s -L -o /dev/null 
> "http://:50070/webhdfs/v1/tmp/bigfile?op=OPEN=root=1";
> done
> {noformat}
> 3. Watch for 1-byte requests that take more than one second:
> tail -F /tmp/times.txt | grep -E "^[^0]"
> 4. After it has had a chance to warm up, start doing large transfers from
> another shell:
> {noformat}
> i=1
> while (true); do 
> echo $i
> let i++
> /usr/bin/time -f %e curl -s -L -o /dev/null 
> "http://:50070/webhdfs/v1/tmp/bigfile?op=OPEN=root";
> done
> {noformat}
> It's easy to find after a minute or two that small reads will sometimes
> pause for 1-300 seconds. In some extreme cases, it appears that the
> transfers timeout and the DN drops the connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8696) Reduce the variances of latency of WebHDFS

2015-09-24 Thread Xiaobing Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14907214#comment-14907214
 ] 

Xiaobing Zhou commented on HDFS-8696:
-

Test failure is irrelevant. It passed in local setup with the patch.

> Reduce the variances of latency of WebHDFS
> --
>
> Key: HDFS-8696
> URL: https://issues.apache.org/jira/browse/HDFS-8696
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Affects Versions: 2.7.0
>Reporter: Xiaobing Zhou
>Assignee: Xiaobing Zhou
> Attachments: HDFS-8696.004.patch, HDFS-8696.005.patch, 
> HDFS-8696.006.patch, HDFS-8696.007.patch, HDFS-8696.008.patch, 
> HDFS-8696.1.patch, HDFS-8696.2.patch, HDFS-8696.3.patch
>
>
> There is an issue that appears related to the webhdfs server. When making two 
> concurrent requests, the DN will sometimes pause for extended periods (I've 
> seen 1-300 seconds), killing performance and dropping connections. 
> To reproduce: 
> 1. set up a HDFS cluster
> 2. Upload a large file (I was using 10GB). Perform 1-byte reads, writing
> the time out to /tmp/times.txt
> {noformat}
> i=1
> while (true); do 
> echo $i
> let i++
> /usr/bin/time -f %e -o /tmp/times.txt -a curl -s -L -o /dev/null 
> "http://:50070/webhdfs/v1/tmp/bigfile?op=OPEN=root=1";
> done
> {noformat}
> 3. Watch for 1-byte requests that take more than one second:
> tail -F /tmp/times.txt | grep -E "^[^0]"
> 4. After it has had a chance to warm up, start doing large transfers from
> another shell:
> {noformat}
> i=1
> while (true); do 
> echo $i
> let i++
> /usr/bin/time -f %e curl -s -L -o /dev/null 
> "http://:50070/webhdfs/v1/tmp/bigfile?op=OPEN=root";
> done
> {noformat}
> It's easy to find after a minute or two that small reads will sometimes
> pause for 1-300 seconds. In some extreme cases, it appears that the
> transfers timeout and the DN drops the connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8696) Reduce the variances of latency of WebHDFS

2015-09-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14905863#comment-14905863
 ] 

Hadoop QA commented on HDFS-8696:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  18m  9s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   8m  5s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 24s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 23s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 29s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 35s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 31s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 11s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests |   0m 21s | Tests failed in hadoop-hdfs. |
| | |  46m 34s | |
\\
\\
|| Reason || Tests ||
| Failed build | hadoop-hdfs |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12762052/HDFS-8696.007.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 06d1c90 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12653/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12653/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12653/console |


This message was automatically generated.

> Reduce the variances of latency of WebHDFS
> --
>
> Key: HDFS-8696
> URL: https://issues.apache.org/jira/browse/HDFS-8696
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Affects Versions: 2.7.0
>Reporter: Xiaobing Zhou
>Assignee: Xiaobing Zhou
> Attachments: HDFS-8696.004.patch, HDFS-8696.005.patch, 
> HDFS-8696.006.patch, HDFS-8696.007.patch, HDFS-8696.1.patch, 
> HDFS-8696.2.patch, HDFS-8696.3.patch
>
>
> There is an issue that appears related to the webhdfs server. When making two 
> concurrent requests, the DN will sometimes pause for extended periods (I've 
> seen 1-300 seconds), killing performance and dropping connections. 
> To reproduce: 
> 1. set up a HDFS cluster
> 2. Upload a large file (I was using 10GB). Perform 1-byte reads, writing
> the time out to /tmp/times.txt
> {noformat}
> i=1
> while (true); do 
> echo $i
> let i++
> /usr/bin/time -f %e -o /tmp/times.txt -a curl -s -L -o /dev/null 
> "http://:50070/webhdfs/v1/tmp/bigfile?op=OPEN=root=1";
> done
> {noformat}
> 3. Watch for 1-byte requests that take more than one second:
> tail -F /tmp/times.txt | grep -E "^[^0]"
> 4. After it has had a chance to warm up, start doing large transfers from
> another shell:
> {noformat}
> i=1
> while (true); do 
> echo $i
> let i++
> /usr/bin/time -f %e curl -s -L -o /dev/null 
> "http://:50070/webhdfs/v1/tmp/bigfile?op=OPEN=root";
> done
> {noformat}
> It's easy to find after a minute or two that small reads will sometimes
> pause for 1-300 seconds. In some extreme cases, it appears that the
> transfers timeout and the DN drops the connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8696) Reduce the variances of latency of WebHDFS

2015-09-23 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14905186#comment-14905186
 ] 

Haohui Mai commented on HDFS-8696:
--

The patch looks pretty good to me.

Hi [~bobhansen], can you please clean up the checkstyle errors? +1 after 
addressing them.

> Reduce the variances of latency of WebHDFS
> --
>
> Key: HDFS-8696
> URL: https://issues.apache.org/jira/browse/HDFS-8696
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Affects Versions: 2.7.0
>Reporter: Xiaobing Zhou
>Assignee: Xiaobing Zhou
> Attachments: HDFS-8696.004.patch, HDFS-8696.005.patch, 
> HDFS-8696.1.patch, HDFS-8696.2.patch, HDFS-8696.3.patch
>
>
> There is an issue that appears related to the webhdfs server. When making two 
> concurrent requests, the DN will sometimes pause for extended periods (I've 
> seen 1-300 seconds), killing performance and dropping connections. 
> To reproduce: 
> 1. set up a HDFS cluster
> 2. Upload a large file (I was using 10GB). Perform 1-byte reads, writing
> the time out to /tmp/times.txt
> {noformat}
> i=1
> while (true); do 
> echo $i
> let i++
> /usr/bin/time -f %e -o /tmp/times.txt -a curl -s -L -o /dev/null 
> "http://:50070/webhdfs/v1/tmp/bigfile?op=OPEN=root=1";
> done
> {noformat}
> 3. Watch for 1-byte requests that take more than one second:
> tail -F /tmp/times.txt | grep -E "^[^0]"
> 4. After it has had a chance to warm up, start doing large transfers from
> another shell:
> {noformat}
> i=1
> while (true); do 
> echo $i
> let i++
> /usr/bin/time -f %e curl -s -L -o /dev/null 
> "http://:50070/webhdfs/v1/tmp/bigfile?op=OPEN=root";
> done
> {noformat}
> It's easy to find after a minute or two that small reads will sometimes
> pause for 1-300 seconds. In some extreme cases, it appears that the
> transfers timeout and the DN drops the connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8696) Reduce the variances of latency of WebHDFS

2015-09-23 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14905715#comment-14905715
 ] 

Haohui Mai commented on HDFS-8696:
--

{code}
+  public static final String  DFS_WEBHDFS_CHANNEL_BUF_LOW_WATERMARK =
+  "dfs.webhdfs.channel.buf.low.watermark";
+  public static final int  DFS_WEBHDFS_CHANNEL_BUF_LOW_WATERMARK_DEFAULT =
+  65535;
+  public static final String  DFS_WEBHDFS_CHANNEL_BUF_HIGH_WATERMARK =
+  "dfs.webhdfs.channel.buf.high.watermark";
+  public static final int  DFS_WEBHDFS_CHANNEL_BUF_HIGH_WATERMARK_DEFAULT =
+  131070;
+
{code}

Please reset the value to Netty's default. And it might make more sense to 
rename the configuration to {{dfs.webhdfs.netty.high/low.watermark}} to reflect 
the facts that they are implementation-specific.

> Reduce the variances of latency of WebHDFS
> --
>
> Key: HDFS-8696
> URL: https://issues.apache.org/jira/browse/HDFS-8696
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Affects Versions: 2.7.0
>Reporter: Xiaobing Zhou
>Assignee: Xiaobing Zhou
> Attachments: HDFS-8696.004.patch, HDFS-8696.005.patch, 
> HDFS-8696.006.patch, HDFS-8696.1.patch, HDFS-8696.2.patch, HDFS-8696.3.patch
>
>
> There is an issue that appears related to the webhdfs server. When making two 
> concurrent requests, the DN will sometimes pause for extended periods (I've 
> seen 1-300 seconds), killing performance and dropping connections. 
> To reproduce: 
> 1. set up a HDFS cluster
> 2. Upload a large file (I was using 10GB). Perform 1-byte reads, writing
> the time out to /tmp/times.txt
> {noformat}
> i=1
> while (true); do 
> echo $i
> let i++
> /usr/bin/time -f %e -o /tmp/times.txt -a curl -s -L -o /dev/null 
> "http://:50070/webhdfs/v1/tmp/bigfile?op=OPEN=root=1";
> done
> {noformat}
> 3. Watch for 1-byte requests that take more than one second:
> tail -F /tmp/times.txt | grep -E "^[^0]"
> 4. After it has had a chance to warm up, start doing large transfers from
> another shell:
> {noformat}
> i=1
> while (true); do 
> echo $i
> let i++
> /usr/bin/time -f %e curl -s -L -o /dev/null 
> "http://:50070/webhdfs/v1/tmp/bigfile?op=OPEN=root";
> done
> {noformat}
> It's easy to find after a minute or two that small reads will sometimes
> pause for 1-300 seconds. In some extreme cases, it appears that the
> transfers timeout and the DN drops the connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8696) Reduce the variances of latency of WebHDFS

2015-09-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14905683#comment-14905683
 ] 

Hadoop QA commented on HDFS-8696:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  18m 12s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 53s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 14s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 25s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 22s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 35s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 14s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 160m 43s | Tests failed in hadoop-hdfs. |
| | | 206m 49s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.server.mover.TestStorageMover |
| Timed out tests | 
org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12761981/HDFS-8696.006.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 1f707ec |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12640/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12640/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12640/console |


This message was automatically generated.

> Reduce the variances of latency of WebHDFS
> --
>
> Key: HDFS-8696
> URL: https://issues.apache.org/jira/browse/HDFS-8696
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Affects Versions: 2.7.0
>Reporter: Xiaobing Zhou
>Assignee: Xiaobing Zhou
> Attachments: HDFS-8696.004.patch, HDFS-8696.005.patch, 
> HDFS-8696.006.patch, HDFS-8696.1.patch, HDFS-8696.2.patch, HDFS-8696.3.patch
>
>
> There is an issue that appears related to the webhdfs server. When making two 
> concurrent requests, the DN will sometimes pause for extended periods (I've 
> seen 1-300 seconds), killing performance and dropping connections. 
> To reproduce: 
> 1. set up a HDFS cluster
> 2. Upload a large file (I was using 10GB). Perform 1-byte reads, writing
> the time out to /tmp/times.txt
> {noformat}
> i=1
> while (true); do 
> echo $i
> let i++
> /usr/bin/time -f %e -o /tmp/times.txt -a curl -s -L -o /dev/null 
> "http://:50070/webhdfs/v1/tmp/bigfile?op=OPEN=root=1";
> done
> {noformat}
> 3. Watch for 1-byte requests that take more than one second:
> tail -F /tmp/times.txt | grep -E "^[^0]"
> 4. After it has had a chance to warm up, start doing large transfers from
> another shell:
> {noformat}
> i=1
> while (true); do 
> echo $i
> let i++
> /usr/bin/time -f %e curl -s -L -o /dev/null 
> "http://:50070/webhdfs/v1/tmp/bigfile?op=OPEN=root";
> done
> {noformat}
> It's easy to find after a minute or two that small reads will sometimes
> pause for 1-300 seconds. In some extreme cases, it appears that the
> transfers timeout and the DN drops the connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8696) Reduce the variances of latency of WebHDFS

2015-09-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14903432#comment-14903432
 ] 

Hadoop QA commented on HDFS-8696:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 38s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 55s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  7s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 21s | The applied patch generated  4 
new checkstyle issues (total was 405, now 409). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 27s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 30s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 12s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 163m 32s | Tests failed in hadoop-hdfs. |
| | | 208m 42s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.web.TestWebHDFSOAuth2 |
|   | hadoop.hdfs.TestSafeMode |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12761687/HDFS-8696.005.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / cc2b473 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/12603/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12603/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12603/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12603/console |


This message was automatically generated.

> Reduce the variances of latency of WebHDFS
> --
>
> Key: HDFS-8696
> URL: https://issues.apache.org/jira/browse/HDFS-8696
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Affects Versions: 2.7.0
>Reporter: Xiaobing Zhou
>Assignee: Xiaobing Zhou
> Attachments: HDFS-8696.004.patch, HDFS-8696.005.patch, 
> HDFS-8696.1.patch, HDFS-8696.2.patch, HDFS-8696.3.patch
>
>
> There is an issue that appears related to the webhdfs server. When making two 
> concurrent requests, the DN will sometimes pause for extended periods (I've 
> seen 1-300 seconds), killing performance and dropping connections. 
> To reproduce: 
> 1. set up a HDFS cluster
> 2. Upload a large file (I was using 10GB). Perform 1-byte reads, writing
> the time out to /tmp/times.txt
> {noformat}
> i=1
> while (true); do 
> echo $i
> let i++
> /usr/bin/time -f %e -o /tmp/times.txt -a curl -s -L -o /dev/null 
> "http://:50070/webhdfs/v1/tmp/bigfile?op=OPEN=root=1";
> done
> {noformat}
> 3. Watch for 1-byte requests that take more than one second:
> tail -F /tmp/times.txt | grep -E "^[^0]"
> 4. After it has had a chance to warm up, start doing large transfers from
> another shell:
> {noformat}
> i=1
> while (true); do 
> echo $i
> let i++
> /usr/bin/time -f %e curl -s -L -o /dev/null 
> "http://:50070/webhdfs/v1/tmp/bigfile?op=OPEN=root";
> done
> {noformat}
> It's easy to find after a minute or two that small reads will sometimes
> pause for 1-300 seconds. In some extreme cases, it appears that the
> transfers timeout and the DN drops the connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8696) Reduce the variances of latency of WebHDFS

2015-09-19 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14877265#comment-14877265
 ] 

Haohui Mai commented on HDFS-8696:
--

Thanks everybody.

I talked to a bunch of folks offline. Here is the recap:

* After HDFS-7279, the DNs are capable of handling much more requests 
concurrently compared to the days that DN served WebHDFS traffic using Jetty.
* The requests have high variances of latency as observed in this jira.
* After poking at multiple parameters in Netty, it looks like that tweaking low 
/ high water marks from (32k / 64k) to (256k / 1M) has resolved the issues.

It looks to me that the results of experiments imply that there are high 
variances of latency in {{DFSInputStream#read()}}, so having a larger buffer by 
setting water marks definitely helps. On the other hand, having a higher water 
mark also means significant more buffer consumptions when there are many 
(1000+) concurrent connections. I don't think these tweaks should not be turned 
on by default. They should only be tuned for specific use cases.

For this patch I would suggest (1) making only high / low water mark 
configurable. (2) leave low / high water marks as the defaults if when 
unspecified.

> Reduce the variances of latency of WebHDFS
> --
>
> Key: HDFS-8696
> URL: https://issues.apache.org/jira/browse/HDFS-8696
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Affects Versions: 2.7.0
>Reporter: Xiaobing Zhou
>Assignee: Xiaobing Zhou
> Attachments: HDFS-8696.004.patch, HDFS-8696.1.patch, 
> HDFS-8696.2.patch, HDFS-8696.3.patch
>
>
> There is an issue that appears related to the webhdfs server. When making two 
> concurrent requests, the DN will sometimes pause for extended periods (I've 
> seen 1-300 seconds), killing performance and dropping connections. 
> To reproduce: 
> 1. set up a HDFS cluster
> 2. Upload a large file (I was using 10GB). Perform 1-byte reads, writing
> the time out to /tmp/times.txt
> {noformat}
> i=1
> while (true); do 
> echo $i
> let i++
> /usr/bin/time -f %e -o /tmp/times.txt -a curl -s -L -o /dev/null 
> "http://:50070/webhdfs/v1/tmp/bigfile?op=OPEN=root=1";
> done
> {noformat}
> 3. Watch for 1-byte requests that take more than one second:
> tail -F /tmp/times.txt | grep -E "^[^0]"
> 4. After it has had a chance to warm up, start doing large transfers from
> another shell:
> {noformat}
> i=1
> while (true); do 
> echo $i
> let i++
> /usr/bin/time -f %e curl -s -L -o /dev/null 
> "http://:50070/webhdfs/v1/tmp/bigfile?op=OPEN=root";
> done
> {noformat}
> It's easy to find after a minute or two that small reads will sometimes
> pause for 1-300 seconds. In some extreme cases, it appears that the
> transfers timeout and the DN drops the connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8696) Reduce the variances of latency of WebHDFS

2015-09-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876796#comment-14876796
 ] 

Hadoop QA commented on HDFS-8696:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  19m 40s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   8m  5s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 13s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   2m  6s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 36s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 24s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | common tests |  22m 53s | Tests failed in 
hadoop-common. |
| {color:red}-1{color} | hdfs tests |  93m 23s | Tests failed in hadoop-hdfs. |
| | | 163m 19s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.ipc.TestRPC |
| Timed out tests | org.apache.hadoop.hdfs.server.mover.TestStorageMover |
|   | org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12761207/HDFS-8696.004.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 94dec5a |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12553/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12553/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12553/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/12553/console |


This message was automatically generated.

> Reduce the variances of latency of WebHDFS
> --
>
> Key: HDFS-8696
> URL: https://issues.apache.org/jira/browse/HDFS-8696
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Affects Versions: 2.7.0
>Reporter: Xiaobing Zhou
>Assignee: Xiaobing Zhou
> Attachments: HDFS-8696.004.patch, HDFS-8696.1.patch, 
> HDFS-8696.2.patch, HDFS-8696.3.patch
>
>
> There is an issue that appears related to the webhdfs server. When making two 
> concurrent requests, the DN will sometimes pause for extended periods (I've 
> seen 1-300 seconds), killing performance and dropping connections. 
> To reproduce: 
> 1. set up a HDFS cluster
> 2. Upload a large file (I was using 10GB). Perform 1-byte reads, writing
> the time out to /tmp/times.txt
> {noformat}
> i=1
> while (true); do 
> echo $i
> let i++
> /usr/bin/time -f %e -o /tmp/times.txt -a curl -s -L -o /dev/null 
> "http://:50070/webhdfs/v1/tmp/bigfile?op=OPEN=root=1";
> done
> {noformat}
> 3. Watch for 1-byte requests that take more than one second:
> tail -F /tmp/times.txt | grep -E "^[^0]"
> 4. After it has had a chance to warm up, start doing large transfers from
> another shell:
> {noformat}
> i=1
> while (true); do 
> echo $i
> let i++
> /usr/bin/time -f %e curl -s -L -o /dev/null 
> "http://:50070/webhdfs/v1/tmp/bigfile?op=OPEN=root";
> done
> {noformat}
> It's easy to find after a minute or two that small reads will sometimes
> pause for 1-300 seconds. In some extreme cases, it appears that the
> transfers timeout and the DN drops the connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8696) Reduce the variances of latency of WebHDFS

2015-09-18 Thread Xiaobing Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876576#comment-14876576
 ] 

Xiaobing Zhou commented on HDFS-8696:
-

Thanks [~bobhansen] for review! Patch V4 fixed the default number of threads 
and checkstyle issues. QA report is too old to show test failures. Let's see 
next one.

> Reduce the variances of latency of WebHDFS
> --
>
> Key: HDFS-8696
> URL: https://issues.apache.org/jira/browse/HDFS-8696
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Affects Versions: 2.7.0
>Reporter: Xiaobing Zhou
>Assignee: Xiaobing Zhou
> Attachments: HDFS-8696.004.patch, HDFS-8696.1.patch, 
> HDFS-8696.2.patch, HDFS-8696.3.patch
>
>
> There is an issue that appears related to the webhdfs server. When making two 
> concurrent requests, the DN will sometimes pause for extended periods (I've 
> seen 1-300 seconds), killing performance and dropping connections. 
> To reproduce: 
> 1. set up a HDFS cluster
> 2. Upload a large file (I was using 10GB). Perform 1-byte reads, writing
> the time out to /tmp/times.txt
> {noformat}
> i=1
> while (true); do 
> echo $i
> let i++
> /usr/bin/time -f %e -o /tmp/times.txt -a curl -s -L -o /dev/null 
> "http://:50070/webhdfs/v1/tmp/bigfile?op=OPEN=root=1";
> done
> {noformat}
> 3. Watch for 1-byte requests that take more than one second:
> tail -F /tmp/times.txt | grep -E "^[^0]"
> 4. After it has had a chance to warm up, start doing large transfers from
> another shell:
> {noformat}
> i=1
> while (true); do 
> echo $i
> let i++
> /usr/bin/time -f %e curl -s -L -o /dev/null 
> "http://:50070/webhdfs/v1/tmp/bigfile?op=OPEN=root";
> done
> {noformat}
> It's easy to find after a minute or two that small reads will sometimes
> pause for 1-300 seconds. In some extreme cases, it appears that the
> transfers timeout and the DN drops the connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8696) Reduce the variances of latency of WebHDFS

2015-09-17 Thread Bob Hansen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804558#comment-14804558
 ] 

Bob Hansen commented on HDFS-8696:
--

Testing the latest patch as part of a full Hadoop build (rather than just a set 
of patched jars over an older Hadoop build) shows much less variance.  After a 
warm-up period, we had >500k short requests < 1000ms and 0 at >= 1000ms.

Let's call this a tentative success while we continue testing.

I've reviewed the code.  We can probably drop the default nio thread count down 
from 100 threads to the number of CPUs at a maximum.  Other than that, +1.

> Reduce the variances of latency of WebHDFS
> --
>
> Key: HDFS-8696
> URL: https://issues.apache.org/jira/browse/HDFS-8696
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Affects Versions: 2.7.0
>Reporter: Xiaobing Zhou
>Assignee: Xiaobing Zhou
> Attachments: HDFS-8696.1.patch, HDFS-8696.2.patch, HDFS-8696.3.patch
>
>
> There is an issue that appears related to the webhdfs server. When making two 
> concurrent requests, the DN will sometimes pause for extended periods (I've 
> seen 1-300 seconds), killing performance and dropping connections. 
> To reproduce: 
> 1. set up a HDFS cluster
> 2. Upload a large file (I was using 10GB). Perform 1-byte reads, writing
> the time out to /tmp/times.txt
> {noformat}
> i=1
> while (true); do 
> echo $i
> let i++
> /usr/bin/time -f %e -o /tmp/times.txt -a curl -s -L -o /dev/null 
> "http://:50070/webhdfs/v1/tmp/bigfile?op=OPEN=root=1";
> done
> {noformat}
> 3. Watch for 1-byte requests that take more than one second:
> tail -F /tmp/times.txt | grep -E "^[^0]"
> 4. After it has had a chance to warm up, start doing large transfers from
> another shell:
> {noformat}
> i=1
> while (true); do 
> echo $i
> let i++
> /usr/bin/time -f %e curl -s -L -o /dev/null 
> "http://:50070/webhdfs/v1/tmp/bigfile?op=OPEN=root";
> done
> {noformat}
> It's easy to find after a minute or two that small reads will sometimes
> pause for 1-300 seconds. In some extreme cases, it appears that the
> transfers timeout and the DN drops the connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8696) Reduce the variances of latency of WebHDFS

2015-09-01 Thread Bob Hansen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14726293#comment-14726293
 ] 

Bob Hansen commented on HDFS-8696:
--

With the patched hdfs, we're still seeing slowdowns.

For the 1-byte reads:
99.7% of requests have an average latency of 4.5ms
0.3% of requests have an average latency of 870 ms (selected for latency > 
500ms)

So we're spending 34% of our wall-clock time waiting for the slow reads.  This 
is better than the unpatched 2.2.6, but still a bit rough.

Running the same experiment against 2.2.0.0 gave 0 reads at more than 500ms.


> Reduce the variances of latency of WebHDFS
> --
>
> Key: HDFS-8696
> URL: https://issues.apache.org/jira/browse/HDFS-8696
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Affects Versions: 2.7.0
>Reporter: Xiaobing Zhou
>Assignee: Xiaobing Zhou
> Attachments: HDFS-8696.1.patch, HDFS-8696.2.patch, HDFS-8696.3.patch
>
>
> There is an issue that appears related to the webhdfs server. When making two 
> concurrent requests, the DN will sometimes pause for extended periods (I've 
> seen 1-300 seconds), killing performance and dropping connections. 
> To reproduce: 
> 1. set up a HDFS cluster
> 2. Upload a large file (I was using 10GB). Perform 1-byte reads, writing
> the time out to /tmp/times.txt
> {noformat}
> i=1
> while (true); do 
> echo $i
> let i++
> /usr/bin/time -f %e -o /tmp/times.txt -a curl -s -L -o /dev/null 
> "http://:50070/webhdfs/v1/tmp/bigfile?op=OPEN=root=1";
> done
> {noformat}
> 3. Watch for 1-byte requests that take more than one second:
> tail -F /tmp/times.txt | grep -E "^[^0]"
> 4. After it has had a chance to warm up, start doing large transfers from
> another shell:
> {noformat}
> i=1
> while (true); do 
> echo $i
> let i++
> (/usr/bin/time -f %e curl -s -L -o /dev/null 
> "http://:50070/webhdfs/v1/tmp/bigfile?op=OPEN=root");
> done
> {noformat}
> It's easy to find after a minute or two that small reads will sometimes
> pause for 1-300 seconds. In some extreme cases, it appears that the
> transfers timeout and the DN drops the connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8696) Reduce the variances of latency of WebHDFS

2015-08-05 Thread Jun Yin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14658875#comment-14658875
 ] 

Jun Yin commented on HDFS-8696:
---

On unpatched try 
$ curl -s -L 
http://NN:50070/webhdfs/v1/tmp/catalog_sales_38_50.dat?op=OPENuser.name=releaselength=1;,
 will get data which length=1
While on patched cluster it will get whole data just as same as execution
$ curl -s -L 
http://NN:50070/webhdfs/v1/tmp/catalog_sales_38_50.dat?op=OPENuser.name=release;
please try if you can also reproduce, maybe i missed something.


 Reduce the variances of latency of WebHDFS
 --

 Key: HDFS-8696
 URL: https://issues.apache.org/jira/browse/HDFS-8696
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: webhdfs
Affects Versions: 2.7.0
Reporter: Xiaobing Zhou
Assignee: Xiaobing Zhou
 Attachments: HDFS-8696.1.patch, HDFS-8696.2.patch, HDFS-8696.3.patch


 There is an issue that appears related to the webhdfs server. When making two 
 concurrent requests, the DN will sometimes pause for extended periods (I've 
 seen 1-300 seconds), killing performance and dropping connections. 
 To reproduce: 
 1. set up a HDFS cluster
 2. Upload a large file (I was using 10GB). Perform 1-byte reads, writing
 the time out to /tmp/times.txt
 {noformat}
 i=1
 while (true); do 
 echo $i
 let i++
 /usr/bin/time -f %e -o /tmp/times.txt -a curl -s -L -o /dev/null 
 http://namenode:50070/webhdfs/v1/tmp/bigfile?op=OPENuser.name=rootlength=1;
 done
 {noformat}
 3. Watch for 1-byte requests that take more than one second:
 tail -F /tmp/times.txt | grep -E ^[^0]
 4. After it has had a chance to warm up, start doing large transfers from
 another shell:
 {noformat}
 i=1
 while (true); do 
 echo $i
 let i++
 (/usr/bin/time -f %e curl -s -L -o /dev/null 
 http://namenode:50070/webhdfs/v1/tmp/bigfile?op=OPENuser.name=root);
 done
 {noformat}
 It's easy to find after a minute or two that small reads will sometimes
 pause for 1-300 seconds. In some extreme cases, it appears that the
 transfers timeout and the DN drops the connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8696) Reduce the variances of latency of WebHDFS

2015-08-04 Thread Bob Hansen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14654210#comment-14654210
 ] 

Bob Hansen commented on HDFS-8696:
--

Still seeing some periodic slowdowns.  Could be related to HDFS-8855; the 
periodic drop in established connections in that bug may correlate with the 
periodic jumps in latency we're seeing in this bug.

 Reduce the variances of latency of WebHDFS
 --

 Key: HDFS-8696
 URL: https://issues.apache.org/jira/browse/HDFS-8696
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: webhdfs
Affects Versions: 2.7.0
Reporter: Xiaobing Zhou
Assignee: Xiaobing Zhou
 Attachments: HDFS-8696.1.patch, HDFS-8696.2.patch, HDFS-8696.3.patch


 There is an issue that appears related to the webhdfs server. When making two 
 concurrent requests, the DN will sometimes pause for extended periods (I've 
 seen 1-300 seconds), killing performance and dropping connections. 
 To reproduce: 
 1. set up a HDFS cluster
 2. Upload a large file (I was using 10GB). Perform 1-byte reads, writing
 the time out to /tmp/times.txt
 {noformat}
 i=1
 while (true); do 
 echo $i
 let i++
 /usr/bin/time -f %e -o /tmp/times.txt -a curl -s -L -o /dev/null 
 http://namenode:50070/webhdfs/v1/tmp/bigfile?op=OPENuser.name=rootlength=1;
 done
 {noformat}
 3. Watch for 1-byte requests that take more than one second:
 tail -F /tmp/times.txt | grep -E ^[^0]
 4. After it has had a chance to warm up, start doing large transfers from
 another shell:
 {noformat}
 i=1
 while (true); do 
 echo $i
 let i++
 (/usr/bin/time -f %e curl -s -L -o /dev/null 
 http://namenode:50070/webhdfs/v1/tmp/bigfile?op=OPENuser.name=root);
 done
 {noformat}
 It's easy to find after a minute or two that small reads will sometimes
 pause for 1-300 seconds. In some extreme cases, it appears that the
 transfers timeout and the DN drops the connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8696) Reduce the variances of latency of WebHDFS

2015-08-04 Thread Jun Yin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14654263#comment-14654263
 ] 

Jun Yin commented on HDFS-8696:
---

good catch, actually we use V3 which Xiaobing sent by email. 
The dfs.webhdfs.server.worker.threads was something taht old patch left in the 
config file, it has no effect with the latest patch I think.

 Reduce the variances of latency of WebHDFS
 --

 Key: HDFS-8696
 URL: https://issues.apache.org/jira/browse/HDFS-8696
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: webhdfs
Affects Versions: 2.7.0
Reporter: Xiaobing Zhou
Assignee: Xiaobing Zhou
 Attachments: HDFS-8696.1.patch, HDFS-8696.2.patch, HDFS-8696.3.patch


 There is an issue that appears related to the webhdfs server. When making two 
 concurrent requests, the DN will sometimes pause for extended periods (I've 
 seen 1-300 seconds), killing performance and dropping connections. 
 To reproduce: 
 1. set up a HDFS cluster
 2. Upload a large file (I was using 10GB). Perform 1-byte reads, writing
 the time out to /tmp/times.txt
 {noformat}
 i=1
 while (true); do 
 echo $i
 let i++
 /usr/bin/time -f %e -o /tmp/times.txt -a curl -s -L -o /dev/null 
 http://namenode:50070/webhdfs/v1/tmp/bigfile?op=OPENuser.name=rootlength=1;
 done
 {noformat}
 3. Watch for 1-byte requests that take more than one second:
 tail -F /tmp/times.txt | grep -E ^[^0]
 4. After it has had a chance to warm up, start doing large transfers from
 another shell:
 {noformat}
 i=1
 while (true); do 
 echo $i
 let i++
 (/usr/bin/time -f %e curl -s -L -o /dev/null 
 http://namenode:50070/webhdfs/v1/tmp/bigfile?op=OPENuser.name=root);
 done
 {noformat}
 It's easy to find after a minute or two that small reads will sometimes
 pause for 1-300 seconds. In some extreme cases, it appears that the
 transfers timeout and the DN drops the connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8696) Reduce the variances of latency of WebHDFS

2015-08-04 Thread Jun (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14654152#comment-14654152
 ] 

Jun commented on HDFS-8696:
---

Hi Xiaobing,

from my test, we got some unexpected result with a larger file(18G) reading.

case #1 - unpatched, hdfs-site.xml has following parameters:
dfs.webhdfs.server.worker.threads= 100;
dfs.webhdfs.server.max.connection.queue.length = 1024;
dfs.webhdfs.net.send.buf.size = 65535;
dfs.webhdfs.net.receive.buf.size = 65535;
dfs.webhdfs.channel.write.buf.low.watermark = 65535;
dfs.webhdfs.channel.write.buf.high.watermark = 131070;
large read test:
$ while (true); do /usr/bin/time -f %e -o /tmp/times.txt -a curl -s -L -o 
/dev/null 
http://NN:50070/webhdfs/v1/tmp/catalog_sales_38_50.dat?op=OPENuser.name=release;;
 done
$ while (true); do /usr/bin/time -f %e -o /tmp/times.txt -a curl -s -L -o 
/dev/null 
http://NN:50070/webhdfs/v1/tmp/catalog_sales_38_50.dat?op=OPENuser.name=releaselength=1;;
 done
$ tail -F /tmp/times.txt | grep -E ^[^0]
result: 
according to the /tmp/times.txt, delays are in the range 30-60s

case #2 - patched, also set required parameters in the config file - 
hdfs-site.xml
large read test as same as case #1, result:
delays are in the range of 40-90s, 2 extremely slow - 155s and 174s

I will update with some percentile later 
Thanks


 Reduce the variances of latency of WebHDFS
 --

 Key: HDFS-8696
 URL: https://issues.apache.org/jira/browse/HDFS-8696
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: webhdfs
Affects Versions: 2.7.0
Reporter: Xiaobing Zhou
Assignee: Xiaobing Zhou
 Attachments: HDFS-8696.1.patch, HDFS-8696.2.patch, HDFS-8696.3.patch


 There is an issue that appears related to the webhdfs server. When making two 
 concurrent requests, the DN will sometimes pause for extended periods (I've 
 seen 1-300 seconds), killing performance and dropping connections. 
 To reproduce: 
 1. set up a HDFS cluster
 2. Upload a large file (I was using 10GB). Perform 1-byte reads, writing
 the time out to /tmp/times.txt
 {noformat}
 i=1
 while (true); do 
 echo $i
 let i++
 /usr/bin/time -f %e -o /tmp/times.txt -a curl -s -L -o /dev/null 
 http://namenode:50070/webhdfs/v1/tmp/bigfile?op=OPENuser.name=rootlength=1;
 done
 {noformat}
 3. Watch for 1-byte requests that take more than one second:
 tail -F /tmp/times.txt | grep -E ^[^0]
 4. After it has had a chance to warm up, start doing large transfers from
 another shell:
 {noformat}
 i=1
 while (true); do 
 echo $i
 let i++
 (/usr/bin/time -f %e curl -s -L -o /dev/null 
 http://namenode:50070/webhdfs/v1/tmp/bigfile?op=OPENuser.name=root);
 done
 {noformat}
 It's easy to find after a minute or two that small reads will sometimes
 pause for 1-300 seconds. In some extreme cases, it appears that the
 transfers timeout and the DN drops the connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8696) Reduce the variances of latency of WebHDFS

2015-08-04 Thread Bob Hansen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14654168#comment-14654168
 ] 

Bob Hansen commented on HDFS-8696:
--

Jun - thanks for posting those.

Can you break down what percentage of requests that were  50ms, 50-1000ms, 
+1000ms for the patched and unpatched sets?

The patched test was with patch v1 (which used the 
dfs.webhdfs.server.worker.threads setting), correct?

 Reduce the variances of latency of WebHDFS
 --

 Key: HDFS-8696
 URL: https://issues.apache.org/jira/browse/HDFS-8696
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: webhdfs
Affects Versions: 2.7.0
Reporter: Xiaobing Zhou
Assignee: Xiaobing Zhou
 Attachments: HDFS-8696.1.patch, HDFS-8696.2.patch, HDFS-8696.3.patch


 There is an issue that appears related to the webhdfs server. When making two 
 concurrent requests, the DN will sometimes pause for extended periods (I've 
 seen 1-300 seconds), killing performance and dropping connections. 
 To reproduce: 
 1. set up a HDFS cluster
 2. Upload a large file (I was using 10GB). Perform 1-byte reads, writing
 the time out to /tmp/times.txt
 {noformat}
 i=1
 while (true); do 
 echo $i
 let i++
 /usr/bin/time -f %e -o /tmp/times.txt -a curl -s -L -o /dev/null 
 http://namenode:50070/webhdfs/v1/tmp/bigfile?op=OPENuser.name=rootlength=1;
 done
 {noformat}
 3. Watch for 1-byte requests that take more than one second:
 tail -F /tmp/times.txt | grep -E ^[^0]
 4. After it has had a chance to warm up, start doing large transfers from
 another shell:
 {noformat}
 i=1
 while (true); do 
 echo $i
 let i++
 (/usr/bin/time -f %e curl -s -L -o /dev/null 
 http://namenode:50070/webhdfs/v1/tmp/bigfile?op=OPENuser.name=root);
 done
 {noformat}
 It's easy to find after a minute or two that small reads will sometimes
 pause for 1-300 seconds. In some extreme cases, it appears that the
 transfers timeout and the DN drops the connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8696) Reduce the variances of latency of WebHDFS

2015-07-06 Thread Bob Hansen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615403#comment-14615403
 ] 

Bob Hansen commented on HDFS-8696:
--

With the addition of dfs.webhdfs.server.worker.threads, does the 
hadoop.http.max.threads setting have any effect, or are they different thread 
pools?  Should we be using dfs.webhdfs.server.worker.threads instead of  
dfs.webhdfs.server.worker.threads?

 Reduce the variances of latency of WebHDFS
 --

 Key: HDFS-8696
 URL: https://issues.apache.org/jira/browse/HDFS-8696
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: webhdfs
Affects Versions: 2.7.0
Reporter: Xiaobing Zhou
Assignee: Xiaobing Zhou
 Attachments: HDFS-8696.1.patch, HDFS-8696.2.patch


 There is an issue that appears related to the webhdfs server. When making two 
 concurrent requests, the DN will sometimes pause for extended periods (I've 
 seen 1-300 seconds), killing performance and dropping connections. 
 To reproduce: 
 1. set up a HDFS cluster
 2. Upload a large file (I was using 10GB). Perform 1-byte reads, writing
 the time out to /tmp/times.txt
 {noformat}
 i=1
 while (true); do 
 echo $i
 let i++
 /usr/bin/time -f %e -o /tmp/times.txt -a curl -s -L -o /dev/null 
 http://namenode:50070/webhdfs/v1/tmp/bigfile?op=OPENuser.name=rootlength=1;
 done
 {noformat}
 3. Watch for 1-byte requests that take more than one second:
 tail -F /tmp/times.txt | grep -E ^[^0]
 4. After it has had a chance to warm up, start doing large transfers from
 another shell:
 {noformat}
 i=1
 while (true); do 
 echo $i
 let i++
 (/usr/bin/time -f %e curl -s -L -o /dev/null 
 http://namenode:50070/webhdfs/v1/tmp/bigfile?op=OPENuser.name=root);
 done
 {noformat}
 It's easy to find after a minute or two that small reads will sometimes
 pause for 1-300 seconds. In some extreme cases, it appears that the
 transfers timeout and the DN drops the connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8696) Reduce the variances of latency of WebHDFS

2015-07-06 Thread Xiaobing Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615751#comment-14615751
 ] 

Xiaobing Zhou commented on HDFS-8696:
-

[~bobhansen] hadoop.http.max.threads only applies to HttpServer2 which is an 
embedded Jetty server, having no effects on Netty. I agree to use it so as not 
to cause confusion.


 Reduce the variances of latency of WebHDFS
 --

 Key: HDFS-8696
 URL: https://issues.apache.org/jira/browse/HDFS-8696
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: webhdfs
Affects Versions: 2.7.0
Reporter: Xiaobing Zhou
Assignee: Xiaobing Zhou
 Attachments: HDFS-8696.1.patch, HDFS-8696.2.patch


 There is an issue that appears related to the webhdfs server. When making two 
 concurrent requests, the DN will sometimes pause for extended periods (I've 
 seen 1-300 seconds), killing performance and dropping connections. 
 To reproduce: 
 1. set up a HDFS cluster
 2. Upload a large file (I was using 10GB). Perform 1-byte reads, writing
 the time out to /tmp/times.txt
 {noformat}
 i=1
 while (true); do 
 echo $i
 let i++
 /usr/bin/time -f %e -o /tmp/times.txt -a curl -s -L -o /dev/null 
 http://namenode:50070/webhdfs/v1/tmp/bigfile?op=OPENuser.name=rootlength=1;
 done
 {noformat}
 3. Watch for 1-byte requests that take more than one second:
 tail -F /tmp/times.txt | grep -E ^[^0]
 4. After it has had a chance to warm up, start doing large transfers from
 another shell:
 {noformat}
 i=1
 while (true); do 
 echo $i
 let i++
 (/usr/bin/time -f %e curl -s -L -o /dev/null 
 http://namenode:50070/webhdfs/v1/tmp/bigfile?op=OPENuser.name=root);
 done
 {noformat}
 It's easy to find after a minute or two that small reads will sometimes
 pause for 1-300 seconds. In some extreme cases, it appears that the
 transfers timeout and the DN drops the connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8696) Reduce the variances of latency of WebHDFS

2015-07-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615772#comment-14615772
 ] 

Hadoop QA commented on HDFS-8696:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  17m 43s | Pre-patch trunk has 1 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 26s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 39s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m 13s | The applied patch generated  
13 new checkstyle issues (total was 431, now 444). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m 15s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 15s | Pre-build of native portion |
| {color:green}+1{color} | hdfs tests | 162m 23s | Tests passed in hadoop-hdfs. 
|
| | | 208m 28s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12743418/HDFS-8696.2.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / fc92d3e |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11586/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/11586/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11586/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11586/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11586/console |


This message was automatically generated.

 Reduce the variances of latency of WebHDFS
 --

 Key: HDFS-8696
 URL: https://issues.apache.org/jira/browse/HDFS-8696
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: webhdfs
Affects Versions: 2.7.0
Reporter: Xiaobing Zhou
Assignee: Xiaobing Zhou
 Attachments: HDFS-8696.1.patch, HDFS-8696.2.patch


 There is an issue that appears related to the webhdfs server. When making two 
 concurrent requests, the DN will sometimes pause for extended periods (I've 
 seen 1-300 seconds), killing performance and dropping connections. 
 To reproduce: 
 1. set up a HDFS cluster
 2. Upload a large file (I was using 10GB). Perform 1-byte reads, writing
 the time out to /tmp/times.txt
 {noformat}
 i=1
 while (true); do 
 echo $i
 let i++
 /usr/bin/time -f %e -o /tmp/times.txt -a curl -s -L -o /dev/null 
 http://namenode:50070/webhdfs/v1/tmp/bigfile?op=OPENuser.name=rootlength=1;
 done
 {noformat}
 3. Watch for 1-byte requests that take more than one second:
 tail -F /tmp/times.txt | grep -E ^[^0]
 4. After it has had a chance to warm up, start doing large transfers from
 another shell:
 {noformat}
 i=1
 while (true); do 
 echo $i
 let i++
 (/usr/bin/time -f %e curl -s -L -o /dev/null 
 http://namenode:50070/webhdfs/v1/tmp/bigfile?op=OPENuser.name=root);
 done
 {noformat}
 It's easy to find after a minute or two that small reads will sometimes
 pause for 1-300 seconds. In some extreme cases, it appears that the
 transfers timeout and the DN drops the connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8696) Reduce the variances of latency of WebHDFS

2015-07-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616199#comment-14616199
 ] 

Hadoop QA commented on HDFS-8696:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  21m  8s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   8m 19s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 16s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m 21s | The applied patch generated  1 
new checkstyle issues (total was 235, now 236). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 48s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 36s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   5m 30s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | common tests |  22m 52s | Tests failed in 
hadoop-common. |
| {color:red}-1{color} | hdfs tests | 165m 44s | Tests failed in hadoop-hdfs. |
| | | 239m 26s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.fs.TestLocalFsFCStatistics |
|   | hadoop.hdfs.server.namenode.TestFileTruncate |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12743849/HDFS-8696.3.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 81f3644 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/11593/artifact/patchprocess/diffcheckstylehadoop-common.txt
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11593/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11593/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11593/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11593/console |


This message was automatically generated.

 Reduce the variances of latency of WebHDFS
 --

 Key: HDFS-8696
 URL: https://issues.apache.org/jira/browse/HDFS-8696
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: webhdfs
Affects Versions: 2.7.0
Reporter: Xiaobing Zhou
Assignee: Xiaobing Zhou
 Attachments: HDFS-8696.1.patch, HDFS-8696.2.patch, HDFS-8696.3.patch


 There is an issue that appears related to the webhdfs server. When making two 
 concurrent requests, the DN will sometimes pause for extended periods (I've 
 seen 1-300 seconds), killing performance and dropping connections. 
 To reproduce: 
 1. set up a HDFS cluster
 2. Upload a large file (I was using 10GB). Perform 1-byte reads, writing
 the time out to /tmp/times.txt
 {noformat}
 i=1
 while (true); do 
 echo $i
 let i++
 /usr/bin/time -f %e -o /tmp/times.txt -a curl -s -L -o /dev/null 
 http://namenode:50070/webhdfs/v1/tmp/bigfile?op=OPENuser.name=rootlength=1;
 done
 {noformat}
 3. Watch for 1-byte requests that take more than one second:
 tail -F /tmp/times.txt | grep -E ^[^0]
 4. After it has had a chance to warm up, start doing large transfers from
 another shell:
 {noformat}
 i=1
 while (true); do 
 echo $i
 let i++
 (/usr/bin/time -f %e curl -s -L -o /dev/null 
 http://namenode:50070/webhdfs/v1/tmp/bigfile?op=OPENuser.name=root);
 done
 {noformat}
 It's easy to find after a minute or two that small reads will sometimes
 pause for 1-300 seconds. In some extreme cases, it appears that the
 transfers timeout and the DN drops the connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8696) Reduce the variances of latency of WebHDFS

2015-07-02 Thread Bob Hansen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612013#comment-14612013
 ] 

Bob Hansen commented on HDFS-8696:
--

Be sure to  set the high water mark before the low water mark.  See 
https://github.com/netty/netty/issues/3806



 Reduce the variances of latency of WebHDFS
 --

 Key: HDFS-8696
 URL: https://issues.apache.org/jira/browse/HDFS-8696
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: webhdfs
Affects Versions: 2.7.0
Reporter: Xiaobing Zhou
Assignee: Xiaobing Zhou
 Attachments: HDFS-8696.1.patch


 There is an issue that appears related to the webhdfs server. When making two 
 concurrent requests, the DN will sometimes pause for extended periods (I've 
 seen 1-300 seconds), killing performance and dropping connections. 
 To reproduce: 
 1. set up a HDFS cluster
 2. Upload a large file (I was using 10GB). Perform 1-byte reads, writing
 the time out to /tmp/times.txt
 {noformat}
 i=1
 while (true); do 
 echo $i
 let i++
 /usr/bin/time -f %e -o /tmp/times.txt -a curl -s -L -o /dev/null 
 http://namenode:50070/webhdfs/v1/tmp/bigfile?op=OPENuser.name=rootlength=1;
 done
 {noformat}
 3. Watch for 1-byte requests that take more than one second:
 tail -F /tmp/times.txt | grep -E ^[^0]
 4. After it has had a chance to warm up, start doing large transfers from
 another shell:
 {noformat}
 i=1
 while (true); do 
 echo $i
 let i++
 (/usr/bin/time -f %e curl -s -L -o /dev/null 
 http://namenode:50070/webhdfs/v1/tmp/bigfile?op=OPENuser.name=root);
 done
 {noformat}
 It's easy to find after a minute or two that small reads will sometimes
 pause for 1-300 seconds. In some extreme cases, it appears that the
 transfers timeout and the DN drops the connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8696) Reduce the variances of latency of WebHDFS

2015-07-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611470#comment-14611470
 ] 

Hadoop QA commented on HDFS-8696:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  19m  5s | Pre-patch trunk has 1 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   8m 26s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 22s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m 26s | The applied patch generated  
14 new checkstyle issues (total was 431, now 445). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 42s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 36s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   3m  9s | Post-patch findbugs 
hadoop-hdfs-project/hadoop-hdfs compilation is broken. |
| {color:green}+1{color} | findbugs |   3m  9s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   0m 45s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 166m 32s | Tests failed in hadoop-hdfs. |
| | | 213m 30s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.fs.TestHdfsNativeCodeLoader |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12743182/HDFS-8696.1.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / a78d507 |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11567/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/11567/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11567/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11567/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11567/console |


This message was automatically generated.

 Reduce the variances of latency of WebHDFS
 --

 Key: HDFS-8696
 URL: https://issues.apache.org/jira/browse/HDFS-8696
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: webhdfs
Affects Versions: 2.7.0
Reporter: Xiaobing Zhou
Assignee: Xiaobing Zhou
 Attachments: HDFS-8696.1.patch


 There is an issue that appears related to the webhdfs server. When making two 
 concurrent requests, the DN will sometimes pause for extended periods (I've 
 seen 1-300 seconds), killing performance and dropping connections. 
 To reproduce: 
 1. set up a HDFS cluster
 2. Upload a large file (I was using 10GB). Perform 1-byte reads, writing
 the time out to /tmp/times.txt
 {noformat}
 i=1
 while (true); do 
 echo $i
 let i++
 /usr/bin/time -f %e -o /tmp/times.txt -a curl -s -L -o /dev/null 
 http://namenode:50070/webhdfs/v1/tmp/bigfile?op=OPENuser.name=rootlength=1;
 done
 {noformat}
 3. Watch for 1-byte requests that take more than one second:
 tail -F /tmp/times.txt | grep -E ^[^0]
 4. After it has had a chance to warm up, start doing large transfers from
 another shell:
 {noformat}
 i=1
 while (true); do 
 echo $i
 let i++
 (/usr/bin/time -f %e curl -s -L -o /dev/null 
 http://namenode:50070/webhdfs/v1/tmp/bigfile?op=OPENuser.name=root);
 done
 {noformat}
 It's easy to find after a minute or two that small reads will sometimes
 pause for 1-300 seconds. In some extreme cases, it appears that the
 transfers timeout and the DN drops the connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)