[jira] [Commented] (HDFS-10169) TestEditLog.testBatchedSyncWithClosedLogs with useAsyncEditLog sometimes fails

2016-07-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363723#comment-15363723
 ] 

Hadoop QA commented on HDFS-10169:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  7m 
19s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
 1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 61m  
4s{color} | {color:green} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 88m 56s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12816363/HDFS-10169-01.patch |
| JIRA Issue | HDFS-10169 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux c0c6d53fd355 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / d792a90 |
| Default Java | 1.8.0_91 |
| findbugs | v3.0.0 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15989/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15989/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> TestEditLog.testBatchedSyncWithClosedLogs with useAsyncEditLog sometimes fails
> --
>
> Key: HDFS-10169
> URL: https://issues.apache.org/jira/browse/HDFS-10169
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Rakesh R
> Attachments: HDFS-10169-00.patch, HDFS-10169-01.patch
>
>
> This failure has been seen multiple precomit builds recently.
> {noformat}
> testBatchedSyncWithClosedLogs[1](org.apache.hadoop.hdfs.server.namenode.TestEditLog)
>   Time 

[jira] [Resolved] (HDFS-10593) MAX_DIR_ITEMS should not be hard coded since RPC buff size is configurable

2016-07-05 Thread Yuanbo Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuanbo Liu resolved HDFS-10593.
---
Resolution: Not A Problem

> MAX_DIR_ITEMS should not be hard coded since RPC buff size is configurable 
> ---
>
> Key: HDFS-10593
> URL: https://issues.apache.org/jira/browse/HDFS-10593
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Yuanbo Liu
>
> In HDFS, "dfs.namenode.fs-limits.max-directory-items" was introduced in 
> HDFS-6102 to restrict max items of single directory, and the value of it can 
> not be larger than the value of MAX_DIR_ITEMS. Since 
> "ipc.maximum.data.length" was added in HADOOP-9676 and documented in 
> HADOOP-13039 to make maximum RPC buffer size configurable, it's not proper to 
> hard code the value of MAX_DIR_ITEMS in {{FSDirectory}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10593) MAX_DIR_ITEMS should not be hard coded since RPC buff size is configurable

2016-07-05 Thread Yuanbo Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363718#comment-15363718
 ] 

Yuanbo Liu commented on HDFS-10593:
---

[~andrew.wang]Thank you for your response.
You're right, "ipc.maximum.data.length" is used to set maximum RPC message 
size, not PB size. I'm sorry for not investigating clearly. I found a jira 
HDFS-10312 and thought that "ipc.maximum.data.length" was a general property 
for PB size. Turns out Chris did not want to introduce a new property and 
reused "ipc.maximum.data.length" to set PB size when block reporting.
I searched {{setSizeLimit}} in Hadoop project and did not find content about 
fsimage PB serde, so I think we didn't make fsimage serde configurable.

> MAX_DIR_ITEMS should not be hard coded since RPC buff size is configurable 
> ---
>
> Key: HDFS-10593
> URL: https://issues.apache.org/jira/browse/HDFS-10593
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Yuanbo Liu
>
> In HDFS, "dfs.namenode.fs-limits.max-directory-items" was introduced in 
> HDFS-6102 to restrict max items of single directory, and the value of it can 
> not be larger than the value of MAX_DIR_ITEMS. Since 
> "ipc.maximum.data.length" was added in HADOOP-9676 and documented in 
> HADOOP-13039 to make maximum RPC buffer size configurable, it's not proper to 
> hard code the value of MAX_DIR_ITEMS in {{FSDirectory}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10548) Remove the long deprecated BlockReaderRemote

2016-07-05 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363678#comment-15363678
 ] 

Kai Zheng commented on HDFS-10548:
--

bq. Is there a JIRA for removing BRLocalLegacy too? IIRC Windows used to need 
it since they didn't support passing the fd via domain socket, but maybe that's 
changed.
Ping: [~cnauroth] and [~ste...@apache.org]. Hope this to be clarified or 
confirmed, and if sounds good I can do it as well.

bq. How do we feel about removing it?
Sure [~andrew.wang], I will find a chance do it.

> Remove the long deprecated BlockReaderRemote
> 
>
> Key: HDFS-10548
> URL: https://issues.apache.org/jira/browse/HDFS-10548
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Kai Zheng
>Assignee: Kai Zheng
> Fix For: 3.0.0-alpha1
>
> Attachments: HDFS-10548-v1.patch, HDFS-10548-v2.patch, 
> HDFS-10548-v3.patch
>
>
> To lessen the maintain burden like raised in HDFS-8901, suggest we remove 
> {{BlockReaderRemote}} class that's deprecated very long time ago. 
> From {{BlockReaderRemote}} header:
> {quote}
>  * @deprecated this is an old implementation that is being left around
>  * in case any issues spring up with the new {@link BlockReaderRemote2}
>  * implementation.
>  * It will be removed in the next release.
> {quote}
> From {{BlockReaderRemote2}} class header:
> {quote}
>  * This is a new implementation introduced in Hadoop 0.23 which
>  * is more efficient and simpler than the older BlockReader
>  * implementation. It should be renamed to BlockReaderRemote
>  * once we are confident in it.
> {quote}
> So even further, after getting rid of the old class, we could rename as the 
> comment suggested: BlockReaderRemote2 => BlockReaderRemote.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10600) PlanCommand#getThrsholdPercentage should not use throughput value.

2016-07-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363645#comment-15363645
 ] 

Hadoop QA commented on HDFS-10600:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  4m 
14s{color} | {color:red} Docker failed to build yetus/hadoop:9560f25. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12816360/HDFS-10600.001.patch |
| JIRA Issue | HDFS-10600 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15988/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> PlanCommand#getThrsholdPercentage should not use throughput value.
> --
>
> Key: HDFS-10600
> URL: https://issues.apache.org/jira/browse/HDFS-10600
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: diskbalancer
>Affects Versions: 2.9.0, 3.0.0-beta1
>Reporter: Lei (Eddy) Xu
>Assignee: Yiqun Lin
> Fix For: 2.9.0
>
> Attachments: HDFS-10600.001.patch
>
>
> In {{PlanCommand#getThresholdPercentage}}
> {code}
>  private double getThresholdPercentage(CommandLine cmd) {
> 
> if ((value <= 0.0) || (value > 100.0)) {
>   value = getConf().getDouble(
>   DFSConfigKeys.DFS_DISK_BALANCER_MAX_DISK_THRUPUT,
>   DFSConfigKeys.DFS_DISK_BALANCER_MAX_DISK_THRUPUT_DEFAULT);
> }
> return value;
>   }
> {code}
> {{DISK_THROUGHPUT}} has the unit of "MB", so it does not make sense to return 
> {{throughput}} as a percentage value.
> Btw, we should use {{THROUGHPUT}} instead of {{THRUPUT}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10169) TestEditLog.testBatchedSyncWithClosedLogs with useAsyncEditLog sometimes fails

2016-07-05 Thread Rakesh R (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-10169:

Attachment: HDFS-10169-01.patch

> TestEditLog.testBatchedSyncWithClosedLogs with useAsyncEditLog sometimes fails
> --
>
> Key: HDFS-10169
> URL: https://issues.apache.org/jira/browse/HDFS-10169
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Rakesh R
> Attachments: HDFS-10169-00.patch, HDFS-10169-01.patch
>
>
> This failure has been seen multiple precomit builds recently.
> {noformat}
> testBatchedSyncWithClosedLogs[1](org.apache.hadoop.hdfs.server.namenode.TestEditLog)
>   Time elapsed: 0.377 sec  <<< FAILURE!
> java.lang.AssertionError: logging edit without syncing should do not affect 
> txid expected:<1> but was:<2>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestEditLog.testBatchedSyncWithClosedLogs(TestEditLog.java:594)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10600) PlanCommand#getThrsholdPercentage should not use throughput value.

2016-07-05 Thread Yiqun Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDFS-10600:
-
Attachment: HDFS-10600.001.patch

> PlanCommand#getThrsholdPercentage should not use throughput value.
> --
>
> Key: HDFS-10600
> URL: https://issues.apache.org/jira/browse/HDFS-10600
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: diskbalancer
>Affects Versions: 2.9.0, 3.0.0-beta1
>Reporter: Lei (Eddy) Xu
>Assignee: Yiqun Lin
> Fix For: 2.9.0
>
> Attachments: HDFS-10600.001.patch
>
>
> In {{PlanCommand#getThresholdPercentage}}
> {code}
>  private double getThresholdPercentage(CommandLine cmd) {
> 
> if ((value <= 0.0) || (value > 100.0)) {
>   value = getConf().getDouble(
>   DFSConfigKeys.DFS_DISK_BALANCER_MAX_DISK_THRUPUT,
>   DFSConfigKeys.DFS_DISK_BALANCER_MAX_DISK_THRUPUT_DEFAULT);
> }
> return value;
>   }
> {code}
> {{DISK_THROUGHPUT}} has the unit of "MB", so it does not make sense to return 
> {{throughput}} as a percentage value.
> Btw, we should use {{THROUGHPUT}} instead of {{THRUPUT}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10600) PlanCommand#getThrsholdPercentage should not use throughput value.

2016-07-05 Thread Yiqun Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363627#comment-15363627
 ] 

Yiqun Lin commented on HDFS-10600:
--

Post a simple patch fot this.

> PlanCommand#getThrsholdPercentage should not use throughput value.
> --
>
> Key: HDFS-10600
> URL: https://issues.apache.org/jira/browse/HDFS-10600
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: diskbalancer
>Affects Versions: 2.9.0, 3.0.0-beta1
>Reporter: Lei (Eddy) Xu
>Assignee: Yiqun Lin
> Fix For: 2.9.0
>
>
> In {{PlanCommand#getThresholdPercentage}}
> {code}
>  private double getThresholdPercentage(CommandLine cmd) {
> 
> if ((value <= 0.0) || (value > 100.0)) {
>   value = getConf().getDouble(
>   DFSConfigKeys.DFS_DISK_BALANCER_MAX_DISK_THRUPUT,
>   DFSConfigKeys.DFS_DISK_BALANCER_MAX_DISK_THRUPUT_DEFAULT);
> }
> return value;
>   }
> {code}
> {{DISK_THROUGHPUT}} has the unit of "MB", so it does not make sense to return 
> {{throughput}} as a percentage value.
> Btw, we should use {{THROUGHPUT}} instead of {{THRUPUT}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10600) PlanCommand#getThrsholdPercentage should not use throughput value.

2016-07-05 Thread Yiqun Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDFS-10600:
-
Status: Patch Available  (was: Open)

> PlanCommand#getThrsholdPercentage should not use throughput value.
> --
>
> Key: HDFS-10600
> URL: https://issues.apache.org/jira/browse/HDFS-10600
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: diskbalancer
>Affects Versions: 2.9.0, 3.0.0-beta1
>Reporter: Lei (Eddy) Xu
>Assignee: Yiqun Lin
> Fix For: 2.9.0
>
>
> In {{PlanCommand#getThresholdPercentage}}
> {code}
>  private double getThresholdPercentage(CommandLine cmd) {
> 
> if ((value <= 0.0) || (value > 100.0)) {
>   value = getConf().getDouble(
>   DFSConfigKeys.DFS_DISK_BALANCER_MAX_DISK_THRUPUT,
>   DFSConfigKeys.DFS_DISK_BALANCER_MAX_DISK_THRUPUT_DEFAULT);
> }
> return value;
>   }
> {code}
> {{DISK_THROUGHPUT}} has the unit of "MB", so it does not make sense to return 
> {{throughput}} as a percentage value.
> Btw, we should use {{THROUGHPUT}} instead of {{THRUPUT}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-10600) PlanCommand#getThrsholdPercentage should not use throughput value.

2016-07-05 Thread Yiqun Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363627#comment-15363627
 ] 

Yiqun Lin edited comment on HDFS-10600 at 7/6/16 2:13 AM:
--

Post a simple patch for this.


was (Author: linyiqun):
Post a simple patch fot this.

> PlanCommand#getThrsholdPercentage should not use throughput value.
> --
>
> Key: HDFS-10600
> URL: https://issues.apache.org/jira/browse/HDFS-10600
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: diskbalancer
>Affects Versions: 2.9.0, 3.0.0-beta1
>Reporter: Lei (Eddy) Xu
>Assignee: Yiqun Lin
> Fix For: 2.9.0
>
>
> In {{PlanCommand#getThresholdPercentage}}
> {code}
>  private double getThresholdPercentage(CommandLine cmd) {
> 
> if ((value <= 0.0) || (value > 100.0)) {
>   value = getConf().getDouble(
>   DFSConfigKeys.DFS_DISK_BALANCER_MAX_DISK_THRUPUT,
>   DFSConfigKeys.DFS_DISK_BALANCER_MAX_DISK_THRUPUT_DEFAULT);
> }
> return value;
>   }
> {code}
> {{DISK_THROUGHPUT}} has the unit of "MB", so it does not make sense to return 
> {{throughput}} as a percentage value.
> Btw, we should use {{THROUGHPUT}} instead of {{THRUPUT}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-10600) PlanCommand#getThrsholdPercentage should not use throughput value.

2016-07-05 Thread Yiqun Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin reassigned HDFS-10600:


Assignee: Yiqun Lin

> PlanCommand#getThrsholdPercentage should not use throughput value.
> --
>
> Key: HDFS-10600
> URL: https://issues.apache.org/jira/browse/HDFS-10600
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: diskbalancer
>Affects Versions: 2.9.0, 3.0.0-beta1
>Reporter: Lei (Eddy) Xu
>Assignee: Yiqun Lin
> Fix For: 2.9.0
>
>
> In {{PlanCommand#getThresholdPercentage}}
> {code}
>  private double getThresholdPercentage(CommandLine cmd) {
> 
> if ((value <= 0.0) || (value > 100.0)) {
>   value = getConf().getDouble(
>   DFSConfigKeys.DFS_DISK_BALANCER_MAX_DISK_THRUPUT,
>   DFSConfigKeys.DFS_DISK_BALANCER_MAX_DISK_THRUPUT_DEFAULT);
> }
> return value;
>   }
> {code}
> {{DISK_THROUGHPUT}} has the unit of "MB", so it does not make sense to return 
> {{throughput}} as a percentage value.
> Btw, we should use {{THROUGHPUT}} instead of {{THRUPUT}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10548) Remove the long deprecated BlockReaderRemote

2016-07-05 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-10548:
---
Hadoop Flags: Incompatible change,Reviewed  (was: Reviewed)
Release Note: This removes the configuration property 
{{dfs.client.use.legacy.blockreader}}, since the legacy remote block reader 
class has been removed from the codebase.  (was: This will obsoletes this 
configuration property, since the legacy block reader is removed from the code 
base. {{dfs.client.use.legacy.blockreader}})

I also realized on second inspection that the LEGACY_BLOCKREADER config key is 
still present in HdfsClientConfigKeys and DFSConfigKeys. How do we feel about 
removing it?

> Remove the long deprecated BlockReaderRemote
> 
>
> Key: HDFS-10548
> URL: https://issues.apache.org/jira/browse/HDFS-10548
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Kai Zheng
>Assignee: Kai Zheng
> Fix For: 3.0.0-alpha1
>
> Attachments: HDFS-10548-v1.patch, HDFS-10548-v2.patch, 
> HDFS-10548-v3.patch
>
>
> To lessen the maintain burden like raised in HDFS-8901, suggest we remove 
> {{BlockReaderRemote}} class that's deprecated very long time ago. 
> From {{BlockReaderRemote}} header:
> {quote}
>  * @deprecated this is an old implementation that is being left around
>  * in case any issues spring up with the new {@link BlockReaderRemote2}
>  * implementation.
>  * It will be removed in the next release.
> {quote}
> From {{BlockReaderRemote2}} class header:
> {quote}
>  * This is a new implementation introduced in Hadoop 0.23 which
>  * is more efficient and simpler than the older BlockReader
>  * implementation. It should be renamed to BlockReaderRemote
>  * once we are confident in it.
> {quote}
> So even further, after getting rid of the old class, we could rename as the 
> comment suggested: BlockReaderRemote2 => BlockReaderRemote.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10548) Remove the long deprecated BlockReaderRemote

2016-07-05 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363584#comment-15363584
 ] 

Andrew Wang commented on HDFS-10548:


I think what Colin meant is that (with the rename) if someone changes BRR in 
trunk, that change needs to be reapplied to BRR2 for the branch-2 backport. So 
the backports won't be clean.

Recommend that we not backport this to branch-2 for compatibility reasons.

> Remove the long deprecated BlockReaderRemote
> 
>
> Key: HDFS-10548
> URL: https://issues.apache.org/jira/browse/HDFS-10548
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Kai Zheng
>Assignee: Kai Zheng
> Fix For: 3.0.0-alpha1
>
> Attachments: HDFS-10548-v1.patch, HDFS-10548-v2.patch, 
> HDFS-10548-v3.patch
>
>
> To lessen the maintain burden like raised in HDFS-8901, suggest we remove 
> {{BlockReaderRemote}} class that's deprecated very long time ago. 
> From {{BlockReaderRemote}} header:
> {quote}
>  * @deprecated this is an old implementation that is being left around
>  * in case any issues spring up with the new {@link BlockReaderRemote2}
>  * implementation.
>  * It will be removed in the next release.
> {quote}
> From {{BlockReaderRemote2}} class header:
> {quote}
>  * This is a new implementation introduced in Hadoop 0.23 which
>  * is more efficient and simpler than the older BlockReader
>  * implementation. It should be renamed to BlockReaderRemote
>  * once we are confident in it.
> {quote}
> So even further, after getting rid of the old class, we could rename as the 
> comment suggested: BlockReaderRemote2 => BlockReaderRemote.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-10600) PlanCommand#getThrsholdPercentage should not use throughput value.

2016-07-05 Thread Lei (Eddy) Xu (JIRA)
Lei (Eddy) Xu created HDFS-10600:


 Summary: PlanCommand#getThrsholdPercentage should not use 
throughput value.
 Key: HDFS-10600
 URL: https://issues.apache.org/jira/browse/HDFS-10600
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: diskbalancer
Affects Versions: 2.9.0, 3.0.0-beta1
Reporter: Lei (Eddy) Xu


In {{PlanCommand#getThresholdPercentage}}

{code}
 private double getThresholdPercentage(CommandLine cmd) {

if ((value <= 0.0) || (value > 100.0)) {
  value = getConf().getDouble(
  DFSConfigKeys.DFS_DISK_BALANCER_MAX_DISK_THRUPUT,
  DFSConfigKeys.DFS_DISK_BALANCER_MAX_DISK_THRUPUT_DEFAULT);
}
return value;
  }
{code}

{{DISK_THROUGHPUT}} has the unit of "MB", so it does not make sense to return 
{{throughput}} as a percentage value.

Btw, we should use {{THROUGHPUT}} instead of {{THRUPUT}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10567) Improve plan command help message

2016-07-05 Thread Anu Engineer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363562#comment-15363562
 ] 

Anu Engineer commented on HDFS-10567:
-

[~xiaobingo] Thank you very much for the improvements in the messages in help. 

There is one improvement that somehow does not feel right.
{noformat}
withDescription("Describes how many errors in integer " + "can be tolerated 
while copying between a pair of disks.")
{noformat}

We seem to have added "in integer" as a unit. Does not how many errors convey 
the same meaning ? 

All other changes look good.


> Improve plan command help message
> -
>
> Key: HDFS-10567
> URL: https://issues.apache.org/jira/browse/HDFS-10567
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: Lei (Eddy) Xu
>Assignee: Xiaobing Zhou
> Attachments: HDFS-10567-HDFS-10576.001.patch, 
> HDFS-10567-HDFS-1312.000.patch
>
>
> {code}
> --bandwidth  Maximum disk bandwidth to be consumed by
>   diskBalancer. e.g. 10
> --maxerror   Describes how many errors can be
>   tolerated while copying between a pair
>   of disks.
> --outFile to write output to, if not
>   specified defaults will be used.
> --plan   creates a plan for datanode.
> --thresholdPercentagePercentage skew that wetolerate before
>   diskbalancer starts working e.g. 10
> --v   Print out the summary of the plan on
>   console
> {code}
> We should 
> * Put the unit into {{--bandwidth}}, or its help message. Is it an integer or 
> float / double number? Not clear in CLI message.
> * Give more details about {{--plan}}. It is not clear what the {{}} is 
> for.
> * {{--thresholdPercentage}},  has typo {{wetolerate}} in the error message. 
> Also it needs to indicated that it is the difference between space 
> utilization between two disks / volumes.  Is it an integer or float / double 
> number?
> Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-10599) DiskBalancer: Execute CLI via Shell

2016-07-05 Thread Anu Engineer (JIRA)
Anu Engineer created HDFS-10599:
---

 Summary: DiskBalancer: Execute CLI via Shell 
 Key: HDFS-10599
 URL: https://issues.apache.org/jira/browse/HDFS-10599
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: balancer & mover
Affects Versions: 3.0.0-alpha1
Reporter: Anu Engineer
Assignee: Anu Engineer
 Fix For: 3.0.0-alpha1


DiskBalancer CLI invokes CLI functions directly instead of shell. This is not 
representative of how end users use it. To provide good unit test coverage, we 
need to have tests where DiskBalancer CLI is invoked via shell.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-10598) DiskBalancer does not execute multi-steps plan.

2016-07-05 Thread Anu Engineer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer reassigned HDFS-10598:
---

Assignee: Anu Engineer

> DiskBalancer does not execute multi-steps plan.
> ---
>
> Key: HDFS-10598
> URL: https://issues.apache.org/jira/browse/HDFS-10598
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: diskbalancer
>Affects Versions: 2.8.0, 3.0.0-beta1
>Reporter: Lei (Eddy) Xu
>Assignee: Anu Engineer
>Priority: Critical
> Fix For: 2.9.0
>
>
> I set up a 3 DN node cluster, each one with 2 small disks.  After creating 
> some files to fill HDFS, I added two more small disks to one DN.  And run the 
> diskbalancer on this DataNode.
> The disk usage before running diskbalancer:
> {code}
> /dev/loop0  3.9G  2.1G  1.6G 58%  /mnt/data1
> /dev/loop1  3.9G  2.6G  1.1G 71%  /mnt/data2
> /dev/loop2  3.9G  17M  3.6G 1%  /mnt/data3
> /dev/loop3  3.9G  17M  3.6G 1%  /mnt/data4
> {code}
> However, after running diskbalancer (i.e., {{-query}} shows {{PLAN_DONE}})
> {code}
> /dev/loop0  3.9G  1.2G  2.5G 32%  /mnt/data1
> /dev/loop1  3.9G  2.6G  1.1G 71%  /mnt/data2
> /dev/loop2  3.9G  953M  2.7G 26%  /mnt/data3
> /dev/loop3  3.9G  17M  3.6G 1%   /mnt/data4
> {code}
> It is suspicious that in {{DiskBalancerMover#copyBlocks}}, every return does 
> {{this.setExitFlag}} which prevents {{copyBlocks()}} be called multiple times 
> from {{DiskBalancer#executePlan}}. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10548) Remove the long deprecated BlockReaderRemote

2016-07-05 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363555#comment-15363555
 ] 

Kai Zheng commented on HDFS-10548:
--

This was done targeting 3.0, but if we need this for branch-2 as well, I can 
check and post a patch for the branch separately if necessary.

> Remove the long deprecated BlockReaderRemote
> 
>
> Key: HDFS-10548
> URL: https://issues.apache.org/jira/browse/HDFS-10548
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Kai Zheng
>Assignee: Kai Zheng
> Fix For: 3.0.0-alpha1
>
> Attachments: HDFS-10548-v1.patch, HDFS-10548-v2.patch, 
> HDFS-10548-v3.patch
>
>
> To lessen the maintain burden like raised in HDFS-8901, suggest we remove 
> {{BlockReaderRemote}} class that's deprecated very long time ago. 
> From {{BlockReaderRemote}} header:
> {quote}
>  * @deprecated this is an old implementation that is being left around
>  * in case any issues spring up with the new {@link BlockReaderRemote2}
>  * implementation.
>  * It will be removed in the next release.
> {quote}
> From {{BlockReaderRemote2}} class header:
> {quote}
>  * This is a new implementation introduced in Hadoop 0.23 which
>  * is more efficient and simpler than the older BlockReader
>  * implementation. It should be renamed to BlockReaderRemote
>  * once we are confident in it.
> {quote}
> So even further, after getting rid of the old class, we could rename as the 
> comment suggested: BlockReaderRemote2 => BlockReaderRemote.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10548) Remove the long deprecated BlockReaderRemote

2016-07-05 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363549#comment-15363549
 ] 

Andrew Wang commented on HDFS-10548:


I don't think there's too much activity on BlockReaderRemote2 these days, so I 
think now is as good as any to pull the trigger.

Is there a JIRA for removing BRLocalLegacy too? IIRC Windows used to need it 
since they didn't support passing the fd via domain socket, but maybe that's 
changed.

> Remove the long deprecated BlockReaderRemote
> 
>
> Key: HDFS-10548
> URL: https://issues.apache.org/jira/browse/HDFS-10548
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Kai Zheng
>Assignee: Kai Zheng
> Fix For: 3.0.0-alpha1
>
> Attachments: HDFS-10548-v1.patch, HDFS-10548-v2.patch, 
> HDFS-10548-v3.patch
>
>
> To lessen the maintain burden like raised in HDFS-8901, suggest we remove 
> {{BlockReaderRemote}} class that's deprecated very long time ago. 
> From {{BlockReaderRemote}} header:
> {quote}
>  * @deprecated this is an old implementation that is being left around
>  * in case any issues spring up with the new {@link BlockReaderRemote2}
>  * implementation.
>  * It will be removed in the next release.
> {quote}
> From {{BlockReaderRemote2}} class header:
> {quote}
>  * This is a new implementation introduced in Hadoop 0.23 which
>  * is more efficient and simpler than the older BlockReader
>  * implementation. It should be renamed to BlockReaderRemote
>  * once we are confident in it.
> {quote}
> So even further, after getting rid of the old class, we could rename as the 
> comment suggested: BlockReaderRemote2 => BlockReaderRemote.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-9809) Abstract implementation-specific details from the datanode

2016-07-05 Thread Virajith Jalaparti (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Virajith Jalaparti updated HDFS-9809:
-
Attachment: HDFS-9809.003.patch

Posting a new patch based on porting the previous changes to the most recent 
version of trunk. 

> Abstract implementation-specific details from the datanode
> --
>
> Key: HDFS-9809
> URL: https://issues.apache.org/jira/browse/HDFS-9809
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: datanode, fs
>Reporter: Virajith Jalaparti
>Assignee: Virajith Jalaparti
> Attachments: HDFS-9809.001.patch, HDFS-9809.002.patch, 
> HDFS-9809.003.patch
>
>
> Multiple parts of the Datanode (FsVolumeSpi, ReplicaInfo, FSVolumeImpl etc.) 
> implicitly assume that blocks are stored in java.io.File(s) and that volumes 
> are divided into directories. We propose to abstract these details, which 
> would help in supporting other storages. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10597) DFSClient hangs if using hedged reads and all but one eligible replica is down

2016-07-05 Thread Michael Rose (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Rose updated HDFS-10597:

Description: 
If hedged reads are enabled, even if there is only a single datanode available, 
the hedged read loop will respect the ignored nodes list and never send more 
than one request, but retry for quite some time choosing a datanode.

This is unfortunate, as the ignored nodes list is only ever added to and never 
removed from in the scope of a single request, therefore a single failed read 
fails the entire request *or* delays responses.

There's actually a secondary undesirable behavior here too. If a hedged read 
can't find a datanode, it will delay a successful response considerably. To set 
the stage, lets say 10ms is the hedged read timeout and we only have a single 
replica available, that is, nodes=[DN1]. 

1. [0ms] {{DFSInputStream#hedgedFetchBlockByteRange}} First (not-hedged) read 
is sent to DN1. In the future, the read takes 50ms to succeed. 
ignoredNodes=[DN1]
2. [10ms] Poll timeout. Send hedged request
3. [10ms] {{DFSInputStream#chooseDataNode}} is called to find a node for the 
hedged request. As ignoredNodes includes DN1, there are no nodes available and 
we re-query the NameNode for block locations and sleep, trying again.
4. [+3000ms] {{DFSInputStream#getBestNodeDNAddrPair}} is called. As 
ignoredNodes includes DN1, we re-query the NameNode for block locations and 
sleep, trying again.
5. [+3000+6000ms] {{DFSInputStream#getBestNodeDNAddrPair}} is called. As 
ignoredNodes includes DN1, we re-query the NameNode for block locations and 
sleep, trying again.
6. [+6000ms+9000ms] {{DFSInputStream#getBestNodeDNAddrPair}} is called. As 
ignoredNodes includes DN1, we re-query the NameNode for block locations and 
sleep, trying again.
7. [27010ms] Control flow restored to 
{{DFSInputStream#hedgedFetchBlockByteRange}}, completion service is polled and 
the read that succeeded at [50ms] is returned successfully, except +27000ms 
late (worst case, expected value would be half given RNG).

This is only one scenario (a happy scenario). Supposing that the first read 
eventually fails, the DFSClient will still retry inside of 
{{DFSInputStream#hedgedFetchBlockByteRange}} for the same retries before 
failing.

I've identified one way to fix the behavior, but I'd be interested in thoughts:

{{DFSInputStream#getBestNodeDNAddrPair}}, there's a check to see if a node is 
in the ignored list before allowing it to be returned. Amending this check to 
short-circuit if there's only a single available node avoids the regrettably 
useless retries, that is:

{{nodes.length == 1 || ignoredNodes == null || 
!ignoredNodes.contains(nodes[i])}}

However, with this change, if there's only one DN available, it'll send the 
hedged request to it as well. Better behavior would be to fail hedged requests 
quickly *or* push the waiting work into the hedge pool so that successful, fast 
reads aren't blocked by this issue.

In our situation, we run a HBase cluster with HDFS RF=2 and hedged reads 
enabled, stopping a single datanode leads to the cluster coming to a grinding 
halt.

You can observe this behavior yourself by editing 
{{TestPread#testMaxOutHedgedReadPool}}'s MiniDFSCluster to have a single 
datanode.

  was:
If hedged reads are enabled, even if there is only a single datanode available, 
the hedged read loop will respect the ignored nodes list and never send more 
than one request, but retry for quite some time choosing a datanode.

This is unfortunate, as the ignored nodes list is only ever added to and never 
removed from in the scope of a single request, therefore a single failed read 
fails the entire request *or* delays responses.

There's actually a secondary undesirable behavior here too. If a hedged read 
can't find a datanode, it will delay a successful response considerably. To set 
the stage, lets say 10ms is the hedged read timeout and we only have a single 
replica available, that is, nodes=[DN1]. 

1. [0ms] {{DFSInputStream#hedgedFetchBlockByteRange}} First (not-hedged) read 
is sent to DN1. In the future, the read takes 50ms to succeed. 
ignoredNodes=[DN1]
2. [10ms] Poll timeout. Send hedged request
3. [10ms] {{DFSInputStream#chooseDataNode}} is called to find a node for the 
hedged request. As ignoredNodes includes DN1, there are no nodes available and 
we re-query the NameNode for block locations and sleep, trying again.
4. [+3000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes 
includes DN1, we re-query the NameNode for block locations and sleep, trying 
again.
5. [+3000+6000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes 
includes DN1, we re-query the NameNode for block locations and sleep, trying 
again.
6. [+6000ms+9000ms] {{DFSInputStream#chooseDataNode}} is called. As 
ignoredNodes includes DN1, we re-query the NameNode for block locations and 
sleep, trying 

[jira] [Updated] (HDFS-10597) DFSClient hangs if using hedged reads and all but one eligible replica is down

2016-07-05 Thread Michael Rose (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Rose updated HDFS-10597:

Description: 
If hedged reads are enabled, even if there is only a single datanode available, 
the hedged read loop will respect the ignored nodes list and never send more 
than one request, but retry for quite some time choosing a datanode.

This is unfortunate, as the ignored nodes list is only ever added to and never 
removed from in the scope of a single request, therefore a single failed read 
fails the entire request *or* delays responses.

There's actually a secondary undesirable behavior here too. If a hedged read 
can't find a datanode, it will delay a successful response considerably. To set 
the stage, lets say 10ms is the hedged read timeout and we only have a single 
replica available, that is, nodes=[DN1]. 

1. [0ms] {{DFSInputStream#hedgedFetchBlockByteRange}} First (not-hedged) read 
is sent to DN1. In the future, the read takes 50ms to succeed. 
ignoredNodes=[DN1]
2. [10ms] Poll timeout. Send hedged request
3. [10ms] {{DFSInputStream#chooseDataNode}} is called to find a node for the 
hedged request. As ignoredNodes includes DN1, there are no nodes available and 
we re-query the NameNode for block locations and sleep, trying again.
4. [+3000ms] {{DFSInputStream#getBestNodeDNAddrPair}} is called. As 
ignoredNodes includes DN1, we throw, re-query the NameNode for block locations 
and sleep, trying again.
5. [+3000+6000ms] {{DFSInputStream#getBestNodeDNAddrPair}} is called. As 
ignoredNodes includes DN1, we throw, re-query the NameNode for block locations 
and sleep, trying again.
6. [+6000ms+9000ms] {{DFSInputStream#getBestNodeDNAddrPair}} is called. As 
ignoredNodes includes DN1, we throw, re-query the NameNode for block locations 
and sleep, trying again.
7. [27010ms] Control flow restored to 
{{DFSInputStream#hedgedFetchBlockByteRange}}, completion service is polled and 
the read that succeeded at [50ms] is returned successfully, except +27000ms 
late (worst case, expected value would be half given RNG).

This is only one scenario (a happy scenario). Supposing that the first read 
eventually fails, the DFSClient will still retry inside of 
{{DFSInputStream#hedgedFetchBlockByteRange}} for the same retries before 
failing.

I've identified one way to fix the behavior, but I'd be interested in thoughts:

{{DFSInputStream#getBestNodeDNAddrPair}}, there's a check to see if a node is 
in the ignored list before allowing it to be returned. Amending this check to 
short-circuit if there's only a single available node avoids the regrettably 
useless retries, that is:

{{nodes.length == 1 || ignoredNodes == null || 
!ignoredNodes.contains(nodes[i])}}

However, with this change, if there's only one DN available, it'll send the 
hedged request to it as well. Better behavior would be to fail hedged requests 
quickly *or* push the waiting work into the hedge pool so that successful, fast 
reads aren't blocked by this issue.

In our situation, we run a HBase cluster with HDFS RF=2 and hedged reads 
enabled, stopping a single datanode leads to the cluster coming to a grinding 
halt.

You can observe this behavior yourself by editing 
{{TestPread#testMaxOutHedgedReadPool}}'s MiniDFSCluster to have a single 
datanode.

  was:
If hedged reads are enabled, even if there is only a single datanode available, 
the hedged read loop will respect the ignored nodes list and never send more 
than one request, but retry for quite some time choosing a datanode.

This is unfortunate, as the ignored nodes list is only ever added to and never 
removed from in the scope of a single request, therefore a single failed read 
fails the entire request *or* delays responses.

There's actually a secondary undesirable behavior here too. If a hedged read 
can't find a datanode, it will delay a successful response considerably. To set 
the stage, lets say 10ms is the hedged read timeout and we only have a single 
replica available, that is, nodes=[DN1]. 

1. [0ms] {{DFSInputStream#hedgedFetchBlockByteRange}} First (not-hedged) read 
is sent to DN1. In the future, the read takes 50ms to succeed. 
ignoredNodes=[DN1]
2. [10ms] Poll timeout. Send hedged request
3. [10ms] {{DFSInputStream#chooseDataNode}} is called to find a node for the 
hedged request. As ignoredNodes includes DN1, there are no nodes available and 
we re-query the NameNode for block locations and sleep, trying again.
4. [+3000ms] {{DFSInputStream#getBestNodeDNAddrPair}} is called. As 
ignoredNodes includes DN1, we re-query the NameNode for block locations and 
sleep, trying again.
5. [+3000+6000ms] {{DFSInputStream#getBestNodeDNAddrPair}} is called. As 
ignoredNodes includes DN1, we re-query the NameNode for block locations and 
sleep, trying again.
6. [+6000ms+9000ms] {{DFSInputStream#getBestNodeDNAddrPair}} is called. As 
ignoredNodes includes DN1, we re-query the 

[jira] [Updated] (HDFS-10597) DFSClient hangs if using hedged reads and all but one eligible replica is down

2016-07-05 Thread Michael Rose (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Rose updated HDFS-10597:

Description: 
If hedged reads are enabled, even if there is only a single datanode available, 
the hedged read loop will respect the ignored nodes list and never send more 
than one request, but retry for quite some time choosing a datanode.

This is unfortunate, as the ignored nodes list is only ever added to and never 
removed from in the scope of a single request, therefore a single failed read 
fails the entire request *or* delays responses.

There's actually a secondary undesirable behavior here too. If a hedged read 
can't find a datanode, it will delay a successful response considerably. To set 
the stage, lets say 10ms is the hedged read timeout and we only have a single 
replica available, that is, nodes=[DN1]. 

1. [0ms] {{DFSInputStream#hedgedFetchBlockByteRange}} First (not-hedged) read 
is sent to DN1. In the future, the read takes 50ms to succeed. 
ignoredNodes=[DN1]
2. [10ms] Poll timeout. Send hedged request
3. [10ms] {{DFSInputStream#chooseDataNode}} is called to find a node for the 
hedged request. As ignoredNodes includes DN1, there are no nodes available and 
we re-query the NameNode for block locations and sleep, trying again.
4. [+3000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes 
includes DN1, we re-query the NameNode for block locations and sleep, trying 
again.
5. [+3000+6000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes 
includes DN1, we re-query the NameNode for block locations and sleep, trying 
again.
6. [+6000ms+9000ms] {{DFSInputStream#chooseDataNode}} is called. As 
ignoredNodes includes DN1, we re-query the NameNode for block locations and 
sleep, trying again.
7. [27010ms] Control flow restored to 
{{DFSInputStream#hedgedFetchBlockByteRange}}, completion service is polled and 
the read that succeeded at [50ms] is returned successfully, except +27000ms 
late (worst case, expected value would be half given RNG).

This is only one scenario (a happy scenario). Supposing that the first read 
eventually fails, the DFSClient will still retry inside of 
{{DFSInputStream#hedgedFetchBlockByteRange}} for the same retries before 
failing.

I've identified one way to fix the behavior, but I'd be interested in thoughts:

{{DFSInputStream#getBestNodeDNAddrPair}}, there's a check to see if a node is 
in the ignored list before allowing it to be returned. Amending this check to 
short-circuit if there's only a single available node avoids the regrettably 
useless retries, that is:

{{nodes.length == 1 || ignoredNodes == null || 
!ignoredNodes.contains(nodes[i])}}

However, with this change, if there's only one DN available, it'll send the 
hedged request to it as well. Better behavior would be to fail hedged requests 
quickly *or* push the waiting work into the hedge pool so that successful, fast 
reads aren't blocked by this issue.

In our situation, we run a HBase cluster with HDFS RF=2 and hedged reads 
enabled, stopping a single datanode leads to the cluster coming to a grinding 
halt.

You can observe this behavior yourself by editing 
{{TestPread#testMaxOutHedgedReadPool}}'s MiniDFSCluster to have a single 
datanode.

  was:
If hedged reads are enabled, even if there is only a single datanode available, 
the hedged read loop will respect the ignored nodes list and never send more 
than one request, but retry for quite some time choosing a datanode.

This is unfortunate, as the ignored nodes list is only ever added to and never 
removed from in the scope of a single request, therefore a single failed read 
fails the entire request *or* delays responses.

There's actually a secondary undesirable behavior here too. If a hedged read 
can't find a datanode, it will delay a successful response considerably. To set 
the stage, lets say 10ms is the hedged read timeout and we only have a single 
replica available, that is, nodes=[DN1]. 

1. [0ms] {{DFSInputStream#hedgedFetchBlockByteRange}} First (not-hedged) read 
is sent to DN1. In the future, the read takes 50ms to succeed. 
ignoredNodes=[DN1]
2. [10ms] Poll timeout. Send hedged request
3. [10ms] {{DFSInputStream#chooseDataNode}} is called to find a node for the 
hedged request. As ignoredNodes includes DN1, there are no nodes available and 
we re-query the NameNode for block locations and sleep, trying again.
4. [+3000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes 
includes DN1, we re-query the NameNode for block locations and sleep, trying 
again.
5. [+3000+6000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes 
includes DN1, we re-query the NameNode for block locations and sleep, trying 
again.
6. [+6000ms+9000ms] {{DFSInputStream#chooseDataNode}} is called. As 
ignoredNodes includes DN1, we re-query the NameNode for block locations and 
sleep, trying again.
7. [27010ms] 

[jira] [Created] (HDFS-10598) DiskBalancer does not execute multi-steps plan.

2016-07-05 Thread Lei (Eddy) Xu (JIRA)
Lei (Eddy) Xu created HDFS-10598:


 Summary: DiskBalancer does not execute multi-steps plan.
 Key: HDFS-10598
 URL: https://issues.apache.org/jira/browse/HDFS-10598
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: diskbalancer
Affects Versions: 2.8.0, 3.0.0-beta1
Reporter: Lei (Eddy) Xu
Priority: Critical


I set up a 3 DN node cluster, each one with 2 small disks.  After creating some 
files to fill HDFS, I added two more small disks to one DN.  And run the 
diskbalancer on this DataNode.

The disk usage before running diskbalancer:

{code}
/dev/loop0  3.9G  2.1G  1.6G 58%  /mnt/data1
/dev/loop1  3.9G  2.6G  1.1G 71%  /mnt/data2
/dev/loop2  3.9G  17M  3.6G 1%  /mnt/data3
/dev/loop3  3.9G  17M  3.6G 1%  /mnt/data4
{code}

However, after running diskbalancer (i.e., {{-query}} shows {{PLAN_DONE}})

{code}
/dev/loop0  3.9G  1.2G  2.5G 32%  /mnt/data1
/dev/loop1  3.9G  2.6G  1.1G 71%  /mnt/data2
/dev/loop2  3.9G  953M  2.7G 26%  /mnt/data3
/dev/loop3  3.9G  17M  3.6G 1%   /mnt/data4
{code}

It is suspicious that in {{DiskBalancerMover#copyBlocks}}, every return does 
{{this.setExitFlag}} which prevents {{copyBlocks()}} be called multiple times 
from {{DiskBalancer#executePlan}}. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10579) HDFS web interfaces lack configs for X-FRAME-OPTIONS protection

2016-07-05 Thread Anu Engineer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer updated HDFS-10579:

Fix Version/s: (was: 3.0.0-alpha1)
   2.9.0

> HDFS web interfaces lack configs for X-FRAME-OPTIONS protection
> ---
>
> Key: HDFS-10579
> URL: https://issues.apache.org/jira/browse/HDFS-10579
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Fix For: 2.9.0
>
>
> This JIRA proposes to extend the work done in HADOOP-12964 and enable a 
> configuration value that enables or disables that option. This JIRA will also 
> add an ability to pick the right x-frame-option, since right now it looks 
> like we have hardcoded that to SAMEORIGIN.
> This allows HDFS to remain backward compatible as required by the branch-2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10579) HDFS web interfaces lack configs for X-FRAME-OPTIONS protection

2016-07-05 Thread Anu Engineer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer updated HDFS-10579:

Target Version/s: 2.9.0  (was: 3.0.0-alpha1)

> HDFS web interfaces lack configs for X-FRAME-OPTIONS protection
> ---
>
> Key: HDFS-10579
> URL: https://issues.apache.org/jira/browse/HDFS-10579
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Fix For: 3.0.0-alpha1
>
>
> This JIRA proposes to extend the work done in HADOOP-12964 and enable a 
> configuration value that enables or disables that option. This JIRA will also 
> add an ability to pick the right x-frame-option, since right now it looks 
> like we have hardcoded that to SAMEORIGIN.
> This allows HDFS to remain backward compatible as required by the branch-2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10579) HDFS web interfaces lack configs for X-FRAME-OPTIONS protection

2016-07-05 Thread Anu Engineer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363348#comment-15363348
 ] 

Anu Engineer commented on HDFS-10579:
-

[~rkanter] [~haibochen] Tagging both of you to make sure that this JIRA is 
noticed by you. I will post a patch soon, would appreciate any feedback you 
might have.

> HDFS web interfaces lack configs for X-FRAME-OPTIONS protection
> ---
>
> Key: HDFS-10579
> URL: https://issues.apache.org/jira/browse/HDFS-10579
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Fix For: 3.0.0-alpha1
>
>
> This JIRA proposes to extend the work done in HADOOP-12964 and enable a 
> configuration value that enables or disables that option. This JIRA will also 
> add an ability to pick the right x-frame-option, since right now it looks 
> like we have hardcoded that to SAMEORIGIN.
> This allows HDFS to remain backward compatible as required by the branch-2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10579) HDFS web interfaces lack configs for X-FRAME-OPTIONS protection

2016-07-05 Thread Anu Engineer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer updated HDFS-10579:

Description: 
This JIRA proposes to extend the work done in HADOOP-12964 and enable a 
configuration value that enables or disables that option. This JIRA will also 
add an ability to pick the right x-frame-option, since right now it looks like 
we have hardcoded that to SAMEORIGIN.

This allows HDFS to remain backward compatible as required by the branch-2.

  was:This JIRA proposes to extend the work done in HADOOP-12964 and enable a 
configuration value that enables or disables that option. This JIRA will also 
add an ability to pick the right x-fram


> HDFS web interfaces lack configs for X-FRAME-OPTIONS protection
> ---
>
> Key: HDFS-10579
> URL: https://issues.apache.org/jira/browse/HDFS-10579
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Fix For: 3.0.0-alpha1
>
>
> This JIRA proposes to extend the work done in HADOOP-12964 and enable a 
> configuration value that enables or disables that option. This JIRA will also 
> add an ability to pick the right x-frame-option, since right now it looks 
> like we have hardcoded that to SAMEORIGIN.
> This allows HDFS to remain backward compatible as required by the branch-2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10579) HDFS web interfaces lack configs for XFS protection

2016-07-05 Thread Anu Engineer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer updated HDFS-10579:

Description: This JIRA proposes to extend the work done in HADOOP-12964 and 
enable a configuration value that enables or disables that option. This JIRA 
will also add an ability to pick the right x-fram

> HDFS web interfaces lack configs for XFS protection
> ---
>
> Key: HDFS-10579
> URL: https://issues.apache.org/jira/browse/HDFS-10579
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Fix For: 3.0.0-alpha1
>
>
> This JIRA proposes to extend the work done in HADOOP-12964 and enable a 
> configuration value that enables or disables that option. This JIRA will also 
> add an ability to pick the right x-fram



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10579) HDFS web interfaces lack configs for X-FRAME-OPTIONS protection

2016-07-05 Thread Anu Engineer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer updated HDFS-10579:

Summary: HDFS web interfaces lack configs for X-FRAME-OPTIONS protection  
(was: HDFS web interfaces lack configs for XFS protection)

> HDFS web interfaces lack configs for X-FRAME-OPTIONS protection
> ---
>
> Key: HDFS-10579
> URL: https://issues.apache.org/jira/browse/HDFS-10579
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Fix For: 3.0.0-alpha1
>
>
> This JIRA proposes to extend the work done in HADOOP-12964 and enable a 
> configuration value that enables or disables that option. This JIRA will also 
> add an ability to pick the right x-fram



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10596) libhdfs++: Implement hdfsFileIsEncrypted

2016-07-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363341#comment-15363341
 ] 

Hadoop QA commented on HDFS-10596:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
27s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
 7s{color} | {color:green} HDFS-8707 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
17s{color} | {color:green} HDFS-8707 passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m  
5s{color} | {color:green} HDFS-8707 passed with JDK v1.7.0_101 {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
17s{color} | {color:green} HDFS-8707 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
13s{color} | {color:green} HDFS-8707 passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m  
7s{color} | {color:green} the patch passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  6m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m  
2s{color} | {color:green} the patch passed with JDK v1.7.0_101 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  6m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  7m 
31s{color} | {color:green} hadoop-hdfs-native-client in the patch passed with 
JDK v1.7.0_101. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 51m 17s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_91 Failed CTEST tests | 
test_libhdfs_threaded_hdfspp_test_shim_static |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0cf5e66 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12816305/HDFS-10596.HDFS-8707.000.patch
 |
| JIRA Issue | HDFS-10596 |
| Optional Tests |  asflicense  compile  cc  mvnsite  javac  unit  |
| uname | Linux b12be425e4ac 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | HDFS-8707 / d643d8c |
| Default Java | 1.7.0_101 |
| Multi-JDK versions |  /usr/lib/jvm/java-8-oracle:1.8.0_91 
/usr/lib/jvm/java-7-openjdk-amd64:1.7.0_101 |
| CTEST | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15987/artifact/patchprocess/patch-hadoop-hdfs-project_hadoop-hdfs-native-client-jdk1.8.0_91-ctest.txt
 |
| JDK v1.7.0_101  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15987/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs-native-client U: 
hadoop-hdfs-project/hadoop-hdfs-native-client |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15987/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> libhdfs++: Implement hdfsFileIsEncrypted
> 
>
> Key: HDFS-10596
> 

[jira] [Updated] (HDFS-10597) DFSClient hangs if using hedged reads and all but one eligible replica is down

2016-07-05 Thread Michael Rose (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Rose updated HDFS-10597:

Affects Version/s: 2.4.0
   2.5.0

> DFSClient hangs if using hedged reads and all but one eligible replica is 
> down 
> ---
>
> Key: HDFS-10597
> URL: https://issues.apache.org/jira/browse/HDFS-10597
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.4.0, 2.5.0, 2.6.0, 2.7.0
>Reporter: Michael Rose
>
> If hedged reads are enabled, even if there is only a single datanode 
> available, the hedged read loop will respect the ignored nodes list and never 
> send more than one request, but retry for quite some time choosing a datanode.
> This is unfortunate, as the ignored nodes list is only ever added to and 
> never removed from in the scope of a single request, therefore a single 
> failed read fails the entire request *or* delays responses.
> There's actually a secondary undesirable behavior here too. If a hedged read 
> can't find a datanode, it will delay a successful response considerably. To 
> set the stage, lets say 10ms is the hedged read timeout and we only have a 
> single replica available, that is, nodes=[DN1]. 
> 1. [0ms] {{DFSInputStream#hedgedFetchBlockByteRange}} First (not-hedged) read 
> is sent to DN1. In the future, the read takes 50ms to succeed. 
> ignoredNodes=[DN1]
> 2. [10ms] Poll timeout. Send hedged request
> 3. [10ms] {{DFSInputStream#chooseDataNode}} is called to find a node for the 
> hedged request. As ignoredNodes includes DN1, there are no nodes available 
> and we re-query the NameNode for block locations and sleep, trying again.
> 4. [+3000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes 
> includes DN1, we re-query the NameNode for block locations and sleep, trying 
> again.
> 5. [+3000+6000ms] {{DFSInputStream#chooseDataNode}} is called. As 
> ignoredNodes includes DN1, we re-query the NameNode for block locations and 
> sleep, trying again.
> 6. [+6000ms+9000ms] {{DFSInputStream#chooseDataNode}} is called. As 
> ignoredNodes includes DN1, we re-query the NameNode for block locations and 
> sleep, trying again.
> 7. [27010ms] Control flow restored to 
> {{DFSInputStream#hedgedFetchBlockByteRange}}, completion service is polled 
> and read that succeeded at [50ms] returned successfully, except +27000ms 
> extra (worst case, expected value would be half).
> This is only one scenario (a happy scenario). Supposing that the first read 
> eventually fails, the DFSClient will still retry inside of 
> {{DFSInputStream#hedgedFetchBlockByteRange}} for the same retries before 
> failing.
> I've identified one way to fix the behavior, but I'd be interested in 
> thoughts:
> {{DFSInputStream#getBestNodeDNAddrPair}}, there's a check to see if a node is 
> in the ignored list before allowing it to be returned. Amending this check to 
> short-circuit if there's only a single available node avoids the regrettably 
> useless retries, that is:
> {{nodes.length == 1 || ignoredNodes == null || 
> !ignoredNodes.contains(nodes[i])}}
> However, with this change, if there's only one DN available, it'll send the 
> hedged request to it as well. Better behavior would be to fail hedged 
> requests quickly *or* push the waiting work into the hedge pool so that 
> successful, fast reads aren't blocked by this issue.
> In our situation, we run a HBase cluster with HDFS RF=2 and hedged reads 
> enabled, stopping a single datanode leads to the cluster coming to a grinding 
> halt.
> You can observe this behavior yourself by editing 
> {{TestPread#testMaxOutHedgedReadPool}}'s MiniDFSCluster to have a single 
> datanode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10579) HDFS web interfaces lack configs for XFS protection

2016-07-05 Thread Anu Engineer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer updated HDFS-10579:

Summary: HDFS web interfaces lack configs for XFS protection  (was: HDFS 
web interfaces lack XFS protection)

> HDFS web interfaces lack configs for XFS protection
> ---
>
> Key: HDFS-10579
> URL: https://issues.apache.org/jira/browse/HDFS-10579
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Fix For: 3.0.0-alpha1
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10579) HDFS web interfaces lack XFS protection

2016-07-05 Thread Anu Engineer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer updated HDFS-10579:

Description: (was: The web interfaces of Namenode and Datanode does not 
protect against XFS attacks. A filter was added in hadoop common (HADOOP-13008) 
to prevent XFS attacks. This JIRA proposes to use that filter to protect 
namenode and datanode web UI.)

> HDFS web interfaces lack XFS protection
> ---
>
> Key: HDFS-10579
> URL: https://issues.apache.org/jira/browse/HDFS-10579
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 3.0.0-alpha1
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Fix For: 3.0.0-alpha1
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10597) DFSClient hangs if using hedged reads and all but one eligible replica is down

2016-07-05 Thread Michael Rose (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Rose updated HDFS-10597:

Description: 
If hedged reads are enabled, even if there is only a single datanode available, 
the hedged read loop will respect the ignored nodes list and never send more 
than one request, but retry for quite some time choosing a datanode.

This is unfortunate, as the ignored nodes list is only ever added to and never 
removed from in the scope of a single request, therefore a single failed read 
fails the entire request *or* delays responses.

There's actually a secondary undesirable behavior here too. If a hedged read 
can't find a datanode, it will delay a successful response considerably. To set 
the stage, lets say 10ms is the hedged read timeout and we only have a single 
replica available, that is, nodes=[DN1]. 

1. [0ms] {{DFSInputStream#hedgedFetchBlockByteRange}} First (not-hedged) read 
is sent to DN1. In the future, the read takes 50ms to succeed. 
ignoredNodes=[DN1]
2. [10ms] Poll timeout. Send hedged request
3. [10ms] {{DFSInputStream#chooseDataNode}} is called to find a node for the 
hedged request. As ignoredNodes includes DN1, there are no nodes available and 
we re-query the NameNode for block locations and sleep, trying again.
4. [+3000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes 
includes DN1, we re-query the NameNode for block locations and sleep, trying 
again.
5. [+3000+6000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes 
includes DN1, we re-query the NameNode for block locations and sleep, trying 
again.
6. [+6000ms+9000ms] {{DFSInputStream#chooseDataNode}} is called. As 
ignoredNodes includes DN1, we re-query the NameNode for block locations and 
sleep, trying again.
7. [27010ms] Control flow restored to 
{{DFSInputStream#hedgedFetchBlockByteRange}}, completion service is polled and 
read that succeeded at [50ms] returned successfully, except +27000ms extra 
(worst case, expected value would be half).

This is only one scenario (a happy scenario). Supposing that the first read 
eventually fails, the DFSClient will still retry inside of 
{{DFSInputStream#hedgedFetchBlockByteRange}} for the same retries before 
failing.

I've identified one way to fix the behavior, but I'd be interested in thoughts:

{{DFSInputStream#getBestNodeDNAddrPair}}, there's a check to see if a node is 
in the ignored list before allowing it to be returned. Amending this check to 
short-circuit if there's only a single available node avoids the regrettably 
useless retries, that is:

{{nodes.length == 1 || ignoredNodes == null || 
!ignoredNodes.contains(nodes[i])}}

However, with this change, if there's only one DN available, it'll send the 
hedged request to it as well. Better behavior would be to fail hedged requests 
quickly *or* push the waiting work into the hedge pool so that successful, fast 
reads aren't blocked by this issue.

In our situation, we run a HBase cluster with HDFS RF=2 and hedged reads 
enabled, stopping a single datanode leads to the cluster coming to a grinding 
halt.

You can observe this behavior yourself by editing 
{{TestPread#testMaxOutHedgedReadPool}}'s MiniDFSCluster to have a single 
datanode.

  was:
If hedged reads are enabled, even if there is only a single datanode available, 
the hedged read loop will respect the ignored nodes list and never send more 
than one request, but retry for quite some time choosing a datanode.

This is unfortunate, as the ignored nodes list is only ever added to and never 
removed from in the scope of a single request, therefore a single failed read 
fails the entire request *or* delays responses.

There's actually a secondary undesirable behavior here too. If a hedged read 
can't find a datanode, it will delay a successful response considerably. To set 
the stage, lets say 10ms is the hedged read timeout and we only have a single 
replica available, that is, nodes=[DN1]. 

1. [0ms] {{DFSInputStream#hedgedFetchBlockByteRange}} First (not-hedged) read 
is sent to DN1. In the future, the read takes 50ms to succeed. 
ignoredNodes=[DN1]
2. [10ms] Poll timeout. Send hedged request
3. [10ms] {{DFSInputStream#chooseDataNode}} is called to find a node for the 
hedged request. As ignoredNodes includes DN1, there are no nodes available and 
we re-query the NameNode for block locations and sleep, trying again.
4. [+3000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes 
includes DN1, we re-query the NameNode for block locations and sleep, trying 
again.
5. [+3000+6000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes 
includes DN1, we re-query the NameNode for block locations and sleep, trying 
again.
6. [+6000ms+9000ms] {{DFSInputStream#chooseDataNode}} is called. As 
ignoredNodes includes DN1, we re-query the NameNode for block locations and 
sleep, trying again.
7. [27010ms] Control flow 

[jira] [Updated] (HDFS-10597) DFSClient hangs if using hedged reads and all but one eligible replica is down

2016-07-05 Thread Michael Rose (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Rose updated HDFS-10597:

Description: 
If hedged reads are enabled, even if there is only a single datanode available, 
the hedged read loop will respect the ignored nodes list and never send more 
than one request, but retry for quite some time choosing a datanode.

This is unfortunate, as the ignored nodes list is only ever added to and never 
removed from in the scope of a single request, therefore a single failed read 
fails the entire request *or* delays responses.

There's actually a secondary undesirable behavior here too. If a hedged read 
can't find a datanode, it will delay a successful response considerably. To set 
the stage, lets say 10ms is the hedged read timeout and we only have a single 
replica available, that is, nodes=[DN1]. 

1. [0ms] {{DFSInputStream#hedgedFetchBlockByteRange}} First (not-hedged) read 
is sent to DN1. In the future, the read takes 50ms to succeed. 
ignoredNodes=[DN1]
2. [10ms] Poll timeout. Send hedged request
3. [10ms] {{DFSInputStream#chooseDataNode}} is called to find a node for the 
hedged request. As ignoredNodes includes DN1, we re-query the NameNode for 
block locations and sleep, trying again.
4. [+3000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes 
includes DN1, we have no nodes available and re-query the NameNode for block 
locations and sleep, trying again.
5. [+3000+6000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes 
includes DN1, we re-query the NameNode for block locations and sleep, trying 
again.
6. [+6000ms+9000ms] {{DFSInputStream#chooseDataNode}} is called. As 
ignoredNodes includes DN1, we re-query the NameNode for block locations and 
sleep, trying again.
7. [27010ms] Control flow restored to 
{{DFSInputStream#hedgedFetchBlockByteRange}}, completion service is polled and 
read that succeeded at [50ms] returned successfully, except +27000ms extra 
(worst case, expected value would be half).

This is only one scenario (a happy scenario). Supposing that the first read 
eventually fails, the DFSClient will still retry inside of 
{{DFSInputStream#hedgedFetchBlockByteRange}} for the same retries before 
failing.

I've identified one way to fix the behavior, but I'd be interested in thoughts:

{{DFSInputStream#getBestNodeDNAddrPair}}, there's a check to see if a node is 
in the ignored list before allowing it to be returned. Amending this check to 
short-circuit if there's only a single available node avoids the regrettably 
useless retries, that is:

{{nodes.length == 1 || ignoredNodes == null || 
!ignoredNodes.contains(nodes[i])}}

However, with this change, if there's only one DN available, it'll send the 
hedged request to it as well. Better behavior would be to fail hedged requests 
quickly *or* push the waiting work into the hedge pool so that successful, fast 
reads aren't blocked by this issue.

In our situation, we run a HBase cluster with HDFS RF=2 and hedged reads 
enabled, stopping a single datanode leads to the cluster coming to a grinding 
halt.

You can observe this behavior yourself by editing 
TestPread#testMaxOutHedgedReadPool's MiniDFSCluster to have a single datanode.

  was:
If hedged reads are enabled, even if there is only a single datanode available, 
the hedged read loop will respect the ignored nodes list and never send more 
than one request, but retry for quite some time choosing a datanode.

This is unfortunate, as the ignored nodes list is only ever added to and never 
removed from in the scope of a single request, therefore a single failed read 
fails the entire request *or* delays responses.

There's actually a secondary undesirable behavior here too. If a hedged read 
can't find a datanode, it will delay a successful response considerably. To set 
the stage, lets say 10ms is the hedged read timeout and we only have a single 
replica available, that is, nodes=[DN1]. 

1. [0ms] {{DFSInputStream#hedgedFetchBlockByteRange}} First (not-hedged) read 
is sent to DN1. In the future, the read takes 50ms to succeed. 
ignoredNodes=[DN1]
2. [10ms] Poll timeout. Send hedged request
3. [10ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes 
DN1, we re-query the NameNode for block locations and sleep, trying again.
4. [+3000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes 
includes DN1, we re-query the NameNode for block locations and sleep, trying 
again.
5. [+3000+6000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes 
includes DN1, we re-query the NameNode for block locations and sleep, trying 
again.
6. [+6000ms+9000ms] {{DFSInputStream#chooseDataNode}} is called. As 
ignoredNodes includes DN1, we re-query the NameNode for block locations and 
sleep, trying again.
7. [27010ms] Control flow restored to 
{{DFSInputStream#hedgedFetchBlockByteRange}}, completion service is polled 

[jira] [Updated] (HDFS-10597) DFSClient hangs if using hedged reads and all but one eligible replica is down

2016-07-05 Thread Michael Rose (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Rose updated HDFS-10597:

Description: 
If hedged reads are enabled, even if there is only a single datanode available, 
the hedged read loop will respect the ignored nodes list and never send more 
than one request, but retry for quite some time choosing a datanode.

This is unfortunate, as the ignored nodes list is only ever added to and never 
removed from in the scope of a single request, therefore a single failed read 
fails the entire request *or* delays responses.

There's actually a secondary undesirable behavior here too. If a hedged read 
can't find a datanode, it will delay a successful response considerably. To set 
the stage, lets say 10ms is the hedged read timeout and we only have a single 
replica available, that is, nodes=[DN1]. 

1. [0ms] {{DFSInputStream#hedgedFetchBlockByteRange}} First (not-hedged) read 
is sent to DN1. In the future, the read takes 50ms to succeed. 
ignoredNodes=[DN1]
2. [10ms] Poll timeout. Send hedged request
3. [10ms] {{DFSInputStream#chooseDataNode}} is called to find a node for the 
hedged request. As ignoredNodes includes DN1, there are no nodes available and 
we re-query the NameNode for block locations and sleep, trying again.
4. [+3000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes 
includes DN1, we re-query the NameNode for block locations and sleep, trying 
again.
5. [+3000+6000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes 
includes DN1, we re-query the NameNode for block locations and sleep, trying 
again.
6. [+6000ms+9000ms] {{DFSInputStream#chooseDataNode}} is called. As 
ignoredNodes includes DN1, we re-query the NameNode for block locations and 
sleep, trying again.
7. [27010ms] Control flow restored to 
{{DFSInputStream#hedgedFetchBlockByteRange}}, completion service is polled and 
read that succeeded at [50ms] returned successfully, except +27000ms extra 
(worst case, expected value would be half).

This is only one scenario (a happy scenario). Supposing that the first read 
eventually fails, the DFSClient will still retry inside of 
{{DFSInputStream#hedgedFetchBlockByteRange}} for the same retries before 
failing.

I've identified one way to fix the behavior, but I'd be interested in thoughts:

{{DFSInputStream#getBestNodeDNAddrPair}}, there's a check to see if a node is 
in the ignored list before allowing it to be returned. Amending this check to 
short-circuit if there's only a single available node avoids the regrettably 
useless retries, that is:

{{nodes.length == 1 || ignoredNodes == null || 
!ignoredNodes.contains(nodes[i])}}

However, with this change, if there's only one DN available, it'll send the 
hedged request to it as well. Better behavior would be to fail hedged requests 
quickly *or* push the waiting work into the hedge pool so that successful, fast 
reads aren't blocked by this issue.

In our situation, we run a HBase cluster with HDFS RF=2 and hedged reads 
enabled, stopping a single datanode leads to the cluster coming to a grinding 
halt.

You can observe this behavior yourself by editing 
TestPread#testMaxOutHedgedReadPool's MiniDFSCluster to have a single datanode.

  was:
If hedged reads are enabled, even if there is only a single datanode available, 
the hedged read loop will respect the ignored nodes list and never send more 
than one request, but retry for quite some time choosing a datanode.

This is unfortunate, as the ignored nodes list is only ever added to and never 
removed from in the scope of a single request, therefore a single failed read 
fails the entire request *or* delays responses.

There's actually a secondary undesirable behavior here too. If a hedged read 
can't find a datanode, it will delay a successful response considerably. To set 
the stage, lets say 10ms is the hedged read timeout and we only have a single 
replica available, that is, nodes=[DN1]. 

1. [0ms] {{DFSInputStream#hedgedFetchBlockByteRange}} First (not-hedged) read 
is sent to DN1. In the future, the read takes 50ms to succeed. 
ignoredNodes=[DN1]
2. [10ms] Poll timeout. Send hedged request
3. [10ms] {{DFSInputStream#chooseDataNode}} is called to find a node for the 
hedged request. As ignoredNodes includes DN1, we re-query the NameNode for 
block locations and sleep, trying again.
4. [+3000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes 
includes DN1, we have no nodes available and re-query the NameNode for block 
locations and sleep, trying again.
5. [+3000+6000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes 
includes DN1, we re-query the NameNode for block locations and sleep, trying 
again.
6. [+6000ms+9000ms] {{DFSInputStream#chooseDataNode}} is called. As 
ignoredNodes includes DN1, we re-query the NameNode for block locations and 
sleep, trying again.
7. [27010ms] Control flow restored to 

[jira] [Updated] (HDFS-10597) DFSClient hangs if using hedged reads and all but one eligible replica is down

2016-07-05 Thread Michael Rose (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Rose updated HDFS-10597:

Description: 
If hedged reads are enabled, even if there is only a single datanode available, 
the hedged read loop will respect the ignored nodes list and never send more 
than one request, but retry for quite some time choosing a datanode.

This is unfortunate, as the ignored nodes list is only ever added to and never 
removed from in the scope of a single request, therefore a single failed read 
fails the entire request *or* delays responses.

There's actually a secondary undesirable behavior here too. If a hedged read 
can't find a datanode, it will delay a successful response considerably. To set 
the stage, lets say 10ms is the hedged read timeout and we only have a single 
replica available, that is, nodes=[DN1]. 

1. [0ms] {{DFSInputStream#hedgedFetchBlockByteRange}} First (not-hedged) read 
is sent to DN1. In the future, the read takes 50ms to succeed. 
ignoredNodes=[DN1]
2. [10ms] Poll timeout. Send hedged request
3. [10ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes 
DN1, we re-query the NameNode for block locations and sleep, trying again.
4. [+3000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes 
includes DN1, we re-query the NameNode for block locations and sleep, trying 
again.
5. [+3000+6000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes 
includes DN1, we re-query the NameNode for block locations and sleep, trying 
again.
6. [+6000ms+9000ms] {{DFSInputStream#chooseDataNode}} is called. As 
ignoredNodes includes DN1, we re-query the NameNode for block locations and 
sleep, trying again.
7. [27010ms] Control flow restored to 
{{DFSInputStream#hedgedFetchBlockByteRange}}, completion service is polled and 
read that succeeded at [50ms] returned successfully, except +27000ms extra 
(worst case, expected value would be half).

This is only one scenario (a happy scenario). Supposing that the first read 
eventually fails, the DFSClient will still retry inside of 
{{DFSInputStream#hedgedFetchBlockByteRange}} for the same retries before 
failing.

I've identified one way to fix the behavior, but I'd be interested in thoughts:

{{DFSInputStream#getBestNodeDNAddrPair}}, there's a check to see if a node is 
in the ignored list before allowing it to be returned. Amending this check to 
short-circuit if there's only a single available node avoids the regrettably 
useless retries, that is:

{{nodes.length == 1 || ignoredNodes == null || 
!ignoredNodes.contains(nodes[i])}}

However, with this change, if there's only one DN available, it'll send the 
hedged request to it as well. Better behavior would be to fail hedged requests 
quickly *or* push the waiting work into the hedge pool so that successful, fast 
reads aren't blocked by this issue.

In our situation, we run a HBase cluster with HDFS RF=2 and hedged reads 
enabled, stopping a single datanode leads to the cluster coming to a grinding 
halt.

You can observe this behavior yourself by editing 
TestPread#testMaxOutHedgedReadPool's MiniDFSCluster to have a single datanode.

  was:
If hedged reads are enabled, even if there is only a single datanode available, 
the hedged read loop will respect the ignored nodes list and never send more 
than one request, but retry for quite some time choosing a datanode.

This is unfortunate, as the ignored nodes list is only ever added to and never 
removed from in the scope of a single request, therefore a single failed read 
fails the entire request *or* delays responses.

There's actually a secondary undesirable behavior here too. If a hedged read 
can't find a datanode, it will delay a successful response considerably. To set 
the stage, lets say 10ms is the hedged read timeout and we only have a single 
replica available, that is, nodes=[DN1]. 

1. [0ms] {{DFSInputStream#hedgedFetchBlockByteRange}} First (not-hedged) read 
is sent to DN1. In the future, the read takes 50ms to succeed. 
ignoredNodes=[DN1]
2. [10ms] Poll timeout. Send hedged request
3. [10ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes 
DN1, we re-query the NameNode for block locations and sleep, trying again.
4. [+3000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes 
includes DN1, we re-query the NameNode for block locations and sleep, trying 
again.
5. [+3000+6000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes 
includes DN1, we re-query the NameNode for block locations and sleep, trying 
again.
6. [+6000ms+9000ms] {{DFSInputStream#chooseDataNode}} is called. As 
ignoredNodes includes DN1, we re-query the NameNode for block locations and 
sleep, trying again.
7. [27010ms] Control flow restored to 
{{DFSInputStream#hedgedFetchBlockByteRange}}, completion service is polled and 
read that succeeded at [50ms] returned successfully, except 

[jira] [Updated] (HDFS-10597) DFSClient hangs if using hedged reads and all but one eligible replica is down

2016-07-05 Thread Michael Rose (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Rose updated HDFS-10597:

Description: 
If hedged reads are enabled, even if there is only a single datanode available, 
the hedged read loop will respect the ignored nodes list and never send more 
than one request, but retry for quite some time choosing a datanode.

This is unfortunate, as the ignored nodes list is only ever added to and never 
removed from in the scope of a single request, therefore a single failed read 
fails the entire request *or* delays responses.

There's actually a secondary undesirable behavior here too. If a hedged read 
can't find a datanode, it will delay a successful response considerably. To set 
the stage, lets say 10ms is the hedged read timeout and we only have a single 
replica available, that is, nodes=[DN1]. 

1. [0ms] {{DFSInputStream#hedgedFetchBlockByteRange}} First (not-hedged) read 
is sent to DN1. In the future, the read takes 50ms to succeed. 
ignoredNodes=[DN1]
2. [10ms] Poll timeout. Send hedged request
3. [10ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes 
DN1, we re-query the NameNode for block locations and sleep, trying again.
4. [+3000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes 
includes DN1, we re-query the NameNode for block locations and sleep, trying 
again.
5. [+3000+6000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes 
includes DN1, we re-query the NameNode for block locations and sleep, trying 
again.
6. [+6000ms+9000ms] {{DFSInputStream#chooseDataNode}} is called. As 
ignoredNodes includes DN1, we re-query the NameNode for block locations and 
sleep, trying again.
7. [27010ms] Control flow restored to 
{{DFSInputStream#hedgedFetchBlockByteRange}}, completion service is polled and 
read that succeeded at [50ms] returned successfully, except +27000ms extra 
(worst case, expected value would be half).

This is only one scenario (a happy scenario). Supposing that the first read 
eventually fails, the DFSClient will still retry inside of 
`DFSInputStream#hedgedFetchBlockByteRange` for the same retries before failing.

I've identified one way to fix the behavior, but I'd be interested in thoughts:

{{DFSInputStream#getBestNodeDNAddrPair`, there's a check to see if a node is in 
the ignored list before allowing it to be returned. Amending this check to 
short-circuit if there's only a single available node avoids the regrettably 
useless retries, that is:

{{nodes.length == 1 || ignoredNodes == null || 
!ignoredNodes.contains(nodes[i])}}

However, with this change, if there's only one DN available, it'll send the 
hedged request to it as well. Better behavior would be to fail hedged requests 
quickly *or* push the waiting work into the hedge pool so that successful, fast 
reads aren't blocked by this issue.

In our situation, we run a HBase cluster with HDFS RF=2 and hedged reads 
enabled, stopping a single datanode leads to the cluster coming to a grinding 
halt.

You can observe this behavior yourself by editing 
TestPread#testMaxOutHedgedReadPool's MiniDFSCluster to have a single datanode.

  was:
If hedged reads are enabled, even if there is only a single datanode available, 
the hedged read loop will respect the ignored nodes list and never send more 
than one request, but retry for quite some time choosing a datanode.

This is unfortunate, as the ignored nodes list is only ever added to and never 
removed from in the scope of a single request, therefore a single failed read 
fails the entire request *or* delays responses.

There's actually a secondary undesirable behavior here too. If a hedged read 
can't find a datanode, it will delay a successful response considerably. To set 
the stage, lets say 10ms is the hedged read timeout and we only have a single 
replica available, that is, nodes=[DN1]. 

1. [0ms] {{DFSInputStream#hedgedFetchBlockByteRange}} First (not-hedged) read 
is sent to DN1. In the future, the read takes 50ms to succeed. 
ignoredNodes=[DN1]
2. [10ms] Poll timeout. Send hedged request
3. [10ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes 
DN1, we re-query the NameNode for block locations and sleep, trying again.
4. [+3000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes 
includes DN1, we re-query the NameNode for block locations and sleep, trying 
again.
5. [+3000+6000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes 
includes DN1, we re-query the NameNode for block locations and sleep, trying 
again.
6. [+6000ms+9000ms] {{DFSInputStream#chooseDataNode}} is called. As 
ignoredNodes includes DN1, we re-query the NameNode for block locations and 
sleep, trying again.
7. [27010ms] Control flow restored to 
{{DFSInputStream#hedgedFetchBlockByteRange}}, completion service is polled and 
read that succeeded at [50ms] returned successfully, except 

[jira] [Updated] (HDFS-10597) DFSClient hangs if using hedged reads and all but one eligible replica is down

2016-07-05 Thread Michael Rose (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Rose updated HDFS-10597:

Description: 
If hedged reads are enabled, even if there is only a single datanode available, 
the hedged read loop will respect the ignored nodes list and never send more 
than one request, but retry for quite some time choosing a datanode.

This is unfortunate, as the ignored nodes list is only ever added to and never 
removed from in the scope of a single request, therefore a single failed read 
fails the entire request *or* delays responses.

There's actually a secondary undesirable behavior here too. If a hedged read 
can't find a datanode, it will delay a successful response considerably. To set 
the stage, lets say 10ms is the hedged read timeout and we only have a single 
replica available, that is, nodes=[DN1]. 

1. [0ms] {{DFSInputStream#hedgedFetchBlockByteRange}} First (not-hedged) read 
is sent to DN1. In the future, the read takes 50ms to succeed. 
ignoredNodes=[DN1]
2. [10ms] Poll timeout. Send hedged request
3. [10ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes 
DN1, we re-query the NameNode for block locations and sleep, trying again.
4. [+3000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes 
includes DN1, we re-query the NameNode for block locations and sleep, trying 
again.
5. [+3000+6000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes 
includes DN1, we re-query the NameNode for block locations and sleep, trying 
again.
6. [+6000ms+9000ms] {{DFSInputStream#chooseDataNode}} is called. As 
ignoredNodes includes DN1, we re-query the NameNode for block locations and 
sleep, trying again.
7. [27010ms] Control flow restored to 
{{DFSInputStream#hedgedFetchBlockByteRange}}, completion service is polled and 
read that succeeded at [50ms] returned successfully, except +27000ms extra 
(worst case, expected value would be half).

This is only one scenario (a happy scenario). Supposing that the first read 
eventually fails, the DFSClient will still retry inside of 
`DFSInputStream#hedgedFetchBlockByteRange` for the same retries before failing.

I've identified one way to fix the behavior, but I'd be interested in thoughts:

{{DFSInputStream#getBestNodeDNAddrPair}}, there's a check to see if a node is 
in the ignored list before allowing it to be returned. Amending this check to 
short-circuit if there's only a single available node avoids the regrettably 
useless retries, that is:

{{nodes.length == 1 || ignoredNodes == null || 
!ignoredNodes.contains(nodes[i])}}

However, with this change, if there's only one DN available, it'll send the 
hedged request to it as well. Better behavior would be to fail hedged requests 
quickly *or* push the waiting work into the hedge pool so that successful, fast 
reads aren't blocked by this issue.

In our situation, we run a HBase cluster with HDFS RF=2 and hedged reads 
enabled, stopping a single datanode leads to the cluster coming to a grinding 
halt.

You can observe this behavior yourself by editing 
TestPread#testMaxOutHedgedReadPool's MiniDFSCluster to have a single datanode.

  was:
If hedged reads are enabled, even if there is only a single datanode available, 
the hedged read loop will respect the ignored nodes list and never send more 
than one request, but retry for quite some time choosing a datanode.

This is unfortunate, as the ignored nodes list is only ever added to and never 
removed from in the scope of a single request, therefore a single failed read 
fails the entire request *or* delays responses.

There's actually a secondary undesirable behavior here too. If a hedged read 
can't find a datanode, it will delay a successful response considerably. To set 
the stage, lets say 10ms is the hedged read timeout and we only have a single 
replica available, that is, nodes=[DN1]. 

1. [0ms] {{DFSInputStream#hedgedFetchBlockByteRange}} First (not-hedged) read 
is sent to DN1. In the future, the read takes 50ms to succeed. 
ignoredNodes=[DN1]
2. [10ms] Poll timeout. Send hedged request
3. [10ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes 
DN1, we re-query the NameNode for block locations and sleep, trying again.
4. [+3000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes 
includes DN1, we re-query the NameNode for block locations and sleep, trying 
again.
5. [+3000+6000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes 
includes DN1, we re-query the NameNode for block locations and sleep, trying 
again.
6. [+6000ms+9000ms] {{DFSInputStream#chooseDataNode}} is called. As 
ignoredNodes includes DN1, we re-query the NameNode for block locations and 
sleep, trying again.
7. [27010ms] Control flow restored to 
{{DFSInputStream#hedgedFetchBlockByteRange}}, completion service is polled and 
read that succeeded at [50ms] returned successfully, except 

[jira] [Updated] (HDFS-10597) DFSClient hangs if using hedged reads and all but one eligible replica is down

2016-07-05 Thread Michael Rose (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Rose updated HDFS-10597:

Description: 
If hedged reads are enabled, even if there is only a single datanode available, 
the hedged read loop will respect the ignored nodes list and never send more 
than one request, but retry for quite some time choosing a datanode.

This is unfortunate, as the ignored nodes list is only ever added to and never 
removed from in the scope of a single request, therefore a single failed read 
fails the entire request *or* delays responses.

There's actually a secondary undesirable behavior here too. If a hedged read 
can't find a datanode, it will delay a successful response considerably. To set 
the stage, lets say 10ms is the hedged read timeout and we only have a single 
replica available, that is, nodes=[DN1]. 

1. [0ms] {{DFSInputStream#hedgedFetchBlockByteRange}} First (not-hedged) read 
is sent to DN1. In the future, the read takes 50ms to succeed. 
ignoredNodes=[DN1]
2. [10ms] Poll timeout. Send hedged request
3. [10ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes 
DN1, we re-query the NameNode for block locations and sleep, trying again.
4. [+3000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes 
includes DN1, we re-query the NameNode for block locations and sleep, trying 
again.
5. [+3000+6000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes 
includes DN1, we re-query the NameNode for block locations and sleep, trying 
again.
6. [+6000ms+9000ms] {{DFSInputStream#chooseDataNode}} is called. As 
ignoredNodes includes DN1, we re-query the NameNode for block locations and 
sleep, trying again.
7. [27010ms] Control flow restored to 
{{DFSInputStream#hedgedFetchBlockByteRange}}, completion service is polled and 
read that succeeded at [50ms] returned successfully, except +27000ms extra 
(worst case, expected value would be half).

This is only one scenario (a happy scenario). Supposing that the first read 
eventually fails, the DFSClient will still retry inside of 
`DFSInputStream#hedgedFetchBlockByteRange` for the same retries before failing.

I've identified one way to fix the behavior, but I'd be interested in thoughts:

{{DFSInputStream#getBestNodeDNAddrPair`, there's a check to see if a node is in 
the ignored list before allowing it to be returned. Amending this check to 
short-circuit if there's only a single available node avoids the regrettably 
useless retries, that is:

{{nodes.length == 1 || ignoredNodes == null || !ignoredNodes.contains(nodes[i])`

However, with this change, if there's only one DN available, it'll send the 
hedged request to it as well. Better behavior would be to fail hedged requests 
quickly *or* push the waiting work into the hedge pool so that successful, fast 
reads aren't blocked by this issue.

In our situation, we run a HBase cluster with HDFS RF=2 and hedged reads 
enabled, stopping a single datanode leads to the cluster coming to a grinding 
halt.

You can observe this behavior yourself by editing 
TestPread#testMaxOutHedgedReadPool's MiniDFSCluster to have a single datanode.

  was:
If hedged reads are enabled, even if there is only a single datanode available, 
the hedged read loop will respect the ignored nodes list and never send more 
than one request, but retry for quite some time choosing a datanode.

This is unfortunate, as the ignored nodes list is only ever added to and never 
removed from in the scope of a single request, therefore a single failed read 
fails the entire request *or* delays responses.

There's actually a secondary undesirable behavior here too. To set the stage, 
lets say 10ms is the hedged read timeout and we only have a single replica 
available. If a hedged read can't find a datanode, it will delay a successful 
response considerably.

1. [0ms] {{DFSInputStream#hedgedFetchBlockByteRange}} First (not-hedged) read 
is sent to DN1, read takes 50ms to succeed. ignoredNodes=[DN1]
2. [+10ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes 
includes DN1, we re-query the NameNode for block locations and sleep, trying 
again.
3. [+3000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes 
includes DN1, we re-query the NameNode for block locations and sleep, trying 
again.
3. [+3000+6000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes 
includes DN1, we re-query the NameNode for block locations and sleep, trying 
again.
4. [+6000ms+9000ms] {{DFSInputStream#chooseDataNode}} is called. As 
ignoredNodes includes DN1, we re-query the NameNode for block locations and 
sleep, trying again.
5. [27010ms] Control flow restored to 
{{DFSInputStream#hedgedFetchBlockByteRange}}, completion service is polled and 
read that succeeded at [50ms] returned successfully, except +27000ms extra 
(worst case, expected value would be half).

This is only one scenario (a 

[jira] [Updated] (HDFS-10597) DFSClient hangs if using hedged reads and all but one eligible replica is down

2016-07-05 Thread Michael Rose (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Rose updated HDFS-10597:

Description: 
If hedged reads are enabled, even if there is only a single datanode available, 
the hedged read loop will respect the ignored nodes list and never send more 
than one request, but retry for quite some time choosing a datanode.

This is unfortunate, as the ignored nodes list is only ever added to and never 
removed from in the scope of a single request, therefore a single failed read 
fails the entire request *or* delays responses.

There's actually a secondary undesirable behavior here too. To set the stage, 
lets say 10ms is the hedged read timeout and we only have a single replica 
available. If a hedged read can't find a datanode, it will delay a successful 
response considerably.

1. [0ms] {{DFSInputStream#hedgedFetchBlockByteRange}} First (not-hedged) read 
is sent to DN1, read takes 50ms to succeed. ignoredNodes=[DN1]
2. [+10ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes 
includes DN1, we re-query the NameNode for block locations and sleep, trying 
again.
3. [+3000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes 
includes DN1, we re-query the NameNode for block locations and sleep, trying 
again.
3. [+3000+6000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes 
includes DN1, we re-query the NameNode for block locations and sleep, trying 
again.
4. [+6000ms+9000ms] {{DFSInputStream#chooseDataNode}} is called. As 
ignoredNodes includes DN1, we re-query the NameNode for block locations and 
sleep, trying again.
5. [27010ms] Control flow restored to 
{{DFSInputStream#hedgedFetchBlockByteRange}}, completion service is polled and 
read that succeeded at [50ms] returned successfully, except +27000ms extra 
(worst case, expected value would be half).

This is only one scenario (a happy scenario). Supposing that the first read 
eventually fails, the DFSClient will still retry inside of 
`DFSInputStream#hedgedFetchBlockByteRange` for the same retries before failing.

I've identified one way to fix the behavior, but I'd be interested in thoughts:

{{DFSInputStream#getBestNodeDNAddrPair`, there's a check to see if a node is in 
the ignored list before allowing it to be returned. Amending this check to 
short-circuit if there's only a single available node avoids the regrettably 
useless retries, that is:

{{nodes.length == 1 || ignoredNodes == null || !ignoredNodes.contains(nodes[i])`

However, with this change, if there's only one DN available, it'll send the 
hedged request to it as well. Better behavior would be to fail hedged requests 
quickly *or* push the waiting work into the hedge pool so that successful, fast 
reads aren't blocked by this issue.

In our situation, we run a HBase cluster with HDFS RF=2 and hedged reads 
enabled, stopping a single datanode leads to the cluster coming to a grinding 
halt.

You can observe this behavior yourself by editing 
TestPread#testMaxOutHedgedReadPool's MiniDFSCluster to have a single datanode.

  was:
If hedged reads are enabled, even if there is only a single datanode available, 
the hedged read loop will respect the ignored nodes list and never send more 
than one request, but retry for quite some time choosing a datanode.

This is unfortunate, as the ignored nodes list is only ever added to and never 
removed from in the scope of a single request, therefore a single failed read 
fails the entire request *or* delays responses.

There's actually a secondary undesirable behavior here too. To set the stage, 
lets say 10ms is the hedged read timeout and we only have a single replica 
available. If a hedged read can't find a datanode, it will delay a successful 
response considerably.

1. [0ms] `DFSInputStream#hedgedFetchBlockByteRange` First (not-hedged) read is 
sent to DN1, read takes 50ms to succeed. ignoredNodes=[DN1]
2. [+10ms] `DFSInputStream#chooseDataNode` is called. As ignoredNodes includes 
DN1, we re-query the NameNode for block locations and sleep, trying again.
3. [+3000ms] `DFSInputStream#chooseDataNode` is called. As ignoredNodes 
includes DN1, we re-query the NameNode for block locations and sleep, trying 
again.
3. [+3000+6000ms] `DFSInputStream#chooseDataNode` is called. As ignoredNodes 
includes DN1, we re-query the NameNode for block locations and sleep, trying 
again.
4. [+6000ms+9000ms] `DFSInputStream#chooseDataNode` is called. As ignoredNodes 
includes DN1, we re-query the NameNode for block locations and sleep, trying 
again.
5. [27010ms] Control flow restored to 
`DFSInputStream#hedgedFetchBlockByteRange`, completion service is polled and 
read that succeeded at [50ms] returned successfully, except +27000ms extra 
(worst case, expected value would be half).

This is only one scenario (a happy scenario). Supposing that the first read 
eventually fails, the DFSClient will still retry 

[jira] [Created] (HDFS-10597) DFSClient hangs if using hedged reads and all but one eligible replica is down

2016-07-05 Thread Michael Rose (JIRA)
Michael Rose created HDFS-10597:
---

 Summary: DFSClient hangs if using hedged reads and all but one 
eligible replica is down 
 Key: HDFS-10597
 URL: https://issues.apache.org/jira/browse/HDFS-10597
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.7.0, 2.6.0
Reporter: Michael Rose


If hedged reads are enabled, even if there is only a single datanode available, 
the hedged read loop will respect the ignored nodes list and never send more 
than one request, but retry for quite some time choosing a datanode.

This is unfortunate, as the ignored nodes list is only ever added to and never 
removed from in the scope of a single request, therefore a single failed read 
fails the entire request *or* delays responses.

There's actually a secondary undesirable behavior here too. To set the stage, 
lets say 10ms is the hedged read timeout and we only have a single replica 
available. If a hedged read can't find a datanode, it will delay a successful 
response considerably.

1. [0ms] `DFSInputStream#hedgedFetchBlockByteRange` First (not-hedged) read is 
sent to DN1, read takes 50ms to succeed. ignoredNodes=[DN1]
2. [+10ms] `DFSInputStream#chooseDataNode` is called. As ignoredNodes includes 
DN1, we re-query the NameNode for block locations and sleep, trying again.
3. [+3000ms] `DFSInputStream#chooseDataNode` is called. As ignoredNodes 
includes DN1, we re-query the NameNode for block locations and sleep, trying 
again.
3. [+3000+6000ms] `DFSInputStream#chooseDataNode` is called. As ignoredNodes 
includes DN1, we re-query the NameNode for block locations and sleep, trying 
again.
4. [+6000ms+9000ms] `DFSInputStream#chooseDataNode` is called. As ignoredNodes 
includes DN1, we re-query the NameNode for block locations and sleep, trying 
again.
5. [27010ms] Control flow restored to 
`DFSInputStream#hedgedFetchBlockByteRange`, completion service is polled and 
read that succeeded at [50ms] returned successfully, except +27000ms extra 
(worst case, expected value would be half).

This is only one scenario (a happy scenario). Supposing that the first read 
eventually fails, the DFSClient will still retry inside of 
`DFSInputStream#hedgedFetchBlockByteRange` for the same retries before failing.

I've identified one way to fix the behavior, but I'd be interested in thoughts:

`DFSInputStream#getBestNodeDNAddrPair`, there's a check to see if a node is in 
the ignored list before allowing it to be returned. Amending this check to 
short-circuit if there's only a single available node avoids the regrettably 
useless retries, that is:

`nodes.length == 1 || ignoredNodes == null || !ignoredNodes.contains(nodes[i])`

However, with this change, if there's only one DN available, it'll send the 
hedged request to it as well. Better behavior would be to fail hedged requests 
quickly *or* push the waiting work into the hedge pool so that successful, fast 
reads aren't blocked by this issue.

In our situation, we run a HBase cluster with HDFS RF=2 and hedged reads 
enabled, stopping a single datanode leads to the cluster coming to a grinding 
halt.

You can observe this behavior yourself by editing 
TestPread#testMaxOutHedgedReadPool's MiniDFSCluster to have a single datanode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10596) libhdfs++: Implement hdfsFileIsEncrypted

2016-07-05 Thread Anatoli Shein (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363263#comment-15363263
 ] 

Anatoli Shein commented on HDFS-10596:
--

In order to test this function we need an encryption zone in HDFS, and to set 
it up we need a key provider service running (kms).

To get kms server to run I did the following modifications to the config files:

/etc/hadoop/kms-site.xml:

hadoop.kms.key.provider.uri
jceks://file@/${user.home}/kms.keystore

  URI of the backing KeyProvider for the KMS.

  

  
hadoop.security.keystore.java-keystore-provider.password-file
kms.keystore.password

  If using the JavaKeyStoreProvider, the password for the keystore file.

  

/etc/hadoop/core-site.xml

hadoop.security.key.provider.path
kms://http@localhost:16000/kms

Path to KeyProvider for the KMS.



Then I needed to create a password file like this:
touch 
.../hadoop-2.6.0/share/hadoop/kms/tomcat/webapps/kms/WEB-INF/classes/kms.keystore.password

After that I was able to start/stop KMS service from .../hadoop-2.6.0/sbin  
directory like this:
./kms.sh start
./kms.sh stop

Then I created a new encryption key:
hadoop key create myKey

And was able to list it:
hadoop key list -provider jceks://file@/home/anatoli/kms.keystore -metadata

Created a new directory:
hadoop fs -mkdir hdfs://localhost.localdomain:9433/zone

However I cannot create zone. This is the command I am trying:
hdfs crypto -createZone -keyName myKey -path 
hdfs://localhost.localdomain:9433/zone

And I get this error:
16/07/05 17:12:27 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
RemoteException: Can't create an encryption zone for /zone since no key 
provider is available.

Not sure how to go around this. Does anyone have any ideas?

> libhdfs++: Implement hdfsFileIsEncrypted
> 
>
> Key: HDFS-10596
> URL: https://issues.apache.org/jira/browse/HDFS-10596
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Anatoli Shein
> Attachments: HDFS-10596.HDFS-8707.000.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10596) libhdfs++: Implement hdfsFileIsEncrypted

2016-07-05 Thread Anatoli Shein (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anatoli Shein updated HDFS-10596:
-
Attachment: HDFS-10596.HDFS-8707.000.patch

Initial patch that adds file encryption fields to the statinfo struct, and 
population of these fields in namenode_operations.

> libhdfs++: Implement hdfsFileIsEncrypted
> 
>
> Key: HDFS-10596
> URL: https://issues.apache.org/jira/browse/HDFS-10596
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Anatoli Shein
> Attachments: HDFS-10596.HDFS-8707.000.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10596) libhdfs++: Implement hdfsFileIsEncrypted

2016-07-05 Thread Anatoli Shein (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anatoli Shein updated HDFS-10596:
-
Status: Patch Available  (was: Open)

> libhdfs++: Implement hdfsFileIsEncrypted
> 
>
> Key: HDFS-10596
> URL: https://issues.apache.org/jira/browse/HDFS-10596
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Anatoli Shein
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9890) libhdfs++: Add test suite to simulate network issues

2016-07-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363232#comment-15363232
 ] 

Hadoop QA commented on HDFS-9890:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
22s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
21s{color} | {color:green} HDFS-8707 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  5m 
20s{color} | {color:green} HDFS-8707 passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  5m 
22s{color} | {color:green} HDFS-8707 passed with JDK v1.7.0_101 {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
16s{color} | {color:green} HDFS-8707 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
11s{color} | {color:green} HDFS-8707 passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  5m 
14s{color} | {color:green} the patch passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  5m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  5m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  5m 
18s{color} | {color:green} the patch passed with JDK v1.7.0_101 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  5m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  5m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  7m 
57s{color} | {color:green} hadoop-hdfs-native-client in the patch passed with 
JDK v1.7.0_101. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
18s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 47m 45s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_91 Failed CTEST tests | 
test_libhdfs_threaded_hdfspp_test_shim_static |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0cf5e66 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12816287/HDFS-9890.HDFS-8707.014.patch
 |
| JIRA Issue | HDFS-9890 |
| Optional Tests |  asflicense  compile  cc  mvnsite  javac  unit  |
| uname | Linux 3ef6412d6929 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | HDFS-8707 / d643d8c |
| Default Java | 1.7.0_101 |
| Multi-JDK versions |  /usr/lib/jvm/java-8-oracle:1.8.0_91 
/usr/lib/jvm/java-7-openjdk-amd64:1.7.0_101 |
| CTEST | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15986/artifact/patchprocess/patch-hadoop-hdfs-project_hadoop-hdfs-native-client-jdk1.8.0_91-ctest.txt
 |
| JDK v1.7.0_101  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15986/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs-native-client U: 
hadoop-hdfs-project/hadoop-hdfs-native-client |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15986/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> libhdfs++: Add test suite to simulate network issues
> 
>
> Key: HDFS-9890

[jira] [Commented] (HDFS-10543) hdfsRead read stops at block boundary

2016-07-05 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363214#comment-15363214
 ] 

Colin Patrick McCabe commented on HDFS-10543:
-

One approach would be to try checking the behavior of the Java client and 
seeing if you can do something similar.  It is not incorrect to avoid short 
reads, just potentially inefficient.

> hdfsRead read stops at block boundary
> -
>
> Key: HDFS-10543
> URL: https://issues.apache.org/jira/browse/HDFS-10543
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Xiaowei Zhu
> Fix For: HDFS-8707
>
> Attachments: HDFS-10543.HDFS-8707.000.patch, 
> HDFS-10543.HDFS-8707.001.patch, HDFS-10543.HDFS-8707.002.patch, 
> HDFS-10543.HDFS-8707.003.patch, HDFS-10543.HDFS-8707.004.patch
>
>
> Reproducer:
> char *buf2 = new char[file_info->mSize];
>   memset(buf2, 0, (size_t)file_info->mSize);
>   int ret = hdfsRead(fs, file, buf2, file_info->mSize);
>   delete [] buf2;
>   if(ret != file_info->mSize) {
> std::stringstream ss;
> ss << "tried to read " << file_info->mSize << " bytes. but read " << 
> ret << " bytes";
> ReportError(ss.str());
> hdfsCloseFile(fs, file);
> continue;
>   }
> When it runs with a file ~1.4GB large, it will return an error like "tried to 
> read 146890 bytes. but read 134217728 bytes". The HDFS cluster it runs 
> against has a block size of 134217728 bytes. So it seems hdfsRead will stop 
> at a block boundary. Looks like a regression. We should add retry to continue 
> reading cross blocks in case of files w/ multiple blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9271) Implement basic NN operations

2016-07-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363178#comment-15363178
 ] 

Hadoop QA commented on HDFS-9271:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
22s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
36s{color} | {color:green} HDFS-8707 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  5m 
48s{color} | {color:green} HDFS-8707 passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  5m 
17s{color} | {color:green} HDFS-8707 passed with JDK v1.7.0_101 {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
15s{color} | {color:green} HDFS-8707 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
13s{color} | {color:green} HDFS-8707 passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  5m 
25s{color} | {color:green} the patch passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  5m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  5m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  5m 
21s{color} | {color:green} the patch passed with JDK v1.7.0_101 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  5m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  5m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  6m 
51s{color} | {color:green} hadoop-hdfs-native-client in the patch passed with 
JDK v1.7.0_101. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
18s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 46m 31s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0cf5e66 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12816278/HDFS-9271.HDFS-8707.002.patch
 |
| JIRA Issue | HDFS-9271 |
| Optional Tests |  asflicense  compile  cc  mvnsite  javac  unit  |
| uname | Linux 02dc3c2befae 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | HDFS-8707 / d643d8c |
| Default Java | 1.7.0_101 |
| Multi-JDK versions |  /usr/lib/jvm/java-8-oracle:1.8.0_91 
/usr/lib/jvm/java-7-openjdk-amd64:1.7.0_101 |
| JDK v1.7.0_101  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15985/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs-native-client U: 
hadoop-hdfs-project/hadoop-hdfs-native-client |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15985/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Implement basic NN operations
> -
>
> Key: HDFS-9271
> URL: https://issues.apache.org/jira/browse/HDFS-9271
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Bob Hansen
>Assignee: Anatoli Shein
> Attachments: HDFS-9271.HDFS-8707.000.patch, 
> 

[jira] [Created] (HDFS-10596) libhdfs++: Implement hdfsFileIsEncrypted

2016-07-05 Thread Anatoli Shein (JIRA)
Anatoli Shein created HDFS-10596:


 Summary: libhdfs++: Implement hdfsFileIsEncrypted
 Key: HDFS-10596
 URL: https://issues.apache.org/jira/browse/HDFS-10596
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Anatoli Shein






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-9890) libhdfs++: Add test suite to simulate network issues

2016-07-05 Thread Xiaowei Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaowei Zhu updated HDFS-9890:
--
Attachment: HDFS-9890.HDFS-8707.014.patch

HDFS-9890.HDFS-8707.014.patch removes the debug build flag in pom.xml. Also fix 
the whitespace issue reported in previous patch.

> libhdfs++: Add test suite to simulate network issues
> 
>
> Key: HDFS-9890
> URL: https://issues.apache.org/jira/browse/HDFS-9890
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: James Clampffer
>Assignee: Xiaowei Zhu
> Attachments: HDFS-9890.HDFS-8707.000.patch, 
> HDFS-9890.HDFS-8707.001.patch, HDFS-9890.HDFS-8707.002.patch, 
> HDFS-9890.HDFS-8707.003.patch, HDFS-9890.HDFS-8707.004.patch, 
> HDFS-9890.HDFS-8707.005.patch, HDFS-9890.HDFS-8707.006.patch, 
> HDFS-9890.HDFS-8707.007.patch, HDFS-9890.HDFS-8707.008.patch, 
> HDFS-9890.HDFS-8707.009.patch, HDFS-9890.HDFS-8707.010.patch, 
> HDFS-9890.HDFS-8707.011.patch, HDFS-9890.HDFS-8707.012.patch, 
> HDFS-9890.HDFS-8707.012.patch, HDFS-9890.HDFS-8707.013.patch, 
> HDFS-9890.HDFS-8707.013.patch, HDFS-9890.HDFS-8707.014.patch, 
> hs_err_pid26832.log, hs_err_pid4944.log
>
>
> I propose adding a test suite to simulate various network issues/failures in 
> order to get good test coverage on some of the retry paths that aren't easy 
> to hit in mock unit tests.
> At the moment the only things that hit the retry paths are the gmock unit 
> tests.  The gmock are only as good as their mock implementations which do a 
> great job of simulating protocol correctness but not more complex 
> interactions.  They also can't really simulate the types of lock contention 
> and subtle memory stomps that show up while doing hundreds or thousands of 
> concurrent reads.   We should add a new minidfscluster test that focuses on 
> heavy read/seek load and then randomly convert error codes returned by 
> network functions into errors.
> List of things to simulate(while heavily loaded), roughly in order of how 
> badly I think they need to be tested at the moment:
> -Rpc connection disconnect
> -Rpc connection slowed down enough to cause a timeout and trigger retry
> -DN connection disconnect



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-9271) Implement basic NN operations

2016-07-05 Thread Anatoli Shein (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anatoli Shein updated HDFS-9271:

Attachment: HDFS-9271.HDFS-8707.002.patch

Patch attached, please review. In this patch:
implemented: hdfsAvailable, hdfsFileIsOpenForWrite, hdfsExists, 
hdfsGetDefaultBlockSizeAtPath, hdfsSetReplication, hdfsGetWorkingDirectory, 
hdfsSetWorkingDirectory, hdfsGetHosts, hdfsFreeHosts, hdfsUtime, 
hdfsFileGetReadStatistics, hdfsFileClearReadStatistics, 
hdfsFileFreeReadStatistics, hdfsReadStatisticsGetRemoteBytesRead, small 
consistency fixes in hdfsCreateDirectory and hdfsGetBlockLocations.

> Implement basic NN operations
> -
>
> Key: HDFS-9271
> URL: https://issues.apache.org/jira/browse/HDFS-9271
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Bob Hansen
>Assignee: Anatoli Shein
> Attachments: HDFS-9271.HDFS-8707.000.patch, 
> HDFS-9271.HDFS-8707.001.patch, HDFS-9271.HDFS-8707.002.patch
>
>
> Expose via C and C++ API:
> * mkdirs
> * rename
> * delete
> * stat
> * chmod
> * chown
> * getListing
> * setOwner



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-9271) Implement basic NN operations

2016-07-05 Thread Anatoli Shein (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anatoli Shein updated HDFS-9271:

Status: Patch Available  (was: Open)

> Implement basic NN operations
> -
>
> Key: HDFS-9271
> URL: https://issues.apache.org/jira/browse/HDFS-9271
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Bob Hansen
>Assignee: Anatoli Shein
> Attachments: HDFS-9271.HDFS-8707.000.patch, 
> HDFS-9271.HDFS-8707.001.patch
>
>
> Expose via C and C++ API:
> * mkdirs
> * rename
> * delete
> * stat
> * chmod
> * chown
> * getListing
> * setOwner



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10543) hdfsRead read stops at block boundary

2016-07-05 Thread Xiaowei Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363033#comment-15363033
 ] 

Xiaowei Zhu commented on HDFS-10543:


Thanks Colin for the review. I believe this patch still supports short reads, 
just it won't automatically stop and return at block boundary. Obviously the 
test should not log something looks like an error, which should be fixed. 

So read stop at block boundary is by design? In that case we should also revert 
this commit.

> hdfsRead read stops at block boundary
> -
>
> Key: HDFS-10543
> URL: https://issues.apache.org/jira/browse/HDFS-10543
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Xiaowei Zhu
> Fix For: HDFS-8707
>
> Attachments: HDFS-10543.HDFS-8707.000.patch, 
> HDFS-10543.HDFS-8707.001.patch, HDFS-10543.HDFS-8707.002.patch, 
> HDFS-10543.HDFS-8707.003.patch, HDFS-10543.HDFS-8707.004.patch
>
>
> Reproducer:
> char *buf2 = new char[file_info->mSize];
>   memset(buf2, 0, (size_t)file_info->mSize);
>   int ret = hdfsRead(fs, file, buf2, file_info->mSize);
>   delete [] buf2;
>   if(ret != file_info->mSize) {
> std::stringstream ss;
> ss << "tried to read " << file_info->mSize << " bytes. but read " << 
> ret << " bytes";
> ReportError(ss.str());
> hdfsCloseFile(fs, file);
> continue;
>   }
> When it runs with a file ~1.4GB large, it will return an error like "tried to 
> read 146890 bytes. but read 134217728 bytes". The HDFS cluster it runs 
> against has a block size of 134217728 bytes. So it seems hdfsRead will stop 
> at a block boundary. Looks like a regression. We should add retry to continue 
> reading cross blocks in case of files w/ multiple blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9890) libhdfs++: Add test suite to simulate network issues

2016-07-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363030#comment-15363030
 ] 

Hadoop QA commented on HDFS-9890:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
27s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
29s{color} | {color:green} HDFS-8707 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
28s{color} | {color:green} HDFS-8707 passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
21s{color} | {color:green} HDFS-8707 passed with JDK v1.7.0_101 {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
17s{color} | {color:green} HDFS-8707 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
16s{color} | {color:green} HDFS-8707 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
11s{color} | {color:green} HDFS-8707 passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
13s{color} | {color:green} HDFS-8707 passed with JDK v1.7.0_101 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  4m 
39s{color} | {color:green} the patch passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  4m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  4m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  4m 
47s{color} | {color:green} the patch passed with JDK v1.7.0_101 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  4m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  4m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m  
7s{color} | {color:green} the patch passed with JDK v1.8.0_91 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed with JDK v1.7.0_101 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  6m  3s{color} 
| {color:red} hadoop-hdfs-native-client in the patch failed with JDK 
v1.7.0_101. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
19s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 52m 46s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_91 Failed CTEST tests | 
test_libhdfs_threaded_hdfspp_test_shim_static |
|   | test_libhdfs_mini_stress_hdfspp_test_shim_static |
| JDK v1.7.0_101 Failed CTEST tests | 
test_libhdfs_mini_stress_hdfspp_test_shim_static |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0cf5e66 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12816253/HDFS-9890.HDFS-8707.013.patch
 |
| JIRA Issue | HDFS-9890 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  xml  cc  |
| uname | Linux 50584004b9cb 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 

[jira] [Commented] (HDFS-10555) Unable to loadFSEdits due to a failure in readCachePoolInfo

2016-07-05 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363005#comment-15363005
 ] 

Colin Patrick McCabe commented on HDFS-10555:
-

Thanks, [~umamaheswararao], [~jingzhao], and [~kihwal].

> Unable to loadFSEdits due to a failure in readCachePoolInfo
> ---
>
> Key: HDFS-10555
> URL: https://issues.apache.org/jira/browse/HDFS-10555
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: caching, namenode
>Affects Versions: 2.9.0
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Critical
> Fix For: 2.9.0
>
> Attachments: HDFS-10555-00.patch
>
>
> Recently some tests are failing and unable to loadFSEdits due to a failure in 
> readCachePoolInfo.
> Here in below code
> FSImageSerialization.java
> {code}
>   }
> if ((flags & ~0x2F) != 0) {
>   throw new IOException("Unknown flag in CachePoolInfo: " + flags);
> }
> {code}
> When all values of CachePool variable set to true, flags value & ~0x2F turns 
> out to non zero value. So, this condition failing due to the addition of 0x20 
>  and changing  value from ~0x1F to ~0x2F.
> May be to fix this issue, we may can change multiply value to ~0x3F 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10169) TestEditLog.testBatchedSyncWithClosedLogs with useAsyncEditLog sometimes fails

2016-07-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363004#comment-15363004
 ] 

Hadoop QA commented on HDFS-10169:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
26s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
 3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 71m 
11s{color} | {color:green} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 93m 44s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:85209cc |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12816232/HDFS-10169-00.patch |
| JIRA Issue | HDFS-10169 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 4b6277e2607f 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 8b4b525 |
| Default Java | 1.8.0_91 |
| findbugs | v3.0.0 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15983/artifact/patchprocess/whitespace-eol.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15983/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15983/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> TestEditLog.testBatchedSyncWithClosedLogs with useAsyncEditLog sometimes fails
> --
>
> Key: HDFS-10169
> URL: https://issues.apache.org/jira/browse/HDFS-10169
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Rakesh R
> Attachments: HDFS-10169-00.patch
>
>
> This failure has been seen multiple 

[jira] [Commented] (HDFS-10548) Remove the long deprecated BlockReaderRemote

2016-07-05 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363003#comment-15363003
 ] 

Colin Patrick McCabe commented on HDFS-10548:
-

Thanks for tackling this, guys.  It is good to see this code duplication 
finally go away.  Next target: {{BlockReaderLocalLegacy}}?

I do think renaming {{BlockReaderRemote2}} will make merging code back to 
branch-2 more difficult-- you might want to reconsider that.

> Remove the long deprecated BlockReaderRemote
> 
>
> Key: HDFS-10548
> URL: https://issues.apache.org/jira/browse/HDFS-10548
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Kai Zheng
>Assignee: Kai Zheng
> Fix For: 3.0.0-alpha1
>
> Attachments: HDFS-10548-v1.patch, HDFS-10548-v2.patch, 
> HDFS-10548-v3.patch
>
>
> To lessen the maintain burden like raised in HDFS-8901, suggest we remove 
> {{BlockReaderRemote}} class that's deprecated very long time ago. 
> From {{BlockReaderRemote}} header:
> {quote}
>  * @deprecated this is an old implementation that is being left around
>  * in case any issues spring up with the new {@link BlockReaderRemote2}
>  * implementation.
>  * It will be removed in the next release.
> {quote}
> From {{BlockReaderRemote2}} class header:
> {quote}
>  * This is a new implementation introduced in Hadoop 0.23 which
>  * is more efficient and simpler than the older BlockReader
>  * implementation. It should be renamed to BlockReaderRemote
>  * once we are confident in it.
> {quote}
> So even further, after getting rid of the old class, we could rename as the 
> comment suggested: BlockReaderRemote2 => BlockReaderRemote.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10543) hdfsRead read stops at block boundary

2016-07-05 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15362997#comment-15362997
 ] 

Colin Patrick McCabe commented on HDFS-10543:
-

Just to be clear, the existing HDFS Java client can return "short reads" that 
are less than what was requested, even when there is more remaining in the 
file.  This is traditional in POSIX and nearly all filesystems I'm aware of 
have these semantics.  The justification is that applications may not want to 
wait a long time to fetch more bytes, if there are some bytes available already 
that they can process.  Applications that do want the full buffer can just call 
read() again.  APIs like {{readFully}} exist to provide these semantics.

> hdfsRead read stops at block boundary
> -
>
> Key: HDFS-10543
> URL: https://issues.apache.org/jira/browse/HDFS-10543
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: Xiaowei Zhu
> Fix For: HDFS-8707
>
> Attachments: HDFS-10543.HDFS-8707.000.patch, 
> HDFS-10543.HDFS-8707.001.patch, HDFS-10543.HDFS-8707.002.patch, 
> HDFS-10543.HDFS-8707.003.patch, HDFS-10543.HDFS-8707.004.patch
>
>
> Reproducer:
> char *buf2 = new char[file_info->mSize];
>   memset(buf2, 0, (size_t)file_info->mSize);
>   int ret = hdfsRead(fs, file, buf2, file_info->mSize);
>   delete [] buf2;
>   if(ret != file_info->mSize) {
> std::stringstream ss;
> ss << "tried to read " << file_info->mSize << " bytes. but read " << 
> ret << " bytes";
> ReportError(ss.str());
> hdfsCloseFile(fs, file);
> continue;
>   }
> When it runs with a file ~1.4GB large, it will return an error like "tried to 
> read 146890 bytes. but read 134217728 bytes". The HDFS cluster it runs 
> against has a block size of 134217728 bytes. So it seems hdfsRead will stop 
> at a block boundary. Looks like a regression. We should add retry to continue 
> reading cross blocks in case of files w/ multiple blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-9805) TCP_NODELAY not set before SASL handshake in data transfer pipeline

2016-07-05 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-9805:
---
   Resolution: Fixed
Fix Version/s: 3.0.0-alpha1
   Status: Resolved  (was: Patch Available)

> TCP_NODELAY not set before SASL handshake in data transfer pipeline
> ---
>
> Key: HDFS-9805
> URL: https://issues.apache.org/jira/browse/HDFS-9805
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Gary Helmling
>Assignee: Gary Helmling
> Fix For: 3.0.0-alpha1
>
> Attachments: HDFS-9805.002.patch, HDFS-9805.003.patch, 
> HDFS-9805.004.patch, HDFS-9805.005.patch
>
>
> There are a few places in the DN -> DN block transfer pipeline where 
> TCP_NODELAY is not set before doing a SASL handshake:
> * in {{DataNode.DataTransfer::run()}}
> * in {{DataXceiver::replaceBlock()}}
> * in {{DataXceiver::writeBlock()}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9805) TCP_NODELAY not set before SASL handshake in data transfer pipeline

2016-07-05 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15362983#comment-15362983
 ] 

Colin Patrick McCabe commented on HDFS-9805:


Thanks for the reminder, [~jzhuge].  I committed the patch last week, but JIRA 
went down before I could mark the ticket as resolved.

I have committed this to trunk only for the moment.  The backport to branch-2 
looks like it might be a little tricky, and our next release will be 3.0 
anyway.  If anyone is interested in backporting to branch-2, please do and 
update the ticket. Cheers.

> TCP_NODELAY not set before SASL handshake in data transfer pipeline
> ---
>
> Key: HDFS-9805
> URL: https://issues.apache.org/jira/browse/HDFS-9805
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Gary Helmling
>Assignee: Gary Helmling
> Fix For: 3.0.0-alpha1
>
> Attachments: HDFS-9805.002.patch, HDFS-9805.003.patch, 
> HDFS-9805.004.patch, HDFS-9805.005.patch
>
>
> There are a few places in the DN -> DN block transfer pipeline where 
> TCP_NODELAY is not set before doing a SASL handshake:
> * in {{DataNode.DataTransfer::run()}}
> * in {{DataXceiver::replaceBlock()}}
> * in {{DataXceiver::writeBlock()}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10594) HDFS-4949 should support recursive cache directives

2016-07-05 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-10594:

Summary: HDFS-4949 should support recursive cache directives  (was: 
CacheReplicationMonitor should recursively rescan the path when the inode of 
the path is directory)

> HDFS-4949 should support recursive cache directives
> ---
>
> Key: HDFS-10594
> URL: https://issues.apache.org/jira/browse/HDFS-10594
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: caching
>Affects Versions: 2.7.1
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
> Attachments: HDFS-10594.001.patch
>
>
> In {{CacheReplicationMonitor#rescanCacheDirectives}}, it should recursively 
> rescan the path when the inode of the path is a directory. In these code:
> {code}
> } else if (node.isDirectory()) {
> INodeDirectory dir = node.asDirectory();
> ReadOnlyList children = dir
> .getChildrenList(Snapshot.CURRENT_STATE_ID);
> for (INode child : children) {
>   if (child.isFile()) {
> rescanFile(directive, child.asFile());
>   }
> }
>}
> {code}
> If we did the this logic, it means that some inode files will be ignored when 
> the child inode is also a directory and there are some other child inode file 
> in it. Finally the child's child file which belong to this path will not be 
> cached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10594) CacheReplicationMonitor should recursively rescan the path when the inode of the path is directory

2016-07-05 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-10594:
---
Issue Type: Improvement  (was: Bug)

Agree with Chris, marking this as an Improvement rather than Bug.

> CacheReplicationMonitor should recursively rescan the path when the inode of 
> the path is directory
> --
>
> Key: HDFS-10594
> URL: https://issues.apache.org/jira/browse/HDFS-10594
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: caching
>Affects Versions: 2.7.1
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
> Attachments: HDFS-10594.001.patch
>
>
> In {{CacheReplicationMonitor#rescanCacheDirectives}}, it should recursively 
> rescan the path when the inode of the path is a directory. In these code:
> {code}
> } else if (node.isDirectory()) {
> INodeDirectory dir = node.asDirectory();
> ReadOnlyList children = dir
> .getChildrenList(Snapshot.CURRENT_STATE_ID);
> for (INode child : children) {
>   if (child.isFile()) {
> rescanFile(directive, child.asFile());
>   }
> }
>}
> {code}
> If we did the this logic, it means that some inode files will be ignored when 
> the child inode is also a directory and there are some other child inode file 
> in it. Finally the child's child file which belong to this path will not be 
> cached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9805) TCP_NODELAY not set before SASL handshake in data transfer pipeline

2016-07-05 Thread John Zhuge (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15362916#comment-15362916
 ] 

John Zhuge commented on HDFS-9805:
--

[~cmccabe]: You committed the patch into trunk on 6/29. Do you plan to resolve 
the jira?

> TCP_NODELAY not set before SASL handshake in data transfer pipeline
> ---
>
> Key: HDFS-9805
> URL: https://issues.apache.org/jira/browse/HDFS-9805
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Gary Helmling
>Assignee: Gary Helmling
> Attachments: HDFS-9805.002.patch, HDFS-9805.003.patch, 
> HDFS-9805.004.patch, HDFS-9805.005.patch
>
>
> There are a few places in the DN -> DN block transfer pipeline where 
> TCP_NODELAY is not set before doing a SASL handshake:
> * in {{DataNode.DataTransfer::run()}}
> * in {{DataXceiver::replaceBlock()}}
> * in {{DataXceiver::writeBlock()}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10564) UNDER MIN REPL'D BLOCKS should be prioritized for replication

2016-07-05 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15362879#comment-15362879
 ] 

Elliott Clark commented on HDFS-10564:
--

Yeah sorry Draining means decommissioning.

> UNDER MIN REPL'D BLOCKS should be prioritized for replication
> -
>
> Key: HDFS-10564
> URL: https://issues.apache.org/jira/browse/HDFS-10564
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Elliott Clark
>
> When datanodes get drained they are probably being drained because the 
> hardware is bad, or suspect. The blocks that have no live nodes should be 
> prioritized. However it appears not to be the case at all.
> Draining full nodes with lots of blocks but only a handful of under min 
> replicated blocks takes about the full time before fsck reports clean again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-9890) libhdfs++: Add test suite to simulate network issues

2016-07-05 Thread Xiaowei Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaowei Zhu updated HDFS-9890:
--
Attachment: HDFS-9890.HDFS-8707.013.patch

> libhdfs++: Add test suite to simulate network issues
> 
>
> Key: HDFS-9890
> URL: https://issues.apache.org/jira/browse/HDFS-9890
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: James Clampffer
>Assignee: Xiaowei Zhu
> Attachments: HDFS-9890.HDFS-8707.000.patch, 
> HDFS-9890.HDFS-8707.001.patch, HDFS-9890.HDFS-8707.002.patch, 
> HDFS-9890.HDFS-8707.003.patch, HDFS-9890.HDFS-8707.004.patch, 
> HDFS-9890.HDFS-8707.005.patch, HDFS-9890.HDFS-8707.006.patch, 
> HDFS-9890.HDFS-8707.007.patch, HDFS-9890.HDFS-8707.008.patch, 
> HDFS-9890.HDFS-8707.009.patch, HDFS-9890.HDFS-8707.010.patch, 
> HDFS-9890.HDFS-8707.011.patch, HDFS-9890.HDFS-8707.012.patch, 
> HDFS-9890.HDFS-8707.012.patch, HDFS-9890.HDFS-8707.013.patch, 
> HDFS-9890.HDFS-8707.013.patch, hs_err_pid26832.log, hs_err_pid4944.log
>
>
> I propose adding a test suite to simulate various network issues/failures in 
> order to get good test coverage on some of the retry paths that aren't easy 
> to hit in mock unit tests.
> At the moment the only things that hit the retry paths are the gmock unit 
> tests.  The gmock are only as good as their mock implementations which do a 
> great job of simulating protocol correctness but not more complex 
> interactions.  They also can't really simulate the types of lock contention 
> and subtle memory stomps that show up while doing hundreds or thousands of 
> concurrent reads.   We should add a new minidfscluster test that focuses on 
> heavy read/seek load and then randomly convert error codes returned by 
> network functions into errors.
> List of things to simulate(while heavily loaded), roughly in order of how 
> badly I think they need to be tested at the moment:
> -Rpc connection disconnect
> -Rpc connection slowed down enough to cause a timeout and trigger retry
> -DN connection disconnect



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10593) MAX_DIR_ITEMS should not be hard coded since RPC buff size is configurable

2016-07-05 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15362863#comment-15362863
 ] 

Andrew Wang commented on HDFS-10593:


I recall we introduced this limit because it broke fsimage PB serde. The 
configs you mention refer to ipc; did we make fsimage serde similarly 
configurable?

> MAX_DIR_ITEMS should not be hard coded since RPC buff size is configurable 
> ---
>
> Key: HDFS-10593
> URL: https://issues.apache.org/jira/browse/HDFS-10593
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Yuanbo Liu
>
> In HDFS, "dfs.namenode.fs-limits.max-directory-items" was introduced in 
> HDFS-6102 to restrict max items of single directory, and the value of it can 
> not be larger than the value of MAX_DIR_ITEMS. Since 
> "ipc.maximum.data.length" was added in HADOOP-9676 and documented in 
> HADOOP-13039 to make maximum RPC buffer size configurable, it's not proper to 
> hard code the value of MAX_DIR_ITEMS in {{FSDirectory}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-10595) libhdfs++: Client Name Protobuf Error

2016-07-05 Thread Anatoli Shein (JIRA)
Anatoli Shein created HDFS-10595:


 Summary: libhdfs++: Client Name Protobuf Error
 Key: HDFS-10595
 URL: https://issues.apache.org/jira/browse/HDFS-10595
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Anatoli Shein


When running a cat tool 
(/hadoop-hdfs-native-client/src/main/native/libhdfspp/examples/cat/c/cat.c) I 
get the following error:

[libprotobuf ERROR google/protobuf/wire_format.cc:1053] String field contains 
invalid UTF-8 data when serializing a protocol buffer. Use the 'bytes' type if 
you intend to send raw bytes.

However it executes correctly. Looks like this error happens when trying to 
serialize Client name in ClientOperationHeaderProto::SerializeWithCachedSizes 
(/hadoop-hdfs-native-client/target/main/native/libhdfspp/lib/proto/datatransfer.pb.cc)

Possibly the problem is caused by generating client name as a UUID in 
GetRandomClientName 
(/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/common/util.cc)

In Java client it looks like there are two different unique client identifiers: 
ClientName and ClientId:

Client name is generated as:
clientName = "DFSClient_" + dfsClientConf.getTaskId() + "_" + 
ThreadLocalRandom.current().nextInt()  + "_" + Thread.currentThread().getId(); 
(/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java)

ClientId is generated as a UUID in 
(/hadoop-common/src/main/java/org/apache/hadoop/ipc/ClientId.java)

In libhdfs++ we need to possibly also have two unique client identifiers, or 
fix the current client name to work without protobuf warnings/errors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10594) CacheReplicationMonitor should recursively rescan the path when the inode of the path is directory

2016-07-05 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15362829#comment-15362829
 ] 

Chris Nauroth commented on HDFS-10594:
--

During initial implementation, we made an intentional choice that a cache 
directive on a directory applies to its direct children only, not all 
descendants recursively.  This behavior is documented here:

http://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/CentralizedCacheManagement.html#Cache_directive

I'm not in favor of changing this behavior, because it would be an unexpected 
change for users after an upgrade.  It's possible that it would cause the 
DataNode to {{mlock}} a lot more files than pre-upgrade.  This would cause 
either unpredictable caching if the new files exceed 
{{dfs.datanode.max.locked.memory}}, possibly caching files that are not useful 
to cache, or even worse, blowing out memory budget and causing insufficient 
memory for services and YARN containers running on the host.

If there is a desire for this behavior, then a more graceful way to support it 
would be to introduce a notion of a recursive cache directive.  This would 
preserve the existing default behavior of applying only to direct children.  
Users who want the recursive behavior could opt in by passing a new flag while 
creating the cache directive.

> CacheReplicationMonitor should recursively rescan the path when the inode of 
> the path is directory
> --
>
> Key: HDFS-10594
> URL: https://issues.apache.org/jira/browse/HDFS-10594
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: caching
>Affects Versions: 2.7.1
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
> Attachments: HDFS-10594.001.patch
>
>
> In {{CacheReplicationMonitor#rescanCacheDirectives}}, it should recursively 
> rescan the path when the inode of the path is a directory. In these code:
> {code}
> } else if (node.isDirectory()) {
> INodeDirectory dir = node.asDirectory();
> ReadOnlyList children = dir
> .getChildrenList(Snapshot.CURRENT_STATE_ID);
> for (INode child : children) {
>   if (child.isFile()) {
> rescanFile(directive, child.asFile());
>   }
> }
>}
> {code}
> If we did the this logic, it means that some inode files will be ignored when 
> the child inode is also a directory and there are some other child inode file 
> in it. Finally the child's child file which belong to this path will not be 
> cached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10169) TestEditLog.testBatchedSyncWithClosedLogs with useAsyncEditLog sometimes fails

2016-07-05 Thread Rakesh R (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-10169:

Summary: TestEditLog.testBatchedSyncWithClosedLogs with useAsyncEditLog 
sometimes fails  (was: TestEditLog.testBatchedSyncWithClosedLogs sometimes 
fails.)

> TestEditLog.testBatchedSyncWithClosedLogs with useAsyncEditLog sometimes fails
> --
>
> Key: HDFS-10169
> URL: https://issues.apache.org/jira/browse/HDFS-10169
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Rakesh R
> Attachments: HDFS-10169-00.patch
>
>
> This failure has been seen multiple precomit builds recently.
> {noformat}
> testBatchedSyncWithClosedLogs[1](org.apache.hadoop.hdfs.server.namenode.TestEditLog)
>   Time elapsed: 0.377 sec  <<< FAILURE!
> java.lang.AssertionError: logging edit without syncing should do not affect 
> txid expected:<1> but was:<2>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestEditLog.testBatchedSyncWithClosedLogs(TestEditLog.java:594)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10592) Fix intermittent test failure of TestNameNodeResourceChecker#testCheckThatNameNodeResourceMonitorIsRunning

2016-07-05 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15362807#comment-15362807
 ] 

Rakesh R commented on HDFS-10592:
-

Please ignore the test case failures, it unrelated to my patch. I could see 
HDFS-10169 is handling {{TestEditLog}} failure and I've commented in that jira. 
Can someone help me by reviewing the proposed patch/fix. Thanks!

> Fix intermittent test failure of 
> TestNameNodeResourceChecker#testCheckThatNameNodeResourceMonitorIsRunning
> --
>
> Key: HDFS-10592
> URL: https://issues.apache.org/jira/browse/HDFS-10592
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Rakesh R
>Assignee: Rakesh R
> Fix For: 2.8.0
>
> Attachments: HDFS-10592-00.patch, HDFS-10592-01.patch
>
>
> This jira is to fix the 
> {{TestNameNodeResourceChecker#testCheckThatNameNodeResourceMonitorIsRunning}} 
> test case failure.
> Reference 
> [Build_15973|https://builds.apache.org/job/PreCommit-HDFS-Build/15973/testReport/junit/org.apache.hadoop.hdfs.server.namenode/TestNameNodeResourceChecker/testCheckThatNameNodeResourceMonitorIsRunning/]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10169) TestEditLog.testBatchedSyncWithClosedLogs sometimes fails.

2016-07-05 Thread Rakesh R (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-10169:

Target Version/s: 2.8.0
  Status: Patch Available  (was: Open)

> TestEditLog.testBatchedSyncWithClosedLogs sometimes fails.
> --
>
> Key: HDFS-10169
> URL: https://issues.apache.org/jira/browse/HDFS-10169
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Rakesh R
> Attachments: HDFS-10169-00.patch
>
>
> This failure has been seen multiple precomit builds recently.
> {noformat}
> testBatchedSyncWithClosedLogs[1](org.apache.hadoop.hdfs.server.namenode.TestEditLog)
>   Time elapsed: 0.377 sec  <<< FAILURE!
> java.lang.AssertionError: logging edit without syncing should do not affect 
> txid expected:<1> but was:<2>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestEditLog.testBatchedSyncWithClosedLogs(TestEditLog.java:594)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10169) TestEditLog.testBatchedSyncWithClosedLogs sometimes fails.

2016-07-05 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15362800#comment-15362800
 ] 

Rakesh R commented on HDFS-10169:
-

Hi [~cnauroth], I've come across this failure while analyzing HDFS-10592 QA 
report. Following is my analysis:

In case of {{useAsyncEditLog=true}}, {{FSEditLogAsync#syncThread}} thread is 
running in the background and invoking {{logSync(getLastWrittenTxId());}} call. 
This call will increase the transaction id. In the failed test scenario, the 
next assertion statement is not expecting the logSync() call to happen, but its 
is not deterministic due to async {{logSync()}} call.

{code}
  // Log an edit from thread A
  doLogEdit(threadA, editLog, "thread-a 1");
  assertEquals("logging edit without syncing should do not affect txid",
1, editLog.getSyncTxId());
{code}

I think the issue is different from HDFS-10183. Am I missing anything?

Simple way is to skip the test case something similar to 
[TestEditLog_testSyncBatching|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLog.java#L511]
 approach. But I've tried an attempt to fix this in another way by stopping the 
syncThread and then later restarting it to avoid the async {{logSync()}} call. 
I've attached draft patch to show this approach.

Please review the analysis and the patch. Thanks!

> TestEditLog.testBatchedSyncWithClosedLogs sometimes fails.
> --
>
> Key: HDFS-10169
> URL: https://issues.apache.org/jira/browse/HDFS-10169
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Rakesh R
> Attachments: HDFS-10169-00.patch
>
>
> This failure has been seen multiple precomit builds recently.
> {noformat}
> testBatchedSyncWithClosedLogs[1](org.apache.hadoop.hdfs.server.namenode.TestEditLog)
>   Time elapsed: 0.377 sec  <<< FAILURE!
> java.lang.AssertionError: logging edit without syncing should do not affect 
> txid expected:<1> but was:<2>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestEditLog.testBatchedSyncWithClosedLogs(TestEditLog.java:594)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10169) TestEditLog.testBatchedSyncWithClosedLogs sometimes fails.

2016-07-05 Thread Rakesh R (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-10169:

Attachment: HDFS-10169-00.patch

> TestEditLog.testBatchedSyncWithClosedLogs sometimes fails.
> --
>
> Key: HDFS-10169
> URL: https://issues.apache.org/jira/browse/HDFS-10169
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Rakesh R
> Attachments: HDFS-10169-00.patch
>
>
> This failure has been seen multiple precomit builds recently.
> {noformat}
> testBatchedSyncWithClosedLogs[1](org.apache.hadoop.hdfs.server.namenode.TestEditLog)
>   Time elapsed: 0.377 sec  <<< FAILURE!
> java.lang.AssertionError: logging edit without syncing should do not affect 
> txid expected:<1> but was:<2>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestEditLog.testBatchedSyncWithClosedLogs(TestEditLog.java:594)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-10169) TestEditLog.testBatchedSyncWithClosedLogs sometimes fails.

2016-07-05 Thread Rakesh R (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R reassigned HDFS-10169:
---

Assignee: Rakesh R

> TestEditLog.testBatchedSyncWithClosedLogs sometimes fails.
> --
>
> Key: HDFS-10169
> URL: https://issues.apache.org/jira/browse/HDFS-10169
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Rakesh R
>
> This failure has been seen multiple precomit builds recently.
> {noformat}
> testBatchedSyncWithClosedLogs[1](org.apache.hadoop.hdfs.server.namenode.TestEditLog)
>   Time elapsed: 0.377 sec  <<< FAILURE!
> java.lang.AssertionError: logging edit without syncing should do not affect 
> txid expected:<1> but was:<2>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestEditLog.testBatchedSyncWithClosedLogs(TestEditLog.java:594)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10594) CacheReplicationMonitor should recursively rescan the path when the inode of the path is directory

2016-07-05 Thread Yiqun Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDFS-10594:
-
Attachment: HDFS-10594.001.patch

> CacheReplicationMonitor should recursively rescan the path when the inode of 
> the path is directory
> --
>
> Key: HDFS-10594
> URL: https://issues.apache.org/jira/browse/HDFS-10594
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: caching
>Affects Versions: 2.7.1
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
> Attachments: HDFS-10594.001.patch
>
>
> In {{CacheReplicationMonitor#rescanCacheDirectives}}, it should recursively 
> rescan the path when the inode of the path is a directory. In these code:
> {code}
> } else if (node.isDirectory()) {
> INodeDirectory dir = node.asDirectory();
> ReadOnlyList children = dir
> .getChildrenList(Snapshot.CURRENT_STATE_ID);
> for (INode child : children) {
>   if (child.isFile()) {
> rescanFile(directive, child.asFile());
>   }
> }
>}
> {code}
> If we did the this logic, it means that some inode files will be ignored when 
> the child inode is also a directory and there are some other child inode file 
> in it. Finally the child's child file which belong to this path will not be 
> cached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10594) CacheReplicationMonitor should recursively rescan the path when the inode of the path is directory

2016-07-05 Thread Yiqun Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDFS-10594:
-
Status: Patch Available  (was: Open)

> CacheReplicationMonitor should recursively rescan the path when the inode of 
> the path is directory
> --
>
> Key: HDFS-10594
> URL: https://issues.apache.org/jira/browse/HDFS-10594
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: caching
>Affects Versions: 2.7.1
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>
> In {{CacheReplicationMonitor#rescanCacheDirectives}}, it should recursively 
> rescan the path when the inode of the path is a directory. In these code:
> {code}
> } else if (node.isDirectory()) {
> INodeDirectory dir = node.asDirectory();
> ReadOnlyList children = dir
> .getChildrenList(Snapshot.CURRENT_STATE_ID);
> for (INode child : children) {
>   if (child.isFile()) {
> rescanFile(directive, child.asFile());
>   }
> }
>}
> {code}
> If we did the this logic, it means that some inode files will be ignored when 
> the child inode is also a directory and there are some other child inode file 
> in it. Finally the child's child file which belong to this path will not be 
> cached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10594) CacheReplicationMonitor should recursively rescan the path when the inode of the path is directory

2016-07-05 Thread Yiqun Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15362497#comment-15362497
 ] 

Yiqun Lin commented on HDFS-10594:
--

Attach a initial patch.

> CacheReplicationMonitor should recursively rescan the path when the inode of 
> the path is directory
> --
>
> Key: HDFS-10594
> URL: https://issues.apache.org/jira/browse/HDFS-10594
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: caching
>Affects Versions: 2.7.1
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>
> In {{CacheReplicationMonitor#rescanCacheDirectives}}, it should recursively 
> rescan the path when the inode of the path is a directory. In these code:
> {code}
> } else if (node.isDirectory()) {
> INodeDirectory dir = node.asDirectory();
> ReadOnlyList children = dir
> .getChildrenList(Snapshot.CURRENT_STATE_ID);
> for (INode child : children) {
>   if (child.isFile()) {
> rescanFile(directive, child.asFile());
>   }
> }
>}
> {code}
> If we did the this logic, it means that some inode files will be ignored when 
> the child inode is also a directory and there are some other child inode file 
> in it. Finally the child's child file which belong to this path will not be 
> cached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-10594) CacheReplicationMonitor should recursively rescan the path when the inode of the path is directory

2016-07-05 Thread Yiqun Lin (JIRA)
Yiqun Lin created HDFS-10594:


 Summary: CacheReplicationMonitor should recursively rescan the 
path when the inode of the path is directory
 Key: HDFS-10594
 URL: https://issues.apache.org/jira/browse/HDFS-10594
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: caching
Affects Versions: 2.7.1
Reporter: Yiqun Lin
Assignee: Yiqun Lin


In {{CacheReplicationMonitor#rescanCacheDirectives}}, it should recursively 
rescan the path when the inode of the path is a directory. In these code:
{code}
} else if (node.isDirectory()) {
INodeDirectory dir = node.asDirectory();
ReadOnlyList children = dir
.getChildrenList(Snapshot.CURRENT_STATE_ID);
for (INode child : children) {
  if (child.isFile()) {
rescanFile(directive, child.asFile());
  }
}
   }
{code}
If we did the this logic, it means that some inode files will be ignored when 
the child inode is also a directory and there are some other child inode file 
in it. Finally the child's child file which belong to this path will not be 
cached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-6962) ACLs inheritance conflict with umaskmode

2016-07-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15362354#comment-15362354
 ] 

Hadoop QA commented on HDFS-6962:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
31s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
21s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
 9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
16s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  7m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m  
5s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 35s{color} | {color:orange} root: The patch generated 1 new + 1152 unchanged 
- 0 fixed = 1153 total (was 1152) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  9m  
2s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
58s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 74m 
18s{color} | {color:green} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
27s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}135m  3s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:85209cc |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12816153/HDFS-6962.006.patch |
| JIRA Issue | HDFS-6962 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  cc  xml  |
| uname | Linux 3d49e69f46bd 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 8b4b525 |
| Default Java | 1.8.0_91 |
| findbugs | v3.0.0 |
| checkstyle | 

[jira] [Commented] (HDFS-8956) Not able to start Datanode

2016-07-05 Thread SHIVADEEP GUNDOJU (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15362333#comment-15362333
 ] 

SHIVADEEP GUNDOJU commented on HDFS-8956:
-

Hello 

I got exactly same problem 
for me after uncommenting below line in /etc/hosts. datanode started 
127.0.0.1   localhost
I don't know why how...but it started

Thanks

> Not able to start Datanode
> --
>
> Key: HDFS-8956
> URL: https://issues.apache.org/jira/browse/HDFS-8956
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 2.7.0
> Environment: Centos
>Reporter: sreelakshmi
>
> Data node service is not started on one of the data nodes, "java.net.bind 
> exception" is thrown. 
> Verified that ports 50010,50070 and 50075 are not in use by any other 
> application.
> 15/08/26 01:50:15 INFO http.HttpServer2: HttpServer.start() threw a non Bind 
> IOException
> java.net.BindException: Port in use: localhost:0
> at 
> org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:919)
> at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:856)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.startInfoServer(DataNode.java:779)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1134)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:434)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2404)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2291)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2338)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2515)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2539)
> Caused by: java.net.BindException: Cannot assign requested address



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-7957) Truncate should verify quota before making changes

2016-07-05 Thread Shen Yinjie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shen Yinjie updated HDFS-7957:
--
Assignee: Jing Zhao  (was: Shen Yinjie)

> Truncate should verify quota before making changes
> --
>
> Key: HDFS-7957
> URL: https://issues.apache.org/jira/browse/HDFS-7957
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>Priority: Critical
> Fix For: 2.7.0
>
> Attachments: HDFS-7957.000.patch, HDFS-7957.001.patch, 
> HDFS-7957.002.patch
>
>
> This is a similar issue with HDFS-7587: for truncate we should also verify 
> quota in the beginning and update quota in the end.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-7957) Truncate should verify quota before making changes

2016-07-05 Thread Shen Yinjie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shen Yinjie reassigned HDFS-7957:
-

Assignee: Shen Yinjie  (was: Jing Zhao)

> Truncate should verify quota before making changes
> --
>
> Key: HDFS-7957
> URL: https://issues.apache.org/jira/browse/HDFS-7957
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.0
>Reporter: Jing Zhao
>Assignee: Shen Yinjie
>Priority: Critical
> Fix For: 2.7.0
>
> Attachments: HDFS-7957.000.patch, HDFS-7957.001.patch, 
> HDFS-7957.002.patch
>
>
> This is a similar issue with HDFS-7587: for truncate we should also verify 
> quota in the beginning and update quota in the end.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-6962) ACLs inheritance conflict with umaskmode

2016-07-05 Thread John Zhuge (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Zhuge updated HDFS-6962:
-
Attachment: HDFS-6962.006.patch

Patch 006 over 005:
* Remove the ugly calls of "instanceof FsCreateModes"

> ACLs inheritance conflict with umaskmode
> 
>
> Key: HDFS-6962
> URL: https://issues.apache.org/jira/browse/HDFS-6962
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.4.1
> Environment: CentOS release 6.5 (Final)
>Reporter: LINTE
>Assignee: John Zhuge
>Priority: Critical
>  Labels: hadoop, security
> Attachments: HDFS-6962.001.patch, HDFS-6962.002.patch, 
> HDFS-6962.003.patch, HDFS-6962.004.patch, HDFS-6962.005.patch, 
> HDFS-6962.006.patch, HDFS-6962.1.patch, disabled_new_client.log, 
> disabled_old_client.log, enabled_new_client.log, enabled_old_client.log, run
>
>
> In hdfs-site.xml 
> 
> dfs.umaskmode
> 027
> 
> 1/ Create a directory as superuser
> bash# hdfs dfs -mkdir  /tmp/ACLS
> 2/ set default ACLs on this directory rwx access for group readwrite and user 
> toto
> bash# hdfs dfs -setfacl -m default:group:readwrite:rwx /tmp/ACLS
> bash# hdfs dfs -setfacl -m default:user:toto:rwx /tmp/ACLS
> 3/ check ACLs /tmp/ACLS/
> bash# hdfs dfs -getfacl /tmp/ACLS/
> # file: /tmp/ACLS
> # owner: hdfs
> # group: hadoop
> user::rwx
> group::r-x
> other::---
> default:user::rwx
> default:user:toto:rwx
> default:group::r-x
> default:group:readwrite:rwx
> default:mask::rwx
> default:other::---
> user::rwx | group::r-x | other::--- matches with the umaskmode defined in 
> hdfs-site.xml, everything ok !
> default:group:readwrite:rwx allow readwrite group with rwx access for 
> inhéritance.
> default:user:toto:rwx allow toto user with rwx access for inhéritance.
> default:mask::rwx inhéritance mask is rwx, so no mask
> 4/ Create a subdir to test inheritance of ACL
> bash# hdfs dfs -mkdir  /tmp/ACLS/hdfs
> 5/ check ACLs /tmp/ACLS/hdfs
> bash# hdfs dfs -getfacl /tmp/ACLS/hdfs
> # file: /tmp/ACLS/hdfs
> # owner: hdfs
> # group: hadoop
> user::rwx
> user:toto:rwx   #effective:r-x
> group::r-x
> group:readwrite:rwx #effective:r-x
> mask::r-x
> other::---
> default:user::rwx
> default:user:toto:rwx
> default:group::r-x
> default:group:readwrite:rwx
> default:mask::rwx
> default:other::---
> Here we can see that the readwrite group has rwx ACL bu only r-x is effective 
> because the mask is r-x (mask::r-x) in spite of default mask for inheritance 
> is set to default:mask::rwx on /tmp/ACLS/
> 6/ Modifiy hdfs-site.xml et restart namenode
> 
> dfs.umaskmode
> 010
> 
> 7/ Create a subdir to test inheritance of ACL with new parameter umaskmode
> bash# hdfs dfs -mkdir  /tmp/ACLS/hdfs2
> 8/ Check ACL on /tmp/ACLS/hdfs2
> bash# hdfs dfs -getfacl /tmp/ACLS/hdfs2
> # file: /tmp/ACLS/hdfs2
> # owner: hdfs
> # group: hadoop
> user::rwx
> user:toto:rwx   #effective:rw-
> group::r-x  #effective:r--
> group:readwrite:rwx #effective:rw-
> mask::rw-
> other::---
> default:user::rwx
> default:user:toto:rwx
> default:group::r-x
> default:group:readwrite:rwx
> default:mask::rwx
> default:other::---
> So HDFS masks the ACL value (user, group and other  -- exepted the POSIX 
> owner -- ) with the group mask of dfs.umaskmode properties when creating 
> directory with inherited ACL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10591) Using webhdfs unable to download pdf,doc files in ubuntu os.

2016-07-05 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15362178#comment-15362178
 ] 

Mingliang Liu commented on HDFS-10591:
--

Please note that, JIRA tickets are not for usage discussions. Use 
u...@hadoop.apache.org maillist instead. In this case, I see no obvious 
relation between the pdf,doc file formats to webhdfs access. If you think this 
is a real bug, please add detailed description to support debugging. Thanks.

> Using webhdfs unable to download pdf,doc files in ubuntu os.
> 
>
> Key: HDFS-10591
> URL: https://issues.apache.org/jira/browse/HDFS-10591
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Reporter: bharghavi
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10582) Change deprecated configuration fs.checkpoint.dir to dfs.namenode.checkpoint.dir in HDFS Commands Doc

2016-07-05 Thread Pan Yuxuan (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15362112#comment-15362112
 ] 

Pan Yuxuan commented on HDFS-10582:
---

Fix for review!

> Change deprecated configuration fs.checkpoint.dir to 
> dfs.namenode.checkpoint.dir in HDFS Commands Doc
> -
>
> Key: HDFS-10582
> URL: https://issues.apache.org/jira/browse/HDFS-10582
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 2.7.2
>Reporter: Pan Yuxuan
>Priority: Minor
> Attachments: HDFS-10582.patch
>
>
> HDFS Commands Documentation -importCheckpoint uses the deprecated 
> configuration string {code}fs.checkpoint.dir{code} we can use 
> {noformat}dfs.namenode.checkpoint.dir{noformat} instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10593) MAX_DIR_ITEMS should not be hard coded since RPC buff size is configurable

2016-07-05 Thread Yuanbo Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15362065#comment-15362065
 ] 

Yuanbo Liu commented on HDFS-10593:
---

[~andrew.wang][~cnauroth] I also tag you two in this loop, and hope to get your 
thoughts.

> MAX_DIR_ITEMS should not be hard coded since RPC buff size is configurable 
> ---
>
> Key: HDFS-10593
> URL: https://issues.apache.org/jira/browse/HDFS-10593
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Yuanbo Liu
>
> In HDFS, "dfs.namenode.fs-limits.max-directory-items" was introduced in 
> HDFS-6102 to restrict max items of single directory, and the value of it can 
> not be larger than the value of MAX_DIR_ITEMS. Since 
> "ipc.maximum.data.length" was added in HADOOP-9676 and documented in 
> HADOOP-13039 to make maximum RPC buffer size configurable, it's not proper to 
> hard code the value of MAX_DIR_ITEMS in {{FSDirectory}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-10593) MAX_DIR_ITEMS should not be hard coded since RPC buff size is configurable

2016-07-05 Thread Yuanbo Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15362065#comment-15362065
 ] 

Yuanbo Liu edited comment on HDFS-10593 at 7/5/16 6:06 AM:
---

[~andrew.wang]/[~cnauroth] I also tag you two in this loop, and hope to get 
your thoughts.


was (Author: yuanbo):
[~andrew.wang][~cnauroth] I also tag you two in this loop, and hope to get your 
thoughts.

> MAX_DIR_ITEMS should not be hard coded since RPC buff size is configurable 
> ---
>
> Key: HDFS-10593
> URL: https://issues.apache.org/jira/browse/HDFS-10593
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Yuanbo Liu
>
> In HDFS, "dfs.namenode.fs-limits.max-directory-items" was introduced in 
> HDFS-6102 to restrict max items of single directory, and the value of it can 
> not be larger than the value of MAX_DIR_ITEMS. Since 
> "ipc.maximum.data.length" was added in HADOOP-9676 and documented in 
> HADOOP-13039 to make maximum RPC buffer size configurable, it's not proper to 
> hard code the value of MAX_DIR_ITEMS in {{FSDirectory}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10592) Fix intermittent test failure of TestNameNodeResourceChecker#testCheckThatNameNodeResourceMonitorIsRunning

2016-07-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15362064#comment-15362064
 ] 

Hadoop QA commented on HDFS-10592:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
24s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} | {color:green} hadoop-hdfs-project/hadoop-hdfs: The patch generated 
0 new + 19 unchanged - 4 fixed = 19 total (was 23) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 75m 19s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
19s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 97m 13s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency |
|   | hadoop.hdfs.server.namenode.TestEditLog |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:85209cc |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12816122/HDFS-10592-01.patch |
| JIRA Issue | HDFS-10592 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 7f3144a609c3 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 8b4b525 |
| Default Java | 1.8.0_91 |
| findbugs | v3.0.0 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15980/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15980/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15980/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Fix intermittent test failure of 
> TestNameNodeResourceChecker#testCheckThatNameNodeResourceMonitorIsRunning
> --
>
> Key: HDFS-10592
> 

[jira] [Created] (HDFS-10593) MAX_DIR_ITEMS should not be hard coded since RPC buff size is configurable

2016-07-05 Thread Yuanbo Liu (JIRA)
Yuanbo Liu created HDFS-10593:
-

 Summary: MAX_DIR_ITEMS should not be hard coded since RPC buff 
size is configurable 
 Key: HDFS-10593
 URL: https://issues.apache.org/jira/browse/HDFS-10593
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Yuanbo Liu


In HDFS, "dfs.namenode.fs-limits.max-directory-items" was introduced in 
HDFS-6102 to restrict max items of single directory, and the value of it can 
not be larger than the value of MAX_DIR_ITEMS. Since "ipc.maximum.data.length" 
was added in HADOOP-9676 and documented in HADOOP-13039 to make maximum RPC 
buffer size configurable, it's not proper to hard code the value of 
MAX_DIR_ITEMS in {{FSDirectory}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org