[jira] [Commented] (HDFS-10169) TestEditLog.testBatchedSyncWithClosedLogs with useAsyncEditLog sometimes fails
[ https://issues.apache.org/jira/browse/HDFS-10169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363723#comment-15363723 ] Hadoop QA commented on HDFS-10169: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 7m 19s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 61m 4s{color} | {color:green} hadoop-hdfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 88m 56s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12816363/HDFS-10169-01.patch | | JIRA Issue | HDFS-10169 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux c0c6d53fd355 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / d792a90 | | Default Java | 1.8.0_91 | | findbugs | v3.0.0 | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/15989/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/15989/console | | Powered by | Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > TestEditLog.testBatchedSyncWithClosedLogs with useAsyncEditLog sometimes fails > -- > > Key: HDFS-10169 > URL: https://issues.apache.org/jira/browse/HDFS-10169 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Rakesh R > Attachments: HDFS-10169-00.patch, HDFS-10169-01.patch > > > This failure has been seen multiple precomit builds recently. > {noformat} > testBatchedSyncWithClosedLogs[1](org.apache.hadoop.hdfs.server.namenode.TestEditLog) > Time
[jira] [Resolved] (HDFS-10593) MAX_DIR_ITEMS should not be hard coded since RPC buff size is configurable
[ https://issues.apache.org/jira/browse/HDFS-10593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuanbo Liu resolved HDFS-10593. --- Resolution: Not A Problem > MAX_DIR_ITEMS should not be hard coded since RPC buff size is configurable > --- > > Key: HDFS-10593 > URL: https://issues.apache.org/jira/browse/HDFS-10593 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Yuanbo Liu > > In HDFS, "dfs.namenode.fs-limits.max-directory-items" was introduced in > HDFS-6102 to restrict max items of single directory, and the value of it can > not be larger than the value of MAX_DIR_ITEMS. Since > "ipc.maximum.data.length" was added in HADOOP-9676 and documented in > HADOOP-13039 to make maximum RPC buffer size configurable, it's not proper to > hard code the value of MAX_DIR_ITEMS in {{FSDirectory}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10593) MAX_DIR_ITEMS should not be hard coded since RPC buff size is configurable
[ https://issues.apache.org/jira/browse/HDFS-10593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363718#comment-15363718 ] Yuanbo Liu commented on HDFS-10593: --- [~andrew.wang]Thank you for your response. You're right, "ipc.maximum.data.length" is used to set maximum RPC message size, not PB size. I'm sorry for not investigating clearly. I found a jira HDFS-10312 and thought that "ipc.maximum.data.length" was a general property for PB size. Turns out Chris did not want to introduce a new property and reused "ipc.maximum.data.length" to set PB size when block reporting. I searched {{setSizeLimit}} in Hadoop project and did not find content about fsimage PB serde, so I think we didn't make fsimage serde configurable. > MAX_DIR_ITEMS should not be hard coded since RPC buff size is configurable > --- > > Key: HDFS-10593 > URL: https://issues.apache.org/jira/browse/HDFS-10593 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Yuanbo Liu > > In HDFS, "dfs.namenode.fs-limits.max-directory-items" was introduced in > HDFS-6102 to restrict max items of single directory, and the value of it can > not be larger than the value of MAX_DIR_ITEMS. Since > "ipc.maximum.data.length" was added in HADOOP-9676 and documented in > HADOOP-13039 to make maximum RPC buffer size configurable, it's not proper to > hard code the value of MAX_DIR_ITEMS in {{FSDirectory}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10548) Remove the long deprecated BlockReaderRemote
[ https://issues.apache.org/jira/browse/HDFS-10548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363678#comment-15363678 ] Kai Zheng commented on HDFS-10548: -- bq. Is there a JIRA for removing BRLocalLegacy too? IIRC Windows used to need it since they didn't support passing the fd via domain socket, but maybe that's changed. Ping: [~cnauroth] and [~ste...@apache.org]. Hope this to be clarified or confirmed, and if sounds good I can do it as well. bq. How do we feel about removing it? Sure [~andrew.wang], I will find a chance do it. > Remove the long deprecated BlockReaderRemote > > > Key: HDFS-10548 > URL: https://issues.apache.org/jira/browse/HDFS-10548 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Kai Zheng >Assignee: Kai Zheng > Fix For: 3.0.0-alpha1 > > Attachments: HDFS-10548-v1.patch, HDFS-10548-v2.patch, > HDFS-10548-v3.patch > > > To lessen the maintain burden like raised in HDFS-8901, suggest we remove > {{BlockReaderRemote}} class that's deprecated very long time ago. > From {{BlockReaderRemote}} header: > {quote} > * @deprecated this is an old implementation that is being left around > * in case any issues spring up with the new {@link BlockReaderRemote2} > * implementation. > * It will be removed in the next release. > {quote} > From {{BlockReaderRemote2}} class header: > {quote} > * This is a new implementation introduced in Hadoop 0.23 which > * is more efficient and simpler than the older BlockReader > * implementation. It should be renamed to BlockReaderRemote > * once we are confident in it. > {quote} > So even further, after getting rid of the old class, we could rename as the > comment suggested: BlockReaderRemote2 => BlockReaderRemote. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10600) PlanCommand#getThrsholdPercentage should not use throughput value.
[ https://issues.apache.org/jira/browse/HDFS-10600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363645#comment-15363645 ] Hadoop QA commented on HDFS-10600: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 4m 14s{color} | {color:red} Docker failed to build yetus/hadoop:9560f25. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12816360/HDFS-10600.001.patch | | JIRA Issue | HDFS-10600 | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/15988/console | | Powered by | Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > PlanCommand#getThrsholdPercentage should not use throughput value. > -- > > Key: HDFS-10600 > URL: https://issues.apache.org/jira/browse/HDFS-10600 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: diskbalancer >Affects Versions: 2.9.0, 3.0.0-beta1 >Reporter: Lei (Eddy) Xu >Assignee: Yiqun Lin > Fix For: 2.9.0 > > Attachments: HDFS-10600.001.patch > > > In {{PlanCommand#getThresholdPercentage}} > {code} > private double getThresholdPercentage(CommandLine cmd) { > > if ((value <= 0.0) || (value > 100.0)) { > value = getConf().getDouble( > DFSConfigKeys.DFS_DISK_BALANCER_MAX_DISK_THRUPUT, > DFSConfigKeys.DFS_DISK_BALANCER_MAX_DISK_THRUPUT_DEFAULT); > } > return value; > } > {code} > {{DISK_THROUGHPUT}} has the unit of "MB", so it does not make sense to return > {{throughput}} as a percentage value. > Btw, we should use {{THROUGHPUT}} instead of {{THRUPUT}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10169) TestEditLog.testBatchedSyncWithClosedLogs with useAsyncEditLog sometimes fails
[ https://issues.apache.org/jira/browse/HDFS-10169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-10169: Attachment: HDFS-10169-01.patch > TestEditLog.testBatchedSyncWithClosedLogs with useAsyncEditLog sometimes fails > -- > > Key: HDFS-10169 > URL: https://issues.apache.org/jira/browse/HDFS-10169 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Rakesh R > Attachments: HDFS-10169-00.patch, HDFS-10169-01.patch > > > This failure has been seen multiple precomit builds recently. > {noformat} > testBatchedSyncWithClosedLogs[1](org.apache.hadoop.hdfs.server.namenode.TestEditLog) > Time elapsed: 0.377 sec <<< FAILURE! > java.lang.AssertionError: logging edit without syncing should do not affect > txid expected:<1> but was:<2> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.hdfs.server.namenode.TestEditLog.testBatchedSyncWithClosedLogs(TestEditLog.java:594) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10600) PlanCommand#getThrsholdPercentage should not use throughput value.
[ https://issues.apache.org/jira/browse/HDFS-10600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yiqun Lin updated HDFS-10600: - Attachment: HDFS-10600.001.patch > PlanCommand#getThrsholdPercentage should not use throughput value. > -- > > Key: HDFS-10600 > URL: https://issues.apache.org/jira/browse/HDFS-10600 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: diskbalancer >Affects Versions: 2.9.0, 3.0.0-beta1 >Reporter: Lei (Eddy) Xu >Assignee: Yiqun Lin > Fix For: 2.9.0 > > Attachments: HDFS-10600.001.patch > > > In {{PlanCommand#getThresholdPercentage}} > {code} > private double getThresholdPercentage(CommandLine cmd) { > > if ((value <= 0.0) || (value > 100.0)) { > value = getConf().getDouble( > DFSConfigKeys.DFS_DISK_BALANCER_MAX_DISK_THRUPUT, > DFSConfigKeys.DFS_DISK_BALANCER_MAX_DISK_THRUPUT_DEFAULT); > } > return value; > } > {code} > {{DISK_THROUGHPUT}} has the unit of "MB", so it does not make sense to return > {{throughput}} as a percentage value. > Btw, we should use {{THROUGHPUT}} instead of {{THRUPUT}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10600) PlanCommand#getThrsholdPercentage should not use throughput value.
[ https://issues.apache.org/jira/browse/HDFS-10600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363627#comment-15363627 ] Yiqun Lin commented on HDFS-10600: -- Post a simple patch fot this. > PlanCommand#getThrsholdPercentage should not use throughput value. > -- > > Key: HDFS-10600 > URL: https://issues.apache.org/jira/browse/HDFS-10600 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: diskbalancer >Affects Versions: 2.9.0, 3.0.0-beta1 >Reporter: Lei (Eddy) Xu >Assignee: Yiqun Lin > Fix For: 2.9.0 > > > In {{PlanCommand#getThresholdPercentage}} > {code} > private double getThresholdPercentage(CommandLine cmd) { > > if ((value <= 0.0) || (value > 100.0)) { > value = getConf().getDouble( > DFSConfigKeys.DFS_DISK_BALANCER_MAX_DISK_THRUPUT, > DFSConfigKeys.DFS_DISK_BALANCER_MAX_DISK_THRUPUT_DEFAULT); > } > return value; > } > {code} > {{DISK_THROUGHPUT}} has the unit of "MB", so it does not make sense to return > {{throughput}} as a percentage value. > Btw, we should use {{THROUGHPUT}} instead of {{THRUPUT}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10600) PlanCommand#getThrsholdPercentage should not use throughput value.
[ https://issues.apache.org/jira/browse/HDFS-10600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yiqun Lin updated HDFS-10600: - Status: Patch Available (was: Open) > PlanCommand#getThrsholdPercentage should not use throughput value. > -- > > Key: HDFS-10600 > URL: https://issues.apache.org/jira/browse/HDFS-10600 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: diskbalancer >Affects Versions: 2.9.0, 3.0.0-beta1 >Reporter: Lei (Eddy) Xu >Assignee: Yiqun Lin > Fix For: 2.9.0 > > > In {{PlanCommand#getThresholdPercentage}} > {code} > private double getThresholdPercentage(CommandLine cmd) { > > if ((value <= 0.0) || (value > 100.0)) { > value = getConf().getDouble( > DFSConfigKeys.DFS_DISK_BALANCER_MAX_DISK_THRUPUT, > DFSConfigKeys.DFS_DISK_BALANCER_MAX_DISK_THRUPUT_DEFAULT); > } > return value; > } > {code} > {{DISK_THROUGHPUT}} has the unit of "MB", so it does not make sense to return > {{throughput}} as a percentage value. > Btw, we should use {{THROUGHPUT}} instead of {{THRUPUT}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-10600) PlanCommand#getThrsholdPercentage should not use throughput value.
[ https://issues.apache.org/jira/browse/HDFS-10600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363627#comment-15363627 ] Yiqun Lin edited comment on HDFS-10600 at 7/6/16 2:13 AM: -- Post a simple patch for this. was (Author: linyiqun): Post a simple patch fot this. > PlanCommand#getThrsholdPercentage should not use throughput value. > -- > > Key: HDFS-10600 > URL: https://issues.apache.org/jira/browse/HDFS-10600 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: diskbalancer >Affects Versions: 2.9.0, 3.0.0-beta1 >Reporter: Lei (Eddy) Xu >Assignee: Yiqun Lin > Fix For: 2.9.0 > > > In {{PlanCommand#getThresholdPercentage}} > {code} > private double getThresholdPercentage(CommandLine cmd) { > > if ((value <= 0.0) || (value > 100.0)) { > value = getConf().getDouble( > DFSConfigKeys.DFS_DISK_BALANCER_MAX_DISK_THRUPUT, > DFSConfigKeys.DFS_DISK_BALANCER_MAX_DISK_THRUPUT_DEFAULT); > } > return value; > } > {code} > {{DISK_THROUGHPUT}} has the unit of "MB", so it does not make sense to return > {{throughput}} as a percentage value. > Btw, we should use {{THROUGHPUT}} instead of {{THRUPUT}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-10600) PlanCommand#getThrsholdPercentage should not use throughput value.
[ https://issues.apache.org/jira/browse/HDFS-10600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yiqun Lin reassigned HDFS-10600: Assignee: Yiqun Lin > PlanCommand#getThrsholdPercentage should not use throughput value. > -- > > Key: HDFS-10600 > URL: https://issues.apache.org/jira/browse/HDFS-10600 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: diskbalancer >Affects Versions: 2.9.0, 3.0.0-beta1 >Reporter: Lei (Eddy) Xu >Assignee: Yiqun Lin > Fix For: 2.9.0 > > > In {{PlanCommand#getThresholdPercentage}} > {code} > private double getThresholdPercentage(CommandLine cmd) { > > if ((value <= 0.0) || (value > 100.0)) { > value = getConf().getDouble( > DFSConfigKeys.DFS_DISK_BALANCER_MAX_DISK_THRUPUT, > DFSConfigKeys.DFS_DISK_BALANCER_MAX_DISK_THRUPUT_DEFAULT); > } > return value; > } > {code} > {{DISK_THROUGHPUT}} has the unit of "MB", so it does not make sense to return > {{throughput}} as a percentage value. > Btw, we should use {{THROUGHPUT}} instead of {{THRUPUT}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10548) Remove the long deprecated BlockReaderRemote
[ https://issues.apache.org/jira/browse/HDFS-10548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-10548: --- Hadoop Flags: Incompatible change,Reviewed (was: Reviewed) Release Note: This removes the configuration property {{dfs.client.use.legacy.blockreader}}, since the legacy remote block reader class has been removed from the codebase. (was: This will obsoletes this configuration property, since the legacy block reader is removed from the code base. {{dfs.client.use.legacy.blockreader}}) I also realized on second inspection that the LEGACY_BLOCKREADER config key is still present in HdfsClientConfigKeys and DFSConfigKeys. How do we feel about removing it? > Remove the long deprecated BlockReaderRemote > > > Key: HDFS-10548 > URL: https://issues.apache.org/jira/browse/HDFS-10548 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Kai Zheng >Assignee: Kai Zheng > Fix For: 3.0.0-alpha1 > > Attachments: HDFS-10548-v1.patch, HDFS-10548-v2.patch, > HDFS-10548-v3.patch > > > To lessen the maintain burden like raised in HDFS-8901, suggest we remove > {{BlockReaderRemote}} class that's deprecated very long time ago. > From {{BlockReaderRemote}} header: > {quote} > * @deprecated this is an old implementation that is being left around > * in case any issues spring up with the new {@link BlockReaderRemote2} > * implementation. > * It will be removed in the next release. > {quote} > From {{BlockReaderRemote2}} class header: > {quote} > * This is a new implementation introduced in Hadoop 0.23 which > * is more efficient and simpler than the older BlockReader > * implementation. It should be renamed to BlockReaderRemote > * once we are confident in it. > {quote} > So even further, after getting rid of the old class, we could rename as the > comment suggested: BlockReaderRemote2 => BlockReaderRemote. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10548) Remove the long deprecated BlockReaderRemote
[ https://issues.apache.org/jira/browse/HDFS-10548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363584#comment-15363584 ] Andrew Wang commented on HDFS-10548: I think what Colin meant is that (with the rename) if someone changes BRR in trunk, that change needs to be reapplied to BRR2 for the branch-2 backport. So the backports won't be clean. Recommend that we not backport this to branch-2 for compatibility reasons. > Remove the long deprecated BlockReaderRemote > > > Key: HDFS-10548 > URL: https://issues.apache.org/jira/browse/HDFS-10548 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Kai Zheng >Assignee: Kai Zheng > Fix For: 3.0.0-alpha1 > > Attachments: HDFS-10548-v1.patch, HDFS-10548-v2.patch, > HDFS-10548-v3.patch > > > To lessen the maintain burden like raised in HDFS-8901, suggest we remove > {{BlockReaderRemote}} class that's deprecated very long time ago. > From {{BlockReaderRemote}} header: > {quote} > * @deprecated this is an old implementation that is being left around > * in case any issues spring up with the new {@link BlockReaderRemote2} > * implementation. > * It will be removed in the next release. > {quote} > From {{BlockReaderRemote2}} class header: > {quote} > * This is a new implementation introduced in Hadoop 0.23 which > * is more efficient and simpler than the older BlockReader > * implementation. It should be renamed to BlockReaderRemote > * once we are confident in it. > {quote} > So even further, after getting rid of the old class, we could rename as the > comment suggested: BlockReaderRemote2 => BlockReaderRemote. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-10600) PlanCommand#getThrsholdPercentage should not use throughput value.
Lei (Eddy) Xu created HDFS-10600: Summary: PlanCommand#getThrsholdPercentage should not use throughput value. Key: HDFS-10600 URL: https://issues.apache.org/jira/browse/HDFS-10600 Project: Hadoop HDFS Issue Type: Sub-task Components: diskbalancer Affects Versions: 2.9.0, 3.0.0-beta1 Reporter: Lei (Eddy) Xu In {{PlanCommand#getThresholdPercentage}} {code} private double getThresholdPercentage(CommandLine cmd) { if ((value <= 0.0) || (value > 100.0)) { value = getConf().getDouble( DFSConfigKeys.DFS_DISK_BALANCER_MAX_DISK_THRUPUT, DFSConfigKeys.DFS_DISK_BALANCER_MAX_DISK_THRUPUT_DEFAULT); } return value; } {code} {{DISK_THROUGHPUT}} has the unit of "MB", so it does not make sense to return {{throughput}} as a percentage value. Btw, we should use {{THROUGHPUT}} instead of {{THRUPUT}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10567) Improve plan command help message
[ https://issues.apache.org/jira/browse/HDFS-10567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363562#comment-15363562 ] Anu Engineer commented on HDFS-10567: - [~xiaobingo] Thank you very much for the improvements in the messages in help. There is one improvement that somehow does not feel right. {noformat} withDescription("Describes how many errors in integer " + "can be tolerated while copying between a pair of disks.") {noformat} We seem to have added "in integer" as a unit. Does not how many errors convey the same meaning ? All other changes look good. > Improve plan command help message > - > > Key: HDFS-10567 > URL: https://issues.apache.org/jira/browse/HDFS-10567 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Reporter: Lei (Eddy) Xu >Assignee: Xiaobing Zhou > Attachments: HDFS-10567-HDFS-10576.001.patch, > HDFS-10567-HDFS-1312.000.patch > > > {code} > --bandwidth Maximum disk bandwidth to be consumed by > diskBalancer. e.g. 10 > --maxerror Describes how many errors can be > tolerated while copying between a pair > of disks. > --outFile to write output to, if not > specified defaults will be used. > --plan creates a plan for datanode. > --thresholdPercentagePercentage skew that wetolerate before > diskbalancer starts working e.g. 10 > --v Print out the summary of the plan on > console > {code} > We should > * Put the unit into {{--bandwidth}}, or its help message. Is it an integer or > float / double number? Not clear in CLI message. > * Give more details about {{--plan}}. It is not clear what the {{}} is > for. > * {{--thresholdPercentage}}, has typo {{wetolerate}} in the error message. > Also it needs to indicated that it is the difference between space > utilization between two disks / volumes. Is it an integer or float / double > number? > Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-10599) DiskBalancer: Execute CLI via Shell
Anu Engineer created HDFS-10599: --- Summary: DiskBalancer: Execute CLI via Shell Key: HDFS-10599 URL: https://issues.apache.org/jira/browse/HDFS-10599 Project: Hadoop HDFS Issue Type: Sub-task Components: balancer & mover Affects Versions: 3.0.0-alpha1 Reporter: Anu Engineer Assignee: Anu Engineer Fix For: 3.0.0-alpha1 DiskBalancer CLI invokes CLI functions directly instead of shell. This is not representative of how end users use it. To provide good unit test coverage, we need to have tests where DiskBalancer CLI is invoked via shell. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-10598) DiskBalancer does not execute multi-steps plan.
[ https://issues.apache.org/jira/browse/HDFS-10598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer reassigned HDFS-10598: --- Assignee: Anu Engineer > DiskBalancer does not execute multi-steps plan. > --- > > Key: HDFS-10598 > URL: https://issues.apache.org/jira/browse/HDFS-10598 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: diskbalancer >Affects Versions: 2.8.0, 3.0.0-beta1 >Reporter: Lei (Eddy) Xu >Assignee: Anu Engineer >Priority: Critical > Fix For: 2.9.0 > > > I set up a 3 DN node cluster, each one with 2 small disks. After creating > some files to fill HDFS, I added two more small disks to one DN. And run the > diskbalancer on this DataNode. > The disk usage before running diskbalancer: > {code} > /dev/loop0 3.9G 2.1G 1.6G 58% /mnt/data1 > /dev/loop1 3.9G 2.6G 1.1G 71% /mnt/data2 > /dev/loop2 3.9G 17M 3.6G 1% /mnt/data3 > /dev/loop3 3.9G 17M 3.6G 1% /mnt/data4 > {code} > However, after running diskbalancer (i.e., {{-query}} shows {{PLAN_DONE}}) > {code} > /dev/loop0 3.9G 1.2G 2.5G 32% /mnt/data1 > /dev/loop1 3.9G 2.6G 1.1G 71% /mnt/data2 > /dev/loop2 3.9G 953M 2.7G 26% /mnt/data3 > /dev/loop3 3.9G 17M 3.6G 1% /mnt/data4 > {code} > It is suspicious that in {{DiskBalancerMover#copyBlocks}}, every return does > {{this.setExitFlag}} which prevents {{copyBlocks()}} be called multiple times > from {{DiskBalancer#executePlan}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10548) Remove the long deprecated BlockReaderRemote
[ https://issues.apache.org/jira/browse/HDFS-10548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363555#comment-15363555 ] Kai Zheng commented on HDFS-10548: -- This was done targeting 3.0, but if we need this for branch-2 as well, I can check and post a patch for the branch separately if necessary. > Remove the long deprecated BlockReaderRemote > > > Key: HDFS-10548 > URL: https://issues.apache.org/jira/browse/HDFS-10548 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Kai Zheng >Assignee: Kai Zheng > Fix For: 3.0.0-alpha1 > > Attachments: HDFS-10548-v1.patch, HDFS-10548-v2.patch, > HDFS-10548-v3.patch > > > To lessen the maintain burden like raised in HDFS-8901, suggest we remove > {{BlockReaderRemote}} class that's deprecated very long time ago. > From {{BlockReaderRemote}} header: > {quote} > * @deprecated this is an old implementation that is being left around > * in case any issues spring up with the new {@link BlockReaderRemote2} > * implementation. > * It will be removed in the next release. > {quote} > From {{BlockReaderRemote2}} class header: > {quote} > * This is a new implementation introduced in Hadoop 0.23 which > * is more efficient and simpler than the older BlockReader > * implementation. It should be renamed to BlockReaderRemote > * once we are confident in it. > {quote} > So even further, after getting rid of the old class, we could rename as the > comment suggested: BlockReaderRemote2 => BlockReaderRemote. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10548) Remove the long deprecated BlockReaderRemote
[ https://issues.apache.org/jira/browse/HDFS-10548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363549#comment-15363549 ] Andrew Wang commented on HDFS-10548: I don't think there's too much activity on BlockReaderRemote2 these days, so I think now is as good as any to pull the trigger. Is there a JIRA for removing BRLocalLegacy too? IIRC Windows used to need it since they didn't support passing the fd via domain socket, but maybe that's changed. > Remove the long deprecated BlockReaderRemote > > > Key: HDFS-10548 > URL: https://issues.apache.org/jira/browse/HDFS-10548 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Kai Zheng >Assignee: Kai Zheng > Fix For: 3.0.0-alpha1 > > Attachments: HDFS-10548-v1.patch, HDFS-10548-v2.patch, > HDFS-10548-v3.patch > > > To lessen the maintain burden like raised in HDFS-8901, suggest we remove > {{BlockReaderRemote}} class that's deprecated very long time ago. > From {{BlockReaderRemote}} header: > {quote} > * @deprecated this is an old implementation that is being left around > * in case any issues spring up with the new {@link BlockReaderRemote2} > * implementation. > * It will be removed in the next release. > {quote} > From {{BlockReaderRemote2}} class header: > {quote} > * This is a new implementation introduced in Hadoop 0.23 which > * is more efficient and simpler than the older BlockReader > * implementation. It should be renamed to BlockReaderRemote > * once we are confident in it. > {quote} > So even further, after getting rid of the old class, we could rename as the > comment suggested: BlockReaderRemote2 => BlockReaderRemote. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-9809) Abstract implementation-specific details from the datanode
[ https://issues.apache.org/jira/browse/HDFS-9809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Virajith Jalaparti updated HDFS-9809: - Attachment: HDFS-9809.003.patch Posting a new patch based on porting the previous changes to the most recent version of trunk. > Abstract implementation-specific details from the datanode > -- > > Key: HDFS-9809 > URL: https://issues.apache.org/jira/browse/HDFS-9809 > Project: Hadoop HDFS > Issue Type: Task > Components: datanode, fs >Reporter: Virajith Jalaparti >Assignee: Virajith Jalaparti > Attachments: HDFS-9809.001.patch, HDFS-9809.002.patch, > HDFS-9809.003.patch > > > Multiple parts of the Datanode (FsVolumeSpi, ReplicaInfo, FSVolumeImpl etc.) > implicitly assume that blocks are stored in java.io.File(s) and that volumes > are divided into directories. We propose to abstract these details, which > would help in supporting other storages. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10597) DFSClient hangs if using hedged reads and all but one eligible replica is down
[ https://issues.apache.org/jira/browse/HDFS-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Rose updated HDFS-10597: Description: If hedged reads are enabled, even if there is only a single datanode available, the hedged read loop will respect the ignored nodes list and never send more than one request, but retry for quite some time choosing a datanode. This is unfortunate, as the ignored nodes list is only ever added to and never removed from in the scope of a single request, therefore a single failed read fails the entire request *or* delays responses. There's actually a secondary undesirable behavior here too. If a hedged read can't find a datanode, it will delay a successful response considerably. To set the stage, lets say 10ms is the hedged read timeout and we only have a single replica available, that is, nodes=[DN1]. 1. [0ms] {{DFSInputStream#hedgedFetchBlockByteRange}} First (not-hedged) read is sent to DN1. In the future, the read takes 50ms to succeed. ignoredNodes=[DN1] 2. [10ms] Poll timeout. Send hedged request 3. [10ms] {{DFSInputStream#chooseDataNode}} is called to find a node for the hedged request. As ignoredNodes includes DN1, there are no nodes available and we re-query the NameNode for block locations and sleep, trying again. 4. [+3000ms] {{DFSInputStream#getBestNodeDNAddrPair}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 5. [+3000+6000ms] {{DFSInputStream#getBestNodeDNAddrPair}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 6. [+6000ms+9000ms] {{DFSInputStream#getBestNodeDNAddrPair}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 7. [27010ms] Control flow restored to {{DFSInputStream#hedgedFetchBlockByteRange}}, completion service is polled and the read that succeeded at [50ms] is returned successfully, except +27000ms late (worst case, expected value would be half given RNG). This is only one scenario (a happy scenario). Supposing that the first read eventually fails, the DFSClient will still retry inside of {{DFSInputStream#hedgedFetchBlockByteRange}} for the same retries before failing. I've identified one way to fix the behavior, but I'd be interested in thoughts: {{DFSInputStream#getBestNodeDNAddrPair}}, there's a check to see if a node is in the ignored list before allowing it to be returned. Amending this check to short-circuit if there's only a single available node avoids the regrettably useless retries, that is: {{nodes.length == 1 || ignoredNodes == null || !ignoredNodes.contains(nodes[i])}} However, with this change, if there's only one DN available, it'll send the hedged request to it as well. Better behavior would be to fail hedged requests quickly *or* push the waiting work into the hedge pool so that successful, fast reads aren't blocked by this issue. In our situation, we run a HBase cluster with HDFS RF=2 and hedged reads enabled, stopping a single datanode leads to the cluster coming to a grinding halt. You can observe this behavior yourself by editing {{TestPread#testMaxOutHedgedReadPool}}'s MiniDFSCluster to have a single datanode. was: If hedged reads are enabled, even if there is only a single datanode available, the hedged read loop will respect the ignored nodes list and never send more than one request, but retry for quite some time choosing a datanode. This is unfortunate, as the ignored nodes list is only ever added to and never removed from in the scope of a single request, therefore a single failed read fails the entire request *or* delays responses. There's actually a secondary undesirable behavior here too. If a hedged read can't find a datanode, it will delay a successful response considerably. To set the stage, lets say 10ms is the hedged read timeout and we only have a single replica available, that is, nodes=[DN1]. 1. [0ms] {{DFSInputStream#hedgedFetchBlockByteRange}} First (not-hedged) read is sent to DN1. In the future, the read takes 50ms to succeed. ignoredNodes=[DN1] 2. [10ms] Poll timeout. Send hedged request 3. [10ms] {{DFSInputStream#chooseDataNode}} is called to find a node for the hedged request. As ignoredNodes includes DN1, there are no nodes available and we re-query the NameNode for block locations and sleep, trying again. 4. [+3000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 5. [+3000+6000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 6. [+6000ms+9000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying
[jira] [Updated] (HDFS-10597) DFSClient hangs if using hedged reads and all but one eligible replica is down
[ https://issues.apache.org/jira/browse/HDFS-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Rose updated HDFS-10597: Description: If hedged reads are enabled, even if there is only a single datanode available, the hedged read loop will respect the ignored nodes list and never send more than one request, but retry for quite some time choosing a datanode. This is unfortunate, as the ignored nodes list is only ever added to and never removed from in the scope of a single request, therefore a single failed read fails the entire request *or* delays responses. There's actually a secondary undesirable behavior here too. If a hedged read can't find a datanode, it will delay a successful response considerably. To set the stage, lets say 10ms is the hedged read timeout and we only have a single replica available, that is, nodes=[DN1]. 1. [0ms] {{DFSInputStream#hedgedFetchBlockByteRange}} First (not-hedged) read is sent to DN1. In the future, the read takes 50ms to succeed. ignoredNodes=[DN1] 2. [10ms] Poll timeout. Send hedged request 3. [10ms] {{DFSInputStream#chooseDataNode}} is called to find a node for the hedged request. As ignoredNodes includes DN1, there are no nodes available and we re-query the NameNode for block locations and sleep, trying again. 4. [+3000ms] {{DFSInputStream#getBestNodeDNAddrPair}} is called. As ignoredNodes includes DN1, we throw, re-query the NameNode for block locations and sleep, trying again. 5. [+3000+6000ms] {{DFSInputStream#getBestNodeDNAddrPair}} is called. As ignoredNodes includes DN1, we throw, re-query the NameNode for block locations and sleep, trying again. 6. [+6000ms+9000ms] {{DFSInputStream#getBestNodeDNAddrPair}} is called. As ignoredNodes includes DN1, we throw, re-query the NameNode for block locations and sleep, trying again. 7. [27010ms] Control flow restored to {{DFSInputStream#hedgedFetchBlockByteRange}}, completion service is polled and the read that succeeded at [50ms] is returned successfully, except +27000ms late (worst case, expected value would be half given RNG). This is only one scenario (a happy scenario). Supposing that the first read eventually fails, the DFSClient will still retry inside of {{DFSInputStream#hedgedFetchBlockByteRange}} for the same retries before failing. I've identified one way to fix the behavior, but I'd be interested in thoughts: {{DFSInputStream#getBestNodeDNAddrPair}}, there's a check to see if a node is in the ignored list before allowing it to be returned. Amending this check to short-circuit if there's only a single available node avoids the regrettably useless retries, that is: {{nodes.length == 1 || ignoredNodes == null || !ignoredNodes.contains(nodes[i])}} However, with this change, if there's only one DN available, it'll send the hedged request to it as well. Better behavior would be to fail hedged requests quickly *or* push the waiting work into the hedge pool so that successful, fast reads aren't blocked by this issue. In our situation, we run a HBase cluster with HDFS RF=2 and hedged reads enabled, stopping a single datanode leads to the cluster coming to a grinding halt. You can observe this behavior yourself by editing {{TestPread#testMaxOutHedgedReadPool}}'s MiniDFSCluster to have a single datanode. was: If hedged reads are enabled, even if there is only a single datanode available, the hedged read loop will respect the ignored nodes list and never send more than one request, but retry for quite some time choosing a datanode. This is unfortunate, as the ignored nodes list is only ever added to and never removed from in the scope of a single request, therefore a single failed read fails the entire request *or* delays responses. There's actually a secondary undesirable behavior here too. If a hedged read can't find a datanode, it will delay a successful response considerably. To set the stage, lets say 10ms is the hedged read timeout and we only have a single replica available, that is, nodes=[DN1]. 1. [0ms] {{DFSInputStream#hedgedFetchBlockByteRange}} First (not-hedged) read is sent to DN1. In the future, the read takes 50ms to succeed. ignoredNodes=[DN1] 2. [10ms] Poll timeout. Send hedged request 3. [10ms] {{DFSInputStream#chooseDataNode}} is called to find a node for the hedged request. As ignoredNodes includes DN1, there are no nodes available and we re-query the NameNode for block locations and sleep, trying again. 4. [+3000ms] {{DFSInputStream#getBestNodeDNAddrPair}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 5. [+3000+6000ms] {{DFSInputStream#getBestNodeDNAddrPair}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 6. [+6000ms+9000ms] {{DFSInputStream#getBestNodeDNAddrPair}} is called. As ignoredNodes includes DN1, we re-query the
[jira] [Updated] (HDFS-10597) DFSClient hangs if using hedged reads and all but one eligible replica is down
[ https://issues.apache.org/jira/browse/HDFS-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Rose updated HDFS-10597: Description: If hedged reads are enabled, even if there is only a single datanode available, the hedged read loop will respect the ignored nodes list and never send more than one request, but retry for quite some time choosing a datanode. This is unfortunate, as the ignored nodes list is only ever added to and never removed from in the scope of a single request, therefore a single failed read fails the entire request *or* delays responses. There's actually a secondary undesirable behavior here too. If a hedged read can't find a datanode, it will delay a successful response considerably. To set the stage, lets say 10ms is the hedged read timeout and we only have a single replica available, that is, nodes=[DN1]. 1. [0ms] {{DFSInputStream#hedgedFetchBlockByteRange}} First (not-hedged) read is sent to DN1. In the future, the read takes 50ms to succeed. ignoredNodes=[DN1] 2. [10ms] Poll timeout. Send hedged request 3. [10ms] {{DFSInputStream#chooseDataNode}} is called to find a node for the hedged request. As ignoredNodes includes DN1, there are no nodes available and we re-query the NameNode for block locations and sleep, trying again. 4. [+3000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 5. [+3000+6000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 6. [+6000ms+9000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 7. [27010ms] Control flow restored to {{DFSInputStream#hedgedFetchBlockByteRange}}, completion service is polled and the read that succeeded at [50ms] is returned successfully, except +27000ms late (worst case, expected value would be half given RNG). This is only one scenario (a happy scenario). Supposing that the first read eventually fails, the DFSClient will still retry inside of {{DFSInputStream#hedgedFetchBlockByteRange}} for the same retries before failing. I've identified one way to fix the behavior, but I'd be interested in thoughts: {{DFSInputStream#getBestNodeDNAddrPair}}, there's a check to see if a node is in the ignored list before allowing it to be returned. Amending this check to short-circuit if there's only a single available node avoids the regrettably useless retries, that is: {{nodes.length == 1 || ignoredNodes == null || !ignoredNodes.contains(nodes[i])}} However, with this change, if there's only one DN available, it'll send the hedged request to it as well. Better behavior would be to fail hedged requests quickly *or* push the waiting work into the hedge pool so that successful, fast reads aren't blocked by this issue. In our situation, we run a HBase cluster with HDFS RF=2 and hedged reads enabled, stopping a single datanode leads to the cluster coming to a grinding halt. You can observe this behavior yourself by editing {{TestPread#testMaxOutHedgedReadPool}}'s MiniDFSCluster to have a single datanode. was: If hedged reads are enabled, even if there is only a single datanode available, the hedged read loop will respect the ignored nodes list and never send more than one request, but retry for quite some time choosing a datanode. This is unfortunate, as the ignored nodes list is only ever added to and never removed from in the scope of a single request, therefore a single failed read fails the entire request *or* delays responses. There's actually a secondary undesirable behavior here too. If a hedged read can't find a datanode, it will delay a successful response considerably. To set the stage, lets say 10ms is the hedged read timeout and we only have a single replica available, that is, nodes=[DN1]. 1. [0ms] {{DFSInputStream#hedgedFetchBlockByteRange}} First (not-hedged) read is sent to DN1. In the future, the read takes 50ms to succeed. ignoredNodes=[DN1] 2. [10ms] Poll timeout. Send hedged request 3. [10ms] {{DFSInputStream#chooseDataNode}} is called to find a node for the hedged request. As ignoredNodes includes DN1, there are no nodes available and we re-query the NameNode for block locations and sleep, trying again. 4. [+3000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 5. [+3000+6000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 6. [+6000ms+9000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 7. [27010ms]
[jira] [Created] (HDFS-10598) DiskBalancer does not execute multi-steps plan.
Lei (Eddy) Xu created HDFS-10598: Summary: DiskBalancer does not execute multi-steps plan. Key: HDFS-10598 URL: https://issues.apache.org/jira/browse/HDFS-10598 Project: Hadoop HDFS Issue Type: Sub-task Components: diskbalancer Affects Versions: 2.8.0, 3.0.0-beta1 Reporter: Lei (Eddy) Xu Priority: Critical I set up a 3 DN node cluster, each one with 2 small disks. After creating some files to fill HDFS, I added two more small disks to one DN. And run the diskbalancer on this DataNode. The disk usage before running diskbalancer: {code} /dev/loop0 3.9G 2.1G 1.6G 58% /mnt/data1 /dev/loop1 3.9G 2.6G 1.1G 71% /mnt/data2 /dev/loop2 3.9G 17M 3.6G 1% /mnt/data3 /dev/loop3 3.9G 17M 3.6G 1% /mnt/data4 {code} However, after running diskbalancer (i.e., {{-query}} shows {{PLAN_DONE}}) {code} /dev/loop0 3.9G 1.2G 2.5G 32% /mnt/data1 /dev/loop1 3.9G 2.6G 1.1G 71% /mnt/data2 /dev/loop2 3.9G 953M 2.7G 26% /mnt/data3 /dev/loop3 3.9G 17M 3.6G 1% /mnt/data4 {code} It is suspicious that in {{DiskBalancerMover#copyBlocks}}, every return does {{this.setExitFlag}} which prevents {{copyBlocks()}} be called multiple times from {{DiskBalancer#executePlan}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10579) HDFS web interfaces lack configs for X-FRAME-OPTIONS protection
[ https://issues.apache.org/jira/browse/HDFS-10579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer updated HDFS-10579: Fix Version/s: (was: 3.0.0-alpha1) 2.9.0 > HDFS web interfaces lack configs for X-FRAME-OPTIONS protection > --- > > Key: HDFS-10579 > URL: https://issues.apache.org/jira/browse/HDFS-10579 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode >Affects Versions: 3.0.0-alpha1 >Reporter: Anu Engineer >Assignee: Anu Engineer > Fix For: 2.9.0 > > > This JIRA proposes to extend the work done in HADOOP-12964 and enable a > configuration value that enables or disables that option. This JIRA will also > add an ability to pick the right x-frame-option, since right now it looks > like we have hardcoded that to SAMEORIGIN. > This allows HDFS to remain backward compatible as required by the branch-2. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10579) HDFS web interfaces lack configs for X-FRAME-OPTIONS protection
[ https://issues.apache.org/jira/browse/HDFS-10579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer updated HDFS-10579: Target Version/s: 2.9.0 (was: 3.0.0-alpha1) > HDFS web interfaces lack configs for X-FRAME-OPTIONS protection > --- > > Key: HDFS-10579 > URL: https://issues.apache.org/jira/browse/HDFS-10579 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode >Affects Versions: 3.0.0-alpha1 >Reporter: Anu Engineer >Assignee: Anu Engineer > Fix For: 3.0.0-alpha1 > > > This JIRA proposes to extend the work done in HADOOP-12964 and enable a > configuration value that enables or disables that option. This JIRA will also > add an ability to pick the right x-frame-option, since right now it looks > like we have hardcoded that to SAMEORIGIN. > This allows HDFS to remain backward compatible as required by the branch-2. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10579) HDFS web interfaces lack configs for X-FRAME-OPTIONS protection
[ https://issues.apache.org/jira/browse/HDFS-10579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363348#comment-15363348 ] Anu Engineer commented on HDFS-10579: - [~rkanter] [~haibochen] Tagging both of you to make sure that this JIRA is noticed by you. I will post a patch soon, would appreciate any feedback you might have. > HDFS web interfaces lack configs for X-FRAME-OPTIONS protection > --- > > Key: HDFS-10579 > URL: https://issues.apache.org/jira/browse/HDFS-10579 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode >Affects Versions: 3.0.0-alpha1 >Reporter: Anu Engineer >Assignee: Anu Engineer > Fix For: 3.0.0-alpha1 > > > This JIRA proposes to extend the work done in HADOOP-12964 and enable a > configuration value that enables or disables that option. This JIRA will also > add an ability to pick the right x-frame-option, since right now it looks > like we have hardcoded that to SAMEORIGIN. > This allows HDFS to remain backward compatible as required by the branch-2. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10579) HDFS web interfaces lack configs for X-FRAME-OPTIONS protection
[ https://issues.apache.org/jira/browse/HDFS-10579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer updated HDFS-10579: Description: This JIRA proposes to extend the work done in HADOOP-12964 and enable a configuration value that enables or disables that option. This JIRA will also add an ability to pick the right x-frame-option, since right now it looks like we have hardcoded that to SAMEORIGIN. This allows HDFS to remain backward compatible as required by the branch-2. was:This JIRA proposes to extend the work done in HADOOP-12964 and enable a configuration value that enables or disables that option. This JIRA will also add an ability to pick the right x-fram > HDFS web interfaces lack configs for X-FRAME-OPTIONS protection > --- > > Key: HDFS-10579 > URL: https://issues.apache.org/jira/browse/HDFS-10579 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode >Affects Versions: 3.0.0-alpha1 >Reporter: Anu Engineer >Assignee: Anu Engineer > Fix For: 3.0.0-alpha1 > > > This JIRA proposes to extend the work done in HADOOP-12964 and enable a > configuration value that enables or disables that option. This JIRA will also > add an ability to pick the right x-frame-option, since right now it looks > like we have hardcoded that to SAMEORIGIN. > This allows HDFS to remain backward compatible as required by the branch-2. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10579) HDFS web interfaces lack configs for XFS protection
[ https://issues.apache.org/jira/browse/HDFS-10579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer updated HDFS-10579: Description: This JIRA proposes to extend the work done in HADOOP-12964 and enable a configuration value that enables or disables that option. This JIRA will also add an ability to pick the right x-fram > HDFS web interfaces lack configs for XFS protection > --- > > Key: HDFS-10579 > URL: https://issues.apache.org/jira/browse/HDFS-10579 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode >Affects Versions: 3.0.0-alpha1 >Reporter: Anu Engineer >Assignee: Anu Engineer > Fix For: 3.0.0-alpha1 > > > This JIRA proposes to extend the work done in HADOOP-12964 and enable a > configuration value that enables or disables that option. This JIRA will also > add an ability to pick the right x-fram -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10579) HDFS web interfaces lack configs for X-FRAME-OPTIONS protection
[ https://issues.apache.org/jira/browse/HDFS-10579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer updated HDFS-10579: Summary: HDFS web interfaces lack configs for X-FRAME-OPTIONS protection (was: HDFS web interfaces lack configs for XFS protection) > HDFS web interfaces lack configs for X-FRAME-OPTIONS protection > --- > > Key: HDFS-10579 > URL: https://issues.apache.org/jira/browse/HDFS-10579 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode >Affects Versions: 3.0.0-alpha1 >Reporter: Anu Engineer >Assignee: Anu Engineer > Fix For: 3.0.0-alpha1 > > > This JIRA proposes to extend the work done in HADOOP-12964 and enable a > configuration value that enables or disables that option. This JIRA will also > add an ability to pick the right x-fram -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10596) libhdfs++: Implement hdfsFileIsEncrypted
[ https://issues.apache.org/jira/browse/HDFS-10596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363341#comment-15363341 ] Hadoop QA commented on HDFS-10596: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 27s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 7s{color} | {color:green} HDFS-8707 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 17s{color} | {color:green} HDFS-8707 passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 5s{color} | {color:green} HDFS-8707 passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 17s{color} | {color:green} HDFS-8707 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s{color} | {color:green} HDFS-8707 passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 7s{color} | {color:green} the patch passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 6m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 2s{color} | {color:green} the patch passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 6m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 31s{color} | {color:green} hadoop-hdfs-native-client in the patch passed with JDK v1.7.0_101. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 51m 17s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_91 Failed CTEST tests | test_libhdfs_threaded_hdfspp_test_shim_static | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0cf5e66 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12816305/HDFS-10596.HDFS-8707.000.patch | | JIRA Issue | HDFS-10596 | | Optional Tests | asflicense compile cc mvnsite javac unit | | uname | Linux b12be425e4ac 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | HDFS-8707 / d643d8c | | Default Java | 1.7.0_101 | | Multi-JDK versions | /usr/lib/jvm/java-8-oracle:1.8.0_91 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_101 | | CTEST | https://builds.apache.org/job/PreCommit-HDFS-Build/15987/artifact/patchprocess/patch-hadoop-hdfs-project_hadoop-hdfs-native-client-jdk1.8.0_91-ctest.txt | | JDK v1.7.0_101 Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/15987/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs-native-client U: hadoop-hdfs-project/hadoop-hdfs-native-client | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/15987/console | | Powered by | Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > libhdfs++: Implement hdfsFileIsEncrypted > > > Key: HDFS-10596 >
[jira] [Updated] (HDFS-10597) DFSClient hangs if using hedged reads and all but one eligible replica is down
[ https://issues.apache.org/jira/browse/HDFS-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Rose updated HDFS-10597: Affects Version/s: 2.4.0 2.5.0 > DFSClient hangs if using hedged reads and all but one eligible replica is > down > --- > > Key: HDFS-10597 > URL: https://issues.apache.org/jira/browse/HDFS-10597 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 2.4.0, 2.5.0, 2.6.0, 2.7.0 >Reporter: Michael Rose > > If hedged reads are enabled, even if there is only a single datanode > available, the hedged read loop will respect the ignored nodes list and never > send more than one request, but retry for quite some time choosing a datanode. > This is unfortunate, as the ignored nodes list is only ever added to and > never removed from in the scope of a single request, therefore a single > failed read fails the entire request *or* delays responses. > There's actually a secondary undesirable behavior here too. If a hedged read > can't find a datanode, it will delay a successful response considerably. To > set the stage, lets say 10ms is the hedged read timeout and we only have a > single replica available, that is, nodes=[DN1]. > 1. [0ms] {{DFSInputStream#hedgedFetchBlockByteRange}} First (not-hedged) read > is sent to DN1. In the future, the read takes 50ms to succeed. > ignoredNodes=[DN1] > 2. [10ms] Poll timeout. Send hedged request > 3. [10ms] {{DFSInputStream#chooseDataNode}} is called to find a node for the > hedged request. As ignoredNodes includes DN1, there are no nodes available > and we re-query the NameNode for block locations and sleep, trying again. > 4. [+3000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes > includes DN1, we re-query the NameNode for block locations and sleep, trying > again. > 5. [+3000+6000ms] {{DFSInputStream#chooseDataNode}} is called. As > ignoredNodes includes DN1, we re-query the NameNode for block locations and > sleep, trying again. > 6. [+6000ms+9000ms] {{DFSInputStream#chooseDataNode}} is called. As > ignoredNodes includes DN1, we re-query the NameNode for block locations and > sleep, trying again. > 7. [27010ms] Control flow restored to > {{DFSInputStream#hedgedFetchBlockByteRange}}, completion service is polled > and read that succeeded at [50ms] returned successfully, except +27000ms > extra (worst case, expected value would be half). > This is only one scenario (a happy scenario). Supposing that the first read > eventually fails, the DFSClient will still retry inside of > {{DFSInputStream#hedgedFetchBlockByteRange}} for the same retries before > failing. > I've identified one way to fix the behavior, but I'd be interested in > thoughts: > {{DFSInputStream#getBestNodeDNAddrPair}}, there's a check to see if a node is > in the ignored list before allowing it to be returned. Amending this check to > short-circuit if there's only a single available node avoids the regrettably > useless retries, that is: > {{nodes.length == 1 || ignoredNodes == null || > !ignoredNodes.contains(nodes[i])}} > However, with this change, if there's only one DN available, it'll send the > hedged request to it as well. Better behavior would be to fail hedged > requests quickly *or* push the waiting work into the hedge pool so that > successful, fast reads aren't blocked by this issue. > In our situation, we run a HBase cluster with HDFS RF=2 and hedged reads > enabled, stopping a single datanode leads to the cluster coming to a grinding > halt. > You can observe this behavior yourself by editing > {{TestPread#testMaxOutHedgedReadPool}}'s MiniDFSCluster to have a single > datanode. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10579) HDFS web interfaces lack configs for XFS protection
[ https://issues.apache.org/jira/browse/HDFS-10579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer updated HDFS-10579: Summary: HDFS web interfaces lack configs for XFS protection (was: HDFS web interfaces lack XFS protection) > HDFS web interfaces lack configs for XFS protection > --- > > Key: HDFS-10579 > URL: https://issues.apache.org/jira/browse/HDFS-10579 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode >Affects Versions: 3.0.0-alpha1 >Reporter: Anu Engineer >Assignee: Anu Engineer > Fix For: 3.0.0-alpha1 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10579) HDFS web interfaces lack XFS protection
[ https://issues.apache.org/jira/browse/HDFS-10579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer updated HDFS-10579: Description: (was: The web interfaces of Namenode and Datanode does not protect against XFS attacks. A filter was added in hadoop common (HADOOP-13008) to prevent XFS attacks. This JIRA proposes to use that filter to protect namenode and datanode web UI.) > HDFS web interfaces lack XFS protection > --- > > Key: HDFS-10579 > URL: https://issues.apache.org/jira/browse/HDFS-10579 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode >Affects Versions: 3.0.0-alpha1 >Reporter: Anu Engineer >Assignee: Anu Engineer > Fix For: 3.0.0-alpha1 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10597) DFSClient hangs if using hedged reads and all but one eligible replica is down
[ https://issues.apache.org/jira/browse/HDFS-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Rose updated HDFS-10597: Description: If hedged reads are enabled, even if there is only a single datanode available, the hedged read loop will respect the ignored nodes list and never send more than one request, but retry for quite some time choosing a datanode. This is unfortunate, as the ignored nodes list is only ever added to and never removed from in the scope of a single request, therefore a single failed read fails the entire request *or* delays responses. There's actually a secondary undesirable behavior here too. If a hedged read can't find a datanode, it will delay a successful response considerably. To set the stage, lets say 10ms is the hedged read timeout and we only have a single replica available, that is, nodes=[DN1]. 1. [0ms] {{DFSInputStream#hedgedFetchBlockByteRange}} First (not-hedged) read is sent to DN1. In the future, the read takes 50ms to succeed. ignoredNodes=[DN1] 2. [10ms] Poll timeout. Send hedged request 3. [10ms] {{DFSInputStream#chooseDataNode}} is called to find a node for the hedged request. As ignoredNodes includes DN1, there are no nodes available and we re-query the NameNode for block locations and sleep, trying again. 4. [+3000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 5. [+3000+6000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 6. [+6000ms+9000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 7. [27010ms] Control flow restored to {{DFSInputStream#hedgedFetchBlockByteRange}}, completion service is polled and read that succeeded at [50ms] returned successfully, except +27000ms extra (worst case, expected value would be half). This is only one scenario (a happy scenario). Supposing that the first read eventually fails, the DFSClient will still retry inside of {{DFSInputStream#hedgedFetchBlockByteRange}} for the same retries before failing. I've identified one way to fix the behavior, but I'd be interested in thoughts: {{DFSInputStream#getBestNodeDNAddrPair}}, there's a check to see if a node is in the ignored list before allowing it to be returned. Amending this check to short-circuit if there's only a single available node avoids the regrettably useless retries, that is: {{nodes.length == 1 || ignoredNodes == null || !ignoredNodes.contains(nodes[i])}} However, with this change, if there's only one DN available, it'll send the hedged request to it as well. Better behavior would be to fail hedged requests quickly *or* push the waiting work into the hedge pool so that successful, fast reads aren't blocked by this issue. In our situation, we run a HBase cluster with HDFS RF=2 and hedged reads enabled, stopping a single datanode leads to the cluster coming to a grinding halt. You can observe this behavior yourself by editing {{TestPread#testMaxOutHedgedReadPool}}'s MiniDFSCluster to have a single datanode. was: If hedged reads are enabled, even if there is only a single datanode available, the hedged read loop will respect the ignored nodes list and never send more than one request, but retry for quite some time choosing a datanode. This is unfortunate, as the ignored nodes list is only ever added to and never removed from in the scope of a single request, therefore a single failed read fails the entire request *or* delays responses. There's actually a secondary undesirable behavior here too. If a hedged read can't find a datanode, it will delay a successful response considerably. To set the stage, lets say 10ms is the hedged read timeout and we only have a single replica available, that is, nodes=[DN1]. 1. [0ms] {{DFSInputStream#hedgedFetchBlockByteRange}} First (not-hedged) read is sent to DN1. In the future, the read takes 50ms to succeed. ignoredNodes=[DN1] 2. [10ms] Poll timeout. Send hedged request 3. [10ms] {{DFSInputStream#chooseDataNode}} is called to find a node for the hedged request. As ignoredNodes includes DN1, there are no nodes available and we re-query the NameNode for block locations and sleep, trying again. 4. [+3000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 5. [+3000+6000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 6. [+6000ms+9000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 7. [27010ms] Control flow
[jira] [Updated] (HDFS-10597) DFSClient hangs if using hedged reads and all but one eligible replica is down
[ https://issues.apache.org/jira/browse/HDFS-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Rose updated HDFS-10597: Description: If hedged reads are enabled, even if there is only a single datanode available, the hedged read loop will respect the ignored nodes list and never send more than one request, but retry for quite some time choosing a datanode. This is unfortunate, as the ignored nodes list is only ever added to and never removed from in the scope of a single request, therefore a single failed read fails the entire request *or* delays responses. There's actually a secondary undesirable behavior here too. If a hedged read can't find a datanode, it will delay a successful response considerably. To set the stage, lets say 10ms is the hedged read timeout and we only have a single replica available, that is, nodes=[DN1]. 1. [0ms] {{DFSInputStream#hedgedFetchBlockByteRange}} First (not-hedged) read is sent to DN1. In the future, the read takes 50ms to succeed. ignoredNodes=[DN1] 2. [10ms] Poll timeout. Send hedged request 3. [10ms] {{DFSInputStream#chooseDataNode}} is called to find a node for the hedged request. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 4. [+3000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we have no nodes available and re-query the NameNode for block locations and sleep, trying again. 5. [+3000+6000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 6. [+6000ms+9000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 7. [27010ms] Control flow restored to {{DFSInputStream#hedgedFetchBlockByteRange}}, completion service is polled and read that succeeded at [50ms] returned successfully, except +27000ms extra (worst case, expected value would be half). This is only one scenario (a happy scenario). Supposing that the first read eventually fails, the DFSClient will still retry inside of {{DFSInputStream#hedgedFetchBlockByteRange}} for the same retries before failing. I've identified one way to fix the behavior, but I'd be interested in thoughts: {{DFSInputStream#getBestNodeDNAddrPair}}, there's a check to see if a node is in the ignored list before allowing it to be returned. Amending this check to short-circuit if there's only a single available node avoids the regrettably useless retries, that is: {{nodes.length == 1 || ignoredNodes == null || !ignoredNodes.contains(nodes[i])}} However, with this change, if there's only one DN available, it'll send the hedged request to it as well. Better behavior would be to fail hedged requests quickly *or* push the waiting work into the hedge pool so that successful, fast reads aren't blocked by this issue. In our situation, we run a HBase cluster with HDFS RF=2 and hedged reads enabled, stopping a single datanode leads to the cluster coming to a grinding halt. You can observe this behavior yourself by editing TestPread#testMaxOutHedgedReadPool's MiniDFSCluster to have a single datanode. was: If hedged reads are enabled, even if there is only a single datanode available, the hedged read loop will respect the ignored nodes list and never send more than one request, but retry for quite some time choosing a datanode. This is unfortunate, as the ignored nodes list is only ever added to and never removed from in the scope of a single request, therefore a single failed read fails the entire request *or* delays responses. There's actually a secondary undesirable behavior here too. If a hedged read can't find a datanode, it will delay a successful response considerably. To set the stage, lets say 10ms is the hedged read timeout and we only have a single replica available, that is, nodes=[DN1]. 1. [0ms] {{DFSInputStream#hedgedFetchBlockByteRange}} First (not-hedged) read is sent to DN1. In the future, the read takes 50ms to succeed. ignoredNodes=[DN1] 2. [10ms] Poll timeout. Send hedged request 3. [10ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 4. [+3000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 5. [+3000+6000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 6. [+6000ms+9000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 7. [27010ms] Control flow restored to {{DFSInputStream#hedgedFetchBlockByteRange}}, completion service is polled
[jira] [Updated] (HDFS-10597) DFSClient hangs if using hedged reads and all but one eligible replica is down
[ https://issues.apache.org/jira/browse/HDFS-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Rose updated HDFS-10597: Description: If hedged reads are enabled, even if there is only a single datanode available, the hedged read loop will respect the ignored nodes list and never send more than one request, but retry for quite some time choosing a datanode. This is unfortunate, as the ignored nodes list is only ever added to and never removed from in the scope of a single request, therefore a single failed read fails the entire request *or* delays responses. There's actually a secondary undesirable behavior here too. If a hedged read can't find a datanode, it will delay a successful response considerably. To set the stage, lets say 10ms is the hedged read timeout and we only have a single replica available, that is, nodes=[DN1]. 1. [0ms] {{DFSInputStream#hedgedFetchBlockByteRange}} First (not-hedged) read is sent to DN1. In the future, the read takes 50ms to succeed. ignoredNodes=[DN1] 2. [10ms] Poll timeout. Send hedged request 3. [10ms] {{DFSInputStream#chooseDataNode}} is called to find a node for the hedged request. As ignoredNodes includes DN1, there are no nodes available and we re-query the NameNode for block locations and sleep, trying again. 4. [+3000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 5. [+3000+6000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 6. [+6000ms+9000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 7. [27010ms] Control flow restored to {{DFSInputStream#hedgedFetchBlockByteRange}}, completion service is polled and read that succeeded at [50ms] returned successfully, except +27000ms extra (worst case, expected value would be half). This is only one scenario (a happy scenario). Supposing that the first read eventually fails, the DFSClient will still retry inside of {{DFSInputStream#hedgedFetchBlockByteRange}} for the same retries before failing. I've identified one way to fix the behavior, but I'd be interested in thoughts: {{DFSInputStream#getBestNodeDNAddrPair}}, there's a check to see if a node is in the ignored list before allowing it to be returned. Amending this check to short-circuit if there's only a single available node avoids the regrettably useless retries, that is: {{nodes.length == 1 || ignoredNodes == null || !ignoredNodes.contains(nodes[i])}} However, with this change, if there's only one DN available, it'll send the hedged request to it as well. Better behavior would be to fail hedged requests quickly *or* push the waiting work into the hedge pool so that successful, fast reads aren't blocked by this issue. In our situation, we run a HBase cluster with HDFS RF=2 and hedged reads enabled, stopping a single datanode leads to the cluster coming to a grinding halt. You can observe this behavior yourself by editing TestPread#testMaxOutHedgedReadPool's MiniDFSCluster to have a single datanode. was: If hedged reads are enabled, even if there is only a single datanode available, the hedged read loop will respect the ignored nodes list and never send more than one request, but retry for quite some time choosing a datanode. This is unfortunate, as the ignored nodes list is only ever added to and never removed from in the scope of a single request, therefore a single failed read fails the entire request *or* delays responses. There's actually a secondary undesirable behavior here too. If a hedged read can't find a datanode, it will delay a successful response considerably. To set the stage, lets say 10ms is the hedged read timeout and we only have a single replica available, that is, nodes=[DN1]. 1. [0ms] {{DFSInputStream#hedgedFetchBlockByteRange}} First (not-hedged) read is sent to DN1. In the future, the read takes 50ms to succeed. ignoredNodes=[DN1] 2. [10ms] Poll timeout. Send hedged request 3. [10ms] {{DFSInputStream#chooseDataNode}} is called to find a node for the hedged request. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 4. [+3000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we have no nodes available and re-query the NameNode for block locations and sleep, trying again. 5. [+3000+6000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 6. [+6000ms+9000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 7. [27010ms] Control flow restored to
[jira] [Updated] (HDFS-10597) DFSClient hangs if using hedged reads and all but one eligible replica is down
[ https://issues.apache.org/jira/browse/HDFS-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Rose updated HDFS-10597: Description: If hedged reads are enabled, even if there is only a single datanode available, the hedged read loop will respect the ignored nodes list and never send more than one request, but retry for quite some time choosing a datanode. This is unfortunate, as the ignored nodes list is only ever added to and never removed from in the scope of a single request, therefore a single failed read fails the entire request *or* delays responses. There's actually a secondary undesirable behavior here too. If a hedged read can't find a datanode, it will delay a successful response considerably. To set the stage, lets say 10ms is the hedged read timeout and we only have a single replica available, that is, nodes=[DN1]. 1. [0ms] {{DFSInputStream#hedgedFetchBlockByteRange}} First (not-hedged) read is sent to DN1. In the future, the read takes 50ms to succeed. ignoredNodes=[DN1] 2. [10ms] Poll timeout. Send hedged request 3. [10ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 4. [+3000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 5. [+3000+6000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 6. [+6000ms+9000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 7. [27010ms] Control flow restored to {{DFSInputStream#hedgedFetchBlockByteRange}}, completion service is polled and read that succeeded at [50ms] returned successfully, except +27000ms extra (worst case, expected value would be half). This is only one scenario (a happy scenario). Supposing that the first read eventually fails, the DFSClient will still retry inside of {{DFSInputStream#hedgedFetchBlockByteRange}} for the same retries before failing. I've identified one way to fix the behavior, but I'd be interested in thoughts: {{DFSInputStream#getBestNodeDNAddrPair}}, there's a check to see if a node is in the ignored list before allowing it to be returned. Amending this check to short-circuit if there's only a single available node avoids the regrettably useless retries, that is: {{nodes.length == 1 || ignoredNodes == null || !ignoredNodes.contains(nodes[i])}} However, with this change, if there's only one DN available, it'll send the hedged request to it as well. Better behavior would be to fail hedged requests quickly *or* push the waiting work into the hedge pool so that successful, fast reads aren't blocked by this issue. In our situation, we run a HBase cluster with HDFS RF=2 and hedged reads enabled, stopping a single datanode leads to the cluster coming to a grinding halt. You can observe this behavior yourself by editing TestPread#testMaxOutHedgedReadPool's MiniDFSCluster to have a single datanode. was: If hedged reads are enabled, even if there is only a single datanode available, the hedged read loop will respect the ignored nodes list and never send more than one request, but retry for quite some time choosing a datanode. This is unfortunate, as the ignored nodes list is only ever added to and never removed from in the scope of a single request, therefore a single failed read fails the entire request *or* delays responses. There's actually a secondary undesirable behavior here too. If a hedged read can't find a datanode, it will delay a successful response considerably. To set the stage, lets say 10ms is the hedged read timeout and we only have a single replica available, that is, nodes=[DN1]. 1. [0ms] {{DFSInputStream#hedgedFetchBlockByteRange}} First (not-hedged) read is sent to DN1. In the future, the read takes 50ms to succeed. ignoredNodes=[DN1] 2. [10ms] Poll timeout. Send hedged request 3. [10ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 4. [+3000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 5. [+3000+6000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 6. [+6000ms+9000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 7. [27010ms] Control flow restored to {{DFSInputStream#hedgedFetchBlockByteRange}}, completion service is polled and read that succeeded at [50ms] returned successfully, except
[jira] [Updated] (HDFS-10597) DFSClient hangs if using hedged reads and all but one eligible replica is down
[ https://issues.apache.org/jira/browse/HDFS-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Rose updated HDFS-10597: Description: If hedged reads are enabled, even if there is only a single datanode available, the hedged read loop will respect the ignored nodes list and never send more than one request, but retry for quite some time choosing a datanode. This is unfortunate, as the ignored nodes list is only ever added to and never removed from in the scope of a single request, therefore a single failed read fails the entire request *or* delays responses. There's actually a secondary undesirable behavior here too. If a hedged read can't find a datanode, it will delay a successful response considerably. To set the stage, lets say 10ms is the hedged read timeout and we only have a single replica available, that is, nodes=[DN1]. 1. [0ms] {{DFSInputStream#hedgedFetchBlockByteRange}} First (not-hedged) read is sent to DN1. In the future, the read takes 50ms to succeed. ignoredNodes=[DN1] 2. [10ms] Poll timeout. Send hedged request 3. [10ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 4. [+3000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 5. [+3000+6000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 6. [+6000ms+9000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 7. [27010ms] Control flow restored to {{DFSInputStream#hedgedFetchBlockByteRange}}, completion service is polled and read that succeeded at [50ms] returned successfully, except +27000ms extra (worst case, expected value would be half). This is only one scenario (a happy scenario). Supposing that the first read eventually fails, the DFSClient will still retry inside of `DFSInputStream#hedgedFetchBlockByteRange` for the same retries before failing. I've identified one way to fix the behavior, but I'd be interested in thoughts: {{DFSInputStream#getBestNodeDNAddrPair`, there's a check to see if a node is in the ignored list before allowing it to be returned. Amending this check to short-circuit if there's only a single available node avoids the regrettably useless retries, that is: {{nodes.length == 1 || ignoredNodes == null || !ignoredNodes.contains(nodes[i])}} However, with this change, if there's only one DN available, it'll send the hedged request to it as well. Better behavior would be to fail hedged requests quickly *or* push the waiting work into the hedge pool so that successful, fast reads aren't blocked by this issue. In our situation, we run a HBase cluster with HDFS RF=2 and hedged reads enabled, stopping a single datanode leads to the cluster coming to a grinding halt. You can observe this behavior yourself by editing TestPread#testMaxOutHedgedReadPool's MiniDFSCluster to have a single datanode. was: If hedged reads are enabled, even if there is only a single datanode available, the hedged read loop will respect the ignored nodes list and never send more than one request, but retry for quite some time choosing a datanode. This is unfortunate, as the ignored nodes list is only ever added to and never removed from in the scope of a single request, therefore a single failed read fails the entire request *or* delays responses. There's actually a secondary undesirable behavior here too. If a hedged read can't find a datanode, it will delay a successful response considerably. To set the stage, lets say 10ms is the hedged read timeout and we only have a single replica available, that is, nodes=[DN1]. 1. [0ms] {{DFSInputStream#hedgedFetchBlockByteRange}} First (not-hedged) read is sent to DN1. In the future, the read takes 50ms to succeed. ignoredNodes=[DN1] 2. [10ms] Poll timeout. Send hedged request 3. [10ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 4. [+3000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 5. [+3000+6000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 6. [+6000ms+9000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 7. [27010ms] Control flow restored to {{DFSInputStream#hedgedFetchBlockByteRange}}, completion service is polled and read that succeeded at [50ms] returned successfully, except
[jira] [Updated] (HDFS-10597) DFSClient hangs if using hedged reads and all but one eligible replica is down
[ https://issues.apache.org/jira/browse/HDFS-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Rose updated HDFS-10597: Description: If hedged reads are enabled, even if there is only a single datanode available, the hedged read loop will respect the ignored nodes list and never send more than one request, but retry for quite some time choosing a datanode. This is unfortunate, as the ignored nodes list is only ever added to and never removed from in the scope of a single request, therefore a single failed read fails the entire request *or* delays responses. There's actually a secondary undesirable behavior here too. If a hedged read can't find a datanode, it will delay a successful response considerably. To set the stage, lets say 10ms is the hedged read timeout and we only have a single replica available, that is, nodes=[DN1]. 1. [0ms] {{DFSInputStream#hedgedFetchBlockByteRange}} First (not-hedged) read is sent to DN1. In the future, the read takes 50ms to succeed. ignoredNodes=[DN1] 2. [10ms] Poll timeout. Send hedged request 3. [10ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 4. [+3000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 5. [+3000+6000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 6. [+6000ms+9000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 7. [27010ms] Control flow restored to {{DFSInputStream#hedgedFetchBlockByteRange}}, completion service is polled and read that succeeded at [50ms] returned successfully, except +27000ms extra (worst case, expected value would be half). This is only one scenario (a happy scenario). Supposing that the first read eventually fails, the DFSClient will still retry inside of `DFSInputStream#hedgedFetchBlockByteRange` for the same retries before failing. I've identified one way to fix the behavior, but I'd be interested in thoughts: {{DFSInputStream#getBestNodeDNAddrPair}}, there's a check to see if a node is in the ignored list before allowing it to be returned. Amending this check to short-circuit if there's only a single available node avoids the regrettably useless retries, that is: {{nodes.length == 1 || ignoredNodes == null || !ignoredNodes.contains(nodes[i])}} However, with this change, if there's only one DN available, it'll send the hedged request to it as well. Better behavior would be to fail hedged requests quickly *or* push the waiting work into the hedge pool so that successful, fast reads aren't blocked by this issue. In our situation, we run a HBase cluster with HDFS RF=2 and hedged reads enabled, stopping a single datanode leads to the cluster coming to a grinding halt. You can observe this behavior yourself by editing TestPread#testMaxOutHedgedReadPool's MiniDFSCluster to have a single datanode. was: If hedged reads are enabled, even if there is only a single datanode available, the hedged read loop will respect the ignored nodes list and never send more than one request, but retry for quite some time choosing a datanode. This is unfortunate, as the ignored nodes list is only ever added to and never removed from in the scope of a single request, therefore a single failed read fails the entire request *or* delays responses. There's actually a secondary undesirable behavior here too. If a hedged read can't find a datanode, it will delay a successful response considerably. To set the stage, lets say 10ms is the hedged read timeout and we only have a single replica available, that is, nodes=[DN1]. 1. [0ms] {{DFSInputStream#hedgedFetchBlockByteRange}} First (not-hedged) read is sent to DN1. In the future, the read takes 50ms to succeed. ignoredNodes=[DN1] 2. [10ms] Poll timeout. Send hedged request 3. [10ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 4. [+3000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 5. [+3000+6000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 6. [+6000ms+9000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 7. [27010ms] Control flow restored to {{DFSInputStream#hedgedFetchBlockByteRange}}, completion service is polled and read that succeeded at [50ms] returned successfully, except
[jira] [Updated] (HDFS-10597) DFSClient hangs if using hedged reads and all but one eligible replica is down
[ https://issues.apache.org/jira/browse/HDFS-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Rose updated HDFS-10597: Description: If hedged reads are enabled, even if there is only a single datanode available, the hedged read loop will respect the ignored nodes list and never send more than one request, but retry for quite some time choosing a datanode. This is unfortunate, as the ignored nodes list is only ever added to and never removed from in the scope of a single request, therefore a single failed read fails the entire request *or* delays responses. There's actually a secondary undesirable behavior here too. If a hedged read can't find a datanode, it will delay a successful response considerably. To set the stage, lets say 10ms is the hedged read timeout and we only have a single replica available, that is, nodes=[DN1]. 1. [0ms] {{DFSInputStream#hedgedFetchBlockByteRange}} First (not-hedged) read is sent to DN1. In the future, the read takes 50ms to succeed. ignoredNodes=[DN1] 2. [10ms] Poll timeout. Send hedged request 3. [10ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 4. [+3000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 5. [+3000+6000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 6. [+6000ms+9000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 7. [27010ms] Control flow restored to {{DFSInputStream#hedgedFetchBlockByteRange}}, completion service is polled and read that succeeded at [50ms] returned successfully, except +27000ms extra (worst case, expected value would be half). This is only one scenario (a happy scenario). Supposing that the first read eventually fails, the DFSClient will still retry inside of `DFSInputStream#hedgedFetchBlockByteRange` for the same retries before failing. I've identified one way to fix the behavior, but I'd be interested in thoughts: {{DFSInputStream#getBestNodeDNAddrPair`, there's a check to see if a node is in the ignored list before allowing it to be returned. Amending this check to short-circuit if there's only a single available node avoids the regrettably useless retries, that is: {{nodes.length == 1 || ignoredNodes == null || !ignoredNodes.contains(nodes[i])` However, with this change, if there's only one DN available, it'll send the hedged request to it as well. Better behavior would be to fail hedged requests quickly *or* push the waiting work into the hedge pool so that successful, fast reads aren't blocked by this issue. In our situation, we run a HBase cluster with HDFS RF=2 and hedged reads enabled, stopping a single datanode leads to the cluster coming to a grinding halt. You can observe this behavior yourself by editing TestPread#testMaxOutHedgedReadPool's MiniDFSCluster to have a single datanode. was: If hedged reads are enabled, even if there is only a single datanode available, the hedged read loop will respect the ignored nodes list and never send more than one request, but retry for quite some time choosing a datanode. This is unfortunate, as the ignored nodes list is only ever added to and never removed from in the scope of a single request, therefore a single failed read fails the entire request *or* delays responses. There's actually a secondary undesirable behavior here too. To set the stage, lets say 10ms is the hedged read timeout and we only have a single replica available. If a hedged read can't find a datanode, it will delay a successful response considerably. 1. [0ms] {{DFSInputStream#hedgedFetchBlockByteRange}} First (not-hedged) read is sent to DN1, read takes 50ms to succeed. ignoredNodes=[DN1] 2. [+10ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 3. [+3000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 3. [+3000+6000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 4. [+6000ms+9000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 5. [27010ms] Control flow restored to {{DFSInputStream#hedgedFetchBlockByteRange}}, completion service is polled and read that succeeded at [50ms] returned successfully, except +27000ms extra (worst case, expected value would be half). This is only one scenario (a
[jira] [Updated] (HDFS-10597) DFSClient hangs if using hedged reads and all but one eligible replica is down
[ https://issues.apache.org/jira/browse/HDFS-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Rose updated HDFS-10597: Description: If hedged reads are enabled, even if there is only a single datanode available, the hedged read loop will respect the ignored nodes list and never send more than one request, but retry for quite some time choosing a datanode. This is unfortunate, as the ignored nodes list is only ever added to and never removed from in the scope of a single request, therefore a single failed read fails the entire request *or* delays responses. There's actually a secondary undesirable behavior here too. To set the stage, lets say 10ms is the hedged read timeout and we only have a single replica available. If a hedged read can't find a datanode, it will delay a successful response considerably. 1. [0ms] {{DFSInputStream#hedgedFetchBlockByteRange}} First (not-hedged) read is sent to DN1, read takes 50ms to succeed. ignoredNodes=[DN1] 2. [+10ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 3. [+3000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 3. [+3000+6000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 4. [+6000ms+9000ms] {{DFSInputStream#chooseDataNode}} is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 5. [27010ms] Control flow restored to {{DFSInputStream#hedgedFetchBlockByteRange}}, completion service is polled and read that succeeded at [50ms] returned successfully, except +27000ms extra (worst case, expected value would be half). This is only one scenario (a happy scenario). Supposing that the first read eventually fails, the DFSClient will still retry inside of `DFSInputStream#hedgedFetchBlockByteRange` for the same retries before failing. I've identified one way to fix the behavior, but I'd be interested in thoughts: {{DFSInputStream#getBestNodeDNAddrPair`, there's a check to see if a node is in the ignored list before allowing it to be returned. Amending this check to short-circuit if there's only a single available node avoids the regrettably useless retries, that is: {{nodes.length == 1 || ignoredNodes == null || !ignoredNodes.contains(nodes[i])` However, with this change, if there's only one DN available, it'll send the hedged request to it as well. Better behavior would be to fail hedged requests quickly *or* push the waiting work into the hedge pool so that successful, fast reads aren't blocked by this issue. In our situation, we run a HBase cluster with HDFS RF=2 and hedged reads enabled, stopping a single datanode leads to the cluster coming to a grinding halt. You can observe this behavior yourself by editing TestPread#testMaxOutHedgedReadPool's MiniDFSCluster to have a single datanode. was: If hedged reads are enabled, even if there is only a single datanode available, the hedged read loop will respect the ignored nodes list and never send more than one request, but retry for quite some time choosing a datanode. This is unfortunate, as the ignored nodes list is only ever added to and never removed from in the scope of a single request, therefore a single failed read fails the entire request *or* delays responses. There's actually a secondary undesirable behavior here too. To set the stage, lets say 10ms is the hedged read timeout and we only have a single replica available. If a hedged read can't find a datanode, it will delay a successful response considerably. 1. [0ms] `DFSInputStream#hedgedFetchBlockByteRange` First (not-hedged) read is sent to DN1, read takes 50ms to succeed. ignoredNodes=[DN1] 2. [+10ms] `DFSInputStream#chooseDataNode` is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 3. [+3000ms] `DFSInputStream#chooseDataNode` is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 3. [+3000+6000ms] `DFSInputStream#chooseDataNode` is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 4. [+6000ms+9000ms] `DFSInputStream#chooseDataNode` is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 5. [27010ms] Control flow restored to `DFSInputStream#hedgedFetchBlockByteRange`, completion service is polled and read that succeeded at [50ms] returned successfully, except +27000ms extra (worst case, expected value would be half). This is only one scenario (a happy scenario). Supposing that the first read eventually fails, the DFSClient will still retry
[jira] [Created] (HDFS-10597) DFSClient hangs if using hedged reads and all but one eligible replica is down
Michael Rose created HDFS-10597: --- Summary: DFSClient hangs if using hedged reads and all but one eligible replica is down Key: HDFS-10597 URL: https://issues.apache.org/jira/browse/HDFS-10597 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.7.0, 2.6.0 Reporter: Michael Rose If hedged reads are enabled, even if there is only a single datanode available, the hedged read loop will respect the ignored nodes list and never send more than one request, but retry for quite some time choosing a datanode. This is unfortunate, as the ignored nodes list is only ever added to and never removed from in the scope of a single request, therefore a single failed read fails the entire request *or* delays responses. There's actually a secondary undesirable behavior here too. To set the stage, lets say 10ms is the hedged read timeout and we only have a single replica available. If a hedged read can't find a datanode, it will delay a successful response considerably. 1. [0ms] `DFSInputStream#hedgedFetchBlockByteRange` First (not-hedged) read is sent to DN1, read takes 50ms to succeed. ignoredNodes=[DN1] 2. [+10ms] `DFSInputStream#chooseDataNode` is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 3. [+3000ms] `DFSInputStream#chooseDataNode` is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 3. [+3000+6000ms] `DFSInputStream#chooseDataNode` is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 4. [+6000ms+9000ms] `DFSInputStream#chooseDataNode` is called. As ignoredNodes includes DN1, we re-query the NameNode for block locations and sleep, trying again. 5. [27010ms] Control flow restored to `DFSInputStream#hedgedFetchBlockByteRange`, completion service is polled and read that succeeded at [50ms] returned successfully, except +27000ms extra (worst case, expected value would be half). This is only one scenario (a happy scenario). Supposing that the first read eventually fails, the DFSClient will still retry inside of `DFSInputStream#hedgedFetchBlockByteRange` for the same retries before failing. I've identified one way to fix the behavior, but I'd be interested in thoughts: `DFSInputStream#getBestNodeDNAddrPair`, there's a check to see if a node is in the ignored list before allowing it to be returned. Amending this check to short-circuit if there's only a single available node avoids the regrettably useless retries, that is: `nodes.length == 1 || ignoredNodes == null || !ignoredNodes.contains(nodes[i])` However, with this change, if there's only one DN available, it'll send the hedged request to it as well. Better behavior would be to fail hedged requests quickly *or* push the waiting work into the hedge pool so that successful, fast reads aren't blocked by this issue. In our situation, we run a HBase cluster with HDFS RF=2 and hedged reads enabled, stopping a single datanode leads to the cluster coming to a grinding halt. You can observe this behavior yourself by editing TestPread#testMaxOutHedgedReadPool's MiniDFSCluster to have a single datanode. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10596) libhdfs++: Implement hdfsFileIsEncrypted
[ https://issues.apache.org/jira/browse/HDFS-10596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363263#comment-15363263 ] Anatoli Shein commented on HDFS-10596: -- In order to test this function we need an encryption zone in HDFS, and to set it up we need a key provider service running (kms). To get kms server to run I did the following modifications to the config files: /etc/hadoop/kms-site.xml: hadoop.kms.key.provider.uri jceks://file@/${user.home}/kms.keystore URI of the backing KeyProvider for the KMS. hadoop.security.keystore.java-keystore-provider.password-file kms.keystore.password If using the JavaKeyStoreProvider, the password for the keystore file. /etc/hadoop/core-site.xml hadoop.security.key.provider.path kms://http@localhost:16000/kms Path to KeyProvider for the KMS. Then I needed to create a password file like this: touch .../hadoop-2.6.0/share/hadoop/kms/tomcat/webapps/kms/WEB-INF/classes/kms.keystore.password After that I was able to start/stop KMS service from .../hadoop-2.6.0/sbin directory like this: ./kms.sh start ./kms.sh stop Then I created a new encryption key: hadoop key create myKey And was able to list it: hadoop key list -provider jceks://file@/home/anatoli/kms.keystore -metadata Created a new directory: hadoop fs -mkdir hdfs://localhost.localdomain:9433/zone However I cannot create zone. This is the command I am trying: hdfs crypto -createZone -keyName myKey -path hdfs://localhost.localdomain:9433/zone And I get this error: 16/07/05 17:12:27 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable RemoteException: Can't create an encryption zone for /zone since no key provider is available. Not sure how to go around this. Does anyone have any ideas? > libhdfs++: Implement hdfsFileIsEncrypted > > > Key: HDFS-10596 > URL: https://issues.apache.org/jira/browse/HDFS-10596 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Anatoli Shein > Attachments: HDFS-10596.HDFS-8707.000.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10596) libhdfs++: Implement hdfsFileIsEncrypted
[ https://issues.apache.org/jira/browse/HDFS-10596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anatoli Shein updated HDFS-10596: - Attachment: HDFS-10596.HDFS-8707.000.patch Initial patch that adds file encryption fields to the statinfo struct, and population of these fields in namenode_operations. > libhdfs++: Implement hdfsFileIsEncrypted > > > Key: HDFS-10596 > URL: https://issues.apache.org/jira/browse/HDFS-10596 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Anatoli Shein > Attachments: HDFS-10596.HDFS-8707.000.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10596) libhdfs++: Implement hdfsFileIsEncrypted
[ https://issues.apache.org/jira/browse/HDFS-10596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anatoli Shein updated HDFS-10596: - Status: Patch Available (was: Open) > libhdfs++: Implement hdfsFileIsEncrypted > > > Key: HDFS-10596 > URL: https://issues.apache.org/jira/browse/HDFS-10596 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Anatoli Shein > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9890) libhdfs++: Add test suite to simulate network issues
[ https://issues.apache.org/jira/browse/HDFS-9890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363232#comment-15363232 ] Hadoop QA commented on HDFS-9890: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 21s{color} | {color:green} HDFS-8707 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 20s{color} | {color:green} HDFS-8707 passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 22s{color} | {color:green} HDFS-8707 passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 16s{color} | {color:green} HDFS-8707 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s{color} | {color:green} HDFS-8707 passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 14s{color} | {color:green} the patch passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 5m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 5m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 18s{color} | {color:green} the patch passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 5m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 5m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 57s{color} | {color:green} hadoop-hdfs-native-client in the patch passed with JDK v1.7.0_101. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 47m 45s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_91 Failed CTEST tests | test_libhdfs_threaded_hdfspp_test_shim_static | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0cf5e66 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12816287/HDFS-9890.HDFS-8707.014.patch | | JIRA Issue | HDFS-9890 | | Optional Tests | asflicense compile cc mvnsite javac unit | | uname | Linux 3ef6412d6929 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | HDFS-8707 / d643d8c | | Default Java | 1.7.0_101 | | Multi-JDK versions | /usr/lib/jvm/java-8-oracle:1.8.0_91 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_101 | | CTEST | https://builds.apache.org/job/PreCommit-HDFS-Build/15986/artifact/patchprocess/patch-hadoop-hdfs-project_hadoop-hdfs-native-client-jdk1.8.0_91-ctest.txt | | JDK v1.7.0_101 Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/15986/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs-native-client U: hadoop-hdfs-project/hadoop-hdfs-native-client | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/15986/console | | Powered by | Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > libhdfs++: Add test suite to simulate network issues > > > Key: HDFS-9890
[jira] [Commented] (HDFS-10543) hdfsRead read stops at block boundary
[ https://issues.apache.org/jira/browse/HDFS-10543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363214#comment-15363214 ] Colin Patrick McCabe commented on HDFS-10543: - One approach would be to try checking the behavior of the Java client and seeing if you can do something similar. It is not incorrect to avoid short reads, just potentially inefficient. > hdfsRead read stops at block boundary > - > > Key: HDFS-10543 > URL: https://issues.apache.org/jira/browse/HDFS-10543 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Xiaowei Zhu > Fix For: HDFS-8707 > > Attachments: HDFS-10543.HDFS-8707.000.patch, > HDFS-10543.HDFS-8707.001.patch, HDFS-10543.HDFS-8707.002.patch, > HDFS-10543.HDFS-8707.003.patch, HDFS-10543.HDFS-8707.004.patch > > > Reproducer: > char *buf2 = new char[file_info->mSize]; > memset(buf2, 0, (size_t)file_info->mSize); > int ret = hdfsRead(fs, file, buf2, file_info->mSize); > delete [] buf2; > if(ret != file_info->mSize) { > std::stringstream ss; > ss << "tried to read " << file_info->mSize << " bytes. but read " << > ret << " bytes"; > ReportError(ss.str()); > hdfsCloseFile(fs, file); > continue; > } > When it runs with a file ~1.4GB large, it will return an error like "tried to > read 146890 bytes. but read 134217728 bytes". The HDFS cluster it runs > against has a block size of 134217728 bytes. So it seems hdfsRead will stop > at a block boundary. Looks like a regression. We should add retry to continue > reading cross blocks in case of files w/ multiple blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9271) Implement basic NN operations
[ https://issues.apache.org/jira/browse/HDFS-9271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363178#comment-15363178 ] Hadoop QA commented on HDFS-9271: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 36s{color} | {color:green} HDFS-8707 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 48s{color} | {color:green} HDFS-8707 passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 17s{color} | {color:green} HDFS-8707 passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 15s{color} | {color:green} HDFS-8707 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s{color} | {color:green} HDFS-8707 passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 25s{color} | {color:green} the patch passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 5m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 5m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 21s{color} | {color:green} the patch passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 5m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 5m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 6m 51s{color} | {color:green} hadoop-hdfs-native-client in the patch passed with JDK v1.7.0_101. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 46m 31s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0cf5e66 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12816278/HDFS-9271.HDFS-8707.002.patch | | JIRA Issue | HDFS-9271 | | Optional Tests | asflicense compile cc mvnsite javac unit | | uname | Linux 02dc3c2befae 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | HDFS-8707 / d643d8c | | Default Java | 1.7.0_101 | | Multi-JDK versions | /usr/lib/jvm/java-8-oracle:1.8.0_91 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_101 | | JDK v1.7.0_101 Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/15985/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs-native-client U: hadoop-hdfs-project/hadoop-hdfs-native-client | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/15985/console | | Powered by | Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Implement basic NN operations > - > > Key: HDFS-9271 > URL: https://issues.apache.org/jira/browse/HDFS-9271 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Bob Hansen >Assignee: Anatoli Shein > Attachments: HDFS-9271.HDFS-8707.000.patch, >
[jira] [Created] (HDFS-10596) libhdfs++: Implement hdfsFileIsEncrypted
Anatoli Shein created HDFS-10596: Summary: libhdfs++: Implement hdfsFileIsEncrypted Key: HDFS-10596 URL: https://issues.apache.org/jira/browse/HDFS-10596 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Anatoli Shein -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-9890) libhdfs++: Add test suite to simulate network issues
[ https://issues.apache.org/jira/browse/HDFS-9890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaowei Zhu updated HDFS-9890: -- Attachment: HDFS-9890.HDFS-8707.014.patch HDFS-9890.HDFS-8707.014.patch removes the debug build flag in pom.xml. Also fix the whitespace issue reported in previous patch. > libhdfs++: Add test suite to simulate network issues > > > Key: HDFS-9890 > URL: https://issues.apache.org/jira/browse/HDFS-9890 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: James Clampffer >Assignee: Xiaowei Zhu > Attachments: HDFS-9890.HDFS-8707.000.patch, > HDFS-9890.HDFS-8707.001.patch, HDFS-9890.HDFS-8707.002.patch, > HDFS-9890.HDFS-8707.003.patch, HDFS-9890.HDFS-8707.004.patch, > HDFS-9890.HDFS-8707.005.patch, HDFS-9890.HDFS-8707.006.patch, > HDFS-9890.HDFS-8707.007.patch, HDFS-9890.HDFS-8707.008.patch, > HDFS-9890.HDFS-8707.009.patch, HDFS-9890.HDFS-8707.010.patch, > HDFS-9890.HDFS-8707.011.patch, HDFS-9890.HDFS-8707.012.patch, > HDFS-9890.HDFS-8707.012.patch, HDFS-9890.HDFS-8707.013.patch, > HDFS-9890.HDFS-8707.013.patch, HDFS-9890.HDFS-8707.014.patch, > hs_err_pid26832.log, hs_err_pid4944.log > > > I propose adding a test suite to simulate various network issues/failures in > order to get good test coverage on some of the retry paths that aren't easy > to hit in mock unit tests. > At the moment the only things that hit the retry paths are the gmock unit > tests. The gmock are only as good as their mock implementations which do a > great job of simulating protocol correctness but not more complex > interactions. They also can't really simulate the types of lock contention > and subtle memory stomps that show up while doing hundreds or thousands of > concurrent reads. We should add a new minidfscluster test that focuses on > heavy read/seek load and then randomly convert error codes returned by > network functions into errors. > List of things to simulate(while heavily loaded), roughly in order of how > badly I think they need to be tested at the moment: > -Rpc connection disconnect > -Rpc connection slowed down enough to cause a timeout and trigger retry > -DN connection disconnect -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-9271) Implement basic NN operations
[ https://issues.apache.org/jira/browse/HDFS-9271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anatoli Shein updated HDFS-9271: Attachment: HDFS-9271.HDFS-8707.002.patch Patch attached, please review. In this patch: implemented: hdfsAvailable, hdfsFileIsOpenForWrite, hdfsExists, hdfsGetDefaultBlockSizeAtPath, hdfsSetReplication, hdfsGetWorkingDirectory, hdfsSetWorkingDirectory, hdfsGetHosts, hdfsFreeHosts, hdfsUtime, hdfsFileGetReadStatistics, hdfsFileClearReadStatistics, hdfsFileFreeReadStatistics, hdfsReadStatisticsGetRemoteBytesRead, small consistency fixes in hdfsCreateDirectory and hdfsGetBlockLocations. > Implement basic NN operations > - > > Key: HDFS-9271 > URL: https://issues.apache.org/jira/browse/HDFS-9271 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Bob Hansen >Assignee: Anatoli Shein > Attachments: HDFS-9271.HDFS-8707.000.patch, > HDFS-9271.HDFS-8707.001.patch, HDFS-9271.HDFS-8707.002.patch > > > Expose via C and C++ API: > * mkdirs > * rename > * delete > * stat > * chmod > * chown > * getListing > * setOwner -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-9271) Implement basic NN operations
[ https://issues.apache.org/jira/browse/HDFS-9271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anatoli Shein updated HDFS-9271: Status: Patch Available (was: Open) > Implement basic NN operations > - > > Key: HDFS-9271 > URL: https://issues.apache.org/jira/browse/HDFS-9271 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Bob Hansen >Assignee: Anatoli Shein > Attachments: HDFS-9271.HDFS-8707.000.patch, > HDFS-9271.HDFS-8707.001.patch > > > Expose via C and C++ API: > * mkdirs > * rename > * delete > * stat > * chmod > * chown > * getListing > * setOwner -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10543) hdfsRead read stops at block boundary
[ https://issues.apache.org/jira/browse/HDFS-10543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363033#comment-15363033 ] Xiaowei Zhu commented on HDFS-10543: Thanks Colin for the review. I believe this patch still supports short reads, just it won't automatically stop and return at block boundary. Obviously the test should not log something looks like an error, which should be fixed. So read stop at block boundary is by design? In that case we should also revert this commit. > hdfsRead read stops at block boundary > - > > Key: HDFS-10543 > URL: https://issues.apache.org/jira/browse/HDFS-10543 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Xiaowei Zhu > Fix For: HDFS-8707 > > Attachments: HDFS-10543.HDFS-8707.000.patch, > HDFS-10543.HDFS-8707.001.patch, HDFS-10543.HDFS-8707.002.patch, > HDFS-10543.HDFS-8707.003.patch, HDFS-10543.HDFS-8707.004.patch > > > Reproducer: > char *buf2 = new char[file_info->mSize]; > memset(buf2, 0, (size_t)file_info->mSize); > int ret = hdfsRead(fs, file, buf2, file_info->mSize); > delete [] buf2; > if(ret != file_info->mSize) { > std::stringstream ss; > ss << "tried to read " << file_info->mSize << " bytes. but read " << > ret << " bytes"; > ReportError(ss.str()); > hdfsCloseFile(fs, file); > continue; > } > When it runs with a file ~1.4GB large, it will return an error like "tried to > read 146890 bytes. but read 134217728 bytes". The HDFS cluster it runs > against has a block size of 134217728 bytes. So it seems hdfsRead will stop > at a block boundary. Looks like a regression. We should add retry to continue > reading cross blocks in case of files w/ multiple blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9890) libhdfs++: Add test suite to simulate network issues
[ https://issues.apache.org/jira/browse/HDFS-9890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363030#comment-15363030 ] Hadoop QA commented on HDFS-9890: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 27s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 29s{color} | {color:green} HDFS-8707 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 28s{color} | {color:green} HDFS-8707 passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 21s{color} | {color:green} HDFS-8707 passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 17s{color} | {color:green} HDFS-8707 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s{color} | {color:green} HDFS-8707 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 11s{color} | {color:green} HDFS-8707 passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s{color} | {color:green} HDFS-8707 passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 39s{color} | {color:green} the patch passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 4m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 4m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 47s{color} | {color:green} the patch passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 4m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 4m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 7s{color} | {color:green} the patch passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 10s{color} | {color:green} the patch passed with JDK v1.7.0_101 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 6m 3s{color} | {color:red} hadoop-hdfs-native-client in the patch failed with JDK v1.7.0_101. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 52m 46s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_91 Failed CTEST tests | test_libhdfs_threaded_hdfspp_test_shim_static | | | test_libhdfs_mini_stress_hdfspp_test_shim_static | | JDK v1.7.0_101 Failed CTEST tests | test_libhdfs_mini_stress_hdfspp_test_shim_static | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0cf5e66 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12816253/HDFS-9890.HDFS-8707.013.patch | | JIRA Issue | HDFS-9890 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit xml cc | | uname | Linux 50584004b9cb 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality |
[jira] [Commented] (HDFS-10555) Unable to loadFSEdits due to a failure in readCachePoolInfo
[ https://issues.apache.org/jira/browse/HDFS-10555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363005#comment-15363005 ] Colin Patrick McCabe commented on HDFS-10555: - Thanks, [~umamaheswararao], [~jingzhao], and [~kihwal]. > Unable to loadFSEdits due to a failure in readCachePoolInfo > --- > > Key: HDFS-10555 > URL: https://issues.apache.org/jira/browse/HDFS-10555 > Project: Hadoop HDFS > Issue Type: Bug > Components: caching, namenode >Affects Versions: 2.9.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Critical > Fix For: 2.9.0 > > Attachments: HDFS-10555-00.patch > > > Recently some tests are failing and unable to loadFSEdits due to a failure in > readCachePoolInfo. > Here in below code > FSImageSerialization.java > {code} > } > if ((flags & ~0x2F) != 0) { > throw new IOException("Unknown flag in CachePoolInfo: " + flags); > } > {code} > When all values of CachePool variable set to true, flags value & ~0x2F turns > out to non zero value. So, this condition failing due to the addition of 0x20 > and changing value from ~0x1F to ~0x2F. > May be to fix this issue, we may can change multiply value to ~0x3F -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10169) TestEditLog.testBatchedSyncWithClosedLogs with useAsyncEditLog sometimes fails
[ https://issues.apache.org/jira/browse/HDFS-10169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363004#comment-15363004 ] Hadoop QA commented on HDFS-10169: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 26s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 71m 11s{color} | {color:green} hadoop-hdfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 93m 44s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:85209cc | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12816232/HDFS-10169-00.patch | | JIRA Issue | HDFS-10169 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 4b6277e2607f 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 8b4b525 | | Default Java | 1.8.0_91 | | findbugs | v3.0.0 | | whitespace | https://builds.apache.org/job/PreCommit-HDFS-Build/15983/artifact/patchprocess/whitespace-eol.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/15983/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/15983/console | | Powered by | Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > TestEditLog.testBatchedSyncWithClosedLogs with useAsyncEditLog sometimes fails > -- > > Key: HDFS-10169 > URL: https://issues.apache.org/jira/browse/HDFS-10169 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Rakesh R > Attachments: HDFS-10169-00.patch > > > This failure has been seen multiple
[jira] [Commented] (HDFS-10548) Remove the long deprecated BlockReaderRemote
[ https://issues.apache.org/jira/browse/HDFS-10548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363003#comment-15363003 ] Colin Patrick McCabe commented on HDFS-10548: - Thanks for tackling this, guys. It is good to see this code duplication finally go away. Next target: {{BlockReaderLocalLegacy}}? I do think renaming {{BlockReaderRemote2}} will make merging code back to branch-2 more difficult-- you might want to reconsider that. > Remove the long deprecated BlockReaderRemote > > > Key: HDFS-10548 > URL: https://issues.apache.org/jira/browse/HDFS-10548 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Kai Zheng >Assignee: Kai Zheng > Fix For: 3.0.0-alpha1 > > Attachments: HDFS-10548-v1.patch, HDFS-10548-v2.patch, > HDFS-10548-v3.patch > > > To lessen the maintain burden like raised in HDFS-8901, suggest we remove > {{BlockReaderRemote}} class that's deprecated very long time ago. > From {{BlockReaderRemote}} header: > {quote} > * @deprecated this is an old implementation that is being left around > * in case any issues spring up with the new {@link BlockReaderRemote2} > * implementation. > * It will be removed in the next release. > {quote} > From {{BlockReaderRemote2}} class header: > {quote} > * This is a new implementation introduced in Hadoop 0.23 which > * is more efficient and simpler than the older BlockReader > * implementation. It should be renamed to BlockReaderRemote > * once we are confident in it. > {quote} > So even further, after getting rid of the old class, we could rename as the > comment suggested: BlockReaderRemote2 => BlockReaderRemote. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10543) hdfsRead read stops at block boundary
[ https://issues.apache.org/jira/browse/HDFS-10543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15362997#comment-15362997 ] Colin Patrick McCabe commented on HDFS-10543: - Just to be clear, the existing HDFS Java client can return "short reads" that are less than what was requested, even when there is more remaining in the file. This is traditional in POSIX and nearly all filesystems I'm aware of have these semantics. The justification is that applications may not want to wait a long time to fetch more bytes, if there are some bytes available already that they can process. Applications that do want the full buffer can just call read() again. APIs like {{readFully}} exist to provide these semantics. > hdfsRead read stops at block boundary > - > > Key: HDFS-10543 > URL: https://issues.apache.org/jira/browse/HDFS-10543 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Xiaowei Zhu > Fix For: HDFS-8707 > > Attachments: HDFS-10543.HDFS-8707.000.patch, > HDFS-10543.HDFS-8707.001.patch, HDFS-10543.HDFS-8707.002.patch, > HDFS-10543.HDFS-8707.003.patch, HDFS-10543.HDFS-8707.004.patch > > > Reproducer: > char *buf2 = new char[file_info->mSize]; > memset(buf2, 0, (size_t)file_info->mSize); > int ret = hdfsRead(fs, file, buf2, file_info->mSize); > delete [] buf2; > if(ret != file_info->mSize) { > std::stringstream ss; > ss << "tried to read " << file_info->mSize << " bytes. but read " << > ret << " bytes"; > ReportError(ss.str()); > hdfsCloseFile(fs, file); > continue; > } > When it runs with a file ~1.4GB large, it will return an error like "tried to > read 146890 bytes. but read 134217728 bytes". The HDFS cluster it runs > against has a block size of 134217728 bytes. So it seems hdfsRead will stop > at a block boundary. Looks like a regression. We should add retry to continue > reading cross blocks in case of files w/ multiple blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-9805) TCP_NODELAY not set before SASL handshake in data transfer pipeline
[ https://issues.apache.org/jira/browse/HDFS-9805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-9805: --- Resolution: Fixed Fix Version/s: 3.0.0-alpha1 Status: Resolved (was: Patch Available) > TCP_NODELAY not set before SASL handshake in data transfer pipeline > --- > > Key: HDFS-9805 > URL: https://issues.apache.org/jira/browse/HDFS-9805 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Gary Helmling >Assignee: Gary Helmling > Fix For: 3.0.0-alpha1 > > Attachments: HDFS-9805.002.patch, HDFS-9805.003.patch, > HDFS-9805.004.patch, HDFS-9805.005.patch > > > There are a few places in the DN -> DN block transfer pipeline where > TCP_NODELAY is not set before doing a SASL handshake: > * in {{DataNode.DataTransfer::run()}} > * in {{DataXceiver::replaceBlock()}} > * in {{DataXceiver::writeBlock()}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9805) TCP_NODELAY not set before SASL handshake in data transfer pipeline
[ https://issues.apache.org/jira/browse/HDFS-9805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15362983#comment-15362983 ] Colin Patrick McCabe commented on HDFS-9805: Thanks for the reminder, [~jzhuge]. I committed the patch last week, but JIRA went down before I could mark the ticket as resolved. I have committed this to trunk only for the moment. The backport to branch-2 looks like it might be a little tricky, and our next release will be 3.0 anyway. If anyone is interested in backporting to branch-2, please do and update the ticket. Cheers. > TCP_NODELAY not set before SASL handshake in data transfer pipeline > --- > > Key: HDFS-9805 > URL: https://issues.apache.org/jira/browse/HDFS-9805 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Gary Helmling >Assignee: Gary Helmling > Fix For: 3.0.0-alpha1 > > Attachments: HDFS-9805.002.patch, HDFS-9805.003.patch, > HDFS-9805.004.patch, HDFS-9805.005.patch > > > There are a few places in the DN -> DN block transfer pipeline where > TCP_NODELAY is not set before doing a SASL handshake: > * in {{DataNode.DataTransfer::run()}} > * in {{DataXceiver::replaceBlock()}} > * in {{DataXceiver::writeBlock()}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10594) HDFS-4949 should support recursive cache directives
[ https://issues.apache.org/jira/browse/HDFS-10594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-10594: Summary: HDFS-4949 should support recursive cache directives (was: CacheReplicationMonitor should recursively rescan the path when the inode of the path is directory) > HDFS-4949 should support recursive cache directives > --- > > Key: HDFS-10594 > URL: https://issues.apache.org/jira/browse/HDFS-10594 > Project: Hadoop HDFS > Issue Type: Improvement > Components: caching >Affects Versions: 2.7.1 >Reporter: Yiqun Lin >Assignee: Yiqun Lin > Attachments: HDFS-10594.001.patch > > > In {{CacheReplicationMonitor#rescanCacheDirectives}}, it should recursively > rescan the path when the inode of the path is a directory. In these code: > {code} > } else if (node.isDirectory()) { > INodeDirectory dir = node.asDirectory(); > ReadOnlyList children = dir > .getChildrenList(Snapshot.CURRENT_STATE_ID); > for (INode child : children) { > if (child.isFile()) { > rescanFile(directive, child.asFile()); > } > } >} > {code} > If we did the this logic, it means that some inode files will be ignored when > the child inode is also a directory and there are some other child inode file > in it. Finally the child's child file which belong to this path will not be > cached. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10594) CacheReplicationMonitor should recursively rescan the path when the inode of the path is directory
[ https://issues.apache.org/jira/browse/HDFS-10594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-10594: --- Issue Type: Improvement (was: Bug) Agree with Chris, marking this as an Improvement rather than Bug. > CacheReplicationMonitor should recursively rescan the path when the inode of > the path is directory > -- > > Key: HDFS-10594 > URL: https://issues.apache.org/jira/browse/HDFS-10594 > Project: Hadoop HDFS > Issue Type: Improvement > Components: caching >Affects Versions: 2.7.1 >Reporter: Yiqun Lin >Assignee: Yiqun Lin > Attachments: HDFS-10594.001.patch > > > In {{CacheReplicationMonitor#rescanCacheDirectives}}, it should recursively > rescan the path when the inode of the path is a directory. In these code: > {code} > } else if (node.isDirectory()) { > INodeDirectory dir = node.asDirectory(); > ReadOnlyList children = dir > .getChildrenList(Snapshot.CURRENT_STATE_ID); > for (INode child : children) { > if (child.isFile()) { > rescanFile(directive, child.asFile()); > } > } >} > {code} > If we did the this logic, it means that some inode files will be ignored when > the child inode is also a directory and there are some other child inode file > in it. Finally the child's child file which belong to this path will not be > cached. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9805) TCP_NODELAY not set before SASL handshake in data transfer pipeline
[ https://issues.apache.org/jira/browse/HDFS-9805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15362916#comment-15362916 ] John Zhuge commented on HDFS-9805: -- [~cmccabe]: You committed the patch into trunk on 6/29. Do you plan to resolve the jira? > TCP_NODELAY not set before SASL handshake in data transfer pipeline > --- > > Key: HDFS-9805 > URL: https://issues.apache.org/jira/browse/HDFS-9805 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Gary Helmling >Assignee: Gary Helmling > Attachments: HDFS-9805.002.patch, HDFS-9805.003.patch, > HDFS-9805.004.patch, HDFS-9805.005.patch > > > There are a few places in the DN -> DN block transfer pipeline where > TCP_NODELAY is not set before doing a SASL handshake: > * in {{DataNode.DataTransfer::run()}} > * in {{DataXceiver::replaceBlock()}} > * in {{DataXceiver::writeBlock()}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10564) UNDER MIN REPL'D BLOCKS should be prioritized for replication
[ https://issues.apache.org/jira/browse/HDFS-10564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15362879#comment-15362879 ] Elliott Clark commented on HDFS-10564: -- Yeah sorry Draining means decommissioning. > UNDER MIN REPL'D BLOCKS should be prioritized for replication > - > > Key: HDFS-10564 > URL: https://issues.apache.org/jira/browse/HDFS-10564 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Elliott Clark > > When datanodes get drained they are probably being drained because the > hardware is bad, or suspect. The blocks that have no live nodes should be > prioritized. However it appears not to be the case at all. > Draining full nodes with lots of blocks but only a handful of under min > replicated blocks takes about the full time before fsck reports clean again. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-9890) libhdfs++: Add test suite to simulate network issues
[ https://issues.apache.org/jira/browse/HDFS-9890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaowei Zhu updated HDFS-9890: -- Attachment: HDFS-9890.HDFS-8707.013.patch > libhdfs++: Add test suite to simulate network issues > > > Key: HDFS-9890 > URL: https://issues.apache.org/jira/browse/HDFS-9890 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: James Clampffer >Assignee: Xiaowei Zhu > Attachments: HDFS-9890.HDFS-8707.000.patch, > HDFS-9890.HDFS-8707.001.patch, HDFS-9890.HDFS-8707.002.patch, > HDFS-9890.HDFS-8707.003.patch, HDFS-9890.HDFS-8707.004.patch, > HDFS-9890.HDFS-8707.005.patch, HDFS-9890.HDFS-8707.006.patch, > HDFS-9890.HDFS-8707.007.patch, HDFS-9890.HDFS-8707.008.patch, > HDFS-9890.HDFS-8707.009.patch, HDFS-9890.HDFS-8707.010.patch, > HDFS-9890.HDFS-8707.011.patch, HDFS-9890.HDFS-8707.012.patch, > HDFS-9890.HDFS-8707.012.patch, HDFS-9890.HDFS-8707.013.patch, > HDFS-9890.HDFS-8707.013.patch, hs_err_pid26832.log, hs_err_pid4944.log > > > I propose adding a test suite to simulate various network issues/failures in > order to get good test coverage on some of the retry paths that aren't easy > to hit in mock unit tests. > At the moment the only things that hit the retry paths are the gmock unit > tests. The gmock are only as good as their mock implementations which do a > great job of simulating protocol correctness but not more complex > interactions. They also can't really simulate the types of lock contention > and subtle memory stomps that show up while doing hundreds or thousands of > concurrent reads. We should add a new minidfscluster test that focuses on > heavy read/seek load and then randomly convert error codes returned by > network functions into errors. > List of things to simulate(while heavily loaded), roughly in order of how > badly I think they need to be tested at the moment: > -Rpc connection disconnect > -Rpc connection slowed down enough to cause a timeout and trigger retry > -DN connection disconnect -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10593) MAX_DIR_ITEMS should not be hard coded since RPC buff size is configurable
[ https://issues.apache.org/jira/browse/HDFS-10593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15362863#comment-15362863 ] Andrew Wang commented on HDFS-10593: I recall we introduced this limit because it broke fsimage PB serde. The configs you mention refer to ipc; did we make fsimage serde similarly configurable? > MAX_DIR_ITEMS should not be hard coded since RPC buff size is configurable > --- > > Key: HDFS-10593 > URL: https://issues.apache.org/jira/browse/HDFS-10593 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Yuanbo Liu > > In HDFS, "dfs.namenode.fs-limits.max-directory-items" was introduced in > HDFS-6102 to restrict max items of single directory, and the value of it can > not be larger than the value of MAX_DIR_ITEMS. Since > "ipc.maximum.data.length" was added in HADOOP-9676 and documented in > HADOOP-13039 to make maximum RPC buffer size configurable, it's not proper to > hard code the value of MAX_DIR_ITEMS in {{FSDirectory}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-10595) libhdfs++: Client Name Protobuf Error
Anatoli Shein created HDFS-10595: Summary: libhdfs++: Client Name Protobuf Error Key: HDFS-10595 URL: https://issues.apache.org/jira/browse/HDFS-10595 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Anatoli Shein When running a cat tool (/hadoop-hdfs-native-client/src/main/native/libhdfspp/examples/cat/c/cat.c) I get the following error: [libprotobuf ERROR google/protobuf/wire_format.cc:1053] String field contains invalid UTF-8 data when serializing a protocol buffer. Use the 'bytes' type if you intend to send raw bytes. However it executes correctly. Looks like this error happens when trying to serialize Client name in ClientOperationHeaderProto::SerializeWithCachedSizes (/hadoop-hdfs-native-client/target/main/native/libhdfspp/lib/proto/datatransfer.pb.cc) Possibly the problem is caused by generating client name as a UUID in GetRandomClientName (/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/common/util.cc) In Java client it looks like there are two different unique client identifiers: ClientName and ClientId: Client name is generated as: clientName = "DFSClient_" + dfsClientConf.getTaskId() + "_" + ThreadLocalRandom.current().nextInt() + "_" + Thread.currentThread().getId(); (/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java) ClientId is generated as a UUID in (/hadoop-common/src/main/java/org/apache/hadoop/ipc/ClientId.java) In libhdfs++ we need to possibly also have two unique client identifiers, or fix the current client name to work without protobuf warnings/errors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10594) CacheReplicationMonitor should recursively rescan the path when the inode of the path is directory
[ https://issues.apache.org/jira/browse/HDFS-10594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15362829#comment-15362829 ] Chris Nauroth commented on HDFS-10594: -- During initial implementation, we made an intentional choice that a cache directive on a directory applies to its direct children only, not all descendants recursively. This behavior is documented here: http://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/CentralizedCacheManagement.html#Cache_directive I'm not in favor of changing this behavior, because it would be an unexpected change for users after an upgrade. It's possible that it would cause the DataNode to {{mlock}} a lot more files than pre-upgrade. This would cause either unpredictable caching if the new files exceed {{dfs.datanode.max.locked.memory}}, possibly caching files that are not useful to cache, or even worse, blowing out memory budget and causing insufficient memory for services and YARN containers running on the host. If there is a desire for this behavior, then a more graceful way to support it would be to introduce a notion of a recursive cache directive. This would preserve the existing default behavior of applying only to direct children. Users who want the recursive behavior could opt in by passing a new flag while creating the cache directive. > CacheReplicationMonitor should recursively rescan the path when the inode of > the path is directory > -- > > Key: HDFS-10594 > URL: https://issues.apache.org/jira/browse/HDFS-10594 > Project: Hadoop HDFS > Issue Type: Bug > Components: caching >Affects Versions: 2.7.1 >Reporter: Yiqun Lin >Assignee: Yiqun Lin > Attachments: HDFS-10594.001.patch > > > In {{CacheReplicationMonitor#rescanCacheDirectives}}, it should recursively > rescan the path when the inode of the path is a directory. In these code: > {code} > } else if (node.isDirectory()) { > INodeDirectory dir = node.asDirectory(); > ReadOnlyList children = dir > .getChildrenList(Snapshot.CURRENT_STATE_ID); > for (INode child : children) { > if (child.isFile()) { > rescanFile(directive, child.asFile()); > } > } >} > {code} > If we did the this logic, it means that some inode files will be ignored when > the child inode is also a directory and there are some other child inode file > in it. Finally the child's child file which belong to this path will not be > cached. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10169) TestEditLog.testBatchedSyncWithClosedLogs with useAsyncEditLog sometimes fails
[ https://issues.apache.org/jira/browse/HDFS-10169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-10169: Summary: TestEditLog.testBatchedSyncWithClosedLogs with useAsyncEditLog sometimes fails (was: TestEditLog.testBatchedSyncWithClosedLogs sometimes fails.) > TestEditLog.testBatchedSyncWithClosedLogs with useAsyncEditLog sometimes fails > -- > > Key: HDFS-10169 > URL: https://issues.apache.org/jira/browse/HDFS-10169 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Rakesh R > Attachments: HDFS-10169-00.patch > > > This failure has been seen multiple precomit builds recently. > {noformat} > testBatchedSyncWithClosedLogs[1](org.apache.hadoop.hdfs.server.namenode.TestEditLog) > Time elapsed: 0.377 sec <<< FAILURE! > java.lang.AssertionError: logging edit without syncing should do not affect > txid expected:<1> but was:<2> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.hdfs.server.namenode.TestEditLog.testBatchedSyncWithClosedLogs(TestEditLog.java:594) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10592) Fix intermittent test failure of TestNameNodeResourceChecker#testCheckThatNameNodeResourceMonitorIsRunning
[ https://issues.apache.org/jira/browse/HDFS-10592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15362807#comment-15362807 ] Rakesh R commented on HDFS-10592: - Please ignore the test case failures, it unrelated to my patch. I could see HDFS-10169 is handling {{TestEditLog}} failure and I've commented in that jira. Can someone help me by reviewing the proposed patch/fix. Thanks! > Fix intermittent test failure of > TestNameNodeResourceChecker#testCheckThatNameNodeResourceMonitorIsRunning > -- > > Key: HDFS-10592 > URL: https://issues.apache.org/jira/browse/HDFS-10592 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Rakesh R >Assignee: Rakesh R > Fix For: 2.8.0 > > Attachments: HDFS-10592-00.patch, HDFS-10592-01.patch > > > This jira is to fix the > {{TestNameNodeResourceChecker#testCheckThatNameNodeResourceMonitorIsRunning}} > test case failure. > Reference > [Build_15973|https://builds.apache.org/job/PreCommit-HDFS-Build/15973/testReport/junit/org.apache.hadoop.hdfs.server.namenode/TestNameNodeResourceChecker/testCheckThatNameNodeResourceMonitorIsRunning/] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10169) TestEditLog.testBatchedSyncWithClosedLogs sometimes fails.
[ https://issues.apache.org/jira/browse/HDFS-10169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-10169: Target Version/s: 2.8.0 Status: Patch Available (was: Open) > TestEditLog.testBatchedSyncWithClosedLogs sometimes fails. > -- > > Key: HDFS-10169 > URL: https://issues.apache.org/jira/browse/HDFS-10169 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Rakesh R > Attachments: HDFS-10169-00.patch > > > This failure has been seen multiple precomit builds recently. > {noformat} > testBatchedSyncWithClosedLogs[1](org.apache.hadoop.hdfs.server.namenode.TestEditLog) > Time elapsed: 0.377 sec <<< FAILURE! > java.lang.AssertionError: logging edit without syncing should do not affect > txid expected:<1> but was:<2> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.hdfs.server.namenode.TestEditLog.testBatchedSyncWithClosedLogs(TestEditLog.java:594) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10169) TestEditLog.testBatchedSyncWithClosedLogs sometimes fails.
[ https://issues.apache.org/jira/browse/HDFS-10169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15362800#comment-15362800 ] Rakesh R commented on HDFS-10169: - Hi [~cnauroth], I've come across this failure while analyzing HDFS-10592 QA report. Following is my analysis: In case of {{useAsyncEditLog=true}}, {{FSEditLogAsync#syncThread}} thread is running in the background and invoking {{logSync(getLastWrittenTxId());}} call. This call will increase the transaction id. In the failed test scenario, the next assertion statement is not expecting the logSync() call to happen, but its is not deterministic due to async {{logSync()}} call. {code} // Log an edit from thread A doLogEdit(threadA, editLog, "thread-a 1"); assertEquals("logging edit without syncing should do not affect txid", 1, editLog.getSyncTxId()); {code} I think the issue is different from HDFS-10183. Am I missing anything? Simple way is to skip the test case something similar to [TestEditLog_testSyncBatching|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLog.java#L511] approach. But I've tried an attempt to fix this in another way by stopping the syncThread and then later restarting it to avoid the async {{logSync()}} call. I've attached draft patch to show this approach. Please review the analysis and the patch. Thanks! > TestEditLog.testBatchedSyncWithClosedLogs sometimes fails. > -- > > Key: HDFS-10169 > URL: https://issues.apache.org/jira/browse/HDFS-10169 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Rakesh R > Attachments: HDFS-10169-00.patch > > > This failure has been seen multiple precomit builds recently. > {noformat} > testBatchedSyncWithClosedLogs[1](org.apache.hadoop.hdfs.server.namenode.TestEditLog) > Time elapsed: 0.377 sec <<< FAILURE! > java.lang.AssertionError: logging edit without syncing should do not affect > txid expected:<1> but was:<2> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.hdfs.server.namenode.TestEditLog.testBatchedSyncWithClosedLogs(TestEditLog.java:594) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10169) TestEditLog.testBatchedSyncWithClosedLogs sometimes fails.
[ https://issues.apache.org/jira/browse/HDFS-10169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-10169: Attachment: HDFS-10169-00.patch > TestEditLog.testBatchedSyncWithClosedLogs sometimes fails. > -- > > Key: HDFS-10169 > URL: https://issues.apache.org/jira/browse/HDFS-10169 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Rakesh R > Attachments: HDFS-10169-00.patch > > > This failure has been seen multiple precomit builds recently. > {noformat} > testBatchedSyncWithClosedLogs[1](org.apache.hadoop.hdfs.server.namenode.TestEditLog) > Time elapsed: 0.377 sec <<< FAILURE! > java.lang.AssertionError: logging edit without syncing should do not affect > txid expected:<1> but was:<2> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.hdfs.server.namenode.TestEditLog.testBatchedSyncWithClosedLogs(TestEditLog.java:594) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-10169) TestEditLog.testBatchedSyncWithClosedLogs sometimes fails.
[ https://issues.apache.org/jira/browse/HDFS-10169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R reassigned HDFS-10169: --- Assignee: Rakesh R > TestEditLog.testBatchedSyncWithClosedLogs sometimes fails. > -- > > Key: HDFS-10169 > URL: https://issues.apache.org/jira/browse/HDFS-10169 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Rakesh R > > This failure has been seen multiple precomit builds recently. > {noformat} > testBatchedSyncWithClosedLogs[1](org.apache.hadoop.hdfs.server.namenode.TestEditLog) > Time elapsed: 0.377 sec <<< FAILURE! > java.lang.AssertionError: logging edit without syncing should do not affect > txid expected:<1> but was:<2> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.hdfs.server.namenode.TestEditLog.testBatchedSyncWithClosedLogs(TestEditLog.java:594) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10594) CacheReplicationMonitor should recursively rescan the path when the inode of the path is directory
[ https://issues.apache.org/jira/browse/HDFS-10594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yiqun Lin updated HDFS-10594: - Attachment: HDFS-10594.001.patch > CacheReplicationMonitor should recursively rescan the path when the inode of > the path is directory > -- > > Key: HDFS-10594 > URL: https://issues.apache.org/jira/browse/HDFS-10594 > Project: Hadoop HDFS > Issue Type: Bug > Components: caching >Affects Versions: 2.7.1 >Reporter: Yiqun Lin >Assignee: Yiqun Lin > Attachments: HDFS-10594.001.patch > > > In {{CacheReplicationMonitor#rescanCacheDirectives}}, it should recursively > rescan the path when the inode of the path is a directory. In these code: > {code} > } else if (node.isDirectory()) { > INodeDirectory dir = node.asDirectory(); > ReadOnlyList children = dir > .getChildrenList(Snapshot.CURRENT_STATE_ID); > for (INode child : children) { > if (child.isFile()) { > rescanFile(directive, child.asFile()); > } > } >} > {code} > If we did the this logic, it means that some inode files will be ignored when > the child inode is also a directory and there are some other child inode file > in it. Finally the child's child file which belong to this path will not be > cached. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10594) CacheReplicationMonitor should recursively rescan the path when the inode of the path is directory
[ https://issues.apache.org/jira/browse/HDFS-10594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yiqun Lin updated HDFS-10594: - Status: Patch Available (was: Open) > CacheReplicationMonitor should recursively rescan the path when the inode of > the path is directory > -- > > Key: HDFS-10594 > URL: https://issues.apache.org/jira/browse/HDFS-10594 > Project: Hadoop HDFS > Issue Type: Bug > Components: caching >Affects Versions: 2.7.1 >Reporter: Yiqun Lin >Assignee: Yiqun Lin > > In {{CacheReplicationMonitor#rescanCacheDirectives}}, it should recursively > rescan the path when the inode of the path is a directory. In these code: > {code} > } else if (node.isDirectory()) { > INodeDirectory dir = node.asDirectory(); > ReadOnlyList children = dir > .getChildrenList(Snapshot.CURRENT_STATE_ID); > for (INode child : children) { > if (child.isFile()) { > rescanFile(directive, child.asFile()); > } > } >} > {code} > If we did the this logic, it means that some inode files will be ignored when > the child inode is also a directory and there are some other child inode file > in it. Finally the child's child file which belong to this path will not be > cached. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10594) CacheReplicationMonitor should recursively rescan the path when the inode of the path is directory
[ https://issues.apache.org/jira/browse/HDFS-10594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15362497#comment-15362497 ] Yiqun Lin commented on HDFS-10594: -- Attach a initial patch. > CacheReplicationMonitor should recursively rescan the path when the inode of > the path is directory > -- > > Key: HDFS-10594 > URL: https://issues.apache.org/jira/browse/HDFS-10594 > Project: Hadoop HDFS > Issue Type: Bug > Components: caching >Affects Versions: 2.7.1 >Reporter: Yiqun Lin >Assignee: Yiqun Lin > > In {{CacheReplicationMonitor#rescanCacheDirectives}}, it should recursively > rescan the path when the inode of the path is a directory. In these code: > {code} > } else if (node.isDirectory()) { > INodeDirectory dir = node.asDirectory(); > ReadOnlyList children = dir > .getChildrenList(Snapshot.CURRENT_STATE_ID); > for (INode child : children) { > if (child.isFile()) { > rescanFile(directive, child.asFile()); > } > } >} > {code} > If we did the this logic, it means that some inode files will be ignored when > the child inode is also a directory and there are some other child inode file > in it. Finally the child's child file which belong to this path will not be > cached. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-10594) CacheReplicationMonitor should recursively rescan the path when the inode of the path is directory
Yiqun Lin created HDFS-10594: Summary: CacheReplicationMonitor should recursively rescan the path when the inode of the path is directory Key: HDFS-10594 URL: https://issues.apache.org/jira/browse/HDFS-10594 Project: Hadoop HDFS Issue Type: Bug Components: caching Affects Versions: 2.7.1 Reporter: Yiqun Lin Assignee: Yiqun Lin In {{CacheReplicationMonitor#rescanCacheDirectives}}, it should recursively rescan the path when the inode of the path is a directory. In these code: {code} } else if (node.isDirectory()) { INodeDirectory dir = node.asDirectory(); ReadOnlyList children = dir .getChildrenList(Snapshot.CURRENT_STATE_ID); for (INode child : children) { if (child.isFile()) { rescanFile(directive, child.asFile()); } } } {code} If we did the this logic, it means that some inode files will be ignored when the child inode is also a directory and there are some other child inode file in it. Finally the child's child file which belong to this path will not be cached. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-6962) ACLs inheritance conflict with umaskmode
[ https://issues.apache.org/jira/browse/HDFS-6962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15362354#comment-15362354 ] Hadoop QA commented on HDFS-6962: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 31s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 21s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 16s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 7m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 5s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 35s{color} | {color:orange} root: The patch generated 1 new + 1152 unchanged - 0 fixed = 1153 total (was 1152) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 2s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 58s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 74m 18s{color} | {color:green} hadoop-hdfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 27s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}135m 3s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:85209cc | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12816153/HDFS-6962.006.patch | | JIRA Issue | HDFS-6962 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle cc xml | | uname | Linux 3d49e69f46bd 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 8b4b525 | | Default Java | 1.8.0_91 | | findbugs | v3.0.0 | | checkstyle |
[jira] [Commented] (HDFS-8956) Not able to start Datanode
[ https://issues.apache.org/jira/browse/HDFS-8956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15362333#comment-15362333 ] SHIVADEEP GUNDOJU commented on HDFS-8956: - Hello I got exactly same problem for me after uncommenting below line in /etc/hosts. datanode started 127.0.0.1 localhost I don't know why how...but it started Thanks > Not able to start Datanode > -- > > Key: HDFS-8956 > URL: https://issues.apache.org/jira/browse/HDFS-8956 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode >Affects Versions: 2.7.0 > Environment: Centos >Reporter: sreelakshmi > > Data node service is not started on one of the data nodes, "java.net.bind > exception" is thrown. > Verified that ports 50010,50070 and 50075 are not in use by any other > application. > 15/08/26 01:50:15 INFO http.HttpServer2: HttpServer.start() threw a non Bind > IOException > java.net.BindException: Port in use: localhost:0 > at > org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:919) > at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:856) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.startInfoServer(DataNode.java:779) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1134) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:434) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2404) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2291) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2338) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2515) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2539) > Caused by: java.net.BindException: Cannot assign requested address -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-7957) Truncate should verify quota before making changes
[ https://issues.apache.org/jira/browse/HDFS-7957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shen Yinjie updated HDFS-7957: -- Assignee: Jing Zhao (was: Shen Yinjie) > Truncate should verify quota before making changes > -- > > Key: HDFS-7957 > URL: https://issues.apache.org/jira/browse/HDFS-7957 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.0 >Reporter: Jing Zhao >Assignee: Jing Zhao >Priority: Critical > Fix For: 2.7.0 > > Attachments: HDFS-7957.000.patch, HDFS-7957.001.patch, > HDFS-7957.002.patch > > > This is a similar issue with HDFS-7587: for truncate we should also verify > quota in the beginning and update quota in the end. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-7957) Truncate should verify quota before making changes
[ https://issues.apache.org/jira/browse/HDFS-7957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shen Yinjie reassigned HDFS-7957: - Assignee: Shen Yinjie (was: Jing Zhao) > Truncate should verify quota before making changes > -- > > Key: HDFS-7957 > URL: https://issues.apache.org/jira/browse/HDFS-7957 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.0 >Reporter: Jing Zhao >Assignee: Shen Yinjie >Priority: Critical > Fix For: 2.7.0 > > Attachments: HDFS-7957.000.patch, HDFS-7957.001.patch, > HDFS-7957.002.patch > > > This is a similar issue with HDFS-7587: for truncate we should also verify > quota in the beginning and update quota in the end. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-6962) ACLs inheritance conflict with umaskmode
[ https://issues.apache.org/jira/browse/HDFS-6962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Zhuge updated HDFS-6962: - Attachment: HDFS-6962.006.patch Patch 006 over 005: * Remove the ugly calls of "instanceof FsCreateModes" > ACLs inheritance conflict with umaskmode > > > Key: HDFS-6962 > URL: https://issues.apache.org/jira/browse/HDFS-6962 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Affects Versions: 2.4.1 > Environment: CentOS release 6.5 (Final) >Reporter: LINTE >Assignee: John Zhuge >Priority: Critical > Labels: hadoop, security > Attachments: HDFS-6962.001.patch, HDFS-6962.002.patch, > HDFS-6962.003.patch, HDFS-6962.004.patch, HDFS-6962.005.patch, > HDFS-6962.006.patch, HDFS-6962.1.patch, disabled_new_client.log, > disabled_old_client.log, enabled_new_client.log, enabled_old_client.log, run > > > In hdfs-site.xml > > dfs.umaskmode > 027 > > 1/ Create a directory as superuser > bash# hdfs dfs -mkdir /tmp/ACLS > 2/ set default ACLs on this directory rwx access for group readwrite and user > toto > bash# hdfs dfs -setfacl -m default:group:readwrite:rwx /tmp/ACLS > bash# hdfs dfs -setfacl -m default:user:toto:rwx /tmp/ACLS > 3/ check ACLs /tmp/ACLS/ > bash# hdfs dfs -getfacl /tmp/ACLS/ > # file: /tmp/ACLS > # owner: hdfs > # group: hadoop > user::rwx > group::r-x > other::--- > default:user::rwx > default:user:toto:rwx > default:group::r-x > default:group:readwrite:rwx > default:mask::rwx > default:other::--- > user::rwx | group::r-x | other::--- matches with the umaskmode defined in > hdfs-site.xml, everything ok ! > default:group:readwrite:rwx allow readwrite group with rwx access for > inhéritance. > default:user:toto:rwx allow toto user with rwx access for inhéritance. > default:mask::rwx inhéritance mask is rwx, so no mask > 4/ Create a subdir to test inheritance of ACL > bash# hdfs dfs -mkdir /tmp/ACLS/hdfs > 5/ check ACLs /tmp/ACLS/hdfs > bash# hdfs dfs -getfacl /tmp/ACLS/hdfs > # file: /tmp/ACLS/hdfs > # owner: hdfs > # group: hadoop > user::rwx > user:toto:rwx #effective:r-x > group::r-x > group:readwrite:rwx #effective:r-x > mask::r-x > other::--- > default:user::rwx > default:user:toto:rwx > default:group::r-x > default:group:readwrite:rwx > default:mask::rwx > default:other::--- > Here we can see that the readwrite group has rwx ACL bu only r-x is effective > because the mask is r-x (mask::r-x) in spite of default mask for inheritance > is set to default:mask::rwx on /tmp/ACLS/ > 6/ Modifiy hdfs-site.xml et restart namenode > > dfs.umaskmode > 010 > > 7/ Create a subdir to test inheritance of ACL with new parameter umaskmode > bash# hdfs dfs -mkdir /tmp/ACLS/hdfs2 > 8/ Check ACL on /tmp/ACLS/hdfs2 > bash# hdfs dfs -getfacl /tmp/ACLS/hdfs2 > # file: /tmp/ACLS/hdfs2 > # owner: hdfs > # group: hadoop > user::rwx > user:toto:rwx #effective:rw- > group::r-x #effective:r-- > group:readwrite:rwx #effective:rw- > mask::rw- > other::--- > default:user::rwx > default:user:toto:rwx > default:group::r-x > default:group:readwrite:rwx > default:mask::rwx > default:other::--- > So HDFS masks the ACL value (user, group and other -- exepted the POSIX > owner -- ) with the group mask of dfs.umaskmode properties when creating > directory with inherited ACL. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10591) Using webhdfs unable to download pdf,doc files in ubuntu os.
[ https://issues.apache.org/jira/browse/HDFS-10591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15362178#comment-15362178 ] Mingliang Liu commented on HDFS-10591: -- Please note that, JIRA tickets are not for usage discussions. Use u...@hadoop.apache.org maillist instead. In this case, I see no obvious relation between the pdf,doc file formats to webhdfs access. If you think this is a real bug, please add detailed description to support debugging. Thanks. > Using webhdfs unable to download pdf,doc files in ubuntu os. > > > Key: HDFS-10591 > URL: https://issues.apache.org/jira/browse/HDFS-10591 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs >Reporter: bharghavi > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10582) Change deprecated configuration fs.checkpoint.dir to dfs.namenode.checkpoint.dir in HDFS Commands Doc
[ https://issues.apache.org/jira/browse/HDFS-10582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15362112#comment-15362112 ] Pan Yuxuan commented on HDFS-10582: --- Fix for review! > Change deprecated configuration fs.checkpoint.dir to > dfs.namenode.checkpoint.dir in HDFS Commands Doc > - > > Key: HDFS-10582 > URL: https://issues.apache.org/jira/browse/HDFS-10582 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation >Affects Versions: 2.7.2 >Reporter: Pan Yuxuan >Priority: Minor > Attachments: HDFS-10582.patch > > > HDFS Commands Documentation -importCheckpoint uses the deprecated > configuration string {code}fs.checkpoint.dir{code} we can use > {noformat}dfs.namenode.checkpoint.dir{noformat} instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10593) MAX_DIR_ITEMS should not be hard coded since RPC buff size is configurable
[ https://issues.apache.org/jira/browse/HDFS-10593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15362065#comment-15362065 ] Yuanbo Liu commented on HDFS-10593: --- [~andrew.wang][~cnauroth] I also tag you two in this loop, and hope to get your thoughts. > MAX_DIR_ITEMS should not be hard coded since RPC buff size is configurable > --- > > Key: HDFS-10593 > URL: https://issues.apache.org/jira/browse/HDFS-10593 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Yuanbo Liu > > In HDFS, "dfs.namenode.fs-limits.max-directory-items" was introduced in > HDFS-6102 to restrict max items of single directory, and the value of it can > not be larger than the value of MAX_DIR_ITEMS. Since > "ipc.maximum.data.length" was added in HADOOP-9676 and documented in > HADOOP-13039 to make maximum RPC buffer size configurable, it's not proper to > hard code the value of MAX_DIR_ITEMS in {{FSDirectory}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-10593) MAX_DIR_ITEMS should not be hard coded since RPC buff size is configurable
[ https://issues.apache.org/jira/browse/HDFS-10593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15362065#comment-15362065 ] Yuanbo Liu edited comment on HDFS-10593 at 7/5/16 6:06 AM: --- [~andrew.wang]/[~cnauroth] I also tag you two in this loop, and hope to get your thoughts. was (Author: yuanbo): [~andrew.wang][~cnauroth] I also tag you two in this loop, and hope to get your thoughts. > MAX_DIR_ITEMS should not be hard coded since RPC buff size is configurable > --- > > Key: HDFS-10593 > URL: https://issues.apache.org/jira/browse/HDFS-10593 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Yuanbo Liu > > In HDFS, "dfs.namenode.fs-limits.max-directory-items" was introduced in > HDFS-6102 to restrict max items of single directory, and the value of it can > not be larger than the value of MAX_DIR_ITEMS. Since > "ipc.maximum.data.length" was added in HADOOP-9676 and documented in > HADOOP-13039 to make maximum RPC buffer size configurable, it's not proper to > hard code the value of MAX_DIR_ITEMS in {{FSDirectory}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10592) Fix intermittent test failure of TestNameNodeResourceChecker#testCheckThatNameNodeResourceMonitorIsRunning
[ https://issues.apache.org/jira/browse/HDFS-10592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15362064#comment-15362064 ] Hadoop QA commented on HDFS-10592: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 24s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 2s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 23s{color} | {color:green} hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 19 unchanged - 4 fixed = 19 total (was 23) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 75m 19s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 97m 13s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency | | | hadoop.hdfs.server.namenode.TestEditLog | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:85209cc | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12816122/HDFS-10592-01.patch | | JIRA Issue | HDFS-10592 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 7f3144a609c3 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 8b4b525 | | Default Java | 1.8.0_91 | | findbugs | v3.0.0 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/15980/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/15980/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/15980/console | | Powered by | Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Fix intermittent test failure of > TestNameNodeResourceChecker#testCheckThatNameNodeResourceMonitorIsRunning > -- > > Key: HDFS-10592 >
[jira] [Created] (HDFS-10593) MAX_DIR_ITEMS should not be hard coded since RPC buff size is configurable
Yuanbo Liu created HDFS-10593: - Summary: MAX_DIR_ITEMS should not be hard coded since RPC buff size is configurable Key: HDFS-10593 URL: https://issues.apache.org/jira/browse/HDFS-10593 Project: Hadoop HDFS Issue Type: Improvement Reporter: Yuanbo Liu In HDFS, "dfs.namenode.fs-limits.max-directory-items" was introduced in HDFS-6102 to restrict max items of single directory, and the value of it can not be larger than the value of MAX_DIR_ITEMS. Since "ipc.maximum.data.length" was added in HADOOP-9676 and documented in HADOOP-13039 to make maximum RPC buffer size configurable, it's not proper to hard code the value of MAX_DIR_ITEMS in {{FSDirectory}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org