[jira] [Created] (HDFS-13480) RBF: separate namenodeHeartbeat and routerHeartbeat to different config key.
maobaolong created HDFS-13480: - Summary: RBF: separate namenodeHeartbeat and routerHeartbeat to different config key. Key: HDFS-13480 URL: https://issues.apache.org/jira/browse/HDFS-13480 Project: Hadoop HDFS Issue Type: Bug Reporter: maobaolong Assignee: maobaolong Now, if i enable the heartbeat.enable, but i do not want to monitor any namenode, i get an ERROR log like: {code:java} [2018-04-19T14:00:03.057+08:00] [ERROR] federation.router.Router.serviceInit(Router.java 214) [main] : Heartbeat is enabled but there are no namenodes to monitor {code} and if i disable the heartbeat.enable, we cannot get any mounttable update, because the following logic in Router.java: {code:java} if (conf.getBoolean( RBFConfigKeys.DFS_ROUTER_HEARTBEAT_ENABLE, RBFConfigKeys.DFS_ROUTER_HEARTBEAT_ENABLE_DEFAULT)) { // Create status updater for each monitored Namenode this.namenodeHeartbeatServices = createNamenodeHeartbeatServices(); for (NamenodeHeartbeatService hearbeatService : this.namenodeHeartbeatServices) { addService(hearbeatService); } if (this.namenodeHeartbeatServices.isEmpty()) { LOG.error("Heartbeat is enabled but there are no namenodes to monitor"); } // Periodically update the router state this.routerHeartbeatService = new RouterHeartbeatService(this); addService(this.routerHeartbeatService); } {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13448) HDFS Block Placement - Ignore Locality for First Block Replica
[ https://issues.apache.org/jira/browse/HDFS-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16443602#comment-16443602 ] genericqa commented on HDFS-13448: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 47s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 29m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 33m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 19m 50s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 38s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 18s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 26m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 26m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 20s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 23s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 19s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 35s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 77m 42s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 40s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}239m 24s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestCrcCorruption | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b | | JIRA Issue | HDFS-13448 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12919738/HDFS-13448.5.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 9318a5b2b29c 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /tes
[jira] [Commented] (HDFS-13478) RBF: Decommission subclusters from the federation
[ https://issues.apache.org/jira/browse/HDFS-13478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16443572#comment-16443572 ] genericqa commented on HDFS-13478: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 36s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 52s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 15s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs-rbf: The patch generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 26s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 58s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs-rbf generated 3 new + 0 unchanged - 0 fixed = 3 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 14m 6s{color} | {color:red} hadoop-hdfs-rbf in the patch failed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 24s{color} | {color:red} The patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 69m 24s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-hdfs-project/hadoop-hdfs-rbf | | | Unchecked/unconfirmed cast from org.apache.hadoop.hdfs.server.federation.store.protocol.GetDisabledNamespacesRequest to org.apache.hadoop.hdfs.server.federation.store.protocol.impl.pb.GetDisabledNamespacesRequestPBImpl in org.apache.hadoop.hdfs.protocolPB.RouterAdminProtocolTranslatorPB.getDisabledNamespaces(GetDisabledNamespacesRequest) At RouterAdminProtocolTranslatorPB.java:org.apache.hadoop.hdfs.server.federation.store.protocol.impl.pb.GetDisabledNamespacesRequestPBImpl in org.apache.hadoop.hdfs.protocolPB.RouterAdminProtocolTranslatorPB.getDisabledNamespaces(GetDisabledNamespacesRequest) At RouterAdminProtocolTranslatorPB.java:[line 261] | | | Useless object stored in variable ret of method org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.getNamespaces() At MembershipNamenodeResolver.java:ret of method org.apache.hadoo
[jira] [Comment Edited] (HDFS-13453) RBF: getMountPointDates should fetch latest subdir time/date when parent dir is not present but /parent/child dirs are present in mount table
[ https://issues.apache.org/jira/browse/HDFS-13453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16443568#comment-16443568 ] Íñigo Goiri edited comment on HDFS-13453 at 4/19/18 4:47 AM: - The only thing is to extract the mod time get. Other than that, this is good to go. was (Author: elgoiri): The only thing is to extract the mod time get. Other than that, this is good to go. On Wed, Apr 18, 2018, 20:58 Dibyendu Karmakar (JIRA) > RBF: getMountPointDates should fetch latest subdir time/date when parent dir > is not present but /parent/child dirs are present in mount table > - > > Key: HDFS-13453 > URL: https://issues.apache.org/jira/browse/HDFS-13453 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Dibyendu Karmakar >Assignee: Dibyendu Karmakar >Priority: Major > Attachments: HDFS-13453-000.patch, HDFS-13453-001.patch > > > [HDFS-13386|https://issues.apache.org/jira/browse/HDFS-13386] is not handling > the case when /parent in not present in mount table but /parent/subdir is in > mount table. > In this case getMountPointDates is not able to fetch the latest time for > /parent as /parent is not present in mount table. > For this scenario we will display latest modified subdir date/time as /parent > modified time. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13453) RBF: getMountPointDates should fetch latest subdir time/date when parent dir is not present but /parent/child dirs are present in mount table
[ https://issues.apache.org/jira/browse/HDFS-13453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16443568#comment-16443568 ] Íñigo Goiri commented on HDFS-13453: The only thing is to extract the mod time get. Other than that, this is good to go. On Wed, Apr 18, 2018, 20:58 Dibyendu Karmakar (JIRA) > RBF: getMountPointDates should fetch latest subdir time/date when parent dir > is not present but /parent/child dirs are present in mount table > - > > Key: HDFS-13453 > URL: https://issues.apache.org/jira/browse/HDFS-13453 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Dibyendu Karmakar >Assignee: Dibyendu Karmakar >Priority: Major > Attachments: HDFS-13453-000.patch, HDFS-13453-001.patch > > > [HDFS-13386|https://issues.apache.org/jira/browse/HDFS-13386] is not handling > the case when /parent in not present in mount table but /parent/subdir is in > mount table. > In this case getMountPointDates is not able to fetch the latest time for > /parent as /parent is not present in mount table. > For this scenario we will display latest modified subdir date/time as /parent > modified time. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-13430) Fix TestEncryptionZonesWithKMS failure due to HADOOP-14445
[ https://issues.apache.org/jira/browse/HDFS-13430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16443557#comment-16443557 ] Rushabh S Shah edited comment on HDFS-13430 at 4/19/18 4:10 AM: Someone needs to properly manage the "Fix Version" fields. 2.9.1 is still showing as unreleased. Same for other branches also. was (Author: shahrs87): Someone needs to fix the "Fix Version" fields. 2.9.1 is still showing as unreleased. Same for other branches also. > Fix TestEncryptionZonesWithKMS failure due to HADOOP-14445 > -- > > Key: HDFS-13430 > URL: https://issues.apache.org/jira/browse/HDFS-13430 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Xiao Chen >Assignee: Xiao Chen >Priority: Major > Fix For: 2.10.0, 2.8.4, 3.2.0, 3.1.1, 2.9.2, 3.0.3 > > Attachments: HDFS-13430.01.patch > > > Unfortunately HADOOP-14445 had an HDFS test failure that's not caught in the > hadoop-common precommit runs. > This is caught by our internal pre-commit using dist-test, and appears to be > the only failure. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13430) Fix TestEncryptionZonesWithKMS failure due to HADOOP-14445
[ https://issues.apache.org/jira/browse/HDFS-13430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16443557#comment-16443557 ] Rushabh S Shah commented on HDFS-13430: --- Someone needs to fix the "Fix Version" fields. 2.9.1 is still showing as unreleased. Same for other branches also. > Fix TestEncryptionZonesWithKMS failure due to HADOOP-14445 > -- > > Key: HDFS-13430 > URL: https://issues.apache.org/jira/browse/HDFS-13430 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Xiao Chen >Assignee: Xiao Chen >Priority: Major > Fix For: 2.10.0, 2.8.4, 3.2.0, 3.1.1, 2.9.2, 3.0.3 > > Attachments: HDFS-13430.01.patch > > > Unfortunately HADOOP-14445 had an HDFS test failure that's not caught in the > hadoop-common precommit runs. > This is caught by our internal pre-commit using dist-test, and appears to be > the only failure. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13430) Fix TestEncryptionZonesWithKMS failure due to HADOOP-14445
[ https://issues.apache.org/jira/browse/HDFS-13430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16443556#comment-16443556 ] Xiao Chen commented on HDFS-13430: -- Thanks Rushabh, branch enum LGTM. > Fix TestEncryptionZonesWithKMS failure due to HADOOP-14445 > -- > > Key: HDFS-13430 > URL: https://issues.apache.org/jira/browse/HDFS-13430 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Xiao Chen >Assignee: Xiao Chen >Priority: Major > Fix For: 2.10.0, 2.8.4, 3.2.0, 3.1.1, 2.9.2, 3.0.3 > > Attachments: HDFS-13430.01.patch > > > Unfortunately HADOOP-14445 had an HDFS test failure that's not caught in the > hadoop-common precommit runs. > This is caught by our internal pre-commit using dist-test, and appears to be > the only failure. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13430) Fix TestEncryptionZonesWithKMS failure due to HADOOP-14445
[ https://issues.apache.org/jira/browse/HDFS-13430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated HDFS-13430: -- Resolution: Fixed Fix Version/s: 3.0.3 2.9.2 3.1.1 2.8.4 2.10.0 Status: Resolved (was: Patch Available) Cherry-picked to branch-3.1, branch-3.0, branch-2, branch-2.9, branch-2.8. Thanks [~xiaochen] for the patch. Hope I didn't mess up anything. Let me know if otherwise. > Fix TestEncryptionZonesWithKMS failure due to HADOOP-14445 > -- > > Key: HDFS-13430 > URL: https://issues.apache.org/jira/browse/HDFS-13430 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Xiao Chen >Assignee: Xiao Chen >Priority: Major > Fix For: 2.10.0, 2.8.4, 3.2.0, 3.1.1, 2.9.2, 3.0.3 > > Attachments: HDFS-13430.01.patch > > > Unfortunately HADOOP-14445 had an HDFS test failure that's not caught in the > hadoop-common precommit runs. > This is caught by our internal pre-commit using dist-test, and appears to be > the only failure. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13453) RBF: getMountPointDates should fetch latest subdir time/date when parent dir is not present but /parent/child dirs are present in mount table
[ https://issues.apache.org/jira/browse/HDFS-13453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16443552#comment-16443552 ] Dibyendu Karmakar commented on HDFS-13453: -- [^HDFS-13453-001.patch] contains the unit test for this scenario. > RBF: getMountPointDates should fetch latest subdir time/date when parent dir > is not present but /parent/child dirs are present in mount table > - > > Key: HDFS-13453 > URL: https://issues.apache.org/jira/browse/HDFS-13453 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Dibyendu Karmakar >Assignee: Dibyendu Karmakar >Priority: Major > Attachments: HDFS-13453-000.patch, HDFS-13453-001.patch > > > [HDFS-13386|https://issues.apache.org/jira/browse/HDFS-13386] is not handling > the case when /parent in not present in mount table but /parent/subdir is in > mount table. > In this case getMountPointDates is not able to fetch the latest time for > /parent as /parent is not present in mount table. > For this scenario we will display latest modified subdir date/time as /parent > modified time. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13469) RBF: Support InodeID in the Router
[ https://issues.apache.org/jira/browse/HDFS-13469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16443544#comment-16443544 ] Xiao Chen commented on HDFS-13469: -- {quote} Does anybody have a pointer to an end 2 end use of inodes? {quote} The only use case I'm aware of is you can for example {{hdfs dfs -ls /.reserved/.inodes/123}} to access the file. (replace CLI with API calls, and ls with other operations). See -HDFS-4434.- > RBF: Support InodeID in the Router > -- > > Key: HDFS-13469 > URL: https://issues.apache.org/jira/browse/HDFS-13469 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Priority: Major > > The Namenode supports identifying files through inode identifiers. > Currently the Router does not handle this properly, we need to add this > functionality. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13478) RBF: Decommission subclusters from the federation
[ https://issues.apache.org/jira/browse/HDFS-13478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Íñigo Goiri updated HDFS-13478: --- Status: Patch Available (was: Open) > RBF: Decommission subclusters from the federation > - > > Key: HDFS-13478 > URL: https://issues.apache.org/jira/browse/HDFS-13478 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: Íñigo Goiri >Priority: Major > Attachments: HDFS-13478.000.patch, HDFS-13478.001.patch > > > We have a subcluster in our federation that is for testing and is > missbehaving. This has a negative impact on the performance with operations > that go to every subcluster (e.g., renewLease() or setSafeMode()). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13469) RBF: Support InodeID in the Router
[ https://issues.apache.org/jira/browse/HDFS-13469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16443536#comment-16443536 ] Íñigo Goiri commented on HDFS-13469: Does anybody have a pointer to an end 2 end use of inodes? For now, it might be good to throw an unsupported exception if we get accesses to an inode path. Tracking the locations might be too involved. > RBF: Support InodeID in the Router > -- > > Key: HDFS-13469 > URL: https://issues.apache.org/jira/browse/HDFS-13469 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Priority: Major > > The Namenode supports identifying files through inode identifiers. > Currently the Router does not handle this properly, we need to add this > functionality. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13453) RBF: getMountPointDates should fetch latest subdir time/date when parent dir is not present but /parent/child dirs are present in mount table
[ https://issues.apache.org/jira/browse/HDFS-13453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16443533#comment-16443533 ] Íñigo Goiri commented on HDFS-13453: [~dibyendu_hadoop], that makes sense, if the parent is there, let's return that date. Let's just make sure that's in the unit test. > RBF: getMountPointDates should fetch latest subdir time/date when parent dir > is not present but /parent/child dirs are present in mount table > - > > Key: HDFS-13453 > URL: https://issues.apache.org/jira/browse/HDFS-13453 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Dibyendu Karmakar >Assignee: Dibyendu Karmakar >Priority: Major > Attachments: HDFS-13453-000.patch, HDFS-13453-001.patch > > > [HDFS-13386|https://issues.apache.org/jira/browse/HDFS-13386] is not handling > the case when /parent in not present in mount table but /parent/subdir is in > mount table. > In this case getMountPointDates is not able to fetch the latest time for > /parent as /parent is not present in mount table. > For this scenario we will display latest modified subdir date/time as /parent > modified time. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13478) RBF: Decommission subclusters from the federation
[ https://issues.apache.org/jira/browse/HDFS-13478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16443532#comment-16443532 ] Íñigo Goiri commented on HDFS-13478: Thanks [~linyiqun] for the comments. After checking, we cannot reuse the membership table as it is updated per Router and the way the heartbeat is done would overwrite everything. For the length, I agree. We can go with [^HDFS-13478.001.patch] which just does the basic wiring and then we can do the RPC checks in another one. We can remove the change for MembershipNamenodeResolver from [^HDFS-13478.001.patch]. > RBF: Decommission subclusters from the federation > - > > Key: HDFS-13478 > URL: https://issues.apache.org/jira/browse/HDFS-13478 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: Íñigo Goiri >Priority: Major > Attachments: HDFS-13478.000.patch, HDFS-13478.001.patch > > > We have a subcluster in our federation that is for testing and is > missbehaving. This has a negative impact on the performance with operations > that go to every subcluster (e.g., renewLease() or setSafeMode()). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13469) RBF: Support InodeID in the Router
[ https://issues.apache.org/jira/browse/HDFS-13469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16443531#comment-16443531 ] Yiqun Lin commented on HDFS-13469: -- {quote}I'm not sure how to figure which subcluster has that inode. {quote} [~elgoiri], currently in the Router, I think we cannot directly know which subcluster has the inode. The one way, we may try to get its HdfsFileStatus from subclusters with the given source path, and verify the fileId in HdfsFileStatus and confirm the desired subcluster(namespace). Not sure this is the expected way as [~daryn] wants. > RBF: Support InodeID in the Router > -- > > Key: HDFS-13469 > URL: https://issues.apache.org/jira/browse/HDFS-13469 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Priority: Major > > The Namenode supports identifying files through inode identifiers. > Currently the Router does not handle this properly, we need to add this > functionality. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13478) RBF: Decommission subclusters from the federation
[ https://issues.apache.org/jira/browse/HDFS-13478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Íñigo Goiri updated HDFS-13478: --- Attachment: HDFS-13478.001.patch > RBF: Decommission subclusters from the federation > - > > Key: HDFS-13478 > URL: https://issues.apache.org/jira/browse/HDFS-13478 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: Íñigo Goiri >Priority: Major > Attachments: HDFS-13478.000.patch, HDFS-13478.001.patch > > > We have a subcluster in our federation that is for testing and is > missbehaving. This has a negative impact on the performance with operations > that go to every subcluster (e.g., renewLease() or setSafeMode()). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-13479) Simplify find StorageInfo logical operation in BlocksMap::replaceBlock()
liaoyuxiangqin created HDFS-13479: - Summary: Simplify find StorageInfo logical operation in BlocksMap::replaceBlock() Key: HDFS-13479 URL: https://issues.apache.org/jira/browse/HDFS-13479 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: liaoyuxiangqin When i read the replaceBlock() of BlocksMap class in hdfs-blockmanger, i found the following find storage code could be more simplify and easy to understand. {code:java|title=DataStreamer.java|borderStyle=solid} for (int i = currentBlock.numNodes() - 1; i >= 0; i--) { final DatanodeDescriptor dn = currentBlock.getDatanode(i); final DatanodeStorageInfo storage = currentBlock.findStorageInfo(dn); final boolean removed = storage.removeBlock(currentBlock); Preconditions.checkState(removed, "currentBlock not found."); final AddBlockResult result = storage.addBlock(newBlock); Preconditions.checkState(result == AddBlockResult.ADDED, "newBlock already exists."); } {code} as described above code segmet, i find that need't get dn and we can find storage by index directly, so i think this code logical could simplify more. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13469) RBF: Support InodeID in the Router
[ https://issues.apache.org/jira/browse/HDFS-13469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16443511#comment-16443511 ] Xiao Chen commented on HDFS-13469: -- Thanks folks for bringing up this issue. Is it even possible to solve this in a compatible way? Suppose I'm the client and I previously read from {{/.reserved/inode/123}}, from the only nameservice I had. Now the cluster is federated and another nameservice is added. Router based or not, if that other nameservice also has inode 123, there is no way that my input {{/.reserved/inode/123}} can map to the 2 inodes in the 2 nameservices, both ID'ed 123... It seems to me we'd require the client to explicitly read from {{hdfs://nameserviceX/.reserved/inode/123}} to have this working, and call out this in documentation. I'm not very familiar with NFS, does HDFS-11575 address this? > RBF: Support InodeID in the Router > -- > > Key: HDFS-13469 > URL: https://issues.apache.org/jira/browse/HDFS-13469 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Priority: Major > > The Namenode supports identifying files through inode identifiers. > Currently the Router does not handle this properly, we need to add this > functionality. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13478) RBF: Decommission subclusters from the federation
[ https://issues.apache.org/jira/browse/HDFS-13478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16443503#comment-16443503 ] Yiqun Lin commented on HDFS-13478: -- [~elgoiri], I did a quick glance for the patch, some thoughts from me: * If we make decommission info as a separate table, that mean we will do the query operation for State Store from one time to twice. Does this will make any performance impact? * Looks like this patch is large. We could separate this into two parts: # Decommission Store APIs implementation (Seems HDFS-13478.000.patch is just doing on this). # Add checking logic in Router sever side and corresponding test for the decommission case. > RBF: Decommission subclusters from the federation > - > > Key: HDFS-13478 > URL: https://issues.apache.org/jira/browse/HDFS-13478 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: Íñigo Goiri >Priority: Major > Attachments: HDFS-13478.000.patch > > > We have a subcluster in our federation that is for testing and is > missbehaving. This has a negative impact on the performance with operations > that go to every subcluster (e.g., renewLease() or setSafeMode()). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13441) DataNode missed BlockKey update from NameNode due to HeartbeatResponse was dropped
[ https://issues.apache.org/jira/browse/HDFS-13441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16443471#comment-16443471 ] genericqa commented on HDFS-13441: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 41s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 28s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 1m 7s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 19s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 99m 18s{color} | {color:green} hadoop-hdfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}163m 14s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b | | JIRA Issue | HDFS-13441 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12919706/HDFS-13441.003.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 79a13594c2b2 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / e4c39f3 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_162 | | findbugs | v3.1.0-RC1 | | mvninstall | https://builds.apache.org/job/PreCommit-HDFS-Build/23992/artifact/out/patch-mvninstall-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/23992/testReport/ | | Max. process+thread count | 3243 (vs. ulimit of 1) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/23992/console | | Powe
[jira] [Commented] (HDFS-13448) HDFS Block Placement - Ignore Locality for First Block Replica
[ https://issues.apache.org/jira/browse/HDFS-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16443469#comment-16443469 ] BELUGA BEHR commented on HDFS-13448: Not sure why the compilation failed trying again... > HDFS Block Placement - Ignore Locality for First Block Replica > -- > > Key: HDFS-13448 > URL: https://issues.apache.org/jira/browse/HDFS-13448 > Project: Hadoop HDFS > Issue Type: New Feature > Components: block placement, hdfs-client >Affects Versions: 2.9.0, 3.0.1 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Minor > Attachments: HDFS-13448.1.patch, HDFS-13448.2.patch, > HDFS-13448.3.patch, HDFS-13448.4.patch, HDFS-13448.5.patch > > > According to the HDFS Block Place Rules: > {quote} > /** > * The replica placement strategy is that if the writer is on a datanode, > * the 1st replica is placed on the local machine, > * otherwise a random datanode. The 2nd replica is placed on a datanode > * that is on a different rack. The 3rd replica is placed on a datanode > * which is on a different node of the rack as the second replica. > */ > {quote} > However, there is a hint for the hdfs-client that allows the block placement > request to not put a block replica on the local datanode _where 'local' means > the same host as the client is being run on._ > {quote} > /** >* Advise that a block replica NOT be written to the local DataNode where >* 'local' means the same host as the client is being run on. >* >* @see CreateFlag#NO_LOCAL_WRITE >*/ > {quote} > I propose that we add a new flag that allows the hdfs-client to request that > the first block replica be placed on a random DataNode in the cluster. The > subsequent block replicas should follow the normal block placement rules. > The issue is that when the {{NO_LOCAL_WRITE}} is enabled, the first block > replica is not placed on the local node, but it is still placed on the local > rack. Where this comes into play is where you have, for example, a flume > agent that is loading data into HDFS. > If the Flume agent is running on a DataNode, then by default, the DataNode > local to the Flume agent will always get the first block replica and this > leads to un-even block placements, with the local node always filling up > faster than any other node in the cluster. > Modifying this example, if the DataNode is removed from the host where the > Flume agent is running, or this {{NO_LOCAL_WRITE}} is enabled by Flume, then > the default block placement policy will still prefer the local rack. This > remedies the situation only so far as now the first block replica will always > be distributed to a DataNode on the local rack. > This new flag would allow a single Flume agent to distribute the blocks > randomly, evenly, over the entire cluster instead of hot-spotting the local > node or the local rack. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13453) RBF: getMountPointDates should fetch latest subdir time/date when parent dir is not present but /parent/child dirs are present in mount table
[ https://issues.apache.org/jira/browse/HDFS-13453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16443466#comment-16443466 ] Dibyendu Karmakar commented on HDFS-13453: -- Thanks [~elgoiri]. Using this above mentioned approach, when mount table will have */parent* as well as */parent/child* mount point it will return the modified date of */parent/child* for */parent*. But the actual modified date of /parent will be different in mount table entry. I think if we have /parent in mount table we should return the actual modified date present in the mount table. When /parent will not be there in mount table, in that case we will return modified date of /parent/child. What do you suggest? > RBF: getMountPointDates should fetch latest subdir time/date when parent dir > is not present but /parent/child dirs are present in mount table > - > > Key: HDFS-13453 > URL: https://issues.apache.org/jira/browse/HDFS-13453 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Dibyendu Karmakar >Assignee: Dibyendu Karmakar >Priority: Major > Attachments: HDFS-13453-000.patch, HDFS-13453-001.patch > > > [HDFS-13386|https://issues.apache.org/jira/browse/HDFS-13386] is not handling > the case when /parent in not present in mount table but /parent/subdir is in > mount table. > In this case getMountPointDates is not able to fetch the latest time for > /parent as /parent is not present in mount table. > For this scenario we will display latest modified subdir date/time as /parent > modified time. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13478) RBF: Decommission subclusters from the federation
[ https://issues.apache.org/jira/browse/HDFS-13478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16443464#comment-16443464 ] Íñigo Goiri commented on HDFS-13478: [^HDFS-13478.000.patch] is a work in progress for what I'm thinking on doing. It is still missing a proper unit test to check that we ignore a subcluster. The main problem as usual with all this interface patches is that the proto wiring is huge. > RBF: Decommission subclusters from the federation > - > > Key: HDFS-13478 > URL: https://issues.apache.org/jira/browse/HDFS-13478 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: Íñigo Goiri >Priority: Major > Attachments: HDFS-13478.000.patch > > > We have a subcluster in our federation that is for testing and is > missbehaving. This has a negative impact on the performance with operations > that go to every subcluster (e.g., renewLease() or setSafeMode()). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13478) RBF: Decommission subclusters from the federation
[ https://issues.apache.org/jira/browse/HDFS-13478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Íñigo Goiri updated HDFS-13478: --- Attachment: HDFS-13478.000.patch > RBF: Decommission subclusters from the federation > - > > Key: HDFS-13478 > URL: https://issues.apache.org/jira/browse/HDFS-13478 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: Íñigo Goiri >Priority: Major > Attachments: HDFS-13478.000.patch > > > We have a subcluster in our federation that is for testing and is > missbehaving. This has a negative impact on the performance with operations > that go to every subcluster (e.g., renewLease() or setSafeMode()). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13448) HDFS Block Placement - Ignore Locality for First Block Replica
[ https://issues.apache.org/jira/browse/HDFS-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HDFS-13448: --- Attachment: HDFS-13448.5.patch > HDFS Block Placement - Ignore Locality for First Block Replica > -- > > Key: HDFS-13448 > URL: https://issues.apache.org/jira/browse/HDFS-13448 > Project: Hadoop HDFS > Issue Type: New Feature > Components: block placement, hdfs-client >Affects Versions: 2.9.0, 3.0.1 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Minor > Attachments: HDFS-13448.1.patch, HDFS-13448.2.patch, > HDFS-13448.3.patch, HDFS-13448.4.patch, HDFS-13448.5.patch > > > According to the HDFS Block Place Rules: > {quote} > /** > * The replica placement strategy is that if the writer is on a datanode, > * the 1st replica is placed on the local machine, > * otherwise a random datanode. The 2nd replica is placed on a datanode > * that is on a different rack. The 3rd replica is placed on a datanode > * which is on a different node of the rack as the second replica. > */ > {quote} > However, there is a hint for the hdfs-client that allows the block placement > request to not put a block replica on the local datanode _where 'local' means > the same host as the client is being run on._ > {quote} > /** >* Advise that a block replica NOT be written to the local DataNode where >* 'local' means the same host as the client is being run on. >* >* @see CreateFlag#NO_LOCAL_WRITE >*/ > {quote} > I propose that we add a new flag that allows the hdfs-client to request that > the first block replica be placed on a random DataNode in the cluster. The > subsequent block replicas should follow the normal block placement rules. > The issue is that when the {{NO_LOCAL_WRITE}} is enabled, the first block > replica is not placed on the local node, but it is still placed on the local > rack. Where this comes into play is where you have, for example, a flume > agent that is loading data into HDFS. > If the Flume agent is running on a DataNode, then by default, the DataNode > local to the Flume agent will always get the first block replica and this > leads to un-even block placements, with the local node always filling up > faster than any other node in the cluster. > Modifying this example, if the DataNode is removed from the host where the > Flume agent is running, or this {{NO_LOCAL_WRITE}} is enabled by Flume, then > the default block placement policy will still prefer the local rack. This > remedies the situation only so far as now the first block replica will always > be distributed to a DataNode on the local rack. > This new flag would allow a single Flume agent to distribute the blocks > randomly, evenly, over the entire cluster instead of hot-spotting the local > node or the local rack. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13469) RBF: Support InodeID in the Router
[ https://issues.apache.org/jira/browse/HDFS-13469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16443439#comment-16443439 ] Íñigo Goiri commented on HDFS-13469: Will the client come with 123 or with a path like /.reserved/.inodes/123? I'm not sure how to figure which subcluster has that inode. > RBF: Support InodeID in the Router > -- > > Key: HDFS-13469 > URL: https://issues.apache.org/jira/browse/HDFS-13469 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Priority: Major > > The Namenode supports identifying files through inode identifiers. > Currently the Router does not handle this properly, we need to add this > functionality. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13286) Add haadmin commands to transition between standby and observer
[ https://issues.apache.org/jira/browse/HDFS-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16443402#comment-16443402 ] genericqa commented on HDFS-13286: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 24s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} HDFS-12943 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 49s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 11s{color} | {color:green} HDFS-12943 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 26m 35s{color} | {color:green} HDFS-12943 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 46s{color} | {color:green} HDFS-12943 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 46s{color} | {color:green} HDFS-12943 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 18s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 47s{color} | {color:green} HDFS-12943 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 9s{color} | {color:green} HDFS-12943 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 18s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 25m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 25m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 25m 52s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 2m 39s{color} | {color:orange} root: The patch generated 13 new + 287 unchanged - 5 fixed = 300 total (was 292) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 26s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 18s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 19s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}103m 40s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 49s{color} | {color:red} hadoop-yarn-common in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 50s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 37s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}249m 2s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure | | | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA | | | hadoop.hdfs.web.TestWebHdfsTimeouts | | | hadoop.hdfs.TestSafeModeWithStripedFile | | | hadoop.hdfs.server.namenode.TestNameNodeMXBean | | | hadoop.hdfs.TestRollingUpgrade | \\ \\ ||
[jira] [Commented] (HDFS-13448) HDFS Block Placement - Ignore Locality for First Block Replica
[ https://issues.apache.org/jira/browse/HDFS-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16443386#comment-16443386 ] genericqa commented on HDFS-13448: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 27s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 19s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 17m 10s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 26s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 17s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 27m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 27m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 24s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 30s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 27s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 46s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 47s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 38s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}151m 30s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b | | JIRA Issue | HDFS-13448 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12919698/HDFS-13448.4.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 329dd1f68fa7 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / bf
[jira] [Commented] (HDFS-13441) DataNode missed BlockKey update from NameNode due to HeartbeatResponse was dropped
[ https://issues.apache.org/jira/browse/HDFS-13441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16443349#comment-16443349 ] yunjiong zhao commented on HDFS-13441: -- [~daryn] , you are right, it's not the best and reliable way to fix this issue. After some rethink, I think one line code should fix this issue. When NameNode startActiveServices, it will call {code:java} blockManager.getDatanodeManager().markAllDatanodesStale(); {code} Inside markAllDatanodesStale, add one line code to make sure DataNode have the current key from active NameNode. {code:java} dn.setNeedKeyUpdate(true); {code} > DataNode missed BlockKey update from NameNode due to HeartbeatResponse was > dropped > -- > > Key: HDFS-13441 > URL: https://issues.apache.org/jira/browse/HDFS-13441 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode >Affects Versions: 2.7.1 >Reporter: yunjiong zhao >Assignee: yunjiong zhao >Priority: Major > Attachments: HDFS-13441.002.patch, HDFS-13441.003.patch, > HDFS-13441.patch > > > After NameNode failover, lots of application failed due to some DataNodes > can't re-compute password from block token. > {code:java} > 2018-04-11 20:10:52,448 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: > hdc3-lvs01-400-1701-048.stratus.lvs.ebay.com:50010:DataXceiver error > processing unknown operation src: /10.142.74.116:57404 dst: > /10.142.77.45:50010 > javax.security.sasl.SaslException: DIGEST-MD5: IO error acquiring password > [Caused by org.apache.hadoop.security.token.SecretManager$InvalidToken: Can't > re-compute password for block_token_identifier (expiryDate=1523538652448, > keyId=1762737944, userId=hadoop, > blockPoolId=BP-36315570-10.103.108.13-1423055488042, blockId=12142862700, > access modes=[WRITE]), since the required block key (keyID=1762737944) > doesn't exist.] > at > com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:598) > at > com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslParticipant.evaluateChallengeOrResponse(SaslParticipant.java:115) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.doSaslHandshake(SaslDataTransferServer.java:376) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.getSaslStreams(SaslDataTransferServer.java:300) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.receive(SaslDataTransferServer.java:127) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:194) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.security.token.SecretManager$InvalidToken: Can't > re-compute password for block_token_identifier (expiryDate=1523538652448, > keyId=1762737944, userId=hadoop, > blockPoolId=BP-36315570-10.103.108.13-1423055488042, blockId=12142862700, > access modes=[WRITE]), since the required block key (keyID=1762737944) > doesn't exist. > at > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.retrievePassword(BlockTokenSecretManager.java:382) > at > org.apache.hadoop.hdfs.security.token.block.BlockPoolTokenSecretManager.retrievePassword(BlockPoolTokenSecretManager.java:79) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.buildServerPassword(SaslDataTransferServer.java:318) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.access$100(SaslDataTransferServer.java:73) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer$2.apply(SaslDataTransferServer.java:297) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer$SaslServerCallbackHandler.handle(SaslDataTransferServer.java:241) > at > com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:589) > ... 7 more > {code} > > In the DataNode log, we didn't see DataNode update block keys around > 2018-04-11 09:55:00 and around 2018-04-11 19:55:00. > {code:java} > 2018-04-10 14:51:36,424 INFO > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting > block keys > 2018-04-10 23:55:38,420 INFO > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting > block keys > 2018-04-11 00:51:34,792 INFO > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting > block keys > 2018-04-11 10:51:39,403 INFO > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting > block keys
[jira] [Updated] (HDFS-13441) DataNode missed BlockKey update from NameNode due to HeartbeatResponse was dropped
[ https://issues.apache.org/jira/browse/HDFS-13441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yunjiong zhao updated HDFS-13441: - Attachment: HDFS-13441.003.patch > DataNode missed BlockKey update from NameNode due to HeartbeatResponse was > dropped > -- > > Key: HDFS-13441 > URL: https://issues.apache.org/jira/browse/HDFS-13441 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode >Affects Versions: 2.7.1 >Reporter: yunjiong zhao >Assignee: yunjiong zhao >Priority: Major > Attachments: HDFS-13441.002.patch, HDFS-13441.003.patch, > HDFS-13441.patch > > > After NameNode failover, lots of application failed due to some DataNodes > can't re-compute password from block token. > {code:java} > 2018-04-11 20:10:52,448 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: > hdc3-lvs01-400-1701-048.stratus.lvs.ebay.com:50010:DataXceiver error > processing unknown operation src: /10.142.74.116:57404 dst: > /10.142.77.45:50010 > javax.security.sasl.SaslException: DIGEST-MD5: IO error acquiring password > [Caused by org.apache.hadoop.security.token.SecretManager$InvalidToken: Can't > re-compute password for block_token_identifier (expiryDate=1523538652448, > keyId=1762737944, userId=hadoop, > blockPoolId=BP-36315570-10.103.108.13-1423055488042, blockId=12142862700, > access modes=[WRITE]), since the required block key (keyID=1762737944) > doesn't exist.] > at > com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:598) > at > com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslParticipant.evaluateChallengeOrResponse(SaslParticipant.java:115) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.doSaslHandshake(SaslDataTransferServer.java:376) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.getSaslStreams(SaslDataTransferServer.java:300) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.receive(SaslDataTransferServer.java:127) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:194) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.security.token.SecretManager$InvalidToken: Can't > re-compute password for block_token_identifier (expiryDate=1523538652448, > keyId=1762737944, userId=hadoop, > blockPoolId=BP-36315570-10.103.108.13-1423055488042, blockId=12142862700, > access modes=[WRITE]), since the required block key (keyID=1762737944) > doesn't exist. > at > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.retrievePassword(BlockTokenSecretManager.java:382) > at > org.apache.hadoop.hdfs.security.token.block.BlockPoolTokenSecretManager.retrievePassword(BlockPoolTokenSecretManager.java:79) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.buildServerPassword(SaslDataTransferServer.java:318) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.access$100(SaslDataTransferServer.java:73) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer$2.apply(SaslDataTransferServer.java:297) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer$SaslServerCallbackHandler.handle(SaslDataTransferServer.java:241) > at > com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:589) > ... 7 more > {code} > > In the DataNode log, we didn't see DataNode update block keys around > 2018-04-11 09:55:00 and around 2018-04-11 19:55:00. > {code:java} > 2018-04-10 14:51:36,424 INFO > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting > block keys > 2018-04-10 23:55:38,420 INFO > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting > block keys > 2018-04-11 00:51:34,792 INFO > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting > block keys > 2018-04-11 10:51:39,403 INFO > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting > block keys > 2018-04-11 20:51:44,422 INFO > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting > block keys > 2018-04-12 02:54:47,855 INFO > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting > block keys > 2018-04-12 05:55:44,456 INFO > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting > block keys > {code} > The reason is there is SocketTimeOutException when sending heartbeat
[jira] [Commented] (HDFS-13469) RBF: Support InodeID in the Router
[ https://issues.apache.org/jira/browse/HDFS-13469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16443317#comment-16443317 ] Daryn Sharp commented on HDFS-13469: It's really as simple as: the path /.reserved/.inodes/123 will access inode 123 which may really be /user/daryn/dir/mystuff. The nfs implementation relies on inode paths. HDFS-7878 cannot solve the problem because its an abstraction for a different purpose. > RBF: Support InodeID in the Router > -- > > Key: HDFS-13469 > URL: https://issues.apache.org/jira/browse/HDFS-13469 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Priority: Major > > The Namenode supports identifying files through inode identifiers. > Currently the Router does not handle this properly, we need to add this > functionality. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13478) RBF: Decommission subclusters from the federation
[ https://issues.apache.org/jira/browse/HDFS-13478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16443309#comment-16443309 ] Íñigo Goiri commented on HDFS-13478: I could make this part of the Membership table but I think it's better to keep it in a separate table. In this way, they are independent and we can keep the subcluster decommissioned even when there is no Router heartbeating. > RBF: Decommission subclusters from the federation > - > > Key: HDFS-13478 > URL: https://issues.apache.org/jira/browse/HDFS-13478 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: Íñigo Goiri >Priority: Major > > We have a subcluster in our federation that is for testing and is > missbehaving. This has a negative impact on the performance with operations > that go to every subcluster (e.g., renewLease() or setSafeMode()). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13478) RBF: Decommission subclusters from the federation
[ https://issues.apache.org/jira/browse/HDFS-13478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16443277#comment-16443277 ] Íñigo Goiri commented on HDFS-13478: The idea would be to add a decommission table in the State Store and ignore this subcluster when checking for locations. We would need in dfsrouteradmin to disable/enable subclusters. > RBF: Decommission subclusters from the federation > - > > Key: HDFS-13478 > URL: https://issues.apache.org/jira/browse/HDFS-13478 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: Íñigo Goiri >Priority: Major > > We have a subcluster in our federation that is for testing and is > missbehaving. This has a negative impact on the performance with operations > that go to every subcluster (e.g., renewLease() or setSafeMode()). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-13478) RBF: Decommission subclusters from the federation
[ https://issues.apache.org/jira/browse/HDFS-13478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Íñigo Goiri reassigned HDFS-13478: -- Assignee: Íñigo Goiri > RBF: Decommission subclusters from the federation > - > > Key: HDFS-13478 > URL: https://issues.apache.org/jira/browse/HDFS-13478 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: Íñigo Goiri >Priority: Major > > We have a subcluster in our federation that is for testing and is > missbehaving. This has a negative impact on the performance with operations > that go to every subcluster (e.g., renewLease() or setSafeMode()). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-13478) RBF: Decommission subclusters from the federation
Íñigo Goiri created HDFS-13478: -- Summary: RBF: Decommission subclusters from the federation Key: HDFS-13478 URL: https://issues.apache.org/jira/browse/HDFS-13478 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Íñigo Goiri We have a subcluster in our federation that is for testing and is missbehaving. This has a negative impact on the performance with operations that go to every subcluster (e.g., renewLease() or setSafeMode()). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13475) RBF: Admin cannot enforce Router enter SafeMode
[ https://issues.apache.org/jira/browse/HDFS-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16443251#comment-16443251 ] Wei Yan commented on HDFS-13475: >From NameNode side, it has isInSafeMode() and isInStartupSafeMode(). Router >can follow similar concept, that we have two different safeMode functions: >isInSafeMode() and isInForcedSafeMode(). > RBF: Admin cannot enforce Router enter SafeMode > --- > > Key: HDFS-13475 > URL: https://issues.apache.org/jira/browse/HDFS-13475 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Wei Yan >Assignee: Wei Yan >Priority: Major > > To reproduce the issue: > {code:java} > $ bin/hdfs dfsrouteradmin -safemode enter > Successfully enter safe mode. > $ bin/hdfs dfsrouteradmin -safemode get > Safe Mode: true{code} > And then, > {code:java} > $ bin/hdfs dfsrouteradmin -safemode get > Safe Mode: false{code} > From the code, it looks like the periodicInvoke triggers the leave. > {code:java} > public void periodicInvoke() { > .. > // Always update to indicate our cache was updated > if (isCacheStale) { > if (!rpcServer.isInSafeMode()) { > enter(); > } > } else if (rpcServer.isInSafeMode()) { > // Cache recently updated, leave safe mode > leave(); > } > } > {code} > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13448) HDFS Block Placement - Ignore Locality for First Block Replica
[ https://issues.apache.org/jira/browse/HDFS-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16443236#comment-16443236 ] BELUGA BEHR commented on HDFS-13448: [~daryn] Thanks for pointing me in this direction. I have attached a new patch as you have recommended. > HDFS Block Placement - Ignore Locality for First Block Replica > -- > > Key: HDFS-13448 > URL: https://issues.apache.org/jira/browse/HDFS-13448 > Project: Hadoop HDFS > Issue Type: New Feature > Components: block placement, hdfs-client >Affects Versions: 2.9.0, 3.0.1 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Minor > Attachments: HDFS-13448.1.patch, HDFS-13448.2.patch, > HDFS-13448.3.patch, HDFS-13448.4.patch > > > According to the HDFS Block Place Rules: > {quote} > /** > * The replica placement strategy is that if the writer is on a datanode, > * the 1st replica is placed on the local machine, > * otherwise a random datanode. The 2nd replica is placed on a datanode > * that is on a different rack. The 3rd replica is placed on a datanode > * which is on a different node of the rack as the second replica. > */ > {quote} > However, there is a hint for the hdfs-client that allows the block placement > request to not put a block replica on the local datanode _where 'local' means > the same host as the client is being run on._ > {quote} > /** >* Advise that a block replica NOT be written to the local DataNode where >* 'local' means the same host as the client is being run on. >* >* @see CreateFlag#NO_LOCAL_WRITE >*/ > {quote} > I propose that we add a new flag that allows the hdfs-client to request that > the first block replica be placed on a random DataNode in the cluster. The > subsequent block replicas should follow the normal block placement rules. > The issue is that when the {{NO_LOCAL_WRITE}} is enabled, the first block > replica is not placed on the local node, but it is still placed on the local > rack. Where this comes into play is where you have, for example, a flume > agent that is loading data into HDFS. > If the Flume agent is running on a DataNode, then by default, the DataNode > local to the Flume agent will always get the first block replica and this > leads to un-even block placements, with the local node always filling up > faster than any other node in the cluster. > Modifying this example, if the DataNode is removed from the host where the > Flume agent is running, or this {{NO_LOCAL_WRITE}} is enabled by Flume, then > the default block placement policy will still prefer the local rack. This > remedies the situation only so far as now the first block replica will always > be distributed to a DataNode on the local rack. > This new flag would allow a single Flume agent to distribute the blocks > randomly, evenly, over the entire cluster instead of hot-spotting the local > node or the local rack. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13448) HDFS Block Placement - Ignore Locality for First Block Replica
[ https://issues.apache.org/jira/browse/HDFS-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HDFS-13448: --- Attachment: (was: HDFS-13448.4.patch) > HDFS Block Placement - Ignore Locality for First Block Replica > -- > > Key: HDFS-13448 > URL: https://issues.apache.org/jira/browse/HDFS-13448 > Project: Hadoop HDFS > Issue Type: New Feature > Components: block placement, hdfs-client >Affects Versions: 2.9.0, 3.0.1 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Minor > Attachments: HDFS-13448.1.patch, HDFS-13448.2.patch, > HDFS-13448.3.patch, HDFS-13448.4.patch > > > According to the HDFS Block Place Rules: > {quote} > /** > * The replica placement strategy is that if the writer is on a datanode, > * the 1st replica is placed on the local machine, > * otherwise a random datanode. The 2nd replica is placed on a datanode > * that is on a different rack. The 3rd replica is placed on a datanode > * which is on a different node of the rack as the second replica. > */ > {quote} > However, there is a hint for the hdfs-client that allows the block placement > request to not put a block replica on the local datanode _where 'local' means > the same host as the client is being run on._ > {quote} > /** >* Advise that a block replica NOT be written to the local DataNode where >* 'local' means the same host as the client is being run on. >* >* @see CreateFlag#NO_LOCAL_WRITE >*/ > {quote} > I propose that we add a new flag that allows the hdfs-client to request that > the first block replica be placed on a random DataNode in the cluster. The > subsequent block replicas should follow the normal block placement rules. > The issue is that when the {{NO_LOCAL_WRITE}} is enabled, the first block > replica is not placed on the local node, but it is still placed on the local > rack. Where this comes into play is where you have, for example, a flume > agent that is loading data into HDFS. > If the Flume agent is running on a DataNode, then by default, the DataNode > local to the Flume agent will always get the first block replica and this > leads to un-even block placements, with the local node always filling up > faster than any other node in the cluster. > Modifying this example, if the DataNode is removed from the host where the > Flume agent is running, or this {{NO_LOCAL_WRITE}} is enabled by Flume, then > the default block placement policy will still prefer the local rack. This > remedies the situation only so far as now the first block replica will always > be distributed to a DataNode on the local rack. > This new flag would allow a single Flume agent to distribute the blocks > randomly, evenly, over the entire cluster instead of hot-spotting the local > node or the local rack. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13448) HDFS Block Placement - Ignore Locality for First Block Replica
[ https://issues.apache.org/jira/browse/HDFS-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HDFS-13448: --- Attachment: HDFS-13448.4.patch > HDFS Block Placement - Ignore Locality for First Block Replica > -- > > Key: HDFS-13448 > URL: https://issues.apache.org/jira/browse/HDFS-13448 > Project: Hadoop HDFS > Issue Type: New Feature > Components: block placement, hdfs-client >Affects Versions: 2.9.0, 3.0.1 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Minor > Attachments: HDFS-13448.1.patch, HDFS-13448.2.patch, > HDFS-13448.3.patch, HDFS-13448.4.patch > > > According to the HDFS Block Place Rules: > {quote} > /** > * The replica placement strategy is that if the writer is on a datanode, > * the 1st replica is placed on the local machine, > * otherwise a random datanode. The 2nd replica is placed on a datanode > * that is on a different rack. The 3rd replica is placed on a datanode > * which is on a different node of the rack as the second replica. > */ > {quote} > However, there is a hint for the hdfs-client that allows the block placement > request to not put a block replica on the local datanode _where 'local' means > the same host as the client is being run on._ > {quote} > /** >* Advise that a block replica NOT be written to the local DataNode where >* 'local' means the same host as the client is being run on. >* >* @see CreateFlag#NO_LOCAL_WRITE >*/ > {quote} > I propose that we add a new flag that allows the hdfs-client to request that > the first block replica be placed on a random DataNode in the cluster. The > subsequent block replicas should follow the normal block placement rules. > The issue is that when the {{NO_LOCAL_WRITE}} is enabled, the first block > replica is not placed on the local node, but it is still placed on the local > rack. Where this comes into play is where you have, for example, a flume > agent that is loading data into HDFS. > If the Flume agent is running on a DataNode, then by default, the DataNode > local to the Flume agent will always get the first block replica and this > leads to un-even block placements, with the local node always filling up > faster than any other node in the cluster. > Modifying this example, if the DataNode is removed from the host where the > Flume agent is running, or this {{NO_LOCAL_WRITE}} is enabled by Flume, then > the default block placement policy will still prefer the local rack. This > remedies the situation only so far as now the first block replica will always > be distributed to a DataNode on the local rack. > This new flag would allow a single Flume agent to distribute the blocks > randomly, evenly, over the entire cluster instead of hot-spotting the local > node or the local rack. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13448) HDFS Block Placement - Ignore Locality for First Block Replica
[ https://issues.apache.org/jira/browse/HDFS-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HDFS-13448: --- Attachment: (was: HDFS-13448.4.patch) > HDFS Block Placement - Ignore Locality for First Block Replica > -- > > Key: HDFS-13448 > URL: https://issues.apache.org/jira/browse/HDFS-13448 > Project: Hadoop HDFS > Issue Type: New Feature > Components: block placement, hdfs-client >Affects Versions: 2.9.0, 3.0.1 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Minor > Attachments: HDFS-13448.1.patch, HDFS-13448.2.patch, > HDFS-13448.3.patch, HDFS-13448.4.patch > > > According to the HDFS Block Place Rules: > {quote} > /** > * The replica placement strategy is that if the writer is on a datanode, > * the 1st replica is placed on the local machine, > * otherwise a random datanode. The 2nd replica is placed on a datanode > * that is on a different rack. The 3rd replica is placed on a datanode > * which is on a different node of the rack as the second replica. > */ > {quote} > However, there is a hint for the hdfs-client that allows the block placement > request to not put a block replica on the local datanode _where 'local' means > the same host as the client is being run on._ > {quote} > /** >* Advise that a block replica NOT be written to the local DataNode where >* 'local' means the same host as the client is being run on. >* >* @see CreateFlag#NO_LOCAL_WRITE >*/ > {quote} > I propose that we add a new flag that allows the hdfs-client to request that > the first block replica be placed on a random DataNode in the cluster. The > subsequent block replicas should follow the normal block placement rules. > The issue is that when the {{NO_LOCAL_WRITE}} is enabled, the first block > replica is not placed on the local node, but it is still placed on the local > rack. Where this comes into play is where you have, for example, a flume > agent that is loading data into HDFS. > If the Flume agent is running on a DataNode, then by default, the DataNode > local to the Flume agent will always get the first block replica and this > leads to un-even block placements, with the local node always filling up > faster than any other node in the cluster. > Modifying this example, if the DataNode is removed from the host where the > Flume agent is running, or this {{NO_LOCAL_WRITE}} is enabled by Flume, then > the default block placement policy will still prefer the local rack. This > remedies the situation only so far as now the first block replica will always > be distributed to a DataNode on the local rack. > This new flag would allow a single Flume agent to distribute the blocks > randomly, evenly, over the entire cluster instead of hot-spotting the local > node or the local rack. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13448) HDFS Block Placement - Ignore Locality for First Block Replica
[ https://issues.apache.org/jira/browse/HDFS-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HDFS-13448: --- Attachment: HDFS-13448.4.patch > HDFS Block Placement - Ignore Locality for First Block Replica > -- > > Key: HDFS-13448 > URL: https://issues.apache.org/jira/browse/HDFS-13448 > Project: Hadoop HDFS > Issue Type: New Feature > Components: block placement, hdfs-client >Affects Versions: 2.9.0, 3.0.1 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Minor > Attachments: HDFS-13448.1.patch, HDFS-13448.2.patch, > HDFS-13448.3.patch, HDFS-13448.4.patch > > > According to the HDFS Block Place Rules: > {quote} > /** > * The replica placement strategy is that if the writer is on a datanode, > * the 1st replica is placed on the local machine, > * otherwise a random datanode. The 2nd replica is placed on a datanode > * that is on a different rack. The 3rd replica is placed on a datanode > * which is on a different node of the rack as the second replica. > */ > {quote} > However, there is a hint for the hdfs-client that allows the block placement > request to not put a block replica on the local datanode _where 'local' means > the same host as the client is being run on._ > {quote} > /** >* Advise that a block replica NOT be written to the local DataNode where >* 'local' means the same host as the client is being run on. >* >* @see CreateFlag#NO_LOCAL_WRITE >*/ > {quote} > I propose that we add a new flag that allows the hdfs-client to request that > the first block replica be placed on a random DataNode in the cluster. The > subsequent block replicas should follow the normal block placement rules. > The issue is that when the {{NO_LOCAL_WRITE}} is enabled, the first block > replica is not placed on the local node, but it is still placed on the local > rack. Where this comes into play is where you have, for example, a flume > agent that is loading data into HDFS. > If the Flume agent is running on a DataNode, then by default, the DataNode > local to the Flume agent will always get the first block replica and this > leads to un-even block placements, with the local node always filling up > faster than any other node in the cluster. > Modifying this example, if the DataNode is removed from the host where the > Flume agent is running, or this {{NO_LOCAL_WRITE}} is enabled by Flume, then > the default block placement policy will still prefer the local rack. This > remedies the situation only so far as now the first block replica will always > be distributed to a DataNode on the local rack. > This new flag would allow a single Flume agent to distribute the blocks > randomly, evenly, over the entire cluster instead of hot-spotting the local > node or the local rack. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13448) HDFS Block Placement - Ignore Locality for First Block Replica
[ https://issues.apache.org/jira/browse/HDFS-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HDFS-13448: --- Status: Patch Available (was: Open) > HDFS Block Placement - Ignore Locality for First Block Replica > -- > > Key: HDFS-13448 > URL: https://issues.apache.org/jira/browse/HDFS-13448 > Project: Hadoop HDFS > Issue Type: New Feature > Components: block placement, hdfs-client >Affects Versions: 3.0.1, 2.9.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Minor > Attachments: HDFS-13448.1.patch, HDFS-13448.2.patch, > HDFS-13448.3.patch, HDFS-13448.4.patch > > > According to the HDFS Block Place Rules: > {quote} > /** > * The replica placement strategy is that if the writer is on a datanode, > * the 1st replica is placed on the local machine, > * otherwise a random datanode. The 2nd replica is placed on a datanode > * that is on a different rack. The 3rd replica is placed on a datanode > * which is on a different node of the rack as the second replica. > */ > {quote} > However, there is a hint for the hdfs-client that allows the block placement > request to not put a block replica on the local datanode _where 'local' means > the same host as the client is being run on._ > {quote} > /** >* Advise that a block replica NOT be written to the local DataNode where >* 'local' means the same host as the client is being run on. >* >* @see CreateFlag#NO_LOCAL_WRITE >*/ > {quote} > I propose that we add a new flag that allows the hdfs-client to request that > the first block replica be placed on a random DataNode in the cluster. The > subsequent block replicas should follow the normal block placement rules. > The issue is that when the {{NO_LOCAL_WRITE}} is enabled, the first block > replica is not placed on the local node, but it is still placed on the local > rack. Where this comes into play is where you have, for example, a flume > agent that is loading data into HDFS. > If the Flume agent is running on a DataNode, then by default, the DataNode > local to the Flume agent will always get the first block replica and this > leads to un-even block placements, with the local node always filling up > faster than any other node in the cluster. > Modifying this example, if the DataNode is removed from the host where the > Flume agent is running, or this {{NO_LOCAL_WRITE}} is enabled by Flume, then > the default block placement policy will still prefer the local rack. This > remedies the situation only so far as now the first block replica will always > be distributed to a DataNode on the local rack. > This new flag would allow a single Flume agent to distribute the blocks > randomly, evenly, over the entire cluster instead of hot-spotting the local > node or the local rack. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13448) HDFS Block Placement - Ignore Locality for First Block Replica
[ https://issues.apache.org/jira/browse/HDFS-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HDFS-13448: --- Attachment: HDFS-13448.4.patch > HDFS Block Placement - Ignore Locality for First Block Replica > -- > > Key: HDFS-13448 > URL: https://issues.apache.org/jira/browse/HDFS-13448 > Project: Hadoop HDFS > Issue Type: New Feature > Components: block placement, hdfs-client >Affects Versions: 2.9.0, 3.0.1 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Minor > Attachments: HDFS-13448.1.patch, HDFS-13448.2.patch, > HDFS-13448.3.patch, HDFS-13448.4.patch > > > According to the HDFS Block Place Rules: > {quote} > /** > * The replica placement strategy is that if the writer is on a datanode, > * the 1st replica is placed on the local machine, > * otherwise a random datanode. The 2nd replica is placed on a datanode > * that is on a different rack. The 3rd replica is placed on a datanode > * which is on a different node of the rack as the second replica. > */ > {quote} > However, there is a hint for the hdfs-client that allows the block placement > request to not put a block replica on the local datanode _where 'local' means > the same host as the client is being run on._ > {quote} > /** >* Advise that a block replica NOT be written to the local DataNode where >* 'local' means the same host as the client is being run on. >* >* @see CreateFlag#NO_LOCAL_WRITE >*/ > {quote} > I propose that we add a new flag that allows the hdfs-client to request that > the first block replica be placed on a random DataNode in the cluster. The > subsequent block replicas should follow the normal block placement rules. > The issue is that when the {{NO_LOCAL_WRITE}} is enabled, the first block > replica is not placed on the local node, but it is still placed on the local > rack. Where this comes into play is where you have, for example, a flume > agent that is loading data into HDFS. > If the Flume agent is running on a DataNode, then by default, the DataNode > local to the Flume agent will always get the first block replica and this > leads to un-even block placements, with the local node always filling up > faster than any other node in the cluster. > Modifying this example, if the DataNode is removed from the host where the > Flume agent is running, or this {{NO_LOCAL_WRITE}} is enabled by Flume, then > the default block placement policy will still prefer the local rack. This > remedies the situation only so far as now the first block replica will always > be distributed to a DataNode on the local rack. > This new flag would allow a single Flume agent to distribute the blocks > randomly, evenly, over the entire cluster instead of hot-spotting the local > node or the local rack. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13448) HDFS Block Placement - Ignore Locality for First Block Replica
[ https://issues.apache.org/jira/browse/HDFS-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BELUGA BEHR updated HDFS-13448: --- Status: Open (was: Patch Available) > HDFS Block Placement - Ignore Locality for First Block Replica > -- > > Key: HDFS-13448 > URL: https://issues.apache.org/jira/browse/HDFS-13448 > Project: Hadoop HDFS > Issue Type: New Feature > Components: block placement, hdfs-client >Affects Versions: 3.0.1, 2.9.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Minor > Attachments: HDFS-13448.1.patch, HDFS-13448.2.patch, > HDFS-13448.3.patch > > > According to the HDFS Block Place Rules: > {quote} > /** > * The replica placement strategy is that if the writer is on a datanode, > * the 1st replica is placed on the local machine, > * otherwise a random datanode. The 2nd replica is placed on a datanode > * that is on a different rack. The 3rd replica is placed on a datanode > * which is on a different node of the rack as the second replica. > */ > {quote} > However, there is a hint for the hdfs-client that allows the block placement > request to not put a block replica on the local datanode _where 'local' means > the same host as the client is being run on._ > {quote} > /** >* Advise that a block replica NOT be written to the local DataNode where >* 'local' means the same host as the client is being run on. >* >* @see CreateFlag#NO_LOCAL_WRITE >*/ > {quote} > I propose that we add a new flag that allows the hdfs-client to request that > the first block replica be placed on a random DataNode in the cluster. The > subsequent block replicas should follow the normal block placement rules. > The issue is that when the {{NO_LOCAL_WRITE}} is enabled, the first block > replica is not placed on the local node, but it is still placed on the local > rack. Where this comes into play is where you have, for example, a flume > agent that is loading data into HDFS. > If the Flume agent is running on a DataNode, then by default, the DataNode > local to the Flume agent will always get the first block replica and this > leads to un-even block placements, with the local node always filling up > faster than any other node in the cluster. > Modifying this example, if the DataNode is removed from the host where the > Flume agent is running, or this {{NO_LOCAL_WRITE}} is enabled by Flume, then > the default block placement policy will still prefer the local rack. This > remedies the situation only so far as now the first block replica will always > be distributed to a DataNode on the local rack. > This new flag would allow a single Flume agent to distribute the blocks > randomly, evenly, over the entire cluster instead of hot-spotting the local > node or the local rack. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13079) Provide a config to start namenode in safemode state upto a certain transaction id
[ https://issues.apache.org/jira/browse/HDFS-13079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16443191#comment-16443191 ] Hanisha Koneru commented on HDFS-13079: --- Thanks for working on this [~shashikant]. bq. Please note that in case a checkpoint has already happened and the requested transaction id has been subsumed in an FSImage, then the namenode will be started with the next nearest transaction id. Further FSImage files and edits will be ignored. In case the requested tx id falls within the latest fsImage , do we want to load the said fsImage or fallback to a previous fsimage with lastTxId < requested txId. IMO, we should load the fsImage with the endTxId <= requested txId. * In {{FsImage#loadFSImage}}, the check for whether we should load a fsImage is made after the image is already being loaded. The line {{loader.load(curFile, requireSameLayoutVersion)}} loads the fsImage transactions into the NN. {code} FSImageFormat.LoaderDelegator loader = FSImageFormat.newLoader(conf, target); loader.load(curFile, requireSameLayoutVersion); long lastTxIdToLoad = target.getLastTxidToLoad(); long txId = loader.getLoadedImageTxId(); if (lastTxIdToLoad != HdfsServerConstants.INVALID_TXID && txId > lastTxIdToLoad) { {code} * When we skip loading the latest fsImage, we should keep falling back to try and load the next latest fsImage. For example, say we have the 2 fsImages - fsimage_00090 and fsimage_00150. Now say we want to start the namenode in safemode upto txId 120. We first check fsimage_00150 and reject it. After this, the NN should attempt to load the next latest fsimage i.e. fsimage_00090. We can throw an exception when skipping an fsImage and catch that exception in following code path in {{FSImage#loadFsImage}}. This way the next latest fsimage will be loaded. {code} 721FSImageFile imageFile = null; 722for (int i = 0; i < imageFiles.size(); i++) { 723 try { 724imageFile = imageFiles.get(i); 725loadFSImageFile(target, recovery, imageFile, startOpt); 726break; {code} * What do we do when there are no fsImages with endTxId <= requested txId? IMO, we should stop the NN and throw an error. > Provide a config to start namenode in safemode state upto a certain > transaction id > -- > > Key: HDFS-13079 > URL: https://issues.apache.org/jira/browse/HDFS-13079 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Mukul Kumar Singh >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDFS-13079.001.patch, HDFS-13079.002.patch > > > In some cases it necessary to rollback the Namenode back to a certain > transaction id. This is especially needed when the user issues a {{rm -Rf > -skipTrash}} by mistake. > Rolling back to a transaction id helps in taking a peek at the filesystem at > a particular instant. This jira proposes to provide a configuration variable > using which the namenode can be started upto a certain transaction id. The > filesystem will be in a readonly safemode which cannot be overridden > manually. It will only be overridden by removing the config value from the > config file. Please also note that this will not cause any changes in the > filesystem state, the filesystem will be in safemode state and no changes to > the filesystem state will be allowed. > Please note that in case a checkpoint has already happened and the requested > transaction id has been subsumed in an FSImage, then the namenode will be > started with the next nearest transaction id. Further FSImage files and edits > will be ignored. > If the checkpoint hasn't happen then the namenode will be started with the > exact transaction id. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13477) Httpserver start failure should be non fatal for KSM and SCM startup
[ https://issues.apache.org/jira/browse/HDFS-13477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16443190#comment-16443190 ] genericqa commented on HDFS-13477: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} HDFS-7240 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 19s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 18s{color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 29m 15s{color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 6s{color} | {color:green} HDFS-7240 passed {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 24s{color} | {color:red} server-scm in HDFS-7240 failed. {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 26s{color} | {color:red} ozone-manager in HDFS-7240 failed. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 47s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 23s{color} | {color:red} server-scm in HDFS-7240 failed. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 22s{color} | {color:red} ozone-manager in HDFS-7240 failed. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 23s{color} | {color:red} server-scm in HDFS-7240 failed. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 22s{color} | {color:red} ozone-manager in HDFS-7240 failed. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 17s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 10s{color} | {color:red} server-scm in the patch failed. {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 10s{color} | {color:red} ozone-manager in the patch failed. {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 26m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 26m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 15s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 21s{color} | {color:red} server-scm in the patch failed. {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 20s{color} | {color:red} ozone-manager in the patch failed. {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 30s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 22s{color} | {color:red} server-scm in the patch failed. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 22s{color} | {color:red} ozone-manager in the patch failed. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 22s{color} | {color:red} server-scm in the patch failed. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 23s{color} | {color:red} ozone-manager in the patch failed. {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 22s{color} | {color:red} server-scm in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 21s{color} | {color:red} ozone-manager in the
[jira] [Updated] (HDFS-13399) Make Client field AlignmentContext non-static.
[ https://issues.apache.org/jira/browse/HDFS-13399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-13399: Attachment: (was: HDFS-13286-HDFS-12943.000.patch) > Make Client field AlignmentContext non-static. > -- > > Key: HDFS-13399 > URL: https://issues.apache.org/jira/browse/HDFS-13399 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-12943 >Reporter: Plamen Jeliazkov >Assignee: Plamen Jeliazkov >Priority: Major > Attachments: HDFS-13399-HDFS-12943.000.patch, > HDFS-13399-HDFS-12943.001.patch, HDFS-13399-HDFS-12943.002.patch > > > In HDFS-12977, DFSClient's constructor was altered to make use of a new > static method in Client that allowed one to set an AlignmentContext. This > work is to remove that static field and make each DFSClient pass it's > AlignmentContext down to the proxy Call level. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13399) Make Client field AlignmentContext non-static.
[ https://issues.apache.org/jira/browse/HDFS-13399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-13399: Attachment: HDFS-13286-HDFS-12943.000.patch > Make Client field AlignmentContext non-static. > -- > > Key: HDFS-13399 > URL: https://issues.apache.org/jira/browse/HDFS-13399 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-12943 >Reporter: Plamen Jeliazkov >Assignee: Plamen Jeliazkov >Priority: Major > Attachments: HDFS-13399-HDFS-12943.000.patch, > HDFS-13399-HDFS-12943.001.patch, HDFS-13399-HDFS-12943.002.patch > > > In HDFS-12977, DFSClient's constructor was altered to make use of a new > static method in Client that allowed one to set an AlignmentContext. This > work is to remove that static field and make each DFSClient pass it's > AlignmentContext down to the proxy Call level. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13286) Add haadmin commands to transition between standby and observer
[ https://issues.apache.org/jira/browse/HDFS-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-13286: Attachment: HDFS-13286-HDFS-12943.000.patch > Add haadmin commands to transition between standby and observer > --- > > Key: HDFS-13286 > URL: https://issues.apache.org/jira/browse/HDFS-13286 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Attachments: HDFS-13286-HDFS-12943.000.patch > > > As discussed in HDFS-12975, we should allow explicit transition between > standby and observer through haadmin command, such as: > {code} > haadmin -transitionToObserver > {code} > Initially we should support transition from observer to standby, and standby > to observer. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13286) Add haadmin commands to transition between standby and observer
[ https://issues.apache.org/jira/browse/HDFS-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-13286: Attachment: (was: HDFS-13286.0.patch) > Add haadmin commands to transition between standby and observer > --- > > Key: HDFS-13286 > URL: https://issues.apache.org/jira/browse/HDFS-13286 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > > As discussed in HDFS-12975, we should allow explicit transition between > standby and observer through haadmin command, such as: > {code} > haadmin -transitionToObserver > {code} > Initially we should support transition from observer to standby, and standby > to observer. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13286) Add haadmin commands to transition between standby and observer
[ https://issues.apache.org/jira/browse/HDFS-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-13286: Attachment: (was: HDFS-13286.1.patch) > Add haadmin commands to transition between standby and observer > --- > > Key: HDFS-13286 > URL: https://issues.apache.org/jira/browse/HDFS-13286 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > > As discussed in HDFS-12975, we should allow explicit transition between > standby and observer through haadmin command, such as: > {code} > haadmin -transitionToObserver > {code} > Initially we should support transition from observer to standby, and standby > to observer. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13286) Add haadmin commands to transition between standby and observer
[ https://issues.apache.org/jira/browse/HDFS-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16443165#comment-16443165 ] genericqa commented on HDFS-13286: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 6s{color} | {color:red} HDFS-13286 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-13286 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12919686/HDFS-13286.1.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/23989/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Add haadmin commands to transition between standby and observer > --- > > Key: HDFS-13286 > URL: https://issues.apache.org/jira/browse/HDFS-13286 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Attachments: HDFS-13286.0.patch, HDFS-13286.1.patch > > > As discussed in HDFS-12975, we should allow explicit transition between > standby and observer through haadmin command, such as: > {code} > haadmin -transitionToObserver > {code} > Initially we should support transition from observer to standby, and standby > to observer. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-13476) HDFS (Hadoop/HDP 2.7.3.2.6.4.0-91) reports CORRUPT files
[ https://issues.apache.org/jira/browse/HDFS-13476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16443156#comment-16443156 ] feng xu edited comment on HDFS-13476 at 4/18/18 8:38 PM: - By the way, java.io.[file.exists(|file://:exists%28/]) is not sufficient to determine if a file exists, because [fs|http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/io/File.java#File.0fs].getBooleanAttributes() could fail with other reasons. was (Author: fxu...@hotmail.com): By the way, java.io.file::exists() is not sufficient to determine if a file exists, because [fs|http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/io/File.java#File.0fs].[getBooleanAttributes|http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/io/FileSystem.java#FileSystem.getBooleanAttributes%28java.io.File%29] could fail with other reasons. > HDFS (Hadoop/HDP 2.7.3.2.6.4.0-91) reports CORRUPT files > > > Key: HDFS-13476 > URL: https://issues.apache.org/jira/browse/HDFS-13476 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.7.4 >Reporter: feng xu >Priority: Critical > > We have a security software runs on local file system(ext4), and the security > software denies some particular users to access some > {color:#33}particular {color}HDFS folders based on security policy. For > example, the security policy always gives the user hdfs full permission, and > denies the user yarn to access /dir1. If the user yarn tries to access a > file under HDFS folder {color:#33}/dir1{color}, the security software > denies the access and returns EACCES from file system call through errno. > This used to work because the data corruption was determined by block > scanner([https://blog.cloudera.com/blog/2016/12/hdfs-datanode-scanners-and-disk-checker-explained/).] > On HDP 2.7.3.2.6.4.0-91, HDFS reports a lot data corruptions because of the > security policy to deny file access in HDFS from local file system. We > debugged HDFS and found out BlockSender() directly calls the following > statements and may cause the problem: > datanode.notifyNamenodeDeletedBlock(block, replica.getStorageUuid()); > datanode.data.invalidate(block.getBlockPoolId(), new > Block[]\{block.getLocalBlock()}); > In the mean time, the block scanner is not triggered because of the > undocumented property {color:#33}dfs.datanode.disk.check.min.gap. However > the problem is still there if we disable > dfs.datanode.disk.check.min.gap{color} by setting it to 0. . -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13476) HDFS (Hadoop/HDP 2.7.3.2.6.4.0-91) reports CORRUPT files
[ https://issues.apache.org/jira/browse/HDFS-13476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16443156#comment-16443156 ] feng xu commented on HDFS-13476: By the way, java.io.file::exists() is not sufficient to determine if a file exists, because [fs|http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/io/File.java#File.0fs].[getBooleanAttributes|http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/io/FileSystem.java#FileSystem.getBooleanAttributes%28java.io.File%29] could fail with other reasons. > HDFS (Hadoop/HDP 2.7.3.2.6.4.0-91) reports CORRUPT files > > > Key: HDFS-13476 > URL: https://issues.apache.org/jira/browse/HDFS-13476 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.7.4 >Reporter: feng xu >Priority: Critical > > We have a security software runs on local file system(ext4), and the security > software denies some particular users to access some > {color:#33}particular {color}HDFS folders based on security policy. For > example, the security policy always gives the user hdfs full permission, and > denies the user yarn to access /dir1. If the user yarn tries to access a > file under HDFS folder {color:#33}/dir1{color}, the security software > denies the access and returns EACCES from file system call through errno. > This used to work because the data corruption was determined by block > scanner([https://blog.cloudera.com/blog/2016/12/hdfs-datanode-scanners-and-disk-checker-explained/).] > On HDP 2.7.3.2.6.4.0-91, HDFS reports a lot data corruptions because of the > security policy to deny file access in HDFS from local file system. We > debugged HDFS and found out BlockSender() directly calls the following > statements and may cause the problem: > datanode.notifyNamenodeDeletedBlock(block, replica.getStorageUuid()); > datanode.data.invalidate(block.getBlockPoolId(), new > Block[]\{block.getLocalBlock()}); > In the mean time, the block scanner is not triggered because of the > undocumented property {color:#33}dfs.datanode.disk.check.min.gap. However > the problem is still there if we disable > dfs.datanode.disk.check.min.gap{color} by setting it to 0. . -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13286) Add haadmin commands to transition between standby and observer
[ https://issues.apache.org/jira/browse/HDFS-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16443153#comment-16443153 ] Chao Sun commented on HDFS-13286: - Rebase to trunk. > Add haadmin commands to transition between standby and observer > --- > > Key: HDFS-13286 > URL: https://issues.apache.org/jira/browse/HDFS-13286 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Attachments: HDFS-13286.0.patch, HDFS-13286.1.patch > > > As discussed in HDFS-12975, we should allow explicit transition between > standby and observer through haadmin command, such as: > {code} > haadmin -transitionToObserver > {code} > Initially we should support transition from observer to standby, and standby > to observer. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13286) Add haadmin commands to transition between standby and observer
[ https://issues.apache.org/jira/browse/HDFS-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-13286: Attachment: HDFS-13286.1.patch > Add haadmin commands to transition between standby and observer > --- > > Key: HDFS-13286 > URL: https://issues.apache.org/jira/browse/HDFS-13286 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Attachments: HDFS-13286.0.patch, HDFS-13286.1.patch > > > As discussed in HDFS-12975, we should allow explicit transition between > standby and observer through haadmin command, such as: > {code} > haadmin -transitionToObserver > {code} > Initially we should support transition from observer to standby, and standby > to observer. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13286) Add haadmin commands to transition between standby and observer
[ https://issues.apache.org/jira/browse/HDFS-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16443148#comment-16443148 ] genericqa commented on HDFS-13286: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 6s{color} | {color:red} HDFS-13286 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-13286 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12919684/HDFS-13286.0.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/23988/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Add haadmin commands to transition between standby and observer > --- > > Key: HDFS-13286 > URL: https://issues.apache.org/jira/browse/HDFS-13286 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Attachments: HDFS-13286.0.patch > > > As discussed in HDFS-12975, we should allow explicit transition between > standby and observer through haadmin command, such as: > {code} > haadmin -transitionToObserver > {code} > Initially we should support transition from observer to standby, and standby > to observer. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13286) Add haadmin commands to transition between standby and observer
[ https://issues.apache.org/jira/browse/HDFS-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16443145#comment-16443145 ] Chao Sun commented on HDFS-13286: - Seems the change on {{haadmin}} is not as disruptive as I thought. Submitted the initial patch. [~shv], [~zero45], [~xkrogen]: can you take a look? > Add haadmin commands to transition between standby and observer > --- > > Key: HDFS-13286 > URL: https://issues.apache.org/jira/browse/HDFS-13286 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Attachments: HDFS-13286.0.patch > > > As discussed in HDFS-12975, we should allow explicit transition between > standby and observer through haadmin command, such as: > {code} > haadmin -transitionToObserver > {code} > Initially we should support transition from observer to standby, and standby > to observer. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13286) Add haadmin commands to transition between standby and observer
[ https://issues.apache.org/jira/browse/HDFS-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-13286: Status: Patch Available (was: Open) > Add haadmin commands to transition between standby and observer > --- > > Key: HDFS-13286 > URL: https://issues.apache.org/jira/browse/HDFS-13286 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Attachments: HDFS-13286.0.patch > > > As discussed in HDFS-12975, we should allow explicit transition between > standby and observer through haadmin command, such as: > {code} > haadmin -transitionToObserver > {code} > Initially we should support transition from observer to standby, and standby > to observer. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13286) Add haadmin commands to transition between standby and observer
[ https://issues.apache.org/jira/browse/HDFS-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-13286: Attachment: HDFS-13286.0.patch > Add haadmin commands to transition between standby and observer > --- > > Key: HDFS-13286 > URL: https://issues.apache.org/jira/browse/HDFS-13286 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Attachments: HDFS-13286.0.patch > > > As discussed in HDFS-12975, we should allow explicit transition between > standby and observer through haadmin command, such as: > {code} > haadmin -transitionToObserver > {code} > Initially we should support transition from observer to standby, and standby > to observer. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13442) Ozone: Handle Datanode Registration failure
[ https://issues.apache.org/jira/browse/HDFS-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16443095#comment-16443095 ] genericqa commented on HDFS-13442: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} HDFS-7240 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 37s{color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s{color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s{color} | {color:green} HDFS-7240 passed {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 13s{color} | {color:red} container-service in HDFS-7240 failed. {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 14s{color} | {color:red} server-scm in HDFS-7240 failed. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 21s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 58s{color} | {color:red} hadoop-hdds/common in HDFS-7240 has 1 extant Findbugs warnings. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 13s{color} | {color:red} container-service in HDFS-7240 failed. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 15s{color} | {color:red} server-scm in HDFS-7240 failed. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 13s{color} | {color:red} container-service in HDFS-7240 failed. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 13s{color} | {color:red} server-scm in HDFS-7240 failed. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 9s{color} | {color:red} container-service in the patch failed. {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 10s{color} | {color:red} server-scm in the patch failed. {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 7s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 9s{color} | {color:red} container-service in the patch failed. {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 10s{color} | {color:red} server-scm in the patch failed. {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 15s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 11s{color} | {color:red} container-service in the patch failed. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 14s{color} | {color:red} server-scm in the patch failed. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 13s{color} | {color:red} container-service in the patch failed. {color} | | {color:red}-1{color} | {co
[jira] [Updated] (HDFS-13355) Create IO provider for hdsl
[ https://issues.apache.org/jira/browse/HDFS-13355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajay Kumar updated HDFS-13355: -- Issue Type: Sub-task (was: Improvement) Parent: HDFS-7240 > Create IO provider for hdsl > --- > > Key: HDFS-13355 > URL: https://issues.apache.org/jira/browse/HDFS-13355 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-7240 >Reporter: Ajay Kumar >Priority: Major > Fix For: HDFS-7240 > > > Create an abstraction like FileIoProvider for hdsl to handle disk failure and > other issues. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13477) Httpserver start failure should be non fatal for KSM and SCM startup
[ https://issues.apache.org/jira/browse/HDFS-13477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajay Kumar updated HDFS-13477: -- Attachment: HDFS-13477-HDFS-7240.00.patch > Httpserver start failure should be non fatal for KSM and SCM startup > > > Key: HDFS-13477 > URL: https://issues.apache.org/jira/browse/HDFS-13477 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-7240 >Reporter: Ajay Kumar >Assignee: Ajay Kumar >Priority: Major > Fix For: HDFS-7240 > > Attachments: HDFS-13477-HDFS-7240.00.patch > > > Currently KSM and SCM startup will fail if corresponding HttpServer fails > with some Exception. HttpServer is not essential for operations of KSM and > SCM so we should allow them to start even if httpServer fails. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13442) Ozone: Handle Datanode Registration failure
[ https://issues.apache.org/jira/browse/HDFS-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru updated HDFS-13442: -- Attachment: HDFS-13442-HDFS-7240.002.patch > Ozone: Handle Datanode Registration failure > --- > > Key: HDFS-13442 > URL: https://issues.apache.org/jira/browse/HDFS-13442 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Affects Versions: HDFS-7240 >Reporter: Hanisha Koneru >Assignee: Hanisha Koneru >Priority: Major > Attachments: HDFS-13442-HDFS-7240.001.patch, > HDFS-13442-HDFS-7240.002.patch > > > If a datanode is not able to register itself, we need to handle that > correctly. > If the number of unsuccessful attempts to register with the SCM exceeds a > configurable max number, the datanode should not make any more attempts. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13477) Httpserver start failure should be non fatal for KSM and SCM startup
[ https://issues.apache.org/jira/browse/HDFS-13477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajay Kumar updated HDFS-13477: -- Attachment: (was: HDFS-13477-HDFS-7240.00.patch) > Httpserver start failure should be non fatal for KSM and SCM startup > > > Key: HDFS-13477 > URL: https://issues.apache.org/jira/browse/HDFS-13477 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-7240 >Reporter: Ajay Kumar >Assignee: Ajay Kumar >Priority: Major > Fix For: HDFS-7240 > > Attachments: HDFS-13477-HDFS-7240.00.patch > > > Currently KSM and SCM startup will fail if corresponding HttpServer fails > with some Exception. HttpServer is not essential for operations of KSM and > SCM so we should allow them to start even if httpServer fails. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13477) Httpserver start failure should be non fatal for KSM and SCM startup
[ https://issues.apache.org/jira/browse/HDFS-13477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajay Kumar updated HDFS-13477: -- Attachment: HDFS-13477-HDFS-7240.00.patch > Httpserver start failure should be non fatal for KSM and SCM startup > > > Key: HDFS-13477 > URL: https://issues.apache.org/jira/browse/HDFS-13477 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-7240 >Reporter: Ajay Kumar >Priority: Major > Fix For: HDFS-7240 > > Attachments: HDFS-13477-HDFS-7240.00.patch > > > Currently KSM and SCM startup will fail if corresponding HttpServer fails > with some Exception. HttpServer is not essential for operations of KSM and > SCM so we should allow them to start even if httpServer fails. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-13477) Httpserver start failure should be non fatal for KSM and SCM startup
[ https://issues.apache.org/jira/browse/HDFS-13477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajay Kumar reassigned HDFS-13477: - Assignee: Ajay Kumar > Httpserver start failure should be non fatal for KSM and SCM startup > > > Key: HDFS-13477 > URL: https://issues.apache.org/jira/browse/HDFS-13477 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-7240 >Reporter: Ajay Kumar >Assignee: Ajay Kumar >Priority: Major > Fix For: HDFS-7240 > > Attachments: HDFS-13477-HDFS-7240.00.patch > > > Currently KSM and SCM startup will fail if corresponding HttpServer fails > with some Exception. HttpServer is not essential for operations of KSM and > SCM so we should allow them to start even if httpServer fails. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13477) Httpserver start failure should be non fatal for KSM and SCM startup
[ https://issues.apache.org/jira/browse/HDFS-13477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajay Kumar updated HDFS-13477: -- Status: Patch Available (was: Open) > Httpserver start failure should be non fatal for KSM and SCM startup > > > Key: HDFS-13477 > URL: https://issues.apache.org/jira/browse/HDFS-13477 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-7240 >Reporter: Ajay Kumar >Priority: Major > Fix For: HDFS-7240 > > Attachments: HDFS-13477-HDFS-7240.00.patch > > > Currently KSM and SCM startup will fail if corresponding HttpServer fails > with some Exception. HttpServer is not essential for operations of KSM and > SCM so we should allow them to start even if httpServer fails. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13477) Httpserver start failure should be non fatal for KSM and SCM startup
[ https://issues.apache.org/jira/browse/HDFS-13477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajay Kumar updated HDFS-13477: -- Issue Type: Sub-task (was: Improvement) Parent: HDFS-7240 > Httpserver start failure should be non fatal for KSM and SCM startup > > > Key: HDFS-13477 > URL: https://issues.apache.org/jira/browse/HDFS-13477 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-7240 >Reporter: Ajay Kumar >Priority: Major > Fix For: HDFS-7240 > > > Currently KSM and SCM startup will fail if corresponding HttpServer fails > with some Exception. HttpServer is not essential for operations of KSM and > SCM so we should allow them to start even if httpServer fails. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13477) Httpserver start failure should be non fatal for KSM and SCM startup
[ https://issues.apache.org/jira/browse/HDFS-13477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajay Kumar updated HDFS-13477: -- Fix Version/s: HDFS-7240 > Httpserver start failure should be non fatal for KSM and SCM startup > > > Key: HDFS-13477 > URL: https://issues.apache.org/jira/browse/HDFS-13477 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: HDFS-7240 >Reporter: Ajay Kumar >Priority: Major > Fix For: HDFS-7240 > > > Currently KSM and SCM startup will fail if corresponding HttpServer fails > with some Exception. HttpServer is not essential for operations of KSM and > SCM so we should allow them to start even if httpServer fails. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13477) Httpserver start failure should be non fatal for KSM and SCM startup
[ https://issues.apache.org/jira/browse/HDFS-13477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajay Kumar updated HDFS-13477: -- Affects Version/s: HDFS-7240 > Httpserver start failure should be non fatal for KSM and SCM startup > > > Key: HDFS-13477 > URL: https://issues.apache.org/jira/browse/HDFS-13477 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: HDFS-7240 >Reporter: Ajay Kumar >Priority: Major > Fix For: HDFS-7240 > > > Currently KSM and SCM startup will fail if corresponding HttpServer fails > with some Exception. HttpServer is not essential for operations of KSM and > SCM so we should allow them to start even if httpServer fails. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-13477) Httpserver start failure should be non fatal for KSM and SCM startup
Ajay Kumar created HDFS-13477: - Summary: Httpserver start failure should be non fatal for KSM and SCM startup Key: HDFS-13477 URL: https://issues.apache.org/jira/browse/HDFS-13477 Project: Hadoop HDFS Issue Type: Improvement Reporter: Ajay Kumar Currently KSM and SCM startup will fail if corresponding HttpServer fails with some Exception. HttpServer is not essential for operations of KSM and SCM so we should allow them to start even if httpServer fails. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13476) HDFS (Hadoop/HDP 2.7.3.2.6.4.0-91) reports CORRUPT files
[ https://issues.apache.org/jira/browse/HDFS-13476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] feng xu updated HDFS-13476: --- Description: We have a security software runs on local file system(ext4), and the security software denies some particular users to access some {color:#33}particular {color}HDFS folders based on security policy. For example, the security policy always gives the user hdfs full permission, and denies the user yarn to access /dir1. If the user yarn tries to access a file under HDFS folder {color:#33}/dir1{color}, the security software denies the access and returns EACCES from file system call through errno. This used to work because the data corruption was determined by block scanner([https://blog.cloudera.com/blog/2016/12/hdfs-datanode-scanners-and-disk-checker-explained/).] On HDP 2.7.3.2.6.4.0-91, HDFS reports a lot data corruptions because of the security policy to deny file access in HDFS from local file system. We debugged HDFS and found out BlockSender() directly calls the following statements and may cause the problem: datanode.notifyNamenodeDeletedBlock(block, replica.getStorageUuid()); datanode.data.invalidate(block.getBlockPoolId(), new Block[]\{block.getLocalBlock()}); In the mean time, the block scanner is not triggered because of the undocumented property {color:#33}dfs.datanode.disk.check.min.gap. However the problem is still there if we disable dfs.datanode.disk.check.min.gap{color} by setting it to 0. . was: We have a security software runs on local file system(ext4), and the security software denies some particular users to access some {color:#33}particular {color}HDFS folders based on security policy. For example, the security policy always gives the user hdfs full permission, and denies the user yarn to access /dir1. If the user yarn tries to access a file under HDFS folder {color:#33}/dir1{color}, the security software denies the access and returns EACCES from file system call through errno. This used to work because the data corruption was determined by block scanner([https://blog.cloudera.com/blog/2016/12/hdfs-datanode-scanners-and-disk-checker-explained/).] On HDP 2.7.3.2.6.4.0-91, HDFS reports a lot data corruptions because of the security policy to deny file access in HDFS from local file system. We debugged HDFS and found out BlockSender() directly calls the following statements and causes the problem: datanode.notifyNamenodeDeletedBlock(block, replica.getStorageUuid()); datanode.data.invalidate(block.getBlockPoolId(), new Block[]\{block.getLocalBlock()}); In the mean time, the block scanner is not triggered because of the undocumented property {color:#33}dfs.datanode.disk.check.min.gap. However the problem is still there if we disable {color:#33}dfs.datanode.disk.check.min.gap{color} by setting it to 0. .{color} > HDFS (Hadoop/HDP 2.7.3.2.6.4.0-91) reports CORRUPT files > > > Key: HDFS-13476 > URL: https://issues.apache.org/jira/browse/HDFS-13476 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.7.4 >Reporter: feng xu >Priority: Critical > > We have a security software runs on local file system(ext4), and the security > software denies some particular users to access some > {color:#33}particular {color}HDFS folders based on security policy. For > example, the security policy always gives the user hdfs full permission, and > denies the user yarn to access /dir1. If the user yarn tries to access a > file under HDFS folder {color:#33}/dir1{color}, the security software > denies the access and returns EACCES from file system call through errno. > This used to work because the data corruption was determined by block > scanner([https://blog.cloudera.com/blog/2016/12/hdfs-datanode-scanners-and-disk-checker-explained/).] > On HDP 2.7.3.2.6.4.0-91, HDFS reports a lot data corruptions because of the > security policy to deny file access in HDFS from local file system. We > debugged HDFS and found out BlockSender() directly calls the following > statements and may cause the problem: > datanode.notifyNamenodeDeletedBlock(block, replica.getStorageUuid()); > datanode.data.invalidate(block.getBlockPoolId(), new > Block[]\{block.getLocalBlock()}); > In the mean time, the block scanner is not triggered because of the > undocumented property {color:#33}dfs.datanode.disk.check.min.gap. However > the problem is still there if we disable > dfs.datanode.disk.check.min.gap{color} by setting it to 0. . -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issu
[jira] [Commented] (HDFS-13476) HDFS (Hadoop/HDP 2.7.3.2.6.4.0-91) reports CORRUPT files
[ https://issues.apache.org/jira/browse/HDFS-13476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16443023#comment-16443023 ] feng xu commented on HDFS-13476: 2018-04-18 12:40:48,466 ERROR datanode.DataNode (DataXceiver.java:run(278)) - 4381-fxu-centos7:50010:DataXceiver error processing READ_BLOCK operation src: /10.3.43.81:51424 dst: /10.3.43.81:50010 java.io.FileNotFoundException: BlockId 1073741896 is not valid. at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockFile(FsDatasetImpl.java:739) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockFile(FsDatasetImpl.java:730) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getMetaDataInputStream(FsDatasetImpl.java:232) at org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:299) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:547) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:116) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:251) at java.lang.Thread.run(Thread.java:745) > HDFS (Hadoop/HDP 2.7.3.2.6.4.0-91) reports CORRUPT files > > > Key: HDFS-13476 > URL: https://issues.apache.org/jira/browse/HDFS-13476 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.7.4 >Reporter: feng xu >Priority: Critical > > We have a security software runs on local file system(ext4), and the security > software denies some particular users to access some > {color:#33}particular {color}HDFS folders based on security policy. For > example, the security policy always gives the user hdfs full permission, and > denies the user yarn to access /dir1. If the user yarn tries to access a > file under HDFS folder {color:#33}/dir1{color}, the security software > denies the access and returns EACCES from file system call through errno. > This used to work because the data corruption was determined by block > scanner([https://blog.cloudera.com/blog/2016/12/hdfs-datanode-scanners-and-disk-checker-explained/).] > On HDP 2.7.3.2.6.4.0-91, HDFS reports a lot data corruptions because of the > security policy to deny file access in HDFS from local file system. We > debugged HDFS and found out BlockSender() directly calls the following > statements and may cause the problem: > datanode.notifyNamenodeDeletedBlock(block, replica.getStorageUuid()); > datanode.data.invalidate(block.getBlockPoolId(), new > Block[]\{block.getLocalBlock()}); > In the mean time, the block scanner is not triggered because of the > undocumented property {color:#33}dfs.datanode.disk.check.min.gap. However > the problem is still there if we disable > dfs.datanode.disk.check.min.gap{color} by setting it to 0. . -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12950) [oiv] ls will fail in secure cluster
[ https://issues.apache.org/jira/browse/HDFS-12950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16443013#comment-16443013 ] Brahma Reddy Battula commented on HDFS-12950: - [~jojochuang] thanks for the patch. Apporach looks good to me. I feel, additionally we can mention that if we can pass "-Dhadoop.security.authentication=simple" ls will work...? > [oiv] ls will fail in secure cluster > - > > Key: HDFS-12950 > URL: https://issues.apache.org/jira/browse/HDFS-12950 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Brahma Reddy Battula >Assignee: Wei-Chiu Chuang >Priority: Major > Attachments: HDFS-12950.001.patch, HDFS-12950.002.patch > > > if we execute ls, it will throw following. > {noformat} > hdfs dfs -ls webhdfs://127.0.0.1:5978/ > ls: Invalid value for webhdfs parameter "op" > {noformat} > When client is configured with security (i.e "hadoop.security.authentication= > KERBEROS) , > then webhdfs will request getdelegation token which is not implemented and > hence it will throw “ls: Invalid value for webhdfs parameter "op"”. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13476) HDFS (Hadoop/HDP 2.7.3.2.6.4.0-91) reports CORRUPT files
[ https://issues.apache.org/jira/browse/HDFS-13476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16442998#comment-16442998 ] Brahma Reddy Battula commented on HDFS-13476: - Looks issue similar to HDFS-11711. so you are getting the FileNotFoundException..? Can you please attach the trace also..? hope HDP-2.7.3 will be same as hadoop-2.7.3. > HDFS (Hadoop/HDP 2.7.3.2.6.4.0-91) reports CORRUPT files > > > Key: HDFS-13476 > URL: https://issues.apache.org/jira/browse/HDFS-13476 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.7.4 >Reporter: feng xu >Priority: Critical > > We have a security software runs on local file system(ext4), and the security > software denies some particular users to access some > {color:#33}particular {color}HDFS folders based on security policy. For > example, the security policy always gives the user hdfs full permission, and > denies the user yarn to access /dir1. If the user yarn tries to access a > file under HDFS folder {color:#33}/dir1{color}, the security software > denies the access and returns EACCES from file system call through errno. > This used to work because the data corruption was determined by block > scanner([https://blog.cloudera.com/blog/2016/12/hdfs-datanode-scanners-and-disk-checker-explained/).] > On HDP 2.7.3.2.6.4.0-91, HDFS reports a lot data corruptions because of the > security policy to deny file access in HDFS from local file system. We > debugged HDFS and found out BlockSender() directly calls the following > statements and causes the problem: > datanode.notifyNamenodeDeletedBlock(block, replica.getStorageUuid()); > datanode.data.invalidate(block.getBlockPoolId(), new > Block[]\{block.getLocalBlock()}); > In the mean time, the block scanner is not triggered because of the > undocumented property {color:#33}dfs.datanode.disk.check.min.gap. However > the problem is still there if we disable > {color:#33}dfs.datanode.disk.check.min.gap{color} by setting it to 0. > .{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13470) RBF: Add Browse the Filesystem button to the UI
[ https://issues.apache.org/jira/browse/HDFS-13470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16442990#comment-16442990 ] Íñigo Goiri commented on HDFS-13470: To make the whole UI easier to maintain I would make all the header and tab links to be generated. Right now, we have to keep consistency between the NN pages and the Router pages. Not sure how to do it with js but I'll give it a try. > RBF: Add Browse the Filesystem button to the UI > --- > > Key: HDFS-13470 > URL: https://issues.apache.org/jira/browse/HDFS-13470 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: Íñigo Goiri >Priority: Major > Attachments: HDFS-13470.000.patch > > > After HDFS-12512 added WebHDFS, we can add the support to browse the > filesystem to the UI. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-13476) HDFS (Hadoop/HDP 2.7.3.2.6.4.0-91) reports CORRUPT files
feng xu created HDFS-13476: -- Summary: HDFS (Hadoop/HDP 2.7.3.2.6.4.0-91) reports CORRUPT files Key: HDFS-13476 URL: https://issues.apache.org/jira/browse/HDFS-13476 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.7.4 Reporter: feng xu We have a security software runs on local file system(ext4), and the security software denies some particular users to access some {color:#33}particular {color}HDFS folders based on security policy. For example, the security policy always gives the user hdfs full permission, and denies the user yarn to access /dir1. If the user yarn tries to access a file under HDFS folder {color:#33}/dir1{color}, the security software denies the access and returns EACCES from file system call through errno. This used to work because the data corruption was determined by block scanner([https://blog.cloudera.com/blog/2016/12/hdfs-datanode-scanners-and-disk-checker-explained/).] On HDP 2.7.3.2.6.4.0-91, HDFS reports a lot data corruptions because of the security policy to deny file access in HDFS from local file system. We debugged HDFS and found out BlockSender() directly calls the following statements and causes the problem: datanode.notifyNamenodeDeletedBlock(block, replica.getStorageUuid()); datanode.data.invalidate(block.getBlockPoolId(), new Block[]\{block.getLocalBlock()}); In the mean time, the block scanner is not triggered because of the undocumented property {color:#33}dfs.datanode.disk.check.min.gap. However the problem is still there if we disable {color:#33}dfs.datanode.disk.check.min.gap{color} by setting it to 0. .{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13474) Unable to start Hadoop DataNodes
[ https://issues.apache.org/jira/browse/HDFS-13474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16442945#comment-16442945 ] Brahma Reddy Battula commented on HDFS-13474: - Thanks for reporting,Could you try with Java8..? Could have this query in mailing list, Jira is to track the issues. > Unable to start Hadoop DataNodes > > > Key: HDFS-13474 > URL: https://issues.apache.org/jira/browse/HDFS-13474 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: robbie >Priority: Major > Attachments: hadoop-roycecoll...@steelydan.com-datanode-c0315.log, > hadoop-roycecoll...@steelydan.com-datanode-c0315.out, > hadoop-roycecoll...@steelydan.com-namenode-c0315.log, > hadoop-roycecoll...@steelydan.com-namenode-c0315.out, > hadoop-roycecoll...@steelydan.com-secondarynamenode-c0315.log, > hadoop-roycecoll...@steelydan.com-secondarynamenode-c0315.out > > > I am trying to follow the instructions in the Getting Started guide, > [http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html#YARN_on_Single_Node] > I have confirmed, that I can `ssh localhost` without a password prompt. I > have also run the following steps, > {quote}1. $ bin/hdfs namenode -format > 2. $ sbin/start-dfs.sh > {quote} > But I cant run step 3. to browse the location at `[http://localhost:9870/]`. > When I run `>jsp` from the terminal prompt I just get returned, > {quote}14900 Jps > {quote} > I was expecting a list of my nodes. > In the Logs I see two error messages towards the end, > {quote}2018-04-18 14:15:42,516 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: RECEIVED SIGNAL 15: SIGTERM > {quote} > {quote}2018-04-18 14:15:42,516 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: RECEIVED SIGNAL 1: SIGHUP > {quote} > {quote}2018-04-18 14:15:42,517 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG: > / > SHUTDOWN_MSG: Shutting down DataNode at c0315/127.0.1.1 > / > {quote} > I will attach the full logs with this bug report. > Can anyone help even with ways to debug this please ? > > Java Version, > {quote}rcoll...@steelydan.com@c0315:~/temp/logs/hadoop$ java --version > java 9.0.4 > Java(TM) SE Runtime Environment (build 9.0.4+11) > Java HotSpot(TM) 64-Bit Server VM (build 9.0.4+11, mixed mode) > {quote} > Ubuntu version, > {quote}$ lsb_release -a > No LSB modules are available. > Distributor ID: neon > Description: KDE neon User Edition 5.12 > Release: 16.04 > Codename: xenial > {quote} > I have tried running the commands, `bin/hdfs version` > {quote}Hadoop 3.1.0 > Source code repository [https://github.com/apache/hadoop] -r > 16b70619a24cdcf5d3b0fcf4b58ca77238ccbe6d > Compiled by centos on 2018-03-30T00:00Z > Compiled with protoc 2.5.0 > From source with checksum 14182d20c972b3e2105580a1ad6990 > This command was run using > /home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/common/hadoop-common-3.1.0.jar > {quote} > when I try `bin/hdfs groups` it doesnt return but gives me, > {quote}018-04-18 15:33:34,590 INFO ipc.Client: Retrying connect to server: > localhost/127.0.0.1:9000. Already tried 0 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > {quote} > when I try, `$ bin/hdfs lsSnapshottableDir` > {quote}lsSnapshottableDir: Call From c0315/127.0.1.1 to localhost:9000 failed > on connection exception: java.net.ConnectException: Connection refused; For > more details see: [http://wiki|http://wiki/]. > apache.org/hadoop/ConnectionRefused > {quote} > > when I try, `$ bin/hdfs classpath` > {quote}/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/etc/hadoop:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/common/lib/*:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/common/*:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/hdfs:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/hdfs/lib/*:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/hdfs/*:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/mapreduce/*:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/yarn:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/yarn/lib/*:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/yarn/* > {quote} > core-site.xml > {quote} > > > fs.defaultFS > hdfs://localhost:9000 > > > {quote} > > hdfs-site.xml > {quote} > > dfs.replication > 1 > > > {quote} > mapred-site.xml > {quote} > > mapreduce.framework.name > yarn > > > {quote} -- This message was sent by Atlassian JIRA (v7
[jira] [Commented] (HDFS-12749) DN may not send block report to NN after NN restart
[ https://issues.apache.org/jira/browse/HDFS-12749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16442942#comment-16442942 ] He Xiaoqiao commented on HDFS-12749: ping [~kihwal],[~daryn],[~arpitagarwal],[~ajayydv] do you mind having a look? > DN may not send block report to NN after NN restart > --- > > Key: HDFS-12749 > URL: https://issues.apache.org/jira/browse/HDFS-12749 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.7.1, 2.8.3, 2.7.5, 3.0.0, 2.9.1 >Reporter: TanYuxin >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-12749-branch-2.7.002.patch, > HDFS-12749-trunk.003.patch, HDFS-12749-trunk.004.patch, HDFS-12749.001.patch > > > Now our cluster have thousands of DN, millions of files and blocks. When NN > restart, NN's load is very high. > After NN restart,DN will call BPServiceActor#reRegister method to register. > But register RPC will get a IOException since NN is busy dealing with Block > Report. The exception is caught at BPServiceActor#processCommand. > Next is the caught IOException: > {code:java} > WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Error processing > datanode Command > java.io.IOException: Failed on local exception: java.io.IOException: > java.net.SocketTimeoutException: 6 millis timeout while waiting for > channel to be ready for read. ch : java.nio.channels.SocketChannel[connected > local=/DataNode_IP:Port remote=NameNode_Host/IP:Port]; Host Details : local > host is: "DataNode_Host/Datanode_IP"; destination host is: > "NameNode_Host":Port; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:773) > at org.apache.hadoop.ipc.Client.call(Client.java:1474) > at org.apache.hadoop.ipc.Client.call(Client.java:1407) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) > at com.sun.proxy.$Proxy13.registerDatanode(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:126) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:793) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.reRegister(BPServiceActor.java:926) > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:604) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:898) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:711) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:864) > at java.lang.Thread.run(Thread.java:745) > {code} > The un-catched IOException breaks BPServiceActor#register, and the Block > Report can not be sent immediately. > {code} > /** >* Register one bp with the corresponding NameNode >* >* The bpDatanode needs to register with the namenode on startup in order >* 1) to report which storage it is serving now and >* 2) to receive a registrationID >* >* issued by the namenode to recognize registered datanodes. >* >* @param nsInfo current NamespaceInfo >* @see FSNamesystem#registerDatanode(DatanodeRegistration) >* @throws IOException >*/ > void register(NamespaceInfo nsInfo) throws IOException { > // The handshake() phase loaded the block pool storage > // off disk - so update the bpRegistration object from that info > DatanodeRegistration newBpRegistration = bpos.createRegistration(); > LOG.info(this + " beginning handshake with NN"); > while (shouldRun()) { > try { > // Use returned registration from namenode with updated fields > newBpRegistration = bpNamenode.registerDatanode(newBpRegistration); > newBpRegistration.setNamespaceInfo(nsInfo); > bpRegistration = newBpRegistration; > break; > } catch(EOFException e) { // namenode might have just restarted > LOG.info("Problem connecting to server: " + nnAddr + " :" > + e.getLocalizedMessage()); > sleepAndLogInterrupts(1000, "connecting to server"); > } catch(SocketTimeoutException e) { // namenode is busy > LOG.info("Problem connecting to server: " + nnAddr); > sleepAndLogInterrupts(1000, "connecting to server"); > } > } > > LOG.info("Block pool " + this + " successfully registered with NN"); > bpos.registrationSucceeded(this, bpRegistration); > // random short delay - helps scatter the BR from all DNs > scheduler.scheduleBlockReport(dnConf.initialBlockReportDelay); > } >
[jira] [Commented] (HDFS-13473) DataNode update BlockKeys using mode PULL rather than PUSH from NameNode
[ https://issues.apache.org/jira/browse/HDFS-13473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16442934#comment-16442934 ] He Xiaoqiao commented on HDFS-13473: Thanks [~daryn] for your comments. {quote}What about something like the DN's heartbeat contains the current key version it has? The NN's handleHeartbeat compares with its current key, calls setNeedKeyUpdate if different.{quote} It is good suggestion to update Block Keys for DataNode. But there may be more code changes since we need to update {{DatanodeProtocol#sendHeartbeat}} and add new parameter about version for BlockKeys? > DataNode update BlockKeys using mode PULL rather than PUSH from NameNode > > > Key: HDFS-13473 > URL: https://issues.apache.org/jira/browse/HDFS-13473 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-13473-trunk.001.patch > > > It is passive behavior about updating Block keys for DataNode currently, and > it depends on if NameNode return #KeyUpdateCommand for heartbeat response. > There are several problems of this Block keys synchronization mode: > a. NameNode can't be sensed about if Block Keys reach DataNode successfully, > b. It is also not sensed for DataNode who meets some exception while receive > or process heartbeat response which include BlockKeyCommand, > such as HDFS-13441 and HDFS-12749 mentioned. > So I propose improve Push Block Keys from NameNode for DataNode to DataNode > Pull Block Keys. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13475) RBF: Admin cannot enforce Router enter SafeMode
[ https://issues.apache.org/jira/browse/HDFS-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated HDFS-13475: --- Summary: RBF: Admin cannot enforce Router enter SafeMode (was: RBF: Router always leaves SafeMode after some time even manually entering SafeMode) > RBF: Admin cannot enforce Router enter SafeMode > --- > > Key: HDFS-13475 > URL: https://issues.apache.org/jira/browse/HDFS-13475 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Wei Yan >Assignee: Wei Yan >Priority: Major > > To reproduce the issue: > {code:java} > $ bin/hdfs dfsrouteradmin -safemode enter > Successfully enter safe mode. > $ bin/hdfs dfsrouteradmin -safemode get > Safe Mode: true{code} > And then, > {code:java} > $ bin/hdfs dfsrouteradmin -safemode get > Safe Mode: false{code} > From the code, it looks like the periodicInvoke triggers the leave. > {code:java} > public void periodicInvoke() { > .. > // Always update to indicate our cache was updated > if (isCacheStale) { > if (!rpcServer.isInSafeMode()) { > enter(); > } > } else if (rpcServer.isInSafeMode()) { > // Cache recently updated, leave safe mode > leave(); > } > } > {code} > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13473) DataNode update BlockKeys using mode PULL rather than PUSH from NameNode
[ https://issues.apache.org/jira/browse/HDFS-13473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16442917#comment-16442917 ] genericqa commented on HDFS-13473: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 39s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 28m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 0s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 52s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 15 new + 546 unchanged - 0 fixed = 561 total (was 546) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 2s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}113m 8s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}178m 32s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.tools.TestHdfsConfigFields | | | hadoop.hdfs.server.datanode.TestBpServiceActorScheduler | | | hadoop.hdfs.TestPread | | | hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy | | | hadoop.hdfs.server.namenode.TestNameNodeMXBean | | | hadoop.hdfs.server.namenode.TestReencryptionWithKMS | | | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b | | JIRA Issue | HDFS-13473 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12919631/HDFS-13473-trunk.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 5bc69cb6f389 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / bf2f493 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_162 | | findbugs | v3.1.0-R
[jira] [Updated] (HDFS-13442) Ozone: Handle Datanode Registration failure
[ https://issues.apache.org/jira/browse/HDFS-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru updated HDFS-13442: -- Attachment: (was: HDFS-13442-HDFS-7240.002.patch) > Ozone: Handle Datanode Registration failure > --- > > Key: HDFS-13442 > URL: https://issues.apache.org/jira/browse/HDFS-13442 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Affects Versions: HDFS-7240 >Reporter: Hanisha Koneru >Assignee: Hanisha Koneru >Priority: Major > Attachments: HDFS-13442-HDFS-7240.001.patch > > > If a datanode is not able to register itself, we need to handle that > correctly. > If the number of unsuccessful attempts to register with the SCM exceeds a > configurable max number, the datanode should not make any more attempts. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-13475) RBF: Router always leaves SafeMode after some time even manually entering SafeMode
Wei Yan created HDFS-13475: -- Summary: RBF: Router always leaves SafeMode after some time even manually entering SafeMode Key: HDFS-13475 URL: https://issues.apache.org/jira/browse/HDFS-13475 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Wei Yan Assignee: Wei Yan To reproduce the issue: {code:java} $ bin/hdfs dfsrouteradmin -safemode enter Successfully enter safe mode. $ bin/hdfs dfsrouteradmin -safemode get Safe Mode: true{code} And then, {code:java} $ bin/hdfs dfsrouteradmin -safemode get Safe Mode: false{code} >From the code, it looks like the periodicInvoke triggers the leave. {code:java} public void periodicInvoke() { .. // Always update to indicate our cache was updated if (isCacheStale) { if (!rpcServer.isInSafeMode()) { enter(); } } else if (rpcServer.isInSafeMode()) { // Cache recently updated, leave safe mode leave(); } } {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13442) Ozone: Handle Datanode Registration failure
[ https://issues.apache.org/jira/browse/HDFS-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hanisha Koneru updated HDFS-13442: -- Attachment: HDFS-13442-HDFS-7240.002.patch > Ozone: Handle Datanode Registration failure > --- > > Key: HDFS-13442 > URL: https://issues.apache.org/jira/browse/HDFS-13442 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Affects Versions: HDFS-7240 >Reporter: Hanisha Koneru >Assignee: Hanisha Koneru >Priority: Major > Attachments: HDFS-13442-HDFS-7240.001.patch, > HDFS-13442-HDFS-7240.002.patch > > > If a datanode is not able to register itself, we need to handle that > correctly. > If the number of unsuccessful attempts to register with the SCM exceeds a > configurable max number, the datanode should not make any more attempts. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13456) Ozone: Update ozone to latest ratis snapshot build (0.1.1-alpha-4309324-SNAPSHOT)
[ https://issues.apache.org/jira/browse/HDFS-13456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16442849#comment-16442849 ] genericqa commented on HDFS-13456: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} HDFS-7240 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 19s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 23s{color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 29m 7s{color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 10s{color} | {color:green} HDFS-7240 passed {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 22s{color} | {color:red} container-service in HDFS-7240 failed. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 31s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 1s{color} | {color:red} hadoop-hdds/common in HDFS-7240 has 1 extant Findbugs warnings. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 22s{color} | {color:red} container-service in HDFS-7240 failed. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 22s{color} | {color:red} container-service in HDFS-7240 failed. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 18s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 10s{color} | {color:red} container-service in the patch failed. {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 27m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 27m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 10s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 22s{color} | {color:red} container-service in the patch failed. {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 3s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 24s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 24s{color} | {color:red} container-service in the patch failed. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 22s{color} | {color:red} container-service in the patch failed. {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 23s{color} | {color:green} hadoop-project in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 4s{color} | {color:green} common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 23s{color} | {color:red} container-service in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 37s{color} | {color:gr
[jira] [Commented] (HDFS-13431) Ozone: Ozone Shell should use RestClient and RpcClient
[ https://issues.apache.org/jira/browse/HDFS-13431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16442844#comment-16442844 ] genericqa commented on HDFS-13431: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} HDFS-7240 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 44s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 4s{color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m 36s{color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 17s{color} | {color:green} HDFS-7240 passed {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 27s{color} | {color:red} client in HDFS-7240 failed. {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 27s{color} | {color:red} integration-test in HDFS-7240 failed. {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 23s{color} | {color:red} ozone-manager in HDFS-7240 failed. {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 25s{color} | {color:red} hadoop-ozone in HDFS-7240 failed. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 40s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-ozone/integration-test {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 57s{color} | {color:red} hadoop-hdds/common in HDFS-7240 has 1 extant Findbugs warnings. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 21s{color} | {color:red} client in HDFS-7240 failed. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 22s{color} | {color:red} ozone-manager in HDFS-7240 failed. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 22s{color} | {color:red} hadoop-ozone in HDFS-7240 failed. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 22s{color} | {color:red} client in HDFS-7240 failed. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 22s{color} | {color:red} integration-test in HDFS-7240 failed. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 22s{color} | {color:red} ozone-manager in HDFS-7240 failed. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 23s{color} | {color:red} hadoop-ozone in HDFS-7240 failed. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 19s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 11s{color} | {color:red} client in the patch failed. {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 10s{color} | {color:red} integration-test in the patch failed. {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 10s{color} | {color:red} ozone-manager in the patch failed. {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 11s{color} | {color:red} hadoop-ozone in the patch failed. {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 27m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 27m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 9s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 23s{color} | {color:red} client in the patch failed. {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 23s{color} | {color:red} integration-test in the patch fai
[jira] [Commented] (HDFS-13441) DataNode missed BlockKey update from NameNode due to HeartbeatResponse was dropped
[ https://issues.apache.org/jira/browse/HDFS-13441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16442824#comment-16442824 ] Daryn Sharp commented on HDFS-13441: This is a bad approach for a couple reasons. Checking if the exception contains "Can't recompute" is very fragile. Exception messages should be considered opaque. Also consider that an invalid token hash caused by a missed key update is rare. The more common case is something like the balancer using an expired secret. Or consider a faulty or malicious client using an expired token. This approach may easily cause DNs to go into re-registration loops and ruin a cluster. Please see discussion on HDFS-13473 for a cleaner way to handle this problem. > DataNode missed BlockKey update from NameNode due to HeartbeatResponse was > dropped > -- > > Key: HDFS-13441 > URL: https://issues.apache.org/jira/browse/HDFS-13441 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode >Affects Versions: 2.7.1 >Reporter: yunjiong zhao >Assignee: yunjiong zhao >Priority: Major > Attachments: HDFS-13441.002.patch, HDFS-13441.patch > > > After NameNode failover, lots of application failed due to some DataNodes > can't re-compute password from block token. > {code:java} > 2018-04-11 20:10:52,448 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: > hdc3-lvs01-400-1701-048.stratus.lvs.ebay.com:50010:DataXceiver error > processing unknown operation src: /10.142.74.116:57404 dst: > /10.142.77.45:50010 > javax.security.sasl.SaslException: DIGEST-MD5: IO error acquiring password > [Caused by org.apache.hadoop.security.token.SecretManager$InvalidToken: Can't > re-compute password for block_token_identifier (expiryDate=1523538652448, > keyId=1762737944, userId=hadoop, > blockPoolId=BP-36315570-10.103.108.13-1423055488042, blockId=12142862700, > access modes=[WRITE]), since the required block key (keyID=1762737944) > doesn't exist.] > at > com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:598) > at > com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslParticipant.evaluateChallengeOrResponse(SaslParticipant.java:115) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.doSaslHandshake(SaslDataTransferServer.java:376) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.getSaslStreams(SaslDataTransferServer.java:300) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.receive(SaslDataTransferServer.java:127) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:194) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.security.token.SecretManager$InvalidToken: Can't > re-compute password for block_token_identifier (expiryDate=1523538652448, > keyId=1762737944, userId=hadoop, > blockPoolId=BP-36315570-10.103.108.13-1423055488042, blockId=12142862700, > access modes=[WRITE]), since the required block key (keyID=1762737944) > doesn't exist. > at > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.retrievePassword(BlockTokenSecretManager.java:382) > at > org.apache.hadoop.hdfs.security.token.block.BlockPoolTokenSecretManager.retrievePassword(BlockPoolTokenSecretManager.java:79) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.buildServerPassword(SaslDataTransferServer.java:318) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.access$100(SaslDataTransferServer.java:73) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer$2.apply(SaslDataTransferServer.java:297) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer$SaslServerCallbackHandler.handle(SaslDataTransferServer.java:241) > at > com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:589) > ... 7 more > {code} > > In the DataNode log, we didn't see DataNode update block keys around > 2018-04-11 09:55:00 and around 2018-04-11 19:55:00. > {code:java} > 2018-04-10 14:51:36,424 INFO > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting > block keys > 2018-04-10 23:55:38,420 INFO > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting > block keys > 2018-04-11 00:51:34,792 INFO > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting > block keys > 2018-04-11 10:51:39,4
[jira] [Commented] (HDFS-13473) DataNode update BlockKeys using mode PULL rather than PUSH from NameNode
[ https://issues.apache.org/jira/browse/HDFS-13473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16442811#comment-16442811 ] Daryn Sharp commented on HDFS-13473: A bit concerned about doing a blocking rpc with retries. What about something like the DN's heartbeat contains the current key version it has? The NN's handleHeartbeat compares with its current key, calls setNeedKeyUpdate if different. Then we don't need additional rpcs, confs, and minimizes the code changes. > DataNode update BlockKeys using mode PULL rather than PUSH from NameNode > > > Key: HDFS-13473 > URL: https://issues.apache.org/jira/browse/HDFS-13473 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-13473-trunk.001.patch > > > It is passive behavior about updating Block keys for DataNode currently, and > it depends on if NameNode return #KeyUpdateCommand for heartbeat response. > There are several problems of this Block keys synchronization mode: > a. NameNode can't be sensed about if Block Keys reach DataNode successfully, > b. It is also not sensed for DataNode who meets some exception while receive > or process heartbeat response which include BlockKeyCommand, > such as HDFS-13441 and HDFS-12749 mentioned. > So I propose improve Push Block Keys from NameNode for DataNode to DataNode > Pull Block Keys. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-13470) RBF: Add Browse the Filesystem button to the UI
[ https://issues.apache.org/jira/browse/HDFS-13470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16442794#comment-16442794 ] Wei Yan edited comment on HDFS-13470 at 4/18/18 4:29 PM: - {quote}Generating the header from javascript {quote} I guess this may work here.. We can let explorer.js generate NN/Router contents, including header, links, etc. Not sure any better idea here. One minor in [^HDFS-13470.000.patch], the tab links need to be updated: {code:java} Overview Subclusters Routers Datanodes Mount table{code} was (Author: ywskycn): {quote}Generating the header from javascript {quote} I guess this may work here.. We can let explorer.js generate NN/Router contents, including header, links, etc. Not sure any better idea here. One minor in [^HDFS-13470.000.patch], the tab links need to be updated: {code:java} Overview Subclusters Routers Datanodes Mount table{code} > RBF: Add Browse the Filesystem button to the UI > --- > > Key: HDFS-13470 > URL: https://issues.apache.org/jira/browse/HDFS-13470 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: Íñigo Goiri >Priority: Major > Attachments: HDFS-13470.000.patch > > > After HDFS-12512 added WebHDFS, we can add the support to browse the > filesystem to the UI. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13470) RBF: Add Browse the Filesystem button to the UI
[ https://issues.apache.org/jira/browse/HDFS-13470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16442794#comment-16442794 ] Wei Yan commented on HDFS-13470: {quote}Generating the header from javascript {quote} I guess this may work here.. We can let explorer.js generate NN/Router contents, including header, links, etc. Not sure any better idea here. One minor in [^HDFS-13470.000.patch], the tab links need to be updated: {code:java} Overview Subclusters Routers Datanodes Mount table{code} > RBF: Add Browse the Filesystem button to the UI > --- > > Key: HDFS-13470 > URL: https://issues.apache.org/jira/browse/HDFS-13470 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: Íñigo Goiri >Priority: Major > Attachments: HDFS-13470.000.patch > > > After HDFS-12512 added WebHDFS, we can add the support to browse the > filesystem to the UI. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13474) Unable to start Hadoop DataNodes
[ https://issues.apache.org/jira/browse/HDFS-13474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] robbie updated HDFS-13474: -- Description: I am trying to follow the instructions in the Getting Started guide, [http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html#YARN_on_Single_Node] I have confirmed, that I can `ssh localhost` without a password prompt. I have also run the following steps, {quote}1. $ bin/hdfs namenode -format 2. $ sbin/start-dfs.sh {quote} But I cant run step 3. to browse the location at `[http://localhost:9870/]`. When I run `>jsp` from the terminal prompt I just get returned, {quote}14900 Jps {quote} I was expecting a list of my nodes. In the Logs I see two error messages towards the end, {quote}2018-04-18 14:15:42,516 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: RECEIVED SIGNAL 15: SIGTERM {quote} {quote}2018-04-18 14:15:42,516 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: RECEIVED SIGNAL 1: SIGHUP {quote} {quote}2018-04-18 14:15:42,517 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down DataNode at c0315/127.0.1.1 / {quote} I will attach the full logs with this bug report. Can anyone help even with ways to debug this please ? Java Version, {quote}rcoll...@steelydan.com@c0315:~/temp/logs/hadoop$ java --version java 9.0.4 Java(TM) SE Runtime Environment (build 9.0.4+11) Java HotSpot(TM) 64-Bit Server VM (build 9.0.4+11, mixed mode) {quote} Ubuntu version, {quote}$ lsb_release -a No LSB modules are available. Distributor ID: neon Description: KDE neon User Edition 5.12 Release: 16.04 Codename: xenial {quote} I have tried running the commands, `bin/hdfs version` {quote}Hadoop 3.1.0 Source code repository [https://github.com/apache/hadoop] -r 16b70619a24cdcf5d3b0fcf4b58ca77238ccbe6d Compiled by centos on 2018-03-30T00:00Z Compiled with protoc 2.5.0 From source with checksum 14182d20c972b3e2105580a1ad6990 This command was run using /home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/common/hadoop-common-3.1.0.jar {quote} when I try `bin/hdfs groups` it doesnt return but gives me, {quote}018-04-18 15:33:34,590 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) {quote} when I try, `$ bin/hdfs lsSnapshottableDir` {quote}lsSnapshottableDir: Call From c0315/127.0.1.1 to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: [http://wiki|http://wiki/]. apache.org/hadoop/ConnectionRefused {quote} when I try, `$ bin/hdfs classpath` {quote}/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/etc/hadoop:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/common/lib/*:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/common/*:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/hdfs:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/hdfs/lib/*:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/hdfs/*:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/mapreduce/*:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/yarn:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/yarn/lib/*:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/yarn/* {quote} core-site.xml {quote} fs.defaultFS hdfs://localhost:9000 {quote} hdfs-site.xml {quote} dfs.replication 1 {quote} mapred-site.xml {quote} mapreduce.framework.name yarn {quote} was: I am trying to follow the instrutions in the GettingStarted guide, [http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html#YARN_on_Single_Node] I have confirmed, that I can `ssh localhost` without a password prompt. I have also run the following steps, {quote}1. $ bin/hdfs namenode -format 2. $ sbin/start-dfs.sh {quote} But I cant run step 3. to browse the location at `[http://localhost:9870/]`. When I run `>jsp` from the terminal prompt I just get returned, {quote}14900 Jps {quote} I was expecting a list of my nodes. In the Logs I see two error messages towards the end, {quote}2018-04-18 14:15:42,516 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: RECEIVED SIGNAL 15: SIGTERM {quote} {quote}2018-04-18 14:15:42,516 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: RECEIVED SIGNAL 1: SIGHUP {quote} {quote}2018-04-18 14:15:42,517 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down DataNode at c0315/127.0.1.1 ***
[jira] [Commented] (HDFS-13448) HDFS Block Placement - Ignore Locality for First Block Replica
[ https://issues.apache.org/jira/browse/HDFS-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16442755#comment-16442755 ] Daryn Sharp commented on HDFS-13448: If we are going to add this feature, it shouldn't have fuzzy semantics. The {{NO_LOCAL_WRITE}} feature is a different, although a valid case for comparison. The {{NO_LOCAL_WRITE}} requires the policy to know the node to provide rack locality, as opposed to this feature where the node is or should be irrelevant. Excluding the local rack is broken for a small number of racks. Take the extreme case of 2 racks. Excluding the local rack will cause placement to fail. The uneven placement this jira seeks to fix will break down if the flume agents are concentrated on a few racks in a cluster with a small number of racks. Simply not providing the node will work with all existing placement policies, and achieve even/random distribution. > HDFS Block Placement - Ignore Locality for First Block Replica > -- > > Key: HDFS-13448 > URL: https://issues.apache.org/jira/browse/HDFS-13448 > Project: Hadoop HDFS > Issue Type: New Feature > Components: block placement, hdfs-client >Affects Versions: 2.9.0, 3.0.1 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Minor > Attachments: HDFS-13448.1.patch, HDFS-13448.2.patch, > HDFS-13448.3.patch > > > According to the HDFS Block Place Rules: > {quote} > /** > * The replica placement strategy is that if the writer is on a datanode, > * the 1st replica is placed on the local machine, > * otherwise a random datanode. The 2nd replica is placed on a datanode > * that is on a different rack. The 3rd replica is placed on a datanode > * which is on a different node of the rack as the second replica. > */ > {quote} > However, there is a hint for the hdfs-client that allows the block placement > request to not put a block replica on the local datanode _where 'local' means > the same host as the client is being run on._ > {quote} > /** >* Advise that a block replica NOT be written to the local DataNode where >* 'local' means the same host as the client is being run on. >* >* @see CreateFlag#NO_LOCAL_WRITE >*/ > {quote} > I propose that we add a new flag that allows the hdfs-client to request that > the first block replica be placed on a random DataNode in the cluster. The > subsequent block replicas should follow the normal block placement rules. > The issue is that when the {{NO_LOCAL_WRITE}} is enabled, the first block > replica is not placed on the local node, but it is still placed on the local > rack. Where this comes into play is where you have, for example, a flume > agent that is loading data into HDFS. > If the Flume agent is running on a DataNode, then by default, the DataNode > local to the Flume agent will always get the first block replica and this > leads to un-even block placements, with the local node always filling up > faster than any other node in the cluster. > Modifying this example, if the DataNode is removed from the host where the > Flume agent is running, or this {{NO_LOCAL_WRITE}} is enabled by Flume, then > the default block placement policy will still prefer the local rack. This > remedies the situation only so far as now the first block replica will always > be distributed to a DataNode on the local rack. > This new flag would allow a single Flume agent to distribute the blocks > randomly, evenly, over the entire cluster instead of hot-spotting the local > node or the local rack. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13470) RBF: Add Browse the Filesystem button to the UI
[ https://issues.apache.org/jira/browse/HDFS-13470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16442745#comment-16442745 ] Íñigo Goiri commented on HDFS-13470: My main concern with this is that the code for explorer.html and explorer.js is pretty much the same as the one for the Namenode. The problem is the header as they have different content there. I would add a header page but not sure how to achieve this: * Using iframes (not a big fan) * Generating the header from javascript Any suggestions? > RBF: Add Browse the Filesystem button to the UI > --- > > Key: HDFS-13470 > URL: https://issues.apache.org/jira/browse/HDFS-13470 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: Íñigo Goiri >Priority: Major > Attachments: HDFS-13470.000.patch > > > After HDFS-12512 added WebHDFS, we can add the support to browse the > filesystem to the UI. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13472) Compilation error in trunk in hadoop-aws
[ https://issues.apache.org/jira/browse/HDFS-13472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16442734#comment-16442734 ] Jason Lowe commented on HDFS-13472: --- I am unable to reproduce the compilation error in trunk. Given StagingTestBase has not been modified since November, it looks like many others have been unable to reproduce the error as well for some time. How are you building Hadoop to reproduce this error (i.e.: what does the command-line look like)? bq. getArgumentAt(int, Class) method is available only from version 2.0.0-beta getArgumentAt is available in 1.10.19. https://static.javadoc.io/org.mockito/mockito-core/1.10.19/org/mockito/invocation/InvocationOnMock.html The reason this is working for me is because mockito-core 1.10.19 is being pulled in by the DynamoDBLocal dependency, and that is appearing in the classpath before the mockito-all 1.8.5 dependency (as reported by mvn dependency:build-classpath). I agree that the version of mockito-all being requested by Hadoop is wrong. It's trying to call a method that isn't available in 1.8.5. I think we should upgrade the mockito dependency to at least 1.10.19. > Compilation error in trunk in hadoop-aws > - > > Key: HDFS-13472 > URL: https://issues.apache.org/jira/browse/HDFS-13472 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Mohammad Arshad >Priority: Major > > *Problem:* hadoop trunk compilation is failing > *Root Cause:* > compilation error is coming from > {{org.apache.hadoop.fs.s3a.commit.staging.StagingTestBase}}. Compilation > error is "The method getArgumentAt(int, Class) is > undefined for the type InvocationOnMock". > StagingTestBase is using getArgumentAt(int, Class) method > which is not available in mockito-all 1.8.5 version. getArgumentAt(int, > Class) method is available only from version 2.0.0-beta > *Expectations:* > Either mockito-all version to be upgraded or test case to be written only > with available functions in 1.8.5. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-13473) DataNode update BlockKeys using mode PULL rather than PUSH from NameNode
[ https://issues.apache.org/jira/browse/HDFS-13473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Xiaoqiao reassigned HDFS-13473: -- Assignee: He Xiaoqiao > DataNode update BlockKeys using mode PULL rather than PUSH from NameNode > > > Key: HDFS-13473 > URL: https://issues.apache.org/jira/browse/HDFS-13473 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-13473-trunk.001.patch > > > It is passive behavior about updating Block keys for DataNode currently, and > it depends on if NameNode return #KeyUpdateCommand for heartbeat response. > There are several problems of this Block keys synchronization mode: > a. NameNode can't be sensed about if Block Keys reach DataNode successfully, > b. It is also not sensed for DataNode who meets some exception while receive > or process heartbeat response which include BlockKeyCommand, > such as HDFS-13441 and HDFS-12749 mentioned. > So I propose improve Push Block Keys from NameNode for DataNode to DataNode > Pull Block Keys. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13473) DataNode update BlockKeys using mode PULL rather than PUSH from NameNode
[ https://issues.apache.org/jira/browse/HDFS-13473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16442732#comment-16442732 ] He Xiaoqiao commented on HDFS-13473: submit initial patch and use {{NamenodeProtocol#getBlockKeys}} interface to update Block Keys periodically by DataNode. also add configuration items to support if switch on this feature. > DataNode update BlockKeys using mode PULL rather than PUSH from NameNode > > > Key: HDFS-13473 > URL: https://issues.apache.org/jira/browse/HDFS-13473 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: He Xiaoqiao >Priority: Major > Attachments: HDFS-13473-trunk.001.patch > > > It is passive behavior about updating Block keys for DataNode currently, and > it depends on if NameNode return #KeyUpdateCommand for heartbeat response. > There are several problems of this Block keys synchronization mode: > a. NameNode can't be sensed about if Block Keys reach DataNode successfully, > b. It is also not sensed for DataNode who meets some exception while receive > or process heartbeat response which include BlockKeyCommand, > such as HDFS-13441 and HDFS-12749 mentioned. > So I propose improve Push Block Keys from NameNode for DataNode to DataNode > Pull Block Keys. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13474) Unable to start Hadoop DataNodes
[ https://issues.apache.org/jira/browse/HDFS-13474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] robert updated HDFS-13474: -- Description: I am trying to follow the instrutions in the GettingStarted guide, [http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html#YARN_on_Single_Node] I have confirmed, that I can `ssh localhost` without a password prompt. I have also run the following steps, {quote}1. $ bin/hdfs namenode -format 2. $ sbin/start-dfs.sh {quote} But I cant run step 3. to browse the location at `[http://localhost:9870/]`. When I run `>jsp` from the terminal prompt I just get returned, {quote}14900 Jps {quote} I was expecting a list of my nodes. In the Logs I see two error messages towards the end, {quote}2018-04-18 14:15:42,516 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: RECEIVED SIGNAL 15: SIGTERM {quote} {quote}2018-04-18 14:15:42,516 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: RECEIVED SIGNAL 1: SIGHUP {quote} {quote}2018-04-18 14:15:42,517 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down DataNode at c0315/127.0.1.1 / {quote} I will attach the full logs with this bug report. Can anyone help even with ways to debug this please ? Java Version, {quote}rcoll...@steelydan.com@c0315:~/temp/logs/hadoop$ java --version java 9.0.4 Java(TM) SE Runtime Environment (build 9.0.4+11) Java HotSpot(TM) 64-Bit Server VM (build 9.0.4+11, mixed mode) {quote} Ubuntu version, {quote}$ lsb_release -a No LSB modules are available. Distributor ID: neon Description: KDE neon User Edition 5.12 Release: 16.04 Codename: xenial {quote} I have tried running the commands, `bin/hdfs version` {quote}Hadoop 3.1.0 Source code repository [https://github.com/apache/hadoop] -r 16b70619a24cdcf5d3b0fcf4b58ca77238ccbe6d Compiled by centos on 2018-03-30T00:00Z Compiled with protoc 2.5.0 From source with checksum 14182d20c972b3e2105580a1ad6990 This command was run using /home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/common/hadoop-common-3.1.0.jar {quote} when I try `bin/hdfs groups` it doesnt return but gives me, {quote}018-04-18 15:33:34,590 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) {quote} when I try, `$ bin/hdfs lsSnapshottableDir` {quote}lsSnapshottableDir: Call From c0315/127.0.1.1 to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: [http://wiki|http://wiki/]. apache.org/hadoop/ConnectionRefused {quote} when I try, `$ bin/hdfs classpath` {quote}/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/etc/hadoop:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/common/lib/*:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/common/*:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/hdfs:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/hdfs/lib/*:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/hdfs/*:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/mapreduce/*:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/yarn:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/yarn/lib/*:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/yarn/* {quote} core-site.xml {quote} fs.defaultFS hdfs://localhost:9000 {quote} hdfs-site.xml {quote} dfs.replication 1 {quote} mapred-site.xml {quote} mapreduce.framework.name yarn {quote} was: I am trying to follow the instrutions in the GettingStarted guide, [http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html#YARN_on_Single_Node] I have confirmed, that I can `ssh localhost` without a password prompt. I have also run the following steps, {quote}1. $ bin/hdfs namenode -format 2. $ sbin/start-dfs.sh {quote} But I cant run step 3. to browse the location at `[http://localhost:9870/]`. When I run `>jsp` from the terminal prompt I just get returned, {quote}14900 Jps {quote} I was expecting a list of my nodes. In the Logs I see two error messages towards the end, {quote}2018-04-18 14:15:42,516 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: RECEIVED SIGNAL 15: SIGTERM {quote} {quote}2018-04-18 14:15:42,516 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: RECEIVED SIGNAL 1: SIGHUP {quote} {quote}2018-04-18 14:15:42,517 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG: /