[jira] [Commented] (HDFS-15082) RBF: Check each component length of destination path when add/update mount entry
[ https://issues.apache.org/jira/browse/HDFS-15082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066416#comment-17066416 ] Hadoop QA commented on HDFS-15082: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 37s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 32m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 2s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 22m 18s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 3s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 7m 33s{color} | {color:red} hadoop-hdfs-rbf in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 27s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 89m 53s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.federation.router.TestRouterFaultTolerant | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.8 Server=19.03.8 Image:yetus/hadoop:4454c6d14b7 | | JIRA Issue | HDFS-15082 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12997621/HDFS-15082.003.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux e357f4a9735b 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / d353b30 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_242 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/29019/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/29019/testReport/ | | Max. process+thread count | 3178 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs-rbf U: hadoop-hdfs-project/hadoop-hdfs-rbf | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/29019/console | | Powered by | Apache Yetu
[jira] [Commented] (HDFS-15154) Allow only hdfs superusers the ability to assign HDFS storage policies
[ https://issues.apache.org/jira/browse/HDFS-15154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066391#comment-17066391 ] Siddharth Wagle commented on HDFS-15154: Thanks [~arp] for your feedback. [~ayushtkn] I have updated version 15 with the minor change to replace most of the CaseUtils calls with string literal for operation name. > Allow only hdfs superusers the ability to assign HDFS storage policies > -- > > Key: HDFS-15154 > URL: https://issues.apache.org/jira/browse/HDFS-15154 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.0.0 >Reporter: Bob Cauthen >Assignee: Siddharth Wagle >Priority: Major > Attachments: HDFS-15154.01.patch, HDFS-15154.02.patch, > HDFS-15154.03.patch, HDFS-15154.04.patch, HDFS-15154.05.patch, > HDFS-15154.06.patch, HDFS-15154.07.patch, HDFS-15154.08.patch, > HDFS-15154.09.patch, HDFS-15154.10.patch, HDFS-15154.11.patch, > HDFS-15154.12.patch, HDFS-15154.13.patch, HDFS-15154.14.patch, > HDFS-15154.15.patch > > > Please provide a way to limit only HDFS superusers the ability to assign HDFS > Storage Policies to HDFS directories. > Currently, and based on Jira HDFS-7093, all storage policies can be disabled > cluster wide by setting the following: > dfs.storage.policy.enabled to false > But we need a way to allow only HDFS superusers the ability to assign an HDFS > Storage Policy to an HDFS directory. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15154) Allow only hdfs superusers the ability to assign HDFS storage policies
[ https://issues.apache.org/jira/browse/HDFS-15154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Wagle updated HDFS-15154: --- Attachment: HDFS-15154.15.patch > Allow only hdfs superusers the ability to assign HDFS storage policies > -- > > Key: HDFS-15154 > URL: https://issues.apache.org/jira/browse/HDFS-15154 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.0.0 >Reporter: Bob Cauthen >Assignee: Siddharth Wagle >Priority: Major > Attachments: HDFS-15154.01.patch, HDFS-15154.02.patch, > HDFS-15154.03.patch, HDFS-15154.04.patch, HDFS-15154.05.patch, > HDFS-15154.06.patch, HDFS-15154.07.patch, HDFS-15154.08.patch, > HDFS-15154.09.patch, HDFS-15154.10.patch, HDFS-15154.11.patch, > HDFS-15154.12.patch, HDFS-15154.13.patch, HDFS-15154.14.patch, > HDFS-15154.15.patch > > > Please provide a way to limit only HDFS superusers the ability to assign HDFS > Storage Policies to HDFS directories. > Currently, and based on Jira HDFS-7093, all storage policies can be disabled > cluster wide by setting the following: > dfs.storage.policy.enabled to false > But we need a way to allow only HDFS superusers the ability to assign an HDFS > Storage Policy to an HDFS directory. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15082) RBF: Check each component length of destination path when add/update mount entry
[ https://issues.apache.org/jira/browse/HDFS-15082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066381#comment-17066381 ] Xiaoqiao He commented on HDFS-15082: Thanks [~elgoiri] for picking up this issue. submit v003 (total same as v002) and try to trigger Jenkins. > RBF: Check each component length of destination path when add/update mount > entry > > > Key: HDFS-15082 > URL: https://issues.apache.org/jira/browse/HDFS-15082 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Attachments: HDFS-15082.001.patch, HDFS-15082.002.patch, > HDFS-15082.003.patch > > > When add/update mount entry, each component length of destination path could > exceed filesystem path component length limit, reference to > `dfs.namenode.fs-limits.max-component-length` of NameNode. So we should check > each component length of destination path when add/update mount entry at > Router side. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15082) RBF: Check each component length of destination path when add/update mount entry
[ https://issues.apache.org/jira/browse/HDFS-15082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoqiao He updated HDFS-15082: --- Attachment: HDFS-15082.003.patch > RBF: Check each component length of destination path when add/update mount > entry > > > Key: HDFS-15082 > URL: https://issues.apache.org/jira/browse/HDFS-15082 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Attachments: HDFS-15082.001.patch, HDFS-15082.002.patch, > HDFS-15082.003.patch > > > When add/update mount entry, each component length of destination path could > exceed filesystem path component length limit, reference to > `dfs.namenode.fs-limits.max-component-length` of NameNode. So we should check > each component length of destination path when add/update mount entry at > Router side. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13470) RBF: Add Browse the Filesystem button to the UI
[ https://issues.apache.org/jira/browse/HDFS-13470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066362#comment-17066362 ] Hadoop QA commented on HDFS-13470: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 45s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 34m 1s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch 36 line(s) with tabs. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 16s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 28s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 50m 35s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.8 Server=19.03.8 Image:yetus/hadoop:4454c6d14b7 | | JIRA Issue | HDFS-13470 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12997615/HDFS-13470.001.patch | | Optional Tests | dupname asflicense shadedclient | | uname | Linux fd9b788c30d9 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / d353b30 | | maven | version: Apache Maven 3.3.9 | | whitespace | https://builds.apache.org/job/PreCommit-HDFS-Build/29018/artifact/out/whitespace-tabs.txt | | Max. process+thread count | 305 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs-rbf U: hadoop-hdfs-project/hadoop-hdfs-rbf | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/29018/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > RBF: Add Browse the Filesystem button to the UI > --- > > Key: HDFS-13470 > URL: https://issues.apache.org/jira/browse/HDFS-13470 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: Íñigo Goiri >Priority: Major > Attachments: HDFS-13470.000.patch, HDFS-13470.001.patch > > > After HDFS-12512 added WebHDFS, we can add the support to browse the > filesystem to the UI. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15238) RBF:NamenodeHeartbeatService caused memory to grow rapidly
[ https://issues.apache.org/jira/browse/HDFS-15238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066353#comment-17066353 ] Íñigo Goiri commented on HDFS-15238: There is a typo in "Cachec HA protocol." in [^HDFS-15238-002.patch]. I don't think there is a need for tests as this is an optimization and is covered already by other tests. Other than the typo, this is ready to go. > RBF:NamenodeHeartbeatService caused memory to grow rapidly > -- > > Key: HDFS-15238 > URL: https://issues.apache.org/jira/browse/HDFS-15238 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: xuzq >Assignee: xuzq >Priority: Major > Attachments: HDFS-15238-002.patch, HDFS-15238-trunk-001.patch > > > NamenodeHeartbeatService will get NameNode's HA status every 5s, and created > HAServiceProtocol every time. > When creating HAServiceProtocol, it also will new Configuration. > Over time, there will be more and more entries for REGISTER in Configuration > until fullGc happen. > The entry will piles up again, after reaching a certain threshold, the > fullGc is triggered again. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15239) Add button to go to the parent directory in the explorer
[ https://issues.apache.org/jira/browse/HDFS-15239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Íñigo Goiri updated HDFS-15239: --- Attachment: screenshot-1.png > Add button to go to the parent directory in the explorer > > > Key: HDFS-15239 > URL: https://issues.apache.org/jira/browse/HDFS-15239 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Íñigo Goiri >Priority: Major > Attachments: screenshot-1.png > > > Currently, when using the HDFS explorer page, it is easy to go into a folder. > However, to go back one has to use the browser back button (if one is coming > from that folder) or to edit the path by hand. > It would be nice to have the typical button to go to the parent. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15239) Add button to go to the parent directory in the explorer
[ https://issues.apache.org/jira/browse/HDFS-15239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066349#comment-17066349 ] Íñigo Goiri commented on HDFS-15239: Currently it looks like this: !screenshot-1.png! > Add button to go to the parent directory in the explorer > > > Key: HDFS-15239 > URL: https://issues.apache.org/jira/browse/HDFS-15239 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Íñigo Goiri >Priority: Major > Attachments: screenshot-1.png > > > Currently, when using the HDFS explorer page, it is easy to go into a folder. > However, to go back one has to use the browser back button (if one is coming > from that folder) or to edit the path by hand. > It would be nice to have the typical button to go to the parent. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15239) Add button to go to the parent directory in the explorer
Íñigo Goiri created HDFS-15239: -- Summary: Add button to go to the parent directory in the explorer Key: HDFS-15239 URL: https://issues.apache.org/jira/browse/HDFS-15239 Project: Hadoop HDFS Issue Type: Improvement Reporter: Íñigo Goiri Currently, when using the HDFS explorer page, it is easy to go into a folder. However, to go back one has to use the browser back button (if one is coming from that folder) or to edit the path by hand. It would be nice to have the typical button to go to the parent. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13470) RBF: Add Browse the Filesystem button to the UI
[ https://issues.apache.org/jira/browse/HDFS-13470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066342#comment-17066342 ] Íñigo Goiri commented on HDFS-13470: I was playing around with trunk and I realized that the explorer never made it into OSS. I updated with the latest HDFS explorer and made it consistent with the NN. Initially, I was trying to reuse the same files but I wasn't able to do a proper refactor. Having duplicated files is not optimal but I think at this point is better than not having this. > RBF: Add Browse the Filesystem button to the UI > --- > > Key: HDFS-13470 > URL: https://issues.apache.org/jira/browse/HDFS-13470 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: Íñigo Goiri >Priority: Major > Attachments: HDFS-13470.000.patch, HDFS-13470.001.patch > > > After HDFS-12512 added WebHDFS, we can add the support to browse the > filesystem to the UI. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13470) RBF: Add Browse the Filesystem button to the UI
[ https://issues.apache.org/jira/browse/HDFS-13470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Íñigo Goiri updated HDFS-13470: --- Attachment: HDFS-13470.001.patch > RBF: Add Browse the Filesystem button to the UI > --- > > Key: HDFS-13470 > URL: https://issues.apache.org/jira/browse/HDFS-13470 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: Íñigo Goiri >Priority: Major > Attachments: HDFS-13470.000.patch, HDFS-13470.001.patch > > > After HDFS-12512 added WebHDFS, we can add the support to browse the > filesystem to the UI. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15238) RBF:NamenodeHeartbeatService caused memory to grow rapidly
[ https://issues.apache.org/jira/browse/HDFS-15238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066336#comment-17066336 ] Hadoop QA commented on HDFS-15238: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 44s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 56s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 15s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 40s{color} | {color:green} hadoop-hdfs-rbf in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 27s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 65m 4s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.8 Server=19.03.8 Image:yetus/hadoop:4454c6d14b7 | | JIRA Issue | HDFS-15238 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12997614/HDFS-15238-002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux e20b0a71b713 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / d353b30 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_242 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/29017/testReport/ | | Max. process+thread count | 3593 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs-rbf U: hadoop-hdfs-project/hadoop-hdfs-rbf | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/29017/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > RBF:NamenodeHeartbeatService
[jira] [Commented] (HDFS-15196) RBF: RouterRpcServer getListing cannot list large dirs correctly
[ https://issues.apache.org/jira/browse/HDFS-15196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066315#comment-17066315 ] Íñigo Goiri commented on HDFS-15196: If possible, I would like to have the correct remaining entries value. As you guys say, the client will just keep asking as long as the value is non-0. However, this could be used for pagination or something else. Even though, I cannot point to an example that uses this, I would try to get the right value in each call. > RBF: RouterRpcServer getListing cannot list large dirs correctly > > > Key: HDFS-15196 > URL: https://issues.apache.org/jira/browse/HDFS-15196 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Fengnan Li >Assignee: Fengnan Li >Priority: Critical > Attachments: HDFS-15196.001.patch, HDFS-15196.002.patch, > HDFS-15196.003.patch, HDFS-15196.003.patch, HDFS-15196.004.patch, > HDFS-15196.005.patch, HDFS-15196.006.patch, HDFS-15196.007.patch, > HDFS-15196.008.patch, HDFS-15196.009.patch, HDFS-15196.010.patch > > > In RouterRpcServer, getListing function is handled as two parts: > # Union all partial listings from destination ns + paths > # Append mount points for the dir to be listed > In the case of large dir which is bigger than DFSConfigKeys.DFS_LIST_LIMIT > (with default value 1k), the batch listing will be used and the startAfter > will be used to define the boundary of each batch listing. However, step 2 > here will add existing mount points, which will mess up with the boundary of > the batch, thus making the next batch startAfter wrong. > The fix is just to append the mount points when there is no more batch query > necessary. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15238) RBF:NamenodeHeartbeatService caused memory to grow rapidly
[ https://issues.apache.org/jira/browse/HDFS-15238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066311#comment-17066311 ] xuzq commented on HDFS-15238: - Thanks [~elgoiri], please review [^HDFS-15238-002.patch] > RBF:NamenodeHeartbeatService caused memory to grow rapidly > -- > > Key: HDFS-15238 > URL: https://issues.apache.org/jira/browse/HDFS-15238 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: xuzq >Assignee: xuzq >Priority: Major > Attachments: HDFS-15238-002.patch, HDFS-15238-trunk-001.patch > > > NamenodeHeartbeatService will get NameNode's HA status every 5s, and created > HAServiceProtocol every time. > When creating HAServiceProtocol, it also will new Configuration. > Over time, there will be more and more entries for REGISTER in Configuration > until fullGc happen. > The entry will piles up again, after reaching a certain threshold, the > fullGc is triggered again. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15238) RBF:NamenodeHeartbeatService caused memory to grow rapidly
[ https://issues.apache.org/jira/browse/HDFS-15238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuzq updated HDFS-15238: Attachment: (was: HDFS-15238-trunk-002.patch) > RBF:NamenodeHeartbeatService caused memory to grow rapidly > -- > > Key: HDFS-15238 > URL: https://issues.apache.org/jira/browse/HDFS-15238 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: xuzq >Assignee: xuzq >Priority: Major > Attachments: HDFS-15238-002.patch, HDFS-15238-trunk-001.patch > > > NamenodeHeartbeatService will get NameNode's HA status every 5s, and created > HAServiceProtocol every time. > When creating HAServiceProtocol, it also will new Configuration. > Over time, there will be more and more entries for REGISTER in Configuration > until fullGc happen. > The entry will piles up again, after reaching a certain threshold, the > fullGc is triggered again. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15238) RBF:NamenodeHeartbeatService caused memory to grow rapidly
[ https://issues.apache.org/jira/browse/HDFS-15238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuzq updated HDFS-15238: Attachment: HDFS-15238-002.patch > RBF:NamenodeHeartbeatService caused memory to grow rapidly > -- > > Key: HDFS-15238 > URL: https://issues.apache.org/jira/browse/HDFS-15238 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: xuzq >Assignee: xuzq >Priority: Major > Attachments: HDFS-15238-002.patch, HDFS-15238-trunk-001.patch > > > NamenodeHeartbeatService will get NameNode's HA status every 5s, and created > HAServiceProtocol every time. > When creating HAServiceProtocol, it also will new Configuration. > Over time, there will be more and more entries for REGISTER in Configuration > until fullGc happen. > The entry will piles up again, after reaching a certain threshold, the > fullGc is triggered again. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15238) RBF:NamenodeHeartbeatService caused memory to grow rapidly
[ https://issues.apache.org/jira/browse/HDFS-15238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuzq updated HDFS-15238: Attachment: HDFS-15238-trunk-002.patch > RBF:NamenodeHeartbeatService caused memory to grow rapidly > -- > > Key: HDFS-15238 > URL: https://issues.apache.org/jira/browse/HDFS-15238 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: xuzq >Assignee: xuzq >Priority: Major > Attachments: HDFS-15238-trunk-001.patch, HDFS-15238-trunk-002.patch > > > NamenodeHeartbeatService will get NameNode's HA status every 5s, and created > HAServiceProtocol every time. > When creating HAServiceProtocol, it also will new Configuration. > Over time, there will be more and more entries for REGISTER in Configuration > until fullGc happen. > The entry will piles up again, after reaching a certain threshold, the > fullGc is triggered again. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12733) Option to disable to namenode local edits
[ https://issues.apache.org/jira/browse/HDFS-12733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066297#comment-17066297 ] Íñigo Goiri commented on HDFS-12733: I guess [^HDFS-12733.008.patch] assumes that one would set DFS_NAMENODE_EDITS_DIR_KEY to null to disable the local edits? I think is fine; can we document this behavior in the HA QJM md file mentioning that if we want to disable the local edits (assuming that the shared ones are enoug), one needs to set dfs.namenode.edits.dir to blank? In addition, in FSNameSystem let's add a log message when {{!noSharedEditDirs.isEmpty()}} saying that the local edits are disabled and that it will rely on the shared ones. BTW, is there a need to check that at least there is one edits location specified? I think that may already be implicit. > Option to disable to namenode local edits > - > > Key: HDFS-12733 > URL: https://issues.apache.org/jira/browse/HDFS-12733 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode, performance >Reporter: Brahma Reddy Battula >Assignee: Xiaoqiao He >Priority: Major > Attachments: HDFS-12733-001.patch, HDFS-12733-002.patch, > HDFS-12733-003.patch, HDFS-12733.004.patch, HDFS-12733.005.patch, > HDFS-12733.006.patch, HDFS-12733.007.patch, HDFS-12733.008.patch > > > As of now, Edits will be written in local and shared locations which will be > redundant and local edits never used in HA setup. > Disabling local edits gives little performance improvement. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15082) RBF: Check each component length of destination path when add/update mount entry
[ https://issues.apache.org/jira/browse/HDFS-15082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066291#comment-17066291 ] Íñigo Goiri commented on HDFS-15082: [~hexiaoqiao], do you mind submitting again? It has been some time and I'd like to make sure it still runs. Other than that, it looks good. > RBF: Check each component length of destination path when add/update mount > entry > > > Key: HDFS-15082 > URL: https://issues.apache.org/jira/browse/HDFS-15082 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Attachments: HDFS-15082.001.patch, HDFS-15082.002.patch > > > When add/update mount entry, each component length of destination path could > exceed filesystem path component length limit, reference to > `dfs.namenode.fs-limits.max-component-length` of NameNode. So we should check > each component length of destination path when add/update mount entry at > Router side. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14385) RBF: Optimize MiniRouterDFSCluster with optional light weight MiniDFSCluster
[ https://issues.apache.org/jira/browse/HDFS-14385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066288#comment-17066288 ] Íñigo Goiri commented on HDFS-14385: [~hexiaoqiao], do you mind rebasing to trunk and posting a patch for that? HDFS-13891 has already been merged. > RBF: Optimize MiniRouterDFSCluster with optional light weight MiniDFSCluster > > > Key: HDFS-14385 > URL: https://issues.apache.org/jira/browse/HDFS-14385 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Attachments: HDFS-14385-HDFS-13891.001.patch > > > MiniRouterDFSCluster mimic federated HDFS cluster with routers to support RBF > test, In MiniRouterDFSCluster, it starts MiniDFSCluster with complete roles > of HDFS which have significant time cost. As HDFS-14351 discussed, it is > better to provide mock MiniDFSCluster/Namenodes as one option to support some > test case and reduce time cost. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13916) Distcp SnapshotDiff to support WebHDFS
[ https://issues.apache.org/jira/browse/HDFS-13916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066267#comment-17066267 ] Hadoop QA commented on HDFS-13916: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 8s{color} | {color:red} HDFS-13916 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-13916 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12976595/HDFS-13916.006.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/29016/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Distcp SnapshotDiff to support WebHDFS > -- > > Key: HDFS-13916 > URL: https://issues.apache.org/jira/browse/HDFS-13916 > Project: Hadoop HDFS > Issue Type: New Feature > Components: distcp, webhdfs >Affects Versions: 3.0.1, 3.1.1 >Reporter: Xun REN >Assignee: Xun REN >Priority: Major > Labels: easyfix, newbie, patch > Attachments: HDFS-13916.002.patch, HDFS-13916.003.patch, > HDFS-13916.004.patch, HDFS-13916.005.patch, HDFS-13916.006.patch, > HDFS-13916.patch > > > [~ljain] has worked on the JIRA: HDFS-13052 to provide the possibility to > make DistCP of SnapshotDiff with WebHDFSFileSystem. However, in the patch, > there is no modification for the real java class which is used by launching > the command "hadoop distcp ..." > > You can check in the latest version here: > [https://github.com/apache/hadoop/blob/branch-3.1.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java#L96-L100] > In the method "preSyncCheck" of the class "DistCpSync", we still check if the > file system is DFS. > So I propose to change the class DistCpSync in order to take into > consideration what was committed by Lokesh Jain. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13916) Distcp SnapshotDiff to support WebHDFS
[ https://issues.apache.org/jira/browse/HDFS-13916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siyao Meng updated HDFS-13916: -- Target Version/s: 3.3.1, 3.4.0 > Distcp SnapshotDiff to support WebHDFS > -- > > Key: HDFS-13916 > URL: https://issues.apache.org/jira/browse/HDFS-13916 > Project: Hadoop HDFS > Issue Type: New Feature > Components: distcp, webhdfs >Affects Versions: 3.0.1, 3.1.1 >Reporter: Xun REN >Assignee: Xun REN >Priority: Major > Labels: easyfix, newbie, patch > Attachments: HDFS-13916.002.patch, HDFS-13916.003.patch, > HDFS-13916.004.patch, HDFS-13916.005.patch, HDFS-13916.006.patch, > HDFS-13916.patch > > > [~ljain] has worked on the JIRA: HDFS-13052 to provide the possibility to > make DistCP of SnapshotDiff with WebHDFSFileSystem. However, in the patch, > there is no modification for the real java class which is used by launching > the command "hadoop distcp ..." > > You can check in the latest version here: > [https://github.com/apache/hadoop/blob/branch-3.1.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpSync.java#L96-L100] > In the method "preSyncCheck" of the class "DistCpSync", we still check if the > file system is DFS. > So I propose to change the class DistCpSync in order to take into > consideration what was committed by Lokesh Jain. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14385) RBF: Optimize MiniRouterDFSCluster with optional light weight MiniDFSCluster
[ https://issues.apache.org/jira/browse/HDFS-14385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066252#comment-17066252 ] Hadoop QA commented on HDFS-14385: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 7m 52s{color} | {color:red} Docker failed to build yetus/hadoop:bdbca0e53b4. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-14385 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12964438/HDFS-14385-HDFS-13891.001.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/29015/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > RBF: Optimize MiniRouterDFSCluster with optional light weight MiniDFSCluster > > > Key: HDFS-14385 > URL: https://issues.apache.org/jira/browse/HDFS-14385 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Attachments: HDFS-14385-HDFS-13891.001.patch > > > MiniRouterDFSCluster mimic federated HDFS cluster with routers to support RBF > test, In MiniRouterDFSCluster, it starts MiniDFSCluster with complete roles > of HDFS which have significant time cost. As HDFS-14351 discussed, it is > better to provide mock MiniDFSCluster/Namenodes as one option to support some > test case and reduce time cost. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14385) RBF: Optimize MiniRouterDFSCluster with optional light weight MiniDFSCluster
[ https://issues.apache.org/jira/browse/HDFS-14385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066242#comment-17066242 ] Wei-Chiu Chuang commented on HDFS-14385: Triggering a rebuild: https://builds.apache.org/job/PreCommit-HDFS-Build/29014/ > RBF: Optimize MiniRouterDFSCluster with optional light weight MiniDFSCluster > > > Key: HDFS-14385 > URL: https://issues.apache.org/jira/browse/HDFS-14385 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Attachments: HDFS-14385-HDFS-13891.001.patch > > > MiniRouterDFSCluster mimic federated HDFS cluster with routers to support RBF > test, In MiniRouterDFSCluster, it starts MiniDFSCluster with complete roles > of HDFS which have significant time cost. As HDFS-14351 discussed, it is > better to provide mock MiniDFSCluster/Namenodes as one option to support some > test case and reduce time cost. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15215) The Timestamp for longest write/read lock held log is wrong
[ https://issues.apache.org/jira/browse/HDFS-15215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066222#comment-17066222 ] Hudson commented on HDFS-15215: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #18083 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/18083/]) HDFS-15215. The Timestamp for longest write/read lock held log is wrong (github: rev d353b30baf6da5b70685cf837cf7095636f345e1) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSNamesystemLock.java * (edit) hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/FakeTimer.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystemLock.java > The Timestamp for longest write/read lock held log is wrong > --- > > Key: HDFS-15215 > URL: https://issues.apache.org/jira/browse/HDFS-15215 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Toshihiro Suzuki >Assignee: Toshihiro Suzuki >Priority: Major > Fix For: 3.3.0 > > > I found the Timestamp for longest write/read lock held log is wrong in trunk: > {code} > 2020-03-10 16:01:26,585 [main] INFO namenode.FSNamesystem > (FSNamesystemLock.java:writeUnlock(281)) - Number of suppressed > write-lock reports: 0 > Longest write-lock held at 1970-01-03 07:07:40,841+0900 for 3ms via > java.lang.Thread.getStackTrace(Thread.java:1559) > ... > {code} > Looking at the code, it looks like the timestamp comes from System.nanoTime() > that returns the current value of the running Java Virtual Machine's > high-resolution time source and this method can only be used to measure > elapsed time: > https://docs.oracle.com/javase/8/docs/api/java/lang/System.html#nanoTime-- > We need to make the timestamp from System.currentTimeMillis(). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15154) Allow only hdfs superusers the ability to assign HDFS storage policies
[ https://issues.apache.org/jira/browse/HDFS-15154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066220#comment-17066220 ] Arpit Agarwal commented on HDFS-15154: -- Sorry I didn't get time to look at this in detail. [~ayushtkn] if the changes look good to you then please go ahead. Main thing would be to ensure there is no incompatibility introduced by the change. > Allow only hdfs superusers the ability to assign HDFS storage policies > -- > > Key: HDFS-15154 > URL: https://issues.apache.org/jira/browse/HDFS-15154 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.0.0 >Reporter: Bob Cauthen >Assignee: Siddharth Wagle >Priority: Major > Attachments: HDFS-15154.01.patch, HDFS-15154.02.patch, > HDFS-15154.03.patch, HDFS-15154.04.patch, HDFS-15154.05.patch, > HDFS-15154.06.patch, HDFS-15154.07.patch, HDFS-15154.08.patch, > HDFS-15154.09.patch, HDFS-15154.10.patch, HDFS-15154.11.patch, > HDFS-15154.12.patch, HDFS-15154.13.patch, HDFS-15154.14.patch > > > Please provide a way to limit only HDFS superusers the ability to assign HDFS > Storage Policies to HDFS directories. > Currently, and based on Jira HDFS-7093, all storage policies can be disabled > cluster wide by setting the following: > dfs.storage.policy.enabled to false > But we need a way to allow only HDFS superusers the ability to assign an HDFS > Storage Policy to an HDFS directory. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15154) Allow only hdfs superusers the ability to assign HDFS storage policies
[ https://issues.apache.org/jira/browse/HDFS-15154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066220#comment-17066220 ] Arpit Agarwal edited comment on HDFS-15154 at 3/24/20, 10:03 PM: - Sorry I didn't get time to look at this in detail. [~ayushtkn] if the changes look good to you then please go ahead and commit. Main thing would be to ensure there is no incompatibility introduced by the change. was (Author: arpitagarwal): Sorry I didn't get time to look at this in detail. [~ayushtkn] if the changes look good to you then please go ahead. Main thing would be to ensure there is no incompatibility introduced by the change. > Allow only hdfs superusers the ability to assign HDFS storage policies > -- > > Key: HDFS-15154 > URL: https://issues.apache.org/jira/browse/HDFS-15154 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.0.0 >Reporter: Bob Cauthen >Assignee: Siddharth Wagle >Priority: Major > Attachments: HDFS-15154.01.patch, HDFS-15154.02.patch, > HDFS-15154.03.patch, HDFS-15154.04.patch, HDFS-15154.05.patch, > HDFS-15154.06.patch, HDFS-15154.07.patch, HDFS-15154.08.patch, > HDFS-15154.09.patch, HDFS-15154.10.patch, HDFS-15154.11.patch, > HDFS-15154.12.patch, HDFS-15154.13.patch, HDFS-15154.14.patch > > > Please provide a way to limit only HDFS superusers the ability to assign HDFS > Storage Policies to HDFS directories. > Currently, and based on Jira HDFS-7093, all storage policies can be disabled > cluster wide by setting the following: > dfs.storage.policy.enabled to false > But we need a way to allow only HDFS superusers the ability to assign an HDFS > Storage Policy to an HDFS directory. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15215) The Timestamp for longest write/read lock held log is wrong
[ https://issues.apache.org/jira/browse/HDFS-15215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066209#comment-17066209 ] Íñigo Goiri commented on HDFS-15215: Thanks [~brfrn169] for the fix. Merged the PR to trunk. > The Timestamp for longest write/read lock held log is wrong > --- > > Key: HDFS-15215 > URL: https://issues.apache.org/jira/browse/HDFS-15215 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Toshihiro Suzuki >Assignee: Toshihiro Suzuki >Priority: Major > Fix For: 3.3.0 > > > I found the Timestamp for longest write/read lock held log is wrong in trunk: > {code} > 2020-03-10 16:01:26,585 [main] INFO namenode.FSNamesystem > (FSNamesystemLock.java:writeUnlock(281)) - Number of suppressed > write-lock reports: 0 > Longest write-lock held at 1970-01-03 07:07:40,841+0900 for 3ms via > java.lang.Thread.getStackTrace(Thread.java:1559) > ... > {code} > Looking at the code, it looks like the timestamp comes from System.nanoTime() > that returns the current value of the running Java Virtual Machine's > high-resolution time source and this method can only be used to measure > elapsed time: > https://docs.oracle.com/javase/8/docs/api/java/lang/System.html#nanoTime-- > We need to make the timestamp from System.currentTimeMillis(). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-15215) The Timestamp for longest write/read lock held log is wrong
[ https://issues.apache.org/jira/browse/HDFS-15215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Íñigo Goiri resolved HDFS-15215. Fix Version/s: 3.3.0 Hadoop Flags: Reviewed Resolution: Fixed > The Timestamp for longest write/read lock held log is wrong > --- > > Key: HDFS-15215 > URL: https://issues.apache.org/jira/browse/HDFS-15215 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Toshihiro Suzuki >Assignee: Toshihiro Suzuki >Priority: Major > Fix For: 3.3.0 > > > I found the Timestamp for longest write/read lock held log is wrong in trunk: > {code} > 2020-03-10 16:01:26,585 [main] INFO namenode.FSNamesystem > (FSNamesystemLock.java:writeUnlock(281)) - Number of suppressed > write-lock reports: 0 > Longest write-lock held at 1970-01-03 07:07:40,841+0900 for 3ms via > java.lang.Thread.getStackTrace(Thread.java:1559) > ... > {code} > Looking at the code, it looks like the timestamp comes from System.nanoTime() > that returns the current value of the running Java Virtual Machine's > high-resolution time source and this method can only be used to measure > elapsed time: > https://docs.oracle.com/javase/8/docs/api/java/lang/System.html#nanoTime-- > We need to make the timestamp from System.currentTimeMillis(). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15075) Remove process command timing from BPServiceActor
[ https://issues.apache.org/jira/browse/HDFS-15075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066208#comment-17066208 ] Íñigo Goiri commented on HDFS-15075: Thanks [~weichiu] for the comment, I think it is fair. [~hexiaoqiao] do you mind limitting this JIRA to fixing the metrics in BPServiceActor and create a separate JIRA for the other metrics? > Remove process command timing from BPServiceActor > - > > Key: HDFS-15075 > URL: https://issues.apache.org/jira/browse/HDFS-15075 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Íñigo Goiri >Assignee: Xiaoqiao He >Priority: Major > Attachments: HDFS-15075.001.patch, HDFS-15075.002.patch, > HDFS-15075.003.patch, HDFS-15075.004.patch, HDFS-15075.005.patch, > HDFS-15075.006.patch, HDFS-15075.007.patch > > > HDFS-14997 moved the command processing into async. > Right now, we are checking the time to add to a queue. > We should remove this one and maybe move the timing within the thread. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15238) RBF:NamenodeHeartbeatService caused memory to grow rapidly
[ https://issues.apache.org/jira/browse/HDFS-15238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066203#comment-17066203 ] Íñigo Goiri commented on HDFS-15238: BTW, I'm not sure if Yetus will catch this patch file name, let's call it HDFS-15238.002.patch just in case. > RBF:NamenodeHeartbeatService caused memory to grow rapidly > -- > > Key: HDFS-15238 > URL: https://issues.apache.org/jira/browse/HDFS-15238 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: xuzq >Assignee: xuzq >Priority: Major > Attachments: HDFS-15238-trunk-001.patch > > > NamenodeHeartbeatService will get NameNode's HA status every 5s, and created > HAServiceProtocol every time. > When creating HAServiceProtocol, it also will new Configuration. > Over time, there will be more and more entries for REGISTER in Configuration > until fullGc happen. > The entry will piles up again, after reaching a certain threshold, the > fullGc is triggered again. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15238) RBF:NamenodeHeartbeatService caused memory to grow rapidly
[ https://issues.apache.org/jira/browse/HDFS-15238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066202#comment-17066202 ] Íñigo Goiri commented on HDFS-15238: It makes sense to cache it; I'm guessing you have profiled this and the size is significant? We make want to reset it with the exceptions just in case. So my comments would be: * When we catch the exception, set localTargetHAProtocol to null. * Add a javadoc comment to localTargetHAProtocol: {code} /** Cachec HA protocol. */ private HAServiceProtocol localTargetHAProtocol; {code} > RBF:NamenodeHeartbeatService caused memory to grow rapidly > -- > > Key: HDFS-15238 > URL: https://issues.apache.org/jira/browse/HDFS-15238 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: xuzq >Assignee: xuzq >Priority: Major > Attachments: HDFS-15238-trunk-001.patch > > > NamenodeHeartbeatService will get NameNode's HA status every 5s, and created > HAServiceProtocol every time. > When creating HAServiceProtocol, it also will new Configuration. > Over time, there will be more and more entries for REGISTER in Configuration > until fullGc happen. > The entry will piles up again, after reaching a certain threshold, the > fullGc is triggered again. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15191) EOF when reading legacy buffer in BlockTokenIdentifier
[ https://issues.apache.org/jira/browse/HDFS-15191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066165#comment-17066165 ] Hadoop QA commented on HDFS-15191: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 45s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 17m 6s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 9s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 3m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 10s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 52s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green}104m 21s{color} | {color:green} hadoop-hdfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 33s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}184m 10s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.8 Server=19.03.8 Image:yetus/hadoop:4454c6d14b7 | | JIRA Issue | HDFS-15191 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12997584/HDFS-15191.004.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 24ea7c8c3a8e 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 4454c6d | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_242 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/29013/testReport/ | | Max. process+thread count | 2803 (vs. ulimit of 5500) | | modu
[jira] [Commented] (HDFS-15219) DFS Client will stuck when ResponseProcessor.run throw Error
[ https://issues.apache.org/jira/browse/HDFS-15219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066072#comment-17066072 ] Hudson commented on HDFS-15219: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #18082 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/18082/]) HDFS-15219. DFS Client will stuck when ResponseProcessor.run throw Error (github: rev d9c4f1129c0814ab61fce6ea8baf4b272f84c252) * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java > DFS Client will stuck when ResponseProcessor.run throw Error > > > Key: HDFS-15219 > URL: https://issues.apache.org/jira/browse/HDFS-15219 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 2.7.3 >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Fix For: 3.3.0 > > Original Estimate: 672h > Remaining Estimate: 672h > > In my case, a Tez application stucked more than 2 hours util we kill this > applicaiton. The Reason is a task attempt stucked, becuase speculative > execution is disable. > Then Exception like this: > {code:java} > 2020-03-11 01:23:59,141 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records > read - 10 > 2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.FileSinkOperator|: FS[3]: > records written - 100 > 2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records > read - 100 > 2020-03-11 01:29:02,967 [FATAL] [ResponseProcessor for block > BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] > |yarn.YarnUncaughtExceptionHandler|: Thread Thread[ResponseProcessor for > block > BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073,5,main] > threw an Error. Shutting down now... > java.lang.NoClassDefFoundError: com/google/protobuf/TextFormat > at > org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.toString(PipelineAck.java:253) > at java.lang.String.valueOf(String.java:2847) > at java.lang.StringBuilder.append(StringBuilder.java:128) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:737) > Caused by: java.lang.ClassNotFoundException: com.google.protobuf.TextFormat > at java.net.URLClassLoader$1.run(URLClassLoader.java:363) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:425) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:358) > ... 4 more > Caused by: java.util.zip.ZipException: error reading zip file > at java.util.zip.ZipFile.read(Native Method) > at java.util.zip.ZipFile.access$1400(ZipFile.java:56) > at java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:679) > at java.util.zip.ZipFile$ZipFileInflaterInputStream.fill(ZipFile.java:415) > at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158) > at sun.misc.Resource.getBytes(Resource.java:124) > at java.net.URLClassLoader.defineClass(URLClassLoader.java:444) > at java.net.URLClassLoader.access$100(URLClassLoader.java:71) > at java.net.URLClassLoader$1.run(URLClassLoader.java:361) > ... 10 more > 2020-03-11 01:29:02,970 [INFO] [ResponseProcessor for block > BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] > |util.ExitUtil|: Exiting with status -1 > 2020-03-11 03:27:26,833 [INFO] [TaskHeartbeatThread] |task.TaskReporter|: > Received should die response from AM > 2020-03-11 03:27:26,834 [INFO] [TaskHeartbeatThread] |task.TaskReporter|: > Asked to die via task heartbeat > 2020-03-11 03:27:26,839 [INFO] [TaskHeartbeatThread] |task.TezTaskRunner2|: > Attempting to abort attempt_1583335296048_917815_3_01_000704_0 due to an > invocation of shutdownRequested > {code} > Reason is UncaughtException. When time is 01:29, a disk was error, so throw > NoClassDefFoundError. ResponseProcessor.run only catch Exception, can't catch > NoClassDefFoundError. So the ReponseProcessor didn't set errorState. Then > DataStream didn't know ReponseProcessor was dead, and can't trigger > closeResponder, so stucked in DataStream.run. > I tested in unit-test TestDataStream.testDfsClient. When I throw > NoClassDefFoundError in ResponseProcessor.run, the > TestDataStream.testDfsClient will failed bacause of timeout. > I think we should catch Throwable but not Exception in ReponseProcessor.run. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.o
[jira] [Commented] (HDFS-15235) Transient network failure during NameNode failover makes cluster unavailable
[ https://issues.apache.org/jira/browse/HDFS-15235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066043#comment-17066043 ] YCozy commented on HDFS-15235: -- [~ayushtkn] Thanks for looking at this! I'll try to upload a UT and a fix. > Transient network failure during NameNode failover makes cluster unavailable > > > Key: HDFS-15235 > URL: https://issues.apache.org/jira/browse/HDFS-15235 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: YCozy >Priority: Major > > We have an HA cluster with two NameNodes: an active NN1 and a standby NN2. At > some point, NN1 becomes unhealthy and the admin tries to manually failover to > NN2 by running command > {code:java} > $ hdfs haadmin -failover NN1 NN2 > {code} > NN2 receives the request and becomes active: > {code:java} > 2020-03-24 00:24:56,412 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Stopping services > started for standby state > 2020-03-24 00:24:56,413 WARN > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Edit log tailer > interrupted: sleep interrupted > 2020-03-24 00:24:56,415 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services > required for active state > 2020-03-24 00:24:56,417 INFO > org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Recovering > unfinalized segments in /app/ha-name-dir-shared/current > 2020-03-24 00:24:56,419 INFO > org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Recovering > unfinalized segments in /app/nn2/name/current > 2020-03-24 00:24:56,419 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Catching up to latest > edits from old active before taking over writer role in edits logs > 2020-03-24 00:24:56,435 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: > Reading > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@7c3095fa > expecting start txid #1 > 2020-03-24 00:24:56,436 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: > Start loading edits file > /app/ha-name-dir-shared/current/edits_001-019 > maxTxnsToRead = 9223372036854775807 > 2020-03-24 00:24:56,441 INFO > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream: > Fast-forwarding stream > '/app/ha-name-dir-shared/current/edits_001-019' > to transaction ID 1 > 2020-03-24 00:24:56,567 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: > Loaded 1 edits file(s) (the last named > /app/ha-name-dir-shared/current/edits_001-019) > of total size 1305.0, total edits 19.0, total load time 109.0 ms > 2020-03-24 00:24:56,567 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Marking all > datanodes as stale > 2020-03-24 00:24:56,568 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Processing 4 > messages from DataNodes that were previously queued during standby state > 2020-03-24 00:24:56,569 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Reprocessing replication > and invalidation queues > 2020-03-24 00:24:56,569 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: initializing > replication queues > 2020-03-24 00:24:56,570 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Will take over writing > edit logs at txnid 20 > 2020-03-24 00:24:56,571 INFO > org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 20 > 2020-03-24 00:24:56,812 INFO > org.apache.hadoop.hdfs.server.namenode.FSDirectory: Initializing quota with 4 > thread(s) > 2020-03-24 00:24:56,819 INFO > org.apache.hadoop.hdfs.server.namenode.FSDirectory: Quota initialization > completed in 6 millisecondsname space=3storage space=24690storage > types=RAM_DISK=0, SSD=0, DISK=0, ARCHIVE=0, PROVIDED=0 > 2020-03-24 00:24:56,827 INFO > org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: > Starting CacheReplicationMonitor with interval 3 milliseconds > {code} > But NN2 fails to send back the RPC response because of temporary network > partitioning. > {code:java} > java.io.EOFException: End of File Exception between local host is: > "24e7b5a52e85/172.17.0.2"; destination host is: "127.0.0.3":8180; : > java.io.EOFException; For more details see: > http://wiki.apache.org/hadoop/EOFException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at org.apache.hadoop
[jira] [Commented] (HDFS-15235) Transient network failure during NameNode failover makes cluster unavailable
[ https://issues.apache.org/jira/browse/HDFS-15235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066042#comment-17066042 ] YCozy commented on HDFS-15235: -- A bit more info: After NN2 fails to send back a response, haadmin first tries to fail back to NN1 before it fences NN2. The failing back succeeds, i.e., NN2 becomes back to standby and NN1 becomes back to active. Here's a log snippet from NN2 (the following log is from a different run, so please ignore the timestamps): {code:java} 2020-03-24 17:17:27,254 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Stopping services started for active state 2020-03-24 17:17:27,255 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 19, 19 2020-03-24 17:17:27,255 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: LazyPersistFileScrubber was interrupted, exiting 2020-03-24 17:17:27,256 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: NameNodeEditLogRoller was interrupted, exiting 2020-03-24 17:17:27,257 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 5 Number of transactions batched in Syncs: 18 Number of syncs: 3 SyncTimes(ms): 4 6 2020-03-24 17:17:27,259 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits file /app/ha-name-dir-shared/current/edits_inprogress_019 -> /app/ha-name-dir-shared/current/edits_019-020 2020-03-24 17:17:27,285 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits file /app/nn2/name/current/edits_inprogress_019 -> /app/nn2/name/current/edits_019-020 2020-03-24 17:17:27,286 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: FSEditLogAsync was interrupted, exiting 2020-03-24 17:17:27,290 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Shutting down CacheReplicationMonitor 2020-03-24 17:17:27,290 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services required for standby state 2020-03-24 17:17:27,294 INFO org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Will roll logs on active node every 120 seconds. 2020-03-24 17:17:27,299 INFO org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer: Starting standby checkpoint thread... {code} Since we only want to make sure that "only one NameNode be in the Active state at any given time" ([from the description of the fencing config|https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithNFS.html#Configuration_details]), and the successful fail-back has already achieved it, we shouldn't kill NN2. > Transient network failure during NameNode failover makes cluster unavailable > > > Key: HDFS-15235 > URL: https://issues.apache.org/jira/browse/HDFS-15235 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: YCozy >Priority: Major > > We have an HA cluster with two NameNodes: an active NN1 and a standby NN2. At > some point, NN1 becomes unhealthy and the admin tries to manually failover to > NN2 by running command > {code:java} > $ hdfs haadmin -failover NN1 NN2 > {code} > NN2 receives the request and becomes active: > {code:java} > 2020-03-24 00:24:56,412 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Stopping services > started for standby state > 2020-03-24 00:24:56,413 WARN > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Edit log tailer > interrupted: sleep interrupted > 2020-03-24 00:24:56,415 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services > required for active state > 2020-03-24 00:24:56,417 INFO > org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Recovering > unfinalized segments in /app/ha-name-dir-shared/current > 2020-03-24 00:24:56,419 INFO > org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Recovering > unfinalized segments in /app/nn2/name/current > 2020-03-24 00:24:56,419 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Catching up to latest > edits from old active before taking over writer role in edits logs > 2020-03-24 00:24:56,435 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: > Reading > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@7c3095fa > expecting start txid #1 > 2020-03-24 00:24:56,436 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: > Start loading edits file > /app/ha-name-dir-shared/current/edits_001-019 > maxTxnsToRead = 9223372036854775807 > 2020-03-24 00:24:56,441 INFO > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream: > Fast-forwarding s
[jira] [Commented] (HDFS-13377) The owner of folder can set quota for his sub folder
[ https://issues.apache.org/jira/browse/HDFS-13377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066039#comment-17066039 ] Hudson commented on HDFS-13377: --- FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #18081 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/18081/]) HDFS-13377. The owner of folder can set quota for his sub folder. (ayushsaxena: rev ea87d6049340d1df040047aa08ce7784c03dd69e) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirAttrOp.java * (add) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestQuotaAllowOwner.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestQuota.java > The owner of folder can set quota for his sub folder > > > Key: HDFS-13377 > URL: https://issues.apache.org/jira/browse/HDFS-13377 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Yang Yun >Assignee: Yang Yun >Priority: Minor > Fix For: 3.3.0 > > Attachments: HDFS-13377.003.patch, HDFS-13377.004.patch, > HDFS-13377.005.patch, HDFS-13377.006.patch, HDFS-13377.patch, > HDFS-13377.patch, HDFS-13377.patch > > > Currently, only super user can set quota. That is huge burden for > administrator in a large system. Add a new feature to let the owner of a > folder also has the privilege to set quota for his sub folders. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15219) DFS Client will stuck when ResponseProcessor.run throw Error
[ https://issues.apache.org/jira/browse/HDFS-15219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena updated HDFS-15219: Fix Version/s: 3.3.0 Hadoop Flags: Reviewed Resolution: Fixed Status: Resolved (was: Patch Available) > DFS Client will stuck when ResponseProcessor.run throw Error > > > Key: HDFS-15219 > URL: https://issues.apache.org/jira/browse/HDFS-15219 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 2.7.3 >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Fix For: 3.3.0 > > Original Estimate: 672h > Remaining Estimate: 672h > > In my case, a Tez application stucked more than 2 hours util we kill this > applicaiton. The Reason is a task attempt stucked, becuase speculative > execution is disable. > Then Exception like this: > {code:java} > 2020-03-11 01:23:59,141 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records > read - 10 > 2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.FileSinkOperator|: FS[3]: > records written - 100 > 2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records > read - 100 > 2020-03-11 01:29:02,967 [FATAL] [ResponseProcessor for block > BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] > |yarn.YarnUncaughtExceptionHandler|: Thread Thread[ResponseProcessor for > block > BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073,5,main] > threw an Error. Shutting down now... > java.lang.NoClassDefFoundError: com/google/protobuf/TextFormat > at > org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.toString(PipelineAck.java:253) > at java.lang.String.valueOf(String.java:2847) > at java.lang.StringBuilder.append(StringBuilder.java:128) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:737) > Caused by: java.lang.ClassNotFoundException: com.google.protobuf.TextFormat > at java.net.URLClassLoader$1.run(URLClassLoader.java:363) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:425) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:358) > ... 4 more > Caused by: java.util.zip.ZipException: error reading zip file > at java.util.zip.ZipFile.read(Native Method) > at java.util.zip.ZipFile.access$1400(ZipFile.java:56) > at java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:679) > at java.util.zip.ZipFile$ZipFileInflaterInputStream.fill(ZipFile.java:415) > at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158) > at sun.misc.Resource.getBytes(Resource.java:124) > at java.net.URLClassLoader.defineClass(URLClassLoader.java:444) > at java.net.URLClassLoader.access$100(URLClassLoader.java:71) > at java.net.URLClassLoader$1.run(URLClassLoader.java:361) > ... 10 more > 2020-03-11 01:29:02,970 [INFO] [ResponseProcessor for block > BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] > |util.ExitUtil|: Exiting with status -1 > 2020-03-11 03:27:26,833 [INFO] [TaskHeartbeatThread] |task.TaskReporter|: > Received should die response from AM > 2020-03-11 03:27:26,834 [INFO] [TaskHeartbeatThread] |task.TaskReporter|: > Asked to die via task heartbeat > 2020-03-11 03:27:26,839 [INFO] [TaskHeartbeatThread] |task.TezTaskRunner2|: > Attempting to abort attempt_1583335296048_917815_3_01_000704_0 due to an > invocation of shutdownRequested > {code} > Reason is UncaughtException. When time is 01:29, a disk was error, so throw > NoClassDefFoundError. ResponseProcessor.run only catch Exception, can't catch > NoClassDefFoundError. So the ReponseProcessor didn't set errorState. Then > DataStream didn't know ReponseProcessor was dead, and can't trigger > closeResponder, so stucked in DataStream.run. > I tested in unit-test TestDataStream.testDfsClient. When I throw > NoClassDefFoundError in ResponseProcessor.run, the > TestDataStream.testDfsClient will failed bacause of timeout. > I think we should catch Throwable but not Exception in ReponseProcessor.run. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15219) DFS Client will stuck when ResponseProcessor.run throw Error
[ https://issues.apache.org/jira/browse/HDFS-15219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066029#comment-17066029 ] Ayush Saxena commented on HDFS-15219: - Merged PR. Thanx [~zhengchenyu] for the contribution. > DFS Client will stuck when ResponseProcessor.run throw Error > > > Key: HDFS-15219 > URL: https://issues.apache.org/jira/browse/HDFS-15219 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 2.7.3 >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Original Estimate: 672h > Remaining Estimate: 672h > > In my case, a Tez application stucked more than 2 hours util we kill this > applicaiton. The Reason is a task attempt stucked, becuase speculative > execution is disable. > Then Exception like this: > {code:java} > 2020-03-11 01:23:59,141 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records > read - 10 > 2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.FileSinkOperator|: FS[3]: > records written - 100 > 2020-03-11 01:24:50,294 [INFO] [TezChild] |exec.MapOperator|: MAP[4]: records > read - 100 > 2020-03-11 01:29:02,967 [FATAL] [ResponseProcessor for block > BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] > |yarn.YarnUncaughtExceptionHandler|: Thread Thread[ResponseProcessor for > block > BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073,5,main] > threw an Error. Shutting down now... > java.lang.NoClassDefFoundError: com/google/protobuf/TextFormat > at > org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.toString(PipelineAck.java:253) > at java.lang.String.valueOf(String.java:2847) > at java.lang.StringBuilder.append(StringBuilder.java:128) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:737) > Caused by: java.lang.ClassNotFoundException: com.google.protobuf.TextFormat > at java.net.URLClassLoader$1.run(URLClassLoader.java:363) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:425) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:358) > ... 4 more > Caused by: java.util.zip.ZipException: error reading zip file > at java.util.zip.ZipFile.read(Native Method) > at java.util.zip.ZipFile.access$1400(ZipFile.java:56) > at java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:679) > at java.util.zip.ZipFile$ZipFileInflaterInputStream.fill(ZipFile.java:415) > at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158) > at sun.misc.Resource.getBytes(Resource.java:124) > at java.net.URLClassLoader.defineClass(URLClassLoader.java:444) > at java.net.URLClassLoader.access$100(URLClassLoader.java:71) > at java.net.URLClassLoader$1.run(URLClassLoader.java:361) > ... 10 more > 2020-03-11 01:29:02,970 [INFO] [ResponseProcessor for block > BP-1856561198-172.16.6.67-1421842461517:blk_15177828027_14109212073] > |util.ExitUtil|: Exiting with status -1 > 2020-03-11 03:27:26,833 [INFO] [TaskHeartbeatThread] |task.TaskReporter|: > Received should die response from AM > 2020-03-11 03:27:26,834 [INFO] [TaskHeartbeatThread] |task.TaskReporter|: > Asked to die via task heartbeat > 2020-03-11 03:27:26,839 [INFO] [TaskHeartbeatThread] |task.TezTaskRunner2|: > Attempting to abort attempt_1583335296048_917815_3_01_000704_0 due to an > invocation of shutdownRequested > {code} > Reason is UncaughtException. When time is 01:29, a disk was error, so throw > NoClassDefFoundError. ResponseProcessor.run only catch Exception, can't catch > NoClassDefFoundError. So the ReponseProcessor didn't set errorState. Then > DataStream didn't know ReponseProcessor was dead, and can't trigger > closeResponder, so stucked in DataStream.run. > I tested in unit-test TestDataStream.testDfsClient. When I throw > NoClassDefFoundError in ResponseProcessor.run, the > TestDataStream.testDfsClient will failed bacause of timeout. > I think we should catch Throwable but not Exception in ReponseProcessor.run. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15235) Transient network failure during NameNode failover makes cluster unavailable
[ https://issues.apache.org/jira/browse/HDFS-15235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066019#comment-17066019 ] Ayush Saxena commented on HDFS-15235: - Is there at UT to reproduce, or do you have a fix to contribute? > Transient network failure during NameNode failover makes cluster unavailable > > > Key: HDFS-15235 > URL: https://issues.apache.org/jira/browse/HDFS-15235 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: YCozy >Priority: Major > > We have an HA cluster with two NameNodes: an active NN1 and a standby NN2. At > some point, NN1 becomes unhealthy and the admin tries to manually failover to > NN2 by running command > {code:java} > $ hdfs haadmin -failover NN1 NN2 > {code} > NN2 receives the request and becomes active: > {code:java} > 2020-03-24 00:24:56,412 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Stopping services > started for standby state > 2020-03-24 00:24:56,413 WARN > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Edit log tailer > interrupted: sleep interrupted > 2020-03-24 00:24:56,415 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services > required for active state > 2020-03-24 00:24:56,417 INFO > org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Recovering > unfinalized segments in /app/ha-name-dir-shared/current > 2020-03-24 00:24:56,419 INFO > org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Recovering > unfinalized segments in /app/nn2/name/current > 2020-03-24 00:24:56,419 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Catching up to latest > edits from old active before taking over writer role in edits logs > 2020-03-24 00:24:56,435 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: > Reading > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@7c3095fa > expecting start txid #1 > 2020-03-24 00:24:56,436 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: > Start loading edits file > /app/ha-name-dir-shared/current/edits_001-019 > maxTxnsToRead = 9223372036854775807 > 2020-03-24 00:24:56,441 INFO > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream: > Fast-forwarding stream > '/app/ha-name-dir-shared/current/edits_001-019' > to transaction ID 1 > 2020-03-24 00:24:56,567 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: > Loaded 1 edits file(s) (the last named > /app/ha-name-dir-shared/current/edits_001-019) > of total size 1305.0, total edits 19.0, total load time 109.0 ms > 2020-03-24 00:24:56,567 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Marking all > datanodes as stale > 2020-03-24 00:24:56,568 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Processing 4 > messages from DataNodes that were previously queued during standby state > 2020-03-24 00:24:56,569 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Reprocessing replication > and invalidation queues > 2020-03-24 00:24:56,569 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: initializing > replication queues > 2020-03-24 00:24:56,570 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Will take over writing > edit logs at txnid 20 > 2020-03-24 00:24:56,571 INFO > org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 20 > 2020-03-24 00:24:56,812 INFO > org.apache.hadoop.hdfs.server.namenode.FSDirectory: Initializing quota with 4 > thread(s) > 2020-03-24 00:24:56,819 INFO > org.apache.hadoop.hdfs.server.namenode.FSDirectory: Quota initialization > completed in 6 millisecondsname space=3storage space=24690storage > types=RAM_DISK=0, SSD=0, DISK=0, ARCHIVE=0, PROVIDED=0 > 2020-03-24 00:24:56,827 INFO > org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: > Starting CacheReplicationMonitor with interval 3 milliseconds > {code} > But NN2 fails to send back the RPC response because of temporary network > partitioning. > {code:java} > java.io.EOFException: End of File Exception between local host is: > "24e7b5a52e85/172.17.0.2"; destination host is: "127.0.0.3":8180; : > java.io.EOFException; For more details see: > http://wiki.apache.org/hadoop/EOFException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at org.apache.ha
[jira] [Updated] (HDFS-13377) The owner of folder can set quota for his sub folder
[ https://issues.apache.org/jira/browse/HDFS-13377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena updated HDFS-13377: Fix Version/s: 3.3.0 Hadoop Flags: Reviewed Resolution: Fixed Status: Resolved (was: Patch Available) > The owner of folder can set quota for his sub folder > > > Key: HDFS-13377 > URL: https://issues.apache.org/jira/browse/HDFS-13377 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Yang Yun >Assignee: Yang Yun >Priority: Minor > Fix For: 3.3.0 > > Attachments: HDFS-13377.003.patch, HDFS-13377.004.patch, > HDFS-13377.005.patch, HDFS-13377.006.patch, HDFS-13377.patch, > HDFS-13377.patch, HDFS-13377.patch > > > Currently, only super user can set quota. That is huge burden for > administrator in a large system. Add a new feature to let the owner of a > folder also has the privilege to set quota for his sub folders. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13377) The owner of folder can set quota for his sub folder
[ https://issues.apache.org/jira/browse/HDFS-13377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066012#comment-17066012 ] Ayush Saxena commented on HDFS-13377: - +1 for v006, Committed to trunk. Thanx [~hadoop_yangyun] for the contribution and [~elgoiri] for the review!!! > The owner of folder can set quota for his sub folder > > > Key: HDFS-13377 > URL: https://issues.apache.org/jira/browse/HDFS-13377 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Yang Yun >Assignee: Yang Yun >Priority: Minor > Attachments: HDFS-13377.003.patch, HDFS-13377.004.patch, > HDFS-13377.005.patch, HDFS-13377.006.patch, HDFS-13377.patch, > HDFS-13377.patch, HDFS-13377.patch > > > Currently, only super user can set quota. That is huge burden for > administrator in a large system. Add a new feature to let the owner of a > folder also has the privilege to set quota for his sub folders. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15191) EOF when reading legacy buffer in BlockTokenIdentifier
[ https://issues.apache.org/jira/browse/HDFS-15191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rand updated HDFS-15191: --- Status: Patch Available (was: Open) > EOF when reading legacy buffer in BlockTokenIdentifier > -- > > Key: HDFS-15191 > URL: https://issues.apache.org/jira/browse/HDFS-15191 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.2.1 >Reporter: Steven Rand >Assignee: Steven Rand >Priority: Major > Attachments: HDFS-15191-001.patch, HDFS-15191-002.patch, > HDFS-15191.003.patch, HDFS-15191.004.patch > > > We have an HDFS client application which recently upgraded from 3.2.0 to > 3.2.1. After this upgrade (but not before), we sometimes see these errors > when this application is used with clusters still running Hadoop 2.x (more > specifically CDH 5.12.1): > {code} > WARN [2020-02-24T00:54:32.856Z] > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory: I/O error constructing > remote block reader. (_sampled: true) > java.io.EOFException: > at java.io.DataInputStream.readByte(DataInputStream.java:272) > at > org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308) > at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:329) > at > org.apache.hadoop.hdfs.security.token.block.BlockTokenIdentifier.readFieldsLegacy(BlockTokenIdentifier.java:240) > at > org.apache.hadoop.hdfs.security.token.block.BlockTokenIdentifier.readFields(BlockTokenIdentifier.java:221) > at > org.apache.hadoop.security.token.Token.decodeIdentifier(Token.java:200) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:530) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:342) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:276) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:245) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:227) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.peerSend(SaslDataTransferClient.java:170) > at > org.apache.hadoop.hdfs.DFSUtilClient.peerFromSocketAndKey(DFSUtilClient.java:730) > at > org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:2942) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:822) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:747) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:380) > at > org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:644) > at > org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:575) > at > org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:757) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:829) > at java.io.DataInputStream.read(DataInputStream.java:100) > at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2314) > at org.apache.commons.io.IOUtils.copy(IOUtils.java:2270) > at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2291) > at org.apache.commons.io.IOUtils.copy(IOUtils.java:2246) > at org.apache.commons.io.IOUtils.toByteArray(IOUtils.java:765) > {code} > We get this warning for all DataNodes with a copy of the block, so the read > fails. > I haven't been able to figure out what changed between 3.2.0 and 3.2.1 to > cause this, but HDFS-13617 and HDFS-14611 seem related, so tagging > [~vagarychen] in case you have any ideas. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15191) EOF when reading legacy buffer in BlockTokenIdentifier
[ https://issues.apache.org/jira/browse/HDFS-15191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rand updated HDFS-15191: --- Attachment: HDFS-15191.004.patch > EOF when reading legacy buffer in BlockTokenIdentifier > -- > > Key: HDFS-15191 > URL: https://issues.apache.org/jira/browse/HDFS-15191 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.2.1 >Reporter: Steven Rand >Assignee: Steven Rand >Priority: Major > Attachments: HDFS-15191-001.patch, HDFS-15191-002.patch, > HDFS-15191.003.patch, HDFS-15191.004.patch > > > We have an HDFS client application which recently upgraded from 3.2.0 to > 3.2.1. After this upgrade (but not before), we sometimes see these errors > when this application is used with clusters still running Hadoop 2.x (more > specifically CDH 5.12.1): > {code} > WARN [2020-02-24T00:54:32.856Z] > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory: I/O error constructing > remote block reader. (_sampled: true) > java.io.EOFException: > at java.io.DataInputStream.readByte(DataInputStream.java:272) > at > org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308) > at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:329) > at > org.apache.hadoop.hdfs.security.token.block.BlockTokenIdentifier.readFieldsLegacy(BlockTokenIdentifier.java:240) > at > org.apache.hadoop.hdfs.security.token.block.BlockTokenIdentifier.readFields(BlockTokenIdentifier.java:221) > at > org.apache.hadoop.security.token.Token.decodeIdentifier(Token.java:200) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:530) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:342) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:276) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:245) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:227) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.peerSend(SaslDataTransferClient.java:170) > at > org.apache.hadoop.hdfs.DFSUtilClient.peerFromSocketAndKey(DFSUtilClient.java:730) > at > org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:2942) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:822) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:747) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:380) > at > org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:644) > at > org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:575) > at > org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:757) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:829) > at java.io.DataInputStream.read(DataInputStream.java:100) > at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2314) > at org.apache.commons.io.IOUtils.copy(IOUtils.java:2270) > at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2291) > at org.apache.commons.io.IOUtils.copy(IOUtils.java:2246) > at org.apache.commons.io.IOUtils.toByteArray(IOUtils.java:765) > {code} > We get this warning for all DataNodes with a copy of the block, so the read > fails. > I haven't been able to figure out what changed between 3.2.0 and 3.2.1 to > cause this, but HDFS-13617 and HDFS-14611 seem related, so tagging > [~vagarychen] in case you have any ideas. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15191) EOF when reading legacy buffer in BlockTokenIdentifier
[ https://issues.apache.org/jira/browse/HDFS-15191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rand updated HDFS-15191: --- Status: Open (was: Patch Available) > EOF when reading legacy buffer in BlockTokenIdentifier > -- > > Key: HDFS-15191 > URL: https://issues.apache.org/jira/browse/HDFS-15191 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.2.1 >Reporter: Steven Rand >Assignee: Steven Rand >Priority: Major > Attachments: HDFS-15191-001.patch, HDFS-15191-002.patch, > HDFS-15191.003.patch > > > We have an HDFS client application which recently upgraded from 3.2.0 to > 3.2.1. After this upgrade (but not before), we sometimes see these errors > when this application is used with clusters still running Hadoop 2.x (more > specifically CDH 5.12.1): > {code} > WARN [2020-02-24T00:54:32.856Z] > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory: I/O error constructing > remote block reader. (_sampled: true) > java.io.EOFException: > at java.io.DataInputStream.readByte(DataInputStream.java:272) > at > org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308) > at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:329) > at > org.apache.hadoop.hdfs.security.token.block.BlockTokenIdentifier.readFieldsLegacy(BlockTokenIdentifier.java:240) > at > org.apache.hadoop.hdfs.security.token.block.BlockTokenIdentifier.readFields(BlockTokenIdentifier.java:221) > at > org.apache.hadoop.security.token.Token.decodeIdentifier(Token.java:200) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:530) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:342) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:276) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:245) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:227) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.peerSend(SaslDataTransferClient.java:170) > at > org.apache.hadoop.hdfs.DFSUtilClient.peerFromSocketAndKey(DFSUtilClient.java:730) > at > org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:2942) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:822) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:747) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:380) > at > org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:644) > at > org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:575) > at > org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:757) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:829) > at java.io.DataInputStream.read(DataInputStream.java:100) > at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2314) > at org.apache.commons.io.IOUtils.copy(IOUtils.java:2270) > at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2291) > at org.apache.commons.io.IOUtils.copy(IOUtils.java:2246) > at org.apache.commons.io.IOUtils.toByteArray(IOUtils.java:765) > {code} > We get this warning for all DataNodes with a copy of the block, so the read > fails. > I haven't been able to figure out what changed between 3.2.0 and 3.2.1 to > cause this, but HDFS-13617 and HDFS-14611 seem related, so tagging > [~vagarychen] in case you have any ideas. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15191) EOF when reading legacy buffer in BlockTokenIdentifier
[ https://issues.apache.org/jira/browse/HDFS-15191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17065944#comment-17065944 ] Hadoop QA commented on HDFS-15191: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 30m 46s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 32s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 17m 8s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 7s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 3m 14s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 48s{color} | {color:orange} hadoop-hdfs-project: The patch generated 9 new + 20 unchanged - 0 fixed = 29 total (was 20) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 24s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 53s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}102m 16s{color} | {color:red} hadoop-hdfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 32s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}214m 51s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.datanode.TestBPOfferService | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.8 Server=19.03.8 Image:yetus/hadoop:4454c6d14b7 | | JIRA Issue | HDFS-15191 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12997550/HDFS-15191.003.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux ecfe56bb171a 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 4454c6d | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_242 | | findbugs
[jira] [Commented] (HDFS-13243) Get CorruptBlock because of calling close and sync in same time
[ https://issues.apache.org/jira/browse/HDFS-13243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17065533#comment-17065533 ] Hadoop QA commented on HDFS-13243: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 8s{color} | {color:red} HDFS-13243 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-13243 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12918525/HDFS-13243-v6.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/29010/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Get CorruptBlock because of calling close and sync in same time > --- > > Key: HDFS-13243 > URL: https://issues.apache.org/jira/browse/HDFS-13243 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.2, 3.2.0 >Reporter: Zephyr Guo >Assignee: Zephyr Guo >Priority: Critical > Attachments: HDFS-13243-v1.patch, HDFS-13243-v2.patch, > HDFS-13243-v3.patch, HDFS-13243-v4.patch, HDFS-13243-v5.patch, > HDFS-13243-v6.patch > > > HDFS File might get broken because of corrupt block(s) that could be produced > by calling close and sync in the same time. > When calling close was not successful, UCBlock status would change to > COMMITTED, and if a sync request gets popped from queue and processed, sync > operation would change the last block length. > After that, DataNode would report all received block to NameNode, and will > check Block length of all COMMITTED Blocks. But the block length was already > different between recorded in NameNode memory and reported by DataNode, and > consequently, the last block is marked as corruptted because of inconsistent > length. > > {panel:title=Log in my hdfs} > 2018-03-05 04:05:39,261 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > allocate blk_1085498930_11758129\{UCState=UNDER_CONSTRUCTION, > truncateBlock=null, primaryNodeIndex=-1, > replicas=[ReplicaUC[[DISK]DS-32c7e479-3845-4a44-adf1-831edec7506b:NORMAL:10.0.0.219:50010|RBW], > > ReplicaUC[[DISK]DS-a9a5d653-c049-463d-8e4a-d1f0dc14409c:NORMAL:10.0.0.220:50010|RBW], > > ReplicaUC[[DISK]DS-f2b7c04a-b724-4c69-abbf-d2e416f70706:NORMAL:10.0.0.218:50010|RBW]]} > for > /hbase/WALs/hb-j5e517al6xib80rkb-006.hbase.rds.aliyuncs.com,16020,1519845790686/hb-j5e517al6xib80rkb-006.hbase.rds.aliyuncs.com%2C16020%2C1519845790686.default.1520193926515 > 2018-03-05 04:05:39,760 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > fsync: > /hbase/WALs/hb-j5e517al6xib80rkb-006.hbase.rds.aliyuncs.com,16020,1519845790686/hb-j5e517al6xib80rkb-006.hbase.rds.aliyuncs.com%2C16020%2C1519845790686.default.1520193926515 > for DFSClient_NONMAPREDUCE_1077513762_1 > 2018-03-05 04:05:39,761 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: BLOCK* > blk_1085498930_11758129\{UCState=COMMITTED, truncateBlock=null, > primaryNodeIndex=-1, > replicas=[ReplicaUC[[DISK]DS-32c7e479-3845-4a44-adf1-831edec7506b:NORMAL:10.0.0.219:50010|RBW], > > ReplicaUC[[DISK]DS-a9a5d653-c049-463d-8e4a-d1f0dc14409c:NORMAL:10.0.0.220:50010|RBW], > > ReplicaUC[[DISK]DS-f2b7c04a-b724-4c69-abbf-d2e416f70706:NORMAL:10.0.0.218:50010|RBW]]} > is not COMPLETE (ucState = COMMITTED, replication# = 0 < minimum = 2) in > file > /hbase/WALs/hb-j5e517al6xib80rkb-006.hbase.rds.aliyuncs.com,16020,1519845790686/hb-j5e517al6xib80rkb-006.hbase.rds.aliyuncs.com%2C16020%2C1519845790686.default.1520193926515 > 2018-03-05 04:05:39,761 INFO BlockStateChange: BLOCK* addStoredBlock: > blockMap updated: 10.0.0.220:50010 is added to > blk_1085498930_11758129\{UCState=COMMITTED, truncateBlock=null, > primaryNodeIndex=-1, > replicas=[ReplicaUC[[DISK]DS-32c7e479-3845-4a44-adf1-831edec7506b:NORMAL:10.0.0.219:50010|RBW], > > ReplicaUC[[DISK]DS-a9a5d653-c049-463d-8e4a-d1f0dc14409c:NORMAL:10.0.0.220:50010|RBW], > > ReplicaUC[[DISK]DS-f2b7c04a-b724-4c69-abbf-d2e416f70706:NORMAL:10.0.0.218:50010|RBW]]} > size 2054413 > 2018-03-05 04:05:39,761 INFO BlockStateChange: BLOCK > NameSystem.addToCorruptReplicasMap: blk_1085498930 added as corrupt on > 10.0.0.219:50010 by > hb-j5e517al6xib80rkb-006.hbase.rds.aliyuncs.com/10.0.0.219 because block is > COMMITTED and reported length 2054413 does not match length in block map > 141232 > 2018-03-05 04:05:39,762 INFO BlockStateChange: BLOCK > NameSystem.addToCorruptReplicasMap: blk_108
[jira] [Updated] (HDFS-11310) Reduce the performance impact of the balancer (trunk port)
[ https://issues.apache.org/jira/browse/HDFS-11310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka updated HDFS-11310: - Target Version/s: 3.4.0 (was: 3.3.0) > Reduce the performance impact of the balancer (trunk port) > -- > > Key: HDFS-11310 > URL: https://issues.apache.org/jira/browse/HDFS-11310 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, namenode >Affects Versions: 3.0.0-alpha1 >Reporter: Daryn Sharp >Priority: Critical > > HDFS-7967 introduced a highly performant balancer getBlocks() query that > scales to large/dense clusters. The simple design implementation depends on > the triplets data structure. HDFS-9260 removed the triplets which > fundamentally changes the implementation. Either that patch must be reverted > or the getBlocks() patch needs reimplementation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11310) Reduce the performance impact of the balancer (trunk port)
[ https://issues.apache.org/jira/browse/HDFS-11310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka updated HDFS-11310: - As code freeze for 3.3 is crossed, moving this Jira to 3.4. Thank you. > Reduce the performance impact of the balancer (trunk port) > -- > > Key: HDFS-11310 > URL: https://issues.apache.org/jira/browse/HDFS-11310 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, namenode >Affects Versions: 3.0.0-alpha1 >Reporter: Daryn Sharp >Priority: Critical > > HDFS-7967 introduced a highly performant balancer getBlocks() query that > scales to large/dense clusters. The simple design implementation depends on > the triplets data structure. HDFS-9260 removed the triplets which > fundamentally changes the implementation. Either that patch must be reverted > or the getBlocks() patch needs reimplementation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15234) Add a default method body for the INodeAttributeProvider#checkPermissionWithContext API
[ https://issues.apache.org/jira/browse/HDFS-15234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka updated HDFS-15234: - Status: Patch Available (was: Open) > Add a default method body for the > INodeAttributeProvider#checkPermissionWithContext API > --- > > Key: HDFS-15234 > URL: https://issues.apache.org/jira/browse/HDFS-15234 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.3.0 >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Blocker > > The new API INodeAttributeProvider#checkPermissionWithContext() needs a > default method body. Otherwise old implementations fail to compile. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15238) RBF:NamenodeHeartbeatService caused memory to grow rapidly
xuzq created HDFS-15238: --- Summary: RBF:NamenodeHeartbeatService caused memory to grow rapidly Key: HDFS-15238 URL: https://issues.apache.org/jira/browse/HDFS-15238 Project: Hadoop HDFS Issue Type: Improvement Reporter: xuzq Assignee: xuzq NamenodeHeartbeatService will get NameNode's HA status every 5s, and created HAServiceProtocol every time. When creating HAServiceProtocol, it also will new Configuration. Over time, there will be more and more entries for REGISTER in Configuration until fullGc happen. The entry will piles up again, after reaching a certain threshold, the fullGc is triggered again. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-8893) DNs with failed volumes stop serving during rolling upgrade
[ https://issues.apache.org/jira/browse/HDFS-8893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka updated HDFS-8893: Target Version/s: 3.4.0 (was: 3.3.0) Moved to 3.4.0. > DNs with failed volumes stop serving during rolling upgrade > --- > > Key: HDFS-8893 > URL: https://issues.apache.org/jira/browse/HDFS-8893 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Rushabh Shah >Priority: Critical > > When a rolling upgrade starts, all DNs try to write a rolling_upgrade marker > to each of their volumes. If one of the volumes is bad, this will fail. When > this failure happens, the DN does not update the key it received from the NN. > Unfortunately we had one failed volume on all the 3 datanodes which were > having replica. > Keys expire after 20 hours so at about 20 hours into the rolling upgrade, the > DNs with failed volumes will stop serving clients. > Here is the stack trace on the datanode size: > {noformat} > 2015-08-11 07:32:28,827 [DataNode: heartbeating to 8020] WARN > datanode.DataNode: IOException in offerService > java.io.IOException: Read-only file system > at java.io.UnixFileSystem.createFileExclusively(Native Method) > at java.io.File.createNewFile(File.java:947) > at > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.setRollingUpgradeMarkers(BlockPoolSliceStorage.java:721) > at > org.apache.hadoop.hdfs.server.datanode.DataStorage.setRollingUpgradeMarker(DataStorage.java:173) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.setRollingUpgradeMarker(FsDatasetImpl.java:2357) > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.signalRollingUpgrade(BPOfferService.java:480) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.handleRollingUpgradeStatus(BPServiceActor.java:626) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:677) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:833) > at java.lang.Thread.run(Thread.java:722) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13243) Get CorruptBlock because of calling close and sync in same time
[ https://issues.apache.org/jira/browse/HDFS-13243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka updated HDFS-13243: - Target Version/s: 3.4.0 (was: 3.3.0) As code freeze for 3.3 is crossed, moving this Jira to 3.4. Please feel free to revert if anyone has concerns. Thank you. > Get CorruptBlock because of calling close and sync in same time > --- > > Key: HDFS-13243 > URL: https://issues.apache.org/jira/browse/HDFS-13243 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.2, 3.2.0 >Reporter: Zephyr Guo >Assignee: Zephyr Guo >Priority: Critical > Attachments: HDFS-13243-v1.patch, HDFS-13243-v2.patch, > HDFS-13243-v3.patch, HDFS-13243-v4.patch, HDFS-13243-v5.patch, > HDFS-13243-v6.patch > > > HDFS File might get broken because of corrupt block(s) that could be produced > by calling close and sync in the same time. > When calling close was not successful, UCBlock status would change to > COMMITTED, and if a sync request gets popped from queue and processed, sync > operation would change the last block length. > After that, DataNode would report all received block to NameNode, and will > check Block length of all COMMITTED Blocks. But the block length was already > different between recorded in NameNode memory and reported by DataNode, and > consequently, the last block is marked as corruptted because of inconsistent > length. > > {panel:title=Log in my hdfs} > 2018-03-05 04:05:39,261 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > allocate blk_1085498930_11758129\{UCState=UNDER_CONSTRUCTION, > truncateBlock=null, primaryNodeIndex=-1, > replicas=[ReplicaUC[[DISK]DS-32c7e479-3845-4a44-adf1-831edec7506b:NORMAL:10.0.0.219:50010|RBW], > > ReplicaUC[[DISK]DS-a9a5d653-c049-463d-8e4a-d1f0dc14409c:NORMAL:10.0.0.220:50010|RBW], > > ReplicaUC[[DISK]DS-f2b7c04a-b724-4c69-abbf-d2e416f70706:NORMAL:10.0.0.218:50010|RBW]]} > for > /hbase/WALs/hb-j5e517al6xib80rkb-006.hbase.rds.aliyuncs.com,16020,1519845790686/hb-j5e517al6xib80rkb-006.hbase.rds.aliyuncs.com%2C16020%2C1519845790686.default.1520193926515 > 2018-03-05 04:05:39,760 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > fsync: > /hbase/WALs/hb-j5e517al6xib80rkb-006.hbase.rds.aliyuncs.com,16020,1519845790686/hb-j5e517al6xib80rkb-006.hbase.rds.aliyuncs.com%2C16020%2C1519845790686.default.1520193926515 > for DFSClient_NONMAPREDUCE_1077513762_1 > 2018-03-05 04:05:39,761 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: BLOCK* > blk_1085498930_11758129\{UCState=COMMITTED, truncateBlock=null, > primaryNodeIndex=-1, > replicas=[ReplicaUC[[DISK]DS-32c7e479-3845-4a44-adf1-831edec7506b:NORMAL:10.0.0.219:50010|RBW], > > ReplicaUC[[DISK]DS-a9a5d653-c049-463d-8e4a-d1f0dc14409c:NORMAL:10.0.0.220:50010|RBW], > > ReplicaUC[[DISK]DS-f2b7c04a-b724-4c69-abbf-d2e416f70706:NORMAL:10.0.0.218:50010|RBW]]} > is not COMPLETE (ucState = COMMITTED, replication# = 0 < minimum = 2) in > file > /hbase/WALs/hb-j5e517al6xib80rkb-006.hbase.rds.aliyuncs.com,16020,1519845790686/hb-j5e517al6xib80rkb-006.hbase.rds.aliyuncs.com%2C16020%2C1519845790686.default.1520193926515 > 2018-03-05 04:05:39,761 INFO BlockStateChange: BLOCK* addStoredBlock: > blockMap updated: 10.0.0.220:50010 is added to > blk_1085498930_11758129\{UCState=COMMITTED, truncateBlock=null, > primaryNodeIndex=-1, > replicas=[ReplicaUC[[DISK]DS-32c7e479-3845-4a44-adf1-831edec7506b:NORMAL:10.0.0.219:50010|RBW], > > ReplicaUC[[DISK]DS-a9a5d653-c049-463d-8e4a-d1f0dc14409c:NORMAL:10.0.0.220:50010|RBW], > > ReplicaUC[[DISK]DS-f2b7c04a-b724-4c69-abbf-d2e416f70706:NORMAL:10.0.0.218:50010|RBW]]} > size 2054413 > 2018-03-05 04:05:39,761 INFO BlockStateChange: BLOCK > NameSystem.addToCorruptReplicasMap: blk_1085498930 added as corrupt on > 10.0.0.219:50010 by > hb-j5e517al6xib80rkb-006.hbase.rds.aliyuncs.com/10.0.0.219 because block is > COMMITTED and reported length 2054413 does not match length in block map > 141232 > 2018-03-05 04:05:39,762 INFO BlockStateChange: BLOCK > NameSystem.addToCorruptReplicasMap: blk_1085498930 added as corrupt on > 10.0.0.218:50010 by > hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com/10.0.0.218 because block is > COMMITTED and reported length 2054413 does not match length in block map > 141232 > 2018-03-05 04:05:40,162 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: BLOCK* > blk_1085498930_11758129\{UCState=COMMITTED, truncateBlock=null, > primaryNodeIndex=-1, > replicas=[ReplicaUC[[DISK]DS-32c7e479-3845-4a44-adf1-831edec7506b:NORMAL:10.0.0.219:50010|RBW], > > ReplicaUC[[DISK]DS-a9a5d653-c049-463d-8e4a-d1f0dc14409c:NORMAL:10.0.0.220:50010|RBW], > > ReplicaUC[[DISK]DS-f2b7c04a-b724-4c69-abbf-d2e416f70706:NORMAL:10.0.0.218:50010|RBW]]} > is not COMPLET
[jira] [Updated] (HDFS-15191) EOF when reading legacy buffer in BlockTokenIdentifier
[ https://issues.apache.org/jira/browse/HDFS-15191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rand updated HDFS-15191: --- Status: Open (was: Patch Available) > EOF when reading legacy buffer in BlockTokenIdentifier > -- > > Key: HDFS-15191 > URL: https://issues.apache.org/jira/browse/HDFS-15191 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.2.1 >Reporter: Steven Rand >Assignee: Steven Rand >Priority: Major > Attachments: HDFS-15191-001.patch, HDFS-15191-002.patch, > HDFS-15191.003.patch > > > We have an HDFS client application which recently upgraded from 3.2.0 to > 3.2.1. After this upgrade (but not before), we sometimes see these errors > when this application is used with clusters still running Hadoop 2.x (more > specifically CDH 5.12.1): > {code} > WARN [2020-02-24T00:54:32.856Z] > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory: I/O error constructing > remote block reader. (_sampled: true) > java.io.EOFException: > at java.io.DataInputStream.readByte(DataInputStream.java:272) > at > org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308) > at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:329) > at > org.apache.hadoop.hdfs.security.token.block.BlockTokenIdentifier.readFieldsLegacy(BlockTokenIdentifier.java:240) > at > org.apache.hadoop.hdfs.security.token.block.BlockTokenIdentifier.readFields(BlockTokenIdentifier.java:221) > at > org.apache.hadoop.security.token.Token.decodeIdentifier(Token.java:200) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:530) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:342) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:276) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:245) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:227) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.peerSend(SaslDataTransferClient.java:170) > at > org.apache.hadoop.hdfs.DFSUtilClient.peerFromSocketAndKey(DFSUtilClient.java:730) > at > org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:2942) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:822) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:747) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:380) > at > org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:644) > at > org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:575) > at > org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:757) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:829) > at java.io.DataInputStream.read(DataInputStream.java:100) > at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2314) > at org.apache.commons.io.IOUtils.copy(IOUtils.java:2270) > at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2291) > at org.apache.commons.io.IOUtils.copy(IOUtils.java:2246) > at org.apache.commons.io.IOUtils.toByteArray(IOUtils.java:765) > {code} > We get this warning for all DataNodes with a copy of the block, so the read > fails. > I haven't been able to figure out what changed between 3.2.0 and 3.2.1 to > cause this, but HDFS-13617 and HDFS-14611 seem related, so tagging > [~vagarychen] in case you have any ideas. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15191) EOF when reading legacy buffer in BlockTokenIdentifier
[ https://issues.apache.org/jira/browse/HDFS-15191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rand updated HDFS-15191: --- Status: Patch Available (was: Open) > EOF when reading legacy buffer in BlockTokenIdentifier > -- > > Key: HDFS-15191 > URL: https://issues.apache.org/jira/browse/HDFS-15191 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.2.1 >Reporter: Steven Rand >Assignee: Steven Rand >Priority: Major > Attachments: HDFS-15191-001.patch, HDFS-15191-002.patch, > HDFS-15191.003.patch > > > We have an HDFS client application which recently upgraded from 3.2.0 to > 3.2.1. After this upgrade (but not before), we sometimes see these errors > when this application is used with clusters still running Hadoop 2.x (more > specifically CDH 5.12.1): > {code} > WARN [2020-02-24T00:54:32.856Z] > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory: I/O error constructing > remote block reader. (_sampled: true) > java.io.EOFException: > at java.io.DataInputStream.readByte(DataInputStream.java:272) > at > org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308) > at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:329) > at > org.apache.hadoop.hdfs.security.token.block.BlockTokenIdentifier.readFieldsLegacy(BlockTokenIdentifier.java:240) > at > org.apache.hadoop.hdfs.security.token.block.BlockTokenIdentifier.readFields(BlockTokenIdentifier.java:221) > at > org.apache.hadoop.security.token.Token.decodeIdentifier(Token.java:200) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:530) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:342) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:276) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:245) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:227) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.peerSend(SaslDataTransferClient.java:170) > at > org.apache.hadoop.hdfs.DFSUtilClient.peerFromSocketAndKey(DFSUtilClient.java:730) > at > org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:2942) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:822) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:747) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:380) > at > org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:644) > at > org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:575) > at > org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:757) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:829) > at java.io.DataInputStream.read(DataInputStream.java:100) > at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2314) > at org.apache.commons.io.IOUtils.copy(IOUtils.java:2270) > at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2291) > at org.apache.commons.io.IOUtils.copy(IOUtils.java:2246) > at org.apache.commons.io.IOUtils.toByteArray(IOUtils.java:765) > {code} > We get this warning for all DataNodes with a copy of the block, so the read > fails. > I haven't been able to figure out what changed between 3.2.0 and 3.2.1 to > cause this, but HDFS-13617 and HDFS-14611 seem related, so tagging > [~vagarychen] in case you have any ideas. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15191) EOF when reading legacy buffer in BlockTokenIdentifier
[ https://issues.apache.org/jira/browse/HDFS-15191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rand updated HDFS-15191: --- Attachment: HDFS-15191.003.patch > EOF when reading legacy buffer in BlockTokenIdentifier > -- > > Key: HDFS-15191 > URL: https://issues.apache.org/jira/browse/HDFS-15191 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.2.1 >Reporter: Steven Rand >Assignee: Steven Rand >Priority: Major > Attachments: HDFS-15191-001.patch, HDFS-15191-002.patch, > HDFS-15191.003.patch > > > We have an HDFS client application which recently upgraded from 3.2.0 to > 3.2.1. After this upgrade (but not before), we sometimes see these errors > when this application is used with clusters still running Hadoop 2.x (more > specifically CDH 5.12.1): > {code} > WARN [2020-02-24T00:54:32.856Z] > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory: I/O error constructing > remote block reader. (_sampled: true) > java.io.EOFException: > at java.io.DataInputStream.readByte(DataInputStream.java:272) > at > org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308) > at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:329) > at > org.apache.hadoop.hdfs.security.token.block.BlockTokenIdentifier.readFieldsLegacy(BlockTokenIdentifier.java:240) > at > org.apache.hadoop.hdfs.security.token.block.BlockTokenIdentifier.readFields(BlockTokenIdentifier.java:221) > at > org.apache.hadoop.security.token.Token.decodeIdentifier(Token.java:200) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:530) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:342) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:276) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:245) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:227) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.peerSend(SaslDataTransferClient.java:170) > at > org.apache.hadoop.hdfs.DFSUtilClient.peerFromSocketAndKey(DFSUtilClient.java:730) > at > org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:2942) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:822) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:747) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:380) > at > org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:644) > at > org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:575) > at > org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:757) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:829) > at java.io.DataInputStream.read(DataInputStream.java:100) > at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2314) > at org.apache.commons.io.IOUtils.copy(IOUtils.java:2270) > at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2291) > at org.apache.commons.io.IOUtils.copy(IOUtils.java:2246) > at org.apache.commons.io.IOUtils.toByteArray(IOUtils.java:765) > {code} > We get this warning for all DataNodes with a copy of the block, so the read > fails. > I haven't been able to figure out what changed between 3.2.0 and 3.2.1 to > cause this, but HDFS-13617 and HDFS-14611 seem related, so tagging > [~vagarychen] in case you have any ideas. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15235) Transient network failure during NameNode failover makes cluster unavailable
[ https://issues.apache.org/jira/browse/HDFS-15235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] YCozy updated HDFS-15235: - Description: We have an HA cluster with two NameNodes: an active NN1 and a standby NN2. At some point, NN1 becomes unhealthy and the admin tries to manually failover to NN2 by running command {code:java} $ hdfs haadmin -failover NN1 NN2 {code} NN2 receives the request and becomes active: {code:java} 2020-03-24 00:24:56,412 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Stopping services started for standby state 2020-03-24 00:24:56,413 WARN org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Edit log tailer interrupted: sleep interrupted 2020-03-24 00:24:56,415 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services required for active state 2020-03-24 00:24:56,417 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Recovering unfinalized segments in /app/ha-name-dir-shared/current 2020-03-24 00:24:56,419 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Recovering unfinalized segments in /app/nn2/name/current 2020-03-24 00:24:56,419 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Catching up to latest edits from old active before taking over writer role in edits logs 2020-03-24 00:24:56,435 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Reading org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@7c3095fa expecting start txid #1 2020-03-24 00:24:56,436 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Start loading edits file /app/ha-name-dir-shared/current/edits_001-019 maxTxnsToRead = 9223372036854775807 2020-03-24 00:24:56,441 INFO org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream: Fast-forwarding stream '/app/ha-name-dir-shared/current/edits_001-019' to transaction ID 1 2020-03-24 00:24:56,567 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Loaded 1 edits file(s) (the last named /app/ha-name-dir-shared/current/edits_001-019) of total size 1305.0, total edits 19.0, total load time 109.0 ms 2020-03-24 00:24:56,567 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Marking all datanodes as stale 2020-03-24 00:24:56,568 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Processing 4 messages from DataNodes that were previously queued during standby state 2020-03-24 00:24:56,569 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Reprocessing replication and invalidation queues 2020-03-24 00:24:56,569 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: initializing replication queues 2020-03-24 00:24:56,570 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Will take over writing edit logs at txnid 20 2020-03-24 00:24:56,571 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 20 2020-03-24 00:24:56,812 INFO org.apache.hadoop.hdfs.server.namenode.FSDirectory: Initializing quota with 4 thread(s) 2020-03-24 00:24:56,819 INFO org.apache.hadoop.hdfs.server.namenode.FSDirectory: Quota initialization completed in 6 millisecondsname space=3storage space=24690storage types=RAM_DISK=0, SSD=0, DISK=0, ARCHIVE=0, PROVIDED=0 2020-03-24 00:24:56,827 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Starting CacheReplicationMonitor with interval 3 milliseconds {code} But NN2 fails to send back the RPC response because of temporary network partitioning. {code:java} java.io.EOFException: End of File Exception between local host is: "24e7b5a52e85/172.17.0.2"; destination host is: "127.0.0.3":8180; : java.io.EOFException; For more details see: http://wiki.apache.org/hadoop/EOFException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:837) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:791) at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1597) at org.apache.hadoop.ipc.Client.call(Client.java:1539) at org.apache.hadoop.ipc.Client.call(Client.java:1436) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118) at com.sun.proxy.$Proxy8.transitionToActive(Unknown Source) at org.apache.hadoop.ha.protocolPB.HAServiceProtocolClientSideTranslatorPB.transitionToActive(HAServiceProtocolClien
[jira] [Updated] (HDFS-15191) EOF when reading legacy buffer in BlockTokenIdentifier
[ https://issues.apache.org/jira/browse/HDFS-15191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rand updated HDFS-15191: --- Status: Patch Available (was: Open) > EOF when reading legacy buffer in BlockTokenIdentifier > -- > > Key: HDFS-15191 > URL: https://issues.apache.org/jira/browse/HDFS-15191 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.2.1 >Reporter: Steven Rand >Assignee: Steven Rand >Priority: Major > Attachments: HDFS-15191-001.patch, HDFS-15191-002.patch > > > We have an HDFS client application which recently upgraded from 3.2.0 to > 3.2.1. After this upgrade (but not before), we sometimes see these errors > when this application is used with clusters still running Hadoop 2.x (more > specifically CDH 5.12.1): > {code} > WARN [2020-02-24T00:54:32.856Z] > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory: I/O error constructing > remote block reader. (_sampled: true) > java.io.EOFException: > at java.io.DataInputStream.readByte(DataInputStream.java:272) > at > org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308) > at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:329) > at > org.apache.hadoop.hdfs.security.token.block.BlockTokenIdentifier.readFieldsLegacy(BlockTokenIdentifier.java:240) > at > org.apache.hadoop.hdfs.security.token.block.BlockTokenIdentifier.readFields(BlockTokenIdentifier.java:221) > at > org.apache.hadoop.security.token.Token.decodeIdentifier(Token.java:200) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:530) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:342) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:276) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:245) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:227) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.peerSend(SaslDataTransferClient.java:170) > at > org.apache.hadoop.hdfs.DFSUtilClient.peerFromSocketAndKey(DFSUtilClient.java:730) > at > org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:2942) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:822) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:747) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:380) > at > org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:644) > at > org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:575) > at > org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:757) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:829) > at java.io.DataInputStream.read(DataInputStream.java:100) > at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2314) > at org.apache.commons.io.IOUtils.copy(IOUtils.java:2270) > at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2291) > at org.apache.commons.io.IOUtils.copy(IOUtils.java:2246) > at org.apache.commons.io.IOUtils.toByteArray(IOUtils.java:765) > {code} > We get this warning for all DataNodes with a copy of the block, so the read > fails. > I haven't been able to figure out what changed between 3.2.0 and 3.2.1 to > cause this, but HDFS-13617 and HDFS-14611 seem related, so tagging > [~vagarychen] in case you have any ideas. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15191) EOF when reading legacy buffer in BlockTokenIdentifier
[ https://issues.apache.org/jira/browse/HDFS-15191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rand updated HDFS-15191: --- Status: Open (was: Patch Available) > EOF when reading legacy buffer in BlockTokenIdentifier > -- > > Key: HDFS-15191 > URL: https://issues.apache.org/jira/browse/HDFS-15191 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.2.1 >Reporter: Steven Rand >Assignee: Steven Rand >Priority: Major > Attachments: HDFS-15191-001.patch, HDFS-15191-002.patch > > > We have an HDFS client application which recently upgraded from 3.2.0 to > 3.2.1. After this upgrade (but not before), we sometimes see these errors > when this application is used with clusters still running Hadoop 2.x (more > specifically CDH 5.12.1): > {code} > WARN [2020-02-24T00:54:32.856Z] > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory: I/O error constructing > remote block reader. (_sampled: true) > java.io.EOFException: > at java.io.DataInputStream.readByte(DataInputStream.java:272) > at > org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308) > at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:329) > at > org.apache.hadoop.hdfs.security.token.block.BlockTokenIdentifier.readFieldsLegacy(BlockTokenIdentifier.java:240) > at > org.apache.hadoop.hdfs.security.token.block.BlockTokenIdentifier.readFields(BlockTokenIdentifier.java:221) > at > org.apache.hadoop.security.token.Token.decodeIdentifier(Token.java:200) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:530) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:342) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:276) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:245) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:227) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.peerSend(SaslDataTransferClient.java:170) > at > org.apache.hadoop.hdfs.DFSUtilClient.peerFromSocketAndKey(DFSUtilClient.java:730) > at > org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:2942) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:822) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:747) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:380) > at > org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:644) > at > org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:575) > at > org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:757) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:829) > at java.io.DataInputStream.read(DataInputStream.java:100) > at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2314) > at org.apache.commons.io.IOUtils.copy(IOUtils.java:2270) > at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2291) > at org.apache.commons.io.IOUtils.copy(IOUtils.java:2246) > at org.apache.commons.io.IOUtils.toByteArray(IOUtils.java:765) > {code} > We get this warning for all DataNodes with a copy of the block, so the read > fails. > I haven't been able to figure out what changed between 3.2.0 and 3.2.1 to > cause this, but HDFS-13617 and HDFS-14611 seem related, so tagging > [~vagarychen] in case you have any ideas. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15180) DataNode FsDatasetImpl Fine-Grained Locking via BlockPool.
[ https://issues.apache.org/jira/browse/HDFS-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17065536#comment-17065536 ] Aiphago commented on HDFS-15180: ping [~sodonnell] , [~linyiqun], [~weichiu] Any advice ?Thanks. > DataNode FsDatasetImpl Fine-Grained Locking via BlockPool. > --- > > Key: HDFS-15180 > URL: https://issues.apache.org/jira/browse/HDFS-15180 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: zhuqi >Assignee: Aiphago >Priority: Major > Attachments: HDFS-15180.001.patch, HDFS-15180.002.patch, > HDFS-15180.003.patch, HDFS-15180.004.patch, > image-2020-03-10-17-22-57-391.png, image-2020-03-10-17-31-58-830.png, > image-2020-03-10-17-34-26-368.png > > > Now the FsDatasetImpl datasetLock is heavy, when their are many namespaces in > big cluster. If we can split the FsDatasetImpl datasetLock via blockpool. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15238) RBF:NamenodeHeartbeatService caused memory to grow rapidly
[ https://issues.apache.org/jira/browse/HDFS-15238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuzq updated HDFS-15238: Attachment: HDFS-15238-trunk-001.patch Status: Patch Available (was: Open) > RBF:NamenodeHeartbeatService caused memory to grow rapidly > -- > > Key: HDFS-15238 > URL: https://issues.apache.org/jira/browse/HDFS-15238 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: xuzq >Assignee: xuzq >Priority: Major > Attachments: HDFS-15238-trunk-001.patch > > > NamenodeHeartbeatService will get NameNode's HA status every 5s, and created > HAServiceProtocol every time. > When creating HAServiceProtocol, it also will new Configuration. > Over time, there will be more and more entries for REGISTER in Configuration > until fullGc happen. > The entry will piles up again, after reaching a certain threshold, the > fullGc is triggered again. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15237) Get checksum of EC file failed, when some block is missing or corrupt
[ https://issues.apache.org/jira/browse/HDFS-15237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17065509#comment-17065509 ] zhengchenyu commented on HDFS-15237: [~ayushtkn] Hadoop-3.2.1 > Get checksum of EC file failed, when some block is missing or corrupt > - > > Key: HDFS-15237 > URL: https://issues.apache.org/jira/browse/HDFS-15237 > Project: Hadoop HDFS > Issue Type: Bug > Components: ec, hdfs >Affects Versions: 3.2.1 >Reporter: zhengchenyu >Priority: Major > Fix For: 3.2.2 > > > When we distcp from an ec directory to another one, I found some error like > this. > {code} > 2020-03-20 20:18:21,366 WARN [main] > org.apache.hadoop.hdfs.FileChecksumHelper: src=/EC/6-3//000325_0, > datanodes[6]=DatanodeInfoWithStorage[10.200.128.40:9866,DS-65ac4407-9d33-4c59-8f72-dd1d80d26d9f,DISK]2020-03-20 > 20:18:21,366 WARN [main] org.apache.hadoop.hdfs.FileChecksumHelper: > src=/EC/6-3//000325_0, > datanodes[6]=DatanodeInfoWithStorage[10.200.128.40:9866,DS-65ac4407-9d33-4c59-8f72-dd1d80d26d9f,DISK]java.io.EOFException: > Unexpected EOF while trying to read response from server at > org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:550) > at > org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.tryDatanode(FileChecksumHelper.java:709) > at > org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlockGroup(FileChecksumHelper.java:664) > at > org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlocks(FileChecksumHelper.java:638) > at > org.apache.hadoop.hdfs.FileChecksumHelper$FileChecksumComputer.compute(FileChecksumHelper.java:252) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumInternal(DFSClient.java:1790) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumWithCombineMode(DFSClient.java:1810) > at > org.apache.hadoop.hdfs.DistributedFileSystem$33.doCall(DistributedFileSystem.java:1691) > at > org.apache.hadoop.hdfs.DistributedFileSystem$33.doCall(DistributedFileSystem.java:1688) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1700) > at > org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:138) > at > org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:115) > at > org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87) > at > org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:259) > at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:220) at > org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:48) at > org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) at > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799) at > org.apache.hadoop.mapred.MapTask.run(MapTask.java:347) at > org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174) at > java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:422) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168) > {code} > And Then I found some error in datanode like this > {code} > 2020-03-20 20:54:16,573 INFO > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient: > SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = > false > 2020-03-20 20:54:16,577 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: > bd-hadoop-128050.zeus.lianjia.com:9866:DataXceiver error processing > BLOCK_GROUP_CHECKSUM operation src: /10.201.1.38:33264 dst: > /10.200.128.50:9866 > java.lang.UnsupportedOperationException > at java.nio.ByteBuffer.array(ByteBuffer.java:994) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockChecksumReconstructor.reconstruct(StripedBlockChecksumReconstructor.java:90) > at > org.apache.hadoop.hdfs.server.datanode.BlockChecksumHelper$BlockGroupNonStripedChecksumComputer.recalculateChecksum(BlockChecksumHelper.java:711) > at > org.apache.hadoop.hdfs.server.datanode.BlockChecksumHelper$BlockGroupNonStripedChecksumComputer.compute(BlockChecksumHelper.java:489) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.blockGroupChecksum(DataXceiver.java:1047) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opStripedBlockChecksum(Receiver.java:327) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:119) > at > org.apache.hadoop.hdfs.server.da
[jira] [Commented] (HDFS-15237) Get checksum of EC file failed, when some block is missing or corrupt
[ https://issues.apache.org/jira/browse/HDFS-15237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17065508#comment-17065508 ] Ayush Saxena commented on HDFS-15237: - What is the Hadoop version? > Get checksum of EC file failed, when some block is missing or corrupt > - > > Key: HDFS-15237 > URL: https://issues.apache.org/jira/browse/HDFS-15237 > Project: Hadoop HDFS > Issue Type: Bug > Components: ec, hdfs >Affects Versions: 3.2.1 >Reporter: zhengchenyu >Priority: Major > Fix For: 3.2.2 > > > When we distcp from an ec directory to another one, I found some error like > this. > {code} > 2020-03-20 20:18:21,366 WARN [main] > org.apache.hadoop.hdfs.FileChecksumHelper: src=/EC/6-3//000325_0, > datanodes[6]=DatanodeInfoWithStorage[10.200.128.40:9866,DS-65ac4407-9d33-4c59-8f72-dd1d80d26d9f,DISK]2020-03-20 > 20:18:21,366 WARN [main] org.apache.hadoop.hdfs.FileChecksumHelper: > src=/EC/6-3//000325_0, > datanodes[6]=DatanodeInfoWithStorage[10.200.128.40:9866,DS-65ac4407-9d33-4c59-8f72-dd1d80d26d9f,DISK]java.io.EOFException: > Unexpected EOF while trying to read response from server at > org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:550) > at > org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.tryDatanode(FileChecksumHelper.java:709) > at > org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlockGroup(FileChecksumHelper.java:664) > at > org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlocks(FileChecksumHelper.java:638) > at > org.apache.hadoop.hdfs.FileChecksumHelper$FileChecksumComputer.compute(FileChecksumHelper.java:252) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumInternal(DFSClient.java:1790) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumWithCombineMode(DFSClient.java:1810) > at > org.apache.hadoop.hdfs.DistributedFileSystem$33.doCall(DistributedFileSystem.java:1691) > at > org.apache.hadoop.hdfs.DistributedFileSystem$33.doCall(DistributedFileSystem.java:1688) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1700) > at > org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:138) > at > org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:115) > at > org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87) > at > org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:259) > at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:220) at > org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:48) at > org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) at > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799) at > org.apache.hadoop.mapred.MapTask.run(MapTask.java:347) at > org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174) at > java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:422) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168) > {code} > And Then I found some error in datanode like this > {code} > 2020-03-20 20:54:16,573 INFO > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient: > SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = > false > 2020-03-20 20:54:16,577 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: > bd-hadoop-128050.zeus.lianjia.com:9866:DataXceiver error processing > BLOCK_GROUP_CHECKSUM operation src: /10.201.1.38:33264 dst: > /10.200.128.50:9866 > java.lang.UnsupportedOperationException > at java.nio.ByteBuffer.array(ByteBuffer.java:994) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockChecksumReconstructor.reconstruct(StripedBlockChecksumReconstructor.java:90) > at > org.apache.hadoop.hdfs.server.datanode.BlockChecksumHelper$BlockGroupNonStripedChecksumComputer.recalculateChecksum(BlockChecksumHelper.java:711) > at > org.apache.hadoop.hdfs.server.datanode.BlockChecksumHelper$BlockGroupNonStripedChecksumComputer.compute(BlockChecksumHelper.java:489) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.blockGroupChecksum(DataXceiver.java:1047) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opStripedBlockChecksum(Receiver.java:327) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:119) > at > org.apache.hadoop.hdfs.serv
[jira] [Commented] (HDFS-15201) SnapshotCounter hits MaxSnapshotID limit
[ https://issues.apache.org/jira/browse/HDFS-15201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17065484#comment-17065484 ] Hudson commented on HDFS-15201: --- FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #18078 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/18078/]) HDFS-15201 SnapshotCounter hits MaxSnapshotID limit (#1870) (github: rev 5250cd6db3a6b7abeb5c9a0a4059f1d95986c07b) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/SnapshotManager.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/snapshot/TestSnapshotManager.java > SnapshotCounter hits MaxSnapshotID limit > > > Key: HDFS-15201 > URL: https://issues.apache.org/jira/browse/HDFS-15201 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Reporter: Karthik Palanisamy >Assignee: Karthik Palanisamy >Priority: Major > > Users reported that they are unable to take HDFS snapshots and their > snapshotCounter hits MaxSnapshotID limit. MaxSnapshotID limit is 16777215. > {code:java} > SnapshotManager.java > private static final int SNAPSHOT_ID_BIT_WIDTH = 24; > /** > * Returns the maximum allowable snapshot ID based on the bit width of the > * snapshot ID. > * > * @return maximum allowable snapshot ID. > */ > public int getMaxSnapshotID() { > return ((1 << SNAPSHOT_ID_BIT_WIDTH) - 1); > } > {code} > > I think, SNAPSHOT_ID_BIT_WIDTH is too low. May be good idea to increase > SNAPSHOT_ID_BIT_WIDTH to 31? to aline with our CURRENT_STATE_ID limit > (Integer.MAX_VALUE - 1). > > {code:java} > /** > * This id is used to indicate the current state (vs. snapshots) > */ > public static final int CURRENT_STATE_ID = Integer.MAX_VALUE - 1; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-15201) SnapshotCounter hits MaxSnapshotID limit
[ https://issues.apache.org/jira/browse/HDFS-15201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lokesh Jain resolved HDFS-15201. Resolution: Fixed > SnapshotCounter hits MaxSnapshotID limit > > > Key: HDFS-15201 > URL: https://issues.apache.org/jira/browse/HDFS-15201 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Reporter: Karthik Palanisamy >Assignee: Karthik Palanisamy >Priority: Major > > Users reported that they are unable to take HDFS snapshots and their > snapshotCounter hits MaxSnapshotID limit. MaxSnapshotID limit is 16777215. > {code:java} > SnapshotManager.java > private static final int SNAPSHOT_ID_BIT_WIDTH = 24; > /** > * Returns the maximum allowable snapshot ID based on the bit width of the > * snapshot ID. > * > * @return maximum allowable snapshot ID. > */ > public int getMaxSnapshotID() { > return ((1 << SNAPSHOT_ID_BIT_WIDTH) - 1); > } > {code} > > I think, SNAPSHOT_ID_BIT_WIDTH is too low. May be good idea to increase > SNAPSHOT_ID_BIT_WIDTH to 31? to aline with our CURRENT_STATE_ID limit > (Integer.MAX_VALUE - 1). > > {code:java} > /** > * This id is used to indicate the current state (vs. snapshots) > */ > public static final int CURRENT_STATE_ID = Integer.MAX_VALUE - 1; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15237) Get checksum of EC file failed, when some block is missing or corrupt
[ https://issues.apache.org/jira/browse/HDFS-15237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17065461#comment-17065461 ] zhengchenyu edited comment on HDFS-15237 at 3/24/20, 9:43 AM: -- Here comment some strange phenomenon. Why this error happen? Because I found some wrong internal block distribution like this: {code:java} 0. BP-1936287042-10.200.128.33-1573194961291:blk_-9223372036797335472_3783205 len=247453749 Live_repl=8 [blk_-9223372036797335472:DatanodeInfoWithStorage[10.200.128.43:9866,DS-2ddde0b8-6a84-4d06-8a40-d4ae5691e81c,DISK], blk_-9223372036797335471:DatanodeInfoWithStorage[10.200.128.41:9866,DS-a4fc5486-6c45-481e-84e7-9393eeaf1313,DISK], blk_-9223372036797335470:DatanodeInfoWithStorage[10.200.128.50:9866,DS-fc0632c6-8916-42d8-8219-57b022bb2786,DISK], blk_-9223372036797335469:DatanodeInfoWithStorage[10.200.128.54:9866,DS-1b6cb52a-f55a-4ef8-beaf-a5d7b7fe93aa,DISK], blk_-9223372036797335467:DatanodeInfoWithStorage[10.200.128.52:9866,DS-fc6e00dd-ca5a-4580-9403-aeb6906da81a,DISK], blk_-9223372036797335466:DatanodeInfoWithStorage[10.200.128.53:9866,DS-2c926a3b-64c0-441b-abe2-188e79918abe,DISK], blk_-9223372036797335465:DatanodeInfoWithStorage[10.200.128.40:9866,DS-65ac4407-9d33-4c59-8f72-dd1d80d26d9f,DISK], blk_-9223372036797335464:DatanodeInfoWithStorage[10.200.128.44:9866,DS-3725af76-fe86-4f97-9740-d77bfa339b3f,DISK], blk_-9223372036797335470:DatanodeInfoWithStorage[10.200.128.45:9866,DS-250fd4cf-705f-4cb5-bc3a-c7a105247e35,DISK]] {code} this is the result of hdfs fsck. Your can see this block group has 9 internal block, but no blk_-9223372036797335468, two repeated blk_-9223372036797335470. Through the distribution is wrong, the blcok group has 9 internal block, so the replicatorMonitor can't repair this error. I think this may be another issue. this block is too old so that the log is missing, so I don't know the reason, and can't reproduction this error now. was (Author: zhengchenyu): Here comment some strange phenomenon. Why this error happen? Because I found some wrong internal block distribution like this: {code:java} 0. BP-1936287042-10.200.128.33-1573194961291:blk_-9223372036797335472_3783205 len=247453749 Live_repl=8 [blk_-9223372036797335472:DatanodeInfoWithStorage[10.200.128.43:9866,DS-2ddde0b8-6a84-4d06-8a40-d4ae5691e81c,DISK], blk_-9223372036797335471:DatanodeInfoWithStorage[10.200.128.41:9866,DS-a4fc5486-6c45-481e-84e7-9393eeaf1313,DISK], blk_-9223372036797335470:DatanodeInfoWithStorage[10.200.128.50:9866,DS-fc0632c6-8916-42d8-8219-57b022bb2786,DISK], blk_-9223372036797335469:DatanodeInfoWithStorage[10.200.128.54:9866,DS-1b6cb52a-f55a-4ef8-beaf-a5d7b7fe93aa,DISK], blk_-9223372036797335467:DatanodeInfoWithStorage[10.200.128.52:9866,DS-fc6e00dd-ca5a-4580-9403-aeb6906da81a,DISK], blk_-9223372036797335466:DatanodeInfoWithStorage[10.200.128.53:9866,DS-2c926a3b-64c0-441b-abe2-188e79918abe,DISK], blk_-9223372036797335465:DatanodeInfoWithStorage[10.200.128.40:9866,DS-65ac4407-9d33-4c59-8f72-dd1d80d26d9f,DISK], blk_-9223372036797335464:DatanodeInfoWithStorage[10.200.128.44:9866,DS-3725af76-fe86-4f97-9740-d77bfa339b3f,DISK], blk_-9223372036797335470:DatanodeInfoWithStorage[10.200.128.45:9866,DS-250fd4cf-705f-4cb5-bc3a-c7a105247e35,DISK]] {code} this is the result of hdfs fsck. Your can see this block group has 9 internal block, but no blk_-9223372036797335468, two repeated blk_-9223372036797335470. this block is too old so that the log is missing, so I don't know the reason, and can't reproduction this error now. > Get checksum of EC file failed, when some block is missing or corrupt > - > > Key: HDFS-15237 > URL: https://issues.apache.org/jira/browse/HDFS-15237 > Project: Hadoop HDFS > Issue Type: Bug > Components: ec, hdfs >Affects Versions: 3.2.1 >Reporter: zhengchenyu >Priority: Major > Fix For: 3.2.2 > > > When we distcp from an ec directory to another one, I found some error like > this. > {code} > 2020-03-20 20:18:21,366 WARN [main] > org.apache.hadoop.hdfs.FileChecksumHelper: src=/EC/6-3//000325_0, > datanodes[6]=DatanodeInfoWithStorage[10.200.128.40:9866,DS-65ac4407-9d33-4c59-8f72-dd1d80d26d9f,DISK]2020-03-20 > 20:18:21,366 WARN [main] org.apache.hadoop.hdfs.FileChecksumHelper: > src=/EC/6-3//000325_0, > datanodes[6]=DatanodeInfoWithStorage[10.200.128.40:9866,DS-65ac4407-9d33-4c59-8f72-dd1d80d26d9f,DISK]java.io.EOFException: > Unexpected EOF while trying to read response from server at > org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:550) > at > org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.tryDatanode(FileChecksumHelper.java:709)
[jira] [Comment Edited] (HDFS-15237) Get checksum of EC file failed, when some block is missing or corrupt
[ https://issues.apache.org/jira/browse/HDFS-15237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17065461#comment-17065461 ] zhengchenyu edited comment on HDFS-15237 at 3/24/20, 9:42 AM: -- Here comment some strange phenomenon. Why this error happen? Because I found some wrong internal block distribution like this: {code:java} 0. BP-1936287042-10.200.128.33-1573194961291:blk_-9223372036797335472_3783205 len=247453749 Live_repl=8 [blk_-9223372036797335472:DatanodeInfoWithStorage[10.200.128.43:9866,DS-2ddde0b8-6a84-4d06-8a40-d4ae5691e81c,DISK], blk_-9223372036797335471:DatanodeInfoWithStorage[10.200.128.41:9866,DS-a4fc5486-6c45-481e-84e7-9393eeaf1313,DISK], blk_-9223372036797335470:DatanodeInfoWithStorage[10.200.128.50:9866,DS-fc0632c6-8916-42d8-8219-57b022bb2786,DISK], blk_-9223372036797335469:DatanodeInfoWithStorage[10.200.128.54:9866,DS-1b6cb52a-f55a-4ef8-beaf-a5d7b7fe93aa,DISK], blk_-9223372036797335467:DatanodeInfoWithStorage[10.200.128.52:9866,DS-fc6e00dd-ca5a-4580-9403-aeb6906da81a,DISK], blk_-9223372036797335466:DatanodeInfoWithStorage[10.200.128.53:9866,DS-2c926a3b-64c0-441b-abe2-188e79918abe,DISK], blk_-9223372036797335465:DatanodeInfoWithStorage[10.200.128.40:9866,DS-65ac4407-9d33-4c59-8f72-dd1d80d26d9f,DISK], blk_-9223372036797335464:DatanodeInfoWithStorage[10.200.128.44:9866,DS-3725af76-fe86-4f97-9740-d77bfa339b3f,DISK], blk_-9223372036797335470:DatanodeInfoWithStorage[10.200.128.45:9866,DS-250fd4cf-705f-4cb5-bc3a-c7a105247e35,DISK]] {code} this is the result of hdfs fsck. Your can see this block group has 9 internal block, but no blk_-9223372036797335468, two repeated blk_-9223372036797335470. this block is too old so that the log is missing, so I don't know the reason, and can't reproduction this error now. was (Author: zhengchenyu): Here comment some strange phenomenon. Why this error happen? Because I found some wrong internal block distribution like this: {code} 0. BP-1936287042-10.200.128.33-1573194961291:blk_-9223372036797335472_3783205 len=247453749 Live_repl=8 [blk_-9223372036797335472:DatanodeInfoWithStorage[10.200.128.43:9866,DS-2ddde0b8-6a84-4d06-8a40-d4ae5691e81c,DISK], blk_-9223372036797335471:DatanodeInfoWithStorage[10.200.128.41:9866,DS-a4fc5486-6c45-481e-84e7-9393eeaf1313,DISK], blk_-9223372036797335470:DatanodeInfoWithStorage[10.200.128.50:9866,DS-fc0632c6-8916-42d8-8219-57b022bb2786,DISK], blk_-9223372036797335469:DatanodeInfoWithStorage[10.200.128.54:9866,DS-1b6cb52a-f55a-4ef8-beaf-a5d7b7fe93aa,DISK], blk_-9223372036797335467:DatanodeInfoWithStorage[10.200.128.52:9866,DS-fc6e00dd-ca5a-4580-9403-aeb6906da81a,DISK], blk_-9223372036797335466:DatanodeInfoWithStorage[10.200.128.53:9866,DS-2c926a3b-64c0-441b-abe2-188e79918abe,DISK], blk_-9223372036797335465:DatanodeInfoWithStorage[10.200.128.40:9866,DS-65ac4407-9d33-4c59-8f72-dd1d80d26d9f,DISK], blk_-9223372036797335464:DatanodeInfoWithStorage[10.200.128.44:9866,DS-3725af76-fe86-4f97-9740-d77bfa339b3f,DISK], blk_-9223372036797335470:DatanodeInfoWithStorage[10.200.128.45:9866,DS-250fd4cf-705f-4cb5-bc3a-c7a105247e35,DISK]] {code} this is the result of hdfs fsck. Your can see this block group has 9 internal block, but no blk_-9223372036797335468, two repeated blk_-9223372036797335480. this block is too old so that the log is missing, so I don't know the reason, and can't reproduction this error now. > Get checksum of EC file failed, when some block is missing or corrupt > - > > Key: HDFS-15237 > URL: https://issues.apache.org/jira/browse/HDFS-15237 > Project: Hadoop HDFS > Issue Type: Bug > Components: ec, hdfs >Affects Versions: 3.2.1 >Reporter: zhengchenyu >Priority: Major > Fix For: 3.2.2 > > > When we distcp from an ec directory to another one, I found some error like > this. > {code} > 2020-03-20 20:18:21,366 WARN [main] > org.apache.hadoop.hdfs.FileChecksumHelper: src=/EC/6-3//000325_0, > datanodes[6]=DatanodeInfoWithStorage[10.200.128.40:9866,DS-65ac4407-9d33-4c59-8f72-dd1d80d26d9f,DISK]2020-03-20 > 20:18:21,366 WARN [main] org.apache.hadoop.hdfs.FileChecksumHelper: > src=/EC/6-3//000325_0, > datanodes[6]=DatanodeInfoWithStorage[10.200.128.40:9866,DS-65ac4407-9d33-4c59-8f72-dd1d80d26d9f,DISK]java.io.EOFException: > Unexpected EOF while trying to read response from server at > org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:550) > at > org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.tryDatanode(FileChecksumHelper.java:709) > at > org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlockGroup(FileChecksumHelper.java:664) > at > org.apache.hado
[jira] [Commented] (HDFS-15237) Get checksum of EC file failed, when some block is missing or corrupt
[ https://issues.apache.org/jira/browse/HDFS-15237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17065461#comment-17065461 ] zhengchenyu commented on HDFS-15237: Here comment some strange phenomenon. Why this error happen? Because I found some wrong internal block distribution like this: {code} 0. BP-1936287042-10.200.128.33-1573194961291:blk_-9223372036797335472_3783205 len=247453749 Live_repl=8 [blk_-9223372036797335472:DatanodeInfoWithStorage[10.200.128.43:9866,DS-2ddde0b8-6a84-4d06-8a40-d4ae5691e81c,DISK], blk_-9223372036797335471:DatanodeInfoWithStorage[10.200.128.41:9866,DS-a4fc5486-6c45-481e-84e7-9393eeaf1313,DISK], blk_-9223372036797335470:DatanodeInfoWithStorage[10.200.128.50:9866,DS-fc0632c6-8916-42d8-8219-57b022bb2786,DISK], blk_-9223372036797335469:DatanodeInfoWithStorage[10.200.128.54:9866,DS-1b6cb52a-f55a-4ef8-beaf-a5d7b7fe93aa,DISK], blk_-9223372036797335467:DatanodeInfoWithStorage[10.200.128.52:9866,DS-fc6e00dd-ca5a-4580-9403-aeb6906da81a,DISK], blk_-9223372036797335466:DatanodeInfoWithStorage[10.200.128.53:9866,DS-2c926a3b-64c0-441b-abe2-188e79918abe,DISK], blk_-9223372036797335465:DatanodeInfoWithStorage[10.200.128.40:9866,DS-65ac4407-9d33-4c59-8f72-dd1d80d26d9f,DISK], blk_-9223372036797335464:DatanodeInfoWithStorage[10.200.128.44:9866,DS-3725af76-fe86-4f97-9740-d77bfa339b3f,DISK], blk_-9223372036797335470:DatanodeInfoWithStorage[10.200.128.45:9866,DS-250fd4cf-705f-4cb5-bc3a-c7a105247e35,DISK]] {code} this is the result of hdfs fsck. Your can see this block group has 9 internal block, but no blk_-9223372036797335468, two repeated blk_-9223372036797335480. this block is too old so that the log is missing, so I don't know the reason, and can't reproduction this error now. > Get checksum of EC file failed, when some block is missing or corrupt > - > > Key: HDFS-15237 > URL: https://issues.apache.org/jira/browse/HDFS-15237 > Project: Hadoop HDFS > Issue Type: Bug > Components: ec, hdfs >Affects Versions: 3.2.1 >Reporter: zhengchenyu >Priority: Major > Fix For: 3.2.2 > > > When we distcp from an ec directory to another one, I found some error like > this. > {code} > 2020-03-20 20:18:21,366 WARN [main] > org.apache.hadoop.hdfs.FileChecksumHelper: src=/EC/6-3//000325_0, > datanodes[6]=DatanodeInfoWithStorage[10.200.128.40:9866,DS-65ac4407-9d33-4c59-8f72-dd1d80d26d9f,DISK]2020-03-20 > 20:18:21,366 WARN [main] org.apache.hadoop.hdfs.FileChecksumHelper: > src=/EC/6-3//000325_0, > datanodes[6]=DatanodeInfoWithStorage[10.200.128.40:9866,DS-65ac4407-9d33-4c59-8f72-dd1d80d26d9f,DISK]java.io.EOFException: > Unexpected EOF while trying to read response from server at > org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:550) > at > org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.tryDatanode(FileChecksumHelper.java:709) > at > org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlockGroup(FileChecksumHelper.java:664) > at > org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlocks(FileChecksumHelper.java:638) > at > org.apache.hadoop.hdfs.FileChecksumHelper$FileChecksumComputer.compute(FileChecksumHelper.java:252) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumInternal(DFSClient.java:1790) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumWithCombineMode(DFSClient.java:1810) > at > org.apache.hadoop.hdfs.DistributedFileSystem$33.doCall(DistributedFileSystem.java:1691) > at > org.apache.hadoop.hdfs.DistributedFileSystem$33.doCall(DistributedFileSystem.java:1688) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1700) > at > org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:138) > at > org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:115) > at > org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87) > at > org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:259) > at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:220) at > org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:48) at > org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) at > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799) at > org.apache.hadoop.mapred.MapTask.run(MapTask.java:347) at > org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174) at > java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.S
[jira] [Created] (HDFS-15237) Get checksum of EC file failed, when some block is missing or corrupt
zhengchenyu created HDFS-15237: -- Summary: Get checksum of EC file failed, when some block is missing or corrupt Key: HDFS-15237 URL: https://issues.apache.org/jira/browse/HDFS-15237 Project: Hadoop HDFS Issue Type: Bug Components: ec, hdfs Affects Versions: 3.2.1 Reporter: zhengchenyu Fix For: 3.2.2 When we distcp from an ec directory to another one, I found some error like this. {code} 2020-03-20 20:18:21,366 WARN [main] org.apache.hadoop.hdfs.FileChecksumHelper: src=/EC/6-3//000325_0, datanodes[6]=DatanodeInfoWithStorage[10.200.128.40:9866,DS-65ac4407-9d33-4c59-8f72-dd1d80d26d9f,DISK]2020-03-20 20:18:21,366 WARN [main] org.apache.hadoop.hdfs.FileChecksumHelper: src=/EC/6-3//000325_0, datanodes[6]=DatanodeInfoWithStorage[10.200.128.40:9866,DS-65ac4407-9d33-4c59-8f72-dd1d80d26d9f,DISK]java.io.EOFException: Unexpected EOF while trying to read response from server at org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:550) at org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.tryDatanode(FileChecksumHelper.java:709) at org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlockGroup(FileChecksumHelper.java:664) at org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlocks(FileChecksumHelper.java:638) at org.apache.hadoop.hdfs.FileChecksumHelper$FileChecksumComputer.compute(FileChecksumHelper.java:252) at org.apache.hadoop.hdfs.DFSClient.getFileChecksumInternal(DFSClient.java:1790) at org.apache.hadoop.hdfs.DFSClient.getFileChecksumWithCombineMode(DFSClient.java:1810) at org.apache.hadoop.hdfs.DistributedFileSystem$33.doCall(DistributedFileSystem.java:1691) at org.apache.hadoop.hdfs.DistributedFileSystem$33.doCall(DistributedFileSystem.java:1688) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1700) at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:138) at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:115) at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87) at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:259) at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:220) at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:48) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168) {code} And Then I found some error in datanode like this {code} 2020-03-20 20:54:16,573 INFO org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false 2020-03-20 20:54:16,577 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: bd-hadoop-128050.zeus.lianjia.com:9866:DataXceiver error processing BLOCK_GROUP_CHECKSUM operation src: /10.201.1.38:33264 dst: /10.200.128.50:9866 java.lang.UnsupportedOperationException at java.nio.ByteBuffer.array(ByteBuffer.java:994) at org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockChecksumReconstructor.reconstruct(StripedBlockChecksumReconstructor.java:90) at org.apache.hadoop.hdfs.server.datanode.BlockChecksumHelper$BlockGroupNonStripedChecksumComputer.recalculateChecksum(BlockChecksumHelper.java:711) at org.apache.hadoop.hdfs.server.datanode.BlockChecksumHelper$BlockGroupNonStripedChecksumComputer.compute(BlockChecksumHelper.java:489) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.blockGroupChecksum(DataXceiver.java:1047) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opStripedBlockChecksum(Receiver.java:327) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:119) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:292) at java.lang.Thread.run(Thread.java:748) {code} The reason is that: When some block is missing or corrupt, datanode will trigger to call recalculateChecksum. But if StripedBlockChecksumReconstructor.targetBuffer is DirectByteBuffer, we couldn't use DirectByteBuffer.array(), so throw the exception. Then we could't get checksum. -- This message was sent by Atlassian Jira (v8.3.4#803005) -
[jira] [Commented] (HDFS-14476) lock too long when fix inconsistent blocks between disk and in-memory
[ https://issues.apache.org/jira/browse/HDFS-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17065398#comment-17065398 ] Sean Chow commented on HDFS-14476: -- Hi [~weichiu], I can't see why it builds failed. Any chance to merge this to 2.10.1? > lock too long when fix inconsistent blocks between disk and in-memory > - > > Key: HDFS-14476 > URL: https://issues.apache.org/jira/browse/HDFS-14476 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.6.0, 2.7.0, 3.0.3 >Reporter: Sean Chow >Assignee: Sean Chow >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14476-branch-2.01.patch, > HDFS-14476-branch-2.02.patch, HDFS-14476.00.patch, HDFS-14476.002.patch, > HDFS-14476.01.patch, HDFS-14476.branch-3.2.001.patch, > datanode-with-patch-14476.png > > > When directoryScanner have the results of differences between disk and > in-memory blocks. it will try to run {{checkAndUpdate}} to fix it. However > {{FsDatasetImpl.checkAndUpdate}} is a synchronized call > As I have about 6millions blocks for every datanodes and every 6hours' scan > will have about 25000 abnormal blocks to fix. That leads to a long lock > holding FsDatasetImpl object. > let's assume every block need 10ms to fix(because of latency of SAS disk), > that will cost 250 seconds to finish. That means all reads and writes will be > blocked for 3mins for that datanode. > > {code:java} > 2019-05-06 08:06:51,704 INFO > org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool > BP-1644920766-10.223.143.220-1450099987967 Total blocks: 6850197, missing > metadata files:23574, missing block files:23574, missing blocks in > memory:47625, mismatched blocks:0 > ... > 2019-05-06 08:16:41,625 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Took 588402ms to process 1 commands from NN > {code} > Take long time to process command from nn because threads are blocked. And > namenode will see long lastContact time for this datanode. > Maybe this affect all hdfs versions. > *how to fix:* > just like process invalidate command from namenode with 1000 batch size, fix > these abnormal block should be handled with batch too and sleep 2 seconds > between the batch to allow normal reading/writing blocks. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14429) Block remain in COMMITTED but not COMPLETE caused by Decommission
[ https://issues.apache.org/jira/browse/HDFS-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17065375#comment-17065375 ] dark_num edited comment on HDFS-14429 at 3/24/20, 7:31 AM: --- {code:java} // Why not use this method to judge the replica state // It also include the "MAINTENANCE" case, avoid wrong calculation // because currentLiveReplic only includes "normal/live" state if (result == AddBlockResult.ADDED) { curReplicaDelta = (node.isInService()) ? 1 : 0; //... } public enum AdminStates { NORMAL("In Service"), DECOMMISSION_INPROGRESS("Decommission In Progress"), DECOMMISSIONED("Decommissioned"), ENTERING_MAINTENANCE("Entering Maintenance"), IN_MAINTENANCE("In Maintenance"); //... }{code} [~caiyicong] Thank you for your efforts and look forward to your reply, was (Author: dark_num): {code:java} // Why not use this method to judge the replica state // It also include the "MAINTENANCE" case, avoid wrong calculation if (result == AddBlockResult.ADDED) { curReplicaDelta = (node.isInService()) ? 1 : 0; //... } public enum AdminStates { NORMAL("In Service"), DECOMMISSION_INPROGRESS("Decommission In Progress"), DECOMMISSIONED("Decommissioned"), ENTERING_MAINTENANCE("Entering Maintenance"), IN_MAINTENANCE("In Maintenance"); //... }{code} [~caiyicong] Thank you for your efforts and look forward to your reply, > Block remain in COMMITTED but not COMPLETE caused by Decommission > - > > Key: HDFS-14429 > URL: https://issues.apache.org/jira/browse/HDFS-14429 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.9.2 >Reporter: Yicong Cai >Assignee: Yicong Cai >Priority: Major > Fix For: 2.10.0, 3.3.0, 3.2.1, 2.9.3, 3.1.3 > > Attachments: HDFS-14429.01.patch, HDFS-14429.02.patch, > HDFS-14429.03.patch, HDFS-14429.branch-2.01.patch, > HDFS-14429.branch-2.02.patch > > > In the following scenario, the Block will remain in the COMMITTED but not > COMPLETE state and cannot be closed properly: > # Client writes Block(bk1) to three data nodes (dn1/dn2/dn3). > # bk1 has been completely written to three data nodes, and the data node > succeeds FinalizeBlock, joins IBR and waits to report to NameNode. > # The client commits bk1 after receiving the ACK. > # When the DN has not been reported to the IBR, all three nodes dn1/dn2/dn3 > enter Decommissioning. > # The DN reports the IBR, but the block cannot be completed normally. > > Then it will lead to the following related exceptions: > {panel:title=Exception} > 2019-04-02 13:40:31,882 INFO namenode.FSNamesystem > (FSNamesystem.java:checkBlocksComplete(2790)) - BLOCK* > blk_4313483521_3245321090 is COMMITTED but not COMPLETE(numNodes= 3 >= > minimum = 1) in file xxx > 2019-04-02 13:40:31,882 INFO ipc.Server (Server.java:logException(2650)) - > IPC Server handler 499 on 8020, call Call#122552 Retry#0 > org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from xxx:47615 > org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not > replicated yet: xxx > at > org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2579) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:846) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:510) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:503) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:871) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:817) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606) > {panel} > And will cause the scenario described in HDFS-12747 > The root cause is that addStoredBlock does not consider the case where the > replications are in Decommission. > This problem needs to be fixed like HDFS-11499. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14429) Block remain in COMMITTED but not COMPLETE caused by Decommission
[ https://issues.apache.org/jira/browse/HDFS-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17065375#comment-17065375 ] dark_num edited comment on HDFS-14429 at 3/24/20, 7:30 AM: --- {code:java} // Why not use this method to judge the replica state // It also include the "MAINTENANCE" case, avoid wrong calculation if (result == AddBlockResult.ADDED) { curReplicaDelta = (node.isInService()) ? 1 : 0; //... } public enum AdminStates { NORMAL("In Service"), DECOMMISSION_INPROGRESS("Decommission In Progress"), DECOMMISSIONED("Decommissioned"), ENTERING_MAINTENANCE("Entering Maintenance"), IN_MAINTENANCE("In Maintenance"); //... }{code} [~caiyicong] Thank you for your efforts and look forward to your reply, was (Author: dark_num): {code:java} // Why not use this method to judge the replica state // It also include the "MAINTENANCE" case, avoid wrong calculation if (result == AddBlockResult.ADDED) { curReplicaDelta = (node.isInService()) ? 0 : 1; //... } public enum AdminStates { NORMAL("In Service"), DECOMMISSION_INPROGRESS("Decommission In Progress"), DECOMMISSIONED("Decommissioned"), ENTERING_MAINTENANCE("Entering Maintenance"), IN_MAINTENANCE("In Maintenance"); //... }{code} [~caiyicong] Thank you for your efforts and look forward to your reply, > Block remain in COMMITTED but not COMPLETE caused by Decommission > - > > Key: HDFS-14429 > URL: https://issues.apache.org/jira/browse/HDFS-14429 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.9.2 >Reporter: Yicong Cai >Assignee: Yicong Cai >Priority: Major > Fix For: 2.10.0, 3.3.0, 3.2.1, 2.9.3, 3.1.3 > > Attachments: HDFS-14429.01.patch, HDFS-14429.02.patch, > HDFS-14429.03.patch, HDFS-14429.branch-2.01.patch, > HDFS-14429.branch-2.02.patch > > > In the following scenario, the Block will remain in the COMMITTED but not > COMPLETE state and cannot be closed properly: > # Client writes Block(bk1) to three data nodes (dn1/dn2/dn3). > # bk1 has been completely written to three data nodes, and the data node > succeeds FinalizeBlock, joins IBR and waits to report to NameNode. > # The client commits bk1 after receiving the ACK. > # When the DN has not been reported to the IBR, all three nodes dn1/dn2/dn3 > enter Decommissioning. > # The DN reports the IBR, but the block cannot be completed normally. > > Then it will lead to the following related exceptions: > {panel:title=Exception} > 2019-04-02 13:40:31,882 INFO namenode.FSNamesystem > (FSNamesystem.java:checkBlocksComplete(2790)) - BLOCK* > blk_4313483521_3245321090 is COMMITTED but not COMPLETE(numNodes= 3 >= > minimum = 1) in file xxx > 2019-04-02 13:40:31,882 INFO ipc.Server (Server.java:logException(2650)) - > IPC Server handler 499 on 8020, call Call#122552 Retry#0 > org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from xxx:47615 > org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not > replicated yet: xxx > at > org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2579) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:846) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:510) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:503) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:871) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:817) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606) > {panel} > And will cause the scenario described in HDFS-12747 > The root cause is that addStoredBlock does not consider the case where the > replications are in Decommission. > This problem needs to be fixed like HDFS-11499. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14429) Block remain in COMMITTED but not COMPLETE caused by Decommission
[ https://issues.apache.org/jira/browse/HDFS-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17065375#comment-17065375 ] dark_num edited comment on HDFS-14429 at 3/24/20, 7:28 AM: --- {code:java} // Why not use this method to judge the replica state // It also include the "MAINTENANCE" case, avoid wrong calculation if (result == AddBlockResult.ADDED) { curReplicaDelta = (node.isInService()) ? 0 : 1; //... } public enum AdminStates { NORMAL("In Service"), DECOMMISSION_INPROGRESS("Decommission In Progress"), DECOMMISSIONED("Decommissioned"), ENTERING_MAINTENANCE("Entering Maintenance"), IN_MAINTENANCE("In Maintenance"); //... }{code} [~caiyicong] Thank you for your efforts and look forward to your reply, was (Author: dark_num): {code:java} // Why not use this method to judge the replica state // It also include the "MAINTENANCE" case, avoid wrong calculation if (result == AddBlockResult.ADDED) { curReplicaDelta = (node.isInService()) ? 0 : 1; //... } {code} Thank you for your efforts and look forward to your reply, > Block remain in COMMITTED but not COMPLETE caused by Decommission > - > > Key: HDFS-14429 > URL: https://issues.apache.org/jira/browse/HDFS-14429 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.9.2 >Reporter: Yicong Cai >Assignee: Yicong Cai >Priority: Major > Fix For: 2.10.0, 3.3.0, 3.2.1, 2.9.3, 3.1.3 > > Attachments: HDFS-14429.01.patch, HDFS-14429.02.patch, > HDFS-14429.03.patch, HDFS-14429.branch-2.01.patch, > HDFS-14429.branch-2.02.patch > > > In the following scenario, the Block will remain in the COMMITTED but not > COMPLETE state and cannot be closed properly: > # Client writes Block(bk1) to three data nodes (dn1/dn2/dn3). > # bk1 has been completely written to three data nodes, and the data node > succeeds FinalizeBlock, joins IBR and waits to report to NameNode. > # The client commits bk1 after receiving the ACK. > # When the DN has not been reported to the IBR, all three nodes dn1/dn2/dn3 > enter Decommissioning. > # The DN reports the IBR, but the block cannot be completed normally. > > Then it will lead to the following related exceptions: > {panel:title=Exception} > 2019-04-02 13:40:31,882 INFO namenode.FSNamesystem > (FSNamesystem.java:checkBlocksComplete(2790)) - BLOCK* > blk_4313483521_3245321090 is COMMITTED but not COMPLETE(numNodes= 3 >= > minimum = 1) in file xxx > 2019-04-02 13:40:31,882 INFO ipc.Server (Server.java:logException(2650)) - > IPC Server handler 499 on 8020, call Call#122552 Retry#0 > org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from xxx:47615 > org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not > replicated yet: xxx > at > org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2579) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:846) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:510) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:503) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:871) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:817) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606) > {panel} > And will cause the scenario described in HDFS-12747 > The root cause is that addStoredBlock does not consider the case where the > replications are in Decommission. > This problem needs to be fixed like HDFS-11499. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14429) Block remain in COMMITTED but not COMPLETE caused by Decommission
[ https://issues.apache.org/jira/browse/HDFS-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17065375#comment-17065375 ] dark_num edited comment on HDFS-14429 at 3/24/20, 7:26 AM: --- {code:java} // Why not use this method to judge the replica state // It also include the "MAINTENANCE" case, avoid wrong calculation if (result == AddBlockResult.ADDED) { curReplicaDelta = (node.isInService()) ? 0 : 1; //... } {code} Thank you for your efforts and look forward to your reply, was (Author: dark_num): {code:java} // Why not use this method to judge the replica state // It also include the "MAINTENANCE" case, avoid wrong calculation if (result == AddBlockResult.ADDED) { curReplicaDelta = (node.isInService()) ? 0 : 1; //... } {code} Thank you for your efforts and look forward to your reply, > Block remain in COMMITTED but not COMPLETE caused by Decommission > - > > Key: HDFS-14429 > URL: https://issues.apache.org/jira/browse/HDFS-14429 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.9.2 >Reporter: Yicong Cai >Assignee: Yicong Cai >Priority: Major > Fix For: 2.10.0, 3.3.0, 3.2.1, 2.9.3, 3.1.3 > > Attachments: HDFS-14429.01.patch, HDFS-14429.02.patch, > HDFS-14429.03.patch, HDFS-14429.branch-2.01.patch, > HDFS-14429.branch-2.02.patch > > > In the following scenario, the Block will remain in the COMMITTED but not > COMPLETE state and cannot be closed properly: > # Client writes Block(bk1) to three data nodes (dn1/dn2/dn3). > # bk1 has been completely written to three data nodes, and the data node > succeeds FinalizeBlock, joins IBR and waits to report to NameNode. > # The client commits bk1 after receiving the ACK. > # When the DN has not been reported to the IBR, all three nodes dn1/dn2/dn3 > enter Decommissioning. > # The DN reports the IBR, but the block cannot be completed normally. > > Then it will lead to the following related exceptions: > {panel:title=Exception} > 2019-04-02 13:40:31,882 INFO namenode.FSNamesystem > (FSNamesystem.java:checkBlocksComplete(2790)) - BLOCK* > blk_4313483521_3245321090 is COMMITTED but not COMPLETE(numNodes= 3 >= > minimum = 1) in file xxx > 2019-04-02 13:40:31,882 INFO ipc.Server (Server.java:logException(2650)) - > IPC Server handler 499 on 8020, call Call#122552 Retry#0 > org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from xxx:47615 > org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not > replicated yet: xxx > at > org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2579) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:846) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:510) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:503) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:871) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:817) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606) > {panel} > And will cause the scenario described in HDFS-12747 > The root cause is that addStoredBlock does not consider the case where the > replications are in Decommission. > This problem needs to be fixed like HDFS-11499. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14429) Block remain in COMMITTED but not COMPLETE caused by Decommission
[ https://issues.apache.org/jira/browse/HDFS-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17065375#comment-17065375 ] dark_num edited comment on HDFS-14429 at 3/24/20, 7:25 AM: --- {code:java} // Why not use this method to judge the replica state // It also include the "MAINTENANCE" case, avoid wrong calculation if (result == AddBlockResult.ADDED) { curReplicaDelta = (node.isInService()) ? 0 : 1; //... } {code} Thank you for your efforts and look forward to your reply, was (Author: dark_num): {code:java} // Why not use this method to judge the replica state // It also include the "MAINTENANCE" case, avoid wrong calculation if (result == AddBlockResult.ADDED) { curReplicaDelta = (node.isInService()) ? 0 : 1; //... } Thank you for your efforts and look forward to your reply {code} > Block remain in COMMITTED but not COMPLETE caused by Decommission > - > > Key: HDFS-14429 > URL: https://issues.apache.org/jira/browse/HDFS-14429 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.9.2 >Reporter: Yicong Cai >Assignee: Yicong Cai >Priority: Major > Fix For: 2.10.0, 3.3.0, 3.2.1, 2.9.3, 3.1.3 > > Attachments: HDFS-14429.01.patch, HDFS-14429.02.patch, > HDFS-14429.03.patch, HDFS-14429.branch-2.01.patch, > HDFS-14429.branch-2.02.patch > > > In the following scenario, the Block will remain in the COMMITTED but not > COMPLETE state and cannot be closed properly: > # Client writes Block(bk1) to three data nodes (dn1/dn2/dn3). > # bk1 has been completely written to three data nodes, and the data node > succeeds FinalizeBlock, joins IBR and waits to report to NameNode. > # The client commits bk1 after receiving the ACK. > # When the DN has not been reported to the IBR, all three nodes dn1/dn2/dn3 > enter Decommissioning. > # The DN reports the IBR, but the block cannot be completed normally. > > Then it will lead to the following related exceptions: > {panel:title=Exception} > 2019-04-02 13:40:31,882 INFO namenode.FSNamesystem > (FSNamesystem.java:checkBlocksComplete(2790)) - BLOCK* > blk_4313483521_3245321090 is COMMITTED but not COMPLETE(numNodes= 3 >= > minimum = 1) in file xxx > 2019-04-02 13:40:31,882 INFO ipc.Server (Server.java:logException(2650)) - > IPC Server handler 499 on 8020, call Call#122552 Retry#0 > org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from xxx:47615 > org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not > replicated yet: xxx > at > org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2579) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:846) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:510) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:503) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:871) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:817) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606) > {panel} > And will cause the scenario described in HDFS-12747 > The root cause is that addStoredBlock does not consider the case where the > replications are in Decommission. > This problem needs to be fixed like HDFS-11499. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14429) Block remain in COMMITTED but not COMPLETE caused by Decommission
[ https://issues.apache.org/jira/browse/HDFS-14429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17065375#comment-17065375 ] dark_num commented on HDFS-14429: - {code:java} // Why not use this method to judge the replica state // It also include the "MAINTENANCE" case, avoid wrong calculation if (result == AddBlockResult.ADDED) { curReplicaDelta = (node.isInService()) ? 0 : 1; //... } Thank you for your efforts and look forward to your reply {code} > Block remain in COMMITTED but not COMPLETE caused by Decommission > - > > Key: HDFS-14429 > URL: https://issues.apache.org/jira/browse/HDFS-14429 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.9.2 >Reporter: Yicong Cai >Assignee: Yicong Cai >Priority: Major > Fix For: 2.10.0, 3.3.0, 3.2.1, 2.9.3, 3.1.3 > > Attachments: HDFS-14429.01.patch, HDFS-14429.02.patch, > HDFS-14429.03.patch, HDFS-14429.branch-2.01.patch, > HDFS-14429.branch-2.02.patch > > > In the following scenario, the Block will remain in the COMMITTED but not > COMPLETE state and cannot be closed properly: > # Client writes Block(bk1) to three data nodes (dn1/dn2/dn3). > # bk1 has been completely written to three data nodes, and the data node > succeeds FinalizeBlock, joins IBR and waits to report to NameNode. > # The client commits bk1 after receiving the ACK. > # When the DN has not been reported to the IBR, all three nodes dn1/dn2/dn3 > enter Decommissioning. > # The DN reports the IBR, but the block cannot be completed normally. > > Then it will lead to the following related exceptions: > {panel:title=Exception} > 2019-04-02 13:40:31,882 INFO namenode.FSNamesystem > (FSNamesystem.java:checkBlocksComplete(2790)) - BLOCK* > blk_4313483521_3245321090 is COMMITTED but not COMPLETE(numNodes= 3 >= > minimum = 1) in file xxx > 2019-04-02 13:40:31,882 INFO ipc.Server (Server.java:logException(2650)) - > IPC Server handler 499 on 8020, call Call#122552 Retry#0 > org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from xxx:47615 > org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not > replicated yet: xxx > at > org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2579) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:846) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:510) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:503) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:871) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:817) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606) > {panel} > And will cause the scenario described in HDFS-12747 > The root cause is that addStoredBlock does not consider the case where the > replications are in Decommission. > This problem needs to be fixed like HDFS-11499. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org