[jira] [Comment Edited] (HDFS-14928) UI: unifying the WebUI across different components.
[ https://issues.apache.org/jira/browse/HDFS-14928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968124#comment-16968124 ] Xieming Li edited comment on HDFS-14928 at 11/6/19 6:52 AM: I have tried implementing #1 on web UI of NameNode, Router, and DataNode. After applying `HDFS-14928.001.patch`, the WebUI will look like the followings: !HDFS-14928.jpg|width=600! During the implementation, I came across two issues that I want to discuss: # I haven't modify the Web UI of DataNode because JMX of DN does not contain any information about its own running status. We have to either A) expose the DN's running status in JMX metrics or B) use ajax to query the JMX of NN. We can also C) skip the changes for now. Which do you think is better amoug A,B or C? # NNs can be in Safemode and Standby at the same time. In current implementation, safemode will never be shown on Overview page. Should we change it so that when a Standby NN is in Safemode, we show Safemode icon. was (Author: risyomei): I have tried implementing #1 on web UI of NameNode, Router, and DataNode. After applying HDFS-14928.001.patch, the WebUI will look like the followings: !HDFS-14928.jpg|width=600! During the implementation, I came across two issues that I want to discuss: # I haven't modify the Web UI of DataNode because JMX of DN does not contain any information about its own running status. We have to either A) expose the DN's running status in JMX metrics or B) use ajax to query the JMX of NN. We can also C) skip the changes for now. Which do you think is better amoug A,B or C? # NNs can be in Safemode and Standby at the same time. In current implementation, safemode will never be shown on Overview page. Should we change it so that when a Standby NN is in Safemode, we show Safemode icon. > UI: unifying the WebUI across different components. > --- > > Key: HDFS-14928 > URL: https://issues.apache.org/jira/browse/HDFS-14928 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ui >Reporter: Xieming Li >Assignee: Xieming Li >Priority: Trivial > Attachments: DN_orig.png, DN_with_legend.png.png, DN_wo_legend.png, > HDFS-14928.001.patch, HDFS-14928.jpg, NN_orig.png, NN_with_legend.png, > NN_wo_legend.png, RBF_orig.png, RBF_with_legend.png, RBF_wo_legend.png > > > The WebUI of different components could be unified. > *Router:* > |Current| !RBF_orig.png|width=500! | > |Proposed 1 (With Icon) | !RBF_wo_legend.png|width=500! | > |Proposed 2 (With Icon and Legend)|!RBF_with_legend.png|width=500! | > *NameNode:* > |Current| !NN_orig.png|width=500! | > |Proposed 1 (With Icon) | !NN_wo_legend.png|width=500! | > |Proposed 2 (With Icon and Legend)| !NN_with_legend.png|width=500! | > *DataNode:* > |Current| !DN_orig.png|width=500! | > |Proposed 1 (With Icon) | !DN_wo_legend.png|width=500! | > |Proposed 2 (With Icon and Legend)| !DN_with_legend.png.png|width=500! | -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14928) UI: unifying the WebUI across different components.
[ https://issues.apache.org/jira/browse/HDFS-14928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968124#comment-16968124 ] Xieming Li commented on HDFS-14928: --- I have tried implementing #1 on web UI of NameNode, Router, and DataNode. After applying HDFS-14928.001.patch, the WebUI will look like the followings: !HDFS-14928.jpg|width=600! During the implementation, I came across two issues that I want to discuss: # I haven't modify the Web UI of DataNode because JMX of DN does not contain any information about its own running status. We have to either A) expose the DN's running status in JMX metrics or B) use ajax to query the JMX of NN. We can also C) skip the changes for now. Which do you think is better amoug A,B or C? # NNs can be in Safemode and Standby at the same time. In current implementation, safemode will never be shown on Overview page. Should we change it so that when a Standby NN is in Safemode, we show Safemode icon. > UI: unifying the WebUI across different components. > --- > > Key: HDFS-14928 > URL: https://issues.apache.org/jira/browse/HDFS-14928 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ui >Reporter: Xieming Li >Assignee: Xieming Li >Priority: Trivial > Attachments: DN_orig.png, DN_with_legend.png.png, DN_wo_legend.png, > HDFS-14928.001.patch, HDFS-14928.jpg, NN_orig.png, NN_with_legend.png, > NN_wo_legend.png, RBF_orig.png, RBF_with_legend.png, RBF_wo_legend.png > > > The WebUI of different components could be unified. > *Router:* > |Current| !RBF_orig.png|width=500! | > |Proposed 1 (With Icon) | !RBF_wo_legend.png|width=500! | > |Proposed 2 (With Icon and Legend)|!RBF_with_legend.png|width=500! | > *NameNode:* > |Current| !NN_orig.png|width=500! | > |Proposed 1 (With Icon) | !NN_wo_legend.png|width=500! | > |Proposed 2 (With Icon and Legend)| !NN_with_legend.png|width=500! | > *DataNode:* > |Current| !DN_orig.png|width=500! | > |Proposed 1 (With Icon) | !DN_wo_legend.png|width=500! | > |Proposed 2 (With Icon and Legend)| !DN_with_legend.png.png|width=500! | -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14928) UI: unifying the WebUI across different components.
[ https://issues.apache.org/jira/browse/HDFS-14928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968122#comment-16968122 ] Íñigo Goiri commented on HDFS-14928: dfshealth.html shouldn't have dependencies on RBF but the other way around. The common stuff should go to HDFS. > UI: unifying the WebUI across different components. > --- > > Key: HDFS-14928 > URL: https://issues.apache.org/jira/browse/HDFS-14928 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ui >Reporter: Xieming Li >Assignee: Xieming Li >Priority: Trivial > Attachments: DN_orig.png, DN_with_legend.png.png, DN_wo_legend.png, > HDFS-14928.001.patch, HDFS-14928.jpg, NN_orig.png, NN_with_legend.png, > NN_wo_legend.png, RBF_orig.png, RBF_with_legend.png, RBF_wo_legend.png > > > The WebUI of different components could be unified. > *Router:* > |Current| !RBF_orig.png|width=500! | > |Proposed 1 (With Icon) | !RBF_wo_legend.png|width=500! | > |Proposed 2 (With Icon and Legend)|!RBF_with_legend.png|width=500! | > *NameNode:* > |Current| !NN_orig.png|width=500! | > |Proposed 1 (With Icon) | !NN_wo_legend.png|width=500! | > |Proposed 2 (With Icon and Legend)| !NN_with_legend.png|width=500! | > *DataNode:* > |Current| !DN_orig.png|width=500! | > |Proposed 1 (With Icon) | !DN_wo_legend.png|width=500! | > |Proposed 2 (With Icon and Legend)| !DN_with_legend.png.png|width=500! | -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14928) UI: unifying the WebUI across different components.
[ https://issues.apache.org/jira/browse/HDFS-14928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968121#comment-16968121 ] Hadoop QA commented on HDFS-14928: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 8s{color} | {color:red} HDFS-14928 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-14928 | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/28260/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > UI: unifying the WebUI across different components. > --- > > Key: HDFS-14928 > URL: https://issues.apache.org/jira/browse/HDFS-14928 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ui >Reporter: Xieming Li >Assignee: Xieming Li >Priority: Trivial > Attachments: DN_orig.png, DN_with_legend.png.png, DN_wo_legend.png, > HDFS-14928.001.patch, HDFS-14928.jpg, NN_orig.png, NN_with_legend.png, > NN_wo_legend.png, RBF_orig.png, RBF_with_legend.png, RBF_wo_legend.png > > > The WebUI of different components could be unified. > *Router:* > |Current| !RBF_orig.png|width=500! | > |Proposed 1 (With Icon) | !RBF_wo_legend.png|width=500! | > |Proposed 2 (With Icon and Legend)|!RBF_with_legend.png|width=500! | > *NameNode:* > |Current| !NN_orig.png|width=500! | > |Proposed 1 (With Icon) | !NN_wo_legend.png|width=500! | > |Proposed 2 (With Icon and Legend)| !NN_with_legend.png|width=500! | > *DataNode:* > |Current| !DN_orig.png|width=500! | > |Proposed 1 (With Icon) | !DN_wo_legend.png|width=500! | > |Proposed 2 (With Icon and Legend)| !DN_with_legend.png.png|width=500! | -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14928) UI: unifying the WebUI across different components.
[ https://issues.apache.org/jira/browse/HDFS-14928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xieming Li updated HDFS-14928: -- Attachment: HDFS-14928.001.patch Status: Patch Available (was: Open) > UI: unifying the WebUI across different components. > --- > > Key: HDFS-14928 > URL: https://issues.apache.org/jira/browse/HDFS-14928 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ui >Reporter: Xieming Li >Assignee: Xieming Li >Priority: Trivial > Attachments: DN_orig.png, DN_with_legend.png.png, DN_wo_legend.png, > HDFS-14928.001.patch, HDFS-14928.jpg, NN_orig.png, NN_with_legend.png, > NN_wo_legend.png, RBF_orig.png, RBF_with_legend.png, RBF_wo_legend.png > > > The WebUI of different components could be unified. > *Router:* > |Current| !RBF_orig.png|width=500! | > |Proposed 1 (With Icon) | !RBF_wo_legend.png|width=500! | > |Proposed 2 (With Icon and Legend)|!RBF_with_legend.png|width=500! | > *NameNode:* > |Current| !NN_orig.png|width=500! | > |Proposed 1 (With Icon) | !NN_wo_legend.png|width=500! | > |Proposed 2 (With Icon and Legend)| !NN_with_legend.png|width=500! | > *DataNode:* > |Current| !DN_orig.png|width=500! | > |Proposed 1 (With Icon) | !DN_wo_legend.png|width=500! | > |Proposed 2 (With Icon and Legend)| !DN_with_legend.png.png|width=500! | -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14928) UI: unifying the WebUI across different components.
[ https://issues.apache.org/jira/browse/HDFS-14928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xieming Li updated HDFS-14928: -- Attachment: HDFS-14928.jpg > UI: unifying the WebUI across different components. > --- > > Key: HDFS-14928 > URL: https://issues.apache.org/jira/browse/HDFS-14928 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ui >Reporter: Xieming Li >Assignee: Xieming Li >Priority: Trivial > Attachments: DN_orig.png, DN_with_legend.png.png, DN_wo_legend.png, > HDFS-14928.001.patch, HDFS-14928.jpg, NN_orig.png, NN_with_legend.png, > NN_wo_legend.png, RBF_orig.png, RBF_with_legend.png, RBF_wo_legend.png > > > The WebUI of different components could be unified. > *Router:* > |Current| !RBF_orig.png|width=500! | > |Proposed 1 (With Icon) | !RBF_wo_legend.png|width=500! | > |Proposed 2 (With Icon and Legend)|!RBF_with_legend.png|width=500! | > *NameNode:* > |Current| !NN_orig.png|width=500! | > |Proposed 1 (With Icon) | !NN_wo_legend.png|width=500! | > |Proposed 2 (With Icon and Legend)| !NN_with_legend.png|width=500! | > *DataNode:* > |Current| !DN_orig.png|width=500! | > |Proposed 1 (With Icon) | !DN_wo_legend.png|width=500! | > |Proposed 2 (With Icon and Legend)| !DN_with_legend.png.png|width=500! | -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2407) Reduce log level of per-node failure in XceiverClientGrpc
[ https://issues.apache.org/jira/browse/HDDS-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Doroszlai updated HDDS-2407: --- Status: Patch Available (was: Open) > Reduce log level of per-node failure in XceiverClientGrpc > - > > Key: HDDS-2407 > URL: https://issues.apache.org/jira/browse/HDDS-2407 > Project: Hadoop Distributed Data Store > Issue Type: Task > Components: Ozone Client >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > When reading from a pipeline, client should not care if some datanode could > not service the request, as long as the pipeline as a whole is OK. The [log > message|https://github.com/apache/hadoop-ozone/blob/2529cee1a7dd27c51cb9aed0dc57af283ff24e26/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/XceiverClientGrpc.java#L303-L304] > indicating node failure was [increased to error > level|https://github.com/apache/hadoop-ozone/commit/a79dc4609a975d46a3e051ad6904fb1eb40705ee#diff-b9b6f3ccb12829d90886e041d11395b1R288] > in HDDS-1780. This task proposes to change it back to debug. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14953) [Dynamometer] Missing blocks gradually increase after NN starts
[ https://issues.apache.org/jira/browse/HDFS-14953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968113#comment-16968113 ] Takanobu Asanuma commented on HDFS-14953: - The case is very similar to [this issue|https://github.com/linkedin/dynamometer/issues/64]. (Thanks for reporting it, [~weichiu].) {quote} After HDFS-9260, NN expects block replicas to be reported in ascending order of block id. If a block id is not in order, NN discards it silently. Because simulated DataNode in Dynamometer uses hash map to store block replicas, the replicas are not reported in order. The Dynamometer cluster would then see missing blocks gradually increase several minutes after NN starts. {quote} > [Dynamometer] Missing blocks gradually increase after NN starts > --- > > Key: HDFS-14953 > URL: https://issues.apache.org/jira/browse/HDFS-14953 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: tools >Reporter: Takanobu Asanuma >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14953) [Dynamometer] Missing blocks gradually increase after NN starts
Takanobu Asanuma created HDFS-14953: --- Summary: [Dynamometer] Missing blocks gradually increase after NN starts Key: HDFS-14953 URL: https://issues.apache.org/jira/browse/HDFS-14953 Project: Hadoop HDFS Issue Type: Sub-task Components: tools Reporter: Takanobu Asanuma -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14384) When lastLocatedBlock token expire, it will take 1~3s second to refetch it.
[ https://issues.apache.org/jira/browse/HDFS-14384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968105#comment-16968105 ] Surendra Singh Lilhore commented on HDFS-14384: --- Fixed check-style warnings. > When lastLocatedBlock token expire, it will take 1~3s second to refetch it. > --- > > Key: HDFS-14384 > URL: https://issues.apache.org/jira/browse/HDFS-14384 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 2.7.2 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore >Priority: Major > Attachments: HDFS-14384.001.patch, HDFS-14384.002.patch, > HDFS-14384.003.patch > > > Scenario : > 1. Write file with one block which is in-progress. > 2. Open input stream and close the output stream. > 3. Wait for block token expiration and read the data. > 4. Last block read take 1~3 sec to read it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14384) When lastLocatedBlock token expire, it will take 1~3s second to refetch it.
[ https://issues.apache.org/jira/browse/HDFS-14384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Surendra Singh Lilhore updated HDFS-14384: -- Attachment: HDFS-14384.003.patch > When lastLocatedBlock token expire, it will take 1~3s second to refetch it. > --- > > Key: HDFS-14384 > URL: https://issues.apache.org/jira/browse/HDFS-14384 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 2.7.2 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore >Priority: Major > Attachments: HDFS-14384.001.patch, HDFS-14384.002.patch, > HDFS-14384.003.patch > > > Scenario : > 1. Write file with one block which is in-progress. > 2. Open input stream and close the output stream. > 3. Wait for block token expiration and read the data. > 4. Last block read take 1~3 sec to read it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2393) HDDS-1847 broke some unit tests
[ https://issues.apache.org/jira/browse/HDDS-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-2393: - Labels: pull-request-available (was: ) > HDDS-1847 broke some unit tests > --- > > Key: HDDS-2393 > URL: https://issues.apache.org/jira/browse/HDDS-2393 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Chris Teoh >Assignee: Chris Teoh >Priority: Major > Labels: pull-request-available > > Siyao Meng commented on HDDS-1847: > -- > Looks like this commit breaks {{TestKeyManagerImpl}} in {{setUp()}} and > {{cleanup()}}. Run {{TestKeyManagerImpl#testListStatus()}} to steadily repro. > I believe there could be other tests that are broken by this. > {code} > java.lang.NullPointerException > at > org.apache.hadoop.hdds.scm.server.StorageContainerManagerHttpServer.getSpnegoPrincipal(StorageContainerManagerHttpServer.java:74) > at > org.apache.hadoop.hdds.server.BaseHttpServer.(BaseHttpServer.java:81) > at > org.apache.hadoop.hdds.scm.server.StorageContainerManagerHttpServer.(StorageContainerManagerHttpServer.java:36) > at > org.apache.hadoop.hdds.scm.server.StorageContainerManager.(StorageContainerManager.java:330) > at org.apache.hadoop.hdds.scm.TestUtils.getScm(TestUtils.java:544) > at > org.apache.hadoop.ozone.om.TestKeyManagerImpl.setUp(TestKeyManagerImpl.java:150) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > at org.junit.runner.JUnitCore.run(JUnitCore.java:160) > at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) > at > com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47) > at > com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) > at > com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) > {code} > {code} > java.lang.NullPointerException > at > org.apache.hadoop.ozone.om.TestKeyManagerImpl.cleanup(TestKeyManagerImpl.java:176) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33) > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > at org.junit.runner.JUnitCore.run(JUnitCore.java:160) > at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) > at > com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47) > at > com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) > at > com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2393) HDDS-1847 broke some unit tests
[ https://issues.apache.org/jira/browse/HDDS-2393?focusedWorklogId=339169=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339169 ] ASF GitHub Bot logged work on HDDS-2393: Author: ASF GitHub Bot Created on: 06/Nov/19 05:55 Start Date: 06/Nov/19 05:55 Worklog Time Spent: 10m Work Description: bharatviswa504 commented on pull request #111: HDDS-2393: Fixing NPE in unit test from HDDS-1847 URL: https://github.com/apache/hadoop-ozone/pull/111 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339169) Remaining Estimate: 0h Time Spent: 10m > HDDS-1847 broke some unit tests > --- > > Key: HDDS-2393 > URL: https://issues.apache.org/jira/browse/HDDS-2393 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Chris Teoh >Assignee: Chris Teoh >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Siyao Meng commented on HDDS-1847: > -- > Looks like this commit breaks {{TestKeyManagerImpl}} in {{setUp()}} and > {{cleanup()}}. Run {{TestKeyManagerImpl#testListStatus()}} to steadily repro. > I believe there could be other tests that are broken by this. > {code} > java.lang.NullPointerException > at > org.apache.hadoop.hdds.scm.server.StorageContainerManagerHttpServer.getSpnegoPrincipal(StorageContainerManagerHttpServer.java:74) > at > org.apache.hadoop.hdds.server.BaseHttpServer.(BaseHttpServer.java:81) > at > org.apache.hadoop.hdds.scm.server.StorageContainerManagerHttpServer.(StorageContainerManagerHttpServer.java:36) > at > org.apache.hadoop.hdds.scm.server.StorageContainerManager.(StorageContainerManager.java:330) > at org.apache.hadoop.hdds.scm.TestUtils.getScm(TestUtils.java:544) > at > org.apache.hadoop.ozone.om.TestKeyManagerImpl.setUp(TestKeyManagerImpl.java:150) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > at org.junit.runner.JUnitCore.run(JUnitCore.java:160) > at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) > at > com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47) > at > com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) > at > com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) > {code} > {code} > java.lang.NullPointerException > at > org.apache.hadoop.ozone.om.TestKeyManagerImpl.cleanup(TestKeyManagerImpl.java:176) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33) > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > at org.junit.runner.JUnitCore.run(JUnitCore.java:160) > at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) > at >
[jira] [Updated] (HDDS-2407) Reduce log level of per-node failure in XceiverClientGrpc
[ https://issues.apache.org/jira/browse/HDDS-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Doroszlai updated HDDS-2407: --- Description: When reading from a pipeline, client should not care if some datanode could not service the request, as long as the pipeline as a whole is OK. The [log message|https://github.com/apache/hadoop-ozone/blob/2529cee1a7dd27c51cb9aed0dc57af283ff24e26/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/XceiverClientGrpc.java#L303-L304] indicating node failure was [increased to error level|https://github.com/apache/hadoop-ozone/commit/a79dc4609a975d46a3e051ad6904fb1eb40705ee#diff-b9b6f3ccb12829d90886e041d11395b1R288] in HDDS-1780. This task proposes to change it back to debug. (was: When reading from a pipeline, client should not care if some datanode could not service the request, as long as the pipeline as a whole is OK. The [log message|https://github.com/bshashikant/hadoop-ozone/blob/2529cee1a7dd27c51cb9aed0dc57af283ff24e26/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/XceiverClientGrpc.java#L303-L304] indicating node failure was [increased to error level|https://github.com/bshashikant/hadoop-ozone/commit/a79dc4609a975d46a3e051ad6904fb1eb40705ee#diff-b9b6f3ccb12829d90886e041d11395b1R288] in HDDS-1780. This task proposes to change it back to debug.) > Reduce log level of per-node failure in XceiverClientGrpc > - > > Key: HDDS-2407 > URL: https://issues.apache.org/jira/browse/HDDS-2407 > Project: Hadoop Distributed Data Store > Issue Type: Task > Components: Ozone Client >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > When reading from a pipeline, client should not care if some datanode could > not service the request, as long as the pipeline as a whole is OK. The [log > message|https://github.com/apache/hadoop-ozone/blob/2529cee1a7dd27c51cb9aed0dc57af283ff24e26/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/XceiverClientGrpc.java#L303-L304] > indicating node failure was [increased to error > level|https://github.com/apache/hadoop-ozone/commit/a79dc4609a975d46a3e051ad6904fb1eb40705ee#diff-b9b6f3ccb12829d90886e041d11395b1R288] > in HDDS-1780. This task proposes to change it back to debug. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2407) Reduce log level of per-node failure in XceiverClientGrpc
[ https://issues.apache.org/jira/browse/HDDS-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-2407: - Labels: pull-request-available (was: ) > Reduce log level of per-node failure in XceiverClientGrpc > - > > Key: HDDS-2407 > URL: https://issues.apache.org/jira/browse/HDDS-2407 > Project: Hadoop Distributed Data Store > Issue Type: Task > Components: Ozone Client >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Minor > Labels: pull-request-available > > When reading from a pipeline, client should not care if some datanode could > not service the request, as long as the pipeline as a whole is OK. The [log > message|https://github.com/bshashikant/hadoop-ozone/blob/2529cee1a7dd27c51cb9aed0dc57af283ff24e26/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/XceiverClientGrpc.java#L303-L304] > indicating node failure was [increased to error > level|https://github.com/bshashikant/hadoop-ozone/commit/a79dc4609a975d46a3e051ad6904fb1eb40705ee#diff-b9b6f3ccb12829d90886e041d11395b1R288] > in HDDS-1780. This task proposes to change it back to debug. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2407) Reduce log level of per-node failure in XceiverClientGrpc
[ https://issues.apache.org/jira/browse/HDDS-2407?focusedWorklogId=339167=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339167 ] ASF GitHub Bot logged work on HDDS-2407: Author: ASF GitHub Bot Created on: 06/Nov/19 05:54 Start Date: 06/Nov/19 05:54 Worklog Time Spent: 10m Work Description: adoroszlai commented on pull request #120: HDDS-2407. Reduce log level of per-node failure in XceiverClientGrpc URL: https://github.com/apache/hadoop-ozone/pull/120 ## What changes were proposed in this pull request? When reading from a pipeline, client should not care if some datanode could not service the request, as long as the pipeline as a whole is OK. The [log message](https://github.com/apache/hadoop-ozone/blob/2529cee1a7dd27c51cb9aed0dc57af283ff24e26/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/XceiverClientGrpc.java#L303-L304) indicating node failure was [increased to error level](https://github.com/apache/hadoop-ozone/commit/a79dc4609a975d46a3e051ad6904fb1eb40705ee#diff-b9b6f3ccb12829d90886e041d11395b1R288) in [HDDS-1780](https://issues.apache.org/jira/browse/HDDS-1780). This PR proposes to change it back to debug. Pipeline-level failure is still logged as error. https://issues.apache.org/jira/browse/HDDS-2407 ## How was this patch tested? Tested locally on docker-compose cluster with 0/1/2/3 datanodes down. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339167) Remaining Estimate: 0h Time Spent: 10m > Reduce log level of per-node failure in XceiverClientGrpc > - > > Key: HDDS-2407 > URL: https://issues.apache.org/jira/browse/HDDS-2407 > Project: Hadoop Distributed Data Store > Issue Type: Task > Components: Ozone Client >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > When reading from a pipeline, client should not care if some datanode could > not service the request, as long as the pipeline as a whole is OK. The [log > message|https://github.com/bshashikant/hadoop-ozone/blob/2529cee1a7dd27c51cb9aed0dc57af283ff24e26/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/XceiverClientGrpc.java#L303-L304] > indicating node failure was [increased to error > level|https://github.com/bshashikant/hadoop-ozone/commit/a79dc4609a975d46a3e051ad6904fb1eb40705ee#diff-b9b6f3ccb12829d90886e041d11395b1R288] > in HDDS-1780. This task proposes to change it back to debug. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2404) Add support for Registered id as service identifier for CSR.
[ https://issues.apache.org/jira/browse/HDDS-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968096#comment-16968096 ] Bharat Viswanadham commented on HDDS-2404: -- Can we move this task under HDDS-505, as it is related to OM HA work.? > Add support for Registered id as service identifier for CSR. > > > Key: HDDS-2404 > URL: https://issues.apache.org/jira/browse/HDDS-2404 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: SCM >Reporter: Anu Engineer >Assignee: Abhishek Purohit >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The SCM HA needs the ability to represent a group as a single entity. So that > Tokens for each of the OM which is part of an HA group can be honored by the > data nodes. > This patch adds the notion of a service group ID to the Certificate > Infrastructure. In the next JIRAs, we will use this capability when issuing > certificates to OM -- especially when they are in HA mode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-1643) Send hostName also part of OMRequest
[ https://issues.apache.org/jira/browse/HDDS-1643?focusedWorklogId=339164=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339164 ] ASF GitHub Bot logged work on HDDS-1643: Author: ASF GitHub Bot Created on: 06/Nov/19 05:49 Start Date: 06/Nov/19 05:49 Worklog Time Spent: 10m Work Description: bharatviswa504 commented on pull request #70: HDDS-1643. Send hostName also part of OMRequest. URL: https://github.com/apache/hadoop-ozone/pull/70 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339164) Time Spent: 20m (was: 10m) > Send hostName also part of OMRequest > > > Key: HDDS-1643 > URL: https://issues.apache.org/jira/browse/HDDS-1643 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Bharat Viswanadham >Assignee: YiSheng Lien >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > This Jira is created based on the comment from [~eyang] on HDDS-1600 jira. > [~bharatviswa] can hostname be used as part of OM request? For running in > docker container, virtual private network address may not be routable or > exposed to outside world. Using IP to identify the source client location may > not be enough. It would be nice to have ability support hostname based > request too. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-1643) Send hostName also part of OMRequest
[ https://issues.apache.org/jira/browse/HDDS-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham resolved HDDS-1643. -- Fix Version/s: 0.5.0 Resolution: Fixed > Send hostName also part of OMRequest > > > Key: HDDS-1643 > URL: https://issues.apache.org/jira/browse/HDDS-1643 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Bharat Viswanadham >Assignee: YiSheng Lien >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > This Jira is created based on the comment from [~eyang] on HDDS-1600 jira. > [~bharatviswa] can hostname be used as part of OM request? For running in > docker container, virtual private network address may not be routable or > exposed to outside world. Using IP to identify the source client location may > not be enough. It would be nice to have ability support hostname based > request too. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2407) Reduce log level of per-node failure in XceiverClientGrpc
Attila Doroszlai created HDDS-2407: -- Summary: Reduce log level of per-node failure in XceiverClientGrpc Key: HDDS-2407 URL: https://issues.apache.org/jira/browse/HDDS-2407 Project: Hadoop Distributed Data Store Issue Type: Task Components: Ozone Client Reporter: Attila Doroszlai Assignee: Attila Doroszlai When reading from a pipeline, client should not care if some datanode could not service the request, as long as the pipeline as a whole is OK. The [log message|https://github.com/bshashikant/hadoop-ozone/blob/2529cee1a7dd27c51cb9aed0dc57af283ff24e26/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/XceiverClientGrpc.java#L303-L304] indicating node failure was [increased to error level|https://github.com/bshashikant/hadoop-ozone/commit/a79dc4609a975d46a3e051ad6904fb1eb40705ee#diff-b9b6f3ccb12829d90886e041d11395b1R288] in HDDS-1780. This task proposes to change it back to debug. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-2064) Add tests for incorrect OM HA config when node ID or RPC address is not configured
[ https://issues.apache.org/jira/browse/HDDS-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham resolved HDDS-2064. -- Fix Version/s: 0.5.0 Resolution: Fixed > Add tests for incorrect OM HA config when node ID or RPC address is not > configured > -- > > Key: HDDS-2064 > URL: https://issues.apache.org/jira/browse/HDDS-2064 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Siyao Meng >Assignee: Siyao Meng >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > -OM will NPE and crash when `ozone.om.service.ids=id1,id2` is configured but > `ozone.om.nodes.id1` doesn't exist; or `ozone.om.address.id1.omX` doesn't > exist.- > -Root cause:- > -`OzoneManager#loadOMHAConfigs()` didn't check the case where `found == 0`. > This happens when local OM doesn't match any `ozone.om.address.idX.omX` in > the config.- > Due to the refactoring done in HDDS-2162. This fix has been included in that > commit. I will repurpose the jira to add some tests for the HA config. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2064) Add tests for incorrect OM HA config when node ID or RPC address is not configured
[ https://issues.apache.org/jira/browse/HDDS-2064?focusedWorklogId=339157=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339157 ] ASF GitHub Bot logged work on HDDS-2064: Author: ASF GitHub Bot Created on: 06/Nov/19 05:29 Start Date: 06/Nov/19 05:29 Worklog Time Spent: 10m Work Description: bharatviswa504 commented on pull request #119: HDDS-2064. Add tests for incorrect OM HA config when node ID or RPC address is not configured URL: https://github.com/apache/hadoop-ozone/pull/119 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339157) Time Spent: 2h 10m (was: 2h) > Add tests for incorrect OM HA config when node ID or RPC address is not > configured > -- > > Key: HDDS-2064 > URL: https://issues.apache.org/jira/browse/HDDS-2064 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Siyao Meng >Assignee: Siyao Meng >Priority: Major > Labels: pull-request-available > Time Spent: 2h 10m > Remaining Estimate: 0h > > -OM will NPE and crash when `ozone.om.service.ids=id1,id2` is configured but > `ozone.om.nodes.id1` doesn't exist; or `ozone.om.address.id1.omX` doesn't > exist.- > -Root cause:- > -`OzoneManager#loadOMHAConfigs()` didn't check the case where `found == 0`. > This happens when local OM doesn't match any `ozone.om.address.idX.omX` in > the config.- > Due to the refactoring done in HDDS-2162. This fix has been included in that > commit. I will repurpose the jira to add some tests for the HA config. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2359) Seeking randomly in a key with more than 2 blocks of data leads to inconsistent reads
[ https://issues.apache.org/jira/browse/HDDS-2359?focusedWorklogId=339153=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339153 ] ASF GitHub Bot logged work on HDDS-2359: Author: ASF GitHub Bot Created on: 06/Nov/19 05:26 Start Date: 06/Nov/19 05:26 Worklog Time Spent: 10m Work Description: bharatviswa504 commented on pull request #82: HDDS-2359. Seeking randomly in a key with more than 2 blocks of data leads to inconsistent reads URL: https://github.com/apache/hadoop-ozone/pull/82 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339153) Time Spent: 20m (was: 10m) > Seeking randomly in a key with more than 2 blocks of data leads to > inconsistent reads > - > > Key: HDDS-2359 > URL: https://issues.apache.org/jira/browse/HDDS-2359 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Istvan Fajth >Assignee: Shashikant Banerjee >Priority: Critical > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > During Hive testing we found the following exception: > {code} > TaskAttempt 3 failed, info=[Error: Error while running task ( failure ) : > attempt_1569246922012_0214_1_03_00_3:java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: > java.io.IOException: error iterating > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41) > at > com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.io.IOException: java.io.IOException: error iterating > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:80) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:426) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267) > ... 16 more > Caused by: java.io.IOException: java.io.IOException: error iterating > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:366) > at > org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79) > at > org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:151) > at >
[jira] [Resolved] (HDDS-2359) Seeking randomly in a key with more than 2 blocks of data leads to inconsistent reads
[ https://issues.apache.org/jira/browse/HDDS-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharat Viswanadham resolved HDDS-2359. -- Fix Version/s: 0.5.0 Resolution: Fixed > Seeking randomly in a key with more than 2 blocks of data leads to > inconsistent reads > - > > Key: HDDS-2359 > URL: https://issues.apache.org/jira/browse/HDDS-2359 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Istvan Fajth >Assignee: Shashikant Banerjee >Priority: Critical > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > During Hive testing we found the following exception: > {code} > TaskAttempt 3 failed, info=[Error: Error while running task ( failure ) : > attempt_1569246922012_0214_1_03_00_3:java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: > java.io.IOException: error iterating > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41) > at > com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.io.IOException: java.io.IOException: error iterating > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:80) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:426) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267) > ... 16 more > Caused by: java.io.IOException: java.io.IOException: error iterating > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:366) > at > org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79) > at > org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:151) > at > org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:116) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68) > ... 18 more > Caused by: java.io.IOException: error iterating > at > org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.next(VectorizedOrcAcidRowBatchReader.java:835) > at > org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.next(VectorizedOrcAcidRowBatchReader.java:74) > at > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:361) > ... 24 more > Caused by: java.io.IOException: Error reading file: > o3fs://hive.warehouse.vc0136.halxg.cloudera.com:9862/data/inventory/delta_001_001_/bucket_0 > at >
[jira] [Resolved] (HDDS-2380) Use the Table.isExist API instead of get() call while checking for presence of key.
[ https://issues.apache.org/jira/browse/HDDS-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey resolved HDDS-2380. Resolution: Fixed > Use the Table.isExist API instead of get() call while checking for presence > of key. > --- > > Key: HDDS-2380 > URL: https://issues.apache.org/jira/browse/HDDS-2380 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Manager >Reporter: Aravindan Vijayan >Assignee: Aravindan Vijayan >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Currently, when OM creates a file/directory, it checks the absence of all > prefix paths of the key in its RocksDB. Since we don't care about the > deserialization of the actual value, we should use the isExist API added in > org.apache.hadoop.hdds.utils.db.Table which internally uses the more > performant keyMayExist API of RocksDB. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14941) Potential editlog race condition can cause corrupted file
[ https://issues.apache.org/jira/browse/HDFS-14941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968063#comment-16968063 ] Hadoop QA commented on HDFS-14941: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 58s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 12s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 21m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 22m 9s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 14s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 36s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 21m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 21m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 25s{color} | {color:green} root: The patch generated 0 new + 705 unchanged - 1 fixed = 705 total (was 706) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 50s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 45s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 34s{color} | {color:red} hadoop-common in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 97m 44s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 48s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}238m 37s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.security.TestFixKerberosTicketOrder | | | hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy | | | hadoop.hdfs.server.namenode.ha.TestBootstrapAliasmap | | | hadoop.hdfs.server.namenode.TestNamenodeCapacityReport | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 | | JIRA Issue | HDFS-14941 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12985005/HDFS-14941.006.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 7cdf1ae334d4 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality |
[jira] [Commented] (HDFS-14941) Potential editlog race condition can cause corrupted file
[ https://issues.apache.org/jira/browse/HDFS-14941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968013#comment-16968013 ] Hadoop QA commented on HDFS-14941: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 34s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 32s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 19m 0s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 2s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 40s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 21s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 45s{color} | {color:green} root: The patch generated 0 new + 705 unchanged - 1 fixed = 705 total (was 706) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 51s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 40s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 31s{color} | {color:red} hadoop-common in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}109m 41s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 49s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}230m 10s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.security.TestFixKerberosTicketOrder | | | hadoop.conf.TestCommonConfigurationFields | | | hadoop.hdfs.server.namenode.TestNameNodeMXBean | | | hadoop.hdfs.TestMultipleNNPortQOP | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 | | JIRA Issue | HDFS-14941 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12985000/HDFS-14941.005.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 7a96a3dee053 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision |
[jira] [Commented] (HDFS-14922) On StartUp , Snapshot modification time got changed
[ https://issues.apache.org/jira/browse/HDFS-14922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968005#comment-16968005 ] Íñigo Goiri commented on HDFS-14922: I meant the full javadoc: {code} /** * Log that a snapshot is created. * @param snapRoot Root of the snapshot. * @param snapName Name of the snapshot. * @param toLogRpcIds If it is logging RPC ids. * @param mtime The snapshot creation time set by Time.now(). */ void logCreateSnapshot(String snapRoot, String snapName, boolean toLogRpcIds, long mtime) { {code} It doesn't hurt to improve the readability of the existing code. BTW, even though {{setSnapshotMTime()}} is package protected, we should also add the javadoc there. > On StartUp , Snapshot modification time got changed > --- > > Key: HDFS-14922 > URL: https://issues.apache.org/jira/browse/HDFS-14922 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: hemanthboyina >Assignee: hemanthboyina >Priority: Major > Attachments: HDFS-14922.001.patch, HDFS-14922.002.patch, > HDFS-14922.003.patch, HDFS-14922.004.patch > > > Snapshot modification time got changed on namenode restart -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14949) HttpFS does not support getServerDefaults()
[ https://issues.apache.org/jira/browse/HDFS-14949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968000#comment-16968000 ] Íñigo Goiri commented on HDFS-14949: Let's do the same then, implement both and refer one to the other. The deprecation warning would be fine in this case. > HttpFS does not support getServerDefaults() > --- > > Key: HDFS-14949 > URL: https://issues.apache.org/jira/browse/HDFS-14949 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Kihwal Lee >Assignee: hemanthboyina >Priority: Major > Attachments: HDFS-14949.001.patch, HDFS-14949.002.patch, > HDFS-14949.003.patch > > > For HttpFS server to function as a fully webhdfs-compatible service, > getServerDefaults() support is needed. It is increasingly used in new > features and improvements. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-14928) UI: unifying the WebUI across different components.
[ https://issues.apache.org/jira/browse/HDFS-14928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xieming Li reassigned HDFS-14928: - Assignee: Xieming Li > UI: unifying the WebUI across different components. > --- > > Key: HDFS-14928 > URL: https://issues.apache.org/jira/browse/HDFS-14928 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ui >Reporter: Xieming Li >Assignee: Xieming Li >Priority: Trivial > Attachments: DN_orig.png, DN_with_legend.png.png, DN_wo_legend.png, > NN_orig.png, NN_with_legend.png, NN_wo_legend.png, RBF_orig.png, > RBF_with_legend.png, RBF_wo_legend.png > > > The WebUI of different components could be unified. > *Router:* > |Current| !RBF_orig.png|width=500! | > |Proposed 1 (With Icon) | !RBF_wo_legend.png|width=500! | > |Proposed 2 (With Icon and Legend)|!RBF_with_legend.png|width=500! | > *NameNode:* > |Current| !NN_orig.png|width=500! | > |Proposed 1 (With Icon) | !NN_wo_legend.png|width=500! | > |Proposed 2 (With Icon and Legend)| !NN_with_legend.png|width=500! | > *DataNode:* > |Current| !DN_orig.png|width=500! | > |Proposed 1 (With Icon) | !DN_wo_legend.png|width=500! | > |Proposed 2 (With Icon and Legend)| !DN_with_legend.png.png|width=500! | -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14806) Bootstrap standby may fail if used in-progress tailing
[ https://issues.apache.org/jira/browse/HDFS-14806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967978#comment-16967978 ] Konstantin Shvachko commented on HDFS-14806: +1 on v4 from me as well. > Bootstrap standby may fail if used in-progress tailing > -- > > Key: HDFS-14806 > URL: https://issues.apache.org/jira/browse/HDFS-14806 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.3.0 >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Attachments: HDFS-14806.001.patch, HDFS-14806.002.patch, > HDFS-14806.003.patch, HDFS-14806.004.patch > > > One issue we went across was that if in-progress tailing is enabled, > bootstrap standby could fail. > When in-progress tailing is enabled, Bootstrap uses the RPC mechanism to get > edits. There is a config {{dfs.ha.tail-edits.qjm.rpc.max-txns}} that sets an > upper bound on how many txnid can be included in one RPC call. The default is > 5000. Meaning bootstraping NN (say NN1) can only pull at most 5000 edits from > JN. However, as part of bootstrap, NN1 queries another NN (say NN2) for NN2's > current transactionID, NN2 may return a state that is > 5000 txnid from NN1's > current image. But NN1 can only see 5000 more txnid from JNs. At this point > NN1 goes panic, because txnid retuned by JNs is behind NN2's returned state, > bootstrap then fail. > Essentially, bootstrap standby can fail if both of two following conditions > are met: > # in-progress tailing is enabled AND > # the boostraping NN is too far (>5000 txid) behind > Increasing the value of {{dfs.ha.tail-edits.qjm.rpc.max-txns}} to some super > large value allowed bootstrap to continue. But this is hardly the ideal > solution. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14941) Potential editlog race condition can cause corrupted file
[ https://issues.apache.org/jira/browse/HDFS-14941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967969#comment-16967969 ] Konstantin Shvachko commented on HDFS-14941: +1 for v6 patch. If anybody wants to review please do. > Potential editlog race condition can cause corrupted file > - > > Key: HDFS-14941 > URL: https://issues.apache.org/jira/browse/HDFS-14941 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Labels: ha > Attachments: HDFS-14941.001.patch, HDFS-14941.002.patch, > HDFS-14941.003.patch, HDFS-14941.004.patch, HDFS-14941.005.patch, > HDFS-14941.006.patch > > > Recently we encountered an issue that, after a failover, NameNode complains > corrupted file/missing blocks. The blocks did recover after full block > reports, so the blocks are not actually missing. After further investigation, > we believe this is what happened: > First of all, on SbN, it is possible that it receives block reports before > corresponding edit tailing happened. In which case SbN postpones processing > the DN block report, handled by the guarding logic below: > {code:java} > if (shouldPostponeBlocksFromFuture && > namesystem.isGenStampInFuture(iblk)) { > queueReportedBlock(storageInfo, iblk, reportedState, > QUEUE_REASON_FUTURE_GENSTAMP); > continue; > } > {code} > Basically if reported block has a future generation stamp, the DN report gets > requeued. > However, in {{FSNamesystem#storeAllocatedBlock}}, we have the following code: > {code:java} > // allocate new block, record block locations in INode. > newBlock = createNewBlock(); > INodesInPath inodesInPath = INodesInPath.fromINode(pendingFile); > saveAllocatedBlock(src, inodesInPath, newBlock, targets); > persistNewBlock(src, pendingFile); > offset = pendingFile.computeFileSize(); > {code} > The line > {{newBlock = createNewBlock();}} > Would log an edit entry {{OP_SET_GENSTAMP_V2}} to bump generation stamp on > Standby > while the following line > {{persistNewBlock(src, pendingFile);}} > would log another edit entry {{OP_ADD_BLOCK}} to actually add the block on > Standby. > Then the race condition is that, imagine Standby has just processed > {{OP_SET_GENSTAMP_V2}}, but not yet {{OP_ADD_BLOCK}} (if they just happen to > be in different setment). Now a block report with new generation stamp comes > in. > Since the genstamp bump has already been processed, the reported block may > not be considered as future block. So the guarding logic passes. But > actually, the block hasn't been added to blockmap, because the second edit is > yet to be tailed. So, the block then gets added to invalidate block list and > we saw messages like: > {code:java} > BLOCK* addBlock: block XXX on node XXX size XXX does not belong to any file > {code} > Even worse, since this IBR is effectively lost, the NameNode has no > information about this block, until the next full block report. So after a > failover, the NN marks it as corrupt. > This issue won't happen though, if both of the edit entries get tailed all > together, so no IBR processing can happen in between. But in our case, we set > edit tailing interval to super low (to allow Standby read), so when under > high workload, there is a much much higher chance that the two entries are > tailed separately, causing the issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14941) Potential editlog race condition can cause corrupted file
[ https://issues.apache.org/jira/browse/HDFS-14941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967962#comment-16967962 ] Chen Liang commented on HDFS-14941: --- Post v006 patch after offline discussion with [~shv]. The diff is changing unit test to check for correctness after failover. > Potential editlog race condition can cause corrupted file > - > > Key: HDFS-14941 > URL: https://issues.apache.org/jira/browse/HDFS-14941 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Labels: ha > Attachments: HDFS-14941.001.patch, HDFS-14941.002.patch, > HDFS-14941.003.patch, HDFS-14941.004.patch, HDFS-14941.005.patch, > HDFS-14941.006.patch > > > Recently we encountered an issue that, after a failover, NameNode complains > corrupted file/missing blocks. The blocks did recover after full block > reports, so the blocks are not actually missing. After further investigation, > we believe this is what happened: > First of all, on SbN, it is possible that it receives block reports before > corresponding edit tailing happened. In which case SbN postpones processing > the DN block report, handled by the guarding logic below: > {code:java} > if (shouldPostponeBlocksFromFuture && > namesystem.isGenStampInFuture(iblk)) { > queueReportedBlock(storageInfo, iblk, reportedState, > QUEUE_REASON_FUTURE_GENSTAMP); > continue; > } > {code} > Basically if reported block has a future generation stamp, the DN report gets > requeued. > However, in {{FSNamesystem#storeAllocatedBlock}}, we have the following code: > {code:java} > // allocate new block, record block locations in INode. > newBlock = createNewBlock(); > INodesInPath inodesInPath = INodesInPath.fromINode(pendingFile); > saveAllocatedBlock(src, inodesInPath, newBlock, targets); > persistNewBlock(src, pendingFile); > offset = pendingFile.computeFileSize(); > {code} > The line > {{newBlock = createNewBlock();}} > Would log an edit entry {{OP_SET_GENSTAMP_V2}} to bump generation stamp on > Standby > while the following line > {{persistNewBlock(src, pendingFile);}} > would log another edit entry {{OP_ADD_BLOCK}} to actually add the block on > Standby. > Then the race condition is that, imagine Standby has just processed > {{OP_SET_GENSTAMP_V2}}, but not yet {{OP_ADD_BLOCK}} (if they just happen to > be in different setment). Now a block report with new generation stamp comes > in. > Since the genstamp bump has already been processed, the reported block may > not be considered as future block. So the guarding logic passes. But > actually, the block hasn't been added to blockmap, because the second edit is > yet to be tailed. So, the block then gets added to invalidate block list and > we saw messages like: > {code:java} > BLOCK* addBlock: block XXX on node XXX size XXX does not belong to any file > {code} > Even worse, since this IBR is effectively lost, the NameNode has no > information about this block, until the next full block report. So after a > failover, the NN marks it as corrupt. > This issue won't happen though, if both of the edit entries get tailed all > together, so no IBR processing can happen in between. But in our case, we set > edit tailing interval to super low (to allow Standby read), so when under > high workload, there is a much much higher chance that the two entries are > tailed separately, causing the issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14941) Potential editlog race condition can cause corrupted file
[ https://issues.apache.org/jira/browse/HDFS-14941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang updated HDFS-14941: -- Attachment: HDFS-14941.006.patch > Potential editlog race condition can cause corrupted file > - > > Key: HDFS-14941 > URL: https://issues.apache.org/jira/browse/HDFS-14941 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Labels: ha > Attachments: HDFS-14941.001.patch, HDFS-14941.002.patch, > HDFS-14941.003.patch, HDFS-14941.004.patch, HDFS-14941.005.patch, > HDFS-14941.006.patch > > > Recently we encountered an issue that, after a failover, NameNode complains > corrupted file/missing blocks. The blocks did recover after full block > reports, so the blocks are not actually missing. After further investigation, > we believe this is what happened: > First of all, on SbN, it is possible that it receives block reports before > corresponding edit tailing happened. In which case SbN postpones processing > the DN block report, handled by the guarding logic below: > {code:java} > if (shouldPostponeBlocksFromFuture && > namesystem.isGenStampInFuture(iblk)) { > queueReportedBlock(storageInfo, iblk, reportedState, > QUEUE_REASON_FUTURE_GENSTAMP); > continue; > } > {code} > Basically if reported block has a future generation stamp, the DN report gets > requeued. > However, in {{FSNamesystem#storeAllocatedBlock}}, we have the following code: > {code:java} > // allocate new block, record block locations in INode. > newBlock = createNewBlock(); > INodesInPath inodesInPath = INodesInPath.fromINode(pendingFile); > saveAllocatedBlock(src, inodesInPath, newBlock, targets); > persistNewBlock(src, pendingFile); > offset = pendingFile.computeFileSize(); > {code} > The line > {{newBlock = createNewBlock();}} > Would log an edit entry {{OP_SET_GENSTAMP_V2}} to bump generation stamp on > Standby > while the following line > {{persistNewBlock(src, pendingFile);}} > would log another edit entry {{OP_ADD_BLOCK}} to actually add the block on > Standby. > Then the race condition is that, imagine Standby has just processed > {{OP_SET_GENSTAMP_V2}}, but not yet {{OP_ADD_BLOCK}} (if they just happen to > be in different setment). Now a block report with new generation stamp comes > in. > Since the genstamp bump has already been processed, the reported block may > not be considered as future block. So the guarding logic passes. But > actually, the block hasn't been added to blockmap, because the second edit is > yet to be tailed. So, the block then gets added to invalidate block list and > we saw messages like: > {code:java} > BLOCK* addBlock: block XXX on node XXX size XXX does not belong to any file > {code} > Even worse, since this IBR is effectively lost, the NameNode has no > information about this block, until the next full block report. So after a > failover, the NN marks it as corrupt. > This issue won't happen though, if both of the edit entries get tailed all > together, so no IBR processing can happen in between. But in our case, we set > edit tailing interval to super low (to allow Standby read), so when under > high workload, there is a much much higher chance that the two entries are > tailed separately, causing the issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2384) Large chunks during write can have memory pressure on DN with multiple clients
[ https://issues.apache.org/jira/browse/HDDS-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967931#comment-16967931 ] Anu Engineer commented on HDDS-2384: Nope, we can encode the data in a set of small packets or read them as a sequence of small packets. Let us say, 8KB/64KB buffer and we read and write the data continually to the underlying disk. > Large chunks during write can have memory pressure on DN with multiple clients > -- > > Key: HDDS-2384 > URL: https://issues.apache.org/jira/browse/HDDS-2384 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Anu Engineer >Priority: Major > Labels: performance > > During large file writes, it ends up writing {{16 MB}} chunks. > https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java#L691 > In large clusters, 100s of clients may connect to DN. In such cases, > depending on the incoming write workload mem load on DN can increase > significantly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2384) Large chunks during write can have memory pressure on DN with multiple clients
[ https://issues.apache.org/jira/browse/HDDS-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967923#comment-16967923 ] Rajesh Balamohan commented on HDDS-2384: Thanks [~aengineer] for sharing the details. Wouldn't this still need 16 MB mem when constructing ChunkInfo from protobuf? > Large chunks during write can have memory pressure on DN with multiple clients > -- > > Key: HDDS-2384 > URL: https://issues.apache.org/jira/browse/HDDS-2384 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Anu Engineer >Priority: Major > Labels: performance > > During large file writes, it ends up writing {{16 MB}} chunks. > https://github.com/apache/hadoop-ozone/blob/master/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java#L691 > In large clusters, 100s of clients may connect to DN. In such cases, > depending on the incoming write workload mem load on DN can increase > significantly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14941) Potential editlog race condition can cause corrupted file
[ https://issues.apache.org/jira/browse/HDFS-14941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967912#comment-16967912 ] Hadoop QA commented on HDFS-14941: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 50s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 15s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 19m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 22m 18s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 13s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 28s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 20m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 20m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 25s{color} | {color:green} root: The patch generated 0 new + 705 unchanged - 1 fixed = 705 total (was 706) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 15s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 14s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 9m 48s{color} | {color:red} hadoop-common in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green}101m 22s{color} | {color:green} hadoop-hdfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 52s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}241m 44s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.security.TestFixKerberosTicketOrder | | | hadoop.conf.TestCommonConfigurationFields | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 | | JIRA Issue | HDFS-14941 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12984985/HDFS-14941.004.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux a61f79ae4988 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / bfb8f28 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | |
[jira] [Updated] (HDFS-14941) Potential editlog race condition can cause corrupted file
[ https://issues.apache.org/jira/browse/HDFS-14941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang updated HDFS-14941: -- Attachment: HDFS-14941.005.patch > Potential editlog race condition can cause corrupted file > - > > Key: HDFS-14941 > URL: https://issues.apache.org/jira/browse/HDFS-14941 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Labels: ha > Attachments: HDFS-14941.001.patch, HDFS-14941.002.patch, > HDFS-14941.003.patch, HDFS-14941.004.patch, HDFS-14941.005.patch > > > Recently we encountered an issue that, after a failover, NameNode complains > corrupted file/missing blocks. The blocks did recover after full block > reports, so the blocks are not actually missing. After further investigation, > we believe this is what happened: > First of all, on SbN, it is possible that it receives block reports before > corresponding edit tailing happened. In which case SbN postpones processing > the DN block report, handled by the guarding logic below: > {code:java} > if (shouldPostponeBlocksFromFuture && > namesystem.isGenStampInFuture(iblk)) { > queueReportedBlock(storageInfo, iblk, reportedState, > QUEUE_REASON_FUTURE_GENSTAMP); > continue; > } > {code} > Basically if reported block has a future generation stamp, the DN report gets > requeued. > However, in {{FSNamesystem#storeAllocatedBlock}}, we have the following code: > {code:java} > // allocate new block, record block locations in INode. > newBlock = createNewBlock(); > INodesInPath inodesInPath = INodesInPath.fromINode(pendingFile); > saveAllocatedBlock(src, inodesInPath, newBlock, targets); > persistNewBlock(src, pendingFile); > offset = pendingFile.computeFileSize(); > {code} > The line > {{newBlock = createNewBlock();}} > Would log an edit entry {{OP_SET_GENSTAMP_V2}} to bump generation stamp on > Standby > while the following line > {{persistNewBlock(src, pendingFile);}} > would log another edit entry {{OP_ADD_BLOCK}} to actually add the block on > Standby. > Then the race condition is that, imagine Standby has just processed > {{OP_SET_GENSTAMP_V2}}, but not yet {{OP_ADD_BLOCK}} (if they just happen to > be in different setment). Now a block report with new generation stamp comes > in. > Since the genstamp bump has already been processed, the reported block may > not be considered as future block. So the guarding logic passes. But > actually, the block hasn't been added to blockmap, because the second edit is > yet to be tailed. So, the block then gets added to invalidate block list and > we saw messages like: > {code:java} > BLOCK* addBlock: block XXX on node XXX size XXX does not belong to any file > {code} > Even worse, since this IBR is effectively lost, the NameNode has no > information about this block, until the next full block report. So after a > failover, the NN marks it as corrupt. > This issue won't happen though, if both of the edit entries get tailed all > together, so no IBR processing can happen in between. But in our case, we set > edit tailing interval to super low (to allow Standby read), so when under > high workload, there is a much much higher chance that the two entries are > tailed separately, causing the issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14941) Potential editlog race condition can cause corrupted file
[ https://issues.apache.org/jira/browse/HDFS-14941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967905#comment-16967905 ] Chen Liang commented on HDFS-14941: --- Thanks for the catch [~shv], uploaded v05 patch. > Potential editlog race condition can cause corrupted file > - > > Key: HDFS-14941 > URL: https://issues.apache.org/jira/browse/HDFS-14941 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Labels: ha > Attachments: HDFS-14941.001.patch, HDFS-14941.002.patch, > HDFS-14941.003.patch, HDFS-14941.004.patch, HDFS-14941.005.patch > > > Recently we encountered an issue that, after a failover, NameNode complains > corrupted file/missing blocks. The blocks did recover after full block > reports, so the blocks are not actually missing. After further investigation, > we believe this is what happened: > First of all, on SbN, it is possible that it receives block reports before > corresponding edit tailing happened. In which case SbN postpones processing > the DN block report, handled by the guarding logic below: > {code:java} > if (shouldPostponeBlocksFromFuture && > namesystem.isGenStampInFuture(iblk)) { > queueReportedBlock(storageInfo, iblk, reportedState, > QUEUE_REASON_FUTURE_GENSTAMP); > continue; > } > {code} > Basically if reported block has a future generation stamp, the DN report gets > requeued. > However, in {{FSNamesystem#storeAllocatedBlock}}, we have the following code: > {code:java} > // allocate new block, record block locations in INode. > newBlock = createNewBlock(); > INodesInPath inodesInPath = INodesInPath.fromINode(pendingFile); > saveAllocatedBlock(src, inodesInPath, newBlock, targets); > persistNewBlock(src, pendingFile); > offset = pendingFile.computeFileSize(); > {code} > The line > {{newBlock = createNewBlock();}} > Would log an edit entry {{OP_SET_GENSTAMP_V2}} to bump generation stamp on > Standby > while the following line > {{persistNewBlock(src, pendingFile);}} > would log another edit entry {{OP_ADD_BLOCK}} to actually add the block on > Standby. > Then the race condition is that, imagine Standby has just processed > {{OP_SET_GENSTAMP_V2}}, but not yet {{OP_ADD_BLOCK}} (if they just happen to > be in different setment). Now a block report with new generation stamp comes > in. > Since the genstamp bump has already been processed, the reported block may > not be considered as future block. So the guarding logic passes. But > actually, the block hasn't been added to blockmap, because the second edit is > yet to be tailed. So, the block then gets added to invalidate block list and > we saw messages like: > {code:java} > BLOCK* addBlock: block XXX on node XXX size XXX does not belong to any file > {code} > Even worse, since this IBR is effectively lost, the NameNode has no > information about this block, until the next full block report. So after a > failover, the NN marks it as corrupt. > This issue won't happen though, if both of the edit entries get tailed all > together, so no IBR processing can happen in between. But in our case, we set > edit tailing interval to super low (to allow Standby read), so when under > high workload, there is a much much higher chance that the two entries are > tailed separately, causing the issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14884) Add sanity check that zone key equals feinfo key while setting Xattrs
[ https://issues.apache.org/jira/browse/HDFS-14884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967903#comment-16967903 ] Erik Krogen commented on HDFS-14884: [~weichiu] any chance you can review the branch-2 port? I am hoping to save myself the time of understanding what's going on here :) Let me know if you don't have the time. > Add sanity check that zone key equals feinfo key while setting Xattrs > - > > Key: HDFS-14884 > URL: https://issues.apache.org/jira/browse/HDFS-14884 > Project: Hadoop HDFS > Issue Type: Bug > Components: encryption, hdfs >Affects Versions: 2.11.0 >Reporter: Mukul Kumar Singh >Assignee: Yuval Degani >Priority: Major > Fix For: 3.3.0, 3.1.4, 3.2.2, 2.11.0 > > Attachments: HDFS-14884-branch-2.001.patch, HDFS-14884.001.patch, > HDFS-14884.002.patch, HDFS-14884.003.patch, hdfs_distcp.patch > > > Currently, it is possible to set an external attribute where the zone key is > not the same as feinfo key. This jira will add a precondition before setting > this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14941) Potential editlog race condition can cause corrupted file
[ https://issues.apache.org/jira/browse/HDFS-14941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967888#comment-16967888 ] Konstantin Shvachko edited comment on HDFS-14941 at 11/5/19 9:40 PM: - Small thing. Still one "cached" genstamp remaining, should be "impending". {code} * Set the current genstamp to the impending genstamp. {code} Don't need Jenkins build for that, since it is a comment change only. was (Author: shv): Small thing. Still one "cached" genstamp remaining, should be "impending". {code} * Set the current genstamp to the impending genstamp. {code} > Potential editlog race condition can cause corrupted file > - > > Key: HDFS-14941 > URL: https://issues.apache.org/jira/browse/HDFS-14941 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Labels: ha > Attachments: HDFS-14941.001.patch, HDFS-14941.002.patch, > HDFS-14941.003.patch, HDFS-14941.004.patch > > > Recently we encountered an issue that, after a failover, NameNode complains > corrupted file/missing blocks. The blocks did recover after full block > reports, so the blocks are not actually missing. After further investigation, > we believe this is what happened: > First of all, on SbN, it is possible that it receives block reports before > corresponding edit tailing happened. In which case SbN postpones processing > the DN block report, handled by the guarding logic below: > {code:java} > if (shouldPostponeBlocksFromFuture && > namesystem.isGenStampInFuture(iblk)) { > queueReportedBlock(storageInfo, iblk, reportedState, > QUEUE_REASON_FUTURE_GENSTAMP); > continue; > } > {code} > Basically if reported block has a future generation stamp, the DN report gets > requeued. > However, in {{FSNamesystem#storeAllocatedBlock}}, we have the following code: > {code:java} > // allocate new block, record block locations in INode. > newBlock = createNewBlock(); > INodesInPath inodesInPath = INodesInPath.fromINode(pendingFile); > saveAllocatedBlock(src, inodesInPath, newBlock, targets); > persistNewBlock(src, pendingFile); > offset = pendingFile.computeFileSize(); > {code} > The line > {{newBlock = createNewBlock();}} > Would log an edit entry {{OP_SET_GENSTAMP_V2}} to bump generation stamp on > Standby > while the following line > {{persistNewBlock(src, pendingFile);}} > would log another edit entry {{OP_ADD_BLOCK}} to actually add the block on > Standby. > Then the race condition is that, imagine Standby has just processed > {{OP_SET_GENSTAMP_V2}}, but not yet {{OP_ADD_BLOCK}} (if they just happen to > be in different setment). Now a block report with new generation stamp comes > in. > Since the genstamp bump has already been processed, the reported block may > not be considered as future block. So the guarding logic passes. But > actually, the block hasn't been added to blockmap, because the second edit is > yet to be tailed. So, the block then gets added to invalidate block list and > we saw messages like: > {code:java} > BLOCK* addBlock: block XXX on node XXX size XXX does not belong to any file > {code} > Even worse, since this IBR is effectively lost, the NameNode has no > information about this block, until the next full block report. So after a > failover, the NN marks it as corrupt. > This issue won't happen though, if both of the edit entries get tailed all > together, so no IBR processing can happen in between. But in our case, we set > edit tailing interval to super low (to allow Standby read), so when under > high workload, there is a much much higher chance that the two entries are > tailed separately, causing the issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14941) Potential editlog race condition can cause corrupted file
[ https://issues.apache.org/jira/browse/HDFS-14941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967888#comment-16967888 ] Konstantin Shvachko commented on HDFS-14941: Small thing. Still one "cached" genstamp remaining, should be "impending". {code} * Set the current genstamp to the impending genstamp. {code} > Potential editlog race condition can cause corrupted file > - > > Key: HDFS-14941 > URL: https://issues.apache.org/jira/browse/HDFS-14941 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Labels: ha > Attachments: HDFS-14941.001.patch, HDFS-14941.002.patch, > HDFS-14941.003.patch, HDFS-14941.004.patch > > > Recently we encountered an issue that, after a failover, NameNode complains > corrupted file/missing blocks. The blocks did recover after full block > reports, so the blocks are not actually missing. After further investigation, > we believe this is what happened: > First of all, on SbN, it is possible that it receives block reports before > corresponding edit tailing happened. In which case SbN postpones processing > the DN block report, handled by the guarding logic below: > {code:java} > if (shouldPostponeBlocksFromFuture && > namesystem.isGenStampInFuture(iblk)) { > queueReportedBlock(storageInfo, iblk, reportedState, > QUEUE_REASON_FUTURE_GENSTAMP); > continue; > } > {code} > Basically if reported block has a future generation stamp, the DN report gets > requeued. > However, in {{FSNamesystem#storeAllocatedBlock}}, we have the following code: > {code:java} > // allocate new block, record block locations in INode. > newBlock = createNewBlock(); > INodesInPath inodesInPath = INodesInPath.fromINode(pendingFile); > saveAllocatedBlock(src, inodesInPath, newBlock, targets); > persistNewBlock(src, pendingFile); > offset = pendingFile.computeFileSize(); > {code} > The line > {{newBlock = createNewBlock();}} > Would log an edit entry {{OP_SET_GENSTAMP_V2}} to bump generation stamp on > Standby > while the following line > {{persistNewBlock(src, pendingFile);}} > would log another edit entry {{OP_ADD_BLOCK}} to actually add the block on > Standby. > Then the race condition is that, imagine Standby has just processed > {{OP_SET_GENSTAMP_V2}}, but not yet {{OP_ADD_BLOCK}} (if they just happen to > be in different setment). Now a block report with new generation stamp comes > in. > Since the genstamp bump has already been processed, the reported block may > not be considered as future block. So the guarding logic passes. But > actually, the block hasn't been added to blockmap, because the second edit is > yet to be tailed. So, the block then gets added to invalidate block list and > we saw messages like: > {code:java} > BLOCK* addBlock: block XXX on node XXX size XXX does not belong to any file > {code} > Even worse, since this IBR is effectively lost, the NameNode has no > information about this block, until the next full block report. So after a > failover, the NN marks it as corrupt. > This issue won't happen though, if both of the edit entries get tailed all > together, so no IBR processing can happen in between. But in our case, we set > edit tailing interval to super low (to allow Standby read), so when under > high workload, there is a much much higher chance that the two entries are > tailed separately, causing the issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14940) HDFS Balancer : getBalancerBandwidth displaying wrong values for the maximum network bandwidth used by the datanode while network bandwidth set with values as 104857600
[ https://issues.apache.org/jira/browse/HDFS-14940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967877#comment-16967877 ] Hadoop QA commented on HDFS-14940: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 24s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 13s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 59s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 12s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 84m 13s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 32s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}146m 34s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.ha.TestInitializeSharedEdits | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 | | JIRA Issue | HDFS-14940 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12984989/HDFS-14940.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 71e197b546db 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / bfb8f28 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/28256/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/28256/testReport/ | | Max. process+thread count | 3396 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/28256/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org |
[jira] [Commented] (HDDS-2372) Datanode pipeline is failing with NoSuchFileException
[ https://issues.apache.org/jira/browse/HDDS-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967873#comment-16967873 ] Tsz-wo Sze commented on HDDS-2372: -- It makes sense to check the chunk file again after temporary chunk file failure to avoid the problem here. This solution is simple and no synchronization is need. > Datanode pipeline is failing with NoSuchFileException > - > > Key: HDDS-2372 > URL: https://issues.apache.org/jira/browse/HDDS-2372 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Marton Elek >Assignee: Shashikant Banerjee >Priority: Critical > > Found it on a k8s based test cluster using a simple 3 node cluster and > HDDS-2327 freon test. After a while the StateMachine become unhealthy after > this error: > {code:java} > datanode-0 datanode java.util.concurrent.ExecutionException: > java.util.concurrent.ExecutionException: > org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: > java.nio.file.NoSuchFileException: > /data/storage/hdds/2a77fab9-9dc5-4f73-9501-b5347ac6145c/current/containerDir0/1/chunks/gGYYgiTTeg_testdata_chunk_13931.tmp.2.20830 > {code} > Can be reproduced. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14384) When lastLocatedBlock token expire, it will take 1~3s second to refetch it.
[ https://issues.apache.org/jira/browse/HDFS-14384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967865#comment-16967865 ] Hadoop QA commented on HDFS-14384: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 36s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 37s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 52s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 3m 11s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 46s{color} | {color:orange} hadoop-hdfs-project: The patch generated 2 new + 54 unchanged - 0 fixed = 56 total (was 54) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 1s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 10s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 45s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 57s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 93m 38s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 39s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}166m 18s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks | | | hadoop.hdfs.protocol.datatransfer.sasl.TestSaslDataTransfer | | | hadoop.hdfs.TestSafeModeWithStripedFileWithRandomECPolicy | | | hadoop.hdfs.TestFileChecksumCompositeCrc | | | hadoop.hdfs.TestErasureCodingExerciseAPIs | | | hadoop.hdfs.server.datanode.TestDataNodeLifeline | | | hadoop.hdfs.TestEncryptedTransfer | | | hadoop.hdfs.TestReconstructStripedFile | | | hadoop.hdfs.TestDFSPermission | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 | | JIRA Issue | HDFS-14384 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12982833/HDFS-14384.002.patch | | Optional Tests | dupname asflicense compile
[jira] [Commented] (HDFS-14922) On StartUp , Snapshot modification time got changed
[ https://issues.apache.org/jira/browse/HDFS-14922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967843#comment-16967843 ] Hadoop QA commented on HDFS-14922: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 48s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 2s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 56s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 20s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 49s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 623 unchanged - 1 fixed = 624 total (was 624) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 55s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 18s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}108m 23s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 42s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}173m 35s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer | | | hadoop.hdfs.server.namenode.TestDiskspaceQuotaUpdate | | | hadoop.hdfs.tools.TestDFSZKFailoverController | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 | | JIRA Issue | HDFS-14922 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12984981/HDFS-14922.004.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 35f9990087a3 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / bfb8f28 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/28253/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/28253/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results |
[jira] [Updated] (HDDS-2195) Apply spotbugs check to test code
[ https://issues.apache.org/jira/browse/HDDS-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Doroszlai updated HDDS-2195: --- Labels: newbie (was: ) > Apply spotbugs check to test code > - > > Key: HDDS-2195 > URL: https://issues.apache.org/jira/browse/HDDS-2195 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: test >Reporter: Attila Doroszlai >Priority: Major > Labels: newbie > > The goal of this task is to [enable Spotbugs to run on test > code|https://spotbugs.github.io/spotbugs-maven-plugin/spotbugs-mojo.html#includeTests], > and fix all issues it reports (both to improve code and to avoid breaking > CI). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-2195) Apply spotbugs check to test code
[ https://issues.apache.org/jira/browse/HDDS-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Doroszlai reassigned HDDS-2195: -- Assignee: (was: Attila Doroszlai) > Apply spotbugs check to test code > - > > Key: HDDS-2195 > URL: https://issues.apache.org/jira/browse/HDDS-2195 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: test >Reporter: Attila Doroszlai >Priority: Major > > The goal of this task is to [enable Spotbugs to run on test > code|https://spotbugs.github.io/spotbugs-maven-plugin/spotbugs-mojo.html#includeTests], > and fix all issues it reports (both to improve code and to avoid breaking > CI). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2405) int2ByteString unnecessary byte array allocation
[ https://issues.apache.org/jira/browse/HDDS-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Doroszlai updated HDDS-2405: --- Description: {{int2ByteString}} implementations (currently duplicated in [RatisHelper|https://github.com/apache/hadoop-ozone/blob/6b2cda125b3647870ef5b01cf64e3b3e4cdc55db/hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/ratis/RatisHelper.java#L280-L289] and [Checksum|https://github.com/apache/hadoop-ozone/blob/6b2cda125b3647870ef5b01cf64e3b3e4cdc55db/hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/common/Checksum.java#L64-L73], but the first one is being removed in HDDS-2375) result in unnecessary byte array allocations: # {{ByteString.Output}} creates 128-byte buffer by default, which is too large for writing a single int # {{DataOutputStream}} allocates an [extra 8-byte array|https://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/java/io/DataOutputStream.java#l204], used only for writing longs # {{ByteString.Output}} also creates 10-element array for {{flushedBuffers}} was: {{int2ByteString}} implementations (currently duplicated in [RatisHelper|https://github.com/apache/hadoop-ozone/blob/6b2cda125b3647870ef5b01cf64e3b3e4cdc55db/hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/ratis/RatisHelper.java#L280-L289] and [Checksum|https://github.com/apache/hadoop-ozone/blob/6b2cda125b3647870ef5b01cf64e3b3e4cdc55db/hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/common/Checksum.java#L64-L73], but the first one is being removed in HDDS-2375) result in unnecessary byte array allocations: # {{ByteString.Output}} creates 128-byte buffer by default, which is too large for writing a single int # {{DataOutputStream}} allocates an [extra 8-byte array|https://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/java/io/DataOutputStream.java#l204], used only for writing longs > int2ByteString unnecessary byte array allocation > > > Key: HDDS-2405 > URL: https://issues.apache.org/jira/browse/HDDS-2405 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Affects Versions: 0.5.0 >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Minor > > {{int2ByteString}} implementations (currently duplicated in > [RatisHelper|https://github.com/apache/hadoop-ozone/blob/6b2cda125b3647870ef5b01cf64e3b3e4cdc55db/hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/ratis/RatisHelper.java#L280-L289] > and > [Checksum|https://github.com/apache/hadoop-ozone/blob/6b2cda125b3647870ef5b01cf64e3b3e4cdc55db/hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/common/Checksum.java#L64-L73], > but the first one is being removed in HDDS-2375) result in unnecessary byte > array allocations: > # {{ByteString.Output}} creates 128-byte buffer by default, which is too > large for writing a single int > # {{DataOutputStream}} allocates an [extra 8-byte > array|https://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/java/io/DataOutputStream.java#l204], > used only for writing longs > # {{ByteString.Output}} also creates 10-element array for {{flushedBuffers}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14929) Hadoop 2.9.1 rename functionality infrequent breaking
[ https://issues.apache.org/jira/browse/HDFS-14929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967797#comment-16967797 ] Wei-Chiu Chuang commented on HDFS-14929: is this a duplicate of HDFS-14947? > Hadoop 2.9.1 rename functionality infrequent breaking > -- > > Key: HDFS-14929 > URL: https://issues.apache.org/jira/browse/HDFS-14929 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.9.1 >Reporter: abhishek sahani >Priority: Major > > We are infrequently seeing rename functionality not working properly , in > logs it seems file is rename is success but in ui and when listing the files > using ./hdfs dfs -ls-R file is not present . > DEBUG hdfs.StateChange: *DIR* NameNode.rename: > /topics/+tmp/datapipelinefinaltest2.5da59e664cedfd00090d3757.dataPipeLineEvent_15.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_15/year=2019/month=10/day=08/hour=19/894901ae-5913-4ad9-8d65-7071655c2db0_tmp.parquet > to > /topics/datapipelinefinaltest2.5da59e664cedfd00090d3757.dataPipeLineEvent_15.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_15/year=2019/month=10/day=08/hour=19/datapipelinefinaltest2.5da59e664cedfd00090d3757.dataPipeLineEvent_15.topic+10+00+99.parquet19/10/23 > 19:06:41 DEBUG hdfs.StateChange: *DIR* NameNode.rename: > /topics/+tmp/datapipelinefinaltest2.5da59e664cedfd00090d3757.dataPipeLineEvent_15.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_15/year=2019/month=10/day=08/hour=19/894901ae-5913-4ad9-8d65-7071655c2db0_tmp.parquet > to > /topics/datapipelinefinaltest2.5da59e664cedfd00090d3757.dataPipeLineEvent_15.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_15/year=2019/month=10/day=08/hour=19/datapipelinefinaltest2.5da59e664cedfd00090d3757.dataPipeLineEvent_15.topic+10+00+99.parquet > 19/10/23 19:06:41DEBUG hdfs.StateChange: DIR* NameSystem.renameTo: > /topics/+tmp/datapipelinefinaltest2.5da59e664cedfd00090d3757.dataPipeLineEvent_15.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_15/year=2019/month=10/day=08/hour=19/894901ae-5913-4ad9-8d65-7071655c2db0_tmp.parquet > to > /topics/datapipelinefinaltest2.5da59e664cedfd00090d3757.dataPipeLineEvent_15.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_15/year=2019/month=10/day=08/hour=19/datapipelinefinaltest2.5da59e664cedfd00090d3757.dataPipeLineEvent_15.topic+10+00+99.parquet19/10/23 > > 19:06:41DEBUG hdfs.StateChange: DIR* FSDirectory.renameTo: > /topics/+tmp/datapipelinefinaltest2.5da59e664cedfd00090d3757.dataPipeLineEvent_15.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_15/year=2019/month=10/day=08/hour=19/894901ae-5913-4ad9-8d65-7071655c2db0_tmp.parquet > to > /topics/datapipelinefinaltest2.5da59e664cedfd00090d3757.dataPipeLineEvent_15.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_15/year=2019/month=10/day=08/hour=19/datapipelinefinaltest2.5da59e664cedfd00090d3757.dataPipeLineEvent_15.topic+10+00+99.parquet19/10/23 > > 19:06:41 DEBUG hdfs.StateChange: DIR* FSDirectory.unprotectedRenameTo: > /topics/+tmp/datapipelinefinaltest2.5da59e664cedfd00090d3757.dataPipeLineEvent_15.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_15/year=2019/month=10/day=08/hour=19/894901ae-5913-4ad9-8d65-7071655c2db0_tmp.parquet > is renamed to > /topics/datapipelinefinaltest2.5da59e664cedfd00090d3757.dataPipeLineEvent_15.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_15/year=2019/month=10/day=08/hour=19/datapipelinefinaltest2.5da59e664cedfd00090d3757.dataPipeLineEvent_15.topic+10+00+99.parquet19/10/23 > > 19:06:41 DEBUG namenode.FSEditLog: logEdit [RpcEdit op:RenameOldOp [length=0, > src=/topics/+tmp/datapipelinefinaltest2.5da59e664cedfd00090d3757.dataPipeLineEvent_15.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_15/year=2019/month=10/day=08/hour=19/894901ae-5913-4ad9-8d65-7071655c2db0_tmp.parquet, > >
[jira] [Commented] (HDFS-14947) infrequent data loss due to rename functionality breaking
[ https://issues.apache.org/jira/browse/HDFS-14947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967794#comment-16967794 ] Wei-Chiu Chuang commented on HDFS-14947: Hey [~abhishek.sahani] thanks for the details. bq. Firstly the connector task creates a temporary file for partition assigned to it in hdfs inmemory file system and later after certain rotation time temporary file is closed and persisted to filesystem and later the temp file is also renamed in hdfs. In other words, is RAM_DISK / LAZY_PERSIST used (https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/MemoryStorage.html)? That can be an issue. Both Cloudera and Hortonworks don't support this feature officially and I know it's not as robust as it should be. Still, a file goes missing without a reason doesn't make sense to me. > infrequent data loss due to rename functionality breaking > - > > Key: HDFS-14947 > URL: https://issues.apache.org/jira/browse/HDFS-14947 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 2.7.3 >Reporter: abhishek sahani >Priority: Critical > > We are facing an issue where data is getting lost from hdfs during rename , > in namenode logs we check file is renamed successfully but in hdfs after > rename file is not present at destination location and thus we are loosing > the data. > > namenode logs: > 19/10/31 16:54:09 DEBUG top.TopAuditLogger: --- logged event > for top service: allowed=true ugi=root (auth:SIMPLE) ip=/*.*.*.* cmd=rename > src=/topics/+tmp/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_17/year=2019/month=10/day=16/hour=17/351bffa9-15e3-427b-9e02-c9e8823d68d6_tmp.parquet > > dst=/topics/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_17/year=2019/month=10/day=16/hour=17/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic+9+00+99.parquet > perm=root:supergroup:rw-r--r-- > > 19/10/31 16:54:09 DEBUG hdfs.StateChange: DIR* NameSystem.renameTo: > /topics/+tmp/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_17/year=2019/month=10/day=16/hour=17/351bffa9-15e3-427b-9e02-c9e8823d68d6_tmp.parquet > to > /topics/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_17/year=2019/month=10/day=16/hour=17/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic+9+00+99.parquet > 19/10/31 16:54:09 DEBUG ipc.Server: IPC Server handler 8 on 9000: responding > to org.apache.hadoop.hdfs.protocol.ClientProtocol.getFileInfo from > *.*.*.*:39854 Call#48333 Retry#0 > 19/10/31 16:54:09 DEBUG hdfs.StateChange: DIR* FSDirectory.renameTo: > /topics/+tmp/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_17/year=2019/month=10/day=16/hour=17/351bffa9-15e3-427b-9e02-c9e8823d68d6_tmp.parquet > to > /topics/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_17/year=2019/month=10/day=16/hour=17/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic+9+00+99.parquet > 19/10/31 16:54:09 DEBUG ipc.Server: IPC Server handler 6 on 9000: > org.apache.hadoop.hdfs.protocol.ClientProtocol.getFileInfo from *.*.*.*:39854 > Call#48337 Retry#0 for RpcKind RPC_PROTOCOL_BUFFER > 19/10/31 16:54:09 DEBUG hdfs.StateChange: DIR* > FSDirectory.unprotectedRenameTo: > /topics/+tmp/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_17/year=2019/month=10/day=16/hour=17/351bffa9-15e3-427b-9e02-c9e8823d68d6_tmp.parquet > is renamed to >
[jira] [Updated] (HDFS-14940) HDFS Balancer : getBalancerBandwidth displaying wrong values for the maximum network bandwidth used by the datanode while network bandwidth set with values as 1048576000g
[ https://issues.apache.org/jira/browse/HDFS-14940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hemanthboyina updated HDFS-14940: - Attachment: HDFS-14940.001.patch Status: Patch Available (was: Open) > HDFS Balancer : getBalancerBandwidth displaying wrong values for the maximum > network bandwidth used by the datanode while network bandwidth set with > values as 1048576000g/1048p/1e > --- > > Key: HDFS-14940 > URL: https://issues.apache.org/jira/browse/HDFS-14940 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Affects Versions: 3.1.1 > Environment: 3 Node HA Setup >Reporter: Souryakanta Dwivedy >Priority: Minor > Attachments: BalancerBW.PNG, HDFS-14940.001.patch > > > HDFS Balancer : getBalancerBandwidth displaying wrong values for the maximum > network bandwidth used by the datanode > while network bandwidth set with values as 1048576000g/1048p/1e > Steps :- > * Set balancer bandwith with command setBalancerBandwidth and vlaues as > [1048576000g/1048p/1e] > * - Check bandwidth used by the datanode during HDFS block balancing with > command :hdfs dfsadmin -getBalancerBandwidth " check it will display some > different values not the same value as set -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14940) HDFS Balancer : getBalancerBandwidth displaying wrong values for the maximum network bandwidth used by the datanode while network bandwidth set with values as 104857600
[ https://issues.apache.org/jira/browse/HDFS-14940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967772#comment-16967772 ] hemanthboyina commented on HDFS-14940: -- thanks for the suggestion [~kihwal] have added a constant for maximum bandwidth per datanode as 1TB/sec and made an upper bound check for bandwidth . attached patch , please review . > HDFS Balancer : getBalancerBandwidth displaying wrong values for the maximum > network bandwidth used by the datanode while network bandwidth set with > values as 1048576000g/1048p/1e > --- > > Key: HDFS-14940 > URL: https://issues.apache.org/jira/browse/HDFS-14940 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Affects Versions: 3.1.1 > Environment: 3 Node HA Setup >Reporter: Souryakanta Dwivedy >Priority: Minor > Attachments: BalancerBW.PNG > > > HDFS Balancer : getBalancerBandwidth displaying wrong values for the maximum > network bandwidth used by the datanode > while network bandwidth set with values as 1048576000g/1048p/1e > Steps :- > * Set balancer bandwith with command setBalancerBandwidth and vlaues as > [1048576000g/1048p/1e] > * - Check bandwidth used by the datanode during HDFS block balancing with > command :hdfs dfsadmin -getBalancerBandwidth " check it will display some > different values not the same value as set -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14947) infrequent data loss due to rename functionality breaking
[ https://issues.apache.org/jira/browse/HDFS-14947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] abhishek sahani updated HDFS-14947: --- Priority: Blocker (was: Critical) > infrequent data loss due to rename functionality breaking > - > > Key: HDFS-14947 > URL: https://issues.apache.org/jira/browse/HDFS-14947 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 2.7.3 >Reporter: abhishek sahani >Priority: Blocker > > We are facing an issue where data is getting lost from hdfs during rename , > in namenode logs we check file is renamed successfully but in hdfs after > rename file is not present at destination location and thus we are loosing > the data. > > namenode logs: > 19/10/31 16:54:09 DEBUG top.TopAuditLogger: --- logged event > for top service: allowed=true ugi=root (auth:SIMPLE) ip=/*.*.*.* cmd=rename > src=/topics/+tmp/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_17/year=2019/month=10/day=16/hour=17/351bffa9-15e3-427b-9e02-c9e8823d68d6_tmp.parquet > > dst=/topics/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_17/year=2019/month=10/day=16/hour=17/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic+9+00+99.parquet > perm=root:supergroup:rw-r--r-- > > 19/10/31 16:54:09 DEBUG hdfs.StateChange: DIR* NameSystem.renameTo: > /topics/+tmp/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_17/year=2019/month=10/day=16/hour=17/351bffa9-15e3-427b-9e02-c9e8823d68d6_tmp.parquet > to > /topics/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_17/year=2019/month=10/day=16/hour=17/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic+9+00+99.parquet > 19/10/31 16:54:09 DEBUG ipc.Server: IPC Server handler 8 on 9000: responding > to org.apache.hadoop.hdfs.protocol.ClientProtocol.getFileInfo from > *.*.*.*:39854 Call#48333 Retry#0 > 19/10/31 16:54:09 DEBUG hdfs.StateChange: DIR* FSDirectory.renameTo: > /topics/+tmp/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_17/year=2019/month=10/day=16/hour=17/351bffa9-15e3-427b-9e02-c9e8823d68d6_tmp.parquet > to > /topics/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_17/year=2019/month=10/day=16/hour=17/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic+9+00+99.parquet > 19/10/31 16:54:09 DEBUG ipc.Server: IPC Server handler 6 on 9000: > org.apache.hadoop.hdfs.protocol.ClientProtocol.getFileInfo from *.*.*.*:39854 > Call#48337 Retry#0 for RpcKind RPC_PROTOCOL_BUFFER > 19/10/31 16:54:09 DEBUG hdfs.StateChange: DIR* > FSDirectory.unprotectedRenameTo: > /topics/+tmp/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_17/year=2019/month=10/day=16/hour=17/351bffa9-15e3-427b-9e02-c9e8823d68d6_tmp.parquet > is renamed to > /topics/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_17/year=2019/month=10/day=16/hour=17/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic+9+00+99.parquet > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14947) infrequent data loss due to rename functionality breaking
[ https://issues.apache.org/jira/browse/HDFS-14947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] abhishek sahani updated HDFS-14947: --- Priority: Critical (was: Blocker) > infrequent data loss due to rename functionality breaking > - > > Key: HDFS-14947 > URL: https://issues.apache.org/jira/browse/HDFS-14947 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 2.7.3 >Reporter: abhishek sahani >Priority: Critical > > We are facing an issue where data is getting lost from hdfs during rename , > in namenode logs we check file is renamed successfully but in hdfs after > rename file is not present at destination location and thus we are loosing > the data. > > namenode logs: > 19/10/31 16:54:09 DEBUG top.TopAuditLogger: --- logged event > for top service: allowed=true ugi=root (auth:SIMPLE) ip=/*.*.*.* cmd=rename > src=/topics/+tmp/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_17/year=2019/month=10/day=16/hour=17/351bffa9-15e3-427b-9e02-c9e8823d68d6_tmp.parquet > > dst=/topics/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_17/year=2019/month=10/day=16/hour=17/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic+9+00+99.parquet > perm=root:supergroup:rw-r--r-- > > 19/10/31 16:54:09 DEBUG hdfs.StateChange: DIR* NameSystem.renameTo: > /topics/+tmp/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_17/year=2019/month=10/day=16/hour=17/351bffa9-15e3-427b-9e02-c9e8823d68d6_tmp.parquet > to > /topics/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_17/year=2019/month=10/day=16/hour=17/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic+9+00+99.parquet > 19/10/31 16:54:09 DEBUG ipc.Server: IPC Server handler 8 on 9000: responding > to org.apache.hadoop.hdfs.protocol.ClientProtocol.getFileInfo from > *.*.*.*:39854 Call#48333 Retry#0 > 19/10/31 16:54:09 DEBUG hdfs.StateChange: DIR* FSDirectory.renameTo: > /topics/+tmp/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_17/year=2019/month=10/day=16/hour=17/351bffa9-15e3-427b-9e02-c9e8823d68d6_tmp.parquet > to > /topics/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_17/year=2019/month=10/day=16/hour=17/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic+9+00+99.parquet > 19/10/31 16:54:09 DEBUG ipc.Server: IPC Server handler 6 on 9000: > org.apache.hadoop.hdfs.protocol.ClientProtocol.getFileInfo from *.*.*.*:39854 > Call#48337 Retry#0 for RpcKind RPC_PROTOCOL_BUFFER > 19/10/31 16:54:09 DEBUG hdfs.StateChange: DIR* > FSDirectory.unprotectedRenameTo: > /topics/+tmp/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_17/year=2019/month=10/day=16/hour=17/351bffa9-15e3-427b-9e02-c9e8823d68d6_tmp.parquet > is renamed to > /topics/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic/tenant=5da59e664cedfd00090d3757/groupid=5da59e664cedfd00090d3758/project=5da59e664cedfd00090d3759/name=dataPipeLineEvent_17/year=2019/month=10/day=16/hour=17/datapipelinefinaltest14.5da59e664cedfd00090d3757.dataPipeLineEvent_17.topic+9+00+99.parquet > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14806) Bootstrap standby may fail if used in-progress tailing
[ https://issues.apache.org/jira/browse/HDFS-14806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967747#comment-16967747 ] Chen Liang commented on HDFS-14806: --- The remaining javadoc warnings are not introduced by this patch, and the failed tests all passed in my local run. > Bootstrap standby may fail if used in-progress tailing > -- > > Key: HDFS-14806 > URL: https://issues.apache.org/jira/browse/HDFS-14806 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.3.0 >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Attachments: HDFS-14806.001.patch, HDFS-14806.002.patch, > HDFS-14806.003.patch, HDFS-14806.004.patch > > > One issue we went across was that if in-progress tailing is enabled, > bootstrap standby could fail. > When in-progress tailing is enabled, Bootstrap uses the RPC mechanism to get > edits. There is a config {{dfs.ha.tail-edits.qjm.rpc.max-txns}} that sets an > upper bound on how many txnid can be included in one RPC call. The default is > 5000. Meaning bootstraping NN (say NN1) can only pull at most 5000 edits from > JN. However, as part of bootstrap, NN1 queries another NN (say NN2) for NN2's > current transactionID, NN2 may return a state that is > 5000 txnid from NN1's > current image. But NN1 can only see 5000 more txnid from JNs. At this point > NN1 goes panic, because txnid retuned by JNs is behind NN2's returned state, > bootstrap then fail. > Essentially, bootstrap standby can fail if both of two following conditions > are met: > # in-progress tailing is enabled AND > # the boostraping NN is too far (>5000 txid) behind > Increasing the value of {{dfs.ha.tail-edits.qjm.rpc.max-txns}} to some super > large value allowed bootstrap to continue. But this is hardly the ideal > solution. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2321) Ozone Block Token verify should not apply to all datanode cmd
[ https://issues.apache.org/jira/browse/HDDS-2321?focusedWorklogId=338910=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338910 ] ASF GitHub Bot logged work on HDDS-2321: Author: ASF GitHub Bot Created on: 05/Nov/19 18:19 Start Date: 05/Nov/19 18:19 Worklog Time Spent: 10m Work Description: xiaoyuyao commented on pull request #110: HDDS-2321. Ozone Block Token verify should not apply to all datanode … URL: https://github.com/apache/hadoop-ozone/pull/110 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 338910) Time Spent: 20m (was: 10m) > Ozone Block Token verify should not apply to all datanode cmd > - > > Key: HDDS-2321 > URL: https://issues.apache.org/jira/browse/HDDS-2321 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Affects Versions: 0.4.1 >Reporter: Nilotpal Nandi >Assignee: Xiaoyu Yao >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > DN container protocol has cmd send from SCM or other DN, which do not bear OM > block token like OM client. We should restrict the OM Block token check only > for those issued from OM client. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2321) Ozone Block Token verify should not apply to all datanode cmd
[ https://issues.apache.org/jira/browse/HDDS-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDDS-2321: - Fix Version/s: 0.5.0 Resolution: Fixed Status: Resolved (was: Patch Available) Thanks all for the reviews. I've merged the PR to master. > Ozone Block Token verify should not apply to all datanode cmd > - > > Key: HDDS-2321 > URL: https://issues.apache.org/jira/browse/HDDS-2321 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Affects Versions: 0.4.1 >Reporter: Nilotpal Nandi >Assignee: Xiaoyu Yao >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 10m > Remaining Estimate: 0h > > DN container protocol has cmd send from SCM or other DN, which do not bear OM > block token like OM client. We should restrict the OM Block token check only > for those issued from OM client. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14941) Potential editlog race condition can cause corrupted file
[ https://issues.apache.org/jira/browse/HDFS-14941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang updated HDFS-14941: -- Attachment: HDFS-14941.004.patch > Potential editlog race condition can cause corrupted file > - > > Key: HDFS-14941 > URL: https://issues.apache.org/jira/browse/HDFS-14941 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Labels: ha > Attachments: HDFS-14941.001.patch, HDFS-14941.002.patch, > HDFS-14941.003.patch, HDFS-14941.004.patch > > > Recently we encountered an issue that, after a failover, NameNode complains > corrupted file/missing blocks. The blocks did recover after full block > reports, so the blocks are not actually missing. After further investigation, > we believe this is what happened: > First of all, on SbN, it is possible that it receives block reports before > corresponding edit tailing happened. In which case SbN postpones processing > the DN block report, handled by the guarding logic below: > {code:java} > if (shouldPostponeBlocksFromFuture && > namesystem.isGenStampInFuture(iblk)) { > queueReportedBlock(storageInfo, iblk, reportedState, > QUEUE_REASON_FUTURE_GENSTAMP); > continue; > } > {code} > Basically if reported block has a future generation stamp, the DN report gets > requeued. > However, in {{FSNamesystem#storeAllocatedBlock}}, we have the following code: > {code:java} > // allocate new block, record block locations in INode. > newBlock = createNewBlock(); > INodesInPath inodesInPath = INodesInPath.fromINode(pendingFile); > saveAllocatedBlock(src, inodesInPath, newBlock, targets); > persistNewBlock(src, pendingFile); > offset = pendingFile.computeFileSize(); > {code} > The line > {{newBlock = createNewBlock();}} > Would log an edit entry {{OP_SET_GENSTAMP_V2}} to bump generation stamp on > Standby > while the following line > {{persistNewBlock(src, pendingFile);}} > would log another edit entry {{OP_ADD_BLOCK}} to actually add the block on > Standby. > Then the race condition is that, imagine Standby has just processed > {{OP_SET_GENSTAMP_V2}}, but not yet {{OP_ADD_BLOCK}} (if they just happen to > be in different setment). Now a block report with new generation stamp comes > in. > Since the genstamp bump has already been processed, the reported block may > not be considered as future block. So the guarding logic passes. But > actually, the block hasn't been added to blockmap, because the second edit is > yet to be tailed. So, the block then gets added to invalidate block list and > we saw messages like: > {code:java} > BLOCK* addBlock: block XXX on node XXX size XXX does not belong to any file > {code} > Even worse, since this IBR is effectively lost, the NameNode has no > information about this block, until the next full block report. So after a > failover, the NN marks it as corrupt. > This issue won't happen though, if both of the edit entries get tailed all > together, so no IBR processing can happen in between. But in our case, we set > edit tailing interval to super low (to allow Standby read), so when under > high workload, there is a much much higher chance that the two entries are > tailed separately, causing the issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14941) Potential editlog race condition can cause corrupted file
[ https://issues.apache.org/jira/browse/HDFS-14941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967735#comment-16967735 ] Chen Liang commented on HDFS-14941: --- v004 patch to fix checkstyle warnings. > Potential editlog race condition can cause corrupted file > - > > Key: HDFS-14941 > URL: https://issues.apache.org/jira/browse/HDFS-14941 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Chen Liang >Assignee: Chen Liang >Priority: Major > Labels: ha > Attachments: HDFS-14941.001.patch, HDFS-14941.002.patch, > HDFS-14941.003.patch, HDFS-14941.004.patch > > > Recently we encountered an issue that, after a failover, NameNode complains > corrupted file/missing blocks. The blocks did recover after full block > reports, so the blocks are not actually missing. After further investigation, > we believe this is what happened: > First of all, on SbN, it is possible that it receives block reports before > corresponding edit tailing happened. In which case SbN postpones processing > the DN block report, handled by the guarding logic below: > {code:java} > if (shouldPostponeBlocksFromFuture && > namesystem.isGenStampInFuture(iblk)) { > queueReportedBlock(storageInfo, iblk, reportedState, > QUEUE_REASON_FUTURE_GENSTAMP); > continue; > } > {code} > Basically if reported block has a future generation stamp, the DN report gets > requeued. > However, in {{FSNamesystem#storeAllocatedBlock}}, we have the following code: > {code:java} > // allocate new block, record block locations in INode. > newBlock = createNewBlock(); > INodesInPath inodesInPath = INodesInPath.fromINode(pendingFile); > saveAllocatedBlock(src, inodesInPath, newBlock, targets); > persistNewBlock(src, pendingFile); > offset = pendingFile.computeFileSize(); > {code} > The line > {{newBlock = createNewBlock();}} > Would log an edit entry {{OP_SET_GENSTAMP_V2}} to bump generation stamp on > Standby > while the following line > {{persistNewBlock(src, pendingFile);}} > would log another edit entry {{OP_ADD_BLOCK}} to actually add the block on > Standby. > Then the race condition is that, imagine Standby has just processed > {{OP_SET_GENSTAMP_V2}}, but not yet {{OP_ADD_BLOCK}} (if they just happen to > be in different setment). Now a block report with new generation stamp comes > in. > Since the genstamp bump has already been processed, the reported block may > not be considered as future block. So the guarding logic passes. But > actually, the block hasn't been added to blockmap, because the second edit is > yet to be tailed. So, the block then gets added to invalidate block list and > we saw messages like: > {code:java} > BLOCK* addBlock: block XXX on node XXX size XXX does not belong to any file > {code} > Even worse, since this IBR is effectively lost, the NameNode has no > information about this block, until the next full block report. So after a > failover, the NN marks it as corrupt. > This issue won't happen though, if both of the edit entries get tailed all > together, so no IBR processing can happen in between. But in our case, we set > edit tailing interval to super low (to allow Standby read), so when under > high workload, there is a much much higher chance that the two entries are > tailed separately, causing the issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14775) Add Timestamp for longest FSN write/read lock held log
[ https://issues.apache.org/jira/browse/HDFS-14775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967713#comment-16967713 ] Hudson commented on HDFS-14775: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17609 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17609/]) HDFS-14775. Add Timestamp for longest FSN write/read lock held log. (inigoiri: rev bfb8f28cc995241e7387ceba8e14791b8c121956) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystemLock.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSNamesystemLock.java > Add Timestamp for longest FSN write/read lock held log > -- > > Key: HDFS-14775 > URL: https://issues.apache.org/jira/browse/HDFS-14775 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Chen Zhang >Assignee: Chen Zhang >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14775.001.patch, HDFS-14775.002.patch, > HDFS-14775.003.patch, HDFS-14775.004.patch, HDFS-14775.005.patch > > > HDFS-13946 improved the log for longest read/write lock held time, it's very > useful improvement. > In some condition, we need to locate the detailed call information(user, ip, > path, etc.) for longest lock holder, but the default throttle interval(10s) > is too long to find the corresponding audit log. I think we should add the > timestamp for the {{longestWriteLockHeldStackTrace}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14949) HttpFS does not support getServerDefaults()
[ https://issues.apache.org/jira/browse/HDFS-14949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967711#comment-16967711 ] hemanthboyina commented on HDFS-14949: -- Have gone through some filesystem api's getServerDefaults() was deprecated and getServerDefaults(Path) calls again the getServerDefaults() i think we need to add getServerDefaults(Path p) . please correct me if am wrong . your suggestions [~elgoiri] [~kihwal] ? > HttpFS does not support getServerDefaults() > --- > > Key: HDFS-14949 > URL: https://issues.apache.org/jira/browse/HDFS-14949 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Kihwal Lee >Assignee: hemanthboyina >Priority: Major > Attachments: HDFS-14949.001.patch, HDFS-14949.002.patch, > HDFS-14949.003.patch > > > For HttpFS server to function as a fully webhdfs-compatible service, > getServerDefaults() support is needed. It is increasingly used in new > features and improvements. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14922) On StartUp , Snapshot modification time got changed
[ https://issues.apache.org/jira/browse/HDFS-14922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hemanthboyina updated HDFS-14922: - Attachment: HDFS-14922.004.patch > On StartUp , Snapshot modification time got changed > --- > > Key: HDFS-14922 > URL: https://issues.apache.org/jira/browse/HDFS-14922 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: hemanthboyina >Assignee: hemanthboyina >Priority: Major > Attachments: HDFS-14922.001.patch, HDFS-14922.002.patch, > HDFS-14922.003.patch, HDFS-14922.004.patch > > > Snapshot modification time got changed on namenode restart -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14928) UI: unifying the WebUI across different components.
[ https://issues.apache.org/jira/browse/HDFS-14928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967702#comment-16967702 ] Íñigo Goiri commented on HDFS-14928: Yes, please, go ahead with #1. > UI: unifying the WebUI across different components. > --- > > Key: HDFS-14928 > URL: https://issues.apache.org/jira/browse/HDFS-14928 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ui >Reporter: Xieming Li >Priority: Trivial > Attachments: DN_orig.png, DN_with_legend.png.png, DN_wo_legend.png, > NN_orig.png, NN_with_legend.png, NN_wo_legend.png, RBF_orig.png, > RBF_with_legend.png, RBF_wo_legend.png > > > The WebUI of different components could be unified. > *Router:* > |Current| !RBF_orig.png|width=500! | > |Proposed 1 (With Icon) | !RBF_wo_legend.png|width=500! | > |Proposed 2 (With Icon and Legend)|!RBF_with_legend.png|width=500! | > *NameNode:* > |Current| !NN_orig.png|width=500! | > |Proposed 1 (With Icon) | !NN_wo_legend.png|width=500! | > |Proposed 2 (With Icon and Legend)| !NN_with_legend.png|width=500! | > *DataNode:* > |Current| !DN_orig.png|width=500! | > |Proposed 1 (With Icon) | !DN_wo_legend.png|width=500! | > |Proposed 2 (With Icon and Legend)| !DN_with_legend.png.png|width=500! | -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14775) Add Timestamp for longest FSN write/read lock held log
[ https://issues.apache.org/jira/browse/HDFS-14775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967691#comment-16967691 ] Íñigo Goiri commented on HDFS-14775: Thanks [~zhangchen] for the patch and [~xkrogen] and [~hexiaoqiao] for the reviews. Committed to trunk. > Add Timestamp for longest FSN write/read lock held log > -- > > Key: HDFS-14775 > URL: https://issues.apache.org/jira/browse/HDFS-14775 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Chen Zhang >Assignee: Chen Zhang >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14775.001.patch, HDFS-14775.002.patch, > HDFS-14775.003.patch, HDFS-14775.004.patch, HDFS-14775.005.patch > > > HDFS-13946 improved the log for longest read/write lock held time, it's very > useful improvement. > In some condition, we need to locate the detailed call information(user, ip, > path, etc.) for longest lock holder, but the default throttle interval(10s) > is too long to find the corresponding audit log. I think we should add the > timestamp for the {{longestWriteLockHeldStackTrace}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14775) Add Timestamp for longest FSN write/read lock held log
[ https://issues.apache.org/jira/browse/HDFS-14775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Íñigo Goiri updated HDFS-14775: --- Fix Version/s: 3.3.0 Hadoop Flags: Reviewed Resolution: Fixed Status: Resolved (was: Patch Available) > Add Timestamp for longest FSN write/read lock held log > -- > > Key: HDFS-14775 > URL: https://issues.apache.org/jira/browse/HDFS-14775 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Chen Zhang >Assignee: Chen Zhang >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14775.001.patch, HDFS-14775.002.patch, > HDFS-14775.003.patch, HDFS-14775.004.patch, HDFS-14775.005.patch > > > HDFS-13946 improved the log for longest read/write lock held time, it's very > useful improvement. > In some condition, we need to locate the detailed call information(user, ip, > path, etc.) for longest lock holder, but the default throttle interval(10s) > is too long to find the corresponding audit log. I think we should add the > timestamp for the {{longestWriteLockHeldStackTrace}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-4935) add symlink support to HttpFS server side
[ https://issues.apache.org/jira/browse/HDFS-4935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Hussein updated HDFS-4935: Attachment: HDFS-4935.001.patch > add symlink support to HttpFS server side > - > > Key: HDFS-4935 > URL: https://issues.apache.org/jira/browse/HDFS-4935 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.3.0 > Environment: followup on HADOOP-8040 >Reporter: Alejandro Abdelnur >Assignee: Ahmed Hussein >Priority: Major > Attachments: HDFS-4935.001.patch > > > follow up on HADOOP-8040 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2270) Avoid buffer copying in ContainerStateMachine.loadSnapshot/persistContainerSet
[ https://issues.apache.org/jira/browse/HDDS-2270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Doroszlai updated HDDS-2270: --- Status: Patch Available (was: In Progress) > Avoid buffer copying in ContainerStateMachine.loadSnapshot/persistContainerSet > -- > > Key: HDDS-2270 > URL: https://issues.apache.org/jira/browse/HDDS-2270 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Datanode >Reporter: Tsz-wo Sze >Assignee: Attila Doroszlai >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > ContainerStateMachine: > - In loadSnapshot(..), it first reads the snapshotFile to a byte[] and then > parses it to ContainerProtos.Container2BCSIDMapProto. The buffer copying can > be avoided. > {code} > try (FileInputStream fin = new FileInputStream(snapshotFile)) { > byte[] container2BCSIDData = IOUtils.toByteArray(fin); > ContainerProtos.Container2BCSIDMapProto proto = > ContainerProtos.Container2BCSIDMapProto > .parseFrom(container2BCSIDData); > ... > } > {code} > - persistContainerSet(..) has similar problem. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2064) Add tests for incorrect OM HA config when node ID or RPC address is not configured
[ https://issues.apache.org/jira/browse/HDDS-2064?focusedWorklogId=338829=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338829 ] ASF GitHub Bot logged work on HDDS-2064: Author: ASF GitHub Bot Created on: 05/Nov/19 16:06 Start Date: 05/Nov/19 16:06 Worklog Time Spent: 10m Work Description: smengcl commented on pull request #119: HDDS-2064. Add tests for incorrect OM HA config when node ID or RPC address is not configured URL: https://github.com/apache/hadoop-ozone/pull/119 ## What changes were proposed in this pull request? Add two unit tests for HDDS-2162, when OM service ID is specified: (1) Cluster should fail to start if a list of OM Node IDs is **not** specified; (2) Cluster should fail to start if a list of OM Node IDs is specified, but OM RPC address is **not** specified. ## What is the link to the Apache JIRA https://issues.apache.org/jira/browse/HDDS-2064 ## How was this patch tested? Run the two newly added unit tests in this patch. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 338829) Time Spent: 2h (was: 1h 50m) > Add tests for incorrect OM HA config when node ID or RPC address is not > configured > -- > > Key: HDDS-2064 > URL: https://issues.apache.org/jira/browse/HDDS-2064 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Siyao Meng >Assignee: Siyao Meng >Priority: Major > Labels: pull-request-available > Time Spent: 2h > Remaining Estimate: 0h > > -OM will NPE and crash when `ozone.om.service.ids=id1,id2` is configured but > `ozone.om.nodes.id1` doesn't exist; or `ozone.om.address.id1.omX` doesn't > exist.- > -Root cause:- > -`OzoneManager#loadOMHAConfigs()` didn't check the case where `found == 0`. > This happens when local OM doesn't match any `ozone.om.address.idX.omX` in > the config.- > Due to the refactoring done in HDDS-2162. This fix has been included in that > commit. I will repurpose the jira to add some tests for the HA config. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2064) Add tests for incorrect OM HA config when node ID or RPC address is not configured
[ https://issues.apache.org/jira/browse/HDDS-2064?focusedWorklogId=338822=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338822 ] ASF GitHub Bot logged work on HDDS-2064: Author: ASF GitHub Bot Created on: 05/Nov/19 16:04 Start Date: 05/Nov/19 16:04 Worklog Time Spent: 10m Work Description: smengcl commented on issue #1398: HDDS-2064. OzoneManagerRatisServer#newOMRatisServer throws NPE when OM HA is configured incorrectly URL: https://github.com/apache/hadoop/pull/1398#issuecomment-549885489 Due to the refactoring done in HDDS-2162. This fix has been included in that commit. I will repurpose the jira to add a unit test for the HA config. I'm closing this PR. Will open another one in hadoop-ozone repo. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 338822) Time Spent: 1h 40m (was: 1.5h) > Add tests for incorrect OM HA config when node ID or RPC address is not > configured > -- > > Key: HDDS-2064 > URL: https://issues.apache.org/jira/browse/HDDS-2064 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Siyao Meng >Assignee: Siyao Meng >Priority: Major > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > -OM will NPE and crash when `ozone.om.service.ids=id1,id2` is configured but > `ozone.om.nodes.id1` doesn't exist; or `ozone.om.address.id1.omX` doesn't > exist.- > -Root cause:- > -`OzoneManager#loadOMHAConfigs()` didn't check the case where `found == 0`. > This happens when local OM doesn't match any `ozone.om.address.idX.omX` in > the config.- > Due to the refactoring done in HDDS-2162. This fix has been included in that > commit. I will repurpose the jira to add some tests for the HA config. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2064) Add tests for incorrect OM HA config when node ID or RPC address is not configured
[ https://issues.apache.org/jira/browse/HDDS-2064?focusedWorklogId=338823=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338823 ] ASF GitHub Bot logged work on HDDS-2064: Author: ASF GitHub Bot Created on: 05/Nov/19 16:04 Start Date: 05/Nov/19 16:04 Worklog Time Spent: 10m Work Description: smengcl commented on pull request #1398: HDDS-2064. OzoneManagerRatisServer#newOMRatisServer throws NPE when OM HA is configured incorrectly URL: https://github.com/apache/hadoop/pull/1398 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 338823) Time Spent: 1h 50m (was: 1h 40m) > Add tests for incorrect OM HA config when node ID or RPC address is not > configured > -- > > Key: HDDS-2064 > URL: https://issues.apache.org/jira/browse/HDDS-2064 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Siyao Meng >Assignee: Siyao Meng >Priority: Major > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > -OM will NPE and crash when `ozone.om.service.ids=id1,id2` is configured but > `ozone.om.nodes.id1` doesn't exist; or `ozone.om.address.id1.omX` doesn't > exist.- > -Root cause:- > -`OzoneManager#loadOMHAConfigs()` didn't check the case where `found == 0`. > This happens when local OM doesn't match any `ozone.om.address.idX.omX` in > the config.- > Due to the refactoring done in HDDS-2162. This fix has been included in that > commit. I will repurpose the jira to add some tests for the HA config. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2064) Add tests for incorrect OM HA config when node ID or RPC address is not configured
[ https://issues.apache.org/jira/browse/HDDS-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siyao Meng updated HDDS-2064: - Summary: Add tests for incorrect OM HA config when node ID or RPC address is not configured (was: OzoneManagerRatisServer#newOMRatisServer throws NPE when OM HA is configured incorrectly) > Add tests for incorrect OM HA config when node ID or RPC address is not > configured > -- > > Key: HDDS-2064 > URL: https://issues.apache.org/jira/browse/HDDS-2064 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Siyao Meng >Assignee: Siyao Meng >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > OM will NPE and crash when `ozone.om.service.ids=id1,id2` is configured but > `ozone.om.nodes.id1` doesn't exist; or `ozone.om.address.id1.omX` doesn't > exist. > Root cause: > `OzoneManager#loadOMHAConfigs()` didn't check the case where `found == 0`. > This happens when local OM doesn't match any `ozone.om.address.idX.omX` in > the config. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2064) Add tests for incorrect OM HA config when node ID or RPC address is not configured
[ https://issues.apache.org/jira/browse/HDDS-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siyao Meng updated HDDS-2064: - Description: -OM will NPE and crash when `ozone.om.service.ids=id1,id2` is configured but `ozone.om.nodes.id1` doesn't exist; or `ozone.om.address.id1.omX` doesn't exist.- -Root cause:- -`OzoneManager#loadOMHAConfigs()` didn't check the case where `found == 0`. This happens when local OM doesn't match any `ozone.om.address.idX.omX` in the config.- Due to the refactoring done in HDDS-2162. This fix has been included in that commit. I will repurpose the jira to add some tests for the HA config. was: -OM will NPE and crash when `ozone.om.service.ids=id1,id2` is configured but `ozone.om.nodes.id1` doesn't exist; or `ozone.om.address.id1.omX` doesn't exist. Root cause: `OzoneManager#loadOMHAConfigs()` didn't check the case where `found == 0`. This happens when local OM doesn't match any `ozone.om.address.idX.omX` in the config. - Due to the refactoring done in HDDS-2162. This fix has been included in that commit. I will repurpose the jira to add some tests for the HA config. > Add tests for incorrect OM HA config when node ID or RPC address is not > configured > -- > > Key: HDDS-2064 > URL: https://issues.apache.org/jira/browse/HDDS-2064 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Siyao Meng >Assignee: Siyao Meng >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > -OM will NPE and crash when `ozone.om.service.ids=id1,id2` is configured but > `ozone.om.nodes.id1` doesn't exist; or `ozone.om.address.id1.omX` doesn't > exist.- > -Root cause:- > -`OzoneManager#loadOMHAConfigs()` didn't check the case where `found == 0`. > This happens when local OM doesn't match any `ozone.om.address.idX.omX` in > the config.- > Due to the refactoring done in HDDS-2162. This fix has been included in that > commit. I will repurpose the jira to add some tests for the HA config. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2064) Add tests for incorrect OM HA config when node ID or RPC address is not configured
[ https://issues.apache.org/jira/browse/HDDS-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siyao Meng updated HDDS-2064: - Description: -OM will NPE and crash when `ozone.om.service.ids=id1,id2` is configured but `ozone.om.nodes.id1` doesn't exist; or `ozone.om.address.id1.omX` doesn't exist. Root cause: `OzoneManager#loadOMHAConfigs()` didn't check the case where `found == 0`. This happens when local OM doesn't match any `ozone.om.address.idX.omX` in the config. - Due to the refactoring done in HDDS-2162. This fix has been included in that commit. I will repurpose the jira to add some tests for the HA config. was: OM will NPE and crash when `ozone.om.service.ids=id1,id2` is configured but `ozone.om.nodes.id1` doesn't exist; or `ozone.om.address.id1.omX` doesn't exist. Root cause: `OzoneManager#loadOMHAConfigs()` didn't check the case where `found == 0`. This happens when local OM doesn't match any `ozone.om.address.idX.omX` in the config. > Add tests for incorrect OM HA config when node ID or RPC address is not > configured > -- > > Key: HDDS-2064 > URL: https://issues.apache.org/jira/browse/HDDS-2064 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Siyao Meng >Assignee: Siyao Meng >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > -OM will NPE and crash when `ozone.om.service.ids=id1,id2` is configured but > `ozone.om.nodes.id1` doesn't exist; or `ozone.om.address.id1.omX` doesn't > exist. > Root cause: > `OzoneManager#loadOMHAConfigs()` didn't check the case where `found == 0`. > This happens when local OM doesn't match any `ozone.om.address.idX.omX` in > the config. > - > Due to the refactoring done in HDDS-2162. This fix has been included in that > commit. I will repurpose the jira to add some tests for the HA config. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14775) Add Timestamp for longest FSN write/read lock held log
[ https://issues.apache.org/jira/browse/HDFS-14775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967617#comment-16967617 ] Erik Krogen commented on HDFS-14775: +1 thanks [~zhangchen]! > Add Timestamp for longest FSN write/read lock held log > -- > > Key: HDFS-14775 > URL: https://issues.apache.org/jira/browse/HDFS-14775 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Chen Zhang >Assignee: Chen Zhang >Priority: Major > Attachments: HDFS-14775.001.patch, HDFS-14775.002.patch, > HDFS-14775.003.patch, HDFS-14775.004.patch, HDFS-14775.005.patch > > > HDFS-13946 improved the log for longest read/write lock held time, it's very > useful improvement. > In some condition, we need to locate the detailed call information(user, ip, > path, etc.) for longest lock holder, but the default throttle interval(10s) > is too long to find the corresponding audit log. I think we should add the > timestamp for the {{longestWriteLockHeldStackTrace}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work started] (HDDS-1987) Fix listStatus API
[ https://issues.apache.org/jira/browse/HDDS-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDDS-1987 started by Siyao Meng. > Fix listStatus API > -- > > Key: HDDS-1987 > URL: https://issues.apache.org/jira/browse/HDDS-1987 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task >Reporter: Bharat Viswanadham >Assignee: Siyao Meng >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > This Jira is to fix listStatus API in HA code path. > In HA, we have an in-memory cache, where we put the result to in-memory cache > and return the response. It will be picked by double buffer thread and > flushed to disk later. So when user call listStatus, it should use both > in-memory cache and rocksdb key table to return the correct result. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2270) Avoid buffer copying in ContainerStateMachine.loadSnapshot/persistContainerSet
[ https://issues.apache.org/jira/browse/HDDS-2270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-2270: - Labels: pull-request-available (was: ) > Avoid buffer copying in ContainerStateMachine.loadSnapshot/persistContainerSet > -- > > Key: HDDS-2270 > URL: https://issues.apache.org/jira/browse/HDDS-2270 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Datanode >Reporter: Tsz-wo Sze >Assignee: Attila Doroszlai >Priority: Major > Labels: pull-request-available > > ContainerStateMachine: > - In loadSnapshot(..), it first reads the snapshotFile to a byte[] and then > parses it to ContainerProtos.Container2BCSIDMapProto. The buffer copying can > be avoided. > {code} > try (FileInputStream fin = new FileInputStream(snapshotFile)) { > byte[] container2BCSIDData = IOUtils.toByteArray(fin); > ContainerProtos.Container2BCSIDMapProto proto = > ContainerProtos.Container2BCSIDMapProto > .parseFrom(container2BCSIDData); > ... > } > {code} > - persistContainerSet(..) has similar problem. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2270) Avoid buffer copying in ContainerStateMachine.loadSnapshot/persistContainerSet
[ https://issues.apache.org/jira/browse/HDDS-2270?focusedWorklogId=338720=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338720 ] ASF GitHub Bot logged work on HDDS-2270: Author: ASF GitHub Bot Created on: 05/Nov/19 13:43 Start Date: 05/Nov/19 13:43 Worklog Time Spent: 10m Work Description: adoroszlai commented on pull request #118: HDDS-2270. Avoid buffer copying in ContainerStateMachine URL: https://github.com/apache/hadoop-ozone/pull/118 ## What changes were proposed in this pull request? Eliminate temporary `byte[]` buffer in `ContainerStateMachine` (`loadSnapshot` and `persistContainerSet`). https://issues.apache.org/jira/browse/HDDS-2270 ## How was this patch tested? Verified on a docker-compose cluster that datanode writes/reads the snapshot info successfully. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 338720) Remaining Estimate: 0h Time Spent: 10m > Avoid buffer copying in ContainerStateMachine.loadSnapshot/persistContainerSet > -- > > Key: HDDS-2270 > URL: https://issues.apache.org/jira/browse/HDDS-2270 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Datanode >Reporter: Tsz-wo Sze >Assignee: Attila Doroszlai >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > ContainerStateMachine: > - In loadSnapshot(..), it first reads the snapshotFile to a byte[] and then > parses it to ContainerProtos.Container2BCSIDMapProto. The buffer copying can > be avoided. > {code} > try (FileInputStream fin = new FileInputStream(snapshotFile)) { > byte[] container2BCSIDData = IOUtils.toByteArray(fin); > ContainerProtos.Container2BCSIDMapProto proto = > ContainerProtos.Container2BCSIDMapProto > .parseFrom(container2BCSIDData); > ... > } > {code} > - persistContainerSet(..) has similar problem. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14643) [Dynamometer] Merge extra commits from GitHub to Hadoop
[ https://issues.apache.org/jira/browse/HDFS-14643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967520#comment-16967520 ] Takanobu Asanuma edited comment on HDFS-14643 at 11/5/19 1:22 PM: -- Some of them have been already merged. * HDFS-14817: [PR #70|https://github.com/linkedin/dynamometer/pull/70] * HDFS-14824: [PR #76|https://github.com/linkedin/dynamometer/pull/76], [PR #92|https://github.com/linkedin/dynamometer/pull/92], [PR #96|https://github.com/linkedin/dynamometer/pull/96] * HDFS-14825: [PR #84|https://github.com/linkedin/dynamometer/pull/84] was (Author: tasanuma0829): Some of them have been already merged. * HDFS-14817: [PR #90|https://github.com/linkedin/dynamometer/pull/90] * HDFS-14824: [PR #76|https://github.com/linkedin/dynamometer/pull/76], [PR #92|https://github.com/linkedin/dynamometer/pull/92], [PR #96|https://github.com/linkedin/dynamometer/pull/96] * HDFS-14825: [PR #84|https://github.com/linkedin/dynamometer/pull/84] > [Dynamometer] Merge extra commits from GitHub to Hadoop > --- > > Key: HDFS-14643 > URL: https://issues.apache.org/jira/browse/HDFS-14643 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > > While Dynamometer was in the process of being committed to Hadoop, a few > patches went in to the GitHub version that haven't yet made it into the > version committed here. Some of them are related to TravisCI and Bintray > deployment, which can safely be ignored in a Hadoop context, but a few are > relevant: > {code} > * 2d2591e 2019-05-24 Make XML parsing error message more explicit (PR #97) > [lfengnan ] > * 755a298 2019-04-04 Fix misimplemented CountTimeWritable setter and update > the README docs regarding the output file (PR #96) [Christopher Gregorian > ] > * 66d3e19 2019-03-14 Modify AuditReplay workflow to output count and latency > of operations (PR #92) [Christopher Gregorian ] > * 5c1d8cd 2019-02-28 Fix issues with the start-workload.sh script (PR #84) > [Erik Krogen ] > {code} > I will use this ticket to track porting these 4 commits into Hadoop's > Dynamometer. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14643) [Dynamometer] Merge extra commits from GitHub to Hadoop
[ https://issues.apache.org/jira/browse/HDFS-14643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967520#comment-16967520 ] Takanobu Asanuma commented on HDFS-14643: - Some of them have been already merged. * HDFS-14817: [PR #90|https://github.com/linkedin/dynamometer/pull/90] * HDFS-14824: [PR #76|https://github.com/linkedin/dynamometer/pull/76], [PR #92|https://github.com/linkedin/dynamometer/pull/92], [PR #96|https://github.com/linkedin/dynamometer/pull/96] * HDFS-14825: [PR #84|https://github.com/linkedin/dynamometer/pull/84] > [Dynamometer] Merge extra commits from GitHub to Hadoop > --- > > Key: HDFS-14643 > URL: https://issues.apache.org/jira/browse/HDFS-14643 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > > While Dynamometer was in the process of being committed to Hadoop, a few > patches went in to the GitHub version that haven't yet made it into the > version committed here. Some of them are related to TravisCI and Bintray > deployment, which can safely be ignored in a Hadoop context, but a few are > relevant: > {code} > * 2d2591e 2019-05-24 Make XML parsing error message more explicit (PR #97) > [lfengnan ] > * 755a298 2019-04-04 Fix misimplemented CountTimeWritable setter and update > the README docs regarding the output file (PR #96) [Christopher Gregorian > ] > * 66d3e19 2019-03-14 Modify AuditReplay workflow to output count and latency > of operations (PR #92) [Christopher Gregorian ] > * 5c1d8cd 2019-02-28 Fix issues with the start-workload.sh script (PR #84) > [Erik Krogen ] > {code} > I will use this ticket to track porting these 4 commits into Hadoop's > Dynamometer. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14499) Misleading REM_QUOTA value with snapshot and trash feature enabled for a directory
[ https://issues.apache.org/jira/browse/HDFS-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967474#comment-16967474 ] Surendra Singh Lilhore commented on HDFS-14499: --- I feel this fix is wrong, {{INodeReference}} are created for snapshots. If space quota for \{{INodeReference}} is calculated then snapshot diff also should be conside. {code:java} return referred.computeContentSummary(id, summary);{code} This call just count the current Inode size, not the FileDiff size. Please correct me, If I am wrong... > Misleading REM_QUOTA value with snapshot and trash feature enabled for a > directory > -- > > Key: HDFS-14499 > URL: https://issues.apache.org/jira/browse/HDFS-14499 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 3.3.0, 3.1.4, 3.2.2 > > Attachments: HDFS-14499.000.patch, HDFS-14499.001.patch, > HDFS-14499.002.patch > > > This is the flow of steps where we see a discrepancy between REM_QUOTA and > new file operation failure. REM_QUOTA shows a value of 1 but file creation > operation does not succeed. > {code:java} > hdfs@c3265-node3 root$ hdfs dfs -mkdir /dir1 > hdfs@c3265-node3 root$ hdfs dfsadmin -setQuota 2 /dir1 > hdfs@c3265-node3 root$ hdfs dfsadmin -allowSnapshot /dir1 > Allowing snaphot on /dir1 succeeded > hdfs@c3265-node3 root$ hdfs dfs -touchz /dir1/file1 > hdfs@c3265-node3 root$ hdfs dfs -createSnapshot /dir1 snap1 > Created snapshot /dir1/.snapshot/snap1 > hdfs@c3265-node3 root$ hdfs dfs -count -v -q /dir1 > QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE > PATHNAME > 2 0 none inf 1 1 0 /dir1 > hdfs@c3265-node3 root$ hdfs dfs -rm /dir1/file1 > 19/03/26 11:20:25 INFO fs.TrashPolicyDefault: Moved: > 'hdfs://smajetinn/dir1/file1' to trash at: > hdfs://smajetinn/user/hdfs/.Trash/Current/dir1/file11553599225772 > hdfs@c3265-node3 root$ hdfs dfs -count -v -q /dir1 > QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE > PATHNAME > 2 1 none inf 1 0 0 /dir1 > hdfs@c3265-node3 root$ hdfs dfs -touchz /dir1/file1 > touchz: The NameSpace quota (directories and files) of directory /dir1 is > exceeded: quota=2 file count=3{code} > The issue here, is that the count command takes only files and directories > into account not the inode references. When trash is enabled, the deletion of > files inside a directory actually does a rename operation as a result of > which an inode reference is maintained in the deleted list of the snapshot > diff which is taken into account while computing the namespace quota, but > count command (getContentSummary()) ,just takes into account just the files > and directories, not the referenced entity for calculating the REM_QUOTA. The > referenced entity is taken into account for space quota only. > InodeReference.java: > --- > {code:java} > @Override > public final ContentSummaryComputationContext computeContentSummary( > int snapshotId, ContentSummaryComputationContext summary) { > final int s = snapshotId < lastSnapshotId ? snapshotId : lastSnapshotId; > // only count storagespace for WithName > final QuotaCounts q = computeQuotaUsage( > summary.getBlockStoragePolicySuite(), getStoragePolicyID(), false, > s); > summary.getCounts().addContent(Content.DISKSPACE, q.getStorageSpace()); > summary.getCounts().addTypeSpaces(q.getTypeSpaces()); > return summary; > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work started] (HDDS-2270) Avoid buffer copying in ContainerStateMachine.loadSnapshot/persistContainerSet
[ https://issues.apache.org/jira/browse/HDDS-2270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDDS-2270 started by Attila Doroszlai. -- > Avoid buffer copying in ContainerStateMachine.loadSnapshot/persistContainerSet > -- > > Key: HDDS-2270 > URL: https://issues.apache.org/jira/browse/HDDS-2270 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Datanode >Reporter: Tsz-wo Sze >Assignee: Attila Doroszlai >Priority: Major > > ContainerStateMachine: > - In loadSnapshot(..), it first reads the snapshotFile to a byte[] and then > parses it to ContainerProtos.Container2BCSIDMapProto. The buffer copying can > be avoided. > {code} > try (FileInputStream fin = new FileInputStream(snapshotFile)) { > byte[] container2BCSIDData = IOUtils.toByteArray(fin); > ContainerProtos.Container2BCSIDMapProto proto = > ContainerProtos.Container2BCSIDMapProto > .parseFrom(container2BCSIDData); > ... > } > {code} > - persistContainerSet(..) has similar problem. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2406) ozone shell key get throws IllegalArgumentException if pipeline is empty
Attila Doroszlai created HDDS-2406: -- Summary: ozone shell key get throws IllegalArgumentException if pipeline is empty Key: HDDS-2406 URL: https://issues.apache.org/jira/browse/HDDS-2406 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone CLI Reporter: Attila Doroszlai {{ozone shell key get}} throws when trying to get a key from a pipeline whose datanodes are all down: {code} java.lang.IllegalArgumentException at com.google.common.base.Preconditions.checkArgument(Preconditions.java:72) at org.apache.hadoop.hdds.scm.XceiverClientManager.acquireClient(XceiverClientManager.java:169) at org.apache.hadoop.hdds.scm.XceiverClientManager.acquireClientForReadData(XceiverClientManager.java:162) at org.apache.hadoop.hdds.scm.storage.BlockInputStream.getChunkInfos(BlockInputStream.java:154) at org.apache.hadoop.hdds.scm.storage.BlockInputStream.initialize(BlockInputStream.java:118) at org.apache.hadoop.hdds.scm.storage.BlockInputStream.read(BlockInputStream.java:224) at org.apache.hadoop.ozone.client.io.KeyInputStream.read(KeyInputStream.java:173) at org.apache.hadoop.ozone.client.io.OzoneInputStream.read(OzoneInputStream.java:47) at java.base/java.io.InputStream.read(InputStream.java:205) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:94) at org.apache.hadoop.ozone.web.ozShell.keys.GetKeyHandler.call(GetKeyHandler.java:98) at org.apache.hadoop.ozone.web.ozShell.keys.GetKeyHandler.call(GetKeyHandler.java:48) at picocli.CommandLine.execute(CommandLine.java:1173) at picocli.CommandLine.access$800(CommandLine.java:141) at picocli.CommandLine$RunLast.handle(CommandLine.java:1367) at picocli.CommandLine$RunLast.handle(CommandLine.java:1335) at picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:1243) at picocli.CommandLine.parseWithHandlers(CommandLine.java:1526) at picocli.CommandLine.parseWithHandler(CommandLine.java:1465) at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:65) at org.apache.hadoop.ozone.web.ozShell.OzoneShell.execute(OzoneShell.java:60) at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:56) at org.apache.hadoop.ozone.web.ozShell.OzoneShell.main(OzoneShell.java:53) {code} I think the exception should be caught and shell should output a more friendly / less verbose message. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14902) RBF: NullPointer When Misconfigured
[ https://issues.apache.org/jira/browse/HDFS-14902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967342#comment-16967342 ] Akira Ajisaka commented on HDFS-14902: -- Compiled with the patch and run {{./hdfs dfsrouter}} command. {noformat} 2019-11-05 17:35:50,276 ERROR router.NamenodeHeartbeatService: Namenode is not operational: Namenode is unregistered {noformat} This error message seems confusing to me. It seems to me NameNode is not just running or is in safe mode. I think the error message is shown if and only if the DFSRouter is misconfigured, so it's better to include that the configuration is wrong. > RBF: NullPointer When Misconfigured > --- > > Key: HDFS-14902 > URL: https://issues.apache.org/jira/browse/HDFS-14902 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: Takanobu Asanuma >Priority: Minor > Attachments: HDFS-14902.001.patch > > > Admittedly the server was mis-configured, but this should be a bit more > elegant. > {code:none} > 2019-10-08 11:19:52,505 ERROR router.NamenodeHeartbeatService: Unhandled > exception updating NN registration for null:null > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.federation.protocol.proto.HdfsServerFederationProtos$NamenodeMembershipRecordProto$Builder.setServiceAddress(HdfsServerFederationProtos.java:3831) > at > org.apache.hadoop.hdfs.server.federation.store.records.impl.pb.MembershipStatePBImpl.setServiceAddress(MembershipStatePBImpl.java:119) > at > org.apache.hadoop.hdfs.server.federation.store.records.MembershipState.newInstance(MembershipState.java:108) > at > org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.registerNamenode(MembershipNamenodeResolver.java:259) > at > org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.updateState(NamenodeHeartbeatService.java:223) > at > org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.periodicInvoke(NamenodeHeartbeatService.java:159) > at > org.apache.hadoop.hdfs.server.federation.router.PeriodicService$1.run(PeriodicService.java:178) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org