[jira] [Commented] (HDFS-14476) lock too long when fix inconsistent blocks between disk and in-memory
[ https://issues.apache.org/jira/browse/HDFS-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017759#comment-17017759 ] Sean Chow commented on HDFS-14476: -- Build failed:( > lock too long when fix inconsistent blocks between disk and in-memory > - > > Key: HDFS-14476 > URL: https://issues.apache.org/jira/browse/HDFS-14476 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.6.0, 2.7.0, 3.0.3 >Reporter: Sean Chow >Assignee: Sean Chow >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14476-branch-2.01.patch, > HDFS-14476-branch-2.02.patch, HDFS-14476.00.patch, HDFS-14476.002.patch, > HDFS-14476.01.patch, HDFS-14476.branch-3.2.001.patch, > datanode-with-patch-14476.png > > > When directoryScanner have the results of differences between disk and > in-memory blocks. it will try to run {{checkAndUpdate}} to fix it. However > {{FsDatasetImpl.checkAndUpdate}} is a synchronized call > As I have about 6millions blocks for every datanodes and every 6hours' scan > will have about 25000 abnormal blocks to fix. That leads to a long lock > holding FsDatasetImpl object. > let's assume every block need 10ms to fix(because of latency of SAS disk), > that will cost 250 seconds to finish. That means all reads and writes will be > blocked for 3mins for that datanode. > > {code:java} > 2019-05-06 08:06:51,704 INFO > org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool > BP-1644920766-10.223.143.220-1450099987967 Total blocks: 6850197, missing > metadata files:23574, missing block files:23574, missing blocks in > memory:47625, mismatched blocks:0 > ... > 2019-05-06 08:16:41,625 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Took 588402ms to process 1 commands from NN > {code} > Take long time to process command from nn because threads are blocked. And > namenode will see long lastContact time for this datanode. > Maybe this affect all hdfs versions. > *how to fix:* > just like process invalidate command from namenode with 1000 batch size, fix > these abnormal block should be handled with batch too and sleep 2 seconds > between the batch to allow normal reading/writing blocks. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15126) testForcedRegistration fails Intermittently
[ https://issues.apache.org/jira/browse/HDFS-15126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017686#comment-17017686 ] Hadoop QA commented on HDFS-15126: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 29m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 30s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 17s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 4s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 11s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 96m 58s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 38s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}183m 31s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.TestRedudantBlocks | | | hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes | | | hadoop.hdfs.server.balancer.TestBalancer | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:c44943d1fc3 | | JIRA Issue | HDFS-15126 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12991187/HDFS-15126.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux db3694bbc4a4 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 263413e | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_232 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/28688/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/28688/testReport/ | | Max. process+thread count | 4291 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output |
[jira] [Commented] (HDFS-15127) RBF: Do not allow writes when a subcluster is unavailable for HASH_ALL mount points.
[ https://issues.apache.org/jira/browse/HDFS-15127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017672#comment-17017672 ] Hadoop QA commented on HDFS-15127: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 29s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 48s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 7s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 57s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 20s{color} | {color:red} hadoop-hdfs-rbf in the patch failed. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 21s{color} | {color:red} hadoop-hdfs-rbf in the patch failed. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 21s{color} | {color:red} hadoop-hdfs-rbf in the patch failed. {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 22s{color} | {color:red} hadoop-hdfs-rbf in the patch failed. {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 23s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 22s{color} | {color:red} hadoop-hdfs-rbf in the patch failed. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 21s{color} | {color:red} hadoop-hdfs-rbf in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 30s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 57m 10s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:c44943d1fc3 | | JIRA Issue | HDFS-15127 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12991169/HDFS-15127.000.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux ceb23d392f79 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 263413e | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_232 | | findbugs | v3.1.0-RC1 | | mvninstall | https://builds.apache.org/job/PreCommit-HDFS-Build/28689/artifact/out/patch-mvninstall-hadoop-hdfs-project_hadoop-hdfs-rbf.txt | | compile | https://builds.apache.org/job/PreCommit-HDFS-Build/28689/artifact/out/patch-compile-hadoop-hdfs-project_hadoop-hdfs-rbf.txt | | javac | https://builds.apache.org/job/PreCommit-HDFS-Build/28689/artifact/out/patch-compile-hadoop-hdfs-project_hadoop-hdfs-rbf.txt | |
[jira] [Commented] (HDFS-15124) Crashing bugs in NameNode when using a valid configuration for `dfs.namenode.audit.loggers`
[ https://issues.apache.org/jira/browse/HDFS-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017653#comment-17017653 ] Hadoop QA commented on HDFS-15124: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 41s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 11s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 16s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 41s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 166 unchanged - 0 fixed = 167 total (was 166) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 25s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 24s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}123m 13s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 32s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}191m 22s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestDeadNodeDetection | | | hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks | | | hadoop.hdfs.server.namenode.TestRedudantBlocks | | | hadoop.hdfs.server.datanode.TestNNHandlesBlockReportPerStorage | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:c44943d1fc3 | | JIRA Issue | HDFS-15124 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12991181/HDFS-15124.002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 0fb99bc6d2d0 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 263413e | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_232 | | findbugs | v3.1.0-RC1 | | checkstyle |
[jira] [Commented] (HDFS-15126) testForcedRegistration fails Intermittently
[ https://issues.apache.org/jira/browse/HDFS-15126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017650#comment-17017650 ] Íñigo Goiri commented on HDFS-15126: I went over this class and this is a little hard to debug. Basically, this goes into a wait for that if it fails it does not say anything but fail the assert. I would propose one of the two following options: # Log the timeout for the function. # Add a message in the assertTrue to say what happened. > testForcedRegistration fails Intermittently > --- > > Key: HDFS-15126 > URL: https://issues.apache.org/jira/browse/HDFS-15126 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Attachments: HDFS-15126.001.patch > > > Recently, {{TestDatanodeRegistration.testForcedRegistration}} started to fail > in the recommit builds: > {code:bash} > [ERROR] Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: > 25.333 s <<< FAILURE! - in org.apache.hadoop.hdfs.TestDatanodeRegistration > [ERROR] > testForcedRegistration(org.apache.hadoop.hdfs.TestDatanodeRegistration) Time > elapsed: 11.39 s <<< FAILURE! > java.lang.AssertionError > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertTrue(Assert.java:52) > at > org.apache.hadoop.hdfs.TestDatanodeRegistration.testForcedRegistration(TestDatanodeRegistration.java:385) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15118) [SBN Read] Slow clients when Observer reads are enabled but there are no Observers on the cluster.
[ https://issues.apache.org/jira/browse/HDFS-15118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017645#comment-17017645 ] Hadoop QA commented on HDFS-15118: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 43s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 7s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 43s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 3m 20s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 52s{color} | {color:orange} hadoop-hdfs-project: The patch generated 1 new + 2 unchanged - 0 fixed = 3 total (was 2) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 37s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 46s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 0s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}123m 56s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 42s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}202m 45s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestDecommission | | | hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits | | | hadoop.hdfs.server.namenode.TestRedudantBlocks | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:c44943d1fc3 | | JIRA Issue | HDFS-15118 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12991178/HDFS-15118.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux d1ea322184a0 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 263413e |
[jira] [Updated] (HDFS-15128) Unit test failing to clean testing data and crashed future Maven test run
[ https://issues.apache.org/jira/browse/HDFS-15128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ctest updated HDFS-15128: - Summary: Unit test failing to clean testing data and crashed future Maven test run (was: Unit test failing to clean testing data and crashed subsequent Maven test run) > Unit test failing to clean testing data and crashed future Maven test run > - > > Key: HDFS-15128 > URL: https://issues.apache.org/jira/browse/HDFS-15128 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, test >Affects Versions: 3.2.1 >Reporter: Ctest >Priority: Critical > Labels: easyfix, patch, test > Attachments: HDFS-15128-000.patch > > > Actively-used test helper function `testVolumeConfig` in > `org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration` > chmod a directory with invalid perm 000 for testing purposes but later failed > to chmod back this directory with a valid perm if the assertion inside this > function failed. Any subsequent `mvn test` command would fail to run if this > test had failed before. It is because Maven failed to build itself as it did > not have permission to clean the temporarily-generated directory that has > perm 000. See below for the code snippet that is buggy. > {code:java} > try { > for (int i = 0; i < volumesFailed; i++) { > prepareDirToFail(dirs[i]); // this will chmod dirs[i] to perm 000 > } > restartDatanodes(volumesTolerated, manageDfsDirs); > } catch (DiskErrorException e) { > ... > } finally { > ... > } > > assertEquals(expectedBPServiceState, bpServiceState); > > for (File dir : dirs) { > FileUtil.chmod(dir.toString(), "755"); > } > } > {code} > The failure of the statement `assertEquals(expectedBPServiceState, > bpServiceState)` caused function to terminate without executing > `FileUtil.chmod(dir.toString(), "755")` for each temporary directory with > invalid perm 000 the test has created. > > *Consequence* > Any subsequent `mvn test` command would fail to run if this test had failed > before. It is because Maven failed to build itself since it does not have > permission to clean this temporarily-generated directory. For details of the > failure, see below: > {noformat} > [INFO] --- maven-antrun-plugin:1.7:run (create-log-dir) @ hadoop-hdfs --- > [INFO] Executing tasks > > main: > [delete] Deleting directory > /home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data > [INFO] > > [INFO] BUILD FAILURE > [INFO] > > [INFO] Total time: 8.349 s > [INFO] Finished at: 2019-12-27T03:53:04-06:00 > [INFO] > > [ERROR] Failed to execute > goalorg.apache.maven.plugins:maven-antrun-plugin:1.7:run (create-log-dir) on > project hadoop-hdfs: An Ant BuildException has occured: Unable to delete > directory > /home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/current > [ERROR] around Ant part ... dir="/home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data"/>... > @ 4:105 in > /home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/antrun/build-main.xml > [ERROR] -> [Help 1] > [ERROR] > [ERROR] To see the full stack trace of the errors, re-run Maven with the -e > switch. > [ERROR] Re-run Maven using the -X switch to enable full debug logging. > [ERROR] > [ERROR] For more information about the errors and possible solutions, please > read the following articles: > [ERROR] [Help 1] > http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException{noformat} > > *Root Cause* > The test helper function > `org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration#testVolumeConfig` > purposely set the directory > `/home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/current` > to have perm 000. And at the end of this function, it changed the perm of > this directory to 755. However, there is an assertion in this function before > the perm was able to changed to 755. Once this assertion fails, the function > terminates before the directory’s perm can be changed to 755. Hence, this > directory was later unable to be removed by Maven for when executing `mvn > test`. > > *Fix* > In > `org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration#testVolumeConfig`, > move the assertion `assertEquals(expectedBPServiceState, bpServiceState)` > to the last line of this function. This fix will fix the bug and will not >
[jira] [Updated] (HDFS-15128) Unit test failing to clean testing data and crashed subsequent Maven test run
[ https://issues.apache.org/jira/browse/HDFS-15128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ctest updated HDFS-15128: - Summary: Unit test failing to clean testing data and crashed subsequent Maven test run (was: Unit test failing to clean testing data and caused Maven to crash) > Unit test failing to clean testing data and crashed subsequent Maven test run > - > > Key: HDFS-15128 > URL: https://issues.apache.org/jira/browse/HDFS-15128 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, test >Affects Versions: 3.2.1 >Reporter: Ctest >Priority: Critical > Labels: easyfix, patch, test > Attachments: HDFS-15128-000.patch > > > Actively-used test helper function `testVolumeConfig` in > `org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration` > chmod a directory with invalid perm 000 for testing purposes but later failed > to chmod back this directory with a valid perm if the assertion inside this > function failed. Any subsequent `mvn test` command would fail to run if this > test had failed before. It is because Maven failed to build itself as it did > not have permission to clean the temporarily-generated directory that has > perm 000. See below for the code snippet that is buggy. > {code:java} > try { > for (int i = 0; i < volumesFailed; i++) { > prepareDirToFail(dirs[i]); // this will chmod dirs[i] to perm 000 > } > restartDatanodes(volumesTolerated, manageDfsDirs); > } catch (DiskErrorException e) { > ... > } finally { > ... > } > > assertEquals(expectedBPServiceState, bpServiceState); > > for (File dir : dirs) { > FileUtil.chmod(dir.toString(), "755"); > } > } > {code} > The failure of the statement `assertEquals(expectedBPServiceState, > bpServiceState)` caused function to terminate without executing > `FileUtil.chmod(dir.toString(), "755")` for each temporary directory with > invalid perm 000 the test has created. > > *Consequence* > Any subsequent `mvn test` command would fail to run if this test had failed > before. It is because Maven failed to build itself since it does not have > permission to clean this temporarily-generated directory. For details of the > failure, see below: > {noformat} > [INFO] --- maven-antrun-plugin:1.7:run (create-log-dir) @ hadoop-hdfs --- > [INFO] Executing tasks > > main: > [delete] Deleting directory > /home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data > [INFO] > > [INFO] BUILD FAILURE > [INFO] > > [INFO] Total time: 8.349 s > [INFO] Finished at: 2019-12-27T03:53:04-06:00 > [INFO] > > [ERROR] Failed to execute > goalorg.apache.maven.plugins:maven-antrun-plugin:1.7:run (create-log-dir) on > project hadoop-hdfs: An Ant BuildException has occured: Unable to delete > directory > /home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/current > [ERROR] around Ant part ... dir="/home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data"/>... > @ 4:105 in > /home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/antrun/build-main.xml > [ERROR] -> [Help 1] > [ERROR] > [ERROR] To see the full stack trace of the errors, re-run Maven with the -e > switch. > [ERROR] Re-run Maven using the -X switch to enable full debug logging. > [ERROR] > [ERROR] For more information about the errors and possible solutions, please > read the following articles: > [ERROR] [Help 1] > http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException{noformat} > > *Root Cause* > The test helper function > `org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration#testVolumeConfig` > purposely set the directory > `/home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/current` > to have perm 000. And at the end of this function, it changed the perm of > this directory to 755. However, there is an assertion in this function before > the perm was able to changed to 755. Once this assertion fails, the function > terminates before the directory’s perm can be changed to 755. Hence, this > directory was later unable to be removed by Maven for when executing `mvn > test`. > > *Fix* > In > `org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration#testVolumeConfig`, > move the assertion `assertEquals(expectedBPServiceState, bpServiceState)` > to the last line of this function. This fix will fix the bug and will not >
[jira] [Updated] (HDFS-15128) Unit test failing to clean testing data and caused Maven to crash
[ https://issues.apache.org/jira/browse/HDFS-15128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ctest updated HDFS-15128: - Description: Actively-used test helper function `testVolumeConfig` in `org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration` chmod a directory with invalid perm 000 for testing purposes but later failed to chmod back this directory with a valid perm if the assertion inside this function failed. Any subsequent `mvn test` command would fail to run if this test had failed before. It is because Maven failed to build itself as it did not have permission to clean the temporarily-generated directory that has perm 000. See below for the code snippet that is buggy. {code:java} try { for (int i = 0; i < volumesFailed; i++) { prepareDirToFail(dirs[i]); // this will chmod dirs[i] to perm 000 } restartDatanodes(volumesTolerated, manageDfsDirs); } catch (DiskErrorException e) { ... } finally { ... } assertEquals(expectedBPServiceState, bpServiceState); for (File dir : dirs) { FileUtil.chmod(dir.toString(), "755"); } } {code} The failure of the statement `assertEquals(expectedBPServiceState, bpServiceState)` caused function to terminate without executing `FileUtil.chmod(dir.toString(), "755")` for each temporary directory with invalid perm 000 the test has created. *Consequence* Any subsequent `mvn test` command would fail to run if this test had failed before. It is because Maven failed to build itself since it does not have permission to clean this temporarily-generated directory. For details of the failure, see below: {noformat} [INFO] --- maven-antrun-plugin:1.7:run (create-log-dir) @ hadoop-hdfs --- [INFO] Executing tasks main: [delete] Deleting directory /home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 8.349 s [INFO] Finished at: 2019-12-27T03:53:04-06:00 [INFO] [ERROR] Failed to execute goalorg.apache.maven.plugins:maven-antrun-plugin:1.7:run (create-log-dir) on project hadoop-hdfs: An Ant BuildException has occured: Unable to delete directory /home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/current [ERROR] around Ant part .. @ 4:105 in /home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/antrun/build-main.xml [ERROR] -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException{noformat} *Root Cause* The test helper function `org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration#testVolumeConfig` purposely set the directory `/home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/current` to have perm 000. And at the end of this function, it changed the perm of this directory to 755. However, there is an assertion in this function before the perm was able to changed to 755. Once this assertion fails, the function terminates before the directory’s perm can be changed to 755. Hence, this directory was later unable to be removed by Maven for when executing `mvn test`. *Fix* In `org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration#testVolumeConfig`, move the assertion `assertEquals(expectedBPServiceState, bpServiceState)` to the last line of this function. This fix will fix the bug and will not change the test outcome. was: Actively-used test helper function `testVolumeConfig` in `org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration` chmod a directory with invalid perm 000 for testing purposes but later failed to chmod back this directory with a valid perm if the assertion inside this function failed. Any subsequent `mvn test` command would fail to run if this test had failed before. It is because Maven failed to build itself as it did not have permission to clean the temporarily-generated directory that has perm 000. See below for the code snippet that is buggy. {code:java} try { for (int i = 0; i < volumesFailed; i++) { prepareDirToFail(dirs[i]); // this will chmod dirs[i] to perm 000 } restartDatanodes(volumesTolerated, manageDfsDirs); } catch (DiskErrorException e) { ... } finally { ... } assertEquals(expectedBPServiceState, bpServiceState);
[jira] [Updated] (HDFS-15128) Unit test failing to clean testing data and caused Maven to crash
[ https://issues.apache.org/jira/browse/HDFS-15128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ctest updated HDFS-15128: - Description: Actively-used test helper function `testVolumeConfig` in `org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration` chmod a directory with invalid perm 000 for testing purposes but later failed to chmod back this directory with a valid perm if the assertion inside this function failed. Any subsequent `mvn test` command would fail to run if this test had failed before. It is because Maven failed to build itself as it did not have permission to clean the temporarily-generated directory that has perm 000. See below for the code snippet that is buggy. {code:java} try { for (int i = 0; i < volumesFailed; i++) { prepareDirToFail(dirs[i]); // this will chmod dirs[i] to perm 000 } restartDatanodes(volumesTolerated, manageDfsDirs); } catch (DiskErrorException e) { ... } finally { ... } assertEquals(expectedBPServiceState, bpServiceState); for (File dir : dirs) { FileUtil.chmod(dir.toString(), "755"); } } {code} The failure of the statement `assertEquals(expectedBPServiceState, bpServiceState)` caused function to terminate without executing `FileUtil.chmod(dir.toString(), "755")` for each temporary directory with invalid perm 000 the test has created. *Consequence:* Any subsequent `mvn test` command would fail to run if this test had failed before. It is because Maven failed to build itself since it does not have permission to clean this temporarily-generated directory. For details of the failure, see below: {noformat} [INFO] --- maven-antrun-plugin:1.7:run (create-log-dir) @ hadoop-hdfs --- [INFO] Executing tasks main: [delete] Deleting directory /home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 8.349 s [INFO] Finished at: 2019-12-27T03:53:04-06:00 [INFO] [ERROR] Failed to execute goalorg.apache.maven.plugins:maven-antrun-plugin:1.7:run (create-log-dir) on project hadoop-hdfs: An Ant BuildException has occured: Unable to delete directory /home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/current [ERROR] around Ant part .. @ 4:105 in /home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/antrun/build-main.xml [ERROR] -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException{noformat} *Root Cause:* The test helper function `org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration#testVolumeConfig` purposely set the directory `/home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/current` to have perm 000. And at the end of this function, it changed the perm of this directory to 755. However, there is an assertion in this function before the perm was able to changed to 755. Once this assertion fails, the function terminates before the directory’s perm can be changed to 755. Hence, this directory was later unable to be removed by Maven for when executing `mvn test`. *Fix:* In `org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration#testVolumeConfig`, move the assertion `assertEquals(expectedBPServiceState, bpServiceState)` to the last line of this function. This fix will fix the bug and will not change the test outcome. was: *Description:* Actively-used test helper function `testVolumeConfig` in `org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration` chmod a directory with invalid perm 000 for testing purposes but later failed to chmod back this directory with a valid perm if the assertion inside this function failed. Any subsequent `mvn test` command would fail to run if this test had failed before. It is because Maven failed to build itself as it did not have permission to clean the temporarily-generated directory that has perm 000. See below for the code snippet that is buggy. {code:java} try { for (int i = 0; i < volumesFailed; i++) { prepareDirToFail(dirs[i]); // this will chmod dirs[i] to perm 000 } restartDatanodes(volumesTolerated, manageDfsDirs); } catch (DiskErrorException e) { ... } finally { ... } assertEquals(expectedBPServiceState,
[jira] [Updated] (HDFS-15128) Unit test failing to clean testing data and caused Maven to crash
[ https://issues.apache.org/jira/browse/HDFS-15128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ctest updated HDFS-15128: - Description: *Description:* Actively-used test helper function `testVolumeConfig` in `org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration` chmod a directory with invalid perm 000 for testing purposes but later failed to chmod back this directory with a valid perm if the assertion inside this function failed. Any subsequent `mvn test` command would fail to run if this test had failed before. It is because Maven failed to build itself as it did not have permission to clean the temporarily-generated directory that has perm 000. See below for the code snippet that is buggy. {code:java} try { for (int i = 0; i < volumesFailed; i++) { prepareDirToFail(dirs[i]); // this will chmod dirs[i] to perm 000 } restartDatanodes(volumesTolerated, manageDfsDirs); } catch (DiskErrorException e) { ... } finally { ... } assertEquals(expectedBPServiceState, bpServiceState); for (File dir : dirs) { FileUtil.chmod(dir.toString(), "755"); } } {code} The failure of the statement `assertEquals(expectedBPServiceState, bpServiceState)` caused function to terminate without executing `FileUtil.chmod(dir.toString(), "755")` for each temporary directory with invalid perm 000 the test has created. *Consequence:* Any subsequent `mvn test` command would fail to run if this test had failed before. It is because Maven failed to build itself since it does not have permission to clean this temporarily-generated directory. For details of the failure, see below: {noformat} [INFO] --- maven-antrun-plugin:1.7:run (create-log-dir) @ hadoop-hdfs --- [INFO] Executing tasks main: [delete] Deleting directory /home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 8.349 s [INFO] Finished at: 2019-12-27T03:53:04-06:00 [INFO] [ERROR] Failed to execute goalorg.apache.maven.plugins:maven-antrun-plugin:1.7:run (create-log-dir) on project hadoop-hdfs: An Ant BuildException has occured: Unable to delete directory /home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/current [ERROR] around Ant part .. @ 4:105 in /home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/antrun/build-main.xml [ERROR] -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException{noformat} *Root Cause:* The test helper function `org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration#testVolumeConfig` purposely set the directory `/home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/current` to have perm 000. And at the end of this function, it changed the perm of this directory to 755. However, there is an assertion in this function before the perm was able to changed to 755. Once this assertion fails, the function terminates before the directory’s perm can be changed to 755. Hence, this directory was later unable to be removed by Maven for when executing `mvn test`. *Fix:* In `org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration#testVolumeConfig`, move the assertion `assertEquals(expectedBPServiceState, bpServiceState)` to the last line of this function. This fix will fix the bug and will not change the test outcome. was: *Description:* Actively-used test helper function `testVolumeConfig` in `org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration` chmod a directory with invalid perm 000 for testing purposes but later failed to chmod back this directory with a valid perm if the assertion inside this function failed. Any subsequent `mvn test` command would fail to run if this test had failed before. It is because Maven failed to build itself as it did not have permission to clean the temporarily-generated directory that has perm 000. See below for the code snippet that is buggy. {code:java} try { for (int i = 0; i < volumesFailed; i++) { prepareDirToFail(dirs[i]); // this will chmod dirs[i] to perm 000 } restartDatanodes(volumesTolerated, manageDfsDirs); } catch (DiskErrorException e) { ... } finally { ... }
[jira] [Updated] (HDFS-15128) Unit test failing to clean testing data and caused Maven to crash
[ https://issues.apache.org/jira/browse/HDFS-15128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ctest updated HDFS-15128: - Attachment: HDFS-15128-000.patch Tags: (was: test) > Unit test failing to clean testing data and caused Maven to crash > - > > Key: HDFS-15128 > URL: https://issues.apache.org/jira/browse/HDFS-15128 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, test >Affects Versions: 3.2.1 >Reporter: Ctest >Priority: Critical > Labels: easyfix, patch, test > Attachments: HDFS-15128-000.patch > > > *Description:* > Actively-used test helper function `testVolumeConfig` in > `org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration` > chmod a directory with invalid perm 000 for testing purposes but later failed > to chmod back this directory with a valid perm if the assertion inside this > function failed. Any subsequent `mvn test` command would fail to run if this > test had failed before. It is because Maven failed to build itself as it did > not have permission to clean the temporarily-generated directory that has > perm 000. See below for the code snippet that is buggy. > > > {code:java} > try { > for (int i = 0; i < volumesFailed; i++) { > prepareDirToFail(dirs[i]); // this will chmod dirs[i] to perm 000 > } > restartDatanodes(volumesTolerated, manageDfsDirs); > } catch (DiskErrorException e) { > ... > } finally { > ... > } > > assertEquals(expectedBPServiceState, bpServiceState); > > for (File dir : dirs) { > FileUtil.chmod(dir.toString(), "755"); > } > } > {code} > > > The failure of the statement `assertEquals(expectedBPServiceState, > bpServiceState)` caused function to terminate without executing > `FileUtil.chmod(dir.toString(), "755")` for each temporary directory with > invalid perm 000 the test has created. > > *Consequence:* > Any subsequent `mvn test` command would fail to run if this test had failed > before. It is because Maven failed to build itself since it does not have > permission to clean this temporarily-generated directory. For details of the > failure, see below: > > {noformat} > [INFO] --- maven-antrun-plugin:1.7:run (create-log-dir) @ hadoop-hdfs --- > [INFO] Executing tasks > > main: > [delete] Deleting directory > /home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data > [INFO] > > [INFO] BUILD FAILURE > [INFO] > > [INFO] Total time: 8.349 s > [INFO] Finished at: 2019-12-27T03:53:04-06:00 > [INFO] > > [ERROR] Failed to execute > goalorg.apache.maven.plugins:maven-antrun-plugin:1.7:run (create-log-dir) on > project hadoop-hdfs: An Ant BuildException has occured: Unable to delete > directory > /home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/current > [ERROR] around Ant part ... dir="/home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data"/>... > @ 4:105 in > /home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/antrun/build-main.xml > [ERROR] -> [Help 1] > [ERROR] > [ERROR] To see the full stack trace of the errors, re-run Maven with the -e > switch. > [ERROR] Re-run Maven using the -X switch to enable full debug logging. > [ERROR] > [ERROR] For more information about the errors and possible solutions, please > read the following articles: > [ERROR] [Help 1] > http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException{noformat} > > > > *Root Cause:* > The test helper function > `org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration#testVolumeConfig` > purposely set the directory > `/home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/current` > to have perm 000. And at the end of this function, it changed the perm of > this directory to 755. However, there is an assertion in this function before > the perm was able to changed to 755. Once this assertion fails, the function > terminates before the directory’s perm can be changed to 755. Hence, this > directory was later unable to be removed by Maven for when executing `mvn > test`. > > *Fix:* > In > `org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration#testVolumeConfig`, > move the assertion `assertEquals(expectedBPServiceState, bpServiceState)` > to the last line of this function. This fix will fix the bug and will not > change the test outcome. > > *Content for the patch:* > {code:java} > diff
[jira] [Updated] (HDFS-15127) RBF: Do not allow writes when a subcluster is unavailable for HASH_ALL mount points.
[ https://issues.apache.org/jira/browse/HDFS-15127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Íñigo Goiri updated HDFS-15127: --- Status: Patch Available (was: Open) > RBF: Do not allow writes when a subcluster is unavailable for HASH_ALL mount > points. > > > Key: HDFS-15127 > URL: https://issues.apache.org/jira/browse/HDFS-15127 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Íñigo Goiri >Priority: Major > Attachments: HDFS-15127.000.patch > > > A HASH_ALL mount point should not allow creating new files if one subcluster > is down. > If the file already existed in the past, this could lead to inconsistencies. > We should return an unavailable exception. > {{TestRouterFaultTolerant#testWriteWithFailedSubcluster()}} needs to be > changed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15128) Unit test failing to clean testing data and caused Maven to crash
Ctest created HDFS-15128: Summary: Unit test failing to clean testing data and caused Maven to crash Key: HDFS-15128 URL: https://issues.apache.org/jira/browse/HDFS-15128 Project: Hadoop HDFS Issue Type: Bug Components: hdfs, test Affects Versions: 3.2.1 Reporter: Ctest *Description:* Actively-used test helper function `testVolumeConfig` in `org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration` chmod a directory with invalid perm 000 for testing purposes but later failed to chmod back this directory with a valid perm if the assertion inside this function failed. Any subsequent `mvn test` command would fail to run if this test had failed before. It is because Maven failed to build itself as it did not have permission to clean the temporarily-generated directory that has perm 000. See below for the code snippet that is buggy. {code:java} try { for (int i = 0; i < volumesFailed; i++) { prepareDirToFail(dirs[i]); // this will chmod dirs[i] to perm 000 } restartDatanodes(volumesTolerated, manageDfsDirs); } catch (DiskErrorException e) { ... } finally { ... } assertEquals(expectedBPServiceState, bpServiceState); for (File dir : dirs) { FileUtil.chmod(dir.toString(), "755"); } } {code} The failure of the statement `assertEquals(expectedBPServiceState, bpServiceState)` caused function to terminate without executing `FileUtil.chmod(dir.toString(), "755")` for each temporary directory with invalid perm 000 the test has created. *Consequence:* Any subsequent `mvn test` command would fail to run if this test had failed before. It is because Maven failed to build itself since it does not have permission to clean this temporarily-generated directory. For details of the failure, see below: {noformat} [INFO] --- maven-antrun-plugin:1.7:run (create-log-dir) @ hadoop-hdfs --- [INFO] Executing tasks main: [delete] Deleting directory /home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 8.349 s [INFO] Finished at: 2019-12-27T03:53:04-06:00 [INFO] [ERROR] Failed to execute goalorg.apache.maven.plugins:maven-antrun-plugin:1.7:run (create-log-dir) on project hadoop-hdfs: An Ant BuildException has occured: Unable to delete directory /home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/current [ERROR] around Ant part .. @ 4:105 in /home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/antrun/build-main.xml [ERROR] -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException{noformat} *Root Cause:* The test helper function `org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration#testVolumeConfig` purposely set the directory `/home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/current` to have perm 000. And at the end of this function, it changed the perm of this directory to 755. However, there is an assertion in this function before the perm was able to changed to 755. Once this assertion fails, the function terminates before the directory’s perm can be changed to 755. Hence, this directory was later unable to be removed by Maven for when executing `mvn test`. *Fix:* In `org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration#testVolumeConfig`, move the assertion `assertEquals(expectedBPServiceState, bpServiceState)` to the last line of this function. This fix will fix the bug and will not change the test outcome. *Content for the patch:* {code:java} diff --git a/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailureToleration.java b/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailureToleration.java index a9e4096df4b..a492fa5fd44 100644 --- a/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailureToleration.java +++ b/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailureToleration.java @@ -256,11 +256,11 @@ private void testVolumeConfig(int volumesTolerated, int
[jira] [Commented] (HDFS-12943) Consistent Reads from Standby Node
[ https://issues.apache.org/jira/browse/HDFS-12943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017634#comment-17017634 ] Konstantin Shvachko commented on HDFS-12943: Hey [~lindy_hopper] yes we currently recommend turning off access time updates on Observers as [~vagarychen] said. We plan to bring aTime updates back with HDFS-15118. Observer will bounce such getBlockLocation() calls to Active, so that it could actually update the time. > Consistent Reads from Standby Node > -- > > Key: HDFS-12943 > URL: https://issues.apache.org/jira/browse/HDFS-12943 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs >Reporter: Konstantin Shvachko >Assignee: Konstantin Shvachko >Priority: Major > Fix For: 2.10.0, 3.3.0, 3.1.4, 3.2.2 > > Attachments: ConsistentReadsFromStandbyNode.pdf, > ConsistentReadsFromStandbyNode.pdf, HDFS-12943-001.patch, > HDFS-12943-002.patch, HDFS-12943-003.patch, HDFS-12943-004.patch, > TestPlan-ConsistentReadsFromStandbyNode.pdf > > > StandbyNode in HDFS is a replica of the active NameNode. The states of the > NameNodes are coordinated via the journal. It is natural to consider > StandbyNode as a read-only replica. As with any replicated distributed system > the problem of stale reads should be resolved. Our main goal is to provide > reads from standby in a consistent way in order to enable a wide range of > existing applications running on top of HDFS. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15111) start / stopStandbyServices() should log which service it is transitioning to/from.
[ https://issues.apache.org/jira/browse/HDFS-15111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-15111: --- Labels: newbie++ (was: ) > start / stopStandbyServices() should log which service it is transitioning > to/from. > --- > > Key: HDFS-15111 > URL: https://issues.apache.org/jira/browse/HDFS-15111 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, logging >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Priority: Major > Labels: newbie++ > > Trying to transition Observer to Standby state. Both > {{stopStandbyServices()}} and {{startStandbyServices()}} log that they are > stopping/starting Standby services. > # {{startStandbyServices()}} should log which state it is transitioning TO. > # {{stopStandbyServices()}} should log which state it is transitioning FROM. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15124) Crashing bugs in NameNode when using a valid configuration for `dfs.namenode.audit.loggers`
[ https://issues.apache.org/jira/browse/HDFS-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017619#comment-17017619 ] Ctest commented on HDFS-15124: -- Uploaded one new patch for trunk. > Crashing bugs in NameNode when using a valid configuration for > `dfs.namenode.audit.loggers` > --- > > Key: HDFS-15124 > URL: https://issues.apache.org/jira/browse/HDFS-15124 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.10.0 >Reporter: Ctest >Priority: Critical > Attachments: HDFS-15124.000.patch, HDFS-15124.001.patch, > HDFS-15124.002.patch > > > I am using Hadoop-2.10.0. > The configuration parameter `dfs.namenode.audit.loggers` allows `default` > (which is the default value) and > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`. > When I use `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, > namenode will not be started successfully because of an > `InstantiationException` thrown from > `org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers`. > The root cause is that while initializing namenode, `initAuditLoggers` will > be called and it will try to call the default constructor of > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger` which doesn't > have a default constructor. Thus the `InstantiationException` exception is > thrown. > > *Symptom* > *$ ./start-dfs.sh* > {code:java} > 2019-12-18 14:05:20,670 ERROR > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem > initialization failed.java.lang.RuntimeException: > java.lang.InstantiationException: > org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1024) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:858) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:677) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:674) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:736) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:961) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:940) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1714) > at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1782) > Caused by: java.lang.InstantiationException: > org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger > at java.lang.Class.newInstance(Class.java:427) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1017)... > 8 more > Caused by: java.lang.NoSuchMethodException: > org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger.() > at java.lang.Class.getConstructor0(Class.java:3082) > at java.lang.Class.newInstance(Class.java:412) > ... 9 more{code} > > > *Detailed Root Cause* > There is no default constructor in > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`: > {code:java} > /** > * An {@link AuditLogger} that sends logged data directly to the metrics > * systems. It is used when the top service is used directly by the name node > */ > @InterfaceAudience.Private > public class TopAuditLogger implements AuditLogger { > public static finalLogger LOG = > LoggerFactory.getLogger(TopAuditLogger.class); > private final TopMetrics topMetrics; > public TopAuditLogger(TopMetrics topMetrics) { > Preconditions.checkNotNull(topMetrics, "Cannot init with a null " + > "TopMetrics"); > this.topMetrics = topMetrics; > } > @Override > public void initialize(Configuration conf) { > } > {code} > As long as the configuration parameter `dfs.namenode.audit.loggers` is set to > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, > `initAuditLoggers` will try to call its default constructor to make a new > instance: > {code:java} > private List initAuditLoggers(Configuration conf) { > // Initialize the custom access loggers if configured. > Collection alClasses = > conf.getTrimmedStringCollection(DFS_NAMENODE_AUDIT_LOGGERS_KEY); > List auditLoggers = Lists.newArrayList(); > if (alClasses != null && !alClasses.isEmpty()) { > for (String className : alClasses) { > try { > AuditLogger logger; > if (DFS_NAMENODE_DEFAULT_AUDIT_LOGGER_NAME.equals(className)) { > logger = new DefaultAuditLogger(); > } else { > logger = (AuditLogger) Class.forName(className).newInstance(); > } > logger.initialize(conf); > auditLoggers.add(logger); > } catch (RuntimeException re) { > throw re; >
[jira] [Updated] (HDFS-15126) testForcedRegistration fails Intermittently
[ https://issues.apache.org/jira/browse/HDFS-15126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Hussein updated HDFS-15126: - Attachment: HDFS-15126.001.patch > testForcedRegistration fails Intermittently > --- > > Key: HDFS-15126 > URL: https://issues.apache.org/jira/browse/HDFS-15126 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Attachments: HDFS-15126.001.patch > > > Recently, {{TestDatanodeRegistration.testForcedRegistration}} started to fail > in the recommit builds: > {code:bash} > [ERROR] Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: > 25.333 s <<< FAILURE! - in org.apache.hadoop.hdfs.TestDatanodeRegistration > [ERROR] > testForcedRegistration(org.apache.hadoop.hdfs.TestDatanodeRegistration) Time > elapsed: 11.39 s <<< FAILURE! > java.lang.AssertionError > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertTrue(Assert.java:52) > at > org.apache.hadoop.hdfs.TestDatanodeRegistration.testForcedRegistration(TestDatanodeRegistration.java:385) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15126) testForcedRegistration fails Intermittently
[ https://issues.apache.org/jira/browse/HDFS-15126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Hussein updated HDFS-15126: - Status: Patch Available (was: Open) > testForcedRegistration fails Intermittently > --- > > Key: HDFS-15126 > URL: https://issues.apache.org/jira/browse/HDFS-15126 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > > Recently, {{TestDatanodeRegistration.testForcedRegistration}} started to fail > in the recommit builds: > {code:bash} > [ERROR] Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: > 25.333 s <<< FAILURE! - in org.apache.hadoop.hdfs.TestDatanodeRegistration > [ERROR] > testForcedRegistration(org.apache.hadoop.hdfs.TestDatanodeRegistration) Time > elapsed: 11.39 s <<< FAILURE! > java.lang.AssertionError > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertTrue(Assert.java:52) > at > org.apache.hadoop.hdfs.TestDatanodeRegistration.testForcedRegistration(TestDatanodeRegistration.java:385) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15124) Crashing bugs in NameNode when using a valid configuration for `dfs.namenode.audit.loggers`
[ https://issues.apache.org/jira/browse/HDFS-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ctest updated HDFS-15124: - Attachment: HDFS-15124.002.patch > Crashing bugs in NameNode when using a valid configuration for > `dfs.namenode.audit.loggers` > --- > > Key: HDFS-15124 > URL: https://issues.apache.org/jira/browse/HDFS-15124 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.10.0 >Reporter: Ctest >Priority: Critical > Attachments: HDFS-15124.000.patch, HDFS-15124.001.patch, > HDFS-15124.002.patch > > > I am using Hadoop-2.10.0. > The configuration parameter `dfs.namenode.audit.loggers` allows `default` > (which is the default value) and > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`. > When I use `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, > namenode will not be started successfully because of an > `InstantiationException` thrown from > `org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers`. > The root cause is that while initializing namenode, `initAuditLoggers` will > be called and it will try to call the default constructor of > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger` which doesn't > have a default constructor. Thus the `InstantiationException` exception is > thrown. > > *Symptom* > *$ ./start-dfs.sh* > {code:java} > 2019-12-18 14:05:20,670 ERROR > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem > initialization failed.java.lang.RuntimeException: > java.lang.InstantiationException: > org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1024) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:858) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:677) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:674) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:736) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:961) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:940) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1714) > at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1782) > Caused by: java.lang.InstantiationException: > org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger > at java.lang.Class.newInstance(Class.java:427) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1017)... > 8 more > Caused by: java.lang.NoSuchMethodException: > org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger.() > at java.lang.Class.getConstructor0(Class.java:3082) > at java.lang.Class.newInstance(Class.java:412) > ... 9 more{code} > > > *Detailed Root Cause* > There is no default constructor in > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`: > {code:java} > /** > * An {@link AuditLogger} that sends logged data directly to the metrics > * systems. It is used when the top service is used directly by the name node > */ > @InterfaceAudience.Private > public class TopAuditLogger implements AuditLogger { > public static finalLogger LOG = > LoggerFactory.getLogger(TopAuditLogger.class); > private final TopMetrics topMetrics; > public TopAuditLogger(TopMetrics topMetrics) { > Preconditions.checkNotNull(topMetrics, "Cannot init with a null " + > "TopMetrics"); > this.topMetrics = topMetrics; > } > @Override > public void initialize(Configuration conf) { > } > {code} > As long as the configuration parameter `dfs.namenode.audit.loggers` is set to > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, > `initAuditLoggers` will try to call its default constructor to make a new > instance: > {code:java} > private List initAuditLoggers(Configuration conf) { > // Initialize the custom access loggers if configured. > Collection alClasses = > conf.getTrimmedStringCollection(DFS_NAMENODE_AUDIT_LOGGERS_KEY); > List auditLoggers = Lists.newArrayList(); > if (alClasses != null && !alClasses.isEmpty()) { > for (String className : alClasses) { > try { > AuditLogger logger; > if (DFS_NAMENODE_DEFAULT_AUDIT_LOGGER_NAME.equals(className)) { > logger = new DefaultAuditLogger(); > } else { > logger = (AuditLogger) Class.forName(className).newInstance(); > } > logger.initialize(conf); > auditLoggers.add(logger); > } catch (RuntimeException re) { > throw re; > } catch (Exception e) { >
[jira] [Commented] (HDFS-15125) Pull back fixes for DataNodeVolume* unit tests
[ https://issues.apache.org/jira/browse/HDFS-15125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017550#comment-17017550 ] Hadoop QA commented on HDFS-15125: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 10m 19s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} branch-2.10 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 39s{color} | {color:green} branch-2.10 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 5s{color} | {color:green} branch-2.10 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 2s{color} | {color:green} branch-2.10 passed with JDK v1.8.0_232 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 36s{color} | {color:green} branch-2.10 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 8s{color} | {color:green} branch-2.10 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 39s{color} | {color:green} branch-2.10 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 22s{color} | {color:green} branch-2.10 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s{color} | {color:green} branch-2.10 passed with JDK v1.8.0_232 {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s{color} | {color:green} the patch passed with JDK v1.8.0_232 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 19s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 48s{color} | {color:green} the patch passed with JDK v1.8.0_232 {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 87m 4s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 56s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}136m 38s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys | | | hadoop.hdfs.qjournal.TestSecureNNWithQJM | | | hadoop.hdfs.TestSetrepIncreasing | | | hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer | | | hadoop.hdfs.TestDFSStartupVersions | | | hadoop.hdfs.tools.TestDFSAdminWithHA | | | hadoop.hdfs.TestDFSClientRetries | | | hadoop.hdfs.TestFileConcurrentReader | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:a969cad0a12 | | JIRA Issue | HDFS-15125 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12991172/HDFS-15125-branch-2.10.002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 397ff213a26b
[jira] [Updated] (HDFS-15118) [SBN Read] Slow clients when Observer reads are enabled but there are no Observers on the cluster.
[ https://issues.apache.org/jira/browse/HDFS-15118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang updated HDFS-15118: -- Status: Patch Available (was: Open) > [SBN Read] Slow clients when Observer reads are enabled but there are no > Observers on the cluster. > -- > > Key: HDFS-15118 > URL: https://issues.apache.org/jira/browse/HDFS-15118 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Chen Liang >Priority: Major > Attachments: HDFS-15118.001.patch > > > We see substantial degradation in performance of HDFS clients, when Observer > reads are enabled via {{ObserverReadProxyProvider}}, but there are no > ObserverNodes on the cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15118) [SBN Read] Slow clients when Observer reads are enabled but there are no Observers on the cluster.
[ https://issues.apache.org/jira/browse/HDFS-15118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang updated HDFS-15118: -- Attachment: HDFS-15118.001.patch > [SBN Read] Slow clients when Observer reads are enabled but there are no > Observers on the cluster. > -- > > Key: HDFS-15118 > URL: https://issues.apache.org/jira/browse/HDFS-15118 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Priority: Major > Attachments: HDFS-15118.001.patch > > > We see substantial degradation in performance of HDFS clients, when Observer > reads are enabled via {{ObserverReadProxyProvider}}, but there are no > ObserverNodes on the cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-15118) [SBN Read] Slow clients when Observer reads are enabled but there are no Observers on the cluster.
[ https://issues.apache.org/jira/browse/HDFS-15118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang reassigned HDFS-15118: - Assignee: Chen Liang > [SBN Read] Slow clients when Observer reads are enabled but there are no > Observers on the cluster. > -- > > Key: HDFS-15118 > URL: https://issues.apache.org/jira/browse/HDFS-15118 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Chen Liang >Priority: Major > Attachments: HDFS-15118.001.patch > > > We see substantial degradation in performance of HDFS clients, when Observer > reads are enabled via {{ObserverReadProxyProvider}}, but there are no > ObserverNodes on the cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15124) Crashing bugs in NameNode when using a valid configuration for `dfs.namenode.audit.loggers`
[ https://issues.apache.org/jira/browse/HDFS-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017471#comment-17017471 ] Hadoop QA commented on HDFS-15124: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 51s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 29s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 15s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 41s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 166 unchanged - 0 fixed = 167 total (was 166) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 33s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 24s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}134m 8s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 33s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}198m 52s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestDecommission | | | hadoop.hdfs.server.namenode.snapshot.TestSnapshot | | | hadoop.hdfs.server.namenode.TestINodeAttributeProvider | | | hadoop.hdfs.TestDatanodeRegistration | | | hadoop.hdfs.server.namenode.TestRedudantBlocks | | | hadoop.hdfs.server.namenode.TestFSImage | | | hadoop.hdfs.server.namenode.TestFsck | | | hadoop.hdfs.server.namenode.snapshot.TestSnapshotDeletion | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:c44943d1fc3 | | JIRA Issue | HDFS-15124 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12991162/HDFS-15124.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 8fac4d4b27be 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / a0ff42d | | maven | version: Apache Maven
[jira] [Commented] (HDFS-15125) Pull back fixes for DataNodeVolume* unit tests
[ https://issues.apache.org/jira/browse/HDFS-15125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017464#comment-17017464 ] Jim Brennan commented on HDFS-15125: There was a problem with my back-port of HDFS-13945, so TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure was still failing for me intermittently. This is fixed in patch 002. > Pull back fixes for DataNodeVolume* unit tests > -- > > Key: HDFS-15125 > URL: https://issues.apache.org/jira/browse/HDFS-15125 > Project: Hadoop HDFS > Issue Type: Test > Components: hdfs >Affects Versions: 2.10.0 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: HDFS-15125-branch-2.10.001.patch, > HDFS-15125-branch-2.10.002.patch > > > I would like to pull back some fixes for the DataNodeVolume* tests to resolve > some intermittent failures we are seeing on branch-2.10. > The fixes are: > HDFS-11353 Improve the unit tests relevant to DataNode volume failure testing > HDFS-13993 > TestDataNodeVolumeFailure#testTolerateVolumeFailuresAfterAddingMoreVolumes is > flaky > HDFS-14324 Fix TestDataNodeVolumeFailure > HDFS-13945 TestDataNodeVolumeFailure is Flaky -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15125) Pull back fixes for DataNodeVolume* unit tests
[ https://issues.apache.org/jira/browse/HDFS-15125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated HDFS-15125: --- Attachment: HDFS-15125-branch-2.10.002.patch > Pull back fixes for DataNodeVolume* unit tests > -- > > Key: HDFS-15125 > URL: https://issues.apache.org/jira/browse/HDFS-15125 > Project: Hadoop HDFS > Issue Type: Test > Components: hdfs >Affects Versions: 2.10.0 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: HDFS-15125-branch-2.10.001.patch, > HDFS-15125-branch-2.10.002.patch > > > I would like to pull back some fixes for the DataNodeVolume* tests to resolve > some intermittent failures we are seeing on branch-2.10. > The fixes are: > HDFS-11353 Improve the unit tests relevant to DataNode volume failure testing > HDFS-13993 > TestDataNodeVolumeFailure#testTolerateVolumeFailuresAfterAddingMoreVolumes is > flaky > HDFS-14324 Fix TestDataNodeVolumeFailure > HDFS-13945 TestDataNodeVolumeFailure is Flaky -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15124) Crashing bugs in NameNode when using a valid configuration for `dfs.namenode.audit.loggers`
[ https://issues.apache.org/jira/browse/HDFS-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017418#comment-17017418 ] Ctest edited comment on HDFS-15124 at 1/16/20 7:55 PM: --- [~elgoiri] Sorry that I was using hadoop-2.10.0 and the FSNameSystem was using {code:java} public static final Log LOG = LogFactory.getLog(FSNamesystem.class);{code} [https://github.com/apache/hadoop/blob/e2f1f118e465e787d8567dfa6e2f3b72a0eb9194/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java#L328] It seems that 3.x.x is using org.slf4j.Logger but 2.x.x is using org.apache.commons.logging.Log for FSNanesystem.java. I will write another patch for trunk and upload it again. Thank you for pointing this out! was (Author: ctest.team): I was using hadoop-2.10.0 and the FSNameSystem was using {code:java} public static final Log LOG = LogFactory.getLog(FSNamesystem.class);{code} [https://github.com/apache/hadoop/blob/e2f1f118e465e787d8567dfa6e2f3b72a0eb9194/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java#L328] Should I use the 3.x.x version to do the fix? It seems that 3.x.x is using org.slf4j.Logger but 2.x.x is using org.apache.commons.logging.Log for FSNanesystem.java > Crashing bugs in NameNode when using a valid configuration for > `dfs.namenode.audit.loggers` > --- > > Key: HDFS-15124 > URL: https://issues.apache.org/jira/browse/HDFS-15124 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.10.0 >Reporter: Ctest >Priority: Critical > Attachments: HDFS-15124.000.patch, HDFS-15124.001.patch > > > I am using Hadoop-2.10.0. > The configuration parameter `dfs.namenode.audit.loggers` allows `default` > (which is the default value) and > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`. > When I use `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, > namenode will not be started successfully because of an > `InstantiationException` thrown from > `org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers`. > The root cause is that while initializing namenode, `initAuditLoggers` will > be called and it will try to call the default constructor of > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger` which doesn't > have a default constructor. Thus the `InstantiationException` exception is > thrown. > > *Symptom* > *$ ./start-dfs.sh* > {code:java} > 2019-12-18 14:05:20,670 ERROR > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem > initialization failed.java.lang.RuntimeException: > java.lang.InstantiationException: > org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1024) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:858) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:677) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:674) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:736) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:961) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:940) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1714) > at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1782) > Caused by: java.lang.InstantiationException: > org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger > at java.lang.Class.newInstance(Class.java:427) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1017)... > 8 more > Caused by: java.lang.NoSuchMethodException: > org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger.() > at java.lang.Class.getConstructor0(Class.java:3082) > at java.lang.Class.newInstance(Class.java:412) > ... 9 more{code} > > > *Detailed Root Cause* > There is no default constructor in > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`: > {code:java} > /** > * An {@link AuditLogger} that sends logged data directly to the metrics > * systems. It is used when the top service is used directly by the name node > */ > @InterfaceAudience.Private > public class TopAuditLogger implements AuditLogger { > public static finalLogger LOG = > LoggerFactory.getLogger(TopAuditLogger.class); > private final TopMetrics topMetrics; > public TopAuditLogger(TopMetrics topMetrics) { > Preconditions.checkNotNull(topMetrics, "Cannot init with a null " + > "TopMetrics"); > this.topMetrics = topMetrics; > } >
[jira] [Commented] (HDFS-15124) Crashing bugs in NameNode when using a valid configuration for `dfs.namenode.audit.loggers`
[ https://issues.apache.org/jira/browse/HDFS-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017418#comment-17017418 ] Ctest commented on HDFS-15124: -- I was using hadoop-2.10.0 and the FSNameSystem was using {code:java} public static final Log LOG = LogFactory.getLog(FSNamesystem.class);{code} [https://github.com/apache/hadoop/blob/e2f1f118e465e787d8567dfa6e2f3b72a0eb9194/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java#L328] Should I use the 3.x.x version to do the fix? It seems that 3.x.x is using org.slf4j.Logger but 2.x.x is using org.apache.commons.logging.Log for FSNanesystem.java > Crashing bugs in NameNode when using a valid configuration for > `dfs.namenode.audit.loggers` > --- > > Key: HDFS-15124 > URL: https://issues.apache.org/jira/browse/HDFS-15124 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.10.0 >Reporter: Ctest >Priority: Critical > Attachments: HDFS-15124.000.patch, HDFS-15124.001.patch > > > I am using Hadoop-2.10.0. > The configuration parameter `dfs.namenode.audit.loggers` allows `default` > (which is the default value) and > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`. > When I use `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, > namenode will not be started successfully because of an > `InstantiationException` thrown from > `org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers`. > The root cause is that while initializing namenode, `initAuditLoggers` will > be called and it will try to call the default constructor of > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger` which doesn't > have a default constructor. Thus the `InstantiationException` exception is > thrown. > > *Symptom* > *$ ./start-dfs.sh* > {code:java} > 2019-12-18 14:05:20,670 ERROR > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem > initialization failed.java.lang.RuntimeException: > java.lang.InstantiationException: > org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1024) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:858) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:677) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:674) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:736) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:961) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:940) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1714) > at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1782) > Caused by: java.lang.InstantiationException: > org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger > at java.lang.Class.newInstance(Class.java:427) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1017)... > 8 more > Caused by: java.lang.NoSuchMethodException: > org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger.() > at java.lang.Class.getConstructor0(Class.java:3082) > at java.lang.Class.newInstance(Class.java:412) > ... 9 more{code} > > > *Detailed Root Cause* > There is no default constructor in > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`: > {code:java} > /** > * An {@link AuditLogger} that sends logged data directly to the metrics > * systems. It is used when the top service is used directly by the name node > */ > @InterfaceAudience.Private > public class TopAuditLogger implements AuditLogger { > public static finalLogger LOG = > LoggerFactory.getLogger(TopAuditLogger.class); > private final TopMetrics topMetrics; > public TopAuditLogger(TopMetrics topMetrics) { > Preconditions.checkNotNull(topMetrics, "Cannot init with a null " + > "TopMetrics"); > this.topMetrics = topMetrics; > } > @Override > public void initialize(Configuration conf) { > } > {code} > As long as the configuration parameter `dfs.namenode.audit.loggers` is set to > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, > `initAuditLoggers` will try to call its default constructor to make a new > instance: > {code:java} > private List initAuditLoggers(Configuration conf) { > // Initialize the custom access loggers if configured. > Collection alClasses = > conf.getTrimmedStringCollection(DFS_NAMENODE_AUDIT_LOGGERS_KEY); > List auditLoggers = Lists.newArrayList(); > if (alClasses != null && !alClasses.isEmpty()) { > for
[jira] [Updated] (HDFS-15127) RBF: Do not allow writes when a subcluster is unavailable for HASH_ALL mount points.
[ https://issues.apache.org/jira/browse/HDFS-15127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Íñigo Goiri updated HDFS-15127: --- Attachment: HDFS-15127.000.patch > RBF: Do not allow writes when a subcluster is unavailable for HASH_ALL mount > points. > > > Key: HDFS-15127 > URL: https://issues.apache.org/jira/browse/HDFS-15127 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Íñigo Goiri >Priority: Major > Attachments: HDFS-15127.000.patch > > > A HASH_ALL mount point should not allow creating new files if one subcluster > is down. > If the file already existed in the past, this could lead to inconsistencies. > We should return an unavailable exception. > {{TestRouterFaultTolerant#testWriteWithFailedSubcluster()}} needs to be > changed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15112) RBF: Do not return FileNotFoundException when a subcluster is unavailable
[ https://issues.apache.org/jira/browse/HDFS-15112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017409#comment-17017409 ] Hudson commented on HDFS-15112: --- FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17873 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17873/]) HDFS-15112. RBF: Do not return FileNotFoundException when a subcluster (inigoiri: rev 263413e83840c7795a988e3939cd292d020c8d5f) * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterRpcServer.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterRpcClient.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/router/TestRouterFaultTolerant.java > RBF: Do not return FileNotFoundException when a subcluster is unavailable > -- > > Key: HDFS-15112 > URL: https://issues.apache.org/jira/browse/HDFS-15112 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Íñigo Goiri >Assignee: Íñigo Goiri >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-15112.000.patch, HDFS-15112.001.patch, > HDFS-15112.002.patch, HDFS-15112.004.patch, HDFS-15112.005.patch, > HDFS-15112.006.patch, HDFS-15112.007.patch, HDFS-15112.008.patch, > HDFS-15112.009.patch, HDFS-15112.patch > > > If we have a mount point using HASH_ALL across two subclusters and one of > them is down, we may return FileNotFoundException while the file is just in > the unavailable subcluster. > We should not return FileNotFoundException but something that shows that the > subcluster is unavailable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15112) RBF: Do not return FileNotFoundException when a subcluster is unavailable
[ https://issues.apache.org/jira/browse/HDFS-15112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017405#comment-17017405 ] Íñigo Goiri commented on HDFS-15112: Thanks [~ayushtkn] for the review. Committed to trunk. Opened HDFS-15127 to fix the writes. > RBF: Do not return FileNotFoundException when a subcluster is unavailable > -- > > Key: HDFS-15112 > URL: https://issues.apache.org/jira/browse/HDFS-15112 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Íñigo Goiri >Assignee: Íñigo Goiri >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-15112.000.patch, HDFS-15112.001.patch, > HDFS-15112.002.patch, HDFS-15112.004.patch, HDFS-15112.005.patch, > HDFS-15112.006.patch, HDFS-15112.007.patch, HDFS-15112.008.patch, > HDFS-15112.009.patch, HDFS-15112.patch > > > If we have a mount point using HASH_ALL across two subclusters and one of > them is down, we may return FileNotFoundException while the file is just in > the unavailable subcluster. > We should not return FileNotFoundException but something that shows that the > subcluster is unavailable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15127) RBF: Do not allow writes when a subcluster is unavailable for HASH_ALL mount points.
Íñigo Goiri created HDFS-15127: -- Summary: RBF: Do not allow writes when a subcluster is unavailable for HASH_ALL mount points. Key: HDFS-15127 URL: https://issues.apache.org/jira/browse/HDFS-15127 Project: Hadoop HDFS Issue Type: Improvement Reporter: Íñigo Goiri A HASH_ALL mount point should not allow creating new files if one subcluster is down. If the file already existed in the past, this could lead to inconsistencies. We should return an unavailable exception. {{TestRouterFaultTolerant#testWriteWithFailedSubcluster()}} needs to be changed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15112) RBF: Do not return FileNotFoundException when a subcluster is unavailable
[ https://issues.apache.org/jira/browse/HDFS-15112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Íñigo Goiri updated HDFS-15112: --- Fix Version/s: 3.3.0 Hadoop Flags: Reviewed Resolution: Fixed Status: Resolved (was: Patch Available) > RBF: Do not return FileNotFoundException when a subcluster is unavailable > -- > > Key: HDFS-15112 > URL: https://issues.apache.org/jira/browse/HDFS-15112 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Íñigo Goiri >Assignee: Íñigo Goiri >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-15112.000.patch, HDFS-15112.001.patch, > HDFS-15112.002.patch, HDFS-15112.004.patch, HDFS-15112.005.patch, > HDFS-15112.006.patch, HDFS-15112.007.patch, HDFS-15112.008.patch, > HDFS-15112.009.patch, HDFS-15112.patch > > > If we have a mount point using HASH_ALL across two subclusters and one of > them is down, we may return FileNotFoundException while the file is just in > the unavailable subcluster. > We should not return FileNotFoundException but something that shows that the > subcluster is unavailable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15112) RBF: Do not return FileNotFoundException when a subcluster is unavailable
[ https://issues.apache.org/jira/browse/HDFS-15112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017401#comment-17017401 ] Íñigo Goiri commented on HDFS-15112: Correct, the change in RouterRPCServer is only there to make the current testWriteWithFailedSubcluster() work. Let's open a new JIRA to change testWriteWithFailedSubcluster() and make it always fail for writes if one subcluster is down. > RBF: Do not return FileNotFoundException when a subcluster is unavailable > -- > > Key: HDFS-15112 > URL: https://issues.apache.org/jira/browse/HDFS-15112 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Íñigo Goiri >Assignee: Íñigo Goiri >Priority: Major > Attachments: HDFS-15112.000.patch, HDFS-15112.001.patch, > HDFS-15112.002.patch, HDFS-15112.004.patch, HDFS-15112.005.patch, > HDFS-15112.006.patch, HDFS-15112.007.patch, HDFS-15112.008.patch, > HDFS-15112.009.patch, HDFS-15112.patch > > > If we have a mount point using HASH_ALL across two subclusters and one of > them is down, we may return FileNotFoundException while the file is just in > the unavailable subcluster. > We should not return FileNotFoundException but something that shows that the > subcluster is unavailable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15124) Crashing bugs in NameNode when using a valid configuration for `dfs.namenode.audit.loggers`
[ https://issues.apache.org/jira/browse/HDFS-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017346#comment-17017346 ] Íñigo Goiri commented on HDFS-15124: FSNameSystem seems to be using logger, right? https://github.com/apache/hadoop/blob/a0ff42d7612e744e0bf88aa14078ea3ab6bcce49/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java#L373 It was done by HDFS-12552. > Crashing bugs in NameNode when using a valid configuration for > `dfs.namenode.audit.loggers` > --- > > Key: HDFS-15124 > URL: https://issues.apache.org/jira/browse/HDFS-15124 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.10.0 >Reporter: Ctest >Priority: Critical > Attachments: HDFS-15124.000.patch, HDFS-15124.001.patch > > > I am using Hadoop-2.10.0. > The configuration parameter `dfs.namenode.audit.loggers` allows `default` > (which is the default value) and > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`. > When I use `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, > namenode will not be started successfully because of an > `InstantiationException` thrown from > `org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers`. > The root cause is that while initializing namenode, `initAuditLoggers` will > be called and it will try to call the default constructor of > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger` which doesn't > have a default constructor. Thus the `InstantiationException` exception is > thrown. > > *Symptom* > *$ ./start-dfs.sh* > {code:java} > 2019-12-18 14:05:20,670 ERROR > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem > initialization failed.java.lang.RuntimeException: > java.lang.InstantiationException: > org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1024) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:858) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:677) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:674) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:736) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:961) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:940) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1714) > at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1782) > Caused by: java.lang.InstantiationException: > org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger > at java.lang.Class.newInstance(Class.java:427) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1017)... > 8 more > Caused by: java.lang.NoSuchMethodException: > org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger.() > at java.lang.Class.getConstructor0(Class.java:3082) > at java.lang.Class.newInstance(Class.java:412) > ... 9 more{code} > > > *Detailed Root Cause* > There is no default constructor in > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`: > {code:java} > /** > * An {@link AuditLogger} that sends logged data directly to the metrics > * systems. It is used when the top service is used directly by the name node > */ > @InterfaceAudience.Private > public class TopAuditLogger implements AuditLogger { > public static finalLogger LOG = > LoggerFactory.getLogger(TopAuditLogger.class); > private final TopMetrics topMetrics; > public TopAuditLogger(TopMetrics topMetrics) { > Preconditions.checkNotNull(topMetrics, "Cannot init with a null " + > "TopMetrics"); > this.topMetrics = topMetrics; > } > @Override > public void initialize(Configuration conf) { > } > {code} > As long as the configuration parameter `dfs.namenode.audit.loggers` is set to > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, > `initAuditLoggers` will try to call its default constructor to make a new > instance: > {code:java} > private List initAuditLoggers(Configuration conf) { > // Initialize the custom access loggers if configured. > Collection alClasses = > conf.getTrimmedStringCollection(DFS_NAMENODE_AUDIT_LOGGERS_KEY); > List auditLoggers = Lists.newArrayList(); > if (alClasses != null && !alClasses.isEmpty()) { > for (String className : alClasses) { > try { > AuditLogger logger; > if (DFS_NAMENODE_DEFAULT_AUDIT_LOGGER_NAME.equals(className)) { > logger = new DefaultAuditLogger(); > } else { >
[jira] [Commented] (HDFS-15113) Missing IBR when NameNode restart if open processCommand async feature
[ https://issues.apache.org/jira/browse/HDFS-15113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017339#comment-17017339 ] Íñigo Goiri commented on HDFS-15113: I think we need to extend [^HDFS-15113.003.patch] to cover the old and the new tests. > Missing IBR when NameNode restart if open processCommand async feature > -- > > Key: HDFS-15113 > URL: https://issues.apache.org/jira/browse/HDFS-15113 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Attachments: HDFS-15113.001.patch, HDFS-15113.002.patch, > HDFS-15113.003.patch > > > Recently, I meet one case that NameNode missing block after restart which is > related with HDFS-14997. > a. during NameNode restart, it will return command `DNA_REGISTER` to DataNode > when receive some RPC request from DataNode. > b. when DataNode receive `DNA_REGISTER` command, it will run #reRegister > async. > {code:java} > void reRegister() throws IOException { > if (shouldRun()) { > // re-retrieve namespace info to make sure that, if the NN > // was restarted, we still match its version (HDFS-2120) > NamespaceInfo nsInfo = retrieveNamespaceInfo(); > // and re-register > register(nsInfo); > scheduler.scheduleHeartbeat(); > // HDFS-9917,Standby NN IBR can be very huge if standby namenode is down > // for sometime. > if (state == HAServiceState.STANDBY || state == > HAServiceState.OBSERVER) { > ibrManager.clearIBRs(); > } > } > } > {code} > c. As we know, #register will trigger BR immediately. > d. because #reRegister run async, so we could not make sure which one run > first between send FBR and clear IBR. If clean IBR run first, it will be OK. > But if send FBR first then clear IBR, it will missing some blocks received > between these two time point until next FBR. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15124) Crashing bugs in NameNode when using a valid configuration for `dfs.namenode.audit.loggers`
[ https://issues.apache.org/jira/browse/HDFS-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017329#comment-17017329 ] Ctest commented on HDFS-15124: -- I uploaded a new patch to do all these changes. [~elgoiri] I am sorry I didn't use {} style for the LOG in FSNamesystem.java because in FSNamesystem.java the LOG is `org.apache.commons.logging.Log` instead of `org.slf4j.Logger`. The `org.apache.commons.logging.Log` doesn't support {code:java} LOG.error("xxx {}", "yyy"){code} Please let me know if anything else needed. Thank you! > Crashing bugs in NameNode when using a valid configuration for > `dfs.namenode.audit.loggers` > --- > > Key: HDFS-15124 > URL: https://issues.apache.org/jira/browse/HDFS-15124 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.10.0 >Reporter: Ctest >Priority: Critical > Attachments: HDFS-15124.000.patch, HDFS-15124.001.patch > > > I am using Hadoop-2.10.0. > The configuration parameter `dfs.namenode.audit.loggers` allows `default` > (which is the default value) and > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`. > When I use `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, > namenode will not be started successfully because of an > `InstantiationException` thrown from > `org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers`. > The root cause is that while initializing namenode, `initAuditLoggers` will > be called and it will try to call the default constructor of > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger` which doesn't > have a default constructor. Thus the `InstantiationException` exception is > thrown. > > *Symptom* > *$ ./start-dfs.sh* > {code:java} > 2019-12-18 14:05:20,670 ERROR > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem > initialization failed.java.lang.RuntimeException: > java.lang.InstantiationException: > org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1024) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:858) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:677) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:674) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:736) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:961) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:940) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1714) > at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1782) > Caused by: java.lang.InstantiationException: > org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger > at java.lang.Class.newInstance(Class.java:427) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1017)... > 8 more > Caused by: java.lang.NoSuchMethodException: > org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger.() > at java.lang.Class.getConstructor0(Class.java:3082) > at java.lang.Class.newInstance(Class.java:412) > ... 9 more{code} > > > *Detailed Root Cause* > There is no default constructor in > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`: > {code:java} > /** > * An {@link AuditLogger} that sends logged data directly to the metrics > * systems. It is used when the top service is used directly by the name node > */ > @InterfaceAudience.Private > public class TopAuditLogger implements AuditLogger { > public static finalLogger LOG = > LoggerFactory.getLogger(TopAuditLogger.class); > private final TopMetrics topMetrics; > public TopAuditLogger(TopMetrics topMetrics) { > Preconditions.checkNotNull(topMetrics, "Cannot init with a null " + > "TopMetrics"); > this.topMetrics = topMetrics; > } > @Override > public void initialize(Configuration conf) { > } > {code} > As long as the configuration parameter `dfs.namenode.audit.loggers` is set to > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, > `initAuditLoggers` will try to call its default constructor to make a new > instance: > {code:java} > private List initAuditLoggers(Configuration conf) { > // Initialize the custom access loggers if configured. > Collection alClasses = > conf.getTrimmedStringCollection(DFS_NAMENODE_AUDIT_LOGGERS_KEY); > List auditLoggers = Lists.newArrayList(); > if (alClasses != null && !alClasses.isEmpty()) { > for (String className : alClasses) { > try { > AuditLogger logger; > if
[jira] [Updated] (HDFS-15124) Crashing bugs in NameNode when using a valid configuration for `dfs.namenode.audit.loggers`
[ https://issues.apache.org/jira/browse/HDFS-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ctest updated HDFS-15124: - Attachment: HDFS-15124.001.patch > Crashing bugs in NameNode when using a valid configuration for > `dfs.namenode.audit.loggers` > --- > > Key: HDFS-15124 > URL: https://issues.apache.org/jira/browse/HDFS-15124 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.10.0 >Reporter: Ctest >Priority: Critical > Attachments: HDFS-15124.000.patch, HDFS-15124.001.patch > > > I am using Hadoop-2.10.0. > The configuration parameter `dfs.namenode.audit.loggers` allows `default` > (which is the default value) and > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`. > When I use `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, > namenode will not be started successfully because of an > `InstantiationException` thrown from > `org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers`. > The root cause is that while initializing namenode, `initAuditLoggers` will > be called and it will try to call the default constructor of > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger` which doesn't > have a default constructor. Thus the `InstantiationException` exception is > thrown. > > *Symptom* > *$ ./start-dfs.sh* > {code:java} > 2019-12-18 14:05:20,670 ERROR > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem > initialization failed.java.lang.RuntimeException: > java.lang.InstantiationException: > org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1024) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:858) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:677) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:674) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:736) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:961) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:940) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1714) > at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1782) > Caused by: java.lang.InstantiationException: > org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger > at java.lang.Class.newInstance(Class.java:427) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1017)... > 8 more > Caused by: java.lang.NoSuchMethodException: > org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger.() > at java.lang.Class.getConstructor0(Class.java:3082) > at java.lang.Class.newInstance(Class.java:412) > ... 9 more{code} > > > *Detailed Root Cause* > There is no default constructor in > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`: > {code:java} > /** > * An {@link AuditLogger} that sends logged data directly to the metrics > * systems. It is used when the top service is used directly by the name node > */ > @InterfaceAudience.Private > public class TopAuditLogger implements AuditLogger { > public static finalLogger LOG = > LoggerFactory.getLogger(TopAuditLogger.class); > private final TopMetrics topMetrics; > public TopAuditLogger(TopMetrics topMetrics) { > Preconditions.checkNotNull(topMetrics, "Cannot init with a null " + > "TopMetrics"); > this.topMetrics = topMetrics; > } > @Override > public void initialize(Configuration conf) { > } > {code} > As long as the configuration parameter `dfs.namenode.audit.loggers` is set to > `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, > `initAuditLoggers` will try to call its default constructor to make a new > instance: > {code:java} > private List initAuditLoggers(Configuration conf) { > // Initialize the custom access loggers if configured. > Collection alClasses = > conf.getTrimmedStringCollection(DFS_NAMENODE_AUDIT_LOGGERS_KEY); > List auditLoggers = Lists.newArrayList(); > if (alClasses != null && !alClasses.isEmpty()) { > for (String className : alClasses) { > try { > AuditLogger logger; > if (DFS_NAMENODE_DEFAULT_AUDIT_LOGGER_NAME.equals(className)) { > logger = new DefaultAuditLogger(); > } else { > logger = (AuditLogger) Class.forName(className).newInstance(); > } > logger.initialize(conf); > auditLoggers.add(logger); > } catch (RuntimeException re) { > throw re; > } catch (Exception e) { > throw new
[jira] [Created] (HDFS-15126) testForcedRegistration fails Intermittently
Ahmed Hussein created HDFS-15126: Summary: testForcedRegistration fails Intermittently Key: HDFS-15126 URL: https://issues.apache.org/jira/browse/HDFS-15126 Project: Hadoop HDFS Issue Type: Bug Reporter: Ahmed Hussein Assignee: Ahmed Hussein Recently, {{TestDatanodeRegistration.testForcedRegistration}} started to fail in the recommit builds: {code:bash} [ERROR] Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 25.333 s <<< FAILURE! - in org.apache.hadoop.hdfs.TestDatanodeRegistration [ERROR] testForcedRegistration(org.apache.hadoop.hdfs.TestDatanodeRegistration) Time elapsed: 11.39 s <<< FAILURE! java.lang.AssertionError at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.hdfs.TestDatanodeRegistration.testForcedRegistration(TestDatanodeRegistration.java:385) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15125) Pull back fixes for DataNodeVolume* unit tests
[ https://issues.apache.org/jira/browse/HDFS-15125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016994#comment-17016994 ] Jim Brennan commented on HDFS-15125: The unit tests are unrelated, but I am still seeing one of the tests fail locally with this patch. Once I resolve that, I will upload a new patch. > Pull back fixes for DataNodeVolume* unit tests > -- > > Key: HDFS-15125 > URL: https://issues.apache.org/jira/browse/HDFS-15125 > Project: Hadoop HDFS > Issue Type: Test > Components: hdfs >Affects Versions: 2.10.0 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: HDFS-15125-branch-2.10.001.patch > > > I would like to pull back some fixes for the DataNodeVolume* tests to resolve > some intermittent failures we are seeing on branch-2.10. > The fixes are: > HDFS-11353 Improve the unit tests relevant to DataNode volume failure testing > HDFS-13993 > TestDataNodeVolumeFailure#testTolerateVolumeFailuresAfterAddingMoreVolumes is > flaky > HDFS-14324 Fix TestDataNodeVolumeFailure > HDFS-13945 TestDataNodeVolumeFailure is Flaky -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14757) TestBalancerRPCDelay.testBalancerRPCDelay failed
[ https://issues.apache.org/jira/browse/HDFS-14757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016985#comment-17016985 ] Ahmed Hussein commented on HDFS-14757: -- The Failed junit tests were failing Intermittently on the recommit build for some time. * hadoop.hdfs.TestDatanodeRegistration: failing for some time on qb reports HDFS-9880, HDFS-13554 (need to file a new Jiras) * hadoop.hdfs.server.namenode.TestRedudantBlocks: HDFS-15092 * hadoop.hdfs.qjournal.server.TestJournalNodeSync: HDFS-14261 [~kihwal], Can you please take a look that patch to fix the failures reported from {{testBalancerRPCDelay}}? > TestBalancerRPCDelay.testBalancerRPCDelay failed > > > Key: HDFS-14757 > URL: https://issues.apache.org/jira/browse/HDFS-14757 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Wei-Chiu Chuang >Assignee: Ahmed Hussein >Priority: Major > Attachments: HDFS-14757.001.patch > > > {noformat} > Error Message > Unfinished stubbing detected here: > -> at > org.apache.hadoop.hdfs.server.balancer.TestBalancer.spyFSNamesystem(TestBalancer.java:1948) > E.g. thenReturn() may be missing. > Examples of correct stubbing: > when(mock.isOk()).thenReturn(true); > when(mock.isOk()).thenThrow(exception); > doThrow(exception).when(mock).someVoidMethod(); > Hints: > 1. missing thenReturn() > 2. although stubbed methods may return mocks, you cannot inline mock > creation (mock()) call inside a thenReturn method (see issue 53) > Stacktrace > org.mockito.exceptions.misusing.UnfinishedStubbingException: > Unfinished stubbing detected here: > -> at > org.apache.hadoop.hdfs.server.balancer.TestBalancer.spyFSNamesystem(TestBalancer.java:1948) > E.g. thenReturn() may be missing. > Examples of correct stubbing: > when(mock.isOk()).thenReturn(true); > when(mock.isOk()).thenThrow(exception); > doThrow(exception).when(mock).someVoidMethod(); > Hints: > 1. missing thenReturn() > 2. although stubbed methods may return mocks, you cannot inline mock > creation (mock()) call inside a thenReturn method (see issue 53) > at > org.apache.hadoop.hdfs.server.balancer.TestBalancer.spyFSNamesystem(TestBalancer.java:1957) > at > org.apache.hadoop.hdfs.server.balancer.TestBalancer.doTest(TestBalancer.java:811) > at > org.apache.hadoop.hdfs.server.balancer.TestBalancer.testBalancerRPCDelay(TestBalancer.java:1976) > at > org.apache.hadoop.hdfs.server.balancer.TestBalancerRPCDelay.testBalancerRPCDelay(TestBalancerRPCDelay.java:30) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15087) RBF: Balance/Rename across federation namespaces
[ https://issues.apache.org/jira/browse/HDFS-15087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016871#comment-17016871 ] Jinglun commented on HDFS-15087: Hi everyone, any further comments? Please let me know your thoughts, thanks ! > RBF: Balance/Rename across federation namespaces > > > Key: HDFS-15087 > URL: https://issues.apache.org/jira/browse/HDFS-15087 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Jinglun >Priority: Major > Attachments: HDFS-15087.initial.patch, HFR_Rename Across Federation > Namespaces.pdf > > > The Xiaomi storage team has developed a new feature called HFR(HDFS > Federation Rename) that enables us to do balance/rename across federation > namespaces. The idea is to first move the meta to the dst NameNode and then > link all the replicas. It has been working in our largest production cluster > for 2 months. We use it to balance the namespaces. It turns out HFR is fast > and flexible. The detail could be found in the design doc. > Looking forward to a lively discussion. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15120) Refresh BlockPlacementPolicy at runtime.
[ https://issues.apache.org/jira/browse/HDFS-15120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016695#comment-17016695 ] Jinglun edited comment on HDFS-15120 at 1/16/20 12:31 PM: -- Hi [~ayushtkn], thanks your comment ! I found the ReconfigurationProtocol is used by DataNode too. Using ReconfigurationProtocol will give DataNode the refreshBlockPlacementPolicy interface even it doesn't has a BlockPlacementPolicy. And I found the NameNode already has a GenericRefreshProtocol. I can create a RefreshBlockPlacementPolicyHandler and use the GenericRefreshProtocol to refresh. Please let me know your thoughts. Upload v02. was (Author: lijinglun): Hi [~ayushtkn], thanks your comment ! I found the ReconfigurationProtocol is used by DataNode too. Using ReconfigurationProtocol will give DataNode the refreshBlockPlacementPolicy interface even it doesn't has a BlockPlacementPolicy. And I found the NameNode already has a GenericRefreshProtocol. I can create a RefreshBlockPlacementPolicyHandler and use the GenericRefreshProtocol to refresh. What do you think ? Upload v02. > Refresh BlockPlacementPolicy at runtime. > > > Key: HDFS-15120 > URL: https://issues.apache.org/jira/browse/HDFS-15120 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Jinglun >Assignee: Jinglun >Priority: Major > Attachments: HDFS-15120.001.patch, HDFS-15120.002.patch > > > Now if we want to switch BlockPlacementPolicies we need to restart the > NameNode. It would be convenient if we can switch it at runtime. For example > we can switch between AvailableSpaceBlockPlacementPolicy and > BlockPlacementPolicyDefault as needed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15120) Refresh BlockPlacementPolicy at runtime.
[ https://issues.apache.org/jira/browse/HDFS-15120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016854#comment-17016854 ] Hadoop QA commented on HDFS-15120: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 31m 53s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 30s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 11s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 41s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 3 new + 346 unchanged - 0 fixed = 349 total (was 346) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 43s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 12s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}122m 31s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 46s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}216m 2s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.web.TestWebHdfsFileSystemContract | | | hadoop.hdfs.TestDeadNodeDetection | | | hadoop.hdfs.server.namenode.TestRedudantBlocks | | | hadoop.hdfs.TestDFSOutputStream | | | hadoop.hdfs.server.blockmanagement.TestReplicationPolicy | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:c44943d1fc3 | | JIRA Issue | HDFS-15120 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12991089/HDFS-15120.002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux ed8c73cc4e32 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / a0ff42d | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_232 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/28683/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt | | unit |
[jira] [Commented] (HDFS-15113) Missing IBR when NameNode restart if open processCommand async feature
[ https://issues.apache.org/jira/browse/HDFS-15113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016851#comment-17016851 ] Xiaoqiao He commented on HDFS-15113: [~elgoiri],[~weichiu],[~brahmareddy], any bandwidth to push this issue forward? Consider this is serious bug, and some users want to backport to their own internal branch, it is better to fix it ASAP. I would like to follow up if any suggestions. Thanks again. > Missing IBR when NameNode restart if open processCommand async feature > -- > > Key: HDFS-15113 > URL: https://issues.apache.org/jira/browse/HDFS-15113 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Attachments: HDFS-15113.001.patch, HDFS-15113.002.patch, > HDFS-15113.003.patch > > > Recently, I meet one case that NameNode missing block after restart which is > related with HDFS-14997. > a. during NameNode restart, it will return command `DNA_REGISTER` to DataNode > when receive some RPC request from DataNode. > b. when DataNode receive `DNA_REGISTER` command, it will run #reRegister > async. > {code:java} > void reRegister() throws IOException { > if (shouldRun()) { > // re-retrieve namespace info to make sure that, if the NN > // was restarted, we still match its version (HDFS-2120) > NamespaceInfo nsInfo = retrieveNamespaceInfo(); > // and re-register > register(nsInfo); > scheduler.scheduleHeartbeat(); > // HDFS-9917,Standby NN IBR can be very huge if standby namenode is down > // for sometime. > if (state == HAServiceState.STANDBY || state == > HAServiceState.OBSERVER) { > ibrManager.clearIBRs(); > } > } > } > {code} > c. As we know, #register will trigger BR immediately. > d. because #reRegister run async, so we could not make sure which one run > first between send FBR and clear IBR. If clean IBR run first, it will be OK. > But if send FBR first then clear IBR, it will missing some blocks received > between these two time point until next FBR. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15115) Namenode crash caused by NPE in BlockPlacementPolicyDefault when dynamically change logger to debug
[ https://issues.apache.org/jira/browse/HDFS-15115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016844#comment-17016844 ] Xiaoqiao He commented on HDFS-15115: [~wzx513], would you like to check all same cases in #BlockPlacementPolicyDefault and try to update new patch, it is better to add UT just like [~weichiu] mentioned above. Thanks. > Namenode crash caused by NPE in BlockPlacementPolicyDefault when dynamically > change logger to debug > --- > > Key: HDFS-15115 > URL: https://issues.apache.org/jira/browse/HDFS-15115 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: wangzhixiang >Priority: Major > Attachments: HDFS-15115.001.patch > > > To get debug info, we dynamically change the logger of > BlockPlacementPolicyDefault to debug when namenode is running. However, the > Namenode crashs. From the log, we find some NPE in > BlockPlacementPolicyDefault.chooseRandom. Because *StringBuilder builder* > will be used 4 times in BlockPlacementPolicyDefault.chooseRandom method. > While the *builder* only initializes in the first time of this method. If we > change the logger of BlockPlacementPolicyDefault to debug after the part, the > *builder* in remaining part is *NULL* and cause *NPE* -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14672) Backport HDFS-12703 to branch-2
[ https://issues.apache.org/jira/browse/HDFS-14672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016843#comment-17016843 ] Xiaoqiao He commented on HDFS-14672: [~Tao Yang], [^HDFS-14672.branch-2.8.001.patch] try to offer path for branch-2.8 without carefully test, FYI. > Backport HDFS-12703 to branch-2 > --- > > Key: HDFS-14672 > URL: https://issues.apache.org/jira/browse/HDFS-14672 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Fix For: 2.10.0 > > Attachments: HDFS-12703.branch-2.001.patch, > HDFS-12703.branch-2.002.patch, HDFS-12703.branch-2.003.patch, > HDFS-14672.branch-2.8.001.patch > > > Currently, `decommission monitor exception cause namenode fatal` is only in > trunk (branch-3). This JIRA aims to backport this bugfix to branch-2. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14672) Backport HDFS-12703 to branch-2
[ https://issues.apache.org/jira/browse/HDFS-14672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoqiao He updated HDFS-14672: --- Attachment: HDFS-14672.branch-2.8.001.patch > Backport HDFS-12703 to branch-2 > --- > > Key: HDFS-14672 > URL: https://issues.apache.org/jira/browse/HDFS-14672 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Fix For: 2.10.0 > > Attachments: HDFS-12703.branch-2.001.patch, > HDFS-12703.branch-2.002.patch, HDFS-12703.branch-2.003.patch, > HDFS-14672.branch-2.8.001.patch > > > Currently, `decommission monitor exception cause namenode fatal` is only in > trunk (branch-3). This JIRA aims to backport this bugfix to branch-2. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15120) Refresh BlockPlacementPolicy at runtime.
[ https://issues.apache.org/jira/browse/HDFS-15120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016695#comment-17016695 ] Jinglun edited comment on HDFS-15120 at 1/16/20 8:38 AM: - Hi [~ayushtkn], thanks your comment ! I found the ReconfigurationProtocol is used by DataNode too. Using ReconfigurationProtocol will give DataNode the refreshBlockPlacementPolicy interface even it doesn't has a BlockPlacementPolicy. And I found the NameNode already has a GenericRefreshProtocol. I can create a RefreshBlockPlacementPolicyHandler and use the GenericRefreshProtocol to refresh. What do you think ? Upload v02. was (Author: lijinglun): Hi [~ayushtkn], thanks your comment ! I found the ReconfigurationProtocol is used by DataNode too. And I found the NameNode already has a GenericRefreshProtocol. I can create a RefreshBlockPlacementPolicyHandler and use the GenericRefreshProtocol to refresh. What do you think ? Upload v02. > Refresh BlockPlacementPolicy at runtime. > > > Key: HDFS-15120 > URL: https://issues.apache.org/jira/browse/HDFS-15120 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Jinglun >Assignee: Jinglun >Priority: Major > Attachments: HDFS-15120.001.patch, HDFS-15120.002.patch > > > Now if we want to switch BlockPlacementPolicies we need to restart the > NameNode. It would be convenient if we can switch it at runtime. For example > we can switch between AvailableSpaceBlockPlacementPolicy and > BlockPlacementPolicyDefault as needed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15120) Refresh BlockPlacementPolicy at runtime.
[ https://issues.apache.org/jira/browse/HDFS-15120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinglun updated HDFS-15120: --- Attachment: HDFS-15120.002.patch > Refresh BlockPlacementPolicy at runtime. > > > Key: HDFS-15120 > URL: https://issues.apache.org/jira/browse/HDFS-15120 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Jinglun >Assignee: Jinglun >Priority: Major > Attachments: HDFS-15120.001.patch, HDFS-15120.002.patch > > > Now if we want to switch BlockPlacementPolicies we need to restart the > NameNode. It would be convenient if we can switch it at runtime. For example > we can switch between AvailableSpaceBlockPlacementPolicy and > BlockPlacementPolicyDefault as needed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15120) Refresh BlockPlacementPolicy at runtime.
[ https://issues.apache.org/jira/browse/HDFS-15120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016695#comment-17016695 ] Jinglun commented on HDFS-15120: Hi [~ayushtkn], thanks your comment ! I found the ReconfigurationProtocol is used by DataNode too. And I found the NameNode already has a GenericRefreshProtocol. I can create a RefreshBlockPlacementPolicyHandler and use the GenericRefreshProtocol to refresh. What do you think ? Upload v02. > Refresh BlockPlacementPolicy at runtime. > > > Key: HDFS-15120 > URL: https://issues.apache.org/jira/browse/HDFS-15120 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Jinglun >Assignee: Jinglun >Priority: Major > Attachments: HDFS-15120.001.patch, HDFS-15120.002.patch > > > Now if we want to switch BlockPlacementPolicies we need to restart the > NameNode. It would be convenient if we can switch it at runtime. For example > we can switch between AvailableSpaceBlockPlacementPolicy and > BlockPlacementPolicyDefault as needed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org