[jira] [Commented] (HDFS-14476) lock too long when fix inconsistent blocks between disk and in-memory

2020-01-16 Thread Sean Chow (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017759#comment-17017759
 ] 

Sean Chow commented on HDFS-14476:
--

Build failed:(

> lock too long when fix inconsistent blocks between disk and in-memory
> -
>
> Key: HDFS-14476
> URL: https://issues.apache.org/jira/browse/HDFS-14476
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.6.0, 2.7.0, 3.0.3
>Reporter: Sean Chow
>Assignee: Sean Chow
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14476-branch-2.01.patch, 
> HDFS-14476-branch-2.02.patch, HDFS-14476.00.patch, HDFS-14476.002.patch, 
> HDFS-14476.01.patch, HDFS-14476.branch-3.2.001.patch, 
> datanode-with-patch-14476.png
>
>
> When directoryScanner have the results of differences between disk and 
> in-memory blocks. it will try to run {{checkAndUpdate}} to fix it. However 
> {{FsDatasetImpl.checkAndUpdate}} is a synchronized call
> As I have about 6millions blocks for every datanodes and every 6hours' scan 
> will have about 25000 abnormal blocks to fix. That leads to a long lock 
> holding FsDatasetImpl object.
> let's assume every block need 10ms to fix(because of latency of SAS disk), 
> that will cost 250 seconds to finish. That means all reads and writes will be 
> blocked for 3mins for that datanode.
>  
> {code:java}
> 2019-05-06 08:06:51,704 INFO 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool 
> BP-1644920766-10.223.143.220-1450099987967 Total blocks: 6850197, missing 
> metadata files:23574, missing block files:23574, missing blocks in 
> memory:47625, mismatched blocks:0
> ...
> 2019-05-06 08:16:41,625 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Took 588402ms to process 1 commands from NN
> {code}
> Take long time to process command from nn because threads are blocked. And 
> namenode will see long lastContact time for this datanode.
> Maybe this affect all hdfs versions.
> *how to fix:*
> just like process invalidate command from namenode with 1000 batch size, fix 
> these abnormal block should be handled with batch too and sleep 2 seconds 
> between the batch to allow normal reading/writing blocks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15126) testForcedRegistration fails Intermittently

2020-01-16 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017686#comment-17017686
 ] 

Hadoop QA commented on HDFS-15126:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 29m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 30s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
17s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  4s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
11s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 96m 58s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
38s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}183m 31s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.namenode.TestRedudantBlocks |
|   | hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes |
|   | hadoop.hdfs.server.balancer.TestBalancer |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:c44943d1fc3 |
| JIRA Issue | HDFS-15126 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12991187/HDFS-15126.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux db3694bbc4a4 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 263413e |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_232 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28688/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28688/testReport/ |
| Max. process+thread count | 4291 (vs. ulimit of 5500) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 

[jira] [Commented] (HDFS-15127) RBF: Do not allow writes when a subcluster is unavailable for HASH_ALL mount points.

2020-01-16 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017672#comment-17017672
 ] 

Hadoop QA commented on HDFS-15127:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
29s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 48s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
57s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
20s{color} | {color:red} hadoop-hdfs-rbf in the patch failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
21s{color} | {color:red} hadoop-hdfs-rbf in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 21s{color} 
| {color:red} hadoop-hdfs-rbf in the patch failed. {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
22s{color} | {color:red} hadoop-hdfs-rbf in the patch failed. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 23s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
22s{color} | {color:red} hadoop-hdfs-rbf in the patch failed. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 21s{color} 
| {color:red} hadoop-hdfs-rbf in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
30s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 57m 10s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:c44943d1fc3 |
| JIRA Issue | HDFS-15127 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12991169/HDFS-15127.000.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux ceb23d392f79 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 263413e |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_232 |
| findbugs | v3.1.0-RC1 |
| mvninstall | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28689/artifact/out/patch-mvninstall-hadoop-hdfs-project_hadoop-hdfs-rbf.txt
 |
| compile | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28689/artifact/out/patch-compile-hadoop-hdfs-project_hadoop-hdfs-rbf.txt
 |
| javac | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28689/artifact/out/patch-compile-hadoop-hdfs-project_hadoop-hdfs-rbf.txt
 |
| 

[jira] [Commented] (HDFS-15124) Crashing bugs in NameNode when using a valid configuration for `dfs.namenode.audit.loggers`

2020-01-16 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017653#comment-17017653
 ] 

Hadoop QA commented on HDFS-15124:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
41s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 11s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
16s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 41s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 1 new + 166 unchanged - 0 fixed = 167 total (was 166) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 25s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
24s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}123m 13s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
32s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}191m 22s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestDeadNodeDetection |
|   | hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks |
|   | hadoop.hdfs.server.namenode.TestRedudantBlocks |
|   | hadoop.hdfs.server.datanode.TestNNHandlesBlockReportPerStorage |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:c44943d1fc3 |
| JIRA Issue | HDFS-15124 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12991181/HDFS-15124.002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 0fb99bc6d2d0 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 
08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 263413e |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_232 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 

[jira] [Commented] (HDFS-15126) testForcedRegistration fails Intermittently

2020-01-16 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-15126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017650#comment-17017650
 ] 

Íñigo Goiri commented on HDFS-15126:


I went over this class and this is a little hard to debug.
Basically, this goes into a wait for that if it fails it does not say anything 
but fail the assert.
I would propose one of the two following options:
# Log the timeout for the function.
# Add a message in the assertTrue to say what happened.

> testForcedRegistration fails Intermittently
> ---
>
> Key: HDFS-15126
> URL: https://issues.apache.org/jira/browse/HDFS-15126
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: HDFS-15126.001.patch
>
>
> Recently, {{TestDatanodeRegistration.testForcedRegistration}} started to fail 
> in the recommit builds:
> {code:bash}
> [ERROR] Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 
> 25.333 s <<< FAILURE! - in org.apache.hadoop.hdfs.TestDatanodeRegistration
> [ERROR] 
> testForcedRegistration(org.apache.hadoop.hdfs.TestDatanodeRegistration)  Time 
> elapsed: 11.39 s  <<< FAILURE!
> java.lang.AssertionError
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.hdfs.TestDatanodeRegistration.testForcedRegistration(TestDatanodeRegistration.java:385)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15118) [SBN Read] Slow clients when Observer reads are enabled but there are no Observers on the cluster.

2020-01-16 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017645#comment-17017645
 ] 

Hadoop QA commented on HDFS-15118:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
43s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
10s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m  7s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
43s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
10s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  3m 
20s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 52s{color} | {color:orange} hadoop-hdfs-project: The patch generated 1 new + 
2 unchanged - 0 fixed = 3 total (was 2) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 37s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
46s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m  
0s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}123m 56s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
42s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}202m 45s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestDecommission |
|   | hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits |
|   | hadoop.hdfs.server.namenode.TestRedudantBlocks |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:c44943d1fc3 |
| JIRA Issue | HDFS-15118 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12991178/HDFS-15118.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux d1ea322184a0 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 
08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 263413e |

[jira] [Updated] (HDFS-15128) Unit test failing to clean testing data and crashed future Maven test run

2020-01-16 Thread Ctest (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ctest updated HDFS-15128:
-
Summary: Unit test failing to clean testing data and crashed future Maven 
test run  (was: Unit test failing to clean testing data and crashed subsequent 
Maven test run)

> Unit test failing to clean testing data and crashed future Maven test run
> -
>
> Key: HDFS-15128
> URL: https://issues.apache.org/jira/browse/HDFS-15128
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, test
>Affects Versions: 3.2.1
>Reporter: Ctest
>Priority: Critical
>  Labels: easyfix, patch, test
> Attachments: HDFS-15128-000.patch
>
>
> Actively-used test helper function `testVolumeConfig` in 
> `org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration` 
> chmod a directory with invalid perm 000 for testing purposes but later failed 
> to chmod back this directory with a valid perm if the assertion inside this 
> function failed. Any subsequent `mvn test` command would fail to run if this 
> test had failed before. It is because Maven failed to build itself as it did 
> not have permission to clean the temporarily-generated directory that has 
> perm 000. See below for the code snippet that is buggy.
> {code:java}
> try {
>   for (int i = 0; i < volumesFailed; i++) {
> prepareDirToFail(dirs[i]); // this will chmod dirs[i] to perm 000
>   }
>   restartDatanodes(volumesTolerated, manageDfsDirs);
> } catch (DiskErrorException e) {
>  ...
> } finally {
> ...
> }
>  
>   assertEquals(expectedBPServiceState, bpServiceState);
>  
>   for (File dir : dirs) {
> FileUtil.chmod(dir.toString(), "755");
>   }
> }
> {code}
> The failure of the statement `assertEquals(expectedBPServiceState, 
> bpServiceState)` caused function to terminate without executing 
> `FileUtil.chmod(dir.toString(), "755")` for each temporary directory with 
> invalid perm 000 the test has created. 
>  
> *Consequence*
> Any subsequent `mvn test` command would fail to run if this test had failed 
> before. It is because Maven failed to build itself since it does not have 
> permission to clean this temporarily-generated directory. For details of the 
> failure, see below:
> {noformat}
> [INFO] --- maven-antrun-plugin:1.7:run (create-log-dir) @ hadoop-hdfs ---
> [INFO] Executing tasks
>  
> main:
> [delete] Deleting directory 
> /home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time:  8.349 s
> [INFO] Finished at: 2019-12-27T03:53:04-06:00
> [INFO] 
> 
> [ERROR] Failed to execute 
> goalorg.apache.maven.plugins:maven-antrun-plugin:1.7:run (create-log-dir) on 
> project hadoop-hdfs: An Ant BuildException has occured: Unable to delete 
> directory 
> /home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/current
> [ERROR] around Ant part ... dir="/home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data"/>...
>  @ 4:105 in 
> /home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/antrun/build-main.xml
> [ERROR] -> [Help 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException{noformat}
>  
> *Root Cause*
> The test helper function 
> `org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration#testVolumeConfig`
>  purposely set the directory 
> `/home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/current`
>  to have perm 000. And at the end of this function, it changed the perm of 
> this directory to 755. However, there is an assertion in this function before 
> the perm was able to changed to 755. Once this assertion fails, the function 
> terminates before the directory’s perm can be changed to 755. Hence, this 
> directory was later unable to be removed by Maven for when executing `mvn 
> test`. 
>  
> *Fix*
> In 
> `org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration#testVolumeConfig`,
>  move the assertion `assertEquals(expectedBPServiceState, bpServiceState)`  
> to the last line of this function. This fix will fix the bug and will not 
> 

[jira] [Updated] (HDFS-15128) Unit test failing to clean testing data and crashed subsequent Maven test run

2020-01-16 Thread Ctest (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ctest updated HDFS-15128:
-
Summary: Unit test failing to clean testing data and crashed subsequent 
Maven test run  (was: Unit test failing to clean testing data and caused Maven 
to crash)

> Unit test failing to clean testing data and crashed subsequent Maven test run
> -
>
> Key: HDFS-15128
> URL: https://issues.apache.org/jira/browse/HDFS-15128
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, test
>Affects Versions: 3.2.1
>Reporter: Ctest
>Priority: Critical
>  Labels: easyfix, patch, test
> Attachments: HDFS-15128-000.patch
>
>
> Actively-used test helper function `testVolumeConfig` in 
> `org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration` 
> chmod a directory with invalid perm 000 for testing purposes but later failed 
> to chmod back this directory with a valid perm if the assertion inside this 
> function failed. Any subsequent `mvn test` command would fail to run if this 
> test had failed before. It is because Maven failed to build itself as it did 
> not have permission to clean the temporarily-generated directory that has 
> perm 000. See below for the code snippet that is buggy.
> {code:java}
> try {
>   for (int i = 0; i < volumesFailed; i++) {
> prepareDirToFail(dirs[i]); // this will chmod dirs[i] to perm 000
>   }
>   restartDatanodes(volumesTolerated, manageDfsDirs);
> } catch (DiskErrorException e) {
>  ...
> } finally {
> ...
> }
>  
>   assertEquals(expectedBPServiceState, bpServiceState);
>  
>   for (File dir : dirs) {
> FileUtil.chmod(dir.toString(), "755");
>   }
> }
> {code}
> The failure of the statement `assertEquals(expectedBPServiceState, 
> bpServiceState)` caused function to terminate without executing 
> `FileUtil.chmod(dir.toString(), "755")` for each temporary directory with 
> invalid perm 000 the test has created. 
>  
> *Consequence*
> Any subsequent `mvn test` command would fail to run if this test had failed 
> before. It is because Maven failed to build itself since it does not have 
> permission to clean this temporarily-generated directory. For details of the 
> failure, see below:
> {noformat}
> [INFO] --- maven-antrun-plugin:1.7:run (create-log-dir) @ hadoop-hdfs ---
> [INFO] Executing tasks
>  
> main:
> [delete] Deleting directory 
> /home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time:  8.349 s
> [INFO] Finished at: 2019-12-27T03:53:04-06:00
> [INFO] 
> 
> [ERROR] Failed to execute 
> goalorg.apache.maven.plugins:maven-antrun-plugin:1.7:run (create-log-dir) on 
> project hadoop-hdfs: An Ant BuildException has occured: Unable to delete 
> directory 
> /home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/current
> [ERROR] around Ant part ... dir="/home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data"/>...
>  @ 4:105 in 
> /home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/antrun/build-main.xml
> [ERROR] -> [Help 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException{noformat}
>  
> *Root Cause*
> The test helper function 
> `org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration#testVolumeConfig`
>  purposely set the directory 
> `/home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/current`
>  to have perm 000. And at the end of this function, it changed the perm of 
> this directory to 755. However, there is an assertion in this function before 
> the perm was able to changed to 755. Once this assertion fails, the function 
> terminates before the directory’s perm can be changed to 755. Hence, this 
> directory was later unable to be removed by Maven for when executing `mvn 
> test`. 
>  
> *Fix*
> In 
> `org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration#testVolumeConfig`,
>  move the assertion `assertEquals(expectedBPServiceState, bpServiceState)`  
> to the last line of this function. This fix will fix the bug and will not 
> 

[jira] [Updated] (HDFS-15128) Unit test failing to clean testing data and caused Maven to crash

2020-01-16 Thread Ctest (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ctest updated HDFS-15128:
-
Description: 
Actively-used test helper function `testVolumeConfig` in 
`org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration` 
chmod a directory with invalid perm 000 for testing purposes but later failed 
to chmod back this directory with a valid perm if the assertion inside this 
function failed. Any subsequent `mvn test` command would fail to run if this 
test had failed before. It is because Maven failed to build itself as it did 
not have permission to clean the temporarily-generated directory that has perm 
000. See below for the code snippet that is buggy.
{code:java}
try {
  for (int i = 0; i < volumesFailed; i++) {
prepareDirToFail(dirs[i]); // this will chmod dirs[i] to perm 000
  }
  restartDatanodes(volumesTolerated, manageDfsDirs);
} catch (DiskErrorException e) {
 ...
} finally {
...
}
 
  assertEquals(expectedBPServiceState, bpServiceState);
 
  for (File dir : dirs) {
FileUtil.chmod(dir.toString(), "755");
  }
}
{code}
The failure of the statement `assertEquals(expectedBPServiceState, 
bpServiceState)` caused function to terminate without executing 
`FileUtil.chmod(dir.toString(), "755")` for each temporary directory with 
invalid perm 000 the test has created. 

 

*Consequence*

Any subsequent `mvn test` command would fail to run if this test had failed 
before. It is because Maven failed to build itself since it does not have 
permission to clean this temporarily-generated directory. For details of the 
failure, see below:
{noformat}
[INFO] --- maven-antrun-plugin:1.7:run (create-log-dir) @ hadoop-hdfs ---
[INFO] Executing tasks
 
main:
[delete] Deleting directory 
/home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time:  8.349 s
[INFO] Finished at: 2019-12-27T03:53:04-06:00
[INFO] 
[ERROR] Failed to execute 
goalorg.apache.maven.plugins:maven-antrun-plugin:1.7:run (create-log-dir) on 
project hadoop-hdfs: An Ant BuildException has occured: Unable to delete 
directory 
/home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/current
[ERROR] around Ant part ..
 @ 4:105 in 
/home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/antrun/build-main.xml
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException{noformat}
 

*Root Cause*

The test helper function 
`org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration#testVolumeConfig`
 purposely set the directory 
`/home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/current`
 to have perm 000. And at the end of this function, it changed the perm of this 
directory to 755. However, there is an assertion in this function before the 
perm was able to changed to 755. Once this assertion fails, the function 
terminates before the directory’s perm can be changed to 755. Hence, this 
directory was later unable to be removed by Maven for when executing `mvn 
test`. 

 

*Fix*

In 
`org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration#testVolumeConfig`,
 move the assertion `assertEquals(expectedBPServiceState, bpServiceState)`  to 
the last line of this function. This fix will fix the bug and will not change 
the test outcome. 

  was:
Actively-used test helper function `testVolumeConfig` in 
`org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration` 
chmod a directory with invalid perm 000 for testing purposes but later failed 
to chmod back this directory with a valid perm if the assertion inside this 
function failed. Any subsequent `mvn test` command would fail to run if this 
test had failed before. It is because Maven failed to build itself as it did 
not have permission to clean the temporarily-generated directory that has perm 
000. See below for the code snippet that is buggy.
{code:java}
try {
  for (int i = 0; i < volumesFailed; i++) {
prepareDirToFail(dirs[i]); // this will chmod dirs[i] to perm 000
  }
  restartDatanodes(volumesTolerated, manageDfsDirs);
} catch (DiskErrorException e) {
 ...
} finally {
...
}
 
  assertEquals(expectedBPServiceState, bpServiceState);
 
  

[jira] [Updated] (HDFS-15128) Unit test failing to clean testing data and caused Maven to crash

2020-01-16 Thread Ctest (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ctest updated HDFS-15128:
-
Description: 
Actively-used test helper function `testVolumeConfig` in 
`org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration` 
chmod a directory with invalid perm 000 for testing purposes but later failed 
to chmod back this directory with a valid perm if the assertion inside this 
function failed. Any subsequent `mvn test` command would fail to run if this 
test had failed before. It is because Maven failed to build itself as it did 
not have permission to clean the temporarily-generated directory that has perm 
000. See below for the code snippet that is buggy.
{code:java}
try {
  for (int i = 0; i < volumesFailed; i++) {
prepareDirToFail(dirs[i]); // this will chmod dirs[i] to perm 000
  }
  restartDatanodes(volumesTolerated, manageDfsDirs);
} catch (DiskErrorException e) {
 ...
} finally {
...
}
 
  assertEquals(expectedBPServiceState, bpServiceState);
 
  for (File dir : dirs) {
FileUtil.chmod(dir.toString(), "755");
  }
}
{code}
The failure of the statement `assertEquals(expectedBPServiceState, 
bpServiceState)` caused function to terminate without executing 
`FileUtil.chmod(dir.toString(), "755")` for each temporary directory with 
invalid perm 000 the test has created. 

 

*Consequence:*

Any subsequent `mvn test` command would fail to run if this test had failed 
before. It is because Maven failed to build itself since it does not have 
permission to clean this temporarily-generated directory. For details of the 
failure, see below:
{noformat}
[INFO] --- maven-antrun-plugin:1.7:run (create-log-dir) @ hadoop-hdfs ---
[INFO] Executing tasks
 
main:
[delete] Deleting directory 
/home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time:  8.349 s
[INFO] Finished at: 2019-12-27T03:53:04-06:00
[INFO] 
[ERROR] Failed to execute 
goalorg.apache.maven.plugins:maven-antrun-plugin:1.7:run (create-log-dir) on 
project hadoop-hdfs: An Ant BuildException has occured: Unable to delete 
directory 
/home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/current
[ERROR] around Ant part ..
 @ 4:105 in 
/home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/antrun/build-main.xml
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException{noformat}
 

*Root Cause:*

The test helper function 
`org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration#testVolumeConfig`
 purposely set the directory 
`/home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/current`
 to have perm 000. And at the end of this function, it changed the perm of this 
directory to 755. However, there is an assertion in this function before the 
perm was able to changed to 755. Once this assertion fails, the function 
terminates before the directory’s perm can be changed to 755. Hence, this 
directory was later unable to be removed by Maven for when executing `mvn 
test`. 

 

*Fix:*

In 
`org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration#testVolumeConfig`,
 move the assertion `assertEquals(expectedBPServiceState, bpServiceState)`  to 
the last line of this function. This fix will fix the bug and will not change 
the test outcome. 

  was:
*Description:*

Actively-used test helper function `testVolumeConfig` in 
`org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration` 
chmod a directory with invalid perm 000 for testing purposes but later failed 
to chmod back this directory with a valid perm if the assertion inside this 
function failed. Any subsequent `mvn test` command would fail to run if this 
test had failed before. It is because Maven failed to build itself as it did 
not have permission to clean the temporarily-generated directory that has perm 
000. See below for the code snippet that is buggy.
{code:java}
try {
  for (int i = 0; i < volumesFailed; i++) {
prepareDirToFail(dirs[i]); // this will chmod dirs[i] to perm 000
  }
  restartDatanodes(volumesTolerated, manageDfsDirs);
} catch (DiskErrorException e) {
 ...
} finally {
...
}
 
  assertEquals(expectedBPServiceState, 

[jira] [Updated] (HDFS-15128) Unit test failing to clean testing data and caused Maven to crash

2020-01-16 Thread Ctest (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ctest updated HDFS-15128:
-
Description: 
*Description:*

Actively-used test helper function `testVolumeConfig` in 
`org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration` 
chmod a directory with invalid perm 000 for testing purposes but later failed 
to chmod back this directory with a valid perm if the assertion inside this 
function failed. Any subsequent `mvn test` command would fail to run if this 
test had failed before. It is because Maven failed to build itself as it did 
not have permission to clean the temporarily-generated directory that has perm 
000. See below for the code snippet that is buggy.
{code:java}
try {
  for (int i = 0; i < volumesFailed; i++) {
prepareDirToFail(dirs[i]); // this will chmod dirs[i] to perm 000
  }
  restartDatanodes(volumesTolerated, manageDfsDirs);
} catch (DiskErrorException e) {
 ...
} finally {
...
}
 
  assertEquals(expectedBPServiceState, bpServiceState);
 
  for (File dir : dirs) {
FileUtil.chmod(dir.toString(), "755");
  }
}
{code}
The failure of the statement `assertEquals(expectedBPServiceState, 
bpServiceState)` caused function to terminate without executing 
`FileUtil.chmod(dir.toString(), "755")` for each temporary directory with 
invalid perm 000 the test has created. 

 

*Consequence:*

Any subsequent `mvn test` command would fail to run if this test had failed 
before. It is because Maven failed to build itself since it does not have 
permission to clean this temporarily-generated directory. For details of the 
failure, see below:
{noformat}
[INFO] --- maven-antrun-plugin:1.7:run (create-log-dir) @ hadoop-hdfs ---
[INFO] Executing tasks
 
main:
[delete] Deleting directory 
/home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time:  8.349 s
[INFO] Finished at: 2019-12-27T03:53:04-06:00
[INFO] 
[ERROR] Failed to execute 
goalorg.apache.maven.plugins:maven-antrun-plugin:1.7:run (create-log-dir) on 
project hadoop-hdfs: An Ant BuildException has occured: Unable to delete 
directory 
/home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/current
[ERROR] around Ant part ..
 @ 4:105 in 
/home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/antrun/build-main.xml
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException{noformat}
 

*Root Cause:*

The test helper function 
`org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration#testVolumeConfig`
 purposely set the directory 
`/home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/current`
 to have perm 000. And at the end of this function, it changed the perm of this 
directory to 755. However, there is an assertion in this function before the 
perm was able to changed to 755. Once this assertion fails, the function 
terminates before the directory’s perm can be changed to 755. Hence, this 
directory was later unable to be removed by Maven for when executing `mvn 
test`. 

 

*Fix:*

In 
`org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration#testVolumeConfig`,
 move the assertion `assertEquals(expectedBPServiceState, bpServiceState)`  to 
the last line of this function. This fix will fix the bug and will not change 
the test outcome. 

  was:
*Description:*

Actively-used test helper function `testVolumeConfig` in 
`org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration` 
chmod a directory with invalid perm 000 for testing purposes but later failed 
to chmod back this directory with a valid perm if the assertion inside this 
function failed. Any subsequent `mvn test` command would fail to run if this 
test had failed before. It is because Maven failed to build itself as it did 
not have permission to clean the temporarily-generated directory that has perm 
000. See below for the code snippet that is buggy.

 

 
{code:java}
try {
  for (int i = 0; i < volumesFailed; i++) {
prepareDirToFail(dirs[i]); // this will chmod dirs[i] to perm 000
  }
  restartDatanodes(volumesTolerated, manageDfsDirs);
} catch (DiskErrorException e) {
 ...
} finally {
...
}
 
  

[jira] [Updated] (HDFS-15128) Unit test failing to clean testing data and caused Maven to crash

2020-01-16 Thread Ctest (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ctest updated HDFS-15128:
-
Attachment: HDFS-15128-000.patch
  Tags:   (was: test)

> Unit test failing to clean testing data and caused Maven to crash
> -
>
> Key: HDFS-15128
> URL: https://issues.apache.org/jira/browse/HDFS-15128
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, test
>Affects Versions: 3.2.1
>Reporter: Ctest
>Priority: Critical
>  Labels: easyfix, patch, test
> Attachments: HDFS-15128-000.patch
>
>
> *Description:*
> Actively-used test helper function `testVolumeConfig` in 
> `org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration` 
> chmod a directory with invalid perm 000 for testing purposes but later failed 
> to chmod back this directory with a valid perm if the assertion inside this 
> function failed. Any subsequent `mvn test` command would fail to run if this 
> test had failed before. It is because Maven failed to build itself as it did 
> not have permission to clean the temporarily-generated directory that has 
> perm 000. See below for the code snippet that is buggy.
>  
>  
> {code:java}
> try {
>   for (int i = 0; i < volumesFailed; i++) {
> prepareDirToFail(dirs[i]); // this will chmod dirs[i] to perm 000
>   }
>   restartDatanodes(volumesTolerated, manageDfsDirs);
> } catch (DiskErrorException e) {
>  ...
> } finally {
> ...
> }
>  
>   assertEquals(expectedBPServiceState, bpServiceState);
>  
>   for (File dir : dirs) {
> FileUtil.chmod(dir.toString(), "755");
>   }
> }
> {code}
>  
>  
> The failure of the statement `assertEquals(expectedBPServiceState, 
> bpServiceState)` caused function to terminate without executing 
> `FileUtil.chmod(dir.toString(), "755")` for each temporary directory with 
> invalid perm 000 the test has created. 
>  
> *Consequence:*
> Any subsequent `mvn test` command would fail to run if this test had failed 
> before. It is because Maven failed to build itself since it does not have 
> permission to clean this temporarily-generated directory. For details of the 
> failure, see below:
>  
> {noformat}
> [INFO] --- maven-antrun-plugin:1.7:run (create-log-dir) @ hadoop-hdfs ---
> [INFO] Executing tasks
>  
> main:
> [delete] Deleting directory 
> /home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time:  8.349 s
> [INFO] Finished at: 2019-12-27T03:53:04-06:00
> [INFO] 
> 
> [ERROR] Failed to execute 
> goalorg.apache.maven.plugins:maven-antrun-plugin:1.7:run (create-log-dir) on 
> project hadoop-hdfs: An Ant BuildException has occured: Unable to delete 
> directory 
> /home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/current
> [ERROR] around Ant part ... dir="/home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data"/>...
>  @ 4:105 in 
> /home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/antrun/build-main.xml
> [ERROR] -> [Help 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException{noformat}
>  
>  
>  
> *Root Cause:*
> The test helper function 
> `org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration#testVolumeConfig`
>  purposely set the directory 
> `/home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/current`
>  to have perm 000. And at the end of this function, it changed the perm of 
> this directory to 755. However, there is an assertion in this function before 
> the perm was able to changed to 755. Once this assertion fails, the function 
> terminates before the directory’s perm can be changed to 755. Hence, this 
> directory was later unable to be removed by Maven for when executing `mvn 
> test`. 
>  
> *Fix:*
> In 
> `org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration#testVolumeConfig`,
>  move the assertion `assertEquals(expectedBPServiceState, bpServiceState)`  
> to the last line of this function. This fix will fix the bug and will not 
> change the test outcome. 
>  
> *Content for the patch:*
> {code:java}
> diff 

[jira] [Updated] (HDFS-15127) RBF: Do not allow writes when a subcluster is unavailable for HASH_ALL mount points.

2020-01-16 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HDFS-15127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated HDFS-15127:
---
Status: Patch Available  (was: Open)

> RBF: Do not allow writes when a subcluster is unavailable for HASH_ALL mount 
> points.
> 
>
> Key: HDFS-15127
> URL: https://issues.apache.org/jira/browse/HDFS-15127
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Íñigo Goiri
>Priority: Major
> Attachments: HDFS-15127.000.patch
>
>
> A HASH_ALL mount point should not allow creating new files if one subcluster 
> is down.
> If the file already existed in the past, this could lead to inconsistencies.
> We should return an unavailable exception.
> {{TestRouterFaultTolerant#testWriteWithFailedSubcluster()}} needs to be 
> changed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15128) Unit test failing to clean testing data and caused Maven to crash

2020-01-16 Thread Ctest (Jira)
Ctest created HDFS-15128:


 Summary: Unit test failing to clean testing data and caused Maven 
to crash
 Key: HDFS-15128
 URL: https://issues.apache.org/jira/browse/HDFS-15128
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs, test
Affects Versions: 3.2.1
Reporter: Ctest


*Description:*

Actively-used test helper function `testVolumeConfig` in 
`org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration` 
chmod a directory with invalid perm 000 for testing purposes but later failed 
to chmod back this directory with a valid perm if the assertion inside this 
function failed. Any subsequent `mvn test` command would fail to run if this 
test had failed before. It is because Maven failed to build itself as it did 
not have permission to clean the temporarily-generated directory that has perm 
000. See below for the code snippet that is buggy.

 

 
{code:java}
try {
  for (int i = 0; i < volumesFailed; i++) {
prepareDirToFail(dirs[i]); // this will chmod dirs[i] to perm 000
  }
  restartDatanodes(volumesTolerated, manageDfsDirs);
} catch (DiskErrorException e) {
 ...
} finally {
...
}
 
  assertEquals(expectedBPServiceState, bpServiceState);
 
  for (File dir : dirs) {
FileUtil.chmod(dir.toString(), "755");
  }
}
{code}
 

 

The failure of the statement `assertEquals(expectedBPServiceState, 
bpServiceState)` caused function to terminate without executing 
`FileUtil.chmod(dir.toString(), "755")` for each temporary directory with 
invalid perm 000 the test has created. 

 

*Consequence:*

Any subsequent `mvn test` command would fail to run if this test had failed 
before. It is because Maven failed to build itself since it does not have 
permission to clean this temporarily-generated directory. For details of the 
failure, see below:

 
{noformat}
[INFO] --- maven-antrun-plugin:1.7:run (create-log-dir) @ hadoop-hdfs ---
[INFO] Executing tasks
 
main:
[delete] Deleting directory 
/home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time:  8.349 s
[INFO] Finished at: 2019-12-27T03:53:04-06:00
[INFO] 
[ERROR] Failed to execute 
goalorg.apache.maven.plugins:maven-antrun-plugin:1.7:run (create-log-dir) on 
project hadoop-hdfs: An Ant BuildException has occured: Unable to delete 
directory 
/home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/current
[ERROR] around Ant part ..
 @ 4:105 in 
/home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/antrun/build-main.xml
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException{noformat}
 

 

 

*Root Cause:*

The test helper function 
`org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration#testVolumeConfig`
 purposely set the directory 
`/home/ctest/app/Ctest-Hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/current`
 to have perm 000. And at the end of this function, it changed the perm of this 
directory to 755. However, there is an assertion in this function before the 
perm was able to changed to 755. Once this assertion fails, the function 
terminates before the directory’s perm can be changed to 755. Hence, this 
directory was later unable to be removed by Maven for when executing `mvn 
test`. 

 

*Fix:*

In 
`org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration#testVolumeConfig`,
 move the assertion `assertEquals(expectedBPServiceState, bpServiceState)`  to 
the last line of this function. This fix will fix the bug and will not change 
the test outcome. 

 

*Content for the patch:*
{code:java}
diff --git 
a/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailureToleration.java
 
b/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailureToleration.java
index a9e4096df4b..a492fa5fd44 100644
--- 
a/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailureToleration.java
+++ 
b/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailureToleration.java
@@ -256,11 +256,11 @@ private void testVolumeConfig(int volumesTolerated, int 

[jira] [Commented] (HDFS-12943) Consistent Reads from Standby Node

2020-01-16 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-12943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017634#comment-17017634
 ] 

Konstantin Shvachko commented on HDFS-12943:


Hey [~lindy_hopper] yes we currently recommend turning off access time updates 
on Observers as [~vagarychen] said.
We plan to bring aTime updates back with HDFS-15118. Observer will bounce such 
getBlockLocation() calls to Active, so that it could actually update the time.

> Consistent Reads from Standby Node
> --
>
> Key: HDFS-12943
> URL: https://issues.apache.org/jira/browse/HDFS-12943
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Major
> Fix For: 2.10.0, 3.3.0, 3.1.4, 3.2.2
>
> Attachments: ConsistentReadsFromStandbyNode.pdf, 
> ConsistentReadsFromStandbyNode.pdf, HDFS-12943-001.patch, 
> HDFS-12943-002.patch, HDFS-12943-003.patch, HDFS-12943-004.patch, 
> TestPlan-ConsistentReadsFromStandbyNode.pdf
>
>
> StandbyNode in HDFS is a replica of the active NameNode. The states of the 
> NameNodes are coordinated via the journal. It is natural to consider 
> StandbyNode as a read-only replica. As with any replicated distributed system 
> the problem of stale reads should be resolved. Our main goal is to provide 
> reads from standby in a consistent way in order to enable a wide range of 
> existing applications running on top of HDFS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15111) start / stopStandbyServices() should log which service it is transitioning to/from.

2020-01-16 Thread Konstantin Shvachko (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-15111:
---
Labels: newbie++  (was: )

> start / stopStandbyServices() should log which service it is transitioning 
> to/from.
> ---
>
> Key: HDFS-15111
> URL: https://issues.apache.org/jira/browse/HDFS-15111
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, logging
>Affects Versions: 2.10.0
>Reporter: Konstantin Shvachko
>Priority: Major
>  Labels: newbie++
>
> Trying to transition Observer to Standby state. Both 
> {{stopStandbyServices()}} and {{startStandbyServices()}} log that they are 
> stopping/starting Standby services.
> # {{startStandbyServices()}} should log which state it is transitioning TO.
> # {{stopStandbyServices()}} should log which state it is transitioning FROM.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15124) Crashing bugs in NameNode when using a valid configuration for `dfs.namenode.audit.loggers`

2020-01-16 Thread Ctest (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017619#comment-17017619
 ] 

Ctest commented on HDFS-15124:
--

Uploaded one new patch for trunk.

> Crashing bugs in NameNode when using a valid configuration for 
> `dfs.namenode.audit.loggers`
> ---
>
> Key: HDFS-15124
> URL: https://issues.apache.org/jira/browse/HDFS-15124
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.10.0
>Reporter: Ctest
>Priority: Critical
> Attachments: HDFS-15124.000.patch, HDFS-15124.001.patch, 
> HDFS-15124.002.patch
>
>
> I am using Hadoop-2.10.0.
> The configuration parameter `dfs.namenode.audit.loggers` allows `default` 
> (which is the default value) and 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`.
> When I use `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, 
> namenode will not be started successfully because of an 
> `InstantiationException` thrown from 
> `org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers`. 
> The root cause is that while initializing namenode, `initAuditLoggers` will 
> be called and it will try to call the default constructor of 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger` which doesn't 
> have a default constructor. Thus the `InstantiationException` exception is 
> thrown.
>  
> *Symptom*
> *$ ./start-dfs.sh*
> {code:java}
> 2019-12-18 14:05:20,670 ERROR 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem 
> initialization failed.java.lang.RuntimeException: 
> java.lang.InstantiationException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1024)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:858)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:677)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:674)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:736)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:961)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:940)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1714)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1782)
> Caused by: java.lang.InstantiationException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger
> at java.lang.Class.newInstance(Class.java:427)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1017)...
> 8 more
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger.()
> at java.lang.Class.getConstructor0(Class.java:3082)
> at java.lang.Class.newInstance(Class.java:412)
> ... 9 more{code}
>  
>  
> *Detailed Root Cause*
> There is no default constructor in 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`: 
> {code:java}
> /** 
>  * An {@link AuditLogger} that sends logged data directly to the metrics 
>  * systems. It is used when the top service is used directly by the name node 
>  */ 
> @InterfaceAudience.Private 
> public class TopAuditLogger implements AuditLogger { 
>   public static finalLogger LOG = 
> LoggerFactory.getLogger(TopAuditLogger.class); 
>   private final TopMetrics topMetrics; 
>   public TopAuditLogger(TopMetrics topMetrics) {
> Preconditions.checkNotNull(topMetrics, "Cannot init with a null " + 
> "TopMetrics");
> this.topMetrics = topMetrics; 
>   }
>   @Override
>   public void initialize(Configuration conf) { 
>   }
> {code}
> As long as the configuration parameter `dfs.namenode.audit.loggers` is set to 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, 
> `initAuditLoggers` will try to call its default constructor to make a new 
> instance: 
> {code:java}
> private List initAuditLoggers(Configuration conf) {
>   // Initialize the custom access loggers if configured.
>   Collection alClasses =
>       conf.getTrimmedStringCollection(DFS_NAMENODE_AUDIT_LOGGERS_KEY);
>   List auditLoggers = Lists.newArrayList();
>   if (alClasses != null && !alClasses.isEmpty()) {
>     for (String className : alClasses) {
>       try {
>         AuditLogger logger;
>         if (DFS_NAMENODE_DEFAULT_AUDIT_LOGGER_NAME.equals(className)) {
>           logger = new DefaultAuditLogger();
>         } else {
>           logger = (AuditLogger) Class.forName(className).newInstance();
>         }
>         logger.initialize(conf);
>         auditLoggers.add(logger);
>       } catch (RuntimeException re) {
>         throw re;
>       

[jira] [Updated] (HDFS-15126) testForcedRegistration fails Intermittently

2020-01-16 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated HDFS-15126:
-
Attachment: HDFS-15126.001.patch

> testForcedRegistration fails Intermittently
> ---
>
> Key: HDFS-15126
> URL: https://issues.apache.org/jira/browse/HDFS-15126
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: HDFS-15126.001.patch
>
>
> Recently, {{TestDatanodeRegistration.testForcedRegistration}} started to fail 
> in the recommit builds:
> {code:bash}
> [ERROR] Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 
> 25.333 s <<< FAILURE! - in org.apache.hadoop.hdfs.TestDatanodeRegistration
> [ERROR] 
> testForcedRegistration(org.apache.hadoop.hdfs.TestDatanodeRegistration)  Time 
> elapsed: 11.39 s  <<< FAILURE!
> java.lang.AssertionError
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.hdfs.TestDatanodeRegistration.testForcedRegistration(TestDatanodeRegistration.java:385)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15126) testForcedRegistration fails Intermittently

2020-01-16 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated HDFS-15126:
-
Status: Patch Available  (was: Open)

> testForcedRegistration fails Intermittently
> ---
>
> Key: HDFS-15126
> URL: https://issues.apache.org/jira/browse/HDFS-15126
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
>
> Recently, {{TestDatanodeRegistration.testForcedRegistration}} started to fail 
> in the recommit builds:
> {code:bash}
> [ERROR] Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 
> 25.333 s <<< FAILURE! - in org.apache.hadoop.hdfs.TestDatanodeRegistration
> [ERROR] 
> testForcedRegistration(org.apache.hadoop.hdfs.TestDatanodeRegistration)  Time 
> elapsed: 11.39 s  <<< FAILURE!
> java.lang.AssertionError
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.hdfs.TestDatanodeRegistration.testForcedRegistration(TestDatanodeRegistration.java:385)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15124) Crashing bugs in NameNode when using a valid configuration for `dfs.namenode.audit.loggers`

2020-01-16 Thread Ctest (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ctest updated HDFS-15124:
-
Attachment: HDFS-15124.002.patch

> Crashing bugs in NameNode when using a valid configuration for 
> `dfs.namenode.audit.loggers`
> ---
>
> Key: HDFS-15124
> URL: https://issues.apache.org/jira/browse/HDFS-15124
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.10.0
>Reporter: Ctest
>Priority: Critical
> Attachments: HDFS-15124.000.patch, HDFS-15124.001.patch, 
> HDFS-15124.002.patch
>
>
> I am using Hadoop-2.10.0.
> The configuration parameter `dfs.namenode.audit.loggers` allows `default` 
> (which is the default value) and 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`.
> When I use `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, 
> namenode will not be started successfully because of an 
> `InstantiationException` thrown from 
> `org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers`. 
> The root cause is that while initializing namenode, `initAuditLoggers` will 
> be called and it will try to call the default constructor of 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger` which doesn't 
> have a default constructor. Thus the `InstantiationException` exception is 
> thrown.
>  
> *Symptom*
> *$ ./start-dfs.sh*
> {code:java}
> 2019-12-18 14:05:20,670 ERROR 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem 
> initialization failed.java.lang.RuntimeException: 
> java.lang.InstantiationException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1024)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:858)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:677)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:674)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:736)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:961)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:940)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1714)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1782)
> Caused by: java.lang.InstantiationException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger
> at java.lang.Class.newInstance(Class.java:427)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1017)...
> 8 more
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger.()
> at java.lang.Class.getConstructor0(Class.java:3082)
> at java.lang.Class.newInstance(Class.java:412)
> ... 9 more{code}
>  
>  
> *Detailed Root Cause*
> There is no default constructor in 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`: 
> {code:java}
> /** 
>  * An {@link AuditLogger} that sends logged data directly to the metrics 
>  * systems. It is used when the top service is used directly by the name node 
>  */ 
> @InterfaceAudience.Private 
> public class TopAuditLogger implements AuditLogger { 
>   public static finalLogger LOG = 
> LoggerFactory.getLogger(TopAuditLogger.class); 
>   private final TopMetrics topMetrics; 
>   public TopAuditLogger(TopMetrics topMetrics) {
> Preconditions.checkNotNull(topMetrics, "Cannot init with a null " + 
> "TopMetrics");
> this.topMetrics = topMetrics; 
>   }
>   @Override
>   public void initialize(Configuration conf) { 
>   }
> {code}
> As long as the configuration parameter `dfs.namenode.audit.loggers` is set to 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, 
> `initAuditLoggers` will try to call its default constructor to make a new 
> instance: 
> {code:java}
> private List initAuditLoggers(Configuration conf) {
>   // Initialize the custom access loggers if configured.
>   Collection alClasses =
>       conf.getTrimmedStringCollection(DFS_NAMENODE_AUDIT_LOGGERS_KEY);
>   List auditLoggers = Lists.newArrayList();
>   if (alClasses != null && !alClasses.isEmpty()) {
>     for (String className : alClasses) {
>       try {
>         AuditLogger logger;
>         if (DFS_NAMENODE_DEFAULT_AUDIT_LOGGER_NAME.equals(className)) {
>           logger = new DefaultAuditLogger();
>         } else {
>           logger = (AuditLogger) Class.forName(className).newInstance();
>         }
>         logger.initialize(conf);
>         auditLoggers.add(logger);
>       } catch (RuntimeException re) {
>         throw re;
>       } catch (Exception e) {
>         

[jira] [Commented] (HDFS-15125) Pull back fixes for DataNodeVolume* unit tests

2020-01-16 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017550#comment-17017550
 ] 

Hadoop QA commented on HDFS-15125:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 10m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} branch-2.10 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
39s{color} | {color:green} branch-2.10 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
5s{color} | {color:green} branch-2.10 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
2s{color} | {color:green} branch-2.10 passed with JDK v1.8.0_232 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
36s{color} | {color:green} branch-2.10 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
8s{color} | {color:green} branch-2.10 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
39s{color} | {color:green} branch-2.10 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
22s{color} | {color:green} branch-2.10 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
56s{color} | {color:green} branch-2.10 passed with JDK v1.8.0_232 {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed with JDK v1.8.0_232 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
19s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed with JDK v1.8.0_232 {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 87m  4s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
56s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}136m 38s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys |
|   | hadoop.hdfs.qjournal.TestSecureNNWithQJM |
|   | hadoop.hdfs.TestSetrepIncreasing |
|   | hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer |
|   | hadoop.hdfs.TestDFSStartupVersions |
|   | hadoop.hdfs.tools.TestDFSAdminWithHA |
|   | hadoop.hdfs.TestDFSClientRetries |
|   | hadoop.hdfs.TestFileConcurrentReader |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:a969cad0a12 |
| JIRA Issue | HDFS-15125 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12991172/HDFS-15125-branch-2.10.002.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 397ff213a26b 

[jira] [Updated] (HDFS-15118) [SBN Read] Slow clients when Observer reads are enabled but there are no Observers on the cluster.

2020-01-16 Thread Chen Liang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-15118:
--
Status: Patch Available  (was: Open)

> [SBN Read] Slow clients when Observer reads are enabled but there are no 
> Observers on the cluster.
> --
>
> Key: HDFS-15118
> URL: https://issues.apache.org/jira/browse/HDFS-15118
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.10.0
>Reporter: Konstantin Shvachko
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-15118.001.patch
>
>
> We see substantial degradation in performance of HDFS clients, when Observer 
> reads are enabled via {{ObserverReadProxyProvider}}, but there are no 
> ObserverNodes on the cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15118) [SBN Read] Slow clients when Observer reads are enabled but there are no Observers on the cluster.

2020-01-16 Thread Chen Liang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-15118:
--
Attachment: HDFS-15118.001.patch

> [SBN Read] Slow clients when Observer reads are enabled but there are no 
> Observers on the cluster.
> --
>
> Key: HDFS-15118
> URL: https://issues.apache.org/jira/browse/HDFS-15118
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.10.0
>Reporter: Konstantin Shvachko
>Priority: Major
> Attachments: HDFS-15118.001.patch
>
>
> We see substantial degradation in performance of HDFS clients, when Observer 
> reads are enabled via {{ObserverReadProxyProvider}}, but there are no 
> ObserverNodes on the cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-15118) [SBN Read] Slow clients when Observer reads are enabled but there are no Observers on the cluster.

2020-01-16 Thread Chen Liang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang reassigned HDFS-15118:
-

Assignee: Chen Liang

> [SBN Read] Slow clients when Observer reads are enabled but there are no 
> Observers on the cluster.
> --
>
> Key: HDFS-15118
> URL: https://issues.apache.org/jira/browse/HDFS-15118
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.10.0
>Reporter: Konstantin Shvachko
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-15118.001.patch
>
>
> We see substantial degradation in performance of HDFS clients, when Observer 
> reads are enabled via {{ObserverReadProxyProvider}}, but there are no 
> ObserverNodes on the cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15124) Crashing bugs in NameNode when using a valid configuration for `dfs.namenode.audit.loggers`

2020-01-16 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017471#comment-17017471
 ] 

Hadoop QA commented on HDFS-15124:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
51s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 29s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
15s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 41s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 1 new + 166 unchanged - 0 fixed = 167 total (was 166) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 33s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
24s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}134m  8s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
33s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}198m 52s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestDecommission |
|   | hadoop.hdfs.server.namenode.snapshot.TestSnapshot |
|   | hadoop.hdfs.server.namenode.TestINodeAttributeProvider |
|   | hadoop.hdfs.TestDatanodeRegistration |
|   | hadoop.hdfs.server.namenode.TestRedudantBlocks |
|   | hadoop.hdfs.server.namenode.TestFSImage |
|   | hadoop.hdfs.server.namenode.TestFsck |
|   | hadoop.hdfs.server.namenode.snapshot.TestSnapshotDeletion |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:c44943d1fc3 |
| JIRA Issue | HDFS-15124 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12991162/HDFS-15124.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 8fac4d4b27be 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 
08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / a0ff42d |
| maven | version: Apache Maven 

[jira] [Commented] (HDFS-15125) Pull back fixes for DataNodeVolume* unit tests

2020-01-16 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017464#comment-17017464
 ] 

Jim Brennan commented on HDFS-15125:


There was a problem with my back-port of HDFS-13945, so 
TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure was still failing 
for me intermittently.

This is fixed in patch 002.

> Pull back fixes for DataNodeVolume* unit tests
> --
>
> Key: HDFS-15125
> URL: https://issues.apache.org/jira/browse/HDFS-15125
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: hdfs
>Affects Versions: 2.10.0
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: HDFS-15125-branch-2.10.001.patch, 
> HDFS-15125-branch-2.10.002.patch
>
>
> I would like to pull back some fixes for the DataNodeVolume* tests to resolve 
> some intermittent failures we are seeing on branch-2.10.
> The fixes are:
> HDFS-11353 Improve the unit tests relevant to DataNode volume failure testing
> HDFS-13993 
> TestDataNodeVolumeFailure#testTolerateVolumeFailuresAfterAddingMoreVolumes is 
> flaky
> HDFS-14324 Fix TestDataNodeVolumeFailure
> HDFS-13945 TestDataNodeVolumeFailure is Flaky



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15125) Pull back fixes for DataNodeVolume* unit tests

2020-01-16 Thread Jim Brennan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Brennan updated HDFS-15125:
---
Attachment: HDFS-15125-branch-2.10.002.patch

> Pull back fixes for DataNodeVolume* unit tests
> --
>
> Key: HDFS-15125
> URL: https://issues.apache.org/jira/browse/HDFS-15125
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: hdfs
>Affects Versions: 2.10.0
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: HDFS-15125-branch-2.10.001.patch, 
> HDFS-15125-branch-2.10.002.patch
>
>
> I would like to pull back some fixes for the DataNodeVolume* tests to resolve 
> some intermittent failures we are seeing on branch-2.10.
> The fixes are:
> HDFS-11353 Improve the unit tests relevant to DataNode volume failure testing
> HDFS-13993 
> TestDataNodeVolumeFailure#testTolerateVolumeFailuresAfterAddingMoreVolumes is 
> flaky
> HDFS-14324 Fix TestDataNodeVolumeFailure
> HDFS-13945 TestDataNodeVolumeFailure is Flaky



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15124) Crashing bugs in NameNode when using a valid configuration for `dfs.namenode.audit.loggers`

2020-01-16 Thread Ctest (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017418#comment-17017418
 ] 

Ctest edited comment on HDFS-15124 at 1/16/20 7:55 PM:
---

[~elgoiri] Sorry that I was using hadoop-2.10.0 and the FSNameSystem was using 
{code:java}
public static final Log LOG = LogFactory.getLog(FSNamesystem.class);{code}
[https://github.com/apache/hadoop/blob/e2f1f118e465e787d8567dfa6e2f3b72a0eb9194/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java#L328]

It seems that 3.x.x is using org.slf4j.Logger but 2.x.x is using 
org.apache.commons.logging.Log for FSNanesystem.java.

I will write another patch for trunk and upload it again. Thank you for 
pointing this out!


was (Author: ctest.team):
I was using hadoop-2.10.0 and the FSNameSystem was using 
{code:java}
public static final Log LOG = LogFactory.getLog(FSNamesystem.class);{code}
[https://github.com/apache/hadoop/blob/e2f1f118e465e787d8567dfa6e2f3b72a0eb9194/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java#L328]

Should I use the 3.x.x version to do the fix? It seems that 3.x.x is using 
org.slf4j.Logger but 2.x.x is using org.apache.commons.logging.Log for 
FSNanesystem.java

> Crashing bugs in NameNode when using a valid configuration for 
> `dfs.namenode.audit.loggers`
> ---
>
> Key: HDFS-15124
> URL: https://issues.apache.org/jira/browse/HDFS-15124
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.10.0
>Reporter: Ctest
>Priority: Critical
> Attachments: HDFS-15124.000.patch, HDFS-15124.001.patch
>
>
> I am using Hadoop-2.10.0.
> The configuration parameter `dfs.namenode.audit.loggers` allows `default` 
> (which is the default value) and 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`.
> When I use `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, 
> namenode will not be started successfully because of an 
> `InstantiationException` thrown from 
> `org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers`. 
> The root cause is that while initializing namenode, `initAuditLoggers` will 
> be called and it will try to call the default constructor of 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger` which doesn't 
> have a default constructor. Thus the `InstantiationException` exception is 
> thrown.
>  
> *Symptom*
> *$ ./start-dfs.sh*
> {code:java}
> 2019-12-18 14:05:20,670 ERROR 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem 
> initialization failed.java.lang.RuntimeException: 
> java.lang.InstantiationException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1024)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:858)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:677)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:674)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:736)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:961)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:940)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1714)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1782)
> Caused by: java.lang.InstantiationException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger
> at java.lang.Class.newInstance(Class.java:427)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1017)...
> 8 more
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger.()
> at java.lang.Class.getConstructor0(Class.java:3082)
> at java.lang.Class.newInstance(Class.java:412)
> ... 9 more{code}
>  
>  
> *Detailed Root Cause*
> There is no default constructor in 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`: 
> {code:java}
> /** 
>  * An {@link AuditLogger} that sends logged data directly to the metrics 
>  * systems. It is used when the top service is used directly by the name node 
>  */ 
> @InterfaceAudience.Private 
> public class TopAuditLogger implements AuditLogger { 
>   public static finalLogger LOG = 
> LoggerFactory.getLogger(TopAuditLogger.class); 
>   private final TopMetrics topMetrics; 
>   public TopAuditLogger(TopMetrics topMetrics) {
> Preconditions.checkNotNull(topMetrics, "Cannot init with a null " + 
> "TopMetrics");
> this.topMetrics = topMetrics; 
>   }
>   

[jira] [Commented] (HDFS-15124) Crashing bugs in NameNode when using a valid configuration for `dfs.namenode.audit.loggers`

2020-01-16 Thread Ctest (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017418#comment-17017418
 ] 

Ctest commented on HDFS-15124:
--

I was using hadoop-2.10.0 and the FSNameSystem was using 
{code:java}
public static final Log LOG = LogFactory.getLog(FSNamesystem.class);{code}
[https://github.com/apache/hadoop/blob/e2f1f118e465e787d8567dfa6e2f3b72a0eb9194/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java#L328]

Should I use the 3.x.x version to do the fix? It seems that 3.x.x is using 
org.slf4j.Logger but 2.x.x is using org.apache.commons.logging.Log for 
FSNanesystem.java

> Crashing bugs in NameNode when using a valid configuration for 
> `dfs.namenode.audit.loggers`
> ---
>
> Key: HDFS-15124
> URL: https://issues.apache.org/jira/browse/HDFS-15124
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.10.0
>Reporter: Ctest
>Priority: Critical
> Attachments: HDFS-15124.000.patch, HDFS-15124.001.patch
>
>
> I am using Hadoop-2.10.0.
> The configuration parameter `dfs.namenode.audit.loggers` allows `default` 
> (which is the default value) and 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`.
> When I use `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, 
> namenode will not be started successfully because of an 
> `InstantiationException` thrown from 
> `org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers`. 
> The root cause is that while initializing namenode, `initAuditLoggers` will 
> be called and it will try to call the default constructor of 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger` which doesn't 
> have a default constructor. Thus the `InstantiationException` exception is 
> thrown.
>  
> *Symptom*
> *$ ./start-dfs.sh*
> {code:java}
> 2019-12-18 14:05:20,670 ERROR 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem 
> initialization failed.java.lang.RuntimeException: 
> java.lang.InstantiationException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1024)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:858)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:677)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:674)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:736)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:961)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:940)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1714)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1782)
> Caused by: java.lang.InstantiationException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger
> at java.lang.Class.newInstance(Class.java:427)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1017)...
> 8 more
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger.()
> at java.lang.Class.getConstructor0(Class.java:3082)
> at java.lang.Class.newInstance(Class.java:412)
> ... 9 more{code}
>  
>  
> *Detailed Root Cause*
> There is no default constructor in 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`: 
> {code:java}
> /** 
>  * An {@link AuditLogger} that sends logged data directly to the metrics 
>  * systems. It is used when the top service is used directly by the name node 
>  */ 
> @InterfaceAudience.Private 
> public class TopAuditLogger implements AuditLogger { 
>   public static finalLogger LOG = 
> LoggerFactory.getLogger(TopAuditLogger.class); 
>   private final TopMetrics topMetrics; 
>   public TopAuditLogger(TopMetrics topMetrics) {
> Preconditions.checkNotNull(topMetrics, "Cannot init with a null " + 
> "TopMetrics");
> this.topMetrics = topMetrics; 
>   }
>   @Override
>   public void initialize(Configuration conf) { 
>   }
> {code}
> As long as the configuration parameter `dfs.namenode.audit.loggers` is set to 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, 
> `initAuditLoggers` will try to call its default constructor to make a new 
> instance: 
> {code:java}
> private List initAuditLoggers(Configuration conf) {
>   // Initialize the custom access loggers if configured.
>   Collection alClasses =
>       conf.getTrimmedStringCollection(DFS_NAMENODE_AUDIT_LOGGERS_KEY);
>   List auditLoggers = Lists.newArrayList();
>   if (alClasses != null && !alClasses.isEmpty()) {
>     for 

[jira] [Updated] (HDFS-15127) RBF: Do not allow writes when a subcluster is unavailable for HASH_ALL mount points.

2020-01-16 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HDFS-15127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated HDFS-15127:
---
Attachment: HDFS-15127.000.patch

> RBF: Do not allow writes when a subcluster is unavailable for HASH_ALL mount 
> points.
> 
>
> Key: HDFS-15127
> URL: https://issues.apache.org/jira/browse/HDFS-15127
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Íñigo Goiri
>Priority: Major
> Attachments: HDFS-15127.000.patch
>
>
> A HASH_ALL mount point should not allow creating new files if one subcluster 
> is down.
> If the file already existed in the past, this could lead to inconsistencies.
> We should return an unavailable exception.
> {{TestRouterFaultTolerant#testWriteWithFailedSubcluster()}} needs to be 
> changed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15112) RBF: Do not return FileNotFoundException when a subcluster is unavailable

2020-01-16 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017409#comment-17017409
 ] 

Hudson commented on HDFS-15112:
---

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17873 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17873/])
HDFS-15112. RBF: Do not return FileNotFoundException when a subcluster 
(inigoiri: rev 263413e83840c7795a988e3939cd292d020c8d5f)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterRpcServer.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterRpcClient.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/router/TestRouterFaultTolerant.java


> RBF: Do not return FileNotFoundException when a subcluster is unavailable 
> --
>
> Key: HDFS-15112
> URL: https://issues.apache.org/jira/browse/HDFS-15112
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-15112.000.patch, HDFS-15112.001.patch, 
> HDFS-15112.002.patch, HDFS-15112.004.patch, HDFS-15112.005.patch, 
> HDFS-15112.006.patch, HDFS-15112.007.patch, HDFS-15112.008.patch, 
> HDFS-15112.009.patch, HDFS-15112.patch
>
>
> If we have a mount point using HASH_ALL across two subclusters and one of 
> them is down, we may return FileNotFoundException while the file is just in 
> the unavailable subcluster.
> We should not return FileNotFoundException but something that shows that the 
> subcluster is unavailable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15112) RBF: Do not return FileNotFoundException when a subcluster is unavailable

2020-01-16 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-15112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017405#comment-17017405
 ] 

Íñigo Goiri commented on HDFS-15112:


Thanks [~ayushtkn] for the review.
Committed to trunk.
Opened HDFS-15127 to fix the writes.

> RBF: Do not return FileNotFoundException when a subcluster is unavailable 
> --
>
> Key: HDFS-15112
> URL: https://issues.apache.org/jira/browse/HDFS-15112
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-15112.000.patch, HDFS-15112.001.patch, 
> HDFS-15112.002.patch, HDFS-15112.004.patch, HDFS-15112.005.patch, 
> HDFS-15112.006.patch, HDFS-15112.007.patch, HDFS-15112.008.patch, 
> HDFS-15112.009.patch, HDFS-15112.patch
>
>
> If we have a mount point using HASH_ALL across two subclusters and one of 
> them is down, we may return FileNotFoundException while the file is just in 
> the unavailable subcluster.
> We should not return FileNotFoundException but something that shows that the 
> subcluster is unavailable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15127) RBF: Do not allow writes when a subcluster is unavailable for HASH_ALL mount points.

2020-01-16 Thread Jira
Íñigo Goiri created HDFS-15127:
--

 Summary: RBF: Do not allow writes when a subcluster is unavailable 
for HASH_ALL mount points.
 Key: HDFS-15127
 URL: https://issues.apache.org/jira/browse/HDFS-15127
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Íñigo Goiri


A HASH_ALL mount point should not allow creating new files if one subcluster is 
down.
If the file already existed in the past, this could lead to inconsistencies.
We should return an unavailable exception.
{{TestRouterFaultTolerant#testWriteWithFailedSubcluster()}} needs to be changed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15112) RBF: Do not return FileNotFoundException when a subcluster is unavailable

2020-01-16 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HDFS-15112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated HDFS-15112:
---
Fix Version/s: 3.3.0
 Hadoop Flags: Reviewed
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> RBF: Do not return FileNotFoundException when a subcluster is unavailable 
> --
>
> Key: HDFS-15112
> URL: https://issues.apache.org/jira/browse/HDFS-15112
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-15112.000.patch, HDFS-15112.001.patch, 
> HDFS-15112.002.patch, HDFS-15112.004.patch, HDFS-15112.005.patch, 
> HDFS-15112.006.patch, HDFS-15112.007.patch, HDFS-15112.008.patch, 
> HDFS-15112.009.patch, HDFS-15112.patch
>
>
> If we have a mount point using HASH_ALL across two subclusters and one of 
> them is down, we may return FileNotFoundException while the file is just in 
> the unavailable subcluster.
> We should not return FileNotFoundException but something that shows that the 
> subcluster is unavailable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15112) RBF: Do not return FileNotFoundException when a subcluster is unavailable

2020-01-16 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-15112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017401#comment-17017401
 ] 

Íñigo Goiri commented on HDFS-15112:


Correct, the change in RouterRPCServer is only there to make the current 
testWriteWithFailedSubcluster() work.
Let's open a new JIRA to change testWriteWithFailedSubcluster() and make it 
always fail for writes if one subcluster is down.

> RBF: Do not return FileNotFoundException when a subcluster is unavailable 
> --
>
> Key: HDFS-15112
> URL: https://issues.apache.org/jira/browse/HDFS-15112
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
>Priority: Major
> Attachments: HDFS-15112.000.patch, HDFS-15112.001.patch, 
> HDFS-15112.002.patch, HDFS-15112.004.patch, HDFS-15112.005.patch, 
> HDFS-15112.006.patch, HDFS-15112.007.patch, HDFS-15112.008.patch, 
> HDFS-15112.009.patch, HDFS-15112.patch
>
>
> If we have a mount point using HASH_ALL across two subclusters and one of 
> them is down, we may return FileNotFoundException while the file is just in 
> the unavailable subcluster.
> We should not return FileNotFoundException but something that shows that the 
> subcluster is unavailable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15124) Crashing bugs in NameNode when using a valid configuration for `dfs.namenode.audit.loggers`

2020-01-16 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017346#comment-17017346
 ] 

Íñigo Goiri commented on HDFS-15124:


FSNameSystem seems to be using logger, right?
https://github.com/apache/hadoop/blob/a0ff42d7612e744e0bf88aa14078ea3ab6bcce49/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java#L373
It was done by HDFS-12552.

> Crashing bugs in NameNode when using a valid configuration for 
> `dfs.namenode.audit.loggers`
> ---
>
> Key: HDFS-15124
> URL: https://issues.apache.org/jira/browse/HDFS-15124
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.10.0
>Reporter: Ctest
>Priority: Critical
> Attachments: HDFS-15124.000.patch, HDFS-15124.001.patch
>
>
> I am using Hadoop-2.10.0.
> The configuration parameter `dfs.namenode.audit.loggers` allows `default` 
> (which is the default value) and 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`.
> When I use `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, 
> namenode will not be started successfully because of an 
> `InstantiationException` thrown from 
> `org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers`. 
> The root cause is that while initializing namenode, `initAuditLoggers` will 
> be called and it will try to call the default constructor of 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger` which doesn't 
> have a default constructor. Thus the `InstantiationException` exception is 
> thrown.
>  
> *Symptom*
> *$ ./start-dfs.sh*
> {code:java}
> 2019-12-18 14:05:20,670 ERROR 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem 
> initialization failed.java.lang.RuntimeException: 
> java.lang.InstantiationException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1024)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:858)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:677)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:674)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:736)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:961)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:940)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1714)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1782)
> Caused by: java.lang.InstantiationException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger
> at java.lang.Class.newInstance(Class.java:427)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1017)...
> 8 more
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger.()
> at java.lang.Class.getConstructor0(Class.java:3082)
> at java.lang.Class.newInstance(Class.java:412)
> ... 9 more{code}
>  
>  
> *Detailed Root Cause*
> There is no default constructor in 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`: 
> {code:java}
> /** 
>  * An {@link AuditLogger} that sends logged data directly to the metrics 
>  * systems. It is used when the top service is used directly by the name node 
>  */ 
> @InterfaceAudience.Private 
> public class TopAuditLogger implements AuditLogger { 
>   public static finalLogger LOG = 
> LoggerFactory.getLogger(TopAuditLogger.class); 
>   private final TopMetrics topMetrics; 
>   public TopAuditLogger(TopMetrics topMetrics) {
> Preconditions.checkNotNull(topMetrics, "Cannot init with a null " + 
> "TopMetrics");
> this.topMetrics = topMetrics; 
>   }
>   @Override
>   public void initialize(Configuration conf) { 
>   }
> {code}
> As long as the configuration parameter `dfs.namenode.audit.loggers` is set to 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, 
> `initAuditLoggers` will try to call its default constructor to make a new 
> instance: 
> {code:java}
> private List initAuditLoggers(Configuration conf) {
>   // Initialize the custom access loggers if configured.
>   Collection alClasses =
>       conf.getTrimmedStringCollection(DFS_NAMENODE_AUDIT_LOGGERS_KEY);
>   List auditLoggers = Lists.newArrayList();
>   if (alClasses != null && !alClasses.isEmpty()) {
>     for (String className : alClasses) {
>       try {
>         AuditLogger logger;
>         if (DFS_NAMENODE_DEFAULT_AUDIT_LOGGER_NAME.equals(className)) {
>           logger = new DefaultAuditLogger();
>         } else {
>           

[jira] [Commented] (HDFS-15113) Missing IBR when NameNode restart if open processCommand async feature

2020-01-16 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-15113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017339#comment-17017339
 ] 

Íñigo Goiri commented on HDFS-15113:


I think we need to extend [^HDFS-15113.003.patch] to cover the old and the new 
tests.

> Missing IBR when NameNode restart if open processCommand async feature
> --
>
> Key: HDFS-15113
> URL: https://issues.apache.org/jira/browse/HDFS-15113
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Attachments: HDFS-15113.001.patch, HDFS-15113.002.patch, 
> HDFS-15113.003.patch
>
>
> Recently, I meet one case that NameNode missing block after restart which is 
> related with HDFS-14997.
> a. during NameNode restart, it will return command `DNA_REGISTER` to DataNode 
> when receive some RPC request from DataNode.
> b. when DataNode receive `DNA_REGISTER` command, it will run #reRegister 
> async.
> {code:java}
>   void reRegister() throws IOException {
> if (shouldRun()) {
>   // re-retrieve namespace info to make sure that, if the NN
>   // was restarted, we still match its version (HDFS-2120)
>   NamespaceInfo nsInfo = retrieveNamespaceInfo();
>   // and re-register
>   register(nsInfo);
>   scheduler.scheduleHeartbeat();
>   // HDFS-9917,Standby NN IBR can be very huge if standby namenode is down
>   // for sometime.
>   if (state == HAServiceState.STANDBY || state == 
> HAServiceState.OBSERVER) {
> ibrManager.clearIBRs();
>   }
> }
>   }
> {code}
> c. As we know, #register will trigger BR immediately.
> d. because #reRegister run async, so we could not make sure which one run 
> first between send FBR and clear IBR. If clean IBR run first, it will be OK. 
> But if send FBR first then clear IBR, it will missing some blocks received 
> between these two time point until next FBR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15124) Crashing bugs in NameNode when using a valid configuration for `dfs.namenode.audit.loggers`

2020-01-16 Thread Ctest (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017329#comment-17017329
 ] 

Ctest commented on HDFS-15124:
--

I uploaded a new patch to do all these changes.

[~elgoiri] I am sorry I didn't use {} style for the LOG in FSNamesystem.java 
because in FSNamesystem.java the LOG is `org.apache.commons.logging.Log` 
instead of `org.slf4j.Logger`. The `org.apache.commons.logging.Log` doesn't 
support
{code:java}
LOG.error("xxx {}", "yyy"){code}
Please let me know if anything else needed. Thank you!

> Crashing bugs in NameNode when using a valid configuration for 
> `dfs.namenode.audit.loggers`
> ---
>
> Key: HDFS-15124
> URL: https://issues.apache.org/jira/browse/HDFS-15124
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.10.0
>Reporter: Ctest
>Priority: Critical
> Attachments: HDFS-15124.000.patch, HDFS-15124.001.patch
>
>
> I am using Hadoop-2.10.0.
> The configuration parameter `dfs.namenode.audit.loggers` allows `default` 
> (which is the default value) and 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`.
> When I use `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, 
> namenode will not be started successfully because of an 
> `InstantiationException` thrown from 
> `org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers`. 
> The root cause is that while initializing namenode, `initAuditLoggers` will 
> be called and it will try to call the default constructor of 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger` which doesn't 
> have a default constructor. Thus the `InstantiationException` exception is 
> thrown.
>  
> *Symptom*
> *$ ./start-dfs.sh*
> {code:java}
> 2019-12-18 14:05:20,670 ERROR 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem 
> initialization failed.java.lang.RuntimeException: 
> java.lang.InstantiationException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1024)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:858)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:677)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:674)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:736)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:961)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:940)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1714)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1782)
> Caused by: java.lang.InstantiationException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger
> at java.lang.Class.newInstance(Class.java:427)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1017)...
> 8 more
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger.()
> at java.lang.Class.getConstructor0(Class.java:3082)
> at java.lang.Class.newInstance(Class.java:412)
> ... 9 more{code}
>  
>  
> *Detailed Root Cause*
> There is no default constructor in 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`: 
> {code:java}
> /** 
>  * An {@link AuditLogger} that sends logged data directly to the metrics 
>  * systems. It is used when the top service is used directly by the name node 
>  */ 
> @InterfaceAudience.Private 
> public class TopAuditLogger implements AuditLogger { 
>   public static finalLogger LOG = 
> LoggerFactory.getLogger(TopAuditLogger.class); 
>   private final TopMetrics topMetrics; 
>   public TopAuditLogger(TopMetrics topMetrics) {
> Preconditions.checkNotNull(topMetrics, "Cannot init with a null " + 
> "TopMetrics");
> this.topMetrics = topMetrics; 
>   }
>   @Override
>   public void initialize(Configuration conf) { 
>   }
> {code}
> As long as the configuration parameter `dfs.namenode.audit.loggers` is set to 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, 
> `initAuditLoggers` will try to call its default constructor to make a new 
> instance: 
> {code:java}
> private List initAuditLoggers(Configuration conf) {
>   // Initialize the custom access loggers if configured.
>   Collection alClasses =
>       conf.getTrimmedStringCollection(DFS_NAMENODE_AUDIT_LOGGERS_KEY);
>   List auditLoggers = Lists.newArrayList();
>   if (alClasses != null && !alClasses.isEmpty()) {
>     for (String className : alClasses) {
>       try {
>         AuditLogger logger;
>         if 

[jira] [Updated] (HDFS-15124) Crashing bugs in NameNode when using a valid configuration for `dfs.namenode.audit.loggers`

2020-01-16 Thread Ctest (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ctest updated HDFS-15124:
-
Attachment: HDFS-15124.001.patch

> Crashing bugs in NameNode when using a valid configuration for 
> `dfs.namenode.audit.loggers`
> ---
>
> Key: HDFS-15124
> URL: https://issues.apache.org/jira/browse/HDFS-15124
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.10.0
>Reporter: Ctest
>Priority: Critical
> Attachments: HDFS-15124.000.patch, HDFS-15124.001.patch
>
>
> I am using Hadoop-2.10.0.
> The configuration parameter `dfs.namenode.audit.loggers` allows `default` 
> (which is the default value) and 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`.
> When I use `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, 
> namenode will not be started successfully because of an 
> `InstantiationException` thrown from 
> `org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers`. 
> The root cause is that while initializing namenode, `initAuditLoggers` will 
> be called and it will try to call the default constructor of 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger` which doesn't 
> have a default constructor. Thus the `InstantiationException` exception is 
> thrown.
>  
> *Symptom*
> *$ ./start-dfs.sh*
> {code:java}
> 2019-12-18 14:05:20,670 ERROR 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem 
> initialization failed.java.lang.RuntimeException: 
> java.lang.InstantiationException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1024)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:858)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:677)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:674)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:736)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:961)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:940)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1714)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1782)
> Caused by: java.lang.InstantiationException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger
> at java.lang.Class.newInstance(Class.java:427)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initAuditLoggers(FSNamesystem.java:1017)...
> 8 more
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger.()
> at java.lang.Class.getConstructor0(Class.java:3082)
> at java.lang.Class.newInstance(Class.java:412)
> ... 9 more{code}
>  
>  
> *Detailed Root Cause*
> There is no default constructor in 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`: 
> {code:java}
> /** 
>  * An {@link AuditLogger} that sends logged data directly to the metrics 
>  * systems. It is used when the top service is used directly by the name node 
>  */ 
> @InterfaceAudience.Private 
> public class TopAuditLogger implements AuditLogger { 
>   public static finalLogger LOG = 
> LoggerFactory.getLogger(TopAuditLogger.class); 
>   private final TopMetrics topMetrics; 
>   public TopAuditLogger(TopMetrics topMetrics) {
> Preconditions.checkNotNull(topMetrics, "Cannot init with a null " + 
> "TopMetrics");
> this.topMetrics = topMetrics; 
>   }
>   @Override
>   public void initialize(Configuration conf) { 
>   }
> {code}
> As long as the configuration parameter `dfs.namenode.audit.loggers` is set to 
> `org.apache.hadoop.hdfs.server.namenode.top.TopAuditLogger`, 
> `initAuditLoggers` will try to call its default constructor to make a new 
> instance: 
> {code:java}
> private List initAuditLoggers(Configuration conf) {
>   // Initialize the custom access loggers if configured.
>   Collection alClasses =
>       conf.getTrimmedStringCollection(DFS_NAMENODE_AUDIT_LOGGERS_KEY);
>   List auditLoggers = Lists.newArrayList();
>   if (alClasses != null && !alClasses.isEmpty()) {
>     for (String className : alClasses) {
>       try {
>         AuditLogger logger;
>         if (DFS_NAMENODE_DEFAULT_AUDIT_LOGGER_NAME.equals(className)) {
>           logger = new DefaultAuditLogger();
>         } else {
>           logger = (AuditLogger) Class.forName(className).newInstance();
>         }
>         logger.initialize(conf);
>         auditLoggers.add(logger);
>       } catch (RuntimeException re) {
>         throw re;
>       } catch (Exception e) {
>         throw new 

[jira] [Created] (HDFS-15126) testForcedRegistration fails Intermittently

2020-01-16 Thread Ahmed Hussein (Jira)
Ahmed Hussein created HDFS-15126:


 Summary: testForcedRegistration fails Intermittently
 Key: HDFS-15126
 URL: https://issues.apache.org/jira/browse/HDFS-15126
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ahmed Hussein
Assignee: Ahmed Hussein


Recently, {{TestDatanodeRegistration.testForcedRegistration}} started to fail 
in the recommit builds:


{code:bash}
[ERROR] Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 25.333 
s <<< FAILURE! - in org.apache.hadoop.hdfs.TestDatanodeRegistration
[ERROR] testForcedRegistration(org.apache.hadoop.hdfs.TestDatanodeRegistration) 
 Time elapsed: 11.39 s  <<< FAILURE!
java.lang.AssertionError
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at 
org.apache.hadoop.hdfs.TestDatanodeRegistration.testForcedRegistration(TestDatanodeRegistration.java:385)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)

{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15125) Pull back fixes for DataNodeVolume* unit tests

2020-01-16 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016994#comment-17016994
 ] 

Jim Brennan commented on HDFS-15125:


The unit tests are unrelated, but I am still seeing one of the tests fail 
locally with this patch.    Once I resolve that, I will upload a new patch.

 

> Pull back fixes for DataNodeVolume* unit tests
> --
>
> Key: HDFS-15125
> URL: https://issues.apache.org/jira/browse/HDFS-15125
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: hdfs
>Affects Versions: 2.10.0
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Attachments: HDFS-15125-branch-2.10.001.patch
>
>
> I would like to pull back some fixes for the DataNodeVolume* tests to resolve 
> some intermittent failures we are seeing on branch-2.10.
> The fixes are:
> HDFS-11353 Improve the unit tests relevant to DataNode volume failure testing
> HDFS-13993 
> TestDataNodeVolumeFailure#testTolerateVolumeFailuresAfterAddingMoreVolumes is 
> flaky
> HDFS-14324 Fix TestDataNodeVolumeFailure
> HDFS-13945 TestDataNodeVolumeFailure is Flaky



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14757) TestBalancerRPCDelay.testBalancerRPCDelay failed

2020-01-16 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016985#comment-17016985
 ] 

Ahmed Hussein commented on HDFS-14757:
--

The Failed junit tests were failing Intermittently on the recommit build for 
some time. 
* hadoop.hdfs.TestDatanodeRegistration: failing for some time on qb reports 
HDFS-9880, HDFS-13554 (need to file a new Jiras)
* hadoop.hdfs.server.namenode.TestRedudantBlocks: HDFS-15092
* hadoop.hdfs.qjournal.server.TestJournalNodeSync: HDFS-14261

[~kihwal], Can you please take a look that patch to fix the failures reported 
from {{testBalancerRPCDelay}}?

> TestBalancerRPCDelay.testBalancerRPCDelay failed
> 
>
> Key: HDFS-14757
> URL: https://issues.apache.org/jira/browse/HDFS-14757
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Wei-Chiu Chuang
>Assignee: Ahmed Hussein
>Priority: Major
> Attachments: HDFS-14757.001.patch
>
>
> {noformat}
> Error Message
> Unfinished stubbing detected here:
> -> at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancer.spyFSNamesystem(TestBalancer.java:1948)
> E.g. thenReturn() may be missing.
> Examples of correct stubbing:
> when(mock.isOk()).thenReturn(true);
> when(mock.isOk()).thenThrow(exception);
> doThrow(exception).when(mock).someVoidMethod();
> Hints:
>  1. missing thenReturn()
>  2. although stubbed methods may return mocks, you cannot inline mock 
> creation (mock()) call inside a thenReturn method (see issue 53)
> Stacktrace
> org.mockito.exceptions.misusing.UnfinishedStubbingException: 
> Unfinished stubbing detected here:
> -> at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancer.spyFSNamesystem(TestBalancer.java:1948)
> E.g. thenReturn() may be missing.
> Examples of correct stubbing:
> when(mock.isOk()).thenReturn(true);
> when(mock.isOk()).thenThrow(exception);
> doThrow(exception).when(mock).someVoidMethod();
> Hints:
>  1. missing thenReturn()
>  2. although stubbed methods may return mocks, you cannot inline mock 
> creation (mock()) call inside a thenReturn method (see issue 53)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancer.spyFSNamesystem(TestBalancer.java:1957)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancer.doTest(TestBalancer.java:811)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancer.testBalancerRPCDelay(TestBalancer.java:1976)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancerRPCDelay.testBalancerRPCDelay(TestBalancerRPCDelay.java:30)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15087) RBF: Balance/Rename across federation namespaces

2020-01-16 Thread Jinglun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016871#comment-17016871
 ] 

Jinglun commented on HDFS-15087:


Hi everyone, any further comments? Please let me know your thoughts, thanks !

> RBF: Balance/Rename across federation namespaces
> 
>
> Key: HDFS-15087
> URL: https://issues.apache.org/jira/browse/HDFS-15087
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jinglun
>Priority: Major
> Attachments: HDFS-15087.initial.patch, HFR_Rename Across Federation 
> Namespaces.pdf
>
>
> The Xiaomi storage team has developed a new feature called HFR(HDFS 
> Federation Rename) that enables us to do balance/rename across federation 
> namespaces. The idea is to first move the meta to the dst NameNode and then 
> link all the replicas. It has been working in our largest production cluster 
> for 2 months. We use it to balance the namespaces. It turns out HFR is fast 
> and flexible. The detail could be found in the design doc. 
> Looking forward to a lively discussion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15120) Refresh BlockPlacementPolicy at runtime.

2020-01-16 Thread Jinglun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016695#comment-17016695
 ] 

Jinglun edited comment on HDFS-15120 at 1/16/20 12:31 PM:
--

Hi [~ayushtkn], thanks your comment ! I found the ReconfigurationProtocol is 
used by DataNode too. Using ReconfigurationProtocol will give DataNode the 
refreshBlockPlacementPolicy interface even it doesn't has a 
BlockPlacementPolicy. And I found the NameNode already has a 
GenericRefreshProtocol. I can create a RefreshBlockPlacementPolicyHandler and 
use the GenericRefreshProtocol to refresh. Please let me know your thoughts.

Upload v02.


was (Author: lijinglun):
Hi [~ayushtkn], thanks your comment ! I found the ReconfigurationProtocol is 
used by DataNode too. Using ReconfigurationProtocol will give DataNode the 
refreshBlockPlacementPolicy interface even it doesn't has a 
BlockPlacementPolicy. And I found the NameNode already has a 
GenericRefreshProtocol. I can create a RefreshBlockPlacementPolicyHandler and 
use the GenericRefreshProtocol to refresh. What do you think ?

Upload v02.

> Refresh BlockPlacementPolicy at runtime.
> 
>
> Key: HDFS-15120
> URL: https://issues.apache.org/jira/browse/HDFS-15120
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Attachments: HDFS-15120.001.patch, HDFS-15120.002.patch
>
>
> Now if we want to switch BlockPlacementPolicies we need to restart the 
> NameNode. It would be convenient if we can switch it at runtime. For example 
> we can switch between AvailableSpaceBlockPlacementPolicy and 
> BlockPlacementPolicyDefault as needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15120) Refresh BlockPlacementPolicy at runtime.

2020-01-16 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016854#comment-17016854
 ] 

Hadoop QA commented on HDFS-15120:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 31m 
53s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 30s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
11s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 41s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 3 new + 346 unchanged - 0 fixed = 349 total (was 346) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 43s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
12s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}122m 31s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
46s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}216m  2s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.web.TestWebHdfsFileSystemContract |
|   | hadoop.hdfs.TestDeadNodeDetection |
|   | hadoop.hdfs.server.namenode.TestRedudantBlocks |
|   | hadoop.hdfs.TestDFSOutputStream |
|   | hadoop.hdfs.server.blockmanagement.TestReplicationPolicy |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:c44943d1fc3 |
| JIRA Issue | HDFS-15120 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12991089/HDFS-15120.002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux ed8c73cc4e32 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 
08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / a0ff42d |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_232 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28683/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| unit | 

[jira] [Commented] (HDFS-15113) Missing IBR when NameNode restart if open processCommand async feature

2020-01-16 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016851#comment-17016851
 ] 

Xiaoqiao He commented on HDFS-15113:


[~elgoiri],[~weichiu],[~brahmareddy], any bandwidth to push this issue forward? 
Consider this is serious bug, and some users want to backport to their own 
internal branch, it is better to fix it ASAP. I would like to follow up if any 
suggestions. Thanks again.

> Missing IBR when NameNode restart if open processCommand async feature
> --
>
> Key: HDFS-15113
> URL: https://issues.apache.org/jira/browse/HDFS-15113
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Attachments: HDFS-15113.001.patch, HDFS-15113.002.patch, 
> HDFS-15113.003.patch
>
>
> Recently, I meet one case that NameNode missing block after restart which is 
> related with HDFS-14997.
> a. during NameNode restart, it will return command `DNA_REGISTER` to DataNode 
> when receive some RPC request from DataNode.
> b. when DataNode receive `DNA_REGISTER` command, it will run #reRegister 
> async.
> {code:java}
>   void reRegister() throws IOException {
> if (shouldRun()) {
>   // re-retrieve namespace info to make sure that, if the NN
>   // was restarted, we still match its version (HDFS-2120)
>   NamespaceInfo nsInfo = retrieveNamespaceInfo();
>   // and re-register
>   register(nsInfo);
>   scheduler.scheduleHeartbeat();
>   // HDFS-9917,Standby NN IBR can be very huge if standby namenode is down
>   // for sometime.
>   if (state == HAServiceState.STANDBY || state == 
> HAServiceState.OBSERVER) {
> ibrManager.clearIBRs();
>   }
> }
>   }
> {code}
> c. As we know, #register will trigger BR immediately.
> d. because #reRegister run async, so we could not make sure which one run 
> first between send FBR and clear IBR. If clean IBR run first, it will be OK. 
> But if send FBR first then clear IBR, it will missing some blocks received 
> between these two time point until next FBR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15115) Namenode crash caused by NPE in BlockPlacementPolicyDefault when dynamically change logger to debug

2020-01-16 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016844#comment-17016844
 ] 

Xiaoqiao He commented on HDFS-15115:


[~wzx513], would you like to check all same cases in 
#BlockPlacementPolicyDefault and try to update new patch, it is better to add 
UT just like [~weichiu] mentioned above. Thanks.

> Namenode crash caused by NPE in BlockPlacementPolicyDefault when dynamically 
> change logger to debug
> ---
>
> Key: HDFS-15115
> URL: https://issues.apache.org/jira/browse/HDFS-15115
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: wangzhixiang
>Priority: Major
> Attachments: HDFS-15115.001.patch
>
>
> To get debug info, we dynamically change the logger of 
> BlockPlacementPolicyDefault to debug when namenode is running. However, the 
> Namenode crashs. From the log, we find some NPE in 
> BlockPlacementPolicyDefault.chooseRandom. Because *StringBuilder builder* 
> will be used 4 times in BlockPlacementPolicyDefault.chooseRandom method. 
> While the *builder* only initializes in the first time of this method. If we 
> change the logger of BlockPlacementPolicyDefault to debug after the part, the 
> *builder* in remaining part is *NULL* and cause *NPE*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14672) Backport HDFS-12703 to branch-2

2020-01-16 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016843#comment-17016843
 ] 

Xiaoqiao He commented on HDFS-14672:


[~Tao Yang],  [^HDFS-14672.branch-2.8.001.patch] try to offer path for 
branch-2.8 without carefully test, FYI.

> Backport HDFS-12703 to branch-2
> ---
>
> Key: HDFS-14672
> URL: https://issues.apache.org/jira/browse/HDFS-14672
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Fix For: 2.10.0
>
> Attachments: HDFS-12703.branch-2.001.patch, 
> HDFS-12703.branch-2.002.patch, HDFS-12703.branch-2.003.patch, 
> HDFS-14672.branch-2.8.001.patch
>
>
> Currently, `decommission monitor exception cause namenode fatal` is only in 
> trunk (branch-3). This JIRA aims to backport this bugfix to branch-2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14672) Backport HDFS-12703 to branch-2

2020-01-16 Thread Xiaoqiao He (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoqiao He updated HDFS-14672:
---
Attachment: HDFS-14672.branch-2.8.001.patch

> Backport HDFS-12703 to branch-2
> ---
>
> Key: HDFS-14672
> URL: https://issues.apache.org/jira/browse/HDFS-14672
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Fix For: 2.10.0
>
> Attachments: HDFS-12703.branch-2.001.patch, 
> HDFS-12703.branch-2.002.patch, HDFS-12703.branch-2.003.patch, 
> HDFS-14672.branch-2.8.001.patch
>
>
> Currently, `decommission monitor exception cause namenode fatal` is only in 
> trunk (branch-3). This JIRA aims to backport this bugfix to branch-2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15120) Refresh BlockPlacementPolicy at runtime.

2020-01-16 Thread Jinglun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016695#comment-17016695
 ] 

Jinglun edited comment on HDFS-15120 at 1/16/20 8:38 AM:
-

Hi [~ayushtkn], thanks your comment ! I found the ReconfigurationProtocol is 
used by DataNode too. Using ReconfigurationProtocol will give DataNode the 
refreshBlockPlacementPolicy interface even it doesn't has a 
BlockPlacementPolicy. And I found the NameNode already has a 
GenericRefreshProtocol. I can create a RefreshBlockPlacementPolicyHandler and 
use the GenericRefreshProtocol to refresh. What do you think ?

Upload v02.


was (Author: lijinglun):
Hi [~ayushtkn], thanks your comment ! I found the ReconfigurationProtocol is 
used by DataNode too. And I found the NameNode already has a 
GenericRefreshProtocol. I can create a RefreshBlockPlacementPolicyHandler and 
use the GenericRefreshProtocol to refresh. What do you think ?

Upload v02.

> Refresh BlockPlacementPolicy at runtime.
> 
>
> Key: HDFS-15120
> URL: https://issues.apache.org/jira/browse/HDFS-15120
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Attachments: HDFS-15120.001.patch, HDFS-15120.002.patch
>
>
> Now if we want to switch BlockPlacementPolicies we need to restart the 
> NameNode. It would be convenient if we can switch it at runtime. For example 
> we can switch between AvailableSpaceBlockPlacementPolicy and 
> BlockPlacementPolicyDefault as needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15120) Refresh BlockPlacementPolicy at runtime.

2020-01-16 Thread Jinglun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinglun updated HDFS-15120:
---
Attachment: HDFS-15120.002.patch

> Refresh BlockPlacementPolicy at runtime.
> 
>
> Key: HDFS-15120
> URL: https://issues.apache.org/jira/browse/HDFS-15120
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Attachments: HDFS-15120.001.patch, HDFS-15120.002.patch
>
>
> Now if we want to switch BlockPlacementPolicies we need to restart the 
> NameNode. It would be convenient if we can switch it at runtime. For example 
> we can switch between AvailableSpaceBlockPlacementPolicy and 
> BlockPlacementPolicyDefault as needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15120) Refresh BlockPlacementPolicy at runtime.

2020-01-16 Thread Jinglun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016695#comment-17016695
 ] 

Jinglun commented on HDFS-15120:


Hi [~ayushtkn], thanks your comment ! I found the ReconfigurationProtocol is 
used by DataNode too. And I found the NameNode already has a 
GenericRefreshProtocol. I can create a RefreshBlockPlacementPolicyHandler and 
use the GenericRefreshProtocol to refresh. What do you think ?

Upload v02.

> Refresh BlockPlacementPolicy at runtime.
> 
>
> Key: HDFS-15120
> URL: https://issues.apache.org/jira/browse/HDFS-15120
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Attachments: HDFS-15120.001.patch, HDFS-15120.002.patch
>
>
> Now if we want to switch BlockPlacementPolicies we need to restart the 
> NameNode. It would be convenient if we can switch it at runtime. For example 
> we can switch between AvailableSpaceBlockPlacementPolicy and 
> BlockPlacementPolicyDefault as needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org