[jira] [Commented] (HDFS-13478) RBF: Decommission subclusters from the federation

2018-04-18 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443572#comment-16443572
 ] 

genericqa commented on HDFS-13478:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
36s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 52s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 15s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs-rbf: The patch 
generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 26s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
58s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs-rbf generated 3 new + 
0 unchanged - 0 fixed = 3 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 14m  6s{color} 
| {color:red} hadoop-hdfs-rbf in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
24s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 69m 24s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs-project/hadoop-hdfs-rbf |
|  |  Unchecked/unconfirmed cast from 
org.apache.hadoop.hdfs.server.federation.store.protocol.GetDisabledNamespacesRequest
 to 
org.apache.hadoop.hdfs.server.federation.store.protocol.impl.pb.GetDisabledNamespacesRequestPBImpl
 in 
org.apache.hadoop.hdfs.protocolPB.RouterAdminProtocolTranslatorPB.getDisabledNamespaces(GetDisabledNamespacesRequest)
  At 
RouterAdminProtocolTranslatorPB.java:org.apache.hadoop.hdfs.server.federation.store.protocol.impl.pb.GetDisabledNamespacesRequestPBImpl
 in 
org.apache.hadoop.hdfs.protocolPB.RouterAdminProtocolTranslatorPB.getDisabledNamespaces(GetDisabledNamespacesRequest)
  At RouterAdminProtocolTranslatorPB.java:[line 261] |
|  |  Useless object stored in variable ret of method 
org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.getNamespaces()
  At MembershipNamenodeResolver.java:ret of method 

[jira] [Comment Edited] (HDFS-13453) RBF: getMountPointDates should fetch latest subdir time/date when parent dir is not present but /parent/child dirs are present in mount table

2018-04-18 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HDFS-13453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443568#comment-16443568
 ] 

Íñigo Goiri edited comment on HDFS-13453 at 4/19/18 4:47 AM:
-

The only thing is to extract the mod time get. Other than that, this is
good to go.




was (Author: elgoiri):
The only thing is to extract the mod time get. Other than that, this is
good to go.

On Wed, Apr 18, 2018, 20:58 Dibyendu Karmakar (JIRA) 



> RBF: getMountPointDates should fetch latest subdir time/date when parent dir 
> is not present but /parent/child dirs are present in mount table
> -
>
> Key: HDFS-13453
> URL: https://issues.apache.org/jira/browse/HDFS-13453
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Dibyendu Karmakar
>Assignee: Dibyendu Karmakar
>Priority: Major
> Attachments: HDFS-13453-000.patch, HDFS-13453-001.patch
>
>
> [HDFS-13386|https://issues.apache.org/jira/browse/HDFS-13386] is not handling 
> the case when /parent in not present in mount table but /parent/subdir is in 
> mount table.
> In this case getMountPointDates is not able to fetch the latest time for 
> /parent as /parent is not present in mount table.
> For this scenario we will display latest modified subdir date/time as /parent 
> modified time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13453) RBF: getMountPointDates should fetch latest subdir time/date when parent dir is not present but /parent/child dirs are present in mount table

2018-04-18 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HDFS-13453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443568#comment-16443568
 ] 

Íñigo Goiri commented on HDFS-13453:


The only thing is to extract the mod time get. Other than that, this is
good to go.

On Wed, Apr 18, 2018, 20:58 Dibyendu Karmakar (JIRA) 



> RBF: getMountPointDates should fetch latest subdir time/date when parent dir 
> is not present but /parent/child dirs are present in mount table
> -
>
> Key: HDFS-13453
> URL: https://issues.apache.org/jira/browse/HDFS-13453
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Dibyendu Karmakar
>Assignee: Dibyendu Karmakar
>Priority: Major
> Attachments: HDFS-13453-000.patch, HDFS-13453-001.patch
>
>
> [HDFS-13386|https://issues.apache.org/jira/browse/HDFS-13386] is not handling 
> the case when /parent in not present in mount table but /parent/subdir is in 
> mount table.
> In this case getMountPointDates is not able to fetch the latest time for 
> /parent as /parent is not present in mount table.
> For this scenario we will display latest modified subdir date/time as /parent 
> modified time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13430) Fix TestEncryptionZonesWithKMS failure due to HADOOP-14445

2018-04-18 Thread Rushabh S Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443557#comment-16443557
 ] 

Rushabh S Shah edited comment on HDFS-13430 at 4/19/18 4:10 AM:


Someone needs to properly manage the "Fix Version" fields. 2.9.1 is still 
showing as unreleased. Same for other branches also.


was (Author: shahrs87):
Someone needs to fix the "Fix Version" fields. 2.9.1 is still showing as 
unreleased. Same for other branches also.

> Fix TestEncryptionZonesWithKMS failure due to HADOOP-14445
> --
>
> Key: HDFS-13430
> URL: https://issues.apache.org/jira/browse/HDFS-13430
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Major
> Fix For: 2.10.0, 2.8.4, 3.2.0, 3.1.1, 2.9.2, 3.0.3
>
> Attachments: HDFS-13430.01.patch
>
>
> Unfortunately HADOOP-14445 had an HDFS test failure that's not caught in the 
> hadoop-common precommit runs.
> This is caught by our internal pre-commit using dist-test, and appears to be 
> the only failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13430) Fix TestEncryptionZonesWithKMS failure due to HADOOP-14445

2018-04-18 Thread Rushabh S Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443557#comment-16443557
 ] 

Rushabh S Shah commented on HDFS-13430:
---

Someone needs to fix the "Fix Version" fields. 2.9.1 is still showing as 
unreleased. Same for other branches also.

> Fix TestEncryptionZonesWithKMS failure due to HADOOP-14445
> --
>
> Key: HDFS-13430
> URL: https://issues.apache.org/jira/browse/HDFS-13430
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Major
> Fix For: 2.10.0, 2.8.4, 3.2.0, 3.1.1, 2.9.2, 3.0.3
>
> Attachments: HDFS-13430.01.patch
>
>
> Unfortunately HADOOP-14445 had an HDFS test failure that's not caught in the 
> hadoop-common precommit runs.
> This is caught by our internal pre-commit using dist-test, and appears to be 
> the only failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13430) Fix TestEncryptionZonesWithKMS failure due to HADOOP-14445

2018-04-18 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443556#comment-16443556
 ] 

Xiao Chen commented on HDFS-13430:
--

Thanks Rushabh, branch enum LGTM.

> Fix TestEncryptionZonesWithKMS failure due to HADOOP-14445
> --
>
> Key: HDFS-13430
> URL: https://issues.apache.org/jira/browse/HDFS-13430
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Major
> Fix For: 2.10.0, 2.8.4, 3.2.0, 3.1.1, 2.9.2, 3.0.3
>
> Attachments: HDFS-13430.01.patch
>
>
> Unfortunately HADOOP-14445 had an HDFS test failure that's not caught in the 
> hadoop-common precommit runs.
> This is caught by our internal pre-commit using dist-test, and appears to be 
> the only failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13430) Fix TestEncryptionZonesWithKMS failure due to HADOOP-14445

2018-04-18 Thread Rushabh S Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rushabh S Shah updated HDFS-13430:
--
   Resolution: Fixed
Fix Version/s: 3.0.3
   2.9.2
   3.1.1
   2.8.4
   2.10.0
   Status: Resolved  (was: Patch Available)

Cherry-picked to branch-3.1, branch-3.0, branch-2, branch-2.9, branch-2.8.
Thanks [~xiaochen] for the patch.
Hope I didn't mess up anything. Let me know if otherwise.

> Fix TestEncryptionZonesWithKMS failure due to HADOOP-14445
> --
>
> Key: HDFS-13430
> URL: https://issues.apache.org/jira/browse/HDFS-13430
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Major
> Fix For: 2.10.0, 2.8.4, 3.2.0, 3.1.1, 2.9.2, 3.0.3
>
> Attachments: HDFS-13430.01.patch
>
>
> Unfortunately HADOOP-14445 had an HDFS test failure that's not caught in the 
> hadoop-common precommit runs.
> This is caught by our internal pre-commit using dist-test, and appears to be 
> the only failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13453) RBF: getMountPointDates should fetch latest subdir time/date when parent dir is not present but /parent/child dirs are present in mount table

2018-04-18 Thread Dibyendu Karmakar (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443552#comment-16443552
 ] 

Dibyendu Karmakar commented on HDFS-13453:
--

[^HDFS-13453-001.patch] contains the unit test for this scenario. 

> RBF: getMountPointDates should fetch latest subdir time/date when parent dir 
> is not present but /parent/child dirs are present in mount table
> -
>
> Key: HDFS-13453
> URL: https://issues.apache.org/jira/browse/HDFS-13453
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Dibyendu Karmakar
>Assignee: Dibyendu Karmakar
>Priority: Major
> Attachments: HDFS-13453-000.patch, HDFS-13453-001.patch
>
>
> [HDFS-13386|https://issues.apache.org/jira/browse/HDFS-13386] is not handling 
> the case when /parent in not present in mount table but /parent/subdir is in 
> mount table.
> In this case getMountPointDates is not able to fetch the latest time for 
> /parent as /parent is not present in mount table.
> For this scenario we will display latest modified subdir date/time as /parent 
> modified time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13469) RBF: Support InodeID in the Router

2018-04-18 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443544#comment-16443544
 ] 

Xiao Chen commented on HDFS-13469:
--

{quote} Does anybody have a pointer to an end 2 end use of inodes?
{quote}
The only use case I'm aware of is you can for example {{hdfs dfs -ls 
/.reserved/.inodes/123}} to access the file.  (replace CLI with API calls, and 
ls with other operations). See -HDFS-4434.-

> RBF: Support InodeID in the Router
> --
>
> Key: HDFS-13469
> URL: https://issues.apache.org/jira/browse/HDFS-13469
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Priority: Major
>
> The Namenode supports identifying files through inode identifiers.
> Currently the Router does not handle this properly, we need to add this 
> functionality.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13478) RBF: Decommission subclusters from the federation

2018-04-18 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HDFS-13478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated HDFS-13478:
---
Status: Patch Available  (was: Open)

> RBF: Decommission subclusters from the federation
> -
>
> Key: HDFS-13478
> URL: https://issues.apache.org/jira/browse/HDFS-13478
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
>Priority: Major
> Attachments: HDFS-13478.000.patch, HDFS-13478.001.patch
>
>
> We have a subcluster in our federation that is for testing and is 
> missbehaving. This has a negative impact on the performance with operations 
> that go to every subcluster (e.g., renewLease() or setSafeMode()).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13469) RBF: Support InodeID in the Router

2018-04-18 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HDFS-13469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443536#comment-16443536
 ] 

Íñigo Goiri commented on HDFS-13469:


Does anybody have a pointer to an end 2 end use of inodes?
For now, it might be good to throw an unsupported exception if we get accesses 
to an inode path.
Tracking the locations might be too involved.

> RBF: Support InodeID in the Router
> --
>
> Key: HDFS-13469
> URL: https://issues.apache.org/jira/browse/HDFS-13469
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Priority: Major
>
> The Namenode supports identifying files through inode identifiers.
> Currently the Router does not handle this properly, we need to add this 
> functionality.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13453) RBF: getMountPointDates should fetch latest subdir time/date when parent dir is not present but /parent/child dirs are present in mount table

2018-04-18 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HDFS-13453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443533#comment-16443533
 ] 

Íñigo Goiri commented on HDFS-13453:


[~dibyendu_hadoop], that makes sense, if the parent is there, let's return that 
date.
Let's just make sure that's in the unit test.

> RBF: getMountPointDates should fetch latest subdir time/date when parent dir 
> is not present but /parent/child dirs are present in mount table
> -
>
> Key: HDFS-13453
> URL: https://issues.apache.org/jira/browse/HDFS-13453
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Dibyendu Karmakar
>Assignee: Dibyendu Karmakar
>Priority: Major
> Attachments: HDFS-13453-000.patch, HDFS-13453-001.patch
>
>
> [HDFS-13386|https://issues.apache.org/jira/browse/HDFS-13386] is not handling 
> the case when /parent in not present in mount table but /parent/subdir is in 
> mount table.
> In this case getMountPointDates is not able to fetch the latest time for 
> /parent as /parent is not present in mount table.
> For this scenario we will display latest modified subdir date/time as /parent 
> modified time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13478) RBF: Decommission subclusters from the federation

2018-04-18 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HDFS-13478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443532#comment-16443532
 ] 

Íñigo Goiri commented on HDFS-13478:


Thanks [~linyiqun] for the comments.
After checking, we cannot reuse the membership table as it is updated per 
Router and the way the heartbeat is done would overwrite everything.
For the length, I agree.
We can go with  [^HDFS-13478.001.patch] which just does the basic wiring and 
then we can do the RPC checks in another one.
We can remove the change for MembershipNamenodeResolver from 
[^HDFS-13478.001.patch].

> RBF: Decommission subclusters from the federation
> -
>
> Key: HDFS-13478
> URL: https://issues.apache.org/jira/browse/HDFS-13478
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
>Priority: Major
> Attachments: HDFS-13478.000.patch, HDFS-13478.001.patch
>
>
> We have a subcluster in our federation that is for testing and is 
> missbehaving. This has a negative impact on the performance with operations 
> that go to every subcluster (e.g., renewLease() or setSafeMode()).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13469) RBF: Support InodeID in the Router

2018-04-18 Thread Yiqun Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443531#comment-16443531
 ] 

Yiqun Lin commented on HDFS-13469:
--

{quote}I'm not sure how to figure which subcluster has that inode.
{quote}
[~elgoiri], currently in the Router, I think we cannot directly know which 
subcluster has the inode. The one way, we may try to get its HdfsFileStatus 
from subclusters with the given source path, and verify the fileId in 
HdfsFileStatus and confirm the desired subcluster(namespace). Not sure this is 
the expected way as [~daryn] wants.

> RBF: Support InodeID in the Router
> --
>
> Key: HDFS-13469
> URL: https://issues.apache.org/jira/browse/HDFS-13469
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Priority: Major
>
> The Namenode supports identifying files through inode identifiers.
> Currently the Router does not handle this properly, we need to add this 
> functionality.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13478) RBF: Decommission subclusters from the federation

2018-04-18 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HDFS-13478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated HDFS-13478:
---
Attachment: HDFS-13478.001.patch

> RBF: Decommission subclusters from the federation
> -
>
> Key: HDFS-13478
> URL: https://issues.apache.org/jira/browse/HDFS-13478
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
>Priority: Major
> Attachments: HDFS-13478.000.patch, HDFS-13478.001.patch
>
>
> We have a subcluster in our federation that is for testing and is 
> missbehaving. This has a negative impact on the performance with operations 
> that go to every subcluster (e.g., renewLease() or setSafeMode()).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-13479) Simplify find StorageInfo logical operation in BlocksMap::replaceBlock()

2018-04-18 Thread liaoyuxiangqin (JIRA)
liaoyuxiangqin created HDFS-13479:
-

 Summary: Simplify find StorageInfo  logical operation in 
BlocksMap::replaceBlock()
 Key: HDFS-13479
 URL: https://issues.apache.org/jira/browse/HDFS-13479
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: liaoyuxiangqin


When i read the replaceBlock() of BlocksMap class in hdfs-blockmanger, i found 
the following find storage code could be more simplify and easy to understand.
{code:java|title=DataStreamer.java|borderStyle=solid}
for (int i = currentBlock.numNodes() - 1; i >= 0; i--) {
  final DatanodeDescriptor dn = currentBlock.getDatanode(i);
  final DatanodeStorageInfo storage = currentBlock.findStorageInfo(dn);
  final boolean removed = storage.removeBlock(currentBlock);
  Preconditions.checkState(removed, "currentBlock not found.");

  final AddBlockResult result = storage.addBlock(newBlock);
  Preconditions.checkState(result == AddBlockResult.ADDED,
  "newBlock already exists.");
}
{code}
as described above code segmet, i find that need't get dn and we can find 
storage by index directly, so i think this code logical could simplify more.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13469) RBF: Support InodeID in the Router

2018-04-18 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443511#comment-16443511
 ] 

Xiao Chen commented on HDFS-13469:
--

Thanks folks for bringing up this issue.

Is it even possible to solve this in a compatible way?

Suppose I'm the client and I previously read from {{/.reserved/inode/123}}, 
from the only nameservice I had. Now the cluster is federated and another 
nameservice is added. Router based or not, if that other nameservice also has 
inode 123, there is no way that my input {{/.reserved/inode/123}} can map to 
the 2 inodes in the 2 nameservices, both ID'ed 123...

It seems to me we'd require the client to explicitly read from 
{{hdfs://nameserviceX/.reserved/inode/123}} to have this working, and call out 
this in documentation.

I'm not very familiar with NFS, does HDFS-11575 address this?

> RBF: Support InodeID in the Router
> --
>
> Key: HDFS-13469
> URL: https://issues.apache.org/jira/browse/HDFS-13469
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Priority: Major
>
> The Namenode supports identifying files through inode identifiers.
> Currently the Router does not handle this properly, we need to add this 
> functionality.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13478) RBF: Decommission subclusters from the federation

2018-04-18 Thread Yiqun Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443503#comment-16443503
 ] 

Yiqun Lin commented on HDFS-13478:
--

[~elgoiri], I did a quick glance for the patch, some thoughts from me:
 * If we make decommission info as a separate table, that mean we will do the 
query operation for State Store from one time to twice. Does this will make any 
performance impact?

 * Looks like this patch is large. We could separate this into two parts:

 # Decommission Store APIs implementation (Seems HDFS-13478.000.patch is just 
doing on this).
 # Add checking logic in Router sever side and corresponding test for the 
decommission case.

> RBF: Decommission subclusters from the federation
> -
>
> Key: HDFS-13478
> URL: https://issues.apache.org/jira/browse/HDFS-13478
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
>Priority: Major
> Attachments: HDFS-13478.000.patch
>
>
> We have a subcluster in our federation that is for testing and is 
> missbehaving. This has a negative impact on the performance with operations 
> that go to every subcluster (e.g., renewLease() or setSafeMode()).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13441) DataNode missed BlockKey update from NameNode due to HeartbeatResponse was dropped

2018-04-18 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443471#comment-16443471
 ] 

genericqa commented on HDFS-13441:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
41s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 28s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  1m  
7s{color} | {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 19s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 99m 
18s{color} | {color:green} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}163m 14s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b |
| JIRA Issue | HDFS-13441 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12919706/HDFS-13441.003.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 79a13594c2b2 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / e4c39f3 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC1 |
| mvninstall | 
https://builds.apache.org/job/PreCommit-HDFS-Build/23992/artifact/out/patch-mvninstall-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/23992/testReport/ |
| Max. process+thread count | 3243 (vs. ulimit of 1) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/23992/console |
| Powered by | Apache 

[jira] [Commented] (HDFS-13448) HDFS Block Placement - Ignore Locality for First Block Replica

2018-04-18 Thread BELUGA BEHR (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443469#comment-16443469
 ] 

BELUGA BEHR commented on HDFS-13448:


Not sure why the compilation failed trying again...

> HDFS Block Placement - Ignore Locality for First Block Replica
> --
>
> Key: HDFS-13448
> URL: https://issues.apache.org/jira/browse/HDFS-13448
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: block placement, hdfs-client
>Affects Versions: 2.9.0, 3.0.1
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Attachments: HDFS-13448.1.patch, HDFS-13448.2.patch, 
> HDFS-13448.3.patch, HDFS-13448.4.patch, HDFS-13448.5.patch
>
>
> According to the HDFS Block Place Rules:
> {quote}
> /**
>  * The replica placement strategy is that if the writer is on a datanode,
>  * the 1st replica is placed on the local machine, 
>  * otherwise a random datanode. The 2nd replica is placed on a datanode
>  * that is on a different rack. The 3rd replica is placed on a datanode
>  * which is on a different node of the rack as the second replica.
>  */
> {quote}
> However, there is a hint for the hdfs-client that allows the block placement 
> request to not put a block replica on the local datanode _where 'local' means 
> the same host as the client is being run on._
> {quote}
>   /**
>* Advise that a block replica NOT be written to the local DataNode where
>* 'local' means the same host as the client is being run on.
>*
>* @see CreateFlag#NO_LOCAL_WRITE
>*/
> {quote}
> I propose that we add a new flag that allows the hdfs-client to request that 
> the first block replica be placed on a random DataNode in the cluster.  The 
> subsequent block replicas should follow the normal block placement rules.
> The issue is that when the {{NO_LOCAL_WRITE}} is enabled, the first block 
> replica is not placed on the local node, but it is still placed on the local 
> rack.  Where this comes into play is where you have, for example, a flume 
> agent that is loading data into HDFS.
> If the Flume agent is running on a DataNode, then by default, the DataNode 
> local to the Flume agent will always get the first block replica and this 
> leads to un-even block placements, with the local node always filling up 
> faster than any other node in the cluster.
> Modifying this example, if the DataNode is removed from the host where the 
> Flume agent is running, or this {{NO_LOCAL_WRITE}} is enabled by Flume, then 
> the default block placement policy will still prefer the local rack.  This 
> remedies the situation only so far as now the first block replica will always 
> be distributed to a DataNode on the local rack.
> This new flag would allow a single Flume agent to distribute the blocks 
> randomly, evenly, over the entire cluster instead of hot-spotting the local 
> node or the local rack.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13453) RBF: getMountPointDates should fetch latest subdir time/date when parent dir is not present but /parent/child dirs are present in mount table

2018-04-18 Thread Dibyendu Karmakar (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443466#comment-16443466
 ] 

Dibyendu Karmakar commented on HDFS-13453:
--

Thanks [~elgoiri].

Using this above mentioned approach, when mount table will have */parent* as 
well as */parent/child* mount point it will return the modified date of 
*/parent/child* for */parent*. But the actual modified date of /parent will be 
different in mount table entry. I think if we have /parent in mount table we 
should return the actual modified date present in the mount table. When /parent 
will not be there in mount table, in that case we will return modified date of 
/parent/child.

What do you suggest?

> RBF: getMountPointDates should fetch latest subdir time/date when parent dir 
> is not present but /parent/child dirs are present in mount table
> -
>
> Key: HDFS-13453
> URL: https://issues.apache.org/jira/browse/HDFS-13453
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Dibyendu Karmakar
>Assignee: Dibyendu Karmakar
>Priority: Major
> Attachments: HDFS-13453-000.patch, HDFS-13453-001.patch
>
>
> [HDFS-13386|https://issues.apache.org/jira/browse/HDFS-13386] is not handling 
> the case when /parent in not present in mount table but /parent/subdir is in 
> mount table.
> In this case getMountPointDates is not able to fetch the latest time for 
> /parent as /parent is not present in mount table.
> For this scenario we will display latest modified subdir date/time as /parent 
> modified time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13478) RBF: Decommission subclusters from the federation

2018-04-18 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HDFS-13478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443464#comment-16443464
 ] 

Íñigo Goiri commented on HDFS-13478:


[^HDFS-13478.000.patch] is a work in progress for what I'm thinking on doing.
It is still missing a proper unit test to check that we ignore a subcluster.
The main problem as usual with all this interface patches is that the proto 
wiring is huge.

> RBF: Decommission subclusters from the federation
> -
>
> Key: HDFS-13478
> URL: https://issues.apache.org/jira/browse/HDFS-13478
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
>Priority: Major
> Attachments: HDFS-13478.000.patch
>
>
> We have a subcluster in our federation that is for testing and is 
> missbehaving. This has a negative impact on the performance with operations 
> that go to every subcluster (e.g., renewLease() or setSafeMode()).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13478) RBF: Decommission subclusters from the federation

2018-04-18 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HDFS-13478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated HDFS-13478:
---
Attachment: HDFS-13478.000.patch

> RBF: Decommission subclusters from the federation
> -
>
> Key: HDFS-13478
> URL: https://issues.apache.org/jira/browse/HDFS-13478
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
>Priority: Major
> Attachments: HDFS-13478.000.patch
>
>
> We have a subcluster in our federation that is for testing and is 
> missbehaving. This has a negative impact on the performance with operations 
> that go to every subcluster (e.g., renewLease() or setSafeMode()).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13448) HDFS Block Placement - Ignore Locality for First Block Replica

2018-04-18 Thread BELUGA BEHR (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HDFS-13448:
---
Attachment: HDFS-13448.5.patch

> HDFS Block Placement - Ignore Locality for First Block Replica
> --
>
> Key: HDFS-13448
> URL: https://issues.apache.org/jira/browse/HDFS-13448
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: block placement, hdfs-client
>Affects Versions: 2.9.0, 3.0.1
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Attachments: HDFS-13448.1.patch, HDFS-13448.2.patch, 
> HDFS-13448.3.patch, HDFS-13448.4.patch, HDFS-13448.5.patch
>
>
> According to the HDFS Block Place Rules:
> {quote}
> /**
>  * The replica placement strategy is that if the writer is on a datanode,
>  * the 1st replica is placed on the local machine, 
>  * otherwise a random datanode. The 2nd replica is placed on a datanode
>  * that is on a different rack. The 3rd replica is placed on a datanode
>  * which is on a different node of the rack as the second replica.
>  */
> {quote}
> However, there is a hint for the hdfs-client that allows the block placement 
> request to not put a block replica on the local datanode _where 'local' means 
> the same host as the client is being run on._
> {quote}
>   /**
>* Advise that a block replica NOT be written to the local DataNode where
>* 'local' means the same host as the client is being run on.
>*
>* @see CreateFlag#NO_LOCAL_WRITE
>*/
> {quote}
> I propose that we add a new flag that allows the hdfs-client to request that 
> the first block replica be placed on a random DataNode in the cluster.  The 
> subsequent block replicas should follow the normal block placement rules.
> The issue is that when the {{NO_LOCAL_WRITE}} is enabled, the first block 
> replica is not placed on the local node, but it is still placed on the local 
> rack.  Where this comes into play is where you have, for example, a flume 
> agent that is loading data into HDFS.
> If the Flume agent is running on a DataNode, then by default, the DataNode 
> local to the Flume agent will always get the first block replica and this 
> leads to un-even block placements, with the local node always filling up 
> faster than any other node in the cluster.
> Modifying this example, if the DataNode is removed from the host where the 
> Flume agent is running, or this {{NO_LOCAL_WRITE}} is enabled by Flume, then 
> the default block placement policy will still prefer the local rack.  This 
> remedies the situation only so far as now the first block replica will always 
> be distributed to a DataNode on the local rack.
> This new flag would allow a single Flume agent to distribute the blocks 
> randomly, evenly, over the entire cluster instead of hot-spotting the local 
> node or the local rack.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13469) RBF: Support InodeID in the Router

2018-04-18 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HDFS-13469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443439#comment-16443439
 ] 

Íñigo Goiri commented on HDFS-13469:


Will the client come with 123 or with a path like /.reserved/.inodes/123?
I'm not sure how to figure which subcluster has that inode.

> RBF: Support InodeID in the Router
> --
>
> Key: HDFS-13469
> URL: https://issues.apache.org/jira/browse/HDFS-13469
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Priority: Major
>
> The Namenode supports identifying files through inode identifiers.
> Currently the Router does not handle this properly, we need to add this 
> functionality.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13286) Add haadmin commands to transition between standby and observer

2018-04-18 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443402#comment-16443402
 ] 

genericqa commented on HDFS-13286:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
24s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} HDFS-12943 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
49s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
11s{color} | {color:green} HDFS-12943 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 26m 
35s{color} | {color:green} HDFS-12943 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
46s{color} | {color:green} HDFS-12943 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
46s{color} | {color:green} HDFS-12943 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 18s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
47s{color} | {color:green} HDFS-12943 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m  
9s{color} | {color:green} HDFS-12943 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
18s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 25m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 25m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 25m 
52s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m 39s{color} | {color:orange} root: The patch generated 13 new + 287 unchanged 
- 5 fixed = 300 total (was 292) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 26s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  6m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
18s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  8m 
19s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}103m 40s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 49s{color} 
| {color:red} hadoop-yarn-common in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 50s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
37s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}249m  2s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure |
|   | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA |
|   | hadoop.hdfs.web.TestWebHdfsTimeouts |
|   | hadoop.hdfs.TestSafeModeWithStripedFile |
|   | hadoop.hdfs.server.namenode.TestNameNodeMXBean |
|   | hadoop.hdfs.TestRollingUpgrade |
\\
\\
|| Subsystem || 

[jira] [Commented] (HDFS-13448) HDFS Block Placement - Ignore Locality for First Block Replica

2018-04-18 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443386#comment-16443386
 ] 

genericqa commented on HDFS-13448:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
27s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
19s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
 6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 10s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
26s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
17s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 27m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 27m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 24s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  6m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
30s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  9m 
27s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
46s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 47s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
38s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}151m 30s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b |
| JIRA Issue | HDFS-13448 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12919698/HDFS-13448.4.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 329dd1f68fa7 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / bf7694d |
| maven 

[jira] [Commented] (HDFS-13441) DataNode missed BlockKey update from NameNode due to HeartbeatResponse was dropped

2018-04-18 Thread yunjiong zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443349#comment-16443349
 ] 

yunjiong zhao commented on HDFS-13441:
--

[~daryn] , you are right, it's not the best and reliable way to fix this issue.

After some rethink, I think one line code should fix this issue.

When NameNode startActiveServices, it will call 
{code:java}
blockManager.getDatanodeManager().markAllDatanodesStale();
{code}
Inside markAllDatanodesStale, add one line code to make sure DataNode have the 
current key from active NameNode.
{code:java}
dn.setNeedKeyUpdate(true);
{code} 

> DataNode missed BlockKey update from NameNode due to HeartbeatResponse was 
> dropped
> --
>
> Key: HDFS-13441
> URL: https://issues.apache.org/jira/browse/HDFS-13441
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 2.7.1
>Reporter: yunjiong zhao
>Assignee: yunjiong zhao
>Priority: Major
> Attachments: HDFS-13441.002.patch, HDFS-13441.003.patch, 
> HDFS-13441.patch
>
>
> After NameNode failover, lots of application failed due to some DataNodes 
> can't re-compute password from block token.
> {code:java}
> 2018-04-11 20:10:52,448 ERROR 
> org.apache.hadoop.hdfs.server.datanode.DataNode: 
> hdc3-lvs01-400-1701-048.stratus.lvs.ebay.com:50010:DataXceiver error 
> processing unknown operation  src: /10.142.74.116:57404 dst: 
> /10.142.77.45:50010
> javax.security.sasl.SaslException: DIGEST-MD5: IO error acquiring password 
> [Caused by org.apache.hadoop.security.token.SecretManager$InvalidToken: Can't 
> re-compute password for block_token_identifier (expiryDate=1523538652448, 
> keyId=1762737944, userId=hadoop, 
> blockPoolId=BP-36315570-10.103.108.13-1423055488042, blockId=12142862700, 
> access modes=[WRITE]), since the required block key (keyID=1762737944) 
> doesn't exist.]
>         at 
> com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:598)
>         at 
> com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslParticipant.evaluateChallengeOrResponse(SaslParticipant.java:115)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.doSaslHandshake(SaslDataTransferServer.java:376)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.getSaslStreams(SaslDataTransferServer.java:300)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.receive(SaslDataTransferServer.java:127)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:194)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.security.token.SecretManager$InvalidToken: Can't 
> re-compute password for block_token_identifier (expiryDate=1523538652448, 
> keyId=1762737944, userId=hadoop, 
> blockPoolId=BP-36315570-10.103.108.13-1423055488042, blockId=12142862700, 
> access modes=[WRITE]), since the required block key (keyID=1762737944) 
> doesn't exist.
>         at 
> org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.retrievePassword(BlockTokenSecretManager.java:382)
>         at 
> org.apache.hadoop.hdfs.security.token.block.BlockPoolTokenSecretManager.retrievePassword(BlockPoolTokenSecretManager.java:79)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.buildServerPassword(SaslDataTransferServer.java:318)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.access$100(SaslDataTransferServer.java:73)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer$2.apply(SaslDataTransferServer.java:297)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer$SaslServerCallbackHandler.handle(SaslDataTransferServer.java:241)
>         at 
> com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:589)
>         ... 7 more
> {code}
>  
> In the DataNode log, we didn't see DataNode update block keys around 
> 2018-04-11 09:55:00 and around 2018-04-11 19:55:00.
> {code:java}
> 2018-04-10 14:51:36,424 INFO 
> org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting 
> block keys
> 2018-04-10 23:55:38,420 INFO 
> org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting 
> block keys
> 2018-04-11 00:51:34,792 INFO 
> org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting 
> block keys
> 2018-04-11 10:51:39,403 INFO 
> org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting 
> block keys
> 2018-04-11 

[jira] [Updated] (HDFS-13441) DataNode missed BlockKey update from NameNode due to HeartbeatResponse was dropped

2018-04-18 Thread yunjiong zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yunjiong zhao updated HDFS-13441:
-
Attachment: HDFS-13441.003.patch

> DataNode missed BlockKey update from NameNode due to HeartbeatResponse was 
> dropped
> --
>
> Key: HDFS-13441
> URL: https://issues.apache.org/jira/browse/HDFS-13441
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 2.7.1
>Reporter: yunjiong zhao
>Assignee: yunjiong zhao
>Priority: Major
> Attachments: HDFS-13441.002.patch, HDFS-13441.003.patch, 
> HDFS-13441.patch
>
>
> After NameNode failover, lots of application failed due to some DataNodes 
> can't re-compute password from block token.
> {code:java}
> 2018-04-11 20:10:52,448 ERROR 
> org.apache.hadoop.hdfs.server.datanode.DataNode: 
> hdc3-lvs01-400-1701-048.stratus.lvs.ebay.com:50010:DataXceiver error 
> processing unknown operation  src: /10.142.74.116:57404 dst: 
> /10.142.77.45:50010
> javax.security.sasl.SaslException: DIGEST-MD5: IO error acquiring password 
> [Caused by org.apache.hadoop.security.token.SecretManager$InvalidToken: Can't 
> re-compute password for block_token_identifier (expiryDate=1523538652448, 
> keyId=1762737944, userId=hadoop, 
> blockPoolId=BP-36315570-10.103.108.13-1423055488042, blockId=12142862700, 
> access modes=[WRITE]), since the required block key (keyID=1762737944) 
> doesn't exist.]
>         at 
> com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:598)
>         at 
> com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslParticipant.evaluateChallengeOrResponse(SaslParticipant.java:115)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.doSaslHandshake(SaslDataTransferServer.java:376)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.getSaslStreams(SaslDataTransferServer.java:300)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.receive(SaslDataTransferServer.java:127)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:194)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.security.token.SecretManager$InvalidToken: Can't 
> re-compute password for block_token_identifier (expiryDate=1523538652448, 
> keyId=1762737944, userId=hadoop, 
> blockPoolId=BP-36315570-10.103.108.13-1423055488042, blockId=12142862700, 
> access modes=[WRITE]), since the required block key (keyID=1762737944) 
> doesn't exist.
>         at 
> org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.retrievePassword(BlockTokenSecretManager.java:382)
>         at 
> org.apache.hadoop.hdfs.security.token.block.BlockPoolTokenSecretManager.retrievePassword(BlockPoolTokenSecretManager.java:79)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.buildServerPassword(SaslDataTransferServer.java:318)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.access$100(SaslDataTransferServer.java:73)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer$2.apply(SaslDataTransferServer.java:297)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer$SaslServerCallbackHandler.handle(SaslDataTransferServer.java:241)
>         at 
> com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:589)
>         ... 7 more
> {code}
>  
> In the DataNode log, we didn't see DataNode update block keys around 
> 2018-04-11 09:55:00 and around 2018-04-11 19:55:00.
> {code:java}
> 2018-04-10 14:51:36,424 INFO 
> org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting 
> block keys
> 2018-04-10 23:55:38,420 INFO 
> org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting 
> block keys
> 2018-04-11 00:51:34,792 INFO 
> org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting 
> block keys
> 2018-04-11 10:51:39,403 INFO 
> org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting 
> block keys
> 2018-04-11 20:51:44,422 INFO 
> org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting 
> block keys
> 2018-04-12 02:54:47,855 INFO 
> org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting 
> block keys
> 2018-04-12 05:55:44,456 INFO 
> org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting 
> block keys
> {code}
> The reason is there is SocketTimeOutException when sending heartbeat 

[jira] [Commented] (HDFS-13469) RBF: Support InodeID in the Router

2018-04-18 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443317#comment-16443317
 ] 

Daryn Sharp commented on HDFS-13469:


It's really as simple as:  the path /.reserved/.inodes/123 will access inode 
123 which may really be /user/daryn/dir/mystuff.  The nfs implementation relies 
on inode paths.

HDFS-7878 cannot solve the problem because its an abstraction for a different 
purpose.

> RBF: Support InodeID in the Router
> --
>
> Key: HDFS-13469
> URL: https://issues.apache.org/jira/browse/HDFS-13469
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Priority: Major
>
> The Namenode supports identifying files through inode identifiers.
> Currently the Router does not handle this properly, we need to add this 
> functionality.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13478) RBF: Decommission subclusters from the federation

2018-04-18 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HDFS-13478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443309#comment-16443309
 ] 

Íñigo Goiri commented on HDFS-13478:


I could make this part of the Membership table but I think it's better to keep 
it in a separate table.
In this way, they are independent and we can keep the subcluster decommissioned 
even when there is no Router heartbeating.

> RBF: Decommission subclusters from the federation
> -
>
> Key: HDFS-13478
> URL: https://issues.apache.org/jira/browse/HDFS-13478
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
>Priority: Major
>
> We have a subcluster in our federation that is for testing and is 
> missbehaving. This has a negative impact on the performance with operations 
> that go to every subcluster (e.g., renewLease() or setSafeMode()).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13478) RBF: Decommission subclusters from the federation

2018-04-18 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HDFS-13478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443277#comment-16443277
 ] 

Íñigo Goiri commented on HDFS-13478:


The idea would be to add a decommission table in the State Store and ignore 
this subcluster when checking for locations.
We would need in dfsrouteradmin to disable/enable subclusters.

> RBF: Decommission subclusters from the federation
> -
>
> Key: HDFS-13478
> URL: https://issues.apache.org/jira/browse/HDFS-13478
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
>Priority: Major
>
> We have a subcluster in our federation that is for testing and is 
> missbehaving. This has a negative impact on the performance with operations 
> that go to every subcluster (e.g., renewLease() or setSafeMode()).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-13478) RBF: Decommission subclusters from the federation

2018-04-18 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HDFS-13478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri reassigned HDFS-13478:
--

Assignee: Íñigo Goiri

> RBF: Decommission subclusters from the federation
> -
>
> Key: HDFS-13478
> URL: https://issues.apache.org/jira/browse/HDFS-13478
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
>Priority: Major
>
> We have a subcluster in our federation that is for testing and is 
> missbehaving. This has a negative impact on the performance with operations 
> that go to every subcluster (e.g., renewLease() or setSafeMode()).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-13478) RBF: Decommission subclusters from the federation

2018-04-18 Thread JIRA
Íñigo Goiri created HDFS-13478:
--

 Summary: RBF: Decommission subclusters from the federation
 Key: HDFS-13478
 URL: https://issues.apache.org/jira/browse/HDFS-13478
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Íñigo Goiri


We have a subcluster in our federation that is for testing and is missbehaving. 
This has a negative impact on the performance with operations that go to every 
subcluster (e.g., renewLease() or setSafeMode()).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13475) RBF: Admin cannot enforce Router enter SafeMode

2018-04-18 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443251#comment-16443251
 ] 

Wei Yan commented on HDFS-13475:


>From NameNode side, it has isInSafeMode() and isInStartupSafeMode(). Router 
>can follow similar concept, that we have two different safeMode functions: 
>isInSafeMode() and isInForcedSafeMode().

> RBF: Admin cannot enforce Router enter SafeMode
> ---
>
> Key: HDFS-13475
> URL: https://issues.apache.org/jira/browse/HDFS-13475
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Wei Yan
>Assignee: Wei Yan
>Priority: Major
>
> To reproduce the issue: 
> {code:java}
> $ bin/hdfs dfsrouteradmin -safemode enter
> Successfully enter safe mode.
> $ bin/hdfs dfsrouteradmin -safemode get
> Safe Mode: true{code}
> And then, 
> {code:java}
> $ bin/hdfs dfsrouteradmin -safemode get
> Safe Mode: false{code}
> From the code, it looks like the periodicInvoke triggers the leave.
> {code:java}
> public void periodicInvoke() {
> ..
>   // Always update to indicate our cache was updated
>   if (isCacheStale) {
> if (!rpcServer.isInSafeMode()) {
>   enter();
> }
>   } else if (rpcServer.isInSafeMode()) {
> // Cache recently updated, leave safe mode
> leave();
>   }
> }
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13448) HDFS Block Placement - Ignore Locality for First Block Replica

2018-04-18 Thread BELUGA BEHR (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443236#comment-16443236
 ] 

BELUGA BEHR commented on HDFS-13448:


[~daryn]  Thanks for pointing me in this direction.  I have attached a new 
patch as you have recommended.

> HDFS Block Placement - Ignore Locality for First Block Replica
> --
>
> Key: HDFS-13448
> URL: https://issues.apache.org/jira/browse/HDFS-13448
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: block placement, hdfs-client
>Affects Versions: 2.9.0, 3.0.1
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Attachments: HDFS-13448.1.patch, HDFS-13448.2.patch, 
> HDFS-13448.3.patch, HDFS-13448.4.patch
>
>
> According to the HDFS Block Place Rules:
> {quote}
> /**
>  * The replica placement strategy is that if the writer is on a datanode,
>  * the 1st replica is placed on the local machine, 
>  * otherwise a random datanode. The 2nd replica is placed on a datanode
>  * that is on a different rack. The 3rd replica is placed on a datanode
>  * which is on a different node of the rack as the second replica.
>  */
> {quote}
> However, there is a hint for the hdfs-client that allows the block placement 
> request to not put a block replica on the local datanode _where 'local' means 
> the same host as the client is being run on._
> {quote}
>   /**
>* Advise that a block replica NOT be written to the local DataNode where
>* 'local' means the same host as the client is being run on.
>*
>* @see CreateFlag#NO_LOCAL_WRITE
>*/
> {quote}
> I propose that we add a new flag that allows the hdfs-client to request that 
> the first block replica be placed on a random DataNode in the cluster.  The 
> subsequent block replicas should follow the normal block placement rules.
> The issue is that when the {{NO_LOCAL_WRITE}} is enabled, the first block 
> replica is not placed on the local node, but it is still placed on the local 
> rack.  Where this comes into play is where you have, for example, a flume 
> agent that is loading data into HDFS.
> If the Flume agent is running on a DataNode, then by default, the DataNode 
> local to the Flume agent will always get the first block replica and this 
> leads to un-even block placements, with the local node always filling up 
> faster than any other node in the cluster.
> Modifying this example, if the DataNode is removed from the host where the 
> Flume agent is running, or this {{NO_LOCAL_WRITE}} is enabled by Flume, then 
> the default block placement policy will still prefer the local rack.  This 
> remedies the situation only so far as now the first block replica will always 
> be distributed to a DataNode on the local rack.
> This new flag would allow a single Flume agent to distribute the blocks 
> randomly, evenly, over the entire cluster instead of hot-spotting the local 
> node or the local rack.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13448) HDFS Block Placement - Ignore Locality for First Block Replica

2018-04-18 Thread BELUGA BEHR (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HDFS-13448:
---
Attachment: (was: HDFS-13448.4.patch)

> HDFS Block Placement - Ignore Locality for First Block Replica
> --
>
> Key: HDFS-13448
> URL: https://issues.apache.org/jira/browse/HDFS-13448
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: block placement, hdfs-client
>Affects Versions: 2.9.0, 3.0.1
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Attachments: HDFS-13448.1.patch, HDFS-13448.2.patch, 
> HDFS-13448.3.patch, HDFS-13448.4.patch
>
>
> According to the HDFS Block Place Rules:
> {quote}
> /**
>  * The replica placement strategy is that if the writer is on a datanode,
>  * the 1st replica is placed on the local machine, 
>  * otherwise a random datanode. The 2nd replica is placed on a datanode
>  * that is on a different rack. The 3rd replica is placed on a datanode
>  * which is on a different node of the rack as the second replica.
>  */
> {quote}
> However, there is a hint for the hdfs-client that allows the block placement 
> request to not put a block replica on the local datanode _where 'local' means 
> the same host as the client is being run on._
> {quote}
>   /**
>* Advise that a block replica NOT be written to the local DataNode where
>* 'local' means the same host as the client is being run on.
>*
>* @see CreateFlag#NO_LOCAL_WRITE
>*/
> {quote}
> I propose that we add a new flag that allows the hdfs-client to request that 
> the first block replica be placed on a random DataNode in the cluster.  The 
> subsequent block replicas should follow the normal block placement rules.
> The issue is that when the {{NO_LOCAL_WRITE}} is enabled, the first block 
> replica is not placed on the local node, but it is still placed on the local 
> rack.  Where this comes into play is where you have, for example, a flume 
> agent that is loading data into HDFS.
> If the Flume agent is running on a DataNode, then by default, the DataNode 
> local to the Flume agent will always get the first block replica and this 
> leads to un-even block placements, with the local node always filling up 
> faster than any other node in the cluster.
> Modifying this example, if the DataNode is removed from the host where the 
> Flume agent is running, or this {{NO_LOCAL_WRITE}} is enabled by Flume, then 
> the default block placement policy will still prefer the local rack.  This 
> remedies the situation only so far as now the first block replica will always 
> be distributed to a DataNode on the local rack.
> This new flag would allow a single Flume agent to distribute the blocks 
> randomly, evenly, over the entire cluster instead of hot-spotting the local 
> node or the local rack.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13448) HDFS Block Placement - Ignore Locality for First Block Replica

2018-04-18 Thread BELUGA BEHR (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HDFS-13448:
---
Attachment: HDFS-13448.4.patch

> HDFS Block Placement - Ignore Locality for First Block Replica
> --
>
> Key: HDFS-13448
> URL: https://issues.apache.org/jira/browse/HDFS-13448
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: block placement, hdfs-client
>Affects Versions: 2.9.0, 3.0.1
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Attachments: HDFS-13448.1.patch, HDFS-13448.2.patch, 
> HDFS-13448.3.patch, HDFS-13448.4.patch
>
>
> According to the HDFS Block Place Rules:
> {quote}
> /**
>  * The replica placement strategy is that if the writer is on a datanode,
>  * the 1st replica is placed on the local machine, 
>  * otherwise a random datanode. The 2nd replica is placed on a datanode
>  * that is on a different rack. The 3rd replica is placed on a datanode
>  * which is on a different node of the rack as the second replica.
>  */
> {quote}
> However, there is a hint for the hdfs-client that allows the block placement 
> request to not put a block replica on the local datanode _where 'local' means 
> the same host as the client is being run on._
> {quote}
>   /**
>* Advise that a block replica NOT be written to the local DataNode where
>* 'local' means the same host as the client is being run on.
>*
>* @see CreateFlag#NO_LOCAL_WRITE
>*/
> {quote}
> I propose that we add a new flag that allows the hdfs-client to request that 
> the first block replica be placed on a random DataNode in the cluster.  The 
> subsequent block replicas should follow the normal block placement rules.
> The issue is that when the {{NO_LOCAL_WRITE}} is enabled, the first block 
> replica is not placed on the local node, but it is still placed on the local 
> rack.  Where this comes into play is where you have, for example, a flume 
> agent that is loading data into HDFS.
> If the Flume agent is running on a DataNode, then by default, the DataNode 
> local to the Flume agent will always get the first block replica and this 
> leads to un-even block placements, with the local node always filling up 
> faster than any other node in the cluster.
> Modifying this example, if the DataNode is removed from the host where the 
> Flume agent is running, or this {{NO_LOCAL_WRITE}} is enabled by Flume, then 
> the default block placement policy will still prefer the local rack.  This 
> remedies the situation only so far as now the first block replica will always 
> be distributed to a DataNode on the local rack.
> This new flag would allow a single Flume agent to distribute the blocks 
> randomly, evenly, over the entire cluster instead of hot-spotting the local 
> node or the local rack.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13448) HDFS Block Placement - Ignore Locality for First Block Replica

2018-04-18 Thread BELUGA BEHR (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HDFS-13448:
---
Attachment: (was: HDFS-13448.4.patch)

> HDFS Block Placement - Ignore Locality for First Block Replica
> --
>
> Key: HDFS-13448
> URL: https://issues.apache.org/jira/browse/HDFS-13448
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: block placement, hdfs-client
>Affects Versions: 2.9.0, 3.0.1
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Attachments: HDFS-13448.1.patch, HDFS-13448.2.patch, 
> HDFS-13448.3.patch, HDFS-13448.4.patch
>
>
> According to the HDFS Block Place Rules:
> {quote}
> /**
>  * The replica placement strategy is that if the writer is on a datanode,
>  * the 1st replica is placed on the local machine, 
>  * otherwise a random datanode. The 2nd replica is placed on a datanode
>  * that is on a different rack. The 3rd replica is placed on a datanode
>  * which is on a different node of the rack as the second replica.
>  */
> {quote}
> However, there is a hint for the hdfs-client that allows the block placement 
> request to not put a block replica on the local datanode _where 'local' means 
> the same host as the client is being run on._
> {quote}
>   /**
>* Advise that a block replica NOT be written to the local DataNode where
>* 'local' means the same host as the client is being run on.
>*
>* @see CreateFlag#NO_LOCAL_WRITE
>*/
> {quote}
> I propose that we add a new flag that allows the hdfs-client to request that 
> the first block replica be placed on a random DataNode in the cluster.  The 
> subsequent block replicas should follow the normal block placement rules.
> The issue is that when the {{NO_LOCAL_WRITE}} is enabled, the first block 
> replica is not placed on the local node, but it is still placed on the local 
> rack.  Where this comes into play is where you have, for example, a flume 
> agent that is loading data into HDFS.
> If the Flume agent is running on a DataNode, then by default, the DataNode 
> local to the Flume agent will always get the first block replica and this 
> leads to un-even block placements, with the local node always filling up 
> faster than any other node in the cluster.
> Modifying this example, if the DataNode is removed from the host where the 
> Flume agent is running, or this {{NO_LOCAL_WRITE}} is enabled by Flume, then 
> the default block placement policy will still prefer the local rack.  This 
> remedies the situation only so far as now the first block replica will always 
> be distributed to a DataNode on the local rack.
> This new flag would allow a single Flume agent to distribute the blocks 
> randomly, evenly, over the entire cluster instead of hot-spotting the local 
> node or the local rack.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13448) HDFS Block Placement - Ignore Locality for First Block Replica

2018-04-18 Thread BELUGA BEHR (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HDFS-13448:
---
Attachment: HDFS-13448.4.patch

> HDFS Block Placement - Ignore Locality for First Block Replica
> --
>
> Key: HDFS-13448
> URL: https://issues.apache.org/jira/browse/HDFS-13448
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: block placement, hdfs-client
>Affects Versions: 2.9.0, 3.0.1
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Attachments: HDFS-13448.1.patch, HDFS-13448.2.patch, 
> HDFS-13448.3.patch, HDFS-13448.4.patch
>
>
> According to the HDFS Block Place Rules:
> {quote}
> /**
>  * The replica placement strategy is that if the writer is on a datanode,
>  * the 1st replica is placed on the local machine, 
>  * otherwise a random datanode. The 2nd replica is placed on a datanode
>  * that is on a different rack. The 3rd replica is placed on a datanode
>  * which is on a different node of the rack as the second replica.
>  */
> {quote}
> However, there is a hint for the hdfs-client that allows the block placement 
> request to not put a block replica on the local datanode _where 'local' means 
> the same host as the client is being run on._
> {quote}
>   /**
>* Advise that a block replica NOT be written to the local DataNode where
>* 'local' means the same host as the client is being run on.
>*
>* @see CreateFlag#NO_LOCAL_WRITE
>*/
> {quote}
> I propose that we add a new flag that allows the hdfs-client to request that 
> the first block replica be placed on a random DataNode in the cluster.  The 
> subsequent block replicas should follow the normal block placement rules.
> The issue is that when the {{NO_LOCAL_WRITE}} is enabled, the first block 
> replica is not placed on the local node, but it is still placed on the local 
> rack.  Where this comes into play is where you have, for example, a flume 
> agent that is loading data into HDFS.
> If the Flume agent is running on a DataNode, then by default, the DataNode 
> local to the Flume agent will always get the first block replica and this 
> leads to un-even block placements, with the local node always filling up 
> faster than any other node in the cluster.
> Modifying this example, if the DataNode is removed from the host where the 
> Flume agent is running, or this {{NO_LOCAL_WRITE}} is enabled by Flume, then 
> the default block placement policy will still prefer the local rack.  This 
> remedies the situation only so far as now the first block replica will always 
> be distributed to a DataNode on the local rack.
> This new flag would allow a single Flume agent to distribute the blocks 
> randomly, evenly, over the entire cluster instead of hot-spotting the local 
> node or the local rack.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13448) HDFS Block Placement - Ignore Locality for First Block Replica

2018-04-18 Thread BELUGA BEHR (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HDFS-13448:
---
Status: Patch Available  (was: Open)

> HDFS Block Placement - Ignore Locality for First Block Replica
> --
>
> Key: HDFS-13448
> URL: https://issues.apache.org/jira/browse/HDFS-13448
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: block placement, hdfs-client
>Affects Versions: 3.0.1, 2.9.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Attachments: HDFS-13448.1.patch, HDFS-13448.2.patch, 
> HDFS-13448.3.patch, HDFS-13448.4.patch
>
>
> According to the HDFS Block Place Rules:
> {quote}
> /**
>  * The replica placement strategy is that if the writer is on a datanode,
>  * the 1st replica is placed on the local machine, 
>  * otherwise a random datanode. The 2nd replica is placed on a datanode
>  * that is on a different rack. The 3rd replica is placed on a datanode
>  * which is on a different node of the rack as the second replica.
>  */
> {quote}
> However, there is a hint for the hdfs-client that allows the block placement 
> request to not put a block replica on the local datanode _where 'local' means 
> the same host as the client is being run on._
> {quote}
>   /**
>* Advise that a block replica NOT be written to the local DataNode where
>* 'local' means the same host as the client is being run on.
>*
>* @see CreateFlag#NO_LOCAL_WRITE
>*/
> {quote}
> I propose that we add a new flag that allows the hdfs-client to request that 
> the first block replica be placed on a random DataNode in the cluster.  The 
> subsequent block replicas should follow the normal block placement rules.
> The issue is that when the {{NO_LOCAL_WRITE}} is enabled, the first block 
> replica is not placed on the local node, but it is still placed on the local 
> rack.  Where this comes into play is where you have, for example, a flume 
> agent that is loading data into HDFS.
> If the Flume agent is running on a DataNode, then by default, the DataNode 
> local to the Flume agent will always get the first block replica and this 
> leads to un-even block placements, with the local node always filling up 
> faster than any other node in the cluster.
> Modifying this example, if the DataNode is removed from the host where the 
> Flume agent is running, or this {{NO_LOCAL_WRITE}} is enabled by Flume, then 
> the default block placement policy will still prefer the local rack.  This 
> remedies the situation only so far as now the first block replica will always 
> be distributed to a DataNode on the local rack.
> This new flag would allow a single Flume agent to distribute the blocks 
> randomly, evenly, over the entire cluster instead of hot-spotting the local 
> node or the local rack.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13448) HDFS Block Placement - Ignore Locality for First Block Replica

2018-04-18 Thread BELUGA BEHR (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HDFS-13448:
---
Attachment: HDFS-13448.4.patch

> HDFS Block Placement - Ignore Locality for First Block Replica
> --
>
> Key: HDFS-13448
> URL: https://issues.apache.org/jira/browse/HDFS-13448
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: block placement, hdfs-client
>Affects Versions: 2.9.0, 3.0.1
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Attachments: HDFS-13448.1.patch, HDFS-13448.2.patch, 
> HDFS-13448.3.patch, HDFS-13448.4.patch
>
>
> According to the HDFS Block Place Rules:
> {quote}
> /**
>  * The replica placement strategy is that if the writer is on a datanode,
>  * the 1st replica is placed on the local machine, 
>  * otherwise a random datanode. The 2nd replica is placed on a datanode
>  * that is on a different rack. The 3rd replica is placed on a datanode
>  * which is on a different node of the rack as the second replica.
>  */
> {quote}
> However, there is a hint for the hdfs-client that allows the block placement 
> request to not put a block replica on the local datanode _where 'local' means 
> the same host as the client is being run on._
> {quote}
>   /**
>* Advise that a block replica NOT be written to the local DataNode where
>* 'local' means the same host as the client is being run on.
>*
>* @see CreateFlag#NO_LOCAL_WRITE
>*/
> {quote}
> I propose that we add a new flag that allows the hdfs-client to request that 
> the first block replica be placed on a random DataNode in the cluster.  The 
> subsequent block replicas should follow the normal block placement rules.
> The issue is that when the {{NO_LOCAL_WRITE}} is enabled, the first block 
> replica is not placed on the local node, but it is still placed on the local 
> rack.  Where this comes into play is where you have, for example, a flume 
> agent that is loading data into HDFS.
> If the Flume agent is running on a DataNode, then by default, the DataNode 
> local to the Flume agent will always get the first block replica and this 
> leads to un-even block placements, with the local node always filling up 
> faster than any other node in the cluster.
> Modifying this example, if the DataNode is removed from the host where the 
> Flume agent is running, or this {{NO_LOCAL_WRITE}} is enabled by Flume, then 
> the default block placement policy will still prefer the local rack.  This 
> remedies the situation only so far as now the first block replica will always 
> be distributed to a DataNode on the local rack.
> This new flag would allow a single Flume agent to distribute the blocks 
> randomly, evenly, over the entire cluster instead of hot-spotting the local 
> node or the local rack.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13448) HDFS Block Placement - Ignore Locality for First Block Replica

2018-04-18 Thread BELUGA BEHR (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HDFS-13448:
---
Status: Open  (was: Patch Available)

> HDFS Block Placement - Ignore Locality for First Block Replica
> --
>
> Key: HDFS-13448
> URL: https://issues.apache.org/jira/browse/HDFS-13448
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: block placement, hdfs-client
>Affects Versions: 3.0.1, 2.9.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Attachments: HDFS-13448.1.patch, HDFS-13448.2.patch, 
> HDFS-13448.3.patch
>
>
> According to the HDFS Block Place Rules:
> {quote}
> /**
>  * The replica placement strategy is that if the writer is on a datanode,
>  * the 1st replica is placed on the local machine, 
>  * otherwise a random datanode. The 2nd replica is placed on a datanode
>  * that is on a different rack. The 3rd replica is placed on a datanode
>  * which is on a different node of the rack as the second replica.
>  */
> {quote}
> However, there is a hint for the hdfs-client that allows the block placement 
> request to not put a block replica on the local datanode _where 'local' means 
> the same host as the client is being run on._
> {quote}
>   /**
>* Advise that a block replica NOT be written to the local DataNode where
>* 'local' means the same host as the client is being run on.
>*
>* @see CreateFlag#NO_LOCAL_WRITE
>*/
> {quote}
> I propose that we add a new flag that allows the hdfs-client to request that 
> the first block replica be placed on a random DataNode in the cluster.  The 
> subsequent block replicas should follow the normal block placement rules.
> The issue is that when the {{NO_LOCAL_WRITE}} is enabled, the first block 
> replica is not placed on the local node, but it is still placed on the local 
> rack.  Where this comes into play is where you have, for example, a flume 
> agent that is loading data into HDFS.
> If the Flume agent is running on a DataNode, then by default, the DataNode 
> local to the Flume agent will always get the first block replica and this 
> leads to un-even block placements, with the local node always filling up 
> faster than any other node in the cluster.
> Modifying this example, if the DataNode is removed from the host where the 
> Flume agent is running, or this {{NO_LOCAL_WRITE}} is enabled by Flume, then 
> the default block placement policy will still prefer the local rack.  This 
> remedies the situation only so far as now the first block replica will always 
> be distributed to a DataNode on the local rack.
> This new flag would allow a single Flume agent to distribute the blocks 
> randomly, evenly, over the entire cluster instead of hot-spotting the local 
> node or the local rack.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13079) Provide a config to start namenode in safemode state upto a certain transaction id

2018-04-18 Thread Hanisha Koneru (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443191#comment-16443191
 ] 

Hanisha Koneru commented on HDFS-13079:
---

Thanks for working on this [~shashikant]. 
bq. Please note that in case a checkpoint has already happened and the 
requested transaction id has been subsumed in an FSImage, then the namenode 
will be started with the next nearest transaction id. Further FSImage files and 
edits will be ignored.
In case the requested tx id falls within the latest fsImage , do we want to 
load the said fsImage or fallback to a previous fsimage with lastTxId < 
requested txId. IMO, we should load the fsImage with the endTxId <= requested 
txId.

* In {{FsImage#loadFSImage}}, the check for whether we should load a fsImage is 
made after the image is already being loaded. The line {{loader.load(curFile, 
requireSameLayoutVersion)}} loads the fsImage transactions into the NN.
{code}
FSImageFormat.LoaderDelegator loader = FSImageFormat.newLoader(conf, target);
loader.load(curFile, requireSameLayoutVersion);

long lastTxIdToLoad = target.getLastTxidToLoad();
long txId = loader.getLoadedImageTxId();
if (lastTxIdToLoad != HdfsServerConstants.INVALID_TXID && txId > 
lastTxIdToLoad) {
{code}

* When we skip loading the latest fsImage, we should keep falling back to try 
and load the next latest fsImage. For example, say we have the 2 fsImages - 
fsimage_00090 and fsimage_00150. Now say we want to start the namenode in 
safemode upto txId 120. We first check fsimage_00150 and reject it. After this, 
the NN should attempt to load the next latest fsimage i.e. fsimage_00090. 
We can throw an exception when skipping an fsImage and catch that exception in 
following code path in {{FSImage#loadFsImage}}. This way the next latest 
fsimage will be loaded.
{code}
721FSImageFile imageFile = null;
722for (int i = 0; i < imageFiles.size(); i++) {
723  try {
724imageFile = imageFiles.get(i);
725loadFSImageFile(target, recovery, imageFile, startOpt);
726break;

{code}

* What do we do when there are no fsImages with endTxId <= requested txId? 
IMO, we should stop the NN and throw an error.


> Provide a config to start namenode in safemode state upto a certain 
> transaction id
> --
>
> Key: HDFS-13079
> URL: https://issues.apache.org/jira/browse/HDFS-13079
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Mukul Kumar Singh
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-13079.001.patch, HDFS-13079.002.patch
>
>
> In some cases it necessary to rollback the Namenode back to a certain 
> transaction id. This is especially needed when the user issues a {{rm -Rf 
> -skipTrash}} by mistake.
> Rolling back to a transaction id helps in taking a peek at the filesystem at 
> a particular instant. This jira proposes to provide a configuration variable 
> using which the namenode can be started upto a certain transaction id. The 
> filesystem will be in a readonly safemode which cannot be overridden 
> manually. It will only be overridden by removing the config value from the 
> config file. Please also note that this will not cause any changes in the 
> filesystem state, the filesystem will be in safemode state and no changes to 
> the filesystem state will be allowed.
> Please note that in case a checkpoint has already happened and the requested 
> transaction id has been subsumed in an FSImage, then the namenode will be 
> started with the next nearest transaction id. Further FSImage files and edits 
> will be ignored.
> If the checkpoint hasn't happen then the namenode will be started with the 
> exact transaction id.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13477) Httpserver start failure should be non fatal for KSM and SCM startup

2018-04-18 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443190#comment-16443190
 ] 

genericqa commented on HDFS-13477:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} HDFS-7240 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
19s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
18s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 29m 
15s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
 6s{color} | {color:green} HDFS-7240 passed {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
24s{color} | {color:red} server-scm in HDFS-7240 failed. {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
26s{color} | {color:red} ozone-manager in HDFS-7240 failed. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 47s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
23s{color} | {color:red} server-scm in HDFS-7240 failed. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
22s{color} | {color:red} ozone-manager in HDFS-7240 failed. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
23s{color} | {color:red} server-scm in HDFS-7240 failed. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
22s{color} | {color:red} ozone-manager in HDFS-7240 failed. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
17s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
10s{color} | {color:red} server-scm in the patch failed. {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
10s{color} | {color:red} ozone-manager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 26m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 26m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
15s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
21s{color} | {color:red} server-scm in the patch failed. {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
20s{color} | {color:red} ozone-manager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 30s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
22s{color} | {color:red} server-scm in the patch failed. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
22s{color} | {color:red} ozone-manager in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
22s{color} | {color:red} server-scm in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
23s{color} | {color:red} ozone-manager in the patch failed. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 22s{color} 
| {color:red} server-scm in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 21s{color} 
| {color:red} ozone-manager in the patch failed. 

[jira] [Updated] (HDFS-13399) Make Client field AlignmentContext non-static.

2018-04-18 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-13399:

Attachment: (was: HDFS-13286-HDFS-12943.000.patch)

> Make Client field AlignmentContext non-static.
> --
>
> Key: HDFS-13399
> URL: https://issues.apache.org/jira/browse/HDFS-13399
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-12943
>Reporter: Plamen Jeliazkov
>Assignee: Plamen Jeliazkov
>Priority: Major
> Attachments: HDFS-13399-HDFS-12943.000.patch, 
> HDFS-13399-HDFS-12943.001.patch, HDFS-13399-HDFS-12943.002.patch
>
>
> In HDFS-12977, DFSClient's constructor was altered to make use of a new 
> static method in Client that allowed one to set an AlignmentContext. This 
> work is to remove that static field and make each DFSClient pass it's 
> AlignmentContext down to the proxy Call level.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13399) Make Client field AlignmentContext non-static.

2018-04-18 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-13399:

Attachment: HDFS-13286-HDFS-12943.000.patch

> Make Client field AlignmentContext non-static.
> --
>
> Key: HDFS-13399
> URL: https://issues.apache.org/jira/browse/HDFS-13399
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-12943
>Reporter: Plamen Jeliazkov
>Assignee: Plamen Jeliazkov
>Priority: Major
> Attachments: HDFS-13399-HDFS-12943.000.patch, 
> HDFS-13399-HDFS-12943.001.patch, HDFS-13399-HDFS-12943.002.patch
>
>
> In HDFS-12977, DFSClient's constructor was altered to make use of a new 
> static method in Client that allowed one to set an AlignmentContext. This 
> work is to remove that static field and make each DFSClient pass it's 
> AlignmentContext down to the proxy Call level.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13286) Add haadmin commands to transition between standby and observer

2018-04-18 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-13286:

Attachment: HDFS-13286-HDFS-12943.000.patch

> Add haadmin commands to transition between standby and observer
> ---
>
> Key: HDFS-13286
> URL: https://issues.apache.org/jira/browse/HDFS-13286
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-13286-HDFS-12943.000.patch
>
>
> As discussed in HDFS-12975, we should allow explicit transition between 
> standby and observer through haadmin command, such as:
> {code}
> haadmin -transitionToObserver
> {code}
> Initially we should support transition from observer to standby, and standby 
> to observer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13286) Add haadmin commands to transition between standby and observer

2018-04-18 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-13286:

Attachment: (was: HDFS-13286.0.patch)

> Add haadmin commands to transition between standby and observer
> ---
>
> Key: HDFS-13286
> URL: https://issues.apache.org/jira/browse/HDFS-13286
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>
> As discussed in HDFS-12975, we should allow explicit transition between 
> standby and observer through haadmin command, such as:
> {code}
> haadmin -transitionToObserver
> {code}
> Initially we should support transition from observer to standby, and standby 
> to observer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13286) Add haadmin commands to transition between standby and observer

2018-04-18 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-13286:

Attachment: (was: HDFS-13286.1.patch)

> Add haadmin commands to transition between standby and observer
> ---
>
> Key: HDFS-13286
> URL: https://issues.apache.org/jira/browse/HDFS-13286
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>
> As discussed in HDFS-12975, we should allow explicit transition between 
> standby and observer through haadmin command, such as:
> {code}
> haadmin -transitionToObserver
> {code}
> Initially we should support transition from observer to standby, and standby 
> to observer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13286) Add haadmin commands to transition between standby and observer

2018-04-18 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443165#comment-16443165
 ] 

genericqa commented on HDFS-13286:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  6s{color} 
| {color:red} HDFS-13286 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-13286 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12919686/HDFS-13286.1.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/23989/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Add haadmin commands to transition between standby and observer
> ---
>
> Key: HDFS-13286
> URL: https://issues.apache.org/jira/browse/HDFS-13286
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-13286.0.patch, HDFS-13286.1.patch
>
>
> As discussed in HDFS-12975, we should allow explicit transition between 
> standby and observer through haadmin command, such as:
> {code}
> haadmin -transitionToObserver
> {code}
> Initially we should support transition from observer to standby, and standby 
> to observer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13476) HDFS (Hadoop/HDP 2.7.3.2.6.4.0-91) reports CORRUPT files

2018-04-18 Thread feng xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443156#comment-16443156
 ] 

feng xu edited comment on HDFS-13476 at 4/18/18 8:38 PM:
-

By the way, java.io.[file.exists(|file://:exists%28/]) is not sufficient to 
determine if a file exists, because 
[fs|http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/io/File.java#File.0fs].getBooleanAttributes()
 could fail with other reasons.


was (Author: fxu...@hotmail.com):
By the way, java.io.file::exists() is not sufficient to determine if a file 
exists, because 
[fs|http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/io/File.java#File.0fs].[getBooleanAttributes|http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/io/FileSystem.java#FileSystem.getBooleanAttributes%28java.io.File%29]
 could fail with other reasons.

> HDFS (Hadoop/HDP 2.7.3.2.6.4.0-91) reports CORRUPT files
> 
>
> Key: HDFS-13476
> URL: https://issues.apache.org/jira/browse/HDFS-13476
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.7.4
>Reporter: feng xu
>Priority: Critical
>
> We have a security software runs on local file system(ext4), and the security 
> software denies some particular users to access some 
> {color:#33}particular {color}HDFS folders based on security policy. For 
> example, the security policy always gives the user hdfs full permission, and 
> denies the user yarn to access /dir1.  If the user yarn tries to access a 
> file under HDFS folder {color:#33}/dir1{color}, the security software 
> denies the access and returns EACCES from file system call through errno. 
> This used to work because the data corruption was determined by block 
> scanner([https://blog.cloudera.com/blog/2016/12/hdfs-datanode-scanners-and-disk-checker-explained/).]
> On HDP 2.7.3.2.6.4.0-91, HDFS reports a lot data corruptions because of the 
> security policy to deny file access in HDFS from local file system. We 
> debugged HDFS and found out BlockSender() directly calls the following 
> statements and may cause the problem:
> datanode.notifyNamenodeDeletedBlock(block, replica.getStorageUuid());
>  datanode.data.invalidate(block.getBlockPoolId(), new 
> Block[]\{block.getLocalBlock()});
> In the mean time, the block scanner is not triggered because of the 
> undocumented property {color:#33}dfs.datanode.disk.check.min.gap. However 
> the problem is still there if we disable 
> dfs.datanode.disk.check.min.gap{color} by setting it to 0. . 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13476) HDFS (Hadoop/HDP 2.7.3.2.6.4.0-91) reports CORRUPT files

2018-04-18 Thread feng xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443156#comment-16443156
 ] 

feng xu commented on HDFS-13476:


By the way, java.io.file::exists() is not sufficient to determine if a file 
exists, because 
[fs|http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/io/File.java#File.0fs].[getBooleanAttributes|http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/io/FileSystem.java#FileSystem.getBooleanAttributes%28java.io.File%29]
 could fail with other reasons.

> HDFS (Hadoop/HDP 2.7.3.2.6.4.0-91) reports CORRUPT files
> 
>
> Key: HDFS-13476
> URL: https://issues.apache.org/jira/browse/HDFS-13476
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.7.4
>Reporter: feng xu
>Priority: Critical
>
> We have a security software runs on local file system(ext4), and the security 
> software denies some particular users to access some 
> {color:#33}particular {color}HDFS folders based on security policy. For 
> example, the security policy always gives the user hdfs full permission, and 
> denies the user yarn to access /dir1.  If the user yarn tries to access a 
> file under HDFS folder {color:#33}/dir1{color}, the security software 
> denies the access and returns EACCES from file system call through errno. 
> This used to work because the data corruption was determined by block 
> scanner([https://blog.cloudera.com/blog/2016/12/hdfs-datanode-scanners-and-disk-checker-explained/).]
> On HDP 2.7.3.2.6.4.0-91, HDFS reports a lot data corruptions because of the 
> security policy to deny file access in HDFS from local file system. We 
> debugged HDFS and found out BlockSender() directly calls the following 
> statements and may cause the problem:
> datanode.notifyNamenodeDeletedBlock(block, replica.getStorageUuid());
>  datanode.data.invalidate(block.getBlockPoolId(), new 
> Block[]\{block.getLocalBlock()});
> In the mean time, the block scanner is not triggered because of the 
> undocumented property {color:#33}dfs.datanode.disk.check.min.gap. However 
> the problem is still there if we disable 
> dfs.datanode.disk.check.min.gap{color} by setting it to 0. . 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13286) Add haadmin commands to transition between standby and observer

2018-04-18 Thread Chao Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443153#comment-16443153
 ] 

Chao Sun commented on HDFS-13286:
-

Rebase to trunk.

> Add haadmin commands to transition between standby and observer
> ---
>
> Key: HDFS-13286
> URL: https://issues.apache.org/jira/browse/HDFS-13286
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-13286.0.patch, HDFS-13286.1.patch
>
>
> As discussed in HDFS-12975, we should allow explicit transition between 
> standby and observer through haadmin command, such as:
> {code}
> haadmin -transitionToObserver
> {code}
> Initially we should support transition from observer to standby, and standby 
> to observer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13286) Add haadmin commands to transition between standby and observer

2018-04-18 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-13286:

Attachment: HDFS-13286.1.patch

> Add haadmin commands to transition between standby and observer
> ---
>
> Key: HDFS-13286
> URL: https://issues.apache.org/jira/browse/HDFS-13286
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-13286.0.patch, HDFS-13286.1.patch
>
>
> As discussed in HDFS-12975, we should allow explicit transition between 
> standby and observer through haadmin command, such as:
> {code}
> haadmin -transitionToObserver
> {code}
> Initially we should support transition from observer to standby, and standby 
> to observer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13286) Add haadmin commands to transition between standby and observer

2018-04-18 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443148#comment-16443148
 ] 

genericqa commented on HDFS-13286:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  6s{color} 
| {color:red} HDFS-13286 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-13286 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12919684/HDFS-13286.0.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/23988/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Add haadmin commands to transition between standby and observer
> ---
>
> Key: HDFS-13286
> URL: https://issues.apache.org/jira/browse/HDFS-13286
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-13286.0.patch
>
>
> As discussed in HDFS-12975, we should allow explicit transition between 
> standby and observer through haadmin command, such as:
> {code}
> haadmin -transitionToObserver
> {code}
> Initially we should support transition from observer to standby, and standby 
> to observer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13286) Add haadmin commands to transition between standby and observer

2018-04-18 Thread Chao Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443145#comment-16443145
 ] 

Chao Sun commented on HDFS-13286:
-

Seems the change on {{haadmin}} is not as disruptive as I thought. Submitted 
the initial patch. [~shv], [~zero45], [~xkrogen]: can you take a look?

> Add haadmin commands to transition between standby and observer
> ---
>
> Key: HDFS-13286
> URL: https://issues.apache.org/jira/browse/HDFS-13286
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-13286.0.patch
>
>
> As discussed in HDFS-12975, we should allow explicit transition between 
> standby and observer through haadmin command, such as:
> {code}
> haadmin -transitionToObserver
> {code}
> Initially we should support transition from observer to standby, and standby 
> to observer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13286) Add haadmin commands to transition between standby and observer

2018-04-18 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-13286:

Status: Patch Available  (was: Open)

> Add haadmin commands to transition between standby and observer
> ---
>
> Key: HDFS-13286
> URL: https://issues.apache.org/jira/browse/HDFS-13286
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-13286.0.patch
>
>
> As discussed in HDFS-12975, we should allow explicit transition between 
> standby and observer through haadmin command, such as:
> {code}
> haadmin -transitionToObserver
> {code}
> Initially we should support transition from observer to standby, and standby 
> to observer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13286) Add haadmin commands to transition between standby and observer

2018-04-18 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-13286:

Attachment: HDFS-13286.0.patch

> Add haadmin commands to transition between standby and observer
> ---
>
> Key: HDFS-13286
> URL: https://issues.apache.org/jira/browse/HDFS-13286
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-13286.0.patch
>
>
> As discussed in HDFS-12975, we should allow explicit transition between 
> standby and observer through haadmin command, such as:
> {code}
> haadmin -transitionToObserver
> {code}
> Initially we should support transition from observer to standby, and standby 
> to observer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13442) Ozone: Handle Datanode Registration failure

2018-04-18 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443095#comment-16443095
 ] 

genericqa commented on HDFS-13442:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
15s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} HDFS-7240 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
9s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
37s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
52s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} HDFS-7240 passed {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
13s{color} | {color:red} container-service in HDFS-7240 failed. {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
14s{color} | {color:red} server-scm in HDFS-7240 failed. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 21s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
58s{color} | {color:red} hadoop-hdds/common in HDFS-7240 has 1 extant Findbugs 
warnings. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
13s{color} | {color:red} container-service in HDFS-7240 failed. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
15s{color} | {color:red} server-scm in HDFS-7240 failed. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
13s{color} | {color:red} container-service in HDFS-7240 failed. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
13s{color} | {color:red} server-scm in HDFS-7240 failed. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
9s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m  
9s{color} | {color:red} container-service in the patch failed. {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
10s{color} | {color:red} server-scm in the patch failed. {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 7s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m  
9s{color} | {color:red} container-service in the patch failed. {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
10s{color} | {color:red} server-scm in the patch failed. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 15s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
11s{color} | {color:red} container-service in the patch failed. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
14s{color} | {color:red} server-scm in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
13s{color} | {color:red} container-service in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javadoc 

[jira] [Updated] (HDFS-13355) Create IO provider for hdsl

2018-04-18 Thread Ajay Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajay Kumar updated HDFS-13355:
--
Issue Type: Sub-task  (was: Improvement)
Parent: HDFS-7240

> Create IO provider for hdsl
> ---
>
> Key: HDFS-13355
> URL: https://issues.apache.org/jira/browse/HDFS-13355
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-7240
>Reporter: Ajay Kumar
>Priority: Major
> Fix For: HDFS-7240
>
>
> Create an abstraction like FileIoProvider for hdsl to handle disk failure and 
> other issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13477) Httpserver start failure should be non fatal for KSM and SCM startup

2018-04-18 Thread Ajay Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajay Kumar updated HDFS-13477:
--
Attachment: HDFS-13477-HDFS-7240.00.patch

> Httpserver start failure should be non fatal for KSM and SCM startup
> 
>
> Key: HDFS-13477
> URL: https://issues.apache.org/jira/browse/HDFS-13477
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-7240
>Reporter: Ajay Kumar
>Assignee: Ajay Kumar
>Priority: Major
> Fix For: HDFS-7240
>
> Attachments: HDFS-13477-HDFS-7240.00.patch
>
>
> Currently KSM and SCM startup will fail if corresponding HttpServer fails 
> with some Exception. HttpServer is not essential for operations of KSM and 
> SCM so we should allow them to start even if httpServer fails.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13442) Ozone: Handle Datanode Registration failure

2018-04-18 Thread Hanisha Koneru (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDFS-13442:
--
Attachment: HDFS-13442-HDFS-7240.002.patch

> Ozone: Handle Datanode Registration failure
> ---
>
> Key: HDFS-13442
> URL: https://issues.apache.org/jira/browse/HDFS-13442
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
> Attachments: HDFS-13442-HDFS-7240.001.patch, 
> HDFS-13442-HDFS-7240.002.patch
>
>
> If a datanode is not able to register itself, we need to handle that 
> correctly. 
> If the number of unsuccessful attempts to register with the SCM exceeds a 
> configurable max number, the datanode should not make any more attempts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13477) Httpserver start failure should be non fatal for KSM and SCM startup

2018-04-18 Thread Ajay Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajay Kumar updated HDFS-13477:
--
Attachment: (was: HDFS-13477-HDFS-7240.00.patch)

> Httpserver start failure should be non fatal for KSM and SCM startup
> 
>
> Key: HDFS-13477
> URL: https://issues.apache.org/jira/browse/HDFS-13477
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-7240
>Reporter: Ajay Kumar
>Assignee: Ajay Kumar
>Priority: Major
> Fix For: HDFS-7240
>
> Attachments: HDFS-13477-HDFS-7240.00.patch
>
>
> Currently KSM and SCM startup will fail if corresponding HttpServer fails 
> with some Exception. HttpServer is not essential for operations of KSM and 
> SCM so we should allow them to start even if httpServer fails.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13477) Httpserver start failure should be non fatal for KSM and SCM startup

2018-04-18 Thread Ajay Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajay Kumar updated HDFS-13477:
--
Attachment: HDFS-13477-HDFS-7240.00.patch

> Httpserver start failure should be non fatal for KSM and SCM startup
> 
>
> Key: HDFS-13477
> URL: https://issues.apache.org/jira/browse/HDFS-13477
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-7240
>Reporter: Ajay Kumar
>Priority: Major
> Fix For: HDFS-7240
>
> Attachments: HDFS-13477-HDFS-7240.00.patch
>
>
> Currently KSM and SCM startup will fail if corresponding HttpServer fails 
> with some Exception. HttpServer is not essential for operations of KSM and 
> SCM so we should allow them to start even if httpServer fails.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-13477) Httpserver start failure should be non fatal for KSM and SCM startup

2018-04-18 Thread Ajay Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajay Kumar reassigned HDFS-13477:
-

Assignee: Ajay Kumar

> Httpserver start failure should be non fatal for KSM and SCM startup
> 
>
> Key: HDFS-13477
> URL: https://issues.apache.org/jira/browse/HDFS-13477
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-7240
>Reporter: Ajay Kumar
>Assignee: Ajay Kumar
>Priority: Major
> Fix For: HDFS-7240
>
> Attachments: HDFS-13477-HDFS-7240.00.patch
>
>
> Currently KSM and SCM startup will fail if corresponding HttpServer fails 
> with some Exception. HttpServer is not essential for operations of KSM and 
> SCM so we should allow them to start even if httpServer fails.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13477) Httpserver start failure should be non fatal for KSM and SCM startup

2018-04-18 Thread Ajay Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajay Kumar updated HDFS-13477:
--
Status: Patch Available  (was: Open)

> Httpserver start failure should be non fatal for KSM and SCM startup
> 
>
> Key: HDFS-13477
> URL: https://issues.apache.org/jira/browse/HDFS-13477
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-7240
>Reporter: Ajay Kumar
>Priority: Major
> Fix For: HDFS-7240
>
> Attachments: HDFS-13477-HDFS-7240.00.patch
>
>
> Currently KSM and SCM startup will fail if corresponding HttpServer fails 
> with some Exception. HttpServer is not essential for operations of KSM and 
> SCM so we should allow them to start even if httpServer fails.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13477) Httpserver start failure should be non fatal for KSM and SCM startup

2018-04-18 Thread Ajay Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajay Kumar updated HDFS-13477:
--
Issue Type: Sub-task  (was: Improvement)
Parent: HDFS-7240

> Httpserver start failure should be non fatal for KSM and SCM startup
> 
>
> Key: HDFS-13477
> URL: https://issues.apache.org/jira/browse/HDFS-13477
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-7240
>Reporter: Ajay Kumar
>Priority: Major
> Fix For: HDFS-7240
>
>
> Currently KSM and SCM startup will fail if corresponding HttpServer fails 
> with some Exception. HttpServer is not essential for operations of KSM and 
> SCM so we should allow them to start even if httpServer fails.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13477) Httpserver start failure should be non fatal for KSM and SCM startup

2018-04-18 Thread Ajay Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajay Kumar updated HDFS-13477:
--
Fix Version/s: HDFS-7240

> Httpserver start failure should be non fatal for KSM and SCM startup
> 
>
> Key: HDFS-13477
> URL: https://issues.apache.org/jira/browse/HDFS-13477
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: HDFS-7240
>Reporter: Ajay Kumar
>Priority: Major
> Fix For: HDFS-7240
>
>
> Currently KSM and SCM startup will fail if corresponding HttpServer fails 
> with some Exception. HttpServer is not essential for operations of KSM and 
> SCM so we should allow them to start even if httpServer fails.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13477) Httpserver start failure should be non fatal for KSM and SCM startup

2018-04-18 Thread Ajay Kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajay Kumar updated HDFS-13477:
--
Affects Version/s: HDFS-7240

> Httpserver start failure should be non fatal for KSM and SCM startup
> 
>
> Key: HDFS-13477
> URL: https://issues.apache.org/jira/browse/HDFS-13477
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: HDFS-7240
>Reporter: Ajay Kumar
>Priority: Major
> Fix For: HDFS-7240
>
>
> Currently KSM and SCM startup will fail if corresponding HttpServer fails 
> with some Exception. HttpServer is not essential for operations of KSM and 
> SCM so we should allow them to start even if httpServer fails.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-13477) Httpserver start failure should be non fatal for KSM and SCM startup

2018-04-18 Thread Ajay Kumar (JIRA)
Ajay Kumar created HDFS-13477:
-

 Summary: Httpserver start failure should be non fatal for KSM and 
SCM startup
 Key: HDFS-13477
 URL: https://issues.apache.org/jira/browse/HDFS-13477
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Ajay Kumar


Currently KSM and SCM startup will fail if corresponding HttpServer fails with 
some Exception. HttpServer is not essential for operations of KSM and SCM so we 
should allow them to start even if httpServer fails.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13476) HDFS (Hadoop/HDP 2.7.3.2.6.4.0-91) reports CORRUPT files

2018-04-18 Thread feng xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

feng xu updated HDFS-13476:
---
Description: 
We have a security software runs on local file system(ext4), and the security 
software denies some particular users to access some {color:#33}particular 
{color}HDFS folders based on security policy. For example, the security policy 
always gives the user hdfs full permission, and denies the user yarn to access 
/dir1.  If the user yarn tries to access a file under HDFS folder 
{color:#33}/dir1{color}, the security software denies the access and 
returns EACCES from file system call through errno. This used to work because 
the data corruption was determined by block 
scanner([https://blog.cloudera.com/blog/2016/12/hdfs-datanode-scanners-and-disk-checker-explained/).]

On HDP 2.7.3.2.6.4.0-91, HDFS reports a lot data corruptions because of the 
security policy to deny file access in HDFS from local file system. We debugged 
HDFS and found out BlockSender() directly calls the following statements and 
may cause the problem:

datanode.notifyNamenodeDeletedBlock(block, replica.getStorageUuid());
 datanode.data.invalidate(block.getBlockPoolId(), new 
Block[]\{block.getLocalBlock()});

In the mean time, the block scanner is not triggered because of the 
undocumented property {color:#33}dfs.datanode.disk.check.min.gap. However 
the problem is still there if we disable dfs.datanode.disk.check.min.gap{color} 
by setting it to 0. . 

  was:
We have a security software runs on local file system(ext4), and the security 
software denies some particular users to access some {color:#33}particular 
{color}HDFS folders based on security policy. For example, the security policy 
always gives the user hdfs full permission, and denies the user yarn to access 
/dir1.  If the user yarn tries to access a file under HDFS folder 
{color:#33}/dir1{color}, the security software denies the access and 
returns EACCES from file system call through errno. This used to work because 
the data corruption was determined by block 
scanner([https://blog.cloudera.com/blog/2016/12/hdfs-datanode-scanners-and-disk-checker-explained/).]

On HDP 2.7.3.2.6.4.0-91, HDFS reports a lot data corruptions because of the 
security policy to deny file access in HDFS from local file system. We debugged 
HDFS and found out BlockSender() directly calls the following statements and 
causes the problem:

datanode.notifyNamenodeDeletedBlock(block, replica.getStorageUuid());
datanode.data.invalidate(block.getBlockPoolId(), new 
Block[]\{block.getLocalBlock()});

In the mean time, the block scanner is not triggered because of the 
undocumented property {color:#33}dfs.datanode.disk.check.min.gap. However 
the problem is still there if we disable 
{color:#33}dfs.datanode.disk.check.min.gap{color} by setting it to 0. 
.{color} 


> HDFS (Hadoop/HDP 2.7.3.2.6.4.0-91) reports CORRUPT files
> 
>
> Key: HDFS-13476
> URL: https://issues.apache.org/jira/browse/HDFS-13476
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.7.4
>Reporter: feng xu
>Priority: Critical
>
> We have a security software runs on local file system(ext4), and the security 
> software denies some particular users to access some 
> {color:#33}particular {color}HDFS folders based on security policy. For 
> example, the security policy always gives the user hdfs full permission, and 
> denies the user yarn to access /dir1.  If the user yarn tries to access a 
> file under HDFS folder {color:#33}/dir1{color}, the security software 
> denies the access and returns EACCES from file system call through errno. 
> This used to work because the data corruption was determined by block 
> scanner([https://blog.cloudera.com/blog/2016/12/hdfs-datanode-scanners-and-disk-checker-explained/).]
> On HDP 2.7.3.2.6.4.0-91, HDFS reports a lot data corruptions because of the 
> security policy to deny file access in HDFS from local file system. We 
> debugged HDFS and found out BlockSender() directly calls the following 
> statements and may cause the problem:
> datanode.notifyNamenodeDeletedBlock(block, replica.getStorageUuid());
>  datanode.data.invalidate(block.getBlockPoolId(), new 
> Block[]\{block.getLocalBlock()});
> In the mean time, the block scanner is not triggered because of the 
> undocumented property {color:#33}dfs.datanode.disk.check.min.gap. However 
> the problem is still there if we disable 
> dfs.datanode.disk.check.min.gap{color} by setting it to 0. . 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: 

[jira] [Commented] (HDFS-13476) HDFS (Hadoop/HDP 2.7.3.2.6.4.0-91) reports CORRUPT files

2018-04-18 Thread feng xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443023#comment-16443023
 ] 

feng xu commented on HDFS-13476:


2018-04-18 12:40:48,466 ERROR datanode.DataNode (DataXceiver.java:run(278)) - 
4381-fxu-centos7:50010:DataXceiver error processing READ_BLOCK operation src: 
/10.3.43.81:51424 dst: /10.3.43.81:50010
java.io.FileNotFoundException: BlockId 1073741896 is not valid.
 at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockFile(FsDatasetImpl.java:739)
 at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockFile(FsDatasetImpl.java:730)
 at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getMetaDataInputStream(FsDatasetImpl.java:232)
 at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:299)
 at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:547)
 at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:116)
 at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
 at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:251)
 at java.lang.Thread.run(Thread.java:745)

 

 

> HDFS (Hadoop/HDP 2.7.3.2.6.4.0-91) reports CORRUPT files
> 
>
> Key: HDFS-13476
> URL: https://issues.apache.org/jira/browse/HDFS-13476
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.7.4
>Reporter: feng xu
>Priority: Critical
>
> We have a security software runs on local file system(ext4), and the security 
> software denies some particular users to access some 
> {color:#33}particular {color}HDFS folders based on security policy. For 
> example, the security policy always gives the user hdfs full permission, and 
> denies the user yarn to access /dir1.  If the user yarn tries to access a 
> file under HDFS folder {color:#33}/dir1{color}, the security software 
> denies the access and returns EACCES from file system call through errno. 
> This used to work because the data corruption was determined by block 
> scanner([https://blog.cloudera.com/blog/2016/12/hdfs-datanode-scanners-and-disk-checker-explained/).]
> On HDP 2.7.3.2.6.4.0-91, HDFS reports a lot data corruptions because of the 
> security policy to deny file access in HDFS from local file system. We 
> debugged HDFS and found out BlockSender() directly calls the following 
> statements and may cause the problem:
> datanode.notifyNamenodeDeletedBlock(block, replica.getStorageUuid());
>  datanode.data.invalidate(block.getBlockPoolId(), new 
> Block[]\{block.getLocalBlock()});
> In the mean time, the block scanner is not triggered because of the 
> undocumented property {color:#33}dfs.datanode.disk.check.min.gap. However 
> the problem is still there if we disable 
> dfs.datanode.disk.check.min.gap{color} by setting it to 0. . 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12950) [oiv] ls will fail in secure cluster

2018-04-18 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443013#comment-16443013
 ] 

Brahma Reddy Battula commented on HDFS-12950:
-

[~jojochuang] thanks for the patch. Apporach looks good to me.

I feel, additionally we can mention that  if we can pass 
"-Dhadoop.security.authentication=simple" ls will work...?

> [oiv] ls will fail in  secure cluster
> -
>
> Key: HDFS-12950
> URL: https://issues.apache.org/jira/browse/HDFS-12950
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Brahma Reddy Battula
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Attachments: HDFS-12950.001.patch, HDFS-12950.002.patch
>
>
> if we execute ls, it will throw following.
> {noformat}
> hdfs dfs -ls webhdfs://127.0.0.1:5978/
> ls: Invalid value for webhdfs parameter "op"
> {noformat}
> When client is configured with security (i.e "hadoop.security.authentication= 
> KERBEROS) , 
> then webhdfs will request getdelegation token which is not implemented and 
> hence it will  throw “ls: Invalid value for webhdfs parameter "op"”.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13476) HDFS (Hadoop/HDP 2.7.3.2.6.4.0-91) reports CORRUPT files

2018-04-18 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16442998#comment-16442998
 ] 

Brahma Reddy Battula commented on HDFS-13476:
-

Looks issue similar to HDFS-11711.

so you are getting the FileNotFoundException..?

Can you please attach the trace also..? hope HDP-2.7.3 will be same as 
hadoop-2.7.3.

> HDFS (Hadoop/HDP 2.7.3.2.6.4.0-91) reports CORRUPT files
> 
>
> Key: HDFS-13476
> URL: https://issues.apache.org/jira/browse/HDFS-13476
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.7.4
>Reporter: feng xu
>Priority: Critical
>
> We have a security software runs on local file system(ext4), and the security 
> software denies some particular users to access some 
> {color:#33}particular {color}HDFS folders based on security policy. For 
> example, the security policy always gives the user hdfs full permission, and 
> denies the user yarn to access /dir1.  If the user yarn tries to access a 
> file under HDFS folder {color:#33}/dir1{color}, the security software 
> denies the access and returns EACCES from file system call through errno. 
> This used to work because the data corruption was determined by block 
> scanner([https://blog.cloudera.com/blog/2016/12/hdfs-datanode-scanners-and-disk-checker-explained/).]
> On HDP 2.7.3.2.6.4.0-91, HDFS reports a lot data corruptions because of the 
> security policy to deny file access in HDFS from local file system. We 
> debugged HDFS and found out BlockSender() directly calls the following 
> statements and causes the problem:
> datanode.notifyNamenodeDeletedBlock(block, replica.getStorageUuid());
> datanode.data.invalidate(block.getBlockPoolId(), new 
> Block[]\{block.getLocalBlock()});
> In the mean time, the block scanner is not triggered because of the 
> undocumented property {color:#33}dfs.datanode.disk.check.min.gap. However 
> the problem is still there if we disable 
> {color:#33}dfs.datanode.disk.check.min.gap{color} by setting it to 0. 
> .{color} 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13470) RBF: Add Browse the Filesystem button to the UI

2018-04-18 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HDFS-13470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16442990#comment-16442990
 ] 

Íñigo Goiri commented on HDFS-13470:


To make the whole UI easier to maintain I would make all the header and tab 
links to be generated.
Right now, we have to keep consistency between the NN pages and the Router 
pages.
Not sure how to do it with js but I'll give it a try.

> RBF: Add Browse the Filesystem button to the UI
> ---
>
> Key: HDFS-13470
> URL: https://issues.apache.org/jira/browse/HDFS-13470
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
>Priority: Major
> Attachments: HDFS-13470.000.patch
>
>
> After HDFS-12512 added WebHDFS, we can add the support to browse the 
> filesystem to the UI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-13476) HDFS (Hadoop/HDP 2.7.3.2.6.4.0-91) reports CORRUPT files

2018-04-18 Thread feng xu (JIRA)
feng xu created HDFS-13476:
--

 Summary: HDFS (Hadoop/HDP 2.7.3.2.6.4.0-91) reports CORRUPT files
 Key: HDFS-13476
 URL: https://issues.apache.org/jira/browse/HDFS-13476
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.7.4
Reporter: feng xu


We have a security software runs on local file system(ext4), and the security 
software denies some particular users to access some {color:#33}particular 
{color}HDFS folders based on security policy. For example, the security policy 
always gives the user hdfs full permission, and denies the user yarn to access 
/dir1.  If the user yarn tries to access a file under HDFS folder 
{color:#33}/dir1{color}, the security software denies the access and 
returns EACCES from file system call through errno. This used to work because 
the data corruption was determined by block 
scanner([https://blog.cloudera.com/blog/2016/12/hdfs-datanode-scanners-and-disk-checker-explained/).]

On HDP 2.7.3.2.6.4.0-91, HDFS reports a lot data corruptions because of the 
security policy to deny file access in HDFS from local file system. We debugged 
HDFS and found out BlockSender() directly calls the following statements and 
causes the problem:

datanode.notifyNamenodeDeletedBlock(block, replica.getStorageUuid());
datanode.data.invalidate(block.getBlockPoolId(), new 
Block[]\{block.getLocalBlock()});

In the mean time, the block scanner is not triggered because of the 
undocumented property {color:#33}dfs.datanode.disk.check.min.gap. However 
the problem is still there if we disable 
{color:#33}dfs.datanode.disk.check.min.gap{color} by setting it to 0. 
.{color} 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13474) Unable to start Hadoop DataNodes

2018-04-18 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16442945#comment-16442945
 ] 

Brahma Reddy Battula commented on HDFS-13474:
-

Thanks for reporting,Could you try with Java8..? Could have this query in 
mailing list, Jira is to track the issues.

> Unable to start Hadoop DataNodes
> 
>
> Key: HDFS-13474
> URL: https://issues.apache.org/jira/browse/HDFS-13474
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: robbie
>Priority: Major
> Attachments: hadoop-roycecoll...@steelydan.com-datanode-c0315.log, 
> hadoop-roycecoll...@steelydan.com-datanode-c0315.out, 
> hadoop-roycecoll...@steelydan.com-namenode-c0315.log, 
> hadoop-roycecoll...@steelydan.com-namenode-c0315.out, 
> hadoop-roycecoll...@steelydan.com-secondarynamenode-c0315.log, 
> hadoop-roycecoll...@steelydan.com-secondarynamenode-c0315.out
>
>
> I am trying to follow the instructions in the Getting Started guide,
> [http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html#YARN_on_Single_Node]
> I have confirmed, that I can `ssh localhost` without a password prompt. I 
> have also run the following steps,
> {quote}1. $ bin/hdfs namenode -format
>  2. $ sbin/start-dfs.sh
> {quote}
> But I cant run step 3. to browse the location at `[http://localhost:9870/]`. 
> When I run `>jsp` from the terminal prompt I just get returned,
> {quote}14900 Jps
> {quote}
> I was expecting a list of my nodes.
> In the Logs I see two error messages towards the end,
> {quote}2018-04-18 14:15:42,516 ERROR 
> org.apache.hadoop.hdfs.server.datanode.DataNode: RECEIVED SIGNAL 15: SIGTERM
> {quote}
> {quote}2018-04-18 14:15:42,516 ERROR 
> org.apache.hadoop.hdfs.server.datanode.DataNode: RECEIVED SIGNAL 1: SIGHUP
> {quote}
> {quote}2018-04-18 14:15:42,517 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
>  /
>  SHUTDOWN_MSG: Shutting down DataNode at c0315/127.0.1.1
>  /
> {quote}
> I will attach the full logs with this bug report.
> Can anyone help even with ways to debug this please ?
>   
>  Java Version,
> {quote}rcoll...@steelydan.com@c0315:~/temp/logs/hadoop$ java --version 
>  java 9.0.4 
>  Java(TM) SE Runtime Environment (build 9.0.4+11) 
>  Java HotSpot(TM) 64-Bit Server VM (build 9.0.4+11, mixed mode)
> {quote}
> Ubuntu version,
> {quote}$ lsb_release -a
>  No LSB modules are available.
>  Distributor ID: neon
>  Description: KDE neon User Edition 5.12
>  Release: 16.04
>  Codename: xenial
> {quote}
> I have tried running the commands, `bin/hdfs version`
> {quote}Hadoop 3.1.0 
>  Source code repository [https://github.com/apache/hadoop] -r 
> 16b70619a24cdcf5d3b0fcf4b58ca77238ccbe6d 
>  Compiled by centos on 2018-03-30T00:00Z 
>  Compiled with protoc 2.5.0 
>  From source with checksum 14182d20c972b3e2105580a1ad6990 
>  This command was run using 
> /home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/common/hadoop-common-3.1.0.jar
> {quote}
>  when I try `bin/hdfs groups` it doesnt return but gives me,
> {quote}018-04-18 15:33:34,590 INFO ipc.Client: Retrying connect to server: 
> localhost/127.0.0.1:9000. Already tried 0 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> {quote}
> when I try, `$ bin/hdfs lsSnapshottableDir`
> {quote}lsSnapshottableDir: Call From c0315/127.0.1.1 to localhost:9000 failed 
> on connection exception: java.net.ConnectException: Connection refused; For 
> more details see:  [http://wiki|http://wiki/].
>  apache.org/hadoop/ConnectionRefused
> {quote}
>  
>  when I try, `$ bin/hdfs classpath`
> {quote}/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/etc/hadoop:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/common/lib/*:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/common/*:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/hdfs:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/hdfs/lib/*:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/hdfs/*:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/mapreduce/*:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/yarn:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/yarn/lib/*:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/yarn/*
> {quote}
> core-site.xml
> {quote} 
>  
>  
>  fs.defaultFS
>  hdfs://localhost:9000
>  
>  
> {quote}
>  
>  hdfs-site.xml
> {quote}
>  
>  dfs.replication
>  1
>  
>  
> {quote}
> mapred-site.xml
> {quote}
>  
>  mapreduce.framework.name
>  yarn
>  
>  
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HDFS-12749) DN may not send block report to NN after NN restart

2018-04-18 Thread He Xiaoqiao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16442942#comment-16442942
 ] 

He Xiaoqiao commented on HDFS-12749:


ping [~kihwal],[~daryn],[~arpitagarwal],[~ajayydv] do you mind having a look?

> DN may not send block report to NN after NN restart
> ---
>
> Key: HDFS-12749
> URL: https://issues.apache.org/jira/browse/HDFS-12749
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.7.1, 2.8.3, 2.7.5, 3.0.0, 2.9.1
>Reporter: TanYuxin
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HDFS-12749-branch-2.7.002.patch, 
> HDFS-12749-trunk.003.patch, HDFS-12749-trunk.004.patch, HDFS-12749.001.patch
>
>
> Now our cluster have thousands of DN, millions of files and blocks. When NN 
> restart, NN's load is very high.
> After NN restart,DN will call BPServiceActor#reRegister method to register. 
> But register RPC will get a IOException since NN is busy dealing with Block 
> Report.  The exception is caught at BPServiceActor#processCommand.
> Next is the caught IOException:
> {code:java}
> WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Error processing 
> datanode Command
> java.io.IOException: Failed on local exception: java.io.IOException: 
> java.net.SocketTimeoutException: 6 millis timeout while waiting for 
> channel to be ready for read. ch : java.nio.channels.SocketChannel[connected 
> local=/DataNode_IP:Port remote=NameNode_Host/IP:Port]; Host Details : local 
> host is: "DataNode_Host/Datanode_IP"; destination host is: 
> "NameNode_Host":Port;
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:773)
> at org.apache.hadoop.ipc.Client.call(Client.java:1474)
> at org.apache.hadoop.ipc.Client.call(Client.java:1407)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
> at com.sun.proxy.$Proxy13.registerDatanode(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:126)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:793)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.reRegister(BPServiceActor.java:926)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:604)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:898)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:711)
> at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:864)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> The un-catched IOException breaks BPServiceActor#register, and the Block 
> Report can not be sent immediately. 
> {code}
>   /**
>* Register one bp with the corresponding NameNode
>* 
>* The bpDatanode needs to register with the namenode on startup in order
>* 1) to report which storage it is serving now and 
>* 2) to receive a registrationID
>*  
>* issued by the namenode to recognize registered datanodes.
>* 
>* @param nsInfo current NamespaceInfo
>* @see FSNamesystem#registerDatanode(DatanodeRegistration)
>* @throws IOException
>*/
>   void register(NamespaceInfo nsInfo) throws IOException {
> // The handshake() phase loaded the block pool storage
> // off disk - so update the bpRegistration object from that info
> DatanodeRegistration newBpRegistration = bpos.createRegistration();
> LOG.info(this + " beginning handshake with NN");
> while (shouldRun()) {
>   try {
> // Use returned registration from namenode with updated fields
> newBpRegistration = bpNamenode.registerDatanode(newBpRegistration);
> newBpRegistration.setNamespaceInfo(nsInfo);
> bpRegistration = newBpRegistration;
> break;
>   } catch(EOFException e) {  // namenode might have just restarted
> LOG.info("Problem connecting to server: " + nnAddr + " :"
> + e.getLocalizedMessage());
> sleepAndLogInterrupts(1000, "connecting to server");
>   } catch(SocketTimeoutException e) {  // namenode is busy
> LOG.info("Problem connecting to server: " + nnAddr);
> sleepAndLogInterrupts(1000, "connecting to server");
>   }
> }
> 
> LOG.info("Block pool " + this + " successfully registered with NN");
> bpos.registrationSucceeded(this, bpRegistration);
> // random short delay - helps scatter the BR from all DNs
> scheduler.scheduleBlockReport(dnConf.initialBlockReportDelay);
>   }
> {code}
> But 

[jira] [Commented] (HDFS-13473) DataNode update BlockKeys using mode PULL rather than PUSH from NameNode

2018-04-18 Thread He Xiaoqiao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16442934#comment-16442934
 ] 

He Xiaoqiao commented on HDFS-13473:


Thanks [~daryn] for your comments.
{quote}What about something like the DN's heartbeat contains the current key 
version it has? The NN's handleHeartbeat compares with its current key, calls 
setNeedKeyUpdate if different.{quote}
It is good suggestion to update Block Keys for DataNode. But there may be more 
code changes since we need to update {{DatanodeProtocol#sendHeartbeat}} and add 
new parameter about version for BlockKeys?

> DataNode update BlockKeys using mode PULL rather than PUSH from NameNode
> 
>
> Key: HDFS-13473
> URL: https://issues.apache.org/jira/browse/HDFS-13473
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HDFS-13473-trunk.001.patch
>
>
> It is passive behavior about updating Block keys for DataNode currently, and 
> it depends on if NameNode return #KeyUpdateCommand for heartbeat response.
> There are several problems of this Block keys synchronization mode:
> a. NameNode can't be sensed about if Block Keys reach DataNode successfully,
> b. It is also not sensed for DataNode who meets some exception while receive 
> or process heartbeat response which include BlockKeyCommand,
> such as HDFS-13441 and HDFS-12749 mentioned.
> So I propose improve Push Block Keys from NameNode for DataNode to DataNode 
> Pull Block Keys.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13475) RBF: Admin cannot enforce Router enter SafeMode

2018-04-18 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated HDFS-13475:
---
Summary: RBF: Admin cannot enforce Router enter SafeMode  (was: RBF: Router 
always leaves SafeMode after some time even manually entering SafeMode)

> RBF: Admin cannot enforce Router enter SafeMode
> ---
>
> Key: HDFS-13475
> URL: https://issues.apache.org/jira/browse/HDFS-13475
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Wei Yan
>Assignee: Wei Yan
>Priority: Major
>
> To reproduce the issue: 
> {code:java}
> $ bin/hdfs dfsrouteradmin -safemode enter
> Successfully enter safe mode.
> $ bin/hdfs dfsrouteradmin -safemode get
> Safe Mode: true{code}
> And then, 
> {code:java}
> $ bin/hdfs dfsrouteradmin -safemode get
> Safe Mode: false{code}
> From the code, it looks like the periodicInvoke triggers the leave.
> {code:java}
> public void periodicInvoke() {
> ..
>   // Always update to indicate our cache was updated
>   if (isCacheStale) {
> if (!rpcServer.isInSafeMode()) {
>   enter();
> }
>   } else if (rpcServer.isInSafeMode()) {
> // Cache recently updated, leave safe mode
> leave();
>   }
> }
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13473) DataNode update BlockKeys using mode PULL rather than PUSH from NameNode

2018-04-18 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16442917#comment-16442917
 ] 

genericqa commented on HDFS-13473:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
39s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 28m 
 5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  0s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
50s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 52s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 15 new + 546 unchanged - 0 fixed = 561 total (was 546) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  2s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}113m  8s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}178m 32s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.tools.TestHdfsConfigFields |
|   | hadoop.hdfs.server.datanode.TestBpServiceActorScheduler |
|   | hadoop.hdfs.TestPread |
|   | hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy |
|   | hadoop.hdfs.server.namenode.TestNameNodeMXBean |
|   | hadoop.hdfs.server.namenode.TestReencryptionWithKMS |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting |
|   | hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery 
|
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b |
| JIRA Issue | HDFS-13473 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12919631/HDFS-13473-trunk.001.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 5bc69cb6f389 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / bf2f493 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC1 |
| 

[jira] [Updated] (HDFS-13442) Ozone: Handle Datanode Registration failure

2018-04-18 Thread Hanisha Koneru (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDFS-13442:
--
Attachment: (was: HDFS-13442-HDFS-7240.002.patch)

> Ozone: Handle Datanode Registration failure
> ---
>
> Key: HDFS-13442
> URL: https://issues.apache.org/jira/browse/HDFS-13442
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
> Attachments: HDFS-13442-HDFS-7240.001.patch
>
>
> If a datanode is not able to register itself, we need to handle that 
> correctly. 
> If the number of unsuccessful attempts to register with the SCM exceeds a 
> configurable max number, the datanode should not make any more attempts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-13475) RBF: Router always leaves SafeMode after some time even manually entering SafeMode

2018-04-18 Thread Wei Yan (JIRA)
Wei Yan created HDFS-13475:
--

 Summary: RBF: Router always leaves SafeMode after some time even 
manually entering SafeMode
 Key: HDFS-13475
 URL: https://issues.apache.org/jira/browse/HDFS-13475
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Wei Yan
Assignee: Wei Yan


To reproduce the issue: 
{code:java}
$ bin/hdfs dfsrouteradmin -safemode enter
Successfully enter safe mode.
$ bin/hdfs dfsrouteradmin -safemode get
Safe Mode: true{code}
And then, 
{code:java}
$ bin/hdfs dfsrouteradmin -safemode get
Safe Mode: false{code}
>From the code, it looks like the periodicInvoke triggers the leave.
{code:java}
public void periodicInvoke() {
..
  // Always update to indicate our cache was updated
  if (isCacheStale) {
if (!rpcServer.isInSafeMode()) {
  enter();
}
  } else if (rpcServer.isInSafeMode()) {
// Cache recently updated, leave safe mode
leave();
  }
}
{code}
 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13442) Ozone: Handle Datanode Registration failure

2018-04-18 Thread Hanisha Koneru (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDFS-13442:
--
Attachment: HDFS-13442-HDFS-7240.002.patch

> Ozone: Handle Datanode Registration failure
> ---
>
> Key: HDFS-13442
> URL: https://issues.apache.org/jira/browse/HDFS-13442
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
> Attachments: HDFS-13442-HDFS-7240.001.patch, 
> HDFS-13442-HDFS-7240.002.patch
>
>
> If a datanode is not able to register itself, we need to handle that 
> correctly. 
> If the number of unsuccessful attempts to register with the SCM exceeds a 
> configurable max number, the datanode should not make any more attempts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13456) Ozone: Update ozone to latest ratis snapshot build (0.1.1-alpha-4309324-SNAPSHOT)

2018-04-18 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16442849#comment-16442849
 ] 

genericqa commented on HDFS-13456:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} HDFS-7240 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
19s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
23s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 29m  
7s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
10s{color} | {color:green} HDFS-7240 passed {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
22s{color} | {color:red} container-service in HDFS-7240 failed. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 31s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m  
1s{color} | {color:red} hadoop-hdds/common in HDFS-7240 has 1 extant Findbugs 
warnings. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
22s{color} | {color:red} container-service in HDFS-7240 failed. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
22s{color} | {color:red} container-service in HDFS-7240 failed. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
18s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
10s{color} | {color:red} container-service in the patch failed. {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 27m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 27m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
10s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
22s{color} | {color:red} container-service in the patch failed. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
3s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 24s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
24s{color} | {color:red} container-service in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
22s{color} | {color:red} container-service in the patch failed. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
23s{color} | {color:green} hadoop-project in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m  
4s{color} | {color:green} common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 23s{color} 
| {color:red} container-service in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
37s{color} | {color:green} The patch 

[jira] [Commented] (HDFS-13431) Ozone: Ozone Shell should use RestClient and RpcClient

2018-04-18 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16442844#comment-16442844
 ] 

genericqa commented on HDFS-13431:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} HDFS-7240 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
44s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
 4s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m 
36s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
17s{color} | {color:green} HDFS-7240 passed {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
27s{color} | {color:red} client in HDFS-7240 failed. {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
27s{color} | {color:red} integration-test in HDFS-7240 failed. {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
23s{color} | {color:red} ozone-manager in HDFS-7240 failed. {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
25s{color} | {color:red} hadoop-ozone in HDFS-7240 failed. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 40s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-ozone/integration-test {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
57s{color} | {color:red} hadoop-hdds/common in HDFS-7240 has 1 extant Findbugs 
warnings. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
21s{color} | {color:red} client in HDFS-7240 failed. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
22s{color} | {color:red} ozone-manager in HDFS-7240 failed. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
22s{color} | {color:red} hadoop-ozone in HDFS-7240 failed. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
22s{color} | {color:red} client in HDFS-7240 failed. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
22s{color} | {color:red} integration-test in HDFS-7240 failed. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
22s{color} | {color:red} ozone-manager in HDFS-7240 failed. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
23s{color} | {color:red} hadoop-ozone in HDFS-7240 failed. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
19s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
11s{color} | {color:red} client in the patch failed. {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
10s{color} | {color:red} integration-test in the patch failed. {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
10s{color} | {color:red} ozone-manager in the patch failed. {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
11s{color} | {color:red} hadoop-ozone in the patch failed. {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 27m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 27m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
 9s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
23s{color} | {color:red} client in the patch failed. {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
23s{color} | {color:red} integration-test in the patch failed. {color} |
| 

[jira] [Commented] (HDFS-13441) DataNode missed BlockKey update from NameNode due to HeartbeatResponse was dropped

2018-04-18 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16442824#comment-16442824
 ] 

Daryn Sharp commented on HDFS-13441:


This is a bad approach for a couple reasons.  Checking if the exception 
contains "Can't recompute" is very fragile.  Exception messages should be 
considered opaque.

Also consider that an invalid token hash caused by a missed key update is rare. 
 The more common case is something like the balancer using an expired secret.  
Or consider a faulty or malicious client using an expired token.  This approach 
may easily cause DNs to go into re-registration loops and ruin a cluster.

Please see discussion on HDFS-13473 for a cleaner way to handle this problem.



> DataNode missed BlockKey update from NameNode due to HeartbeatResponse was 
> dropped
> --
>
> Key: HDFS-13441
> URL: https://issues.apache.org/jira/browse/HDFS-13441
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 2.7.1
>Reporter: yunjiong zhao
>Assignee: yunjiong zhao
>Priority: Major
> Attachments: HDFS-13441.002.patch, HDFS-13441.patch
>
>
> After NameNode failover, lots of application failed due to some DataNodes 
> can't re-compute password from block token.
> {code:java}
> 2018-04-11 20:10:52,448 ERROR 
> org.apache.hadoop.hdfs.server.datanode.DataNode: 
> hdc3-lvs01-400-1701-048.stratus.lvs.ebay.com:50010:DataXceiver error 
> processing unknown operation  src: /10.142.74.116:57404 dst: 
> /10.142.77.45:50010
> javax.security.sasl.SaslException: DIGEST-MD5: IO error acquiring password 
> [Caused by org.apache.hadoop.security.token.SecretManager$InvalidToken: Can't 
> re-compute password for block_token_identifier (expiryDate=1523538652448, 
> keyId=1762737944, userId=hadoop, 
> blockPoolId=BP-36315570-10.103.108.13-1423055488042, blockId=12142862700, 
> access modes=[WRITE]), since the required block key (keyID=1762737944) 
> doesn't exist.]
>         at 
> com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:598)
>         at 
> com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslParticipant.evaluateChallengeOrResponse(SaslParticipant.java:115)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.doSaslHandshake(SaslDataTransferServer.java:376)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.getSaslStreams(SaslDataTransferServer.java:300)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.receive(SaslDataTransferServer.java:127)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:194)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.security.token.SecretManager$InvalidToken: Can't 
> re-compute password for block_token_identifier (expiryDate=1523538652448, 
> keyId=1762737944, userId=hadoop, 
> blockPoolId=BP-36315570-10.103.108.13-1423055488042, blockId=12142862700, 
> access modes=[WRITE]), since the required block key (keyID=1762737944) 
> doesn't exist.
>         at 
> org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.retrievePassword(BlockTokenSecretManager.java:382)
>         at 
> org.apache.hadoop.hdfs.security.token.block.BlockPoolTokenSecretManager.retrievePassword(BlockPoolTokenSecretManager.java:79)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.buildServerPassword(SaslDataTransferServer.java:318)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.access$100(SaslDataTransferServer.java:73)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer$2.apply(SaslDataTransferServer.java:297)
>         at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer$SaslServerCallbackHandler.handle(SaslDataTransferServer.java:241)
>         at 
> com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:589)
>         ... 7 more
> {code}
>  
> In the DataNode log, we didn't see DataNode update block keys around 
> 2018-04-11 09:55:00 and around 2018-04-11 19:55:00.
> {code:java}
> 2018-04-10 14:51:36,424 INFO 
> org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting 
> block keys
> 2018-04-10 23:55:38,420 INFO 
> org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting 
> block keys
> 2018-04-11 00:51:34,792 INFO 
> org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager: Setting 
> block keys
> 2018-04-11 10:51:39,403 INFO 
> 

[jira] [Commented] (HDFS-13473) DataNode update BlockKeys using mode PULL rather than PUSH from NameNode

2018-04-18 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16442811#comment-16442811
 ] 

Daryn Sharp commented on HDFS-13473:


A bit concerned about doing a blocking rpc with retries.

What about something like the DN's heartbeat contains the current key version 
it has?  The NN's handleHeartbeat compares with its current key, calls 
setNeedKeyUpdate if different.  Then we don't need additional rpcs, confs, and 
minimizes the code changes.

> DataNode update BlockKeys using mode PULL rather than PUSH from NameNode
> 
>
> Key: HDFS-13473
> URL: https://issues.apache.org/jira/browse/HDFS-13473
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HDFS-13473-trunk.001.patch
>
>
> It is passive behavior about updating Block keys for DataNode currently, and 
> it depends on if NameNode return #KeyUpdateCommand for heartbeat response.
> There are several problems of this Block keys synchronization mode:
> a. NameNode can't be sensed about if Block Keys reach DataNode successfully,
> b. It is also not sensed for DataNode who meets some exception while receive 
> or process heartbeat response which include BlockKeyCommand,
> such as HDFS-13441 and HDFS-12749 mentioned.
> So I propose improve Push Block Keys from NameNode for DataNode to DataNode 
> Pull Block Keys.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13470) RBF: Add Browse the Filesystem button to the UI

2018-04-18 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16442794#comment-16442794
 ] 

Wei Yan edited comment on HDFS-13470 at 4/18/18 4:29 PM:
-

{quote}Generating the header from javascript
{quote}
I guess this may work here.. We can let explorer.js generate NN/Router 
contents, including header, links, etc. Not sure any better idea here.

One minor in [^HDFS-13470.000.patch], the tab links need to be updated:
{code:java}
Overview
Subclusters
Routers
Datanodes
Mount table{code}


was (Author: ywskycn):
{quote}Generating the header from javascript
{quote}
I guess this may work here.. We can let explorer.js generate NN/Router 
contents, including header, links, etc. Not sure any better idea here.

One minor in [^HDFS-13470.000.patch], the tab links need to be updated:
{code:java}
Overview
Subclusters
Routers
Datanodes
Mount table{code}
 

> RBF: Add Browse the Filesystem button to the UI
> ---
>
> Key: HDFS-13470
> URL: https://issues.apache.org/jira/browse/HDFS-13470
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
>Priority: Major
> Attachments: HDFS-13470.000.patch
>
>
> After HDFS-12512 added WebHDFS, we can add the support to browse the 
> filesystem to the UI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13470) RBF: Add Browse the Filesystem button to the UI

2018-04-18 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16442794#comment-16442794
 ] 

Wei Yan commented on HDFS-13470:


{quote}Generating the header from javascript
{quote}
I guess this may work here.. We can let explorer.js generate NN/Router 
contents, including header, links, etc. Not sure any better idea here.

One minor in [^HDFS-13470.000.patch], the tab links need to be updated:
{code:java}
Overview
Subclusters
Routers
Datanodes
Mount table{code}
 

> RBF: Add Browse the Filesystem button to the UI
> ---
>
> Key: HDFS-13470
> URL: https://issues.apache.org/jira/browse/HDFS-13470
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
>Priority: Major
> Attachments: HDFS-13470.000.patch
>
>
> After HDFS-12512 added WebHDFS, we can add the support to browse the 
> filesystem to the UI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13474) Unable to start Hadoop DataNodes

2018-04-18 Thread robbie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

robbie updated HDFS-13474:
--
Description: 
I am trying to follow the instructions in the Getting Started guide,

[http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html#YARN_on_Single_Node]

I have confirmed, that I can `ssh localhost` without a password prompt. I have 
also run the following steps,
{quote}1. $ bin/hdfs namenode -format
 2. $ sbin/start-dfs.sh
{quote}
But I cant run step 3. to browse the location at `[http://localhost:9870/]`. 
When I run `>jsp` from the terminal prompt I just get returned,
{quote}14900 Jps
{quote}
I was expecting a list of my nodes.

In the Logs I see two error messages towards the end,
{quote}2018-04-18 14:15:42,516 ERROR 
org.apache.hadoop.hdfs.server.datanode.DataNode: RECEIVED SIGNAL 15: SIGTERM
{quote}
{quote}2018-04-18 14:15:42,516 ERROR 
org.apache.hadoop.hdfs.server.datanode.DataNode: RECEIVED SIGNAL 1: SIGHUP
{quote}
{quote}2018-04-18 14:15:42,517 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
 /
 SHUTDOWN_MSG: Shutting down DataNode at c0315/127.0.1.1
 /
{quote}
I will attach the full logs with this bug report.

Can anyone help even with ways to debug this please ?
  
 Java Version,
{quote}rcoll...@steelydan.com@c0315:~/temp/logs/hadoop$ java --version 
 java 9.0.4 
 Java(TM) SE Runtime Environment (build 9.0.4+11) 
 Java HotSpot(TM) 64-Bit Server VM (build 9.0.4+11, mixed mode)
{quote}
Ubuntu version,
{quote}$ lsb_release -a
 No LSB modules are available.
 Distributor ID: neon
 Description: KDE neon User Edition 5.12
 Release: 16.04
 Codename: xenial
{quote}
I have tried running the commands, `bin/hdfs version`
{quote}Hadoop 3.1.0 
 Source code repository [https://github.com/apache/hadoop] -r 
16b70619a24cdcf5d3b0fcf4b58ca77238ccbe6d 
 Compiled by centos on 2018-03-30T00:00Z 
 Compiled with protoc 2.5.0 
 From source with checksum 14182d20c972b3e2105580a1ad6990 
 This command was run using 
/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/common/hadoop-common-3.1.0.jar
{quote}
 when I try `bin/hdfs groups` it doesnt return but gives me,
{quote}018-04-18 15:33:34,590 INFO ipc.Client: Retrying connect to server: 
localhost/127.0.0.1:9000. Already tried 0 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
{quote}
when I try, `$ bin/hdfs lsSnapshottableDir`
{quote}lsSnapshottableDir: Call From c0315/127.0.1.1 to localhost:9000 failed 
on connection exception: java.net.ConnectException: Connection refused; For 
more details see:  [http://wiki|http://wiki/].
 apache.org/hadoop/ConnectionRefused
{quote}
 
 when I try, `$ bin/hdfs classpath`
{quote}/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/etc/hadoop:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/common/lib/*:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/common/*:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/hdfs:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/hdfs/lib/*:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/hdfs/*:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/mapreduce/*:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/yarn:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/yarn/lib/*:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/yarn/*
{quote}
core-site.xml
{quote} 
 
 
 fs.defaultFS
 hdfs://localhost:9000
 
 
{quote}
 
 hdfs-site.xml
{quote}
 
 dfs.replication
 1
 
 
{quote}
mapred-site.xml
{quote}
 
 mapreduce.framework.name
 yarn
 
 
{quote}

  was:
I am trying to follow the instrutions in the GettingStarted guide,

[http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html#YARN_on_Single_Node]

I have confirmed, that I can `ssh localhost` without a password prompt. I have 
also run the following steps,
{quote}1. $ bin/hdfs namenode -format
 2. $ sbin/start-dfs.sh
{quote}
But I cant run step 3. to browse the location at `[http://localhost:9870/]`. 
When I run `>jsp` from the terminal prompt I just get returned,
{quote}14900 Jps
{quote}
I was expecting a list of my nodes.

In the Logs I see two error messages towards the end,
{quote}2018-04-18 14:15:42,516 ERROR 
org.apache.hadoop.hdfs.server.datanode.DataNode: RECEIVED SIGNAL 15: SIGTERM
{quote}
{quote}2018-04-18 14:15:42,516 ERROR 
org.apache.hadoop.hdfs.server.datanode.DataNode: RECEIVED SIGNAL 1: SIGHUP
{quote}
{quote}2018-04-18 14:15:42,517 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
 /
 SHUTDOWN_MSG: Shutting down DataNode at c0315/127.0.1.1
 

[jira] [Commented] (HDFS-13448) HDFS Block Placement - Ignore Locality for First Block Replica

2018-04-18 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16442755#comment-16442755
 ] 

Daryn Sharp commented on HDFS-13448:


If we are going to add this feature, it shouldn't have fuzzy semantics.  The 
{{NO_LOCAL_WRITE}} feature is a different, although a valid case for comparison.

The {{NO_LOCAL_WRITE}} requires the policy to know the node to provide rack 
locality, as opposed to this feature where the node is or should be irrelevant.

Excluding the local rack is broken for a small number of racks.  Take the 
extreme case of 2 racks.  Excluding the local rack will cause placement to 
fail.  The uneven placement this jira seeks to fix will break down if the flume 
agents are concentrated on a few racks in a cluster with a small number of 
racks.

Simply not providing the node will work with all existing placement policies, 
and achieve even/random distribution.


> HDFS Block Placement - Ignore Locality for First Block Replica
> --
>
> Key: HDFS-13448
> URL: https://issues.apache.org/jira/browse/HDFS-13448
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: block placement, hdfs-client
>Affects Versions: 2.9.0, 3.0.1
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Attachments: HDFS-13448.1.patch, HDFS-13448.2.patch, 
> HDFS-13448.3.patch
>
>
> According to the HDFS Block Place Rules:
> {quote}
> /**
>  * The replica placement strategy is that if the writer is on a datanode,
>  * the 1st replica is placed on the local machine, 
>  * otherwise a random datanode. The 2nd replica is placed on a datanode
>  * that is on a different rack. The 3rd replica is placed on a datanode
>  * which is on a different node of the rack as the second replica.
>  */
> {quote}
> However, there is a hint for the hdfs-client that allows the block placement 
> request to not put a block replica on the local datanode _where 'local' means 
> the same host as the client is being run on._
> {quote}
>   /**
>* Advise that a block replica NOT be written to the local DataNode where
>* 'local' means the same host as the client is being run on.
>*
>* @see CreateFlag#NO_LOCAL_WRITE
>*/
> {quote}
> I propose that we add a new flag that allows the hdfs-client to request that 
> the first block replica be placed on a random DataNode in the cluster.  The 
> subsequent block replicas should follow the normal block placement rules.
> The issue is that when the {{NO_LOCAL_WRITE}} is enabled, the first block 
> replica is not placed on the local node, but it is still placed on the local 
> rack.  Where this comes into play is where you have, for example, a flume 
> agent that is loading data into HDFS.
> If the Flume agent is running on a DataNode, then by default, the DataNode 
> local to the Flume agent will always get the first block replica and this 
> leads to un-even block placements, with the local node always filling up 
> faster than any other node in the cluster.
> Modifying this example, if the DataNode is removed from the host where the 
> Flume agent is running, or this {{NO_LOCAL_WRITE}} is enabled by Flume, then 
> the default block placement policy will still prefer the local rack.  This 
> remedies the situation only so far as now the first block replica will always 
> be distributed to a DataNode on the local rack.
> This new flag would allow a single Flume agent to distribute the blocks 
> randomly, evenly, over the entire cluster instead of hot-spotting the local 
> node or the local rack.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13470) RBF: Add Browse the Filesystem button to the UI

2018-04-18 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HDFS-13470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16442745#comment-16442745
 ] 

Íñigo Goiri commented on HDFS-13470:


My main concern with this is that the code for explorer.html and explorer.js is 
pretty much the same as the one for the Namenode.
The problem is the header as they have different content there.
I would add a header page but not sure how to achieve this:
* Using iframes (not a big fan)
* Generating the header from javascript

Any suggestions?

> RBF: Add Browse the Filesystem button to the UI
> ---
>
> Key: HDFS-13470
> URL: https://issues.apache.org/jira/browse/HDFS-13470
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
>Priority: Major
> Attachments: HDFS-13470.000.patch
>
>
> After HDFS-12512 added WebHDFS, we can add the support to browse the 
> filesystem to the UI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13472) Compilation error in trunk in hadoop-aws

2018-04-18 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16442734#comment-16442734
 ] 

Jason Lowe commented on HDFS-13472:
---

I am unable to reproduce the compilation error in trunk.  Given StagingTestBase 
has not been modified since November, it looks like many others have been 
unable to reproduce the error as well for some time.  How are you building 
Hadoop to reproduce this error (i.e.: what does the command-line look like)?

bq.  getArgumentAt(int, Class) method is available only from 
version 2.0.0-beta

getArgumentAt is available in 1.10.19.  
https://static.javadoc.io/org.mockito/mockito-core/1.10.19/org/mockito/invocation/InvocationOnMock.html

The reason this is working for me is because mockito-core 1.10.19 is being 
pulled in by the DynamoDBLocal dependency, and that is appearing in the 
classpath before the mockito-all 1.8.5 dependency (as reported by mvn 
dependency:build-classpath).

I agree that the version of mockito-all being requested by Hadoop is wrong.  
It's trying to call a method that isn't available in 1.8.5.  I think we should 
upgrade the mockito dependency to at least 1.10.19.


> Compilation error in trunk in hadoop-aws 
> -
>
> Key: HDFS-13472
> URL: https://issues.apache.org/jira/browse/HDFS-13472
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Mohammad Arshad
>Priority: Major
>
> *Problem:* hadoop trunk compilation is failing
>  *Root Cause:*
>  compilation error is coming from 
> {{org.apache.hadoop.fs.s3a.commit.staging.StagingTestBase}}. Compilation 
> error is "The method getArgumentAt(int, Class) is 
> undefined for the type InvocationOnMock".
> StagingTestBase is using getArgumentAt(int, Class) method 
> which is not available in mockito-all 1.8.5 version. getArgumentAt(int, 
> Class) method is available only from version 2.0.0-beta
> *Expectations:*
>  Either mockito-all version to be upgraded or test case to be written only 
> with available functions in 1.8.5.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-13473) DataNode update BlockKeys using mode PULL rather than PUSH from NameNode

2018-04-18 Thread He Xiaoqiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Xiaoqiao reassigned HDFS-13473:
--

Assignee: He Xiaoqiao

> DataNode update BlockKeys using mode PULL rather than PUSH from NameNode
> 
>
> Key: HDFS-13473
> URL: https://issues.apache.org/jira/browse/HDFS-13473
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Attachments: HDFS-13473-trunk.001.patch
>
>
> It is passive behavior about updating Block keys for DataNode currently, and 
> it depends on if NameNode return #KeyUpdateCommand for heartbeat response.
> There are several problems of this Block keys synchronization mode:
> a. NameNode can't be sensed about if Block Keys reach DataNode successfully,
> b. It is also not sensed for DataNode who meets some exception while receive 
> or process heartbeat response which include BlockKeyCommand,
> such as HDFS-13441 and HDFS-12749 mentioned.
> So I propose improve Push Block Keys from NameNode for DataNode to DataNode 
> Pull Block Keys.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13473) DataNode update BlockKeys using mode PULL rather than PUSH from NameNode

2018-04-18 Thread He Xiaoqiao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16442732#comment-16442732
 ] 

He Xiaoqiao commented on HDFS-13473:


submit initial patch and use {{NamenodeProtocol#getBlockKeys}} interface to 
update Block Keys periodically by DataNode.
also add configuration items to support if switch on this feature.

> DataNode update BlockKeys using mode PULL rather than PUSH from NameNode
> 
>
> Key: HDFS-13473
> URL: https://issues.apache.org/jira/browse/HDFS-13473
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: He Xiaoqiao
>Priority: Major
> Attachments: HDFS-13473-trunk.001.patch
>
>
> It is passive behavior about updating Block keys for DataNode currently, and 
> it depends on if NameNode return #KeyUpdateCommand for heartbeat response.
> There are several problems of this Block keys synchronization mode:
> a. NameNode can't be sensed about if Block Keys reach DataNode successfully,
> b. It is also not sensed for DataNode who meets some exception while receive 
> or process heartbeat response which include BlockKeyCommand,
> such as HDFS-13441 and HDFS-12749 mentioned.
> So I propose improve Push Block Keys from NameNode for DataNode to DataNode 
> Pull Block Keys.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13474) Unable to start Hadoop DataNodes

2018-04-18 Thread robert (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

robert updated HDFS-13474:
--
Description: 
I am trying to follow the instrutions in the GettingStarted guide,

[http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html#YARN_on_Single_Node]

I have confirmed, that I can `ssh localhost` without a password prompt. I have 
also run the following steps,
{quote}1. $ bin/hdfs namenode -format
 2. $ sbin/start-dfs.sh
{quote}
But I cant run step 3. to browse the location at `[http://localhost:9870/]`. 
When I run `>jsp` from the terminal prompt I just get returned,
{quote}14900 Jps
{quote}
I was expecting a list of my nodes.

In the Logs I see two error messages towards the end,
{quote}2018-04-18 14:15:42,516 ERROR 
org.apache.hadoop.hdfs.server.datanode.DataNode: RECEIVED SIGNAL 15: SIGTERM
{quote}
{quote}2018-04-18 14:15:42,516 ERROR 
org.apache.hadoop.hdfs.server.datanode.DataNode: RECEIVED SIGNAL 1: SIGHUP
{quote}
{quote}2018-04-18 14:15:42,517 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
 /
 SHUTDOWN_MSG: Shutting down DataNode at c0315/127.0.1.1
 /
{quote}
I will attach the full logs with this bug report.

Can anyone help even with ways to debug this please ?
  
 Java Version,
{quote}rcoll...@steelydan.com@c0315:~/temp/logs/hadoop$ java --version 
 java 9.0.4 
 Java(TM) SE Runtime Environment (build 9.0.4+11) 
 Java HotSpot(TM) 64-Bit Server VM (build 9.0.4+11, mixed mode)
{quote}
Ubuntu version,
{quote}$ lsb_release -a
 No LSB modules are available.
 Distributor ID: neon
 Description: KDE neon User Edition 5.12
 Release: 16.04
 Codename: xenial
{quote}
I have tried running the commands, `bin/hdfs version`
{quote}Hadoop 3.1.0 
 Source code repository [https://github.com/apache/hadoop] -r 
16b70619a24cdcf5d3b0fcf4b58ca77238ccbe6d 
 Compiled by centos on 2018-03-30T00:00Z 
 Compiled with protoc 2.5.0 
 From source with checksum 14182d20c972b3e2105580a1ad6990 
 This command was run using 
/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/common/hadoop-common-3.1.0.jar
{quote}
 when I try `bin/hdfs groups` it doesnt return but gives me,
{quote}018-04-18 15:33:34,590 INFO ipc.Client: Retrying connect to server: 
localhost/127.0.0.1:9000. Already tried 0 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
{quote}
when I try, `$ bin/hdfs lsSnapshottableDir`
{quote}lsSnapshottableDir: Call From c0315/127.0.1.1 to localhost:9000 failed 
on connection exception: java.net.ConnectException: Connection refused; For 
more details see:  [http://wiki|http://wiki/].
 apache.org/hadoop/ConnectionRefused
{quote}
 
 when I try, `$ bin/hdfs classpath`
{quote}/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/etc/hadoop:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/common/lib/*:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/common/*:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/hdfs:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/hdfs/lib/*:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/hdfs/*:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/mapreduce/*:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/yarn:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/yarn/lib/*:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/yarn/*
{quote}

core-site.xml
{quote} 


fs.defaultFS
hdfs://localhost:9000


{quote}
 
hdfs-site.xml
{quote}


dfs.replication
1


{quote}

mapred-site.xml
{quote}


mapreduce.framework.name
yarn


{quote}


  was:
I am trying to follow the instrutions in the GettingStarted guide,

[http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html#YARN_on_Single_Node]

I have confirmed, that I can `ssh localhost` without a password prompt. I have 
also run the following steps,
{quote}1. $ bin/hdfs namenode -format
 2. $ sbin/start-dfs.sh
{quote}
But I cant run step 3. to browse the location at `[http://localhost:9870/]`. 
When I run `>jsp` from the terminal prompt I just get returned,
{quote}14900 Jps
{quote}
I was expecting a list of my nodes.

In the Logs I see two error messages towards the end,
{quote}2018-04-18 14:15:42,516 ERROR 
org.apache.hadoop.hdfs.server.datanode.DataNode: RECEIVED SIGNAL 15: SIGTERM
{quote}
{quote}2018-04-18 14:15:42,516 ERROR 
org.apache.hadoop.hdfs.server.datanode.DataNode: RECEIVED SIGNAL 1: SIGHUP
{quote}
{quote}2018-04-18 14:15:42,517 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
 

[jira] [Updated] (HDFS-13474) Unable to start Hadoop DataNodes

2018-04-18 Thread robert (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

robert updated HDFS-13474:
--
Description: 
I am trying to follow the instrutions in the GettingStarted guide,

[http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html#YARN_on_Single_Node]

I have confirmed, that I can `ssh localhost` without a password prompt. I have 
also run the following steps,
{quote}1. $ bin/hdfs namenode -format
 2. $ sbin/start-dfs.sh
{quote}
But I cant run step 3. to browse the location at `[http://localhost:9870/]`. 
When I run `>jsp` from the terminal prompt I just get returned,
{quote}14900 Jps
{quote}
I was expecting a list of my nodes.

In the Logs I see two error messages towards the end,
{quote}2018-04-18 14:15:42,516 ERROR 
org.apache.hadoop.hdfs.server.datanode.DataNode: RECEIVED SIGNAL 15: SIGTERM
{quote}
{quote}2018-04-18 14:15:42,516 ERROR 
org.apache.hadoop.hdfs.server.datanode.DataNode: RECEIVED SIGNAL 1: SIGHUP
{quote}
{quote}2018-04-18 14:15:42,517 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
 /
 SHUTDOWN_MSG: Shutting down DataNode at c0315/127.0.1.1
 /
{quote}
I will attach the full logs with this bug report.

Can anyone help even with ways to debug this please ?
 
Java Version,
{quote}
rcoll...@steelydan.com@c0315:~/temp/logs/hadoop$ java --version 
java 9.0.4 
Java(TM) SE Runtime Environment (build 9.0.4+11) 
Java HotSpot(TM) 64-Bit Server VM (build 9.0.4+11, mixed mode)
{quote}

Ubuntu version,
{quote}
$ lsb_release -a
No LSB modules are available.
Distributor ID: neon
Description: KDE neon User Edition 5.12
Release: 16.04
Codename: xenial
{quote}

I have tried running the commands, `bin/hdfs version`
{quote}Hadoop 3.1.0 
 Source code repository [https://github.com/apache/hadoop] -r 
16b70619a24cdcf5d3b0fcf4b58ca77238ccbe6d 
 Compiled by centos on 2018-03-30T00:00Z 
 Compiled with protoc 2.5.0 
 From source with checksum 14182d20c972b3e2105580a1ad6990 
 This command was run using 
/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/common/hadoop-common-3.1.0.jar
{quote}
 when I try `bin/hdfs groups` it doesnt return but gives me,
{quote}018-04-18 15:33:34,590 INFO ipc.Client: Retrying connect to server: 
localhost/127.0.0.1:9000. Already tried 0 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
{quote}
when I try, `$ bin/hdfs lsSnapshottableDir`
{quote}lsSnapshottableDir: Call From c0315/127.0.1.1 to localhost:9000 failed 
on connection exception: java.net.ConnectException: Connection refused; For 
more details see:  [http://wiki|http://wiki/].
 apache.org/hadoop/ConnectionRefused
{quote}
 
 when I try, `$ bin/hdfs classpath`
{quote}/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/etc/hadoop:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/common/lib/*:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/common/*:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/hdfs:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/hdfs/lib/*:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/hdfs/*:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/mapreduce/*:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/yarn:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/yarn/lib/*:/home/steelydan.com/roycecollige/Apps/hadoop-3.1.0/share/hadoop/yarn/*
{quote}
 

  was:
I am trying to follow the instrutions in the GettingStarted guide,

[http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html#YARN_on_Single_Node]

I have confirmed, that I can `ssh localhost` without a password prompt. I have 
also run the following steps,
{quote}1. $ bin/hdfs namenode -format
 2. $ sbin/start-dfs.sh
{quote}
But I cant run step 3. to browse the location at `[http://localhost:9870/]`. 
When I run `>jsp` from the terminal prompt I just get returned,
{quote}14900 Jps
{quote}
I was expecting a list of my nodes.

In the Logs I see two error messages towards the end,
{quote}2018-04-18 14:15:42,516 ERROR 
org.apache.hadoop.hdfs.server.datanode.DataNode: RECEIVED SIGNAL 15: SIGTERM
{quote}
{quote}2018-04-18 14:15:42,516 ERROR 
org.apache.hadoop.hdfs.server.datanode.DataNode: RECEIVED SIGNAL 1: SIGHUP
{quote}
{quote}2018-04-18 14:15:42,517 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
 /
 SHUTDOWN_MSG: Shutting down DataNode at c0315/127.0.1.1
 /
{quote}
I will attach the full logs with this bug report.

Can anyone help even with ways to debug this please ?

 

Java Version,
rcoll...@steelydan.com@c0315:~/temp/logs/hadoop$ java 

[jira] [Commented] (HDFS-13464) Fix javadoc in FsVolumeList#handleVolumeFailures

2018-04-18 Thread Bharat Viswanadham (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16442694#comment-16442694
 ] 

Bharat Viswanadham commented on HDFS-13464:
---

Thank You [~shashikant] for reporting and working on this and [~ajisakaa]  for 
review. I have committed this to trunk and branch-3.1.

> Fix javadoc in FsVolumeList#handleVolumeFailures
> 
>
> Key: HDFS-13464
> URL: https://issues.apache.org/jira/browse/HDFS-13464
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Minor
> Fix For: 3.2.0, 3.1.1
>
> Attachments: HDFS-13464.000.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



  1   2   >