[jira] [Updated] (HDFS-14910) Rename Snapshot with Pre Descendants Fail With IllegalArgumentException.

2019-10-23 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14910:
---
Status: Patch Available  (was: In Progress)

> Rename Snapshot with Pre Descendants Fail With IllegalArgumentException.
> 
>
> Key: HDFS-14910
> URL: https://issues.apache.org/jira/browse/HDFS-14910
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Íñigo Goiri
>Assignee: Wei-Chiu Chuang
>Priority: Blocker
>
> TestRenameWithSnapshots#testRename2PreDescendant has been failing 
> consistently.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work started] (HDFS-14910) Rename Snapshot with Pre Descendants Fail With IllegalArgumentException.

2019-10-23 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-14910 started by Wei-Chiu Chuang.
--
> Rename Snapshot with Pre Descendants Fail With IllegalArgumentException.
> 
>
> Key: HDFS-14910
> URL: https://issues.apache.org/jira/browse/HDFS-14910
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Íñigo Goiri
>Assignee: Wei-Chiu Chuang
>Priority: Blocker
>
> TestRenameWithSnapshots#testRename2PreDescendant has been failing 
> consistently.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14910) Rename Snapshot with Pre Descendants Fail With IllegalArgumentException.

2019-10-23 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16958298#comment-16958298
 ] 

Wei-Chiu Chuang commented on HDFS-14910:


Hey sorry for some reason my message didn't get out.
I think I got a patch that addresses the issue. Will post shortly.

> Rename Snapshot with Pre Descendants Fail With IllegalArgumentException.
> 
>
> Key: HDFS-14910
> URL: https://issues.apache.org/jira/browse/HDFS-14910
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Íñigo Goiri
>Assignee: Wei-Chiu Chuang
>Priority: Blocker
>
> TestRenameWithSnapshots#testRename2PreDescendant has been failing 
> consistently.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13901) INode access time is ignored because of race between open and rename

2019-10-22 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-13901:
---
Fix Version/s: 2.10.1
   2.9.3
   2.8.6

> INode access time is ignored because of race between open and rename
> 
>
> Key: HDFS-13901
> URL: https://issues.apache.org/jira/browse/HDFS-13901
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Fix For: 3.3.0, 2.8.6, 2.9.3, 3.1.4, 3.2.2, 2.10.1
>
> Attachments: HDFS-13901-branch-3.2.001.patch, HDFS-13901.000.patch, 
> HDFS-13901.001.patch, HDFS-13901.002.patch, HDFS-13901.003.patch, 
> HDFS-13901.004.patch, HDFS-13901.005.patch, HDFS-13901.006.patch, 
> HDFS-13901.007.patch, HDFS-13901.branch-2.001.patch
>
>
> That's because in getBlockLocations there is a gap between readUnlock and 
> re-fetch write lock (to update access time). If a rename operation occurs in 
> the gap, the update of access time will be ignored. We can calculate new path 
> from the inode and use the new path to update access time. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13901) INode access time is ignored because of race between open and rename

2019-10-22 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-13901:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Pushed the patch all the way from trunk to branch-2.8.

Thanks [~LiJinglun]!

> INode access time is ignored because of race between open and rename
> 
>
> Key: HDFS-13901
> URL: https://issues.apache.org/jira/browse/HDFS-13901
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Fix For: 3.3.0, 2.8.6, 2.9.3, 3.1.4, 3.2.2, 2.10.1
>
> Attachments: HDFS-13901-branch-3.2.001.patch, HDFS-13901.000.patch, 
> HDFS-13901.001.patch, HDFS-13901.002.patch, HDFS-13901.003.patch, 
> HDFS-13901.004.patch, HDFS-13901.005.patch, HDFS-13901.006.patch, 
> HDFS-13901.007.patch, HDFS-13901.branch-2.001.patch
>
>
> That's because in getBlockLocations there is a gap between readUnlock and 
> re-fetch write lock (to update access time). If a rename operation occurs in 
> the gap, the update of access time will be ignored. We can calculate new path 
> from the inode and use the new path to update access time. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14884) Add sanity check that zone key equals feinfo key while setting Xattrs

2019-10-22 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14884:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Pushed to trunk branch-3.2 and branch-3.1. 
Thanks [~msingh]!

> Add sanity check that zone key equals feinfo key while setting Xattrs
> -
>
> Key: HDFS-14884
> URL: https://issues.apache.org/jira/browse/HDFS-14884
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, hdfs
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-14884.001.patch, HDFS-14884.002.patch, 
> HDFS-14884.003.patch, hdfs_distcp.patch
>
>
> Currently, it is possible to set an external attribute where the  zone key is 
> not the same as  feinfo key. This jira will add a precondition before setting 
> this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14884) Add sanity check that zone key equals feinfo key while setting Xattrs

2019-10-22 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14884:
---
Fix Version/s: 3.2.2
   3.1.4
   3.3.0

> Add sanity check that zone key equals feinfo key while setting Xattrs
> -
>
> Key: HDFS-14884
> URL: https://issues.apache.org/jira/browse/HDFS-14884
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, hdfs
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-14884.001.patch, HDFS-14884.002.patch, 
> HDFS-14884.003.patch, hdfs_distcp.patch
>
>
> Currently, it is possible to set an external attribute where the  zone key is 
> not the same as  feinfo key. This jira will add a precondition before setting 
> this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14884) Add sanity check that zone key equals feinfo key while setting Xattrs

2019-10-22 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957330#comment-16957330
 ] 

Wei-Chiu Chuang commented on HDFS-14884:


+1 committing 003 patch.

> Add sanity check that zone key equals feinfo key while setting Xattrs
> -
>
> Key: HDFS-14884
> URL: https://issues.apache.org/jira/browse/HDFS-14884
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, hdfs
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
> Attachments: HDFS-14884.001.patch, HDFS-14884.002.patch, 
> HDFS-14884.003.patch, hdfs_distcp.patch
>
>
> Currently, it is possible to set an external attribute where the  zone key is 
> not the same as  feinfo key. This jira will add a precondition before setting 
> this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13901) INode access time is ignored because of race between open and rename

2019-10-22 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-13901:
---
Fix Version/s: 3.2.2
   3.1.4

> INode access time is ignored because of race between open and rename
> 
>
> Key: HDFS-13901
> URL: https://issues.apache.org/jira/browse/HDFS-13901
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-13901-branch-3.2.001.patch, HDFS-13901.000.patch, 
> HDFS-13901.001.patch, HDFS-13901.002.patch, HDFS-13901.003.patch, 
> HDFS-13901.004.patch, HDFS-13901.005.patch, HDFS-13901.006.patch, 
> HDFS-13901.007.patch, HDFS-13901.branch-2.001.patch
>
>
> That's because in getBlockLocations there is a gap between readUnlock and 
> re-fetch write lock (to update access time). If a rename operation occurs in 
> the gap, the update of access time will be ignored. We can calculate new path 
> from the inode and use the new path to update access time. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13901) INode access time is ignored because of race between open and rename

2019-10-22 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-13901:
---
Attachment: HDFS-13901.branch-2.001.patch

> INode access time is ignored because of race between open and rename
> 
>
> Key: HDFS-13901
> URL: https://issues.apache.org/jira/browse/HDFS-13901
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-13901-branch-3.2.001.patch, HDFS-13901.000.patch, 
> HDFS-13901.001.patch, HDFS-13901.002.patch, HDFS-13901.003.patch, 
> HDFS-13901.004.patch, HDFS-13901.005.patch, HDFS-13901.006.patch, 
> HDFS-13901.007.patch, HDFS-13901.branch-2.001.patch
>
>
> That's because in getBlockLocations there is a gap between readUnlock and 
> re-fetch write lock (to update access time). If a rename operation occurs in 
> the gap, the update of access time will be ignored. We can calculate new path 
> from the inode and use the new path to update access time. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13901) INode access time is ignored because of race between open and rename

2019-10-21 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956560#comment-16956560
 ] 

Wei-Chiu Chuang commented on HDFS-13901:


Pushed to trunk. The patch has conflicts with branch-3.2 so a new patch is 
needed. [~LiJinglun] if you are interested in offering a branch-3.2 patch. 
Thanks

> INode access time is ignored because of race between open and rename
> 
>
> Key: HDFS-13901
> URL: https://issues.apache.org/jira/browse/HDFS-13901
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-13901.000.patch, HDFS-13901.001.patch, 
> HDFS-13901.002.patch, HDFS-13901.003.patch, HDFS-13901.004.patch, 
> HDFS-13901.005.patch, HDFS-13901.006.patch, HDFS-13901.007.patch
>
>
> That's because in getBlockLocations there is a gap between readUnlock and 
> re-fetch write lock (to update access time). If a rename operation occurs in 
> the gap, the update of access time will be ignored. We can calculate new path 
> from the inode and use the new path to update access time. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13901) INode access time is ignored because of race between open and rename

2019-10-21 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-13901:
---
Fix Version/s: 3.3.0

> INode access time is ignored because of race between open and rename
> 
>
> Key: HDFS-13901
> URL: https://issues.apache.org/jira/browse/HDFS-13901
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-13901.000.patch, HDFS-13901.001.patch, 
> HDFS-13901.002.patch, HDFS-13901.003.patch, HDFS-13901.004.patch, 
> HDFS-13901.005.patch, HDFS-13901.006.patch, HDFS-13901.007.patch
>
>
> That's because in getBlockLocations there is a gap between readUnlock and 
> re-fetch write lock (to update access time). If a rename operation occurs in 
> the gap, the update of access time will be ignored. We can calculate new path 
> from the inode and use the new path to update access time. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14854) Create improved decommission monitor implementation

2019-10-21 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956552#comment-16956552
 ] 

Wei-Chiu Chuang commented on HDFS-14854:


Thanks [~sodonnell]
I finally gave it a hard look at the patch. The patch looks reasonable to me. 
But a patch this big will inevitably introduce bugs in corner cases or race 
condition bugs. Next step would be to set up a small cluster, generate a few 
million files and shake out easy bugs.

There's a minor conflict because of HDFS-14847.

There's a typo:
{code}
LOG.error("P{ is set to an invalid value, it must be greater than "+
  "zero. Defaulting to {}",
{code}
I think you wanted instead
{code}
LOG.error("{} is set to an invalid value, it must be greater than "+
  "zero. Defaulting to {}",
{code}

> Create improved decommission monitor implementation
> ---
>
> Key: HDFS-14854
> URL: https://issues.apache.org/jira/browse/HDFS-14854
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: Decommission_Monitor_V2_001.pdf, HDFS-14854.001.patch, 
> HDFS-14854.002.patch, HDFS-14854.003.patch, HDFS-14854.004.patch, 
> HDFS-14854.005.patch, HDFS-14854.006.patch, HDFS-14854.007.patch, 
> HDFS-14854.008.patch, HDFS-14854.009.patch, HDFS-14854.010.patch, 
> HDFS-14854.011.patch
>
>
> In HDFS-13157, we discovered a series of problems with the current 
> decommission monitor implementation, such as:
>  * Blocks are replicated sequentially disk by disk and node by node, and 
> hence the load is not spread well across the cluster
>  * Adding a node for decommission can cause the namenode write lock to be 
> held for a long time.
>  * Decommissioning nodes floods the replication queue and under replicated 
> blocks from a future node or disk failure may way for a long time before they 
> are replicated.
>  * Blocks pending replication are checked many times under a write lock 
> before they are sufficiently replicate, wasting resources
> In this Jira I propose to create a new implementation of the decommission 
> monitor that resolves these issues. As it will be difficult to prove one 
> implementation is better than another, the new implementation can be enabled 
> or disabled giving the option of the existing implementation or the new one.
> I will attach a pdf with some more details on the design and then a version 1 
> patch shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-12049) Recommissioning live nodes stalls the NN

2019-10-21 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-12049.

Resolution: Duplicate

> Recommissioning live nodes stalls the NN
> 
>
> Key: HDFS-12049
> URL: https://issues.apache.org/jira/browse/HDFS-12049
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Priority: Critical
>
> A node refresh will recommission included nodes that are alive and in 
> decommissioning or decommissioned state.  The recommission will scan all 
> blocks on the node, find over replicated blocks, chose an excess, queue an 
> invalidate.
> The process is expensive and worsened by overhead of storage types (even when 
> not in use).  It can be especially devastating because the write lock is held 
> for the entire node refresh.  _Recommissioning 67 nodes with ~500k 
> blocks/node stalled rpc services for over 4 mins._



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13901) INode access time is ignored because of race between open and rename

2019-10-21 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956438#comment-16956438
 ] 

Wei-Chiu Chuang commented on HDFS-13901:


+1. Thanks [~LiJinglun] you're obviously smarter than I am.
The test leverages the fairness of NameNode lock to guarantee thread order.

> INode access time is ignored because of race between open and rename
> 
>
> Key: HDFS-13901
> URL: https://issues.apache.org/jira/browse/HDFS-13901
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Attachments: HDFS-13901.000.patch, HDFS-13901.001.patch, 
> HDFS-13901.002.patch, HDFS-13901.003.patch, HDFS-13901.004.patch, 
> HDFS-13901.005.patch, HDFS-13901.006.patch, HDFS-13901.007.patch
>
>
> That's because in getBlockLocations there is a gap between readUnlock and 
> re-fetch write lock (to update access time). If a rename operation occurs in 
> the gap, the update of access time will be ignored. We can calculate new path 
> from the inode and use the new path to update access time. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14768) EC : Busy DN replica should be consider in live replica check.

2019-10-21 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14768:
---
Fix Version/s: (was: 3.3.0)

> EC : Busy DN replica should be consider in live replica check.
> --
>
> Key: HDFS-14768
> URL: https://issues.apache.org/jira/browse/HDFS-14768
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, erasure-coding, hdfs, namenode
>Affects Versions: 3.0.2
>Reporter: guojh
>Assignee: guojh
>Priority: Major
>  Labels: patch
> Attachments: 1568275810244.jpg, 1568276338275.jpg, 1568771471942.jpg, 
> HDFS-14768.000.patch, HDFS-14768.001.patch, HDFS-14768.002.patch, 
> HDFS-14768.003.patch, HDFS-14768.004.patch, HDFS-14768.005.patch, 
> HDFS-14768.006.patch, HDFS-14768.007.patch, HDFS-14768.jpg, 
> guojh_UT_after_deomission.txt, guojh_UT_before_deomission.txt, 
> zhaoyiming_UT_after_deomission.txt, zhaoyiming_UT_beofre_deomission.txt
>
>
> Policy is RS-6-3-1024K, version is hadoop 3.0.2;
> We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission 
> index[3,4], increase the index 6 datanode's
> pendingReplicationWithoutTargets  that make it large than 
> replicationStreamsHardLimit(we set 14). Then, After the method 
> chooseSourceDatanodes of BlockMananger, the liveBlockIndices is 
> [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. 
> In method scheduleReconstruction of BlockManager, the additionalReplRequired 
> is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a 
> erasureCode task to target datanode.
> When datanode get the task will build  targetIndices from liveBlockIndices 
> and target length. the code is blow.
> {code:java}
> // code placeholder
> targetIndices = new short[targets.length];
> private void initTargetIndices() { 
>   BitSet bitset = reconstructor.getLiveBitSet();
>   int m = 0; hasValidTargets = false; 
>   for (int i = 0; i < dataBlkNum + parityBlkNum; i++) {  
> if (!bitset.get) {    
>   if (reconstructor.getBlockLen > 0) {
>        if (m < targets.length) {
>          targetIndices[m++] = (short)i;
>          hasValidTargets = true;
>         }
>       }
>     }
>  }
> {code}
> targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value.
> The StripedReader is  aways create reader from first 6 index block, and is 
> [0,1,2,3,4,5]
> Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal 
> bug. the block index6's data is corruption(all data is zero).
> I write a unit test can stabilize repreduce.
> {code:java}
> // code placeholder
> private int replicationStreamsHardLimit = 
> DFSConfigKeys.DFS_NAMENODE_REPLICATION_STREAMS_HARD_LIMIT_DEFAULT;
> numDNs = dataBlocks + parityBlocks + 10;
> @Test(timeout = 24)
> public void testFileDecommission() throws Exception {
>   LOG.info("Starting test testFileDecommission");
>   final Path ecFile = new Path(ecDir, "testFileDecommission");
>   int writeBytes = cellSize * dataBlocks;
>   writeStripedFile(dfs, ecFile, writeBytes);
>   Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks());
>   FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes);
>   final INodeFile fileNode = cluster.getNamesystem().getFSDirectory()
>   .getINode4Write(ecFile.toString()).asFile();
>   LocatedBlocks locatedBlocks =
>   StripedFileTestUtil.getLocatedBlocks(ecFile, dfs);
>   LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0)
>   .get(0);
>   DatanodeInfo[] dnLocs = lb.getLocations();
>   LocatedStripedBlock lastBlock =
>   (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock();
>   DatanodeInfo[] storageInfos = lastBlock.getLocations();
>   //
>   DatanodeDescriptor datanodeDescriptor = 
> cluster.getNameNode().getNamesystem()
>   
> .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid());
>   BlockInfo firstBlock = fileNode.getBlocks()[0];
>   DatanodeStorageInfo[] dStorageInfos = bm.getStorages(firstBlock);
>   // the first heartbeat will consume 3 replica tasks
>   for (int i = 0; i <= replicationStreamsHardLimit + 3; i++) {
> BlockManagerTestUtil.addBlockToBeReplicated(datanodeDescriptor, new 
> Block(i),
> new DatanodeStorageInfo[]{dStorageInfos[0]});
>   }
>   assertEquals(dataBlocks + parityBlocks, dnLocs.length);
>   int[] decommNodeIndex = {3, 4};
>   final List decommisionNodes = new ArrayList();
>   // add the node which will be decommissioning
>   decommisionNodes.add(dnLocs[decommNodeIndex[0]]);
>   decommisionNodes.add(dnLocs[decommNodeIndex[1]]);
>   decommissionNode(0, decommisionNodes, AdminStates.DECOMMISSIONED);
>   assertEquals(decommisionNodes.size(), fsn.getNumDecomLiveDataNodes());
>   bm.getDatanodeManager().removeDatanode(datanodeD

[jira] [Commented] (HDFS-14884) Add sanity check that zone key equals feinfo key while setting Xattrs

2019-10-21 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956328#comment-16956328
 ] 

Wei-Chiu Chuang commented on HDFS-14884:


LGTM +1 pending Jenkins.

> Add sanity check that zone key equals feinfo key while setting Xattrs
> -
>
> Key: HDFS-14884
> URL: https://issues.apache.org/jira/browse/HDFS-14884
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, hdfs
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
> Attachments: HDFS-14884.001.patch, HDFS-14884.002.patch, 
> HDFS-14884.003.patch, hdfs_distcp.patch
>
>
> Currently, it is possible to set an external attribute where the  zone key is 
> not the same as  feinfo key. This jira will add a precondition before setting 
> this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14884) Add sanity check that zone key equals feinfo key while setting Xattrs

2019-10-21 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956276#comment-16956276
 ] 

Wei-Chiu Chuang commented on HDFS-14884:


LGTM
Nit:
{code}
import static org.apache.hadoop.hdfs.server.common.HdfsServerConstants.*;
{code}
please refrain from using wildcard import.

Test: should we add a fail() at the end of try block to make sure it does throw 
an exception when setting xattr on zone2File? Also, it would be better if you 
can add to the error log that the error is expected. Or use assertThrows().

> Add sanity check that zone key equals feinfo key while setting Xattrs
> -
>
> Key: HDFS-14884
> URL: https://issues.apache.org/jira/browse/HDFS-14884
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, hdfs
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
> Attachments: HDFS-14884.001.patch, HDFS-14884.002.patch, 
> hdfs_distcp.patch
>
>
> Currently, it is possible to set an external attribute where the  zone key is 
> not the same as  feinfo key. This jira will add a precondition before setting 
> this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14884) Add sanity check that zone key equals feinfo key while setting Xattrs

2019-10-21 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956276#comment-16956276
 ] 

Wei-Chiu Chuang edited comment on HDFS-14884 at 10/21/19 4:55 PM:
--

Looks really good just a few nits
Nit:
{code}
import static org.apache.hadoop.hdfs.server.common.HdfsServerConstants.*;
{code}
please refrain from using wildcard import.

Test: should we add a fail() at the end of try block to make sure it does throw 
an exception when setting xattr on zone2File? Also, it would be better if you 
can add to the error log that the error is expected. Or use assertThrows().


was (Author: jojochuang):
LGTM
Nit:
{code}
import static org.apache.hadoop.hdfs.server.common.HdfsServerConstants.*;
{code}
please refrain from using wildcard import.

Test: should we add a fail() at the end of try block to make sure it does throw 
an exception when setting xattr on zone2File? Also, it would be better if you 
can add to the error log that the error is expected. Or use assertThrows().

> Add sanity check that zone key equals feinfo key while setting Xattrs
> -
>
> Key: HDFS-14884
> URL: https://issues.apache.org/jira/browse/HDFS-14884
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, hdfs
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
> Attachments: HDFS-14884.001.patch, HDFS-14884.002.patch, 
> hdfs_distcp.patch
>
>
> Currently, it is possible to set an external attribute where the  zone key is 
> not the same as  feinfo key. This jira will add a precondition before setting 
> this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14912) Set dfs.image.string-tables.expanded default to false in branch-2.7

2019-10-18 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16955002#comment-16955002
 ] 

Wei-Chiu Chuang commented on HDFS-14912:


Started one for you: https://builds.apache.org/job/PreCommit-HDFS-Build/28122/
let's see what it says.

> Set dfs.image.string-tables.expanded default to false in branch-2.7
> ---
>
> Key: HDFS-14912
> URL: https://issues.apache.org/jira/browse/HDFS-14912
> Project: Hadoop HDFS
>  Issue Type: Task
>Affects Versions: 2.7.8
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
> Attachments: HDFS-14912.001.branch-2.7.patch
>
>
> In the branch-2.7 patch for CVE-2018-11768 HDFS FSImage Corruption, 
> dfs.image.string-tables.expanded is set to true by default: 
> https://github.com/apache/hadoop/commit/109d44604ca843212bdf22b50e86a5a41e1d21da#diff-36b19e9d8816002ed9dff8580055d3fbR627
> This is different from all other branches, which set it to false by default.
> For instance, branch-2.8: 
> https://github.com/apache/hadoop/commit/f697f3c4fc0067bb82494e445900d86942685b09#diff-36b19e9d8816002ed9dff8580055d3fbR629
> Goal: Flip the dfs.image.string-tables.expanded default in branch-2.7 to 
> false to make it consistent with other branches.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-14910) TestRenameWithSnapshots#testRename2PreDescendant failing consistently

2019-10-17 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang reassigned HDFS-14910:
--

Assignee: Wei-Chiu Chuang

> TestRenameWithSnapshots#testRename2PreDescendant failing consistently
> -
>
> Key: HDFS-14910
> URL: https://issues.apache.org/jira/browse/HDFS-14910
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Íñigo Goiri
>Assignee: Wei-Chiu Chuang
>Priority: Major
>
> TestRenameWithSnapshots#testRename2PreDescendant has been failing 
> consistently.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14908) LeaseManager should check parent-child relationship when filter open files.

2019-10-15 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16952094#comment-16952094
 ] 

Wei-Chiu Chuang commented on HDFS-14908:


[~linyiqun] wanna take a look?

> LeaseManager should check parent-child relationship when filter open files.
> ---
>
> Key: HDFS-14908
> URL: https://issues.apache.org/jira/browse/HDFS-14908
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.0, 3.0.1
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Minor
> Attachments: HDFS-14908.001.patch
>
>
> Now when doing listOpenFiles(), LeaseManager only checks whether the filter 
> path is the prefix of the open files. We should check whether the filter path 
> is the parent/ancestor of the open files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14908) LeaseManager should check parent-child relationship when filter open files.

2019-10-15 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14908:
---
Affects Version/s: 3.1.0
   3.0.1

> LeaseManager should check parent-child relationship when filter open files.
> ---
>
> Key: HDFS-14908
> URL: https://issues.apache.org/jira/browse/HDFS-14908
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.0, 3.0.1
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Minor
> Attachments: HDFS-14908.001.patch
>
>
> Now when doing listOpenFiles(), LeaseManager only checks whether the filter 
> path is the prefix of the open files. We should check whether the filter path 
> is the parent/ancestor of the open files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14271) [SBN read] StandbyException is logged if Observer is the first NameNode

2019-10-13 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14271:
---
Labels: multi-sbnn  (was: )

> [SBN read] StandbyException is logged if Observer is the first NameNode
> ---
>
> Key: HDFS-14271
> URL: https://issues.apache.org/jira/browse/HDFS-14271
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.3.0
>Reporter: Wei-Chiu Chuang
>Assignee: Shen Yinjie
>Priority: Minor
>  Labels: multi-sbnn
> Attachments: HDFS-14271_1.patch
>
>
> If I transition the first NameNode into Observer state, and then I create a 
> file from command line, it prints the following StandbyException log message, 
> as if the command failed. But it actually completed successfully:
> {noformat}
> [root@weichiu-sbsr-1 ~]# hdfs dfs -touchz /tmp/abf
> 19/02/12 16:35:17 INFO retry.RetryInvocationHandler: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException):
>  Operation category WRITE is not supported in state observer. Visit 
> https://s.apache.org/sbnn-error
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:98)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1987)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1424)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:762)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:458)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:530)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:918)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:853)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2782)
> , while invoking $Proxy4.create over 
> [weichiu-sbsr-1.gce.cloudera.com/172.31.121.145:8020,weichiu-sbsr-2.gce.cloudera.com/172.31.121.140:8020].
>  Trying to failover immediately.
> {noformat}
> This is unlike the case when the first NameNode is the Standby, where this 
> StandbyException is suppressed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-6524) Choosing datanode retries times considering with block replica number

2019-10-13 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-6524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16950369#comment-16950369
 ] 

Wei-Chiu Chuang commented on HDFS-6524:
---

Looks like a good improvement, thanks [~leosun08].
Additionally, you might find HDFS hedged read useful. There doesn't appear to 
be a good reference in the Apache Hadoop doc, but you can find additional info 
in HBase doc http://hbase.apache.org/book.html#hedged.reads

> Choosing datanode  retries times considering with block replica number
> --
>
> Key: HDFS-6524
> URL: https://issues.apache.org/jira/browse/HDFS-6524
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0-alpha1
>Reporter: Liang Xie
>Assignee: Lisheng Sun
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: HDFS-6524.001.patch, HDFS-6524.002.patch, 
> HDFS-6524.003.patch, HDFS-6524.004.patch, HDFS-6524.005(2).patch, 
> HDFS-6524.005.patch, HDFS-6524.006.patch, HDFS-6524.007.patch, HDFS-6524.txt
>
>
> Currently the chooseDataNode() does retry with the setting: 
> dfsClientConf.maxBlockAcquireFailures, which by default is 3 
> (DFS_CLIENT_MAX_BLOCK_ACQUIRE_FAILURES_DEFAULT = 3), it would be better 
> having another option, block replication factor. One cluster with only  two 
> block replica setting, or using Reed-solomon encoding solution with one 
> replica factor. It helps to reduce the long tail latency.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14190) Copying folders containing = - characters between hdfs (using webhdfs) does not work in distcp

2019-10-09 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16947850#comment-16947850
 ] 

Wei-Chiu Chuang commented on HDFS-14190:


I suspect HDFS-14323 or HDFS-14423 fixed it.

> Copying folders containing = - characters between hdfs (using webhdfs) does 
> not work in distcp
> --
>
> Key: HDFS-14190
> URL: https://issues.apache.org/jira/browse/HDFS-14190
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 3.1.1
>Reporter: yinsong
>Assignee: Aihua Xu
>Priority: Major
>
> Copying folders containing = - characters between hdfs (using webhdfs) does 
> not work in distcp
> for example:
> src:hadoop2.7  target:hadoop3.1.1
> (1)
> hadoop distcp \
> -pugp \
> -i \
> webhdfs://1.1.1.1:50070/sudiyi_datawarehouse 
> webhdfs://2.2.2.2:50070/sudiyi_datawarehouse
> ERROR tools.SimpleCopyListing: FileNotFoundException exception in listStatus: 
> File /sudiyi_datawarehouse/st_device_standard_ds/date_time%3D2018-10-10 does 
> not exist
>  
> (2)
> hadoop distcp \
> -Dmapreduce.framework.name=yarn \
> -pugp \
> -i \
> webhdfs://1.1.1.1:50070/druid webhdfs://2.2.2.2:50070/druid
> Error: java.io.IOException: File copy failed: 
> webhdfs://10.26.93.65:50070/druid/indexing-logs/kill_task-myapp_V1-2018-04-26T16_20_55+0800
>  --> 
> webhdfs://10.27.234.198:50070/druid/indexing-logs/kill_task-myapp_V1-2018-04-26T16_20_55+0800
>  at 
> org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:259)
>  at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:217)
>  at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:48)
>  at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
>  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799)
>  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
>  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
> Caused by: java.io.IOException: Couldn't run retriable-command: Copying 
> webhdfs://10.26.93.65:50070/druid/indexing-logs/kill_task-myapp_V1-2018-04-26T16_20_55+0800
>  to 
> webhdfs://10.27.234.198:50070/druid/indexing-logs/kill_task-myapp_V1-2018-04-26T16_20_55+0800
>  at 
> org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101)
>  at 
> org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:256)
>  ... 10 more
> Caused by: java.io.IOException: Failed to promote 
> tmp-file:webhdfs://10.27.234.198:50070/druid/.distcp.tmp.attempt_1545990837043_0016_m_15_2
>  to: 
> webhdfs://10.27.234.198:50070/druid/indexing-logs/kill_task-myapp_V1-2018-04-26T16_20_55+0800
>  at 
> org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.promoteTmpToTarget(RetriableFileCopyCommand.java:250)
>  at 
> org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:140)
>  at 
> org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:99)
>  at 
> org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14902) RBF: NullPointer When Misconfigured

2019-10-08 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14902:
---
Summary: RBF: NullPointer When Misconfigured  (was: NullPointer When 
Misconfigured)

> RBF: NullPointer When Misconfigured
> ---
>
> Key: HDFS-14902
> URL: https://issues.apache.org/jira/browse/HDFS-14902
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Priority: Minor
>
> Admittedly the server was mis-configured, but this should be a bit more 
> elegant.
> {code:none}
> 2019-10-08 11:19:52,505 ERROR router.NamenodeHeartbeatService: Unhandled 
> exception updating NN registration for null:null
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.federation.protocol.proto.HdfsServerFederationProtos$NamenodeMembershipRecordProto$Builder.setServiceAddress(HdfsServerFederationProtos.java:3831)
>   at 
> org.apache.hadoop.hdfs.server.federation.store.records.impl.pb.MembershipStatePBImpl.setServiceAddress(MembershipStatePBImpl.java:119)
>   at 
> org.apache.hadoop.hdfs.server.federation.store.records.MembershipState.newInstance(MembershipState.java:108)
>   at 
> org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.registerNamenode(MembershipNamenodeResolver.java:259)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.updateState(NamenodeHeartbeatService.java:223)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.periodicInvoke(NamenodeHeartbeatService.java:159)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.PeriodicService$1.run(PeriodicService.java:178)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13806) EC: No error message for unsetting EC policy of the directory inherits the erasure coding policy from an ancestor directory

2019-10-04 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-13806:
---
Fix Version/s: 3.1.4

> EC: No error message for unsetting EC policy of the directory inherits the 
> erasure coding policy from an ancestor directory
> ---
>
> Key: HDFS-13806
> URL: https://issues.apache.org/jira/browse/HDFS-13806
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.0.0
> Environment: 3 Node SUSE Linux cluster
>Reporter: Souryakanta Dwivedy
>Assignee: Ayush Saxena
>Priority: Minor
> Fix For: 3.2.0, 3.1.4
>
> Attachments: HDFS-13806-01.patch, HDFS-13806-02.patch, 
> HDFS-13806-03.patch, HDFS-13806-04.patch, HDFS-13806-05.patch, 
> HDFS-13806-06.patch, No_error_unset_ec_policy.png
>
>
> No error message thrown for unsetting EC policy of the directory inherits the 
> erasure coding policy from an ancestor directory
> Steps :-
> --
>  * Create a Directory
>  - Set EC policy for the Directory
>  - Create a file in-side that Directory 
>  - Create a sub-directory inside the parent directory
>  - Check both the file and sub-directory inherit the EC policy from parent 
> directory
>  - Try to unset EC Policy for the file and check it will throw error as [ 
> Cannot unset an erasure coding policy on a file]
>  - Try to unset EC Policy for the sub-directory and check it will throw a 
> success message as [Unset erasure coding policy from ] 
>  instead of throwing the error message,which is wrong behavior
> Actual output :-
> No proper error message thrown for unsetting EC policy of the directory 
> inherits the erasure coding policy from an ancestor directory
>  A success message is displayed instead of throwing an error message
>  Expected output :-
>  
>  Proper error message should be thrown while trying to unset EC policy of the 
> directory inherits the erasure coding policy from an ancestor directory
>  like error message thrown while unsetting the EC policy of a file inherits 
> the erasure coding policy from an ancestor directory



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14497) Write lock held by metasave impact following RPC processing

2019-10-04 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14497:
---
Fix Version/s: 3.2.2
   3.1.4

> Write lock held by metasave impact following RPC processing
> ---
>
> Key: HDFS-14497
> URL: https://issues.apache.org/jira/browse/HDFS-14497
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-14497-addendum.001.patch, HDFS-14497.001.patch
>
>
> NameNode meta save hold global write lock currently, so following RPC r/w 
> request or inner-thread of NameNode could be paused if they try to acquire 
> global read/write lock and have to wait before metasave release it.
> I propose to change write lock to read lock and let some read request could 
> be process normally. I think it could not change informations which meta save 
> try to get if we try to open read request.
> Actually, we need ensure that there are only one thread to execute metaSave, 
> otherwise, output streams could meet exception especially both streams hold 
> the same file handle or some other same output stream.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-2470) NN should automatically set permissions on dfs.namenode.*.dir

2019-10-04 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944767#comment-16944767
 ] 

Wei-Chiu Chuang commented on HDFS-2470:
---

Thanks! Just in time!

> NN should automatically set permissions on dfs.namenode.*.dir
> -
>
> Key: HDFS-2470
> URL: https://issues.apache.org/jira/browse/HDFS-2470
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.0-alpha
>Reporter: Aaron Myers
>Assignee: Siddharth Wagle
>Priority: Major
> Fix For: 3.3.0, 3.2.1, 3.1.4
>
> Attachments: HDFS-2470.01.patch, HDFS-2470.02.patch, 
> HDFS-2470.03.patch, HDFS-2470.04.patch, HDFS-2470.05.patch, 
> HDFS-2470.06.patch, HDFS-2470.07.patch, HDFS-2470.08.patch, 
> HDFS-2470.09.patch, HDFS-2470.branch-3.1.patch
>
>
> Much as the DN currently sets the correct permissions for the 
> dfs.datanode.data.dir, the NN should do the same for the 
> dfs.namenode.(name|edit).dir.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14890) Setting permissions on name directory fails on non posix compliant filesystems

2019-10-04 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14890:
---
Fix Version/s: 3.1.4

> Setting permissions on name directory fails on non posix compliant filesystems
> --
>
> Key: HDFS-14890
> URL: https://issues.apache.org/jira/browse/HDFS-14890
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.2.1
> Environment: Windows 10.
>Reporter: hirik
>Assignee: Siddharth Wagle
>Priority: Blocker
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-14890.01.patch
>
>
> Hi,
> HDFS NameNode and JournalNode are not starting in Windows machine. Found 
> below related exception in logs. 
> Caused by: java.lang.UnsupportedOperationExceptionCaused by: 
> java.lang.UnsupportedOperationException
> at java.base/java.nio.file.Files.setPosixFilePermissions(Files.java:2155)
> at 
> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.clearDirectory(Storage.java:452)
> at org.apache.hadoop.hdfs.server.namenode.NNStorage.format(NNStorage.java:591)
> at org.apache.hadoop.hdfs.server.namenode.NNStorage.format(NNStorage.java:613)
> at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:188)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1206)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:422)
> at 
> com.slog.dfs.hdfs.nn.NameNodeServiceImpl.delayedStart(NameNodeServiceImpl.java:147)
>  
> Code changes related to this issue: 
> [https://github.com/apache/hadoop/commit/07e3cf952eac9e47e7bd5e195b0f9fc28c468313#diff-1a56e69d50f21b059637cfcbf1d23f11]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-2470) NN should automatically set permissions on dfs.namenode.*.dir

2019-10-04 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944750#comment-16944750
 ] 

Wei-Chiu Chuang commented on HDFS-2470:
---

Pushed to branch-3.1 with trivial conflicts. Attached  
[^HDFS-2470.branch-3.1.patch]  for posterity.

> NN should automatically set permissions on dfs.namenode.*.dir
> -
>
> Key: HDFS-2470
> URL: https://issues.apache.org/jira/browse/HDFS-2470
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.0-alpha
>Reporter: Aaron Myers
>Assignee: Siddharth Wagle
>Priority: Major
> Fix For: 3.3.0, 3.2.1, 3.1.4
>
> Attachments: HDFS-2470.01.patch, HDFS-2470.02.patch, 
> HDFS-2470.03.patch, HDFS-2470.04.patch, HDFS-2470.05.patch, 
> HDFS-2470.06.patch, HDFS-2470.07.patch, HDFS-2470.08.patch, 
> HDFS-2470.09.patch, HDFS-2470.branch-3.1.patch
>
>
> Much as the DN currently sets the correct permissions for the 
> dfs.datanode.data.dir, the NN should do the same for the 
> dfs.namenode.(name|edit).dir.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-2470) NN should automatically set permissions on dfs.namenode.*.dir

2019-10-04 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-2470:
--
Attachment: HDFS-2470.branch-3.1.patch

> NN should automatically set permissions on dfs.namenode.*.dir
> -
>
> Key: HDFS-2470
> URL: https://issues.apache.org/jira/browse/HDFS-2470
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.0-alpha
>Reporter: Aaron Myers
>Assignee: Siddharth Wagle
>Priority: Major
> Fix For: 3.3.0, 3.2.1, 3.1.4
>
> Attachments: HDFS-2470.01.patch, HDFS-2470.02.patch, 
> HDFS-2470.03.patch, HDFS-2470.04.patch, HDFS-2470.05.patch, 
> HDFS-2470.06.patch, HDFS-2470.07.patch, HDFS-2470.08.patch, 
> HDFS-2470.09.patch, HDFS-2470.branch-3.1.patch
>
>
> Much as the DN currently sets the correct permissions for the 
> dfs.datanode.data.dir, the NN should do the same for the 
> dfs.namenode.(name|edit).dir.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-2470) NN should automatically set permissions on dfs.namenode.*.dir

2019-10-04 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-2470:
--
Fix Version/s: 3.1.4

> NN should automatically set permissions on dfs.namenode.*.dir
> -
>
> Key: HDFS-2470
> URL: https://issues.apache.org/jira/browse/HDFS-2470
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.0-alpha
>Reporter: Aaron Myers
>Assignee: Siddharth Wagle
>Priority: Major
> Fix For: 3.3.0, 3.2.1, 3.1.4
>
> Attachments: HDFS-2470.01.patch, HDFS-2470.02.patch, 
> HDFS-2470.03.patch, HDFS-2470.04.patch, HDFS-2470.05.patch, 
> HDFS-2470.06.patch, HDFS-2470.07.patch, HDFS-2470.08.patch, HDFS-2470.09.patch
>
>
> Much as the DN currently sets the correct permissions for the 
> dfs.datanode.data.dir, the NN should do the same for the 
> dfs.namenode.(name|edit).dir.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14892) Close the output stream if createWrappedOutputStream() fails

2019-10-04 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14892:
---
Component/s: encryption

> Close the output stream if createWrappedOutputStream() fails
> 
>
> Key: HDFS-14892
> URL: https://issues.apache.org/jira/browse/HDFS-14892
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption
>Reporter: Kihwal Lee
>Priority: Major
>
> create() in an encryption zone is a two step process by the client. First, a 
> regular FSOutputStream is created and then it is wrapped with an encrypted 
> stream.  When there is a system issue or a KMS ACL-based denial, the second 
> phase will fail. If the client terminates right away, the shutdown hook 
> closes the output stream opened in the first phase.  But if the client lives 
> on, the output stream will leak.
> Datanode's WebHdfsHandler, DFSClient, DistributedFileSystem, Hdfs 
> (FileContext) and RpcProgramNfs3 do this.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14686) HttpFS: HttpFSFileSystem#getErasureCodingPolicy always returns null

2019-10-04 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14686:
---
Fix Version/s: 3.2.2
   3.1.4

> HttpFS: HttpFSFileSystem#getErasureCodingPolicy always returns null
> ---
>
> Key: HDFS-14686
> URL: https://issues.apache.org/jira/browse/HDFS-14686
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: httpfs
>Affects Versions: 3.2.0
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
>
> The root cause is that *FSOperations#contentSummaryToJSON* doesn't parse 
> *ContentSummary.erasureCodingPolicy* into the json.
> The expected behavior is that *HttpFSFileSystem#getErasureCodingPolicy* 
> should at least return "" (empty string, for directories or symlinks), or 
> "Replicated" (for non-EC files), "RS-6-3-1024k", etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14892) Close the output stream if createWrappedOutputStream() fails

2019-10-04 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944675#comment-16944675
 ] 

Wei-Chiu Chuang commented on HDFS-14892:


Good finding, Kihwal. I am aware the client creates an empty file in such a 
case but didn't realize the upstream leaks.

> Close the output stream if createWrappedOutputStream() fails
> 
>
> Key: HDFS-14892
> URL: https://issues.apache.org/jira/browse/HDFS-14892
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Priority: Major
>
> create() in an encryption zone is a two step process by the client. First, a 
> regular FSOutputStream is created and then it is wrapped with an encrypted 
> stream.  When there is a system issue or a KMS ACL-based denial, the second 
> phase will fail. If the client terminates right away, the shutdown hook 
> closes the output stream opened in the first phase.  But if the client lives 
> on, the output stream will leak.
> Datanode's WebHdfsHandler, DFSClient, DistributedFileSystem, Hdfs 
> (FileContext) and RpcProgramNfs3 do this.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13693) Remove unnecessary search in INodeDirectory.addChild during image loading

2019-10-04 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-13693:
---
Fix Version/s: 3.2.2
   3.1.4

> Remove unnecessary search in INodeDirectory.addChild during image loading
> -
>
> Key: HDFS-13693
> URL: https://issues.apache.org/jira/browse/HDFS-13693
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: zhouyingchao
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-13693-001.patch, HDFS-13693-002.patch, 
> HDFS-13693-003.patch, HDFS-13693-004.patch, HDFS-13693-005.patch
>
>
> In FSImageFormatPBINode.loadINodeDirectorySection, all child INodes are added 
> to their parent INode's map one by one. The adding procedure will search a 
> position in the parent's map and then insert the child to the position. 
> However, during image loading, the search is unnecessary since the insert 
> position should always be at the end of the map given the sequence they are 
> serialized on disk.
> Test this patch against a fsimage of a 70PB  cluster (200million files and 
> 300million blocks), the image loading time be reduced from 1210 seconds to 
> 1138 seconds.So it can reduce up to about 10% of time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14637) Namenode may not replicate blocks to meet the policy after enabling upgradeDomain

2019-10-03 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14637:
---
Fix Version/s: 3.2.2
   3.1.4
   3.3.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Pushed to trunk. Cherry picked the commit to branch-3.2 and branch-3.1 with 
trivial conflicts in test code.

Thanks [~sodonnell] and [~ayushtkn]!

> Namenode may not replicate blocks to meet the policy after enabling 
> upgradeDomain
> -
>
> Key: HDFS-14637
> URL: https://issues.apache.org/jira/browse/HDFS-14637
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-14637.001.patch, HDFS-14637.002.patch, 
> HDFS-14637.003.patch, HDFS-14637.004.patch, HDFS-14637.005.patch, 
> HDFS-14637.branch-3.1.patch, HDFS-14637.branch-3.2.patch
>
>
> After changing the network topology or placement policy on a cluster and 
> restarting the namenode, the namenode will scan all blocks on the cluster at 
> startup, and check if they meet the current placement policy. If they do not, 
> they are added to the replication queue and the namenode will arrange for 
> them to be replicated to ensure the placement policy is used.
> If you start with a cluster with no UpgradeDomain, and then enable 
> UpgradeDomain, then on restart the NN does notice all the blocks violate the 
> placement policy and it adds them to the replication queue. I believe there 
> are some issues in the logic that prevents the blocks from replicating 
> depending on the setup:
> With UD enabled, but no racks configured, and possible on a 2 rack cluster, 
> the queued replication work never makes any progress, as in 
> blockManager.validateReconstructionWork(), it checks to see if the new 
> replica increases the number of racks, and if it does not, it skips it and 
> tries again later.
> {code:java}
> DatanodeStorageInfo[] targets = rw.getTargets();
> if ((numReplicas.liveReplicas() >= requiredRedundancy) &&
> (!isPlacementPolicySatisfied(block)) ) {
>   if (!isInNewRack(rw.getSrcNodes(), targets[0].getDatanodeDescriptor())) {
> // No use continuing, unless a new rack in this case
> return false;
>   }
>   // mark that the reconstruction work is to replicate internal block to a
>   // new rack.
>   rw.setNotEnoughRack();
> }
> {code}
> Additionally, in blockManager.scheduleReconstruction() is there some logic 
> that sets the number of new replicas required to one, if the live replicas >= 
> requiredReduncancy:
> {code:java}
> int additionalReplRequired;
> if (numReplicas.liveReplicas() < requiredRedundancy) {
>   additionalReplRequired = requiredRedundancy - numReplicas.liveReplicas()
>   - pendingNum;
> } else {
>   additionalReplRequired = 1; // Needed on a new rack
> }{code}
> With UD, it is possible for 2 new replicas to be needed to meet the block 
> placement policy, if all existing replicas are on nodes with the same domain. 
> For traditional '2 rack redundancy', only 1 new replica would ever have been 
> needed in this scenario.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14637) Namenode may not replicate blocks to meet the policy after enabling upgradeDomain

2019-10-03 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14637:
---
Attachment: HDFS-14637.branch-3.2.patch
HDFS-14637.branch-3.1.patch

> Namenode may not replicate blocks to meet the policy after enabling 
> upgradeDomain
> -
>
> Key: HDFS-14637
> URL: https://issues.apache.org/jira/browse/HDFS-14637
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-14637.001.patch, HDFS-14637.002.patch, 
> HDFS-14637.003.patch, HDFS-14637.004.patch, HDFS-14637.005.patch, 
> HDFS-14637.branch-3.1.patch, HDFS-14637.branch-3.2.patch
>
>
> After changing the network topology or placement policy on a cluster and 
> restarting the namenode, the namenode will scan all blocks on the cluster at 
> startup, and check if they meet the current placement policy. If they do not, 
> they are added to the replication queue and the namenode will arrange for 
> them to be replicated to ensure the placement policy is used.
> If you start with a cluster with no UpgradeDomain, and then enable 
> UpgradeDomain, then on restart the NN does notice all the blocks violate the 
> placement policy and it adds them to the replication queue. I believe there 
> are some issues in the logic that prevents the blocks from replicating 
> depending on the setup:
> With UD enabled, but no racks configured, and possible on a 2 rack cluster, 
> the queued replication work never makes any progress, as in 
> blockManager.validateReconstructionWork(), it checks to see if the new 
> replica increases the number of racks, and if it does not, it skips it and 
> tries again later.
> {code:java}
> DatanodeStorageInfo[] targets = rw.getTargets();
> if ((numReplicas.liveReplicas() >= requiredRedundancy) &&
> (!isPlacementPolicySatisfied(block)) ) {
>   if (!isInNewRack(rw.getSrcNodes(), targets[0].getDatanodeDescriptor())) {
> // No use continuing, unless a new rack in this case
> return false;
>   }
>   // mark that the reconstruction work is to replicate internal block to a
>   // new rack.
>   rw.setNotEnoughRack();
> }
> {code}
> Additionally, in blockManager.scheduleReconstruction() is there some logic 
> that sets the number of new replicas required to one, if the live replicas >= 
> requiredReduncancy:
> {code:java}
> int additionalReplRequired;
> if (numReplicas.liveReplicas() < requiredRedundancy) {
>   additionalReplRequired = requiredRedundancy - numReplicas.liveReplicas()
>   - pendingNum;
> } else {
>   additionalReplRequired = 1; // Needed on a new rack
> }{code}
> With UD, it is possible for 2 new replicas to be needed to meet the block 
> placement policy, if all existing replicas are on nodes with the same domain. 
> For traditional '2 rack redundancy', only 1 new replica would ever have been 
> needed in this scenario.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14850) Optimize FileSystemAccessService#getFileSystemConfiguration

2019-10-03 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14850:
---
Fix Version/s: 3.2.2
   3.1.4

> Optimize FileSystemAccessService#getFileSystemConfiguration
> ---
>
> Key: HDFS-14850
> URL: https://issues.apache.org/jira/browse/HDFS-14850
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: httpfs, performance
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-14850.001.patch, HDFS-14850.002.patch, 
> HDFS-14850.003.patch, HDFS-14850.004(2).patch, HDFS-14850.004.patch, 
> HDFS-14850.005.patch
>
>
> {code:java}
>  @Override
>   public Configuration getFileSystemConfiguration() {
> Configuration conf = new Configuration(true);
> ConfigurationUtils.copy(serviceHadoopConf, conf);
> conf.setBoolean(FILE_SYSTEM_SERVICE_CREATED, true);
> // Force-clear server-side umask to make HttpFS match WebHDFS behavior
> conf.set(FsPermission.UMASK_LABEL, "000");
> return conf;
>   }
> {code}
> As above code,when call 
> FileSystemAccessService#getFileSystemConfiguration,current code  new 
> Configuration every time.  
> It is not necessary and affects performance. I think it only need to new 
> Configuration in FileSystemAccessService#init once and  
> FileSystemAccessService#getFileSystemConfiguration get it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14527) Stop all DataNodes may result in NN terminate

2019-10-03 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14527:
---
Fix Version/s: 3.1.4

> Stop all DataNodes may result in NN terminate
> -
>
> Key: HDFS-14527
> URL: https://issues.apache.org/jira/browse/HDFS-14527
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-14527.001.patch, HDFS-14527.002.patch, 
> HDFS-14527.003.patch, HDFS-14527.004.patch, HDFS-14527.005.patch
>
>
> If we stop all datanodes of cluster, BlockPlacementPolicyDefault#chooseTarget 
> may get ArithmeticException when calling #getMaxNodesPerRack, which throws 
> the runtime exception out to BlockManager's ReplicationMonitor thread and 
> then terminate the NN.
> The root cause is that BlockPlacementPolicyDefault#chooseTarget not hold the 
> global lock, and if all DataNodes are dead between 
> {{clusterMap.getNumberOfLeaves()}} and {{getMaxNodesPerRack}} then it meet 
> {{ArithmeticException}} while invoke {{getMaxNodesPerRack}}.
> {code:java}
>   private DatanodeStorageInfo[] chooseTarget(int numOfReplicas,
> Node writer,
> List chosenStorage,
> boolean returnChosenNodes,
> Set excludedNodes,
> long blocksize,
> final BlockStoragePolicy storagePolicy,
> EnumSet addBlockFlags,
> EnumMap sTypes) {
> if (numOfReplicas == 0 || clusterMap.getNumOfLeaves()==0) {
>   return DatanodeStorageInfo.EMPTY_ARRAY;
> }
> ..
> int[] result = getMaxNodesPerRack(chosenStorage.size(), numOfReplicas);
> ..
> }
> {code}
> Some detailed log show as following.
> {code:java}
> 2019-05-31 12:29:21,803 ERROR 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
> ReplicationMonitor thread received Runtime exception. 
> java.lang.ArithmeticException: / by zero
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.getMaxNodesPerRack(BlockPlacementPolicyDefault.java:282)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:228)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:132)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.chooseTargets(BlockManager.java:4533)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.access$1800(BlockManager.java:4493)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1954)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1830)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4453)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4388)
> at java.lang.Thread.run(Thread.java:745)
> 2019-05-31 12:29:21,805 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
> status 1
> {code}
> To be honest, this is not serious bug and not reprod easily, since if we stop 
> all Datanodes and only keep NameNode lives, HDFS could be not offer service 
> normally and we could only retrieve directory. It may be one corner case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14624) When decommissioning a node, log remaining blocks to replicate periodically

2019-10-03 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14624:
---
Fix Version/s: 3.2.2
   3.1.4

> When decommissioning a node, log remaining blocks to replicate periodically
> ---
>
> Key: HDFS-14624
> URL: https://issues.apache.org/jira/browse/HDFS-14624
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-14624.001.patch, HDFS-14624.002.patch, 
> HDFS-14624.003.patch
>
>
> When a node is marked for decommission, there is a monitor thread which runs 
> every 30 seconds by default, and checks if the node still has pending blocks 
> to be replicated before the node can complete replication.
> There are two existing debug level messages logged in the monitor thread, 
> DatanodeAdminManager$Monitor.check(), which log the correct information 
> already, first as the pending blocks are replicated:
> {code:java}
> LOG.debug("Node {} still has {} blocks to replicate "
> + "before it is a candidate to finish {}.",
> dn, blocks.size(), dn.getAdminState());{code}
> And then after the initial set of blocks has completed and a rescan happens:
> {code:java}
> LOG.debug("Node {} {} healthy."
> + " It needs to replicate {} more blocks."
> + " {} is still in progress.", dn,
> isHealthy ? "is": "isn't", blocks.size(), dn.getAdminState());{code}
> I would like to propose moving these messages to INFO level so it is easier 
> to monitor decommission progress over time from the Namenode log.
> Based on the default settings, this would result in at most 1 log message per 
> node being decommissioned every 30 seconds. The reason this is at the most, 
> is because the monitor thread stops after checking after 500K blocks and 
> therefore in practice it could be as little as 1 log message per 30 seconds, 
> even if many DNs are being decommissioned at the same time.
> Note that the namenode webUI does display the above information, but having 
> this in the NN logs would allow progress to be tracked more easily.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14499) Misleading REM_QUOTA value with snapshot and trash feature enabled for a directory

2019-10-03 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14499:
---
Fix Version/s: 3.2.2
   3.1.4

> Misleading REM_QUOTA value with snapshot and trash feature enabled for a 
> directory
> --
>
> Key: HDFS-14499
> URL: https://issues.apache.org/jira/browse/HDFS-14499
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-14499.000.patch, HDFS-14499.001.patch, 
> HDFS-14499.002.patch
>
>
> This is the flow of steps where we see a discrepancy between REM_QUOTA and 
> new file operation failure. REM_QUOTA shows a value of  1 but file creation 
> operation does not succeed.
> {code:java}
> hdfs@c3265-node3 root$ hdfs dfs -mkdir /dir1
> hdfs@c3265-node3 root$ hdfs dfsadmin -setQuota 2 /dir1
> hdfs@c3265-node3 root$ hdfs dfsadmin -allowSnapshot /dir1
> Allowing snaphot on /dir1 succeeded
> hdfs@c3265-node3 root$ hdfs dfs -touchz /dir1/file1
> hdfs@c3265-node3 root$ hdfs dfs -createSnapshot /dir1 snap1
> Created snapshot /dir1/.snapshot/snap1
> hdfs@c3265-node3 root$ hdfs dfs -count -v -q /dir1
> QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE 
> PATHNAME
> 2 0 none inf 1 1 0 /dir1
> hdfs@c3265-node3 root$ hdfs dfs -rm /dir1/file1
> 19/03/26 11:20:25 INFO fs.TrashPolicyDefault: Moved: 
> 'hdfs://smajetinn/dir1/file1' to trash at: 
> hdfs://smajetinn/user/hdfs/.Trash/Current/dir1/file11553599225772
> hdfs@c3265-node3 root$ hdfs dfs -count -v -q /dir1
> QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA DIR_COUNT FILE_COUNT CONTENT_SIZE 
> PATHNAME
> 2 1 none inf 1 0 0 /dir1
> hdfs@c3265-node3 root$ hdfs dfs -touchz /dir1/file1
> touchz: The NameSpace quota (directories and files) of directory /dir1 is 
> exceeded: quota=2 file count=3{code}
> The issue here, is that the count command takes only files and directories 
> into account not the inode references. When trash is enabled, the deletion of 
> files inside a directory actually does a rename operation as a result of 
> which an inode reference is maintained in the deleted list of the snapshot 
> diff which is taken into account while computing the namespace quota, but 
> count command (getContentSummary()) ,just takes into account just the files 
> and directories, not the referenced entity for calculating the REM_QUOTA. The 
> referenced entity is taken into account for space quota only.
> InodeReference.java:
> ---
> {code:java}
>  @Override
> public final ContentSummaryComputationContext computeContentSummary(
> int snapshotId, ContentSummaryComputationContext summary) {
>   final int s = snapshotId < lastSnapshotId ? snapshotId : lastSnapshotId;
>   // only count storagespace for WithName
>   final QuotaCounts q = computeQuotaUsage(
>   summary.getBlockStoragePolicySuite(), getStoragePolicyID(), false, 
> s);
>   summary.getCounts().addContent(Content.DISKSPACE, q.getStorageSpace());
>   summary.getCounts().addTypeSpaces(q.getTypeSpaces());
>   return summary;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14113) EC : Add Configuration to restrict UserDefined Policies

2019-10-03 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14113:
---
Fix Version/s: 3.2.2
   3.1.4

> EC : Add Configuration to restrict UserDefined Policies
> ---
>
> Key: HDFS-14113
> URL: https://issues.apache.org/jira/browse/HDFS-14113
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-14113-01.patch, HDFS-14113-02.patch, 
> HDFS-14113-03.patch
>
>
> By default addition of erasure coding policies is enabled for users.We need 
> to add configuration whether to allow addition of new User Defined policies 
> or not.Which can be configured in for of a Boolean value at the server side.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14124) EC : Support EC Commands (set/get/unset EcPolicy) via WebHdfs

2019-10-03 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14124:
---
Fix Version/s: 3.1.4

> EC : Support EC Commands (set/get/unset EcPolicy) via WebHdfs
> -
>
> Key: HDFS-14124
> URL: https://issues.apache.org/jira/browse/HDFS-14124
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding, httpfs, webhdfs
>Reporter: Souryakanta Dwivedy
>Assignee: Ayush Saxena
>Priority: Major
> Fix For: 3.3.0, 3.2.1, 3.1.4
>
> Attachments: HDFS-14124-01.patch, HDFS-14124-02.patch, 
> HDFS-14124-03.patch, HDFS-14124-04.patch, HDFS-14124-04.patch, 
> HDFS-14124.branch-3.1.patch
>
>
> EC : Support EC Commands (set/get/unset EcPolicy) via WebHdfs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14124) EC : Support EC Commands (set/get/unset EcPolicy) via WebHdfs

2019-10-03 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944129#comment-16944129
 ] 

Wei-Chiu Chuang commented on HDFS-14124:


Pushed to branch-3.1. There's just a trivial conflict in the doc. Attached  
[^HDFS-14124.branch-3.1.patch]  for posterity.

> EC : Support EC Commands (set/get/unset EcPolicy) via WebHdfs
> -
>
> Key: HDFS-14124
> URL: https://issues.apache.org/jira/browse/HDFS-14124
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding, httpfs, webhdfs
>Reporter: Souryakanta Dwivedy
>Assignee: Ayush Saxena
>Priority: Major
> Fix For: 3.3.0, 3.2.1, 3.1.4
>
> Attachments: HDFS-14124-01.patch, HDFS-14124-02.patch, 
> HDFS-14124-03.patch, HDFS-14124-04.patch, HDFS-14124-04.patch, 
> HDFS-14124.branch-3.1.patch
>
>
> EC : Support EC Commands (set/get/unset EcPolicy) via WebHdfs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14124) EC : Support EC Commands (set/get/unset EcPolicy) via WebHdfs

2019-10-03 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14124:
---
Attachment: HDFS-14124.branch-3.1.patch

> EC : Support EC Commands (set/get/unset EcPolicy) via WebHdfs
> -
>
> Key: HDFS-14124
> URL: https://issues.apache.org/jira/browse/HDFS-14124
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding, httpfs, webhdfs
>Reporter: Souryakanta Dwivedy
>Assignee: Ayush Saxena
>Priority: Major
> Fix For: 3.3.0, 3.2.1, 3.1.4
>
> Attachments: HDFS-14124-01.patch, HDFS-14124-02.patch, 
> HDFS-14124-03.patch, HDFS-14124-04.patch, HDFS-14124-04.patch, 
> HDFS-14124.branch-3.1.patch
>
>
> EC : Support EC Commands (set/get/unset EcPolicy) via WebHdfs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14187) Make warning message more clear when there are not enough data nodes for EC write

2019-10-03 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14187:
---
Fix Version/s: 3.2.2
   3.1.4

> Make warning message more clear when there are not enough data nodes for EC 
> write
> -
>
> Key: HDFS-14187
> URL: https://issues.apache.org/jira/browse/HDFS-14187
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding
>Affects Versions: 3.1.1
>Reporter: Kitti Nanasi
>Assignee: Kitti Nanasi
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-14187.001.patch
>
>
> When setting an erasure coding policy for which there are not enough racks or 
> data nodes, write will fail with the following message:
> {code:java}
> [root@oks-upgrade6727-1 ~]# sudo -u systest hdfs dfs -mkdir 
> /user/systest/testdir
> [root@oks-upgrade6727-1 ~]# sudo -u hdfs hdfs ec -setPolicy -path 
> /user/systest/testdir
> Set default erasure coding policy on /user/systest/testdir
> [root@oks-upgrade6727-1 ~]# sudo -u systest hdfs dfs -put /tmp/file1 
> /user/systest/testdir
> 18/11/12 05:41:26 WARN hdfs.DFSOutputStream: Cannot allocate parity 
> block(index=3, policy=RS-3-2-1024k). Not enough datanodes? Exclude nodes=[]
> 18/11/12 05:41:26 WARN hdfs.DFSOutputStream: Cannot allocate parity 
> block(index=4, policy=RS-3-2-1024k). Not enough datanodes? Exclude nodes=[]
> 18/11/12 05:41:26 WARN hdfs.DFSOutputStream: Block group <1> failed to write 
> 2 blocks. It's at high risk of losing data.
> {code}
> I suggest to log a more descriptive message suggesting to use hdfs ec 
> -verifyCluster command to verify the cluster setup against the ec policies.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14064) WEBHDFS: Support Enable/Disable EC Policy

2019-10-03 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944111#comment-16944111
 ] 

Wei-Chiu Chuang commented on HDFS-14064:


Pushed to branch-3.1 There's just a trivial conflict. Attached patch for 
posterity.  [^HDFS-14064.branch-3.1.patch] 

> WEBHDFS: Support Enable/Disable EC Policy
> -
>
> Key: HDFS-14064
> URL: https://issues.apache.org/jira/browse/HDFS-14064
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding, webhdfs
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Fix For: 3.3.0, 3.2.1, 3.1.4
>
> Attachments: HDFS-14064-01.patch, HDFS-14064-02.patch, 
> HDFS-14064-03.patch, HDFS-14064-04.patch, HDFS-14064-04.patch, 
> HDFS-14064-05.patch, HDFS-14064.branch-3.1.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14064) WEBHDFS: Support Enable/Disable EC Policy

2019-10-03 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14064:
---
Fix Version/s: 3.1.4

> WEBHDFS: Support Enable/Disable EC Policy
> -
>
> Key: HDFS-14064
> URL: https://issues.apache.org/jira/browse/HDFS-14064
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding, webhdfs
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Fix For: 3.3.0, 3.2.1, 3.1.4
>
> Attachments: HDFS-14064-01.patch, HDFS-14064-02.patch, 
> HDFS-14064-03.patch, HDFS-14064-04.patch, HDFS-14064-04.patch, 
> HDFS-14064-05.patch, HDFS-14064.branch-3.1.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14064) WEBHDFS: Support Enable/Disable EC Policy

2019-10-03 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14064:
---
Attachment: HDFS-14064.branch-3.1.patch

> WEBHDFS: Support Enable/Disable EC Policy
> -
>
> Key: HDFS-14064
> URL: https://issues.apache.org/jira/browse/HDFS-14064
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding, webhdfs
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Fix For: 3.3.0, 3.2.1
>
> Attachments: HDFS-14064-01.patch, HDFS-14064-02.patch, 
> HDFS-14064-03.patch, HDFS-14064-04.patch, HDFS-14064-04.patch, 
> HDFS-14064-05.patch, HDFS-14064.branch-3.1.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14064) WEBHDFS: Support Enable/Disable EC Policy

2019-10-03 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14064:
---
Component/s: webhdfs
 erasure-coding

> WEBHDFS: Support Enable/Disable EC Policy
> -
>
> Key: HDFS-14064
> URL: https://issues.apache.org/jira/browse/HDFS-14064
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding, webhdfs
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Fix For: 3.3.0, 3.2.1
>
> Attachments: HDFS-14064-01.patch, HDFS-14064-02.patch, 
> HDFS-14064-03.patch, HDFS-14064-04.patch, HDFS-14064-04.patch, 
> HDFS-14064-05.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14849) Erasure Coding: the internal block is replicated many times when datanode is decommissioning

2019-10-03 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944096#comment-16944096
 ] 

Wei-Chiu Chuang commented on HDFS-14849:


Cherrypicked the commit to branch-3.2 without conflicts.
There is a trivial conflict for branch-3.1. So attached a patch  
[^HDFS-14849.branch-3.1.patch] for posterity.

> Erasure Coding: the internal block is replicated many times when datanode is 
> decommissioning
> 
>
> Key: HDFS-14849
> URL: https://issues.apache.org/jira/browse/HDFS-14849
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec, erasure-coding
>Affects Versions: 3.3.0
>Reporter: HuangTao
>Assignee: HuangTao
>Priority: Major
>  Labels: EC, HDFS, NameNode
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-14849.001.patch, HDFS-14849.002.patch, 
> HDFS-14849.branch-3.1.patch, fsck-file.png, liveBlockIndices.png, 
> scheduleReconstruction.png
>
>
> When the datanode keeping in DECOMMISSION_INPROGRESS status, the EC internal 
> block in that datanode will be replicated many times.
> // added 2019/09/19
> I reproduced this scenario in a 163 nodes cluster with decommission 100 nodes 
> simultaneously. 
>  !scheduleReconstruction.png! 
>  !fsck-file.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14849) Erasure Coding: the internal block is replicated many times when datanode is decommissioning

2019-10-03 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14849:
---
Fix Version/s: 3.1.4

> Erasure Coding: the internal block is replicated many times when datanode is 
> decommissioning
> 
>
> Key: HDFS-14849
> URL: https://issues.apache.org/jira/browse/HDFS-14849
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec, erasure-coding
>Affects Versions: 3.3.0
>Reporter: HuangTao
>Assignee: HuangTao
>Priority: Major
>  Labels: EC, HDFS, NameNode
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-14849.001.patch, HDFS-14849.002.patch, 
> HDFS-14849.branch-3.1.patch, fsck-file.png, liveBlockIndices.png, 
> scheduleReconstruction.png
>
>
> When the datanode keeping in DECOMMISSION_INPROGRESS status, the EC internal 
> block in that datanode will be replicated many times.
> // added 2019/09/19
> I reproduced this scenario in a 163 nodes cluster with decommission 100 nodes 
> simultaneously. 
>  !scheduleReconstruction.png! 
>  !fsck-file.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14849) Erasure Coding: the internal block is replicated many times when datanode is decommissioning

2019-10-03 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14849:
---
Attachment: HDFS-14849.branch-3.1.patch

> Erasure Coding: the internal block is replicated many times when datanode is 
> decommissioning
> 
>
> Key: HDFS-14849
> URL: https://issues.apache.org/jira/browse/HDFS-14849
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec, erasure-coding
>Affects Versions: 3.3.0
>Reporter: HuangTao
>Assignee: HuangTao
>Priority: Major
>  Labels: EC, HDFS, NameNode
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-14849.001.patch, HDFS-14849.002.patch, 
> HDFS-14849.branch-3.1.patch, fsck-file.png, liveBlockIndices.png, 
> scheduleReconstruction.png
>
>
> When the datanode keeping in DECOMMISSION_INPROGRESS status, the EC internal 
> block in that datanode will be replicated many times.
> // added 2019/09/19
> I reproduced this scenario in a 163 nodes cluster with decommission 100 nodes 
> simultaneously. 
>  !scheduleReconstruction.png! 
>  !fsck-file.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14849) Erasure Coding: the internal block is replicated many times when datanode is decommissioning

2019-10-03 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14849:
---
Fix Version/s: 3.2.2

> Erasure Coding: the internal block is replicated many times when datanode is 
> decommissioning
> 
>
> Key: HDFS-14849
> URL: https://issues.apache.org/jira/browse/HDFS-14849
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec, erasure-coding
>Affects Versions: 3.3.0
>Reporter: HuangTao
>Assignee: HuangTao
>Priority: Major
>  Labels: EC, HDFS, NameNode
> Fix For: 3.3.0, 3.2.2
>
> Attachments: HDFS-14849.001.patch, HDFS-14849.002.patch, 
> fsck-file.png, liveBlockIndices.png, scheduleReconstruction.png
>
>
> When the datanode keeping in DECOMMISSION_INPROGRESS status, the EC internal 
> block in that datanode will be replicated many times.
> // added 2019/09/19
> I reproduced this scenario in a 163 nodes cluster with decommission 100 nodes 
> simultaneously. 
>  !scheduleReconstruction.png! 
>  !fsck-file.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14754) Erasure Coding : The number of Under-Replicated Blocks never reduced

2019-10-03 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943826#comment-16943826
 ] 

Wei-Chiu Chuang commented on HDFS-14754:


[~surendrasingh] [~hemanthboyina]
the test doesn't seem valid. If i remove the fix, the test still passes. Can 
you please recheck?

> Erasure Coding :  The number of Under-Replicated Blocks never reduced
> -
>
> Key: HDFS-14754
> URL: https://issues.apache.org/jira/browse/HDFS-14754
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Critical
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-14754-addendum.001.patch, HDFS-14754.001.patch, 
> HDFS-14754.002.patch, HDFS-14754.003.patch, HDFS-14754.004.patch, 
> HDFS-14754.005.patch, HDFS-14754.006.patch, HDFS-14754.007.patch, 
> HDFS-14754.008.patch, HDFS-14754.branch-3.1.patch
>
>
> Using EC RS-3-2, 6 DN 
> We came accross a scenario where in the EC 5 blocks , same block is 
> replicated thrice and two blocks got missing
> Replicated block was not deleting and missing block is not able to ReConstruct



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14637) Namenode may not replicate blocks to meet the policy after enabling upgradeDomain

2019-10-03 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943740#comment-16943740
 ] 

Wei-Chiu Chuang commented on HDFS-14637:


+1

> Namenode may not replicate blocks to meet the policy after enabling 
> upgradeDomain
> -
>
> Key: HDFS-14637
> URL: https://issues.apache.org/jira/browse/HDFS-14637
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-14637.001.patch, HDFS-14637.002.patch, 
> HDFS-14637.003.patch, HDFS-14637.004.patch, HDFS-14637.005.patch
>
>
> After changing the network topology or placement policy on a cluster and 
> restarting the namenode, the namenode will scan all blocks on the cluster at 
> startup, and check if they meet the current placement policy. If they do not, 
> they are added to the replication queue and the namenode will arrange for 
> them to be replicated to ensure the placement policy is used.
> If you start with a cluster with no UpgradeDomain, and then enable 
> UpgradeDomain, then on restart the NN does notice all the blocks violate the 
> placement policy and it adds them to the replication queue. I believe there 
> are some issues in the logic that prevents the blocks from replicating 
> depending on the setup:
> With UD enabled, but no racks configured, and possible on a 2 rack cluster, 
> the queued replication work never makes any progress, as in 
> blockManager.validateReconstructionWork(), it checks to see if the new 
> replica increases the number of racks, and if it does not, it skips it and 
> tries again later.
> {code:java}
> DatanodeStorageInfo[] targets = rw.getTargets();
> if ((numReplicas.liveReplicas() >= requiredRedundancy) &&
> (!isPlacementPolicySatisfied(block)) ) {
>   if (!isInNewRack(rw.getSrcNodes(), targets[0].getDatanodeDescriptor())) {
> // No use continuing, unless a new rack in this case
> return false;
>   }
>   // mark that the reconstruction work is to replicate internal block to a
>   // new rack.
>   rw.setNotEnoughRack();
> }
> {code}
> Additionally, in blockManager.scheduleReconstruction() is there some logic 
> that sets the number of new replicas required to one, if the live replicas >= 
> requiredReduncancy:
> {code:java}
> int additionalReplRequired;
> if (numReplicas.liveReplicas() < requiredRedundancy) {
>   additionalReplRequired = requiredRedundancy - numReplicas.liveReplicas()
>   - pendingNum;
> } else {
>   additionalReplRequired = 1; // Needed on a new rack
> }{code}
> With UD, it is possible for 2 new replicas to be needed to meet the block 
> placement policy, if all existing replicas are on nodes with the same domain. 
> For traditional '2 rack redundancy', only 1 new replica would ever have been 
> needed in this scenario.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14216) NullPointerException happens in NamenodeWebHdfs

2019-10-02 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14216:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> NullPointerException happens in NamenodeWebHdfs
> ---
>
> Key: HDFS-14216
> URL: https://issues.apache.org/jira/browse/HDFS-14216
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: lujie
>Assignee: lujie
>Priority: Critical
> Fix For: 3.3.0, 3.1.4, 3.2.1
>
> Attachments: HDFS-14216.branch-3.1.patch, HDFS-14216_1.patch, 
> HDFS-14216_2.patch, HDFS-14216_3.patch, HDFS-14216_4.patch, 
> HDFS-14216_5.patch, HDFS-14216_6.patch, hadoop-hires-namenode-hadoop11.log
>
>
>  workload
> {code:java}
> curl -i -X PUT -T $HOMEPARH/test.txt 
> "http://hadoop1:9870/webhdfs/v1/input?op=CREATE&excludedatanodes=hadoop2";
> {code}
> the method
> {code:java}
> org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.chooseDatanode(String
>  excludeDatanodes){
>     HashSet excludes = new HashSet();
> if (excludeDatanodes != null) {
>for (String host : StringUtils
>  .getTrimmedStringCollection(excludeDatanodes)) {
>  int idx = host.indexOf(":");
>if (idx != -1) { 
> excludes.add(bm.getDatanodeManager().getDatanodeByXferAddr(
>host.substring(0, idx), Integer.parseInt(host.substring(idx + 
> 1;
>} else {
>   
> excludes.add(bm.getDatanodeManager().getDatanodeByHost(host));//line280
>}
>   }
> }
> }
> {code}
> when datanode(e.g.hadoop2) is {color:#d04437}just  wiped before 
> line280{color}, or{color:#33} 
> {color}{color:#ff}we{color}{color:#ff} give the wrong DN 
> name{color}*,*then  bm.getDatanodeManager().getDatanodeByHost(host) will 
> return null, *_excludes_* *containes null*. while *_excludes_* are used 
> later, NPE happens:
> {code:java}
> java.lang.NullPointerException
> at org.apache.hadoop.net.NodeBase.getPath(NodeBase.java:113)
> at 
> org.apache.hadoop.net.NetworkTopology.countNumOfAvailableNodes(NetworkTopology.java:672)
> at 
> org.apache.hadoop.net.NetworkTopology.chooseRandom(NetworkTopology.java:533)
> at 
> org.apache.hadoop.net.NetworkTopology.chooseRandom(NetworkTopology.java:491)
> at 
> org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.chooseDatanode(NamenodeWebHdfsMethods.java:323)
> at 
> org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.redirectURI(NamenodeWebHdfsMethods.java:384)
> at 
> org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.put(NamenodeWebHdfsMethods.java:652)
> at 
> org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods$2.run(NamenodeWebHdfsMethods.java:600)
> at 
> org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods$2.run(NamenodeWebHdfsMethods.java:597)
> at org.apache.hadoop.ipc.ExternalCall.run(ExternalCall.java:73)
> at org.apache.hadoop.ipc.ExternalCall.run(ExternalCall.java:30)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2830)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14216) NullPointerException happens in NamenodeWebHdfs

2019-10-02 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14216:
---
Fix Version/s: 3.1.4

> NullPointerException happens in NamenodeWebHdfs
> ---
>
> Key: HDFS-14216
> URL: https://issues.apache.org/jira/browse/HDFS-14216
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: lujie
>Assignee: lujie
>Priority: Critical
> Fix For: 3.3.0, 3.2.1, 3.1.4
>
> Attachments: HDFS-14216.branch-3.1.patch, HDFS-14216_1.patch, 
> HDFS-14216_2.patch, HDFS-14216_3.patch, HDFS-14216_4.patch, 
> HDFS-14216_5.patch, HDFS-14216_6.patch, hadoop-hires-namenode-hadoop11.log
>
>
>  workload
> {code:java}
> curl -i -X PUT -T $HOMEPARH/test.txt 
> "http://hadoop1:9870/webhdfs/v1/input?op=CREATE&excludedatanodes=hadoop2";
> {code}
> the method
> {code:java}
> org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.chooseDatanode(String
>  excludeDatanodes){
>     HashSet excludes = new HashSet();
> if (excludeDatanodes != null) {
>for (String host : StringUtils
>  .getTrimmedStringCollection(excludeDatanodes)) {
>  int idx = host.indexOf(":");
>if (idx != -1) { 
> excludes.add(bm.getDatanodeManager().getDatanodeByXferAddr(
>host.substring(0, idx), Integer.parseInt(host.substring(idx + 
> 1;
>} else {
>   
> excludes.add(bm.getDatanodeManager().getDatanodeByHost(host));//line280
>}
>   }
> }
> }
> {code}
> when datanode(e.g.hadoop2) is {color:#d04437}just  wiped before 
> line280{color}, or{color:#33} 
> {color}{color:#ff}we{color}{color:#ff} give the wrong DN 
> name{color}*,*then  bm.getDatanodeManager().getDatanodeByHost(host) will 
> return null, *_excludes_* *containes null*. while *_excludes_* are used 
> later, NPE happens:
> {code:java}
> java.lang.NullPointerException
> at org.apache.hadoop.net.NodeBase.getPath(NodeBase.java:113)
> at 
> org.apache.hadoop.net.NetworkTopology.countNumOfAvailableNodes(NetworkTopology.java:672)
> at 
> org.apache.hadoop.net.NetworkTopology.chooseRandom(NetworkTopology.java:533)
> at 
> org.apache.hadoop.net.NetworkTopology.chooseRandom(NetworkTopology.java:491)
> at 
> org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.chooseDatanode(NamenodeWebHdfsMethods.java:323)
> at 
> org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.redirectURI(NamenodeWebHdfsMethods.java:384)
> at 
> org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.put(NamenodeWebHdfsMethods.java:652)
> at 
> org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods$2.run(NamenodeWebHdfsMethods.java:600)
> at 
> org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods$2.run(NamenodeWebHdfsMethods.java:597)
> at org.apache.hadoop.ipc.ExternalCall.run(ExternalCall.java:73)
> at org.apache.hadoop.ipc.ExternalCall.run(ExternalCall.java:30)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2830)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14216) NullPointerException happens in NamenodeWebHdfs

2019-10-02 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943294#comment-16943294
 ] 

Wei-Chiu Chuang commented on HDFS-14216:


Failure doesn't look related. Pushing it to branch-3.1

> NullPointerException happens in NamenodeWebHdfs
> ---
>
> Key: HDFS-14216
> URL: https://issues.apache.org/jira/browse/HDFS-14216
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: lujie
>Assignee: lujie
>Priority: Critical
> Fix For: 3.3.0, 3.2.1
>
> Attachments: HDFS-14216.branch-3.1.patch, HDFS-14216_1.patch, 
> HDFS-14216_2.patch, HDFS-14216_3.patch, HDFS-14216_4.patch, 
> HDFS-14216_5.patch, HDFS-14216_6.patch, hadoop-hires-namenode-hadoop11.log
>
>
>  workload
> {code:java}
> curl -i -X PUT -T $HOMEPARH/test.txt 
> "http://hadoop1:9870/webhdfs/v1/input?op=CREATE&excludedatanodes=hadoop2";
> {code}
> the method
> {code:java}
> org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.chooseDatanode(String
>  excludeDatanodes){
>     HashSet excludes = new HashSet();
> if (excludeDatanodes != null) {
>for (String host : StringUtils
>  .getTrimmedStringCollection(excludeDatanodes)) {
>  int idx = host.indexOf(":");
>if (idx != -1) { 
> excludes.add(bm.getDatanodeManager().getDatanodeByXferAddr(
>host.substring(0, idx), Integer.parseInt(host.substring(idx + 
> 1;
>} else {
>   
> excludes.add(bm.getDatanodeManager().getDatanodeByHost(host));//line280
>}
>   }
> }
> }
> {code}
> when datanode(e.g.hadoop2) is {color:#d04437}just  wiped before 
> line280{color}, or{color:#33} 
> {color}{color:#ff}we{color}{color:#ff} give the wrong DN 
> name{color}*,*then  bm.getDatanodeManager().getDatanodeByHost(host) will 
> return null, *_excludes_* *containes null*. while *_excludes_* are used 
> later, NPE happens:
> {code:java}
> java.lang.NullPointerException
> at org.apache.hadoop.net.NodeBase.getPath(NodeBase.java:113)
> at 
> org.apache.hadoop.net.NetworkTopology.countNumOfAvailableNodes(NetworkTopology.java:672)
> at 
> org.apache.hadoop.net.NetworkTopology.chooseRandom(NetworkTopology.java:533)
> at 
> org.apache.hadoop.net.NetworkTopology.chooseRandom(NetworkTopology.java:491)
> at 
> org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.chooseDatanode(NamenodeWebHdfsMethods.java:323)
> at 
> org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.redirectURI(NamenodeWebHdfsMethods.java:384)
> at 
> org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.put(NamenodeWebHdfsMethods.java:652)
> at 
> org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods$2.run(NamenodeWebHdfsMethods.java:600)
> at 
> org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods$2.run(NamenodeWebHdfsMethods.java:597)
> at org.apache.hadoop.ipc.ExternalCall.run(ExternalCall.java:73)
> at org.apache.hadoop.ipc.ExternalCall.run(ExternalCall.java:30)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2830)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14678) Allow triggerBlockReport to a specific namenode

2019-10-02 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14678:
---
Fix Version/s: 3.2.2
   3.1.4

> Allow triggerBlockReport to a specific namenode
> ---
>
> Key: HDFS-14678
> URL: https://issues.apache.org/jira/browse/HDFS-14678
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.8.2
>Reporter: Leon Gao
>Assignee: Leon Gao
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
>
> In our largest prod cluster (running 2.8.2) we have >3k hosts. Every time 
> when rolling restarting NNs we will need to wait for block report which takes 
> >2.5 hours for each NN.
> One way to make it faster is to manually trigger a full block report from all 
> datanodes. [HDFS-7278|https://issues.apache.org/jira/browse/HDFS-7278]. 
> However, the current triggerBlockReport command will trigger a block report 
> on all NNs which will flood the active NN as well.
> A quick solution will be adding an option to specify a NN that the manually 
> triggered block report will go to, something like:
> *_hdfs dfsadmin [-triggerBlockReport [-incremental] ] 
> [-namenode] _*
> So when doing a restart of standby NN or observer NN we can trigger an 
> aggressive block report to a specific NN to exit safemode faster without 
> risking active NN performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14678) Allow triggerBlockReport to a specific namenode

2019-10-02 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943259#comment-16943259
 ] 

Wei-Chiu Chuang commented on HDFS-14678:


Cherry picking the commit into branch-3.2 and branch-3.1.
There's just a trivial conflict in the test code due to HADOOP-14178. 

> Allow triggerBlockReport to a specific namenode
> ---
>
> Key: HDFS-14678
> URL: https://issues.apache.org/jira/browse/HDFS-14678
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.8.2
>Reporter: Leon Gao
>Assignee: Leon Gao
>Priority: Major
> Fix For: 3.3.0
>
>
> In our largest prod cluster (running 2.8.2) we have >3k hosts. Every time 
> when rolling restarting NNs we will need to wait for block report which takes 
> >2.5 hours for each NN.
> One way to make it faster is to manually trigger a full block report from all 
> datanodes. [HDFS-7278|https://issues.apache.org/jira/browse/HDFS-7278]. 
> However, the current triggerBlockReport command will trigger a block report 
> on all NNs which will flood the active NN as well.
> A quick solution will be adding an option to specify a NN that the manually 
> triggered block report will go to, something like:
> *_hdfs dfsadmin [-triggerBlockReport [-incremental] ] 
> [-namenode] _*
> So when doing a restart of standby NN or observer NN we can trigger an 
> aggressive block report to a specific NN to exit safemode faster without 
> risking active NN performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-8881) Erasure Coding: internal blocks got missed and got over-replicated at the same time

2019-10-02 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-8881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-8881:
--
Resolution: Duplicate
Status: Resolved  (was: Patch Available)

> Erasure Coding: internal blocks got missed and got over-replicated at the 
> same time
> ---
>
> Key: HDFS-8881
> URL: https://issues.apache.org/jira/browse/HDFS-8881
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Reporter: Walter Su
>Assignee: Walter Su
>Priority: Major
> Attachments: HDFS-8881.00.patch
>
>
> We know the Repl checking depends on {{BlockManager#countNodes()}}, but 
> countNodes() has limitation for striped blockGroup.
> *One* missing internal block will be catched by Repl checking, and handled by 
> ReplicationMonitor.
> *One* over-replicated internal block will be catched by Repl checking, and 
> handled by processOverReplicatedBlocks.
> *One* missing internal block and *two* over-replicated internal blocks *at 
> the same time* will be catched by Repl checking, and handled by 
> processOverReplicatedBlocks, later by ReplicationMonitor.
> *One* missing internal block and *One* over-replicated internal block *at the 
> same time* will *NOT* be catched by Repl checking.
> "at the same time" means one missing internal block can't be recovered, and 
> one internal block got over-replicated anyway. For example:
> scenario A:
> step 1. block #0 and #1 are reported missing.
> 2. a new #1 got recovered.
> 3. the old #1 come back, and the recovery work for #0 failed.
> scenario B:
> 1. An DN decommissioned/dead which has #1.
> 2. block #0 is reported missing.
> 3. The DN has #1 recommisioned, and the recovery work for #0 failed.
> In the end, the blockGroup has \[1, 1, 2, 3, 4, 5, 6, 7, 8\], assume 6+3 
> schema. Client always needs to decode #0 if the blockGroup doesn't get 
> handled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14523) Remove excess read lock for NetworkToplogy

2019-10-02 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14523:
---
Fix Version/s: 3.2.2
   3.1.4

> Remove excess read lock for NetworkToplogy
> --
>
> Key: HDFS-14523
> URL: https://issues.apache.org/jira/browse/HDFS-14523
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Wu Weiwei
>Assignee: Wu Weiwei
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-14523.1.patch
>
>
> getNumOfRacks() and getNumOfLeaves() are two high frequencies call methods 
> for BlockPlacementPolicy, this two methods need to get NetworkTopology read 
> lock, and get lock in high frequencies call methods may impact the namenode 
> performance. 
> This two methods get number of racks and number of leaves just for 
> chooseTarget calculate,  lock in these two methods cannot guarantee these two 
> values will not change in the subsequent calculations.
> I think it's safe to remove the read lock from this two methods.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14618) Incorrect synchronization of ArrayList field (ArrayList is thread-unsafe).

2019-10-02 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14618:
---
Fix Version/s: 3.2.2
   3.1.4

> Incorrect synchronization of ArrayList field (ArrayList is thread-unsafe).
> --
>
> Key: HDFS-14618
> URL: https://issues.apache.org/jira/browse/HDFS-14618
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Paul Ward
>Assignee: Paul Ward
>Priority: Critical
>  Labels: fix-provided, patch-available
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: race.patch
>
>
> I submitted a  CR for this issue at:
> https://github.com/apache/hadoop/pull/1030
> The field {{timedOutItems}}  (an {{ArrayList}}, i.e., not thread safe):
> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/PendingReconstructionBlocks.java#L70
> is protected by synchronization on itself ({{timedOutItems}}):
> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/PendingReconstructionBlocks.java#L167-L168
> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/PendingReconstructionBlocks.java#L267-L268
> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/PendingReconstructionBlocks.java#L178
> However, in one place:
> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/PendingReconstructionBlocks.java#L133-L135
> it is (trying to be) protected by synchronized using 
> {{pendingReconstructions}} --- but this cannot protect {{timedOutItems}}.
> Synchronized on different objects does not ensure mutual exclusion with the 
> other locations.
> I.e., 2 code locations, one synchronized by {{pendingReconstructions}} and 
> the other by {{timedOutItems}} can still executed concurrently.
> This CR adds the synchronized on {{timedOutItems}}.
> Note that this CR keeps the synchronized on {{pendingReconstructions}}, which 
> is needed for a different purpose (protect {{pendingReconstructions}})



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14216) NullPointerException happens in NamenodeWebHdfs

2019-10-02 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14216:
---
Attachment: HDFS-14216.branch-3.1.patch

> NullPointerException happens in NamenodeWebHdfs
> ---
>
> Key: HDFS-14216
> URL: https://issues.apache.org/jira/browse/HDFS-14216
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: lujie
>Assignee: lujie
>Priority: Critical
> Fix For: 3.3.0, 3.2.1
>
> Attachments: HDFS-14216.branch-3.1.patch, HDFS-14216_1.patch, 
> HDFS-14216_2.patch, HDFS-14216_3.patch, HDFS-14216_4.patch, 
> HDFS-14216_5.patch, HDFS-14216_6.patch, hadoop-hires-namenode-hadoop11.log
>
>
>  workload
> {code:java}
> curl -i -X PUT -T $HOMEPARH/test.txt 
> "http://hadoop1:9870/webhdfs/v1/input?op=CREATE&excludedatanodes=hadoop2";
> {code}
> the method
> {code:java}
> org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.chooseDatanode(String
>  excludeDatanodes){
>     HashSet excludes = new HashSet();
> if (excludeDatanodes != null) {
>for (String host : StringUtils
>  .getTrimmedStringCollection(excludeDatanodes)) {
>  int idx = host.indexOf(":");
>if (idx != -1) { 
> excludes.add(bm.getDatanodeManager().getDatanodeByXferAddr(
>host.substring(0, idx), Integer.parseInt(host.substring(idx + 
> 1;
>} else {
>   
> excludes.add(bm.getDatanodeManager().getDatanodeByHost(host));//line280
>}
>   }
> }
> }
> {code}
> when datanode(e.g.hadoop2) is {color:#d04437}just  wiped before 
> line280{color}, or{color:#33} 
> {color}{color:#ff}we{color}{color:#ff} give the wrong DN 
> name{color}*,*then  bm.getDatanodeManager().getDatanodeByHost(host) will 
> return null, *_excludes_* *containes null*. while *_excludes_* are used 
> later, NPE happens:
> {code:java}
> java.lang.NullPointerException
> at org.apache.hadoop.net.NodeBase.getPath(NodeBase.java:113)
> at 
> org.apache.hadoop.net.NetworkTopology.countNumOfAvailableNodes(NetworkTopology.java:672)
> at 
> org.apache.hadoop.net.NetworkTopology.chooseRandom(NetworkTopology.java:533)
> at 
> org.apache.hadoop.net.NetworkTopology.chooseRandom(NetworkTopology.java:491)
> at 
> org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.chooseDatanode(NamenodeWebHdfsMethods.java:323)
> at 
> org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.redirectURI(NamenodeWebHdfsMethods.java:384)
> at 
> org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.put(NamenodeWebHdfsMethods.java:652)
> at 
> org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods$2.run(NamenodeWebHdfsMethods.java:600)
> at 
> org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods$2.run(NamenodeWebHdfsMethods.java:597)
> at org.apache.hadoop.ipc.ExternalCall.run(ExternalCall.java:73)
> at org.apache.hadoop.ipc.ExternalCall.run(ExternalCall.java:30)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2830)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14216) NullPointerException happens in NamenodeWebHdfs

2019-10-02 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14216:
---
Status: Patch Available  (was: Reopened)

> NullPointerException happens in NamenodeWebHdfs
> ---
>
> Key: HDFS-14216
> URL: https://issues.apache.org/jira/browse/HDFS-14216
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: lujie
>Assignee: lujie
>Priority: Critical
> Fix For: 3.3.0, 3.2.1
>
> Attachments: HDFS-14216.branch-3.1.patch, HDFS-14216_1.patch, 
> HDFS-14216_2.patch, HDFS-14216_3.patch, HDFS-14216_4.patch, 
> HDFS-14216_5.patch, HDFS-14216_6.patch, hadoop-hires-namenode-hadoop11.log
>
>
>  workload
> {code:java}
> curl -i -X PUT -T $HOMEPARH/test.txt 
> "http://hadoop1:9870/webhdfs/v1/input?op=CREATE&excludedatanodes=hadoop2";
> {code}
> the method
> {code:java}
> org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.chooseDatanode(String
>  excludeDatanodes){
>     HashSet excludes = new HashSet();
> if (excludeDatanodes != null) {
>for (String host : StringUtils
>  .getTrimmedStringCollection(excludeDatanodes)) {
>  int idx = host.indexOf(":");
>if (idx != -1) { 
> excludes.add(bm.getDatanodeManager().getDatanodeByXferAddr(
>host.substring(0, idx), Integer.parseInt(host.substring(idx + 
> 1;
>} else {
>   
> excludes.add(bm.getDatanodeManager().getDatanodeByHost(host));//line280
>}
>   }
> }
> }
> {code}
> when datanode(e.g.hadoop2) is {color:#d04437}just  wiped before 
> line280{color}, or{color:#33} 
> {color}{color:#ff}we{color}{color:#ff} give the wrong DN 
> name{color}*,*then  bm.getDatanodeManager().getDatanodeByHost(host) will 
> return null, *_excludes_* *containes null*. while *_excludes_* are used 
> later, NPE happens:
> {code:java}
> java.lang.NullPointerException
> at org.apache.hadoop.net.NodeBase.getPath(NodeBase.java:113)
> at 
> org.apache.hadoop.net.NetworkTopology.countNumOfAvailableNodes(NetworkTopology.java:672)
> at 
> org.apache.hadoop.net.NetworkTopology.chooseRandom(NetworkTopology.java:533)
> at 
> org.apache.hadoop.net.NetworkTopology.chooseRandom(NetworkTopology.java:491)
> at 
> org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.chooseDatanode(NamenodeWebHdfsMethods.java:323)
> at 
> org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.redirectURI(NamenodeWebHdfsMethods.java:384)
> at 
> org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.put(NamenodeWebHdfsMethods.java:652)
> at 
> org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods$2.run(NamenodeWebHdfsMethods.java:600)
> at 
> org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods$2.run(NamenodeWebHdfsMethods.java:597)
> at org.apache.hadoop.ipc.ExternalCall.run(ExternalCall.java:73)
> at org.apache.hadoop.ipc.ExternalCall.run(ExternalCall.java:30)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2830)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-14216) NullPointerException happens in NamenodeWebHdfs

2019-10-02 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang reopened HDFS-14216:


Reopen for branch-3.1. The only thing different is the LOG class change. Can't 
use parameterized logging.

> NullPointerException happens in NamenodeWebHdfs
> ---
>
> Key: HDFS-14216
> URL: https://issues.apache.org/jira/browse/HDFS-14216
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: lujie
>Assignee: lujie
>Priority: Critical
> Fix For: 3.3.0, 3.2.1
>
> Attachments: HDFS-14216_1.patch, HDFS-14216_2.patch, 
> HDFS-14216_3.patch, HDFS-14216_4.patch, HDFS-14216_5.patch, 
> HDFS-14216_6.patch, hadoop-hires-namenode-hadoop11.log
>
>
>  workload
> {code:java}
> curl -i -X PUT -T $HOMEPARH/test.txt 
> "http://hadoop1:9870/webhdfs/v1/input?op=CREATE&excludedatanodes=hadoop2";
> {code}
> the method
> {code:java}
> org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.chooseDatanode(String
>  excludeDatanodes){
>     HashSet excludes = new HashSet();
> if (excludeDatanodes != null) {
>for (String host : StringUtils
>  .getTrimmedStringCollection(excludeDatanodes)) {
>  int idx = host.indexOf(":");
>if (idx != -1) { 
> excludes.add(bm.getDatanodeManager().getDatanodeByXferAddr(
>host.substring(0, idx), Integer.parseInt(host.substring(idx + 
> 1;
>} else {
>   
> excludes.add(bm.getDatanodeManager().getDatanodeByHost(host));//line280
>}
>   }
> }
> }
> {code}
> when datanode(e.g.hadoop2) is {color:#d04437}just  wiped before 
> line280{color}, or{color:#33} 
> {color}{color:#ff}we{color}{color:#ff} give the wrong DN 
> name{color}*,*then  bm.getDatanodeManager().getDatanodeByHost(host) will 
> return null, *_excludes_* *containes null*. while *_excludes_* are used 
> later, NPE happens:
> {code:java}
> java.lang.NullPointerException
> at org.apache.hadoop.net.NodeBase.getPath(NodeBase.java:113)
> at 
> org.apache.hadoop.net.NetworkTopology.countNumOfAvailableNodes(NetworkTopology.java:672)
> at 
> org.apache.hadoop.net.NetworkTopology.chooseRandom(NetworkTopology.java:533)
> at 
> org.apache.hadoop.net.NetworkTopology.chooseRandom(NetworkTopology.java:491)
> at 
> org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.chooseDatanode(NamenodeWebHdfsMethods.java:323)
> at 
> org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.redirectURI(NamenodeWebHdfsMethods.java:384)
> at 
> org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods.put(NamenodeWebHdfsMethods.java:652)
> at 
> org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods$2.run(NamenodeWebHdfsMethods.java:600)
> at 
> org.apache.hadoop.hdfs.server.namenode.web.resources.NamenodeWebHdfsMethods$2.run(NamenodeWebHdfsMethods.java:597)
> at org.apache.hadoop.ipc.ExternalCall.run(ExternalCall.java:73)
> at org.apache.hadoop.ipc.ExternalCall.run(ExternalCall.java:30)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2830)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14610) HashMap is not thread safe. Field storageMap is typically synchronized by storageMap. However, in one place, field storageMap is not protected with synchronized.

2019-10-02 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14610:
---
Fix Version/s: 3.2.2
   3.1.4

> HashMap is not thread safe. Field storageMap is typically synchronized by 
> storageMap. However, in one place, field storageMap is not protected with 
> synchronized.
> -
>
> Key: HDFS-14610
> URL: https://issues.apache.org/jira/browse/HDFS-14610
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Paul Ward
>Assignee: Paul Ward
>Priority: Critical
>  Labels: fix-provided, patch-available
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: addingSynchronization.patch
>
>
> I submitted a CR for this issue at:
> [https://github.com/apache/hadoop/pull/1015]
> The field *storageMap* (a *HashMap*)
> [https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java#L155]
> is typically protected by synchronization on *storageMap*, e.g.,
> [https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java#L294]
> [https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java#L443]
> [https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java#L484]
> For a total of 9 locations.
> The reason is because *HashMap* is not thread safe.
> However, here:
> [https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java#L455]
> {{DatanodeStorageInfo storage =}}
> {{   storageMap.get(report.getStorage().getStorageID());}}
> It is not synchronized.
> Note that in the same method:
> [https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java#L484]
> *storageMap* is again protected by synchronization:
> {{synchronized (storageMap) {}}
> {{   storageMapSize = storageMap.size();}}
> {{}}}
>  
> The CR I inlined above protected the above instance (line 455 ) with 
> synchronization
>  like in line 484 and in all other occurrences.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14527) Stop all DataNodes may result in NN terminate

2019-10-02 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943165#comment-16943165
 ] 

Wei-Chiu Chuang commented on HDFS-14527:


Patch applies cleanly in branch-3.2 also.
But it doesn't compile in branch-3.1. I'll provide a patch shortly.

> Stop all DataNodes may result in NN terminate
> -
>
> Key: HDFS-14527
> URL: https://issues.apache.org/jira/browse/HDFS-14527
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Fix For: 3.3.0, 3.2.2
>
> Attachments: HDFS-14527.001.patch, HDFS-14527.002.patch, 
> HDFS-14527.003.patch, HDFS-14527.004.patch, HDFS-14527.005.patch
>
>
> If we stop all datanodes of cluster, BlockPlacementPolicyDefault#chooseTarget 
> may get ArithmeticException when calling #getMaxNodesPerRack, which throws 
> the runtime exception out to BlockManager's ReplicationMonitor thread and 
> then terminate the NN.
> The root cause is that BlockPlacementPolicyDefault#chooseTarget not hold the 
> global lock, and if all DataNodes are dead between 
> {{clusterMap.getNumberOfLeaves()}} and {{getMaxNodesPerRack}} then it meet 
> {{ArithmeticException}} while invoke {{getMaxNodesPerRack}}.
> {code:java}
>   private DatanodeStorageInfo[] chooseTarget(int numOfReplicas,
> Node writer,
> List chosenStorage,
> boolean returnChosenNodes,
> Set excludedNodes,
> long blocksize,
> final BlockStoragePolicy storagePolicy,
> EnumSet addBlockFlags,
> EnumMap sTypes) {
> if (numOfReplicas == 0 || clusterMap.getNumOfLeaves()==0) {
>   return DatanodeStorageInfo.EMPTY_ARRAY;
> }
> ..
> int[] result = getMaxNodesPerRack(chosenStorage.size(), numOfReplicas);
> ..
> }
> {code}
> Some detailed log show as following.
> {code:java}
> 2019-05-31 12:29:21,803 ERROR 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
> ReplicationMonitor thread received Runtime exception. 
> java.lang.ArithmeticException: / by zero
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.getMaxNodesPerRack(BlockPlacementPolicyDefault.java:282)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:228)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:132)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.chooseTargets(BlockManager.java:4533)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.access$1800(BlockManager.java:4493)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1954)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1830)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4453)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4388)
> at java.lang.Thread.run(Thread.java:745)
> 2019-05-31 12:29:21,805 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
> status 1
> {code}
> To be honest, this is not serious bug and not reprod easily, since if we stop 
> all Datanodes and only keep NameNode lives, HDFS could be not offer service 
> normally and we could only retrieve directory. It may be one corner case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14527) Stop all DataNodes may result in NN terminate

2019-10-02 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14527:
---
Fix Version/s: 3.2.2

> Stop all DataNodes may result in NN terminate
> -
>
> Key: HDFS-14527
> URL: https://issues.apache.org/jira/browse/HDFS-14527
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Fix For: 3.3.0, 3.2.2
>
> Attachments: HDFS-14527.001.patch, HDFS-14527.002.patch, 
> HDFS-14527.003.patch, HDFS-14527.004.patch, HDFS-14527.005.patch
>
>
> If we stop all datanodes of cluster, BlockPlacementPolicyDefault#chooseTarget 
> may get ArithmeticException when calling #getMaxNodesPerRack, which throws 
> the runtime exception out to BlockManager's ReplicationMonitor thread and 
> then terminate the NN.
> The root cause is that BlockPlacementPolicyDefault#chooseTarget not hold the 
> global lock, and if all DataNodes are dead between 
> {{clusterMap.getNumberOfLeaves()}} and {{getMaxNodesPerRack}} then it meet 
> {{ArithmeticException}} while invoke {{getMaxNodesPerRack}}.
> {code:java}
>   private DatanodeStorageInfo[] chooseTarget(int numOfReplicas,
> Node writer,
> List chosenStorage,
> boolean returnChosenNodes,
> Set excludedNodes,
> long blocksize,
> final BlockStoragePolicy storagePolicy,
> EnumSet addBlockFlags,
> EnumMap sTypes) {
> if (numOfReplicas == 0 || clusterMap.getNumOfLeaves()==0) {
>   return DatanodeStorageInfo.EMPTY_ARRAY;
> }
> ..
> int[] result = getMaxNodesPerRack(chosenStorage.size(), numOfReplicas);
> ..
> }
> {code}
> Some detailed log show as following.
> {code:java}
> 2019-05-31 12:29:21,803 ERROR 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
> ReplicationMonitor thread received Runtime exception. 
> java.lang.ArithmeticException: / by zero
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.getMaxNodesPerRack(BlockPlacementPolicyDefault.java:282)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:228)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:132)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.chooseTargets(BlockManager.java:4533)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationWork.access$1800(BlockManager.java:4493)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1954)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1830)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4453)
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:4388)
> at java.lang.Thread.run(Thread.java:745)
> 2019-05-31 12:29:21,805 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
> status 1
> {code}
> To be honest, this is not serious bug and not reprod easily, since if we stop 
> all Datanodes and only keep NameNode lives, HDFS could be not offer service 
> normally and we could only retrieve directory. It may be one corner case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14808) EC: Improper size values for corrupt ec block in LOG

2019-10-02 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14808:
---
Fix Version/s: 3.2.2
   3.1.4

> EC: Improper size values for corrupt ec block in LOG 
> -
>
> Key: HDFS-14808
> URL: https://issues.apache.org/jira/browse/HDFS-14808
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Reporter: Harshakiran Reddy
>Assignee: Ayush Saxena
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-14808-01.patch
>
>
> If the block corruption reason is size mismatch the log. The values shown and 
> compared are ambiguous.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14699) Erasure Coding: Storage not considered in live replica when replication streams hard limit reached to threshold

2019-10-02 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14699:
---
Fix Version/s: 3.1.4

> Erasure Coding: Storage not considered in live replica when replication 
> streams hard limit reached to threshold
> ---
>
> Key: HDFS-14699
> URL: https://issues.apache.org/jira/browse/HDFS-14699
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Affects Versions: 3.2.0, 3.1.1, 3.3.0
>Reporter: Zhao Yi Ming
>Assignee: Zhao Yi Ming
>Priority: Critical
>  Labels: patch
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-14699.00.patch, HDFS-14699.01.patch, 
> HDFS-14699.02.patch, HDFS-14699.03.patch, HDFS-14699.04.patch, 
> HDFS-14699.05.patch, image-2019-08-20-19-58-51-872.png, 
> image-2019-09-02-17-51-46-742.png
>
>
> We are tried the EC function on 80 node cluster with hadoop 3.1.1, we hit the 
> same scenario as you said https://issues.apache.org/jira/browse/HDFS-8881. 
> Following are our testing steps, hope it can helpful.(following DNs have the 
> testing internal blocks)
>  # we customized a new 10-2-1024k policy and use it on a path, now we have 12 
> internal block(12 live block)
>  # decommission one DN, after the decommission complete. now we have 13 
> internal block(12 live block and 1 decommission block)
>  # then shutdown one DN which did not have the same block id as 1 
> decommission block, now we have 12 internal block(11 live block and 1 
> decommission block)
>  # after wait for about 600s (before the heart beat come) commission the 
> decommissioned DN again, now we have 12 internal block(11 live block and 1 
> duplicate block)
>  # Then the EC is not reconstruct the missed block
> We think this is a critical issue for using the EC function in a production 
> env. Could you help? Thanks a lot!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14754) Erasure Coding : The number of Under-Replicated Blocks never reduced

2019-10-02 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14754:
---
Status: Patch Available  (was: Reopened)

Reopen & submit the branch-3.1 patch.
Branch-3.2 was cherry picked without conflict.

> Erasure Coding :  The number of Under-Replicated Blocks never reduced
> -
>
> Key: HDFS-14754
> URL: https://issues.apache.org/jira/browse/HDFS-14754
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Critical
> Fix For: 3.3.0, 3.2.2
>
> Attachments: HDFS-14754-addendum.001.patch, HDFS-14754.001.patch, 
> HDFS-14754.002.patch, HDFS-14754.003.patch, HDFS-14754.004.patch, 
> HDFS-14754.005.patch, HDFS-14754.006.patch, HDFS-14754.007.patch, 
> HDFS-14754.008.patch, HDFS-14754.branch-3.1.patch
>
>
> Using EC RS-3-2, 6 DN 
> We came accross a scenario where in the EC 5 blocks , same block is 
> replicated thrice and two blocks got missing
> Replicated block was not deleting and missing block is not able to ReConstruct



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-14754) Erasure Coding : The number of Under-Replicated Blocks never reduced

2019-10-02 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang reopened HDFS-14754:


> Erasure Coding :  The number of Under-Replicated Blocks never reduced
> -
>
> Key: HDFS-14754
> URL: https://issues.apache.org/jira/browse/HDFS-14754
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Critical
> Fix For: 3.3.0, 3.2.2
>
> Attachments: HDFS-14754-addendum.001.patch, HDFS-14754.001.patch, 
> HDFS-14754.002.patch, HDFS-14754.003.patch, HDFS-14754.004.patch, 
> HDFS-14754.005.patch, HDFS-14754.006.patch, HDFS-14754.007.patch, 
> HDFS-14754.008.patch, HDFS-14754.branch-3.1.patch
>
>
> Using EC RS-3-2, 6 DN 
> We came accross a scenario where in the EC 5 blocks , same block is 
> replicated thrice and two blocks got missing
> Replicated block was not deleting and missing block is not able to ReConstruct



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14754) Erasure Coding : The number of Under-Replicated Blocks never reduced

2019-10-02 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14754:
---
Fix Version/s: 3.2.2

> Erasure Coding :  The number of Under-Replicated Blocks never reduced
> -
>
> Key: HDFS-14754
> URL: https://issues.apache.org/jira/browse/HDFS-14754
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Critical
> Fix For: 3.3.0, 3.2.2
>
> Attachments: HDFS-14754-addendum.001.patch, HDFS-14754.001.patch, 
> HDFS-14754.002.patch, HDFS-14754.003.patch, HDFS-14754.004.patch, 
> HDFS-14754.005.patch, HDFS-14754.006.patch, HDFS-14754.007.patch, 
> HDFS-14754.008.patch, HDFS-14754.branch-3.1.patch
>
>
> Using EC RS-3-2, 6 DN 
> We came accross a scenario where in the EC 5 blocks , same block is 
> replicated thrice and two blocks got missing
> Replicated block was not deleting and missing block is not able to ReConstruct



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

2019-10-01 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942401#comment-16942401
 ] 

Wei-Chiu Chuang commented on HDFS-14313:


Conflicts are trivial so I pushed them into branch-3.2 and branch-3.1. Patch 
attached to the jira for posterity.

> Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory  
> instead of df/du
> 
>
> Key: HDFS-14313
> URL: https://issues.apache.org/jira/browse/HDFS-14313
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 2.10.0, 3.0.4, 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-14313-branch-2.v1.patch, 
> HDFS-14313-branch-2.v2.patch, HDFS-14313.000.patch, HDFS-14313.001.patch, 
> HDFS-14313.002.patch, HDFS-14313.003.patch, HDFS-14313.004.patch, 
> HDFS-14313.005.patch, HDFS-14313.006.patch, HDFS-14313.007.patch, 
> HDFS-14313.008.patch, HDFS-14313.009.patch, HDFS-14313.010.patch, 
> HDFS-14313.011.patch, HDFS-14313.012.patch, HDFS-14313.013.patch, 
> HDFS-14313.014.patch, HDFS-14313.branch-3.0.v1.patch, 
> HDFS-14313.branch-3.0.v2.patch, HDFS-14313.branch-3.1.patch, 
> HDFS-14313.branch-3.2.patch, HDFS-14313.branch-3.v1.patch
>
>
> There are two ways of DU/DF getting used space that are insufficient.
>  #  Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike.
>  #  Running DF is inaccurate when the disk sharing by multiple datanode or 
> other servers.
>  Getting hdfs used space from  FsDatasetImpl#volumeMap#ReplicaInfos in memory 
> is very small and accurate. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

2019-10-01 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14313:
---
Fix Version/s: 3.2.2
   3.1.4

> Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory  
> instead of df/du
> 
>
> Key: HDFS-14313
> URL: https://issues.apache.org/jira/browse/HDFS-14313
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 2.10.0, 3.0.4, 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-14313-branch-2.v1.patch, 
> HDFS-14313-branch-2.v2.patch, HDFS-14313.000.patch, HDFS-14313.001.patch, 
> HDFS-14313.002.patch, HDFS-14313.003.patch, HDFS-14313.004.patch, 
> HDFS-14313.005.patch, HDFS-14313.006.patch, HDFS-14313.007.patch, 
> HDFS-14313.008.patch, HDFS-14313.009.patch, HDFS-14313.010.patch, 
> HDFS-14313.011.patch, HDFS-14313.012.patch, HDFS-14313.013.patch, 
> HDFS-14313.014.patch, HDFS-14313.branch-3.0.v1.patch, 
> HDFS-14313.branch-3.0.v2.patch, HDFS-14313.branch-3.1.patch, 
> HDFS-14313.branch-3.2.patch, HDFS-14313.branch-3.v1.patch
>
>
> There are two ways of DU/DF getting used space that are insufficient.
>  #  Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike.
>  #  Running DF is inaccurate when the disk sharing by multiple datanode or 
> other servers.
>  Getting hdfs used space from  FsDatasetImpl#volumeMap#ReplicaInfos in memory 
> is very small and accurate. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

2019-10-01 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14313:
---
Attachment: HDFS-14313.branch-3.2.patch
HDFS-14313.branch-3.1.patch

> Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory  
> instead of df/du
> 
>
> Key: HDFS-14313
> URL: https://issues.apache.org/jira/browse/HDFS-14313
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 2.10.0, 3.0.4, 3.3.0
>
> Attachments: HDFS-14313-branch-2.v1.patch, 
> HDFS-14313-branch-2.v2.patch, HDFS-14313.000.patch, HDFS-14313.001.patch, 
> HDFS-14313.002.patch, HDFS-14313.003.patch, HDFS-14313.004.patch, 
> HDFS-14313.005.patch, HDFS-14313.006.patch, HDFS-14313.007.patch, 
> HDFS-14313.008.patch, HDFS-14313.009.patch, HDFS-14313.010.patch, 
> HDFS-14313.011.patch, HDFS-14313.012.patch, HDFS-14313.013.patch, 
> HDFS-14313.014.patch, HDFS-14313.branch-3.0.v1.patch, 
> HDFS-14313.branch-3.0.v2.patch, HDFS-14313.branch-3.1.patch, 
> HDFS-14313.branch-3.2.patch, HDFS-14313.branch-3.v1.patch
>
>
> There are two ways of DU/DF getting used space that are insufficient.
>  #  Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike.
>  #  Running DF is inaccurate when the disk sharing by multiple datanode or 
> other servers.
>  Getting hdfs used space from  FsDatasetImpl#volumeMap#ReplicaInfos in memory 
> is very small and accurate. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

2019-10-01 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942389#comment-16942389
 ] 

Wei-Chiu Chuang commented on HDFS-14313:


This is missing from branch-3.2 and branch-3.1 unfortunately ...

> Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory  
> instead of df/du
> 
>
> Key: HDFS-14313
> URL: https://issues.apache.org/jira/browse/HDFS-14313
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, performance
>Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 2.10.0, 3.0.4, 3.3.0
>
> Attachments: HDFS-14313-branch-2.v1.patch, 
> HDFS-14313-branch-2.v2.patch, HDFS-14313.000.patch, HDFS-14313.001.patch, 
> HDFS-14313.002.patch, HDFS-14313.003.patch, HDFS-14313.004.patch, 
> HDFS-14313.005.patch, HDFS-14313.006.patch, HDFS-14313.007.patch, 
> HDFS-14313.008.patch, HDFS-14313.009.patch, HDFS-14313.010.patch, 
> HDFS-14313.011.patch, HDFS-14313.012.patch, HDFS-14313.013.patch, 
> HDFS-14313.014.patch, HDFS-14313.branch-3.0.v1.patch, 
> HDFS-14313.branch-3.0.v2.patch, HDFS-14313.branch-3.v1.patch
>
>
> There are two ways of DU/DF getting used space that are insufficient.
>  #  Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike.
>  #  Running DF is inaccurate when the disk sharing by multiple datanode or 
> other servers.
>  Getting hdfs used space from  FsDatasetImpl#volumeMap#ReplicaInfos in memory 
> is very small and accurate. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14192) Track missing DFS operations in Statistics and StorageStatistics

2019-10-01 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14192:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Resolve. The failed tests doesn't appear related.
Cherry picked the commit from trunk to branch-3.2, and pushed the patch to 
branch-3.1

> Track missing DFS operations in Statistics and StorageStatistics
> 
>
> Key: HDFS-14192
> URL: https://issues.apache.org/jira/browse/HDFS-14192
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-14192-01.patch, HDFS-14192-02.patch, 
> HDFS-14192.branch-3.1.patch
>
>
> Track Missing DFS Operations to oblige the Read/Write Statistics and the 
> StorageStatistics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14192) Track missing DFS operations in Statistics and StorageStatistics

2019-10-01 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14192:
---
Fix Version/s: 3.2.2
   3.1.4

> Track missing DFS operations in Statistics and StorageStatistics
> 
>
> Key: HDFS-14192
> URL: https://issues.apache.org/jira/browse/HDFS-14192
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-14192-01.patch, HDFS-14192-02.patch, 
> HDFS-14192.branch-3.1.patch
>
>
> Track Missing DFS Operations to oblige the Read/Write Statistics and the 
> StorageStatistics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14349) Edit log may be rolled more frequently than necessary with multiple Standby nodes

2019-10-01 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14349:
---
Labels: multi-sbnn  (was: )

> Edit log may be rolled more frequently than necessary with multiple Standby 
> nodes
> -
>
> Key: HDFS-14349
> URL: https://issues.apache.org/jira/browse/HDFS-14349
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, hdfs, qjm
>Reporter: Erik Krogen
>Assignee: Ekanth Sethuramalingam
>Priority: Major
>  Labels: multi-sbnn
>
> When HDFS-14317 was fixed, we tackled the problem that in a cluster with 
> in-progress edit log tailing enabled, a Standby NameNode may _never_ roll the 
> edit logs, which can eventually cause data loss.
> Unfortunately, in the process, it was made so that if there are multiple 
> Standby NameNodes, they will all roll the edit logs at their specified 
> frequency, so the edit log will be rolled X times more frequently than they 
> should be (where X is the number of Standby NNs). This is not as bad as the 
> original bug since rolling frequently does not affect correctness or data 
> availability, but may degrade performance by creating more edit log segments 
> than necessary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14192) Track missing DFS operations in Statistics and StorageStatistics

2019-10-01 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14192:
---
Status: Patch Available  (was: Reopened)

> Track missing DFS operations in Statistics and StorageStatistics
> 
>
> Key: HDFS-14192
> URL: https://issues.apache.org/jira/browse/HDFS-14192
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14192-01.patch, HDFS-14192-02.patch, 
> HDFS-14192.branch-3.1.patch
>
>
> Track Missing DFS Operations to oblige the Read/Write Statistics and the 
> StorageStatistics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14192) Track missing DFS operations in Statistics and StorageStatistics

2019-10-01 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14192:
---
Attachment: HDFS-14192.branch-3.1.patch

> Track missing DFS operations in Statistics and StorageStatistics
> 
>
> Key: HDFS-14192
> URL: https://issues.apache.org/jira/browse/HDFS-14192
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14192-01.patch, HDFS-14192-02.patch, 
> HDFS-14192.branch-3.1.patch
>
>
> Track Missing DFS Operations to oblige the Read/Write Statistics and the 
> StorageStatistics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-14192) Track missing DFS operations in Statistics and StorageStatistics

2019-10-01 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang reopened HDFS-14192:


Reopen for the branch-3.1 backport.

> Track missing DFS operations in Statistics and StorageStatistics
> 
>
> Key: HDFS-14192
> URL: https://issues.apache.org/jira/browse/HDFS-14192
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14192-01.patch, HDFS-14192-02.patch, 
> HDFS-14192.branch-3.1.patch
>
>
> Track Missing DFS Operations to oblige the Read/Write Statistics and the 
> StorageStatistics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14192) Track missing DFS operations in Statistics and StorageStatistics

2019-10-01 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942226#comment-16942226
 ] 

Wei-Chiu Chuang commented on HDFS-14192:


Commit applies cleanly in branch-3.2, but because storage policy satisfier 
isn't in 3.1, the patch has a conflict.

> Track missing DFS operations in Statistics and StorageStatistics
> 
>
> Key: HDFS-14192
> URL: https://issues.apache.org/jira/browse/HDFS-14192
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14192-01.patch, HDFS-14192-02.patch
>
>
> Track Missing DFS Operations to oblige the Read/Write Statistics and the 
> StorageStatistics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14460) DFSUtil#getNamenodeWebAddr should return HTTPS address based on policy configured

2019-10-01 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14460:
---
Fix Version/s: 3.2.2
   3.1.4

> DFSUtil#getNamenodeWebAddr should return HTTPS address based on policy 
> configured
> -
>
> Key: HDFS-14460
> URL: https://issues.apache.org/jira/browse/HDFS-14460
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: CR Hota
>Assignee: CR Hota
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-14460.001.patch, HDFS-14460.002.patch, 
> HDFS-14460.003.patch, HDFS-14460.004.patch
>
>
> DFSUtil#getNamenodeWebAddr does a look-up of HTTP address irrespective of 
> policy configured. It should instead look at the policy configured and return 
> appropriate web address.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14175) EC: Native XOR decoder should reset the output buffer before using it.

2019-10-01 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14175:
---
Fix Version/s: 3.2.2
   3.1.4

> EC: Native XOR decoder should reset the output buffer before using it.
> --
>
> Key: HDFS-14175
> URL: https://issues.apache.org/jira/browse/HDFS-14175
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec, hdfs
>Affects Versions: 3.1.1
>Reporter: Surendra Singh Lilhore
>Assignee: Ayush Saxena
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-14175-01.patch, HDFS-14175-02.patch, 
> HDFS-14175-03.patch
>
>
> *jni_xor_decoder#xxx_decodeImpl()* ** should reset the outputs[0] before 
> using it. Sometime file decode give wrong data because of this. Please refer 
> the XORRawDecoder#doDecode() for the java  implementation of XOR decoder.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14202) "dfs.disk.balancer.max.disk.throughputInMBperSec" property is not working as per set value.

2019-10-01 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14202:
---
Fix Version/s: 3.2.2
   3.1.4

> "dfs.disk.balancer.max.disk.throughputInMBperSec" property is not working as 
> per set value.
> ---
>
> Key: HDFS-14202
> URL: https://issues.apache.org/jira/browse/HDFS-14202
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: diskbalancer
>Affects Versions: 3.0.1
>Reporter: Ranith Sardar
>Assignee: Ranith Sardar
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-14202.001.patch, HDFS-14202.002.patch, 
> HDFS-14202.003.patch, HDFS-14202.004.patch, HDFS-14202.005.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14418) Remove redundant super user priveledge checks from namenode.

2019-10-01 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14418:
---
Fix Version/s: 3.2.2
   3.1.4

> Remove redundant super user priveledge checks from namenode.
> 
>
> Key: HDFS-14418
> URL: https://issues.apache.org/jira/browse/HDFS-14418
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-14418-01.patch, HDFS-14418-02.patch, 
> HDFS-14418.branch-3.1.001.patch
>
>
> There are couple of methods that unnecessarily double checks super user 
> privileged at namenode, which can reduced to single.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-14418) Remove redundant super user priveledge checks from namenode.

2019-10-01 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-14418.

Resolution: Fixed

> Remove redundant super user priveledge checks from namenode.
> 
>
> Key: HDFS-14418
> URL: https://issues.apache.org/jira/browse/HDFS-14418
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-14418-01.patch, HDFS-14418-02.patch, 
> HDFS-14418.branch-3.1.001.patch
>
>
> There are couple of methods that unnecessarily double checks super user 
> privileged at namenode, which can reduced to single.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14418) Remove redundant super user priveledge checks from namenode.

2019-10-01 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14418:
---
Attachment: HDFS-14418.branch-3.1.001.patch

> Remove redundant super user priveledge checks from namenode.
> 
>
> Key: HDFS-14418
> URL: https://issues.apache.org/jira/browse/HDFS-14418
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14418-01.patch, HDFS-14418-02.patch, 
> HDFS-14418.branch-3.1.001.patch
>
>
> There are couple of methods that unnecessarily double checks super user 
> privileged at namenode, which can reduced to single.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14418) Remove redundant super user priveledge checks from namenode.

2019-10-01 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16942118#comment-16942118
 ] 

Wei-Chiu Chuang commented on HDFS-14418:


Patch applies cleanly in branch-3.2. There's a trivial conflict in branch-3.1.

> Remove redundant super user priveledge checks from namenode.
> 
>
> Key: HDFS-14418
> URL: https://issues.apache.org/jira/browse/HDFS-14418
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14418-01.patch, HDFS-14418-02.patch, 
> HDFS-14418.branch-3.1.001.patch
>
>
> There are couple of methods that unnecessarily double checks super user 
> privileged at namenode, which can reduced to single.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-14418) Remove redundant super user priveledge checks from namenode.

2019-10-01 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang reopened HDFS-14418:


Reopen to add this in branch-3.1

> Remove redundant super user priveledge checks from namenode.
> 
>
> Key: HDFS-14418
> URL: https://issues.apache.org/jira/browse/HDFS-14418
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14418-01.patch, HDFS-14418-02.patch, 
> HDFS-14418.branch-3.1.001.patch
>
>
> There are couple of methods that unnecessarily double checks super user 
> privileged at namenode, which can reduced to single.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14235) Handle ArrayIndexOutOfBoundsException in DataNodeDiskMetrics#slowDiskDetectionDaemon

2019-09-30 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16941411#comment-16941411
 ] 

Wei-Chiu Chuang commented on HDFS-14235:


Commit applies cleanly in branch-3.1. Updated fix version

> Handle ArrayIndexOutOfBoundsException in 
> DataNodeDiskMetrics#slowDiskDetectionDaemon 
> -
>
> Key: HDFS-14235
> URL: https://issues.apache.org/jira/browse/HDFS-14235
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Surendra Singh Lilhore
>Assignee: Ranith Sardar
>Priority: Major
> Fix For: 3.3.0, 3.2.1, 3.1.4
>
> Attachments: HDFS-14235.000.patch, HDFS-14235.001.patch, 
> HDFS-14235.002.patch, HDFS-14235.003.patch, NPE.png, exception.png
>
>
> below code throwing exception because {{volumeIterator.next()}} called two 
> time without checking hashNext().
> {code:java}
> while (volumeIterator.hasNext()) {
>   FsVolumeSpi volume = volumeIterator.next();
>   DataNodeVolumeMetrics metrics = volumeIterator.next().getMetrics();
>   String volumeName = volume.getBaseURI().getPath();
>   metadataOpStats.put(volumeName,
>   metrics.getMetadataOperationMean());
>   readIoStats.put(volumeName, metrics.getReadIoMean());
>   writeIoStats.put(volumeName, metrics.getWriteIoMean());
> }{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14235) Handle ArrayIndexOutOfBoundsException in DataNodeDiskMetrics#slowDiskDetectionDaemon

2019-09-30 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14235:
---
Fix Version/s: 3.1.4

> Handle ArrayIndexOutOfBoundsException in 
> DataNodeDiskMetrics#slowDiskDetectionDaemon 
> -
>
> Key: HDFS-14235
> URL: https://issues.apache.org/jira/browse/HDFS-14235
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Surendra Singh Lilhore
>Assignee: Ranith Sardar
>Priority: Major
> Fix For: 3.3.0, 3.2.1, 3.1.4
>
> Attachments: HDFS-14235.000.patch, HDFS-14235.001.patch, 
> HDFS-14235.002.patch, HDFS-14235.003.patch, NPE.png, exception.png
>
>
> below code throwing exception because {{volumeIterator.next()}} called two 
> time without checking hashNext().
> {code:java}
> while (volumeIterator.hasNext()) {
>   FsVolumeSpi volume = volumeIterator.next();
>   DataNodeVolumeMetrics metrics = volumeIterator.next().getMetrics();
>   String volumeName = volume.getBaseURI().getPath();
>   metadataOpStats.put(volumeName,
>   metrics.getMetadataOperationMean());
>   readIoStats.put(volumeName, metrics.getReadIoMean());
>   writeIoStats.put(volumeName, metrics.getWriteIoMean());
> }{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10648) Expose Balancer metrics through Metrics2

2019-09-30 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-10648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16941389#comment-16941389
 ] 

Wei-Chiu Chuang commented on HDFS-10648:


Thanks for doing this. Without metrics HDFS-13783 isn't much useful.

> Expose Balancer metrics through Metrics2
> 
>
> Key: HDFS-10648
> URL: https://issues.apache.org/jira/browse/HDFS-10648
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: balancer & mover, metrics
>Reporter: Mark Wagner
>Assignee: Chen Zhang
>Priority: Major
>  Labels: metrics
>
> The Balancer currently prints progress information to the console. For 
> deployments that run the balancer frequently, it would be helpful to collect 
> those metrics for publishing to the available sinks. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



<    5   6   7   8   9   10   11   12   13   14   >