[jira] [Updated] (HDFS-6962) ACLs inheritance conflict with umaskmode

2015-02-16 Thread Srikanth Upputuri (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srikanth Upputuri updated HDFS-6962:

Attachment: HDFS-6962.1.patch

> ACLs inheritance conflict with umaskmode
> 
>
> Key: HDFS-6962
> URL: https://issues.apache.org/jira/browse/HDFS-6962
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.4.1
> Environment: CentOS release 6.5 (Final)
>Reporter: LINTE
>Assignee: Srikanth Upputuri
>  Labels: hadoop, security
> Attachments: HDFS-6962.1.patch
>
>
> In hdfs-site.xml 
> 
> dfs.umaskmode
> 027
> 
> 1/ Create a directory as superuser
> bash# hdfs dfs -mkdir  /tmp/ACLS
> 2/ set default ACLs on this directory rwx access for group readwrite and user 
> toto
> bash# hdfs dfs -setfacl -m default:group:readwrite:rwx /tmp/ACLS
> bash# hdfs dfs -setfacl -m default:user:toto:rwx /tmp/ACLS
> 3/ check ACLs /tmp/ACLS/
> bash# hdfs dfs -getfacl /tmp/ACLS/
> # file: /tmp/ACLS
> # owner: hdfs
> # group: hadoop
> user::rwx
> group::r-x
> other::---
> default:user::rwx
> default:user:toto:rwx
> default:group::r-x
> default:group:readwrite:rwx
> default:mask::rwx
> default:other::---
> user::rwx | group::r-x | other::--- matches with the umaskmode defined in 
> hdfs-site.xml, everything ok !
> default:group:readwrite:rwx allow readwrite group with rwx access for 
> inhéritance.
> default:user:toto:rwx allow toto user with rwx access for inhéritance.
> default:mask::rwx inhéritance mask is rwx, so no mask
> 4/ Create a subdir to test inheritance of ACL
> bash# hdfs dfs -mkdir  /tmp/ACLS/hdfs
> 5/ check ACLs /tmp/ACLS/hdfs
> bash# hdfs dfs -getfacl /tmp/ACLS/hdfs
> # file: /tmp/ACLS/hdfs
> # owner: hdfs
> # group: hadoop
> user::rwx
> user:toto:rwx   #effective:r-x
> group::r-x
> group:readwrite:rwx #effective:r-x
> mask::r-x
> other::---
> default:user::rwx
> default:user:toto:rwx
> default:group::r-x
> default:group:readwrite:rwx
> default:mask::rwx
> default:other::---
> Here we can see that the readwrite group has rwx ACL bu only r-x is effective 
> because the mask is r-x (mask::r-x) in spite of default mask for inheritance 
> is set to default:mask::rwx on /tmp/ACLS/
> 6/ Modifiy hdfs-site.xml et restart namenode
> 
> dfs.umaskmode
> 010
> 
> 7/ Create a subdir to test inheritance of ACL with new parameter umaskmode
> bash# hdfs dfs -mkdir  /tmp/ACLS/hdfs2
> 8/ Check ACL on /tmp/ACLS/hdfs2
> bash# hdfs dfs -getfacl /tmp/ACLS/hdfs2
> # file: /tmp/ACLS/hdfs2
> # owner: hdfs
> # group: hadoop
> user::rwx
> user:toto:rwx   #effective:rw-
> group::r-x  #effective:r--
> group:readwrite:rwx #effective:rw-
> mask::rw-
> other::---
> default:user::rwx
> default:user:toto:rwx
> default:group::r-x
> default:group:readwrite:rwx
> default:mask::rwx
> default:other::---
> So HDFS masks the ACL value (user, group and other  -- exepted the POSIX 
> owner -- ) with the group mask of dfs.umaskmode properties when creating 
> directory with inherited ACL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6962) ACLs inheritance conflict with umaskmode

2015-02-16 Thread Srikanth Upputuri (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14322555#comment-14322555
 ] 

Srikanth Upputuri commented on HDFS-6962:
-

[~cnauroth], after reading your comment above I have studied the relevant code 
and this is what I think. 

The umask should be loaded and applied on the server, depending on whether the 
parent directory has default acls or not. Only if default acls do not exist, 
umask will be applied to the mode. For mode, client will either pass the source 
permissions(cp, put, copyFromLocal) or the default permissions if no source 
permissions exist(create, mkdir etc). Currently the client code wrongly applies 
the mask to the permissions before making RPC calls. This happens at several 
places and this needs to be changed. 

For the copyFromLocal command, I have compared the behavior with 'cp' on Linux 
local file system. The resultant permissions of the destination file are 
determined by the parent directory's default permissions and the source file's 
permissions (mode). The umask is used only when the parent directory doesn't 
have default permissions. This is just like create api, except that in case of 
'create', the mode takes default value (0666).
The second RPC to 'setPermission' is only used when 'preserve attributes' 
option -p is used and permissions/ACLs are expected to be retained and in this 
case umask is not required. So, the only change 'copyFromLocal' may require is 
pass the the source file's permissions as mode, without masking.

Compatibility: Older clients applying the mask before passing the mode to 
server will retain their existing behavior if the parent directory has default 
permissions. In case the parent directory does not have default permissions, 
the mask gets applied one more time on the server without causing any change to 
the permissions. So, effectively the clients see the same behavior as existing.

I am attaching a prototype patch, please take a look. I will add tests later 
once the approach is validated.

> ACLs inheritance conflict with umaskmode
> 
>
> Key: HDFS-6962
> URL: https://issues.apache.org/jira/browse/HDFS-6962
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.4.1
> Environment: CentOS release 6.5 (Final)
>Reporter: LINTE
>Assignee: Srikanth Upputuri
>  Labels: hadoop, security
>
> In hdfs-site.xml 
> 
> dfs.umaskmode
> 027
> 
> 1/ Create a directory as superuser
> bash# hdfs dfs -mkdir  /tmp/ACLS
> 2/ set default ACLs on this directory rwx access for group readwrite and user 
> toto
> bash# hdfs dfs -setfacl -m default:group:readwrite:rwx /tmp/ACLS
> bash# hdfs dfs -setfacl -m default:user:toto:rwx /tmp/ACLS
> 3/ check ACLs /tmp/ACLS/
> bash# hdfs dfs -getfacl /tmp/ACLS/
> # file: /tmp/ACLS
> # owner: hdfs
> # group: hadoop
> user::rwx
> group::r-x
> other::---
> default:user::rwx
> default:user:toto:rwx
> default:group::r-x
> default:group:readwrite:rwx
> default:mask::rwx
> default:other::---
> user::rwx | group::r-x | other::--- matches with the umaskmode defined in 
> hdfs-site.xml, everything ok !
> default:group:readwrite:rwx allow readwrite group with rwx access for 
> inhéritance.
> default:user:toto:rwx allow toto user with rwx access for inhéritance.
> default:mask::rwx inhéritance mask is rwx, so no mask
> 4/ Create a subdir to test inheritance of ACL
> bash# hdfs dfs -mkdir  /tmp/ACLS/hdfs
> 5/ check ACLs /tmp/ACLS/hdfs
> bash# hdfs dfs -getfacl /tmp/ACLS/hdfs
> # file: /tmp/ACLS/hdfs
> # owner: hdfs
> # group: hadoop
> user::rwx
> user:toto:rwx   #effective:r-x
> group::r-x
> group:readwrite:rwx #effective:r-x
> mask::r-x
> other::---
> default:user::rwx
> default:user:toto:rwx
> default:group::r-x
> default:group:readwrite:rwx
> default:mask::rwx
> default:other::---
> Here we can see that the readwrite group has rwx ACL bu only r-x is effective 
> because the mask is r-x (mask::r-x) in spite of default mask for inheritance 
> is set to default:mask::rwx on /tmp/ACLS/
> 6/ Modifiy hdfs-site.xml et restart namenode
> 
> dfs.umaskmode
> 010
> 
> 7/ Create a subdir to test inheritance of ACL with new parameter umaskmode
> bash# hdfs dfs -mkdir  /tmp/ACLS/hdfs2
> 8/ Check ACL on /tmp/ACLS/hdfs2
> bash# hdfs dfs -getfacl /tmp/ACLS/hdfs2
> # file: /tmp/ACLS/hdfs2
> # owner: hdfs
> # group: hadoop
> user::rwx
> user:toto:rwx   #effective:rw-
> group::r-x  #effective:r--
> group:readwrite:rwx #effective:rw-
> mask::rw-
> other::---
> default:user::rwx
> default:user:toto:rwx
> default:group::r-x
> default:group:readwrite:rwx
> default:mask::rwx
> default:other::---
> So HDFS masks the ACL value (user, group and other  -- exepted the POSIX 
> owner -- ) with the group mask of dfs.umaskmode prope

[jira] [Assigned] (HDFS-6962) ACLs inheritance conflict with umaskmode

2015-02-13 Thread Srikanth Upputuri (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srikanth Upputuri reassigned HDFS-6962:
---

Assignee: Srikanth Upputuri

> ACLs inheritance conflict with umaskmode
> 
>
> Key: HDFS-6962
> URL: https://issues.apache.org/jira/browse/HDFS-6962
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.4.1
> Environment: CentOS release 6.5 (Final)
>Reporter: LINTE
>Assignee: Srikanth Upputuri
>  Labels: hadoop, security
>
> In hdfs-site.xml 
> 
> dfs.umaskmode
> 027
> 
> 1/ Create a directory as superuser
> bash# hdfs dfs -mkdir  /tmp/ACLS
> 2/ set default ACLs on this directory rwx access for group readwrite and user 
> toto
> bash# hdfs dfs -setfacl -m default:group:readwrite:rwx /tmp/ACLS
> bash# hdfs dfs -setfacl -m default:user:toto:rwx /tmp/ACLS
> 3/ check ACLs /tmp/ACLS/
> bash# hdfs dfs -getfacl /tmp/ACLS/
> # file: /tmp/ACLS
> # owner: hdfs
> # group: hadoop
> user::rwx
> group::r-x
> other::---
> default:user::rwx
> default:user:toto:rwx
> default:group::r-x
> default:group:readwrite:rwx
> default:mask::rwx
> default:other::---
> user::rwx | group::r-x | other::--- matches with the umaskmode defined in 
> hdfs-site.xml, everything ok !
> default:group:readwrite:rwx allow readwrite group with rwx access for 
> inhéritance.
> default:user:toto:rwx allow toto user with rwx access for inhéritance.
> default:mask::rwx inhéritance mask is rwx, so no mask
> 4/ Create a subdir to test inheritance of ACL
> bash# hdfs dfs -mkdir  /tmp/ACLS/hdfs
> 5/ check ACLs /tmp/ACLS/hdfs
> bash# hdfs dfs -getfacl /tmp/ACLS/hdfs
> # file: /tmp/ACLS/hdfs
> # owner: hdfs
> # group: hadoop
> user::rwx
> user:toto:rwx   #effective:r-x
> group::r-x
> group:readwrite:rwx #effective:r-x
> mask::r-x
> other::---
> default:user::rwx
> default:user:toto:rwx
> default:group::r-x
> default:group:readwrite:rwx
> default:mask::rwx
> default:other::---
> Here we can see that the readwrite group has rwx ACL bu only r-x is effective 
> because the mask is r-x (mask::r-x) in spite of default mask for inheritance 
> is set to default:mask::rwx on /tmp/ACLS/
> 6/ Modifiy hdfs-site.xml et restart namenode
> 
> dfs.umaskmode
> 010
> 
> 7/ Create a subdir to test inheritance of ACL with new parameter umaskmode
> bash# hdfs dfs -mkdir  /tmp/ACLS/hdfs2
> 8/ Check ACL on /tmp/ACLS/hdfs2
> bash# hdfs dfs -getfacl /tmp/ACLS/hdfs2
> # file: /tmp/ACLS/hdfs2
> # owner: hdfs
> # group: hadoop
> user::rwx
> user:toto:rwx   #effective:rw-
> group::r-x  #effective:r--
> group:readwrite:rwx #effective:rw-
> mask::rw-
> other::---
> default:user::rwx
> default:user:toto:rwx
> default:group::r-x
> default:group:readwrite:rwx
> default:mask::rwx
> default:other::---
> So HDFS masks the ACL value (user, group and other  -- exepted the POSIX 
> owner -- ) with the group mask of dfs.umaskmode properties when creating 
> directory with inherited ACL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7722) DataNode#checkDiskError should also remove Storage when error is found.

2015-02-05 Thread Srikanth Upputuri (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14306893#comment-14306893
 ] 

Srikanth Upputuri commented on HDFS-7722:
-

Reassigned to you, [~eddyxu].

> DataNode#checkDiskError should also remove Storage when error is found.
> ---
>
> Key: HDFS-7722
> URL: https://issues.apache.org/jira/browse/HDFS-7722
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
>
> When {{DataNode#checkDiskError}} found disk errors, it removes all block 
> metadatas from {{FsDatasetImpl}}. However, it does not removed the 
> corresponding {{DataStorage}} and {{BlockPoolSliceStorage}}. 
> The result is that, we could not directly run {{reconfig}} to hot swap the 
> failure disks without changing the configure file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-7722) DataNode#checkDiskError should also remove Storage when error is found.

2015-02-05 Thread Srikanth Upputuri (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srikanth Upputuri reassigned HDFS-7722:
---

Assignee: Lei (Eddy) Xu  (was: Srikanth Upputuri)

> DataNode#checkDiskError should also remove Storage when error is found.
> ---
>
> Key: HDFS-7722
> URL: https://issues.apache.org/jira/browse/HDFS-7722
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
>
> When {{DataNode#checkDiskError}} found disk errors, it removes all block 
> metadatas from {{FsDatasetImpl}}. However, it does not removed the 
> corresponding {{DataStorage}} and {{BlockPoolSliceStorage}}. 
> The result is that, we could not directly run {{reconfig}} to hot swap the 
> failure disks without changing the configure file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-7722) DataNode#checkDiskError should also remove Storage when error is found.

2015-02-03 Thread Srikanth Upputuri (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srikanth Upputuri reassigned HDFS-7722:
---

Assignee: Srikanth Upputuri

> DataNode#checkDiskError should also remove Storage when error is found.
> ---
>
> Key: HDFS-7722
> URL: https://issues.apache.org/jira/browse/HDFS-7722
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0
>Reporter: Lei (Eddy) Xu
>Assignee: Srikanth Upputuri
>
> When {{DataNode#checkDiskError}} found disk errors, it removes all block 
> metadatas from {{FsDatasetImpl}}. However, it does not removed the 
> corresponding {{DataStorage}} and {{BlockPoolSliceStorage}}. 
> The result is that, we could not directly run {{reconfig}} to hot swap the 
> failure disks without changing the configure file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6753) When one the Disk is full and all the volumes configured are unhealthy , then Datanode is not considering it as failure and datanode process is not shutting down .

2015-02-02 Thread Srikanth Upputuri (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14302784#comment-14302784
 ] 

Srikanth Upputuri commented on HDFS-6753:
-

A write request to DN will first check for a disk volume with available space 
then proceeds to create a rbw file on it. The 'check disk error' is triggered 
when the rbw file can not be created. But if a volume with sufficient space 
could not be found, the request just throws an exception without initiating 
'check disk error'. This is reasonable to expect because if there is no space 
available on any volume, DN may still be able to service read requests, so 'not 
enough space' is not a sufficient condition for DN shutdown. However, if after 
this condition all the volumes happen to become faulty, a subsequent read 
request will detect this condition and shutdown DN anyway. Therefore there is 
no need to fix this behavior.

> When one the Disk is full and all the volumes configured are unhealthy , then 
> Datanode is not considering it as failure and datanode process is not 
> shutting down .
> ---
>
> Key: HDFS-6753
> URL: https://issues.apache.org/jira/browse/HDFS-6753
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: J.Andreina
>Assignee: Srikanth Upputuri
>
> Env Details :
> =
> Cluster has 3 Datanode
> Cluster installed with "Rex" user
> dfs.datanode.failed.volumes.tolerated  = 3
> dfs.blockreport.intervalMsec  = 18000
> dfs.datanode.directoryscan.interval = 120
> DN_XX1.XX1.XX1.XX1 data dir = 
> /mnt/tmp_Datanode,/home/REX/data/dfs1/data,/home/REX/data/dfs2/data,/opt/REX/dfs/data
>  
>  
> /home/REX/data/dfs1/data,/home/REX/data/dfs2/data,/opt/REX/dfs/data - 
> permission is denied ( hence DN considered the volume as failed )
>  
> Expected behavior is observed when disk is not full:
> 
>  
> Step 1: Change the permissions of /mnt/tmp_Datanode to root
>  
> Step 2: Perform write operations ( DN detects that all Volume configured is 
> failed and gets shutdown )
>  
> Scenario 1: 
> ===
>  
> Step 1 : Make /mnt/tmp_Datanode disk full and change the permissions to root
> Step 2 : Perform client write operations ( disk full exception is thrown , 
> but Datanode is not getting shutdown ,  eventhough all the volume configured 
> has failed)
>  
> {noformat}
>  
> 2014-07-21 14:10:52,814 ERROR 
> org.apache.hadoop.hdfs.server.datanode.DataNode: 
> XX1.XX1.XX1.XX1:50010:DataXceiver error processing WRITE_BLOCK operation  
> src: /XX2.XX2.XX2.XX2:10106 dst: /XX1.XX1.XX1.XX1:50010
>  
> org.apache.hadoop.util.DiskChecker$DiskOutOfSpaceException: Out of space: The 
> volume with the most available space (=4096 B) is less than the block size 
> (=134217728 B).
>  
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.RoundRobinVolumeChoosingPolicy.chooseVolume(RoundRobinVolumeChoosingPolicy.java:60)
>  
> {noformat}
>  
> Observations :
> ==
> 1. Write operations does not shutdown Datanode , eventhough all the volume 
> configured is failed ( When one of the disk is full and for all the disk 
> permission is denied)
>  
> 2. Directory scannning fails , still DN is not getting shutdown
>  
>  
>  
> {noformat}
>  
> 2014-07-21 14:13:00,180 WARN 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: Exception occured 
> while compiling report: 
>  
> java.io.IOException: Invalid directory or I/O error occurred for dir: 
> /mnt/tmp_Datanode/current/BP-1384489961-XX2.XX2.XX2.XX2-845784615183/current/finalized
>  
> at org.apache.hadoop.fs.FileUtil.listFiles(FileUtil.java:1164)
>  
> at 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner$ReportCompiler.compileReport(DirectoryScanner.java:596)
>  
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (HDFS-7082) When replication factor equals number of data nodes, corrupt replica will never get substituted with good replica

2014-09-18 Thread Srikanth Upputuri (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-7082 started by Srikanth Upputuri.
---
> When replication factor equals number of data nodes, corrupt replica will 
> never get substituted with good replica
> -
>
> Key: HDFS-7082
> URL: https://issues.apache.org/jira/browse/HDFS-7082
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Srikanth Upputuri
>Assignee: Srikanth Upputuri
>Priority: Minor
>
> BlockManager will not invalidate a corrupt replica if this brings down the 
> total number of replicas below replication factor (except if the corrupt 
> replica has a wrong genstamp). On clusters where the replication factor = 
> total data nodes, a new replica can not be created from a live replica as all 
> the available datanodes already have a replica each. Because of this, the 
> corrupt replicas will never be substituted with good replicas, so will never 
> get deleted. Sooner or later all replicas may get corrupt and there will be 
> no live replicas in the cluster for this block.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-2932) Under replicated block after the pipeline recovery.

2014-09-17 Thread Srikanth Upputuri (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srikanth Upputuri resolved HDFS-2932.
-
   Resolution: Duplicate
Fix Version/s: (was: 0.24.0)

Closed as duplicate of HDFS-3493. 

> Under replicated block after the pipeline recovery.
> ---
>
> Key: HDFS-2932
> URL: https://issues.apache.org/jira/browse/HDFS-2932
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 0.24.0
>Reporter: J.Andreina
>Assignee: Srikanth Upputuri
>
> Started 1NN,DN1,DN2,DN3 in the same machine.
> Written a huge file of size 2 Gb
> while the write for the block-id-1005 is in progress bruought down DN3.
> after the pipeline recovery happened.Block stamp changed into block_id_1006 
> in DN1,Dn2.
> after the write is over.DN3 is brought up and fsck command is issued.
> the following mess is displayed as follows
> "block-id_1006 is underreplicatede.Target replicas is 3 but found 2 replicas".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-2932) Under replicated block after the pipeline recovery.

2014-09-17 Thread Srikanth Upputuri (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srikanth Upputuri reassigned HDFS-2932:
---

Assignee: Srikanth Upputuri

> Under replicated block after the pipeline recovery.
> ---
>
> Key: HDFS-2932
> URL: https://issues.apache.org/jira/browse/HDFS-2932
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 0.24.0
>Reporter: J.Andreina
>Assignee: Srikanth Upputuri
> Fix For: 0.24.0
>
>
> Started 1NN,DN1,DN2,DN3 in the same machine.
> Written a huge file of size 2 Gb
> while the write for the block-id-1005 is in progress bruought down DN3.
> after the pipeline recovery happened.Block stamp changed into block_id_1006 
> in DN1,Dn2.
> after the write is over.DN3 is brought up and fsck command is issued.
> the following mess is displayed as follows
> "block-id_1006 is underreplicatede.Target replicas is 3 but found 2 replicas".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7082) When replication factor equals number of data nodes, corrupt replica will never get substituted with good replica

2014-09-17 Thread Srikanth Upputuri (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14138555#comment-14138555
 ] 

Srikanth Upputuri commented on HDFS-7082:
-

Currently if the below condition in BlockManager#markBlockAsCorrupt is true we 
go ahead and invalidate the corrupt replica. But for the scenario in question, 
it will be false.
{code}
boolean hasMoreCorruptReplicas = minReplicationSatisfied &&
(numberOfReplicas.liveReplicas() + numberOfReplicas.corruptReplicas()) >
bc.getBlockReplication();
{code}
I propose to change this to 
{code}
boolean hasMoreCorruptReplicas = minReplicationSatisfied &&
(numberOfReplicas.liveReplicas() + numberOfReplicas.corruptReplicas()) 
>=
bc.getBlockReplication();
{code}
This solves the current problem as well as retains almost all the existing 
behavior.
Now we let the 'total replicas' become 'replication factor - 1'. And we don't 
let it go down beyond that. This will effectively vacate a slot on exactly one 
datanode and let the replication happen, thereby solving the reported problem. 

Example scenarios: 

1. DN1, DN2, DN3, replication factor =3, DN3 replica is corrupt. 
The corrupt replica is invalidated and deleted. New live replica will be 
written to DN3.

2. DN1, DN2, DN3, replication factor =3, DN2 and DN3 replicas are corrupt. 
DN3 sends block report. The corrupt replica on DN3 is invalidated and deleted. 
DN2 sends block report. The corrupt replica on DN2 will not be invalidated as 
the current 'total replicas' < 'replication factor'. New live replica will 
eventually be written to DN3. Then on further block report from DN2, the 
corrupt replica will get deleted.


> When replication factor equals number of data nodes, corrupt replica will 
> never get substituted with good replica
> -
>
> Key: HDFS-7082
> URL: https://issues.apache.org/jira/browse/HDFS-7082
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Srikanth Upputuri
>Assignee: Srikanth Upputuri
>Priority: Minor
>
> BlockManager will not invalidate a corrupt replica if this brings down the 
> total number of replicas below replication factor (except if the corrupt 
> replica has a wrong genstamp). On clusters where the replication factor = 
> total data nodes, a new replica can not be created from a live replica as all 
> the available datanodes already have a replica each. Because of this, the 
> corrupt replicas will never be substituted with good replicas, so will never 
> get deleted. Sooner or later all replicas may get corrupt and there will be 
> no live replicas in the cluster for this block.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7082) When replication factor equals number of data nodes, corrupt replica will never get substituted with good replica

2014-09-17 Thread Srikanth Upputuri (JIRA)
Srikanth Upputuri created HDFS-7082:
---

 Summary: When replication factor equals number of data nodes, 
corrupt replica will never get substituted with good replica
 Key: HDFS-7082
 URL: https://issues.apache.org/jira/browse/HDFS-7082
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Srikanth Upputuri
Assignee: Srikanth Upputuri
Priority: Minor


BlockManager will not invalidate a corrupt replica if this brings down the 
total number of replicas below replication factor (except if the corrupt 
replica has a wrong genstamp). On clusters where the replication factor = total 
data nodes, a new replica can not be created from a live replica as all the 
available datanodes already have a replica each. Because of this, the corrupt 
replicas will never be substituted with good replicas, so will never get 
deleted. Sooner or later all replicas may get corrupt and there will be no live 
replicas in the cluster for this block.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-6805) NPE is thrown at Namenode , for every block report sent from DN

2014-09-17 Thread Srikanth Upputuri (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srikanth Upputuri reassigned HDFS-6805:
---

Assignee: Srikanth Upputuri

> NPE is thrown at Namenode , for every block report sent from DN
> ---
>
> Key: HDFS-6805
> URL: https://issues.apache.org/jira/browse/HDFS-6805
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: J.Andreina
>Assignee: Srikanth Upputuri
>
> Env Details :
> HA Cluster
> 2 DN 
> Procedure :
> ===
> During Client operation is in progress restarted one DN .
> After restart for every block report NPE is thrown at Namenode and DN side.
> Namenode Log:
> =
> {noformat}
> 2014-08-01 18:24:16,585 WARN org.apache.hadoop.ipc.Server: IPC Server handler 
> 3 on 8020, call 
> org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.blockReport from 
> 10.18.40.14:38651 Call#7 Retry#0
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.moveBlockToHead(BlockInfo.java:354)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.moveBlockToHead(DatanodeStorageInfo.java:242)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1905)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1772)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1699)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1019)
>   at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:152)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28061)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>   at java.security.AccessController.doPrivileged(Native Method)
> {noformat}
> Datanode Log:
> 
> {noformat}
> 2014-08-01 18:34:21,793 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> RemoteException in offerService
> org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.moveBlockToHead(BlockInfo.java:354)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.moveBlockToHead(DatanodeStorageInfo.java:242)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1905)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1772)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1699)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1019)
>   at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:152)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28061)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>   at java.security.AccessController.doPrivileged(Native Method)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-7033) dfs.web.authentication.filter should be documented

2014-09-16 Thread Srikanth Upputuri (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srikanth Upputuri reassigned HDFS-7033:
---

Assignee: Srikanth Upputuri

> dfs.web.authentication.filter should be documented
> --
>
> Key: HDFS-7033
> URL: https://issues.apache.org/jira/browse/HDFS-7033
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation, security
>Affects Versions: 2.4.0
>Reporter: Allen Wittenauer
>Assignee: Srikanth Upputuri
>
> HDFS-5716 added dfs.web.authentication.filter but this doesn't appear to be 
> documented anywhere.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6606) Optimize HDFS Encrypted Transport performance

2014-09-16 Thread Srikanth Upputuri (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14136781#comment-14136781
 ] 

Srikanth Upputuri commented on HDFS-6606:
-

{quote} In this JIRA, 3DES is used to encrypt/decrypt the negotiated cipher key 
(originally it was used to encrypt the transferred data). You are right, the 
channel confidentiality is the same, but it's enough. Our goal is to improve 
the performance.{quote}

Thank you for the explanation. I read about AES-NI and I now understand that 
with a JCE provider like Diceros AES performance will significantly improve. 
However, if we need to provide support for increased confidentiality with AES, 
can we not do it by implementing GSSAPI mechanism in addition to the existing 
DIGEST-MD5, the same way it is implemented for rpc? The java gss api has 
support for AES anyway as described in  
http://docs.oracle.com/javase/7/docs/technotes/guides/security/jgss/jgss-features.html.
 That way we get better performance (with AES-NI support) as well as better 
data privacy. I have read through all the comments but didn't quite get why 
this approach is not considered. Any reasons?

> Optimize HDFS Encrypted Transport performance
> -
>
> Key: HDFS-6606
> URL: https://issues.apache.org/jira/browse/HDFS-6606
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, hdfs-client, security
>Reporter: Yi Liu
>Assignee: Yi Liu
> Attachments: HDFS-6606.001.patch, HDFS-6606.002.patch, 
> HDFS-6606.003.patch, HDFS-6606.004.patch, 
> OptimizeHdfsEncryptedTransportperformance.pdf
>
>
> In HDFS-3637, [~atm] added support for encrypting the DataTransferProtocol, 
> it was a great work.
> It utilizes SASL {{Digest-MD5}} mechanism (use Qop: auth-conf),  it supports 
> three security strength:
> * high  3des   or rc4 (128bits)
> * medium des or rc4(56bits)
> * low   rc4(40bits)
> 3des and rc4 are slow, only *tens of MB/s*, 
> http://www.javamex.com/tutorials/cryptography/ciphers.shtml
> http://www.cs.wustl.edu/~jain/cse567-06/ftp/encryption_perf/
> I will give more detailed performance data in future. Absolutely it’s 
> bottleneck and will vastly affect the end to end performance. 
> AES(Advanced Encryption Standard) is recommended as a replacement of DES, 
> it’s more secure; with AES-NI support, the throughput can reach nearly 
> *2GB/s*, it won’t be the bottleneck any more, AES and CryptoCodec work is 
> supported in HADOOP-10150, HADOOP-10603 and HADOOP-10693 (We may need to add 
> a new mode support for AES). 
> This JIRA will use AES with AES-NI support as encryption algorithm for 
> DataTransferProtocol.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-2932) Under replicated block after the pipeline recovery.

2014-09-15 Thread Srikanth Upputuri (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14133800#comment-14133800
 ] 

Srikanth Upputuri commented on HDFS-2932:
-

[~vinayrpet], though I really don't see a reason why we should not delete a 
mis-stamped replica (during block report processing) after the block is 
committed, I agree with you that this improvement in early detection may be 
unnecessary (or even slightly risky?) particularly when the benefit is very 
little.

Can I mark it duplicate of HDFS-3493?

> Under replicated block after the pipeline recovery.
> ---
>
> Key: HDFS-2932
> URL: https://issues.apache.org/jira/browse/HDFS-2932
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 0.24.0
>Reporter: J.Andreina
> Fix For: 0.24.0
>
>
> Started 1NN,DN1,DN2,DN3 in the same machine.
> Written a huge file of size 2 Gb
> while the write for the block-id-1005 is in progress bruought down DN3.
> after the pipeline recovery happened.Block stamp changed into block_id_1006 
> in DN1,Dn2.
> after the write is over.DN3 is brought up and fsck command is issued.
> the following mess is displayed as follows
> "block-id_1006 is underreplicatede.Target replicas is 3 but found 2 replicas".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-6753) When one the Disk is full and all the volumes configured are unhealthy , then Datanode is not considering it as failure and datanode process is not shutting down .

2014-09-12 Thread Srikanth Upputuri (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srikanth Upputuri reassigned HDFS-6753:
---

Assignee: Srikanth Upputuri

> When one the Disk is full and all the volumes configured are unhealthy , then 
> Datanode is not considering it as failure and datanode process is not 
> shutting down .
> ---
>
> Key: HDFS-6753
> URL: https://issues.apache.org/jira/browse/HDFS-6753
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: J.Andreina
>Assignee: Srikanth Upputuri
>
> Env Details :
> =
> Cluster has 3 Datanode
> Cluster installed with "Rex" user
> dfs.datanode.failed.volumes.tolerated  = 3
> dfs.blockreport.intervalMsec  = 18000
> dfs.datanode.directoryscan.interval = 120
> DN_XX1.XX1.XX1.XX1 data dir = 
> /mnt/tmp_Datanode,/home/REX/data/dfs1/data,/home/REX/data/dfs2/data,/opt/REX/dfs/data
>  
>  
> /home/REX/data/dfs1/data,/home/REX/data/dfs2/data,/opt/REX/dfs/data - 
> permission is denied ( hence DN considered the volume as failed )
>  
> Expected behavior is observed when disk is not full:
> 
>  
> Step 1: Change the permissions of /mnt/tmp_Datanode to root
>  
> Step 2: Perform write operations ( DN detects that all Volume configured is 
> failed and gets shutdown )
>  
> Scenario 1: 
> ===
>  
> Step 1 : Make /mnt/tmp_Datanode disk full and change the permissions to root
> Step 2 : Perform client write operations ( disk full exception is thrown , 
> but Datanode is not getting shutdown ,  eventhough all the volume configured 
> has failed)
>  
> {noformat}
>  
> 2014-07-21 14:10:52,814 ERROR 
> org.apache.hadoop.hdfs.server.datanode.DataNode: 
> XX1.XX1.XX1.XX1:50010:DataXceiver error processing WRITE_BLOCK operation  
> src: /XX2.XX2.XX2.XX2:10106 dst: /XX1.XX1.XX1.XX1:50010
>  
> org.apache.hadoop.util.DiskChecker$DiskOutOfSpaceException: Out of space: The 
> volume with the most available space (=4096 B) is less than the block size 
> (=134217728 B).
>  
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.RoundRobinVolumeChoosingPolicy.chooseVolume(RoundRobinVolumeChoosingPolicy.java:60)
>  
> {noformat}
>  
> Observations :
> ==
> 1. Write operations does not shutdown Datanode , eventhough all the volume 
> configured is failed ( When one of the disk is full and for all the disk 
> permission is denied)
>  
> 2. Directory scannning fails , still DN is not getting shutdown
>  
>  
>  
> {noformat}
>  
> 2014-07-21 14:13:00,180 WARN 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: Exception occured 
> while compiling report: 
>  
> java.io.IOException: Invalid directory or I/O error occurred for dir: 
> /mnt/tmp_Datanode/current/BP-1384489961-XX2.XX2.XX2.XX2-845784615183/current/finalized
>  
> at org.apache.hadoop.fs.FileUtil.listFiles(FileUtil.java:1164)
>  
> at 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner$ReportCompiler.compileReport(DirectoryScanner.java:596)
>  
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6606) Optimize HDFS Encrypted Transport performance

2014-09-12 Thread Srikanth Upputuri (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14131300#comment-14131300
 ] 

Srikanth Upputuri commented on HDFS-6606:
-

This is a very nice effort. It's a great deal of learning for me reading 
through this jira and HDFS-3637. But I have a couple of fundamental questions 
here.

Does this patch improve data transfer speed? But isn't the existing RC4 option 
much faster (as shown in the comparison analysis)?

Does this patch improve the data transfer channel confidentiality? But, if we 
transfer the AES keys and IVs over a 3DES encrypted channel, isn't the overall 
confidentiality effectively same as someone who can successfully intercept and 
decrypt the 3DES traffic can read the AES keys?

Am I missing something here?

> Optimize HDFS Encrypted Transport performance
> -
>
> Key: HDFS-6606
> URL: https://issues.apache.org/jira/browse/HDFS-6606
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, hdfs-client, security
>Reporter: Yi Liu
>Assignee: Yi Liu
> Attachments: HDFS-6606.001.patch, HDFS-6606.002.patch, 
> HDFS-6606.003.patch, HDFS-6606.004.patch, 
> OptimizeHdfsEncryptedTransportperformance.pdf
>
>
> In HDFS-3637, [~atm] added support for encrypting the DataTransferProtocol, 
> it was a great work.
> It utilizes SASL {{Digest-MD5}} mechanism (use Qop: auth-conf),  it supports 
> three security strength:
> * high  3des   or rc4 (128bits)
> * medium des or rc4(56bits)
> * low   rc4(40bits)
> 3des and rc4 are slow, only *tens of MB/s*, 
> http://www.javamex.com/tutorials/cryptography/ciphers.shtml
> http://www.cs.wustl.edu/~jain/cse567-06/ftp/encryption_perf/
> I will give more detailed performance data in future. Absolutely it’s 
> bottleneck and will vastly affect the end to end performance. 
> AES(Advanced Encryption Standard) is recommended as a replacement of DES, 
> it’s more secure; with AES-NI support, the throughput can reach nearly 
> *2GB/s*, it won’t be the bottleneck any more, AES and CryptoCodec work is 
> supported in HADOOP-10150, HADOOP-10603 and HADOOP-10693 (We may need to add 
> a new mode support for AES). 
> This JIRA will use AES with AES-NI support as encryption algorithm for 
> DataTransferProtocol.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3586) Blocks are not getting replicate even DN's are availble.

2014-09-09 Thread Srikanth Upputuri (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14126903#comment-14126903
 ] 

Srikanth Upputuri commented on HDFS-3586:
-

HDFS-3493 resolves the same issue. Now if the number of live replicas are more 
than minimum required and total replicas (live+corrupt) is more than 
replication factor, we invalidate the extra corrupt replica(s). Also, if the 
replica happens to be one that was discarded during a pipeline recovery, this 
will be invalidated if there are minimum number of live replicas irrespective 
of total replica count.

However, there is one possibility that can result in NN sending replication 
requests to copy a block to a DN with a write-pipeline-failed-replica. This is 
if the block is still being written when the reconnected DN sends a block 
report with an RBW/RWR replica for this block. I discussed this scenario in 
more detail in HDFS-2932. But for this situation, I think this jira can be 
closed as a duplicate to HDFS-3493. Please suggest.

> Blocks are not getting replicate even DN's are availble.
> 
>
> Key: HDFS-3586
> URL: https://issues.apache.org/jira/browse/HDFS-3586
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 2.0.0-alpha, 3.0.0
>Reporter: Brahma Reddy Battula
>Assignee: amith
> Attachments: HDFS-3586-analysis.txt
>
>
> Scenario:
> =
> Started four DN's(Say DN1,DN2,DN3 and DN4)
> writing files with RF=3..
> formed pipeline with DN1->DN2->DN3.
> Since DN3 network is very slow.it's not able to send acks.
> Again pipeline is fromed with DN1->DN2->DN4.
> Here DN4 network is also slow.
> So finally commitblocksync happend tp DN1 and DN2 successfully.
> block present in all the four DN's(finalized state in two DN's and rbw state 
> in another DN's)..
> Here NN is asking replicate to DN3 and DN4,but it's failing since replcia's 
> are already present in RBW dir.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-2932) Under replicated block after the pipeline recovery.

2014-09-08 Thread Srikanth Upputuri (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125490#comment-14125490
 ] 

Srikanth Upputuri commented on HDFS-2932:
-

Further analysing the two cases detailed by Vinay:

*Case 1*. I think the fix given for HDFS-3493 will solve this case as the 
corrupt replica(result of pipeline failure) will be eventually invalidated, 
inspite of the fact that total replicas = replication factor. Please confirm.

*Case 2*. If a write-pipeline-failed replica from a restarted DN arrives before 
the stored block is 'completed', it will not be marked as corrupt. Later when 
NN computes the replication work it is not aware of the fact that a corrupt 
replica exists on DN3, so it will keep scheduling replication from say DN2 to 
DN3 without success till next block report from DN3 is processed.


{code}
//BlockManager#checkReplicaCorrupt

case RBW:
case RWR:
  if (!storedBlock.isComplete()) {
return null; // not corrupt
  } 
{code}

There are two exclusive time windows when such a replica can be reported.
DN restarts and replica is reported before the client finished writing the 
block, i.e the block is not 'committed'.
DN restarts and replica is reported after 'commit' but before 'complete'. 


Solution is to be able to detect and capture a write-pipeline-failed replica as 
early as possible. First fix may be to change the check from 'isCompleted' to 
'isCommitted'. This will capture write-pipeline-failed replicas reported just 
after commit and before 'complete' and mark them as corrupt.

Then to capture write-pipeline-failed replicas reported before commit, I am 
investigating if this can be solved by marking them as corrupt as part of 
commit. There already exists a check to find any mis-stamped replicas during 
commit but we only remove them from the blocksMap. In addition can we not mark 
such replicas as corrupt?

{code}
//BlockInfoUnderConstruction#commitBlock

// Sort out invalid replicas.
setGenerationStampAndVerifyReplicas(block.getGenerationStamp());
{code}

Any thoughts/suggestions?

> Under replicated block after the pipeline recovery.
> ---
>
> Key: HDFS-2932
> URL: https://issues.apache.org/jira/browse/HDFS-2932
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 0.24.0
>Reporter: J.Andreina
> Fix For: 0.24.0
>
>
> Started 1NN,DN1,DN2,DN3 in the same machine.
> Written a huge file of size 2 Gb
> while the write for the block-id-1005 is in progress bruought down DN3.
> after the pipeline recovery happened.Block stamp changed into block_id_1006 
> in DN1,Dn2.
> after the write is over.DN3 is brought up and fsck command is issued.
> the following mess is displayed as follows
> "block-id_1006 is underreplicatede.Target replicas is 3 but found 2 replicas".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6703) NFS: Files can be deleted from a read-only mount

2014-07-23 Thread Srikanth Upputuri (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071501#comment-14071501
 ] 

Srikanth Upputuri commented on HDFS-6703:
-

Thanks [~brandonli] and [~abutala] for your quick responses and support!

> NFS: Files can be deleted from a read-only mount
> 
>
> Key: HDFS-6703
> URL: https://issues.apache.org/jira/browse/HDFS-6703
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.2.0
>Reporter: Abhiraj Butala
>Assignee: Srikanth Upputuri
> Fix For: 2.5.0
>
> Attachments: HDFS-6703.patch
>
>
>   
> As reported by bigdatagroup  on hadoop-users mailing 
> list:
> {code}
> We exported our distributed filesystem with the following configuration 
> (Managed by Cloudera Manager over CDH 5.0.1):
>  
> dfs.nfs.exports.allowed.hosts
> 192.168.0.153 ro
>   
> As you can see, we expect the exported FS to be read-only, but in fact we are 
> able to delete files and folders stored on it (where the user has the correct 
> permissions), from  the client machine that mounted the FS.
> Other writing operations are correctly blocked.
> Hadoop Version in use: 2.3.0+cdh5.0.1+567"
> {code}
> I was able to reproduce the issue on latest hadoop trunk. Though I could only 
> delete files, deleting directories were correctly blocked:
> {code}
> abutala@abutala-vBox:/mnt/hdfs$ mount | grep 127
> 127.0.1.1:/ on /mnt/hdfs type nfs (rw,vers=3,proto=tcp,nolock,addr=127.0.1.1)
> abutala@abutala-vBox:/mnt/hdfs$ ls -lh
> total 512
> -rw-r--r-- 1 abutala supergroup  0 Jul 17 18:51 abc.txt
> drwxr-xr-x 2 abutala supergroup 64 Jul 17 18:31 temp
> abutala@abutala-vBox:/mnt/hdfs$ rm abc.txt
> abutala@abutala-vBox:/mnt/hdfs$ ls
> temp
> abutala@abutala-vBox:/mnt/hdfs$ rm -r temp
> rm: cannot remove `temp': Permission denied
> abutala@abutala-vBox:/mnt/hdfs$ ls
> temp
> abutala@abutala-vBox:/mnt/hdfs$
> {code}
> Contents of hdfs-site.xml:
> {code}
> 
> 
> dfs.nfs3.dump.dir
> /tmp/.hdfs-nfs3
> 
> 
> dfs.nfs.exports.allowed.hosts
> localhost ro
> 
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6703) NFS: Files can be deleted from a read-only mount

2014-07-22 Thread Srikanth Upputuri (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srikanth Upputuri updated HDFS-6703:


Status: Patch Available  (was: Open)

> NFS: Files can be deleted from a read-only mount
> 
>
> Key: HDFS-6703
> URL: https://issues.apache.org/jira/browse/HDFS-6703
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.2.0
>Reporter: Abhiraj Butala
>Assignee: Srikanth Upputuri
> Attachments: HDFS-6703.patch
>
>
>   
> As reported by bigdatagroup  on hadoop-users mailing 
> list:
> {code}
> We exported our distributed filesystem with the following configuration 
> (Managed by Cloudera Manager over CDH 5.0.1):
>  
> dfs.nfs.exports.allowed.hosts
> 192.168.0.153 ro
>   
> As you can see, we expect the exported FS to be read-only, but in fact we are 
> able to delete files and folders stored on it (where the user has the correct 
> permissions), from  the client machine that mounted the FS.
> Other writing operations are correctly blocked.
> Hadoop Version in use: 2.3.0+cdh5.0.1+567"
> {code}
> I was able to reproduce the issue on latest hadoop trunk. Though I could only 
> delete files, deleting directories were correctly blocked:
> {code}
> abutala@abutala-vBox:/mnt/hdfs$ mount | grep 127
> 127.0.1.1:/ on /mnt/hdfs type nfs (rw,vers=3,proto=tcp,nolock,addr=127.0.1.1)
> abutala@abutala-vBox:/mnt/hdfs$ ls -lh
> total 512
> -rw-r--r-- 1 abutala supergroup  0 Jul 17 18:51 abc.txt
> drwxr-xr-x 2 abutala supergroup 64 Jul 17 18:31 temp
> abutala@abutala-vBox:/mnt/hdfs$ rm abc.txt
> abutala@abutala-vBox:/mnt/hdfs$ ls
> temp
> abutala@abutala-vBox:/mnt/hdfs$ rm -r temp
> rm: cannot remove `temp': Permission denied
> abutala@abutala-vBox:/mnt/hdfs$ ls
> temp
> abutala@abutala-vBox:/mnt/hdfs$
> {code}
> Contents of hdfs-site.xml:
> {code}
> 
> 
> dfs.nfs3.dump.dir
> /tmp/.hdfs-nfs3
> 
> 
> dfs.nfs.exports.allowed.hosts
> localhost ro
> 
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6703) NFS: Files can be deleted from a read-only mount

2014-07-22 Thread Srikanth Upputuri (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srikanth Upputuri updated HDFS-6703:


Attachment: HDFS-6703.patch

Attached a patch. Please review.

> NFS: Files can be deleted from a read-only mount
> 
>
> Key: HDFS-6703
> URL: https://issues.apache.org/jira/browse/HDFS-6703
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Affects Versions: 2.2.0
>Reporter: Abhiraj Butala
>Assignee: Srikanth Upputuri
> Attachments: HDFS-6703.patch
>
>
>   
> As reported by bigdatagroup  on hadoop-users mailing 
> list:
> {code}
> We exported our distributed filesystem with the following configuration 
> (Managed by Cloudera Manager over CDH 5.0.1):
>  
> dfs.nfs.exports.allowed.hosts
> 192.168.0.153 ro
>   
> As you can see, we expect the exported FS to be read-only, but in fact we are 
> able to delete files and folders stored on it (where the user has the correct 
> permissions), from  the client machine that mounted the FS.
> Other writing operations are correctly blocked.
> Hadoop Version in use: 2.3.0+cdh5.0.1+567"
> {code}
> I was able to reproduce the issue on latest hadoop trunk. Though I could only 
> delete files, deleting directories were correctly blocked:
> {code}
> abutala@abutala-vBox:/mnt/hdfs$ mount | grep 127
> 127.0.1.1:/ on /mnt/hdfs type nfs (rw,vers=3,proto=tcp,nolock,addr=127.0.1.1)
> abutala@abutala-vBox:/mnt/hdfs$ ls -lh
> total 512
> -rw-r--r-- 1 abutala supergroup  0 Jul 17 18:51 abc.txt
> drwxr-xr-x 2 abutala supergroup 64 Jul 17 18:31 temp
> abutala@abutala-vBox:/mnt/hdfs$ rm abc.txt
> abutala@abutala-vBox:/mnt/hdfs$ ls
> temp
> abutala@abutala-vBox:/mnt/hdfs$ rm -r temp
> rm: cannot remove `temp': Permission denied
> abutala@abutala-vBox:/mnt/hdfs$ ls
> temp
> abutala@abutala-vBox:/mnt/hdfs$
> {code}
> Contents of hdfs-site.xml:
> {code}
> 
> 
> dfs.nfs3.dump.dir
> /tmp/.hdfs-nfs3
> 
> 
> dfs.nfs.exports.allowed.hosts
> localhost ro
> 
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6703) NFS: Files can be deleted from a read-only mount

2014-07-18 Thread Srikanth Upputuri (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14066332#comment-14066332
 ] 

Srikanth Upputuri commented on HDFS-6703:
-

I am interested to work on this. Below is my initial analysis.
The access privilege check seems to be missing in the 'remove' implementation 
in RpcProgramNfs3.java. This check is available for 'rmdir' as shown below
{code}
  if (!checkAccessPrivilege(client, AccessPrivilege.READ_WRITE)) {
return new RMDIR3Response(Nfs3Status.NFS3ERR_ACCES, errWcc); 
  }
{code}

Any thoughts? I will analyze further and will update soon.

> NFS: Files can be deleted from a read-only mount
> 
>
> Key: HDFS-6703
> URL: https://issues.apache.org/jira/browse/HDFS-6703
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Reporter: Abhiraj Butala
>Assignee: Srikanth Upputuri
>
>   
> As reported by bigdatagroup  on hadoop-users mailing 
> list:
> {code}
> We exported our distributed filesystem with the following configuration 
> (Managed by Cloudera Manager over CDH 5.0.1):
>  
> dfs.nfs.exports.allowed.hosts
> 192.168.0.153 ro
>   
> As you can see, we expect the exported FS to be read-only, but in fact we are 
> able to delete files and folders stored on it (where the user has the correct 
> permissions), from  the client machine that mounted the FS.
> Other writing operations are correctly blocked.
> Hadoop Version in use: 2.3.0+cdh5.0.1+567"
> {code}
> I was able to reproduce the issue on latest hadoop trunk. Though I could only 
> delete files, deleting directories were correctly blocked:
> {code}
> abutala@abutala-vBox:/mnt/hdfs$ mount | grep 127
> 127.0.1.1:/ on /mnt/hdfs type nfs (rw,vers=3,proto=tcp,nolock,addr=127.0.1.1)
> abutala@abutala-vBox:/mnt/hdfs$ ls -lh
> total 512
> -rw-r--r-- 1 abutala supergroup  0 Jul 17 18:51 abc.txt
> drwxr-xr-x 2 abutala supergroup 64 Jul 17 18:31 temp
> abutala@abutala-vBox:/mnt/hdfs$ rm abc.txt
> abutala@abutala-vBox:/mnt/hdfs$ ls
> temp
> abutala@abutala-vBox:/mnt/hdfs$ rm -r temp
> rm: cannot remove `temp': Permission denied
> abutala@abutala-vBox:/mnt/hdfs$ ls
> temp
> abutala@abutala-vBox:/mnt/hdfs$
> {code}
> Contents of hdfs-site.xml:
> {code}
> 
> 
> dfs.nfs3.dump.dir
> /tmp/.hdfs-nfs3
> 
> 
> dfs.nfs.exports.allowed.hosts
> localhost ro
> 
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (HDFS-6703) NFS: Files can be deleted from a read-only mount

2014-07-18 Thread Srikanth Upputuri (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srikanth Upputuri reassigned HDFS-6703:
---

Assignee: Srikanth Upputuri

> NFS: Files can be deleted from a read-only mount
> 
>
> Key: HDFS-6703
> URL: https://issues.apache.org/jira/browse/HDFS-6703
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Reporter: Abhiraj Butala
>Assignee: Srikanth Upputuri
>
>   
> As reported by bigdatagroup  on hadoop-users mailing 
> list:
> {code}
> We exported our distributed filesystem with the following configuration 
> (Managed by Cloudera Manager over CDH 5.0.1):
>  
> dfs.nfs.exports.allowed.hosts
> 192.168.0.153 ro
>   
> As you can see, we expect the exported FS to be read-only, but in fact we are 
> able to delete files and folders stored on it (where the user has the correct 
> permissions), from  the client machine that mounted the FS.
> Other writing operations are correctly blocked.
> Hadoop Version in use: 2.3.0+cdh5.0.1+567"
> {code}
> I was able to reproduce the issue on latest hadoop trunk. Though I could only 
> delete files, deleting directories were correctly blocked:
> {code}
> abutala@abutala-vBox:/mnt/hdfs$ mount | grep 127
> 127.0.1.1:/ on /mnt/hdfs type nfs (rw,vers=3,proto=tcp,nolock,addr=127.0.1.1)
> abutala@abutala-vBox:/mnt/hdfs$ ls -lh
> total 512
> -rw-r--r-- 1 abutala supergroup  0 Jul 17 18:51 abc.txt
> drwxr-xr-x 2 abutala supergroup 64 Jul 17 18:31 temp
> abutala@abutala-vBox:/mnt/hdfs$ rm abc.txt
> abutala@abutala-vBox:/mnt/hdfs$ ls
> temp
> abutala@abutala-vBox:/mnt/hdfs$ rm -r temp
> rm: cannot remove `temp': Permission denied
> abutala@abutala-vBox:/mnt/hdfs$ ls
> temp
> abutala@abutala-vBox:/mnt/hdfs$
> {code}
> Contents of hdfs-site.xml:
> {code}
> 
> 
> dfs.nfs3.dump.dir
> /tmp/.hdfs-nfs3
> 
> 
> dfs.nfs.exports.allowed.hosts
> localhost ro
> 
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)