[jira] [Updated] (HDFS-6962) ACLs inheritance conflict with umaskmode
[ https://issues.apache.org/jira/browse/HDFS-6962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Srikanth Upputuri updated HDFS-6962: Attachment: HDFS-6962.1.patch > ACLs inheritance conflict with umaskmode > > > Key: HDFS-6962 > URL: https://issues.apache.org/jira/browse/HDFS-6962 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Affects Versions: 2.4.1 > Environment: CentOS release 6.5 (Final) >Reporter: LINTE >Assignee: Srikanth Upputuri > Labels: hadoop, security > Attachments: HDFS-6962.1.patch > > > In hdfs-site.xml > > dfs.umaskmode > 027 > > 1/ Create a directory as superuser > bash# hdfs dfs -mkdir /tmp/ACLS > 2/ set default ACLs on this directory rwx access for group readwrite and user > toto > bash# hdfs dfs -setfacl -m default:group:readwrite:rwx /tmp/ACLS > bash# hdfs dfs -setfacl -m default:user:toto:rwx /tmp/ACLS > 3/ check ACLs /tmp/ACLS/ > bash# hdfs dfs -getfacl /tmp/ACLS/ > # file: /tmp/ACLS > # owner: hdfs > # group: hadoop > user::rwx > group::r-x > other::--- > default:user::rwx > default:user:toto:rwx > default:group::r-x > default:group:readwrite:rwx > default:mask::rwx > default:other::--- > user::rwx | group::r-x | other::--- matches with the umaskmode defined in > hdfs-site.xml, everything ok ! > default:group:readwrite:rwx allow readwrite group with rwx access for > inhéritance. > default:user:toto:rwx allow toto user with rwx access for inhéritance. > default:mask::rwx inhéritance mask is rwx, so no mask > 4/ Create a subdir to test inheritance of ACL > bash# hdfs dfs -mkdir /tmp/ACLS/hdfs > 5/ check ACLs /tmp/ACLS/hdfs > bash# hdfs dfs -getfacl /tmp/ACLS/hdfs > # file: /tmp/ACLS/hdfs > # owner: hdfs > # group: hadoop > user::rwx > user:toto:rwx #effective:r-x > group::r-x > group:readwrite:rwx #effective:r-x > mask::r-x > other::--- > default:user::rwx > default:user:toto:rwx > default:group::r-x > default:group:readwrite:rwx > default:mask::rwx > default:other::--- > Here we can see that the readwrite group has rwx ACL bu only r-x is effective > because the mask is r-x (mask::r-x) in spite of default mask for inheritance > is set to default:mask::rwx on /tmp/ACLS/ > 6/ Modifiy hdfs-site.xml et restart namenode > > dfs.umaskmode > 010 > > 7/ Create a subdir to test inheritance of ACL with new parameter umaskmode > bash# hdfs dfs -mkdir /tmp/ACLS/hdfs2 > 8/ Check ACL on /tmp/ACLS/hdfs2 > bash# hdfs dfs -getfacl /tmp/ACLS/hdfs2 > # file: /tmp/ACLS/hdfs2 > # owner: hdfs > # group: hadoop > user::rwx > user:toto:rwx #effective:rw- > group::r-x #effective:r-- > group:readwrite:rwx #effective:rw- > mask::rw- > other::--- > default:user::rwx > default:user:toto:rwx > default:group::r-x > default:group:readwrite:rwx > default:mask::rwx > default:other::--- > So HDFS masks the ACL value (user, group and other -- exepted the POSIX > owner -- ) with the group mask of dfs.umaskmode properties when creating > directory with inherited ACL. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6962) ACLs inheritance conflict with umaskmode
[ https://issues.apache.org/jira/browse/HDFS-6962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14322555#comment-14322555 ] Srikanth Upputuri commented on HDFS-6962: - [~cnauroth], after reading your comment above I have studied the relevant code and this is what I think. The umask should be loaded and applied on the server, depending on whether the parent directory has default acls or not. Only if default acls do not exist, umask will be applied to the mode. For mode, client will either pass the source permissions(cp, put, copyFromLocal) or the default permissions if no source permissions exist(create, mkdir etc). Currently the client code wrongly applies the mask to the permissions before making RPC calls. This happens at several places and this needs to be changed. For the copyFromLocal command, I have compared the behavior with 'cp' on Linux local file system. The resultant permissions of the destination file are determined by the parent directory's default permissions and the source file's permissions (mode). The umask is used only when the parent directory doesn't have default permissions. This is just like create api, except that in case of 'create', the mode takes default value (0666). The second RPC to 'setPermission' is only used when 'preserve attributes' option -p is used and permissions/ACLs are expected to be retained and in this case umask is not required. So, the only change 'copyFromLocal' may require is pass the the source file's permissions as mode, without masking. Compatibility: Older clients applying the mask before passing the mode to server will retain their existing behavior if the parent directory has default permissions. In case the parent directory does not have default permissions, the mask gets applied one more time on the server without causing any change to the permissions. So, effectively the clients see the same behavior as existing. I am attaching a prototype patch, please take a look. I will add tests later once the approach is validated. > ACLs inheritance conflict with umaskmode > > > Key: HDFS-6962 > URL: https://issues.apache.org/jira/browse/HDFS-6962 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Affects Versions: 2.4.1 > Environment: CentOS release 6.5 (Final) >Reporter: LINTE >Assignee: Srikanth Upputuri > Labels: hadoop, security > > In hdfs-site.xml > > dfs.umaskmode > 027 > > 1/ Create a directory as superuser > bash# hdfs dfs -mkdir /tmp/ACLS > 2/ set default ACLs on this directory rwx access for group readwrite and user > toto > bash# hdfs dfs -setfacl -m default:group:readwrite:rwx /tmp/ACLS > bash# hdfs dfs -setfacl -m default:user:toto:rwx /tmp/ACLS > 3/ check ACLs /tmp/ACLS/ > bash# hdfs dfs -getfacl /tmp/ACLS/ > # file: /tmp/ACLS > # owner: hdfs > # group: hadoop > user::rwx > group::r-x > other::--- > default:user::rwx > default:user:toto:rwx > default:group::r-x > default:group:readwrite:rwx > default:mask::rwx > default:other::--- > user::rwx | group::r-x | other::--- matches with the umaskmode defined in > hdfs-site.xml, everything ok ! > default:group:readwrite:rwx allow readwrite group with rwx access for > inhéritance. > default:user:toto:rwx allow toto user with rwx access for inhéritance. > default:mask::rwx inhéritance mask is rwx, so no mask > 4/ Create a subdir to test inheritance of ACL > bash# hdfs dfs -mkdir /tmp/ACLS/hdfs > 5/ check ACLs /tmp/ACLS/hdfs > bash# hdfs dfs -getfacl /tmp/ACLS/hdfs > # file: /tmp/ACLS/hdfs > # owner: hdfs > # group: hadoop > user::rwx > user:toto:rwx #effective:r-x > group::r-x > group:readwrite:rwx #effective:r-x > mask::r-x > other::--- > default:user::rwx > default:user:toto:rwx > default:group::r-x > default:group:readwrite:rwx > default:mask::rwx > default:other::--- > Here we can see that the readwrite group has rwx ACL bu only r-x is effective > because the mask is r-x (mask::r-x) in spite of default mask for inheritance > is set to default:mask::rwx on /tmp/ACLS/ > 6/ Modifiy hdfs-site.xml et restart namenode > > dfs.umaskmode > 010 > > 7/ Create a subdir to test inheritance of ACL with new parameter umaskmode > bash# hdfs dfs -mkdir /tmp/ACLS/hdfs2 > 8/ Check ACL on /tmp/ACLS/hdfs2 > bash# hdfs dfs -getfacl /tmp/ACLS/hdfs2 > # file: /tmp/ACLS/hdfs2 > # owner: hdfs > # group: hadoop > user::rwx > user:toto:rwx #effective:rw- > group::r-x #effective:r-- > group:readwrite:rwx #effective:rw- > mask::rw- > other::--- > default:user::rwx > default:user:toto:rwx > default:group::r-x > default:group:readwrite:rwx > default:mask::rwx > default:other::--- > So HDFS masks the ACL value (user, group and other -- exepted the POSIX > owner -- ) with the group mask of dfs.umaskmode prope
[jira] [Assigned] (HDFS-6962) ACLs inheritance conflict with umaskmode
[ https://issues.apache.org/jira/browse/HDFS-6962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Srikanth Upputuri reassigned HDFS-6962: --- Assignee: Srikanth Upputuri > ACLs inheritance conflict with umaskmode > > > Key: HDFS-6962 > URL: https://issues.apache.org/jira/browse/HDFS-6962 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Affects Versions: 2.4.1 > Environment: CentOS release 6.5 (Final) >Reporter: LINTE >Assignee: Srikanth Upputuri > Labels: hadoop, security > > In hdfs-site.xml > > dfs.umaskmode > 027 > > 1/ Create a directory as superuser > bash# hdfs dfs -mkdir /tmp/ACLS > 2/ set default ACLs on this directory rwx access for group readwrite and user > toto > bash# hdfs dfs -setfacl -m default:group:readwrite:rwx /tmp/ACLS > bash# hdfs dfs -setfacl -m default:user:toto:rwx /tmp/ACLS > 3/ check ACLs /tmp/ACLS/ > bash# hdfs dfs -getfacl /tmp/ACLS/ > # file: /tmp/ACLS > # owner: hdfs > # group: hadoop > user::rwx > group::r-x > other::--- > default:user::rwx > default:user:toto:rwx > default:group::r-x > default:group:readwrite:rwx > default:mask::rwx > default:other::--- > user::rwx | group::r-x | other::--- matches with the umaskmode defined in > hdfs-site.xml, everything ok ! > default:group:readwrite:rwx allow readwrite group with rwx access for > inhéritance. > default:user:toto:rwx allow toto user with rwx access for inhéritance. > default:mask::rwx inhéritance mask is rwx, so no mask > 4/ Create a subdir to test inheritance of ACL > bash# hdfs dfs -mkdir /tmp/ACLS/hdfs > 5/ check ACLs /tmp/ACLS/hdfs > bash# hdfs dfs -getfacl /tmp/ACLS/hdfs > # file: /tmp/ACLS/hdfs > # owner: hdfs > # group: hadoop > user::rwx > user:toto:rwx #effective:r-x > group::r-x > group:readwrite:rwx #effective:r-x > mask::r-x > other::--- > default:user::rwx > default:user:toto:rwx > default:group::r-x > default:group:readwrite:rwx > default:mask::rwx > default:other::--- > Here we can see that the readwrite group has rwx ACL bu only r-x is effective > because the mask is r-x (mask::r-x) in spite of default mask for inheritance > is set to default:mask::rwx on /tmp/ACLS/ > 6/ Modifiy hdfs-site.xml et restart namenode > > dfs.umaskmode > 010 > > 7/ Create a subdir to test inheritance of ACL with new parameter umaskmode > bash# hdfs dfs -mkdir /tmp/ACLS/hdfs2 > 8/ Check ACL on /tmp/ACLS/hdfs2 > bash# hdfs dfs -getfacl /tmp/ACLS/hdfs2 > # file: /tmp/ACLS/hdfs2 > # owner: hdfs > # group: hadoop > user::rwx > user:toto:rwx #effective:rw- > group::r-x #effective:r-- > group:readwrite:rwx #effective:rw- > mask::rw- > other::--- > default:user::rwx > default:user:toto:rwx > default:group::r-x > default:group:readwrite:rwx > default:mask::rwx > default:other::--- > So HDFS masks the ACL value (user, group and other -- exepted the POSIX > owner -- ) with the group mask of dfs.umaskmode properties when creating > directory with inherited ACL. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7722) DataNode#checkDiskError should also remove Storage when error is found.
[ https://issues.apache.org/jira/browse/HDFS-7722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14306893#comment-14306893 ] Srikanth Upputuri commented on HDFS-7722: - Reassigned to you, [~eddyxu]. > DataNode#checkDiskError should also remove Storage when error is found. > --- > > Key: HDFS-7722 > URL: https://issues.apache.org/jira/browse/HDFS-7722 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.6.0 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu > > When {{DataNode#checkDiskError}} found disk errors, it removes all block > metadatas from {{FsDatasetImpl}}. However, it does not removed the > corresponding {{DataStorage}} and {{BlockPoolSliceStorage}}. > The result is that, we could not directly run {{reconfig}} to hot swap the > failure disks without changing the configure file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-7722) DataNode#checkDiskError should also remove Storage when error is found.
[ https://issues.apache.org/jira/browse/HDFS-7722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Srikanth Upputuri reassigned HDFS-7722: --- Assignee: Lei (Eddy) Xu (was: Srikanth Upputuri) > DataNode#checkDiskError should also remove Storage when error is found. > --- > > Key: HDFS-7722 > URL: https://issues.apache.org/jira/browse/HDFS-7722 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.6.0 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu > > When {{DataNode#checkDiskError}} found disk errors, it removes all block > metadatas from {{FsDatasetImpl}}. However, it does not removed the > corresponding {{DataStorage}} and {{BlockPoolSliceStorage}}. > The result is that, we could not directly run {{reconfig}} to hot swap the > failure disks without changing the configure file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-7722) DataNode#checkDiskError should also remove Storage when error is found.
[ https://issues.apache.org/jira/browse/HDFS-7722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Srikanth Upputuri reassigned HDFS-7722: --- Assignee: Srikanth Upputuri > DataNode#checkDiskError should also remove Storage when error is found. > --- > > Key: HDFS-7722 > URL: https://issues.apache.org/jira/browse/HDFS-7722 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.6.0 >Reporter: Lei (Eddy) Xu >Assignee: Srikanth Upputuri > > When {{DataNode#checkDiskError}} found disk errors, it removes all block > metadatas from {{FsDatasetImpl}}. However, it does not removed the > corresponding {{DataStorage}} and {{BlockPoolSliceStorage}}. > The result is that, we could not directly run {{reconfig}} to hot swap the > failure disks without changing the configure file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6753) When one the Disk is full and all the volumes configured are unhealthy , then Datanode is not considering it as failure and datanode process is not shutting down .
[ https://issues.apache.org/jira/browse/HDFS-6753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14302784#comment-14302784 ] Srikanth Upputuri commented on HDFS-6753: - A write request to DN will first check for a disk volume with available space then proceeds to create a rbw file on it. The 'check disk error' is triggered when the rbw file can not be created. But if a volume with sufficient space could not be found, the request just throws an exception without initiating 'check disk error'. This is reasonable to expect because if there is no space available on any volume, DN may still be able to service read requests, so 'not enough space' is not a sufficient condition for DN shutdown. However, if after this condition all the volumes happen to become faulty, a subsequent read request will detect this condition and shutdown DN anyway. Therefore there is no need to fix this behavior. > When one the Disk is full and all the volumes configured are unhealthy , then > Datanode is not considering it as failure and datanode process is not > shutting down . > --- > > Key: HDFS-6753 > URL: https://issues.apache.org/jira/browse/HDFS-6753 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: J.Andreina >Assignee: Srikanth Upputuri > > Env Details : > = > Cluster has 3 Datanode > Cluster installed with "Rex" user > dfs.datanode.failed.volumes.tolerated = 3 > dfs.blockreport.intervalMsec = 18000 > dfs.datanode.directoryscan.interval = 120 > DN_XX1.XX1.XX1.XX1 data dir = > /mnt/tmp_Datanode,/home/REX/data/dfs1/data,/home/REX/data/dfs2/data,/opt/REX/dfs/data > > > /home/REX/data/dfs1/data,/home/REX/data/dfs2/data,/opt/REX/dfs/data - > permission is denied ( hence DN considered the volume as failed ) > > Expected behavior is observed when disk is not full: > > > Step 1: Change the permissions of /mnt/tmp_Datanode to root > > Step 2: Perform write operations ( DN detects that all Volume configured is > failed and gets shutdown ) > > Scenario 1: > === > > Step 1 : Make /mnt/tmp_Datanode disk full and change the permissions to root > Step 2 : Perform client write operations ( disk full exception is thrown , > but Datanode is not getting shutdown , eventhough all the volume configured > has failed) > > {noformat} > > 2014-07-21 14:10:52,814 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: > XX1.XX1.XX1.XX1:50010:DataXceiver error processing WRITE_BLOCK operation > src: /XX2.XX2.XX2.XX2:10106 dst: /XX1.XX1.XX1.XX1:50010 > > org.apache.hadoop.util.DiskChecker$DiskOutOfSpaceException: Out of space: The > volume with the most available space (=4096 B) is less than the block size > (=134217728 B). > > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.RoundRobinVolumeChoosingPolicy.chooseVolume(RoundRobinVolumeChoosingPolicy.java:60) > > {noformat} > > Observations : > == > 1. Write operations does not shutdown Datanode , eventhough all the volume > configured is failed ( When one of the disk is full and for all the disk > permission is denied) > > 2. Directory scannning fails , still DN is not getting shutdown > > > > {noformat} > > 2014-07-21 14:13:00,180 WARN > org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: Exception occured > while compiling report: > > java.io.IOException: Invalid directory or I/O error occurred for dir: > /mnt/tmp_Datanode/current/BP-1384489961-XX2.XX2.XX2.XX2-845784615183/current/finalized > > at org.apache.hadoop.fs.FileUtil.listFiles(FileUtil.java:1164) > > at > org.apache.hadoop.hdfs.server.datanode.DirectoryScanner$ReportCompiler.compileReport(DirectoryScanner.java:596) > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HDFS-7082) When replication factor equals number of data nodes, corrupt replica will never get substituted with good replica
[ https://issues.apache.org/jira/browse/HDFS-7082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-7082 started by Srikanth Upputuri. --- > When replication factor equals number of data nodes, corrupt replica will > never get substituted with good replica > - > > Key: HDFS-7082 > URL: https://issues.apache.org/jira/browse/HDFS-7082 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Srikanth Upputuri >Assignee: Srikanth Upputuri >Priority: Minor > > BlockManager will not invalidate a corrupt replica if this brings down the > total number of replicas below replication factor (except if the corrupt > replica has a wrong genstamp). On clusters where the replication factor = > total data nodes, a new replica can not be created from a live replica as all > the available datanodes already have a replica each. Because of this, the > corrupt replicas will never be substituted with good replicas, so will never > get deleted. Sooner or later all replicas may get corrupt and there will be > no live replicas in the cluster for this block. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-2932) Under replicated block after the pipeline recovery.
[ https://issues.apache.org/jira/browse/HDFS-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Srikanth Upputuri resolved HDFS-2932. - Resolution: Duplicate Fix Version/s: (was: 0.24.0) Closed as duplicate of HDFS-3493. > Under replicated block after the pipeline recovery. > --- > > Key: HDFS-2932 > URL: https://issues.apache.org/jira/browse/HDFS-2932 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 0.24.0 >Reporter: J.Andreina >Assignee: Srikanth Upputuri > > Started 1NN,DN1,DN2,DN3 in the same machine. > Written a huge file of size 2 Gb > while the write for the block-id-1005 is in progress bruought down DN3. > after the pipeline recovery happened.Block stamp changed into block_id_1006 > in DN1,Dn2. > after the write is over.DN3 is brought up and fsck command is issued. > the following mess is displayed as follows > "block-id_1006 is underreplicatede.Target replicas is 3 but found 2 replicas". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-2932) Under replicated block after the pipeline recovery.
[ https://issues.apache.org/jira/browse/HDFS-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Srikanth Upputuri reassigned HDFS-2932: --- Assignee: Srikanth Upputuri > Under replicated block after the pipeline recovery. > --- > > Key: HDFS-2932 > URL: https://issues.apache.org/jira/browse/HDFS-2932 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 0.24.0 >Reporter: J.Andreina >Assignee: Srikanth Upputuri > Fix For: 0.24.0 > > > Started 1NN,DN1,DN2,DN3 in the same machine. > Written a huge file of size 2 Gb > while the write for the block-id-1005 is in progress bruought down DN3. > after the pipeline recovery happened.Block stamp changed into block_id_1006 > in DN1,Dn2. > after the write is over.DN3 is brought up and fsck command is issued. > the following mess is displayed as follows > "block-id_1006 is underreplicatede.Target replicas is 3 but found 2 replicas". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7082) When replication factor equals number of data nodes, corrupt replica will never get substituted with good replica
[ https://issues.apache.org/jira/browse/HDFS-7082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14138555#comment-14138555 ] Srikanth Upputuri commented on HDFS-7082: - Currently if the below condition in BlockManager#markBlockAsCorrupt is true we go ahead and invalidate the corrupt replica. But for the scenario in question, it will be false. {code} boolean hasMoreCorruptReplicas = minReplicationSatisfied && (numberOfReplicas.liveReplicas() + numberOfReplicas.corruptReplicas()) > bc.getBlockReplication(); {code} I propose to change this to {code} boolean hasMoreCorruptReplicas = minReplicationSatisfied && (numberOfReplicas.liveReplicas() + numberOfReplicas.corruptReplicas()) >= bc.getBlockReplication(); {code} This solves the current problem as well as retains almost all the existing behavior. Now we let the 'total replicas' become 'replication factor - 1'. And we don't let it go down beyond that. This will effectively vacate a slot on exactly one datanode and let the replication happen, thereby solving the reported problem. Example scenarios: 1. DN1, DN2, DN3, replication factor =3, DN3 replica is corrupt. The corrupt replica is invalidated and deleted. New live replica will be written to DN3. 2. DN1, DN2, DN3, replication factor =3, DN2 and DN3 replicas are corrupt. DN3 sends block report. The corrupt replica on DN3 is invalidated and deleted. DN2 sends block report. The corrupt replica on DN2 will not be invalidated as the current 'total replicas' < 'replication factor'. New live replica will eventually be written to DN3. Then on further block report from DN2, the corrupt replica will get deleted. > When replication factor equals number of data nodes, corrupt replica will > never get substituted with good replica > - > > Key: HDFS-7082 > URL: https://issues.apache.org/jira/browse/HDFS-7082 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Srikanth Upputuri >Assignee: Srikanth Upputuri >Priority: Minor > > BlockManager will not invalidate a corrupt replica if this brings down the > total number of replicas below replication factor (except if the corrupt > replica has a wrong genstamp). On clusters where the replication factor = > total data nodes, a new replica can not be created from a live replica as all > the available datanodes already have a replica each. Because of this, the > corrupt replicas will never be substituted with good replicas, so will never > get deleted. Sooner or later all replicas may get corrupt and there will be > no live replicas in the cluster for this block. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7082) When replication factor equals number of data nodes, corrupt replica will never get substituted with good replica
Srikanth Upputuri created HDFS-7082: --- Summary: When replication factor equals number of data nodes, corrupt replica will never get substituted with good replica Key: HDFS-7082 URL: https://issues.apache.org/jira/browse/HDFS-7082 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Srikanth Upputuri Assignee: Srikanth Upputuri Priority: Minor BlockManager will not invalidate a corrupt replica if this brings down the total number of replicas below replication factor (except if the corrupt replica has a wrong genstamp). On clusters where the replication factor = total data nodes, a new replica can not be created from a live replica as all the available datanodes already have a replica each. Because of this, the corrupt replicas will never be substituted with good replicas, so will never get deleted. Sooner or later all replicas may get corrupt and there will be no live replicas in the cluster for this block. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-6805) NPE is thrown at Namenode , for every block report sent from DN
[ https://issues.apache.org/jira/browse/HDFS-6805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Srikanth Upputuri reassigned HDFS-6805: --- Assignee: Srikanth Upputuri > NPE is thrown at Namenode , for every block report sent from DN > --- > > Key: HDFS-6805 > URL: https://issues.apache.org/jira/browse/HDFS-6805 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: J.Andreina >Assignee: Srikanth Upputuri > > Env Details : > HA Cluster > 2 DN > Procedure : > === > During Client operation is in progress restarted one DN . > After restart for every block report NPE is thrown at Namenode and DN side. > Namenode Log: > = > {noformat} > 2014-08-01 18:24:16,585 WARN org.apache.hadoop.ipc.Server: IPC Server handler > 3 on 8020, call > org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.blockReport from > 10.18.40.14:38651 Call#7 Retry#0 > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.moveBlockToHead(BlockInfo.java:354) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.moveBlockToHead(DatanodeStorageInfo.java:242) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1905) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1772) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1699) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1019) > at > org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:152) > at > org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28061) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > {noformat} > Datanode Log: > > {noformat} > 2014-08-01 18:34:21,793 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > RemoteException in offerService > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.moveBlockToHead(BlockInfo.java:354) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.moveBlockToHead(DatanodeStorageInfo.java:242) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1905) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1772) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1699) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1019) > at > org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:152) > at > org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28061) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) > at java.security.AccessController.doPrivileged(Native Method) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-7033) dfs.web.authentication.filter should be documented
[ https://issues.apache.org/jira/browse/HDFS-7033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Srikanth Upputuri reassigned HDFS-7033: --- Assignee: Srikanth Upputuri > dfs.web.authentication.filter should be documented > -- > > Key: HDFS-7033 > URL: https://issues.apache.org/jira/browse/HDFS-7033 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation, security >Affects Versions: 2.4.0 >Reporter: Allen Wittenauer >Assignee: Srikanth Upputuri > > HDFS-5716 added dfs.web.authentication.filter but this doesn't appear to be > documented anywhere. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6606) Optimize HDFS Encrypted Transport performance
[ https://issues.apache.org/jira/browse/HDFS-6606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14136781#comment-14136781 ] Srikanth Upputuri commented on HDFS-6606: - {quote} In this JIRA, 3DES is used to encrypt/decrypt the negotiated cipher key (originally it was used to encrypt the transferred data). You are right, the channel confidentiality is the same, but it's enough. Our goal is to improve the performance.{quote} Thank you for the explanation. I read about AES-NI and I now understand that with a JCE provider like Diceros AES performance will significantly improve. However, if we need to provide support for increased confidentiality with AES, can we not do it by implementing GSSAPI mechanism in addition to the existing DIGEST-MD5, the same way it is implemented for rpc? The java gss api has support for AES anyway as described in http://docs.oracle.com/javase/7/docs/technotes/guides/security/jgss/jgss-features.html. That way we get better performance (with AES-NI support) as well as better data privacy. I have read through all the comments but didn't quite get why this approach is not considered. Any reasons? > Optimize HDFS Encrypted Transport performance > - > > Key: HDFS-6606 > URL: https://issues.apache.org/jira/browse/HDFS-6606 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, hdfs-client, security >Reporter: Yi Liu >Assignee: Yi Liu > Attachments: HDFS-6606.001.patch, HDFS-6606.002.patch, > HDFS-6606.003.patch, HDFS-6606.004.patch, > OptimizeHdfsEncryptedTransportperformance.pdf > > > In HDFS-3637, [~atm] added support for encrypting the DataTransferProtocol, > it was a great work. > It utilizes SASL {{Digest-MD5}} mechanism (use Qop: auth-conf), it supports > three security strength: > * high 3des or rc4 (128bits) > * medium des or rc4(56bits) > * low rc4(40bits) > 3des and rc4 are slow, only *tens of MB/s*, > http://www.javamex.com/tutorials/cryptography/ciphers.shtml > http://www.cs.wustl.edu/~jain/cse567-06/ftp/encryption_perf/ > I will give more detailed performance data in future. Absolutely it’s > bottleneck and will vastly affect the end to end performance. > AES(Advanced Encryption Standard) is recommended as a replacement of DES, > it’s more secure; with AES-NI support, the throughput can reach nearly > *2GB/s*, it won’t be the bottleneck any more, AES and CryptoCodec work is > supported in HADOOP-10150, HADOOP-10603 and HADOOP-10693 (We may need to add > a new mode support for AES). > This JIRA will use AES with AES-NI support as encryption algorithm for > DataTransferProtocol. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-2932) Under replicated block after the pipeline recovery.
[ https://issues.apache.org/jira/browse/HDFS-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14133800#comment-14133800 ] Srikanth Upputuri commented on HDFS-2932: - [~vinayrpet], though I really don't see a reason why we should not delete a mis-stamped replica (during block report processing) after the block is committed, I agree with you that this improvement in early detection may be unnecessary (or even slightly risky?) particularly when the benefit is very little. Can I mark it duplicate of HDFS-3493? > Under replicated block after the pipeline recovery. > --- > > Key: HDFS-2932 > URL: https://issues.apache.org/jira/browse/HDFS-2932 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 0.24.0 >Reporter: J.Andreina > Fix For: 0.24.0 > > > Started 1NN,DN1,DN2,DN3 in the same machine. > Written a huge file of size 2 Gb > while the write for the block-id-1005 is in progress bruought down DN3. > after the pipeline recovery happened.Block stamp changed into block_id_1006 > in DN1,Dn2. > after the write is over.DN3 is brought up and fsck command is issued. > the following mess is displayed as follows > "block-id_1006 is underreplicatede.Target replicas is 3 but found 2 replicas". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-6753) When one the Disk is full and all the volumes configured are unhealthy , then Datanode is not considering it as failure and datanode process is not shutting down .
[ https://issues.apache.org/jira/browse/HDFS-6753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Srikanth Upputuri reassigned HDFS-6753: --- Assignee: Srikanth Upputuri > When one the Disk is full and all the volumes configured are unhealthy , then > Datanode is not considering it as failure and datanode process is not > shutting down . > --- > > Key: HDFS-6753 > URL: https://issues.apache.org/jira/browse/HDFS-6753 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: J.Andreina >Assignee: Srikanth Upputuri > > Env Details : > = > Cluster has 3 Datanode > Cluster installed with "Rex" user > dfs.datanode.failed.volumes.tolerated = 3 > dfs.blockreport.intervalMsec = 18000 > dfs.datanode.directoryscan.interval = 120 > DN_XX1.XX1.XX1.XX1 data dir = > /mnt/tmp_Datanode,/home/REX/data/dfs1/data,/home/REX/data/dfs2/data,/opt/REX/dfs/data > > > /home/REX/data/dfs1/data,/home/REX/data/dfs2/data,/opt/REX/dfs/data - > permission is denied ( hence DN considered the volume as failed ) > > Expected behavior is observed when disk is not full: > > > Step 1: Change the permissions of /mnt/tmp_Datanode to root > > Step 2: Perform write operations ( DN detects that all Volume configured is > failed and gets shutdown ) > > Scenario 1: > === > > Step 1 : Make /mnt/tmp_Datanode disk full and change the permissions to root > Step 2 : Perform client write operations ( disk full exception is thrown , > but Datanode is not getting shutdown , eventhough all the volume configured > has failed) > > {noformat} > > 2014-07-21 14:10:52,814 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: > XX1.XX1.XX1.XX1:50010:DataXceiver error processing WRITE_BLOCK operation > src: /XX2.XX2.XX2.XX2:10106 dst: /XX1.XX1.XX1.XX1:50010 > > org.apache.hadoop.util.DiskChecker$DiskOutOfSpaceException: Out of space: The > volume with the most available space (=4096 B) is less than the block size > (=134217728 B). > > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.RoundRobinVolumeChoosingPolicy.chooseVolume(RoundRobinVolumeChoosingPolicy.java:60) > > {noformat} > > Observations : > == > 1. Write operations does not shutdown Datanode , eventhough all the volume > configured is failed ( When one of the disk is full and for all the disk > permission is denied) > > 2. Directory scannning fails , still DN is not getting shutdown > > > > {noformat} > > 2014-07-21 14:13:00,180 WARN > org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: Exception occured > while compiling report: > > java.io.IOException: Invalid directory or I/O error occurred for dir: > /mnt/tmp_Datanode/current/BP-1384489961-XX2.XX2.XX2.XX2-845784615183/current/finalized > > at org.apache.hadoop.fs.FileUtil.listFiles(FileUtil.java:1164) > > at > org.apache.hadoop.hdfs.server.datanode.DirectoryScanner$ReportCompiler.compileReport(DirectoryScanner.java:596) > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6606) Optimize HDFS Encrypted Transport performance
[ https://issues.apache.org/jira/browse/HDFS-6606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14131300#comment-14131300 ] Srikanth Upputuri commented on HDFS-6606: - This is a very nice effort. It's a great deal of learning for me reading through this jira and HDFS-3637. But I have a couple of fundamental questions here. Does this patch improve data transfer speed? But isn't the existing RC4 option much faster (as shown in the comparison analysis)? Does this patch improve the data transfer channel confidentiality? But, if we transfer the AES keys and IVs over a 3DES encrypted channel, isn't the overall confidentiality effectively same as someone who can successfully intercept and decrypt the 3DES traffic can read the AES keys? Am I missing something here? > Optimize HDFS Encrypted Transport performance > - > > Key: HDFS-6606 > URL: https://issues.apache.org/jira/browse/HDFS-6606 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, hdfs-client, security >Reporter: Yi Liu >Assignee: Yi Liu > Attachments: HDFS-6606.001.patch, HDFS-6606.002.patch, > HDFS-6606.003.patch, HDFS-6606.004.patch, > OptimizeHdfsEncryptedTransportperformance.pdf > > > In HDFS-3637, [~atm] added support for encrypting the DataTransferProtocol, > it was a great work. > It utilizes SASL {{Digest-MD5}} mechanism (use Qop: auth-conf), it supports > three security strength: > * high 3des or rc4 (128bits) > * medium des or rc4(56bits) > * low rc4(40bits) > 3des and rc4 are slow, only *tens of MB/s*, > http://www.javamex.com/tutorials/cryptography/ciphers.shtml > http://www.cs.wustl.edu/~jain/cse567-06/ftp/encryption_perf/ > I will give more detailed performance data in future. Absolutely it’s > bottleneck and will vastly affect the end to end performance. > AES(Advanced Encryption Standard) is recommended as a replacement of DES, > it’s more secure; with AES-NI support, the throughput can reach nearly > *2GB/s*, it won’t be the bottleneck any more, AES and CryptoCodec work is > supported in HADOOP-10150, HADOOP-10603 and HADOOP-10693 (We may need to add > a new mode support for AES). > This JIRA will use AES with AES-NI support as encryption algorithm for > DataTransferProtocol. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3586) Blocks are not getting replicate even DN's are availble.
[ https://issues.apache.org/jira/browse/HDFS-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14126903#comment-14126903 ] Srikanth Upputuri commented on HDFS-3586: - HDFS-3493 resolves the same issue. Now if the number of live replicas are more than minimum required and total replicas (live+corrupt) is more than replication factor, we invalidate the extra corrupt replica(s). Also, if the replica happens to be one that was discarded during a pipeline recovery, this will be invalidated if there are minimum number of live replicas irrespective of total replica count. However, there is one possibility that can result in NN sending replication requests to copy a block to a DN with a write-pipeline-failed-replica. This is if the block is still being written when the reconnected DN sends a block report with an RBW/RWR replica for this block. I discussed this scenario in more detail in HDFS-2932. But for this situation, I think this jira can be closed as a duplicate to HDFS-3493. Please suggest. > Blocks are not getting replicate even DN's are availble. > > > Key: HDFS-3586 > URL: https://issues.apache.org/jira/browse/HDFS-3586 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode >Affects Versions: 2.0.0-alpha, 3.0.0 >Reporter: Brahma Reddy Battula >Assignee: amith > Attachments: HDFS-3586-analysis.txt > > > Scenario: > = > Started four DN's(Say DN1,DN2,DN3 and DN4) > writing files with RF=3.. > formed pipeline with DN1->DN2->DN3. > Since DN3 network is very slow.it's not able to send acks. > Again pipeline is fromed with DN1->DN2->DN4. > Here DN4 network is also slow. > So finally commitblocksync happend tp DN1 and DN2 successfully. > block present in all the four DN's(finalized state in two DN's and rbw state > in another DN's).. > Here NN is asking replicate to DN3 and DN4,but it's failing since replcia's > are already present in RBW dir. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-2932) Under replicated block after the pipeline recovery.
[ https://issues.apache.org/jira/browse/HDFS-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125490#comment-14125490 ] Srikanth Upputuri commented on HDFS-2932: - Further analysing the two cases detailed by Vinay: *Case 1*. I think the fix given for HDFS-3493 will solve this case as the corrupt replica(result of pipeline failure) will be eventually invalidated, inspite of the fact that total replicas = replication factor. Please confirm. *Case 2*. If a write-pipeline-failed replica from a restarted DN arrives before the stored block is 'completed', it will not be marked as corrupt. Later when NN computes the replication work it is not aware of the fact that a corrupt replica exists on DN3, so it will keep scheduling replication from say DN2 to DN3 without success till next block report from DN3 is processed. {code} //BlockManager#checkReplicaCorrupt case RBW: case RWR: if (!storedBlock.isComplete()) { return null; // not corrupt } {code} There are two exclusive time windows when such a replica can be reported. DN restarts and replica is reported before the client finished writing the block, i.e the block is not 'committed'. DN restarts and replica is reported after 'commit' but before 'complete'. Solution is to be able to detect and capture a write-pipeline-failed replica as early as possible. First fix may be to change the check from 'isCompleted' to 'isCommitted'. This will capture write-pipeline-failed replicas reported just after commit and before 'complete' and mark them as corrupt. Then to capture write-pipeline-failed replicas reported before commit, I am investigating if this can be solved by marking them as corrupt as part of commit. There already exists a check to find any mis-stamped replicas during commit but we only remove them from the blocksMap. In addition can we not mark such replicas as corrupt? {code} //BlockInfoUnderConstruction#commitBlock // Sort out invalid replicas. setGenerationStampAndVerifyReplicas(block.getGenerationStamp()); {code} Any thoughts/suggestions? > Under replicated block after the pipeline recovery. > --- > > Key: HDFS-2932 > URL: https://issues.apache.org/jira/browse/HDFS-2932 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 0.24.0 >Reporter: J.Andreina > Fix For: 0.24.0 > > > Started 1NN,DN1,DN2,DN3 in the same machine. > Written a huge file of size 2 Gb > while the write for the block-id-1005 is in progress bruought down DN3. > after the pipeline recovery happened.Block stamp changed into block_id_1006 > in DN1,Dn2. > after the write is over.DN3 is brought up and fsck command is issued. > the following mess is displayed as follows > "block-id_1006 is underreplicatede.Target replicas is 3 but found 2 replicas". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6703) NFS: Files can be deleted from a read-only mount
[ https://issues.apache.org/jira/browse/HDFS-6703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071501#comment-14071501 ] Srikanth Upputuri commented on HDFS-6703: - Thanks [~brandonli] and [~abutala] for your quick responses and support! > NFS: Files can be deleted from a read-only mount > > > Key: HDFS-6703 > URL: https://issues.apache.org/jira/browse/HDFS-6703 > Project: Hadoop HDFS > Issue Type: Bug > Components: nfs >Affects Versions: 2.2.0 >Reporter: Abhiraj Butala >Assignee: Srikanth Upputuri > Fix For: 2.5.0 > > Attachments: HDFS-6703.patch > > > > As reported by bigdatagroup on hadoop-users mailing > list: > {code} > We exported our distributed filesystem with the following configuration > (Managed by Cloudera Manager over CDH 5.0.1): > > dfs.nfs.exports.allowed.hosts > 192.168.0.153 ro > > As you can see, we expect the exported FS to be read-only, but in fact we are > able to delete files and folders stored on it (where the user has the correct > permissions), from the client machine that mounted the FS. > Other writing operations are correctly blocked. > Hadoop Version in use: 2.3.0+cdh5.0.1+567" > {code} > I was able to reproduce the issue on latest hadoop trunk. Though I could only > delete files, deleting directories were correctly blocked: > {code} > abutala@abutala-vBox:/mnt/hdfs$ mount | grep 127 > 127.0.1.1:/ on /mnt/hdfs type nfs (rw,vers=3,proto=tcp,nolock,addr=127.0.1.1) > abutala@abutala-vBox:/mnt/hdfs$ ls -lh > total 512 > -rw-r--r-- 1 abutala supergroup 0 Jul 17 18:51 abc.txt > drwxr-xr-x 2 abutala supergroup 64 Jul 17 18:31 temp > abutala@abutala-vBox:/mnt/hdfs$ rm abc.txt > abutala@abutala-vBox:/mnt/hdfs$ ls > temp > abutala@abutala-vBox:/mnt/hdfs$ rm -r temp > rm: cannot remove `temp': Permission denied > abutala@abutala-vBox:/mnt/hdfs$ ls > temp > abutala@abutala-vBox:/mnt/hdfs$ > {code} > Contents of hdfs-site.xml: > {code} > > > dfs.nfs3.dump.dir > /tmp/.hdfs-nfs3 > > > dfs.nfs.exports.allowed.hosts > localhost ro > > > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6703) NFS: Files can be deleted from a read-only mount
[ https://issues.apache.org/jira/browse/HDFS-6703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Srikanth Upputuri updated HDFS-6703: Status: Patch Available (was: Open) > NFS: Files can be deleted from a read-only mount > > > Key: HDFS-6703 > URL: https://issues.apache.org/jira/browse/HDFS-6703 > Project: Hadoop HDFS > Issue Type: Bug > Components: nfs >Affects Versions: 2.2.0 >Reporter: Abhiraj Butala >Assignee: Srikanth Upputuri > Attachments: HDFS-6703.patch > > > > As reported by bigdatagroup on hadoop-users mailing > list: > {code} > We exported our distributed filesystem with the following configuration > (Managed by Cloudera Manager over CDH 5.0.1): > > dfs.nfs.exports.allowed.hosts > 192.168.0.153 ro > > As you can see, we expect the exported FS to be read-only, but in fact we are > able to delete files and folders stored on it (where the user has the correct > permissions), from the client machine that mounted the FS. > Other writing operations are correctly blocked. > Hadoop Version in use: 2.3.0+cdh5.0.1+567" > {code} > I was able to reproduce the issue on latest hadoop trunk. Though I could only > delete files, deleting directories were correctly blocked: > {code} > abutala@abutala-vBox:/mnt/hdfs$ mount | grep 127 > 127.0.1.1:/ on /mnt/hdfs type nfs (rw,vers=3,proto=tcp,nolock,addr=127.0.1.1) > abutala@abutala-vBox:/mnt/hdfs$ ls -lh > total 512 > -rw-r--r-- 1 abutala supergroup 0 Jul 17 18:51 abc.txt > drwxr-xr-x 2 abutala supergroup 64 Jul 17 18:31 temp > abutala@abutala-vBox:/mnt/hdfs$ rm abc.txt > abutala@abutala-vBox:/mnt/hdfs$ ls > temp > abutala@abutala-vBox:/mnt/hdfs$ rm -r temp > rm: cannot remove `temp': Permission denied > abutala@abutala-vBox:/mnt/hdfs$ ls > temp > abutala@abutala-vBox:/mnt/hdfs$ > {code} > Contents of hdfs-site.xml: > {code} > > > dfs.nfs3.dump.dir > /tmp/.hdfs-nfs3 > > > dfs.nfs.exports.allowed.hosts > localhost ro > > > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6703) NFS: Files can be deleted from a read-only mount
[ https://issues.apache.org/jira/browse/HDFS-6703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Srikanth Upputuri updated HDFS-6703: Attachment: HDFS-6703.patch Attached a patch. Please review. > NFS: Files can be deleted from a read-only mount > > > Key: HDFS-6703 > URL: https://issues.apache.org/jira/browse/HDFS-6703 > Project: Hadoop HDFS > Issue Type: Bug > Components: nfs >Affects Versions: 2.2.0 >Reporter: Abhiraj Butala >Assignee: Srikanth Upputuri > Attachments: HDFS-6703.patch > > > > As reported by bigdatagroup on hadoop-users mailing > list: > {code} > We exported our distributed filesystem with the following configuration > (Managed by Cloudera Manager over CDH 5.0.1): > > dfs.nfs.exports.allowed.hosts > 192.168.0.153 ro > > As you can see, we expect the exported FS to be read-only, but in fact we are > able to delete files and folders stored on it (where the user has the correct > permissions), from the client machine that mounted the FS. > Other writing operations are correctly blocked. > Hadoop Version in use: 2.3.0+cdh5.0.1+567" > {code} > I was able to reproduce the issue on latest hadoop trunk. Though I could only > delete files, deleting directories were correctly blocked: > {code} > abutala@abutala-vBox:/mnt/hdfs$ mount | grep 127 > 127.0.1.1:/ on /mnt/hdfs type nfs (rw,vers=3,proto=tcp,nolock,addr=127.0.1.1) > abutala@abutala-vBox:/mnt/hdfs$ ls -lh > total 512 > -rw-r--r-- 1 abutala supergroup 0 Jul 17 18:51 abc.txt > drwxr-xr-x 2 abutala supergroup 64 Jul 17 18:31 temp > abutala@abutala-vBox:/mnt/hdfs$ rm abc.txt > abutala@abutala-vBox:/mnt/hdfs$ ls > temp > abutala@abutala-vBox:/mnt/hdfs$ rm -r temp > rm: cannot remove `temp': Permission denied > abutala@abutala-vBox:/mnt/hdfs$ ls > temp > abutala@abutala-vBox:/mnt/hdfs$ > {code} > Contents of hdfs-site.xml: > {code} > > > dfs.nfs3.dump.dir > /tmp/.hdfs-nfs3 > > > dfs.nfs.exports.allowed.hosts > localhost ro > > > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6703) NFS: Files can be deleted from a read-only mount
[ https://issues.apache.org/jira/browse/HDFS-6703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14066332#comment-14066332 ] Srikanth Upputuri commented on HDFS-6703: - I am interested to work on this. Below is my initial analysis. The access privilege check seems to be missing in the 'remove' implementation in RpcProgramNfs3.java. This check is available for 'rmdir' as shown below {code} if (!checkAccessPrivilege(client, AccessPrivilege.READ_WRITE)) { return new RMDIR3Response(Nfs3Status.NFS3ERR_ACCES, errWcc); } {code} Any thoughts? I will analyze further and will update soon. > NFS: Files can be deleted from a read-only mount > > > Key: HDFS-6703 > URL: https://issues.apache.org/jira/browse/HDFS-6703 > Project: Hadoop HDFS > Issue Type: Bug > Components: nfs >Reporter: Abhiraj Butala >Assignee: Srikanth Upputuri > > > As reported by bigdatagroup on hadoop-users mailing > list: > {code} > We exported our distributed filesystem with the following configuration > (Managed by Cloudera Manager over CDH 5.0.1): > > dfs.nfs.exports.allowed.hosts > 192.168.0.153 ro > > As you can see, we expect the exported FS to be read-only, but in fact we are > able to delete files and folders stored on it (where the user has the correct > permissions), from the client machine that mounted the FS. > Other writing operations are correctly blocked. > Hadoop Version in use: 2.3.0+cdh5.0.1+567" > {code} > I was able to reproduce the issue on latest hadoop trunk. Though I could only > delete files, deleting directories were correctly blocked: > {code} > abutala@abutala-vBox:/mnt/hdfs$ mount | grep 127 > 127.0.1.1:/ on /mnt/hdfs type nfs (rw,vers=3,proto=tcp,nolock,addr=127.0.1.1) > abutala@abutala-vBox:/mnt/hdfs$ ls -lh > total 512 > -rw-r--r-- 1 abutala supergroup 0 Jul 17 18:51 abc.txt > drwxr-xr-x 2 abutala supergroup 64 Jul 17 18:31 temp > abutala@abutala-vBox:/mnt/hdfs$ rm abc.txt > abutala@abutala-vBox:/mnt/hdfs$ ls > temp > abutala@abutala-vBox:/mnt/hdfs$ rm -r temp > rm: cannot remove `temp': Permission denied > abutala@abutala-vBox:/mnt/hdfs$ ls > temp > abutala@abutala-vBox:/mnt/hdfs$ > {code} > Contents of hdfs-site.xml: > {code} > > > dfs.nfs3.dump.dir > /tmp/.hdfs-nfs3 > > > dfs.nfs.exports.allowed.hosts > localhost ro > > > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HDFS-6703) NFS: Files can be deleted from a read-only mount
[ https://issues.apache.org/jira/browse/HDFS-6703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Srikanth Upputuri reassigned HDFS-6703: --- Assignee: Srikanth Upputuri > NFS: Files can be deleted from a read-only mount > > > Key: HDFS-6703 > URL: https://issues.apache.org/jira/browse/HDFS-6703 > Project: Hadoop HDFS > Issue Type: Bug > Components: nfs >Reporter: Abhiraj Butala >Assignee: Srikanth Upputuri > > > As reported by bigdatagroup on hadoop-users mailing > list: > {code} > We exported our distributed filesystem with the following configuration > (Managed by Cloudera Manager over CDH 5.0.1): > > dfs.nfs.exports.allowed.hosts > 192.168.0.153 ro > > As you can see, we expect the exported FS to be read-only, but in fact we are > able to delete files and folders stored on it (where the user has the correct > permissions), from the client machine that mounted the FS. > Other writing operations are correctly blocked. > Hadoop Version in use: 2.3.0+cdh5.0.1+567" > {code} > I was able to reproduce the issue on latest hadoop trunk. Though I could only > delete files, deleting directories were correctly blocked: > {code} > abutala@abutala-vBox:/mnt/hdfs$ mount | grep 127 > 127.0.1.1:/ on /mnt/hdfs type nfs (rw,vers=3,proto=tcp,nolock,addr=127.0.1.1) > abutala@abutala-vBox:/mnt/hdfs$ ls -lh > total 512 > -rw-r--r-- 1 abutala supergroup 0 Jul 17 18:51 abc.txt > drwxr-xr-x 2 abutala supergroup 64 Jul 17 18:31 temp > abutala@abutala-vBox:/mnt/hdfs$ rm abc.txt > abutala@abutala-vBox:/mnt/hdfs$ ls > temp > abutala@abutala-vBox:/mnt/hdfs$ rm -r temp > rm: cannot remove `temp': Permission denied > abutala@abutala-vBox:/mnt/hdfs$ ls > temp > abutala@abutala-vBox:/mnt/hdfs$ > {code} > Contents of hdfs-site.xml: > {code} > > > dfs.nfs3.dump.dir > /tmp/.hdfs-nfs3 > > > dfs.nfs.exports.allowed.hosts > localhost ro > > > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)