[ 
https://issues.apache.org/jira/browse/HIVE-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13938184#comment-13938184
 ] 

Joe Rao commented on HIVE-6489:
-------------------------------

A succinct way of wording the problem is:
- Hadoop daemons create the /tmp/hive-<username> directory with their group 
ownership
- User data loaded via LOAD DATA LOCAL INPATH is staged in 
/tmp/hive-<username>, inheriting its group ownership
- User data is moved to the table directory, but keeps the group ownership of 
/tmp/hive-<username>

The desired behavior is:
- Data loaded with LOAD DATA LOCAL INPATH inherits the group ownership of the 
table directory

This could be solved by:
- Removing the need to stage the data in /tmp, OR
- Adding a step to LOAD DATA LOCAL INPATH to change the group ownership, after 
the load completes (this one is probably an easier solution)

> Data loaded with LOAD DATA LOCAL INPATH has incorrect group ownership
> ---------------------------------------------------------------------
>
>                 Key: HIVE-6489
>                 URL: https://issues.apache.org/jira/browse/HIVE-6489
>             Project: Hive
>          Issue Type: Bug
>          Components: Import/Export
>    Affects Versions: 0.9.0, 0.10.0, 0.11.0, 0.12.0
>         Environment: OS and hardware are irrelevant.  Tested and reproduced 
> on multiple configurations, including SLES, RHEL, VM, Teradata Hadoop 
> Appliance, HDP 1.1, HDP 1.3.2, HDP 2.0.
>            Reporter: Joe Rao
>            Priority: Minor
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Data uploaded by <user> via the Hive client with the "LOAD DATA LOCAL INPATH" 
> method will have group ownership of the hdfs://tmp/hive-<user> instead of the 
> primary group that <user> belongs to.  The group ownership of the 
> hdfs://tmp/hive-<user> is, by default, the group that the user running the 
> hadoop daemons run under.  This means that, on a Hadoop system with default 
> file permissions of 770, any data loaded to hive via the LOAD DATA LOCAL 
> INPATH method by one user cannot be seen by another user in the same group 
> until the group ownership is manually changed in Hive's internal directory, 
> or the group ownership is manually changed on hdfs://tmp/hive-<user>.  This 
> problem is not present with the LOAD DATA INPATH method, or by using regular 
> HDFS loads.
> Steps to reproduce the problem on a pseudodistributed Hadoop cluster:
> - In hdfs-site.xml, modify the umask to 007 (meaning that default permissions 
> on files are 770).  The property changes names in Hadoop 2.0 but used to be 
> called "dfs.umaskmode".
> - Restart hdfs
> - Create a group called "testgroup".
> - Create two users that have testgroup as their primary group.  Call them 
> "testuser1" and "testuser2"
> - Create a test file containing "Hello World" and call it "test.txt".  It 
> should be stored on the local filesystem.
> - Create a table called "testtable" in Hive using testuser1.  Give it a 
> single string column, textfile format, comma delimited fields.
> - Have testuser1 use the LOAD DATA LOCAL INPATH command to load "test.txt" 
> into testtable.
> - Attempt to read testtable using testuser2.  The read will fail on a 
> permissions error, when it should not.
> - Examine the contents of the hdfs://apps/hive/warehouse/testtable directory. 
>  The file will belong to the "hadoop" or "users" or analogous group, instead 
> of the correct group "testgroup".  It will have correct permissions of 770.
> - Change the group ownership of the folder "hdfs://tmp/hive-testuser1" to 
> "testgroup".
> - Repeat the data load.  testuser2 will now be able to correctly read the 
> data, and the file will have the correct group ownership.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to