[
https://issues.apache.org/jira/browse/HIVE-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13938184#comment-13938184
]
Joe Rao commented on HIVE-6489:
-------------------------------
A succinct way of wording the problem is:
- Hadoop daemons create the /tmp/hive-<username> directory with their group
ownership
- User data loaded via LOAD DATA LOCAL INPATH is staged in
/tmp/hive-<username>, inheriting its group ownership
- User data is moved to the table directory, but keeps the group ownership of
/tmp/hive-<username>
The desired behavior is:
- Data loaded with LOAD DATA LOCAL INPATH inherits the group ownership of the
table directory
This could be solved by:
- Removing the need to stage the data in /tmp, OR
- Adding a step to LOAD DATA LOCAL INPATH to change the group ownership, after
the load completes (this one is probably an easier solution)
> Data loaded with LOAD DATA LOCAL INPATH has incorrect group ownership
> ---------------------------------------------------------------------
>
> Key: HIVE-6489
> URL: https://issues.apache.org/jira/browse/HIVE-6489
> Project: Hive
> Issue Type: Bug
> Components: Import/Export
> Affects Versions: 0.9.0, 0.10.0, 0.11.0, 0.12.0
> Environment: OS and hardware are irrelevant. Tested and reproduced
> on multiple configurations, including SLES, RHEL, VM, Teradata Hadoop
> Appliance, HDP 1.1, HDP 1.3.2, HDP 2.0.
> Reporter: Joe Rao
> Priority: Minor
> Original Estimate: 24h
> Remaining Estimate: 24h
>
> Data uploaded by <user> via the Hive client with the "LOAD DATA LOCAL INPATH"
> method will have group ownership of the hdfs://tmp/hive-<user> instead of the
> primary group that <user> belongs to. The group ownership of the
> hdfs://tmp/hive-<user> is, by default, the group that the user running the
> hadoop daemons run under. This means that, on a Hadoop system with default
> file permissions of 770, any data loaded to hive via the LOAD DATA LOCAL
> INPATH method by one user cannot be seen by another user in the same group
> until the group ownership is manually changed in Hive's internal directory,
> or the group ownership is manually changed on hdfs://tmp/hive-<user>. This
> problem is not present with the LOAD DATA INPATH method, or by using regular
> HDFS loads.
> Steps to reproduce the problem on a pseudodistributed Hadoop cluster:
> - In hdfs-site.xml, modify the umask to 007 (meaning that default permissions
> on files are 770). The property changes names in Hadoop 2.0 but used to be
> called "dfs.umaskmode".
> - Restart hdfs
> - Create a group called "testgroup".
> - Create two users that have testgroup as their primary group. Call them
> "testuser1" and "testuser2"
> - Create a test file containing "Hello World" and call it "test.txt". It
> should be stored on the local filesystem.
> - Create a table called "testtable" in Hive using testuser1. Give it a
> single string column, textfile format, comma delimited fields.
> - Have testuser1 use the LOAD DATA LOCAL INPATH command to load "test.txt"
> into testtable.
> - Attempt to read testtable using testuser2. The read will fail on a
> permissions error, when it should not.
> - Examine the contents of the hdfs://apps/hive/warehouse/testtable directory.
> The file will belong to the "hadoop" or "users" or analogous group, instead
> of the correct group "testgroup". It will have correct permissions of 770.
> - Change the group ownership of the folder "hdfs://tmp/hive-testuser1" to
> "testgroup".
> - Repeat the data load. testuser2 will now be able to correctly read the
> data, and the file will have the correct group ownership.
--
This message was sent by Atlassian JIRA
(v6.2#6252)