I favor the option of a conf variable - "strict.owner.mode" to indicate that dirs will not be created by server and will be done by the client. In installations where there are thrift clients, this can be set to false till the point the clients are ready to create the dirs themselves - is this an acceptable solution - I can then open a jira with this proposed solution.
Thoughts? Pradeep ________________________________ From: Pradeep Kamath [mailto:prade...@yahoo-inc.com] Sent: Tuesday, July 20, 2010 10:10 AM To: hive-user@hadoop.apache.org Subject: RE: Thrift metastore server and dfs file owner In addition to the options below, if there is some way to have custom code into thrift clients then that could be a third option - from what little I know of thrift, I think the client code is generated and there is no way to add additional logic into the methods - but in case there is a way to do that, then that might be the best option. ________________________________ From: Pradeep Kamath [mailto:prade...@yahoo-inc.com] Sent: Monday, July 19, 2010 1:09 PM To: hive-user@hadoop.apache.org Subject: RE: Thrift metastore server and dfs file owner I agree this will be an issue for direct thrift clients. How about the following options: 1) Add a conf variable - "strict.owner.mode" - if this is set to true on the server, dirs will not be created and they will be created on the client (both client and server should have the same value (true or false). OR 2) Add a new API method in the thrift API which takes an extra Boolean arg whether or not to create dirs. The HiveMetaStoreClient code will use this new api with a "false" argument value and create the dir on the client side. The issue with this is that existing Thrift client would be calling the current API method which would create dirs as the thrift server users. So depending on whether you are creating the table using thrift (with old method) or CLI you get different results. The old method could be deprecated and the thrift clients can migrate to the new one. Thoughts? (This directory creation/deletion is relevant to create table/drop table/add partition/alter table/alter partition I think) Pradeep -----Original Message----- From: Paul Yang [mailto:py...@facebook.com] Sent: Monday, July 19, 2010 10:53 AM To: hive-user@hadoop.apache.org Subject: RE: Thrift metastore server and dfs file owner That approach would work for the CLI, but then the semantics for the create table/create partition calls for thrift clients would be different - it would no longer create the table directory. This might be a problem if there are scripts that rely on this property for copying/moving files. Also, table renaming code would need to be modified as well. -----Original Message----- From: Pradeep Kamath [mailto:prade...@yahoo-inc.com] Sent: Monday, July 19, 2010 10:24 AM To: hive-user@hadoop.apache.org Subject: RE: Thrift metastore server and dfs file owner I was thinking about this a little more and was wondering if the following alternative approach is feasible: Instead of the Metastore code creating the directories why not have HiveMetastoreClient create it in createTable() after the table is created - i.e. it can do a getTable().getSd().getLocation() and perform wh.mkdirs() on that path. We could do the same thing with addPartition(). This way, we can have the metastore thrift server running as a non-hdfs-superuser. Also, we no longer need to keep track or user/group information since the client already is running with the right user/group credentials. Thoughts? Pradeep -----Original Message----- From: Pradeep Kamath [mailto:prade...@yahoo-inc.com] Sent: Thursday, July 15, 2010 10:23 AM To: hive-user@hadoop.apache.org Subject: RE: Thrift metastore server and dfs file owner Currently group information is not present in the Table and both owner and group information are absent from Database. If these are added to these classes, we could change Warehouse.mkdirs(). This method is also called form addPartition(), should we just use the table's owner/group in this case? - could potentially fail in non thrift case if some other user is creating the partitions OR we would need to add owner/group to Partition as well with the implication that table and partition owner's could differ causing query failures. Paul's concern about security is valid but is there any other way around this? Pradeep -----Original Message----- From: Paul Yang [mailto:py...@facebook.com] Sent: Wednesday, July 14, 2010 3:18 PM To: hive-user@hadoop.apache.org Subject: RE: Thrift metastore server and dfs file owner Yeah, you could overload Warehouse.mkdirs() to allow specification of an owner/group and then use Filesystem.setOwner() within the method. If the thrift server has full permissions for DFS though, wouldn't this present a security hole? -----Original Message----- From: Ashish Thusoo [mailto:athu...@facebook.com] Sent: Wednesday, July 14, 2010 12:34 PM To: hive-user@hadoop.apache.org Subject: RE: Thrift metastore server and dfs file owner We could just fix this in Warehouse.java so that the mkdirs call make the directories according to the owner field that is passed to the table? That probably would be a simple fix for this, no? Ashish -----Original Message----- From: Pradeep Kamath [mailto:prade...@yahoo-inc.com] Sent: Wednesday, July 14, 2010 11:14 AM To: hive-user@hadoop.apache.org Subject: RE: Thrift metastore server and dfs file owner <name>dfs.permissions</name> <value>true</value> .. <name>dfs.permissions.supergroup</name> <value>hdfs</value> You mentioned: "I think the thrift server can use the dfs processor." - were you suggesting the metastore implementation in HiveMetastore should always do chown user:user on create_table_core() (or selectively look at the conf and known it is being run as a thrift server and chown only in that case)? Pradeep -----Original Message----- From: Edward Capriolo [mailto:edlinuxg...@gmail.com] Sent: Tuesday, July 13, 2010 4:52 PM To: hive-user@hadoop.apache.org Subject: Re: Thrift metastore server and dfs file owner On Tue, Jul 13, 2010 at 6:20 PM, Pradeep Kamath <prade...@yahoo-inc.com> wrote: > I tried: > hive -e "set user.name=$USER;create table foo2 ( name string);" > > My warehouse table dir still got created by "root" (the user my thrift > server is running as) drwxr-xr-x - root supergroup 0 > 2010-07-13 15:19 /user/pradeepk/hive/warehouse/foo2 > > -----Original Message----- > From: Edward Capriolo [mailto:edlinuxg...@gmail.com] > Sent: Tuesday, July 13, 2010 2:47 PM > To: hive-user@hadoop.apache.org > Subject: Re: Thrift metastore server and dfs file owner > > On Tue, Jul 13, 2010 at 5:04 PM, Pradeep Kamath <prade...@yahoo-inc.com> wrote: >> Hi, >> >> I suspect this is true but wanted to confirm: If I start a thrift >> metastore service as user "joe" then all internal tables created will >> have directories under the warehouse directory owned by "joe" >> regardless of the actual user running the create table statement - is >> this correct? There is no way for the thrift server to create the directory as the actual user? >> However if thrift service is not used and the hive client directly >> works against the metastore database, then the directories are >> created by the actual user - is this correct? >> >> >> >> Thanks, >> >> Pradeep > > The hive web interface does this: > > queries.add("set hadoop.job.ugi=" + auth.getUser() + "," > + auth.getGroups()[0]); > queries.add("set user.name=" + auth.getUser()); > > You should be able to accomplish the same thing using set commands > with the Thrift Server to impersonate. > > Regards, > Edward > You are right. That technique may only affect files created during the map/reduce job. I think the thrift server can use the dfs processor. hive> dfs -chown user:user /user/hive/warehouse/foo2; Questions: Who is your hadoop superuser? Are you enforcing dfs permissions? If you are enforcing permissions only the hadoop superuser (hadoop) will be able to chown files to other users and groups.