[ https://issues.apache.org/jira/browse/SPARK-21918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16155022#comment-16155022 ]
Hu Liu, edited comment on SPARK-21918 at 9/6/17 8:51 AM: --------------------------------------------------------- [~mgaido] Yes, all the jobs are executed using the same user. but the problem is't in STS. The STS open session with impersonation when doAS is enabled {code:java} if (cliService.getHiveConf().getBoolVar(ConfVars.HIVE_SERVER2_ENABLE_DOAS) && (userName != null)) { String delegationTokenStr = getDelegationToken(userName); sessionHandle = cliService.openSessionWithImpersonation(protocol, userName, req.getPassword(), ipAddress, req.getConfiguration(), delegationTokenStr); } else { {code} And run sql by session ugi in HiveSessionProxy. For DDL operation, spark sql use Hive object in HiveClientImplement.java to communicate with metastore. Currently the Hive object is shared between different threads that why all jobs is executed using same user in HiveClientImpl.java {code:java} private def client: Hive = { if (clientLoader.cachedHive != null) { clientLoader.cachedHive.asInstanceOf[Hive] } else { val c = Hive.get(conf) clientLoader.cachedHive = c c } } {code} Actually Hive object store different instance for different thread and class HiveSessionImplwithUGI have already create Hive object for current user session {code:java} // create a new metastore connection for this particular user session Hive.set(null); try { sessionHive = Hive.get(getHiveConf()); } catch (HiveException e) { throw new HiveSQLException("Failed to setup metastore connection", e); } {code} If we could pass the Hive object for current user session to the working thread, we can fix this problem I have already fixed it and could run DDL operation using the session user. was (Author: huliu): [~mgaido] Yes, all the jobs are executed using the same user. but the problem is't in STS. The STS open session with impersonation when doAS is enabled {code:java} if (cliService.getHiveConf().getBoolVar(ConfVars.HIVE_SERVER2_ENABLE_DOAS) && (userName != null)) { String delegationTokenStr = getDelegationToken(userName); sessionHandle = cliService.openSessionWithImpersonation(protocol, userName, req.getPassword(), ipAddress, req.getConfiguration(), delegationTokenStr); } else { {code} And run sql by session ugi in HiveSessionProxy. For DDL operation, spark sql use Hive object in HiveClientImplement.java to communicate with metastore. Currently the Hive object is shared between different threads that why all jobs is executed using same user in HiveClientImpl.java {code:java} private def client: Hive = { if (clientLoader.cachedHive != null) { clientLoader.cachedHive.asInstanceOf[Hive] } else { val c = Hive.get(conf) clientLoader.cachedHive = c c } } {code} Actually Hive object store different instance for different thread and class HiveSessionImplwithUGI have already create Hive object for current user session {code:java} // create a new metastore connection for this particular user session Hive.set(null); try { sessionHive = Hive.get(getHiveConf()); } catch (HiveException e) { throw new HiveSQLException("Failed to setup metastore connection", e); } {code} If we could pass the Hive object for current user session to the work thread, we can fix this problem I have already fixed it and could run DDL operation using the session user. > HiveClient shouldn't share Hive object between different thread > --------------------------------------------------------------- > > Key: SPARK-21918 > URL: https://issues.apache.org/jira/browse/SPARK-21918 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.2.0 > Reporter: Hu Liu, > > I'm testing the spark thrift server and found that all the DDL statements are > run by user hive even if hive.server2.enable.doAs=true > The root cause is that Hive object is shared between different thread in > HiveClientImpl > {code:java} > private def client: Hive = { > if (clientLoader.cachedHive != null) { > clientLoader.cachedHive.asInstanceOf[Hive] > } else { > val c = Hive.get(conf) > clientLoader.cachedHive = c > c > } > } > {code} > But in impersonation mode, we should just share the Hive object inside the > thread so that the metastore client in Hive could be associated with right > user. > we can pass the Hive object of parent thread to child thread when running > the sql to fix it > I have already had a initial patch for review and I'm glad to work on it if > anyone could assign it to me. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org