[ 
https://issues.apache.org/jira/browse/SPARK-17516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16769889#comment-16769889
 ] 

t oo commented on SPARK-17516:
------------------------------

gentle ping

> Current user info is not checked on STS in DML queries
> ------------------------------------------------------
>
>                 Key: SPARK-17516
>                 URL: https://issues.apache.org/jira/browse/SPARK-17516
>             Project: Spark
>          Issue Type: Bug
>            Reporter: Tao Li
>            Priority: Critical
>
> I have captured some issues related to doAs support from STS. I am using a 
> non-secure cluster as my test environment. Simply speaking, the end user info 
> is not being passed when STS talks to metastore, so the impersonation is not 
> happening on metastore.
> STS is using a ClientWarpper instance (which is wrapped in HiveContext) for 
> each session. However by design all ClientWarpper instances are sharing the 
> same Hive instance, which is responsible for talking to Metastore. A 
> singleton IsolatedClientLoader instance is initialized when STS starts up and 
> it contains the cachedHive instance. The cachedHive is associated “hive” UGI, 
> since no session has been set up so current user is “hive". Then each session 
> creates a ClientWarpper instance which is associated with the same cachedHive 
> instance.
> When we make queries after session is established, the code path to retrieve 
> the Hive instance is different for DML and DDL operation. Looks like DML 
> operation related code has less dependency on hive-exec module.
> For the DML operations (e.g. “select *”), STS calls into ClientWarpper code 
> and talks to metastore through the singleton Hive instance directly. There is 
> no code involved to check the current user. That’s why doAs is not being 
> respected, even though current user is already switched to the end user in 
> the thread context.
> For DDL operations (e.g. “ALTER table”), STS eventually calls into hive 
> driver code (e.g. BaseSemanticAnalyzer). From there Hive.get() is called to 
> get the thread local Hive instance and refresh it if necessary. If the 
> current user has changed, we refresh the Hive instance by recreating the 
> metastore connection with the current user info. So even though all thread 
> locals are actually referencing the singleton Hive instance, calling 
> Hive.get() is playing an important role here to take any UGI change into 
> account. That’s why the DDL operations respects doAs . 
> The fix should be calling Hive.get() for the DML operations, like the hive 
> driver code called from DDL operation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to