[ https://issues.apache.org/jira/browse/SPARK-17516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16769889#comment-16769889 ]
t oo commented on SPARK-17516: ------------------------------ gentle ping > Current user info is not checked on STS in DML queries > ------------------------------------------------------ > > Key: SPARK-17516 > URL: https://issues.apache.org/jira/browse/SPARK-17516 > Project: Spark > Issue Type: Bug > Reporter: Tao Li > Priority: Critical > > I have captured some issues related to doAs support from STS. I am using a > non-secure cluster as my test environment. Simply speaking, the end user info > is not being passed when STS talks to metastore, so the impersonation is not > happening on metastore. > STS is using a ClientWarpper instance (which is wrapped in HiveContext) for > each session. However by design all ClientWarpper instances are sharing the > same Hive instance, which is responsible for talking to Metastore. A > singleton IsolatedClientLoader instance is initialized when STS starts up and > it contains the cachedHive instance. The cachedHive is associated “hive” UGI, > since no session has been set up so current user is “hive". Then each session > creates a ClientWarpper instance which is associated with the same cachedHive > instance. > When we make queries after session is established, the code path to retrieve > the Hive instance is different for DML and DDL operation. Looks like DML > operation related code has less dependency on hive-exec module. > For the DML operations (e.g. “select *”), STS calls into ClientWarpper code > and talks to metastore through the singleton Hive instance directly. There is > no code involved to check the current user. That’s why doAs is not being > respected, even though current user is already switched to the end user in > the thread context. > For DDL operations (e.g. “ALTER table”), STS eventually calls into hive > driver code (e.g. BaseSemanticAnalyzer). From there Hive.get() is called to > get the thread local Hive instance and refresh it if necessary. If the > current user has changed, we refresh the Hive instance by recreating the > metastore connection with the current user info. So even though all thread > locals are actually referencing the singleton Hive instance, calling > Hive.get() is playing an important role here to take any UGI change into > account. That’s why the DDL operations respects doAs . > The fix should be calling Hive.get() for the DML operations, like the hive > driver code called from DDL operation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org