[ 
https://issues.apache.org/jira/browse/SPARK-46566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chenyu Zheng updated SPARK-46566:
---------------------------------
    Description: 
I setup thriftserver based on v3.5.0, when I execute command, will throw this 
error:
{code:java}
15:10:53.400 [HiveServer2-Handler-Pool: Thread-293] ERROR 
org.apache.thrift.transport.TSaslTransport - SASL negotiation failure
javax.security.sasl.SaslException: GSS initiate failed
## ignore very long stack trace ## {code}
With some debugging and analysis, I found that proxyuser should use token to 
access metastore, but actually uses kerberos. The direct reason is that 
"hive.metastore.token.signature" is lost.

In fact, we have set "hive.metastore.token.signature" to 
"HiveServer2ImpersonationToken" for config when construct 
HiveSessionImplwithUGI, and store the config in 
HiveSessionImplwithUGI::sessionHive and HiveSessionImplwithUGI::sessionState
When session is acquire, we should set sessionState and sessionHive to 
thread-level variables. Then the execution statements will use their own 
sessionHive and sessionState, so use the right config.

But if isolation is enable, a new SessionState and Hive will be constructed 
using the specified hive version. Config is not passed from 
HiveSessionImplwithUGI::sessionState to this SessionState. And config is not 
passed from HiveSessionImplwithUGI::sessionHive to new Hive. So 
hive.metastore.token.signature is missing.

How to fix?

For `spark.sql.hive.metastore.jars` is 'builtin', we can directly obtain the 
session-level config which is threadlocal variable by SessionState.get() or 
Hive.get().

For `spark.sql.hive.metastore.jars` is 'maven' or 'path' for jars path, we will 
use IsolatedClientLoader to reload Hive metastore related class, It means the 
thread-local variable in SessionState and Hive will be missing. So we need a 
new structure to store threadlocal  config.

  was:
I setup thriftserver based on v3.5.0, when I execute command, will throw this 
error:
{code:java}
15:10:53.400 [HiveServer2-Handler-Pool: Thread-293] ERROR 
org.apache.thrift.transport.TSaslTransport - SASL negotiation failure
javax.security.sasl.SaslException: GSS initiate failed
## ignore very long stack trace ## {code}
With some debugging and analysis, I found that proxyuser should use token to 
access metastore, but actually uses kerberos. The direct reason is that 
"hive.metastore.token.signature" is lost.

In fact, we have set "hive.metastore.token.signature" to 
"HiveServer2ImpersonationToken" for config when construct 
HiveSessionImplwithUGI, and store the config in 
HiveSessionImplwithUGI::sessionHive and HiveSessionImplwithUGI::sessionState
When session is acquire, we should set sessionState and sessionHive to 
thread-level variables. Then the execution statements will use their own 
sessionHive and sessionState, so use the right config.

But if isolation is enable, a new SessionState and Hive will be constructed 
using the specified hive version. Config is not passed from 
HiveSessionImplwithUGI::sessionState to this SessionState. And config is not 
passed from HiveSessionImplwithUGI::sessionHive to new Hive. So 
hive.metastore.token.signature is missing.


> Session level config was not loaded when isolation is enable.
> -------------------------------------------------------------
>
>                 Key: SPARK-46566
>                 URL: https://issues.apache.org/jira/browse/SPARK-46566
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.5.0
>            Reporter: Chenyu Zheng
>            Priority: Major
>
> I setup thriftserver based on v3.5.0, when I execute command, will throw this 
> error:
> {code:java}
> 15:10:53.400 [HiveServer2-Handler-Pool: Thread-293] ERROR 
> org.apache.thrift.transport.TSaslTransport - SASL negotiation failure
> javax.security.sasl.SaslException: GSS initiate failed
> ## ignore very long stack trace ## {code}
> With some debugging and analysis, I found that proxyuser should use token to 
> access metastore, but actually uses kerberos. The direct reason is that 
> "hive.metastore.token.signature" is lost.
> In fact, we have set "hive.metastore.token.signature" to 
> "HiveServer2ImpersonationToken" for config when construct 
> HiveSessionImplwithUGI, and store the config in 
> HiveSessionImplwithUGI::sessionHive and HiveSessionImplwithUGI::sessionState
> When session is acquire, we should set sessionState and sessionHive to 
> thread-level variables. Then the execution statements will use their own 
> sessionHive and sessionState, so use the right config.
> But if isolation is enable, a new SessionState and Hive will be constructed 
> using the specified hive version. Config is not passed from 
> HiveSessionImplwithUGI::sessionState to this SessionState. And config is not 
> passed from HiveSessionImplwithUGI::sessionHive to new Hive. So 
> hive.metastore.token.signature is missing.
> How to fix?
> For `spark.sql.hive.metastore.jars` is 'builtin', we can directly obtain the 
> session-level config which is threadlocal variable by SessionState.get() or 
> Hive.get().
> For `spark.sql.hive.metastore.jars` is 'maven' or 'path' for jars path, we 
> will use IsolatedClientLoader to reload Hive metastore related class, It 
> means the thread-local variable in SessionState and Hive will be missing. So 
> we need a new structure to store threadlocal  config.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to