Hi,
I am new to Spark and would like to have a quick question about the end user
impersonation on Spark executor process.
Basically I am running SQL queries through Spark thrift server with doAs set to
true to enable end user impersonation. In my experiment, I was able to start
session for multiple end users at the same time and all queries look fine. For
example, user A can query table 1, which is accessible to A exclusively
(according to HDFS permission). At the same time, user B can query table 2,
which is accessible to B exclusively. Looks like the end user UGI has been
flowed to the executor process successfully. I checked SparkContext code and
looks like the end user info is flowed to executor by specifying “SPARK_USER”
env variable. Correct me if I am wrong.
I only see 1 executor process running for all the queries from multiple users
in my experiment. The question is why the single process can impersonate
multiple end users at the same time. I assume the value of “SPARK_USER” env
variable should be either user A or B in the executor. Then there has to be
HDFS permission errors for the other user. But I did not see any error for any
user.
Can someone give some insights on that question? Thanks so much.