Hi,

I am new to Spark and would like to have a quick question about the end user 
impersonation on Spark executor process.

Basically I am running SQL queries through Spark thrift server with doAs set to 
true to enable end user impersonation. In my experiment, I was able to start 
session for multiple end users at the same time and all queries look fine. For 
example, user A can query table 1, which is accessible to A exclusively 
(according to HDFS permission). At the same time, user B can query table 2, 
which is accessible to B exclusively. Looks like the end user UGI has been 
flowed to the executor process successfully. I checked SparkContext code and 
looks like the end user info is flowed to executor by specifying “SPARK_USER” 
env variable. Correct me if I am wrong.

I only see 1 executor process running for all the queries from multiple users 
in my experiment. The question is why the single process can impersonate 
multiple end users at the same time. I assume the value of “SPARK_USER” env 
variable should be either user A or B in the executor. Then there has to be 
HDFS permission errors for the other user. But I did not see any error for any 
user.

Can someone give some insights on that question? Thanks so much.

Reply via email to