Hi, I am new to Spark and would like to have a quick question about the end user impersonation on Spark executor process.
Basically I am running SQL queries through Spark thrift server with doAs set to true to enable end user impersonation. In my experiment, I was able to start session for multiple end users at the same time and all queries look fine. For example, user A can query table 1, which is accessible to A exclusively (according to HDFS permission). At the same time, user B can query table 2, which is accessible to B exclusively. Looks like the end user UGI has been flowed to the executor process successfully. I checked SparkContext code and looks like the end user info is flowed to executor by specifying “SPARK_USER” env variable. Correct me if I am wrong. I only see 1 executor process running for all the queries from multiple users in my experiment. The question is why the single process can impersonate multiple end users at the same time. I assume the value of “SPARK_USER” env variable should be either user A or B in the executor. Then there has to be HDFS permission errors for the other user. But I did not see any error for any user. Can someone give some insights on that question? Thanks so much.