How to improve the concurrent query performance of spark SQL query

2021-08-26 Thread Tao Li
In the high concurrency scenario, the query performance of spark SQL is limited 
by namenode and hive Metastore. There are some caches in the code, but the 
effect is limited. Do we have a practical and effective way to solve the 
time-consuming problem of driver in concurrent query?

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Question about impersonation on Spark executor

2016-09-14 Thread Tao Li
Hi,

I am new to Spark and would like to have a quick question about the end user 
impersonation on Spark executor process.

Basically I am running SQL queries through Spark thrift server with doAs set to 
true to enable end user impersonation. In my experiment, I was able to start 
session for multiple end users at the same time and all queries look fine. For 
example, user A can query table 1, which is accessible to A exclusively 
(according to HDFS permission). At the same time, user B can query table 2, 
which is accessible to B exclusively. Looks like the end user UGI has been 
flowed to the executor process successfully. I checked SparkContext code and 
looks like the end user info is flowed to executor by specifying “SPARK_USER” 
env variable. Correct me if I am wrong.

I only see 1 executor process running for all the queries from multiple users 
in my experiment. The question is why the single process can impersonate 
multiple end users at the same time. I assume the value of “SPARK_USER” env 
variable should be either user A or B in the executor. Then there has to be 
HDFS permission errors for the other user. But I did not see any error for any 
user.

Can someone give some insights on that question? Thanks so much.


Quick question about hive-exec 1.2.1.spark2

2016-08-03 Thread Tao Li
Hi,

The spark-hive module has a dependency on hive-exec module (a custom built 
module from "Hive on Spark” project). Can someone point me to the source code 
repo of the hive-exec module? Thanks.

Here is the maven repo link: 
https://mvnrepository.com/artifact/org.spark-project.hive/hive-exec/1.2.1.spark2