Spark Thrift Server is started with ./sbin/start-thriftserver.sh --master yarn-client --hiveconf hive.server2.thrift.port=10001 --num-executors 4 --executor-cores 2 --executor-memory 4G --conf spark.scheduler.mode=FAIR
20 parallel below queries are executed select distinct val2 from philips1 where key>=1000 and key<=1500 And there is no issue at the backend Spark Executors, as spark jobs UI shows all 20 queries are launched and completed with same duration. And all 20 queries are received by Spark Thrift Server at same time. But the Spark Driver present inside Spark Thrift Sever looks like overloaded and hence the queries are not parsed and submitted to executors at same time and hence seeing the delay in query execution time from client. On Thu, Jun 23, 2016 at 11:12 PM, Michael Segel <msegel_had...@hotmail.com> wrote: > Hi, > There are a lot of moving parts and a lot of unknowns from your > description. > Besides the version stuff. > > How many executors, how many cores? How much memory? > Are you persisting (memory and disk) or just caching (memory) > > During the execution… same tables… are you seeing a lot of shuffling of > data for some queries and not others? > > It sounds like an interesting problem… > > On Jun 23, 2016, at 5:21 AM, Prabhu Joseph <prabhujose.ga...@gmail.com> > wrote: > > Hi All, > > On submitting 20 parallel same SQL query to Spark Thrift Server, the > query execution time for some queries are less than a second and some are > more than 2seconds. The Spark Thrift Server logs shows all 20 queries are > submitted at same time 16/06/23 12:12:01 but the result schema are at > different times. > > 16/06/23 12:12:01 INFO SparkExecuteStatementOperation: Running query > 'select distinct val2 from philips1 where key>=1000 and key<=1500 > > 16/06/23 12:12:*02* INFO SparkExecuteStatementOperation: Result Schema: > ArrayBuffer(val2#2110) > 16/06/23 12:12:*03* INFO SparkExecuteStatementOperation: Result Schema: > ArrayBuffer(val2#2182) > 16/06/23 12:12:*04* INFO SparkExecuteStatementOperation: Result Schema: > ArrayBuffer(val2#2344) > 16/06/23 12:12:*05* INFO SparkExecuteStatementOperation: Result Schema: > ArrayBuffer(val2#2362) > > There are sufficient executors running on YARN. The concurrency is > affected by Single Driver. How to improve the concurrency and what are the > best practices. > > Thanks, > Prabhu Joseph > > >