Spark Thrift Server is started with

./sbin/start-thriftserver.sh --master yarn-client --hiveconf
hive.server2.thrift.port=10001 --num-executors 4 --executor-cores 2
--executor-memory 4G --conf spark.scheduler.mode=FAIR

20 parallel below queries are executed

select distinct val2 from philips1 where key>=1000 and key<=1500

And there is no issue at the backend Spark Executors, as spark jobs UI
shows all 20 queries are launched and completed with same duration. And all
20 queries are received by Spark Thrift Server at same time. But the Spark
Driver present inside Spark Thrift Sever  looks like overloaded and hence
the queries are not parsed and
submitted to executors at same time and hence seeing the delay in query
execution time from client.





On Thu, Jun 23, 2016 at 11:12 PM, Michael Segel <msegel_had...@hotmail.com>
wrote:

> Hi,
> There are  a lot of moving parts and a lot of unknowns from your
> description.
> Besides the version stuff.
>
> How many executors, how many cores? How much memory?
> Are you persisting (memory and disk) or just caching (memory)
>
> During the execution… same tables… are  you seeing a lot of shuffling of
> data for some queries and not others?
>
> It sounds like an interesting problem…
>
> On Jun 23, 2016, at 5:21 AM, Prabhu Joseph <prabhujose.ga...@gmail.com>
> wrote:
>
> Hi All,
>
>    On submitting 20 parallel same SQL query to Spark Thrift Server, the
> query execution time for some queries are less than a second and some are
> more than 2seconds. The Spark Thrift Server logs shows all 20 queries are
> submitted at same time 16/06/23 12:12:01 but the result schema are at
> different times.
>
> 16/06/23 12:12:01 INFO SparkExecuteStatementOperation: Running query
> 'select distinct val2 from philips1 where key>=1000 and key<=1500
>
> 16/06/23 12:12:*02* INFO SparkExecuteStatementOperation: Result Schema:
> ArrayBuffer(val2#2110)
> 16/06/23 12:12:*03* INFO SparkExecuteStatementOperation: Result Schema:
> ArrayBuffer(val2#2182)
> 16/06/23 12:12:*04* INFO SparkExecuteStatementOperation: Result Schema:
> ArrayBuffer(val2#2344)
> 16/06/23 12:12:*05* INFO SparkExecuteStatementOperation: Result Schema:
> ArrayBuffer(val2#2362)
>
> There are sufficient executors running on YARN. The concurrency is
> affected by Single Driver. How to improve the concurrency and what are the
> best practices.
>
> Thanks,
> Prabhu Joseph
>
>
>

Reply via email to