Re: Spark Thrift Server Concurrency

Michael Segel Thu, 23 Jun 2016 10:43:11 -0700

Hi, 
There are  a lot of moving parts and a lot of unknowns from your description. 
Besides the version stuff.


How many executors, how many cores? How much memory? 
Are you persisting (memory and disk) or just caching (memory) 

During the execution… same tables… are  you seeing a lot of shuffling of data 
for some queries and not others? 

It sounds like an interesting problem… 

> On Jun 23, 2016, at 5:21 AM, Prabhu Joseph <prabhujose.ga...@gmail.com> wrote:
> 
> Hi All,
> 
>    On submitting 20 parallel same SQL query to Spark Thrift Server, the query 
> execution time for some queries are less than a second and some are more than 
> 2seconds. The Spark Thrift Server logs shows all 20 queries are submitted at 
> same time 16/06/23 12:12:01 but the result schema are at different times.
> 
> 16/06/23 12:12:01 INFO SparkExecuteStatementOperation: Running query 'select 
> distinct val2 from philips1 where key>=1000 and key<=1500
> 
> 16/06/23 12:12:02 INFO SparkExecuteStatementOperation: Result Schema: 
> ArrayBuffer(val2#2110)
> 16/06/23 12:12:03 INFO SparkExecuteStatementOperation: Result Schema: 
> ArrayBuffer(val2#2182)
> 16/06/23 12:12:04 INFO SparkExecuteStatementOperation: Result Schema: 
> ArrayBuffer(val2#2344)
> 16/06/23 12:12:05 INFO SparkExecuteStatementOperation: Result Schema: 
> ArrayBuffer(val2#2362)
> 
> There are sufficient executors running on YARN. The concurrency is affected 
> by Single Driver. How to improve the concurrency and what are the best 
> practices.
> 
> Thanks,
> Prabhu Joseph

Re: Spark Thrift Server Concurrency

Reply via email to