[ 
https://issues.apache.org/jira/browse/SPARK-22683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16288262#comment-16288262
 ] 

Xuefu Zhang commented on SPARK-22683:
-------------------------------------

Hi [~jcuquemelle], Thanks for working on this and bringing up the efficiency 
problem associated with dynamic allocation. Significant resource consumption 
increase is also experienced in our company when workload is migrated from MR 
to Spark (via Hive). Thus, I believe that there is a strong need to improve 
spark efficiency in addition to performance.

While your proposal has its merit, I largely concur with Sean that it might not 
be universally applicable to solve a class of problem rather than particular 
workload. Take MR as an example, which also allocate as many mappers/reducers 
as the number of map or reduce tasks, yet offers higher efficiency than Spark 
in many cases. The inefficiency associated with dynamic allocation comes in 
many aspects such as executor idling out, bigger executors, many stages (rather 
than 2 stages only in MR) in a spark job, etc. As there is a class of users 
conscious about resource consumption, especially when many moving their 
workload to the cloud, there demands a solution that's more generic to such 
users.

I have been thinking about a proposal that introduces a MR-based resource 
allocation in parallel with dynamic allocation. Such an allocation mechanism is 
based on MR style, but can be further enhanced to beat MR and be more adapted 
to Spark execution model. This would be a great alternative to dynamic 
allocation.

While dynamic is certainly performance centric, the new allocation scheme can 
still offer good performance improvement (compared to MR) while being 
efficiency-centric.

As a start point, I'm going to create an JIRA and move the discussion along 
this proposal over there. You're welcome to share your thoughts and/or 
contribute.

Thanks.

> Allow tuning the number of dynamically allocated executors wrt task number
> --------------------------------------------------------------------------
>
>                 Key: SPARK-22683
>                 URL: https://issues.apache.org/jira/browse/SPARK-22683
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 2.1.0, 2.2.0
>            Reporter: Julien Cuquemelle
>              Labels: pull-request-available
>
> let's say an executor has spark.executor.cores / spark.task.cpus taskSlots
> The current dynamic allocation policy allocates enough executors
> to have each taskSlot execute a single task, which minimizes latency, 
> but wastes resources when tasks are small regarding executor allocation
> overhead. 
> By adding the tasksPerExecutorSlot, it is made possible to specify how many 
> tasks
> a single slot should ideally execute to mitigate the overhead of executor
> allocation.
> PR: https://github.com/apache/spark/pull/19881



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to