[ https://issues.apache.org/jira/browse/SPARK-22683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16288262#comment-16288262 ]
Xuefu Zhang commented on SPARK-22683: ------------------------------------- Hi [~jcuquemelle], Thanks for working on this and bringing up the efficiency problem associated with dynamic allocation. Significant resource consumption increase is also experienced in our company when workload is migrated from MR to Spark (via Hive). Thus, I believe that there is a strong need to improve spark efficiency in addition to performance. While your proposal has its merit, I largely concur with Sean that it might not be universally applicable to solve a class of problem rather than particular workload. Take MR as an example, which also allocate as many mappers/reducers as the number of map or reduce tasks, yet offers higher efficiency than Spark in many cases. The inefficiency associated with dynamic allocation comes in many aspects such as executor idling out, bigger executors, many stages (rather than 2 stages only in MR) in a spark job, etc. As there is a class of users conscious about resource consumption, especially when many moving their workload to the cloud, there demands a solution that's more generic to such users. I have been thinking about a proposal that introduces a MR-based resource allocation in parallel with dynamic allocation. Such an allocation mechanism is based on MR style, but can be further enhanced to beat MR and be more adapted to Spark execution model. This would be a great alternative to dynamic allocation. While dynamic is certainly performance centric, the new allocation scheme can still offer good performance improvement (compared to MR) while being efficiency-centric. As a start point, I'm going to create an JIRA and move the discussion along this proposal over there. You're welcome to share your thoughts and/or contribute. Thanks. > Allow tuning the number of dynamically allocated executors wrt task number > -------------------------------------------------------------------------- > > Key: SPARK-22683 > URL: https://issues.apache.org/jira/browse/SPARK-22683 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 2.1.0, 2.2.0 > Reporter: Julien Cuquemelle > Labels: pull-request-available > > let's say an executor has spark.executor.cores / spark.task.cpus taskSlots > The current dynamic allocation policy allocates enough executors > to have each taskSlot execute a single task, which minimizes latency, > but wastes resources when tasks are small regarding executor allocation > overhead. > By adding the tasksPerExecutorSlot, it is made possible to specify how many > tasks > a single slot should ideally execute to mitigate the overhead of executor > allocation. > PR: https://github.com/apache/spark/pull/19881 -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org