[ 
https://issues.apache.org/jira/browse/SPARK-4341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207885#comment-14207885
 ] 

Sean Owen commented on SPARK-4341:
----------------------------------

So I think some of this is already done by Spark. For example, the number of 
partitions is determined in the same way that Hadoop does, and carries through 
a pipeline of transformations.

Some of this is not necessarily the right thing to do. For example I could be 
running several transformations at once, and trying to match each's parallelism 
to the number of executors may be inefficient, not only because it may mean 
making partitions that are excessively small or large, but because it may 
require a shuffle, which is expensive.

Finally I think the issue of resource usage is better dealt with by 
increasing/decreasing the number of executors dynamically in response to demand 
or load, and there is already work in progress on those. So maybe that covers 
what you are thinking of already.

> Spark need to set num-executors automatically
> ---------------------------------------------
>
>                 Key: SPARK-4341
>                 URL: https://issues.apache.org/jira/browse/SPARK-4341
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 1.1.0
>            Reporter: Hong Shen
>
> The mapreduce job can set maptask automaticlly, but in spark, we have to set 
> num-executors, executor memory and cores. It's difficult for users to set 
> these args, especially for the users want to use spark sql. So when user 
> havn't set num-executors,  spark should set num-executors automatically 
> accroding to the input partitions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to