[ 
https://issues.apache.org/jira/browse/SPARK-20589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16127930#comment-16127930
 ] 

Amit Kumar commented on SPARK-20589:
------------------------------------

[~imranr] Like you said, we could restrict the number of executors for the 
whole pipeline, but it would make the throughput too slow or worse, start 
causing cause OOM and shuffle errors.
As for the other option, I'm doing exactly that, breaking up one pipeline into 
multiple stages. But, as you can imagine, it makes your workflow code much more 
longer and complex than desired. Not only do we have to add the additional 
complexity of serializing intermediate data on HDFS , it increases the total 
time.
Also , I don't agree about this being a rare use case. As [~mcnels1], also 
said, we would see this come up more and more as we start linking Apache Spark 
with external storage/querying solution which have a tighter QPS restrictions.



> Allow limiting task concurrency per stage
> -----------------------------------------
>
>                 Key: SPARK-20589
>                 URL: https://issues.apache.org/jira/browse/SPARK-20589
>             Project: Spark
>          Issue Type: Improvement
>          Components: Scheduler
>    Affects Versions: 2.1.0
>            Reporter: Thomas Graves
>
> It would be nice to have the ability to limit the number of concurrent tasks 
> per stage.  This is useful when your spark job might be accessing another 
> service and you don't want to DOS that service.  For instance Spark writing 
> to hbase or Spark doing http puts on a service.  Many times you want to do 
> this without limiting the number of partitions. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to