[ 
https://issues.apache.org/jira/browse/SPARK-20589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16129289#comment-16129289
 ] 

Amit Kumar commented on SPARK-20589:
------------------------------------

As you said, adding job boundary via code will be much easier than the overhead 
of hdfs serialization/deserialization and writing multiple application submit 
workflows.

Furthermore, the use case is for opposite of what you mentioned.  It is when we 
have a 1000 concurrent tasks and from there we want to go to 20 concurrent 
tasks to write somewhere, without
* Forcing a coalesce to 20 partitions which could cause huge partitions and 
possible OOM and shuffle errors
* Affecting the earlier parallelisms, (the 1000 concurrent tasks etc)

The thing is, that we don't want to reduce the number of partitions as it 
starts affecting either earlier tasks or cause huge partitions. But for some 
stages in the pipeline we want to limit the number active tasks at any given 
time.  Adding the boundary via simple code, as proposed by [~Dhruve Ashar] 
seems much more simpler solutions than breaking the pipeline into different 
stages and running each with different configs.  We do have to wait for his 
complete solution to pass judgement for whether or not it's too complex, but if 
he can achieve the result,  I would think it will be more beneficial for the 
community

> Allow limiting task concurrency per stage
> -----------------------------------------
>
>                 Key: SPARK-20589
>                 URL: https://issues.apache.org/jira/browse/SPARK-20589
>             Project: Spark
>          Issue Type: Improvement
>          Components: Scheduler
>    Affects Versions: 2.1.0
>            Reporter: Thomas Graves
>
> It would be nice to have the ability to limit the number of concurrent tasks 
> per stage.  This is useful when your spark job might be accessing another 
> service and you don't want to DOS that service.  For instance Spark writing 
> to hbase or Spark doing http puts on a service.  Many times you want to do 
> this without limiting the number of partitions. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to