[ https://issues.apache.org/jira/browse/SPARK-20589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16129289#comment-16129289 ]
Amit Kumar commented on SPARK-20589: ------------------------------------ As you said, adding job boundary via code will be much easier than the overhead of hdfs serialization/deserialization and writing multiple application submit workflows. Furthermore, the use case is for opposite of what you mentioned. It is when we have a 1000 concurrent tasks and from there we want to go to 20 concurrent tasks to write somewhere, without * Forcing a coalesce to 20 partitions which could cause huge partitions and possible OOM and shuffle errors * Affecting the earlier parallelisms, (the 1000 concurrent tasks etc) The thing is, that we don't want to reduce the number of partitions as it starts affecting either earlier tasks or cause huge partitions. But for some stages in the pipeline we want to limit the number active tasks at any given time. Adding the boundary via simple code, as proposed by [~Dhruve Ashar] seems much more simpler solutions than breaking the pipeline into different stages and running each with different configs. We do have to wait for his complete solution to pass judgement for whether or not it's too complex, but if he can achieve the result, I would think it will be more beneficial for the community > Allow limiting task concurrency per stage > ----------------------------------------- > > Key: SPARK-20589 > URL: https://issues.apache.org/jira/browse/SPARK-20589 > Project: Spark > Issue Type: Improvement > Components: Scheduler > Affects Versions: 2.1.0 > Reporter: Thomas Graves > > It would be nice to have the ability to limit the number of concurrent tasks > per stage. This is useful when your spark job might be accessing another > service and you don't want to DOS that service. For instance Spark writing > to hbase or Spark doing http puts on a service. Many times you want to do > this without limiting the number of partitions. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org