[ https://issues.apache.org/jira/browse/SPARK-20589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16127930#comment-16127930 ]
Amit Kumar commented on SPARK-20589: ------------------------------------ [~imranr] Like you said, we could restrict the number of executors for the whole pipeline, but it would make the throughput too slow or worse, start causing cause OOM and shuffle errors. As for the other option, I'm doing exactly that, breaking up one pipeline into multiple stages. But, as you can imagine, it makes your workflow code much more longer and complex than desired. Not only do we have to add the additional complexity of serializing intermediate data on HDFS , it increases the total time. Also , I don't agree about this being a rare use case. As [~mcnels1], also said, we would see this come up more and more as we start linking Apache Spark with external storage/querying solution which have a tighter QPS restrictions. > Allow limiting task concurrency per stage > ----------------------------------------- > > Key: SPARK-20589 > URL: https://issues.apache.org/jira/browse/SPARK-20589 > Project: Spark > Issue Type: Improvement > Components: Scheduler > Affects Versions: 2.1.0 > Reporter: Thomas Graves > > It would be nice to have the ability to limit the number of concurrent tasks > per stage. This is useful when your spark job might be accessing another > service and you don't want to DOS that service. For instance Spark writing > to hbase or Spark doing http puts on a service. Many times you want to do > this without limiting the number of partitions. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org