[jira] [Commented] (SPARK-20662) Block jobs that have greater than a configured number of tasks
[ https://issues.apache.org/jira/browse/SPARK-20662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16035897#comment-16035897 ] Sean Owen commented on SPARK-20662: --- I still don't understand why YARN's capacity scheduler doesn't answer this. It shouldn't be reimplemented elsewhere, including HoS. I agree with [~vanzin] > Block jobs that have greater than a configured number of tasks > -- > > Key: SPARK-20662 > URL: https://issues.apache.org/jira/browse/SPARK-20662 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.6.0, 2.0.0 >Reporter: Xuefu Zhang > > In a shared cluster, it's desirable for an admin to block large Spark jobs. > While there might not be a single metrics defining the size of a job, the > number of tasks is usually a good indicator. Thus, it would be useful for > Spark scheduler to block a job whose number of tasks reaches a configured > limit. By default, the limit could be just infinite, to retain the existing > behavior. > MapReduce has mapreduce.job.max.map and mapreduce.job.max.reduce to be > configured, which blocks a MR job at job submission time. > The proposed configuration is spark.job.max.tasks with a default value -1 > (infinite). -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20662) Block jobs that have greater than a configured number of tasks
[ https://issues.apache.org/jira/browse/SPARK-20662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16035525#comment-16035525 ] Marcelo Vanzin commented on SPARK-20662: bq. For multiple users in an enterprise deployment, it's good to provide admin knobs. In this case, an admin just wanted to block bad jobs. Your definition of a bad job is the problem (well, one of the problems). "Number of tasks" is not an indication that a job is large. Each task may be really small. Spark shouldn't be in the job of defining what is a good or bad job, and that doesn't mean it's targeted at single user vs. multi user environments. It's just something that needs to be controlled at a different layer. If the admin is really worried about resource usage, he has control over the RM, and shouldn't rely on applications behaving nicely to enforce those controls. Applications misbehave. Users mess with configuration. Those are all things outside of the admin's control. > Block jobs that have greater than a configured number of tasks > -- > > Key: SPARK-20662 > URL: https://issues.apache.org/jira/browse/SPARK-20662 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.6.0, 2.0.0 >Reporter: Xuefu Zhang > > In a shared cluster, it's desirable for an admin to block large Spark jobs. > While there might not be a single metrics defining the size of a job, the > number of tasks is usually a good indicator. Thus, it would be useful for > Spark scheduler to block a job whose number of tasks reaches a configured > limit. By default, the limit could be just infinite, to retain the existing > behavior. > MapReduce has mapreduce.job.max.map and mapreduce.job.max.reduce to be > configured, which blocks a MR job at job submission time. > The proposed configuration is spark.job.max.tasks with a default value -1 > (infinite). -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20662) Block jobs that have greater than a configured number of tasks
[ https://issues.apache.org/jira/browse/SPARK-20662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16035519#comment-16035519 ] Xuefu Zhang commented on SPARK-20662: - I can understand the counter argument here if Spark is targeted for single user cases. For multiple users in an enterprise deployment, it's good to provide admin knobs. In this case, an admin just wanted to block bad jobs. I don't think RM meets that goal. This is actually implemented in Hive on Spark. However, I thought this is generic and may be desirable for others as well. In addition, blocking a job at submission is better than killing it after it started to run. If Spark doesn't think this is useful, then very well. > Block jobs that have greater than a configured number of tasks > -- > > Key: SPARK-20662 > URL: https://issues.apache.org/jira/browse/SPARK-20662 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.6.0, 2.0.0 >Reporter: Xuefu Zhang > > In a shared cluster, it's desirable for an admin to block large Spark jobs. > While there might not be a single metrics defining the size of a job, the > number of tasks is usually a good indicator. Thus, it would be useful for > Spark scheduler to block a job whose number of tasks reaches a configured > limit. By default, the limit could be just infinite, to retain the existing > behavior. > MapReduce has mapreduce.job.max.map and mapreduce.job.max.reduce to be > configured, which blocks a MR job at job submission time. > The proposed configuration is spark.job.max.tasks with a default value -1 > (infinite). -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20662) Block jobs that have greater than a configured number of tasks
[ https://issues.apache.org/jira/browse/SPARK-20662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16035487#comment-16035487 ] Marcelo Vanzin commented on SPARK-20662: BTW if you really, really, really think this is a good idea and you really want it, you can write a listener that just cancels jobs or kills the application whenever a stage with more than x tasks is submitted. No need for any changes in Spark. > Block jobs that have greater than a configured number of tasks > -- > > Key: SPARK-20662 > URL: https://issues.apache.org/jira/browse/SPARK-20662 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.6.0, 2.0.0 >Reporter: Xuefu Zhang > > In a shared cluster, it's desirable for an admin to block large Spark jobs. > While there might not be a single metrics defining the size of a job, the > number of tasks is usually a good indicator. Thus, it would be useful for > Spark scheduler to block a job whose number of tasks reaches a configured > limit. By default, the limit could be just infinite, to retain the existing > behavior. > MapReduce has mapreduce.job.max.map and mapreduce.job.max.reduce to be > configured, which blocks a MR job at job submission time. > The proposed configuration is spark.job.max.tasks with a default value -1 > (infinite). -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20662) Block jobs that have greater than a configured number of tasks
[ https://issues.apache.org/jira/browse/SPARK-20662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16035481#comment-16035481 ] Sean Owen commented on SPARK-20662: --- It's not equivalent to block the job, but why is that more desirable? your use case is what resource queues are for, and things like the capacity scheduler. Yes you limit the amount of resource a person is entitled for just that reason. A job that's blocked for being "too big" during busy hours may be fine to run off hours, but this would mean the job is never runnable ever. The capacity scheduler, in contrast, can let someone use resources when nobody else wants them but preempt when someone else needs them, so it doesn't really cost anyone else. It just doesn't seem like this is a wheel to reinvent in Spark. Possibly its own standalone resource manager, but if you need functionality like this you're not likely to get by with a standalone cluster anyway. > Block jobs that have greater than a configured number of tasks > -- > > Key: SPARK-20662 > URL: https://issues.apache.org/jira/browse/SPARK-20662 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.6.0, 2.0.0 >Reporter: Xuefu Zhang > > In a shared cluster, it's desirable for an admin to block large Spark jobs. > While there might not be a single metrics defining the size of a job, the > number of tasks is usually a good indicator. Thus, it would be useful for > Spark scheduler to block a job whose number of tasks reaches a configured > limit. By default, the limit could be just infinite, to retain the existing > behavior. > MapReduce has mapreduce.job.max.map and mapreduce.job.max.reduce to be > configured, which blocks a MR job at job submission time. > The proposed configuration is spark.job.max.tasks with a default value -1 > (infinite). -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20662) Block jobs that have greater than a configured number of tasks
[ https://issues.apache.org/jira/browse/SPARK-20662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16035478#comment-16035478 ] Marcelo Vanzin commented on SPARK-20662: bq. It's probably not a good idea to let one job takes all resources while starving others. I'm pretty sure that's why resource managers have queues. What you want here is a client-controlled, opt-in, application-level "nicety config" that tells it to not submit more tasks than a limit at a time. That control already exists - set a maximum number of executors for the app. number of executors times number of cores = max number of tasks. > Block jobs that have greater than a configured number of tasks > -- > > Key: SPARK-20662 > URL: https://issues.apache.org/jira/browse/SPARK-20662 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.6.0, 2.0.0 >Reporter: Xuefu Zhang > > In a shared cluster, it's desirable for an admin to block large Spark jobs. > While there might not be a single metrics defining the size of a job, the > number of tasks is usually a good indicator. Thus, it would be useful for > Spark scheduler to block a job whose number of tasks reaches a configured > limit. By default, the limit could be just infinite, to retain the existing > behavior. > MapReduce has mapreduce.job.max.map and mapreduce.job.max.reduce to be > configured, which blocks a MR job at job submission time. > The proposed configuration is spark.job.max.tasks with a default value -1 > (infinite). -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20662) Block jobs that have greater than a configured number of tasks
[ https://issues.apache.org/jira/browse/SPARK-20662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16035462#comment-16035462 ] Xuefu Zhang commented on SPARK-20662: - [~lyc] I'm talking about mapreduce.job.max.map, which is the maximum number of map tasks that a MR job may have. If a submitted MR job contains more map tasks than that, it will be rejected. Similar to mapreduce.job.max.reduce. [~sowen], [~vanzin], I don't think blocking a large (perhaps ridiculously) job is equivalent to letting it run slowly and for ever. The use case I have is: while yarn queue can be used to limit how much resources can be used, but a queue can be shared by a team or multiple applications. It's probably not a good idea to let one job takes all resources while starving others. Secondly, many those users who submit ridiculously large job have no idea on what they are doing and they don't even realize that their jobs are huge. Lastly and more importantly, our application environment has a global timeout, beyond which a job will be killed. If a large job gets killed this way, significant resources is wasted. Thus, blocking such a job at submission time helps preserve the resources. BTW, if the scenarios don't apply to a user, there is nothing for him/her to worry about because the default should keep them happy. In addition to spark.job.max.tasks, I'd also propose spark.stage.max.tasks, which limits the number of tasks any stage of a job may contain. The rationale behind this is that spark.job.max.tasks tends to favor jobs with small number of stages. With both, we can not only cover MR's mapreduce.job.max.map and mapreduce.job.max.reduce, but also control the overall size of a job. > Block jobs that have greater than a configured number of tasks > -- > > Key: SPARK-20662 > URL: https://issues.apache.org/jira/browse/SPARK-20662 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.6.0, 2.0.0 >Reporter: Xuefu Zhang > > In a shared cluster, it's desirable for an admin to block large Spark jobs. > While there might not be a single metrics defining the size of a job, the > number of tasks is usually a good indicator. Thus, it would be useful for > Spark scheduler to block a job whose number of tasks reaches a configured > limit. By default, the limit could be just infinite, to retain the existing > behavior. > MapReduce has mapreduce.job.max.map and mapreduce.job.max.reduce to be > configured, which blocks a MR job at job submission time. > The proposed configuration is spark.job.max.tasks with a default value -1 > (infinite). -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20662) Block jobs that have greater than a configured number of tasks
[ https://issues.apache.org/jira/browse/SPARK-20662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16034964#comment-16034964 ] Marcelo Vanzin commented on SPARK-20662: Yeah, I don't really understand this request. It doesn't matter how many tasks a job creates, what really matters is how many resources the cluster manager allows the application to allocate. If a job has 1 million tasks but the cluster manager allocates a single vcpu for the job, it will take forever, but it won't really bog down the cluster. > Block jobs that have greater than a configured number of tasks > -- > > Key: SPARK-20662 > URL: https://issues.apache.org/jira/browse/SPARK-20662 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.6.0, 2.0.0 >Reporter: Xuefu Zhang > > In a shared cluster, it's desirable for an admin to block large Spark jobs. > While there might not be a single metrics defining the size of a job, the > number of tasks is usually a good indicator. Thus, it would be useful for > Spark scheduler to block a job whose number of tasks reaches a configured > limit. By default, the limit could be just infinite, to retain the existing > behavior. > MapReduce has mapreduce.job.max.map and mapreduce.job.max.reduce to be > configured, which blocks a MR job at job submission time. > The proposed configuration is spark.job.max.tasks with a default value -1 > (infinite). -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20662) Block jobs that have greater than a configured number of tasks
[ https://issues.apache.org/jira/browse/SPARK-20662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16034334#comment-16034334 ] Sean Owen commented on SPARK-20662: --- Isn't this better handled by the resource manager? for example, YARN lets you cap these things in a bunch of ways already, and the resource manager is a better place to manage, well, resources. > Block jobs that have greater than a configured number of tasks > -- > > Key: SPARK-20662 > URL: https://issues.apache.org/jira/browse/SPARK-20662 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.6.0, 2.0.0 >Reporter: Xuefu Zhang > > In a shared cluster, it's desirable for an admin to block large Spark jobs. > While there might not be a single metrics defining the size of a job, the > number of tasks is usually a good indicator. Thus, it would be useful for > Spark scheduler to block a job whose number of tasks reaches a configured > limit. By default, the limit could be just infinite, to retain the existing > behavior. > MapReduce has mapreduce.job.max.map and mapreduce.job.max.reduce to be > configured, which blocks a MR job at job submission time. > The proposed configuration is spark.job.max.tasks with a default value -1 > (infinite). -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20662) Block jobs that have greater than a configured number of tasks
[ https://issues.apache.org/jira/browse/SPARK-20662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16034282#comment-16034282 ] lyc commented on SPARK-20662: - Do you mean `mapreduce.job.running.map.limit`? The conf means `The maximum number of simultaneous map tasks per job. There is no limit if this value is 0 or negative.` This means task concurrency. And the behavior seems to be that stops scheduling tasks when job has that many running tasks, and starts scheduling when some tasks are done. This seems can be done in `DAGScheduler`, I'd like give it a try if the idea is accepted. cc @Marcelo Vanzin > Block jobs that have greater than a configured number of tasks > -- > > Key: SPARK-20662 > URL: https://issues.apache.org/jira/browse/SPARK-20662 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.6.0, 2.0.0 >Reporter: Xuefu Zhang > > In a shared cluster, it's desirable for an admin to block large Spark jobs. > While there might not be a single metrics defining the size of a job, the > number of tasks is usually a good indicator. Thus, it would be useful for > Spark scheduler to block a job whose number of tasks reaches a configured > limit. By default, the limit could be just infinite, to retain the existing > behavior. > MapReduce has mapreduce.job.max.map and mapreduce.job.max.reduce to be > configured, which blocks a MR job at job submission time. > The proposed configuration is spark.job.max.tasks with a default value -1 > (infinite). -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org