[jira] [Commented] (HIVE-16552) Limit the number of tasks a Spark job may contain

Xuefu Zhang (JIRA) Mon, 01 May 2017 20:54:22 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-16552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15992296#comment-15992296
 ]


Xuefu Zhang commented on HIVE-16552:
------------------------------------

[~lirui], as mentioned in the description, the main use of this property is to 
block large/bad queries that taking a lot of resources, such as scanning a lot 
of partitions. YARN resource settings doesn't prevent users from submitting 
such a large query. MR has things like mapreduce.job.max.map, whereas Spark 
doesn't provide such options.

Large/bad queries not just run longer but also creates huge load on HS2 and 
HDFS. This option provides an admin to control such queries.

Regular users don't have to worry about this configuration. They just need to 
rewrite their blocked queries. It's advisable for an admin to blacklist this 
configuration.

Also, for admins or regular users who don't have a such problem, the default 
value will just do for them.

Make sense?

> Limit the number of tasks a Spark job may contain
> -------------------------------------------------
>
>                 Key: HIVE-16552
>                 URL: https://issues.apache.org/jira/browse/HIVE-16552
>             Project: Hive
>          Issue Type: Improvement
>          Components: Spark
>    Affects Versions: 1.0.0, 2.0.0
>            Reporter: Xuefu Zhang
>            Assignee: Xuefu Zhang
>         Attachments: HIVE-16552.1.patch, HIVE-16552.patch
>
>
> It's commonly desirable to block bad and big queries that takes a lot of YARN 
> resources. One approach, similar to mapreduce.job.max.map in MapReduce, is to 
> stop a query that invokes a Spark job that contains too many tasks. The 
> proposal here is to introduce hive.spark.job.max.tasks with a default value 
> of -1 (no limit), which an admin can set to block queries that trigger too 
> many spark tasks.
> Please note that this control knob applies to a spark job, though it's 
> possible that one query can trigger multiple Spark jobs (such as in case of 
> map-join). Nevertheless, the proposed approach is still helpful.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16552) Limit the number of tasks a Spark job may contain

Reply via email to