[ https://issues.apache.org/jira/browse/HIVE-17481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16157762#comment-16157762 ]
Sergey Shelukhin commented on HIVE-17481: ----------------------------------------- [~hagleitn] [~sseth] [~prasanth_j] [~harishjp] fyi this is the umbrella JIRA > LLAP workload management (umbrella) > ----------------------------------- > > Key: HIVE-17481 > URL: https://issues.apache.org/jira/browse/HIVE-17481 > Project: Hive > Issue Type: New Feature > Reporter: Sergey Shelukhin > Assignee: Sergey Shelukhin > > This effort is intended to improve various aspects of cluster sharing for > LLAP. Some of these are applicable to non-LLAP queries and may later be > extended to all queries. Administrators will be able to specify and apply > policies for workload management ("resource plans") that apply to the entire > cluster, with only one resource plan being active at a time. The policies > will be created and modified using new Hive DDL statements. > The policies will cover: > * Dividing the cluster into a set of (optionally, nested) query pools that > are each allocated a fraction of the cluster, a set query parallelism, > resource sharing policy between queries, and potentially others like > priority, etc. > * Mapping the incoming queries into pools based on the query user, groups, > explicit configuration, etc. > * Specifying rules that perform actions on queries based on counter values > (e.g. killing or moving queries). > One would also be able to switch policies on a live cluster without (usually) > affecting running queries, including e.g. to change policies for daytime and > nighttime usage patterns, and other similar scenarios. The switches would be > safe and atomic; versioning may eventually be supported. > Some implementation details: > * WM will only be supported in HS2 (for obvious reasons). > * All LLAP query AMs will run in "interactive" YARN queue and will be > fungible between Hive pools. > * We will use the concept of "guaranteed tasks" (also known as ducks) to > enforce cluster allocation without a central scheduler and without > compromising throughput. Guaranteed tasks preempt other (speculative) tasks > and are distributed from HS2 to AMs, and from AMs to tasks, in accordance > with percentage allocations in the policy. Each "duck" corresponds to a CPU > resource on the cluster. The implementation will be isolated so as to allow > different ones later. > * In future, we may consider improved task placement and late binding, > similar to the ones described in Sparrow paper, to work around potential > hotspots/etc. that are not avoided with the decentralized scheme. > * Only one HS2 will initially be supported to avoid split-brain workload > management. We will also implement (in a tangential set of work items) > active-passive HS2 recovery. Eventually, we intend to switch to full > active-active HS2 configuration with shared WM and Tez session pool (unlike > the current case with 2 separate session pools). -- This message was sent by Atlassian JIRA (v6.4.14#64029)