[jira] [Updated] (HIVE-17481) LLAP workload management (umbrella)

Sergey Shelukhin (JIRA) Thu, 07 Sep 2017 15:12:35 -0700

     [ 
https://issues.apache.org/jira/browse/HIVE-17481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sergey Shelukhin updated HIVE-17481:
------------------------------------
    Description: 
This effort is intended to improve various aspects of cluster sharing for LLAP. 
Some of these are applicable to non-LLAP queries and may later be extended to 
all queries. Administrators will be able to specify and apply policies for 
workload management ("resource plans") that apply to the entire cluster, with 
only one resource plan being active at a time. The policies will be created and 
modified using new Hive DML statements. 
The policies will cover:
* Dividing the cluster into a set of (optionally, nested) query pools that are 
each allocated a fraction of the cluster, a set query parallelism, resource 
sharing policy between queries, and potentially others like priority, etc.
* Mapping the incoming queries into pools based on the query user, groups, 
explicit configuration, etc.
* Specifying rules that perform actions on queries based on counter values 
(e.g. killing or moving queries).
One would also be able to switch policies on a live cluster without (usually) 
affecting running queries, including e.g. to change policies for daytime and 
nighttime usage patterns, and other similar scenarios. The switches would be 
safe and atomic; versioning may eventually be supported.

Some implementation details:
* WM will only be supported in HS2 (for obvious reasons).
* All LLAP query AMs will run in "interactive" YARN queue and will be fungible 
between Hive pools.
* We will use the concept of "guaranteed tasks" (also known as ducks) to 
enforce cluster allocation without a central scheduler and without compromising 
throughput. Guaranteed tasks preempt other (speculative) tasks and are 
distributed from HS2 to AMs, and from AMs to tasks, in accordance with 
percentage allocations in the policy. Each "duck" corresponds to a CPU resource 
on the cluster. The implementation will be isolated so as to allow different 
ones later.
* In future, we may consider improved task placement and late binding, similar 
to the ones described in Sparrow paper, to work around potential hotspots/etc. 
that are not avoided with the decentralized scheme.
* Only one HS2 will initially be supported to avoid split-brain workload 
management. We will also implement (in a tangential set of work items) 
active-passive HS2 recovery. Eventually, we intend to switch to full 
active-active HS2 configuration with shared WM and Tez session pool (unlike the 
current case with 2 separate session pools). 


  was:
This effort is intended to improve various aspects of cluster sharing for LLAP. 
Some of these are applicable to non-LLAP queries and may later be extended to 
all queries. Administrators will be able to specify and apply policies for 
workload management ("resource plans") that apply to the entire cluster, with 
only one resource plan being active at a time. The policies will be created and 
modified using new Hive DML statements. 
The policies will cover:
* Dividing the cluster into a set of (optionally, nested) query pools that are 
each allocated a fraction of the cluster, a set query parallelism, resource 
sharing policy between queries, and potentially others like priority, etc.
* Mapping the incoming queries into pools based on the query user, groups, 
explicit configuration, etc.
* Specifying rules that perform actions on queries based on counter values 
(e.g. killing or moving queries).
One would also be able to switch policies on a live cluster without (usually) 
affecting running queries, including e.g. to change policies for daytime and 
nighttime usage patterns, and other similar scenarios. The switches would be 
safe and atomic; versioning may eventually be supported.
Some implementation details:
* WM will only be supported in HS2 (for obvious reasons).
* All LLAP query AMs will run in "interactive" YARN queue and will be fungible 
between Hive pools.
* We will use the concept of "guaranteed tasks" (also known as ducks) to 
enforce cluster allocation without a central scheduler and without compromising 
throughput. Guaranteed tasks preempt other (speculative) tasks and are 
distributed from HS2 to AMs, and from AMs to tasks, in accordance with 
percentage allocations in the policy. Each "duck" corresponds to a CPU resource 
on the cluster. The implementation will be isolated so as to allow different 
ones later.
* In future, we may consider improved task placement and late binding, similar 
to the ones described in Sparrow paper, to work around potential hotspots/etc. 
that are not avoided with the decentralized scheme.
* Only one HS2 will initially be supported to avoid split-brain workload 
management. We will also implement (in a tangential set of work items) 
active-passive HS2 recovery. Eventually, we intend to switch to full 
active-active HS2 configuration with shared WM and Tez session pool (unlike the 
current case with 2 separate session pools). 



> LLAP workload management (umbrella)
> -----------------------------------
>
>                 Key: HIVE-17481
>                 URL: https://issues.apache.org/jira/browse/HIVE-17481
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>
> This effort is intended to improve various aspects of cluster sharing for 
> LLAP. Some of these are applicable to non-LLAP queries and may later be 
> extended to all queries. Administrators will be able to specify and apply 
> policies for workload management ("resource plans") that apply to the entire 
> cluster, with only one resource plan being active at a time. The policies 
> will be created and modified using new Hive DML statements. 
> The policies will cover:
> * Dividing the cluster into a set of (optionally, nested) query pools that 
> are each allocated a fraction of the cluster, a set query parallelism, 
> resource sharing policy between queries, and potentially others like 
> priority, etc.
> * Mapping the incoming queries into pools based on the query user, groups, 
> explicit configuration, etc.
> * Specifying rules that perform actions on queries based on counter values 
> (e.g. killing or moving queries).
> One would also be able to switch policies on a live cluster without (usually) 
> affecting running queries, including e.g. to change policies for daytime and 
> nighttime usage patterns, and other similar scenarios. The switches would be 
> safe and atomic; versioning may eventually be supported.
> Some implementation details:
> * WM will only be supported in HS2 (for obvious reasons).
> * All LLAP query AMs will run in "interactive" YARN queue and will be 
> fungible between Hive pools.
> * We will use the concept of "guaranteed tasks" (also known as ducks) to 
> enforce cluster allocation without a central scheduler and without 
> compromising throughput. Guaranteed tasks preempt other (speculative) tasks 
> and are distributed from HS2 to AMs, and from AMs to tasks, in accordance 
> with percentage allocations in the policy. Each "duck" corresponds to a CPU 
> resource on the cluster. The implementation will be isolated so as to allow 
> different ones later.
> * In future, we may consider improved task placement and late binding, 
> similar to the ones described in Sparrow paper, to work around potential 
> hotspots/etc. that are not avoided with the decentralized scheme.
> * Only one HS2 will initially be supported to avoid split-brain workload 
> management. We will also implement (in a tangential set of work items) 
> active-passive HS2 recovery. Eventually, we intend to switch to full 
> active-active HS2 configuration with shared WM and Tez session pool (unlike 
> the current case with 2 separate session pools). 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17481) LLAP workload management (umbrella)

Reply via email to