[ 
https://issues.apache.org/jira/browse/HIVE-7768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14256960#comment-14256960
 ] 

Chengxiang Li commented on HIVE-7768:
-------------------------------------

As SPARK-3174 is resolved, Hive on Spark could benefit from this feature as it 
share SparkContext between queries in specified user session, I verified the 
Spark executor scaling ability with HoS on YARN in yarn-client mode, it works 
as expected. To enable the auto executor scaling, several prerequisite is 
required:
# Build spark code source, add 
$\{spark.home}/network/yarn/target/$\{scala_version}/spark-$\{spark_version}-yarn-shuffle.jar
 into YARN classpath.
# Configure yarn-site.xml, set yarn.nodemanager.aux-services=spark_shuffle, and 
yarn.nodemanager.aux-services.spark_shuffle.class=org.apache.spark.network.yarn.YarnShuffleService,
 then start YARN service.
# Set spark.shuffle.service.enabled=true, which is required by executor auto 
scaling feature.
# Set spark.dynamicAllocation.enabled=true, and use 
spark.dynamicAllocation.minExecutors and spark.dynamicAllocation.maxExecutors 
to set executor number range.

My verification go through as below:
# run a query which launch max number executors, after query finished, 
executors number would reduce down to min number executors as configured after 
an interval(configured with spark.dynamicAllocation.executorIdleTimeout).
# run query again, during the execution, executor number grow to max number 
executor exponentially after an interval(configured with 
spark.dynamicAllocation.schedulerBacklogTimeout).

Spark use max executor number as default value, while it seems more make sense 
for HoS to use min executor number as default value, there are some discussion 
about this on SPARK-4585.

> Integrate with Spark executor scaling [Spark Branch]
> ----------------------------------------------------
>
>                 Key: HIVE-7768
>                 URL: https://issues.apache.org/jira/browse/HIVE-7768
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Brock Noland
>            Assignee: Chengxiang Li
>            Priority: Critical
>
> Scenario:
> A user connects to Hive and runs a query on a small time. Our SC is sized for 
> that small table. They then run a query on a much larger table. We'll need to 
> "re-size" the SC which I don't think Spark supports today, so we need to 
> research what is available today in Spark and how Tez works.
> More details:
> Similar to Tez, it's likely our "SparkContext" is going to be long lived and 
> process many queries. Some queries will be large and some small. Additionally 
> the SC might be idle for long periods of time.
> In this JIRA we will research the following:
> * How Spark decides the number of slaves for a given RDD today
> * Given a SC when you create a new RDD based on a much larger input dataset, 
> does the SC adjust?
> * How Tez increases/decreases the size of the running YARN application (set 
> of slaves)
> * How Tez handles scenarios when it has a running set of slaves in YARN and 
> requests more resources for a query and fails to get additional resources
> * How Tez decides to timeout idle slaves
> This will guide requirements we'll need from Spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to