[ 
https://issues.apache.org/jira/browse/HIVE-17270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16120039#comment-16120039
 ] 

Peter Vary commented on HIVE-17270:
-----------------------------------

Ok, I think I understand now what is happening.

When running {{TestMiniSparkOnYarnCliDriver}} tests, we set the number of 
execution instances to 2 in the {{data/conf/spark/yarn-client/hive-site.xml}}:
{code}
<property>
  <name>spark.executor.instances</name>
  <value>2</value>
</property>
{code}

When running this code against a {{Hadoop23Shims.MiniSparkShim}} we create the 
cluster with this {{new MiniSparkOnYARNCluster("sparkOnYarn")}}. This means 
that the cluster is created with 1 nodeManager. So no matter what we set in the 
configuration we will have only 1 executor (SparkClientImpl.GetExecutorCountJob 
will return 1). Thus the resulting explain output will show 2*1 reducers:
{code}
   Stage: Stage-1
     Spark
       Edges:
         Reducer 2 <- Map 1 (GROUP, 2)      <-- number of reducers are 
instances * cores
         Reducer 3 <- Reducer 2 (SORT, 1)
{code}

The good news is, that the resulting output is consistent - always will show 
{{2}}.
No change is absolutely needed, but I would prefer to use consistent 
configuration - so I propose one of the followings:
# Change the {{spark.executor.instances}} to 1 in the hive-site.xml, or
# Change the cluster creation to have more nodes {{new 
MiniSparkOnYARNCluster("sparkOnYarn", 1, 2)}}

I expect these changes will not affect the test outputs at all. I personally 
prefer the 1st, since it will require less resource to run the tests, and I do 
not see the point of having more cores for every test.


Another change I propose is that when we are not able to get the executors we 
should fall back to the one provided by the configuration, so we can provide 
better experience in the first run too. So instead of this:
{code:title=SparkClientImpl}
  public ObjectPair<Long, Integer> getMemoryAndCores() throws Exception {
    SparkConf sparkConf = hiveSparkClient.getSparkConf();
    int numExecutors = hiveSparkClient.getExecutorCount();
    // at start-up, we may be unable to get number of executors
    if (numExecutors <= 0) {
      return new ObjectPair<Long, Integer>(-1L, -1);
    }
[..]
  }
{code}

We should read the configuration:
{code}
  @Override
  public ObjectPair<Long, Integer> getMemoryAndCores() throws Exception {
    SparkConf sparkConf = hiveSparkClient.getSparkConf();
    int numExecutors = hiveSparkClient.getExecutorCount();
    // at start-up, we may be unable to get number of executors, use the 
configuration values in this case
    if (numExecutors <= 0) {
      numExecutors = sparkConf.getInt("spark.executor.instances", 1);
    }
[..]
  }
{code}

What do you think about this [~stakiar], [~xuefuz], [~lirui]? If we agree, I 
can create the required patches / jiras etc.
And many thanks for your pointers! Those are helped me tremendously!

Peter

> Qtest results show wrong number of executors
> --------------------------------------------
>
>                 Key: HIVE-17270
>                 URL: https://issues.apache.org/jira/browse/HIVE-17270
>             Project: Hive
>          Issue Type: Bug
>          Components: Spark
>    Affects Versions: 3.0.0
>            Reporter: Peter Vary
>            Assignee: Peter Vary
>
> The hive-site.xml shows, that the TestMiniSparkOnYarnCliDriver uses 2 cores, 
> and 2 executor instances to run the queries. See: 
> https://github.com/apache/hive/blob/master/data/conf/spark/yarn-client/hive-site.xml#L233
> When reading the log files for the query tests, I see the following:
> {code}
> 2017-08-08T07:41:03,315  INFO [0381325d-2c8c-46fb-ab51-423defaddd84 main] 
> session.SparkSession: Spark cluster current has executors: 1, total cores: 2, 
> memory per executor: 512M, memoryFraction: 0.4
> {code}
> See: 
> http://104.198.109.242/logs/PreCommit-HIVE-Build-6299/succeeded/171-TestMiniSparkOnYarnCliDriver-insert_overwrite_directory2.q-scriptfile1.q-vector_outer_join0.q-and-17-more/logs/hive.log
> When running the tests against a real cluster, I found that running an 
> explain query for the first time I see 1 executor, but running it for the 
> second time I see 2 executors.
> Also setting some spark configuration on the cluster resets this behavior. 
> For the first time I will see 1 executor, and for the second time I will see 
> 2 executors again.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to