[
https://issues.apache.org/jira/browse/HIVE-7958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14122192#comment-14122192
]
Hive QA commented on HIVE-7958:
-------------------------------
{color:red}Overall{color}: -1 at least one tests failed
Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12666605/HIVE-7958-spark.patch
{color:red}ERROR:{color} -1 due to 19 failed/errored test(s), 6291 tests
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_alter_merge_stats_orc
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_1
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_10
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_15
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_16
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_18
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_19
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_20
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_21
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_24
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_25
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_4
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_5
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_7
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_8
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_remove_9
{noformat}
Test results:
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/112/testReport
Console output:
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/112/console
Test logs:
http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-112/
Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 19 tests failed
{noformat}
This message is automatically generated.
ATTACHMENT ID: 12666605
> SparkWork generated by SparkCompiler may require multiple Spark jobs to run
> ---------------------------------------------------------------------------
>
> Key: HIVE-7958
> URL: https://issues.apache.org/jira/browse/HIVE-7958
> Project: Hive
> Issue Type: Bug
> Components: Spark
> Reporter: Xuefu Zhang
> Assignee: Xuefu Zhang
> Priority: Critical
> Labels: Spark-M1
> Attachments: HIVE-7958-spark.patch
>
>
> A SparkWork instance currently may contain disjointed work graphs. For
> instance, union_remove_1.q may generated a plan like this:
> {code}
> Reduce2 <- Map 1
> Reduce4 <- Map 3
> {code}
> The SparkPlan instance generated from this work graph contains two result
> RDDs. When such plan is executed, we call .foreach() on the two RDDs
> sequentially, which results two Spark jobs, one after the other.
> While this works functionally, the performance will not be great as the Spark
> jobs are run sequentially rather than concurrently.
> Another side effect of this is that the corresponding SparkPlan instance is
> over-complicated.
> The are two potential approaches:
> 1. Let SparkCompiler generate a work that can be executed in ONE Spark job
> only. In above example, two Spark task should be generated.
> 2. Let SparkPlanGenerate generate multiple Spark plans and then SparkClient
> executes them concurrently.
> Approach #1 seems more reasonable and naturally fit to our architecture.
> Also, Hive's task execution framework already takes care of the task
> concurrency.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)