[ https://issues.apache.org/jira/browse/HIVE-15887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
KaiXu updated HIVE-15887: ------------------------- Description: when I run Hive queries on Spark, got below error in the console, after check the container's log, found it failed to connected to spark driver. I have set hive.spark.job.monitor.timeout=3600s, so the log said 'Job hasn't been submitted after 3601s', actually during this long-time period it's impossible no available resource, and also did not see any issue related to the network, so the cause is not clear from the message "Possible reasons include network issues, errors in remote driver or the cluster has no available resources, etc.". >From Hive's log, failed to get APP ID, so this might be the cause why the >driver did not start up. console log: Starting Spark Job = e9ce42c8-ff20-4ac8-803f-7668678c2a00 Job hasn't been submitted after 3601s. Aborting it. Possible reasons include network issues, errors in remote driver or the cluster has no available resources, etc. Please check YARN or Spark driver's logs for further information. Status: SENT FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.spark.SparkTask container's log: 17/02/13 05:05:54 INFO yarn.ApplicationMaster: Preparing Local resources 17/02/13 05:05:54 INFO yarn.ApplicationMaster: Prepared Local resources Map(__spark_libs__ -> resource { scheme: "hdfs" host: "hsx-node1" port: 8020 file: "/user/root/.sparkStaging/application_1486905599813_0046/__spark_libs__6842484649003444330.zip" } size: 153484072 timestamp: 1486926551130 type: ARCHIVE visibility: PRIVATE, __spark_conf__ -> resource { scheme: "hdfs" host: "hsx-node1" port: 8020 file: "/user/root/.sparkStaging/application_1486905599813_0046/__spark_conf__.zip" } size: 116245 timestamp: 1486926551318 type: ARCHIVE visibility: PRIVATE) 17/02/13 05:05:54 INFO yarn.ApplicationMaster: ApplicationAttemptId: appattempt_1486905599813_0046_000002 17/02/13 05:05:54 INFO spark.SecurityManager: Changing view acls to: root 17/02/13 05:05:54 INFO spark.SecurityManager: Changing modify acls to: root 17/02/13 05:05:54 INFO spark.SecurityManager: Changing view acls groups to: 17/02/13 05:05:54 INFO spark.SecurityManager: Changing modify acls groups to: 17/02/13 05:05:54 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set() 17/02/13 05:05:54 INFO yarn.ApplicationMaster: Waiting for Spark driver to be reachable. 17/02/13 05:05:54 ERROR yarn.ApplicationMaster: Failed to connect to driver at 192.168.1.1:43656, retrying ... 17/02/13 05:05:54 ERROR yarn.ApplicationMaster: Failed to connect to driver at 192.168.1.1:43656, retrying ... 17/02/13 05:05:54 ERROR yarn.ApplicationMaster: Failed to connect to driver at 192.168.1.1:43656, retrying ... 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver at 192.168.1.1:43656, retrying ... 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver at 192.168.1.1:43656, retrying ... 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver at 192.168.1.1:43656, retrying ... 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver at 192.168.1.1:43656, retrying ... 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver at 192.168.1.1:43656, retrying ... 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver at 192.168.1.1:43656, retrying ... 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver at 192.168.1.1:43656, retrying ... 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver at 192.168.1.1:43656, retrying ... 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver at 192.168.1.1:43656, retrying ... 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver at 192.168.1.1:43656, retrying ... 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver at 192.168.1.1:43656, retrying ... 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver at 192.168.1.1:43656, retrying ... 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver at 192.168.1.1:43656, retrying ... 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver at 192.168.1.1:43656, retrying ... 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver at 192.168.1.1:43656, retrying ... 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver at 192.168.1.1:43656, retrying ... 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver at 192.168.1.1:43656, retrying ... 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver at 192.168.1.1:43656, retrying ... 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver at 192.168.1.1:43656, retrying ... 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver at 192.168.1.1:43656, retrying ... 17/02/13 05:05:57 ERROR yarn.ApplicationMaster: Failed to connect to driver at 192.168.1.1:43656, retrying ... 17/02/13 05:05:57 ERROR yarn.ApplicationMaster: Failed to connect to driver at 192.168.1.1:43656, retrying ... 17/02/13 05:07:34 ERROR yarn.ApplicationMaster: Failed to connect to driver at 192.168.1.1:43656, retrying ... 17/02/13 05:07:34 ERROR yarn.ApplicationMaster: Failed to connect to driver at 192.168.1.1:43656, retrying ... 17/02/13 05:07:34 ERROR yarn.ApplicationMaster: Failed to connect to driver at 192.168.1.1:43656, retrying ... 17/02/13 05:07:34 ERROR yarn.ApplicationMaster: Failed to connect to driver at 192.168.1.1:43656, retrying ... 17/02/13 05:07:34 ERROR yarn.ApplicationMaster: Uncaught exception: org.apache.spark.SparkException: Failed to connect to driver! at org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkDriver(ApplicationMaster.scala:569) at org.apache.spark.deploy.yarn.ApplicationMaster.runExecutorLauncher(ApplicationMaster.scala:405) at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:247) at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:749) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:71) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:70) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:70) at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:747) at org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:774) at org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala) 17/02/13 05:07:34 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 10, (reason: Uncaught exception: org.apache.spark.SparkException: Failed to connect to driver!) 17/02/13 05:07:34 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: Uncaught exception: org.apache.spark.SparkException: Failed to connect to driver!) 17/02/13 05:07:34 INFO yarn.ApplicationMaster: Deleting staging directory hdfs://hsx-node1:8020/user/root/.sparkStaging/application_1486905599813_0046 17/02/13 05:07:34 INFO util.ShutdownHookManager: Shutdown hook called hive's log: 2017-02-13T03:10:01,639 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/13 03:10:01 INFO yarn.Client: Application report for application_1486905599813_0046 (state: ACCEPTED) 2017-02-13T03:10:06,640 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/13 03:10:06 INFO yarn.Client: Application report for application_1486905599813_0046 (state: ACCEPTED) 2017-02-13T03:10:08,176 WARN [c807cf48-301a-47b4-96df-495b2827d6ba main] impl.RemoteSparkJobStatus: Failed to get APP ID. java.util.concurrent.TimeoutException at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:49) ~[netty-all-4.0.29.Final.jar:4.0.29.Final] at org.apache.hadoop.hive.ql.exec.spark.status.impl.RemoteSparkJobStatus.getAppID(RemoteSparkJobStatus.java:65) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:114) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2168) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1824) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1511) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1222) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1212) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:400) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:430) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:446) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:749) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:715) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:642) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_60] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_60] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_60] at java.lang.reflect.Method.invoke(Method.java:497) ~[?:1.8.0_60] at org.apache.hadoop.util.RunJar.run(RunJar.java:221) ~[hadoop-common-2.7.1.jar:?] at org.apache.hadoop.util.RunJar.main(RunJar.java:136) ~[hadoop-common-2.7.1.jar:?] 2017-02-13T03:10:11,641 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/13 03:10:11 INFO yarn.Client: Application report for application_1486905599813_0046 (state: ACCEPTED) 2017-02-13T03:10:16,643 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/13 03:10:16 INFO yarn.Client: Application report for application_1486905599813_0046 (state: ACCEPTED) 2017-02-13T04:11:07,354 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/13 04:11:07 INFO yarn.Client: Application report for application_1486905599813_0046 (state: ACCEPTED) 2017-02-13T04:11:09,706 WARN [c807cf48-301a-47b4-96df-495b2827d6ba main] impl.RemoteSparkJobStatus: Failed to get APP ID. java.util.concurrent.TimeoutException at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:49) ~[netty-all-4.0.29.Final.jar:4.0.29.Final] at org.apache.hadoop.hive.ql.exec.spark.status.impl.RemoteSparkJobStatus.getAppID(RemoteSparkJobStatus.java:65) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:132) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2168) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1824) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1511) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1222) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1212) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:400) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:430) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:446) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:749) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:715) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:642) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_60] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_60] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_60] at java.lang.reflect.Method.invoke(Method.java:497) ~[?:1.8.0_60] at org.apache.hadoop.util.RunJar.run(RunJar.java:221) ~[hadoop-common-2.7.1.jar:?] at org.apache.hadoop.util.RunJar.main(RunJar.java:136) ~[hadoop-common-2.7.1.jar:?] 2017-02-13T04:11:09,719 ERROR [c807cf48-301a-47b4-96df-495b2827d6ba main] ql.Driver: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.spark.SparkTask 2017-02-13T04:11:09,720 INFO [c807cf48-301a-47b4-96df-495b2827d6ba main] ql.Driver: Completed executing command(queryId=root_20170213030905_947e7eca-a134-4652-883c-10bded3c6218); Time taken: 3723.688 seconds 2017-02-13T04:11:09,800 INFO [c807cf48-301a-47b4-96df-495b2827d6ba main] conf.HiveConf: Using the default value passed in for log id: c807cf48-301a-47b4-96df-495b2827d6ba 2017-02-13T04:11:09,800 INFO [c807cf48-301a-47b4-96df-495b2827d6ba main] session.SessionState: Resetting thread name to main 2017-02-13T04:11:09,800 INFO [main] conf.HiveConf: Using the default value passed in for log id: c807cf48-301a-47b4-96df-495b2827d6ba 2017-02-13T04:11:12,355 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/13 04:11:12 INFO yarn.Client: Application report for application_1486905599813_0046 (state: ACCEPTED) 2017-02-13T04:11:17,356 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/13 04:11:17 INFO yarn.Client: Application report for application_1486905599813_0046 (state: ACCEPTED) 2017-02-13T04:11:19,811 WARN [c807cf48-301a-47b4-96df-495b2827d6ba main] client.SparkClientImpl: Timed out shutting down remote driver, interrupting... 2017-02-13T04:11:19,811 WARN [Driver] client.SparkClientImpl: Waiting thread interrupted, killing child process. 2017-02-13T04:11:19,839 INFO [main] session.SessionState: Deleted directory: /tmp/hive/root/c807cf48-301a-47b4-96df-495b2827d6ba on fs with scheme hdfs 2017-02-13T04:11:19,839 INFO [main] session.SessionState: Deleted directory: /tmp/root/c807cf48-301a-47b4-96df-495b2827d6ba on fs with scheme file was: {noformat} 2017-02-13T03:10:01,639 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/13 03:10:01 INFO yarn.Client: Application report for application_1486905599813_0046 (state: ACCEPTED) 2017-02-13T03:10:06,640 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/13 03:10:06 INFO yarn.Client: Application report for application_1486905599813_0046 (state: ACCEPTED) 2017-02-13T03:10:08,176 WARN [c807cf48-301a-47b4-96df-495b2827d6ba main] impl.RemoteSparkJobStatus: Failed to get APP ID. java.util.concurrent.TimeoutException at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:49) ~[netty-all-4.0.29.Final.jar:4.0.29.Final] at org.apache.hadoop.hive.ql.exec.spark.status.impl.RemoteSparkJobStatus.getAppID(RemoteSparkJobStatus.java:65) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:114) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2168) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1824) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1511) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1222) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1212) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:400) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:430) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:446) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:749) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:715) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:642) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_60] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_60] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_60] at java.lang.reflect.Method.invoke(Method.java:497) ~[?:1.8.0_60] at org.apache.hadoop.util.RunJar.run(RunJar.java:221) ~[hadoop-common-2.7.1.jar:?] at org.apache.hadoop.util.RunJar.main(RunJar.java:136) ~[hadoop-common-2.7.1.jar:?] 2017-02-13T03:10:11,641 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/13 03:10:11 INFO yarn.Client: Application report for application_1486905599813_0046 (state: ACCEPTED) 2017-02-13T03:10:16,643 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/13 03:10:16 INFO yarn.Client: Application report for application_1486905599813_0046 (state: ACCEPTED) {noformat} > could not get APP ID and cause failed to connect to spark driver on > yarn-client mode > ------------------------------------------------------------------------------------ > > Key: HIVE-15887 > URL: https://issues.apache.org/jira/browse/HIVE-15887 > Project: Hive > Issue Type: Bug > Components: Hive, Spark > Affects Versions: 2.2.0 > Environment: Hive2.2 > Spark2.0.2 > hadoop2.7.1 > Reporter: KaiXu > > when I run Hive queries on Spark, got below error in the console, after check > the container's log, found it failed to connected to spark driver. I have set > hive.spark.job.monitor.timeout=3600s, so the log said 'Job hasn't been > submitted after 3601s', actually during this long-time period it's impossible > no available resource, and also did not see any issue related to the network, > so the cause is not clear from the message "Possible reasons include network > issues, errors in remote driver or the cluster has no available resources, > etc.". > From Hive's log, failed to get APP ID, so this might be the cause why the > driver did not start up. > console log: > Starting Spark Job = e9ce42c8-ff20-4ac8-803f-7668678c2a00 > Job hasn't been submitted after 3601s. Aborting it. > Possible reasons include network issues, errors in remote driver or the > cluster has no available resources, etc. > Please check YARN or Spark driver's logs for further information. > Status: SENT > FAILED: Execution Error, return code 2 from > org.apache.hadoop.hive.ql.exec.spark.SparkTask > container's log: > 17/02/13 05:05:54 INFO yarn.ApplicationMaster: Preparing Local resources > 17/02/13 05:05:54 INFO yarn.ApplicationMaster: Prepared Local resources > Map(__spark_libs__ -> resource { scheme: "hdfs" host: "hsx-node1" port: 8020 > file: > "/user/root/.sparkStaging/application_1486905599813_0046/__spark_libs__6842484649003444330.zip" > } size: 153484072 timestamp: 1486926551130 type: ARCHIVE visibility: > PRIVATE, __spark_conf__ -> resource { scheme: "hdfs" host: "hsx-node1" port: > 8020 file: > "/user/root/.sparkStaging/application_1486905599813_0046/__spark_conf__.zip" > } size: 116245 timestamp: 1486926551318 type: ARCHIVE visibility: PRIVATE) > 17/02/13 05:05:54 INFO yarn.ApplicationMaster: ApplicationAttemptId: > appattempt_1486905599813_0046_000002 > 17/02/13 05:05:54 INFO spark.SecurityManager: Changing view acls to: root > 17/02/13 05:05:54 INFO spark.SecurityManager: Changing modify acls to: root > 17/02/13 05:05:54 INFO spark.SecurityManager: Changing view acls groups to: > 17/02/13 05:05:54 INFO spark.SecurityManager: Changing modify acls groups to: > 17/02/13 05:05:54 INFO spark.SecurityManager: SecurityManager: authentication > disabled; ui acls disabled; users with view permissions: Set(root); groups > with view permissions: Set(); users with modify permissions: Set(root); > groups with modify permissions: Set() > 17/02/13 05:05:54 INFO yarn.ApplicationMaster: Waiting for Spark driver to be > reachable. > 17/02/13 05:05:54 ERROR yarn.ApplicationMaster: Failed to connect to driver > at 192.168.1.1:43656, retrying ... > 17/02/13 05:05:54 ERROR yarn.ApplicationMaster: Failed to connect to driver > at 192.168.1.1:43656, retrying ... > 17/02/13 05:05:54 ERROR yarn.ApplicationMaster: Failed to connect to driver > at 192.168.1.1:43656, retrying ... > 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver > at 192.168.1.1:43656, retrying ... > 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver > at 192.168.1.1:43656, retrying ... > 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver > at 192.168.1.1:43656, retrying ... > 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver > at 192.168.1.1:43656, retrying ... > 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver > at 192.168.1.1:43656, retrying ... > 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver > at 192.168.1.1:43656, retrying ... > 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver > at 192.168.1.1:43656, retrying ... > 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver > at 192.168.1.1:43656, retrying ... > 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver > at 192.168.1.1:43656, retrying ... > 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver > at 192.168.1.1:43656, retrying ... > 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver > at 192.168.1.1:43656, retrying ... > 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver > at 192.168.1.1:43656, retrying ... > 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver > at 192.168.1.1:43656, retrying ... > 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver > at 192.168.1.1:43656, retrying ... > 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver > at 192.168.1.1:43656, retrying ... > 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver > at 192.168.1.1:43656, retrying ... > 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver > at 192.168.1.1:43656, retrying ... > 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver > at 192.168.1.1:43656, retrying ... > 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver > at 192.168.1.1:43656, retrying ... > 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver > at 192.168.1.1:43656, retrying ... > 17/02/13 05:05:57 ERROR yarn.ApplicationMaster: Failed to connect to driver > at 192.168.1.1:43656, retrying ... > 17/02/13 05:05:57 ERROR yarn.ApplicationMaster: Failed to connect to driver > at 192.168.1.1:43656, retrying ... > 17/02/13 05:07:34 ERROR yarn.ApplicationMaster: Failed to connect to driver > at 192.168.1.1:43656, retrying ... > 17/02/13 05:07:34 ERROR yarn.ApplicationMaster: Failed to connect to driver > at 192.168.1.1:43656, retrying ... > 17/02/13 05:07:34 ERROR yarn.ApplicationMaster: Failed to connect to driver > at 192.168.1.1:43656, retrying ... > 17/02/13 05:07:34 ERROR yarn.ApplicationMaster: Failed to connect to driver > at 192.168.1.1:43656, retrying ... > 17/02/13 05:07:34 ERROR yarn.ApplicationMaster: Uncaught exception: > org.apache.spark.SparkException: Failed to connect to driver! > at > org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkDriver(ApplicationMaster.scala:569) > at > org.apache.spark.deploy.yarn.ApplicationMaster.runExecutorLauncher(ApplicationMaster.scala:405) > at > org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:247) > at > org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:749) > at > org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:71) > at > org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:70) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at > org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:70) > at > org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:747) > at > org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:774) > at > org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala) > 17/02/13 05:07:34 INFO yarn.ApplicationMaster: Final app status: FAILED, > exitCode: 10, (reason: Uncaught exception: org.apache.spark.SparkException: > Failed to connect to driver!) > 17/02/13 05:07:34 INFO yarn.ApplicationMaster: Unregistering > ApplicationMaster with FAILED (diag message: Uncaught exception: > org.apache.spark.SparkException: Failed to connect to driver!) > 17/02/13 05:07:34 INFO yarn.ApplicationMaster: Deleting staging directory > hdfs://hsx-node1:8020/user/root/.sparkStaging/application_1486905599813_0046 > 17/02/13 05:07:34 INFO util.ShutdownHookManager: Shutdown hook called > hive's log: > 2017-02-13T03:10:01,639 INFO [stderr-redir-1] client.SparkClientImpl: > 17/02/13 03:10:01 INFO yarn.Client: Application report for > application_1486905599813_0046 (state: ACCEPTED) > 2017-02-13T03:10:06,640 INFO [stderr-redir-1] client.SparkClientImpl: > 17/02/13 03:10:06 INFO yarn.Client: Application report for > application_1486905599813_0046 (state: ACCEPTED) > 2017-02-13T03:10:08,176 WARN [c807cf48-301a-47b4-96df-495b2827d6ba main] > impl.RemoteSparkJobStatus: Failed to get APP ID. > java.util.concurrent.TimeoutException > at > io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:49) > ~[netty-all-4.0.29.Final.jar:4.0.29.Final] > at > org.apache.hadoop.hive.ql.exec.spark.status.impl.RemoteSparkJobStatus.getAppID(RemoteSparkJobStatus.java:65) > ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:114) > ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199) > ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) > ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2168) > ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1824) > ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1511) > ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1222) > ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1212) > ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233) > ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184) > ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:400) > ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336) > ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:430) > ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:446) > ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:749) > ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:715) > ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:642) > ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ~[?:1.8.0_60] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[?:1.8.0_60] > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:1.8.0_60] > at java.lang.reflect.Method.invoke(Method.java:497) ~[?:1.8.0_60] > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > ~[hadoop-common-2.7.1.jar:?] > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > ~[hadoop-common-2.7.1.jar:?] > 2017-02-13T03:10:11,641 INFO [stderr-redir-1] client.SparkClientImpl: > 17/02/13 03:10:11 INFO yarn.Client: Application report for > application_1486905599813_0046 (state: ACCEPTED) > 2017-02-13T03:10:16,643 INFO [stderr-redir-1] client.SparkClientImpl: > 17/02/13 03:10:16 INFO yarn.Client: Application report for > application_1486905599813_0046 (state: ACCEPTED) > 2017-02-13T04:11:07,354 INFO [stderr-redir-1] client.SparkClientImpl: > 17/02/13 04:11:07 INFO yarn.Client: Application report for > application_1486905599813_0046 (state: ACCEPTED) > 2017-02-13T04:11:09,706 WARN [c807cf48-301a-47b4-96df-495b2827d6ba main] > impl.RemoteSparkJobStatus: Failed to get APP ID. > java.util.concurrent.TimeoutException > at > io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:49) > ~[netty-all-4.0.29.Final.jar:4.0.29.Final] > at > org.apache.hadoop.hive.ql.exec.spark.status.impl.RemoteSparkJobStatus.getAppID(RemoteSparkJobStatus.java:65) > ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:132) > ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199) > ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) > ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2168) > ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1824) > ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1511) > ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1222) > ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1212) > ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233) > ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184) > ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:400) > ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336) > ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:430) > ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:446) > ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:749) > ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:715) > ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:642) > ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ~[?:1.8.0_60] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[?:1.8.0_60] > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:1.8.0_60] > at java.lang.reflect.Method.invoke(Method.java:497) ~[?:1.8.0_60] > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > ~[hadoop-common-2.7.1.jar:?] > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > ~[hadoop-common-2.7.1.jar:?] > 2017-02-13T04:11:09,719 ERROR [c807cf48-301a-47b4-96df-495b2827d6ba main] > ql.Driver: FAILED: Execution Error, return code 2 from > org.apache.hadoop.hive.ql.exec.spark.SparkTask > 2017-02-13T04:11:09,720 INFO [c807cf48-301a-47b4-96df-495b2827d6ba main] > ql.Driver: Completed executing > command(queryId=root_20170213030905_947e7eca-a134-4652-883c-10bded3c6218); > Time taken: 3723.688 seconds > 2017-02-13T04:11:09,800 INFO [c807cf48-301a-47b4-96df-495b2827d6ba main] > conf.HiveConf: Using the default value passed in for log id: > c807cf48-301a-47b4-96df-495b2827d6ba > 2017-02-13T04:11:09,800 INFO [c807cf48-301a-47b4-96df-495b2827d6ba main] > session.SessionState: Resetting thread name to main > 2017-02-13T04:11:09,800 INFO [main] conf.HiveConf: Using the default value > passed in for log id: c807cf48-301a-47b4-96df-495b2827d6ba > 2017-02-13T04:11:12,355 INFO [stderr-redir-1] client.SparkClientImpl: > 17/02/13 04:11:12 INFO yarn.Client: Application report for > application_1486905599813_0046 (state: ACCEPTED) > 2017-02-13T04:11:17,356 INFO [stderr-redir-1] client.SparkClientImpl: > 17/02/13 04:11:17 INFO yarn.Client: Application report for > application_1486905599813_0046 (state: ACCEPTED) > 2017-02-13T04:11:19,811 WARN [c807cf48-301a-47b4-96df-495b2827d6ba main] > client.SparkClientImpl: Timed out shutting down remote driver, interrupting... > 2017-02-13T04:11:19,811 WARN [Driver] client.SparkClientImpl: Waiting thread > interrupted, killing child process. > 2017-02-13T04:11:19,839 INFO [main] session.SessionState: Deleted directory: > /tmp/hive/root/c807cf48-301a-47b4-96df-495b2827d6ba on fs with scheme hdfs > 2017-02-13T04:11:19,839 INFO [main] session.SessionState: Deleted directory: > /tmp/root/c807cf48-301a-47b4-96df-495b2827d6ba on fs with scheme file -- This message was sent by Atlassian JIRA (v6.3.15#6346)