[
https://issues.apache.org/jira/browse/SPARK-19569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15863109#comment-15863109
]
KaiXu commented on SPARK-19569:
-------------------------------
it's not the IP address resolution issue (SPARK-5113), since 192.168.1.1 is the
client node(yarn-client the driver node) .
> could not connect to spark driver on yarn-client mode
> ------------------------------------------------------
>
> Key: SPARK-19569
> URL: https://issues.apache.org/jira/browse/SPARK-19569
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 2.0.2
> Environment: hadoop2.7.1
> spark2.0.2
> hive2.2
> Reporter: KaiXu
>
> when I run Hive queries on Spark, got below error in the console, after check
> the container's log, found it failed to connected to spark driver. I have set
> hive.spark.job.monitor.timeout=3600s, so the log said 'Job hasn't been
> submitted after 3601s', actually during this long-time period it's impossible
> no available resource, and also did not see any issue related to the network,
> so the cause is not clear from the message "Possible reasons include network
> issues, errors in remote driver or the cluster has no available resources,
> etc.".
> console log:
> Starting Spark Job = e9ce42c8-ff20-4ac8-803f-7668678c2a00
> Job hasn't been submitted after 3601s. Aborting it.
> Possible reasons include network issues, errors in remote driver or the
> cluster has no available resources, etc.
> Please check YARN or Spark driver's logs for further information.
> Status: SENT
> FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.spark.SparkTask
> container's log:
> 17/02/13 05:05:54 INFO yarn.ApplicationMaster: Preparing Local resources
> 17/02/13 05:05:54 INFO yarn.ApplicationMaster: Prepared Local resources
> Map(__spark_libs__ -> resource { scheme: "hdfs" host: "hsx-node1" port: 8020
> file:
> "/user/root/.sparkStaging/application_1486905599813_0046/__spark_libs__6842484649003444330.zip"
> } size: 153484072 timestamp: 1486926551130 type: ARCHIVE visibility:
> PRIVATE, __spark_conf__ -> resource { scheme: "hdfs" host: "hsx-node1" port:
> 8020 file:
> "/user/root/.sparkStaging/application_1486905599813_0046/__spark_conf__.zip"
> } size: 116245 timestamp: 1486926551318 type: ARCHIVE visibility: PRIVATE)
> 17/02/13 05:05:54 INFO yarn.ApplicationMaster: ApplicationAttemptId:
> appattempt_1486905599813_0046_000002
> 17/02/13 05:05:54 INFO spark.SecurityManager: Changing view acls to: root
> 17/02/13 05:05:54 INFO spark.SecurityManager: Changing modify acls to: root
> 17/02/13 05:05:54 INFO spark.SecurityManager: Changing view acls groups to:
> 17/02/13 05:05:54 INFO spark.SecurityManager: Changing modify acls groups to:
> 17/02/13 05:05:54 INFO spark.SecurityManager: SecurityManager: authentication
> disabled; ui acls disabled; users with view permissions: Set(root); groups
> with view permissions: Set(); users with modify permissions: Set(root);
> groups with modify permissions: Set()
> 17/02/13 05:05:54 INFO yarn.ApplicationMaster: Waiting for Spark driver to be
> reachable.
> 17/02/13 05:05:54 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:54 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:54 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:57 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:57 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:07:34 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:07:34 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:07:34 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:07:34 ERROR yarn.ApplicationMaster: Failed to connect to driver
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:07:34 ERROR yarn.ApplicationMaster: Uncaught exception:
> org.apache.spark.SparkException: Failed to connect to driver!
> at
> org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkDriver(ApplicationMaster.scala:569)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster.runExecutorLauncher(ApplicationMaster.scala:405)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:247)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:749)
> at
> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:71)
> at
> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:70)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at
> org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:70)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:747)
> at
> org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:774)
> at
> org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala)
> 17/02/13 05:07:34 INFO yarn.ApplicationMaster: Final app status: FAILED,
> exitCode: 10, (reason: Uncaught exception: org.apache.spark.SparkException:
> Failed to connect to driver!)
> 17/02/13 05:07:34 INFO yarn.ApplicationMaster: Unregistering
> ApplicationMaster with FAILED (diag message: Uncaught exception:
> org.apache.spark.SparkException: Failed to connect to driver!)
> 17/02/13 05:07:34 INFO yarn.ApplicationMaster: Deleting staging directory
> hdfs://hsx-node1:8020/user/root/.sparkStaging/application_1486905599813_0046
> 17/02/13 05:07:34 INFO util.ShutdownHookManager: Shutdown hook called
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]