[ 
https://issues.apache.org/jira/browse/SPARK-19569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15863109#comment-15863109
 ] 

KaiXu commented on SPARK-19569:
-------------------------------

it's not the IP address resolution issue (SPARK-5113), since 192.168.1.1 is the 
client node(yarn-client the driver node) .

> could not  connect to spark driver on yarn-client mode
> ------------------------------------------------------
>
>                 Key: SPARK-19569
>                 URL: https://issues.apache.org/jira/browse/SPARK-19569
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.0.2
>         Environment: hadoop2.7.1
> spark2.0.2
> hive2.2
>            Reporter: KaiXu
>
> when I run Hive queries on Spark, got below error in the console, after check 
> the container's log, found it failed to connected to spark driver. I have set 
>  hive.spark.job.monitor.timeout=3600s, so the log said 'Job hasn't been 
> submitted after 3601s', actually during this long-time period it's impossible 
> no available resource, and also did not see any issue related to the network, 
> so the cause is not clear from the message "Possible reasons include network 
> issues, errors in remote driver or the cluster has no available resources, 
> etc.".
> console log:
> Starting Spark Job = e9ce42c8-ff20-4ac8-803f-7668678c2a00
> Job hasn't been submitted after 3601s. Aborting it.
> Possible reasons include network issues, errors in remote driver or the 
> cluster has no available resources, etc.
> Please check YARN or Spark driver's logs for further information.
> Status: SENT
> FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask
> container's log:
> 17/02/13 05:05:54 INFO yarn.ApplicationMaster: Preparing Local resources
> 17/02/13 05:05:54 INFO yarn.ApplicationMaster: Prepared Local resources 
> Map(__spark_libs__ -> resource { scheme: "hdfs" host: "hsx-node1" port: 8020 
> file: 
> "/user/root/.sparkStaging/application_1486905599813_0046/__spark_libs__6842484649003444330.zip"
>  } size: 153484072 timestamp: 1486926551130 type: ARCHIVE visibility: 
> PRIVATE, __spark_conf__ -> resource { scheme: "hdfs" host: "hsx-node1" port: 
> 8020 file: 
> "/user/root/.sparkStaging/application_1486905599813_0046/__spark_conf__.zip" 
> } size: 116245 timestamp: 1486926551318 type: ARCHIVE visibility: PRIVATE)
> 17/02/13 05:05:54 INFO yarn.ApplicationMaster: ApplicationAttemptId: 
> appattempt_1486905599813_0046_000002
> 17/02/13 05:05:54 INFO spark.SecurityManager: Changing view acls to: root
> 17/02/13 05:05:54 INFO spark.SecurityManager: Changing modify acls to: root
> 17/02/13 05:05:54 INFO spark.SecurityManager: Changing view acls groups to: 
> 17/02/13 05:05:54 INFO spark.SecurityManager: Changing modify acls groups to: 
> 17/02/13 05:05:54 INFO spark.SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users  with view permissions: Set(root); groups 
> with view permissions: Set(); users  with modify permissions: Set(root); 
> groups with modify permissions: Set()
> 17/02/13 05:05:54 INFO yarn.ApplicationMaster: Waiting for Spark driver to be 
> reachable.
> 17/02/13 05:05:54 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:54 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:54 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:55 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:56 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:57 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:05:57 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:07:34 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:07:34 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:07:34 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:07:34 ERROR yarn.ApplicationMaster: Failed to connect to driver 
> at 192.168.1.1:43656, retrying ...
> 17/02/13 05:07:34 ERROR yarn.ApplicationMaster: Uncaught exception: 
> org.apache.spark.SparkException: Failed to connect to driver!
>       at 
> org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkDriver(ApplicationMaster.scala:569)
>       at 
> org.apache.spark.deploy.yarn.ApplicationMaster.runExecutorLauncher(ApplicationMaster.scala:405)
>       at 
> org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:247)
>       at 
> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:749)
>       at 
> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:71)
>       at 
> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:70)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>       at 
> org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:70)
>       at 
> org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:747)
>       at 
> org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:774)
>       at 
> org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala)
> 17/02/13 05:07:34 INFO yarn.ApplicationMaster: Final app status: FAILED, 
> exitCode: 10, (reason: Uncaught exception: org.apache.spark.SparkException: 
> Failed to connect to driver!)
> 17/02/13 05:07:34 INFO yarn.ApplicationMaster: Unregistering 
> ApplicationMaster with FAILED (diag message: Uncaught exception: 
> org.apache.spark.SparkException: Failed to connect to driver!)
> 17/02/13 05:07:34 INFO yarn.ApplicationMaster: Deleting staging directory 
> hdfs://hsx-node1:8020/user/root/.sparkStaging/application_1486905599813_0046
> 17/02/13 05:07:34 INFO util.ShutdownHookManager: Shutdown hook called



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to