[ 
https://issues.apache.org/jira/browse/HIVE-9323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14270719#comment-14270719
 ] 

Chengxiang Li commented on HIVE-9323:
-------------------------------------

[~Szehon], I take a look at the hive log, the failed reason is quite strange 
and a little different from HIVE-9094. HIVE-9094 failed due to get executor 
count timeout because of spark cluster launch time is longer than spark client 
future timeout interval(5s, and 30s after HIVE-9094), while this timeout 
failure is due to RemoteDriver is not response in time(spark client wait 10s 
for RemoteDriver to register).
>From the hive.log, RemoteDriver processor is launched at 2015-01-08 
>18:43:03,938
{noformat}
2015-01-08 18:43:03,938 DEBUG [main]: client.SparkClientImpl 
(SparkClientImpl.java:startDriver(298)) - Running client driver with argv: 
/home/hiveptest/54.177.142.77-hiveptest-1/apache-svn-spark-source/itests/qtest-spark/../../itests/qtest-spark/target/spark/bin/spark-submit
 --properties-file 
/home/hiveptest/54.177.142.77-hiveptest-1/apache-svn-spark-source/itests/qtest-spark/target/tmp/spark-submit.1097041260552550316.properties
 --class org.apache.hive.spark.client.RemoteDriver 
/home/hiveptest/54.177.142.77-hiveptest-1/maven/org/apache/hive/hive-exec/0.15.0-SNAPSHOT/hive-exec-0.15.0-SNAPSHOT.jar
 --remote-host ip-10-228-130-250.us-west-1.compute.internal --remote-port 40406
{noformat}
In spark.log, RemoteDriver register back to SparkClient at 2015-01-08 
18:43:13,891 which should just more than timeout interval which is 10s.
{noformat}
2015-01-08 18:43:13,891 DEBUG [Driver-RPC-Handler-0]: rpc.RpcDispatcher 
(RpcDispatcher.java:registerRpc(185)) - [DriverProtocol] Registered outstanding 
rpc 0 (org.apache.hive.spark.client.rpc.Rpc$Hello).
{noformat}
The strange thing is that RemoteDriver processor is unusual slow, as it's 
launched at 2015-01-08 18:43:03,938 but we get it's first debug info at 
2015-01-08 18:43:13,161, RemoteDriver hardly do anything before this debug info.
{noformat}
2015-01-08 18:43:13,161 INFO  [main]: client.RemoteDriver 
(RemoteDriver.java:<init>(118)) - Connecting to: 
ip-10-228-130-250.us-west-1.compute.internal:40406
{noformat}
I not sure why this happens, but this should be a quite rarely case, we can 
check whether it happens again, besides expand timeout interval, i don't have a 
good solution for this issue now.

> Merge from trunk to spark 1/8/2015
> ----------------------------------
>
>                 Key: HIVE-9323
>                 URL: https://issues.apache.org/jira/browse/HIVE-9323
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>    Affects Versions: spark-branch
>            Reporter: Szehon Ho
>            Assignee: Szehon Ho
>             Fix For: spark-branch
>
>         Attachments: HIVE-9323-spark.patch, HIVE-9323.2-spark.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to