[ 
https://issues.apache.org/jira/browse/HIVE-15671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857975#comment-15857975
 ] 

KaiXu commented on HIVE-15671:
------------------------------

this error occurs when several queries run at the same time with large data 
scale, in fact it would not occur when running the query separately, but it can 
frequently occur when running together again.

the connection is closed suddenly, seems to be killed manually.  
2017-02-08 09:51:01,338 Stage-2_0: 1041/1041 Finished   Stage-3_0: 
961(+383)/1520       Stage-4_0: 0/2021       Stage-5_0: 0/1009       Stage-6_0: 
0/1
Failed to monitor Job[ 2] with exception 'java.lang.IllegalStateException(RPC 
channel is closed.)'
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.spark.SparkTask

found only one ERROR in yarn application log, it seems the driver was closed 
but not know what caused it close, above comment is hive's log, any suggestions 
shall be appreciated!

17/02/08 09:51:00 INFO executor.Executor: Finished task 1492.0 in stage 3.0 
(TID 2168). 3294 bytes result sent to driver
17/02/08 09:51:00 INFO executor.Executor: Finished task 556.0 in stage 3.0 (TID 
1587). 3312 bytes result sent to driver
17/02/08 09:51:00 INFO executor.Executor: Finished task 1412.0 in stage 3.0 
(TID 2136). 3294 bytes result sent to driver
17/02/08 09:51:00 INFO executor.Executor: Finished task 1236.0 in stage 3.0 
(TID 2007). 3294 bytes result sent to driver
17/02/08 09:51:04 INFO executor.CoarseGrainedExecutorBackend: Driver commanded 
a shutdown
17/02/08 09:51:04 INFO storage.MemoryStore: MemoryStore cleared
17/02/08 09:51:04 INFO storage.BlockManager: BlockManager stopped
17/02/08 09:51:04 WARN executor.CoarseGrainedExecutorBackend: An unknown 
(hsx-node1:42777) driver disconnected.
17/02/08 09:51:04 ERROR executor.CoarseGrainedExecutorBackend: Driver 
192.168.1.1:42777 disassociated! Shutting down.
17/02/08 09:51:04 INFO util.ShutdownHookManager: Shutdown hook called
17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory 
/mnt/disk8/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-a8167f0b-f3c3-458f-ad51-8a0f4bcda4f3
17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: 
Shutting down remote daemon.
17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory 
/mnt/disk1/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-26cba445-66d2-4b78-a428-17881c92f0f6
17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote 
daemon shut down; proceeding with flushing remote transports.
17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: 
Remoting shut down.

> RPCServer.registerClient() erroneously uses server/client handshake timeout 
> for connection timeout
> --------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-15671
>                 URL: https://issues.apache.org/jira/browse/HIVE-15671
>             Project: Hive
>          Issue Type: Bug
>          Components: Spark
>    Affects Versions: 1.1.0
>            Reporter: Xuefu Zhang
>            Assignee: Xuefu Zhang
>         Attachments: HIVE-15671.1.patch, HIVE-15671.patch
>
>
> {code}
>   /**
>    * Tells the RPC server to expect a connection from a new client.
>    * ...
>    */
>   public Future<Rpc> registerClient(final String clientId, String secret,
>       RpcDispatcher serverDispatcher) {
>     return registerClient(clientId, secret, serverDispatcher, 
> config.getServerConnectTimeoutMs());
>   }
> {code}
> {{config.getServerConnectTimeoutMs()}} returns value for 
> *hive.spark.client.server.connect.timeout*, which is meant for timeout for 
> handshake between Hive client and remote Spark driver. Instead, the timeout 
> should be *hive.spark.client.connect.timeout*, which is for timeout for 
> remote Spark driver in connecting back to Hive client.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to