[ https://issues.apache.org/jira/browse/HIVE-15671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857975#comment-15857975 ]
KaiXu commented on HIVE-15671: ------------------------------ this error occurs when several queries run at the same time with large data scale, in fact it would not occur when running the query separately, but it can frequently occur when running together again. the connection is closed suddenly, seems to be killed manually. 2017-02-08 09:51:01,338 Stage-2_0: 1041/1041 Finished Stage-3_0: 961(+383)/1520 Stage-4_0: 0/2021 Stage-5_0: 0/1009 Stage-6_0: 0/1 Failed to monitor Job[ 2] with exception 'java.lang.IllegalStateException(RPC channel is closed.)' FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask found only one ERROR in yarn application log, it seems the driver was closed but not know what caused it close, above comment is hive's log, any suggestions shall be appreciated! 17/02/08 09:51:00 INFO executor.Executor: Finished task 1492.0 in stage 3.0 (TID 2168). 3294 bytes result sent to driver 17/02/08 09:51:00 INFO executor.Executor: Finished task 556.0 in stage 3.0 (TID 1587). 3312 bytes result sent to driver 17/02/08 09:51:00 INFO executor.Executor: Finished task 1412.0 in stage 3.0 (TID 2136). 3294 bytes result sent to driver 17/02/08 09:51:00 INFO executor.Executor: Finished task 1236.0 in stage 3.0 (TID 2007). 3294 bytes result sent to driver 17/02/08 09:51:04 INFO executor.CoarseGrainedExecutorBackend: Driver commanded a shutdown 17/02/08 09:51:04 INFO storage.MemoryStore: MemoryStore cleared 17/02/08 09:51:04 INFO storage.BlockManager: BlockManager stopped 17/02/08 09:51:04 WARN executor.CoarseGrainedExecutorBackend: An unknown (hsx-node1:42777) driver disconnected. 17/02/08 09:51:04 ERROR executor.CoarseGrainedExecutorBackend: Driver 192.168.1.1:42777 disassociated! Shutting down. 17/02/08 09:51:04 INFO util.ShutdownHookManager: Shutdown hook called 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk8/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-a8167f0b-f3c3-458f-ad51-8a0f4bcda4f3 17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon. 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk1/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-26cba445-66d2-4b78-a428-17881c92f0f6 17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports. 17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remoting shut down. > RPCServer.registerClient() erroneously uses server/client handshake timeout > for connection timeout > -------------------------------------------------------------------------------------------------- > > Key: HIVE-15671 > URL: https://issues.apache.org/jira/browse/HIVE-15671 > Project: Hive > Issue Type: Bug > Components: Spark > Affects Versions: 1.1.0 > Reporter: Xuefu Zhang > Assignee: Xuefu Zhang > Attachments: HIVE-15671.1.patch, HIVE-15671.patch > > > {code} > /** > * Tells the RPC server to expect a connection from a new client. > * ... > */ > public Future<Rpc> registerClient(final String clientId, String secret, > RpcDispatcher serverDispatcher) { > return registerClient(clientId, secret, serverDispatcher, > config.getServerConnectTimeoutMs()); > } > {code} > {{config.getServerConnectTimeoutMs()}} returns value for > *hive.spark.client.server.connect.timeout*, which is meant for timeout for > handshake between Hive client and remote Spark driver. Instead, the timeout > should be *hive.spark.client.connect.timeout*, which is for timeout for > remote Spark driver in connecting back to Hive client. -- This message was sent by Atlassian JIRA (v6.3.15#6346)