[jira] [Comment Edited] (HIVE-15671) RPCServer.registerClient() erroneously uses server/client handshake timeout for connection timeout

Xuefu Zhang (JIRA) Fri, 10 Feb 2017 03:12:07 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-15671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15861114#comment-15861114
 ]


Xuefu Zhang edited comment on HIVE-15671 at 2/10/17 11:10 AM:
--------------------------------------------------------------

Hi [~vanzin], to backtrack a little bit, I have a followup question about your 
comment.
{quote}
That's kinda hard to solve, because the server doesn't know which client 
connected until two things happen: first the driver has started, second the 
driver completed the SASL handshake to identify itself. A lot of things can go 
wrong in that time. There's already some code, IIRC, that fails the session if 
the spark-submit job dies with an error, but aside from that, it's kinda hard 
to do more.
{quote}
I was talking about server detecting a driver problem after it has connected 
back to the server. I'm wondering which timeout applies in case of a problem on 
the driver side, such as long GC, stall connection between the server and the 
driver, etc. It's kind of long if this timeout is also server.connect.timeout, 
which is increased to 10m in our case to accommodate for the busy cluster. To 
me it doesn't seem that such a timeout exist, in absence of a heartbeat 
mechanism.


was (Author: xuefuz):
Hi [~vanzin], to backtrack a little bit, I have a followup question about your 
comment.
{quote}
That's kinda hard to solve, because the server doesn't know which client 
connected until two things happen: first the driver has started, second the 
driver completed the SASL handshake to identify itself. A lot of things can go 
wrong in that time. There's already some code, IIRC, that fails the session if 
the spark-submit job dies with an error, but aside from that, it's kinda hard 
to do more.
{code}
I was talking about server detecting a driver problem after it has connected 
back to the server. I'm wondering which timeout applies in case of a problem on 
the driver side, such as long GC, stall connection between the server and the 
driver, etc. It's kind of long if this timeout is also server.connect.timeout, 
which is increased to 10m in our case to accommodate for the busy cluster. To 
me it doesn't seem that such a timeout exist, in absence of a heartbeat 
mechanism.

> RPCServer.registerClient() erroneously uses server/client handshake timeout 
> for connection timeout
> --------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-15671
>                 URL: https://issues.apache.org/jira/browse/HIVE-15671
>             Project: Hive
>          Issue Type: Bug
>          Components: Spark
>    Affects Versions: 1.1.0
>            Reporter: Xuefu Zhang
>            Assignee: Xuefu Zhang
>         Attachments: HIVE-15671.1.patch, HIVE-15671.patch
>
>
> {code}
>   /**
>    * Tells the RPC server to expect a connection from a new client.
>    * ...
>    */
>   public Future<Rpc> registerClient(final String clientId, String secret,
>       RpcDispatcher serverDispatcher) {
>     return registerClient(clientId, secret, serverDispatcher, 
> config.getServerConnectTimeoutMs());
>   }
> {code}
> {{config.getServerConnectTimeoutMs()}} returns value for 
> *hive.spark.client.server.connect.timeout*, which is meant for timeout for 
> handshake between Hive client and remote Spark driver. Instead, the timeout 
> should be *hive.spark.client.connect.timeout*, which is for timeout for 
> remote Spark driver in connecting back to Hive client.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (HIVE-15671) RPCServer.registerClient() erroneously uses server/client handshake timeout for connection timeout

Reply via email to