[ 
https://issues.apache.org/jira/browse/HIVE-16071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15900703#comment-15900703
 ] 

Xuefu Zhang commented on HIVE-16071:
------------------------------------

Hi [~lirui], thank for your input and experiment. I think we are making some 
progress in drawing a conclusion.
{quote}
If no SaslMessage is sent, Hive will still wait for 
hive.spark.client.server.connect.timeout, even if cancelTask closes the channel 
after 1s.
{quote}
I'm particularly concern on cases where Hive takes more than it needs to detect 
a problem and return the error to the user. In this case, Hive should know in 
1s that Sasl handshake doesn't complete. It doesn't make sense to let user know 
the failure until after 1 hr. (1 hr is set to accommodate the resource 
availability, not connection establishment.)

{quote}
the cancelTask only closes the channel, it doesn't set failure to the Future.
{quote}
This is a good observation. Is this another bug that we should fix? That is, 
let cancelTask fail the Future so that Hive stops waiting until 
server.connect.timeout elapses.

Any further thoughts?

[~ctang.ma], To answer your question, I don't think we need another property. 
We should use client.connect.timeout as it's also used on driver side. If the 
default value is too low, we can bump it up.


> Spark remote driver misuses the timeout in RPC handshake
> --------------------------------------------------------
>
>                 Key: HIVE-16071
>                 URL: https://issues.apache.org/jira/browse/HIVE-16071
>             Project: Hive
>          Issue Type: Bug
>          Components: Spark
>            Reporter: Chaoyu Tang
>            Assignee: Chaoyu Tang
>         Attachments: HIVE-16071.patch
>
>
> Based on its property description in HiveConf and the comments in HIVE-12650 
> (https://issues.apache.org/jira/browse/HIVE-12650?focusedCommentId=15128979&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15128979),
>  hive.spark.client.connect.timeout is the timeout when the spark remote 
> driver makes a socket connection (channel) to RPC server. But currently it is 
> also used by the remote driver for RPC client/server handshaking, which is 
> not right. Instead, hive.spark.client.server.connect.timeout should be used 
> and it has already been used by the RPCServer in the handshaking.
> The error like following is usually caused by this issue, since the default 
> hive.spark.client.connect.timeout value (1000ms) used by remote driver for 
> handshaking is a little too short.
> {code}
> 17/02/20 08:46:08 ERROR yarn.ApplicationMaster: User class threw exception: 
> java.util.concurrent.ExecutionException: javax.security.sasl.SaslException: 
> Client closed before SASL negotiation finished.
> java.util.concurrent.ExecutionException: javax.security.sasl.SaslException: 
> Client closed before SASL negotiation finished.
>         at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37)
>         at 
> org.apache.hive.spark.client.RemoteDriver.<init>(RemoteDriver.java:156)
>         at 
> org.apache.hive.spark.client.RemoteDriver.main(RemoteDriver.java:556)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at 
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:542)
> Caused by: javax.security.sasl.SaslException: Client closed before SASL 
> negotiation finished.
>         at 
> org.apache.hive.spark.client.rpc.Rpc$SaslClientHandler.dispose(Rpc.java:453)
>         at 
> org.apache.hive.spark.client.rpc.SaslHandler.channelInactive(SaslHandler.java:90)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to