Prabhu Joseph created ZEPPELIN-2822:
---------------------------------------

             Summary: Livy Interpreter should have a way to set Livy RSCConf  
RPC_CLIENT_HANDSHAKE_TIMEOUT
                 Key: ZEPPELIN-2822
                 URL: https://issues.apache.org/jira/browse/ZEPPELIN-2822
             Project: Zeppelin
          Issue Type: Bug
          Components: livy-interpreter
    Affects Versions: 0.7.2
            Reporter: Prabhu Joseph


Consider a Node which has NN1 (NameNode) and HiveMetaStore is down but we have 
HA for both services. Running livy script will create a new session and will 
wait for ipc.client.connect.timeout (20s) for each jar upload into hdfs 

{code}
17/07/31 13:59:29 INFO ContextLauncher: 17/07/31 13:59:39 INFO Client: Source 
and destination file systems are the same. Not copying 
hdfs://prabhu/hdp/apps/2.6.1.0-129/spark/spark-hdp-assembly.jar
17/07/31 13:59:49 INFO ContextLauncher: 17/07/31 13:59:49 INFO Client: 
Uploading resource 
file:/usr/hdp/current/livy-server/rsc-jars/livy-rsc-0.3.0.2.6.1.0-129.jar -> 
hdfs://prabhu/user/diasmi/.sparkStaging/application_1501501991083_0001/livy-rsc-0.3.0.2.6.1.0-129.jar
{code}

and 5 seconds (hive.metastore.client.socket.timeout)

{code}
17/07/26 09:09:46 INFO ContextLauncher: 17/07/26 09:09:46 INFO metastore: 
Trying to connect to metastore with URI thrift://prabhu01:9083
17/07/26 09:09:51 INFO ContextLauncher: 17/07/26 09:09:51 WARN metastore: 
Failed to connect to the MetaStore Server...
17/07/26 09:09:51 INFO ContextLauncher: 17/07/26 09:09:51 INFO metastore: 
Trying to connect to metastore with URI thrift://prabhu02:9083
17/07/26 09:09:51 INFO ContextLauncher: 17/07/26 09:09:51 INFO metastore: 
Connected to metastore.
{code}

and finally will fail with timeout with Livy Server Connect Timeout. 90 Seconds 
is too low for this case. Zeppelin has to have a way for overriding this 
timeout configuration.

RPC_CLIENT_HANDSHAKE_TIMEOUT("server.connect.timeout", "90s")

{code}
17/07/31 14:00:51 ERROR RSCClient: Failed to connect to context.
java.util.concurrent.TimeoutException: Timed out waiting for context to start.
        at 
com.cloudera.livy.rsc.ContextLauncher.connectTimeout(ContextLauncher.java:133)
        at 
com.cloudera.livy.rsc.ContextLauncher.access$200(ContextLauncher.java:62)
        at com.cloudera.livy.rsc.ContextLauncher$2.run(ContextLauncher.java:121)
        at 
io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38)
        at 
io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:120)
        at 
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
        at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
        at java.lang.Thread.run(Thread.java:745)
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to