[ 
https://issues.apache.org/jira/browse/HIVE-9094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14248560#comment-14248560
 ] 

Marcelo Vanzin commented on HIVE-9094:
--------------------------------------

60s sounds reasonable. This initial timeout will always be hard to figure out, 
since launching the app will depend a lot on the cluster being used and a bunch 
of other things... :-/

Perhaps we could add some kind of "context status" even that the client side 
can listen to, and the driver side can periodically send the update... but that 
would probably still need some kind of timeout. Anyway, raising the timeout 
sounds fine for now.

> TimeoutException when trying get executor count from RSC [Spark Branch]
> -----------------------------------------------------------------------
>
>                 Key: HIVE-9094
>                 URL: https://issues.apache.org/jira/browse/HIVE-9094
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Xuefu Zhang
>            Assignee: Chengxiang Li
>
> In 
> http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/532/testReport,
>  join25.q failed because:
> {code}
> 2014-12-12 19:14:50,084 ERROR [main]: ql.Driver 
> (SessionState.java:printError(838)) - FAILED: SemanticException Failed to get 
> spark memory/core info: java.util.concurrent.TimeoutException
> org.apache.hadoop.hive.ql.parse.SemanticException: Failed to get spark 
> memory/core info: java.util.concurrent.TimeoutException
>         at 
> org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:120)
>         at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
>         at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
>         at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78)
>         at 
> org.apache.hadoop.hive.ql.lib.ForwardWalker.walk(ForwardWalker.java:79)
>         at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109)
>         at 
> org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:134)
>         at 
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:99)
>         at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10202)
>         at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221)
>         at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74)
>         at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:221)
>         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:420)
>         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:306)
>         at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1108)
>         at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170)
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1045)
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1035)
>         at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:199)
>         at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:151)
>         at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:362)
>         at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:297)
>         at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:837)
>         at 
> org.apache.hadoop.hive.cli.TestSparkCliDriver.runTest(TestSparkCliDriver.java:234)
>         at 
> org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_join25(TestSparkCliDriver.java:162)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at junit.framework.TestCase.runTest(TestCase.java:176)
>         at junit.framework.TestCase.runBare(TestCase.java:141)
>         at junit.framework.TestResult$1.protect(TestResult.java:122)
>         at junit.framework.TestResult.runProtected(TestResult.java:142)
>         at junit.framework.TestResult.run(TestResult.java:125)
>         at junit.framework.TestCase.run(TestCase.java:129)
>         at junit.framework.TestSuite.runTest(TestSuite.java:255)
>         at junit.framework.TestSuite.run(TestSuite.java:250)
>         at 
> org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84)
>         at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
>         at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
>         at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
>         at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
>         at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
>         at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
> Caused by: java.util.concurrent.TimeoutException
>         at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:49)
>         at 
> org.apache.hive.spark.client.JobHandleImpl.get(JobHandleImpl.java:74)
>         at 
> org.apache.hive.spark.client.JobHandleImpl.get(JobHandleImpl.java:35)
>         at 
> org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.getExecutorCount(RemoteHiveSparkClient.java:92)
>         at 
> org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.getMemoryAndCores(SparkSessionImpl.java:77)
>         at 
> org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:118)
>         ... 43 more
> {code}
> The timeout is introduced in HIVE-9079. Previously the driver may hang. This 
> seems to be a robustness issue of RSC. Hanging isn't good, but timeout isn't 
> good either unless there is some network issue, which doesn't seem to be case 
> here. [~chengxiang li]/[~vanzin], could we get the bottom of this? Increase 
> the timeout value if necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to