Chao Sun created HIVE-10433:
-------------------------------

             Summary: Cancel connection when remote driver process exited with 
error code [Spark Branch]
                 Key: HIVE-10433
                 URL: https://issues.apache.org/jira/browse/HIVE-10433
             Project: Hive
          Issue Type: Bug
          Components: spark-branch
            Reporter: Chao Sun


Currently in HoS, after starting a remote process in SparkClientImpl, it will 
wait for the process to connect back. However, there are cases that the process 
may fail and exit with error code, and thus no connection is attempted. In this 
situation, the HS2 process will still wait for the connection and eventually 
timeout itself. What makes it worse, user may need to wait for two timeout 
periods, one for SparkSetReducerParallelism, and another for the actual Spark 
job.

We should cancel the timeout task and mark the promise as failed once we know 
that the process is failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to