[ 
https://issues.apache.org/jira/browse/HIVE-15860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15861036#comment-15861036
 ] 

Rui Li commented on HIVE-15860:
-------------------------------

A more specific way to fix it is just add the check when job has started and 
{{sparkJobStatus.getState()}} returns null. The SENT and QUEUED branches are 
covered by the monitor timeout. The SUCCEEDED and FAILED branch will break the 
loop themselves. So we only need to worry about the STARTED branch.

> RemoteSparkJobMonitor may hang when RemoteDriver exits abnormally
> -----------------------------------------------------------------
>
>                 Key: HIVE-15860
>                 URL: https://issues.apache.org/jira/browse/HIVE-15860
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Rui Li
>            Assignee: Rui Li
>         Attachments: HIVE-15860.1.patch
>
>
> It happens when RemoteDriver crashes between {{JobStarted}} and 
> {{JobSubmitted}}, e.g. killed by {{kill -9}}. RemoteSparkJobMonitor will 
> consider the job has started, however it can't get the job info because it 
> hasn't received the JobId. Then the monitor will loop forever.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to