[ 
https://issues.apache.org/jira/browse/HIVE-18684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-18684:
--------------------------------
    Status: Open  (was: Patch Available)

> Race condition in RemoteSparkJobMonitor
> ---------------------------------------
>
>                 Key: HIVE-18684
>                 URL: https://issues.apache.org/jira/browse/HIVE-18684
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Sahil Takiar
>            Assignee: Sahil Takiar
>            Priority: Major
>         Attachments: HIVE-18684.1.patch, HIVE-18684.2.patch, 
> HIVE-18684.3.patch
>
>
> There is a race condition in {{RemoteSparkJobMonitor}}. Sometimes the info in 
> {{RemoteSparkJobMonitor#startMonitor.STARTED}} gets printed out, sometimes it 
> doesn't. This can be easily verified by running a qtest on 
> {{TestMiniSparkOnYarnCliDriver}} and counting the number of times {{Query 
> Hive on Spark job}} is printed vs. the number of times {{Finished 
> successfully in}} gets printed.
> The issue is that {{RemoteSparkJobMonitor}} runs every one second, and checks 
> the state of {{JobHandle}}. Depending on the state, it prints out some 
> logging info. The content of the logs contain an implicit assumption that 
> logs in the {{STARTED}} state are printed before the logs in the 
> {{SUCCEEDED}} state. However, this isn't always the case. The state 
> transitions are driven by how long the remote Spark job takes to run, and it 
> it finishes within one second then the logs in the {{STARTED}} state never 
> printed.
> This can be confusing to users, and there is key debugging information that 
> is printed in the {{STARTED}} state.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to