[jira] [Updated] (HIVE-15529) LLAP: TaskSchedulerService can get stuck when scheduleTask returns DELAYED_RESOURCES

2017-01-03 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-15529:

Attachment: HIVE-15529.1.patch

Basically, re-enabling the node was not kicking off correctly in 
{{NodeEnablerCallable}}. It was trying to get {{ServiceInstance}} based on node 
identity from {{DynamicServiceInstanceSet}} which internally was not returning 
the instance correctly.

This was causing issues in re-enabling the node in {{NodeEnablerCallable}}. 
Attached patch checks for "workerIdentity" in {{DynamicServiceInstanceSet}}.  
Tested this in small scale cluster.


> LLAP: TaskSchedulerService can get stuck when scheduleTask returns 
> DELAYED_RESOURCES
> 
>
> Key: HIVE-15529
> URL: https://issues.apache.org/jira/browse/HIVE-15529
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Rajesh Balamohan
>Priority: Critical
> Attachments: HIVE-15529.1.patch
>
>
> Easier way to simulate the issue:
> 1. Start hive cli with "--hiveconf hive.execution.mode=llap"
> 2. Run a sql script file (e.g sql script containing tpc-ds queries)
> 3. In the middle of the run, press "ctrl+C" which would interrupt the current 
> job. This should not exit the hive cli yet.
> 4. After sometime, launch the same SQL script in same cli. This would get 
> stuck indefinitely (waiting for computing the splits).
> Even when cli is quit, AM runs forever until explicitly killed. 
> Issue seems to be around {{LlapTaskSchedulerService::schedulePendingTasks}} 
> dealing with the loop when it encounters {{DELAYED_RESOURCES}} on task 
> scheduling. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-15529) LLAP: TaskSchedulerService can get stuck when scheduleTask returns DELAYED_RESOURCES

2017-01-03 Thread Rajesh Balamohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-15529:

Assignee: Rajesh Balamohan
  Status: Patch Available  (was: Open)

> LLAP: TaskSchedulerService can get stuck when scheduleTask returns 
> DELAYED_RESOURCES
> 
>
> Key: HIVE-15529
> URL: https://issues.apache.org/jira/browse/HIVE-15529
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Critical
> Attachments: HIVE-15529.1.patch
>
>
> Easier way to simulate the issue:
> 1. Start hive cli with "--hiveconf hive.execution.mode=llap"
> 2. Run a sql script file (e.g sql script containing tpc-ds queries)
> 3. In the middle of the run, press "ctrl+C" which would interrupt the current 
> job. This should not exit the hive cli yet.
> 4. After sometime, launch the same SQL script in same cli. This would get 
> stuck indefinitely (waiting for computing the splits).
> Even when cli is quit, AM runs forever until explicitly killed. 
> Issue seems to be around {{LlapTaskSchedulerService::schedulePendingTasks}} 
> dealing with the loop when it encounters {{DELAYED_RESOURCES}} on task 
> scheduling. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)