[ 
https://issues.apache.org/jira/browse/HIVE-16104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15902341#comment-15902341
 ] 

Siddharth Seth edited comment on HIVE-16104 at 3/9/17 2:10 AM:
---------------------------------------------------------------

Thank you. Still some parts which should not have changed... Anyway. Actual 
review comments this time

- Thread.sleep(PREEMPTION_KILL_GRACE_SLEEP_MS); - Think it is possible to 
replace this with a timed wait. Similar to the way the main scheduling loop 
waits for a fragment to complete or get scheduled before it tries to schedule 
the next one. That way we won't actually be sleeping for the entire 100ms.
- Another nice to have bit would be to allow the element to go back into the 
queue, and fetch the latest from the queue - which could be at a higher 
priority. Don't think this is critical though.

- In TaskRunnerCallable - killTask needs a minor change. shouldRunTask = false 
needs to be set independent of the (taskRunner) condition that it is under. 
Having a fragment wait outside the queue increases the possibility of an issue 
like this being hit. (The nice to have about putting the fragment back in the 
queue and waiting for the next schedule attempt would limit the possibility of 
other such issues, if they exist, as well)

- Any chance of adding tests? The test class is setup with some controlled 
scheduling constructs to help with this. Suspect it'll need more work with the 
timed wait though.

- System.nanoTime - replace with the Clock instance already used in the class. 
(Helps with unit tests to simulate time changes, may not be plugged in yet).

- Nit: lastKillTimeNs - Long(null) vs long (-1/special value for unset)?

- Not required.
{code}
// TODO: this can all be replaced by a Thread with a Runnable and a catch block
{code}


was (Author: sseth):
Thank you. Still some parts which should not have changed... Anyway. Actual 
review comments this time

Thread.sleep(PREEMPTION_KILL_GRACE_SLEEP_MS); - Think it is possible to replace 
this with a timed wait. Similar to the way the main scheduling loop waits for a 
fragment to complete or get scheduled before it tries to schedule the next one. 
That way we won't actually be sleeping for the entire 100ms.
Another nice to have bit would be to allow the element to go back into the 
queue, and fetch the latest from the queue - which could be at a higher 
priority. Don't think this is critical though.

In TaskRunnerCallable - killTask needs a minor change. shouldRunTask = false 
needs to be set independent of the (taskRunner) condition that it is under. 
Having a fragment wait outside the queue increases the possibility of an issue 
like this being hit. (The nice to have about putting the fragment back in the 
queue and waiting for the next schedule attempt would limit the possibility of 
other such issues, if they exist, as well)

Any chance of adding tests? The test class is setup with some controlled 
scheduling constructs to help with this. Suspect it'll need more work with the 
timed wait though.

System.nanoTime - replace with the Clock instance already used in the class. 
(Helps with unit tests to simulate time changes, may not be plugged in yet).

Nit: lastKillTimeNs - Long(null) vs long (-1/special value for unset)?

Not required.
{code}
// TODO: this can all be replaced by a Thread with a Runnable and a catch block
{code}

> LLAP: preemption may be too aggressive if the pre-empted task doesn't die 
> immediately
> -------------------------------------------------------------------------------------
>
>                 Key: HIVE-16104
>                 URL: https://issues.apache.org/jira/browse/HIVE-16104
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>         Attachments: HIVE-16104.01.patch, HIVE-16104.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to