[jira] [Commented] (HIVE-22966) LLAP: Consider including waitTime for comparing attempts in same vertex

2020-03-03 Thread Mustafa Iman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17050477#comment-17050477
 ] 

Mustafa Iman commented on HIVE-22966:
-

If two tasks are in the same job and their priorities are the same, does it 
really matter which one gets executed first?

> LLAP: Consider including waitTime for comparing attempts in same vertex
> ---
>
> Key: HIVE-22966
> URL: https://issues.apache.org/jira/browse/HIVE-22966
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-22966.3.patch, HIVE-22966.4.patch
>
>
> When attempts are compared within same vertex, it should pick up the attempt 
> with longest wait time to avoid starvation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22966) LLAP: Consider including waitTime for comparing attempts in same vertex

2020-03-03 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17050680#comment-17050680
 ] 

Rajesh Balamohan commented on HIVE-22966:
-

It does depending on the wait time. Wait time is used as proxy to schedule the 
attempts. For e.g, without the patch, longest wait time of the attempt was 1430 
ms with running time of 528 ms (total of 1900+ms) in Q55. With patch, longest 
wait time was the attempt was 741 ms with running time of 700 ms (total of 
1500ms). Depending on when the attempt gets scheduled, it impacts overall 
runtime of the vertex. 
 Patch reduces starvation period for the task by fair comparison with wait time.

> LLAP: Consider including waitTime for comparing attempts in same vertex
> ---
>
> Key: HIVE-22966
> URL: https://issues.apache.org/jira/browse/HIVE-22966
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-22966.3.patch, HIVE-22966.4.patch
>
>
> When attempts are compared within same vertex, it should pick up the attempt 
> with longest wait time to avoid starvation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22966) LLAP: Consider including waitTime for comparing attempts in same vertex

2020-03-04 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17051598#comment-17051598
 ] 

Hive QA commented on HIVE-22966:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
41s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
24s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
45s{color} | {color:blue} llap-server in master has 90 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
16s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
15s{color} | {color:red} The patch generated 2 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 14m 14s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-20951/dev-support/hive-personality.sh
 |
| git revision | master / deebfb6 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.1 |
| asflicense | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-20951/yetus/patch-asflicense-problems.txt
 |
| modules | C: llap-server U: llap-server |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-20951/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> LLAP: Consider including waitTime for comparing attempts in same vertex
> ---
>
> Key: HIVE-22966
> URL: https://issues.apache.org/jira/browse/HIVE-22966
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-22966.3.patch, HIVE-22966.4.patch
>
>
> When attempts are compared within same vertex, it should pick up the attempt 
> with longest wait time to avoid starvation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22966) LLAP: Consider including waitTime for comparing attempts in same vertex

2020-03-04 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17051650#comment-17051650
 ] 

Hive QA commented on HIVE-22966:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12995412/HIVE-22966.4.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 18096 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/20951/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/20951/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-20951/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12995412 - PreCommit-HIVE-Build

> LLAP: Consider including waitTime for comparing attempts in same vertex
> ---
>
> Key: HIVE-22966
> URL: https://issues.apache.org/jira/browse/HIVE-22966
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-22966.3.patch, HIVE-22966.4.patch
>
>
> When attempts are compared within same vertex, it should pick up the attempt 
> with longest wait time to avoid starvation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22966) LLAP: Consider including waitTime for comparing attempts in same vertex

2020-03-04 Thread Gopal Vijayaraghavan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17051768#comment-17051768
 ] 

Gopal Vijayaraghavan commented on HIVE-22966:
-

LGTM - +1

> LLAP: Consider including waitTime for comparing attempts in same vertex
> ---
>
> Key: HIVE-22966
> URL: https://issues.apache.org/jira/browse/HIVE-22966
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-22966.3.patch, HIVE-22966.4.patch
>
>
> When attempts are compared within same vertex, it should pick up the attempt 
> with longest wait time to avoid starvation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22966) LLAP: Consider including waitTime for comparing attempts in same vertex

2020-03-05 Thread Panagiotis Garefalakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17052118#comment-17052118
 ] 

Panagiotis Garefalakis commented on HIVE-22966:
---

Just to be clear here: waiting time has nothing to do with starvation as all 
tasks eventually complete within a vertex no matter how long they wait. 
Across vertices, a vertex can not really affect the completion of another as 
their priorities are different (so no starvation there either).

In this patch we are using wait time as a proxy of a long-waiting task which is 
a really weak assumption – it might work for a query but it might not work for 
another based on the task runtime distributions –  a better approach would be 
to use input/split size stats to make more involved decisions when prioritising 
tasks which requires significantly more work.

Other than that, with this patch we are assigning resources to tasks in a more 
Fair manner so it does not hurt – but to avoid tasks with long-tails executed 
at the end there is more work involved.

+1 from me well

> LLAP: Consider including waitTime for comparing attempts in same vertex
> ---
>
> Key: HIVE-22966
> URL: https://issues.apache.org/jira/browse/HIVE-22966
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Fix For: 4.0.0
>
> Attachments: HIVE-22966.3.patch, HIVE-22966.4.patch
>
>
> When attempts are compared within same vertex, it should pick up the attempt 
> with longest wait time to avoid starvation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22966) LLAP: Consider including waitTime for comparing attempts in same vertex

2020-03-05 Thread Rajesh Balamohan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17052180#comment-17052180
 ] 

Rajesh Balamohan commented on HIVE-22966:
-

Starvation of tasks waiting for resources within the same vertex is what is 
being targeted in this patch. As mentioned in the ticket, it is not deadlock 
that it is trying to fix.

Tasks across vertices does not come into picture as DAG priority takes care of 
it. Waiting time is intentionally used based on the existing set of metrics 
available.

> LLAP: Consider including waitTime for comparing attempts in same vertex
> ---
>
> Key: HIVE-22966
> URL: https://issues.apache.org/jira/browse/HIVE-22966
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Fix For: 4.0.0
>
> Attachments: HIVE-22966.3.patch, HIVE-22966.4.patch
>
>
> When attempts are compared within same vertex, it should pick up the attempt 
> with longest wait time to avoid starvation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22966) LLAP: Consider including waitTime for comparing attempts in same vertex

2020-03-05 Thread Panagiotis Garefalakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17052191#comment-17052191
 ] 

Panagiotis Garefalakis commented on HIVE-22966:
---

As long as the stage is making progress and tasks get resources its not 
starvation – anyhow I believe the important thing to mention here is that it 
does not cure the long-tail task issue and we need to properly take care of it.

> LLAP: Consider including waitTime for comparing attempts in same vertex
> ---
>
> Key: HIVE-22966
> URL: https://issues.apache.org/jira/browse/HIVE-22966
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Fix For: 4.0.0
>
> Attachments: HIVE-22966.3.patch, HIVE-22966.4.patch
>
>
> When attempts are compared within same vertex, it should pick up the attempt 
> with longest wait time to avoid starvation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22966) LLAP: Consider including waitTime for comparing attempts in same vertex

2020-03-05 Thread Gopal Vijayaraghavan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17052380#comment-17052380
 ] 

Gopal Vijayaraghavan commented on HIVE-22966:
-

bq. even thought this patch takes into account task aging we do not cure the 
long-tail task issue and we need to properly take care of it.

This entire patch is hiding in the shadow of YARN FIFO assumptions in long tail 
task scheduling order code inside Tez.

https://github.com/apache/tez/blob/master/tez-runtime-library/src/main/java/org/apache/tez/dag/library/vertexmanager/ShuffleVertexManager.java#L591

There's also a somewhat equivalent version for the splits as well

https://github.com/apache/tez/blob/master/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/hadoop/MRInputHelpers.java#L501

So Tez explicitly picks the biggest splits and the heaviest skewed reducers to 
start first, which is mostly relevant for query latency when we have a large 
number of tasks and a low number of executors.

That is why this patch makes a difference, because at the same priority, we get 
FIFO back.

> LLAP: Consider including waitTime for comparing attempts in same vertex
> ---
>
> Key: HIVE-22966
> URL: https://issues.apache.org/jira/browse/HIVE-22966
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Fix For: 4.0.0
>
> Attachments: HIVE-22966.3.patch, HIVE-22966.4.patch
>
>
> When attempts are compared within same vertex, it should pick up the attempt 
> with longest wait time to avoid starvation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)