[jira] [Commented] (HIVE-22966) LLAP: Consider including waitTime for comparing attempts in same vertex
[ https://issues.apache.org/jira/browse/HIVE-22966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17050477#comment-17050477 ] Mustafa Iman commented on HIVE-22966: - If two tasks are in the same job and their priorities are the same, does it really matter which one gets executed first? > LLAP: Consider including waitTime for comparing attempts in same vertex > --- > > Key: HIVE-22966 > URL: https://issues.apache.org/jira/browse/HIVE-22966 > Project: Hive > Issue Type: Improvement > Components: llap >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Attachments: HIVE-22966.3.patch, HIVE-22966.4.patch > > > When attempts are compared within same vertex, it should pick up the attempt > with longest wait time to avoid starvation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22966) LLAP: Consider including waitTime for comparing attempts in same vertex
[ https://issues.apache.org/jira/browse/HIVE-22966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17050680#comment-17050680 ] Rajesh Balamohan commented on HIVE-22966: - It does depending on the wait time. Wait time is used as proxy to schedule the attempts. For e.g, without the patch, longest wait time of the attempt was 1430 ms with running time of 528 ms (total of 1900+ms) in Q55. With patch, longest wait time was the attempt was 741 ms with running time of 700 ms (total of 1500ms). Depending on when the attempt gets scheduled, it impacts overall runtime of the vertex. Patch reduces starvation period for the task by fair comparison with wait time. > LLAP: Consider including waitTime for comparing attempts in same vertex > --- > > Key: HIVE-22966 > URL: https://issues.apache.org/jira/browse/HIVE-22966 > Project: Hive > Issue Type: Improvement > Components: llap >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Attachments: HIVE-22966.3.patch, HIVE-22966.4.patch > > > When attempts are compared within same vertex, it should pick up the attempt > with longest wait time to avoid starvation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22966) LLAP: Consider including waitTime for comparing attempts in same vertex
[ https://issues.apache.org/jira/browse/HIVE-22966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17051598#comment-17051598 ] Hive QA commented on HIVE-22966: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 41s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 45s{color} | {color:blue} llap-server in master has 90 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 15s{color} | {color:red} The patch generated 2 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 14m 14s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-20951/dev-support/hive-personality.sh | | git revision | master / deebfb6 | | Default Java | 1.8.0_111 | | findbugs | v3.0.1 | | asflicense | http://104.198.109.242/logs//PreCommit-HIVE-Build-20951/yetus/patch-asflicense-problems.txt | | modules | C: llap-server U: llap-server | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-20951/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > LLAP: Consider including waitTime for comparing attempts in same vertex > --- > > Key: HIVE-22966 > URL: https://issues.apache.org/jira/browse/HIVE-22966 > Project: Hive > Issue Type: Improvement > Components: llap >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Attachments: HIVE-22966.3.patch, HIVE-22966.4.patch > > > When attempts are compared within same vertex, it should pick up the attempt > with longest wait time to avoid starvation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22966) LLAP: Consider including waitTime for comparing attempts in same vertex
[ https://issues.apache.org/jira/browse/HIVE-22966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17051650#comment-17051650 ] Hive QA commented on HIVE-22966: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12995412/HIVE-22966.4.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:green}SUCCESS:{color} +1 due to 18096 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/20951/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/20951/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-20951/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12995412 - PreCommit-HIVE-Build > LLAP: Consider including waitTime for comparing attempts in same vertex > --- > > Key: HIVE-22966 > URL: https://issues.apache.org/jira/browse/HIVE-22966 > Project: Hive > Issue Type: Improvement > Components: llap >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Attachments: HIVE-22966.3.patch, HIVE-22966.4.patch > > > When attempts are compared within same vertex, it should pick up the attempt > with longest wait time to avoid starvation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22966) LLAP: Consider including waitTime for comparing attempts in same vertex
[ https://issues.apache.org/jira/browse/HIVE-22966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17051768#comment-17051768 ] Gopal Vijayaraghavan commented on HIVE-22966: - LGTM - +1 > LLAP: Consider including waitTime for comparing attempts in same vertex > --- > > Key: HIVE-22966 > URL: https://issues.apache.org/jira/browse/HIVE-22966 > Project: Hive > Issue Type: Improvement > Components: llap >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Attachments: HIVE-22966.3.patch, HIVE-22966.4.patch > > > When attempts are compared within same vertex, it should pick up the attempt > with longest wait time to avoid starvation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22966) LLAP: Consider including waitTime for comparing attempts in same vertex
[ https://issues.apache.org/jira/browse/HIVE-22966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17052118#comment-17052118 ] Panagiotis Garefalakis commented on HIVE-22966: --- Just to be clear here: waiting time has nothing to do with starvation as all tasks eventually complete within a vertex no matter how long they wait. Across vertices, a vertex can not really affect the completion of another as their priorities are different (so no starvation there either). In this patch we are using wait time as a proxy of a long-waiting task which is a really weak assumption – it might work for a query but it might not work for another based on the task runtime distributions – a better approach would be to use input/split size stats to make more involved decisions when prioritising tasks which requires significantly more work. Other than that, with this patch we are assigning resources to tasks in a more Fair manner so it does not hurt – but to avoid tasks with long-tails executed at the end there is more work involved. +1 from me well > LLAP: Consider including waitTime for comparing attempts in same vertex > --- > > Key: HIVE-22966 > URL: https://issues.apache.org/jira/browse/HIVE-22966 > Project: Hive > Issue Type: Improvement > Components: llap >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Fix For: 4.0.0 > > Attachments: HIVE-22966.3.patch, HIVE-22966.4.patch > > > When attempts are compared within same vertex, it should pick up the attempt > with longest wait time to avoid starvation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22966) LLAP: Consider including waitTime for comparing attempts in same vertex
[ https://issues.apache.org/jira/browse/HIVE-22966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17052180#comment-17052180 ] Rajesh Balamohan commented on HIVE-22966: - Starvation of tasks waiting for resources within the same vertex is what is being targeted in this patch. As mentioned in the ticket, it is not deadlock that it is trying to fix. Tasks across vertices does not come into picture as DAG priority takes care of it. Waiting time is intentionally used based on the existing set of metrics available. > LLAP: Consider including waitTime for comparing attempts in same vertex > --- > > Key: HIVE-22966 > URL: https://issues.apache.org/jira/browse/HIVE-22966 > Project: Hive > Issue Type: Improvement > Components: llap >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Fix For: 4.0.0 > > Attachments: HIVE-22966.3.patch, HIVE-22966.4.patch > > > When attempts are compared within same vertex, it should pick up the attempt > with longest wait time to avoid starvation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22966) LLAP: Consider including waitTime for comparing attempts in same vertex
[ https://issues.apache.org/jira/browse/HIVE-22966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17052191#comment-17052191 ] Panagiotis Garefalakis commented on HIVE-22966: --- As long as the stage is making progress and tasks get resources its not starvation – anyhow I believe the important thing to mention here is that it does not cure the long-tail task issue and we need to properly take care of it. > LLAP: Consider including waitTime for comparing attempts in same vertex > --- > > Key: HIVE-22966 > URL: https://issues.apache.org/jira/browse/HIVE-22966 > Project: Hive > Issue Type: Improvement > Components: llap >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Fix For: 4.0.0 > > Attachments: HIVE-22966.3.patch, HIVE-22966.4.patch > > > When attempts are compared within same vertex, it should pick up the attempt > with longest wait time to avoid starvation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22966) LLAP: Consider including waitTime for comparing attempts in same vertex
[ https://issues.apache.org/jira/browse/HIVE-22966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17052380#comment-17052380 ] Gopal Vijayaraghavan commented on HIVE-22966: - bq. even thought this patch takes into account task aging we do not cure the long-tail task issue and we need to properly take care of it. This entire patch is hiding in the shadow of YARN FIFO assumptions in long tail task scheduling order code inside Tez. https://github.com/apache/tez/blob/master/tez-runtime-library/src/main/java/org/apache/tez/dag/library/vertexmanager/ShuffleVertexManager.java#L591 There's also a somewhat equivalent version for the splits as well https://github.com/apache/tez/blob/master/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/hadoop/MRInputHelpers.java#L501 So Tez explicitly picks the biggest splits and the heaviest skewed reducers to start first, which is mostly relevant for query latency when we have a large number of tasks and a low number of executors. That is why this patch makes a difference, because at the same priority, we get FIFO back. > LLAP: Consider including waitTime for comparing attempts in same vertex > --- > > Key: HIVE-22966 > URL: https://issues.apache.org/jira/browse/HIVE-22966 > Project: Hive > Issue Type: Improvement > Components: llap >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Fix For: 4.0.0 > > Attachments: HIVE-22966.3.patch, HIVE-22966.4.patch > > > When attempts are compared within same vertex, it should pick up the attempt > with longest wait time to avoid starvation. -- This message was sent by Atlassian Jira (v8.3.4#803005)