[jira] [Commented] (TEZ-3810) TezCounter for idle time in shuffle phase
[ https://issues.apache.org/jira/browse/TEZ-3810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17400588#comment-17400588 ] Hadoop QA commented on TEZ-3810: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 6s{color} | {color:red} TEZ-3810 does not apply to master. Rebase required? Wrong Branch? See https://cwiki.apache.org/confluence/display/TEZ/How+to+Contribute+to+Tez for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | TEZ-3810 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12902897/TEZ-3810.005.patch | | Console output | https://ci-hadoop.apache.org/job/PreCommit-TEZ-Build/121/console | | versions | git=2.17.1 | | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org | This message was automatically generated. > TezCounter for idle time in shuffle phase > - > > Key: TEZ-3810 > URL: https://issues.apache.org/jira/browse/TEZ-3810 > Project: Apache Tez > Issue Type: Improvement >Reporter: Ashwin Ramesh >Assignee: Kuhu Shukla >Priority: Major > Attachments: TEZ-3810-001.patch, TEZ-3810.002.patch, > TEZ-3810.003.patch, TEZ-3810.004.patch, TEZ-3810.005.patch > > > A task attempt counter that tracks how much time was spent waiting for > inputs in the shuffle phase. We can use this to quickly identify jobs that > are wasting a lot of time on the grid with idle reducer tasks instead of > shuffling/merging. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (TEZ-3810) TezCounter for idle time in shuffle phase
[ https://issues.apache.org/jira/browse/TEZ-3810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487696#comment-16487696 ] TezQA commented on TEZ-3810: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12902897/TEZ-3810.005.patch against master revision bf87a0f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.test.TestSecureShuffle The following test timeouts occurred in : org.apache.tez.tests.TestExternalTezServicesErrors org.apache.tez.tests.TestExternalTezServices Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2810//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2810//console This message is automatically generated. > TezCounter for idle time in shuffle phase > - > > Key: TEZ-3810 > URL: https://issues.apache.org/jira/browse/TEZ-3810 > Project: Apache Tez > Issue Type: Improvement >Reporter: Ashwin Ramesh >Assignee: Kuhu Shukla >Priority: Major > Attachments: TEZ-3810-001.patch, TEZ-3810.002.patch, > TEZ-3810.003.patch, TEZ-3810.004.patch, TEZ-3810.005.patch > > > A task attempt counter that tracks how much time was spent waiting for > inputs in the shuffle phase. We can use this to quickly identify jobs that > are wasting a lot of time on the grid with idle reducer tasks instead of > shuffling/merging. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TEZ-3810) TezCounter for idle time in shuffle phase
[ https://issues.apache.org/jira/browse/TEZ-3810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16297208#comment-16297208 ] TezQA commented on TEZ-3810: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12902897/TEZ-3810.005.patch against master revision 4c378b4. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2705//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2705//console This message is automatically generated. > TezCounter for idle time in shuffle phase > - > > Key: TEZ-3810 > URL: https://issues.apache.org/jira/browse/TEZ-3810 > Project: Apache Tez > Issue Type: Improvement >Reporter: Ashwin Ramesh > Attachments: TEZ-3810-001.patch, TEZ-3810.002.patch, > TEZ-3810.003.patch, TEZ-3810.004.patch, TEZ-3810.005.patch > > > A task attempt counter that tracks how much time was spent waiting for > inputs in the shuffle phase. We can use this to quickly identify jobs that > are wasting a lot of time on the grid with idle reducer tasks instead of > shuffling/merging. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3810) TezCounter for idle time in shuffle phase
[ https://issues.apache.org/jira/browse/TEZ-3810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16293134#comment-16293134 ] Jason Lowe commented on TEZ-3810: - IMHO the idle time should not include when individual fetchers are idle but rather when there is nothing to shuffle (i.e.: no fetchers are running at all). It's too hard to interpret the value if it includes time when at least one fetcher is idle, since there may only be a few inputs to fetch at times. Measuring the time when no fetchers are running helps quantify the pure overhead of the task, where the task is doing nothing useful and making no progress during that idle time period. The same is not true if we're also counting cases where at least one fetcher is actively transferring data. > TezCounter for idle time in shuffle phase > - > > Key: TEZ-3810 > URL: https://issues.apache.org/jira/browse/TEZ-3810 > Project: Apache Tez > Issue Type: Improvement >Reporter: Ashwin Ramesh > Attachments: TEZ-3810-001.patch, TEZ-3810.002.patch, > TEZ-3810.003.patch, TEZ-3810.004.patch > > > A task attempt counter that tracks how much time was spent waiting for > inputs in the shuffle phase. We can use this to quickly identify jobs that > are wasting a lot of time on the grid with idle reducer tasks instead of > shuffling/merging. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3810) TezCounter for idle time in shuffle phase
[ https://issues.apache.org/jira/browse/TEZ-3810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16293117#comment-16293117 ] Kuhu Shukla commented on TEZ-3810: -- I think there is a need to retract and open up the question of what we really want to measure here up for discussion. 1. What is defined as idle shuffle time? a. Is it the time each fetcher has to wait for the input to be ready? OR b. Is it the time that runningFetchers are zero and pending hosts is empty as well? That is, as long as one fetcher is running, the shuffle process in general is not taken to be idle. This gets tricky if one of say x outputs from a given host takes a long time to finish, since pendingHosts will be non-empty and runningFetchers would be zero post all other fetches complete. There are benefits to tracking the time a single fetcher is idle, telling us more about efficiency of thread assignment to map outputs, but it may bloat the value in cases where other fetches are considered as idle time for the fetcher thread waiting on a skewed or a straggler output. Appreciate any thoughts by the community here. Thanks a lot! > TezCounter for idle time in shuffle phase > - > > Key: TEZ-3810 > URL: https://issues.apache.org/jira/browse/TEZ-3810 > Project: Apache Tez > Issue Type: Improvement >Reporter: Ashwin Ramesh > Attachments: TEZ-3810-001.patch, TEZ-3810.002.patch, > TEZ-3810.003.patch, TEZ-3810.004.patch > > > A task attempt counter that tracks how much time was spent waiting for > inputs in the shuffle phase. We can use this to quickly identify jobs that > are wasting a lot of time on the grid with idle reducer tasks instead of > shuffling/merging. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3810) TezCounter for idle time in shuffle phase
[ https://issues.apache.org/jira/browse/TEZ-3810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16290112#comment-16290112 ] Zhiyuan Yang commented on TEZ-3810: --- I think this may be not necessary {code} } else if (idleStartTime != 0) { shuffleIdleTime.increment(Time.monotonicNow() - idleStartTime); idleStartTime = 0; } {code} since number of fetchers won't increase within this loop anyway. {code} while ((runningFetchers.size() >= numFetchers || pendingHosts.isEmpty()) && numCompletedInputs.get() < numInputs) { {code} Also the test make this counter look like a timestamp, although the code works. {code} long startTime = inputContext.getCounters().findCounter(TaskCounter.SHUFFLE_IDLE_TIME).getValue(); long endTime = inputContext.getCounters().findCounter(TaskCounter.SHUFFLE_IDLE_TIME).getValue(); assertTrue("ShuffleIdleTime counter was: "+ (endTime - startTime) + "ms", endTime - startTime >= 5000); {code} > TezCounter for idle time in shuffle phase > - > > Key: TEZ-3810 > URL: https://issues.apache.org/jira/browse/TEZ-3810 > Project: Apache Tez > Issue Type: Improvement >Reporter: Ashwin Ramesh > Attachments: TEZ-3810-001.patch, TEZ-3810.002.patch, > TEZ-3810.003.patch, TEZ-3810.004.patch > > > A task attempt counter that tracks how much time was spent waiting for > inputs in the shuffle phase. We can use this to quickly identify jobs that > are wasting a lot of time on the grid with idle reducer tasks instead of > shuffling/merging. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3810) TezCounter for idle time in shuffle phase
[ https://issues.apache.org/jira/browse/TEZ-3810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16287709#comment-16287709 ] Kuhu Shukla commented on TEZ-3810: -- Test failure is unrelated. [~jlowe]/[~jeagles], request for review/comments. Thanks a lot! > TezCounter for idle time in shuffle phase > - > > Key: TEZ-3810 > URL: https://issues.apache.org/jira/browse/TEZ-3810 > Project: Apache Tez > Issue Type: Improvement >Reporter: Ashwin Ramesh > Attachments: TEZ-3810-001.patch, TEZ-3810.002.patch, > TEZ-3810.003.patch, TEZ-3810.004.patch > > > A task attempt counter that tracks how much time was spent waiting for > inputs in the shuffle phase. We can use this to quickly identify jobs that > are wasting a lot of time on the grid with idle reducer tasks instead of > shuffling/merging. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3810) TezCounter for idle time in shuffle phase
[ https://issues.apache.org/jira/browse/TEZ-3810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16286672#comment-16286672 ] TezQA commented on TEZ-3810: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12901554/TEZ-3810.004.patch against master revision 4c378b4. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.runtime.library.common.writers.TestUnorderedPartitionedKVWriter Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2701//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2701//console This message is automatically generated. > TezCounter for idle time in shuffle phase > - > > Key: TEZ-3810 > URL: https://issues.apache.org/jira/browse/TEZ-3810 > Project: Apache Tez > Issue Type: Improvement >Reporter: Ashwin Ramesh > Attachments: TEZ-3810-001.patch, TEZ-3810.002.patch, > TEZ-3810.003.patch, TEZ-3810.004.patch > > > A task attempt counter that tracks how much time was spent waiting for > inputs in the shuffle phase. We can use this to quickly identify jobs that > are wasting a lot of time on the grid with idle reducer tasks instead of > shuffling/merging. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3810) TezCounter for idle time in shuffle phase
[ https://issues.apache.org/jira/browse/TEZ-3810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16269123#comment-16269123 ] TezQA commented on TEZ-3810: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12899640/TEZ-3810.003.patch against master revision b61e55c. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.runtime.library.common.shuffle.TestShuffleUtils org.apache.tez.dag.app.dag.impl.TestVertexManager org.apache.tez.dag.app.TestSpeculation org.apache.tez.mapreduce.output.TestMROutputLegacy org.apache.tez.mapreduce.output.TestMROutput org.apache.tez.mapreduce.output.TestMultiMROutput org.apache.hadoop.mapred.split.TestGroupedSplits org.apache.tez.test.TestTaskErrorsUsingLocalMode org.apache.tez.test.TestExceptionPropagation org.apache.tez.test.TestLocalMode org.apache.tez.client.TestTezClientUtils org.apache.tez.client.TestTezClient org.apache.tez.auxservices.TestShuffleHandler org.apache.tez.tests.TestExtServicesWithLocalMode Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2694//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2694//console This message is automatically generated. > TezCounter for idle time in shuffle phase > - > > Key: TEZ-3810 > URL: https://issues.apache.org/jira/browse/TEZ-3810 > Project: Apache Tez > Issue Type: Improvement >Reporter: Ashwin Ramesh > Attachments: TEZ-3810-001.patch, TEZ-3810.002.patch, > TEZ-3810.003.patch > > > A task attempt counter that tracks how much time was spent waiting for > inputs in the shuffle phase. We can use this to quickly identify jobs that > are wasting a lot of time on the grid with idle reducer tasks instead of > shuffling/merging. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3810) TezCounter for idle time in shuffle phase
[ https://issues.apache.org/jira/browse/TEZ-3810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16268890#comment-16268890 ] TezQA commented on TEZ-3810: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12899617/TEZ-3810.002.patch against master revision b61e55c. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.dag.app.dag.impl.TestVertexManager Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2693//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/2693//artifact/patchprocess/newPatchFindbugsWarningstez-runtime-library.html Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2693//console This message is automatically generated. > TezCounter for idle time in shuffle phase > - > > Key: TEZ-3810 > URL: https://issues.apache.org/jira/browse/TEZ-3810 > Project: Apache Tez > Issue Type: Improvement >Reporter: Ashwin Ramesh > Attachments: TEZ-3810-001.patch, TEZ-3810.002.patch > > > A task attempt counter that tracks how much time was spent waiting for > inputs in the shuffle phase. We can use this to quickly identify jobs that > are wasting a lot of time on the grid with idle reducer tasks instead of > shuffling/merging. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3810) TezCounter for idle time in shuffle phase
[ https://issues.apache.org/jira/browse/TEZ-3810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16111340#comment-16111340 ] TezQA commented on TEZ-3810: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12880063/TEZ-3810-001.patch against master revision 614937c. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2597//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2597//console This message is automatically generated. > TezCounter for idle time in shuffle phase > - > > Key: TEZ-3810 > URL: https://issues.apache.org/jira/browse/TEZ-3810 > Project: Apache Tez > Issue Type: Improvement >Reporter: Ashwin Ramesh > Attachments: TEZ-3810-001.patch > > > A task attempt counter that tracks how much time was spent waiting for > inputs in the shuffle phase. We can use this to quickly identify jobs that > are wasting a lot of time on the grid with idle reducer tasks instead of > shuffling/merging. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3810) TezCounter for idle time in shuffle phase
[ https://issues.apache.org/jira/browse/TEZ-3810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1661#comment-1661 ] Ashwin Ramesh commented on TEZ-3810: [~kshukla] Thanks for the review, will look at that immediately. > TezCounter for idle time in shuffle phase > - > > Key: TEZ-3810 > URL: https://issues.apache.org/jira/browse/TEZ-3810 > Project: Apache Tez > Issue Type: Improvement >Reporter: Ashwin Ramesh > Attachments: TEZ-3810-001.patch > > > A task attempt counter that tracks how much time was spent waiting for > inputs in the shuffle phase. We can use this to quickly identify jobs that > are wasting a lot of time on the grid with idle reducer tasks instead of > shuffling/merging. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3810) TezCounter for idle time in shuffle phase
[ https://issues.apache.org/jira/browse/TEZ-3810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1642#comment-1642 ] Kuhu Shukla commented on TEZ-3810: -- Thanks [~aramesh2] for the patch! I haven't fully reviewed this patch, but I think we will need an equivalent change in ShuffleScheduler as well (which is the shuffle-r for the ordered case). > TezCounter for idle time in shuffle phase > - > > Key: TEZ-3810 > URL: https://issues.apache.org/jira/browse/TEZ-3810 > Project: Apache Tez > Issue Type: Improvement >Reporter: Ashwin Ramesh > Attachments: TEZ-3810-001.patch > > > A task attempt counter that tracks how much time was spent waiting for > inputs in the shuffle phase. We can use this to quickly identify jobs that > are wasting a lot of time on the grid with idle reducer tasks instead of > shuffling/merging. -- This message was sent by Atlassian JIRA (v6.4.14#64029)