[jira] [Updated] (TEZ-2839) Tez UI: Use another kind of bar to represent dag killed/failed
[ https://issues.apache.org/jira/browse/TEZ-2839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated TEZ-2839: Attachment: 2015-09-17_1359.png > Tez UI: Use another kind of bar to represent dag killed/failed > -- > > Key: TEZ-2839 > URL: https://issues.apache.org/jira/browse/TEZ-2839 > Project: Apache Tez > Issue Type: Sub-task > Components: UI >Reporter: Jeff Zhang >Priority: Minor > Attachments: 2015-09-17_1359.png > > > Currently tez-ui use a blue animation bar to indicate the progress of dag. > It would be better to use another kind (red one and without animation ?) of > bar in the case of dag failed/killed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-2839) Tez UI: Use another kind of bar to represent dag killed/failed
Jeff Zhang created TEZ-2839: --- Summary: Tez UI: Use another kind of bar to represent dag killed/failed Key: TEZ-2839 URL: https://issues.apache.org/jira/browse/TEZ-2839 Project: Apache Tez Issue Type: Sub-task Reporter: Jeff Zhang Priority: Minor Currently tez-ui use a blue animation bar to indicate the progress of dag. It would be better to use another kind (red one and without animation ?) of bar in the case of dag failed/killed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2838) Tez UI: Finished Time is not updated in real-time
[ https://issues.apache.org/jira/browse/TEZ-2838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated TEZ-2838: Summary: Tez UI: Finished Time is not updated in real-time (was: Tez UI: Finish Time and Duration is not available on DAG Details) > Tez UI: Finished Time is not updated in real-time > - > > Key: TEZ-2838 > URL: https://issues.apache.org/jira/browse/TEZ-2838 > Project: Apache Tez > Issue Type: Sub-task > Components: UI >Affects Versions: 0.8.1 >Reporter: Jeff Zhang >Priority: Minor > Attachments: 2015-09-17_1338.png > > > I have to refresh the page to see the finished time and duration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2838) Tez UI: Finished Time is not updated in real-time
[ https://issues.apache.org/jira/browse/TEZ-2838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated TEZ-2838: Description: I have to refresh the page to see the finished time and duration. Same for DAG/Vertex/Task/TaskAttempt was:I have to refresh the page to see the finished time and duration. > Tez UI: Finished Time is not updated in real-time > - > > Key: TEZ-2838 > URL: https://issues.apache.org/jira/browse/TEZ-2838 > Project: Apache Tez > Issue Type: Sub-task > Components: UI >Affects Versions: 0.8.1 >Reporter: Jeff Zhang >Priority: Minor > Attachments: 2015-09-17_1338.png > > > I have to refresh the page to see the finished time and duration. > Same for DAG/Vertex/Task/TaskAttempt -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2838) Tez UI: Finish Time and Duration is not available on DAG Details
[ https://issues.apache.org/jira/browse/TEZ-2838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated TEZ-2838: Priority: Minor (was: Major) > Tez UI: Finish Time and Duration is not available on DAG Details > > > Key: TEZ-2838 > URL: https://issues.apache.org/jira/browse/TEZ-2838 > Project: Apache Tez > Issue Type: Sub-task > Components: UI >Affects Versions: 0.8.1 >Reporter: Jeff Zhang >Priority: Minor > Attachments: 2015-09-17_1338.png > > > I have to refresh the page to see the finished time and duration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-2838) Tez UI: Finish Time and Duration is not available on DAG Details
Jeff Zhang created TEZ-2838: --- Summary: Tez UI: Finish Time and Duration is not available on DAG Details Key: TEZ-2838 URL: https://issues.apache.org/jira/browse/TEZ-2838 Project: Apache Tez Issue Type: Sub-task Affects Versions: 0.8.1 Reporter: Jeff Zhang Attachments: 2015-09-17_1338.png I have to refresh the page to see the finished time and duration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2838) Tez UI: Finish Time and Duration is not available on DAG Details
[ https://issues.apache.org/jira/browse/TEZ-2838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated TEZ-2838: Attachment: 2015-09-17_1338.png > Tez UI: Finish Time and Duration is not available on DAG Details > > > Key: TEZ-2838 > URL: https://issues.apache.org/jira/browse/TEZ-2838 > Project: Apache Tez > Issue Type: Sub-task > Components: UI >Affects Versions: 0.8.1 >Reporter: Jeff Zhang > Attachments: 2015-09-17_1338.png > > > I have to refresh the page to see the finished time and duration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2837) TEZ UI: First Task Start Time is not available
[ https://issues.apache.org/jira/browse/TEZ-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated TEZ-2837: Issue Type: Sub-task (was: Improvement) Parent: TEZ-2760 > TEZ UI: First Task Start Time is not available > -- > > Key: TEZ-2837 > URL: https://issues.apache.org/jira/browse/TEZ-2837 > Project: Apache Tez > Issue Type: Sub-task >Affects Versions: 0.8.1 >Reporter: Jeff Zhang >Priority: Minor > Attachments: 2015-09-17_1326.png > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2837) TEZ UI: First Task Start Time is not available
[ https://issues.apache.org/jira/browse/TEZ-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated TEZ-2837: Attachment: 2015-09-17_1326.png > TEZ UI: First Task Start Time is not available > -- > > Key: TEZ-2837 > URL: https://issues.apache.org/jira/browse/TEZ-2837 > Project: Apache Tez > Issue Type: Improvement >Affects Versions: 0.8.1 >Reporter: Jeff Zhang >Priority: Minor > Attachments: 2015-09-17_1326.png > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-2837) TEZ UI: First Task Start Time is not available
Jeff Zhang created TEZ-2837: --- Summary: TEZ UI: First Task Start Time is not available Key: TEZ-2837 URL: https://issues.apache.org/jira/browse/TEZ-2837 Project: Apache Tez Issue Type: Improvement Affects Versions: 0.8.1 Reporter: Jeff Zhang Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-814) Improve heuristic for determining a task has failed outputs
[ https://issues.apache.org/jira/browse/TEZ-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791544#comment-14791544 ] TezQA commented on TEZ-814: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12756404/TEZ-814.1.patch against master revision 1a065b9. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/1146//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/1146//artifact/patchprocess/newPatchFindbugsWarningstez-dag.html Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1146//console This message is automatically generated. > Improve heuristic for determining a task has failed outputs > --- > > Key: TEZ-814 > URL: https://issues.apache.org/jira/browse/TEZ-814 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Bikas Saha >Assignee: Bikas Saha > Fix For: 0.7.1 > > Attachments: TEZ-814.1.patch > > > Currently 25% of consumers need to report failure. However we may not always > have those many error reports. Eg. this is the last consumer and it the > source is lost. Or some consumers are cut off from the source. The job may > hang on those consumers waiting for a re-run. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Failed: TEZ-814 PreCommit Build #1146
Jira: https://issues.apache.org/jira/browse/TEZ-814 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/1146/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 3456 lines...] {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12756404/TEZ-814.1.patch against master revision 1a065b9. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/1146//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/1146//artifact/patchprocess/newPatchFindbugsWarningstez-dag.html Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1146//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. fe9df782fb7defa2d684dffef4d0e3e5d14ffe91 logged out == == Finished build. == == Build step 'Execute shell' marked build as failure Archiving artifacts Sending artifact delta relative to PreCommit-TEZ-Build #1144 Archived 53 artifacts Archive block size is 32768 Received 6 blocks and 3102498 bytes Compression is 6.0% Took 0.88 sec [description-setter] Could not determine description. Recording test results Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Updated] (TEZ-814) Improve heuristic for determining a task has failed outputs
[ https://issues.apache.org/jira/browse/TEZ-814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated TEZ-814: --- Fix Version/s: 0.7.1 > Improve heuristic for determining a task has failed outputs > --- > > Key: TEZ-814 > URL: https://issues.apache.org/jira/browse/TEZ-814 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Bikas Saha >Assignee: Bikas Saha > Fix For: 0.7.1 > > Attachments: TEZ-814.1.patch > > > Currently 25% of consumers need to report failure. However we may not always > have those many error reports. Eg. this is the last consumer and it the > source is lost. Or some consumers are cut off from the source. The job may > hang on those consumers waiting for a re-run. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-814) Improve heuristic for determining a task has failed outputs
[ https://issues.apache.org/jira/browse/TEZ-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791456#comment-14791456 ] Bikas Saha commented on TEZ-814: Heuristics are mainly designed to prevent inadvertent flurry of re-runs due to intermittent network issues. So we have fraction and unique failures reported heuristics to verify that multiple readers are reporting the same failure. Regardless of these current and future heuristics we need to ensure indefinite job hangs due to non convergent heuristics. So this patch adds a time based deadline. If a consumer attempt reports a read error for a timespan exceeding a threshold (default 300s) then the producer attempt will be re-run. [~rajesh.balamohan] [~hitesh] Please review > Improve heuristic for determining a task has failed outputs > --- > > Key: TEZ-814 > URL: https://issues.apache.org/jira/browse/TEZ-814 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Bikas Saha > Attachments: TEZ-814.1.patch > > > Currently 25% of consumers need to report failure. However we may not always > have those many error reports. Eg. this is the last consumer and it the > source is lost. Or some consumers are cut off from the source. The job may > hang on those consumers waiting for a re-run. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (TEZ-814) Improve heuristic for determining a task has failed outputs
[ https://issues.apache.org/jira/browse/TEZ-814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha reassigned TEZ-814: -- Assignee: Bikas Saha > Improve heuristic for determining a task has failed outputs > --- > > Key: TEZ-814 > URL: https://issues.apache.org/jira/browse/TEZ-814 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Bikas Saha >Assignee: Bikas Saha > Attachments: TEZ-814.1.patch > > > Currently 25% of consumers need to report failure. However we may not always > have those many error reports. Eg. this is the last consumer and it the > source is lost. Or some consumers are cut off from the source. The job may > hang on those consumers waiting for a re-run. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-814) Improve heuristic for determining a task has failed outputs
[ https://issues.apache.org/jira/browse/TEZ-814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated TEZ-814: --- Attachment: TEZ-814.1.patch > Improve heuristic for determining a task has failed outputs > --- > > Key: TEZ-814 > URL: https://issues.apache.org/jira/browse/TEZ-814 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Bikas Saha > Attachments: TEZ-814.1.patch > > > Currently 25% of consumers need to report failure. However we may not always > have those many error reports. Eg. this is the last consumer and it the > source is lost. Or some consumers are cut off from the source. The job may > hang on those consumers waiting for a re-run. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2836) Avoid setting framework/system counters for tasks running in threads
[ https://issues.apache.org/jira/browse/TEZ-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791435#comment-14791435 ] TezQA commented on TEZ-2836: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12756368/TEZ-2836.1.txt against master revision 1a065b9. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/1145//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/1145//artifact/patchprocess/newPatchFindbugsWarningstez-dag.html Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1145//console This message is automatically generated. > Avoid setting framework/system counters for tasks running in threads > > > Key: TEZ-2836 > URL: https://issues.apache.org/jira/browse/TEZ-2836 > Project: Apache Tez > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: TEZ-2836.1.txt > > > Counters like FileSystemCounters, GC_TIME, CPU_TIME etc - are computed > incorrectly in case of LocalMode, Uber, TestService and others where tasks > may execute in threads. (The values end up being a combination of what's > running in the process - which could be other tasks or the AM). > It's better not to set them for now, instead of reporting incorrect values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Failed: TEZ-2836 PreCommit Build #1145
Jira: https://issues.apache.org/jira/browse/TEZ-2836 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/1145/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 3463 lines...] {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12756368/TEZ-2836.1.txt against master revision 1a065b9. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/1145//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/1145//artifact/patchprocess/newPatchFindbugsWarningstez-dag.html Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1145//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. b62c2cda1c24e7403d040eca8a7c84c190e368f5 logged out == == Finished build. == == Build step 'Execute shell' marked build as failure Archiving artifacts Sending artifact delta relative to PreCommit-TEZ-Build #1144 Archived 53 artifacts Archive block size is 32768 Received 10 blocks and 2937915 bytes Compression is 10.0% Took 4.2 sec [description-setter] Could not determine description. Recording test results Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Resolved] (TEZ-2830) Backport TEZ-2774 to branch-0.7
[ https://issues.apache.org/jira/browse/TEZ-2830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth resolved TEZ-2830. - Resolution: Fixed Fix Version/s: 0.7.1 > Backport TEZ-2774 to branch-0.7 > --- > > Key: TEZ-2830 > URL: https://issues.apache.org/jira/browse/TEZ-2830 > Project: Apache Tez > Issue Type: Task >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Fix For: 0.7.1 > > Attachments: TEZ-2830.1.txt, TEZ-2830.2.txt > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2836) Avoid setting framework/system counters for tasks running in threads
[ https://issues.apache.org/jira/browse/TEZ-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated TEZ-2836: Attachment: TEZ-2836.1.txt [~rajesh.balamohan], [~hitesh] - please review. This disables the final updateCounters for local and uber mode, and in the test service. > Avoid setting framework/system counters for tasks running in threads > > > Key: TEZ-2836 > URL: https://issues.apache.org/jira/browse/TEZ-2836 > Project: Apache Tez > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: TEZ-2836.1.txt > > > Counters like FileSystemCounters, GC_TIME, CPU_TIME etc - are computed > incorrectly in case of LocalMode, Uber, TestService and others where tasks > may execute in threads. (The values end up being a combination of what's > running in the process - which could be other tasks or the AM). > It's better not to set them for now, instead of reporting incorrect values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-2836) Avoid setting framework/system counters for tasks running in threads
Siddharth Seth created TEZ-2836: --- Summary: Avoid setting framework/system counters for tasks running in threads Key: TEZ-2836 URL: https://issues.apache.org/jira/browse/TEZ-2836 Project: Apache Tez Issue Type: Improvement Reporter: Siddharth Seth Assignee: Siddharth Seth Counters like FileSystemCounters, GC_TIME, CPU_TIME etc - are computed incorrectly in case of LocalMode, Uber, TestService and others where tasks may execute in threads. (The values end up being a combination of what's running in the process - which could be other tasks or the AM). It's better not to set them for now, instead of reporting incorrect values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2830) Backport TEZ-2774 to branch-0.7
[ https://issues.apache.org/jira/browse/TEZ-2830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791209#comment-14791209 ] Siddharth Seth commented on TEZ-2830: - That's not relevant to branch-0.7, only for threaded execution of tasks. Thanks for taking a look. Committing. > Backport TEZ-2774 to branch-0.7 > --- > > Key: TEZ-2830 > URL: https://issues.apache.org/jira/browse/TEZ-2830 > Project: Apache Tez > Issue Type: Task >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: TEZ-2830.1.txt, TEZ-2830.2.txt > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-2835) [Timeline ACLs] Session-level entities should not be tied to the dag's domain
Hitesh Shah created TEZ-2835: Summary: [Timeline ACLs] Session-level entities should not be tied to the dag's domain Key: TEZ-2835 URL: https://issues.apache.org/jira/browse/TEZ-2835 Project: Apache Tez Issue Type: Bug Reporter: Hitesh Shah Container events may be created either at session start or a different dag. Updates to the container entities if done in a different dag will have acl issues if a common domain-acl for Timeline is not used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2826) save the status for completed dags in a session
[ https://issues.apache.org/jira/browse/TEZ-2826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-2826: - Description: currently we store the list of dags completed. If we store the state of the dag too, it would be helpful in the case of tez-ui where the ui will query a completed dag and show the uptodate status for completed dags in a session. \cc [~zjffdu] was: currently we store the list of dags completed. If we store the state of the dag too, it would be helpful in the case of tez-ui where the ui will query a completed dag and show the uptodate status for completed dags in a session. \cc [~jzhang] > save the status for completed dags in a session > --- > > Key: TEZ-2826 > URL: https://issues.apache.org/jira/browse/TEZ-2826 > Project: Apache Tez > Issue Type: Bug >Reporter: Prakash Ramachandran >Assignee: Hitesh Shah > > currently we store the list of dags completed. If we store the state of the > dag too, it would be helpful in the case of tez-ui where the ui will query a > completed dag and show the uptodate status for completed dags in a session. > \cc [~zjffdu] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2830) Backport TEZ-2774 to branch-0.7
[ https://issues.apache.org/jira/browse/TEZ-2830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791099#comment-14791099 ] Bikas Saha commented on TEZ-2830: - lgtm. found one missing item. perhaps its not relevant to 0.7. {code} -LOG.debug("ThreadId : " + id + ", name=" + threadInfo.getThreadName()); +if (LOG.isDebugEnabled()) { + LOG.debug("ThreadId : " + id + ", name=" + threadInfo.getThreadName()); +}{code} > Backport TEZ-2774 to branch-0.7 > --- > > Key: TEZ-2830 > URL: https://issues.apache.org/jira/browse/TEZ-2830 > Project: Apache Tez > Issue Type: Task >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: TEZ-2830.1.txt, TEZ-2830.2.txt > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2830) Backport TEZ-2774 to branch-0.7
[ https://issues.apache.org/jira/browse/TEZ-2830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated TEZ-2830: Attachment: TEZ-2830.2.txt Updated with the addendum to 2774 > Backport TEZ-2774 to branch-0.7 > --- > > Key: TEZ-2830 > URL: https://issues.apache.org/jira/browse/TEZ-2830 > Project: Apache Tez > Issue Type: Task >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: TEZ-2830.1.txt, TEZ-2830.2.txt > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2774) Reduce logging in the AM, and parts of the runtime
[ https://issues.apache.org/jira/browse/TEZ-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791022#comment-14791022 ] Bikas Saha commented on TEZ-2774: - Thanks! commit 1a065b9d87d84645363d0c65ae021a6a514169a8 Author: Bikas Saha Date: Wed Sep 16 12:50:38 2015 -0700 TEZ-2774. addendum to add a preemption periodic log > Reduce logging in the AM, and parts of the runtime > -- > > Key: TEZ-2774 > URL: https://issues.apache.org/jira/browse/TEZ-2774 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Fix For: 0.8.1 > > Attachments: TEZ-2774.1.txt, TEZ-2774.2.txt, TEZ-2774.3.txt, > TEZ-2774.4.patch, TEZ-2774.5.patch, TEZ-2774.addendum.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2774) Reduce logging in the AM, and parts of the runtime
[ https://issues.apache.org/jira/browse/TEZ-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790974#comment-14790974 ] Siddharth Seth commented on TEZ-2774: - Looks fine. > Reduce logging in the AM, and parts of the runtime > -- > > Key: TEZ-2774 > URL: https://issues.apache.org/jira/browse/TEZ-2774 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Fix For: 0.8.1 > > Attachments: TEZ-2774.1.txt, TEZ-2774.2.txt, TEZ-2774.3.txt, > TEZ-2774.4.patch, TEZ-2774.5.patch, TEZ-2774.addendum.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2774) Reduce logging in the AM, and parts of the runtime
[ https://issues.apache.org/jira/browse/TEZ-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated TEZ-2774: Attachment: TEZ-2774.addendum.patch Attaching an addendum patch that periodically logs in preemption related code. The log wasnt removed in this jira but TEZ-2834 showed the absence of this log is bad. Adding a periodicity to that logging would help. > Reduce logging in the AM, and parts of the runtime > -- > > Key: TEZ-2774 > URL: https://issues.apache.org/jira/browse/TEZ-2774 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Fix For: 0.8.1 > > Attachments: TEZ-2774.1.txt, TEZ-2774.2.txt, TEZ-2774.3.txt, > TEZ-2774.4.patch, TEZ-2774.5.patch, TEZ-2774.addendum.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TEZ-2774) Reduce logging in the AM, and parts of the runtime
[ https://issues.apache.org/jira/browse/TEZ-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790965#comment-14790965 ] Bikas Saha edited comment on TEZ-2774 at 9/16/15 7:09 PM: -- Attaching an addendum patch that periodically logs in preemption related code. The log wasnt removed in this jira but TEZ-2834 showed the absence of this log is bad. Adding a periodicity to that logging would help. [~sseth] Could you take a quick look at the addendum? was (Author: bikassaha): Attaching an addendum patch that periodically logs in preemption related code. The log wasnt removed in this jira but TEZ-2834 showed the absence of this log is bad. Adding a periodicity to that logging would help. > Reduce logging in the AM, and parts of the runtime > -- > > Key: TEZ-2774 > URL: https://issues.apache.org/jira/browse/TEZ-2774 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Fix For: 0.8.1 > > Attachments: TEZ-2774.1.txt, TEZ-2774.2.txt, TEZ-2774.3.txt, > TEZ-2774.4.patch, TEZ-2774.5.patch, TEZ-2774.addendum.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2830) Backport TEZ-2774 to branch-0.7
[ https://issues.apache.org/jira/browse/TEZ-2830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated TEZ-2830: Attachment: TEZ-2830.1.txt [~bikassaha] - could you please scan through the backport for sanity. > Backport TEZ-2774 to branch-0.7 > --- > > Key: TEZ-2830 > URL: https://issues.apache.org/jira/browse/TEZ-2830 > Project: Apache Tez > Issue Type: Task >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: TEZ-2830.1.txt > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (TEZ-2834) tez app hangs at large scale (~30TB)
[ https://issues.apache.org/jira/browse/TEZ-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha reassigned TEZ-2834: --- Assignee: Bikas Saha > tez app hangs at large scale (~30TB) > > > Key: TEZ-2834 > URL: https://issues.apache.org/jira/browse/TEZ-2834 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.7.1 >Reporter: Rajesh Balamohan >Assignee: Bikas Saha > Attachments: DAG_view.png, application_1442254312093_0095.1.log.gz, > application_1442254312093_0095.2.log.gz, hive_view.png > > > Will attach the DAG. > Repro for reference: TPC-DS q_70 @ 30 TB scale. > "Map 7" completes in 2 waves. Output is very tiny, so reducer 8 gets launched > slightly late. But before "Reducer 9" can get scheduled, slots are taken up > by "Map 1", which is not preempted for running "Reducer 9". > This is with 0.7.1 codebase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2834) tez app hangs at large scale (~30TB)
[ https://issues.apache.org/jira/browse/TEZ-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated TEZ-2834: Assignee: Bikas Saha (was: Siddharth Seth) > tez app hangs at large scale (~30TB) > > > Key: TEZ-2834 > URL: https://issues.apache.org/jira/browse/TEZ-2834 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.7.1 >Reporter: Rajesh Balamohan >Assignee: Bikas Saha > Attachments: DAG_view.png, application_1442254312093_0095.1.log.gz, > application_1442254312093_0095.2.log.gz, hive_view.png > > > Will attach the DAG. > Repro for reference: TPC-DS q_70 @ 30 TB scale. > "Map 7" completes in 2 waves. Output is very tiny, so reducer 8 gets launched > slightly late. But before "Reducer 9" can get scheduled, slots are taken up > by "Map 1", which is not preempted for running "Reducer 9". > This is with 0.7.1 codebase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (TEZ-2834) tez app hangs at large scale (~30TB)
[ https://issues.apache.org/jira/browse/TEZ-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth reassigned TEZ-2834: --- Assignee: Siddharth Seth > tez app hangs at large scale (~30TB) > > > Key: TEZ-2834 > URL: https://issues.apache.org/jira/browse/TEZ-2834 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.7.1 >Reporter: Rajesh Balamohan >Assignee: Siddharth Seth > Attachments: DAG_view.png, application_1442254312093_0095.1.log.gz, > application_1442254312093_0095.2.log.gz, hive_view.png > > > Will attach the DAG. > Repro for reference: TPC-DS q_70 @ 30 TB scale. > "Map 7" completes in 2 waves. Output is very tiny, so reducer 8 gets launched > slightly late. But before "Reducer 9" can get scheduled, slots are taken up > by "Map 1", which is not preempted for running "Reducer 9". > This is with 0.7.1 codebase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2834) tez app hangs at large scale (~30TB)
[ https://issues.apache.org/jira/browse/TEZ-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated TEZ-2834: Assignee: (was: Bikas Saha) > tez app hangs at large scale (~30TB) > > > Key: TEZ-2834 > URL: https://issues.apache.org/jira/browse/TEZ-2834 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.7.1 >Reporter: Rajesh Balamohan > Attachments: DAG_view.png, application_1442254312093_0095.1.log.gz, > application_1442254312093_0095.2.log.gz, hive_view.png > > > Will attach the DAG. > Repro for reference: TPC-DS q_70 @ 30 TB scale. > "Map 7" completes in 2 waves. Output is very tiny, so reducer 8 gets launched > slightly late. But before "Reducer 9" can get scheduled, slots are taken up > by "Map 1", which is not preempted for running "Reducer 9". > This is with 0.7.1 codebase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2834) tez app hangs at large scale (~30TB)
[ https://issues.apache.org/jira/browse/TEZ-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790823#comment-14790823 ] Bikas Saha commented on TEZ-2834: - Was the cluster fully occupied when this was happening. My speculation is that the headroom reported by RM was enough to cover this 1 task and so we were not preempting anything but we were not getting containers allocated to us. > tez app hangs at large scale (~30TB) > > > Key: TEZ-2834 > URL: https://issues.apache.org/jira/browse/TEZ-2834 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.7.1 >Reporter: Rajesh Balamohan > Attachments: DAG_view.png, application_1442254312093_0095.1.log.gz, > application_1442254312093_0095.2.log.gz, hive_view.png > > > Will attach the DAG. > Repro for reference: TPC-DS q_70 @ 30 TB scale. > "Map 7" completes in 2 waves. Output is very tiny, so reducer 8 gets launched > slightly late. But before "Reducer 9" can get scheduled, slots are taken up > by "Map 1", which is not preempted for running "Reducer 9". > This is with 0.7.1 codebase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2834) tez app hangs at large scale (~30TB)
[ https://issues.apache.org/jira/browse/TEZ-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790819#comment-14790819 ] Bikas Saha commented on TEZ-2834: - The preemption code logs are all debug. This issue cannot be debugged with the attached logs. > tez app hangs at large scale (~30TB) > > > Key: TEZ-2834 > URL: https://issues.apache.org/jira/browse/TEZ-2834 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.7.1 >Reporter: Rajesh Balamohan > Attachments: DAG_view.png, application_1442254312093_0095.1.log.gz, > application_1442254312093_0095.2.log.gz, hive_view.png > > > Will attach the DAG. > Repro for reference: TPC-DS q_70 @ 30 TB scale. > "Map 7" completes in 2 waves. Output is very tiny, so reducer 8 gets launched > slightly late. But before "Reducer 9" can get scheduled, slots are taken up > by "Map 1", which is not preempted for running "Reducer 9". > This is with 0.7.1 codebase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2834) tez app hangs at large scale (~30TB)
[ https://issues.apache.org/jira/browse/TEZ-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790783#comment-14790783 ] Gopal V commented on TEZ-2834: -- [~bikassaha]: YARN-4149? That was fixed last night, it's not deployed on the cluster yet. > tez app hangs at large scale (~30TB) > > > Key: TEZ-2834 > URL: https://issues.apache.org/jira/browse/TEZ-2834 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.7.1 >Reporter: Rajesh Balamohan > Attachments: DAG_view.png, application_1442254312093_0095.1.log.gz, > application_1442254312093_0095.2.log.gz, hive_view.png > > > Will attach the DAG. > Repro for reference: TPC-DS q_70 @ 30 TB scale. > "Map 7" completes in 2 waves. Output is very tiny, so reducer 8 gets launched > slightly late. But before "Reducer 9" can get scheduled, slots are taken up > by "Map 1", which is not preempted for running "Reducer 9". > This is with 0.7.1 codebase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2834) tez app hangs at large scale (~30TB)
[ https://issues.apache.org/jira/browse/TEZ-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790778#comment-14790778 ] Bikas Saha commented on TEZ-2834: - If this cluster has latest YARN then the am logs can be separately downloaded using the new yarn logs commands enhancements. > tez app hangs at large scale (~30TB) > > > Key: TEZ-2834 > URL: https://issues.apache.org/jira/browse/TEZ-2834 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.7.1 >Reporter: Rajesh Balamohan > Attachments: DAG_view.png, application_1442254312093_0095.1.log.gz, > application_1442254312093_0095.2.log.gz, hive_view.png > > > Will attach the DAG. > Repro for reference: TPC-DS q_70 @ 30 TB scale. > "Map 7" completes in 2 waves. Output is very tiny, so reducer 8 gets launched > slightly late. But before "Reducer 9" can get scheduled, slots are taken up > by "Map 1", which is not preempted for running "Reducer 9". > This is with 0.7.1 codebase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2833) Dont create extra directory during ATS file download
[ https://issues.apache.org/jira/browse/TEZ-2833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790762#comment-14790762 ] Bikas Saha commented on TEZ-2833: - Couldn't understand the scenario :) The file names are already different. So not sure how having the extra folder helps. > Dont create extra directory during ATS file download > > > Key: TEZ-2833 > URL: https://issues.apache.org/jira/browse/TEZ-2833 > Project: Apache Tez > Issue Type: Bug >Reporter: Bikas Saha >Assignee: Rajesh Balamohan > > The file name already has the dag id as a unique identifier. Placing it > inside another directory with the dag id seems unnecessary and can throw off > a user expecting the zip file in the user specified download dir. > /cc [~rajesh.balamohan] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2732) DefaultSorter throws ArrayIndex exceptions on 2047 Mb size sort buffers
[ https://issues.apache.org/jira/browse/TEZ-2732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-2732: - Affects Version/s: 0.5.0 0.6.0 0.7.0 0.8.0-alpha > DefaultSorter throws ArrayIndex exceptions on 2047 Mb size sort buffers > --- > > Key: TEZ-2732 > URL: https://issues.apache.org/jira/browse/TEZ-2732 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.5.0, 0.6.0, 0.7.0 >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan > Fix For: 0.7.1 > > Attachments: TEZ-2732.1.patch, TEZ-2732.branch-0.6-and-0.5.patch, > TEZ-2732.branch-0.7.patch > > > {noformat} > kvbuffer.length = 2146435072 (2047 MB) > Corner case: bufIndex=2026133899, kvbidx=523629312. > distkvi = mod - i + j = 2146435072 - 2026133899 + 523629312 = 643930485 > newPos = (2026133899 + (max(.., min(643930485/2, 271128624))) (This would > overflow) > {noformat} > Would be good to restrict the max allowed sort buffer to 1800 instead of > 2047. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2732) DefaultSorter throws ArrayIndex exceptions on 2047 Mb size sort buffers
[ https://issues.apache.org/jira/browse/TEZ-2732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-2732: - Affects Version/s: (was: 0.8.0-alpha) > DefaultSorter throws ArrayIndex exceptions on 2047 Mb size sort buffers > --- > > Key: TEZ-2732 > URL: https://issues.apache.org/jira/browse/TEZ-2732 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.5.0, 0.6.0, 0.7.0 >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan > Fix For: 0.7.1 > > Attachments: TEZ-2732.1.patch, TEZ-2732.branch-0.6-and-0.5.patch, > TEZ-2732.branch-0.7.patch > > > {noformat} > kvbuffer.length = 2146435072 (2047 MB) > Corner case: bufIndex=2026133899, kvbidx=523629312. > distkvi = mod - i + j = 2146435072 - 2026133899 + 523629312 = 643930485 > newPos = (2026133899 + (max(.., min(643930485/2, 271128624))) (This would > overflow) > {noformat} > Would be good to restrict the max allowed sort buffer to 1800 instead of > 2047. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2834) tez app hangs at large scale (~30TB)
[ https://issues.apache.org/jira/browse/TEZ-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated TEZ-2834: -- Description: Will attach the DAG. Repro for reference: TPC-DS q_70 @ 30 TB scale. "Map 7" completes in 2 waves. Output is very tiny, so reducer 8 gets launched slightly late. But before "Reducer 9" can get scheduled, slots are taken up by "Map 1", which is not preempted for running "Reducer 9". This is with 0.7.1 codebase. was: Will attach the DAG. Repro for reference: TPC-DS q_70 @ 30 TB scale. "Map 7" completes in 2 waves. Output is very tiny, so reducer 8 gets launched slightly late. But before "Reducer 9" can get scheduled, slots are taken up by "Map 1", which is not preempted for running "Reducer 9". > tez app hangs at large scale (~30TB) > > > Key: TEZ-2834 > URL: https://issues.apache.org/jira/browse/TEZ-2834 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.7.1 >Reporter: Rajesh Balamohan > Attachments: DAG_view.png, application_1442254312093_0095.1.log.gz, > application_1442254312093_0095.2.log.gz, hive_view.png > > > Will attach the DAG. > Repro for reference: TPC-DS q_70 @ 30 TB scale. > "Map 7" completes in 2 waves. Output is very tiny, so reducer 8 gets launched > slightly late. But before "Reducer 9" can get scheduled, slots are taken up > by "Map 1", which is not preempted for running "Reducer 9". > This is with 0.7.1 codebase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-2834) tez app hangs at large scale (~30TB)
[ https://issues.apache.org/jira/browse/TEZ-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated TEZ-2834: -- Attachment: application_1442254312093_0095.1.log.gz application_1442254312093_0095.2.log.gz DAG_view.png hive_view.png Attaching DAG, hive_view and app logs for reference. App logs has been split into 2 and uploaded as they are huge. {noformat} 2015-09-15 09:41:12,208 INFO [Dispatcher thread: Central] impl.VertexImpl: Creating 2 tasks for vertex: vertex_1442254312093_0095_1_05 [Reducer 9] 2015-09-15 09:41:12,208 INFO [Dispatcher thread: Central] impl.VertexImpl: Directly initializing vertex: vertex_1442254312093_0095_1_05 [Reducer 9] ... 2015-09-15 09:43:25,493 INFO [Dispatcher thread: Central] impl.TaskAttemptImpl: attempt_1442254312093_0095_1_05_00_0 TaskAttempt Transitioned from NEW to START_WAIT due to event TA_SCHEDULE 2015-09-15 09:43:25,493 INFO [TaskSchedulerEventHandlerThread] rm.YarnTaskSchedulerService: Allocation request for task: attempt_1442254312093_0095_1_05_00_0 with request: Capability[]Priority[11] host: null rack: null {noformat} Reducer 9 is not getting transitioned after "NEW to START_WAIT due to event TA_SCHEDULE" > tez app hangs at large scale (~30TB) > > > Key: TEZ-2834 > URL: https://issues.apache.org/jira/browse/TEZ-2834 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.7.1 >Reporter: Rajesh Balamohan > Attachments: DAG_view.png, application_1442254312093_0095.1.log.gz, > application_1442254312093_0095.2.log.gz, hive_view.png > > > Will attach the DAG. > Repro for reference: TPC-DS q_70 @ 30 TB scale. > "Map 7" completes in 2 waves. Output is very tiny, so reducer 8 gets launched > slightly late. But before "Reducer 9" can get scheduled, slots are taken up > by "Map 1", which is not preempted for running "Reducer 9". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-2834) tez app hangs at large scale (~30TB)
Rajesh Balamohan created TEZ-2834: - Summary: tez app hangs at large scale (~30TB) Key: TEZ-2834 URL: https://issues.apache.org/jira/browse/TEZ-2834 Project: Apache Tez Issue Type: Bug Affects Versions: 0.7.1 Reporter: Rajesh Balamohan Will attach the DAG. Repro for reference: TPC-DS q_70 @ 30 TB scale. "Map 7" completes in 2 waves. Output is very tiny, so reducer 8 gets launched slightly late. But before "Reducer 9" can get scheduled, slots are taken up by "Map 1", which is not preempted for running "Reducer 9". -- This message was sent by Atlassian JIRA (v6.3.4#6332)