[jira] [Created] (TEZ-3318) Tez UI: Polling is not restarted after RM recovery
Sreenath Somarajapuram created TEZ-3318: --- Summary: Tez UI: Polling is not restarted after RM recovery Key: TEZ-3318 URL: https://issues.apache.org/jira/browse/TEZ-3318 Project: Apache Tez Issue Type: Bug Reporter: Sreenath Somarajapuram Assignee: Sreenath Somarajapuram For a running DAG, we poll the AM to get progress and other realtime information. This communication happens via RM. If RM goes down, even after its recovery the polling is not re established. Step to repro: 1. Run a job 2. Go to DAG details page, and ensure that the progress is getting updated. 3. Stop RM, and ensure that error bar is getting displayed in the UI. 4. Start RM. 5. As soon as RM is online, the progress bar must get updated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3318) Tez UI: Polling is not restarted after RM recovery
[ https://issues.apache.org/jira/browse/TEZ-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15357256#comment-15357256 ] Hitesh Shah commented on TEZ-3318: -- I think there should be limit to how many continuous re-tries are done. Maybe say 10 mins in total at the very max? i.e. if polling every 10 seconds, max retries should be for 60 times? This counter should obviously be reset to 0 on the first successful call. > Tez UI: Polling is not restarted after RM recovery > -- > > Key: TEZ-3318 > URL: https://issues.apache.org/jira/browse/TEZ-3318 > Project: Apache Tez > Issue Type: Bug >Reporter: Sreenath Somarajapuram >Assignee: Sreenath Somarajapuram > > For a running DAG, we poll the AM to get progress and other realtime > information. This communication happens via RM. If RM goes down, even after > its recovery the polling is not re established. > Step to repro: > 1. Run a job > 2. Go to DAG details page, and ensure that the progress is getting updated. > 3. Stop RM, and ensure that error bar is getting displayed in the UI. > 4. Start RM. > 5. As soon as RM is online, the progress bar must get updated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-3286) Allow clients to set processor reserved memory per vertex (instead of per container)
[ https://issues.apache.org/jira/browse/TEZ-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-3286: - Attachment: TEZ-3286.2.patch addressing some of the comments. > Allow clients to set processor reserved memory per vertex (instead of per > container) > > > Key: TEZ-3286 > URL: https://issues.apache.org/jira/browse/TEZ-3286 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.8.3 >Reporter: Wei Zheng >Assignee: Hitesh Shah > Attachments: TEZ-3286.1.patch, TEZ-3286.2.patch > > > tez.task.scale.memory.reserve-fraction can be set by clients to control how > much memory is available to the processor. Ths values applies at a container > level though, instead of at a vertex level. > In case of a hash-join - the processor typically needs more memory. In case > of a Shuffle join - the processor may not need as much. In DAGs with a mix > of map joins and shuffle joins - setting this at a container level is > sub-optimal. > To a large extent this comes down to propagating vertex configs to the > container / task. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-3286) Allow clients to set processor reserved memory per vertex (instead of per container)
[ https://issues.apache.org/jira/browse/TEZ-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated TEZ-3286: - Attachment: TEZ-3286.3.patch Missed the config scope changes. > Allow clients to set processor reserved memory per vertex (instead of per > container) > > > Key: TEZ-3286 > URL: https://issues.apache.org/jira/browse/TEZ-3286 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.8.3 >Reporter: Wei Zheng >Assignee: Hitesh Shah > Attachments: TEZ-3286.1.patch, TEZ-3286.2.patch, TEZ-3286.3.patch > > > tez.task.scale.memory.reserve-fraction can be set by clients to control how > much memory is available to the processor. Ths values applies at a container > level though, instead of at a vertex level. > In case of a hash-join - the processor typically needs more memory. In case > of a Shuffle join - the processor may not need as much. In DAGs with a mix > of map joins and shuffle joins - setting this at a container level is > sub-optimal. > To a large extent this comes down to propagating vertex configs to the > container / task. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-3319) tez-history-parser should not have its own Version class
Hitesh Shah created TEZ-3319: Summary: tez-history-parser should not have its own Version class Key: TEZ-3319 URL: https://issues.apache.org/jira/browse/TEZ-3319 Project: Apache Tez Issue Type: Bug Reporter: Hitesh Shah Assignee: Rajesh Balamohan Priority: Critical This will hopefully restrict problems such as TEZ-3313 to a single implementation \cc [~rajesh.balamohan] [~ozawa] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3319) tez-history-parser should not have its own Version class
[ https://issues.apache.org/jira/browse/TEZ-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15357433#comment-15357433 ] Hitesh Shah commented on TEZ-3319: -- [~rajesh.balamohan] would you mind creating jiras for any other code duplication in place today. > tez-history-parser should not have its own Version class > > > Key: TEZ-3319 > URL: https://issues.apache.org/jira/browse/TEZ-3319 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Rajesh Balamohan >Priority: Critical > > This will hopefully restrict problems such as TEZ-3313 to a single > implementation > \cc [~rajesh.balamohan] [~ozawa] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3319) tez-history-parser should not have its own Version class
[ https://issues.apache.org/jira/browse/TEZ-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15357458#comment-15357458 ] Tsuyoshi Ozawa commented on TEZ-3319: - I agree with the solution :-) > tez-history-parser should not have its own Version class > > > Key: TEZ-3319 > URL: https://issues.apache.org/jira/browse/TEZ-3319 > Project: Apache Tez > Issue Type: Bug >Reporter: Hitesh Shah >Assignee: Rajesh Balamohan >Priority: Critical > > This will hopefully restrict problems such as TEZ-3313 to a single > implementation > \cc [~rajesh.balamohan] [~ozawa] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Success: TEZ-3286 PreCommit Build #1820
Jira: https://issues.apache.org/jira/browse/TEZ-3286 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/1820/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 4136 lines...] [INFO] Tez ... SUCCESS [ 0.022 s] [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 58:41 min [INFO] Finished at: 2016-06-30T17:57:34+00:00 [INFO] Final Memory: 74M/1181M [INFO] {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12815501/TEZ-3286.3.patch against master revision 540eab0. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 12 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/1820//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1820//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. dcc42afb2da240f23adbe8e7bda434af3093275d logged out == == Finished build. == == Archiving artifacts [description-setter] Description set: TEZ-3286 Recording test results Email was triggered for: Success Sending email for trigger: Success ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Commented] (TEZ-3286) Allow clients to set processor reserved memory per vertex (instead of per container)
[ https://issues.apache.org/jira/browse/TEZ-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15357562#comment-15357562 ] TezQA commented on TEZ-3286: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12815501/TEZ-3286.3.patch against master revision 540eab0. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 12 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/1820//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1820//console This message is automatically generated. > Allow clients to set processor reserved memory per vertex (instead of per > container) > > > Key: TEZ-3286 > URL: https://issues.apache.org/jira/browse/TEZ-3286 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.8.3 >Reporter: Wei Zheng >Assignee: Hitesh Shah > Attachments: TEZ-3286.1.patch, TEZ-3286.2.patch, TEZ-3286.3.patch > > > tez.task.scale.memory.reserve-fraction can be set by clients to control how > much memory is available to the processor. Ths values applies at a container > level though, instead of at a vertex level. > In case of a hash-join - the processor typically needs more memory. In case > of a Shuffle join - the processor may not need as much. In DAGs with a mix > of map joins and shuffle joins - setting this at a container level is > sub-optimal. > To a large extent this comes down to propagating vertex configs to the > container / task. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-3318) Tez UI: Polling is not restarted after RM recovery
[ https://issues.apache.org/jira/browse/TEZ-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sreenath Somarajapuram updated TEZ-3318: Attachment: TEZ-3318.1.patch > Tez UI: Polling is not restarted after RM recovery > -- > > Key: TEZ-3318 > URL: https://issues.apache.org/jira/browse/TEZ-3318 > Project: Apache Tez > Issue Type: Bug >Reporter: Sreenath Somarajapuram >Assignee: Sreenath Somarajapuram > Attachments: TEZ-3318.1.patch > > > For a running DAG, we poll the AM to get progress and other realtime > information. This communication happens via RM. If RM goes down, even after > its recovery the polling is not re established. > Step to repro: > 1. Run a job > 2. Go to DAG details page, and ensure that the progress is getting updated. > 3. Stop RM, and ensure that error bar is getting displayed in the UI. > 4. Start RM. > 5. As soon as RM is online, the progress bar must get updated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-3318) Tez UI: Polling is not restarted after RM recovery
[ https://issues.apache.org/jira/browse/TEZ-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sreenath Somarajapuram updated TEZ-3318: Target Version/s: 0.9.0 > Tez UI: Polling is not restarted after RM recovery > -- > > Key: TEZ-3318 > URL: https://issues.apache.org/jira/browse/TEZ-3318 > Project: Apache Tez > Issue Type: Bug >Reporter: Sreenath Somarajapuram >Assignee: Sreenath Somarajapuram > > For a running DAG, we poll the AM to get progress and other realtime > information. This communication happens via RM. If RM goes down, even after > its recovery the polling is not re established. > Step to repro: > 1. Run a job > 2. Go to DAG details page, and ensure that the progress is getting updated. > 3. Stop RM, and ensure that error bar is getting displayed in the UI. > 4. Start RM. > 5. As soon as RM is online, the progress bar must get updated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3318) Tez UI: Polling is not restarted after RM recovery
[ https://issues.apache.org/jira/browse/TEZ-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15357636#comment-15357636 ] Sreenath Somarajapuram commented on TEZ-3318: - [~hitesh] When polling fails, we don't do a polling retry (From AM). Instead what we do is a page reload in double the time. i.e polling delay in 3sec, if RM is not reachable we do a page reload (From ATS) every 6 seconds until - 1. RM is reachable or 2. the application is complete. Considering that do we need this this retry limit? Adding the limit is a small change though. > Tez UI: Polling is not restarted after RM recovery > -- > > Key: TEZ-3318 > URL: https://issues.apache.org/jira/browse/TEZ-3318 > Project: Apache Tez > Issue Type: Bug >Reporter: Sreenath Somarajapuram >Assignee: Sreenath Somarajapuram > Attachments: TEZ-3318.1.patch > > > For a running DAG, we poll the AM to get progress and other realtime > information. This communication happens via RM. If RM goes down, even after > its recovery the polling is not re established. > Step to repro: > 1. Run a job > 2. Go to DAG details page, and ensure that the progress is getting updated. > 3. Stop RM, and ensure that error bar is getting displayed in the UI. > 4. Start RM. > 5. As soon as RM is online, the progress bar must get updated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (TEZ-3320) Java implementation of bitonic merge sort
[ https://issues.apache.org/jira/browse/TEZ-3320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi Ozawa reassigned TEZ-3320: --- Assignee: Tsuyoshi Ozawa > Java implementation of bitonic merge sort > - > > Key: TEZ-3320 > URL: https://issues.apache.org/jira/browse/TEZ-3320 > Project: Apache Tez > Issue Type: Improvement >Reporter: Tsuyoshi Ozawa >Assignee: Tsuyoshi Ozawa > > Pure java cache-aware bitonic merge sort without JNI can solve the bottleneck > of sort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-3320) Java implementation of bitonic merge sort
Tsuyoshi Ozawa created TEZ-3320: --- Summary: Java implementation of bitonic merge sort Key: TEZ-3320 URL: https://issues.apache.org/jira/browse/TEZ-3320 Project: Apache Tez Issue Type: Improvement Reporter: Tsuyoshi Ozawa Pure java cache-aware bitonic merge sort without JNI can solve the bottleneck of sort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-3314) Double counting input bytes in MultiMRInput
[ https://issues.apache.org/jira/browse/TEZ-3314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated TEZ-3314: Fix Version/s: 0.8.4 > Double counting input bytes in MultiMRInput > --- > > Key: TEZ-3314 > URL: https://issues.apache.org/jira/browse/TEZ-3314 > Project: Apache Tez > Issue Type: Bug >Reporter: Harish Jaiprakash >Assignee: Harish Jaiprakash > Fix For: 0.9.0, 0.8.4 > > Attachments: TEZ-3314.0.patch > > > TEZ_INPUT_SPLIT_LENGTH is incremented twice if useNewAPI is set. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-3291) Optimize splits grouping when locality information is not available
[ https://issues.apache.org/jira/browse/TEZ-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated TEZ-3291: Fix Version/s: 0.8.4 0.9.0 > Optimize splits grouping when locality information is not available > --- > > Key: TEZ-3291 > URL: https://issues.apache.org/jira/browse/TEZ-3291 > Project: Apache Tez > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Fix For: 0.9.0, 0.8.4 > > Attachments: TEZ-3291.004.patch, TEZ-3291.2.patch, TEZ-3291.3.patch, > TEZ-3291.4.patch, TEZ-3291.5.patch, TEZ-3291.WIP.patch > > > There are scenarios where splits might not contain the location details. S3 > is an example, where all splits would have "localhost" for the location > details. In such cases, curent split computation does not go through the > rack local and allow-small groups optimizations and ends up creating small > number of splits. Depending on clusters this can end creating long running > map jobs. > Example with hive: > == > 1. Inventory table in tpc-ds dataset is partitioned and is relatively a small > table. > 2. With query-22, hive requests with the original splits count as 52 and > overall length of splits themselves is around 12061817 bytes. > {{tez.grouping.min-size}} was set to 16 MB. > 3. In tez splits grouping, this ends up creating a single split with 52+ > files be processed in the split. In clusters with split locations, this > would have landed up with multiple splits since {{allowSmallGroups}} would > have kicked in. > But in S3, since everything would have "localhost" all splits get added to > single group. This makes things a lot worse. > 4. Depending on the dataset and the format, this can be problematic. For > instance, file open calls and random seeks can be expensive in S3. > 5. In this case, 52 files have to be opened and processed by single task in > sequential fashion. Had it been processed by multiple tasks, response time > would have drastically reduced. > E.g log details > {noformat} > 2016-06-01 13:48:08,353 [INFO] [InputInitializer {Map 2} #0] > |split.TezMapredSplitsGrouper|: Grouping splits in Tez > 2016-06-01 13:48:08,353 [INFO] [InputInitializer {Map 2} #0] > |split.TezMapredSplitsGrouper|: Desired splits: 110 too large. Desired > splitLength: 109652 Min splitLength: 16777216 New desired splits: 1 Total > length: 12061817 Original splits: 52 > 2016-06-01 13:48:08,354 [INFO] [InputInitializer {Map 2} #0] > |split.TezMapredSplitsGrouper|: Desired numSplits: 1 lengthPerGroup: 12061817 > numLocations: 1 numSplitsPerLocation: 52 numSplitsInGroup: 52 totalLength: > 12061817 numOriginalSplits: 52 . Grouping by length: true count: false > 2016-06-01 13:48:08,354 [INFO] [InputInitializer {Map 2} #0] > |split.TezMapredSplitsGrouper|: Number of splits desired: 1 created: 1 > splitsProcessed: 52 > {noformat} > Alternate options: > == > 1. Force Hadoop to provide bogus locations for S3. But not sure, if that > would be accepted anytime soon. Ref: HADOOP-12878 > 2. Set {{tez.grouping.min-size}} to very very low value. But should the end > user always be doing this on query to query basis? > 3. When {{(lengthPerGroup < "tez.grouping.min-size")}}, recompute > desiredNumSplits only when number of distinct locations in the splits is > 1. > This would force more number of splits to be generated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3314) Double counting input bytes in MultiMRInput
[ https://issues.apache.org/jira/browse/TEZ-3314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15357941#comment-15357941 ] Siddharth Seth commented on TEZ-3314: - Pulled into branch-0.8 as well. > Double counting input bytes in MultiMRInput > --- > > Key: TEZ-3314 > URL: https://issues.apache.org/jira/browse/TEZ-3314 > Project: Apache Tez > Issue Type: Bug >Reporter: Harish Jaiprakash >Assignee: Harish Jaiprakash > Fix For: 0.9.0, 0.8.4 > > Attachments: TEZ-3314.0.patch > > > TEZ_INPUT_SPLIT_LENGTH is incremented twice if useNewAPI is set. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (TEZ-3303) Have ShuffleVertexManager consume more precise partition stats
[ https://issues.apache.org/jira/browse/TEZ-3303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi Ozawa reassigned TEZ-3303: --- Assignee: Tsuyoshi Ozawa > Have ShuffleVertexManager consume more precise partition stats > -- > > Key: TEZ-3303 > URL: https://issues.apache.org/jira/browse/TEZ-3303 > Project: Apache Tez > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Tsuyoshi Ozawa > > TEZ-3216 adds the support for more precise partition stats. > ShuffleVertexManager should be updated to consume the more precise partition > stats. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-3303) Have ShuffleVertexManager consume more precise partition stats
[ https://issues.apache.org/jira/browse/TEZ-3303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi Ozawa updated TEZ-3303: Attachment: TEZ-3303.001.patch Attaching a patch to consume more precise partition stats. > Have ShuffleVertexManager consume more precise partition stats > -- > > Key: TEZ-3303 > URL: https://issues.apache.org/jira/browse/TEZ-3303 > Project: Apache Tez > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Tsuyoshi Ozawa > Attachments: TEZ-3303.001.patch > > > TEZ-3216 adds the support for more precise partition stats. > ShuffleVertexManager should be updated to consume the more precise partition > stats. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2962) Use per partition stats in shuffle vertex manager auto parallelism
[ https://issues.apache.org/jira/browse/TEZ-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15357961#comment-15357961 ] Tsuyoshi Ozawa commented on TEZ-2962: - This can be done after TEZ-3303. > Use per partition stats in shuffle vertex manager auto parallelism > -- > > Key: TEZ-2962 > URL: https://issues.apache.org/jira/browse/TEZ-2962 > Project: Apache Tez > Issue Type: Bug >Reporter: Bikas Saha >Priority: Critical > > The original code used output size sent by completed tasks. Recently per > partition stats have been added that provide granular information. Using > partition stats may be more accurate and also remove the duplicate counting > of data size in partition stats and per task overall. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3293) Fetch failures can cause a shuffle hang waiting for memory merge that never starts
[ https://issues.apache.org/jira/browse/TEZ-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15357988#comment-15357988 ] Siddharth Seth commented on TEZ-3293: - +1. Looks good. Thanks [~jlowe]. Apologies for the delay in the review. > Fetch failures can cause a shuffle hang waiting for memory merge that never > starts > -- > > Key: TEZ-3293 > URL: https://issues.apache.org/jira/browse/TEZ-3293 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.7.1, 0.8.3 >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Critical > Attachments: TEZ-3293.001.patch > > > Tez jobs can hang in shuffle waiting for a memory merge that never starts. > When a MapOutput is reserved it increments usedMemory but when it is > unreserved it decrements usedMemory _and_ commitMemory. If enough shuffle > failures occur of sufficient size then commitMemory may never reach the merge > threshold even after all outstanding transfers have committed and thus hang > the shuffle. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3286) Allow clients to set processor reserved memory per vertex (instead of per container)
[ https://issues.apache.org/jira/browse/TEZ-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15358029#comment-15358029 ] Siddharth Seth commented on TEZ-3286: - +1. Looks good. Thanks [~hitesh]. Will commit in a bit with a small change to add a timeout to the new tests. > Allow clients to set processor reserved memory per vertex (instead of per > container) > > > Key: TEZ-3286 > URL: https://issues.apache.org/jira/browse/TEZ-3286 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.8.3 >Reporter: Wei Zheng >Assignee: Hitesh Shah > Attachments: TEZ-3286.1.patch, TEZ-3286.2.patch, TEZ-3286.3.patch > > > tez.task.scale.memory.reserve-fraction can be set by clients to control how > much memory is available to the processor. Ths values applies at a container > level though, instead of at a vertex level. > In case of a hash-join - the processor typically needs more memory. In case > of a Shuffle join - the processor may not need as much. In DAGs with a mix > of map joins and shuffle joins - setting this at a container level is > sub-optimal. > To a large extent this comes down to propagating vertex configs to the > container / task. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-3286) Allow clients to set processor reserved memory per vertex (instead of per container)
[ https://issues.apache.org/jira/browse/TEZ-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated TEZ-3286: Attachment: TEZ-3286.3.withTestTimeout.txt > Allow clients to set processor reserved memory per vertex (instead of per > container) > > > Key: TEZ-3286 > URL: https://issues.apache.org/jira/browse/TEZ-3286 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.8.3 >Reporter: Wei Zheng >Assignee: Hitesh Shah > Attachments: TEZ-3286.1.patch, TEZ-3286.2.patch, TEZ-3286.3.patch, > TEZ-3286.3.withTestTimeout.txt > > > tez.task.scale.memory.reserve-fraction can be set by clients to control how > much memory is available to the processor. Ths values applies at a container > level though, instead of at a vertex level. > In case of a hash-join - the processor typically needs more memory. In case > of a Shuffle join - the processor may not need as much. In DAGs with a mix > of map joins and shuffle joins - setting this at a container level is > sub-optimal. > To a large extent this comes down to propagating vertex configs to the > container / task. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-3293) Fetch failures can cause a shuffle hang waiting for memory merge that never starts
[ https://issues.apache.org/jira/browse/TEZ-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated TEZ-3293: Attachment: TEZ-3293.001-branch-0.7.patch Patch fro branch-0.7 > Fetch failures can cause a shuffle hang waiting for memory merge that never > starts > -- > > Key: TEZ-3293 > URL: https://issues.apache.org/jira/browse/TEZ-3293 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.7.1, 0.8.3 >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Critical > Attachments: TEZ-3293.001-branch-0.7.patch, TEZ-3293.001.patch > > > Tez jobs can hang in shuffle waiting for a memory merge that never starts. > When a MapOutput is reserved it increments usedMemory but when it is > unreserved it decrements usedMemory _and_ commitMemory. If enough shuffle > failures occur of sufficient size then commitMemory may never reach the merge > threshold even after all outstanding transfers have committed and thus hang > the shuffle. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-3321) Changes for 0.8.4 release
Siddharth Seth created TEZ-3321: --- Summary: Changes for 0.8.4 release Key: TEZ-3321 URL: https://issues.apache.org/jira/browse/TEZ-3321 Project: Apache Tez Issue Type: Task Reporter: Siddharth Seth Assignee: Siddharth Seth -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Failed: TEZ-3293 PreCommit Build #1823
Jira: https://issues.apache.org/jira/browse/TEZ-3293 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/1823/ ### ## LAST 60 LINES OF THE CONSOLE ### Started by remote host 127.0.0.1 [EnvInject] - Loading node environment variables. Building remotely on H5 (Mapreduce Falcon Hadoop Pig Zookeeper Tez Hdfs yahoo-not-h2) in workspace /home/jenkins/jenkins-slave/workspace/PreCommit-TEZ-Build > git rev-parse --is-inside-work-tree # timeout=10 Fetching changes from the remote Git repository > git config remote.origin.url https://git-wip-us.apache.org/repos/asf/tez.git > # timeout=10 Cleaning workspace > git rev-parse --verify HEAD # timeout=10 Resetting working tree > git reset --hard # timeout=10 > git clean -fdx # timeout=10 Fetching upstream changes from https://git-wip-us.apache.org/repos/asf/tez.git > git --version # timeout=10 > git -c core.askpass=true fetch --tags --progress > https://git-wip-us.apache.org/repos/asf/tez.git > +refs/heads/*:refs/remotes/origin/* > git rev-parse refs/remotes/origin/master^{commit} # timeout=10 > git rev-parse refs/remotes/origin/origin/master^{commit} # timeout=10 Checking out Revision 3b08cbf907784de463c9e3c05147b5c6d681251d (refs/remotes/origin/master) > git config core.sparsecheckout # timeout=10 > git checkout -f 3b08cbf907784de463c9e3c05147b5c6d681251d > git rev-list 71bb2defe97e55e3bf7dbb299fe33ab8a667e7a1 # timeout=10 No emails were triggered. [PreCommit-TEZ-Build] $ /bin/bash /tmp/hudson2317049414711291726.sh Running in Jenkins mode == == Testing patch for TEZ-3293. == == HEAD is now at 3b08cbf TEZ-3286. Allow clients to set processor reserved memory per vertex (instead of per container). Contributed by Hitesh Shah. Previous HEAD position was 3b08cbf... TEZ-3286. Allow clients to set processor reserved memory per vertex (instead of per container). Contributed by Hitesh Shah. Switched to branch 'master' Your branch is behind 'origin/master' by 7 commits, and can be fast-forwarded. (use "git pull" to update your local branch) First, rewinding head to replay your work on top of it... Fast-forwarded master to 3b08cbf907784de463c9e3c05147b5c6d681251d. TEZ-3293 is not "Patch Available". Exiting. == == Finished build. == == Archiving artifacts ERROR: No artifacts found that match the file pattern "patchprocess/*.*". Configuration error? ERROR: ?patchprocess/*.*? doesn?t match anything, but ?*.*? does. Perhaps that?s what you mean? Build step 'Archive the artifacts' changed build result to FAILURE [description-setter] Could not determine description. Recording test results ERROR: Step ?Publish JUnit test result report? failed: No test report files were found. Configuration error? Email was triggered for: Failure - Any Sending email for trigger: Failure - Any ### ## FAILED TESTS (if any) ## No tests ran.
[jira] [Updated] (TEZ-3321) Changes for 0.8.4 release
[ https://issues.apache.org/jira/browse/TEZ-3321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated TEZ-3321: Attachment: TEZ-3321.part1.txt Patch for branch-0.8 to update version and add section to CHANGES.txt > Changes for 0.8.4 release > - > > Key: TEZ-3321 > URL: https://issues.apache.org/jira/browse/TEZ-3321 > Project: Apache Tez > Issue Type: Task >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: TEZ-3321.part1.txt > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3287) Have UnorderedPartitionedKVWriter honor tez.runtime.empty.partitions.info-via-events.enabled
[ https://issues.apache.org/jira/browse/TEZ-3287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15358060#comment-15358060 ] Lalitha Viswanathan commented on TEZ-3287: -- Which version of tez source code is being used in the patch? I tried applying the patch manually in 0.8.3 version, compiled and deployed it in my cluster. Didn't get the "hive.tez.auto.reducer.parallelism=true" optimization working with shuffle hash join. Am I missing something? > Have UnorderedPartitionedKVWriter honor > tez.runtime.empty.partitions.info-via-events.enabled > > > Key: TEZ-3287 > URL: https://issues.apache.org/jira/browse/TEZ-3287 > Project: Apache Tez > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Tsuyoshi Ozawa > Attachments: TEZ-3287.001.patch > > > The ordered partitioned output allows applications to specify if empty > partition stats should be included as part of DataMovementEvent via a > configuration. It seems unordered partitioned output should honor that > configuration as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Success: TEZ-3303 PreCommit Build #1821
Jira: https://issues.apache.org/jira/browse/TEZ-3303 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/1821/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 4116 lines...] [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 56:30 min [INFO] Finished at: 2016-06-30T23:47:20+00:00 [INFO] Final Memory: 72M/882M [INFO] {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12815554/TEZ-3303.001.patch against master revision ac9cfb9. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/1821//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1821//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. c34874fa138f25f5e30f4ae9950b579317b08081 logged out == == Finished build. == == Archiving artifacts Compressed 3.20 MB of artifacts by 27.4% relative to #1820 [description-setter] Description set: TEZ-3303 Recording test results Email was triggered for: Success Sending email for trigger: Success ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Commented] (TEZ-3303) Have ShuffleVertexManager consume more precise partition stats
[ https://issues.apache.org/jira/browse/TEZ-3303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15358062#comment-15358062 ] TezQA commented on TEZ-3303: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12815554/TEZ-3303.001.patch against master revision ac9cfb9. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/1821//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1821//console This message is automatically generated. > Have ShuffleVertexManager consume more precise partition stats > -- > > Key: TEZ-3303 > URL: https://issues.apache.org/jira/browse/TEZ-3303 > Project: Apache Tez > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Tsuyoshi Ozawa > Attachments: TEZ-3303.001.patch > > > TEZ-3216 adds the support for more precise partition stats. > ShuffleVertexManager should be updated to consume the more precise partition > stats. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3287) Have UnorderedPartitionedKVWriter honor tez.runtime.empty.partitions.info-via-events.enabled
[ https://issues.apache.org/jira/browse/TEZ-3287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15358068#comment-15358068 ] Tsuyoshi Ozawa commented on TEZ-3287: - [~lmv] thanks for your taking a look! The patch is targeting master, not for branch-8. After merging this into master, I can backport it to branch-0.8. > Have UnorderedPartitionedKVWriter honor > tez.runtime.empty.partitions.info-via-events.enabled > > > Key: TEZ-3287 > URL: https://issues.apache.org/jira/browse/TEZ-3287 > Project: Apache Tez > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Tsuyoshi Ozawa > Attachments: TEZ-3287.001.patch > > > The ordered partitioned output allows applications to specify if empty > partition stats should be included as part of DataMovementEvent via a > configuration. It seems unordered partitioned output should honor that > configuration as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TEZ-3287) Have UnorderedPartitionedKVWriter honor tez.runtime.empty.partitions.info-via-events.enabled
[ https://issues.apache.org/jira/browse/TEZ-3287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15358068#comment-15358068 ] Tsuyoshi Ozawa edited comment on TEZ-3287 at 6/30/16 11:57 PM: --- [~lmv] thanks for your taking a look! The patch is targeting master, not for branch-8. After merging this into master, I can backport it to branch-0.8. BTW, I'm thinking that there is no relationship between "hive.tez.auto.reducer.parallelism=true" and this jira. Let me know if I'm wrong. was (Author: ozawa): [~lmv] thanks for your taking a look! The patch is targeting master, not for branch-8. After merging this into master, I can backport it to branch-0.8. > Have UnorderedPartitionedKVWriter honor > tez.runtime.empty.partitions.info-via-events.enabled > > > Key: TEZ-3287 > URL: https://issues.apache.org/jira/browse/TEZ-3287 > Project: Apache Tez > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Tsuyoshi Ozawa > Attachments: TEZ-3287.001.patch > > > The ordered partitioned output allows applications to specify if empty > partition stats should be included as part of DataMovementEvent via a > configuration. It seems unordered partitioned output should honor that > configuration as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Success: TEZ-3286 PreCommit Build #1822
Jira: https://issues.apache.org/jira/browse/TEZ-3286 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/1822/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 4136 lines...] [INFO] Tez ... SUCCESS [ 0.031 s] [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 55:12 min [INFO] Finished at: 2016-07-01T00:41:14+00:00 [INFO] Final Memory: 85M/1069M [INFO] {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12815571/TEZ-3286.3.withTestTimeout.txt against master revision 71bb2de. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 12 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/1822//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1822//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. affaacd3419fd9fe75fec70d4bf26b324fc171da logged out == == Finished build. == == Archiving artifacts [description-setter] Description set: TEZ-3286 Recording test results Email was triggered for: Success Sending email for trigger: Success ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Commented] (TEZ-3286) Allow clients to set processor reserved memory per vertex (instead of per container)
[ https://issues.apache.org/jira/browse/TEZ-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15358113#comment-15358113 ] TezQA commented on TEZ-3286: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12815571/TEZ-3286.3.withTestTimeout.txt against master revision 71bb2de. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 12 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/1822//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1822//console This message is automatically generated. > Allow clients to set processor reserved memory per vertex (instead of per > container) > > > Key: TEZ-3286 > URL: https://issues.apache.org/jira/browse/TEZ-3286 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.8.3 >Reporter: Wei Zheng >Assignee: Hitesh Shah > Fix For: 0.9.0, 0.8.4 > > Attachments: TEZ-3286.1.patch, TEZ-3286.2.patch, TEZ-3286.3.patch, > TEZ-3286.3.withTestTimeout.txt > > > tez.task.scale.memory.reserve-fraction can be set by clients to control how > much memory is available to the processor. Ths values applies at a container > level though, instead of at a vertex level. > In case of a hash-join - the processor typically needs more memory. In case > of a Shuffle join - the processor may not need as much. In DAGs with a mix > of map joins and shuffle joins - setting this at a container level is > sub-optimal. > To a large extent this comes down to propagating vertex configs to the > container / task. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3303) Have ShuffleVertexManager consume more precise partition stats
[ https://issues.apache.org/jira/browse/TEZ-3303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15358196#comment-15358196 ] Tsuyoshi Ozawa commented on TEZ-3303: - [~sseth] could you take a look? > Have ShuffleVertexManager consume more precise partition stats > -- > > Key: TEZ-3303 > URL: https://issues.apache.org/jira/browse/TEZ-3303 > Project: Apache Tez > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Tsuyoshi Ozawa > Attachments: TEZ-3303.001.patch > > > TEZ-3216 adds the support for more precise partition stats. > ShuffleVertexManager should be updated to consume the more precise partition > stats. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-3322) Add Apache license to the generated tez-configuration-template files
Siddharth Seth created TEZ-3322: --- Summary: Add Apache license to the generated tez-configuration-template files Key: TEZ-3322 URL: https://issues.apache.org/jira/browse/TEZ-3322 Project: Apache Tez Issue Type: Task Reporter: Siddharth Seth -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3318) Tez UI: Polling is not restarted after RM recovery
[ https://issues.apache.org/jira/browse/TEZ-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15358255#comment-15358255 ] Hitesh Shah commented on TEZ-3318: -- Does the polling interval reset back to 3 on any successful call? > Tez UI: Polling is not restarted after RM recovery > -- > > Key: TEZ-3318 > URL: https://issues.apache.org/jira/browse/TEZ-3318 > Project: Apache Tez > Issue Type: Bug >Reporter: Sreenath Somarajapuram >Assignee: Sreenath Somarajapuram > Attachments: TEZ-3318.1.patch > > > For a running DAG, we poll the AM to get progress and other realtime > information. This communication happens via RM. If RM goes down, even after > its recovery the polling is not re established. > Step to repro: > 1. Run a job > 2. Go to DAG details page, and ensure that the progress is getting updated. > 3. Stop RM, and ensure that error bar is getting displayed in the UI. > 4. Start RM. > 5. As soon as RM is online, the progress bar must get updated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3303) Have ShuffleVertexManager consume more precise partition stats
[ https://issues.apache.org/jira/browse/TEZ-3303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15358300#comment-15358300 ] Siddharth Seth commented on TEZ-3303: - [~ozawa] - I don't think the patch actually makes use of the stats. It needs to check which stats are set - and use that set appropriately. If I'm not mistaken the current patch only checks and reads detailed stats, but does nothing with them. cc [~mingma] - in case you'd like to review the patch when it's updated. > Have ShuffleVertexManager consume more precise partition stats > -- > > Key: TEZ-3303 > URL: https://issues.apache.org/jira/browse/TEZ-3303 > Project: Apache Tez > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Tsuyoshi Ozawa > Attachments: TEZ-3303.001.patch > > > TEZ-3216 adds the support for more precise partition stats. > ShuffleVertexManager should be updated to consume the more precise partition > stats. -- This message was sent by Atlassian JIRA (v6.3.4#6332)