[jira] [Commented] (TEZ-3296) Tez job can hang if two vertices at the same root distance have different task requirements
[ https://issues.apache.org/jira/browse/TEZ-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15326738#comment-15326738 ] Jonathan Eagles commented on TEZ-3296: -- bq. Now - (1*24*3) + 20*3 = 150 = (2*24*3) + 2*3 The formula is set up so that all vertices with a distance of _h_ from the root have a logically higher priority than all vertices with a distance of _h + 1_ . In the example above, the calculation on the LHS should be 132. > Tez job can hang if two vertices at the same root distance have different > task requirements > --- > > Key: TEZ-3296 > URL: https://issues.apache.org/jira/browse/TEZ-3296 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.7.1 >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Critical > Attachments: TEZ-3296.001.patch > > > When two vertices have the same distance from the root Tez will schedule > containers with the same priority. However those vertices could have > different task requirements and therefore different capabilities. As > documented in YARN-314, YARN currently doesn't support requests for multiple > sizes at the same priority. In practice this leads to one vertex allocation > requests clobbering the other, and that can result in a situation where the > Tez AM is waiting on containers it will never receive from the RM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-3291) Optimize splits grouping when locality information is not available
[ https://issues.apache.org/jira/browse/TEZ-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated TEZ-3291: -- Attachment: TEZ-3291.4.patch Attaching the patch which explicitly checks whether all splits having "localhost" (for s3). Added additional test case. > Optimize splits grouping when locality information is not available > --- > > Key: TEZ-3291 > URL: https://issues.apache.org/jira/browse/TEZ-3291 > Project: Apache Tez > Issue Type: Improvement >Reporter: Rajesh Balamohan >Priority: Minor > Attachments: TEZ-3291.2.patch, TEZ-3291.3.patch, TEZ-3291.4.patch, > TEZ-3291.WIP.patch > > > There are scenarios where splits might not contain the location details. S3 > is an example, where all splits would have "localhost" for the location > details. In such cases, curent split computation does not go through the > rack local and allow-small groups optimizations and ends up creating small > number of splits. Depending on clusters this can end creating long running > map jobs. > Example with hive: > == > 1. Inventory table in tpc-ds dataset is partitioned and is relatively a small > table. > 2. With query-22, hive requests with the original splits count as 52 and > overall length of splits themselves is around 12061817 bytes. > {{tez.grouping.min-size}} was set to 16 MB. > 3. In tez splits grouping, this ends up creating a single split with 52+ > files be processed in the split. In clusters with split locations, this > would have landed up with multiple splits since {{allowSmallGroups}} would > have kicked in. > But in S3, since everything would have "localhost" all splits get added to > single group. This makes things a lot worse. > 4. Depending on the dataset and the format, this can be problematic. For > instance, file open calls and random seeks can be expensive in S3. > 5. In this case, 52 files have to be opened and processed by single task in > sequential fashion. Had it been processed by multiple tasks, response time > would have drastically reduced. > E.g log details > {noformat} > 2016-06-01 13:48:08,353 [INFO] [InputInitializer {Map 2} #0] > |split.TezMapredSplitsGrouper|: Grouping splits in Tez > 2016-06-01 13:48:08,353 [INFO] [InputInitializer {Map 2} #0] > |split.TezMapredSplitsGrouper|: Desired splits: 110 too large. Desired > splitLength: 109652 Min splitLength: 16777216 New desired splits: 1 Total > length: 12061817 Original splits: 52 > 2016-06-01 13:48:08,354 [INFO] [InputInitializer {Map 2} #0] > |split.TezMapredSplitsGrouper|: Desired numSplits: 1 lengthPerGroup: 12061817 > numLocations: 1 numSplitsPerLocation: 52 numSplitsInGroup: 52 totalLength: > 12061817 numOriginalSplits: 52 . Grouping by length: true count: false > 2016-06-01 13:48:08,354 [INFO] [InputInitializer {Map 2} #0] > |split.TezMapredSplitsGrouper|: Number of splits desired: 1 created: 1 > splitsProcessed: 52 > {noformat} > Alternate options: > == > 1. Force Hadoop to provide bogus locations for S3. But not sure, if that > would be accepted anytime soon. Ref: HADOOP-12878 > 2. Set {{tez.grouping.min-size}} to very very low value. But should the end > user always be doing this on query to query basis? > 3. When {{(lengthPerGroup < "tez.grouping.min-size")}}, recompute > desiredNumSplits only when number of distinct locations in the splits is > 1. > This would force more number of splits to be generated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3291) Optimize splits grouping when locality information is not available
[ https://issues.apache.org/jira/browse/TEZ-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15326714#comment-15326714 ] Gopal V commented on TEZ-3291: -- The numDistinctLocations worries me since this impl leaks into HDFS runs as well. S3 and WASB return "localhost" for the hostnames (causing much damage with YARN container allocation), while all other impls which provide actual locality information instead of providing a dummy entry - in particular, using the actual "127.0.0.1" IP address instead of using hostnames. The text entry of "localhost" could be special-cased, so that this change cannot impact HDFS installs. > Optimize splits grouping when locality information is not available > --- > > Key: TEZ-3291 > URL: https://issues.apache.org/jira/browse/TEZ-3291 > Project: Apache Tez > Issue Type: Improvement >Reporter: Rajesh Balamohan >Priority: Minor > Attachments: TEZ-3291.2.patch, TEZ-3291.3.patch, TEZ-3291.WIP.patch > > > There are scenarios where splits might not contain the location details. S3 > is an example, where all splits would have "localhost" for the location > details. In such cases, curent split computation does not go through the > rack local and allow-small groups optimizations and ends up creating small > number of splits. Depending on clusters this can end creating long running > map jobs. > Example with hive: > == > 1. Inventory table in tpc-ds dataset is partitioned and is relatively a small > table. > 2. With query-22, hive requests with the original splits count as 52 and > overall length of splits themselves is around 12061817 bytes. > {{tez.grouping.min-size}} was set to 16 MB. > 3. In tez splits grouping, this ends up creating a single split with 52+ > files be processed in the split. In clusters with split locations, this > would have landed up with multiple splits since {{allowSmallGroups}} would > have kicked in. > But in S3, since everything would have "localhost" all splits get added to > single group. This makes things a lot worse. > 4. Depending on the dataset and the format, this can be problematic. For > instance, file open calls and random seeks can be expensive in S3. > 5. In this case, 52 files have to be opened and processed by single task in > sequential fashion. Had it been processed by multiple tasks, response time > would have drastically reduced. > E.g log details > {noformat} > 2016-06-01 13:48:08,353 [INFO] [InputInitializer {Map 2} #0] > |split.TezMapredSplitsGrouper|: Grouping splits in Tez > 2016-06-01 13:48:08,353 [INFO] [InputInitializer {Map 2} #0] > |split.TezMapredSplitsGrouper|: Desired splits: 110 too large. Desired > splitLength: 109652 Min splitLength: 16777216 New desired splits: 1 Total > length: 12061817 Original splits: 52 > 2016-06-01 13:48:08,354 [INFO] [InputInitializer {Map 2} #0] > |split.TezMapredSplitsGrouper|: Desired numSplits: 1 lengthPerGroup: 12061817 > numLocations: 1 numSplitsPerLocation: 52 numSplitsInGroup: 52 totalLength: > 12061817 numOriginalSplits: 52 . Grouping by length: true count: false > 2016-06-01 13:48:08,354 [INFO] [InputInitializer {Map 2} #0] > |split.TezMapredSplitsGrouper|: Number of splits desired: 1 created: 1 > splitsProcessed: 52 > {noformat} > Alternate options: > == > 1. Force Hadoop to provide bogus locations for S3. But not sure, if that > would be accepted anytime soon. Ref: HADOOP-12878 > 2. Set {{tez.grouping.min-size}} to very very low value. But should the end > user always be doing this on query to query basis? > 3. When {{(lengthPerGroup < "tez.grouping.min-size")}}, recompute > desiredNumSplits only when number of distinct locations in the splits is > 1. > This would force more number of splits to be generated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-3291) Optimize splits grouping when locality information is not available
[ https://issues.apache.org/jira/browse/TEZ-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated TEZ-3291: -- Attachment: TEZ-3291.3.patch Attaching patch to address review comments. S3 urls can be explicitly checked in splits (by casting to fileSplit and checking getPath). But not sure, if we need to restrict this only for FileSplits in future. > Optimize splits grouping when locality information is not available > --- > > Key: TEZ-3291 > URL: https://issues.apache.org/jira/browse/TEZ-3291 > Project: Apache Tez > Issue Type: Improvement >Reporter: Rajesh Balamohan >Priority: Minor > Attachments: TEZ-3291.2.patch, TEZ-3291.3.patch, TEZ-3291.WIP.patch > > > There are scenarios where splits might not contain the location details. S3 > is an example, where all splits would have "localhost" for the location > details. In such cases, curent split computation does not go through the > rack local and allow-small groups optimizations and ends up creating small > number of splits. Depending on clusters this can end creating long running > map jobs. > Example with hive: > == > 1. Inventory table in tpc-ds dataset is partitioned and is relatively a small > table. > 2. With query-22, hive requests with the original splits count as 52 and > overall length of splits themselves is around 12061817 bytes. > {{tez.grouping.min-size}} was set to 16 MB. > 3. In tez splits grouping, this ends up creating a single split with 52+ > files be processed in the split. In clusters with split locations, this > would have landed up with multiple splits since {{allowSmallGroups}} would > have kicked in. > But in S3, since everything would have "localhost" all splits get added to > single group. This makes things a lot worse. > 4. Depending on the dataset and the format, this can be problematic. For > instance, file open calls and random seeks can be expensive in S3. > 5. In this case, 52 files have to be opened and processed by single task in > sequential fashion. Had it been processed by multiple tasks, response time > would have drastically reduced. > E.g log details > {noformat} > 2016-06-01 13:48:08,353 [INFO] [InputInitializer {Map 2} #0] > |split.TezMapredSplitsGrouper|: Grouping splits in Tez > 2016-06-01 13:48:08,353 [INFO] [InputInitializer {Map 2} #0] > |split.TezMapredSplitsGrouper|: Desired splits: 110 too large. Desired > splitLength: 109652 Min splitLength: 16777216 New desired splits: 1 Total > length: 12061817 Original splits: 52 > 2016-06-01 13:48:08,354 [INFO] [InputInitializer {Map 2} #0] > |split.TezMapredSplitsGrouper|: Desired numSplits: 1 lengthPerGroup: 12061817 > numLocations: 1 numSplitsPerLocation: 52 numSplitsInGroup: 52 totalLength: > 12061817 numOriginalSplits: 52 . Grouping by length: true count: false > 2016-06-01 13:48:08,354 [INFO] [InputInitializer {Map 2} #0] > |split.TezMapredSplitsGrouper|: Number of splits desired: 1 created: 1 > splitsProcessed: 52 > {noformat} > Alternate options: > == > 1. Force Hadoop to provide bogus locations for S3. But not sure, if that > would be accepted anytime soon. Ref: HADOOP-12878 > 2. Set {{tez.grouping.min-size}} to very very low value. But should the end > user always be doing this on query to query basis? > 3. When {{(lengthPerGroup < "tez.grouping.min-size")}}, recompute > desiredNumSplits only when number of distinct locations in the splits is > 1. > This would force more number of splits to be generated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (TEZ-3297) Deadlock scenario in AM during ShuffleVertexManager auto reduce
[ https://issues.apache.org/jira/browse/TEZ-3297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan reassigned TEZ-3297: - Assignee: Rajesh Balamohan > Deadlock scenario in AM during ShuffleVertexManager auto reduce > --- > > Key: TEZ-3297 > URL: https://issues.apache.org/jira/browse/TEZ-3297 > Project: Apache Tez > Issue Type: Bug >Reporter: Zhiyuan Yang >Assignee: Rajesh Balamohan >Priority: Critical > Attachments: TEZ-3297.1.patch, TEZ-3297.2.branch-0.7.patch, > TEZ-3297.2.patch, am_log, thread_dump > > > Here is what's happening in the attached thread dump. > App Pool thread #9 does the auto reduce on V2 and initializes the new edge > manager, it holds the V2 write lock and wants read lock of source vertex V1. > At the same time, another App Pool thread #2 schedules a task of V1 and gets > the output spec, so it holds the V1 read lock and wants V2 read lock. > Also, dispatcher thread wants the V1 write lock to begin the state machine > transition. Since dispatcher thread is at the head of V1 ReadWriteLock queue, > thread #9 cannot get V1 read lock even thread #2 is holding V1 read lock. > This is a circular lock scenario. #2 blocks dispatcher, dispatcher blocks #9, > and #9 blocks #2. > There is no problem with ReadWriteLock behavior in this case. Please see this > java bug report, http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6816565. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-3297) Deadlock scenario in AM during ShuffleVertexManager auto reduce
[ https://issues.apache.org/jira/browse/TEZ-3297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated TEZ-3297: -- Attachment: TEZ-3297.2.branch-0.7.patch Thanks a lot [~sseth], [~bikassaha]. Will commit it shortly. [~jeagles] - Attaching patch for branch-0.7 as well. Will commit it to branch-0.7 > Deadlock scenario in AM during ShuffleVertexManager auto reduce > --- > > Key: TEZ-3297 > URL: https://issues.apache.org/jira/browse/TEZ-3297 > Project: Apache Tez > Issue Type: Bug >Reporter: Zhiyuan Yang >Priority: Critical > Attachments: TEZ-3297.1.patch, TEZ-3297.2.branch-0.7.patch, > TEZ-3297.2.patch, am_log, thread_dump > > > Here is what's happening in the attached thread dump. > App Pool thread #9 does the auto reduce on V2 and initializes the new edge > manager, it holds the V2 write lock and wants read lock of source vertex V1. > At the same time, another App Pool thread #2 schedules a task of V1 and gets > the output spec, so it holds the V1 read lock and wants V2 read lock. > Also, dispatcher thread wants the V1 write lock to begin the state machine > transition. Since dispatcher thread is at the head of V1 ReadWriteLock queue, > thread #9 cannot get V1 read lock even thread #2 is holding V1 read lock. > This is a circular lock scenario. #2 blocks dispatcher, dispatcher blocks #9, > and #9 blocks #2. > There is no problem with ReadWriteLock behavior in this case. Please see this > java bug report, http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6816565. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3297) Deadlock scenario in AM during ShuffleVertexManager auto reduce
[ https://issues.apache.org/jira/browse/TEZ-3297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15326688#comment-15326688 ] TezQA commented on TEZ-3297: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12809732/TEZ-3297.2.branch-0.7.patch against master revision 1d11ad2. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1793//console This message is automatically generated. > Deadlock scenario in AM during ShuffleVertexManager auto reduce > --- > > Key: TEZ-3297 > URL: https://issues.apache.org/jira/browse/TEZ-3297 > Project: Apache Tez > Issue Type: Bug >Reporter: Zhiyuan Yang >Priority: Critical > Attachments: TEZ-3297.1.patch, TEZ-3297.2.branch-0.7.patch, > TEZ-3297.2.patch, am_log, thread_dump > > > Here is what's happening in the attached thread dump. > App Pool thread #9 does the auto reduce on V2 and initializes the new edge > manager, it holds the V2 write lock and wants read lock of source vertex V1. > At the same time, another App Pool thread #2 schedules a task of V1 and gets > the output spec, so it holds the V1 read lock and wants V2 read lock. > Also, dispatcher thread wants the V1 write lock to begin the state machine > transition. Since dispatcher thread is at the head of V1 ReadWriteLock queue, > thread #9 cannot get V1 read lock even thread #2 is holding V1 read lock. > This is a circular lock scenario. #2 blocks dispatcher, dispatcher blocks #9, > and #9 blocks #2. > There is no problem with ReadWriteLock behavior in this case. Please see this > java bug report, http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6816565. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3291) Optimize splits grouping when locality information is not available
[ https://issues.apache.org/jira/browse/TEZ-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15326689#comment-15326689 ] Bikas Saha commented on TEZ-3291: - The comment could be more explicit like "this is a workaround for systems like S3 that pass the same fake hostname for all splits" The log could log the newDesiredSplits and also the final value of desired splits such that we get all the info in one log. > Optimize splits grouping when locality information is not available > --- > > Key: TEZ-3291 > URL: https://issues.apache.org/jira/browse/TEZ-3291 > Project: Apache Tez > Issue Type: Improvement >Reporter: Rajesh Balamohan >Priority: Minor > Attachments: TEZ-3291.2.patch, TEZ-3291.WIP.patch > > > There are scenarios where splits might not contain the location details. S3 > is an example, where all splits would have "localhost" for the location > details. In such cases, curent split computation does not go through the > rack local and allow-small groups optimizations and ends up creating small > number of splits. Depending on clusters this can end creating long running > map jobs. > Example with hive: > == > 1. Inventory table in tpc-ds dataset is partitioned and is relatively a small > table. > 2. With query-22, hive requests with the original splits count as 52 and > overall length of splits themselves is around 12061817 bytes. > {{tez.grouping.min-size}} was set to 16 MB. > 3. In tez splits grouping, this ends up creating a single split with 52+ > files be processed in the split. In clusters with split locations, this > would have landed up with multiple splits since {{allowSmallGroups}} would > have kicked in. > But in S3, since everything would have "localhost" all splits get added to > single group. This makes things a lot worse. > 4. Depending on the dataset and the format, this can be problematic. For > instance, file open calls and random seeks can be expensive in S3. > 5. In this case, 52 files have to be opened and processed by single task in > sequential fashion. Had it been processed by multiple tasks, response time > would have drastically reduced. > E.g log details > {noformat} > 2016-06-01 13:48:08,353 [INFO] [InputInitializer {Map 2} #0] > |split.TezMapredSplitsGrouper|: Grouping splits in Tez > 2016-06-01 13:48:08,353 [INFO] [InputInitializer {Map 2} #0] > |split.TezMapredSplitsGrouper|: Desired splits: 110 too large. Desired > splitLength: 109652 Min splitLength: 16777216 New desired splits: 1 Total > length: 12061817 Original splits: 52 > 2016-06-01 13:48:08,354 [INFO] [InputInitializer {Map 2} #0] > |split.TezMapredSplitsGrouper|: Desired numSplits: 1 lengthPerGroup: 12061817 > numLocations: 1 numSplitsPerLocation: 52 numSplitsInGroup: 52 totalLength: > 12061817 numOriginalSplits: 52 . Grouping by length: true count: false > 2016-06-01 13:48:08,354 [INFO] [InputInitializer {Map 2} #0] > |split.TezMapredSplitsGrouper|: Number of splits desired: 1 created: 1 > splitsProcessed: 52 > {noformat} > Alternate options: > == > 1. Force Hadoop to provide bogus locations for S3. But not sure, if that > would be accepted anytime soon. Ref: HADOOP-12878 > 2. Set {{tez.grouping.min-size}} to very very low value. But should the end > user always be doing this on query to query basis? > 3. When {{(lengthPerGroup < "tez.grouping.min-size")}}, recompute > desiredNumSplits only when number of distinct locations in the splits is > 1. > This would force more number of splits to be generated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Failed: TEZ-3297 PreCommit Build #1793
Jira: https://issues.apache.org/jira/browse/TEZ-3297 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/1793/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 24 lines...] == == Testing patch for TEZ-3297. == == HEAD is now at 1d11ad2 TEZ-3296. Tez fails to compile against hadoop 2.8 after MAPREDUCE-5870 (jeagles) Previous HEAD position was 1d11ad2... TEZ-3296. Tez fails to compile against hadoop 2.8 after MAPREDUCE-5870 (jeagles) Switched to branch 'master' Your branch is behind 'origin/master' by 1 commit, and can be fast-forwarded. (use "git pull" to update your local branch) First, rewinding head to replay your work on top of it... Fast-forwarded master to 1d11ad275548031c68b2b360f2b8b7111ecd91fd. TEZ-3297 patch is being downloaded at Sun Jun 12 23:38:50 UTC 2016 from http://issues.apache.org/jira/secure/attachment/12809732/TEZ-3297.2.branch-0.7.patch The patch does not appear to apply with p0 to p2 PATCH APPLICATION FAILED {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12809732/TEZ-3297.2.branch-0.7.patch against master revision 1d11ad2. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1793//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. f15871f38d47b02ff0ce71f1d18d1873987d847a logged out == == Finished build. == == Build step 'Execute shell' marked build as failure Archiving artifacts [description-setter] Description set: MAPREDUCE-5870 Recording test results ERROR: Step ?Publish JUnit test result report? failed: No test report files were found. Configuration error? Email was triggered for: Failure - Any Sending email for trigger: Failure - Any ### ## FAILED TESTS (if any) ## No tests ran.
[jira] [Commented] (TEZ-3291) Optimize splits grouping when locality information is not available
[ https://issues.apache.org/jira/browse/TEZ-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15326684#comment-15326684 ] Bikas Saha commented on TEZ-3291: - Would the split not have the URLs with S3 in them? Wondering how ORC split estimator works? If it cases the spit into ORCSplit and inspects internal members then perhaps the S3 split could also be cast into the correct object to look at the URLs? > Optimize splits grouping when locality information is not available > --- > > Key: TEZ-3291 > URL: https://issues.apache.org/jira/browse/TEZ-3291 > Project: Apache Tez > Issue Type: Improvement >Reporter: Rajesh Balamohan >Priority: Minor > Attachments: TEZ-3291.2.patch, TEZ-3291.WIP.patch > > > There are scenarios where splits might not contain the location details. S3 > is an example, where all splits would have "localhost" for the location > details. In such cases, curent split computation does not go through the > rack local and allow-small groups optimizations and ends up creating small > number of splits. Depending on clusters this can end creating long running > map jobs. > Example with hive: > == > 1. Inventory table in tpc-ds dataset is partitioned and is relatively a small > table. > 2. With query-22, hive requests with the original splits count as 52 and > overall length of splits themselves is around 12061817 bytes. > {{tez.grouping.min-size}} was set to 16 MB. > 3. In tez splits grouping, this ends up creating a single split with 52+ > files be processed in the split. In clusters with split locations, this > would have landed up with multiple splits since {{allowSmallGroups}} would > have kicked in. > But in S3, since everything would have "localhost" all splits get added to > single group. This makes things a lot worse. > 4. Depending on the dataset and the format, this can be problematic. For > instance, file open calls and random seeks can be expensive in S3. > 5. In this case, 52 files have to be opened and processed by single task in > sequential fashion. Had it been processed by multiple tasks, response time > would have drastically reduced. > E.g log details > {noformat} > 2016-06-01 13:48:08,353 [INFO] [InputInitializer {Map 2} #0] > |split.TezMapredSplitsGrouper|: Grouping splits in Tez > 2016-06-01 13:48:08,353 [INFO] [InputInitializer {Map 2} #0] > |split.TezMapredSplitsGrouper|: Desired splits: 110 too large. Desired > splitLength: 109652 Min splitLength: 16777216 New desired splits: 1 Total > length: 12061817 Original splits: 52 > 2016-06-01 13:48:08,354 [INFO] [InputInitializer {Map 2} #0] > |split.TezMapredSplitsGrouper|: Desired numSplits: 1 lengthPerGroup: 12061817 > numLocations: 1 numSplitsPerLocation: 52 numSplitsInGroup: 52 totalLength: > 12061817 numOriginalSplits: 52 . Grouping by length: true count: false > 2016-06-01 13:48:08,354 [INFO] [InputInitializer {Map 2} #0] > |split.TezMapredSplitsGrouper|: Number of splits desired: 1 created: 1 > splitsProcessed: 52 > {noformat} > Alternate options: > == > 1. Force Hadoop to provide bogus locations for S3. But not sure, if that > would be accepted anytime soon. Ref: HADOOP-12878 > 2. Set {{tez.grouping.min-size}} to very very low value. But should the end > user always be doing this on query to query basis? > 3. When {{(lengthPerGroup < "tez.grouping.min-size")}}, recompute > desiredNumSplits only when number of distinct locations in the splits is > 1. > This would force more number of splits to be generated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-3291) Optimize splits grouping when locality information is not available
[ https://issues.apache.org/jira/browse/TEZ-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated TEZ-3291: -- Attachment: TEZ-3291.2.patch Thanks for the review [~bikassaha]. Attaching the revised patch. It is hard to find out if it is coming from localhost or from S3. Number of nodes in the cluster could serve as a hint (to avoid single node cluster), but that info would not be available in split grouper. When {{lengthPerGroup > maxLengthPerGroup}}, it goes via the normal code path and gets more splits. It is also possible that couple of groups can have more number of splits than others. But having a max number of splits that can be accommodated in a group when "tez.grouping.by-length" is turned on would be tricky. > Optimize splits grouping when locality information is not available > --- > > Key: TEZ-3291 > URL: https://issues.apache.org/jira/browse/TEZ-3291 > Project: Apache Tez > Issue Type: Improvement >Reporter: Rajesh Balamohan >Priority: Minor > Attachments: TEZ-3291.2.patch, TEZ-3291.WIP.patch > > > There are scenarios where splits might not contain the location details. S3 > is an example, where all splits would have "localhost" for the location > details. In such cases, curent split computation does not go through the > rack local and allow-small groups optimizations and ends up creating small > number of splits. Depending on clusters this can end creating long running > map jobs. > Example with hive: > == > 1. Inventory table in tpc-ds dataset is partitioned and is relatively a small > table. > 2. With query-22, hive requests with the original splits count as 52 and > overall length of splits themselves is around 12061817 bytes. > {{tez.grouping.min-size}} was set to 16 MB. > 3. In tez splits grouping, this ends up creating a single split with 52+ > files be processed in the split. In clusters with split locations, this > would have landed up with multiple splits since {{allowSmallGroups}} would > have kicked in. > But in S3, since everything would have "localhost" all splits get added to > single group. This makes things a lot worse. > 4. Depending on the dataset and the format, this can be problematic. For > instance, file open calls and random seeks can be expensive in S3. > 5. In this case, 52 files have to be opened and processed by single task in > sequential fashion. Had it been processed by multiple tasks, response time > would have drastically reduced. > E.g log details > {noformat} > 2016-06-01 13:48:08,353 [INFO] [InputInitializer {Map 2} #0] > |split.TezMapredSplitsGrouper|: Grouping splits in Tez > 2016-06-01 13:48:08,353 [INFO] [InputInitializer {Map 2} #0] > |split.TezMapredSplitsGrouper|: Desired splits: 110 too large. Desired > splitLength: 109652 Min splitLength: 16777216 New desired splits: 1 Total > length: 12061817 Original splits: 52 > 2016-06-01 13:48:08,354 [INFO] [InputInitializer {Map 2} #0] > |split.TezMapredSplitsGrouper|: Desired numSplits: 1 lengthPerGroup: 12061817 > numLocations: 1 numSplitsPerLocation: 52 numSplitsInGroup: 52 totalLength: > 12061817 numOriginalSplits: 52 . Grouping by length: true count: false > 2016-06-01 13:48:08,354 [INFO] [InputInitializer {Map 2} #0] > |split.TezMapredSplitsGrouper|: Number of splits desired: 1 created: 1 > splitsProcessed: 52 > {noformat} > Alternate options: > == > 1. Force Hadoop to provide bogus locations for S3. But not sure, if that > would be accepted anytime soon. Ref: HADOOP-12878 > 2. Set {{tez.grouping.min-size}} to very very low value. But should the end > user always be doing this on query to query basis? > 3. When {{(lengthPerGroup < "tez.grouping.min-size")}}, recompute > desiredNumSplits only when number of distinct locations in the splits is > 1. > This would force more number of splits to be generated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3296) Tez job can hang if two vertices at the same root distance have different task requirements
[ https://issues.apache.org/jira/browse/TEZ-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15326673#comment-15326673 ] Bikas Saha commented on TEZ-3296: - bq. Today each vertex uses a set of three priority values, the low, the high, and the mean of those two. (Oddly containers for high are never requested in practice, just the low and mean.) The middle priority is default. The lower value (higher pri) is for failed task reruns. The higher value (lower pri) was intended was speculative tasks but may have been missed being used for that. Wondering why the app was hung. IIRC YARN keeps the higher resource request when there are multiple at the same priority because thats the safer thing to do. So when 2 vertices have the same priority but different resources then we would expect to get containers for both but with the higher resource value across the board. If the above is correct then perhaps there is a bug in the task scheduler code that needs to get fixed which we might miss if we change the vertex priorities to be unique as a workaround. The vertex priority change is good in its own right. But would be good to make sure we dont have some pending bug in the task scheduler that may have other side effects. Could you please attach the task scheduler log for the job that hung in case that has some clues. On the patch itself the formula looks like (Height*Total*3) + V*3. Now - (1*24*3) + 20*3 = 150 = (2*24*3) + 2*3 So we could still have collisions depending on the manner in which vertexIds get assigned, right? Unless currently we are getting lucky in the vId assignment such that vertices close to the root also happen to get low ids. > Tez job can hang if two vertices at the same root distance have different > task requirements > --- > > Key: TEZ-3296 > URL: https://issues.apache.org/jira/browse/TEZ-3296 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.7.1 >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Critical > Attachments: TEZ-3296.001.patch > > > When two vertices have the same distance from the root Tez will schedule > containers with the same priority. However those vertices could have > different task requirements and therefore different capabilities. As > documented in YARN-314, YARN currently doesn't support requests for multiple > sizes at the same priority. In practice this leads to one vertex allocation > requests clobbering the other, and that can result in a situation where the > Tez AM is waiting on containers it will never receive from the RM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEZ-3302) Add a version of processorContext.waitForAllInputsReady and waitForAnyInputReady with a timeout
Siddharth Seth created TEZ-3302: --- Summary: Add a version of processorContext.waitForAllInputsReady and waitForAnyInputReady with a timeout Key: TEZ-3302 URL: https://issues.apache.org/jira/browse/TEZ-3302 Project: Apache Tez Issue Type: Improvement Reporter: Siddharth Seth -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TEZ-3302) Add a version of processorContext.waitForAllInputsReady and waitForAnyInputReady with a timeout
[ https://issues.apache.org/jira/browse/TEZ-3302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated TEZ-3302: Description: This is useful when a Processor needs to check on whether it has been aborted or not, and the interrupt that is sent in as part of the 'Task kill' process has been swallowed by some other entity. > Add a version of processorContext.waitForAllInputsReady and > waitForAnyInputReady with a timeout > --- > > Key: TEZ-3302 > URL: https://issues.apache.org/jira/browse/TEZ-3302 > Project: Apache Tez > Issue Type: Improvement >Reporter: Siddharth Seth > > This is useful when a Processor needs to check on whether it has been aborted > or not, and the interrupt that is sent in as part of the 'Task kill' process > has been swallowed by some other entity. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3297) Deadlock scenario in AM during ShuffleVertexManager auto reduce
[ https://issues.apache.org/jira/browse/TEZ-3297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15326660#comment-15326660 ] Bikas Saha commented on TEZ-3297: - looking at the code further, looks like the crucial change is not holding own vertex lock while trying to read src/dest vertex lock. that makes sense and seems like a lock ordering issue waiting to happen. Perhaps a quick scan of such nested locking is in order in case not already done. The removal of the overall lock is fine since each internal method invocation like getTotalTasks() are already handling their own locking. lgtm. Moving VM invoked sync calls onto the dispatcher is a good idea but would need the addition of new callbacks into the VM to notify them of completion of the requested vertex state change operation. Since most current VMs dont do much after changing parallelism, the change might be simpler to implement now. Not sure about Hive custom VMs. > Deadlock scenario in AM during ShuffleVertexManager auto reduce > --- > > Key: TEZ-3297 > URL: https://issues.apache.org/jira/browse/TEZ-3297 > Project: Apache Tez > Issue Type: Bug >Reporter: Zhiyuan Yang >Priority: Critical > Attachments: TEZ-3297.1.patch, TEZ-3297.2.patch, am_log, thread_dump > > > Here is what's happening in the attached thread dump. > App Pool thread #9 does the auto reduce on V2 and initializes the new edge > manager, it holds the V2 write lock and wants read lock of source vertex V1. > At the same time, another App Pool thread #2 schedules a task of V1 and gets > the output spec, so it holds the V1 read lock and wants V2 read lock. > Also, dispatcher thread wants the V1 write lock to begin the state machine > transition. Since dispatcher thread is at the head of V1 ReadWriteLock queue, > thread #9 cannot get V1 read lock even thread #2 is holding V1 read lock. > This is a circular lock scenario. #2 blocks dispatcher, dispatcher blocks #9, > and #9 blocks #2. > There is no problem with ReadWriteLock behavior in this case. Please see this > java bug report, http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6816565. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3216) Support for more precise partition stats in VertexManagerEvent
[ https://issues.apache.org/jira/browse/TEZ-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15326650#comment-15326650 ] Bikas Saha commented on TEZ-3216: - /cc [~rajesh.balamohan] in case he is interested in this optimization. > Support for more precise partition stats in VertexManagerEvent > -- > > Key: TEZ-3216 > URL: https://issues.apache.org/jira/browse/TEZ-3216 > Project: Apache Tez > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Ming Ma > Attachments: TEZ-3216.patch > > > Follow up on TEZ-3206 discussion, at least for some use cases, more accurate > partition stats will be useful for DataMovementEvent routing. Maybe we can > provide a config option to allow apps to choose the more accurate partition > stats over RoaringBitmap. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3291) Optimize splits grouping when locality information is not available
[ https://issues.apache.org/jira/browse/TEZ-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15326649#comment-15326649 ] Bikas Saha commented on TEZ-3291: - Why the numLoc=1 check only in the size < min case? A comment before the code, explaining the above workaround would be useful. Also a log statement. This may affect single node cases because numLoc=1 in that case too. Is there any way we can find out if the splits are coming from an S3 like source and use that information instead. E.g. something similar to splitSizeEstimator that can look at the split and return if its locations are potentially fake. > Optimize splits grouping when locality information is not available > --- > > Key: TEZ-3291 > URL: https://issues.apache.org/jira/browse/TEZ-3291 > Project: Apache Tez > Issue Type: Improvement >Reporter: Rajesh Balamohan >Priority: Minor > Attachments: TEZ-3291.WIP.patch > > > There are scenarios where splits might not contain the location details. S3 > is an example, where all splits would have "localhost" for the location > details. In such cases, curent split computation does not go through the > rack local and allow-small groups optimizations and ends up creating small > number of splits. Depending on clusters this can end creating long running > map jobs. > Example with hive: > == > 1. Inventory table in tpc-ds dataset is partitioned and is relatively a small > table. > 2. With query-22, hive requests with the original splits count as 52 and > overall length of splits themselves is around 12061817 bytes. > {{tez.grouping.min-size}} was set to 16 MB. > 3. In tez splits grouping, this ends up creating a single split with 52+ > files be processed in the split. In clusters with split locations, this > would have landed up with multiple splits since {{allowSmallGroups}} would > have kicked in. > But in S3, since everything would have "localhost" all splits get added to > single group. This makes things a lot worse. > 4. Depending on the dataset and the format, this can be problematic. For > instance, file open calls and random seeks can be expensive in S3. > 5. In this case, 52 files have to be opened and processed by single task in > sequential fashion. Had it been processed by multiple tasks, response time > would have drastically reduced. > E.g log details > {noformat} > 2016-06-01 13:48:08,353 [INFO] [InputInitializer {Map 2} #0] > |split.TezMapredSplitsGrouper|: Grouping splits in Tez > 2016-06-01 13:48:08,353 [INFO] [InputInitializer {Map 2} #0] > |split.TezMapredSplitsGrouper|: Desired splits: 110 too large. Desired > splitLength: 109652 Min splitLength: 16777216 New desired splits: 1 Total > length: 12061817 Original splits: 52 > 2016-06-01 13:48:08,354 [INFO] [InputInitializer {Map 2} #0] > |split.TezMapredSplitsGrouper|: Desired numSplits: 1 lengthPerGroup: 12061817 > numLocations: 1 numSplitsPerLocation: 52 numSplitsInGroup: 52 totalLength: > 12061817 numOriginalSplits: 52 . Grouping by length: true count: false > 2016-06-01 13:48:08,354 [INFO] [InputInitializer {Map 2} #0] > |split.TezMapredSplitsGrouper|: Number of splits desired: 1 created: 1 > splitsProcessed: 52 > {noformat} > Alternate options: > == > 1. Force Hadoop to provide bogus locations for S3. But not sure, if that > would be accepted anytime soon. Ref: HADOOP-12878 > 2. Set {{tez.grouping.min-size}} to very very low value. But should the end > user always be doing this on query to query basis? > 3. When {{(lengthPerGroup < "tez.grouping.min-size")}}, recompute > desiredNumSplits only when number of distinct locations in the splits is > 1. > This would force more number of splits to be generated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3300) Tez UI: A wiki must be created with info about each page in Tez UI
[ https://issues.apache.org/jira/browse/TEZ-3300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15326638#comment-15326638 ] Bikas Saha commented on TEZ-3300: - Could pages to the wiki be linked directly from the UI page for quick access? > Tez UI: A wiki must be created with info about each page in Tez UI > -- > > Key: TEZ-3300 > URL: https://issues.apache.org/jira/browse/TEZ-3300 > Project: Apache Tez > Issue Type: Bug >Reporter: Sreenath Somarajapuram > > - It would be a page under Tez confluence > - Must be flexible enough to support different versions of Tez UI, and give > context based help. > - Add a section on understanding various errors displayed in the error-bar. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TEZ-3300) Tez UI: A wiki must be created with info about each page in Tez UI
[ https://issues.apache.org/jira/browse/TEZ-3300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15326638#comment-15326638 ] Bikas Saha edited comment on TEZ-3300 at 6/12/16 9:22 PM: -- Could pages to the wiki be linked directly from the corresponding UI pages for quick access? was (Author: bikassaha): Could pages to the wiki be linked directly from the UI page for quick access? > Tez UI: A wiki must be created with info about each page in Tez UI > -- > > Key: TEZ-3300 > URL: https://issues.apache.org/jira/browse/TEZ-3300 > Project: Apache Tez > Issue Type: Bug >Reporter: Sreenath Somarajapuram > > - It would be a page under Tez confluence > - Must be flexible enough to support different versions of Tez UI, and give > context based help. > - Add a section on understanding various errors displayed in the error-bar. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-3291) Optimize splits grouping when locality information is not available
[ https://issues.apache.org/jira/browse/TEZ-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15326302#comment-15326302 ] Rajesh Balamohan commented on TEZ-3291: --- It is ready for review [~bikassaha]. haven't renamed the patch. > Optimize splits grouping when locality information is not available > --- > > Key: TEZ-3291 > URL: https://issues.apache.org/jira/browse/TEZ-3291 > Project: Apache Tez > Issue Type: Improvement >Reporter: Rajesh Balamohan >Priority: Minor > Attachments: TEZ-3291.WIP.patch > > > There are scenarios where splits might not contain the location details. S3 > is an example, where all splits would have "localhost" for the location > details. In such cases, curent split computation does not go through the > rack local and allow-small groups optimizations and ends up creating small > number of splits. Depending on clusters this can end creating long running > map jobs. > Example with hive: > == > 1. Inventory table in tpc-ds dataset is partitioned and is relatively a small > table. > 2. With query-22, hive requests with the original splits count as 52 and > overall length of splits themselves is around 12061817 bytes. > {{tez.grouping.min-size}} was set to 16 MB. > 3. In tez splits grouping, this ends up creating a single split with 52+ > files be processed in the split. In clusters with split locations, this > would have landed up with multiple splits since {{allowSmallGroups}} would > have kicked in. > But in S3, since everything would have "localhost" all splits get added to > single group. This makes things a lot worse. > 4. Depending on the dataset and the format, this can be problematic. For > instance, file open calls and random seeks can be expensive in S3. > 5. In this case, 52 files have to be opened and processed by single task in > sequential fashion. Had it been processed by multiple tasks, response time > would have drastically reduced. > E.g log details > {noformat} > 2016-06-01 13:48:08,353 [INFO] [InputInitializer {Map 2} #0] > |split.TezMapredSplitsGrouper|: Grouping splits in Tez > 2016-06-01 13:48:08,353 [INFO] [InputInitializer {Map 2} #0] > |split.TezMapredSplitsGrouper|: Desired splits: 110 too large. Desired > splitLength: 109652 Min splitLength: 16777216 New desired splits: 1 Total > length: 12061817 Original splits: 52 > 2016-06-01 13:48:08,354 [INFO] [InputInitializer {Map 2} #0] > |split.TezMapredSplitsGrouper|: Desired numSplits: 1 lengthPerGroup: 12061817 > numLocations: 1 numSplitsPerLocation: 52 numSplitsInGroup: 52 totalLength: > 12061817 numOriginalSplits: 52 . Grouping by length: true count: false > 2016-06-01 13:48:08,354 [INFO] [InputInitializer {Map 2} #0] > |split.TezMapredSplitsGrouper|: Number of splits desired: 1 created: 1 > splitsProcessed: 52 > {noformat} > Alternate options: > == > 1. Force Hadoop to provide bogus locations for S3. But not sure, if that > would be accepted anytime soon. Ref: HADOOP-12878 > 2. Set {{tez.grouping.min-size}} to very very low value. But should the end > user always be doing this on query to query basis? > 3. When {{(lengthPerGroup < "tez.grouping.min-size")}}, recompute > desiredNumSplits only when number of distinct locations in the splits is > 1. > This would force more number of splits to be generated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)