[ https://issues.apache.org/jira/browse/TEZ-3269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15578791#comment-15578791 ]
Siddharth Seth commented on TEZ-3269: ------------------------------------- Apologies for the long delay in the review. Mostly looks good. Would be a lot easier to review if this were split into smaller jiras... think it combines a bunch of things like long to int, with the core logic changes. Minor Stuff: - final where possible - e.g. PartitionsGroupingCalculator.sourceVertexInfo, all variables in FairEdgeConfiguration - This is a fairly complicated patch. Would be good to have some more documentation. - the ceil method - Within various conditions in compute and iterator - Obligatory rename request: getTotalStatsAtIndex to getCurrentlyKnownStatsAtIndex - this method will normally not return totalStats. - Nit: expectedTotalSourceTasksOutputSize / numOfPartitions; - can be done once outside the loop - onVertexStarted - Should this be split up a little more. It's possible for quite a bit to happen at the moment, before the "single vertex only" check is hit in FairShufflleVertexManager Question: - estimatePartitionSize.partitionstatSizeInMB is across all partitions. This ensures that averaging of stats based on output size isn't accidentally hit on a 0 sized partition? (Could break earlier from the loop) - In case of reduce_parallelism - this considers the partition size and may produce groups with different number of partitions to consume, which the current ShuffleVertexManager doesn't do yet? - Will the parallelism ever end up getting increased? Any thoughts on what it will take to move this to support multiple source vertices ? > Provide basic fair routing and scheduling functionality via custom > VertexManager and EdgeManager > ------------------------------------------------------------------------------------------------ > > Key: TEZ-3269 > URL: https://issues.apache.org/jira/browse/TEZ-3269 > Project: Apache Tez > Issue Type: Sub-task > Reporter: Ming Ma > Attachments: TEZ-3269-2.patch, TEZ-3269-3.patch, TEZ-3269.patch > > > With TEZ-3206 and TEZ-3216, we can build a custom VertexManager and > EdgeManager that uses partition stats to do fair routing as well as the > scheduling based on destination tasks’ dependency on source tasks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)