[ 
https://issues.apache.org/jira/browse/TEZ-3269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15578791#comment-15578791
 ] 

Siddharth Seth commented on TEZ-3269:
-------------------------------------

Apologies for the long delay in the review.
Mostly looks good. Would be a lot easier to review if this were split into 
smaller jiras... think it combines a bunch of things like long to int, with the 
core logic changes.
Minor Stuff:
- final where possible - e.g. PartitionsGroupingCalculator.sourceVertexInfo, 
all variables in FairEdgeConfiguration
- This is a fairly complicated patch. Would be good to have some more 
documentation.
   - the ceil method
   - Within various conditions in compute and iterator
- Obligatory rename request: getTotalStatsAtIndex to 
getCurrentlyKnownStatsAtIndex - this method will normally not return totalStats.
- Nit: expectedTotalSourceTasksOutputSize / numOfPartitions; - can be done once 
outside the loop
- onVertexStarted - Should this be split up a little more. It's possible for 
quite a bit to happen at the moment, before the "single vertex only" check is 
hit in FairShufflleVertexManager

Question:
- estimatePartitionSize.partitionstatSizeInMB is across all partitions. This 
ensures that averaging of stats based on output size isn't accidentally hit on 
a 0 sized partition? (Could break earlier from the loop)
- In case of reduce_parallelism - this considers the partition size and may 
produce groups with different number of partitions to consume, which the 
current ShuffleVertexManager doesn't do yet?
- Will the parallelism ever end up getting increased?

Any thoughts on what it will take to move this to support multiple source 
vertices ?

> Provide basic fair routing and scheduling functionality via custom 
> VertexManager and EdgeManager
> ------------------------------------------------------------------------------------------------
>
>                 Key: TEZ-3269
>                 URL: https://issues.apache.org/jira/browse/TEZ-3269
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Ming Ma
>         Attachments: TEZ-3269-2.patch, TEZ-3269-3.patch, TEZ-3269.patch
>
>
> With TEZ-3206 and TEZ-3216, we can build a custom VertexManager and 
> EdgeManager that uses partition stats to do fair routing as well as the 
> scheduling based on destination tasks’ dependency on source tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to