[ https://issues.apache.org/jira/browse/TEZ-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Eric Badger updated TEZ-3274: ----------------------------- Attachment: TEZ-3274.004.patch Attaching a completely new patch that adds the relevant slow start code from ShuffleVertexManager and ShuffleVertexManagerBase into RootInputVertexManager. It’s quite cumbersome and adds a lot of redundant code, but it allows MRInput + Broadcast Input vertices to benefit from slow start via separate configs. Additionally, I ported over the slow start unit test from TestShuffleVertexManager and fixed up some other tests that broke because of the new feature. I tested the change on a 15 node cluster. It preserves its previous functionality by default (i.e. no slow start), and is tunable by 3 configs, tez.root-input-vertex-manager.(enable.slow-start,min-src-fraction,max-src-fraction). Slow start is disabled by default, which sets both the min and max to 0, causing all tasks to start immediately, just as they would in the previous ImmediateStart case. When slow start is enabled, it performs just like the ShuffleVertexManager case, scheduling tasks linearly between the min/max values. I tested this with a script that creates a DAG with a MRInput + Broadcast input downstream vertex. {noformat} -- Tab separate values in the input files A = LOAD '/tmp/data1' as (a, b, c); B = LOAD '/tmp/data2' as (x, y, z); C = GROUP A BY a; D = JOIN B by x, C by group using 'replicated'; STORE D into '/tmp/output'; {noformat} [~sseth], [~jeagles], [~rohini], [~jlowe], I would appreciate any and all comments! > Vertex with MRInput and broadcast input does not respect slow start > ------------------------------------------------------------------- > > Key: TEZ-3274 > URL: https://issues.apache.org/jira/browse/TEZ-3274 > Project: Apache Tez > Issue Type: Bug > Reporter: Jonathan Eagles > Assignee: Eric Badger > Attachments: TEZ-3274.001.patch, TEZ-3274.002.patch, > TEZ-3274.003.patch, TEZ-3274.004.patch > > > Vertices with shuffle input and MRInput choose RootInputVertexManager (and > not ShuffleVertexManager) and start containers and tasks immediately. In this > scenario, resources can be wasted since they do not respect > tez.shuffle-vertex-manager.min-src-fraction > tez.shuffle-vertex-manager.max-src-fraction. -- This message was sent by Atlassian JIRA (v6.3.15#6346)