[ 
https://issues.apache.org/jira/browse/TEZ-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated TEZ-3274:
-----------------------------
    Attachment: TEZ-3274.004.patch

Attaching a completely new patch that adds the relevant slow start code from 
ShuffleVertexManager and ShuffleVertexManagerBase into RootInputVertexManager. 
It’s quite cumbersome and adds a lot of redundant code, but it allows MRInput + 
Broadcast Input vertices to benefit from slow start via separate configs. 
Additionally, I ported over the slow start unit test from 
TestShuffleVertexManager and fixed up some other tests that broke because of 
the new feature. 

I tested the change on a 15 node cluster. It preserves its previous 
functionality by default (i.e. no slow start), and is tunable by 3 configs, 
tez.root-input-vertex-manager.(enable.slow-start,min-src-fraction,max-src-fraction).
 Slow start is disabled by default, which sets both the min and max to 0, 
causing all tasks to start immediately, just as they would in the previous 
ImmediateStart case. When slow start is enabled, it performs just like the 
ShuffleVertexManager case, scheduling tasks linearly between the min/max 
values. I tested this with a script that creates a DAG with a MRInput + 
Broadcast input downstream vertex.

{noformat}
-- Tab separate values in the input files
A = LOAD '/tmp/data1' as (a, b, c);
B = LOAD '/tmp/data2' as (x, y, z);
C = GROUP A BY a;
D = JOIN B by x, C by group using 'replicated';
STORE D into '/tmp/output';
{noformat}

[~sseth], [~jeagles], [~rohini], [~jlowe], I would appreciate any and all 
comments!

> Vertex with MRInput and broadcast input does not respect slow start
> -------------------------------------------------------------------
>
>                 Key: TEZ-3274
>                 URL: https://issues.apache.org/jira/browse/TEZ-3274
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Jonathan Eagles
>            Assignee: Eric Badger
>         Attachments: TEZ-3274.001.patch, TEZ-3274.002.patch, 
> TEZ-3274.003.patch, TEZ-3274.004.patch
>
>
> Vertices with shuffle input and MRInput choose RootInputVertexManager (and 
> not ShuffleVertexManager) and start containers and tasks immediately. In this 
> scenario, resources can be wasted since they do not respect 
> tez.shuffle-vertex-manager.min-src-fraction 
> tez.shuffle-vertex-manager.max-src-fraction. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to