The scheduler isn't guaranteed to compute them in that order to maximize parallelism. You can imagine in the case where m = n -1, that it just computes the m branches in parallel, then it has to complete the nth branch with parallelism 1.
On Tue, Feb 5, 2019 at 7:20 AM soma dhavala <soma.dhav...@gmail.com> wrote: > Imagine that you have celery (or another cluster) executor with “m” > workers (with equal resources) > > Say, I’ve the following dad > > [0]<—[1a]<—[1b]<— [1c] > [0]<—[2a]<—[2b]<— [2c] > .. > [0]<—[n1]<—[n2]<— [n3] > > In the above, [0] node is the parent of “n” identical branches. > Suppose [0]’s computational time is negligible, and pipeline [a,b,c] takes > say "h” hours. > Then, will all the “n” branches work in parallel (having a celery worker > with “m” workers with all equal resources), leading to a total computation > time of approximately "(n/m)h hours” to finish the dag in ? > > I know a small dag can be created to test the concept, just want to check > if there is a theoretical answer here that other devs are aware of. > > thanks, > -soma > > > On Feb 5, 2019, at 8:36 PM, Iván Robla Albarrán <ivanro...@gmail.com> > wrote: > > > > Hi, > > > > In connections examples, I see spark_default , is a spark type > conection and i want to create other Spark connection but i can't found > spark in connection type. > > > > Images Attached > > > > Could you help me? > > > > Thanks!!! > > > > > >