Re: question on an embarrassingly parallelism

Alex Guziel Tue, 05 Feb 2019 11:05:20 -0800

The scheduler isn't guaranteed to compute them in that order to maximize
parallelism. You can imagine in the case where m = n -1, that it just
computes the m branches in parallel, then it has to complete the nth branch
with parallelism 1.


On Tue, Feb 5, 2019 at 7:20 AM soma dhavala <soma.dhav...@gmail.com> wrote:

> Imagine that you have celery (or another cluster) executor with “m”
> workers (with equal resources)
>
> Say, I’ve the following dad
>
> [0]<—[1a]<—[1b]<— [1c]
> [0]<—[2a]<—[2b]<— [2c]
> ..
> [0]<—[n1]<—[n2]<— [n3]
>
> In the above, [0] node is the parent of “n” identical branches.
> Suppose [0]’s computational time is negligible, and pipeline [a,b,c] takes
> say "h” hours.
> Then, will all the “n” branches work in parallel (having a celery worker
> with “m” workers with all equal resources), leading to a total computation
> time of approximately "(n/m)h hours”  to finish the dag in ?
>
> I know a small dag can be created to test the concept, just want to check
> if there is a theoretical answer here that other devs are aware of.
>
> thanks,
> -soma
>
> > On Feb 5, 2019, at 8:36 PM, Iván Robla Albarrán <ivanro...@gmail.com>
> wrote:
> >
> > Hi,
> >
> > In connections examples,  I see spark_default , is a spark type
> conection and i want to create other Spark connection but i can't found
> spark in connection type.
> >
> > Images Attached
> >
> > Could you help me?
> >
> > Thanks!!!
> >
> >
>
>

Re: question on an embarrassingly parallelism

Reply via email to