question on an embarrassingly parallelism

soma dhavala Tue, 05 Feb 2019 07:20:51 -0800

Imagine that you have celery (or another cluster) executor with “m” workers 
(with equal resources)

Say, I’ve the following dad

[0]<—[1a]<—[1b]<— [1c]
[0]<—[2a]<—[2b]<— [2c]
..
[0]<—[n1]<—[n2]<— [n3]

In the above, [0] node is the parent of “n” identical branches.
Suppose [0]’s computational time is negligible, and pipeline [a,b,c] takes say 
"h” hours.
Then, will all the “n” branches work in parallel (having a celery worker with 
“m” workers with all equal resources), leading to a total computation time of 
approximately "(n/m)h hours”  to finish the dag in ?

I know a small dag can be created to test the concept, just want to check if 
there is a theoretical answer here that other devs are aware of.

thanks,
-soma

> On Feb 5, 2019, at 8:36 PM, Iván Robla Albarrán <[email protected]> wrote:
> 
> Hi, 
> 
> In connections examples,  I see spark_default , is a spark type conection and 
> i want to create other Spark connection but i can't found spark in connection 
> type. 
> 
> Images Attached
> 
> Could you help me? 
> 
> Thanks!!!
> 
>

question on an embarrassingly parallelism

Reply via email to