Imagine that you have celery (or another cluster) executor with “m” workers (with equal resources)
Say, I’ve the following dad [0]<—[1a]<—[1b]<— [1c] [0]<—[2a]<—[2b]<— [2c] .. [0]<—[n1]<—[n2]<— [n3] In the above, [0] node is the parent of “n” identical branches. Suppose [0]’s computational time is negligible, and pipeline [a,b,c] takes say "h” hours. Then, will all the “n” branches work in parallel (having a celery worker with “m” workers with all equal resources), leading to a total computation time of approximately "(n/m)h hours” to finish the dag in ? I know a small dag can be created to test the concept, just want to check if there is a theoretical answer here that other devs are aware of. thanks, -soma > On Feb 5, 2019, at 8:36 PM, Iván Robla Albarrán <[email protected]> wrote: > > Hi, > > In connections examples, I see spark_default , is a spark type conection and > i want to create other Spark connection but i can't found spark in connection > type. > > Images Attached > > Could you help me? > > Thanks!!! > >
