Andrew - I guess I am not sure how the CheckOperator is implemented, but wouldn't it amount to the same thing i.e. unnecessary polling? I imagine some process is kicked off somewhere and repeatedly polls to check if A and B are both done writing their outcome. I do not want to convert what is essentially a time dependency (and what I consider to be in the purview of the scheduler) into some sort of polling solution.
I am looking for a solution that respects the time dependencies of A and B and only runs them at their specified time. C being a child of A and B will run only on successful completion of the two. No task (sensor, check or any other poller) ever runs outside of this schedule. The scheduler itself might poll but we are not launching new processes that mostly just sleep. Ram. On 2018/07/23 17:58:56, Andrew Maguire <andrewm4...@gmail.com> wrote: > Maybe you could have A and B report their outcome somewhere and then use > that output, read back in from somewhere, as a check operator in C. > > This is kinda reinventing the wheel a little bit though as ideally would be > a way to keep all that state inside airflow. > > I think what I suggest would work, but maybe a little hackish. > > On Mon, 23 Jul 2018, 14:33 srinivas.ramabhad...@gmail.com, < > srinivas.ramabhad...@gmail.com> wrote: > > > Carl - > > > > Thanks, that definitely works, but it's non-ideal. If I had 100s of > > jobs running throughout the day, a TimeSensor task (process) gets created > > for each task at midnight even though a task may not be required to run for > > a very long time (e.g. a whole bunch of tasks need to run @ 20:00. All of > > their time sensors are kicked off at 00:00). Worse still, if I used a > > LocalExcecutor with a pool size of 10, some jobs that need to run early may > > not even get scheduled in favor of time sensors for tasks later in the day > > which only perform a sleep operation. > > > > Is there another way to do this? If not, is there at least another way > > around the LocalExecutor problem? > > > > Ram. > > > > > > On 2018/07/23 08:23:45, Carl Johan Gustavsson <carl.j.gustavs...@gmail.com> > > wrote: > > > Hi Ram, > > > > > > You can have a single DAG scheduled to 10am, which starts A and then use > > a TimeSensor set to 11 am that B depends on and then have C depend on A > > and B. > > > > > > Something like: > > > > > > a = BashOperator(‘a’, …) > > > > > > delay_b = TimeSensor(‘delay_b’, target_time=time(11, 0, 0), …) > > > b = BashOperator(‘b’, …) > > > b.set_upstream(delay_b) > > > > > > c = BashOperator(‘c’, …) > > > c.set_upstream(a) > > > c.set_upstream(b) > > > > > > > > > / Carl Johan > > > On 23 July 2018 at 02:18:00, srinivas.ramabhad...@gmail.com ( > > srinivas.ramabhad...@gmail.com) wrote: > > > > > > Hi - > > > > > > I have recently started using Airflow version 1.9.0 and am having some > > difficulty setting up a very simple DAG. I have three tasks A, B and C. I'd > > like A to run every day at 10am and B at 11am. C depends on BOTH A and B > > running successfully. > > > > > > Initially, I decided to create one DAG, add all three tasks to it and > > set C as downstream to A and B. I then set the schedule_interval of the DAG > > to @daily. But this meant I couldn't run A and B at 10am and 11am > > respectively since the they are PythonOperators and tasks dont support > > schedule_interval (or, at least, it's deprecated syntax and gets ignored). > > > > > > I scratched that idea and then created A and B as DAGs, specified the > > schedule interval as per the cron syntax: '00 10 * * *' for A and '00 11 * > > * *' for B. But now when I set C as a downstream of A and B, it complains > > that C can't belong to two different dags. > > > > > > How do I accomplish such a simple dependency structure? > > > > > > Ram. > > > > > >