Yes, clearly the DAG runs be can in inconsistent states with related task instances and backfill processes. Here's a quick patch that helps a little: https://github.com/apache/incubator-airflow/pull/3433
After writing the quick patch above I'm thinking it requires a bit more thinking. The clear command is effectively a bit of a way to issue a "scheduler-driven backfill", maybe we can deprecate clear and have a new "airflow backfill --scheduler", which would effectively clear task instances and create/set DAG runs in the right state. Max On Tue, May 29, 2018 at 5:58 PM Ruiqin Yang <yrql...@gmail.com> wrote: > This line > < > https://github.com/apache/incubator-airflow/blob/master/airflow/jobs.py#L935 > > > is > where the scheduler skips the backfill DAG runs. Despite what state the DAG > run is in, tasks in DAG run starts with 'backfill_' would not be considered > when scheduling. > > I agree with Dan Davydov's idea that we should at least have something like > multiple DAG runs for one execution to distinguish different DAG runs like > scheduled and backfilled. The situation Scott is facing here is not the > only case that lack of multiple DAG run has caused (e.g. manually trigging > a task in the UI should also create a seperate DAG run, otherwise the > implementation logic is a bit wired). > > Cheers, > Kevin Y > > On Tue, May 29, 2018 at 5:52 PM Scott Halgrim > <scott.halg...@zapier.com.invalid> wrote: > > > Well I’ve gone ahead and run the UPDATE query now, so the scheduler is > > picking up tasks. > > > > When I cleared the tasks, every DAG run that had a cleared task in it was > > set to running. Because I’d backfilled them all they were all `backfill_` > > dag runs. Inspection of various tasks via `task_failed_deps` indicated > the > > tasks had all their dependencies filled. After running the update query, > > they’re all `scheduled__` dag runs. > > > > On May 29, 2018, 5:02 PM -0700, Maxime Beauchemin < > > maximebeauche...@gmail.com>, wrote: > > > While this may work it's clearly not the prescribed way to do this. > > > Clearing should just work. > > > > > > I'm trying to understand why the scheduler is not picking up the > cleared > > > task. Clearing should remove the task instance state and set the state > of > > > the related DAG Run to running so that the scheduler picks those up. > > > Perhaps there's a conflict between the backfill and scheduler-related > DAG > > > Runs? Which DAG runs are set to running? The backfill or > > scheduler-related > > > ones? > > > > > > Originally when I introduced DAG runs, backfill was operating without > any > > > consideration related to DAG runs (DAG runs were a scheduler-specific > > > construct), later on Bolke added backfill-specific DAG runs and I'm not > > > 100% sure how that works. > > > > > > Let's get to the bottom of this. > > > > > > Max > > > > > > On Fri, May 25, 2018 at 7:48 PM Ruiqin Yang <yrql...@gmail.com> wrote: > > > > > > > If you are sure the update query targets the desired rows, the > behavior > > > > should be the same. > > > > > > > > Scott Halgrim <scott.halg...@zapier.com.invalid>于2018年5月25日 > > 周五下午4:23写道: > > > > > > > > > So far no ill effects from: > > > > > > > > > > update dag_run > > > > > set run_id = concat('scheduled__', substring(run_id, 10, 19)) > > > > > where dag_id = 'daily' > > > > > and execution_date > '2017-08-31' and execution_date < '2018-01-11' > > > > > and run_id like 'backfill_%' > > > > > order by execution_date; > > > > > > > > > > On May 25, 2018, 4:03 PM -0700, Scott Halgrim < > > scott.halg...@zapier.com > > > > > , > > > > > wrote: > > > > > > Oh wow, that will work? Thanks! Is there any reason for me not to > > just > > > > > run a mass UPDATE on those dag runs directly in the metadata > > database? > > > > > > > > > > > > On May 25, 2018, 4:01 PM -0700, Ruiqin Yang <yrql...@gmail.com>, > > > > wrote: > > > > > > > Airflow is not going to schedule backfill DAG runs, by looking > > at the > > > > > dag > > > > > > > run ID (which will start by 'backfill__'). If you want the > > scheduler > > > > to > > > > > > > schedule those tasks, you can click the DAG run and edit its > name > > > > back > > > > > to > > > > > > > 'scheduled__<something>' > > > > > > > > > > > > > > Cheers, > > > > > > > Kevin Y > > > > > > > > > > > > > > On Fri, May 25, 2018 at 3:53 PM, Scott Halgrim < > > > > > > > scott.halg...@zapier.com.invalid> wrote: > > > > > > > > > > > > > > > I’ve got four months of dag runs that were scheduled dag > runs, > > > > then I > > > > > > > > backfilled them. And now when I clear a task from one of > those > > the > > > > > dag run > > > > > > > > goes to “running,” but none of the tasks get scheduled > (unless > > I > > > > > manually > > > > > > > > backfill each of them) > > > > > > > > > > > > > > > > What I really should have done here was just cleared a > mid-dag > > task > > > > > as > > > > > > > > well as all downstream tasks for these dag runs, but, well, > > now I’m > > > > > here > > > > > > > > and I’m wondering what the best way to fix this. > > > > > > > > > > > > > > > > Thanks! > > > > > > > > > > > > > > > > > > > > > > > > > > > >