Hi Chris, I see. I switched to LocalExecutor and the scheduler is working as expected. Thanks a lot for your help!
Jason On Tue, May 31, 2016 at 3:35 PM, Chris Riccomini <criccom...@apache.org> wrote: > Hey Jason, > > The SequentialExecutor only ever runs one task at a time. It's meant for > debugging purposes. Try switching to the LocalExecutor. > > Cheers, > Chris > > On Tue, May 31, 2016 at 3:31 PM, Jason Chen <chingchien.c...@gmail.com> > wrote: > >> Chris, >> I am running SequentialExecutor. >> >> Thanks. >> Jason >> >> >> On Tue, May 31, 2016 at 1:36 PM, Chris Riccomini <criccom...@apache.org> >> wrote: >> >>> Hey Jason, >>> >>> Are you running the SerialExecutor? This is the default out-of-the-box >>> executor. >>> >>> Cheers, >>> Chris >>> >>> On Tue, May 31, 2016 at 12:59 PM, Jason Chen <chingchien.c...@gmail.com> >>> wrote: >>> >>>> Hi Chris, >>>> >>>> I made the changes and tried it out. >>>> It seems not working as expected. >>>> When a dag is running (a particular task inside that dag is taking >>>> time), another task from another dag seems "blocked". >>>> >>>> My setting: >>>> (1) airflow.cfg >>>> max_active_runs_per_dag = 16 >>>> parallelism = 32 >>>> dag_concurrency = 16 >>>> >>>> (2) A dag (dag1) python file is as below partially. Please note that >>>> inside this DAG, the first task (task1) is a long running task >>>> >>>> dag1 = DAG('dag1', schedule_interval=timedelta(minutes=15), >>>> max_active_runs=1, default_args=args) >>>> >>>> Then, the tasks are running in the order... >>>> task1 (long running) --> task 2 --> task3 >>>> ... >>>> (3) In another dag (dag2) python file is as below partially. >>>> dag2 = DAG('dag2', schedule_interval=timedelta(minutes=3), >>>> max_active_runs=1, default_args=args) >>>> ... >>>> Then, the tasks are running in the order... >>>> taskA (short running task) --> taskB >>>> >>>> (4) Inside the upstart script file. this is the main part how I start >>>> airflow scheduler >>>> >>>> env SCHEDULER_RUNS=0 >>>> export SCHEDULER_RUNS >>>> >>>> script >>>> exec >> ${AIRFLOW_HOME}/scheduler-log/airflow-scheduler.log 2>&1 >>>> exec usr/local/bin/airflow scheduler -n ${SCHEDULER_RUNS} >>>> end script >>>> >>>> ========================= >>>> >>>> What I observed are that >>>> (a) task1 (of dag1) is running about 20 mins and during it's running >>>> time, there is no other dag1 triggered. This is as expected. >>>> >>>> (b) taskA (of dag2) should be triggered to run every 3 mins. However, >>>> it is NOT triggered if task-1 of dag-1 is running. >>>> taskA seems to be queued/bolcked and not run. It is executed after >>>> task-1 (of dag-1) is done. So, it looks like it is dispatched into a "gap" >>>> of task1 and task2 (of dag1). This looks not normal, as it's expected taskA >>>> (of dag 2) should run no matter what happens to another dag (dag-1). >>>> >>>> >>>> Any suggestions? >>>> Thanks. >>>> Jason >>>> >>>> >>>> On Tue, May 31, 2016 at 9:02 AM, Chris Riccomini <criccom...@apache.org >>>> > wrote: >>>> >>>>> Hey Jason, >>>>> >>>>> The problem is max_active_runs_per_dag=1. Set it back to 16. You just >>>>> need >>>>> max_active_runs=1 for the individual DAGs. This will allow multiple >>>>> (different) DAGs to run in parallel, but only one DAG of each type can >>>>> run >>>>> at the same type. >>>>> >>>>> Cheers, >>>>> Chris >>>>> >>>>> On Fri, May 27, 2016 at 11:42 PM, Jason Chen < >>>>> chingchien.c...@gmail.com> >>>>> wrote: >>>>> >>>>> > Hi Chris, >>>>> > Thanks for your reply. After setting it up, I observed how it works >>>>> for >>>>> > couple of days.. >>>>> > >>>>> > I tried to to set max_active_runs=1 in the DAG >>>>> > dag = DAG(...max_active_runs=1...) and it executed fine to avoid two >>>>> runs >>>>> > at the same time. >>>>> > However, I noticed other dags (not the dag that is running) is also >>>>> > "paused". >>>>> > My understanding is that "max_active_runs" is basically >>>>> > "max_active_runs_per_dag". >>>>> > So, why another dag (different dag name) cannot run at the same time >>>>> as the >>>>> > first dag? >>>>> > I want to have the two dags can be possibly run at the same time and >>>>> inside >>>>> > each dag, there is only >>>>> > one run per dag. >>>>> > Thanks. >>>>> > >>>>> > Jason >>>>> > >>>>> > My other settings in airflow.cfg >>>>> > >>>>> > max_active_runs_per_dag=1 >>>>> > parallelism = 32 >>>>> > dag_concurrency = 16 >>>>> > >>>>> > >>>>> > >>>>> > On Mon, May 16, 2016 at 8:57 PM, Chris Riccomini < >>>>> criccom...@apache.org> >>>>> > wrote: >>>>> > >>>>> > > Hey Jason, >>>>> > > >>>>> > > For (2), by default, task1 will start running again. You'll have >>>>> two runs >>>>> > > going at the same time. If you want to prevent this, you can set >>>>> > > max_active_runs to 1 in your DAG. >>>>> > > >>>>> > > Cheers, >>>>> > > Chris >>>>> > > >>>>> > > On Mon, May 16, 2016 at 1:09 PM, Jason Chen < >>>>> chingchien.c...@gmail.com> >>>>> > > wrote: >>>>> > > >>>>> > > > I have two questions >>>>> > > > >>>>> > > > (1) For the airflow UI: "Tree view", it lists the tasks along >>>>> with the >>>>> > > time >>>>> > > > highlighted in the top (say, 08:30; 09:00, etc). What's the >>>>> meaning of >>>>> > > > time? It looks not the UTC time of the task was running. I know >>>>> in >>>>> > > > overall, airflow uses UTC time >>>>> > > > (2) I have a DAG with two tasks: task1 --> task2 >>>>> > > > Task1 is running hourly and could take longer than one hour to >>>>> run, >>>>> > > > sometimes. >>>>> > > > In such a setup, task1 will be triggered hourly and what happens >>>>> if the >>>>> > > > previous task1 is still running ? Will the "new" task1 be queued >>>>> ? >>>>> > > > >>>>> > > > Thanks. >>>>> > > > Jason >>>>> > > > >>>>> > > >>>>> > >>>>> >>>> >>>> >>> >> >