Re: Failover in apache 1.8.0

Ruiqin Yang Fri, 20 Jul 2018 14:28:24 -0700

"scheduler lost track of it" means cases like the scheduler process got
killed. When scheduler restarts, tasks with SCHEDULED or QUEUED state will
be set to NONE state.


For SLA, I think that delay is included, here is the logic how Airflow
calculates SLA misses
<https://github.com/apache/incubator-airflow/blob/284dbdb60ab1fec027dea4871e3013a4727f6041/airflow/jobs.py#L604-L739>.
I think the SLA in Airflow is similar( e.g. you can add sla_miss_callback
into your DAG), here's the doc
<https://airflow.apache.org/concepts.html?highlight=slas#slas> for it.

Cheers,
Kevin Y

On Fri, Jul 20, 2018 at 1:49 PM Shubham Gupta <shubham180695...@gmail.com>
wrote:

> Also, is this delay b/w adding of task in queue and beginning of task on
> the worker not included in SLA of the task? Or is the SLA period begins
> once the task actually starts on the worker? Also, if scheduler has to wait
> for a response from the worker for the final state of the task
> (success/failure), how can the scheduler loose track of the task?
>
> FYI, I am comparing airflow with quartz, which has a mistrigger handling
> built in. Mistrigger in quartz means that the task was not started within a
> pre-configured interval beginning form the scheduled time of start. Isn't
> there something similar in airflow?
>
> Regards
> Shubham Gupta
>
> On Fri, Jul 20, 2018 at 1:42 PM Shubham Gupta <shubham180695...@gmail.com>
> wrote:
>
> > Hi Ruiqin Yang,
> >
> > Can you please elaborate on what is meant by "and the scheduler lost
> > track of it"  in your second paragraph? When can this happen? Also, what
> > is the default state when the scheduler restarts? Is it not* None*?
> >
> > Thanks for your quick reply.
> >
> > Regards
> > Shubham Gupta
> >
> >
> > On Fri, Jul 20, 2018 at 1:04 AM Ruiqin Yang <yrql...@gmail.com> wrote:
> >
> >> Hi Shubham,
> >>
> >> Worker running actual airflow task will regularly heartbeat, which
> updates
> >> the task instance entry in the DB. Scheduler will kill task instance w/o
> >> heartbeat for a long time, called zombie tasks, and if the task has
> retry
> >> left it will try to reschedule it( given all trigger rules are
> satisfied).
> >>
> >> If workers have heavy load, the scheduler will still be able to schedule
> >> tasks( putting tasks into worker queue). And you will just wait for
> >> workers
> >> to pick up the tasks from the queue. If the tasks never get picked up
> and
> >> the scheduler lost track of it, their state will be reset to NONE when
> >> scheduler restarts, they are called orphan tasks.
> >>
> >> FYI, inside Airbnb, Alex Guziel( @saguziel <https://github.com/saguziel
> >)
> >> has a patch that will requeue tasks if they don't get picked up by
> workers
> >> for a long time and he has plan to open source it.
> >>
> >> Cheers,
> >> Kevin Y
> >>
> >> On Fri, Jul 20, 2018 at 12:40 AM Shubham Gupta <
> >> shubham180695...@gmail.com>
> >> wrote:
> >>
> >> > Hi,
> >> >
> >> > I would like to know what happens if a Celery worker running one of
> the
> >> > tasks crashes. Will the job be rescheduled?
> >> >
> >> > Also, if the scheduler is not able to schedule a task on time due to
> >> heavy
> >> > load on all workers, what will happen to the task?
> >> >
> >> > Regards
> >> > Shubham Gupta
> >> >
> >>
> >
>

Re: Failover in apache 1.8.0

Reply via email to