Re: Making Airflow Fault-Tolerant when running Airflow on Kubernetes

Kevin Lam Mon, 12 Nov 2018 10:16:20 -0800

Friendly ping :). Do you think you could elaborate on the fault tolerance a
bit, Daniel? Thanks for your help!


On Wed, Sep 12, 2018 at 5:35 PM Kevin Lam <ke...@fathomhealth.co> wrote:

> Hi Daniel,
>
> Thanks for the reply!
>
> No we haven't looked too deeply into it. Can you elaborate a bit on how
> that works? With the KubernetesExecutor, if a DAG is in flight and part of
> airflow go down, it will be able to recover? How do airflow workers
> reconnect to Pods that were in flight?
>
> On Wed, Sep 12, 2018 at 4:59 PM Daniel Imberman <daniel.imber...@gmail.com>
> wrote:
>
>> Hi Kevin,
>>
>> Have you looked into the KubernetesExecutor? We achieve fault tolerance
>> using the kubernetes resourceVersion to ensure that all state is
>> reproducible.
>>
>> On Wed, Sep 12, 2018 at 1:08 PM Kevin Lam <ke...@fathomhealth.co> wrote:
>>
>> > Hi all,
>> >
>> > We currently run Airflow as a Deployment in a kubernetes cluster. We
>> also
>> > use a variant of KubernetesOperator to run our DAGs.
>> >
>> > We are investigating how to best make Airflow fault-tolerant, in part,
>> due
>> > to investigating the use of preemptible vms [1]. *Has there been much
>> > discussion about about how to deploy Airflow in a fault-tolerant way?
>> Are
>> > there any best practices? Ideally we'd like our kubernetes-hosted
>> Airflow
>> > to support rolling updates for Docker image updates and also recover
>> from
>> > components (worker, scheduler, web) going down temporarily, including
>> when
>> > DAGs are in flight. *
>> >
>> > Any advice, ideas and/or feedback appreciated!
>> >
>> > [1]
>> https://cloud.google.com/kubernetes-engine/docs/how-to/preemptible-vms
>> >
>>
>

Re: Making Airflow Fault-Tolerant when running Airflow on Kubernetes

Reply via email to