Hi Kevin,

Have you looked into the KubernetesExecutor? We achieve fault tolerance
using the kubernetes resourceVersion to ensure that all state is
reproducible.

On Wed, Sep 12, 2018 at 1:08 PM Kevin Lam <ke...@fathomhealth.co> wrote:

> Hi all,
>
> We currently run Airflow as a Deployment in a kubernetes cluster. We also
> use a variant of KubernetesOperator to run our DAGs.
>
> We are investigating how to best make Airflow fault-tolerant, in part, due
> to investigating the use of preemptible vms [1]. *Has there been much
> discussion about about how to deploy Airflow in a fault-tolerant way? Are
> there any best practices? Ideally we'd like our kubernetes-hosted Airflow
> to support rolling updates for Docker image updates and also recover from
> components (worker, scheduler, web) going down temporarily, including when
> DAGs are in flight. *
>
> Any advice, ideas and/or feedback appreciated!
>
> [1] https://cloud.google.com/kubernetes-engine/docs/how-to/preemptible-vms
>

Reply via email to