Re: [Discuss] Airflow Kubernetes worker configuration should be parsed from YAML

James Meickle Wed, 06 Mar 2019 07:19:00 -0800

I'm in favor of having a YAML-based option for Kubernetes. We've had to
internally subclass the Kubernetes operator because it really isn't doing
what we need out of the box; such as intercepting the object it creates
right before it sends it so that we can patch in missing features. I think
it would make sense to make this a sibling class to the existing operator,
since it can use the same watching/submitting logic, but just accept YAML
instead. Using existing Airflow templating systems here would make sense
too, of course.

However, what I'd really like to see is a Helm operator!

Airflow tasks often require temporary resources. Here's an example: we run
the same container in ~12 different configurations. Each of them requires
slightly different ConfigMaps. As of right now, we have to manage the
ConfigMaps out of band from Airflow, because Airflow has no way to maintain
or update those ConfigMaps. This can lead to pushing the new code to
Airflow, but forgetting to update the ConfigMaps.

What would be ideal for us is to define the task _and_ its necessary
resources in a Helm chart (either in the same repo as the DAG, or pointing
to a semver tag). Then the operator would wait for the entire chart to
finish successfully, including creating and tearing down resources as
required.

This would also help in scenarios where we want to run a task outside of
Airflow. Right now, a lot of our tasks are "baked into" the DAG and can't
be run without either going through Airflow, or manually copying config
options from the DAG code. Declaring a task as a resource, and then just
referencing that resource from Airflow, would allow us to also reference
that resource in other systems in our infrastructure and ensure that it
gets invoked in an identical way.

Unfortunately Helm itself has some issues around not really having a
concept of "one-off" tasks. So we started to build something like this
in-house but ran into roadblocks. We looked into hacks like storing task
definitions in a CronJob but I came to the conclusion that a TaskTemplate
CRD would be needed to support this kind of workflow.

On Wed, Mar 6, 2019 at 10:06 AM [email protected] <[email protected]>
wrote:

> Hi,
>
> I would like to discuss parsing YAML for the Kubernetes worker
> configuration instead of the current process of programmatically generating
> the YAML from the Pod and PodRequest Factory as is done currently.
>
> *Motivation:*
>
> Kubernetes configuration is quite complex. Instead of using the
> configuration system that is offered natively by Kubernetes (YAML),the
> current method involves programmatically recreating this YAML file. Fully
> re-implementing the configuration in Airflow is taking a lot of time, and
> at the moment many features available through YAML configuration are not
> available in Airflow. Furthermore, as the Kubernetes API evolves, the
> Airflow codebase will have to change with it, and Airflow will be in a
> constant state of catching up with missing features available. This can all
> be solved by simply parsing the YAML file.
>
> *Idea:*
>
> Either pass in the YAML as string or have a path to the YAML file.
>

Re: [Discuss] Airflow Kubernetes worker configuration should be parsed from YAML

Reply via email to