*[Proposal]*
Create a new *syncer *command to sync dags from any remote folder, which
will be used as initContainer command in KubernetesExecutor.
It is just like the initdb command, but it will copy dags from the remote
folder before running the dag.

*[Problem]*
Currently, there are only two ways supported to mount dags in the pod
created by the kubernetes executor: *GCS* and PersistentVolumeClaim(*PVC*).

When using the PVC option it becomes difficult to update the dags in the
persistent volume correctly.
Normally, we run rsync/cp command to copy the dags from a remote folder to
the volume, but it results in an error of reading and writing to the same
file.
We encountered this issue in our production airflow where files were read
and written to at the same time. Link to mailing list discussion:
https://mail-archives.apache.org/mod_mbox/airflow-dev/201908.mbox/browser

If we are reading from sources like S3/GCS, it is not natively supported in
airflow, and thus one has to write custom code to pull data from the remote
folder.

Since we cannot know within the pod when the dags will be updated, the
route of having an initContainer to copy dags from the remote location on
instantiation of pod becomes a better choice.

*[Implementation]*
We can create a new command called syncer inorder to sync dags from a
remote location.
We will pass *dags_syncer_conn_id* in airflow.cfg and then using this
connection and *dags_syncer_folder* denotes the remote location from where
the files can be copied.

In the initContainer we copy all contents from the

*[Benefit]*
a. In our pipelines to upload dags to airflow, we can simply write to the
remote folder and be rest assured that the dags will be taken into account
by airflow.

b. We can provide S3 or GCS folder as the source of dags and we need to
just publish to this folder.
Airflow itself will take care of syncing dags from this remote folder and
thus there is native support of pulling dags from these sources.

Please share your views/comments.

Regards,
Maulik

Reply via email to