Hi guys,

A related topic has been discussed recently via a separate email thread
(see 'How to add hooks for strong deployment consistency?
<https://lists.apache.org/thread.html/%3CCAB=riaaxkskea4a7vx0mzpp7jsh0kktp0whzkwgwdd1vr2s...@mail.gmail.com%3E>
')

The idea brought up by Maxime is to modify DagBag and implement a
DagFetcher abstraction, where the default is "FileSystemDagFetcher", but it
open up doors for "GitRepoDagFetcher", "ArtifactoryDagFetcher",
"TarballInS3DagFetcher", or in this case, "HDFSDagFetcher", "S3DagFetcher",
and "GCSDagFetcher".

We are all in favor of this, but as far as I'm aware no one has owned this
yet. So if you (or anyone) wants to work on this, please create a JIRA and
call it out :)

Cheers,
Joy



On Thu, Mar 15, 2018 at 3:54 PM, Chris Fei <cfe...@gmail.com> wrote:

> Hi Diogo,
>
> This would be valuable for me as well, I'd love first-class support for
> hdfs://..., s3://..., gcs://..., etc as a value for dags_folder. As a
> workaround, I deploy a maintenance DAG that periodically downloads other
> DAGs from GCS into my DAG folder. Not perfect, but gets the job done.
> Chris
>
> On Thu, Mar 15, 2018, at 6:32 PM, Diogo Franco wrote:
> > Hi all,
> >
> > I think that the ability to fill up the DagBag from remote
> > locations would> be useful (in my use case, having the dags folder in
> HDFS would
> > greatly> simplify the release process).
> >
> > Was there any discussion on this previously? I looked around
> > briefly but> couldn't find it.
> >
> > Maybe the method **DagBag.collect_dags** in *airflow/models.py *could>
> delegate the walking part to specific methods based on the
> > *dags_folder *prefix,
> > in a sort of plugin architecture. This would allow the
> > dags_folder to be> defined like hdfs://namenode/user/airflow/dags, or
> s3://...
> >
> > If this makes sense, I'd love to work on it.
> >
> > Cheers,
> > Diogo Franco
>
>

Reply via email to