Re: About the project support in Airflow

2018-04-26 Thread Taylor Edmiston
We've discussed internally something like having groups or "folders" for
DAGs in the UI.  Nothing functional on the backend, purely a front end
aesthetic.  Something like having DAGs named "foo/bar" and "foo/baz" would
be grouped like a tree visually in the UI:

- Group foo
  - DAG bar
  - DAG baz

Is that what you're looking for?

Best,
Taylor

On Thu, Apr 26, 2018 at 1:51 AM 刘松(Cycle++开发组)  wrote:

> Hi Feng,
>
> Thanks for your information, indeed I have noticed this work also.
>
> But if I am understanding correctly, it is focus on the permission
> (edit/read etc.) with the DAG itself.
>
> “project concept” is some kind of “Group” but it is more meaningful than
> the “Tag”, so if we don’t want to support “project concept”, is there any
> other solution for this requirement or any consideration behind ?
>
> Many thanks for help.
>
> Thanks,
> Song
>
> On 26/04/2018, 12:28 PM, "Tao Feng"  wrote:
>
> Hi Song,
>
> Just noted that we are also working on dag-level access on top of
> RBAC(AIRFLOW-2267) which should provide dag-level acl functionality.
> The
> WIP pr could be found at
> https://github.com/apache/incubator-airflow/pull/3197
>
> On Wed, Apr 25, 2018 at 7:42 PM, 刘松(Cycle++开发组) 
> wrote:
>
> > Hi Taylor,
> >
> > Yes, I know that this RBAC feature would be released within the 1.10
> > release.
> >
> > # About multi-user support
> >
> > But Why not deploy one instance of Airflow per user ? (
> > With this feature, don’t you think that the Airflow is to be more
> likely
> > as a platform to serve more different users.
> > Also multi-user case would exhaust the Airflow resource more easily
> if we
> > are talking the scalability capability of Airflow.
> >
> > # About multi-project support
> >
> > You could see the “project” concept is some kind of logical group of
> the
> > DAGs to let the DAGs be organized more structural.
> > I can’t see it will beat the “scalability” of Airflow somehow, it
> just let
> > the user experience be more friendly I see.
> >
> > So that is why I want to use the “multi-user support” case to argue
> why
> > suggest using multi-instance for “multi-project”,
> > since that I think the “multi-user” support is kindly of pushing the
> > Airflow in the way of “be more scalable”, but “multi-project” just
> be more
> > intuitive and more user-experience friendly.
> >
> > Thanks,
> > Song
> >
> > On 26/04/2018, 4:50 AM, "Taylor Edmiston" 
> wrote:
> >
> > Something else that might be relevant for your multi-user use
> case is
> > the
> > new RBAC support that Joy Gao added.
> >
> > https://github.com/apache/incubator-airflow/pull/3015
> >
> > *Taylor Edmiston*
> > Blog  | Stack Overflow CV
> >  | LinkedIn
> >  | AngelList
> > 
> >
> >
> > On Wed, Apr 25, 2018 at 3:04 PM, James Meickle <
> > jmeic...@quantopian.com>
> > wrote:
> >
> > > Another reason you would want separated infrastructure is that
> there
> > are a
> > > lot of ways to exhaust Airflow resources or otherwise cause
> > contention -
> > > like having too many sensors or sub-DAGs using up all available
> > tasks.
> > >
> > > Doesn't seem like a great idea to push for having different
> teams
> > with
> > > co-tenancy until there is also per-team control over resource
> use...
> > >
> > > On Tue, Apr 24, 2018 at 8:27 PM, 刘松(Cycle++开发组) <
> > liuson...@megvii.com>
> > > wrote:
> > >
> > > > It seems that all the current approach is pointing to
> multiple
> > instance
> > > of
> > > > airflow, but project concept is very nature since one user
> might to
> > > handle
> > > > different type of tasks.
> > > >
> > > > Another thing about the multiple user support, one way is
> also to
> > deploy
> > > > multiple instance, but it seems that airflow is providing
> multiple
> > user
> > > > function builtin.
> > > >
> > > > So I can not be convinced that using multiple instance for
> multiple
> > > > project purpose.
> > > >
> > > > Thanks,
> > > > Song
> > > >
> > > >
> > > >
> > > >
> > > > On Wed, Apr 25, 2018 at 4:25 AM +0800, "Ace Haidrey" <
> > > acehaid...@gmail.com
> > > > > wrote:
> > > >
> > > >
> > > > Looks neat Taylor!
> > > >
> > > > And regarding the original question, going off of what
> Maxime and
> > Bolke
> > > > said, at Pandora, it made more sense for us to have 

Re: About the project support in Airflow

2018-04-25 Thread Cycle++开发组
Hi Feng,

Thanks for your information, indeed I have noticed this work also.

But if I am understanding correctly, it is focus on the permission (edit/read 
etc.) with the DAG itself.

“project concept” is some kind of “Group” but it is more meaningful than the 
“Tag”, so if we don’t want to support “project concept”, is there any other 
solution for this requirement or any consideration behind ?

Many thanks for help.

Thanks,
Song

On 26/04/2018, 12:28 PM, "Tao Feng"  wrote:

Hi Song,

Just noted that we are also working on dag-level access on top of
RBAC(AIRFLOW-2267) which should provide dag-level acl functionality. The
WIP pr could be found at
https://github.com/apache/incubator-airflow/pull/3197

On Wed, Apr 25, 2018 at 7:42 PM, 刘松(Cycle++开发组) 
wrote:

> Hi Taylor,
>
> Yes, I know that this RBAC feature would be released within the 1.10
> release.
>
> # About multi-user support
>
> But Why not deploy one instance of Airflow per user ? (
> With this feature, don’t you think that the Airflow is to be more likely
> as a platform to serve more different users.
> Also multi-user case would exhaust the Airflow resource more easily if we
> are talking the scalability capability of Airflow.
>
> # About multi-project support
>
> You could see the “project” concept is some kind of logical group of the
> DAGs to let the DAGs be organized more structural.
> I can’t see it will beat the “scalability” of Airflow somehow, it just let
> the user experience be more friendly I see.
>
> So that is why I want to use the “multi-user support” case to argue why
> suggest using multi-instance for “multi-project”,
> since that I think the “multi-user” support is kindly of pushing the
> Airflow in the way of “be more scalable”, but “multi-project” just be more
> intuitive and more user-experience friendly.
>
> Thanks,
> Song
>
> On 26/04/2018, 4:50 AM, "Taylor Edmiston"  wrote:
>
> Something else that might be relevant for your multi-user use case is
> the
> new RBAC support that Joy Gao added.
>
> https://github.com/apache/incubator-airflow/pull/3015
>
> *Taylor Edmiston*
> Blog  | Stack Overflow CV
>  | LinkedIn
>  | AngelList
> 
>
>
> On Wed, Apr 25, 2018 at 3:04 PM, James Meickle <
> jmeic...@quantopian.com>
> wrote:
>
> > Another reason you would want separated infrastructure is that there
> are a
> > lot of ways to exhaust Airflow resources or otherwise cause
> contention -
> > like having too many sensors or sub-DAGs using up all available
> tasks.
> >
> > Doesn't seem like a great idea to push for having different teams
> with
> > co-tenancy until there is also per-team control over resource use...
> >
> > On Tue, Apr 24, 2018 at 8:27 PM, 刘松(Cycle++开发组) <
> liuson...@megvii.com>
> > wrote:
> >
> > > It seems that all the current approach is pointing to multiple
> instance
> > of
> > > airflow, but project concept is very nature since one user might 
to
> > handle
> > > different type of tasks.
> > >
> > > Another thing about the multiple user support, one way is also to
> deploy
> > > multiple instance, but it seems that airflow is providing multiple
> user
> > > function builtin.
> > >
> > > So I can not be convinced that using multiple instance for 
multiple
> > > project purpose.
> > >
> > > Thanks,
> > > Song
> > >
> > >
> > >
> > >
> > > On Wed, Apr 25, 2018 at 4:25 AM +0800, "Ace Haidrey" <
> > acehaid...@gmail.com
> > > > wrote:
> > >
> > >
> > > Looks neat Taylor!
> > >
> > > And regarding the original question, going off of what Maxime and
> Bolke
> > > said, at Pandora, it made more sense for us to have an instance
> per team
> > > since each team has its own system user for prod and the instance
> can run
> > > all processes as that user. Alternatively you could have a super
> user
> > that
> > > can sudo as those other system users, and have many teams on a
> single
> > > instance but that is a security concern (what if one team sudo's
> as the
> > > other team and accidentally overwrites data - there is nothing
> stopping
> > > them from doing it). It depends what your org set up is, but let
> me know
> > if
> > > there are any questions I can help 

Re: About the project support in Airflow

2018-04-25 Thread Tao Feng
Hi Song,

Just noted that we are also working on dag-level access on top of
RBAC(AIRFLOW-2267) which should provide dag-level acl functionality. The
WIP pr could be found at
https://github.com/apache/incubator-airflow/pull/3197

On Wed, Apr 25, 2018 at 7:42 PM, 刘松(Cycle++开发组) 
wrote:

> Hi Taylor,
>
> Yes, I know that this RBAC feature would be released within the 1.10
> release.
>
> # About multi-user support
>
> But Why not deploy one instance of Airflow per user ? (
> With this feature, don’t you think that the Airflow is to be more likely
> as a platform to serve more different users.
> Also multi-user case would exhaust the Airflow resource more easily if we
> are talking the scalability capability of Airflow.
>
> # About multi-project support
>
> You could see the “project” concept is some kind of logical group of the
> DAGs to let the DAGs be organized more structural.
> I can’t see it will beat the “scalability” of Airflow somehow, it just let
> the user experience be more friendly I see.
>
> So that is why I want to use the “multi-user support” case to argue why
> suggest using multi-instance for “multi-project”,
> since that I think the “multi-user” support is kindly of pushing the
> Airflow in the way of “be more scalable”, but “multi-project” just be more
> intuitive and more user-experience friendly.
>
> Thanks,
> Song
>
> On 26/04/2018, 4:50 AM, "Taylor Edmiston"  wrote:
>
> Something else that might be relevant for your multi-user use case is
> the
> new RBAC support that Joy Gao added.
>
> https://github.com/apache/incubator-airflow/pull/3015
>
> *Taylor Edmiston*
> Blog  | Stack Overflow CV
>  | LinkedIn
>  | AngelList
> 
>
>
> On Wed, Apr 25, 2018 at 3:04 PM, James Meickle <
> jmeic...@quantopian.com>
> wrote:
>
> > Another reason you would want separated infrastructure is that there
> are a
> > lot of ways to exhaust Airflow resources or otherwise cause
> contention -
> > like having too many sensors or sub-DAGs using up all available
> tasks.
> >
> > Doesn't seem like a great idea to push for having different teams
> with
> > co-tenancy until there is also per-team control over resource use...
> >
> > On Tue, Apr 24, 2018 at 8:27 PM, 刘松(Cycle++开发组) <
> liuson...@megvii.com>
> > wrote:
> >
> > > It seems that all the current approach is pointing to multiple
> instance
> > of
> > > airflow, but project concept is very nature since one user might to
> > handle
> > > different type of tasks.
> > >
> > > Another thing about the multiple user support, one way is also to
> deploy
> > > multiple instance, but it seems that airflow is providing multiple
> user
> > > function builtin.
> > >
> > > So I can not be convinced that using multiple instance for multiple
> > > project purpose.
> > >
> > > Thanks,
> > > Song
> > >
> > >
> > >
> > >
> > > On Wed, Apr 25, 2018 at 4:25 AM +0800, "Ace Haidrey" <
> > acehaid...@gmail.com
> > > > wrote:
> > >
> > >
> > > Looks neat Taylor!
> > >
> > > And regarding the original question, going off of what Maxime and
> Bolke
> > > said, at Pandora, it made more sense for us to have an instance
> per team
> > > since each team has its own system user for prod and the instance
> can run
> > > all processes as that user. Alternatively you could have a super
> user
> > that
> > > can sudo as those other system users, and have many teams on a
> single
> > > instance but that is a security concern (what if one team sudo's
> as the
> > > other team and accidentally overwrites data - there is nothing
> stopping
> > > them from doing it). It depends what your org set up is, but let
> me know
> > if
> > > there are any questions I can help with.
> > >
> > > Ace
> > >
> > >
> > > > On Apr 24, 2018, at 1:16 PM, Taylor Edmiston  wrote:
> > > >
> > > > We use a similar approach like Bolke mentioned with running
> multiple
> > > > Airflow instances.
> > > >
> > > > I haven't read the Pandora article yet, but we have an
> Astronomer Open
> > > > Edition (fully open source) that bundles similar tools like
> Prometheus,
> > > > Grafana, Celery, etc with Airflow and a Docker Compose file if
> you're
> > > > looking to get a setup like that up and running quickly.
> > > >
> > > > https://github.com/astronomerio/astronomer/blob/
> > master/examples/airflow-
> > > enterprise/docker-compose.yml
> > > > https://github.com/astronomerio/astronomer
> > > >
> > > > *Taylor Edmiston*
> > > > Blog  | Stack Overflow CV
> > > >  | LinkedIn
> > > >  | AngelList
> > > >
> > > >
> > > >
> > > > On Tue, Apr 24

Re: About the project support in Airflow

2018-04-25 Thread Cycle++开发组
Hi James,

Yes, the “multi-user” feature is kind of way to exhaust Airflow resources 
actually, but “multi-user” feature is import to let the Airflow to be more like 
as a service.
There will be more work to let Airflow be more scalable, but the direction 
looks to be promising.

Thanks,
Song

On 26/04/2018, 3:05 AM, "James Meickle"  wrote:

Another reason you would want separated infrastructure is that there are a
lot of ways to exhaust Airflow resources or otherwise cause contention -
like having too many sensors or sub-DAGs using up all available tasks.

Doesn't seem like a great idea to push for having different teams with
co-tenancy until there is also per-team control over resource use...

On Tue, Apr 24, 2018 at 8:27 PM, 刘松(Cycle++开发组) 
wrote:

> It seems that all the current approach is pointing to multiple instance of
> airflow, but project concept is very nature since one user might to handle
> different type of tasks.
>
> Another thing about the multiple user support, one way is also to deploy
> multiple instance, but it seems that airflow is providing multiple user
> function builtin.
>
> So I can not be convinced that using multiple instance for multiple
> project purpose.
>
> Thanks,
> Song
>
>
>
>
> On Wed, Apr 25, 2018 at 4:25 AM +0800, "Ace Haidrey"  > wrote:
>
>
> Looks neat Taylor!
>
> And regarding the original question, going off of what Maxime and Bolke
> said, at Pandora, it made more sense for us to have an instance per team
> since each team has its own system user for prod and the instance can run
> all processes as that user. Alternatively you could have a super user that
> can sudo as those other system users, and have many teams on a single
> instance but that is a security concern (what if one team sudo's as the
> other team and accidentally overwrites data - there is nothing stopping
> them from doing it). It depends what your org set up is, but let me know 
if
> there are any questions I can help with.
>
> Ace
>
>
> > On Apr 24, 2018, at 1:16 PM, Taylor Edmiston  wrote:
> >
> > We use a similar approach like Bolke mentioned with running multiple
> > Airflow instances.
> >
> > I haven't read the Pandora article yet, but we have an Astronomer Open
> > Edition (fully open source) that bundles similar tools like Prometheus,
> > Grafana, Celery, etc with Airflow and a Docker Compose file if you're
> > looking to get a setup like that up and running quickly.
> >
> > https://github.com/astronomerio/astronomer/blob/master/examples/airflow-
> enterprise/docker-compose.yml
> > https://github.com/astronomerio/astronomer
> >
> > *Taylor Edmiston*
> > Blog  | Stack Overflow CV
> >  | LinkedIn
> >  | AngelList
> >
> >
> >
> > On Tue, Apr 24, 2018 at 3:30 PM, Maxime Beauchemin <
> > maximebeauche...@gmail.com> wrote:
> >
> >> Related blog post about multi-tenant Airflow deployment out of Pandora:
> >> https://engineering.pandora.com/apache-airflow-at-pandora-1d7a844d68ee
> >>
> >> On Tue, Apr 24, 2018 at 10:20 AM, Bolke de Bruin
> >> wrote:
> >>
> >>> My suggestion would be to deploy airflow per project. You could even
> use
> >>> airflow to manage your ci/cd pipeline.
> >>>
> >>> B.
> >>>
> >>> Sent from my iPhone
> >>>
>  On 24 Apr 2018, at 18:33, Maxime Beauchemin <
> >> maximebeauche...@gmail.com>
> >>> wrote:
> 
>  People have been talking about namespacing DAGs in the past. I'd
> >>> recommend
>  using tags (many to many) instead of categories/projects (one to
> many).
> 
>  It should be fairly easy to add this feature. One question is whether
> >>> tags
>  are defined as code or in the UI/db only.
> 
>  Max
> 
> > On Tue, Apr 24, 2018 at 1:48 AM, Song Liu
> >> wrote:
> >
> > Hi,
> >
> > Basically the DAGs are created for a project purpose, so if I have
> >> many
> > different projects, will the Airflow support the Project concept and
> > organize them separately ?
> >
> > Is this a known requirement or any plan for this already ?
> >
> > Thanks,
> > Song
> >
> >>>
> >>
>
>
>




Re: About the project support in Airflow

2018-04-25 Thread Cycle++开发组
Hi Taylor,

Yes, I know that this RBAC feature would be released within the 1.10 release.

# About multi-user support

But Why not deploy one instance of Airflow per user ? (
With this feature, don’t you think that the Airflow is to be more likely as a 
platform to serve more different users.
Also multi-user case would exhaust the Airflow resource more easily if we are 
talking the scalability capability of Airflow.

# About multi-project support

You could see the “project” concept is some kind of logical group of the DAGs 
to let the DAGs be organized more structural.
I can’t see it will beat the “scalability” of Airflow somehow, it just let the 
user experience be more friendly I see.

So that is why I want to use the “multi-user support” case to argue why suggest 
using multi-instance for “multi-project”,
since that I think the “multi-user” support is kindly of pushing the Airflow in 
the way of “be more scalable”, but “multi-project” just be more intuitive and 
more user-experience friendly.  

Thanks,
Song

On 26/04/2018, 4:50 AM, "Taylor Edmiston"  wrote:

Something else that might be relevant for your multi-user use case is the
new RBAC support that Joy Gao added.

https://github.com/apache/incubator-airflow/pull/3015

*Taylor Edmiston*
Blog  | Stack Overflow CV
 | LinkedIn
 | AngelList



On Wed, Apr 25, 2018 at 3:04 PM, James Meickle 
wrote:

> Another reason you would want separated infrastructure is that there are a
> lot of ways to exhaust Airflow resources or otherwise cause contention -
> like having too many sensors or sub-DAGs using up all available tasks.
>
> Doesn't seem like a great idea to push for having different teams with
> co-tenancy until there is also per-team control over resource use...
>
> On Tue, Apr 24, 2018 at 8:27 PM, 刘松(Cycle++开发组) 
> wrote:
>
> > It seems that all the current approach is pointing to multiple instance
> of
> > airflow, but project concept is very nature since one user might to
> handle
> > different type of tasks.
> >
> > Another thing about the multiple user support, one way is also to deploy
> > multiple instance, but it seems that airflow is providing multiple user
> > function builtin.
> >
> > So I can not be convinced that using multiple instance for multiple
> > project purpose.
> >
> > Thanks,
> > Song
> >
> >
> >
> >
> > On Wed, Apr 25, 2018 at 4:25 AM +0800, "Ace Haidrey" <
> acehaid...@gmail.com
> > > wrote:
> >
> >
> > Looks neat Taylor!
> >
> > And regarding the original question, going off of what Maxime and Bolke
> > said, at Pandora, it made more sense for us to have an instance per team
> > since each team has its own system user for prod and the instance can 
run
> > all processes as that user. Alternatively you could have a super user
> that
> > can sudo as those other system users, and have many teams on a single
> > instance but that is a security concern (what if one team sudo's as the
> > other team and accidentally overwrites data - there is nothing stopping
> > them from doing it). It depends what your org set up is, but let me know
> if
> > there are any questions I can help with.
> >
> > Ace
> >
> >
> > > On Apr 24, 2018, at 1:16 PM, Taylor Edmiston  wrote:
> > >
> > > We use a similar approach like Bolke mentioned with running multiple
> > > Airflow instances.
> > >
> > > I haven't read the Pandora article yet, but we have an Astronomer Open
> > > Edition (fully open source) that bundles similar tools like 
Prometheus,
> > > Grafana, Celery, etc with Airflow and a Docker Compose file if you're
> > > looking to get a setup like that up and running quickly.
> > >
> > > https://github.com/astronomerio/astronomer/blob/
> master/examples/airflow-
> > enterprise/docker-compose.yml
> > > https://github.com/astronomerio/astronomer
> > >
> > > *Taylor Edmiston*
> > > Blog  | Stack Overflow CV
> > >  | LinkedIn
> > >  | AngelList
> > >
> > >
> > >
> > > On Tue, Apr 24, 2018 at 3:30 PM, Maxime Beauchemin <
> > > maximebeauche...@gmail.com> wrote:
> > >
> > >> Related blog post about multi-tenant Airflow deployment out of
> Pandora:
> > >> https://engineering.pandora.com/apache-airflow-at-pandora-
> 1d7a844d68ee
> > >>
> > >> On Tue, Apr 24, 2018 at 10:20 AM, Bolke de Bruin
> > >> wrote:
> > >>
> > >>> My suggestion would be to deploy airflow per project. You could even
> > use
> > >>> airflow to manage your ci/cd pipeline.
> > >>>
> > >>> B.
> > >>>
> > >>> Sent fr

Re: About the project support in Airflow

2018-04-25 Thread Brian Greene
+1

Sent from a device with less than stellar autocorrect

> On Apr 25, 2018, at 12:04 PM, James Meickle  wrote:
> 
> Another reason you would want separated infrastructure is that there are a
> lot of ways to exhaust Airflow resources or otherwise cause contention -
> like having too many sensors or sub-DAGs using up all available tasks.
> 
> Doesn't seem like a great idea to push for having different teams with
> co-tenancy until there is also per-team control over resource use...
> 
> On Tue, Apr 24, 2018 at 8:27 PM, 刘松(Cycle++开发组) 
> wrote:
> 
>> It seems that all the current approach is pointing to multiple instance of
>> airflow, but project concept is very nature since one user might to handle
>> different type of tasks.
>> 
>> Another thing about the multiple user support, one way is also to deploy
>> multiple instance, but it seems that airflow is providing multiple user
>> function builtin.
>> 
>> So I can not be convinced that using multiple instance for multiple
>> project purpose.
>> 
>> Thanks,
>> Song
>> 
>> 
>> 
>> 
>> On Wed, Apr 25, 2018 at 4:25 AM +0800, "Ace Haidrey" > > wrote:
>> 
>> 
>> Looks neat Taylor!
>> 
>> And regarding the original question, going off of what Maxime and Bolke
>> said, at Pandora, it made more sense for us to have an instance per team
>> since each team has its own system user for prod and the instance can run
>> all processes as that user. Alternatively you could have a super user that
>> can sudo as those other system users, and have many teams on a single
>> instance but that is a security concern (what if one team sudo's as the
>> other team and accidentally overwrites data - there is nothing stopping
>> them from doing it). It depends what your org set up is, but let me know if
>> there are any questions I can help with.
>> 
>> Ace
>> 
>> 
>>> On Apr 24, 2018, at 1:16 PM, Taylor Edmiston  wrote:
>>> 
>>> We use a similar approach like Bolke mentioned with running multiple
>>> Airflow instances.
>>> 
>>> I haven't read the Pandora article yet, but we have an Astronomer Open
>>> Edition (fully open source) that bundles similar tools like Prometheus,
>>> Grafana, Celery, etc with Airflow and a Docker Compose file if you're
>>> looking to get a setup like that up and running quickly.
>>> 
>>> https://github.com/astronomerio/astronomer/blob/master/examples/airflow-
>> enterprise/docker-compose.yml
>>> https://github.com/astronomerio/astronomer
>>> 
>>> *Taylor Edmiston*
>>> Blog  | Stack Overflow CV
>>> | LinkedIn
>>> | AngelList
>>> 
>>> 
>>> 
>>> On Tue, Apr 24, 2018 at 3:30 PM, Maxime Beauchemin <
>>> maximebeauche...@gmail.com> wrote:
>>> 
 Related blog post about multi-tenant Airflow deployment out of Pandora:
 https://engineering.pandora.com/apache-airflow-at-pandora-1d7a844d68ee
 
 On Tue, Apr 24, 2018 at 10:20 AM, Bolke de Bruin
 wrote:
 
> My suggestion would be to deploy airflow per project. You could even
>> use
> airflow to manage your ci/cd pipeline.
> 
> B.
> 
> Sent from my iPhone
> 
>> On 24 Apr 2018, at 18:33, Maxime Beauchemin <
 maximebeauche...@gmail.com>
> wrote:
>> 
>> People have been talking about namespacing DAGs in the past. I'd
> recommend
>> using tags (many to many) instead of categories/projects (one to
>> many).
>> 
>> It should be fairly easy to add this feature. One question is whether
> tags
>> are defined as code or in the UI/db only.
>> 
>> Max
>> 
>>> On Tue, Apr 24, 2018 at 1:48 AM, Song Liu
 wrote:
>>> 
>>> Hi,
>>> 
>>> Basically the DAGs are created for a project purpose, so if I have
 many
>>> different projects, will the Airflow support the Project concept and
>>> organize them separately ?
>>> 
>>> Is this a known requirement or any plan for this already ?
>>> 
>>> Thanks,
>>> Song
>>> 
> 
 
>> 
>> 
>> 


Re: About the project support in Airflow

2018-04-25 Thread Taylor Edmiston
Something else that might be relevant for your multi-user use case is the
new RBAC support that Joy Gao added.

https://github.com/apache/incubator-airflow/pull/3015

*Taylor Edmiston*
Blog  | Stack Overflow CV
 | LinkedIn
 | AngelList



On Wed, Apr 25, 2018 at 3:04 PM, James Meickle 
wrote:

> Another reason you would want separated infrastructure is that there are a
> lot of ways to exhaust Airflow resources or otherwise cause contention -
> like having too many sensors or sub-DAGs using up all available tasks.
>
> Doesn't seem like a great idea to push for having different teams with
> co-tenancy until there is also per-team control over resource use...
>
> On Tue, Apr 24, 2018 at 8:27 PM, 刘松(Cycle++开发组) 
> wrote:
>
> > It seems that all the current approach is pointing to multiple instance
> of
> > airflow, but project concept is very nature since one user might to
> handle
> > different type of tasks.
> >
> > Another thing about the multiple user support, one way is also to deploy
> > multiple instance, but it seems that airflow is providing multiple user
> > function builtin.
> >
> > So I can not be convinced that using multiple instance for multiple
> > project purpose.
> >
> > Thanks,
> > Song
> >
> >
> >
> >
> > On Wed, Apr 25, 2018 at 4:25 AM +0800, "Ace Haidrey" <
> acehaid...@gmail.com
> > > wrote:
> >
> >
> > Looks neat Taylor!
> >
> > And regarding the original question, going off of what Maxime and Bolke
> > said, at Pandora, it made more sense for us to have an instance per team
> > since each team has its own system user for prod and the instance can run
> > all processes as that user. Alternatively you could have a super user
> that
> > can sudo as those other system users, and have many teams on a single
> > instance but that is a security concern (what if one team sudo's as the
> > other team and accidentally overwrites data - there is nothing stopping
> > them from doing it). It depends what your org set up is, but let me know
> if
> > there are any questions I can help with.
> >
> > Ace
> >
> >
> > > On Apr 24, 2018, at 1:16 PM, Taylor Edmiston  wrote:
> > >
> > > We use a similar approach like Bolke mentioned with running multiple
> > > Airflow instances.
> > >
> > > I haven't read the Pandora article yet, but we have an Astronomer Open
> > > Edition (fully open source) that bundles similar tools like Prometheus,
> > > Grafana, Celery, etc with Airflow and a Docker Compose file if you're
> > > looking to get a setup like that up and running quickly.
> > >
> > > https://github.com/astronomerio/astronomer/blob/
> master/examples/airflow-
> > enterprise/docker-compose.yml
> > > https://github.com/astronomerio/astronomer
> > >
> > > *Taylor Edmiston*
> > > Blog  | Stack Overflow CV
> > >  | LinkedIn
> > >  | AngelList
> > >
> > >
> > >
> > > On Tue, Apr 24, 2018 at 3:30 PM, Maxime Beauchemin <
> > > maximebeauche...@gmail.com> wrote:
> > >
> > >> Related blog post about multi-tenant Airflow deployment out of
> Pandora:
> > >> https://engineering.pandora.com/apache-airflow-at-pandora-
> 1d7a844d68ee
> > >>
> > >> On Tue, Apr 24, 2018 at 10:20 AM, Bolke de Bruin
> > >> wrote:
> > >>
> > >>> My suggestion would be to deploy airflow per project. You could even
> > use
> > >>> airflow to manage your ci/cd pipeline.
> > >>>
> > >>> B.
> > >>>
> > >>> Sent from my iPhone
> > >>>
> >  On 24 Apr 2018, at 18:33, Maxime Beauchemin <
> > >> maximebeauche...@gmail.com>
> > >>> wrote:
> > 
> >  People have been talking about namespacing DAGs in the past. I'd
> > >>> recommend
> >  using tags (many to many) instead of categories/projects (one to
> > many).
> > 
> >  It should be fairly easy to add this feature. One question is
> whether
> > >>> tags
> >  are defined as code or in the UI/db only.
> > 
> >  Max
> > 
> > > On Tue, Apr 24, 2018 at 1:48 AM, Song Liu
> > >> wrote:
> > >
> > > Hi,
> > >
> > > Basically the DAGs are created for a project purpose, so if I have
> > >> many
> > > different projects, will the Airflow support the Project concept
> and
> > > organize them separately ?
> > >
> > > Is this a known requirement or any plan for this already ?
> > >
> > > Thanks,
> > > Song
> > >
> > >>>
> > >>
> >
> >
> >
>


Re: About the project support in Airflow

2018-04-25 Thread James Meickle
Another reason you would want separated infrastructure is that there are a
lot of ways to exhaust Airflow resources or otherwise cause contention -
like having too many sensors or sub-DAGs using up all available tasks.

Doesn't seem like a great idea to push for having different teams with
co-tenancy until there is also per-team control over resource use...

On Tue, Apr 24, 2018 at 8:27 PM, 刘松(Cycle++开发组) 
wrote:

> It seems that all the current approach is pointing to multiple instance of
> airflow, but project concept is very nature since one user might to handle
> different type of tasks.
>
> Another thing about the multiple user support, one way is also to deploy
> multiple instance, but it seems that airflow is providing multiple user
> function builtin.
>
> So I can not be convinced that using multiple instance for multiple
> project purpose.
>
> Thanks,
> Song
>
>
>
>
> On Wed, Apr 25, 2018 at 4:25 AM +0800, "Ace Haidrey"  > wrote:
>
>
> Looks neat Taylor!
>
> And regarding the original question, going off of what Maxime and Bolke
> said, at Pandora, it made more sense for us to have an instance per team
> since each team has its own system user for prod and the instance can run
> all processes as that user. Alternatively you could have a super user that
> can sudo as those other system users, and have many teams on a single
> instance but that is a security concern (what if one team sudo's as the
> other team and accidentally overwrites data - there is nothing stopping
> them from doing it). It depends what your org set up is, but let me know if
> there are any questions I can help with.
>
> Ace
>
>
> > On Apr 24, 2018, at 1:16 PM, Taylor Edmiston  wrote:
> >
> > We use a similar approach like Bolke mentioned with running multiple
> > Airflow instances.
> >
> > I haven't read the Pandora article yet, but we have an Astronomer Open
> > Edition (fully open source) that bundles similar tools like Prometheus,
> > Grafana, Celery, etc with Airflow and a Docker Compose file if you're
> > looking to get a setup like that up and running quickly.
> >
> > https://github.com/astronomerio/astronomer/blob/master/examples/airflow-
> enterprise/docker-compose.yml
> > https://github.com/astronomerio/astronomer
> >
> > *Taylor Edmiston*
> > Blog  | Stack Overflow CV
> >  | LinkedIn
> >  | AngelList
> >
> >
> >
> > On Tue, Apr 24, 2018 at 3:30 PM, Maxime Beauchemin <
> > maximebeauche...@gmail.com> wrote:
> >
> >> Related blog post about multi-tenant Airflow deployment out of Pandora:
> >> https://engineering.pandora.com/apache-airflow-at-pandora-1d7a844d68ee
> >>
> >> On Tue, Apr 24, 2018 at 10:20 AM, Bolke de Bruin
> >> wrote:
> >>
> >>> My suggestion would be to deploy airflow per project. You could even
> use
> >>> airflow to manage your ci/cd pipeline.
> >>>
> >>> B.
> >>>
> >>> Sent from my iPhone
> >>>
>  On 24 Apr 2018, at 18:33, Maxime Beauchemin <
> >> maximebeauche...@gmail.com>
> >>> wrote:
> 
>  People have been talking about namespacing DAGs in the past. I'd
> >>> recommend
>  using tags (many to many) instead of categories/projects (one to
> many).
> 
>  It should be fairly easy to add this feature. One question is whether
> >>> tags
>  are defined as code or in the UI/db only.
> 
>  Max
> 
> > On Tue, Apr 24, 2018 at 1:48 AM, Song Liu
> >> wrote:
> >
> > Hi,
> >
> > Basically the DAGs are created for a project purpose, so if I have
> >> many
> > different projects, will the Airflow support the Project concept and
> > organize them separately ?
> >
> > Is this a known requirement or any plan for this already ?
> >
> > Thanks,
> > Song
> >
> >>>
> >>
>
>
>


Re: About the project support in Airflow

2018-04-24 Thread Cycle++开发组
It seems that all the current approach is pointing to multiple instance of 
airflow, but project concept is very nature since one user might to handle 
different type of tasks.

Another thing about the multiple user support, one way is also to deploy 
multiple instance, but it seems that airflow is providing multiple user 
function builtin.

So I can not be convinced that using multiple instance for multiple project 
purpose.

Thanks,
Song




On Wed, Apr 25, 2018 at 4:25 AM +0800, "Ace Haidrey" 
mailto:acehaid...@gmail.com>> wrote:


Looks neat Taylor!

And regarding the original question, going off of what Maxime and Bolke said, 
at Pandora, it made more sense for us to have an instance per team since each 
team has its own system user for prod and the instance can run all processes as 
that user. Alternatively you could have a super user that can sudo as those 
other system users, and have many teams on a single instance but that is a 
security concern (what if one team sudo's as the other team and accidentally 
overwrites data - there is nothing stopping them from doing it). It depends 
what your org set up is, but let me know if there are any questions I can help 
with.

Ace


> On Apr 24, 2018, at 1:16 PM, Taylor Edmiston  wrote:
>
> We use a similar approach like Bolke mentioned with running multiple
> Airflow instances.
>
> I haven't read the Pandora article yet, but we have an Astronomer Open
> Edition (fully open source) that bundles similar tools like Prometheus,
> Grafana, Celery, etc with Airflow and a Docker Compose file if you're
> looking to get a setup like that up and running quickly.
>
> https://github.com/astronomerio/astronomer/blob/master/examples/airflow-enterprise/docker-compose.yml
> https://github.com/astronomerio/astronomer
>
> *Taylor Edmiston*
> Blog  | Stack Overflow CV
>  | LinkedIn
>  | AngelList
>
>
>
> On Tue, Apr 24, 2018 at 3:30 PM, Maxime Beauchemin <
> maximebeauche...@gmail.com> wrote:
>
>> Related blog post about multi-tenant Airflow deployment out of Pandora:
>> https://engineering.pandora.com/apache-airflow-at-pandora-1d7a844d68ee
>>
>> On Tue, Apr 24, 2018 at 10:20 AM, Bolke de Bruin
>> wrote:
>>
>>> My suggestion would be to deploy airflow per project. You could even use
>>> airflow to manage your ci/cd pipeline.
>>>
>>> B.
>>>
>>> Sent from my iPhone
>>>
 On 24 Apr 2018, at 18:33, Maxime Beauchemin <
>> maximebeauche...@gmail.com>
>>> wrote:

 People have been talking about namespacing DAGs in the past. I'd
>>> recommend
 using tags (many to many) instead of categories/projects (one to many).

 It should be fairly easy to add this feature. One question is whether
>>> tags
 are defined as code or in the UI/db only.

 Max

> On Tue, Apr 24, 2018 at 1:48 AM, Song Liu
>> wrote:
>
> Hi,
>
> Basically the DAGs are created for a project purpose, so if I have
>> many
> different projects, will the Airflow support the Project concept and
> organize them separately ?
>
> Is this a known requirement or any plan for this already ?
>
> Thanks,
> Song
>
>>>
>>




Re: About the project support in Airflow

2018-04-24 Thread Ace Haidrey
Looks neat Taylor!

And regarding the original question, going off of what Maxime and Bolke said, 
at Pandora, it made more sense for us to have an instance per team since each 
team has its own system user for prod and the instance can run all processes as 
that user. Alternatively you could have a super user that can sudo as those 
other system users, and have many teams on a single instance but that is a 
security concern (what if one team sudo's as the other team and accidentally 
overwrites data - there is nothing stopping them from doing it). It depends 
what your org set up is, but let me know if there are any questions I can help 
with.

Ace


> On Apr 24, 2018, at 1:16 PM, Taylor Edmiston  wrote:
> 
> We use a similar approach like Bolke mentioned with running multiple
> Airflow instances.
> 
> I haven't read the Pandora article yet, but we have an Astronomer Open
> Edition (fully open source) that bundles similar tools like Prometheus,
> Grafana, Celery, etc with Airflow and a Docker Compose file if you're
> looking to get a setup like that up and running quickly.
> 
> https://github.com/astronomerio/astronomer/blob/master/examples/airflow-enterprise/docker-compose.yml
> https://github.com/astronomerio/astronomer
> 
> *Taylor Edmiston*
> Blog  | Stack Overflow CV
>  | LinkedIn
>  | AngelList
> 
> 
> 
> On Tue, Apr 24, 2018 at 3:30 PM, Maxime Beauchemin <
> maximebeauche...@gmail.com> wrote:
> 
>> Related blog post about multi-tenant Airflow deployment out of Pandora:
>> https://engineering.pandora.com/apache-airflow-at-pandora-1d7a844d68ee
>> 
>> On Tue, Apr 24, 2018 at 10:20 AM, Bolke de Bruin 
>> wrote:
>> 
>>> My suggestion would be to deploy airflow per project. You could even use
>>> airflow to manage your ci/cd pipeline.
>>> 
>>> B.
>>> 
>>> Sent from my iPhone
>>> 
 On 24 Apr 2018, at 18:33, Maxime Beauchemin <
>> maximebeauche...@gmail.com>
>>> wrote:
 
 People have been talking about namespacing DAGs in the past. I'd
>>> recommend
 using tags (many to many) instead of categories/projects (one to many).
 
 It should be fairly easy to add this feature. One question is whether
>>> tags
 are defined as code or in the UI/db only.
 
 Max
 
> On Tue, Apr 24, 2018 at 1:48 AM, Song Liu 
>> wrote:
> 
> Hi,
> 
> Basically the DAGs are created for a project purpose, so if I have
>> many
> different projects, will the Airflow support the Project concept and
> organize them separately ?
> 
> Is this a known requirement or any plan for this already ?
> 
> Thanks,
> Song
> 
>>> 
>> 



Re: About the project support in Airflow

2018-04-24 Thread Taylor Edmiston
We use a similar approach like Bolke mentioned with running multiple
Airflow instances.

I haven't read the Pandora article yet, but we have an Astronomer Open
Edition (fully open source) that bundles similar tools like Prometheus,
Grafana, Celery, etc with Airflow and a Docker Compose file if you're
looking to get a setup like that up and running quickly.

https://github.com/astronomerio/astronomer/blob/master/examples/airflow-enterprise/docker-compose.yml
https://github.com/astronomerio/astronomer

*Taylor Edmiston*
Blog  | Stack Overflow CV
 | LinkedIn
 | AngelList



On Tue, Apr 24, 2018 at 3:30 PM, Maxime Beauchemin <
maximebeauche...@gmail.com> wrote:

> Related blog post about multi-tenant Airflow deployment out of Pandora:
> https://engineering.pandora.com/apache-airflow-at-pandora-1d7a844d68ee
>
> On Tue, Apr 24, 2018 at 10:20 AM, Bolke de Bruin 
> wrote:
>
> > My suggestion would be to deploy airflow per project. You could even use
> > airflow to manage your ci/cd pipeline.
> >
> > B.
> >
> > Sent from my iPhone
> >
> > > On 24 Apr 2018, at 18:33, Maxime Beauchemin <
> maximebeauche...@gmail.com>
> > wrote:
> > >
> > > People have been talking about namespacing DAGs in the past. I'd
> > recommend
> > > using tags (many to many) instead of categories/projects (one to many).
> > >
> > > It should be fairly easy to add this feature. One question is whether
> > tags
> > > are defined as code or in the UI/db only.
> > >
> > > Max
> > >
> > >> On Tue, Apr 24, 2018 at 1:48 AM, Song Liu 
> wrote:
> > >>
> > >> Hi,
> > >>
> > >> Basically the DAGs are created for a project purpose, so if I have
> many
> > >> different projects, will the Airflow support the Project concept and
> > >> organize them separately ?
> > >>
> > >> Is this a known requirement or any plan for this already ?
> > >>
> > >> Thanks,
> > >> Song
> > >>
> >
>


Re: About the project support in Airflow

2018-04-24 Thread Maxime Beauchemin
Related blog post about multi-tenant Airflow deployment out of Pandora:
https://engineering.pandora.com/apache-airflow-at-pandora-1d7a844d68ee

On Tue, Apr 24, 2018 at 10:20 AM, Bolke de Bruin  wrote:

> My suggestion would be to deploy airflow per project. You could even use
> airflow to manage your ci/cd pipeline.
>
> B.
>
> Sent from my iPhone
>
> > On 24 Apr 2018, at 18:33, Maxime Beauchemin 
> wrote:
> >
> > People have been talking about namespacing DAGs in the past. I'd
> recommend
> > using tags (many to many) instead of categories/projects (one to many).
> >
> > It should be fairly easy to add this feature. One question is whether
> tags
> > are defined as code or in the UI/db only.
> >
> > Max
> >
> >> On Tue, Apr 24, 2018 at 1:48 AM, Song Liu  wrote:
> >>
> >> Hi,
> >>
> >> Basically the DAGs are created for a project purpose, so if I have many
> >> different projects, will the Airflow support the Project concept and
> >> organize them separately ?
> >>
> >> Is this a known requirement or any plan for this already ?
> >>
> >> Thanks,
> >> Song
> >>
>


Re: About the project support in Airflow

2018-04-24 Thread Bolke de Bruin
My suggestion would be to deploy airflow per project. You could even use 
airflow to manage your ci/cd pipeline. 

B.

Sent from my iPhone

> On 24 Apr 2018, at 18:33, Maxime Beauchemin  
> wrote:
> 
> People have been talking about namespacing DAGs in the past. I'd recommend
> using tags (many to many) instead of categories/projects (one to many).
> 
> It should be fairly easy to add this feature. One question is whether tags
> are defined as code or in the UI/db only.
> 
> Max
> 
>> On Tue, Apr 24, 2018 at 1:48 AM, Song Liu  wrote:
>> 
>> Hi,
>> 
>> Basically the DAGs are created for a project purpose, so if I have many
>> different projects, will the Airflow support the Project concept and
>> organize them separately ?
>> 
>> Is this a known requirement or any plan for this already ?
>> 
>> Thanks,
>> Song
>> 


Re: About the project support in Airflow

2018-04-24 Thread Maxime Beauchemin
People have been talking about namespacing DAGs in the past. I'd recommend
using tags (many to many) instead of categories/projects (one to many).

It should be fairly easy to add this feature. One question is whether tags
are defined as code or in the UI/db only.

Max

On Tue, Apr 24, 2018 at 1:48 AM, Song Liu  wrote:

> Hi,
>
> Basically the DAGs are created for a project purpose, so if I have many
> different projects, will the Airflow support the Project concept and
> organize them separately ?
>
> Is this a known requirement or any plan for this already ?
>
> Thanks,
> Song
>