Re: [DISCUSS] DRAFT AIP-67 Multi-tenant deployment of Airflow components

2024-04-05 Thread Mehta, Shubham
@Gabe - Thanks for chiming in on this important discussion. I hope other users 
who are looking for a multi-tenant deployment do the same. As Niko mentioned, 
the proposal is not to have a separate database per tenant, but rather having 
isolation with other means, while giving the benefit of shared resources to our 
users. I think what you mentioned, "this allows for efficient use of resources 
while respecting the security needs of many organizations," is the key here. 
Some of us think that RBAC isolation is enough, which I think is not true. 
Having RBAC isolation only guarantees UI separation and does nothing from a 
security point of view.

As an exercise to get more citable points on this, I went over last year's 
use-case sessions during the Airflow summit [1]. A few things I noted from 
there:
1. All users used the word "multi-tenant" to describe their setup for 
supporting multiple teams, not "multi-team" or anything else. In the end, this 
feature is to serve the need of our users, not maintainers, and we should focus 
on naming that helps users easily and intuitively understand it. Additionally, 
as previously stated, the community has been using the term "multi-tenant" in 
many of our discussions [2, 3]/AIPs that have been widely shared and reviewed. 
Going back now, especially when it has not been made clear what a "true" 
multi-tenant system would look like and whether we will ever implement it, 
seems futile.

2. A few companies, such as Salesforce and Wealthsimple, talked about security 
aspects which are only possible with some form of database isolation. Not 
saying that this is a must for all companies, as Snap called out during their 
talk (and during the recent virtual townhall) that database isolation is not a 
must requirement for them. 

I would recommend others who would like to get more user perspective to review 
presentations from last year's Airflow summit.

Based on this, I think some level of database isolation should be considered in 
the scope of this AIP. That said, during implementation, we can always break 
into different parts and do RBAC first to ease the development process. In the 
end, I hope we make the decision based on what's right for our users.

1. https://airflowsummit.org/sessions/2023/
2. https://github.com/apache/airflow/discussions/26602
3. https://github.com/apache/airflow/issues/317


Thanks
Shubham

On 2024-04-05, 3:09 PM, "Oliveira, Niko" mailto:oniko...@amazon.com.inva>LID> wrote:


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.






AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe. Ne 
cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne pouvez pas 
confirmer l’identité de l’expéditeur et si vous n’êtes pas certain que le 
contenu ne présente aucun risque.






Thanks for the reply Gabe! I'm glad you're interested in the topic and weighing 
in for your company.




> @Niko – If I misstated your intent, I hope you will clarify that for me. 
> Thanks!


The intent (DB isolation being critical) was there, but the approach you 
attributed to me is not actually what I'm proposing:


> 1. A separate db per tenant, as Niko suggests.


I'm not suggesting a separate DB per tenant. AIP-67 along with AIP-44 together 
actually already provide a solution to isolate users from each other at the DB 
level (basically adding the DB API layer and then putting access control on 
that API). I'm just advocating that we _keep_ that in the scope of AIP-67 
proposal because I think that level of isolation is critical.


Cheers,
Niko







From: Gabe Schenz mailto:jsch...@chewy.com.inva>LID>
Sent: Friday, April 5, 2024 2:55:08 PM
To: dev@airflow.apache.org 
Subject: RE: [EXTERNAL] [COURRIEL EXTERNE] [DISCUSS] DRAFT AIP-67 Multi-tenant 
deployment of Airflow components


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.






AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe. Ne 
cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne pouvez pas 
confirmer l’identité de l’expéditeur et si vous n’êtes pas certain que le 
contenu ne présente aucun risque.






The outcome that Niko brings up is that any tenants should not be able to 
interact with metadata that is from other tenants within the Airflow 
environment. If we focus on that as an outcome, what are the various options 
which would support that goal?




1. A separate db per tenant, as Niko suggests.
2. A separate schema within the metadata database
3. Applying row-based access control policies within the metadata db.
4. …?


For #3, postgres supports row-level security[1] which could be a sufficient 
means of separation between tenants. I 

Re: [DISCUSS] DRAFT AIP-67 Multi-tenant deployment of Airflow components

2024-04-05 Thread Oliveira, Niko
Thanks for the reply Gabe! I'm glad you're interested in the topic and weighing 
in for your company.


> @Niko – If I misstated your intent, I hope you will clarify that for me. 
> Thanks!

The intent (DB isolation being critical) was there, but the approach you 
attributed to me is not actually what I'm proposing:

> 1.  A separate db per tenant, as Niko suggests.

I'm not suggesting a separate DB per tenant. AIP-67 along with AIP-44 together 
actually already provide a solution to isolate users from each other at the DB 
level (basically adding the DB API layer and then putting access control on 
that API). I'm just advocating that we _keep_ that in the scope of AIP-67 
proposal because I think that level of isolation is critical.

Cheers,
Niko




From: Gabe Schenz 
Sent: Friday, April 5, 2024 2:55:08 PM
To: dev@airflow.apache.org
Subject: RE: [EXTERNAL] [COURRIEL EXTERNE] [DISCUSS] DRAFT AIP-67 Multi-tenant 
deployment of Airflow components

CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.



AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe. Ne 
cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne pouvez pas 
confirmer l’identité de l’expéditeur et si vous n’êtes pas certain que le 
contenu ne présente aucun risque.



The outcome that Niko brings up is that any tenants should not be able to 
interact with metadata that is from other tenants within the Airflow 
environment.  If we focus on that as an outcome, what are the various options 
which would support that goal?


  1.  A separate db per tenant, as Niko suggests.
  2.  A separate schema within the metadata database
  3.  Applying row-based access control policies within the metadata db.
  4.  …?

For #3, postgres supports row-level security[1] which could be a sufficient 
means of separation between tenants.  I feel that an approach like this allows 
for efficient use of resources while respecting the security needs of many 
organizations. This seems simpler from an implementation perspective as well.

As a data engineer in an enterprise that has separate Airflow environments for 
each team, I feel that we would greatly benefit from the ability to have a 
single set of resources to manage while allowing tenants to have their own 
separate secrets (connections, variables) and their own access policies (object 
storage and other resources) within our cloud provider’s platform.

@Niko – If I misstated your intent, I hope you will clarify that for me. Thanks!

[1] https://www.postgresql.org/docs/current/ddl-rowsecurity.html

Cheers,
Gabe

--
Gabe Schenz


From: Oliveira, Niko 
Date: Thursday, March 28, 2024 at 12:38 PM
To: dev@airflow.apache.org 
Subject: [EXTERNAL] Re: [DISCUSS] DRAFT AIP-67 Multi-tenant deployment of 
Airflow components
This Message originated outside your organization.

Hey folks, just some thoughts on the topics below:

1) I'm not too fussed about the naming. There has been many years of us 
branding this multitenancy (talks, townhalls, email chains, etc), so a lot of 
our users are already familiar with this name. I'm not sure we'll benefit much 
by changing it, rather we'll mostly just confuse people. But also a more 
accurate name is never a bad thing either.

2) I'm not sold on reducing the scope of the AIP in this way. I think the DB 
isolation is one of the most important pieces. We have spoken with many users 
and customers of MWAA who's top requirement of anything multi tenant is that in 
now way can users query or modify DB records for other tenants. You could put 
XCOM, variables, secrets, connections in different backends of course, but the 
serialized dags are still in the DB never mind dag run history and other 
related bits.

A 2 phased approach like Jarek mentioned could be okay (or in parallel if we 
have the people power to do so), but I'd still like to vote on the whole pack 
as it is rather than reduce the scope in this way.

Just my 2c


From: Jarek Potiuk 
Sent: Wednesday, March 27, 2024 6:04:14 AM
To: dev@airflow.apache.org
Subject: RE: [EXTERNAL] [COURRIEL EXTERNE] [DISCUSS] DRAFT AIP-67 Multi-tenant 
deployment of Airflow components

CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.



AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe. Ne 
cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne pouvez pas 
confirmer l’identité de l’expéditeur et si vous n’êtes pas certain que le 
contenu ne présente aucun risque.



Hey Ash (and others who already commented). I hoped, that writing the
proposal down will finally make us all discuss (and eventually converge) on
what "multi-tenancy" means for Airflow and I am glad we have different

Re: [DISCUSS] DRAFT AIP-67 Multi-tenant deployment of Airflow components

2024-04-05 Thread Gabe Schenz
The outcome that Niko brings up is that any tenants should not be able to 
interact with metadata that is from other tenants within the Airflow 
environment.  If we focus on that as an outcome, what are the various options 
which would support that goal?


  1.  A separate db per tenant, as Niko suggests.
  2.  A separate schema within the metadata database
  3.  Applying row-based access control policies within the metadata db.
  4.  …?

For #3, postgres supports row-level security[1] which could be a sufficient 
means of separation between tenants.  I feel that an approach like this allows 
for efficient use of resources while respecting the security needs of many 
organizations. This seems simpler from an implementation perspective as well.

As a data engineer in an enterprise that has separate Airflow environments for 
each team, I feel that we would greatly benefit from the ability to have a 
single set of resources to manage while allowing tenants to have their own 
separate secrets (connections, variables) and their own access policies (object 
storage and other resources) within our cloud provider’s platform.

@Niko – If I misstated your intent, I hope you will clarify that for me. Thanks!

[1] https://www.postgresql.org/docs/current/ddl-rowsecurity.html

Cheers,
Gabe

--
Gabe Schenz


From: Oliveira, Niko 
Date: Thursday, March 28, 2024 at 12:38 PM
To: dev@airflow.apache.org 
Subject: [EXTERNAL] Re: [DISCUSS] DRAFT AIP-67 Multi-tenant deployment of 
Airflow components
This Message originated outside your organization.

Hey folks, just some thoughts on the topics below:

1) I'm not too fussed about the naming. There has been many years of us 
branding this multitenancy (talks, townhalls, email chains, etc), so a lot of 
our users are already familiar with this name. I'm not sure we'll benefit much 
by changing it, rather we'll mostly just confuse people. But also a more 
accurate name is never a bad thing either.

2) I'm not sold on reducing the scope of the AIP in this way. I think the DB 
isolation is one of the most important pieces. We have spoken with many users 
and customers of MWAA who's top requirement of anything multi tenant is that in 
now way can users query or modify DB records for other tenants. You could put 
XCOM, variables, secrets, connections in different backends of course, but the 
serialized dags are still in the DB never mind dag run history and other 
related bits.

A 2 phased approach like Jarek mentioned could be okay (or in parallel if we 
have the people power to do so), but I'd still like to vote on the whole pack 
as it is rather than reduce the scope in this way.

Just my 2c


From: Jarek Potiuk 
Sent: Wednesday, March 27, 2024 6:04:14 AM
To: dev@airflow.apache.org
Subject: RE: [EXTERNAL] [COURRIEL EXTERNE] [DISCUSS] DRAFT AIP-67 Multi-tenant 
deployment of Airflow components

CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.



AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe. Ne 
cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne pouvez pas 
confirmer l’identité de l’expéditeur et si vous n’êtes pas certain que le 
contenu ne présente aucun risque.



Hey Ash (and others who already commented). I hoped, that writing the
proposal down will finally make us all discuss (and eventually converge) on
what "multi-tenancy" means for Airflow and I am glad we have different
opinions - both about naming and scope of the change.

I think those are two things that are worth to discuss a bit separately:

1) "Naming" - Is it enough for "team separation" to use the name
"multi-tenancy" ? I think that one deserves even something like a poll or
eventually voting to see what's a general perception of what "tenancy" and
particularly "multi-tenancy" is. Over the last few weeks I've been talking
to a number of parties (including some offline talks) and it seems that
"multi-tenancy" has completely different co-notation for people - depending
where they come from. For most maintainers it seems what the proposal is
about is "not multi-tenancy", for those who would like to serve different
customers - it's also not "multi-tenancy", but for most of the "platform
teams" I spoke to (those who manage an internal deployment of airflow for a
big organisation with multiple teams/departments - this is "precisely
multi-tenancy".

I am absolutely happy if we use different name eventually, I am not at all
connected to multi-tenancy as a name, and I am happy with for example
"multi-team airflow deployment". I started to like the name actually as it
more explicitly explains what the proposal is (and it still keeps the
multi-"t..." ). And since there are so many people who get confused about
it - I think I would run quick poll during the voting period where I'd add
("and how would you like it to be named" - and I can 

Re: [VOTE] Release Airflow 2.9.0 from 2.9.0rc2

2024-04-05 Thread Jed Cunningham
+1 (binding)

Checked reproducibility, signatures, checksums, licences. Used it with the
helm chart with a few different configs. All looks good!


Re: [DISCUSS] Proposal for adding Telemetry via Scarf

2024-04-05 Thread Michał Modras
My 2 cents: it must be possible to opt-out, preferably it should be
possible to deploy Airflow instances without bundling the telemetry library
dependencies. Other than that I don't mind it being e.g. optional provider.

śr., 3 kwi 2024, 22:42 użytkownik Hussein Awala  napisał:

> > I'd like to propose, that we start with collecting simple data with
> limited access: to all the PMC members. We can always expand it to
> Committers and then expand further to make it invite-only or setup
> exporting it to a DB like Postgres
>  and have a publicly
> viewable dashboard.
>
> Looks like a good plan; we can discuss the export format when we decide to
> do it.
>
> On Wed, Apr 3, 2024 at 7:59 PM Kaxil Naik  wrote:
>
> > Yup, exactly.
> >
> > I believe this would definitely help us take early and informed
> decisions.
> >> E.g. Had we had this earlier, I believe it would have definitely helped
> us
> >> more for our past discussions like whether we should continue supporting
> >> MsSQL(https://lists.apache.org/thread/r06j306hldg03g2my1pd4nyjxg78b3h4
> ),
> >> similarly about the DaskExecutor (
> >> https://lists.apache.org/thread/ptwjf5g87lyl5476krt91bzfrm96pnb1), etc.
> >>
> >
> >
> > Btw clarifying my own stance on the below; and let me know what you
> think @Hussein
> > Awala  : I'd like to propose, that we start with
> > collecting simple data with limited access: to all the PMC members. We
> can
> > always expand it to Committers and then expand further to make it
> > invite-only or setup exporting it to a DB like Postgres
> >  and have a
> publicly
> > viewable dashboard. It would be similar to an iterative software
> > development approach, since this will be the first time for us, as
> Airflow
> > PMC, to add such telemetry. This is of course just my opinion though :)
> >
> > Regarding the data, like I had mentioned in the email and I am glad
> others
> >> including you are on the same page that the data will be shared with all
> >> PMC members. The point about sharing it via website and newsletter was
> for
> >> the community — Airflow users. I don’t think anyone in the community
> (apart
> >> from the PMC members) would need raw data. And even if they need it, I’d
> >> say they should put effort and contribute to the Airflow project and
> become
> >> PMC members.
> >> To be clear: this telemetry data should help us, as Airflow PMC, to
> steer
> >> some of the decision making based on this data similar to how only PMC
> has
> >> a binding vote on the releases. [1] and this is similar to how Apache
> >> Superset does it too.
> >> [1]
> >> https://www.apache.org/dev/pmc.html#what-is-a-pmc
> >
> >
> > On Wed, 3 Apr 2024 at 12:03, Pankaj Koti  .invalid>
> > wrote:
> >
> >> +1 to introduce this.
> >>
> >> I believe this would definitely help us take early and informed
> decisions.
> >> E.g. Had we had this earlier, I believe it would have definitely helped
> us
> >> more for our past discussions like whether we should continue supporting
> >> MsSQL(https://lists.apache.org/thread/r06j306hldg03g2my1pd4nyjxg78b3h4
> ),
> >> similarly about the DaskExecutor (
> >> https://lists.apache.org/thread/ptwjf5g87lyl5476krt91bzfrm96pnb1), etc.
> >>
> >>
> >> Best regards,
> >>
> >> *Pankaj Koti*
> >> Senior Software Engineer (Airflow OSS Engineering team)
> >> Location: Pune, Maharashtra, India
> >> Timezone: Indian Standard Time (IST)
> >> Phone: +91 9730079985
> >>
> >>
> >> On Wed, Apr 3, 2024 at 2:44 PM Kaxil Naik  wrote:
> >>
> >> > Yup, I had added a link to scarf docs in the original email that
> >> referenced
> >> > opting out and we should even add an Airflow config that puts all
> >> config in
> >> > a single place. Without it we can’t be compliant to all the policies
> >> even
> >> > if we collectively ignore or are unaware of the importance of it.
> >> >
> >> > Regarding the data, like I had mentioned in the email and I am glad
> >> others
> >> > including you are on the same page that the data will be shared with
> all
> >> > PMC members. The point about sharing it via website and newsletter was
> >> for
> >> > the community — Airflow users. I don’t think anyone in the community
> >> (apart
> >> > from the PMC members) would need raw data. And even if they need it,
> I’d
> >> > say they should put effort and contribute to the Airflow project and
> >> become
> >> > PMC members.
> >> >
> >> > To be clear: this telemetry data should help us, as Airflow PMC, to
> >> steer
> >> > some of the decision making based on this data similar to how only PMC
> >> has
> >> > a binding vote on the releases. [1] and this is similar to how Apache
> >> > Superset does it too.
> >> >
> >> > [1]
> >> > https://www.apache.org/dev/pmc.html#what-is-a-pmc
> >> >
> >> >
> >> >
> >> > On Wed, 3 Apr 2024 at 00:05, Hussein Awala  wrote:
> >> >
> >> > > I mentioned opting out just to confirm its importance, and after
> >> checking
> >> 

Re: [DISCUSS] Consider disabling self-hosted runners for commiter PRs

2024-04-05 Thread Wei Lee
+1 for this. I do not yet have enough chance to experience many job failures, 
but it won’t harm us to test them out. Plus, it saves some of the cost.

Best,
Wei

> On Apr 5, 2024, at 11:36 PM, Jarek Potiuk  wrote:
> 
> Seeing no big "no's" - I will prepare and run the experiment - starting
> some time next week, after we get 2.9.0 out - I do not want to break
> anything there. In the meantime, preparatory PR to add "use self-hosted
> runners" label is out https://github.com/apache/airflow/pull/38779
> 
> On Fri, Apr 5, 2024 at 4:21 PM Bishundeo, Rajeshwar
>  wrote:
> 
>> +1 with trying this out. I agree with keeping the canary builds
>> self-hosted in order to validate the usage for the PRs.
>> 
>> -- Rajesh
>> 
>> 
>> From: Jarek Potiuk 
>> Reply-To: "dev@airflow.apache.org" 
>> Date: Friday, April 5, 2024 at 8:36 AM
>> To: "dev@airflow.apache.org" 
>> Subject: RE: [EXTERNAL] [COURRIEL EXTERNE] [DISCUSS] Consider disabling
>> self-hosted runners for commiter PRs
>> 
>> 
>> CAUTION: This email originated from outside of the organization. Do not
>> click links or open attachments unless you can confirm the sender and know
>> the content is safe.
>> 
>> 
>> 
>> AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe.
>> Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne pouvez
>> pas confirmer l’identité de l’expéditeur et si vous n’êtes pas certain que
>> le contenu ne présente aucun risque.
>> 
>> 
>> Yeah. Valid concerns Hussein.
>> 
>> And I am happy to share some more information on that. I did not want to
>> put all of that in the original email, but I see that might be interesting
>> for you and possibly others.
>> 
>> I am closely following the numbers now. One of the reasons I am doing /
>> proposing it now is that finally (after almost 3 years of waiting) we
>> finally have access to some metrics that we can check. As of last week I
>> got access to the ASF metrics (
>> https://issues.apache.org/jira/browse/INFRA-25662).
>> 
>> I have access to "organisation" level information. Infra does not want to
>> open it to everyone - even to every member -  but since I got very active
>> and been helping with a number I got the access granted as an exception.
>> Also I saw a small dashboard the INFRA prepares to open to everyone once
>> they sort the access where we will be able to see the "per-project" usage.
>> 
>> Some stats that I can share (they asked not to share too much).
>> 
>> From what I looked at I can tell that we are right now (the whole ASF
>> organisation) safely below the total capacity. With a large margin - enough
>> to handle spikes, but of course the growth of usage is there and if
>> uncontrolled - we can again reach the same situation that triggered getting
>> self-hosted runners a few years ago.
>> 
>> Luckily - INRA gets it under control this time |(and metrics will help).
>> In the last INFRA newsletter, they announced some limitations that will
>> apply to the projects (effective as of end of April) - so once those will
>> be followed, we should be "safe" from being impacted by others (i.e.
>> noisy-neighbour effect). Some of the projects (not Airflow (!) ) were
>> exceeding those so far and they will be capped - they will need to optimize
>> their builds eventually.
>> 
>> Those are the rules:
>> 
>> * All workflows MUST have a job concurrency level less than or equal to
>> 20. This means a workflow cannot have more than 20 jobs running at the same
>> time across all matrices.
>> * All workflows SHOULD have a job concurrency level less than or equal to
>> 15. Just because 20 is the max, doesn't mean you should strive for 20.
>> * The average number of minutes a project uses per calendar week MUST NOT
>> exceed the equivalent of 25 full-time runners (250,000 minutes, or 4,200
>> hours).
>> * The average number of minutes a project uses in any consecutive five-day
>> period MUST NOT exceed the equivalent of 30 full-time runners (216,000
>> minutes, or 3,600 hours).
>> * Projects whose builds consistently cross the maximum use limits will
>> lose their access to GitHub Actions until they fix their build
>> configurations.
>> 
>> Those numbers on their own do not tell much, but we can easily see what
>> they mean when we put them side-by-side t with "our" current numbers.
>> 
>> * Currently - with all the "public" usage we are at 8 full-time runners.
>> This is after some of the changes I've done, With the recent changes I
>> already moved a lot of the non-essential build components that do not
>> require a lot of parallelism to public runners.
>> * The 20/15 jobs limit is a bit artificial (not really enforceable on
>> workflow level) - but in our case as I optimized most PR to run just a
>> subset of the tests, The average will be way below that - no matter if you
>> are committer or not, regular PRs are far smaller subset of the jobs than
>> full "canary" build. And for canary builds we should stay - at least for
>> now - with self-hosted 

Re: [DISCUSS] Consider disabling self-hosted runners for commiter PRs

2024-04-05 Thread Jarek Potiuk
Seeing no big "no's" - I will prepare and run the experiment - starting
some time next week, after we get 2.9.0 out - I do not want to break
anything there. In the meantime, preparatory PR to add "use self-hosted
runners" label is out https://github.com/apache/airflow/pull/38779

On Fri, Apr 5, 2024 at 4:21 PM Bishundeo, Rajeshwar
 wrote:

> +1 with trying this out. I agree with keeping the canary builds
> self-hosted in order to validate the usage for the PRs.
>
> -- Rajesh
>
>
> From: Jarek Potiuk 
> Reply-To: "dev@airflow.apache.org" 
> Date: Friday, April 5, 2024 at 8:36 AM
> To: "dev@airflow.apache.org" 
> Subject: RE: [EXTERNAL] [COURRIEL EXTERNE] [DISCUSS] Consider disabling
> self-hosted runners for commiter PRs
>
>
> CAUTION: This email originated from outside of the organization. Do not
> click links or open attachments unless you can confirm the sender and know
> the content is safe.
>
>
>
> AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe.
> Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne pouvez
> pas confirmer l’identité de l’expéditeur et si vous n’êtes pas certain que
> le contenu ne présente aucun risque.
>
>
> Yeah. Valid concerns Hussein.
>
> And I am happy to share some more information on that. I did not want to
> put all of that in the original email, but I see that might be interesting
> for you and possibly others.
>
> I am closely following the numbers now. One of the reasons I am doing /
> proposing it now is that finally (after almost 3 years of waiting) we
> finally have access to some metrics that we can check. As of last week I
> got access to the ASF metrics (
> https://issues.apache.org/jira/browse/INFRA-25662).
>
> I have access to "organisation" level information. Infra does not want to
> open it to everyone - even to every member -  but since I got very active
> and been helping with a number I got the access granted as an exception.
> Also I saw a small dashboard the INFRA prepares to open to everyone once
> they sort the access where we will be able to see the "per-project" usage.
>
> Some stats that I can share (they asked not to share too much).
>
> From what I looked at I can tell that we are right now (the whole ASF
> organisation) safely below the total capacity. With a large margin - enough
> to handle spikes, but of course the growth of usage is there and if
> uncontrolled - we can again reach the same situation that triggered getting
> self-hosted runners a few years ago.
>
> Luckily - INRA gets it under control this time |(and metrics will help).
> In the last INFRA newsletter, they announced some limitations that will
> apply to the projects (effective as of end of April) - so once those will
> be followed, we should be "safe" from being impacted by others (i.e.
> noisy-neighbour effect). Some of the projects (not Airflow (!) ) were
> exceeding those so far and they will be capped - they will need to optimize
> their builds eventually.
>
> Those are the rules:
>
> * All workflows MUST have a job concurrency level less than or equal to
> 20. This means a workflow cannot have more than 20 jobs running at the same
> time across all matrices.
> * All workflows SHOULD have a job concurrency level less than or equal to
> 15. Just because 20 is the max, doesn't mean you should strive for 20.
> * The average number of minutes a project uses per calendar week MUST NOT
> exceed the equivalent of 25 full-time runners (250,000 minutes, or 4,200
> hours).
> * The average number of minutes a project uses in any consecutive five-day
> period MUST NOT exceed the equivalent of 30 full-time runners (216,000
> minutes, or 3,600 hours).
> * Projects whose builds consistently cross the maximum use limits will
> lose their access to GitHub Actions until they fix their build
> configurations.
>
> Those numbers on their own do not tell much, but we can easily see what
> they mean when we put them side-by-side t with "our" current numbers.
>
> * Currently - with all the "public" usage we are at 8 full-time runners.
> This is after some of the changes I've done, With the recent changes I
> already moved a lot of the non-essential build components that do not
> require a lot of parallelism to public runners.
> * The 20/15 jobs limit is a bit artificial (not really enforceable on
> workflow level) - but in our case as I optimized most PR to run just a
> subset of the tests, The average will be way below that - no matter if you
> are committer or not, regular PRs are far smaller subset of the jobs than
> full "canary" build. And for canary builds we should stay - at least for
> now - with self-hosted runners.
>
> Some of the back-of-the envelope calculations of what might happen when we
> switch to "public" for everyone:
>
> Unfortunately, until we enable the experiment, I do not have an easy way
> to distinguish the "canary" from "committer" runs so those are a bit
> guesses. But our self-hosted build time vs. public build time is ~ 20% more
> 

Re: [DISCUSS] Consider disabling self-hosted runners for commiter PRs

2024-04-05 Thread Bishundeo, Rajeshwar
+1 with trying this out. I agree with keeping the canary builds self-hosted in 
order to validate the usage for the PRs.

-- Rajesh


From: Jarek Potiuk 
Reply-To: "dev@airflow.apache.org" 
Date: Friday, April 5, 2024 at 8:36 AM
To: "dev@airflow.apache.org" 
Subject: RE: [EXTERNAL] [COURRIEL EXTERNE] [DISCUSS] Consider disabling 
self-hosted runners for commiter PRs


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.



AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe. Ne 
cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne pouvez pas 
confirmer l’identité de l’expéditeur et si vous n’êtes pas certain que le 
contenu ne présente aucun risque.


Yeah. Valid concerns Hussein.

And I am happy to share some more information on that. I did not want to put 
all of that in the original email, but I see that might be interesting for you 
and possibly others.

I am closely following the numbers now. One of the reasons I am doing / 
proposing it now is that finally (after almost 3 years of waiting) we finally 
have access to some metrics that we can check. As of last week I got access to 
the ASF metrics (https://issues.apache.org/jira/browse/INFRA-25662).

I have access to "organisation" level information. Infra does not want to open 
it to everyone - even to every member -  but since I got very active and been 
helping with a number I got the access granted as an exception. Also I saw a 
small dashboard the INFRA prepares to open to everyone once they sort the 
access where we will be able to see the "per-project" usage.

Some stats that I can share (they asked not to share too much).

From what I looked at I can tell that we are right now (the whole ASF 
organisation) safely below the total capacity. With a large margin - enough to 
handle spikes, but of course the growth of usage is there and if uncontrolled - 
we can again reach the same situation that triggered getting self-hosted 
runners a few years ago.

Luckily - INRA gets it under control this time |(and metrics will help). In the 
last INFRA newsletter, they announced some limitations that will apply to the 
projects (effective as of end of April) - so once those will be followed, we 
should be "safe" from being impacted by others (i.e. noisy-neighbour effect). 
Some of the projects (not Airflow (!) ) were exceeding those so far and they 
will be capped - they will need to optimize their builds eventually.

Those are the rules:

* All workflows MUST have a job concurrency level less than or equal to 20. 
This means a workflow cannot have more than 20 jobs running at the same time 
across all matrices.
* All workflows SHOULD have a job concurrency level less than or equal to 15. 
Just because 20 is the max, doesn't mean you should strive for 20.
* The average number of minutes a project uses per calendar week MUST NOT 
exceed the equivalent of 25 full-time runners (250,000 minutes, or 4,200 hours).
* The average number of minutes a project uses in any consecutive five-day 
period MUST NOT exceed the equivalent of 30 full-time runners (216,000 minutes, 
or 3,600 hours).
* Projects whose builds consistently cross the maximum use limits will lose 
their access to GitHub Actions until they fix their build configurations.

Those numbers on their own do not tell much, but we can easily see what they 
mean when we put them side-by-side t with "our" current numbers.

* Currently - with all the "public" usage we are at 8 full-time runners.  This 
is after some of the changes I've done, With the recent changes I already moved 
a lot of the non-essential build components that do not require a lot of 
parallelism to public runners.
* The 20/15 jobs limit is a bit artificial (not really enforceable on workflow 
level) - but in our case as I optimized most PR to run just a subset of the 
tests, The average will be way below that - no matter if you are committer or 
not, regular PRs are far smaller subset of the jobs than full "canary" build. 
And for canary builds we should stay - at least for now - with self-hosted 
runners.

Some of the back-of-the envelope calculations of what might happen when we 
switch to "public" for everyone:

Unfortunately, until we enable the experiment, I do not have an easy way to 
distinguish the "canary" from "committer" runs so those are a bit guesses. But 
our self-hosted build time vs. public build time is ~ 20% more for self-hosted 
(100.000 minutes vs. 80.000 minutes this month) - see the attached screenshot 
for the current month.
As you can see - building images are already moved to public runners for 
everyone as of two weeks or so, so that will not change.

Taking into account that self-hosted ones are ~ 1.7x faster, this means that 
currently we have ~ 2x more self-hosted time used than public. We can assume 
that 50% of that are committer PRs and "Canary" builds are the 

Re: [DISCUSS] Consider disabling self-hosted runners for commiter PRs

2024-04-05 Thread Jarek Potiuk
Yeah. Valid concerns Hussein.

And I am happy to share some more information on that. I did not want to
put all of that in the original email, but I see that might be interesting
for you and possibly others.

I am closely following the numbers now. One of the reasons I am doing /
proposing it now is that finally (after almost 3 years of waiting) we
finally have access to some metrics that we can check. As of last week I
got access to the ASF metrics (
https://issues.apache.org/jira/browse/INFRA-25662).

I have access to "organisation" level information. Infra does not want to
open it to everyone - even to every member -  but since I got very active
and been helping with a number I got the access granted as an exception.
Also I saw a small dashboard the INFRA prepares to open to everyone once
they sort the access where we will be able to see the "per-project" usage.

Some stats that I can share (they asked not to share too much).

>From what I looked at I can tell that we are right now (the whole ASF
organisation) safely below the total capacity. With a large margin - enough
to handle spikes, but of course the growth of usage is there and if
uncontrolled - we can again reach the same situation that triggered getting
self-hosted runners a few years ago.

Luckily - INRA gets it under control this time |(and metrics will help). In
the last INFRA newsletter, they announced some limitations that will apply
to the projects (effective as of end of April) - so once those will be
followed, we should be "safe" from being impacted by others (i.e.
noisy-neighbour effect). Some of the projects (not Airflow (!) ) were
exceeding those so far and they will be capped - they will need to optimize
their builds eventually.

Those are the rules:

* All workflows MUST have a job concurrency level less than or equal to 20.
This means a workflow cannot have more than 20 jobs running at the same
time across all matrices.
* All workflows SHOULD have a job concurrency level less than or equal to
15. Just because 20 is the max, doesn't mean you should strive for 20.
* The average number of minutes a project uses per calendar week MUST NOT
exceed the equivalent of 25 full-time runners (250,000 minutes, or 4,200
hours).
* The average number of minutes a project uses in any consecutive five-day
period MUST NOT exceed the equivalent of 30 full-time runners (216,000
minutes, or 3,600 hours).
* Projects whose builds consistently cross the maximum use limits will lose
their access to GitHub Actions until they fix their build configurations.

Those numbers on their own do not tell much, but we can easily see what
they mean when we put them side-by-side t with "our" current numbers.

* Currently - with all the "public" usage we are at 8 full-time runners.
This is after some of the changes I've done, With the recent changes I
already moved a lot of the non-essential build components that do not
require a lot of parallelism to public runners.
* The 20/15 jobs limit is a bit artificial (not really enforceable on
workflow level) - but in our case as I optimized most PR to run just a
subset of the tests, The average will be way below that - no matter if you
are committer or not, regular PRs are far smaller subset of the jobs than
full "canary" build. And for canary builds we should stay - at least for
now - with self-hosted runners.

Some of the back-of-the envelope calculations of what might happen when we
switch to "public" for everyone:

Unfortunately, until we enable the experiment, I do not have an easy way to
distinguish the "canary" from "committer" runs so those are a bit guesses.
But our self-hosted build time vs. public build time is ~ 20% more for
self-hosted (100.000 minutes vs. 80.000 minutes this month) - see the
attached screenshot for the current month.
As you can see - building images are already moved to public runners for
everyone as of two weeks or so, so that will not change.

Taking into account that self-hosted ones are ~ 1.7x faster, this means
that currently we have ~ 2x more self-hosted time used than public. We can
assume that 50% of that are committer PRs and "Canary" builds are the
second half (sounds safe because canary builds use way more resources, even
if committers run many more PRs than merges).
So by moving committer builds to public runners, we will - likely -
increase our public time 2x (from 8 FT runners to 16 FT runners) - way
below the 25 FT runners that is the "cap" from INFRA, Even if we move all
Canary builds there, we should be at most at ~24 FTs, which is still below
the limits. but would be dangerously close to it. That's why I want to keep
canary builds as self-hosted until we can get some clarity on the "PR"
moving impact.

We will see the final numbers when we move, but I think we are pretty safe
within the limits.

J.


On Fri, Apr 5, 2024 at 1:16 PM Hussein Awala  wrote:

> Although 900 runners seem like a lot, they are shared among the Apache
> organization's 2.2k repositories, of course only a 

Re: [DISCUSS] Consider disabling self-hosted runners for commiter PRs

2024-04-05 Thread Aritra Basu
I'm +0. Definitely don't see any issue with seeing the changes.

--
Regards,
Aritra Basu

On Fri, Apr 5, 2024, 3:37 PM Amogh Desai  wrote:

> +1 I like the idea.
> Looking forward to seeing the difference.
>
> Thanks & Regards,
> Amogh Desai
>
>
> On Fri, Apr 5, 2024 at 3:54 AM Ferruzzi, Dennis
> 
> wrote:
>
> > Interested in seeing the difference, +1
> >
> >
> >  - ferruzzi
> >
> >
> > 
> > From: Oliveira, Niko 
> > Sent: Thursday, April 4, 2024 2:00 PM
> > To: dev@airflow.apache.org
> > Subject: RE: [EXTERNAL] [COURRIEL EXTERNE] [DISCUSS] Consider disabling
> > self-hosted runners for commiter PRs
> >
> > CAUTION: This email originated from outside of the organization. Do not
> > click links or open attachments unless you can confirm the sender and
> know
> > the content is safe.
> >
> >
> >
> > AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe.
> > Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne
> pouvez
> > pas confirmer l’identité de l’expéditeur et si vous n’êtes pas certain
> que
> > le contenu ne présente aucun risque.
> >
> >
> >
> > +1I'd love to see this as well.
> >
> > In the past, stability and long queue times of PR builds have been very
> > frustrating. I'm not 100% sure this is due to using self hosted runners,
> > since 35 queue depth (to my mind) should be plenty. But something about
> > that setup has never seemed quite right to me with queuing. Switching to
> > public runners for a while to experiment would be great to see if it
> > improves.
> >
> > 
> > From: Pankaj Koti 
> > Sent: Thursday, April 4, 2024 12:41:02 PM
> > To: dev@airflow.apache.org
> > Subject: RE: [EXTERNAL] [COURRIEL EXTERNE] [DISCUSS] Consider disabling
> > self-hosted runners for commiter PRs
> >
> > CAUTION: This email originated from outside of the organization. Do not
> > click links or open attachments unless you can confirm the sender and
> know
> > the content is safe.
> >
> >
> >
> > AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe.
> > Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne
> pouvez
> > pas confirmer l’identité de l’expéditeur et si vous n’êtes pas certain
> que
> > le contenu ne présente aucun risque.
> >
> >
> >
> > +1 from me to this idea.
> >
> > Sounds very reasonable to me.
> > At times, my experience has been better with public runners instead of
> > self-hosted runners :)
> >
> > And like already mentioned in the discussion, I think having the ability
> of
> > a applying the label "use-self-hosted-runners" to be used for critical
> > times would be nice to have too.
> >
> >
> > On Fri, 5 Apr 2024, 00:50 Jarek Potiuk,  wrote:
> >
> > > Hello everyone,
> > >
> > > TL;DR With some recent changes in GitHub Actions and the fact that ASF
> > has
> > > a lot of runners available donated for all the builds, I think we could
> > > experiment with disabling "self-hosted" runners for committer builds.
> > >
> > > The self-hosted runners of ours have been extremely helpful (and we
> > should
> > > again thank Amazon and Astronomer for donating credits / money for
> > those) -
> > > when the Github Public runners have been far less powerful - and we had
> > > less number of those available for ASF projects. This saved us a LOT of
> > > troubles where there was a contention between ASF projects.
> > >
> > > But as of recently both limitations have been largely removed:
> > >
> > > * ASF has 900 public runners donated by GitHub to all projects
> > > * Those public runners have (as of January) for open-source projects
> now
> > > have 4 CPUS and 16GB of memory -
> > >
> > >
> >
> https://github.blog/2024-01-17-github-hosted-runners-double-the-power-for-open-source/
> > >
> > >
> > > While they are not as powerful as our self-hosted runners, the
> > parallelism
> > > we utilise for those brings those builds in not-that bad shape compared
> > to
> > > self-hosted runners. Typical differences between the public and
> > self-hosted
> > > runners now for the complete set of tests are ~ 20m for public runners
> > and
> > > ~14 m for self-hosted ones.
> > >
> > > But this is not the only factor - I think committers experience the
> "Job
> > > failed" for self-hosted runners generally much more often than
> > > non-committers (stability of our solution is not best, also we are
> using
> > > cheaper spot instances). Plus - we limit the total number of
> self-hosted
> > > runners (35) - so if several committers submit a few PRs and we have
> > canary
> > > build running, the jobs will wait until runners are available.
> > >
> > > And of course it costs the credits/money of sponsors which we could use
> > for
> > > other things.
> > >
> > > I have - as of recently - access to Github Actions metrics - and while
> > ASF
> > > is keeping an eye and stared limiting the number of parallel jobs
> > workflows
> > > in projects are run, it looks like even if all committer 

Re: [DISCUSS] Consider disabling self-hosted runners for commiter PRs

2024-04-05 Thread Hussein Awala
Although 900 runners seem like a lot, they are shared among the Apache
organization's 2.2k repositories, of course only a few of them are active
(let's say 50), and some of them use an external CI tool for big jobs (eg:
Kafka uses Jenkins, Hudi uses Azure pipelines), but we have other very
active repositories based entirely on GHA, for example, Iceberg, Spark,
Superset, ...

I haven't found the AFS runners metrics dashboard to check the max
concurrency and the max queued time during peak hours, but I'm sure that
moving Airflow committers' CI jobs to public runners will put some pressure
on these runners, especially since these committers are the most active
contributors to Airflow, and the 35 self-hosted runners (with 8 CPUs and 64
GB RAM) are used almost all the time, so we can say that we will need
around 70 AFS runners to run the same jobs.

There is no harm in testing and deciding after 2-3 weeks.

We also need to find a way to let the infra team help us solve the
connectivity problem with the ARC runners

.

+1 for testing what you propose.

On Fri, Apr 5, 2024 at 12:07 PM Amogh Desai 
wrote:

> +1 I like the idea.
> Looking forward to seeing the difference.
>
> Thanks & Regards,
> Amogh Desai
>
>
> On Fri, Apr 5, 2024 at 3:54 AM Ferruzzi, Dennis
> 
> wrote:
>
> > Interested in seeing the difference, +1
> >
> >
> >  - ferruzzi
> >
> >
> > 
> > From: Oliveira, Niko 
> > Sent: Thursday, April 4, 2024 2:00 PM
> > To: dev@airflow.apache.org
> > Subject: RE: [EXTERNAL] [COURRIEL EXTERNE] [DISCUSS] Consider disabling
> > self-hosted runners for commiter PRs
> >
> > CAUTION: This email originated from outside of the organization. Do not
> > click links or open attachments unless you can confirm the sender and
> know
> > the content is safe.
> >
> >
> >
> > AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe.
> > Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne
> pouvez
> > pas confirmer l’identité de l’expéditeur et si vous n’êtes pas certain
> que
> > le contenu ne présente aucun risque.
> >
> >
> >
> > +1I'd love to see this as well.
> >
> > In the past, stability and long queue times of PR builds have been very
> > frustrating. I'm not 100% sure this is due to using self hosted runners,
> > since 35 queue depth (to my mind) should be plenty. But something about
> > that setup has never seemed quite right to me with queuing. Switching to
> > public runners for a while to experiment would be great to see if it
> > improves.
> >
> > 
> > From: Pankaj Koti 
> > Sent: Thursday, April 4, 2024 12:41:02 PM
> > To: dev@airflow.apache.org
> > Subject: RE: [EXTERNAL] [COURRIEL EXTERNE] [DISCUSS] Consider disabling
> > self-hosted runners for commiter PRs
> >
> > CAUTION: This email originated from outside of the organization. Do not
> > click links or open attachments unless you can confirm the sender and
> know
> > the content is safe.
> >
> >
> >
> > AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe.
> > Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne
> pouvez
> > pas confirmer l’identité de l’expéditeur et si vous n’êtes pas certain
> que
> > le contenu ne présente aucun risque.
> >
> >
> >
> > +1 from me to this idea.
> >
> > Sounds very reasonable to me.
> > At times, my experience has been better with public runners instead of
> > self-hosted runners :)
> >
> > And like already mentioned in the discussion, I think having the ability
> of
> > a applying the label "use-self-hosted-runners" to be used for critical
> > times would be nice to have too.
> >
> >
> > On Fri, 5 Apr 2024, 00:50 Jarek Potiuk,  wrote:
> >
> > > Hello everyone,
> > >
> > > TL;DR With some recent changes in GitHub Actions and the fact that ASF
> > has
> > > a lot of runners available donated for all the builds, I think we could
> > > experiment with disabling "self-hosted" runners for committer builds.
> > >
> > > The self-hosted runners of ours have been extremely helpful (and we
> > should
> > > again thank Amazon and Astronomer for donating credits / money for
> > those) -
> > > when the Github Public runners have been far less powerful - and we had
> > > less number of those available for ASF projects. This saved us a LOT of
> > > troubles where there was a contention between ASF projects.
> > >
> > > But as of recently both limitations have been largely removed:
> > >
> > > * ASF has 900 public runners donated by GitHub to all projects
> > > * Those public runners have (as of January) for open-source projects
> now
> > > have 4 CPUS and 16GB of memory -
> > >
> > >
> >
> https://github.blog/2024-01-17-github-hosted-runners-double-the-power-for-open-source/
> > >
> > >
> > > While they are not as powerful as our self-hosted runners, the
> > parallelism
> > > we utilise for those brings those builds in 

Re: [DISCUSS] Consider disabling self-hosted runners for commiter PRs

2024-04-05 Thread Amogh Desai
+1 I like the idea.
Looking forward to seeing the difference.

Thanks & Regards,
Amogh Desai


On Fri, Apr 5, 2024 at 3:54 AM Ferruzzi, Dennis 
wrote:

> Interested in seeing the difference, +1
>
>
>  - ferruzzi
>
>
> 
> From: Oliveira, Niko 
> Sent: Thursday, April 4, 2024 2:00 PM
> To: dev@airflow.apache.org
> Subject: RE: [EXTERNAL] [COURRIEL EXTERNE] [DISCUSS] Consider disabling
> self-hosted runners for commiter PRs
>
> CAUTION: This email originated from outside of the organization. Do not
> click links or open attachments unless you can confirm the sender and know
> the content is safe.
>
>
>
> AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe.
> Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne pouvez
> pas confirmer l’identité de l’expéditeur et si vous n’êtes pas certain que
> le contenu ne présente aucun risque.
>
>
>
> +1I'd love to see this as well.
>
> In the past, stability and long queue times of PR builds have been very
> frustrating. I'm not 100% sure this is due to using self hosted runners,
> since 35 queue depth (to my mind) should be plenty. But something about
> that setup has never seemed quite right to me with queuing. Switching to
> public runners for a while to experiment would be great to see if it
> improves.
>
> 
> From: Pankaj Koti 
> Sent: Thursday, April 4, 2024 12:41:02 PM
> To: dev@airflow.apache.org
> Subject: RE: [EXTERNAL] [COURRIEL EXTERNE] [DISCUSS] Consider disabling
> self-hosted runners for commiter PRs
>
> CAUTION: This email originated from outside of the organization. Do not
> click links or open attachments unless you can confirm the sender and know
> the content is safe.
>
>
>
> AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe.
> Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne pouvez
> pas confirmer l’identité de l’expéditeur et si vous n’êtes pas certain que
> le contenu ne présente aucun risque.
>
>
>
> +1 from me to this idea.
>
> Sounds very reasonable to me.
> At times, my experience has been better with public runners instead of
> self-hosted runners :)
>
> And like already mentioned in the discussion, I think having the ability of
> a applying the label "use-self-hosted-runners" to be used for critical
> times would be nice to have too.
>
>
> On Fri, 5 Apr 2024, 00:50 Jarek Potiuk,  wrote:
>
> > Hello everyone,
> >
> > TL;DR With some recent changes in GitHub Actions and the fact that ASF
> has
> > a lot of runners available donated for all the builds, I think we could
> > experiment with disabling "self-hosted" runners for committer builds.
> >
> > The self-hosted runners of ours have been extremely helpful (and we
> should
> > again thank Amazon and Astronomer for donating credits / money for
> those) -
> > when the Github Public runners have been far less powerful - and we had
> > less number of those available for ASF projects. This saved us a LOT of
> > troubles where there was a contention between ASF projects.
> >
> > But as of recently both limitations have been largely removed:
> >
> > * ASF has 900 public runners donated by GitHub to all projects
> > * Those public runners have (as of January) for open-source projects now
> > have 4 CPUS and 16GB of memory -
> >
> >
> https://github.blog/2024-01-17-github-hosted-runners-double-the-power-for-open-source/
> >
> >
> > While they are not as powerful as our self-hosted runners, the
> parallelism
> > we utilise for those brings those builds in not-that bad shape compared
> to
> > self-hosted runners. Typical differences between the public and
> self-hosted
> > runners now for the complete set of tests are ~ 20m for public runners
> and
> > ~14 m for self-hosted ones.
> >
> > But this is not the only factor - I think committers experience the "Job
> > failed" for self-hosted runners generally much more often than
> > non-committers (stability of our solution is not best, also we are using
> > cheaper spot instances). Plus - we limit the total number of self-hosted
> > runners (35) - so if several committers submit a few PRs and we have
> canary
> > build running, the jobs will wait until runners are available.
> >
> > And of course it costs the credits/money of sponsors which we could use
> for
> > other things.
> >
> > I have - as of recently - access to Github Actions metrics - and while
> ASF
> > is keeping an eye and stared limiting the number of parallel jobs
> workflows
> > in projects are run, it looks like even if all committer runs are added
> to
> > the public runners, we will still cause far lower usage that the limits
> are
> > and far lower than some other projects (which I will not name here).  I
> > have access to the metrics so I can monitor our usage and react.
> >
> > I think possibly - if we switch committers to "public" runners by default
> > -the experience will not be much worse for them (and sometimes even
> better
> > - 

Re: [VOTE] Release Airflow 2.9.0 from 2.9.0rc2

2024-04-05 Thread Amogh Desai
+1 non binding

Tested a few example DAGs and tested to see if my changes work as expected.
It looks good to me.

Thanks & Regards,
Amogh Desai


On Fri, Apr 5, 2024 at 4:04 AM Jarek Potiuk  wrote:

> +1 (binding) - checked reproducibility, signatures, checksums, licences - >
> all good. Installed it, run a few dags, clicked through a number of
> screens. All looks good. Also verified the final package and it looks good
> with the right FAB >=1.0.2 dependency. All looks good.
>
> On Thu, Apr 4, 2024 at 11:25 PM Ephraim Anierobi <
> ephraimanier...@apache.org>
> wrote:
>
> > Hey fellow Airflowers,
> >
> > I have cut Airflow 2.9.0rc2. This email is calling a vote on the release,
> > which will last at least 52 hours, from Thursday, April 4, 2024, at 9:00
> pm
> > UTC
> > until Sunday, April 7, 2024, at 1:00 am UTC
> > <
> >
> https://www.timeanddate.com/worldclock/fixedtime.html?msg=8=20240407T0100=1440
> > >,
> > and until 3 binding +1 votes have been received.
> >
> > Consider this my (binding) +1.
> >
> > Airflow 2.9.0rc2 is available at:
> > https://dist.apache.org/repos/dist/dev/airflow/2.9.0rc2/
> >
> > *apache-airflow-2.9.0-source.tar.gz* is a source release that comes with
> > INSTALL instructions.
> > *apache-airflow-2.9.0.tar.gz* is the binary Python "sdist" release.
> > *apache_airflow-2.9.0-py3-none-any.whl* is the binary Python wheel
> "binary"
> > release.
> >
> > Public keys are available at:
> > https://dist.apache.org/repos/dist/release/airflow/KEYS
> >
> > Please vote accordingly:
> >
> > [ ] +1 approve
> > [ ] +0 no opinion
> > [ ] -1 disapprove with the reason
> >
> > Only votes from PMC members are binding, but all members of the community
> > are encouraged to test the release and vote with "(non-binding)".
> >
> > The test procedure for PMC members is described in:
> >
> >
> https://github.com/apache/airflow/blob/main/dev/README_RELEASE_AIRFLOW.md\#verify-the-release-candidate-by-pmc-members
> >
> > The test procedure for and Contributors who would like to test this RC is
> > described in:
> >
> >
> https://github.com/apache/airflow/blob/main/dev/README_RELEASE_AIRFLOW.md\#verify-the-release-candidate-by-contributors
> >
> >
> > Please note that the version number excludes the `rcX` string, so it's
> now
> > simply 2.9.0. This will allow us to rename the artifact without modifying
> > the artifact checksums when we actually release.
> >
> > Release Notes:
> > https://github.com/apache/airflow/blob/2.9.0rc2/RELEASE_NOTES.rst
> >
> > For information on what goes into a release please see:
> >
> >
> https://github.com/apache/airflow/blob/main/dev/WHAT_GOES_INTO_THE_NEXT_RELEASE.md
> >
> > Changes since 2.9.0rc1:
> >
> > *Bug Fixes*:
> > - Fix decryption of trigger kwargs when downgrading (#38743)
> > - Fix grid header rendering (#38720)
> >
> > *Doc-only Change*:
> > - Improve timetable documentation (#38505)
> > - Reorder OpenAPI Spec tags alphabetically (#38717)
> >
> > Cheers,
> > Ephraim
> >
>