Great work on this, I know much time you dedicated on it. Regards, Kaxil
On Tue, Feb 9, 2021 at 1:40 PM Ash Berlin-Taylor <[email protected]> wrote: > Hi everyone. > > After a good two weeks of playing whack-a-mole with bugs, I have finally > merged https://github.com/apache/airflow/pull/13730 which means that > *some* builds now run on machines under our control. > > The biggest difference this will make is that 1) we won't be stuck in a > queue behind other ASF projects waiting for our "slot", 2) builds should > also be a bit faster now due to running most of the build on tmpfs > > I will do a more in-depth write up soon, but the rough architecture is: > > - A GitHub application receives events and whenever* a check-run is > created that posts to: > - A AWS Lambda function (via API gateway) that check if there is an idle > runner already > - an ASG that configures r5a.xlarge instances with tmpfs in "interesting" > places (docker store, tmp dirs etc) > - Some clever processes on the instance that set/clear ScaleInProtection > so that running jobs don't get killed, and emits a custom CloudWatch metric) > - A CloudWatch alarm to scale down the ASG when nodes are idle > - A paid-for docker hub user on these machines to avoid hitting pull > limits. > > The major downside is that due to security concerns, builds for non > committers/PMC members still run on the public queue. However the "build > image" step for everyone now runs on our machines, so everyone should > benefit a bit. > > I do expect a bit of fallout from this, so I will be monitoring the > Actions queue, but if there are any problems or issues let me know (here, > or on Slack) > > -ash >
