Sounds like a new ASF project: K8S-CI-AAS? :P   Joking aside, I'm all for 
consolidating effort if they have a solution that works for us and are willing 
to share the fruits of their labour.


________________________________
From: Jarek Potiuk <[email protected]>
Sent: Sunday, December 18, 2022 8:03 AM
To: [email protected]
Subject: [EXTERNAL] [PROPOSAL] Switching our CI runners to K8S controller 
open-sourced for Apache Arrow


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.


Hello everyone,

TL;DR: I wanted to make a proposal to move our CI runners from our own "custom" 
implementation developed mostly by Ash and based on VMs to a newly released 
Auto-scaling K8S controller that was developed for Apache Arrow by Voltron Data.

I was in contact with Jacob Wujciak who lead the effort in Arrow - and we were 
also discussing it at the latest ASF build meeting (BTW. Jacob was just 
approved as an Arrow committer) and I think they have a solid and proven 
solution, very well documented and working together with the ASF GitHub 
application that was implemented to distribute ephemeral tokens needed to run 
the runners.  We would likely keep using Ash's runner for security but this can 
be easily done in the solution from Voltron Data.

Why would we want to do it?

We wanted to switch from our implementation for quite some time already as what 
we have is somewhat brittle and rather complex - including multiple 
AWS-specific technologies (and is our code that we have to maintain in 
https://github.com/apache/airflow-ci-infra). Actually the fact that we use 
AWS-specific technologies, was one of the reasons we could not use easily 
Google Cloud Platform Credits for CI even if they were offered to us in the 
past.
[https://opengraph.githubassets.com/65c4300bf22c7f627561db56568d836eaae374c94992d69b1ffa12753f658fc9/apache/airflow-ci-infra]<https://github.com/apache/airflow-ci-infra>

GitHub - apache/airflow-ci-infra: Automation around CI infrastructure for 
Apache Airflow<https://github.com/apache/airflow-ci-infra>
github.com
Automation around CI infrastructure for Apache Airflow - GitHub - 
apache/airflow-ci-infra: Automation around CI infrastructure for Apache Airflow


I am afraid only Ash knows most of the ins-outs of the scaling code (though 
both myself and Kaxil were able to fix some stuff and I added a lot of stuff in 
packer-based installation).

While the current solution is very stable, we sometimes get "job not started" 
problems and sometimes we have to manually "push" Auto-scaling to work. 
K8S-based auto-scaling controller is as good as it gets, and we have good 
relationship with Arrow team and Jacob so we can expect a decent help and 
cooperation - they will also implement them in very similar setup to ours (with 
ASF tokens) so our use case will be handled well. Also choosing K8S controller 
makes it easy to move between clouds or even possible to run it on multiple 
clouds.

The Discussion on Arrow devlist about it:

https://lists.apache.org/thread/mskpqwpdq65t1wpj4f5klfq9217ljodw

If this will seem like a good idea, I will work on it likely around the end of 
year and if anyone would like ot help with it, I will be more than happy for 
others to join me - volunteers are most welcome - so that we will have more 
hands and eyes knowledgeable about the setup.

J.




Reply via email to