Some update on my GitLab experiences so far: TL;DR; I think the POC has shown that we can fairly easily replicate the CI in GitLab + Kubernetes. I think i can say - it generally works, I can plug it in for master/v1-10-test builds in the main Airflow project for a few weeks to see how it is doing (while I am no holidays) and once we see it running and get the support for PRs from GitLab we can switch to it.
What do you think ? Should i call a vote or just try to set it up ? Some details - I manged to get full working builds in GitLabCI + kubernetes - without the kubernetes-specific tests yet, but this should be rather easy with kind (looking at it next): - Working example here - you can take a look and compare the UI/how it is to navigate, comparing to Travis etc: https://gitlab.com/Jarek.Potiuk/airflow/pipelines/74625817 - Per-job it is a bit slower than Travis so far (still around 35 minutes in total), but I plan to optimise it further. I can play with memory/cpu settings of individual workers (Got some reasonable values now), I can use local SSD disk as Docker storage/logs/etc - I got an approval for 72vCPU quota (up for initial 24) - that should let us build 3 builds in parallel independently from each other. - I managed to get Preemptible nodes working (we have built in retry mechanism in GitLab to work in case of system failures like that - Current spending with > 120 builds is 40 USD. We should be way below 500 USD/month according to my back-of-the-envelope calculations. Likely well below - The current setup does not use GCR as cache and Kaniko as I originally planned. GCR would require custom authentication (and easy-to-steal secrets) and Kaniko does not yet well handle multi-staging builds (cache does not work https://github.com/GoogleContainerTools/kaniko/issues/682). I updated https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-23+Migrate+out+of+Travis+CI to reflect that. - We only use GCR as mirroring of DockerHub - so that we can have reliable downloads not depending on DockerHub's stability (it has problems sometimes) - All in-all, it's GCP-independent. It could be run in any Kubernetes cluster (some optimisations like local volumes mounting for docker engine might have GCP-specific assumptions, but should be generally replicable). - You can take a look at the current source code in https://github.com/potiuk/airflow/commits/test-gitlab-ci - There will be some updates (I will get rid of custom builder Docker, simplify it a bit and implement kubernetes tests) - it's mostly some cleanups + removal of Travis-Specific variables + gitlab.ci yaml with job definitions. J. On Wed, Jul 31, 2019 at 10:57 AM Jarek Potiuk <jarek.pot...@polidea.com> wrote: > So GitLab already works on automatically running builds from for PRs :). > > Kamil got involved and will be out advocate on it: > https://gitlab.com/gitlab-org/gitlab-ce/issues/65139 > J. > > Principal Software Engineer > Phone: +48660796129 > > pt., 26 lip 2019, 18:12 użytkownik Jarek Potiuk <jarek.pot...@polidea.com> > napisał: > >> Update: I added appropriate comment in the GitLab CI issue about PRs and >> we are getting attention of Jason Lenny - director of Product Management @ >> GitLab. Let's hope they prioritise it quickly enough. >> >> Speaking of potential complexity/Maintenance - in order to alleviate any >> maintenance worries, I think about setting up the whole system on GitLab >> CI + GKE and running it in parallel to Travis for quite some time (even >> months) so that we can switch it at any time. Then we will be able to tune >> it according to real use cases and compare the experience of both systems. >> >> Also I am going for holidays in two weeks and I will make sure that there >> will be someone with GitLab + Kubernetes experience (from my company) who >> can take over and make sure there will be no problems. However I am quite >> confident :D nothing is going to happen while I am away. I would also >> invite whoever from committers who would like to join the project and >> gitlab instance (once I setup POC) to learn and see how easy it is and how >> maintenance free it is going to be. >> >> J. >> >> On Fri, Jul 26, 2019 at 2:56 PM Kamil Breguła <kamil.breg...@polidea.com> >> wrote: >> >>> GKE and its own CI will allow us to solve other problems - building >>> and publishing documentation from the master branch. Currently, >>> building is done using the RTD service. Unfortunately, our project is >>> too large and often the documentation is not built properly. >>> https://readthedocs.org/projects/airflow/builds/ >>> We should think about another way to build documentation. In the ideal >>> world, building documentation should use the same environment as >>> checking documentation on CI. Adding this step to Travis can further >>> reduce our development opportunities. >>> Discussion on Slack about it: >>> https://apache-airflow.slack.com/archives/CJ1LVREHX/p1561756652021900 >>> >>> It is worth thinking about the fact that our project will soon have a >>> website and our documentation will also be available in many >>> languages. Currently, talks are taking place with the design studio >>> and developers who can make these websites ;-) >>> >>> https://lists.apache.org/thread.html/982c7baa06742ad722f2baa0db53ad99aea6c26b14b7d6d4aa522677@%3Cdev.airflow.apache.org%3E >>> We should provide an environment that will allow you to build a >>> website and documentation. At best, these tasks should be combined. I >>> hope that we will be able to create a website that will be a real >>> support for the community on current events, so it will be updated >>> frequently. >>> >>> It seems to me that the project will grow. If we now have problems >>> with Travis, then the significance of these problems in the future can >>> only grow. Now we have a chance to provide a stable infrastructure for >>> the project for a long time. >>> >>> I would like to share another situation which was not pleasant for me. >>> Recently I wanted to send >10 PR, but because of Travis, I had to wait >>> for the weekend to send changes. If I would send my changes in a week, >>> I would block the queue for a few hours. Although I did it over the >>> weekend, I got the message that the queue is blocked on Travis by my >>> jobs. >>> >>> On Tue, Jul 23, 2019 at 6:12 PM Jarek Potiuk <jarek.pot...@polidea.com> >>> wrote: >>> > >>> > Hello Everyone, >>> > >>> > I prepared a short docs where I described general architecture of the >>> > solution I imagine we can deploy fairly quickly - having GitLab CI >>> support >>> > and Google provided funding for GCP resources. >>> > >>> > I am going to start working on Proof-Of-Concept soon but before I start >>> > doing it, I would like to get some comments and opinions on the >>> proposed >>> > approach. I discussed the basic approach with my friend Kamil who >>> works at >>> > GitLab and he is a CI maintainer and this is what we think will be >>> > achievable in fairly short time. >>> > >>> > >>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-23+Migrate+out+of+Travis+CI >>> > >>> > I am happy to discuss details and make changes to the proposal - we can >>> > discuss it here or as comments in the document. >>> > >>> > Let's see what people think about it and if we get to some consensus we >>> > might want to cast a vote (or maybe go via lasy consensus as this is >>> > something we should have rather quickly) >>> > >>> > Looking forward to your comments! >>> > >>> > J. >>> > >>> > -- >>> > >>> > Jarek Potiuk >>> > Polidea <https://www.polidea.com/> | Principal Software Engineer >>> > >>> > M: +48 660 796 129 <+48660796129> >>> > [image: Polidea] <https://www.polidea.com/> >>> >> >> >> -- >> >> Jarek Potiuk >> Polidea <https://www.polidea.com/> | Principal Software Engineer >> >> M: +48 660 796 129 <+48660796129> >> [image: Polidea] <https://www.polidea.com/> >> >> -- Jarek Potiuk Polidea <https://www.polidea.com/> | Principal Software Engineer M: +48 660 796 129 <+48660796129> [image: Polidea] <https://www.polidea.com/>