Hi Jarek. sounds good to me. I am happy to help you as much as I can with it.
BR Tobiasz On Thu, Oct 22, 2020 at 9:06 AM Jarek Potiuk <jarek.pot...@polidea.com> wrote: > *TLDR; I thought about it a bit and I have a proposal on how to solve it > even better - one that can be implemented over the weekend (I volunteer :) > ) and that can be very easily shared and adopted by the other ASF projects > so that we all collectively decrease the strain on Github Actions. * > > This is in parallel to our efforts on having self-hosted workers of > course, but I think it will be needed anyway. Let me put it in a bit of > context > > *Problem statement: * > > * the root cause of the problem is that we are competing with many other > projects of ASF for the 180 jobs. I have started the discussion in > bui...@apache.org about this: > https://lists.apache.org/thread.html/r1708881f52adbdae722afb8fea16b23325b739b254b60890e72375e1%40%3Cbuilds.apache.org%3E > and it's clear all ASF projects using GA have the same problem and compete > against each other for the jobs. > * if we decrease the strain on our side, this is not solving the problem > long term. We keep on doing it already, and we already decrease a lot of > strain, but other projects from ASF increase their strain in the meantime > (Apache Beam, Skywalking, and few other projects are becoming heavy GitHub > Actions users). > * in all the projects that I looked at, we have the same root cause. > Matrix strategy of builds causes enormous strain on Github Actions if the > whole matrix is run for all PRs. We are going to make it works sustainably > only if we come up with an easy solution, that can be applied to all those > projects. > * I think the comment-based PR triggering process is complex and > cumbersome to follow. It puts a LOT more effort on the committers because > they not only have to review and comment on the PRs but also make decisions > that those PRs are ready for "full build". This is a lot of unnecessary > effort and complicated process that many of the ASF projects will not like > to adopt > > *Proposed Solution:* > > *Add an easy way to limit the matrix strategy to one "default" combo for > PRs THAT HAVE NOT BEEN APPROVED BY COMMITTERS YET* > > This can be easily done with Github Actions workflows - no need to write a > bot for this. > > *Some details:* > > * Custom GitHub Action (generic) that checks if the PRs are approved by > Committers (and no "disapprovals"). The action would produce an output -> > "Approved", "Not approved". The output could be used to determine the > matrix strategy scope (in our case we already have support for dynamic > matrix strategy that I added a few weeks ago - so it's just a matter of > wiring the output in). > * Very small workflow with the same GithubAction run on > "pull_request_target" event. That workflow would effective "observe" the > PR, and when the status changes from "not approved" to "approved", it > triggers a PR build (with the "full matrix strategy" this time because the > PR will be already approved). This seems to be entirely possible. This > "pull_request_target" workflow - similarly to "workflow_run" runs with a > "write" access token and uses a "main branch" workflow version and it could > easily trigger a rerun of the last PR build in such case. > > *Benefits:* > > - I think I could write such an action over the coming weekend (happy to > collaborate with anyone on that). I will first search if someone has done > something similar of course because maybe it can be done faster this way, > but I am quite confident after writing my > https://github.com/potiuk/cancel-workflow-runs which we already use to > limit the strain that it is doable in a day/two > - no need to change the process we have - we continue working as we did > and simply "approved" PRs will be the full matrix strategy ones but the > "not-approved-ones" will run a limited version of the checks. > - no way to accidentally submit a breaking PR - when the committer > approves the PR that has not been approved before, the PR build will be > re-run with the "full matrix strategy" and not mergeable until it finishes > - last-but-not-least: we can propose (and help) other ASF projects to use > the action in their own GitHub Actions. It will not be changing anyone's > process - which makes it super-easy to adopt and I can even turn it into a > "recommended solution" by Apache Infra - similarly as Airflow's CI > architecture is a recommended solution already for the integration of GA > with DockerHub > https://cwiki.apache.org/confluence/display/INFRA/Github+Actions+to+DockerHub > > WDYT? > > J > > > > On Fri, Oct 2, 2020 at 7:59 PM Jarek Potiuk <jarek.pot...@polidea.com> > wrote: > >> I think it's a good idea, but I'd augment it a bit. A better option >> will be to run all test types but >> for only one chosen combination of "Python + DB type + DB version". >> >> I often don't even look at the PR until the tests pass and this would be >> much better this way. >> And often people have slower/small machines so they submit the PR to see >> if they have not >> broken any other tests. This is much, much easier than doing it locally - >> because then in one >> "fire&forget" you can run static, doc, unit tests, integration, and >> Kubernetes ones,. And it's a valid >> thing. >> >> Also, we have to make sure that such PR does not become "Green" before >> all the tests are run. >> This might be rather problematic as Github does not yet have a " manual" >> Approval step in >> Github Actions (it's coming in Q4: >> https://github.com/github/roadmap/issues/99). >> >> We have many tests and already we hit a bug a few times, where not all >> tests have yet started >> and we've merged such PR. I can imagine it will happen more and more >> often if all PRs will >> only run a subset of tests. It will be very easy to make that mistake >> because even if we run a subset of >> those tests, we have so many jobs that you cannot see them all in the >> GitHub UI. >> >> So we will have to have a check that fails the PR but marks it somehow as >> "Ready for review" for example adding >> a label "Ready for review" when the subset of tests succeeds. >> >> Also, this might not be needed (or less important) if we implement: >> https://github.com/apache/airflow/issues/10507 "Selective Tests" >> for which I have an open PR. They will give much bigger improvements - >> because, in the vast majority of cases, the tests will take >> very little time - giving feedback >> about relevant tests in a few minutes rather than half-an-hour. We can >> also combine those two. >> >> It seems that I managed to finish some of the stuff that I thought will >> take more time, so I might come back to it next week >> if it goes as well as I planned. >> >> J. >> >> >> >> On Fri, Oct 2, 2020 at 3:57 AM Daniel Imberman <daniel.imber...@gmail.com> >> wrote: >> >>> I’m not too worried about that. I think people would learn pretty >>> quickly. It hasn’t been an issue for the kubernetes community so I can’t >>> imagine it being an issue for us. End-of-day, we only have a limited amount >>> of compute power and this will increase the speed we merge the PR’s that >>> have passed basic code quality checks. >>> >>> >>> On Thu, Oct 1, 2020 at 2:19 PM, Tomasz Urbaszek <turbas...@apache.org> >>> wrote: >>> >>> I think I can agree. Especially with flaky tests, some contributors may >>> be confused that some of the tests don't work on CI but work locally... >>> >>> Checking the code quality is good first step. Once there's a review we >>> can start tests on CI. >>> >>> On the other hand, I can see people asking for starting the tests or >>> being even more confused why some PRs have more CI builds than others... >>> >>> Cheers, >>> Tomek >>> >>> On Thu, Oct 1, 2020 at 10:29 PM Daniel Imberman < >>> daniel.imber...@gmail.com> wrote: >>> >>>> Hello all, >>>> >>>> With the recent uptick in airflow contribution and pull requests, I >>>> have a proposal that I hope will ensure that we do not find ourselves in a >>>> CI backlog hell. I noticed that on the Kubernetes project, pull requests do >>>> not run integration test until a committer submits a "ready to test" >>>> command to the CI bot. This step can prevent draft PRs or un-reviewed PRs >>>> from taking github CI resources. It is worth noting that with breeze's >>>> docker based testing system, users have the exact same testing capabilities >>>> locally as they would on our CI. >>>> >>>> I propose that we allow unverified PR's to run basic and static tests, >>>> but not perform the full test suite or integration test without first being >>>> reviewed. >>>> >>>> What does everyone think? >>>> >>> >> >> -- >> >> Jarek Potiuk >> Polidea <https://www.polidea.com/> | Principal Software Engineer >> >> M: +48 660 796 129 <+48660796129> >> [image: Polidea] <https://www.polidea.com/> >> >> > > -- > > Jarek Potiuk > Polidea <https://www.polidea.com/> | Principal Software Engineer > > M: +48 660 796 129 <+48660796129> > [image: Polidea] <https://www.polidea.com/> > >