+1000! via Newton Mail [https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.51&pv=10.15.6&source=email_footer_2] On Fri, Oct 23, 2020 at 8:22 AM, Jarek Potiuk <jarek.pot...@polidea.com> wrote: Thanks Tobiasz :). fantastic. I prepared a very short and simple design doc https://docs.google.com/document/d/16rwyCfyDpKWN-DrLYbhjU0B1D58T1RFYan5ltmw4DQg/edit# [https://docs.google.com/document/d/16rwyCfyDpKWN-DrLYbhjU0B1D58T1RFYan5ltmw4DQg/edit#] where we can collaborate. I also added you as collaborator to https://github.com/potiuk/get-workflow-origin [https://github.com/potiuk/get-workflow-origin] that we already use, and I think you can update the "get workflow origin" plugin to include status of the PR in the output of the action (ore maybe we find out that we already have what we need in GitHub context). I will take a look at finding out how/if we can trigger the "full build" automatically when approval status changes from "Not approved" to "Approved". J.
On Thu, Oct 22, 2020 at 7:20 PM Tobiasz Kędzierski < tobiasz.kedzier...@polidea.com [tobiasz.kedzier...@polidea.com] > wrote: Hi Jarek. sounds good to me. I am happy to help you as much as I can with it. BR Tobiasz On Thu, Oct 22, 2020 at 9:06 AM Jarek Potiuk < jarek.pot...@polidea.com [jarek.pot...@polidea.com] > wrote: TLDR; I thought about it a bit and I have a proposal on how to solve it even better - one that can be implemented over the weekend (I volunteer :) ) and that can be very easily shared and adopted by the other ASF projects so that we all collectively decrease the strain on Github Actions. This is in parallel to our efforts on having self-hosted workers of course, but I think it will be needed anyway. Let me put it in a bit of context Problem statement: * the root cause of the problem is that we are competing with many other projects of ASF for the 180 jobs. I have started the discussion in bui...@apache.org [bui...@apache.org] about this: https://lists.apache.org/thread.html/r1708881f52adbdae722afb8fea16b23325b739b254b60890e72375e1%40%3Cbuilds.apache.org%3E [https://lists.apache.org/thread.html/r1708881f52adbdae722afb8fea16b23325b739b254b60890e72375e1%40%3Cbuilds.apache.org%3E] and it's clear all ASF projects using GA have the same problem and compete against each other for the jobs. * if we decrease the strain on our side, this is not solving the problem long term. We keep on doing it already, and we already decrease a lot of strain, but other projects from ASF increase their strain in the meantime (Apache Beam, Skywalking, and few other projects are becoming heavy GitHub Actions users). * in all the projects that I looked at, we have the same root cause. Matrix strategy of builds causes enormous strain on Github Actions if the whole matrix is run for all PRs. We are going to make it works sustainably only if we come up with an easy solution, that can be applied to all those projects. * I think the comment-based PR triggering process is complex and cumbersome to follow. It puts a LOT more effort on the committers because they not only have to review and comment on the PRs but also make decisions that those PRs are ready for "full build". This is a lot of unnecessary effort and complicated process that many of the ASF projects will not like to adopt Proposed Solution: Add an easy way to limit the matrix strategy to one "default" combo for PRs THAT HAVE NOT BEEN APPROVED BY COMMITTERS YET This can be easily done with Github Actions workflows - no need to write a bot for this. Some details: * Custom GitHub Action (generic) that checks if the PRs are approved by Committers (and no "disapprovals"). The action would produce an output -> "Approved", "Not approved". The output could be used to determine the matrix strategy scope (in our case we already have support for dynamic matrix strategy that I added a few weeks ago - so it's just a matter of wiring the output in). * Very small workflow with the same GithubAction run on "pull_request_target" event. That workflow would effective "observe" the PR, and when the status changes from "not approved" to "approved", it triggers a PR build (with the "full matrix strategy" this time because the PR will be already approved). This seems to be entirely possible. This "pull_request_target" workflow - similarly to "workflow_run" runs with a "write" access token and uses a "main branch" workflow version and it could easily trigger a rerun of the last PR build in such case. Benefits: - I think I could write such an action over the coming weekend (happy to collaborate with anyone on that). I will first search if someone has done something similar of course because maybe it can be done faster this way, but I am quite confident after writing my https://github.com/potiuk/cancel-workflow-runs [https://github.com/potiuk/cancel-workflow-runs] which we already use to limit the strain that it is doable in a day/two - no need to change the process we have - we continue working as we did and simply "approved" PRs will be the full matrix strategy ones but the "not-approved-ones" will run a limited version of the checks. - no way to accidentally submit a breaking PR - when the committer approves the PR that has not been approved before, the PR build will be re-run with the "full matrix strategy" and not mergeable until it finishes - last-but-not-least: we can propose (and help) other ASF projects to use the action in their own GitHub Actions. It will not be changing anyone's process - which makes it super-easy to adopt and I can even turn it into a "recommended solution" by Apache Infra - similarly as Airflow's CI architecture is a recommended solution already for the integration of GA with DockerHub https://cwiki.apache.org/confluence/display/INFRA/Github+Actions+to+DockerHub [https://cwiki.apache.org/confluence/display/INFRA/Github+Actions+to+DockerHub] WDYT? J On Fri, Oct 2, 2020 at 7:59 PM Jarek Potiuk < jarek.pot...@polidea.com [jarek.pot...@polidea.com] > wrote: I think it's a good idea, but I'd augment it a bit. A better option will be to run all test types but for only one chosen combination of "Python + DB type + DB version". I often don't even look at the PR until the tests pass and this would be much better this way. And often people have slower/small machines so they submit the PR to see if they have not broken any other tests. This is much, much easier than doing it locally - because then in one "fire&forget" you can run static, doc, unit tests, integration, and Kubernetes ones,. And it's a valid thing. Also, we have to make sure that such PR does not become "Green" before all the tests are run. This might be rather problematic as Github does not yet have a " manual" Approval step in Github Actions (it's coming in Q4: https://github.com/github/roadmap/issues/99 [https://github.com/github/roadmap/issues/99] ). We have many tests and already we hit a bug a few times, where not all tests have yet started and we've merged such PR. I can imagine it will happen more and more often if all PRs will only run a subset of tests. It will be very easy to make that mistake because even if we run a subset of those tests, we have so many jobs that you cannot see them all in the GitHub UI. So we will have to have a check that fails the PR but marks it somehow as "Ready for review" for example adding a label "Ready for review" when the subset of tests succeeds. Also, this might not be needed (or less important) if we implement: https://github.com/apache/airflow/issues/10507 [https://github.com/apache/airflow/issues/10507] "Selective Tests" for which I have an open PR. They will give much bigger improvements - because, in the vast majority of cases, the tests will take very little time - giving feedback about relevant tests in a few minutes rather than half-an-hour. We can also combine those two. It seems that I managed to finish some of the stuff that I thought will take more time, so I might come back to it next week if it goes as well as I planned. J. On Fri, Oct 2, 2020 at 3:57 AM Daniel Imberman < daniel.imber...@gmail.com [daniel.imber...@gmail.com] > wrote: I’m not too worried about that. I think people would learn pretty quickly. It hasn’t been an issue for the kubernetes community so I can’t imagine it being an issue for us. End-of-day, we only have a limited amount of compute power and this will increase the speed we merge the PR’s that have passed basic code quality checks. On Thu, Oct 1, 2020 at 2:19 PM, Tomasz Urbaszek < turbas...@apache.org [turbas...@apache.org] > wrote: I think I can agree. Especially with flaky tests, some contributors may be confused that some of the tests don't work on CI but work locally... Checking the code quality is good first step. Once there's a review we can start tests on CI. On the other hand, I can see people asking for starting the tests or being even more confused why some PRs have more CI builds than others... Cheers, Tomek On Thu, Oct 1, 2020 at 10:29 PM Daniel Imberman < daniel.imber...@gmail.com [daniel.imber...@gmail.com] > wrote: Hello all, With the recent uptick in airflow contribution and pull requests, I have a proposal that I hope will ensure that we do not find ourselves in a CI backlog hell. I noticed that on the Kubernetes project, pull requests do not run integration test until a committer submits a "ready to test" command to the CI bot. This step can prevent draft PRs or un-reviewed PRs from taking github CI resources. It is worth noting that with breeze's docker based testing system, users have the exact same testing capabilities locally as they would on our CI. I propose that we allow unverified PR's to run basic and static tests, but not perform the full test suite or integration test without first being reviewed. What does everyone think? -- Jarek Potiuk Polidea [https://www.polidea.com/] | Principal Software Engineer M: +48 660 796 129 [tel:+48660796129] [https://www.polidea.com/] -- Jarek Potiuk Polidea [https://www.polidea.com/] | Principal Software Engineer M: +48 660 796 129 [tel:+48660796129] [https://www.polidea.com/] -- Jarek Potiuk Polidea [https://www.polidea.com/] | Principal Software Engineer M: +48 660 796 129 [tel:+48660796129] [https://www.polidea.com/]