Re: Add /ready-to-test bot to airflow CI

Daniel Imberman Fri, 23 Oct 2020 16:10:29 -0700

+1000!

via Newton Mail 
[https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.51&pv=10.15.6&source=email_footer_2]
On Fri, Oct 23, 2020 at 8:22 AM, Jarek Potiuk <jarek.pot...@polidea.com> wrote:
Thanks Tobiasz :). fantastic.
I prepared a very short and simple design doc 
https://docs.google.com/document/d/16rwyCfyDpKWN-DrLYbhjU0B1D58T1RFYan5ltmw4DQg/edit#
 
[https://docs.google.com/document/d/16rwyCfyDpKWN-DrLYbhjU0B1D58T1RFYan5ltmw4DQg/edit#]
 where we can collaborate.
I also added you as collaborator to 
https://github.com/potiuk/get-workflow-origin 
[https://github.com/potiuk/get-workflow-origin] that we already use, and I 
think you can update the "get workflow origin" plugin to include status of the 
PR in the output of the action (ore maybe we find out that we already have what 
we need in GitHub context).
I will take a look at finding out how/if we can trigger the "full build" 
automatically when approval status changes from "Not approved" to "Approved".
J.

On Thu, Oct 22, 2020 at 7:20 PM Tobiasz Kędzierski <
tobiasz.kedzier...@polidea.com [tobiasz.kedzier...@polidea.com] > wrote:
Hi Jarek.

sounds good to me. I am happy to help you as much as I can with it.

BR
Tobiasz
On Thu, Oct 22, 2020 at 9:06 AM Jarek Potiuk < jarek.pot...@polidea.com
[jarek.pot...@polidea.com] > wrote:
TLDR; I thought about it a bit and I have a proposal on how to solve it even
better - one that can be implemented over the weekend (I volunteer :) ) and
that can be very easily shared and adopted by the other ASF projects so that we
all collectively decrease the strain on Github Actions.
This is in parallel to our efforts on having self-hosted workers of course, but
I think it will be needed anyway. Let me put it in a bit of context
Problem statement:
* the root cause of the problem is that we are competing with many other
projects of ASF for the 180 jobs. I have started the discussion in
bui...@apache.org [bui...@apache.org] about this:
https://lists.apache.org/thread.html/r1708881f52adbdae722afb8fea16b23325b739b254b60890e72375e1%40%3Cbuilds.apache.org%3E

[https://lists.apache.org/thread.html/r1708881f52adbdae722afb8fea16b23325b739b254b60890e72375e1%40%3Cbuilds.apache.org%3E]
and it's clear all ASF projects using GA have the same problem and compete
against each other for the jobs. * if we decrease the strain on our side, this
is not solving the problem long term. We keep on doing it already, and we
already decrease a lot of strain, but other projects from ASF increase their
strain in the meantime (Apache Beam, Skywalking, and few other projects are
becoming heavy GitHub Actions users). * in all the projects that I looked at,
we have the same root cause. Matrix strategy of builds causes enormous strain
on Github Actions if the whole matrix is run for all PRs. We are going to make
it works sustainably only if we come up with an easy solution, that can be
applied to all those projects. * I think the comment-based PR triggering
process is complex and cumbersome to follow. It puts a LOT more effort on the
committers because they not only have to review and comment on the PRs but also
make decisions that those PRs are ready for "full build". This is a lot of
unnecessary effort and complicated process that many of the ASF projects will
not like to adopt
Proposed Solution:
Add an easy way to limit the matrix strategy to one "default" combo for PRs
THAT HAVE NOT BEEN APPROVED BY COMMITTERS YET
This can be easily done with Github Actions workflows - no need to write a bot
for this.
Some details:
* Custom GitHub Action (generic) that checks if the PRs are approved by
Committers (and no "disapprovals"). The action would produce an output ->
"Approved", "Not approved". The output could be used to determine the matrix
strategy scope (in our case we already have support for dynamic matrix strategy
that I added a few weeks ago - so it's just a matter of wiring the output in).
* Very small workflow with the same GithubAction run on "pull_request_target"
event. That workflow would effective "observe" the PR, and when the status
changes from "not approved" to "approved", it triggers a PR build (with the
"full matrix strategy" this time because the PR will be already approved). This
seems to be entirely possible. This "pull_request_target" workflow - similarly
to "workflow_run" runs with a "write" access token and uses a "main branch"
workflow version and it could easily trigger a rerun of the last PR build in
such case.
Benefits:
- I think I could write such an action over the coming weekend (happy to
collaborate with anyone on that). I will first search if someone has done
something similar of course because maybe it can be done faster this way, but I
am quite confident after writing my
https://github.com/potiuk/cancel-workflow-runs
[https://github.com/potiuk/cancel-workflow-runs] which we already use to limit
the strain that it is doable in a day/two
- no need to change the process we have - we continue working as we did and
simply "approved" PRs will be the full matrix strategy ones but the
"not-approved-ones" will run a limited version of the checks. - no way to
accidentally submit a breaking PR - when the committer approves the PR that has
not been approved before, the PR build will be re-run with the "full matrix
strategy" and not mergeable until it finishes - last-but-not-least: we can
propose (and help) other ASF projects to use the action in their own GitHub
Actions. It will not be changing anyone's process - which makes it super-easy
to adopt and I can even turn it into a "recommended solution" by Apache Infra -
similarly as Airflow's CI architecture is a recommended solution already for
the integration of GA with DockerHub
https://cwiki.apache.org/confluence/display/INFRA/Github+Actions+to+DockerHub
[https://cwiki.apache.org/confluence/display/INFRA/Github+Actions+to+DockerHub]
WDYT?
J

On Fri, Oct 2, 2020 at 7:59 PM Jarek Potiuk < jarek.pot...@polidea.com
[jarek.pot...@polidea.com] > wrote:
I think it's a good idea, but I'd augment it a bit. A better option will be to
run all test types but for only one chosen combination of "Python + DB type +
DB version".
I often don't even look at the PR until the tests pass and this would be much
better this way. And often people have slower/small machines so they submit the
PR to see if they have not broken any other tests. This is much, much easier
than doing it locally - because then in one "fire&forget" you can run static,
doc, unit tests, integration, and Kubernetes ones,. And it's a valid thing.
Also, we have to make sure that such PR does not become "Green" before all the
tests are run. This might be rather problematic as Github does not yet have a "
manual" Approval step in Github Actions (it's coming in Q4:
https://github.com/github/roadmap/issues/99
[https://github.com/github/roadmap/issues/99] ).
We have many tests and already we hit a bug a few times, where not all tests
have yet started and we've merged such PR. I can imagine it will happen more
and more often if all PRs will only run a subset of tests. It will be very easy
to make that mistake because even if we run a subset of those tests, we have so
many jobs that you cannot see them all in the GitHub UI.
So we will have to have a check that fails the PR but marks it somehow as
"Ready for review" for example adding a label "Ready for review" when the
subset of tests succeeds.
Also, this might not be needed (or less important) if we implement:
https://github.com/apache/airflow/issues/10507
[https://github.com/apache/airflow/issues/10507] "Selective Tests" for which I
have an open PR. They will give much bigger improvements - because, in the vast
majority of cases, the tests will take very little time - giving feedback about
relevant tests in a few minutes rather than half-an-hour. We can also combine
those two.
It seems that I managed to finish some of the stuff that I thought will take
more time, so I might come back to it next week if it goes as well as I planned.
J.

On Fri, Oct 2, 2020 at 3:57 AM Daniel Imberman < daniel.imber...@gmail.com
[daniel.imber...@gmail.com] > wrote:
I’m not too worried about that. I think people would learn pretty quickly. It
hasn’t been an issue for the kubernetes community so I can’t imagine it being
an issue for us. End-of-day, we only have a limited amount of compute power and
this will increase the speed we merge the PR’s that have passed basic code
quality checks.

On Thu, Oct 1, 2020 at 2:19 PM, Tomasz Urbaszek < turbas...@apache.org
[turbas...@apache.org] > wrote:
I think I can agree. Especially with flaky tests, some contributors may be
confused that some of the tests don't work on CI but work locally...
Checking the code quality is good first step. Once there's a review we can
start tests on CI.
On the other hand, I can see people asking for starting the tests or being even
more confused why some PRs have more CI builds than others...
Cheers, Tomek
On Thu, Oct 1, 2020 at 10:29 PM Daniel Imberman < daniel.imber...@gmail.com
[daniel.imber...@gmail.com] > wrote:
Hello all,
With the recent uptick in airflow contribution and pull requests, I have a
proposal that I hope will ensure that we do not find ourselves in a CI backlog
hell. I noticed that on the Kubernetes project, pull requests do not run
integration test until a committer submits a "ready to test" command to the CI
bot. This step can prevent draft PRs or un-reviewed PRs from taking github CI
resources. It is worth noting that with breeze's docker based testing system,
users have the exact same testing capabilities locally as they would on our CI.
I propose that we allow unverified PR's to run basic and static tests, but not
perform the full test suite or integration test without first being reviewed.
What does everyone think?

--
Jarek Potiuk
Polidea [https://www.polidea.com/] | Principal Software Engineer

M: +48 660 796 129 [tel:+48660796129]
[https://www.polidea.com/]

--
Jarek Potiuk
Polidea [https://www.polidea.com/] | Principal Software Engineer

M: +48 660 796 129 [tel:+48660796129]
[https://www.polidea.com/]

--
Jarek Potiuk
Polidea [https://www.polidea.com/] | Principal Software Engineer

M: +48 660 796 129 [tel:+48660796129]
[https://www.polidea.com/]

Re: Add /ready-to-test bot to airflow CI

Reply via email to