Re: Add /ready-to-test bot to airflow CI

Tobiasz Kędzierski Thu, 22 Oct 2020 10:20:13 -0700

Hi Jarek.

sounds good to me. I am happy to help you as much as I can with it.


BR
Tobiasz

On Thu, Oct 22, 2020 at 9:06 AM Jarek Potiuk <jarek.pot...@polidea.com>
wrote:

> *TLDR; I thought about it a bit and I have a proposal on how to solve it
> even better - one that can be implemented over the weekend (I volunteer :)
> ) and that can be very easily shared and adopted by the other ASF projects
> so that we all collectively decrease the strain on Github Actions. *
>
> This is in parallel to our efforts on having self-hosted workers of
> course, but I think it will be needed anyway. Let me put it in a bit of
> context
>
> *Problem statement: *
>
> * the root cause of the problem is that we are competing with many other
> projects of ASF for the 180 jobs. I have started the discussion in
> bui...@apache.org about this:
> https://lists.apache.org/thread.html/r1708881f52adbdae722afb8fea16b23325b739b254b60890e72375e1%40%3Cbuilds.apache.org%3E
> and it's clear all ASF projects using GA have the same problem and compete
> against each other for the jobs.
> * if we decrease the strain on our side, this is not solving the problem
> long term. We keep on doing it already, and we already decrease a lot of
> strain, but other projects from ASF increase their strain in the meantime
> (Apache Beam, Skywalking, and few other projects are becoming heavy GitHub
> Actions users).
> * in all the projects that I looked at, we have the same root cause.
> Matrix strategy of builds causes enormous strain on Github Actions if the
> whole matrix is run for all PRs. We are going to make it works sustainably
> only if we come up with an easy solution, that can be applied to all those
> projects.
> * I think the comment-based PR triggering process is complex and
> cumbersome to follow. It puts a LOT more effort on the committers because
> they not only have to review and comment on the PRs but also make decisions
> that those PRs are ready for "full build". This is a lot of unnecessary
> effort and complicated process that many of the ASF projects will not like
> to adopt
>
> *Proposed Solution:*
>
> *Add an easy way to limit the matrix strategy to one "default" combo for
> PRs THAT HAVE NOT BEEN APPROVED BY COMMITTERS YET*
>
> This can be easily done with Github Actions workflows - no need to write a
> bot for this.
>
> *Some details:*
>
> * Custom GitHub Action (generic) that checks if the PRs are approved by
> Committers (and no "disapprovals"). The action would produce an output ->
> "Approved", "Not approved". The output could be used to determine the
> matrix strategy scope (in our case we already have support for dynamic
> matrix strategy that I added a few weeks ago - so it's just a matter of
> wiring the output in).
> * Very small workflow with the same GithubAction run on
> "pull_request_target" event. That workflow would effective "observe" the
> PR, and when the status changes from "not approved" to "approved", it
> triggers a PR build (with the "full matrix strategy" this time because the
> PR will be already approved). This seems to be entirely possible. This
> "pull_request_target" workflow - similarly to "workflow_run" runs with a
> "write" access token and uses a "main branch" workflow version and it could
> easily trigger a rerun of the last PR build in such case.
>
> *Benefits:*
>
> - I think I could write such an action over the coming weekend (happy to
> collaborate with anyone on that). I will first search if someone has done
> something similar of course because maybe it can be done faster this way,
> but I am quite confident after writing my
> https://github.com/potiuk/cancel-workflow-runs which we already use to
> limit the strain that it is doable in a day/two
> - no need to change the process we have - we continue working as we did
> and simply "approved" PRs will be the full matrix strategy ones but the
> "not-approved-ones" will run a limited version of the checks.
> - no way to accidentally submit a breaking PR - when the committer
> approves the PR that has not been approved before, the PR build will be
> re-run with the "full matrix strategy" and not mergeable until it finishes
> - last-but-not-least: we can propose (and help) other ASF projects to use
> the action in their own GitHub Actions. It will not be changing anyone's
> process - which makes it super-easy to adopt and I can even turn it into a
> "recommended solution" by Apache Infra - similarly as Airflow's CI
> architecture is a recommended solution already for the integration of GA
> with DockerHub
> https://cwiki.apache.org/confluence/display/INFRA/Github+Actions+to+DockerHub
>
> WDYT?
>
> J
>
>
>
> On Fri, Oct 2, 2020 at 7:59 PM Jarek Potiuk <jarek.pot...@polidea.com>
> wrote:
>
>> I think it's a good idea, but  I'd augment it a bit. A  better option
>> will be to run all test types but
>> for only one chosen combination of "Python + DB type + DB version".
>>
>> I often don't even look at the PR until the tests pass and this would be
>> much better this way.
>> And often people have slower/small machines so they submit the PR to see
>> if they have not
>> broken any other tests. This is much, much easier than doing it locally -
>> because then in one
>> "fire&forget" you can run static, doc, unit tests, integration, and
>> Kubernetes ones,. And it's a valid
>> thing.
>>
>> Also, we have to make sure that such PR does not become "Green" before
>> all the tests are run.
>>  This might be rather problematic as Github does not yet have a " manual"
>> Approval step in
>> Github Actions (it's coming in Q4:
>> https://github.com/github/roadmap/issues/99).
>>
>> We have many tests and already we hit a bug a few times, where not all
>> tests have yet started
>> and we've merged such PR. I can imagine it will happen more and more
>> often if all PRs will
>> only run a subset of tests. It will be very easy to make that mistake
>> because even if we run a subset of
>> those tests, we have so many jobs that you cannot see them all in the
>> GitHub UI.
>>
>> So we will have to have a check that fails the PR but marks it somehow as
>> "Ready for review" for example adding
>> a label "Ready for review" when the subset of tests succeeds.
>>
>> Also, this might not be needed (or less important) if we implement:
>> https://github.com/apache/airflow/issues/10507 "Selective Tests"
>> for which I have an open PR. They will give much bigger improvements -
>> because, in the vast majority of cases, the tests will take
>> very little time - giving feedback
>> about relevant tests in a few minutes rather than half-an-hour. We can
>> also combine those two.
>>
>> It seems that I managed to finish some of the stuff that I thought will
>> take more time, so I might come back to it next week
>> if it goes as well as I planned.
>>
>> J.
>>
>>
>>
>> On Fri, Oct 2, 2020 at 3:57 AM Daniel Imberman <daniel.imber...@gmail.com>
>> wrote:
>>
>>> I’m not too worried about that. I think people would learn pretty
>>> quickly. It hasn’t been an issue for the kubernetes community so I can’t
>>> imagine it being an issue for us. End-of-day, we only have a limited amount
>>> of compute power and this will increase the speed we merge the PR’s that
>>> have passed basic code quality checks.
>>>
>>>
>>> On Thu, Oct 1, 2020 at 2:19 PM, Tomasz Urbaszek <turbas...@apache.org>
>>> wrote:
>>>
>>> I think I can agree. Especially with flaky tests, some contributors may
>>> be confused that some of the tests don't work on CI but work locally...
>>>
>>> Checking the code quality is good first step. Once there's a review we
>>> can start tests on CI.
>>>
>>> On the other hand, I can see people asking for starting the tests or
>>> being even more confused why some PRs have more CI builds than others...
>>>
>>> Cheers,
>>> Tomek
>>>
>>> On Thu, Oct 1, 2020 at 10:29 PM Daniel Imberman <
>>> daniel.imber...@gmail.com> wrote:
>>>
>>>> Hello all,
>>>>
>>>> With the recent uptick in airflow contribution and pull requests, I
>>>> have a proposal that I hope will ensure that we do not find ourselves in a
>>>> CI backlog hell. I noticed that on the Kubernetes project, pull requests do
>>>> not run integration test until a committer submits a "ready to test"
>>>> command to the CI bot. This step can prevent draft PRs or un-reviewed PRs
>>>> from taking github CI resources. It is worth noting that with breeze's
>>>> docker based testing system, users have the exact same testing capabilities
>>>> locally as they would on our CI.
>>>>
>>>> I propose that we allow unverified PR's to run basic and static tests,
>>>> but not perform the full test suite or integration test without first being
>>>> reviewed.
>>>>
>>>> What does everyone think?
>>>>
>>>
>>
>> --
>>
>> Jarek Potiuk
>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>
>> M: +48 660 796 129 <+48660796129>
>> [image: Polidea] <https://www.polidea.com/>
>>
>>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>
>

Re: Add /ready-to-test bot to airflow CI

Reply via email to