+1000!

via Newton Mail 
[https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.51&pv=10.15.6&source=email_footer_2]
On Fri, Oct 23, 2020 at 8:22 AM, Jarek Potiuk <jarek.pot...@polidea.com> wrote:
Thanks Tobiasz :). fantastic.
I prepared a very short and simple design doc 
https://docs.google.com/document/d/16rwyCfyDpKWN-DrLYbhjU0B1D58T1RFYan5ltmw4DQg/edit#
 
[https://docs.google.com/document/d/16rwyCfyDpKWN-DrLYbhjU0B1D58T1RFYan5ltmw4DQg/edit#]
 where we can collaborate.
I also added you as collaborator to 
https://github.com/potiuk/get-workflow-origin 
[https://github.com/potiuk/get-workflow-origin] that we already use, and I 
think you can update the "get workflow origin" plugin to include status of the 
PR in the output of the action (ore maybe we find out that we already have what 
we need in GitHub context).
I will take a look at finding out how/if we can trigger the "full build" 
automatically when approval status changes from "Not approved" to "Approved".
J.



On Thu, Oct 22, 2020 at 7:20 PM Tobiasz Kędzierski < 
tobiasz.kedzier...@polidea.com [tobiasz.kedzier...@polidea.com] > wrote:
Hi Jarek.

sounds good to me. I am happy to help you as much as I can with it.

BR
Tobiasz
On Thu, Oct 22, 2020 at 9:06 AM Jarek Potiuk < jarek.pot...@polidea.com 
[jarek.pot...@polidea.com] > wrote:
TLDR; I thought about it a bit and I have a proposal on how to solve it even 
better - one that can be implemented over the weekend (I volunteer :) ) and 
that can be very easily shared and adopted by the other ASF projects so that we 
all collectively decrease the strain on Github Actions.
This is in parallel to our efforts on having self-hosted workers of course, but 
I think it will be needed anyway. Let me put it in a bit of context
Problem statement:
* the root cause of the problem is that we are competing with many other 
projects of ASF for the 180 jobs. I have started the discussion in 
bui...@apache.org [bui...@apache.org] about this: 
https://lists.apache.org/thread.html/r1708881f52adbdae722afb8fea16b23325b739b254b60890e72375e1%40%3Cbuilds.apache.org%3E
 
[https://lists.apache.org/thread.html/r1708881f52adbdae722afb8fea16b23325b739b254b60890e72375e1%40%3Cbuilds.apache.org%3E]
 and it's clear all ASF projects using GA have the same problem and compete 
against each other for the jobs. * if we decrease the strain on our side, this 
is not solving the problem long term. We keep on doing it already, and we 
already decrease a lot of strain, but other projects from ASF increase their 
strain in the meantime (Apache Beam, Skywalking, and few other projects are 
becoming heavy GitHub Actions users). * in all the projects that I looked at, 
we have the same root cause. Matrix strategy of builds causes enormous strain 
on Github Actions if the whole matrix is run for all PRs. We are going to make 
it works sustainably only if we come up with an easy solution, that can be 
applied to all those projects. * I think the comment-based PR triggering 
process is complex and cumbersome to follow. It puts a LOT more effort on the 
committers because they not only have to review and comment on the PRs but also 
make decisions that those PRs are ready for "full build". This is a lot of 
unnecessary effort and complicated process that many of the ASF projects will 
not like to adopt
Proposed Solution:
Add an easy way to limit the matrix strategy to one "default" combo for PRs 
THAT HAVE NOT BEEN APPROVED BY COMMITTERS YET
This can be easily done with Github Actions workflows - no need to write a bot 
for this.
Some details:
* Custom GitHub Action (generic) that checks if the PRs are approved by 
Committers (and no "disapprovals"). The action would produce an output -> 
"Approved", "Not approved". The output could be used to determine the matrix 
strategy scope (in our case we already have support for dynamic matrix strategy 
that I added a few weeks ago - so it's just a matter of wiring the output in). 
* Very small workflow with the same GithubAction run on "pull_request_target" 
event. That workflow would effective "observe" the PR, and when the status 
changes from "not approved" to "approved", it triggers a PR build (with the 
"full matrix strategy" this time because the PR will be already approved). This 
seems to be entirely possible. This "pull_request_target" workflow - similarly 
to "workflow_run" runs with a "write" access token and uses a "main branch" 
workflow version and it could easily trigger a rerun of the last PR build in 
such case.
Benefits:
- I think I could write such an action over the coming weekend (happy to 
collaborate with anyone on that). I will first search if someone has done 
something similar of course because maybe it can be done faster this way, but I 
am quite confident after writing my 
https://github.com/potiuk/cancel-workflow-runs 
[https://github.com/potiuk/cancel-workflow-runs] which we already use to limit 
the strain that it is doable in a day/two
- no need to change the process we have - we continue working as we did and 
simply "approved" PRs will be the full matrix strategy ones but the 
"not-approved-ones" will run a limited version of the checks. - no way to 
accidentally submit a breaking PR - when the committer approves the PR that has 
not been approved before, the PR build will be re-run with the "full matrix 
strategy" and not mergeable until it finishes - last-but-not-least: we can 
propose (and help) other ASF projects to use the action in their own GitHub 
Actions. It will not be changing anyone's process - which makes it super-easy 
to adopt and I can even turn it into a "recommended solution" by Apache Infra - 
similarly as Airflow's CI architecture is a recommended solution already for 
the integration of GA with DockerHub 
https://cwiki.apache.org/confluence/display/INFRA/Github+Actions+to+DockerHub 
[https://cwiki.apache.org/confluence/display/INFRA/Github+Actions+to+DockerHub]
WDYT?
J


On Fri, Oct 2, 2020 at 7:59 PM Jarek Potiuk < jarek.pot...@polidea.com 
[jarek.pot...@polidea.com] > wrote:
I think it's a good idea, but I'd augment it a bit. A better option will be to 
run all test types but for only one chosen combination of "Python + DB type + 
DB version".
I often don't even look at the PR until the tests pass and this would be much 
better this way. And often people have slower/small machines so they submit the 
PR to see if they have not broken any other tests. This is much, much easier 
than doing it locally - because then in one "fire&forget" you can run static, 
doc, unit tests, integration, and Kubernetes ones,. And it's a valid thing.
Also, we have to make sure that such PR does not become "Green" before all the 
tests are run. This might be rather problematic as Github does not yet have a " 
manual" Approval step in Github Actions (it's coming in Q4: 
https://github.com/github/roadmap/issues/99 
[https://github.com/github/roadmap/issues/99] ).
We have many tests and already we hit a bug a few times, where not all tests 
have yet started and we've merged such PR. I can imagine it will happen more 
and more often if all PRs will only run a subset of tests. It will be very easy 
to make that mistake because even if we run a subset of those tests, we have so 
many jobs that you cannot see them all in the GitHub UI.
So we will have to have a check that fails the PR but marks it somehow as 
"Ready for review" for example adding a label "Ready for review" when the 
subset of tests succeeds.
Also, this might not be needed (or less important) if we implement: 
https://github.com/apache/airflow/issues/10507 
[https://github.com/apache/airflow/issues/10507] "Selective Tests" for which I 
have an open PR. They will give much bigger improvements - because, in the vast 
majority of cases, the tests will take very little time - giving feedback about 
relevant tests in a few minutes rather than half-an-hour. We can also combine 
those two.
It seems that I managed to finish some of the stuff that I thought will take 
more time, so I might come back to it next week if it goes as well as I planned.
J.


On Fri, Oct 2, 2020 at 3:57 AM Daniel Imberman < daniel.imber...@gmail.com 
[daniel.imber...@gmail.com] > wrote:
I’m not too worried about that. I think people would learn pretty quickly. It 
hasn’t been an issue for the kubernetes community so I can’t imagine it being 
an issue for us. End-of-day, we only have a limited amount of compute power and 
this will increase the speed we merge the PR’s that have passed basic code 
quality checks.

On Thu, Oct 1, 2020 at 2:19 PM, Tomasz Urbaszek < turbas...@apache.org 
[turbas...@apache.org] > wrote:
I think I can agree. Especially with flaky tests, some contributors may be 
confused that some of the tests don't work on CI but work locally...
Checking the code quality is good first step. Once there's a review we can 
start tests on CI.
On the other hand, I can see people asking for starting the tests or being even 
more confused why some PRs have more CI builds than others...
Cheers, Tomek
On Thu, Oct 1, 2020 at 10:29 PM Daniel Imberman < daniel.imber...@gmail.com 
[daniel.imber...@gmail.com] > wrote:
Hello all,
With the recent uptick in airflow contribution and pull requests, I have a 
proposal that I hope will ensure that we do not find ourselves in a CI backlog 
hell. I noticed that on the Kubernetes project, pull requests do not run 
integration test until a committer submits a "ready to test" command to the CI 
bot. This step can prevent draft PRs or un-reviewed PRs from taking github CI 
resources. It is worth noting that with breeze's docker based testing system, 
users have the exact same testing capabilities locally as they would on our CI.
I propose that we allow unverified PR's to run basic and static tests, but not 
perform the full test suite or integration test without first being reviewed.
What does everyone think?

--
   Jarek Potiuk                                                       
   Polidea [https://www.polidea.com/] | Principal Software Engineer   

M: +48 660 796 129 [tel:+48660796129]   
[https://www.polidea.com/]              




--
   Jarek Potiuk                                                       
   Polidea [https://www.polidea.com/] | Principal Software Engineer   

M: +48 660 796 129 [tel:+48660796129]   
[https://www.polidea.com/]              




--
   Jarek Potiuk                                                       
   Polidea [https://www.polidea.com/] | Principal Software Engineer   

M: +48 660 796 129 [tel:+48660796129]   
[https://www.polidea.com/]

Reply via email to