I believe this problem cannot be really handled by one project, but I have
a proposal.

I looked at the common pattern we have in the ASF projects and I think
there is a way that we can help each other.

I think most of the problems come from many PRs submitted that run a matrix
of tests before even commiters have time to take a look at them. We
discussed how we can approach it and I think I have a proposal that we can
all adopt in the ASF projects. Something that will be easy to implement and
will not impact the process we have. I would love to hear your thoughts
about it - before I start implementing it :).

My proposal is to create a GitHub Action that will allow to run only a
subset of "matrix" test for PRs that are not yet approved by committers.
This should be possible using the current GitHub Actions workflows and API.
It boils down to:
* If PR is not approved, only a subset of matrix (default value for each
matrix component) are run
* the committers can see the "green" mark of test passing and make a review
* once the PR gets approved, automatically a new "full matrix" check is
triggered
* all future approved PR pushes run the "full matrix" check

I think that might significantly reduce the strain on GA jobs we run, and
it should very naturally fit in the typical PR workflow for ASF projects.
But I am only guessing now, so I would love to hear what you think:

I am willing (together with my colleagues) to implement this action and add
it to Apache Airflow to check it. Together with the
"cancel-workflow-action" I developed and we deployed it at Apache Airflow
and Apache Beam, I think that might help to keep the CI "pressure" much
lower - independently if any of the projects manages to get their credit
sponsors. I think I can have a working Action/implementation done over the
weekend:

More details about the proposal here:
https://lists.apache.org/thread.html/r6f6f1420aa6346c9f81bf9d9fff8816e860e49224eb02e25d856c249%40%3Cdev.airflow.apache.org%3E

J,

On Mon, Oct 19, 2020 at 5:28 PM Jarek Potiuk <[email protected]>
wrote:

> Yep. We still continuously optimize it and we are reaching out to get
> funding for self-hosted runners. And I think it would be great to see that
> happening. I am happy to help anyone who needs some help there - I've been
> already helping Apache Beam with their GitHub Actions settings.
>
> On Mon, Oct 19, 2020 at 6:12 AM Greg Stein <[email protected]> wrote:
>
>> This is some great news, Jarek.
>>
>> Given that GitHub build minutes are shared, we need more of this kind of
>> work from our many communities.
>>
>> Thanks,
>> Greg
>> InfraAdmin, ASF
>>
>>
>> On Sun, Oct 18, 2020 at 2:32 PM Jarek Potiuk <[email protected]>
>> wrote:
>>
>> > Hello Allen,
>> >
>> > I'd really love to give a try to Yetus - how it can actually make our
>> > approach better.
>> >
>> > I just merged the change I planned (finally we got to that), that
>> > implements the final optimisation that you mentioned. In the case of a
>> > single .md file change we got the build time down to about 1 minute,
>> most
>> > of it being GitHub Actions "workflow" overhead.
>> >
>> > We went-down with the incremental pre-commit tests to ~ 25s.
>> >
>> > Build here: https://github.com/potiuk/airflow/pull/128/checks. As you
>> can
>> > see here:
>> >
>> >
>> https://github.com/potiuk/airflow/pull/128/checks?check_run_id=1268353637#step:7:98
>> > in
>> > this case we run only the relevant static checks:
>> >
>> >    - "No-tabs checker"
>> >    - "Add license for all md files"
>> >    - "Add TOC for md files."
>> >    - "Check for merge conflicts"
>> >    - "Detect Private Key"
>> >    - "Fix End of Files"
>> >    - "Trim Trailing Whitespace"
>> >    - "Check for language that we do not accept as community",
>> >
>> > All the other checks, image building, and all the extra checks are
>> skipped
>> > (automatically as pre-commit determined them irrelevant).
>> >
>> > All this, while we keep really comprehensive tests and optimisation of
>> > image building for all the "serious steps". I tried to explain the
>> > philosophy and some basic assumptions behind our CI in
>> > https://github.com/apache/airflow/blob/master/CI.rst#ci-environment -
>> and
>> > I'd love to try to see how this plays together with the Yetus tool.
>> >
>> > Would it be possible to work together with the Yetus team on trying to
>> see
>> > how it can help to further optimise and possibly simplify the setup we
>> > have? I'd love to get some cooperation on those. I am nearly done with
>> all
>> > optimisations I planned, And we are for years (long before my tenure)
>> among
>> > top-3 Apache projects when it comes to CI-time use, so that might be a
>> good
>> > one if we can pull together some improvements.
>> >
>> >
>> > J.
>> >
>> >
>> >
>> > On Wed, Oct 14, 2020 at 4:41 PM Jarek Potiuk <[email protected]>
>> > wrote:
>> >
>> > > Exactly - > dialectic vs. dislectic for example.
>> > >
>> > > On Wed, Oct 14, 2020 at 4:40 PM Jarek Potiuk <
>> [email protected]>
>> > > wrote:
>> > >
>> > >> And really sorry about yatus vs. yetus - I am slightly dialectic and
>> > when
>> > >> things are not in the dictionary, I tend to do many mistakes. I hope
>> > it's
>> > >> not something that people can take as a sign of being "worse", but if
>> > you
>> > >> felt offended by that - apologies.
>> > >>
>> > >>
>> > >>
>> > >> On Wed, Oct 14, 2020 at 4:34 PM Jarek Potiuk <
>> [email protected]>
>> > >> wrote:
>> > >>
>> > >>> Hey Allen,
>> > >>>
>> > >>> I would be super happy if you could help us to do it properly at
>> > Airlfow
>> > >>> - would you like to work with us and get the yatus configuration
>> that
>> > >>> would work for us ? I am super happy to try it? Maybe you could
>> open PR
>> > >>> with some basic yatus implementation to start with and we could work
>> > >>> together to get it simplified? I would love to learn how to do it.
>> > >>>
>> > >>> J
>> > >>>
>> > >>>
>> > >>> On Wed, Oct 14, 2020 at 3:37 PM Allen Wittenauer
>> > >>> <[email protected]> wrote:
>> > >>>
>> > >>>>
>> > >>>>
>> > >>>> > On Oct 13, 2020, at 11:04 PM, Jarek Potiuk <
>> > [email protected]>
>> > >>>> wrote:
>> > >>>> > This is a logic
>> > >>>> > that we have to implement regardless - whether we use yatus or
>> > >>>> pre-commit
>> > >>>> > (please correct me if I am wrong).
>> > >>>>
>> > >>>>         I'm not sure about yatus, but for yetus, for the most part,
>> > >>>> yes, one would like to need to implement custom rules in the
>> > personality to
>> > >>>> exactly duplicate the overly complicated and over engineered
>> airflow
>> > >>>> setup.  The big difference is that one wouldn't be starting from
>> > scratch.
>> > >>>> The difference engine is already there. The file filter is already
>> > there.
>> > >>>> full build vs. PR handling is already there. etc etc etc
>> > >>>>
>> > >>>> > For all others, this is not a big issue because in total all
>> other
>> > >>>> > pre-commits take 2-3 minutes at best. And if we find that we
>> need to
>> > >>>> > optimize it further we can simply disable the '--all-files'
>> switch
>> > for
>> > >>>> > pre-commit and they will only run on the latest commit-changed
>> files
>> > >>>> > (pre-commit will only run the tests related to those changed
>> files).
>> > >>>> But
>> > >>>> > since they are pretty fast (except pylint/mypy/flake8) we think
>> > >>>> running
>> > >>>> > them all, for now, is not a problem.
>> > >>>>
>> > >>>>         That's what everyone thinks until they start aggregating
>> the
>> > >>>> time across all changes...
>> > >>>>
>> > >>>>
>> > >>>
>> > >>> --
>> > >>>
>> > >>> Jarek Potiuk
>> > >>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>> > >>>
>> > >>> M: +48 660 796 129 <+48660796129>
>> > >>> [image: Polidea] <https://www.polidea.com/>
>> > >>>
>> > >>>
>> > >>
>> > >> --
>> > >>
>> > >> Jarek Potiuk
>> > >> Polidea <https://www.polidea.com/> | Principal Software Engineer
>> > >>
>> > >> M: +48 660 796 129 <+48660796129>
>> > >> [image: Polidea] <https://www.polidea.com/>
>> > >>
>> > >>
>> > >
>> > > --
>> > >
>> > > Jarek Potiuk
>> > > Polidea <https://www.polidea.com/> | Principal Software Engineer
>> > >
>> > > M: +48 660 796 129 <+48660796129>
>> > > [image: Polidea] <https://www.polidea.com/>
>> > >
>> > >
>> >
>> > --
>> >
>> > Jarek Potiuk
>> > Polidea <https://www.polidea.com/> | Principal Software Engineer
>> >
>> > M: +48 660 796 129 <+48660796129>
>> > [image: Polidea] <https://www.polidea.com/>
>> >
>>
>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>
>

-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Reply via email to