I tend to disagree with this approach. I think I can generally distill what
you said to 3 points, which I'll address below:

1. Every workflow needs a clean worker, and Apache has a limited number of
those. As a result, queue times can build up.

This is a problem, but it's one that should be resolved by introducing our
own self-hosted runners. (Jarek, I know you're actually involved in that
effort). Any significant push towards GHA is going to rely on that. Once we
have self-hosted runners, we won't necessarily need a clean machine every
time, and we also won't need to worry about capacity issues like we do now.

2. There are security issues with running workflows that are based on the
master branch.

This is sorta true - this is basically only a problem if we pass a
non-locked down secret into a step that runs code outside of our GH
workflow configuration (and is either running on a PR or on master with
some git checkout magic to target a different ref). That's not a trivial
risk, so we need to be careful, but it's also not one that's solved by
using probot - if our end goal is to kick off CI runs that use user code,
this is just a problem that we're going to have to deal with. It's
basically just one of those problems that Open Source CI is always going to
have (and one we already have with Jenkins) - how much do you restrict
resources available to a fork? As a rule, it is a good idea for any code we
want to trigger on arbitrary users' comments to not pass in secrets to that
code, or to explicitly limit the scope of them.

Kicking off workflows on master is not generally a problem because any code
they run will be vetted by the normal PR process.

3. Probot is a better option.

I'd argue Probot is just a different option, with different tradeoffs (and
I'd say is overall worse for our use case). Here are some pros and cons I
see of using GHA over ProBot

Pros:
- All config lives in the repo and can be managed without involving Infra
(you already called this out).
- We're already using GHA for some of our CI, this limits the number of
tools we're using.
- Lots of built in, *and* *supported*, functionality exists for Actions -
technically there are a lot of Probot apps out there that do similar things
to GHA, but most of them are deprecated or not actively maintained.
Speaking from experience, GH had a team of >100 working on building
out/maintaining Actions, and virtually nobody working on Probot related
things. A bunch of Probot apps were also deprecated when GitHub Actions v2
was launched. It's also worth calling out that *most *of the things you
have your Airflow Probot app doing exist already as actions.
- Relatedly, I expect future improvements to come to GHA, not Probot.
- Actions logging shows up in the repo where anyone can view it. Probot
requires a separate logging service.

Cons:
- Probot is a little snappier, even with private runners
- Maintaining state is easier with probot (not impossible with GHA, but
requires being kinda hacky).

To me, those cons don't justify partially moving off of GHA to ProBot.

Thanks,
Danny

On Fri, Jul 15, 2022 at 3:00 AM Jarek Potiuk <[email protected]> wrote:

> My 3 cents.
>
> We've been playing with similar approaches in Apache Airflow and I
> think Github Actions Workflows are not a good idea for this kind of
> behaviour. Github Action workflows are really "heavy-weight" in many
> ways, you should really think of them to be spinned to actually do
> some processing - building, compiling, processing. They have a lot of
> overhead and problems for "quick" actions like triggering something
> based on an event.
>
> There are ways to use "master" workflows - we do that in Airflow, it
> is possible to have a master workflow that checks out the code from
> the branch that the event came from. It has a number of security
> implications though, so it's nothing but complex and dangerous to use
> (because your master workflow can get dangerous write permissions and
> if you are not careful and execute PR-provided code, this might get
> exploited by bots opening PRs and making comments to your repo). But
> even if you solve the problems with "master-only"  you will hit the
> problem that there are 150 workers total for All Apache projects and
> every workflow needs a "clean worker" - which means that just to
> trigger an action on comment you will sometimes have to wait in a
> queue for available workers.
>
> I think for that kind of "triggering" and commenting, we have much
> better experience with Github Apps - which even if they require
> installing by Infra, provide a much better "snappiness" and you can do
> more with them
> https://docs.github.com/en/developers/apps/managing-github-apps/installing-github-apps
> - they react on webhooks coming from GitHub. It's a bit more hassle to
> develop and install but the effect is much better IMHO for the kind of
> behaviour you want. In Airflow we have Boring Cyborg which we
> developed years ago and it is rock-solid for this kind of quick
> actions https://github.com/kaxil/boring-cyborg  (based on Probot).
>
> I think Github Apps are kinda "forgotten" but when Github Actions were
> added - the main reason is that you had to develop it outside your
> repo, where GitHub Actions are mostly "in-repo" and you can "easily"
> make a change in a PR, but this is deceptive a bit. The "simple" PR
> workflows are fine, but when you feel the need of "write" access to
> your repo things get very complex, very quickly.
>
> I hope this helps.
>
> J.
>
> On Fri, Jul 15, 2022 at 4:41 AM Danny McCormick via dev
> <[email protected]> wrote:
> >
> > I don't think using trigger comments is generally super common, so no. I
> do still think it's a useful feature to have though, and part of GHA's
> approach is that everything is intentionally extensible and feature light
> so that you can tailor it to your org's needs.
> >
> > Actually, digging in further, it does look like the approach I
> recommended might not work out - from
> https://github.community/t/on-issue-comment-events-are-not-triggering-workflows/16784/13
> it looks like issue_comment always triggers on master unfortunately.
> >
> > Maybe a ]better approach here would be to use an action like
> https://github.com/peter-evans/repository-dispatch to trigger workflows
> on issue comment. That would allow us to have a single workflow that reads
> the issue comment and triggers a workflow as needed. The only downside is
> that it would rely on an external token that would need to be rotated on a
> yearly basis. It also puts all the issue triggering logic in a single place
> which is nice.
> >
> >
> > On Thu, Jul 14, 2022 at 5:07 PM Kenneth Knowles <[email protected]> wrote:
> >>
> >> Is this an idiomatic way to trigger GHA?
> >>
> >> On Thu, Jul 14, 2022 at 1:36 PM Danny McCormick via dev <
> [email protected]> wrote:
> >>>
> >>> Hey Fer,
> >>>
> >>> I'm not 100% sure I follow what you're trying to do, but one approach
> you could take is to gate everything off of an if like:
> >>>
> >>> ` if {{ (github.event.issue.pull_request && github.event.comment &&
> contains(github.event.comment.body, "PHRASE")) || !github.event.comment }}
> >>>
> >>> Basically, that's doing: `if (<this is a PR> && <this is a comment> &&
> <the comment contains the phrase>) || <this is not an issue comment>`
> >>>
> >>> I haven't tested it so it might be a little off syntactically, but I
> believe that should generally do what you're trying to do. FWIW, you might
> find it helpful to dump the context with an action like this in a test repo
> - that will show you exactly what is available in the github context for
> each kind of event so that you can use them accordingly.
> >>>
> >>> Thanks,
> >>> Danny
> >>>
> >>> On Thu, Jul 14, 2022 at 3:20 PM Fer Morales Martinez <
> [email protected]> wrote:
> >>>>
> >>>> Hello everyone!
> >>>>
> >>>> As part of the migration of the precommit and postcommit jobs from
> jenkins over to github actions, we're trying to implement the trigger
> phrase functionality.
> >>>> Our first approach was to use issue_comment
> >>>> One problem we noticed after testing is that it looks like
> issue_comment is mutually exclusive with the other events. For example,
> given the following flow
> >>>>
> >>>> name: Java Tests
> >>>> on:
> >>>>   schedule:
> >>>>     - cron: '10 2 * * *'
> >>>>   push:
> >>>>     branches: ['master', 'release-*']
> >>>>   pull_request:
> >>>>     branches: ['master', 'release-*']
> >>>>     paths: ['sdks/java/**', 'model/**', 'runners/**',
> 'examples/java/**',
> >>>>             'examples/kotlin/**', 'release/**', 'buildSrc/**']
> >>>>   issue_comment:
> >>>>   types: [created]
> >>>>
> >>>> jobs:
> >>>>   pr_commented:
> >>>>       # This job only runs for pull request comments
> >>>>       name: PR comment
> >>>>         if: ${{ github.event.issue.pull_request }}
> >>>>     runs-on: ubuntu-latest
> >>>>     steps:
> >>>>        - run: |
> >>>>               echo A comment on PR $NUMBER
> >>>>         env:
> >>>>            NUMBER: ${{ github.event.issue.number }}
> >>>>
> >>>>   issue_commented:
> >>>>     # This job only runs for issue comments
> >>>>     name: Issue comment
> >>>>     if: ${{ !github.event.issue.pull_request }}
> >>>>     runs-on: ubuntu-latest
> >>>>     steps:
> >>>>       - run: |
> >>>>           echo A comment on issue $NUMBER
> >>>>         env:
> >>>>           NUMBER: ${{ github.event.issue.number }}
> >>>>
> >>>> the flow will get kicked off but upon validating the if conditional,
> neither job will run since the github.event.issue.pull_request parameter is
> not present.
> >>>>
> >>>> Another caveat is that we'd have to populate with an if statement
> every job we wanted to execute by way of a trigger phrase.
> >>>>
> >>>> Does someone have experience with this kind of work? Has someone
> implemented trigger phrases in github actions?
> >>>>
> >>>> Thanks!
> >>>>
> >>>>
> >>>> This email and its contents (including any attachments) are being
> sent to
> >>>> you on the condition of confidentiality and may be protected by legal
> >>>> privilege. Access to this email by anyone other than the intended
> recipient
> >>>> is unauthorized. If you are not the intended recipient, please
> immediately
> >>>> notify the sender by replying to this message and delete the material
> >>>> immediately from your system. Any further use, dissemination,
> distribution
> >>>> or reproduction of this email is strictly prohibited. Further, no
> >>>> representation is made with respect to any content contained in this
> email.
>

Reply via email to