Hi Tiago,

thank you for the reply.

Two clarifications:
- I expect sheriffs to be rotating on a monthly or two week basis. The idea
is to share the burden for something that is typically not superfun
- I do not want to create CI superexperts, but instead to spread the
knowledge among all the team. Having people have to look into stuff and
starting to understand why they fail seems a sensible starting point.

For the PR checks see my other email - basically I agree and I hope we can
act on that starting next week.

Regards

Paolo

On Thu, Aug 1, 2024 at 4:58 PM Tiago Bento <[email protected]> wrote:

> Thanks Paolo for starting this conversation. Let me bring a little bit
> of my perspective to it.
>
> Although I agree that having people "dedicated" to the quality and
> stability of our CI and other automations would be better than what we
> have today, having our builds break so often that we need a system in
> place to deal with them is a symptom of other problems, IMHO.
>
> The complexity of our CI systems and automations is discouraging for
> most people to get involved. Without the system itself changing and
> being more approachable, having "build sheriffs" will only make the
> separation between "development" and "CI" bigger, and we'll be reliant
> on a small group of people who'll become solely responsible for either
> fixing stuff other people broke, or chasing them to fix it. When
> inevitably these experts can't or simply don't want to contribute to
> this area of the community anymore, we're in big trouble.
>
> My opinion is that we could try and concentrate our efforts to reduce
> the barrier of entry to maintaining the CI and automations we have,
> while putting a system in place that will naturally have each one of
> us know at least the basics of how the CI and automations work.
>
> From my experience maintaining `kie-tools`, a few things help reaching
> that point:
> 1. Having local builds be as similar as possible to CI builds. No
> fancy commands or profiles that only run on CI.
> 2. Red PRs can't be merged. Ever. If your PR became red for "unrelated
> reasons", you then become responsible to fix the "unrelated issue",
> helping everyone else not face the same problem.
> 3. Having a CI system with the least amount of abstractions possible.
> Less CI code == less cognitive load == smaller barrier of entry.
>
> Moving away from Jenkins for PR checks and concentrating on GitHub
> Actions is, IMHO, already a great step in that direction.
>
> I hope I could bring something positive to the discussion.
>
> Thanks!
>
> Regards,
>
> Tiago Bento
>
> On Thu, Aug 1, 2024 at 10:08 AM Gabriele Cardosi
> <[email protected]> wrote:
> >
> > Thanks for clarification, Paolo!
> >
> > Il giorno gio 1 ago 2024 alle ore 15:46 Paolo Bizzarri <
> [email protected]>
> > ha scritto:
> >
> > > Hi Gabriele,
> > >
> > > it is a mix of various stuff.
> > >
> > > For example, take the various issues that I reported in the analysis
> done
> > > for 10.x branch. Most of them apply just the same for the main branch.
> > >
> > > For example
> > >
> > >
> https://ci-builds.apache.org/job/KIE/job/kogito/job/main/job/tools/job/kogito-clean-old-nightly-images/
> > >
> > > Now this is probably a build that has to be just deleted - but still
> it is
> > > always red, and we need someone that looks at it and decide that yes,
> we
> > > need to get rid of it, create a corresponding kie issue and go after
> it.
> > >
> > > Another example:
> > >
> > >
> https://ci-builds.apache.org/job/KIE/job/kogito/job/10.0.x/job/nightly/job/kogito-examples.build-and-deploy/17/
> > >
> > > This test has been failing almost every day in the last few days.
> Either we
> > > need to make it a little more stable, or get rid of it.
> > >
> > > And so on.
> > >
> > > The goal of the sheriff is to keep the top level folder in good
> health, and
> > > that means that all the underlying jobs are healthy.
> > >
> > > I hope this clarifies my proposal.
> > >
> > > Regards
> > >
> > > Paolo
> > >
> > >
> > >
> > > On Thu, Aug 1, 2024 at 3:18 PM Gabriele Cardosi <
> > > [email protected]>
> > > wrote:
> > >
> > > > Hi Paolo,
> > > > may you explain exactly what you mean with "builds are often broken"
> ?
> > > May
> > > > you give an example of such and, in the example, what should the
> > > "sheriff"
> > > > do to manage it ? (Sorry, I just need to understand what you are
> > > referring
> > > > to)
> > > >
> > > > Thanks!
> > > >
> > > > Il giorno gio 1 ago 2024 alle ore 15:09 Paolo Bizzarri <
> > > [email protected]>
> > > > ha scritto:
> > > >
> > > > > Hello kie mates,
> > > > >
> > > > > please find my proposal in the following.
> > > > >
> > > > > PROBLEM
> > > > > - builds are often broken and they stay broken for a long time.
> There
> > > > seem
> > > > > to be not a clear definition of who should take care of this
> > > > >
> > > > > CONTEXT
> > > > > - fixing builds is slow, annoying and tipically is more a job of
> > > chasing
> > > > > someone else than fixing it yourself. So it becomes quickly
> wearing.
> > > > >
> > > > > PROPOSED SOLUTION
> > > > > - identify a number of build sheriffs that look at the various
> builds,
> > > > open
> > > > > the relevant issues for tracking and chase other devs and
> contributors
> > > to
> > > > > fix the issues themselves. The sheriffs are not supposed to fix
> > > > everything
> > > > > by themselves, but instead to keep the attention of other
> developers on
> > > > the
> > > > > status of the builds.
> > > > > I suggest we have three sheriffs, that stay around for one  month
> and
> > > > then
> > > > > pass the token to someone else: one for drools and optaplanner,
> one for
> > > > > kogito, one for kie-tools.
> > > > >
> > > > > Let me know your ideas and feedback.
> > > > >
> > > > > Regards
> > > > >
> > > > > Paolo
> > > > >
> > > >
> > >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to