Sounds like the conversation has evolved a bit more than the original proposal. Francisco, would you mind creating a new reply with the update and consensus we've achieved so far, or at least as you understand it? Thank you.
On 2024/08/01 17:20:22 Francisco Javier Tirado Sarti wrote: > Hi Tiago, > I fully agree with that procedure (and I think skipping the issue for small > ones, like the one I just opened, is also the way to go) > > On Thu, Aug 1, 2024 at 7:13 PM Tiago Bento <[email protected]> wrote: > > > Paolo, I understand your proposal, I'm just concerned that the build > > sheriff role keeps rotating around the same people, as not everyone is > > available to volunteer and/or act as such. I know it's not your > > intention, but it's something that can happen. We can't expect all > > contributors to express interest in being the build sheriff for a > > month, nor that this will be something we can maintain running > > sustainably. > > > > Francisco, filing a new issue and fixing the problem on a separate PR > > is indeed the way to go, IMHO. What we usually do on `kie-tools` is: > > 1. Send a PR with a change. > > 2. Observe red PR checks, unrelated to the changes introduced. > > 3. Open an issue and send a separate PR targeting the same branch of > > the original PR, fixing the problem on the PR checks. > > 4. Review and merge this second PR, closing the new issue. > > 5. Retrigger PR checks on the original PR. > > 6. Observe a green build, review and merge it normally. > > > > * Sometimes we skip opening an issue, if the effort to fix it is small > > enough, and we can use the PR description to provide enough context > > for reviewers and watchers of the repo. > > > > The important thing, IMHO, is that the original PR doesn't get merged > > before the unrelated issue on PR checks is fixed. Otherwise we open a > > credit line that allows us to fall into tech debt :) My view has > > always been that we need to collectively cherish our CI, PR checks and > > automations, seeing those as the canonical way to build our software. > > But it's been really hard to cherish something so distant from our > > day-to-day work, especially when we all can, to some extent, continue > > operating the same way we've been for the last who knows how many > > years, somewhat ignoring the systems we currently have :/ > > > > On Thu, Aug 1, 2024 at 11:54 AM Francisco Javier Tirado Sarti > > <[email protected]> wrote: > > > > > > I forgot to mention that another topic that is difficult to fix but easy > > to > > > discuss is if the Jenkins machines executing the test are properly > > > dimensioned for the test we are executing. > > > For example, in the previous PR, the timeout to startup the Keycloak > > > quarkus instance IT test was increased to 2 minutes, because the default > > of > > > 1 minute does not seem to be enough for downloading and running the > > > keycloak image. > > > The CI is already taking ages. > > > Either we increase our HW resources for testing, or we start reducing our > > > test scope. > > > > > > On Thu, Aug 1, 2024 at 5:42 PM Francisco Javier Tirado Sarti < > > > [email protected]> wrote: > > > > > > > By the way I opened > > > > https://github.com/apache/incubator-kie-kogito-examples/pull/1991 for > > > > fixing > > > > > > https://ci-builds.apache.org/job/KIE/job/kogito/job/10.0.x/job/nightly/job/kogito-examples.build-and-deploy/17/ > > > > So it can be said that I acted as sheriff, but since Im weak and cannot > > > > hold the pressure, I pass the torch (or the start) to the next one ;) > > > > > > > > > > > > On Thu, Aug 1, 2024 at 5:37 PM Francisco Javier Tirado Sarti < > > > > [email protected]> wrote: > > > > > > > >> Hi Tiago, > > > >> About point 2, when the issue blocking the merge is really unrelated, > > it > > > >> won't be a better approach to open a separate issue to fix the > > unrelated > > > >> issue? > > > >> I think we agree that is better for tracking (so you do not see an > > > >> unrelated change in a PR history) and will avoid the undesired > > situation of > > > >> two developers trying to fix the same unrelated issue from two > > simultaneous > > > >> PRs (one of the two eventually has to trigger the rebase and realize > > the > > > >> broken test is already fixed, but still, there are less chances of > > them > > > >> working in the same problem if there is an issue in the issue list) > > > >> > > > >> > > > >> On Thu, Aug 1, 2024 at 4:58 PM Tiago Bento <[email protected]> > > wrote: > > > >> > > > >>> Thanks Paolo for starting this conversation. Let me bring a little > > bit > > > >>> of my perspective to it. > > > >>> > > > >>> Although I agree that having people "dedicated" to the quality and > > > >>> stability of our CI and other automations would be better than what > > we > > > >>> have today, having our builds break so often that we need a system in > > > >>> place to deal with them is a symptom of other problems, IMHO. > > > >>> > > > >>> The complexity of our CI systems and automations is discouraging for > > > >>> most people to get involved. Without the system itself changing and > > > >>> being more approachable, having "build sheriffs" will only make the > > > >>> separation between "development" and "CI" bigger, and we'll be > > reliant > > > >>> on a small group of people who'll become solely responsible for > > either > > > >>> fixing stuff other people broke, or chasing them to fix it. When > > > >>> inevitably these experts can't or simply don't want to contribute to > > > >>> this area of the community anymore, we're in big trouble. > > > >>> > > > >>> My opinion is that we could try and concentrate our efforts to reduce > > > >>> the barrier of entry to maintaining the CI and automations we have, > > > >>> while putting a system in place that will naturally have each one of > > > >>> us know at least the basics of how the CI and automations work. > > > >>> > > > >>> From my experience maintaining `kie-tools`, a few things help > > reaching > > > >>> that point: > > > >>> 1. Having local builds be as similar as possible to CI builds. No > > > >>> fancy commands or profiles that only run on CI. > > > >>> 2. Red PRs can't be merged. Ever. If your PR became red for > > "unrelated > > > >>> reasons", you then become responsible to fix the "unrelated issue", > > > >>> helping everyone else not face the same problem. > > > >>> 3. Having a CI system with the least amount of abstractions possible. > > > >>> Less CI code == less cognitive load == smaller barrier of entry. > > > >>> > > > >>> Moving away from Jenkins for PR checks and concentrating on GitHub > > > >>> Actions is, IMHO, already a great step in that direction. > > > >>> > > > >>> I hope I could bring something positive to the discussion. > > > >>> > > > >>> Thanks! > > > >>> > > > >>> Regards, > > > >>> > > > >>> Tiago Bento > > > >>> > > > >>> On Thu, Aug 1, 2024 at 10:08 AM Gabriele Cardosi > > > >>> <[email protected]> wrote: > > > >>> > > > > >>> > Thanks for clarification, Paolo! > > > >>> > > > > >>> > Il giorno gio 1 ago 2024 alle ore 15:46 Paolo Bizzarri < > > > >>> [email protected]> > > > >>> > ha scritto: > > > >>> > > > > >>> > > Hi Gabriele, > > > >>> > > > > > >>> > > it is a mix of various stuff. > > > >>> > > > > > >>> > > For example, take the various issues that I reported in the > > analysis > > > >>> done > > > >>> > > for 10.x branch. Most of them apply just the same for the main > > > >>> branch. > > > >>> > > > > > >>> > > For example > > > >>> > > > > > >>> > > > > > >>> > > https://ci-builds.apache.org/job/KIE/job/kogito/job/main/job/tools/job/kogito-clean-old-nightly-images/ > > > >>> > > > > > >>> > > Now this is probably a build that has to be just deleted - but > > still > > > >>> it is > > > >>> > > always red, and we need someone that looks at it and decide that > > > >>> yes, we > > > >>> > > need to get rid of it, create a corresponding kie issue and go > > after > > > >>> it. > > > >>> > > > > > >>> > > Another example: > > > >>> > > > > > >>> > > > > > >>> > > https://ci-builds.apache.org/job/KIE/job/kogito/job/10.0.x/job/nightly/job/kogito-examples.build-and-deploy/17/ > > > >>> > > > > > >>> > > This test has been failing almost every day in the last few days. > > > >>> Either we > > > >>> > > need to make it a little more stable, or get rid of it. > > > >>> > > > > > >>> > > And so on. > > > >>> > > > > > >>> > > The goal of the sheriff is to keep the top level folder in good > > > >>> health, and > > > >>> > > that means that all the underlying jobs are healthy. > > > >>> > > > > > >>> > > I hope this clarifies my proposal. > > > >>> > > > > > >>> > > Regards > > > >>> > > > > > >>> > > Paolo > > > >>> > > > > > >>> > > > > > >>> > > > > > >>> > > On Thu, Aug 1, 2024 at 3:18 PM Gabriele Cardosi < > > > >>> > > [email protected]> > > > >>> > > wrote: > > > >>> > > > > > >>> > > > Hi Paolo, > > > >>> > > > may you explain exactly what you mean with "builds are often > > > >>> broken" ? > > > >>> > > May > > > >>> > > > you give an example of such and, in the example, what should > > the > > > >>> > > "sheriff" > > > >>> > > > do to manage it ? (Sorry, I just need to understand what you > > are > > > >>> > > referring > > > >>> > > > to) > > > >>> > > > > > > >>> > > > Thanks! > > > >>> > > > > > > >>> > > > Il giorno gio 1 ago 2024 alle ore 15:09 Paolo Bizzarri < > > > >>> > > [email protected]> > > > >>> > > > ha scritto: > > > >>> > > > > > > >>> > > > > Hello kie mates, > > > >>> > > > > > > > >>> > > > > please find my proposal in the following. > > > >>> > > > > > > > >>> > > > > PROBLEM > > > >>> > > > > - builds are often broken and they stay broken for a long > > time. > > > >>> There > > > >>> > > > seem > > > >>> > > > > to be not a clear definition of who should take care of this > > > >>> > > > > > > > >>> > > > > CONTEXT > > > >>> > > > > - fixing builds is slow, annoying and tipically is more a > > job of > > > >>> > > chasing > > > >>> > > > > someone else than fixing it yourself. So it becomes quickly > > > >>> wearing. > > > >>> > > > > > > > >>> > > > > PROPOSED SOLUTION > > > >>> > > > > - identify a number of build sheriffs that look at the > > various > > > >>> builds, > > > >>> > > > open > > > >>> > > > > the relevant issues for tracking and chase other devs and > > > >>> contributors > > > >>> > > to > > > >>> > > > > fix the issues themselves. The sheriffs are not supposed to > > fix > > > >>> > > > everything > > > >>> > > > > by themselves, but instead to keep the attention of other > > > >>> developers on > > > >>> > > > the > > > >>> > > > > status of the builds. > > > >>> > > > > I suggest we have three sheriffs, that stay around for one > > > >>> month and > > > >>> > > > then > > > >>> > > > > pass the token to someone else: one for drools and > > optaplanner, > > > >>> one for > > > >>> > > > > kogito, one for kie-tools. > > > >>> > > > > > > > >>> > > > > Let me know your ideas and feedback. > > > >>> > > > > > > > >>> > > > > Regards > > > >>> > > > > > > > >>> > > > > Paolo > > > >>> > > > > > > > >>> > > > > > > >>> > > > > > >>> > > > >>> --------------------------------------------------------------------- > > > >>> To unsubscribe, e-mail: [email protected] > > > >>> For additional commands, e-mail: [email protected] > > > >>> > > > >>> > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [email protected] > > For additional commands, e-mail: [email protected] > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
