Hi Tiago, I fully agree with that procedure (and I think skipping the issue for small ones, like the one I just opened, is also the way to go)
On Thu, Aug 1, 2024 at 7:13 PM Tiago Bento <[email protected]> wrote: > Paolo, I understand your proposal, I'm just concerned that the build > sheriff role keeps rotating around the same people, as not everyone is > available to volunteer and/or act as such. I know it's not your > intention, but it's something that can happen. We can't expect all > contributors to express interest in being the build sheriff for a > month, nor that this will be something we can maintain running > sustainably. > > Francisco, filing a new issue and fixing the problem on a separate PR > is indeed the way to go, IMHO. What we usually do on `kie-tools` is: > 1. Send a PR with a change. > 2. Observe red PR checks, unrelated to the changes introduced. > 3. Open an issue and send a separate PR targeting the same branch of > the original PR, fixing the problem on the PR checks. > 4. Review and merge this second PR, closing the new issue. > 5. Retrigger PR checks on the original PR. > 6. Observe a green build, review and merge it normally. > > * Sometimes we skip opening an issue, if the effort to fix it is small > enough, and we can use the PR description to provide enough context > for reviewers and watchers of the repo. > > The important thing, IMHO, is that the original PR doesn't get merged > before the unrelated issue on PR checks is fixed. Otherwise we open a > credit line that allows us to fall into tech debt :) My view has > always been that we need to collectively cherish our CI, PR checks and > automations, seeing those as the canonical way to build our software. > But it's been really hard to cherish something so distant from our > day-to-day work, especially when we all can, to some extent, continue > operating the same way we've been for the last who knows how many > years, somewhat ignoring the systems we currently have :/ > > On Thu, Aug 1, 2024 at 11:54 AM Francisco Javier Tirado Sarti > <[email protected]> wrote: > > > > I forgot to mention that another topic that is difficult to fix but easy > to > > discuss is if the Jenkins machines executing the test are properly > > dimensioned for the test we are executing. > > For example, in the previous PR, the timeout to startup the Keycloak > > quarkus instance IT test was increased to 2 minutes, because the default > of > > 1 minute does not seem to be enough for downloading and running the > > keycloak image. > > The CI is already taking ages. > > Either we increase our HW resources for testing, or we start reducing our > > test scope. > > > > On Thu, Aug 1, 2024 at 5:42 PM Francisco Javier Tirado Sarti < > > [email protected]> wrote: > > > > > By the way I opened > > > https://github.com/apache/incubator-kie-kogito-examples/pull/1991 for > > > fixing > > > > https://ci-builds.apache.org/job/KIE/job/kogito/job/10.0.x/job/nightly/job/kogito-examples.build-and-deploy/17/ > > > So it can be said that I acted as sheriff, but since Im weak and cannot > > > hold the pressure, I pass the torch (or the start) to the next one ;) > > > > > > > > > On Thu, Aug 1, 2024 at 5:37 PM Francisco Javier Tirado Sarti < > > > [email protected]> wrote: > > > > > >> Hi Tiago, > > >> About point 2, when the issue blocking the merge is really unrelated, > it > > >> won't be a better approach to open a separate issue to fix the > unrelated > > >> issue? > > >> I think we agree that is better for tracking (so you do not see an > > >> unrelated change in a PR history) and will avoid the undesired > situation of > > >> two developers trying to fix the same unrelated issue from two > simultaneous > > >> PRs (one of the two eventually has to trigger the rebase and realize > the > > >> broken test is already fixed, but still, there are less chances of > them > > >> working in the same problem if there is an issue in the issue list) > > >> > > >> > > >> On Thu, Aug 1, 2024 at 4:58 PM Tiago Bento <[email protected]> > wrote: > > >> > > >>> Thanks Paolo for starting this conversation. Let me bring a little > bit > > >>> of my perspective to it. > > >>> > > >>> Although I agree that having people "dedicated" to the quality and > > >>> stability of our CI and other automations would be better than what > we > > >>> have today, having our builds break so often that we need a system in > > >>> place to deal with them is a symptom of other problems, IMHO. > > >>> > > >>> The complexity of our CI systems and automations is discouraging for > > >>> most people to get involved. Without the system itself changing and > > >>> being more approachable, having "build sheriffs" will only make the > > >>> separation between "development" and "CI" bigger, and we'll be > reliant > > >>> on a small group of people who'll become solely responsible for > either > > >>> fixing stuff other people broke, or chasing them to fix it. When > > >>> inevitably these experts can't or simply don't want to contribute to > > >>> this area of the community anymore, we're in big trouble. > > >>> > > >>> My opinion is that we could try and concentrate our efforts to reduce > > >>> the barrier of entry to maintaining the CI and automations we have, > > >>> while putting a system in place that will naturally have each one of > > >>> us know at least the basics of how the CI and automations work. > > >>> > > >>> From my experience maintaining `kie-tools`, a few things help > reaching > > >>> that point: > > >>> 1. Having local builds be as similar as possible to CI builds. No > > >>> fancy commands or profiles that only run on CI. > > >>> 2. Red PRs can't be merged. Ever. If your PR became red for > "unrelated > > >>> reasons", you then become responsible to fix the "unrelated issue", > > >>> helping everyone else not face the same problem. > > >>> 3. Having a CI system with the least amount of abstractions possible. > > >>> Less CI code == less cognitive load == smaller barrier of entry. > > >>> > > >>> Moving away from Jenkins for PR checks and concentrating on GitHub > > >>> Actions is, IMHO, already a great step in that direction. > > >>> > > >>> I hope I could bring something positive to the discussion. > > >>> > > >>> Thanks! > > >>> > > >>> Regards, > > >>> > > >>> Tiago Bento > > >>> > > >>> On Thu, Aug 1, 2024 at 10:08 AM Gabriele Cardosi > > >>> <[email protected]> wrote: > > >>> > > > >>> > Thanks for clarification, Paolo! > > >>> > > > >>> > Il giorno gio 1 ago 2024 alle ore 15:46 Paolo Bizzarri < > > >>> [email protected]> > > >>> > ha scritto: > > >>> > > > >>> > > Hi Gabriele, > > >>> > > > > >>> > > it is a mix of various stuff. > > >>> > > > > >>> > > For example, take the various issues that I reported in the > analysis > > >>> done > > >>> > > for 10.x branch. Most of them apply just the same for the main > > >>> branch. > > >>> > > > > >>> > > For example > > >>> > > > > >>> > > > > >>> > https://ci-builds.apache.org/job/KIE/job/kogito/job/main/job/tools/job/kogito-clean-old-nightly-images/ > > >>> > > > > >>> > > Now this is probably a build that has to be just deleted - but > still > > >>> it is > > >>> > > always red, and we need someone that looks at it and decide that > > >>> yes, we > > >>> > > need to get rid of it, create a corresponding kie issue and go > after > > >>> it. > > >>> > > > > >>> > > Another example: > > >>> > > > > >>> > > > > >>> > https://ci-builds.apache.org/job/KIE/job/kogito/job/10.0.x/job/nightly/job/kogito-examples.build-and-deploy/17/ > > >>> > > > > >>> > > This test has been failing almost every day in the last few days. > > >>> Either we > > >>> > > need to make it a little more stable, or get rid of it. > > >>> > > > > >>> > > And so on. > > >>> > > > > >>> > > The goal of the sheriff is to keep the top level folder in good > > >>> health, and > > >>> > > that means that all the underlying jobs are healthy. > > >>> > > > > >>> > > I hope this clarifies my proposal. > > >>> > > > > >>> > > Regards > > >>> > > > > >>> > > Paolo > > >>> > > > > >>> > > > > >>> > > > > >>> > > On Thu, Aug 1, 2024 at 3:18 PM Gabriele Cardosi < > > >>> > > [email protected]> > > >>> > > wrote: > > >>> > > > > >>> > > > Hi Paolo, > > >>> > > > may you explain exactly what you mean with "builds are often > > >>> broken" ? > > >>> > > May > > >>> > > > you give an example of such and, in the example, what should > the > > >>> > > "sheriff" > > >>> > > > do to manage it ? (Sorry, I just need to understand what you > are > > >>> > > referring > > >>> > > > to) > > >>> > > > > > >>> > > > Thanks! > > >>> > > > > > >>> > > > Il giorno gio 1 ago 2024 alle ore 15:09 Paolo Bizzarri < > > >>> > > [email protected]> > > >>> > > > ha scritto: > > >>> > > > > > >>> > > > > Hello kie mates, > > >>> > > > > > > >>> > > > > please find my proposal in the following. > > >>> > > > > > > >>> > > > > PROBLEM > > >>> > > > > - builds are often broken and they stay broken for a long > time. > > >>> There > > >>> > > > seem > > >>> > > > > to be not a clear definition of who should take care of this > > >>> > > > > > > >>> > > > > CONTEXT > > >>> > > > > - fixing builds is slow, annoying and tipically is more a > job of > > >>> > > chasing > > >>> > > > > someone else than fixing it yourself. So it becomes quickly > > >>> wearing. > > >>> > > > > > > >>> > > > > PROPOSED SOLUTION > > >>> > > > > - identify a number of build sheriffs that look at the > various > > >>> builds, > > >>> > > > open > > >>> > > > > the relevant issues for tracking and chase other devs and > > >>> contributors > > >>> > > to > > >>> > > > > fix the issues themselves. The sheriffs are not supposed to > fix > > >>> > > > everything > > >>> > > > > by themselves, but instead to keep the attention of other > > >>> developers on > > >>> > > > the > > >>> > > > > status of the builds. > > >>> > > > > I suggest we have three sheriffs, that stay around for one > > >>> month and > > >>> > > > then > > >>> > > > > pass the token to someone else: one for drools and > optaplanner, > > >>> one for > > >>> > > > > kogito, one for kie-tools. > > >>> > > > > > > >>> > > > > Let me know your ideas and feedback. > > >>> > > > > > > >>> > > > > Regards > > >>> > > > > > > >>> > > > > Paolo > > >>> > > > > > > >>> > > > > > >>> > > > > >>> > > >>> --------------------------------------------------------------------- > > >>> To unsubscribe, e-mail: [email protected] > > >>> For additional commands, e-mail: [email protected] > > >>> > > >>> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
