What does the GoCD server think the status of that job and stage is? What does "the pipeline crashed" mean? If the stage is shown as passed by the GoCD server, what downstream problem did this cause? Did subsequent stages or pipelines not trigger correctly?
The error looks like your agent had some kind of problem talking to the server or reporting its status. If that's the case then there is potentially a chicken-and-egg problem here that might prevent reporting at the level of scope you suggest - depending on the root cause of the issue - if the agent couldn't talk to the server to report its status and the error was not recoverable by the agent then you'd probably need to monitor agents for such connectivity errors. Agents do have a health API <https://docs.gocd.org/24.4.0/advanced_usage/agent-health-check-api.html>exposed which reports their ability to connect to the server - this could be monitored externally, but would not have "pipeline/stage/job" scope. - if the GoCD server itself had a problem preventing it from correctly updating the status from the agent, it would depend what the cause of that error is/was and whether it happened within the scope of a stage/job. If the stage/job was left in an indeterminate state there'd potentially be a similar problem with knowing how to report the status at the pipeline/stage's scope. The server has its own internal error reporting/tracking (the one that drives the red errors/warnings in the UI, and also has its own API for external consumption <https://api.gocd.org/current/#server-health-messages>) but we'd need to know what the root cause was and whether it triggered such an error/warning. -Chad On Thu, Nov 21, 2024 at 5:26 PM 'Hans Dampf' via go-cd < [email protected]> wrote: > Hi, > > [image: fail.png] > > We run into this problem tonight. The stage passed, but then the pipeline > crashed. The crash itself is not the main problem. > > The main problem is we have an extra fail-stage configured by mail and > enriched information. My guess is this fail stage did not get triggered > because the previous staged succeeded. > > I found in the documentation this part, but I'm not sure if this had > worked in this case. > https://docs.gocd.org/current/configuration/dev_notifications.html > > If not, then there should be a way to catch these events outside the stage > and job level but still in the pipeline and generate an alert. > > Regards > > -- > You received this message because you are subscribed to the Google Groups > "go-cd" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion visit > https://groups.google.com/d/msgid/go-cd/e4dd0034-8c49-4ac4-9e19-6d193d73c20fn%40googlegroups.com > <https://groups.google.com/d/msgid/go-cd/e4dd0034-8c49-4ac4-9e19-6d193d73c20fn%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "go-cd" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion visit https://groups.google.com/d/msgid/go-cd/CAA1RwH-t9Do19zSeR9v7ckFsr3vmR0fLR2AbACQU3BwLg8hWRA%40mail.gmail.com.
