What does the GoCD server think the status of that job and stage is? What
does "the pipeline crashed" mean? If the stage is shown as passed by the
GoCD server, what downstream problem did this cause? Did subsequent stages
or pipelines not trigger correctly?

The error looks like your agent had some kind of problem talking to the
server or reporting its status. If that's the case then there is
potentially a chicken-and-egg problem here that might prevent reporting at
the level of scope you suggest - depending on the root cause of the issue

   - if the agent couldn't talk to the server to report its status and the
   error was not recoverable by the agent then you'd probably need to monitor
   agents for such connectivity errors. Agents do have a health API
   
<https://docs.gocd.org/24.4.0/advanced_usage/agent-health-check-api.html>exposed
   which reports their ability to connect to the server - this could be
   monitored externally, but would not have "pipeline/stage/job" scope.
   - if the GoCD server itself had a problem preventing it from correctly
   updating the status from the agent, it would depend what the cause of that
   error is/was and whether it happened within the scope of a stage/job. If
   the stage/job was left in an indeterminate state there'd potentially be a
   similar problem with knowing how to report the status at the
   pipeline/stage's scope. The server has its own internal error
   reporting/tracking (the one that drives the red errors/warnings in the UI,
   and also has its own API for external consumption
   <https://api.gocd.org/current/#server-health-messages>) but we'd need to
   know what the root cause was and whether it triggered such an error/warning.


-Chad

On Thu, Nov 21, 2024 at 5:26 PM 'Hans Dampf' via go-cd <
[email protected]> wrote:

> Hi,
>
> [image: fail.png]
>
> We run into this problem tonight. The stage passed, but then the pipeline
> crashed. The crash itself is not the main problem.
>
> The main problem is we have an extra fail-stage configured by mail and
> enriched information.  My guess is this fail stage did not get triggered
> because the previous staged succeeded.
>
> I found in the documentation this part, but I'm not sure if this had
> worked in this case.
> https://docs.gocd.org/current/configuration/dev_notifications.html
>
> If not, then there should be a way to catch these events outside the stage
> and job level but still in the pipeline and generate an alert.
>
> Regards
>
> --
> You received this message because you are subscribed to the Google Groups
> "go-cd" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion visit
> https://groups.google.com/d/msgid/go-cd/e4dd0034-8c49-4ac4-9e19-6d193d73c20fn%40googlegroups.com
> <https://groups.google.com/d/msgid/go-cd/e4dd0034-8c49-4ac4-9e19-6d193d73c20fn%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"go-cd" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/go-cd/CAA1RwH-t9Do19zSeR9v7ckFsr3vmR0fLR2AbACQU3BwLg8hWRA%40mail.gmail.com.

Reply via email to