Re: Ops experience: monitoring [mail processing - mailbox event processing] for distributed James product
+1 Le mer. 12 févr. 2020 à 12:01, Tellier Benoit a écrit : > Then an admin might miss the original log, if out of it's browsing window. > > However I agree the log could be done at a lower pace: > - Check every minute, log directly upon status change > - Otherwise re-log current status every 30 minutes > > On 12/02/2020 17:48, Antoine Duprat wrote: > > Shouldn't it be more logic to log only status changes ? > > > > I mean, if you are in a degraded state, you will log the same thing each > > minute else if you have fixed the issue. > > > > Le mer. 12 févr. 2020 à 11:43, Tellier Benoit a > > écrit : > > > >> +1 > >> > >> We should make this happen. > >> > >> On 12/02/2020 17:29, Matthieu Baechler wrote: > >>> On Wed, 2020-02-12 at 16:27 +0700, Tellier Benoit wrote: > >>> > - Through grafana, the admin will have the information directly > available. Nowaday, health-checks requires her to execute the > healthcheck via webadmin. More actions is generally the best way of > having none of them taken. > > >>> > >>> I just want to add, on that matter, that I already proposed to have a > >>> timer that logs health state to WARN when status is `degraded` and > >>> ERROR when status is `down` on a sensible time interval (like once a > >>> minute) and that would be enabled in our default configuration. > >>> > >>> That way the logs, which are the first and most basic tool any admin is > >>> looking at, would give you that very important information. > >>> > >>> Cheers, > >>> > >> > >> - > >> To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org > >> For additional commands, e-mail: server-dev-h...@james.apache.org > >> > >> > > > > - > To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org > For additional commands, e-mail: server-dev-h...@james.apache.org > >
Re: Ops experience: monitoring [mail processing - mailbox event processing] for distributed James product
Then an admin might miss the original log, if out of it's browsing window. However I agree the log could be done at a lower pace: - Check every minute, log directly upon status change - Otherwise re-log current status every 30 minutes On 12/02/2020 17:48, Antoine Duprat wrote: > Shouldn't it be more logic to log only status changes ? > > I mean, if you are in a degraded state, you will log the same thing each > minute else if you have fixed the issue. > > Le mer. 12 févr. 2020 à 11:43, Tellier Benoit a > écrit : > >> +1 >> >> We should make this happen. >> >> On 12/02/2020 17:29, Matthieu Baechler wrote: >>> On Wed, 2020-02-12 at 16:27 +0700, Tellier Benoit wrote: >>> - Through grafana, the admin will have the information directly available. Nowaday, health-checks requires her to execute the healthcheck via webadmin. More actions is generally the best way of having none of them taken. >>> >>> I just want to add, on that matter, that I already proposed to have a >>> timer that logs health state to WARN when status is `degraded` and >>> ERROR when status is `down` on a sensible time interval (like once a >>> minute) and that would be enabled in our default configuration. >>> >>> That way the logs, which are the first and most basic tool any admin is >>> looking at, would give you that very important information. >>> >>> Cheers, >>> >> >> - >> To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org >> For additional commands, e-mail: server-dev-h...@james.apache.org >> >> > - To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org For additional commands, e-mail: server-dev-h...@james.apache.org
Re: Ops experience: monitoring [mail processing - mailbox event processing] for distributed James product
Shouldn't it be more logic to log only status changes ? I mean, if you are in a degraded state, you will log the same thing each minute else if you have fixed the issue. Le mer. 12 févr. 2020 à 11:43, Tellier Benoit a écrit : > +1 > > We should make this happen. > > On 12/02/2020 17:29, Matthieu Baechler wrote: > > On Wed, 2020-02-12 at 16:27 +0700, Tellier Benoit wrote: > > > >> - Through grafana, the admin will have the information directly > >> available. Nowaday, health-checks requires her to execute the > >> healthcheck via webadmin. More actions is generally the best way of > >> having none of them taken. > >> > > > > I just want to add, on that matter, that I already proposed to have a > > timer that logs health state to WARN when status is `degraded` and > > ERROR when status is `down` on a sensible time interval (like once a > > minute) and that would be enabled in our default configuration. > > > > That way the logs, which are the first and most basic tool any admin is > > looking at, would give you that very important information. > > > > Cheers, > > > > - > To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org > For additional commands, e-mail: server-dev-h...@james.apache.org > >
Re: Ops experience: monitoring [mail processing - mailbox event processing] for distributed James product
+1 We should make this happen. On 12/02/2020 17:29, Matthieu Baechler wrote: > On Wed, 2020-02-12 at 16:27 +0700, Tellier Benoit wrote: > >> - Through grafana, the admin will have the information directly >> available. Nowaday, health-checks requires her to execute the >> healthcheck via webadmin. More actions is generally the best way of >> having none of them taken. >> > > I just want to add, on that matter, that I already proposed to have a > timer that logs health state to WARN when status is `degraded` and > ERROR when status is `down` on a sensible time interval (like once a > minute) and that would be enabled in our default configuration. > > That way the logs, which are the first and most basic tool any admin is > looking at, would give you that very important information. > > Cheers, > - To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org For additional commands, e-mail: server-dev-h...@james.apache.org
Re: Ops experience: monitoring [mail processing - mailbox event processing] for distributed James product
On Wed, 2020-02-12 at 16:27 +0700, Tellier Benoit wrote: > - Through grafana, the admin will have the information directly > available. Nowaday, health-checks requires her to execute the > healthcheck via webadmin. More actions is generally the best way of > having none of them taken. > I just want to add, on that matter, that I already proposed to have a timer that logs health state to WARN when status is `degraded` and ERROR when status is `down` on a sensible time interval (like once a minute) and that would be enabled in our default configuration. That way the logs, which are the first and most basic tool any admin is looking at, would give you that very important information. Cheers, -- Matthieu Baechler - To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org For additional commands, e-mail: server-dev-h...@james.apache.org
Ops experience: monitoring [mail processing - mailbox event processing] for distributed James product
Hello all, Recently, as part of our work documenting Administration Procedures for the Distributed Guice James product, we are having some reflections regarding the way to conduct monitoring, which undertook some nice discussions. Currently, monitoring of `mailbox event processing` and `mail processing` can be achieved via logs (ie ERROR log review, etc..) However, logs requires correct kibana configuration which means also good information. But: - It makes retries/final-try non trivial to distinguish - Admin generally monotor logs using a time-window. Events older than this time window are ignored. We can think of several mechanisms to enhance this matter of fact: - Having for instance a health check, like MailboxEventProcessingHealthCheck ensuring that dead-letter is empty, or returning "degraded" otherwize - Having a metric displayed in a board. For the dead-letter exemple, a boolean text field can be enough. While interesting, the health check options received the following critics so far: - A perfectly behaving James server might report some failed processing entries (for example on some border line EML parsing), leading to a degraded status of an overwize perfectly working James server (for both the mail processing and mailbox processing case) - Through grafana, the admin will have the information directly available. Nowaday, health-checks requires her to execute the healthcheck via webadmin. More actions is generally the best way of having none of them taken. We would be very interested by feedback on this topic, in order to get a friendlyer admin experience. Best regards, Benoit - To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org For additional commands, e-mail: server-dev-h...@james.apache.org