Re: [DISCUSS] log guidelines

2023-12-06 Thread Daan Hoogland
Thanks João,

On Tue, Dec 5, 2023 at 8:21 PM João Jandre Paraquetti
 wrote:

...

> However, when the error happens because of system degradation, there are
> times when having a stacktrace can save hours of debugging, as the error
> might be hard to reproduce. Also, if the error is occurring in a
> production environment, changing the log level to debug means restarting
> ACS's components, possibly leading to downtime and/or more errors. The
> second issue will be solved with the log4j2 PR, but the first one can
> still be a pain.

The second issue is already implemented in the current system (though
tuning to a higher level does not work as flawlessly as as going to
debug or trace)

The problem I have with the second reason is that if it occurs it will
occur a lot usually.
A compromise could be adding a second error log to be used for those
stacktraces. What do you think?
Also, How would you word it instead, João?

> On 12/4/23 10:51, Daan Hoogland wrote:
...
> > Fatal:
> > - include a reason why the system can not function anymore,
> > - be followed by a System.exit() call,
> > - include a stack trace if it is reasonable to expect this would give
> > properly helpful information.
> > I did a quick search and we have 2 fatal calls in the system. Neither
> > is followed by a System.exit() and I think they can be replaced by
> > error()/debug() combinations
> >
> > Error:
> > - include a reason why the system can not execute a user request or
> > why part of the system is degraded.
> > - should *not* include a stack trace
> >
> > Warn:
> > - include a reason why the system might not behave as the user is
> > expecting. (configuration or integration missing for example,)
> > - should *not* include a stack trace
> >
> > Info:
> > - is for informationals to the operator like resource lifecycle
> > messages or deployment planning considderations
> > - should *not* include a stack trace
> >
> > Debug:
> > -  information to be turned on when trouble shooting or developing.
> > This information should not be needed for normal operation
> > - if a thrown exception is caught and not rethrown, the stack trace of
> > such exception
> >
> > Trace:
> > - whatever you feel helps you during development of a
> > feature/enhancement/fix/improvement that is not needed in any kind of
> > operational way.

-- 
Daan


Re: [DISCUSS] log guidelines

2023-12-05 Thread João Jandre Paraquetti

Hello all,

As someone who has worked a lot on logging in this project, I feel like 
a set of log guidelines are a good addition. I generally agree with 
Daan's ideas; however, I think that the error level was oversimplified. 
As Daan already said, there are two main reasons for error logs: "the 
system can not execute a user request or part of the system is degraded".


When the system cannot execute a user request, a simple message 
explaining why it's not possible to execute the request is often enough. 
However, when the error happens because of system degradation, there are 
times when having a stacktrace can save hours of debugging, as the error 
might be hard to reproduce. Also, if the error is occurring in a 
production environment, changing the log level to debug means restarting 
ACS's components, possibly leading to downtime and/or more errors. The 
second issue will be solved with the log4j2 PR, but the first one can 
still be a pain.


In summary, a hard rule would not address all situations, but a 
guideline or good practices guide would be a great addition to ACS.


Best regards,
João Jandre


On 12/4/23 10:51, Daan Hoogland wrote:

There have been some discussions on github PRs about the subject of
proper logging and I want to use this to start a discussion and come
to some guidelines that hopefully everybody can agree to.

My idea is the following

Fatal:
- include a reason why the system can not function anymore,
- be followed by a System.exit() call,
- include a stack trace if it is reasonable to expect this would give
properly helpful information.
I did a quick search and we have 2 fatal calls in the system. Neither
is followed by a System.exit() and I think they can be replaced by
error()/debug() combinations

Error:
- include a reason why the system can not execute a user request or
why part of the system is degraded.
- should *not* include a stack trace

Warn:
- include a reason why the system might not behave as the user is
expecting. (configuration or integration missing for example,)
- should *not* include a stack trace

Info:
- is for informationals to the operator like resource lifecycle
messages or deployment planning considderations
- should *not* include a stack trace

Debug:
-  information to be turned on when trouble shooting or developing.
This information should not be needed for normal operation
- if a thrown exception is caught and not rethrown, the stack trace of
such exception

Trace:
- whatever you feel helps you during development of a
feature/enhancement/fix/improvement that is not needed in any kind of
operational way.

As I think the two fatal()s we have are not genuine, I think stack
traces are only for Debug and Trace level, where Debug can be
temporarily turned on in production and Trace is fro developers
exclusively,

I realise the current state of the system is far from this behaviour,
but i like to think we can grow in the direction as I describe and I
hope we can refine this to a generic guideline to be used in reviews.




[DISCUSS] log guidelines

2023-12-04 Thread Daan Hoogland
There have been some discussions on github PRs about the subject of
proper logging and I want to use this to start a discussion and come
to some guidelines that hopefully everybody can agree to.

My idea is the following

Fatal:
- include a reason why the system can not function anymore,
- be followed by a System.exit() call,
- include a stack trace if it is reasonable to expect this would give
properly helpful information.
I did a quick search and we have 2 fatal calls in the system. Neither
is followed by a System.exit() and I think they can be replaced by
error()/debug() combinations

Error:
- include a reason why the system can not execute a user request or
why part of the system is degraded.
- should *not* include a stack trace

Warn:
- include a reason why the system might not behave as the user is
expecting. (configuration or integration missing for example,)
- should *not* include a stack trace

Info:
- is for informationals to the operator like resource lifecycle
messages or deployment planning considderations
- should *not* include a stack trace

Debug:
-  information to be turned on when trouble shooting or developing.
This information should not be needed for normal operation
- if a thrown exception is caught and not rethrown, the stack trace of
such exception

Trace:
- whatever you feel helps you during development of a
feature/enhancement/fix/improvement that is not needed in any kind of
operational way.

As I think the two fatal()s we have are not genuine, I think stack
traces are only for Debug and Trace level, where Debug can be
temporarily turned on in production and Trace is fro developers
exclusively,

I realise the current state of the system is far from this behaviour,
but i like to think we can grow in the direction as I describe and I
hope we can refine this to a generic guideline to be used in reviews.


-- 
Daan