Re: [DISCUSS] log guidelines
Thanks João, On Tue, Dec 5, 2023 at 8:21 PM João Jandre Paraquetti wrote: ... > However, when the error happens because of system degradation, there are > times when having a stacktrace can save hours of debugging, as the error > might be hard to reproduce. Also, if the error is occurring in a > production environment, changing the log level to debug means restarting > ACS's components, possibly leading to downtime and/or more errors. The > second issue will be solved with the log4j2 PR, but the first one can > still be a pain. The second issue is already implemented in the current system (though tuning to a higher level does not work as flawlessly as as going to debug or trace) The problem I have with the second reason is that if it occurs it will occur a lot usually. A compromise could be adding a second error log to be used for those stacktraces. What do you think? Also, How would you word it instead, João? > On 12/4/23 10:51, Daan Hoogland wrote: ... > > Fatal: > > - include a reason why the system can not function anymore, > > - be followed by a System.exit() call, > > - include a stack trace if it is reasonable to expect this would give > > properly helpful information. > > I did a quick search and we have 2 fatal calls in the system. Neither > > is followed by a System.exit() and I think they can be replaced by > > error()/debug() combinations > > > > Error: > > - include a reason why the system can not execute a user request or > > why part of the system is degraded. > > - should *not* include a stack trace > > > > Warn: > > - include a reason why the system might not behave as the user is > > expecting. (configuration or integration missing for example,) > > - should *not* include a stack trace > > > > Info: > > - is for informationals to the operator like resource lifecycle > > messages or deployment planning considderations > > - should *not* include a stack trace > > > > Debug: > > - information to be turned on when trouble shooting or developing. > > This information should not be needed for normal operation > > - if a thrown exception is caught and not rethrown, the stack trace of > > such exception > > > > Trace: > > - whatever you feel helps you during development of a > > feature/enhancement/fix/improvement that is not needed in any kind of > > operational way. -- Daan
Re: [DISCUSS] log guidelines
Hello all, As someone who has worked a lot on logging in this project, I feel like a set of log guidelines are a good addition. I generally agree with Daan's ideas; however, I think that the error level was oversimplified. As Daan already said, there are two main reasons for error logs: "the system can not execute a user request or part of the system is degraded". When the system cannot execute a user request, a simple message explaining why it's not possible to execute the request is often enough. However, when the error happens because of system degradation, there are times when having a stacktrace can save hours of debugging, as the error might be hard to reproduce. Also, if the error is occurring in a production environment, changing the log level to debug means restarting ACS's components, possibly leading to downtime and/or more errors. The second issue will be solved with the log4j2 PR, but the first one can still be a pain. In summary, a hard rule would not address all situations, but a guideline or good practices guide would be a great addition to ACS. Best regards, João Jandre On 12/4/23 10:51, Daan Hoogland wrote: There have been some discussions on github PRs about the subject of proper logging and I want to use this to start a discussion and come to some guidelines that hopefully everybody can agree to. My idea is the following Fatal: - include a reason why the system can not function anymore, - be followed by a System.exit() call, - include a stack trace if it is reasonable to expect this would give properly helpful information. I did a quick search and we have 2 fatal calls in the system. Neither is followed by a System.exit() and I think they can be replaced by error()/debug() combinations Error: - include a reason why the system can not execute a user request or why part of the system is degraded. - should *not* include a stack trace Warn: - include a reason why the system might not behave as the user is expecting. (configuration or integration missing for example,) - should *not* include a stack trace Info: - is for informationals to the operator like resource lifecycle messages or deployment planning considderations - should *not* include a stack trace Debug: - information to be turned on when trouble shooting or developing. This information should not be needed for normal operation - if a thrown exception is caught and not rethrown, the stack trace of such exception Trace: - whatever you feel helps you during development of a feature/enhancement/fix/improvement that is not needed in any kind of operational way. As I think the two fatal()s we have are not genuine, I think stack traces are only for Debug and Trace level, where Debug can be temporarily turned on in production and Trace is fro developers exclusively, I realise the current state of the system is far from this behaviour, but i like to think we can grow in the direction as I describe and I hope we can refine this to a generic guideline to be used in reviews.
[DISCUSS] log guidelines
There have been some discussions on github PRs about the subject of proper logging and I want to use this to start a discussion and come to some guidelines that hopefully everybody can agree to. My idea is the following Fatal: - include a reason why the system can not function anymore, - be followed by a System.exit() call, - include a stack trace if it is reasonable to expect this would give properly helpful information. I did a quick search and we have 2 fatal calls in the system. Neither is followed by a System.exit() and I think they can be replaced by error()/debug() combinations Error: - include a reason why the system can not execute a user request or why part of the system is degraded. - should *not* include a stack trace Warn: - include a reason why the system might not behave as the user is expecting. (configuration or integration missing for example,) - should *not* include a stack trace Info: - is for informationals to the operator like resource lifecycle messages or deployment planning considderations - should *not* include a stack trace Debug: - information to be turned on when trouble shooting or developing. This information should not be needed for normal operation - if a thrown exception is caught and not rethrown, the stack trace of such exception Trace: - whatever you feel helps you during development of a feature/enhancement/fix/improvement that is not needed in any kind of operational way. As I think the two fatal()s we have are not genuine, I think stack traces are only for Debug and Trace level, where Debug can be temporarily turned on in production and Trace is fro developers exclusively, I realise the current state of the system is far from this behaviour, but i like to think we can grow in the direction as I describe and I hope we can refine this to a generic guideline to be used in reviews. -- Daan