zentol commented on a change in pull request #6996: [FLINK-10715] Change reporter log level URL: https://github.com/apache/flink/pull/6996#discussion_r230336292
########## File path: flink-runtime/src/main/java/org/apache/flink/runtime/metrics/MetricRegistryImpl.java ########## @@ -426,7 +426,12 @@ public void run() { try { reporter.report(); } catch (Throwable t) { - LOG.warn("Error while reporting metrics", t); + if (LOG.isDebugEnabled()) { + LOG.debug("Error while reporting metrics {}", t); + } + else { + LOG.warn("Error while reporting metrics", t.getMessage()); Review comment: I'd rather be on the safe side and log the stack-trace in case it is not caused by us. Whether a report failing or not being critical depends on the context and user. For example, a reporter failing consistently over a large period of time should show up in the logs in a manner that allows the user to make make a conscious decision whether the cluster has to be restarted or not. However in the worst case you get a `null` error message, which is entirely unhelpful for making said decision. If we want to go for temporary band-aids then iterate over all reporters and catch `ConcurrentModificationExceptions`. Then you don't have these somewhat-expected intermittent exceptions caused by us anymore. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services