zentol commented on a change in pull request #6996: [FLINK-10715] Change 
reporter log level
URL: https://github.com/apache/flink/pull/6996#discussion_r230336292
 
 

 ##########
 File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/metrics/MetricRegistryImpl.java
 ##########
 @@ -426,7 +426,12 @@ public void run() {
                        try {
                                reporter.report();
                        } catch (Throwable t) {
-                               LOG.warn("Error while reporting metrics", t);
+                               if (LOG.isDebugEnabled()) {
+                                       LOG.debug("Error while reporting 
metrics {}", t);
+                               }
+                               else {
+                                       LOG.warn("Error while reporting 
metrics", t.getMessage());
 
 Review comment:
   I'd rather be on the safe side and log the stack-trace in case it is not 
caused by us. Whether a report failing or not being critical depends on the 
context and user.
   
   For example, a reporter failing consistently over a large period of time 
should show up in the logs in a manner that allows the user to make make a 
conscious decision whether the cluster has to be restarted or not. However in 
the worst case you get a `null` error message, which is entirely unhelpful for 
making said decision.
   
   If we want to go for temporary band-aids then iterate over all reporters and 
catch `ConcurrentModificationExceptions`. Then you don't have these 
somewhat-expected intermittent exceptions caused by us anymore.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to