lonerzzz commented on issue #11042: FLINK-15744 Some TaskManager Task exceptions are logged as info URL: https://github.com/apache/flink/pull/11042#issuecomment-586757018 @zentol @aljoscha Upon reading the issue #5399, it didn't seem that any firm position was taken on the issue. The reference to setting JobManager output to log at the info level assumes an ability to recover. This is not true in all cases. Two situations that I have encountered are those from which recovery does not occur or occurs slowly: 1) Job submission failure - there are many errors from which the submission will not recover without manual intervention. By forcing JobManager output to log at the info level, the JobManager must always be run with info level logging for situations where jobs are regularly submitted or the errors will not be visible. 2) Rebalancing errors - several situations that I have encountered where the number of task slots is close to the number of tasks can result in jobs that are stuck awaiting deployment and rebalancing for very long periods of time in the event of a transient infrastructure error. While recovery may happen, it can take a while and a warning would at least allow operations staff to take manual action to correct things rather than finding out that a job in a pipeline is not processing because it is awaiting resources.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services