When an external metric server configured to use TCP is unreachable, the storage and VM indicators of the cluster nodes in the UI turn gray. This is because, currently, a failed connection attempt raises an unhandled exception, which aborts the status update flow. As the connection attempts happen at the beginning of the update process, status information is then not broadcasted within the system or across the cluster. After five minutes without updates, the frontend marks the indicators as gray.
To catch connection errors, wrap connection establishment in an eval block. The implementation ensures that other connections to external metric servers are still established, even if one fails. Signed-off-by: Lukas Sichert <[email protected]> --- Notes: changes from v1 to v2: -add the SafeSyslog import required for syslog() -correct bug ID: #4911 -> #4130 -move the push operation outside the eval block as suggested by Thomas Regarding catching the errors at a higher level: Since this function is iterated through the plugins, not catching the error here would mean, that not all the plugins are checked. PVE/ExtMetric.pm | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/PVE/ExtMetric.pm b/PVE/ExtMetric.pm index ebc2817b..18815efd 100644 --- a/PVE/ExtMetric.pm +++ b/PVE/ExtMetric.pm @@ -7,6 +7,7 @@ use PVE::Status::Plugin; use PVE::Status::Graphite; use PVE::Status::InfluxDB; use PVE::Status::OpenTelemetry; +use PVE::SafeSyslog; PVE::Status::Graphite->register(); PVE::Status::InfluxDB->register(); @@ -52,8 +53,12 @@ sub transactions_start { $cfg, sub { my ($plugin, $id, $plugin_config) = @_; - - my $connection = $plugin->_connect($plugin_config, $id); + + my $connection = eval { $plugin->_connect($plugin_config, $id);}; + if (my $err = $@) { + syslog( "warning", "connection for plugin '$id' failed: $err"); + return; + } push @$transactions, { -- 2.47.3
