jmsperu opened a new issue, #12679:
URL: https://github.com/apache/cloudstack/issues/12679
### problem
`NASBackupProvider.syncBackupStorageStats()` crashes with a
`NullPointerException` when
`ResourceManager.findOneRandomRunningHostByHypervisor()` returns `null`. This
happens when no KVM host in the zone has `status=Up` at the exact moment the
`BackupSyncTask` runs (e.g., during management server startup, brief agent
disconnections, or host state
transitions).
The NPE kills the entire `BackupSyncTask` background job every sync
interval (default 300s), flooding the management server log with stack traces
and preventing backup storage stats from being updated.
## Stack Trace
ERROR [o.a.c.b.B.BackupSyncTask] Error trying to run backup-sync
background task due to:
[Cannot invoke "com.cloud.host.Host.getId()" because "host" is null].
java.lang.NullPointerException: Cannot invoke
"com.cloud.host.Host.getId()" because "host" is null
at
org.apache.cloudstack.backup.NASBackupProvider.syncBackupStorageStats(NASBackupProvider.java:544)
at
org.apache.cloudstack.backup.BackupManagerImpl$BackupSyncTask.runInContext(BackupManagerImpl.java:1947)
## Affected Code
File:
plugins/backup/nas/src/main/java/org/apache/cloudstack/backup/NASBackupProvider.java`
java
@Override
public void syncBackupStorageStats(Long zoneId) {
final List<BackupRepository> repositories =
backupRepositoryDao.listByZoneAndProvider(zoneId, getName());
final Host host =
resourceManager.findOneRandomRunningHostByHypervisor(Hypervisor.HypervisorType.KVM,
zoneId);
// host can be null here, but no null check before using it:
for (final BackupRepository repository : repositories) {
...
answer = (BackupStorageStatsAnswer)
agentManager.send(host.getId(), command); // NPE
...
}
}
findOneRandomRunningHostByHypervisor in ResourceManagerImpl returns null
when no matching host is found:
if (CollectionUtils.isEmpty(hosts)) {
return null;
}
The same pattern also exists in deleteBackup() (line ~450) where the host
can be null when the VM is removed and no running KVM host is available.
Suggested Fix
Add a null check after findOneRandomRunningHostByHypervisor, log a
warning, and return early:
@Override
public void syncBackupStorageStats(Long zoneId) {
final List<BackupRepository> repositories =
backupRepositoryDao.listByZoneAndProvider(zoneId, getName());
if (repositories.isEmpty()) {
return;
}
final Host host =
resourceManager.findOneRandomRunningHostByHypervisor(Hypervisor.HypervisorType.KVM,
zoneId);
if (host == null) {
logger.warn("Unable to find a running KVM host in zone {} to sync
backup storage stats", zoneId);
return;
}
for (final BackupRepository repository : repositories) {
...
}
}
And similarly for deleteBackup():
Host host = vm != null ? getVMHypervisorHost(vm) :
resourceManager.findOneRandomRunningHostByHypervisor(HypervisorType.KVM,
Long.valueOf(backup.getZoneId()));
if (host == null) {
throw new CloudRuntimeException("Unable to find a running KVM host to
process backup deletion");
}
Environment
- CloudStack version: 4.22.0.0
- Hypervisor: KVM
- Backup provider: NAS (NFS)
- OS: Ubuntu 24.04, Java 21
How to Reproduce
1. Configure NAS backup provider with an NFS backup repository
2. Assign backup offerings to VMs
3. Restart cloudstack-management (or wait for a transient host disconnect)
4. Observe management-server.log — the NPE fires every
backup.framework.sync.interval seconds
Impact
- BackupSyncTask fails completely on every cycle, backup storage capacity
stats are never updated
- Log spam (one full stack trace every 5 minutes)
- No data loss, but backup monitoring/reporting is degraded
### versions
The versions of ACS, hypervisors, storage, network etc..
### The steps to reproduce the bug
1.
2.
3.
...
### What to do about it?
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]