Hanarion opened a new pull request, #12133: URL: https://github.com/apache/cloudstack/pull/12133
### Description Fixes: #12122 This PR resolves the random backup failures observed when using a CIFS (SMB) backup repository with NAS backup. The original issue describes how backups appear to complete — files transferred, file remaining = 0 — but the job ends in status FAILED because the subsequent sync + umount step blocks: the mount point remains busy and cannot unmount cleanly. #### What was happening: After the data copy, the script issues sync but because CIFS doesn’t always flush/close all filesystem handles immediately, the mount remains busy. The script attempting umount $mount_point fails (“target is busy”), the mount and directory remain, leaving resources dangling and causing job to fail even though the backup data is present. The issue is intermittent (“sometimes it fails, sometimes it doesn’t”) due to timing/race conditions with CIFS. #### What this PR implements: Adds a polling loop (e.g., using fuser ‑m <mount_point>) with a timeout to wait for any active handles on the mount to clear before attempting umount. If the mount remains busy past the timeout, we show an error text, and still try to umount (We never know, it may work if we are lucky) We also ensures that on backups of stopped VMs, the umount is also triggered ### Types of changes - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] New feature (non-breaking change which adds functionality) - [X] Bug fix (non-breaking change which fixes an issue) - [ ] Enhancement (improves an existing feature and functionality) - [ ] Cleanup (Code refactoring and cleanup, that may add test cases) - [ ] Build/CI - [ ] Test (unit or integration test code) ### Feature/Enhancement Scale or Bug Severity #### Feature/Enhancement Scale - [ ] Major - [X] Minor #### Bug Severity - [ ] BLOCKER - [ ] Critical - [ ] Major - [X] Minor - [ ] Trivial ### Screenshots (if appropriate): ### How Has This Been Tested? I ran multiple tests by directly calling the script and checking the return code while blocking the umount : ```bash [root@compute01 ~]# /usr/bin/bash /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/nasbackup.sh -o backup -v i-12-606-VM -t cifs -s '/XXXXXX.XXXXX/XXX' -m 'vers=3.0,username=XXXXXX,password=XXXXXX' -p 'i-12-606-VM/test' -q false -d '' Job type: Completed Operation: Backup Time elapsed: 32208 ms File processed: 23.000 GiB File remaining: 0.000 B File total: 23.000 GiB 2770737887 Timeout for unmounting reached: still busy Warning: failed to unmount /tmp/csbackup.weorL, skipping rmdir umount error message: umount: /tmp/csbackup.weorL: target is busy. [root@compute01 ~]# echo $? 0 [root@compute01 ~]# grep -i unmount /var/log/cloudstack/agent/agent.log 2025-11-25 13-42-17> Warning: failed to unmount /tmp/csbackup.weorL, error: umount: /tmp/csbackup.weorL: target is busy. ``` #### How did you try to break this feature and the system with this change? This change should not break anything as it simply fix the wrong return code when umount fails, and add more details in stdout and logs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
