Hanarion opened a new pull request, #12133:
URL: https://github.com/apache/cloudstack/pull/12133

   ### Description
   Fixes: #12122
   
   This PR resolves the random backup failures observed when using a CIFS (SMB) 
backup repository with NAS backup. The original issue describes how backups 
appear to complete — files transferred, file remaining = 0 — but the job ends 
in status FAILED because the subsequent sync + umount step blocks: the mount 
point remains busy and cannot unmount cleanly. 
   
   #### What was happening:
   After the data copy, the script issues sync but because CIFS doesn’t always 
flush/close all filesystem handles immediately, the mount remains busy.
   
   The script attempting umount $mount_point fails (“target is busy”), the 
mount and directory remain, leaving resources dangling and causing job to fail 
even though the backup data is present. 
   
   The issue is intermittent (“sometimes it fails, sometimes it doesn’t”) due 
to timing/race conditions with CIFS.
   
   #### What this PR implements:
   Adds a polling loop (e.g., using fuser ‑m <mount_point>) with a timeout to 
wait for any active handles on the mount to clear before attempting umount.
   
   If the mount remains busy past the timeout, we show an error text, and still 
try to umount (We never know, it may work if we are lucky)
   
   We also ensures that on backups of stopped VMs, the umount is also triggered
   
   ### Types of changes
   
   - [ ] Breaking change (fix or feature that would cause existing 
functionality to change)
   - [ ] New feature (non-breaking change which adds functionality)
   - [X] Bug fix (non-breaking change which fixes an issue)
   - [ ] Enhancement (improves an existing feature and functionality)
   - [ ] Cleanup (Code refactoring and cleanup, that may add test cases)
   - [ ] Build/CI
   - [ ] Test (unit or integration test code)
   
   ### Feature/Enhancement Scale or Bug Severity
   
   #### Feature/Enhancement Scale
   
   - [ ] Major
   - [X] Minor
   
   #### Bug Severity
   
   - [ ] BLOCKER
   - [ ] Critical
   - [ ] Major
   - [X] Minor
   - [ ] Trivial
   
   ### Screenshots (if appropriate):
   
   ### How Has This Been Tested?
   
   I ran multiple tests by directly calling the script and checking the return 
code while blocking the umount :
   ```bash
   [root@compute01 ~]# /usr/bin/bash 
/usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/nasbackup.sh -o backup 
-v i-12-606-VM -t cifs -s '/XXXXXX.XXXXX/XXX' -m 
'vers=3.0,username=XXXXXX,password=XXXXXX' -p 'i-12-606-VM/test' -q false -d ''
   
   Job type:         Completed   
   Operation:        Backup      
   Time elapsed:     32208        ms
   File processed:   23.000 GiB
   File remaining:   0.000 B
   File total:       23.000 GiB
   
   2770737887
   Timeout for unmounting reached: still busy
   Warning: failed to unmount /tmp/csbackup.weorL, skipping rmdir
   umount error message: umount: /tmp/csbackup.weorL: target is busy.
   [root@compute01 ~]# echo $?
   0
   [root@compute01 ~]# grep -i unmount /var/log/cloudstack/agent/agent.log
   2025-11-25 13-42-17> Warning: failed to unmount /tmp/csbackup.weorL, error: 
umount: /tmp/csbackup.weorL: target is busy.
   
   ```
   
   #### How did you try to break this feature and the system with this change?
   
   This change should not break anything as it simply fix the wrong return code 
when umount fails, and add more details in stdout and logs
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to