This has all the earmarks of a race condition because it is totally intermittent. It succeeds maybe 80% of the time.
I am using rsync to backup a Linux system to a pair of thumb drives which both appear to be healthy. The mounting process goes as follows: #Combine 2 256-GB drives in to 1 512 GB drive. mount /rsnapshot1 mount /rsnapshot2 mhddfs /rsnapshot1,/rsnapshot2 /var/cache/rsnapshot -o mlimit=100M If one does # df -h /var/cache/rsnapshot Filesystem Size Used Avail Use% Mounted on /rsnapshot1;/rsnapshot2 463G 173G 267G 40% /var/cache/rsnapshot That all works as it should. One can run rsnapshot and get a backup of today's file system. The /etc/rsnapshot.conf file is set to call the mount process before rsync runs and then do the umount after it finishes cmd_preexec /usr/local/etc/mtbkmedia # Specify the path to a script (and any optional arguments) to run right # after rsnapshot syncs files # cmd_postexec /usr/local/etc/umbkmedia My problem may be with how I am unmounting everything so umbkmedia follows: #!/bin/sh umount /var/cache/rsnapshot /rsnapshot2 /rsnapshot1 exit 0 Normally, this simply works and /var/cache/rsnapshot ends up empty but when one of these intermittent explosions happens, I receive the following Date: Tue, 11 Sep 2018 00:06:23 -0500 From: root@wb5agz (Cron Daemon) Subject: Cron <root@wb5agz> /usr/local/etc/daily_backup From root@wb5agz Tue Sep 11 00: 06:24 2018 /bin/rm: cannot remove '/var/cache/rsnapshot/halfday.1/wb5agz/home/usr/lib/i386 -linux-gnu': Transport endpoint is not connected /bin/rm: cannot remove '/var/cache/rsnapshot/halfday.1/wb5agz/home/usr/lib/libg pgme-pth.so.11': Transport endpoint is not connected That is the beginning of what was, to day, a 152-line message in which all of the error messages ended in "Transport endpoint is not connected" When I have discovered one of these crashes, I have re-run the script as root and it usually runs perfectly the second time defying the definition of madness which is to keep doing the same and expect different results. You frequently get them in the form of a proper backup. Today, I manually re-ran the backup and this time, it actually failed from the command line with the same error messages for each file mentioned. The spew frequently highlights a different set of directories. Look at the two drives later and they are fine except that one does not get the last backup as rsync saw the errors and you're left with the last good backup. I did a ls /var/cache/rsnapshot after the big spew and got an error about "Transport endpoint is not connected" again. I have actually tried umount /rsnapshot2 /rsnapshot1 /var/cache/rsnapshot as well as umount /var/cache/rsnapshot /rsnapshot2 /rsnapshot1 . I was thinking that the order might make a difference but have gotten as many good runs with either order. If one looks in /var/log/syslog, one sees the mounting of the two drives and no errors and there are no errors reported if you watch it happen. Are there any ideas on how to do the umount to insure that all the inodes are in the state they should be in before the umount is done? Normally, this blocks until every inode is set and the umount succeeds. I have been chasing this rabbit for quite a while now and it can sometimes be weeks without a spew, just long enough to think that the last rejiggering of the order for unmounting or someother futile rearranging of the Titanic's deck chairs actually made a difference. Any constructive ideas are appreciated. If I left the drives mounted all the time, there would be no spew but since these are backup drives, having them mounted all the time is quite risky. Martin McCormick WB5AGZ