Re: [Bacula-users] SD stalled?!
Mandi! Josh Fisher In chel di` si favelave... > Try checking vchanger functions directly. Josh, sorry. Bacula and vchanger works as expected. dir and SD communicate correctly, and sd (seems to) work correctly with vchanger. The trouble arose 'sometime', typically when the remote branch user does not change the RDX cassette on scheduled time, or after some weeks of usage of the system, and appear 'spot', sometime on one site, sometime on other. Because i've setup a script that manage and eject the cassette, i'm simply thinking of restarting bacula-sd when there's a cassette change. But this seems strange to me... probably there's something i've missed... For example, what mean that a volume is 'reserved'?! Three's some way to 'unreserve' a volume? Thanks. ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] SD stalled?!
On 12/19/22 04:55, Marco Gaiarin wrote: Mandi! Josh Fisher In chel di` si favelave... Does the firewall on sdpve2.sd.lnf.it allow connections on TCP 9103? Is there an SD process on sdpve2.sd.lnf.it listening on port 9103? You can try to 'telnet sdpve2.sd.lnf.it 9103' command from the machine that bacula-dir runs on to see if it is connecting. Check that the SD password set in bacul-dir.conf matches the one in bacula-sd.conf. Something is preventing bacula-dir from connecting to bacula-sd. Comunication between dir and SD works as expected, there's no firewall or other connection troubles in between. I can send mount or umount command, simply they are ignored: Try checking vchanger functions directly. Use the -u and -g flags to run vchanger as the same user and group that bacula-sd runs as. As root, try: vchanger -u bacula -g tape /path/to/vchanger.conf load 5 /dev/null 0 The 3rd positional argument is the slot number, the 5th the drive number. It could be a permission issue. When vchanger is invoked by bacula-sd, it will run as the same user and group as bacula-sd. So the vchanger work directory needs to be writable by that user. Also, the filesystem that the backup is being written to needs to be writable by the user that bacula-sd runs as. Device status: Autochanger "RDXAutochanger" with devices: "RDXStorage0" (/var/spool/vchanger/VIPVE2RDX/0) "RDXStorage1" (/var/spool/vchanger/VIPVE2RDX/1) "RDXStorage2" (/var/spool/vchanger/VIPVE2RDX/2) Device File: "RDXStorage0" (/var/spool/vchanger/VIPVE2RDX/0) is not open. Device is being initialized. Drive 0 is not loaded. == Device File: "RDXStorage1" (/var/spool/vchanger/VIPVE2RDX/1) is not open. Slot 6 was last loaded in drive 1. == Device File: "RDXStorage2" (/var/spool/vchanger/VIPVE2RDX/2) is not open. Drive 2 is not loaded. == Used Volume status: Reserved volume: VIPVE2RDX_0002_0003 on File device "RDXStorage0" (/var/spool/vchanger/VIPVE2RDX/0) Reader=0 writers=0 reserves=2 volinuse=1 worm=0 *umount storage=VIPVE2RDX Automatically selected Catalog: BaculaLNF Using Catalog "BaculaLNF" Enter autochanger drive[0]: 3901 Device ""RDXStorage0" (/var/spool/vchanger/VIPVE2RDX/0)" is already unmounted. Seems that simply the SD is on a 'unknown' state: they pretend to have a volume 'reserved' (what mean?) and do nothing. After a restart of the SD, current (stalling) job got canceled, but if i rerun it, thay work as expected... ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] SD stalled?!
Mandi! Josh Fisher In chel di` si favelave... > Does the firewall on sdpve2.sd.lnf.it allow connections on TCP 9103? Is > there an SD process on sdpve2.sd.lnf.it listening on port 9103? You can try > to 'telnet sdpve2.sd.lnf.it 9103' command from the machine that bacula-dir > runs on to see if it is connecting. Check that the SD password set in > bacul-dir.conf matches the one in bacula-sd.conf. Something is preventing > bacula-dir from connecting to bacula-sd. Comunication between dir and SD works as expected, there's no firewall or other connection troubles in between. I can send mount or umount command, simply they are ignored: Device status: Autochanger "RDXAutochanger" with devices: "RDXStorage0" (/var/spool/vchanger/VIPVE2RDX/0) "RDXStorage1" (/var/spool/vchanger/VIPVE2RDX/1) "RDXStorage2" (/var/spool/vchanger/VIPVE2RDX/2) Device File: "RDXStorage0" (/var/spool/vchanger/VIPVE2RDX/0) is not open. Device is being initialized. Drive 0 is not loaded. == Device File: "RDXStorage1" (/var/spool/vchanger/VIPVE2RDX/1) is not open. Slot 6 was last loaded in drive 1. == Device File: "RDXStorage2" (/var/spool/vchanger/VIPVE2RDX/2) is not open. Drive 2 is not loaded. == Used Volume status: Reserved volume: VIPVE2RDX_0002_0003 on File device "RDXStorage0" (/var/spool/vchanger/VIPVE2RDX/0) Reader=0 writers=0 reserves=2 volinuse=1 worm=0 *umount storage=VIPVE2RDX Automatically selected Catalog: BaculaLNF Using Catalog "BaculaLNF" Enter autochanger drive[0]: 3901 Device ""RDXStorage0" (/var/spool/vchanger/VIPVE2RDX/0)" is already unmounted. Seems that simply the SD is on a 'unknown' state: they pretend to have a volume 'reserved' (what mean?) and do nothing. After a restart of the SD, current (stalling) job got canceled, but if i rerun it, thay work as expected... ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] SD stalled?!
On 12/15/22 04:37, Marco Gaiarin wrote: Mandi! Josh Fisher via Bacula-users In chel di` si favelave... Also, you have tried using 'umount' and 'update slots' in bconsole, but did you try the 'mount' command? It is the mount command that would cause bacula-dir to choose a volume and invoke vchanger to load it. After restarting the SD in logs i got: 15-Dec 10:30 sdinny-fd JobId 2318: Fatal error: job.c:2004 Bad response to Append Data command. Wanted 3000 OK data, got 3903 Error append data: vol_mgr.c:420 Cannot reserve Volume=SDPVE2RDX_0001_0003 because drive is busy with Volume=SDPVE2RDX_0002_0004 (JobId=0). 15-Dec 10:30 lnfbacula-dir JobId 0: Fatal error: authenticate.c:123 Director unable to authenticate with Storage daemon at "sdpve2.sd.lnf.it:9103". Possible causes: Passwords or names not the same or Maximum Concurrent Jobs exceeded on the SD or SD networking messed up (restart daemon). For help, please see: http://www.bacula.org/rel-manual/en/problems/Bacula_Frequently_Asked_Que.html Does the firewall on sdpve2.sd.lnf.it allow connections on TCP 9103? Is there an SD process on sdpve2.sd.lnf.it listening on port 9103? You can try to 'telnet sdpve2.sd.lnf.it 9103' command from the machine that bacula-dir runs on to see if it is connecting. Check that the SD password set in bacul-dir.conf matches the one in bacula-sd.conf. Something is preventing bacula-dir from connecting to bacula-sd. ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] SD stalled?!
Mandi! Josh Fisher via Bacula-users In chel di` si favelave... > I don't see that Dir is reporting any errors invoking vchanger, no > timeouts or vchanger errors, but are there any vchanger processes still > running? The vchanger processes should be very short lived. In No, there's no vchanger, bconsole or other bacula binary running, apart the SD (and a fd). > Are you sure that the filesystem is mounted at the mount point pointed > to by /var/spool/vchanger/SDPVE2RDX/0 ? Versions of vchanger before > 1.0.3 used the nohup command in udev scripts which does not work as > expected when invoked by udev and can cause the udev auto-mounting to fail. > > Another problem with versions before 1.0.3 is that the locking used to > serializes concurrent vchanger processes had a race condition that could > prevent a vchanger instance from running and cause a LOAD or UNLOAD > command to fail, although that should be logged as a timeout error by > bacula-dir. As a diagnostic aid, you can turn off this behavior in the > vchanger config by setting bconsole="". That will prevent vchanger from > invoking bconsole at all and eliminate the possibility of the race > condition. Vchanger is version 1.0.3, and effectively no, there's no link to the loaded media, that seems coherent anyway: root@sdpve2:~# df | grep vchanger /dev/sdc1 1921798404 222569068 1601537480 13% /mnt/vchanger/49a2d36c-d620-426c-90de-de8c009aeeeb root@sdpve2:~# ls -la /var/spool/vchanger/SDPVE2RDX/ total 2 drwxr-x--- 2 bacula tape4 Dec 12 06:37 . drwxr-xr-x 3 bacula bacula 3 Jul 12 16:17 .. -rw-r- 1 bacula bacula 47 Dec 12 14:37 bay_state-1 -rw-r- 1 bacula tape 17 Dec 12 14:37 dynamic.conf > Also, you have tried using 'umount' and 'update slots' in bconsole, but > did you try the 'mount' command? It is the mount command that would > cause bacula-dir to choose a volume and invoke vchanger to load it. If i try on 'drive 0' i got: *mount storage=SDPVE2RDX Automatically selected Catalog: BaculaLNF Using Catalog "BaculaLNF" Enter autochanger drive[0]: Enter autochanger slot: 3001 Device ""RDXStorage0" (/var/spool/vchanger/SDPVE2RDX/0)" is doing acquire. on drive 1: *mount storage=SDPVE2RDX Automatically selected Catalog: BaculaLNF Using Catalog "BaculaLNF" Enter autochanger drive[0]: 1 Enter autochanger slot: 3906 File device ""RDXStorage1" (/var/spool/vchanger/SDPVE2RDX/1)" is always mounted. * *umount storage=SDPVE2RDX Enter autochanger drive[0]: 1 3901 Device ""RDXStorage1" (/var/spool/vchanger/SDPVE2RDX/1)" is already unmounted. * *mount storage=SDPVE2RDX Enter autochanger drive[0]: 1 Enter autochanger slot: 3906 File device ""RDXStorage1" (/var/spool/vchanger/SDPVE2RDX/1)" is always mounted. Now i need to restart the SD... -- Qualche saggio scriveva che l'unico sistema economico funzionante gli pareva il libero mercato in cui il governo tiene la pistola puntata alla tempia delle corporations. Peccato che il governo sia più solito offrire modiche quantità di vaselina al consumatore, occasionalmente... (Emanuele Pucciarelli) ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] SD stalled?!
Mandi! Josh Fisher via Bacula-users In chel di` si favelave... > Also, you have tried using 'umount' and 'update slots' in bconsole, but > did you try the 'mount' command? It is the mount command that would > cause bacula-dir to choose a volume and invoke vchanger to load it. After restarting the SD in logs i got: 15-Dec 10:30 sdinny-fd JobId 2318: Fatal error: job.c:2004 Bad response to Append Data command. Wanted 3000 OK data, got 3903 Error append data: vol_mgr.c:420 Cannot reserve Volume=SDPVE2RDX_0001_0003 because drive is busy with Volume=SDPVE2RDX_0002_0004 (JobId=0). 15-Dec 10:30 lnfbacula-dir JobId 0: Fatal error: authenticate.c:123 Director unable to authenticate with Storage daemon at "sdpve2.sd.lnf.it:9103". Possible causes: Passwords or names not the same or Maximum Concurrent Jobs exceeded on the SD or SD networking messed up (restart daemon). For help, please see: http://www.bacula.org/rel-manual/en/problems/Bacula_Frequently_Asked_Que.html running jobs failed of course, but a subsequent manual run: 15-Dec 10:32 lnfbacula-dir JobId 2350: Start Backup JobId 2350, Job=SDPVE2-VMs.2022-12-15_10.32.49_46 15-Dec 10:32 lnfbacula-dir JobId 2350: Using Device "RDXStorage0" to write. 15-Dec 10:32 sdpve2-fd JobId 2350: DIR and FD clocks differ by -9 seconds, FD automatically compensating. 15-Dec 10:32 sdpve2-sd JobId 2350: 3304 Issuing autochanger "load Volume SDPVE2RDX_0001_0003, Slot 4, Drive 0" command. 15-Dec 10:32 sdpve2-sd JobId 2350: 3305 Autochanger "load Volume SDPVE2RDX_0001_0003, Slot 4, Drive 0", status is OK. 15-Dec 10:32 sdpve2-sd JobId 2350: Recycled volume "SDPVE2RDX_0001_0003" on File device "RDXStorage0" (/var/spool/vchanger/SDPVE2RDX/0), all previous data lost. and backup is running... -- Solo una sana e consapevole libidine salva il giovane dagli SCOUT e dall'Azione Cattolica! (Zucchero, circa O;-) ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] SD stalled?!
On 12/12/22 10:47, Marco Gaiarin wrote: I'm using 'vchanger' script defining some virtual changer. I don't see that Dir is reporting any errors invoking vchanger, no timeouts or vchanger errors, but are there any vchanger processes still running? The vchanger processes should be very short lived. In Are you sure that the filesystem is mounted at the mount point pointed to by /var/spool/vchanger/SDPVE2RDX/0 ? Versions of vchanger before 1.0.3 used the nohup command in udev scripts which does not work as expected when invoked by udev and can cause the udev auto-mounting to fail. Another problem with versions before 1.0.3 is that the locking used to serializes concurrent vchanger processes had a race condition that could prevent a vchanger instance from running and cause a LOAD or UNLOAD command to fail, although that should be logged as a timeout error by bacula-dir. As a diagnostic aid, you can turn off this behavior in the vchanger config by setting bconsole="". That will prevent vchanger from invoking bconsole at all and eliminate the possibility of the race condition. Also, you have tried using 'umount' and 'update slots' in bconsole, but did you try the 'mount' command? It is the mount command that would cause bacula-dir to choose a volume and invoke vchanger to load it. Sometimes, and i've looked at but found nothing in logs, the SD 'stalled'; typical situation: *status storage=SDPVE2RDX Connecting to Storage daemon SDPVE2RDX at sdpve2.sd.lnf.it:9103 sdpve2-sd Version: 9.4.2 (04 February 2019) x86_64-pc-linux-gnu debian 10.5 Daemon started 19-Sep-22 13:01. Jobs: run=90, running=1. Heap: heap=139,264 smbytes=799,681 max_bytes=1,701,085 bufs=209 max_bufs=375 Sizes: boffset_t=8 size_t=8 int32_t=4 int64_t=8 mode=0,0 newbsr=0 Res: ndevices=3 nautochgr=1 Running Jobs: Writing: Incremental Backup job Sdinny JobId=2263 Volume="" pool="SDPVE2RDXPool" device="RDXStorage0" (/var/spool/vchanger/SDPVE2RDX/0) spooling=0 despooling=0 despool_wait=0 Files=0 Bytes=0 AveBytes/sec=0 LastBytes/sec=0 FDReadSeqNo=6 in_msg=6 out_msg=4 fd=5 Writing: Full Backup job SDPVE2-VMs JobId=2274 Volume="" pool="SDPVE2RDXPool" device="RDXStorage0" (/var/spool/vchanger/SDPVE2RDX/0) spooling=0 despooling=0 despool_wait=0 Files=0 Bytes=0 AveBytes/sec=0 LastBytes/sec=0 FDReadSeqNo=6 in_msg=6 out_msg=5 fd=7 Jobs waiting to reserve a drive: Terminated Jobs: JobId LevelFiles Bytes Status FinishedName === 2120 Incr 7,388168.4 M OK 01-Dec-22 23:01 Sdinny 2137 Full620,898179.7 G OK 02-Dec-22 22:07 Sdinny 2147 Incr94993.80 M OK 03-Dec-22 23:02 Sdinny 2158 Full 646.26 G OK 04-Dec-22 02:05 SDPVE2-VMs 2168 Incr54293.40 M OK 04-Dec-22 23:02 Sdinny 2185 Incr 8,063227.5 M OK 05-Dec-22 23:02 Sdinny 2202 Incr 8,497257.1 M OK 06-Dec-22 23:06 Sdinny 2219 Incr 9,638228.3 M OK 07-Dec-22 23:02 Sdinny 2236 Incr98693.80 M OK 08-Dec-22 23:02 Sdinny 2253 Full 0 0 Error09-Dec-22 20:01 Sdinny Device status: Autochanger "RDXAutochanger" with devices: "RDXStorage0" (/var/spool/vchanger/SDPVE2RDX/0) "RDXStorage1" (/var/spool/vchanger/SDPVE2RDX/1) "RDXStorage2" (/var/spool/vchanger/SDPVE2RDX/2) Device File: "RDXStorage0" (/var/spool/vchanger/SDPVE2RDX/0) is not open. Device is being initialized. Drive 0 is not loaded. == Device File: "RDXStorage1" (/var/spool/vchanger/SDPVE2RDX/1) is not open. Drive 1 is not loaded. == Device File: "RDXStorage2" (/var/spool/vchanger/SDPVE2RDX/2) is not open. Drive 2 is not loaded. == Used Volume status: Reserved volume: SDPVE2RDX_0002_0004 on File device "RDXStorage0" (/var/spool/vchanger/SDPVE2RDX/0) Reader=0 writers=0 reserves=2 volinuse=1 worm=0 Attr spooling: 0 active jobs, 0 bytes; 86 total jobs, 178,259,171 max bytes. eg, there are jobs stalled waiting a volume, in director: Running Jobs: Console connected at 12-Dec-22 14:31 JobId Type Level Files Bytes Name Status == 2263 Back Incr 0 0 Sdinnyis running 2274 Back Full 0 0 SDPVE2-VMsis running but seems that the virtual changer is still on the 'Reserved volume: SDPVE2RDX_0002_0004'. If i try to 'umount', 'update slots', ... nothing changed. Volumes status seems OK: *list media pool=SDPVE2RDXPool Using Catalog "BaculaLNF" +-+-+---+-+-+--+--+-+--+---+---+-+--+-+---+ | mediaid | volumename | volstatus |