Re: [Bacula-users] SD stalled?!

2022-12-19 Thread Marco Gaiarin
Mandi! Josh Fisher
  In chel di` si favelave...

> Try checking vchanger functions directly.

Josh, sorry.


Bacula and vchanger works as expected. dir and SD communicate correctly, and
sd (seems to) work correctly with vchanger.

The trouble arose 'sometime', typically when the remote branch user does not
change the RDX cassette on scheduled time, or after some weeks of usage of
the system, and appear 'spot', sometime on one site, sometime on other.


Because i've setup a script that manage and eject the cassette, i'm simply
thinking of restarting bacula-sd when there's a cassette change.

But this seems strange to me... probably there's something i've missed...


For example, what mean that a volume is 'reserved'?! Three's some way to
'unreserve' a volume?


Thanks.



___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] SD stalled?!

2022-12-19 Thread Josh Fisher via Bacula-users


On 12/19/22 04:55, Marco Gaiarin wrote:

Mandi! Josh Fisher
   In chel di` si favelave...


Does the firewall on sdpve2.sd.lnf.it allow connections on TCP 9103? Is
there an SD process on sdpve2.sd.lnf.it listening on port 9103? You can try
to 'telnet sdpve2.sd.lnf.it 9103' command from the machine that bacula-dir
runs on to see if it is connecting. Check that the SD password set in
bacul-dir.conf matches the one in bacula-sd.conf. Something is preventing
bacula-dir from connecting to bacula-sd.

Comunication between dir and SD works as expected, there's no firewall or
other connection troubles in between.

I can send mount or umount command, simply they are ignored:



Try checking vchanger functions directly. Use the -u and -g flags to run 
vchanger as the same user and group that bacula-sd runs as. As root, try:


        vchanger -u bacula -g tape /path/to/vchanger.conf load 5 
/dev/null 0


The 3rd positional argument is the slot number, the 5th the drive number.

It could be a permission issue. When vchanger is invoked by bacula-sd, 
it will run as the same user and group as bacula-sd. So the vchanger 
work directory needs to be writable by that user.


Also, the filesystem that the backup is being written to needs to be 
writable by the user that bacula-sd runs as.





  Device status:
  Autochanger "RDXAutochanger" with devices:
"RDXStorage0" (/var/spool/vchanger/VIPVE2RDX/0)
"RDXStorage1" (/var/spool/vchanger/VIPVE2RDX/1)
"RDXStorage2" (/var/spool/vchanger/VIPVE2RDX/2)

  Device File: "RDXStorage0" (/var/spool/vchanger/VIPVE2RDX/0) is not open.
Device is being initialized.
Drive 0 is not loaded.
  ==

  Device File: "RDXStorage1" (/var/spool/vchanger/VIPVE2RDX/1) is not open.
Slot 6 was last loaded in drive 1.
  ==

  Device File: "RDXStorage2" (/var/spool/vchanger/VIPVE2RDX/2) is not open.
Drive 2 is not loaded.
  ==
  

  Used Volume status:
  Reserved volume: VIPVE2RDX_0002_0003 on File device "RDXStorage0" 
(/var/spool/vchanger/VIPVE2RDX/0)
 Reader=0 writers=0 reserves=2 volinuse=1 worm=0



  *umount storage=VIPVE2RDX
  Automatically selected Catalog: BaculaLNF
  Using Catalog "BaculaLNF"
  Enter autochanger drive[0]:
  3901 Device ""RDXStorage0" (/var/spool/vchanger/VIPVE2RDX/0)" is already 
unmounted.


Seems that simply the SD is on a 'unknown' state: they pretend to have a
volume 'reserved' (what mean?) and do nothing.

After a restart of the SD, current (stalling) job got canceled, but if i
rerun it, thay work as expected...



___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users



___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] SD stalled?!

2022-12-19 Thread Marco Gaiarin
Mandi! Josh Fisher
  In chel di` si favelave...

> Does the firewall on sdpve2.sd.lnf.it allow connections on TCP 9103? Is
> there an SD process on sdpve2.sd.lnf.it listening on port 9103? You can try
> to 'telnet sdpve2.sd.lnf.it 9103' command from the machine that bacula-dir
> runs on to see if it is connecting. Check that the SD password set in
> bacul-dir.conf matches the one in bacula-sd.conf. Something is preventing
> bacula-dir from connecting to bacula-sd.

Comunication between dir and SD works as expected, there's no firewall or
other connection troubles in between.

I can send mount or umount command, simply they are ignored:

 Device status:
 Autochanger "RDXAutochanger" with devices:
   "RDXStorage0" (/var/spool/vchanger/VIPVE2RDX/0)
   "RDXStorage1" (/var/spool/vchanger/VIPVE2RDX/1)
   "RDXStorage2" (/var/spool/vchanger/VIPVE2RDX/2)

 Device File: "RDXStorage0" (/var/spool/vchanger/VIPVE2RDX/0) is not open.
   Device is being initialized.
   Drive 0 is not loaded.
 ==

 Device File: "RDXStorage1" (/var/spool/vchanger/VIPVE2RDX/1) is not open.
   Slot 6 was last loaded in drive 1.
 ==

 Device File: "RDXStorage2" (/var/spool/vchanger/VIPVE2RDX/2) is not open.
   Drive 2 is not loaded.
 ==
 

 Used Volume status:
 Reserved volume: VIPVE2RDX_0002_0003 on File device "RDXStorage0" 
(/var/spool/vchanger/VIPVE2RDX/0)
Reader=0 writers=0 reserves=2 volinuse=1 worm=0 



 *umount storage=VIPVE2RDX
 Automatically selected Catalog: BaculaLNF
 Using Catalog "BaculaLNF"
 Enter autochanger drive[0]: 
 3901 Device ""RDXStorage0" (/var/spool/vchanger/VIPVE2RDX/0)" is already 
unmounted.


Seems that simply the SD is on a 'unknown' state: they pretend to have a
volume 'reserved' (what mean?) and do nothing.

After a restart of the SD, current (stalling) job got canceled, but if i
rerun it, thay work as expected...



___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] SD stalled?!

2022-12-16 Thread Josh Fisher via Bacula-users

On 12/15/22 04:37, Marco Gaiarin wrote:

Mandi! Josh Fisher via Bacula-users
   In chel di` si favelave...


Also, you have tried using 'umount' and 'update slots' in bconsole, but
did you try the 'mount' command? It is the mount command that would
cause bacula-dir to choose a volume and invoke vchanger to load it.

After restarting the SD in logs i got:

  15-Dec 10:30 sdinny-fd JobId 2318: Fatal error: job.c:2004 Bad response to 
Append Data command. Wanted 3000 OK data, got 3903 Error append data: 
vol_mgr.c:420 Cannot reserve Volume=SDPVE2RDX_0001_0003 because drive is busy 
with Volume=SDPVE2RDX_0002_0004 (JobId=0).
  15-Dec 10:30 lnfbacula-dir JobId 0: Fatal error: authenticate.c:123 Director unable to 
authenticate with Storage daemon at "sdpve2.sd.lnf.it:9103". Possible causes:
  Passwords or names not the same or
  Maximum Concurrent Jobs exceeded on the SD or
  SD networking messed up (restart daemon).
  For help, please see: 
http://www.bacula.org/rel-manual/en/problems/Bacula_Frequently_Asked_Que.html


Does the firewall on sdpve2.sd.lnf.it allow connections on TCP 9103? Is 
there an SD process on sdpve2.sd.lnf.it listening on port 9103? You can 
try to 'telnet sdpve2.sd.lnf.it 9103' command from the machine that 
bacula-dir runs on to see if it is connecting. Check that the SD 
password set in bacul-dir.conf matches the one in bacula-sd.conf. 
Something is preventing bacula-dir from connecting to bacula-sd.





___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] SD stalled?!

2022-12-15 Thread Marco Gaiarin
Mandi! Josh Fisher via Bacula-users
  In chel di` si favelave...

> I don't see that Dir is reporting any errors invoking vchanger, no 
> timeouts or vchanger errors, but are there any vchanger processes still 
> running? The vchanger processes should be very short lived. In

No, there's no vchanger, bconsole or other bacula binary running, apart the
SD (and a fd).


> Are you sure that the filesystem is mounted at the mount point pointed 
> to by /var/spool/vchanger/SDPVE2RDX/0 ? Versions of vchanger before 
> 1.0.3 used the nohup command in udev scripts which does not work as 
> expected when invoked by udev and can cause the udev auto-mounting to fail.
> 
> Another problem with versions before 1.0.3 is that the locking used to 
> serializes concurrent vchanger processes had a race condition that could 
> prevent a vchanger instance from running and cause a LOAD or UNLOAD 
> command to fail, although that should be logged as a timeout error by 
> bacula-dir. As a diagnostic aid, you can turn off this behavior in the 
> vchanger config by setting bconsole="". That will prevent vchanger from 
> invoking bconsole at all and eliminate the possibility of the race 
> condition.

Vchanger is version 1.0.3, and effectively no, there's no link to the loaded
media, that seems coherent anyway:

 root@sdpve2:~# df | grep vchanger
 /dev/sdc1   1921798404 222569068 1601537480  13% 
/mnt/vchanger/49a2d36c-d620-426c-90de-de8c009aeeeb
 root@sdpve2:~# ls -la /var/spool/vchanger/SDPVE2RDX/
 total 2
 drwxr-x--- 2 bacula tape4 Dec 12 06:37 .
 drwxr-xr-x 3 bacula bacula  3 Jul 12 16:17 ..
 -rw-r- 1 bacula bacula 47 Dec 12 14:37 bay_state-1
 -rw-r- 1 bacula tape   17 Dec 12 14:37 dynamic.conf


> Also, you have tried using 'umount' and 'update slots' in bconsole, but 
> did you try the 'mount' command? It is the mount command that would 
> cause bacula-dir to choose a volume and invoke vchanger to load it.

If i try on 'drive 0' i got:

 *mount storage=SDPVE2RDX 
 Automatically selected Catalog: BaculaLNF
 Using Catalog "BaculaLNF"
 Enter autochanger drive[0]: 
 Enter autochanger slot: 
 3001 Device ""RDXStorage0" (/var/spool/vchanger/SDPVE2RDX/0)" is doing acquire.

on drive 1:

 *mount storage=SDPVE2RDX
 Automatically selected Catalog: BaculaLNF
 Using Catalog "BaculaLNF"
 Enter autochanger drive[0]: 1
 Enter autochanger slot: 
 3906 File device ""RDXStorage1" (/var/spool/vchanger/SDPVE2RDX/1)" is always 
mounted.
 * 
 *umount storage=SDPVE2RDX
 Enter autochanger drive[0]: 1
 3901 Device ""RDXStorage1" (/var/spool/vchanger/SDPVE2RDX/1)" is already 
unmounted.
 *
 *mount storage=SDPVE2RDX
 Enter autochanger drive[0]: 1
 Enter autochanger slot: 
 3906 File device ""RDXStorage1" (/var/spool/vchanger/SDPVE2RDX/1)" is always 
mounted.


Now i need to restart the SD...

-- 
  Qualche saggio scriveva che l'unico sistema economico funzionante
  gli pareva il libero mercato in cui il governo tiene la pistola puntata
  alla tempia delle corporations.  Peccato che il governo sia più solito
  offrire modiche quantità di vaselina al consumatore, occasionalmente...
(Emanuele Pucciarelli)




___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] SD stalled?!

2022-12-15 Thread Marco Gaiarin
Mandi! Josh Fisher via Bacula-users
  In chel di` si favelave...

> Also, you have tried using 'umount' and 'update slots' in bconsole, but 
> did you try the 'mount' command? It is the mount command that would 
> cause bacula-dir to choose a volume and invoke vchanger to load it.

After restarting the SD in logs i got:

 15-Dec 10:30 sdinny-fd JobId 2318: Fatal error: job.c:2004 Bad response to 
Append Data command. Wanted 3000 OK data, got 3903 Error append data: 
vol_mgr.c:420 Cannot reserve Volume=SDPVE2RDX_0001_0003 because drive is busy 
with Volume=SDPVE2RDX_0002_0004 (JobId=0).
 15-Dec 10:30 lnfbacula-dir JobId 0: Fatal error: authenticate.c:123 Director 
unable to authenticate with Storage daemon at "sdpve2.sd.lnf.it:9103". Possible 
causes:
 Passwords or names not the same or
 Maximum Concurrent Jobs exceeded on the SD or
 SD networking messed up (restart daemon).
 For help, please see: 
http://www.bacula.org/rel-manual/en/problems/Bacula_Frequently_Asked_Que.html

running jobs failed of course, but a subsequent manual run:

 15-Dec 10:32 lnfbacula-dir JobId 2350: Start Backup JobId 2350, 
Job=SDPVE2-VMs.2022-12-15_10.32.49_46
 15-Dec 10:32 lnfbacula-dir JobId 2350: Using Device "RDXStorage0" to write.
 15-Dec 10:32 sdpve2-fd JobId 2350: DIR and FD clocks differ by -9 seconds, FD 
automatically compensating.
 15-Dec 10:32 sdpve2-sd JobId 2350: 3304 Issuing autochanger "load Volume 
SDPVE2RDX_0001_0003, Slot 4, Drive 0" command.
 15-Dec 10:32 sdpve2-sd JobId 2350: 3305 Autochanger "load Volume 
SDPVE2RDX_0001_0003, Slot 4, Drive 0", status is OK.
 15-Dec 10:32 sdpve2-sd JobId 2350: Recycled volume "SDPVE2RDX_0001_0003" on 
File device "RDXStorage0" (/var/spool/vchanger/SDPVE2RDX/0), all previous data 
lost.

and backup is running...

-- 
  Solo una sana e consapevole libidine
  salva il giovane dagli SCOUT e dall'Azione Cattolica!
(Zucchero, circa O;-)




___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] SD stalled?!

2022-12-13 Thread Josh Fisher via Bacula-users



On 12/12/22 10:47, Marco Gaiarin wrote:

I'm using 'vchanger' script defining some virtual changer.


I don't see that Dir is reporting any errors invoking vchanger, no 
timeouts or vchanger errors, but are there any vchanger processes still 
running? The vchanger processes should be very short lived. In


Are you sure that the filesystem is mounted at the mount point pointed 
to by /var/spool/vchanger/SDPVE2RDX/0 ? Versions of vchanger before 
1.0.3 used the nohup command in udev scripts which does not work as 
expected when invoked by udev and can cause the udev auto-mounting to fail.


Another problem with versions before 1.0.3 is that the locking used to 
serializes concurrent vchanger processes had a race condition that could 
prevent a vchanger instance from running and cause a LOAD or UNLOAD 
command to fail, although that should be logged as a timeout error by 
bacula-dir. As a diagnostic aid, you can turn off this behavior in the 
vchanger config by setting bconsole="". That will prevent vchanger from 
invoking bconsole at all and eliminate the possibility of the race 
condition.


Also, you have tried using 'umount' and 'update slots' in bconsole, but 
did you try the 'mount' command? It is the mount command that would 
cause bacula-dir to choose a volume and invoke vchanger to load it.





Sometimes, and i've looked at but found nothing in logs, the SD 'stalled';
typical situation:

*status storage=SDPVE2RDX
Connecting to Storage daemon SDPVE2RDX at sdpve2.sd.lnf.it:9103

sdpve2-sd Version: 9.4.2 (04 February 2019) x86_64-pc-linux-gnu debian 10.5
Daemon started 19-Sep-22 13:01. Jobs: run=90, running=1.
  Heap: heap=139,264 smbytes=799,681 max_bytes=1,701,085 bufs=209 max_bufs=375
  Sizes: boffset_t=8 size_t=8 int32_t=4 int64_t=8 mode=0,0 newbsr=0
  Res: ndevices=3 nautochgr=1

Running Jobs:
Writing: Incremental Backup job Sdinny JobId=2263 Volume=""
 pool="SDPVE2RDXPool" device="RDXStorage0" (/var/spool/vchanger/SDPVE2RDX/0)
 spooling=0 despooling=0 despool_wait=0
 Files=0 Bytes=0 AveBytes/sec=0 LastBytes/sec=0
 FDReadSeqNo=6 in_msg=6 out_msg=4 fd=5
Writing: Full Backup job SDPVE2-VMs JobId=2274 Volume=""
 pool="SDPVE2RDXPool" device="RDXStorage0" (/var/spool/vchanger/SDPVE2RDX/0)
 spooling=0 despooling=0 despool_wait=0
 Files=0 Bytes=0 AveBytes/sec=0 LastBytes/sec=0
 FDReadSeqNo=6 in_msg=6 out_msg=5 fd=7


Jobs waiting to reserve a drive:


Terminated Jobs:
  JobId  LevelFiles  Bytes   Status   FinishedName
===
   2120  Incr  7,388168.4 M  OK   01-Dec-22 23:01 Sdinny
   2137  Full620,898179.7 G  OK   02-Dec-22 22:07 Sdinny
   2147  Incr94993.80 M  OK   03-Dec-22 23:02 Sdinny
   2158  Full  646.26 G  OK   04-Dec-22 02:05 SDPVE2-VMs
   2168  Incr54293.40 M  OK   04-Dec-22 23:02 Sdinny
   2185  Incr  8,063227.5 M  OK   05-Dec-22 23:02 Sdinny
   2202  Incr  8,497257.1 M  OK   06-Dec-22 23:06 Sdinny
   2219  Incr  9,638228.3 M  OK   07-Dec-22 23:02 Sdinny
   2236  Incr98693.80 M  OK   08-Dec-22 23:02 Sdinny
   2253  Full  0 0   Error09-Dec-22 20:01 Sdinny


Device status:
Autochanger "RDXAutochanger" with devices:
"RDXStorage0" (/var/spool/vchanger/SDPVE2RDX/0)
"RDXStorage1" (/var/spool/vchanger/SDPVE2RDX/1)
"RDXStorage2" (/var/spool/vchanger/SDPVE2RDX/2)

Device File: "RDXStorage0" (/var/spool/vchanger/SDPVE2RDX/0) is not open.
Device is being initialized.
Drive 0 is not loaded.
==

Device File: "RDXStorage1" (/var/spool/vchanger/SDPVE2RDX/1) is not open.
Drive 1 is not loaded.
==

Device File: "RDXStorage2" (/var/spool/vchanger/SDPVE2RDX/2) is not open.
Drive 2 is not loaded.
==


Used Volume status:
Reserved volume: SDPVE2RDX_0002_0004 on File device "RDXStorage0" 
(/var/spool/vchanger/SDPVE2RDX/0)
 Reader=0 writers=0 reserves=2 volinuse=1 worm=0


Attr spooling: 0 active jobs, 0 bytes; 86 total jobs, 178,259,171 max bytes.



eg, there are jobs stalled waiting a volume, in director:

Running Jobs:
Console connected at 12-Dec-22 14:31
  JobId  Type Level Files Bytes  Name  Status
==
   2263  Back Incr  0 0  Sdinnyis running
   2274  Back Full  0 0  SDPVE2-VMsis running


but seems that the virtual changer is still on the 'Reserved volume:
SDPVE2RDX_0002_0004'.

If i try to 'umount', 'update slots', ... nothing changed. Volumes status
seems OK:

*list media pool=SDPVE2RDXPool
Using Catalog "BaculaLNF"
+-+-+---+-+-+--+--+-+--+---+---+-+--+-+---+
| mediaid | volumename  | volstatus |