Kern Sibbald schrieb:
Hmmm.  I think I am going to pack it in and take a year's vacation ...

Damn :-)

You _did_ notice Richard Knights Mail, right?

By the way - I have left the backup server in the state I described. Logging is turned off because my limited disk-space would soon result in more trouble, but I *can* still connect to the DIR and issue commands... with temporarily turning on debug logging, this might give some clues. Apart from that, I'm seriously considering installing gdb on that machine...

Arno

On Monday 12 December 2005 20:21, Arno Lehmann wrote:

Hi,

Arno Lehmann schrieb:

Hi,

Kern Sibbald schrieb:

...


Anyway, I would really appreciate it if you could try the 1.38.3
beta.  Please use the released tar file rather than the CVS -- the
HEAD CVS is now slightly behind the 1.38 branch in terms of fixes ...

I'll download it in a few minutes...


Although I haven't been able to reproduce the pool problems you are
having, I have been working on it nevertheless.  When I was able to
duplicate the two drive problem of not running jobs simultaneously on
both drives, I took the time to review the reservation system in
detail, and I found a few places where there *may* have been problems.

I should see if the version 1.38.3beta fixes my problem on Tuesday -
that's still my jobs-to-different-pools-day. I'll report what happens.
And, of course, I've got one run of straight and usual backups before
that, although I don't think there will be any problems tomorrow.

Unfortunately, I have to report something rather unexpected. The device
reservation system works a little too good ;-)

I started my bachup server this morning (or rather, it started itself).
All tape drives Bacula itself loads were empty, magazines were filled
with the correct volumes. Basically, the same situation I have 5 out of
seven days for about a year now, with many versions of Bacula.

Jobs were scheduled as usual and got triggered as usual, too.

After that, not much progress since this morning, 8:20 (almost 12 hours

ago):

12-Dec 11:21 goblin-sd: Job DracheStd.2005-12-12_08.20.00 waiting to
reserve a device. 12-Dec 11:26 goblin-sd: Job Goblin.2005-12-12_08.25.00
waiting to reserve a device. 12-Dec 11:26 goblin-sd: Job
Ork.2005-12-12_08.25.03 waiting to reserve a device. 12-Dec 11:26
goblin-sd: Job ElfSys.2005-12-12_08.25.01 waiting to reserve a device.
12-Dec 11:26 goblin-sd: Job ElfHome.2005-12-12_08.25.02 waiting to
reserve a device. 12-Dec 15:21 goblin-sd: Job
DracheStd.2005-12-12_08.20.00 waiting to reserve a device. 12-Dec 15:26
goblin-sd: Job Goblin.2005-12-12_08.25.00 waiting to reserve a device.
12-Dec 15:26 goblin-sd: Job Ork.2005-12-12_08.25.03 waiting to reserve a
device. 12-Dec 15:26 goblin-sd: Job ElfSys.2005-12-12_08.25.01 waiting to
reserve a device. 12-Dec 15:26 goblin-sd: Job ElfHome.2005-12-12_08.25.02
waiting to reserve a device.

these are the latest messages.

Or, in other words, since I upgraded to 1.38.3 beta, nothing has been
backed up at all.

The current director status is


#sta dir
goblin-dir Version: 1.38.3 (09 December 2005) i586-pc-linux-gnu suse 8.1
Daemon started 12-Dec-05 07:32, 1 Job run since started.

Scheduled Jobs:
Level          Type     Pri  Scheduled          Name               Volume
=========================================================================
========== Incremental Backup 10 13-Dec-05 08:20 DracheStd DLT-IV-0063 Incremental Backup 10 13-Dec-05 08:20 BeowulfStd DLT-IV-0063 Differential Backup 10 13-Dec-05 08:25 Goblin DLT-IV-0063 Differential Backup 10 13-Dec-05 08:25 ElfSys DLT-IV-0063 Differential Backup 10 13-Dec-05 08:25 ElfHome DLT-IV-0063 Differential Backup 10 13-Dec-05 08:25 Ork *unknown* Differential Backup 30 13-Dec-05 08:25 GoblinDB DLT-IV-0063 Differential Backup 10 13-Dec-05 08:30 BackupMail QIC-525-0046 Full Backup 100 13-Dec-05 08:33 BackupCatalog DAT-120-0006 Admin 999 13-Dec-05 09:00 Shutdown
====

Running Jobs:
JobId Level   Name                       Status
======================================================================
 4443 Increme  DracheStd.2005-12-12_08.20.00 is waiting on Storage HPDAT
 4445 Increme  Goblin.2005-12-12_08.25.00 is waiting on Storage HPDAT
 4446 Increme  ElfSys.2005-12-12_08.25.01 is waiting on Storage HPDAT
 4447 Increme  ElfHome.2005-12-12_08.25.02 is waiting on Storage HPDAT
 4448 Increme  Ork.2005-12-12_08.25.03 is waiting on Storage HPDAT
 4449 Increme  GoblinDB.2005-12-12_08.25.04 is waiting for higher
priority jobs to finish 4450 Differe  BackupMail.2005-12-12_08.30.00 is
waiting on max Client jobs 4451 Full    BackupCatalog.2005-12-12_08.34.00
is waiting execution 4452         Shutdown.2005-12-12_09.00.00 is waiting
execution ====

Terminated Jobs:
JobId  Level     Files      Bytes     Status   Finished        Name
========================================================================
 4434  Incr        152      5,453,504 OK       11-Dec-05 08:54 Goblin
 4435  Incr        277    323,115,310 OK       11-Dec-05 08:59 ElfSys
 4437  Incr        880    302,413,714 OK       11-Dec-05 08:59 Ork
 4439  Diff        872     17,081,058 OK       11-Dec-05 09:01
BackupMail 4432  Incr      1,082  1,506,827,769 OK       11-Dec-05 09:47
DracheStd 4438  Incr          2        755,036 OK       11-Dec-05 09:47
GoblinDB 4440  Full         16  1,704,566,707 OK       11-Dec-05 10:31
BackupCatalog 4441                0              0 OK       11-Dec-05
10:37 Shutdown 4442                0              0 OK       11-Dec-05
23:58 Shutdown 4444  Incr          0              0 Error    12-Dec-05
08:20 BeowulfStd

====

The last job mentioned failed because the run before script I use found
the host to be down and exited with a non-zero status. This is expected
and usual behaviour in case of the windows workstations I back up.

The SD status:

#setdebug level=200 sd=HPDAT
Connecting to Storage daemon HPDAT at goblin:9103
3000 OK setdebug=200
#sta sd=HPDAT
Connecting to Storage daemon HPDAT at goblin:9103

goblin-sd Version: 1.38.3 (09 December 2005) i586-pc-linux-gnu suse 8.1
Daemon started 12-Dec-05 07:32, 0 Jobs run since started.
Heap: bytes=50,665 max_bytes=116,676 bufs=241 max_bufs=249

Running Jobs:
No Jobs running.
====

Terminated Jobs:
JobId  Level   Files          Bytes Status   Finished        Name
======================================================================
 4422  Incr          0              0 OK       10-Dec-05 10:31 DracheStd
 4430  Full         16  1,704,173,700 OK       10-Dec-05 11:07
BackupCatalog 4436  Incr          6     81,815,937 OK       11-Dec-05
08:54 ElfHome 4434  Incr        152      5,467,497 OK       11-Dec-05
08:54 Goblin 4435  Incr        277    323,149,789 OK       11-Dec-05
08:59 ElfSys 4437  Incr        880    302,514,999 OK       11-Dec-05
08:59 Ork 4439  Diff        872     17,180,372 OK       11-Dec-05 09:01
BackupMail 4432  Incr      1,082  1,507,075,010 OK       11-Dec-05 09:47
DracheStd 4438  Incr          2        755,274 OK       11-Dec-05 09:47
GoblinDB 4440  Full         16  1,704,568,641 OK       11-Dec-05 10:31
BackupCatalog ====

Device status:
Autochanger "HP DAT" with devices:
  "HP DAT 0" (/dev/nst1)
Autochanger "FastStor 4000" with devices:
  "Quantum DLT 0" (/dev/nst2)
Device "GoblinFile" (/var/bacula/) is not open or does not exist.
Configured device capabilities:
EOF BSR BSF FSR FSF EOM !REM RACCESS AUTOMOUNT LABEL !ANONVOLS
!ALWAYSOPEN Device state:
!OPENED !TAPE !LABEL MALLOC !APPEND !READ !EOT !WEOT !EOF !NEXTVOL !SHORT
!MOUNTED num_writers=0 JobStatus=C block=0

Device parameters:
Archive name: /var/bacula/ Device name: GoblinFile
File=0 block=0
Min block=0 Max block=0
Device "Wangtek 6525" (/dev/nst0) is mounted with Volume "QIC-525-0046"
Configured device capabilities:
EOF BSR BSF FSR FSF EOM REM !RACCESS AUTOMOUNT !LABEL !ANONVOLS
ALWAYSOPEN Device state:
OPENED TAPE LABEL MALLOC !APPEND !READ !EOT !WEOT !EOF !NEXTVOL !SHORT
!MOUNTED num_writers=0 JobStatus=C block=0

Device parameters:
Archive name: /dev/nst0 Device name: Wangtek 6525
File=0 block=0
Min block=1024 Max block=1024
   Total Bytes Read=0 Blocks Read=0 Bytes/block=0
   Positioned at File=0 Block=0
Device "HP DAT 0" (/dev/nst1) open but no Bacula volume is mounted.
   Drive 0 is not loaded.
Configured device capabilities:
EOF BSR BSF FSR FSF EOM REM !RACCESS AUTOMOUNT !LABEL !ANONVOLS
ALWAYSOPEN Device state:
OPENED TAPE !LABEL MALLOC !APPEND !READ !EOT !WEOT !EOF !NEXTVOL !SHORT
!MOUNTED num_writers=0 JobStatus=C block=0

Device parameters:
Archive name: /dev/nst1 Device name: HP DAT 0
File=0 block=0
Min block=0 Max block=0
   Total Bytes Read=0 Blocks Read=0 Bytes/block=0
   Positioned at File=0 Block=0
Device "Quantum DLT 0" (/dev/nst2) open but no Bacula volume is mounted.
   Drive 0 is not loaded.
Configured device capabilities:
EOF BSR BSF FSR FSF EOM REM !RACCESS AUTOMOUNT !LABEL !ANONVOLS
ALWAYSOPEN Device state:
OPENED TAPE !LABEL MALLOC !APPEND !READ !EOT !WEOT !EOF !NEXTVOL !SHORT
!MOUNTED num_writers=0 JobStatus=C block=0

Device parameters:
Archive name: /dev/nst2 Device name: Quantum DLT 0
File=0 block=0
Min block=0 Max block=0
   Total Bytes Read=0 Blocks Read=0 Bytes/block=0
   Positioned at File=0 Block=0
====

In Use Volume status:
QIC-525-0046 on device "Wangtek 6525" (/dev/nst0)
====

In other words, the SD seems to be sure it doesn't have to do anything.

This morning, when I noticed that behaviour, I used mtx to load a tape

into drive HPDAT and issued the proper mount command. I do the same now:

#mount HPDAT
3905 Device "HP DAT 0" (/dev/nst1) open but no Bacula volume is mounted.
If this is not a blank tape, try unmounting and remounting the Volume.
#

This is hardly true - that same tape was ok up till yesterday. Apart

from that, the SD doesn't even try to read the tape:

goblin:~ # tapeinfo -f /dev/sg4
Product Type: Tape Drive
Vendor ID: 'HP      '
Product ID: 'C1553A          '
Revision: '9503'
Attached Changer: No
MinBlock:1
MaxBlock:16777215
SCSI ID: 5
SCSI LUN: 0
Ready: yes
BufferedMode: yes
Medium Type: Not Loaded
Density Code: 0x24
BlockSize: 0
DataCompEnabled: yes
DataCompCapable: yes
DataDeCompEnabled: yes
CompType: 0x20
DeCompType: 0x0
BOP: yes
Block Position: 0
goblin:~ #

Or, in other words, the tape which is loaded is still at BOT which
indicates that the SD didn't actually try to read it.

The debug log is filled with many thousand lines mostly created by
reserve.c in case of the mount command.

Any clues?

Arno


Also, if your problem originated from failed jobs where the Dir was
unable to contact a client, then it was most likely related to bug #
508, which is now fixed.

No, I don't think that's the problem.


Oh, yes, I have now corrected the documentation for 1.38.3 for the
Maximum Changer Wait specification.

Great. Thanks.

Arno



--
IT-Service Lehmann                    [EMAIL PROTECTED]
Arno Lehmann                  http://www.its-lehmann.de


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to