Hi!
Thank you Kern! This debug flag gave me a lot of information and also a hint to solve the problem!

I have two storage resources (with file as media type) specified in the directors configuration file. One for a local file storage and one for the storage daemon on the second server. When doing a restore the director suggested the first of the two (the local one) and even though I manually changed the restore job to use the other sd the job failed. When looking at the debug output it became clear that the file daemon tried to connect to the wrong storage daemon to get the files.
I commented out the local file storage and restarted the daemons. I could now do a restore without any problems!

Is this a know issue? Howcome the file daemon tries to access the local storage daemon even though I changed "storage" to another sd when specifying the restore?
I didn't see this problem before my upgrade. Could it be a bug in the new release or could I have a problem in the config somewhere? Is anyone else running several storage daemons with bacula 1.38.2?

Best regards, Jonas Mixter


On 2005-12-14 23:15, Kern Sibbald wrote:
Hello Jonas,

Using an IP address is fine.

Running with a -d400 on the SD and the FD would probably give you a much 
better idea what is going wrong ...


On Wednesday 14 December 2005 23:12, Jonas Mixter wrote:
  
On 2005-12-14 22:23, Kern Sibbald wrote:
    
On Wednesday 14 December 2005 22:13, Jonas Mixter wrote:
      
On 2005-12-14 17:55, Attila Fülöp wrote:
        
Jonas Mixter wrote:
          
On 2005-12-14 10:29, Jonas Mixter wrote:
            
Hi!
Yesterday I upgraded my director, fd and sd from bacula 1.36.2 (that
comes with Debian sarge) to 1.38.2.
I have one machine running the director (with a connected
tape-station) and one machine with a lot of disks and a storage
daemon (this should later on be placed elsewhere, hence the
separation of the daemons). The director is named merry-dir and the
storage daemon is named pippin-sd. Before the upgrade I could backup
for example the catalog to a file storage on the machine with just
the sd. I could also restore files from
the backup.

After the upgrade, I could backup just as before. The job exits OK
and I
could see that the files that hold the backup are growing. But I
cannot restore...
When running a restore I could mark the files I want and the jobs is
visible underneath "Running Jobs" in bconsole. The job is "waiting for
Client merry-fd to connect to Storage File" and after a while the job
times out.

Here's the output when trying to restore a file to merry-fd (running
on the same server as the director):

14-Dec 09:03 merry-dir: Start Restore Job
RestoreFiles.2005-12-14_09.03.22
14-Dec 09:13 merry-fd: RestoreFiles.2005-12-14_09.03.22 Fatal error:
Authorization key rejected by Storage daemon.
Please see
http://www.bacula.org/rel-manual/faq.html#AuthorizationErrors
for help.
14-Dec 09:13 merry-fd: RestoreFiles.2005-12-14_09.03.22 Fatal error:
Failed to authenticate Storage daemon.
14-Dec 09:13 merry-dir: RestoreFiles.2005-12-14_09.03.22 Fatal error:
Socket error on Storage command: ERR=No data available
14-Dec 09:13 merry-dir: RestoreFiles.2005-12-14_09.03.22 Error: Bacula
1.38.2 (20Nov05): 14-Dec-2005 09:13:30
JobId:                  512
Job:                    RestoreFiles.2005-12-14_09.03.22
Client:                 merry-fd
Start time:             14-Dec-2005 09:03:25
End time:               14-Dec-2005 09:13:30
Files Expected:         1
Files Restored:         0
Bytes Restored:         0
Rate:                   0.0 KB/s
FD Errors:              0
FD termination status: SD termination status:  Waiting on FD
Termination:            *** Restore Error ***

14-Dec 09:13 merry-dir: RestoreFiles.2005-12-14_09.03.22 Error: Bacula
1.38.2 (20Nov05): 14-Dec-2005 09:13:30
JobId:                  512
Job:                    RestoreFiles.2005-12-14_09.03.22
Client:                 merry-fd
Start time:             14-Dec-2005 09:03:25
End time:               14-Dec-2005 09:13:30
Files Expected:         1
Files Restored:         0
Bytes Restored:         0
Rate:                   0.0 KB/s
FD Errors:              1
FD termination status: SD termination status:  Waiting on FD
Termination:            *** Restore Error ***


Note the job fails twice... Is that really correct?
I could backup and restore without any problems to a local file on the
server running the director (tested only once though).
I could check the status of the storage daemon from the director, and
also do backups so the passwords should be OK, right?

I've also seen this error sometimes "14-Dec 09:43 merry-sd: Job
RestoreFiles.2005-12-14_09.43.50 waiting to reserve a device." but
don't
really know the meaning. In a previous post i saw that Kern asked
about the /lib/tls-directory when presented with this error. I do
have the /lib/tls, but start all my daemons with the startscript that
bacula provides. In that script the
"LD_ASSUME_KERNEL=2.4.19"-variable is exported. (If running kernel
2.4! is the /lib/tls a problem with kernel 2.6 too?)
I use kernel 2.4.27 on the server with the director, and 2.6.8 on
the sd
server. The /lib/tls is there on both the servers which are running
Debian sarge.
Should really "merry-sd" be involved? I've selected the storage on
pippin-sd when running "restore" in bconsole. Could it be that the
director/fd is trying to restore from the wrong sd? Shouldn't the job
then fail with a "volume not found" error, ask me to mount a volume,
or similiar?

If I try to cancel a job, the director from time to time complains it
cannot the job even though it's listed under running jobs.
*cancel
Automatically selected Job: JobId=515
Job=RestoreFiles.2005-12-14_10.18.20
Confirm cancel (yes/no): yes
3902 Job RestoreFiles.2005-12-14_10.18.20 not found.

The job is marked "has been canceled" in bconsole anyway. I don't
remember if I got errors like this before the upgrade.

What could be wrong in my setup? I've been banging my head to the wall
for quite a few hours now and I'm out of ideas.

Best regards, Jonas Mixter
              
Hi!
I've doing some more tests today.
There seems to be no problems at all restoring from backups stored on
tape.
If I copy the entire backupfile from the server with running only the
sd, to the server running the bacula-dir I could do a restore files
without any problems.
Does anyone have any suggestions where continue trouble shooting?

Best regards, Jonas Mixter
            
Authorization key rejected by Storage daemon.
Please see http://www.bacula.org/rel-manual/faq.html#AuthorizationErrors
for help.
14-Dec 09:13 merry-fd: RestoreFiles.2005-12-14_09.03.22 Fatal error:
Failed to authenticate Storage daemon.

Either you have a password mismatch or some firewall in between.
          
Hi!
Thank you for your answer. I'm afraid this is not the problem though,
even if it seems obvious.
There is no firewall between (or on) the machines. They are connected to
the very same network switch and I got no iptables or similar activated.
The passwords seem to match when I review the config files and I have no
problem using "status storage" or running backups _to_ this storage
daemon. I shouldn't be able to do that if the passwords weren't
matching, right? I only get this problem when trying to restore _from_
the storage daemon.
It also seems more like a timeout (10 minutes between the start of the
job and the error) than an actual authorization error too me. Am I wrong?
        
Most likely the error message is correct, but for a slightly more subtle
reason -- I suspect that either your Director and the Client don't resolve
the SD address to the same IP or more likely the SD has crashed.
      
Hi Kern.
Thank you for taking time to review my problem.
I have an IP-address specified in my bacula-dir.conf so I don't think
that there should be any resolve problem. The server running the the
storage daemon has no entry in the DNS-system at the moment. Could that
be a problem even though I specify an IP as the address for the server?

The storage daemon must be up and running since I could do a second
backup right after the failed restore. (Just tested to be sure.)
Could I start any of the daemons in some kind of debug mode? Would that
do me any good?
I've included the parts of my configuration files that I think is
relevant here below. Could anyone on the list spot an error? (The
commented out lines about different ports are for a future stunnel, but
at the moment I use standard bacula ports for the sd.)

Best regards, Jonas Mixter

>From the storage daemon:

Storage {                             # definition of myself
  Name = pippin-sd
  #SDPort = 59103                  # Director's port
  SDPort = 9103                  # Director's port
  WorkingDirectory = "/var/bacula/working"
  Pid Directory = "/var/run"
  Maximum Concurrent Jobs = 20
}
Director {
  Name = merry-dir
  Password = "mypassword"
}
Device {
  Name = FileStorage
  Media Type = File
  Archive Device = /backup
  LabelMedia = yes;                   # lets Bacula label unlabeled media
  Random Access = Yes;
  AutomaticMount = yes;               # when device opened, read it
  RemovableMedia = no;
  AlwaysOpen = no;
}

>From the directors config file:

# Definition of file storage device
Storage {
  Name = File
  Address = my-ip-for-the-storage-daemon-server
  #Address = merry.jamtport.se # Här finns en stunnel som krypterar
trafiken och skickar den till pippin
  SDPort = 9103
  #SDPort = 9104
  Password = "mypassword, same as above"          # password for Storage
daemon
  Device = "FileStorage"            # must be same as Device in Storage
daemon
  Media Type = "File"                # must be same as MediaType in
Storage daemon
}
    

  

Reply via email to