Re: [Bacula-users] Incomplete backup - due to bsock error

2017-09-19 Thread Jerry Lowry
The reading side is the same system.  It is a copy job setup to backup
daily backups to the offsite backup disk.
The attachment is the bacula jobid 35202.

jerry

On Tue, Sep 19, 2017 at 10:08 AM, Martin Simmons 
wrote:

> The email below is from the writing side of the copy job and the message:
>
> 13-Sep 08:43 kilchis JobId 35203: Error: bsock.c:849 Read error from
> Storage daemon:kilchis:9103: ERR=Connection reset by peer
>
> shows that the connection to the reading side of the job was closed
> unexpectedly from the reading end.
>
> Do you have the corresponding email from the reading side?  It will have a
> different JobId (but should mention JobId 35203) and should start with
> something like "Using Device ... to read."
>
> __Martin
>
>
> > On Mon, 18 Sep 2017 13:42:19 -0700, Jerry Lowry said:
> >
> > Martin,
> > Here is the complete email that was sent just before the "Copy Error"
> > message:
> >
> > 12-Sep 15:09 kilchis-dir JobId 35203: Using Device "MidSwap" to write.
> > 12-Sep 15:09 kilchis JobId 35203: Volume "homeMS-200" previously
> written, moving to end of data.
> > 12-Sep 15:27 kilchis JobId 35203: End of medium on Volume "homeMS-200"
> Bytes=1,932,735,274,146 Blocks=29,959,317 at 12-Sep-2017 15:27.
> > 12-Sep 15:28 kilchis JobId 35203: Job BackupUsers.2017-09-12_09.05.09_50
> is waiting. Cannot find any appendable volumes.
> > Please use the "label" command to create a new Volume for:
> > Storage:  "MidSwap" (/MidSwap)
> > Pool: OffsiteMid
> > Media type:   File
> > 12-Sep 15:36 kilchis JobId 35203: Wrote label to prelabeled Volume
> "homeMS-201" on File device "MidSwap" (/MidSwap)
> > 12-Sep 15:36 kilchis JobId 35203: New volume "homeMS-201" mounted on
> device "MidSwap" (/MidSwap) at 12-Sep-2017 15:36.
> > 12-Sep 19:54 kilchis JobId 35203: End of medium on Volume "homeMS-201"
> Bytes=1,932,735,281,790 Blocks=29,959,315 at 12-Sep-2017 19:54.
> > 12-Sep 19:54 kilchis JobId 35203: Job BackupUsers.2017-09-12_09.05.09_50
> is waiting. Cannot find any appendable volumes.
> > Please use the "label" command to create a new Volume for:
> > Storage:  "MidSwap" (/MidSwap)
> > Pool: OffsiteMid
> > Media type:   File
> > 12-Sep 20:57 kilchis JobId 35203: Job BackupUsers.2017-09-12_09.05.09_50
> is waiting. Cannot find any appendable volumes.
> > Please use the "label" command to create a new Volume for:
> > Storage:  "MidSwap" (/MidSwap)
> > Pool: OffsiteMid
> > Media type:   File
> > 12-Sep 23:03 kilchis JobId 35203: Job BackupUsers.2017-09-12_09.05.09_50
> is waiting. Cannot find any appendable volumes.
> > Please use the "label" command to create a new Volume for:
> > Storage:  "MidSwap" (/MidSwap)
> > Pool: OffsiteMid
> > Media type:   File
> > 13-Sep 03:15 kilchis JobId 35203: Job BackupUsers.2017-09-12_09.05.09_50
> is waiting. Cannot find any appendable volumes.
> > Please use the "label" command to create a new Volume for:
> > Storage:  "MidSwap" (/MidSwap)
> > Pool: OffsiteMid
> > Media type:   File
> > 13-Sep 08:23 kilchis JobId 35203: Wrote label to prelabeled Volume
> "homeMS-202" on File device "MidSwap" (/MidSwap)
> > 13-Sep 08:23 kilchis JobId 35203: New volume "homeMS-202" mounted on
> device "MidSwap" (/MidSwap) at 13-Sep-2017 08:23.
> > 13-Sep 08:43 kilchis JobId 35203: Error: bsock.c:849 Read error from
> Storage daemon:kilchis:9103: ERR=Connection reset by peer
> > 13-Sep 08:43 kilchis JobId 35203: Fatal error: append.c:271 Network
> error reading from FD. ERR=Connection reset by peer
> > 13-Sep 08:43 kilchis JobId 35203: Elapsed time=04:56:15, Transfer
> rate=125.6 M Bytes/second
> > 13-Sep 08:43 kilchis JobId 35203: Sending spooled attrs to the Director.
> Despooling 1,533,148,574 bytes ...
> >
> > I don't have the job log. Interestingly, I did not have any problems with
> > this or any other copy job before I upgraded.  I went from 5.2.13 to
> 9.0.3
> > of Bacula and latest version of MySql to Mariadb.  Not saying that this
> is
> > a problem, because I have 5 other copy jobs that work without error
> still.
> > This one just happens to be the biggest one.
> >
> > thanks,
> > jerry
> >
> > On Mon, Sep 18, 2017 at 7:55 AM, Martin Simmons 
> > wrote:
> >
> > > A copy job will communicate using TCP between the Bacula daemons.  A
> bsock
> > > error could indicate that bacula-sd closed the connection unexpectedly
> and
> > > I
> > > would expect media errors to be logged.
> > >
> > > Your syslog did include some I/O errors.  Any they caused by something
> > > else?
> > >
> > > Do you have the complete job log (from the Bacula log, not the syslog)?
> > >
> > > __Martin
> > >
> > >
> > > > On Wed, 13 Sep 2017 09:35:07 -0700, Jerry Lowry said:
> > > >
> > > > Kern,
> > > > My Offsite Backup just failed again on the same drive, different
> disk. It
> > > > failed with the same bsock error.  If the backup is working on the
> same
> > > > system using the copy

Re: [Bacula-users] Incomplete backup - due to bsock error

2017-09-19 Thread Martin Simmons
The email below is from the writing side of the copy job and the message:

13-Sep 08:43 kilchis JobId 35203: Error: bsock.c:849 Read error from Storage 
daemon:kilchis:9103: ERR=Connection reset by peer

shows that the connection to the reading side of the job was closed
unexpectedly from the reading end.

Do you have the corresponding email from the reading side?  It will have a
different JobId (but should mention JobId 35203) and should start with
something like "Using Device ... to read."

__Martin


> On Mon, 18 Sep 2017 13:42:19 -0700, Jerry Lowry said:
> 
> Martin,
> Here is the complete email that was sent just before the "Copy Error"
> message:
> 
> 12-Sep 15:09 kilchis-dir JobId 35203: Using Device "MidSwap" to write.
> 12-Sep 15:09 kilchis JobId 35203: Volume "homeMS-200" previously written, 
> moving to end of data.
> 12-Sep 15:27 kilchis JobId 35203: End of medium on Volume "homeMS-200" 
> Bytes=1,932,735,274,146 Blocks=29,959,317 at 12-Sep-2017 15:27.
> 12-Sep 15:28 kilchis JobId 35203: Job BackupUsers.2017-09-12_09.05.09_50 is 
> waiting. Cannot find any appendable volumes.
> Please use the "label" command to create a new Volume for:
> Storage:  "MidSwap" (/MidSwap)
> Pool: OffsiteMid
> Media type:   File
> 12-Sep 15:36 kilchis JobId 35203: Wrote label to prelabeled Volume 
> "homeMS-201" on File device "MidSwap" (/MidSwap)
> 12-Sep 15:36 kilchis JobId 35203: New volume "homeMS-201" mounted on device 
> "MidSwap" (/MidSwap) at 12-Sep-2017 15:36.
> 12-Sep 19:54 kilchis JobId 35203: End of medium on Volume "homeMS-201" 
> Bytes=1,932,735,281,790 Blocks=29,959,315 at 12-Sep-2017 19:54.
> 12-Sep 19:54 kilchis JobId 35203: Job BackupUsers.2017-09-12_09.05.09_50 is 
> waiting. Cannot find any appendable volumes.
> Please use the "label" command to create a new Volume for:
> Storage:  "MidSwap" (/MidSwap)
> Pool: OffsiteMid
> Media type:   File
> 12-Sep 20:57 kilchis JobId 35203: Job BackupUsers.2017-09-12_09.05.09_50 is 
> waiting. Cannot find any appendable volumes.
> Please use the "label" command to create a new Volume for:
> Storage:  "MidSwap" (/MidSwap)
> Pool: OffsiteMid
> Media type:   File
> 12-Sep 23:03 kilchis JobId 35203: Job BackupUsers.2017-09-12_09.05.09_50 is 
> waiting. Cannot find any appendable volumes.
> Please use the "label" command to create a new Volume for:
> Storage:  "MidSwap" (/MidSwap)
> Pool: OffsiteMid
> Media type:   File
> 13-Sep 03:15 kilchis JobId 35203: Job BackupUsers.2017-09-12_09.05.09_50 is 
> waiting. Cannot find any appendable volumes.
> Please use the "label" command to create a new Volume for:
> Storage:  "MidSwap" (/MidSwap)
> Pool: OffsiteMid
> Media type:   File
> 13-Sep 08:23 kilchis JobId 35203: Wrote label to prelabeled Volume 
> "homeMS-202" on File device "MidSwap" (/MidSwap)
> 13-Sep 08:23 kilchis JobId 35203: New volume "homeMS-202" mounted on device 
> "MidSwap" (/MidSwap) at 13-Sep-2017 08:23.
> 13-Sep 08:43 kilchis JobId 35203: Error: bsock.c:849 Read error from Storage 
> daemon:kilchis:9103: ERR=Connection reset by peer
> 13-Sep 08:43 kilchis JobId 35203: Fatal error: append.c:271 Network error 
> reading from FD. ERR=Connection reset by peer
> 13-Sep 08:43 kilchis JobId 35203: Elapsed time=04:56:15, Transfer rate=125.6 
> M Bytes/second
> 13-Sep 08:43 kilchis JobId 35203: Sending spooled attrs to the Director. 
> Despooling 1,533,148,574 bytes ...
> 
> I don't have the job log. Interestingly, I did not have any problems with
> this or any other copy job before I upgraded.  I went from 5.2.13 to 9.0.3
> of Bacula and latest version of MySql to Mariadb.  Not saying that this is
> a problem, because I have 5 other copy jobs that work without error still.
> This one just happens to be the biggest one.
> 
> thanks,
> jerry
> 
> On Mon, Sep 18, 2017 at 7:55 AM, Martin Simmons 
> wrote:
> 
> > A copy job will communicate using TCP between the Bacula daemons.  A bsock
> > error could indicate that bacula-sd closed the connection unexpectedly and
> > I
> > would expect media errors to be logged.
> >
> > Your syslog did include some I/O errors.  Any they caused by something
> > else?
> >
> > Do you have the complete job log (from the Bacula log, not the syslog)?
> >
> > __Martin
> >
> >
> > > On Wed, 13 Sep 2017 09:35:07 -0700, Jerry Lowry said:
> > >
> > > Kern,
> > > My Offsite Backup just failed again on the same drive, different disk. It
> > > failed with the same bsock error.  If the backup is working on the same
> > > system using the copy function, how far out of the network stack does it
> > > go.  My thinking is it does not get out of the application layer.  Is
> > this
> > > right?  Why would I get a bsock error?
> > >
> > > I have taken a look at the smart data for the disk and they seem to be
> > > running okay. I am getting some sector relocation errors, would that
> > cause
> > > the bsock error during a remap

Re: [Bacula-users] No Job status returned from FD. Backup fails

2017-09-19 Thread Can Şirin

 Hi,

It totally depends on number of file entries. My jobs have almost 30M files
x 8 parallell jobs, and it takes about 20 hours to despool attributes. I
recommend you to check the database configuration. If it is Postgres, you
should check checkpoint interval. Also you check the link below.

http://bacula.us/tuning/

Can

Quoting Matthias Koch-Schirrmeister :


After a long and apparently successful backup job, I am getting this:

Standard-Device Elapsed time=25:57:52, Transfer rate=1.463 M Bytes/second
Sending spooled attrs to the Director. Despooling 431,639,287 bytes ...

It's been about an hour now that the job stopped writing to the volume.
The job should now finish. I wonder if this is normal?

Heartbeat Interval has been set to 1200 btw... 600 apparently wasn't
enough.

Matthias


--

Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Bacula-users mailing list


Bacula-users@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/bacula-users
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] No Job status returned from FD. Backup fails

2017-09-19 Thread Matthias Koch-Schirrmeister
After a long and apparently successful backup job, I am getting this:

Standard-Device Elapsed time=25:57:52, Transfer rate=1.463 M Bytes/second
Sending spooled attrs to the Director. Despooling 431,639,287 bytes ...


It's been about an hour now that the job stopped writing to the volume.
The job should now finish. I wonder if this is normal?

Heartbeat Interval has been set to 1200 btw... 600 apparently wasn't enough.

Matthias
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users