Yes, kilchis is a bonifide hardware server. Only VM's I have are test
systems running on my desktop.

There are 2 copy jobs on this system. This particular job is the one that
typically runs long enough that it will need a new volume during the
night.  The other one will if it is run late in the day and the current
volume does not have very much space left on it. The other daily backup
jobs will wait until the copy job is finished, but there is nothing else
running on the system that utilizes the network except for VNC traffic.
This problem happened two weeks in a row and this last week it worked just
fine.  The one thing that is different is that I dropped all of the current
backup files and purged them from the DB. I then recreated new files to
backup to.  Just wondering if one of the files was writing on a
questionable sector on disk.  Nothing in the logs and smart does not give
any details on that.

I think I will call it a fluke and keep a watch on it in the future..
Thanks!

On Fri, Sep 22, 2017 at 10:27 AM, Martin Simmons <mar...@lispworks.com>
wrote:

> That's odd -- the reading side looks normal to me until the error is
> detected.
>
> Also, "Connection reset by peer" doesn't normally occur when connected to
> the
> current machine.
>
> Is kilchis a real computer (not a VM)?
>
> Is this the only copy job that waits overnight for someone to label a new
> volume?
>
> Maybe something happens overnight on the system that causes networking to
> be
> disrupted in some subtle way, causing "Connection reset by peer" when the
> connection is closed cleanly?
>
> __Martin
>
>
> >>>>> On Tue, 19 Sep 2017 15:31:46 -0700, Jerry Lowry said:
> >
> > The reading side is the same system.  It is a copy job setup to backup
> > daily backups to the offsite backup disk.
> > The attachment is the bacula jobid 35202.
> >
> > jerry
> >
> > On Tue, Sep 19, 2017 at 10:08 AM, Martin Simmons <mar...@lispworks.com>
> > wrote:
> >
> > > The email below is from the writing side of the copy job and the
> message:
> > >
> > > 13-Sep 08:43 kilchis JobId 35203: Error: bsock.c:849 Read error from
> > > Storage daemon:kilchis:9103: ERR=Connection reset by peer
> > >
> > > shows that the connection to the reading side of the job was closed
> > > unexpectedly from the reading end.
> > >
> > > Do you have the corresponding email from the reading side?  It will
> have a
> > > different JobId (but should mention JobId 35203) and should start with
> > > something like "Using Device ... to read."
> > >
> > > __Martin
> > >
> > >
> > > >>>>> On Mon, 18 Sep 2017 13:42:19 -0700, Jerry Lowry said:
> > > >
> > > > Martin,
> > > > Here is the complete email that was sent just before the "Copy Error"
> > > > message:
> > > >
> > > > 12-Sep 15:09 kilchis-dir JobId 35203: Using Device "MidSwap" to
> write.
> > > > 12-Sep 15:09 kilchis JobId 35203: Volume "homeMS-200" previously
> > > written, moving to end of data.
> > > > 12-Sep 15:27 kilchis JobId 35203: End of medium on Volume
> "homeMS-200"
> > > Bytes=1,932,735,274,146 Blocks=29,959,317 at 12-Sep-2017 15:27.
> > > > 12-Sep 15:28 kilchis JobId 35203: Job BackupUsers.2017-09-12_09.05.
> 09_50
> > > is waiting. Cannot find any appendable volumes.
> > > > Please use the "label" command to create a new Volume for:
> > > >     Storage:      "MidSwap" (/MidSwap)
> > > >     Pool:         OffsiteMid
> > > >     Media type:   File
> > > > 12-Sep 15:36 kilchis JobId 35203: Wrote label to prelabeled Volume
> > > "homeMS-201" on File device "MidSwap" (/MidSwap)
> > > > 12-Sep 15:36 kilchis JobId 35203: New volume "homeMS-201" mounted on
> > > device "MidSwap" (/MidSwap) at 12-Sep-2017 15:36.
> > > > 12-Sep 19:54 kilchis JobId 35203: End of medium on Volume
> "homeMS-201"
> > > Bytes=1,932,735,281,790 Blocks=29,959,315 at 12-Sep-2017 19:54.
> > > > 12-Sep 19:54 kilchis JobId 35203: Job BackupUsers.2017-09-12_09.05.
> 09_50
> > > is waiting. Cannot find any appendable volumes.
> > > > Please use the "label" command to create a new Volume for:
> > > >     Storage:      "MidSwap" (/MidSwap)
> > > >     Pool:         OffsiteMid
> > > >     Media type:   File
> > > > 12-Sep 20:57 kilchis JobId 35203: Job BackupUsers.2017-09-12_09.05.
> 09_50
> > > is waiting. Cannot find any appendable volumes.
> > > > Please use the "label" command to create a new Volume for:
> > > >     Storage:      "MidSwap" (/MidSwap)
> > > >     Pool:         OffsiteMid
> > > >     Media type:   File
> > > > 12-Sep 23:03 kilchis JobId 35203: Job BackupUsers.2017-09-12_09.05.
> 09_50
> > > is waiting. Cannot find any appendable volumes.
> > > > Please use the "label" command to create a new Volume for:
> > > >     Storage:      "MidSwap" (/MidSwap)
> > > >     Pool:         OffsiteMid
> > > >     Media type:   File
> > > > 13-Sep 03:15 kilchis JobId 35203: Job BackupUsers.2017-09-12_09.05.
> 09_50
> > > is waiting. Cannot find any appendable volumes.
> > > > Please use the "label" command to create a new Volume for:
> > > >     Storage:      "MidSwap" (/MidSwap)
> > > >     Pool:         OffsiteMid
> > > >     Media type:   File
> > > > 13-Sep 08:23 kilchis JobId 35203: Wrote label to prelabeled Volume
> > > "homeMS-202" on File device "MidSwap" (/MidSwap)
> > > > 13-Sep 08:23 kilchis JobId 35203: New volume "homeMS-202" mounted on
> > > device "MidSwap" (/MidSwap) at 13-Sep-2017 08:23.
> > > > 13-Sep 08:43 kilchis JobId 35203: Error: bsock.c:849 Read error from
> > > Storage daemon:kilchis:9103: ERR=Connection reset by peer
> > > > 13-Sep 08:43 kilchis JobId 35203: Fatal error: append.c:271 Network
> > > error reading from FD. ERR=Connection reset by peer
> > > > 13-Sep 08:43 kilchis JobId 35203: Elapsed time=04:56:15, Transfer
> > > rate=125.6 M Bytes/second
> > > > 13-Sep 08:43 kilchis JobId 35203: Sending spooled attrs to the
> Director.
> > > Despooling 1,533,148,574 bytes ...
> > > >
> > > > I don't have the job log. Interestingly, I did not have any problems
> with
> > > > this or any other copy job before I upgraded.  I went from 5.2.13 to
> > > 9.0.3
> > > > of Bacula and latest version of MySql to Mariadb.  Not saying that
> this
> > > is
> > > > a problem, because I have 5 other copy jobs that work without error
> > > still.
> > > > This one just happens to be the biggest one.
> > > >
> > > > thanks,
> > > > jerry
> > > >
> > > > On Mon, Sep 18, 2017 at 7:55 AM, Martin Simmons <
> mar...@lispworks.com>
> > > > wrote:
> > > >
> > > > > A copy job will communicate using TCP between the Bacula daemons.
> A
> > > bsock
> > > > > error could indicate that bacula-sd closed the connection
> unexpectedly
> > > and
> > > > > I
> > > > > would expect media errors to be logged.
> > > > >
> > > > > Your syslog did include some I/O errors.  Any they caused by
> something
> > > > > else?
> > > > >
> > > > > Do you have the complete job log (from the Bacula log, not the
> syslog)?
> > > > >
> > > > > __Martin
> > > > >
> > > > >
> > > > > >>>>> On Wed, 13 Sep 2017 09:35:07 -0700, Jerry Lowry said:
> > > > > >
> > > > > > Kern,
> > > > > > My Offsite Backup just failed again on the same drive, different
> > > disk. It
> > > > > > failed with the same bsock error.  If the backup is working on
> the
> > > same
> > > > > > system using the copy function, how far out of the network stack
> > > does it
> > > > > > go.  My thinking is it does not get out of the application
> layer.  Is
> > > > > this
> > > > > > right?  Why would I get a bsock error?
> > > > > >
> > > > > > I have taken a look at the smart data for the disk and they seem
> to
> > > be
> > > > > > running okay. I am getting some sector relocation errors, would
> that
> > > > > cause
> > > > > > the bsock error during a remap?  This procedure has been running
> > > > > flawlessly
> > > > > > for many years ( except for human error ).  I am wondering if I
> > > should
> > > > > > delete the present disk files and let bacula recreate new ones.
> > > > > >
> > > > > > thanks for your help!
> > > > > >
> > > > > > jerry
> > > > > >
> > > > > >
> > > > > > On Wed, Sep 6, 2017 at 11:26 PM, Kern Sibbald <k...@sibbald.com>
> > > wrote:
> > > > > >
> > > > > > > Hello,
> > > > > > >
> > > > > > > If the job is marked as Incomplete in the catalog ("I" I
> think),
> > > then
> > > > > you
> > > > > > > can simply restart it and it should pickup where it left off.
> If
> > > not
> > > > > you
> > > > > > > must run it again from the beginning.
> > > > > > >
> > > > > > > If you are switching devices when one is full during a Job, it
> is
> > > > > unlikely
> > > > > > > you can restore that job when it terminates. I recommend
> carefully
> > > > > testing
> > > > > > > restores on your system.
> > > > > > >
> > > > > > > Best regards,
> > > > > > >
> > > > > > > Kern
> > > > > > >
> > > > > > > On 09/06/2017 05:38 PM, Jerry Lowry wrote:
> > > > > > >
> > > > > > > List,
> > > > > > > I am running, bacula 9.0.3, Mariadb 12.2.8 on Centos 6.9.  I
> got
> > > notice
> > > > > > > last night that my Offsite backup failed due to a bsock
> error.  My
> > > > > offsite
> > > > > > > drives are attached to an ATTO raid card which gives me hot
> swap
> > > > > > > capability. This configuration works great as it allows me to
> hot
> > > swap
> > > > > a
> > > > > > > drive when it fills up with a new drive to continue with.  The
> > > problem
> > > > > is
> > > > > > > included below. The backup that I was doing is to the
> OffsiteMid
> > > drive
> > > > > > > which is mounted as /dev/sde. Is there a way to restart this
> backup
> > > > > job or
> > > > > > > am I left with an incomplete backup going forward.
> > > > > > >
> > > > > > > thanks for your help,
> > > > > > >
> > > > > > > jerry
> > > > > > >
> > > > > > >
> > > > > > > Sep  5 08:46:01 kilchis bat[4339]: bsock.c:147 Unable to
> connect to
> > > > > > > Director dae
> > > > > > > mon on kilchis:9101. ERR=Connection refused
> > > > > > > Sep  5 10:37:20 kilchis attocfgd: [CRIT] [ExpressSAS
> > > > > > > R608,50:01:08:60:00:57:3d:c
> > > > > > > 0] [FW] RAID Group state now Offline: OffsiteTop
> > > > > > > Sep  5 10:39:06 kilchis kernel: scsi 5:0:1:0: Direct-Access
> > >  ATTO
> > > > > > > Offsite
> > > > > > > Top00     0001 PQ: 0 ANSI: 5
> > > > > > > Sep  5 10:39:06 kilchis kernel: sd 5:0:1:0: Attached scsi
> generic
> > > sg6
> > > > > type
> > > > > > > 0
> > > > > > > Sep  5 10:39:06 kilchis kernel: sd 5:0:1:0: [sdd] 488366336
> > > 4096-byte
> > > > > > > logical bl
> > > > > > > ocks: (2.00 TB/1.81 TiB)
> > > > > > > Sep  5 10:39:06 kilchis kernel: sd 5:0:1:0: [sdd] Write
> Protect is
> > > off
> > > > > > > Sep  5 10:39:06 kilchis kernel: sd 5:0:1:0: [sdd] Write cache:
> > > enabled,
> > > > > > > read cac
> > > > > > > he: enabled, doesn't support DPO or FUA
> > > > > > > Sep  5 10:39:06 kilchis kernel: sd 5:0:1:0: [sdd] 488366336
> > > 4096-byte
> > > > > > > logical bl
> > > > > > > ocks: (2.00 TB/1.81 TiB)
> > > > > > > Sep  5 10:39:06 kilchis kernel: sdd: unknown partition table
> > > > > > > Sep  5 10:39:06 kilchis kernel: sd 5:0:1:0: [sdd] 488366336
> > > 4096-byte
> > > > > > > logical bl
> > > > > > > ocks: (2.00 TB/1.81 TiB)
> > > > > > > Sep  5 10:39:06 kilchis kernel: sd 5:0:1:0: [sdd] Attached SCSI
> > > disk
> > > > > > > Sep  5 10:39:35 kilchis kernel: sd 5:0:1:0: [sdd] 488366336
> > > 4096-byte
> > > > > > > logical bl
> > > > > > > ocks: (2.00 TB/1.81 TiB)
> > > > > > > Sep  5 10:39:35 kilchis kernel: sdd:
> > > > > > > Sep  5 10:44:54 kilchis kernel: EXT4-fs (sdd): mounted
> filesystem
> > > with
> > > > > > > ordered d
> > > > > > > ata mode. Opts:
> > > > > > > Sep  5 11:02:38 kilchis bacula-dir[4373]: bsock.c:537 Socket
> has
> > > > > errors=1
> > > > > > > on cal
> > > > > > > l to client:10.20.10.21:9101
> > > > > > > Sep  5 11:02:38 kilchis bacula-dir[4373]: bsock.c:537 Socket
> has
> > > > > errors=1
> > > > > > > on cal
> > > > > > > l to client:10.20.10.21:9101
> > > > > > > Sep  5 11:02:38 kilchis bacula-dir[4373]: bsock.c:537 Socket
> has
> > > > > errors=1
> > > > > > > on cal
> > > > > > > l to client:10.20.10.21:9101
> > > > > > > Sep  5 11:02:38 kilchis bacula-dir[4373]: bsock.c:537 Socket
> has
> > > > > errors=1
> > > > > > > on cal
> > > > > > > l to client:10.20.10.21:9101
> > > > > > > Sep  5 11:02:38 kilchis bacula-dir[4373]: bsock.c:537 Socket
> has
> > > > > errors=1
> > > > > > > on cal
> > > > > > > l to client:10.20.10.21:9101
> > > > > > > Sep  5 11:02:38 kilchis bacula-dir[4373]: bsock.c:537 Socket
> has
> > > > > errors=1
> > > > > > > on cal
> > > > > > > l to client:10.20.10.21:9101
> > > > > > > Sep  5 11:02:38 kilchis bacula-dir[4373]: bsock.c:537 Socket
> has
> > > > > errors=1
> > > > > > > on cal
> > > > > > > l to client:10.20.10.21:9101
> > > > > > > Sep  5 13:45:48 kilchis attocfgd: [CRIT] [ExpressSAS
> > > > > > > R608,50:01:08:60:00:57:3d:c
> > > > > > > 0] [FW] RAID Group state now Offline: OffsiteMid
> > > > > > > Sep  5 13:45:53 kilchis attocfgd: [CRIT] [ExpressSAS
> > > > > > > R608,50:01:08:60:00:57:3d:c
> > > > > > > 0] [FW] RAID Group state now Offline: OffsiteTop
> > > > > > > Sep  5 13:47:52 kilchis kernel: scsi 5:0:1:0: Direct-Access
> > >  ATTO
> > > > > > > Offsite
> > > > > > > Mid00     0001 PQ: 0 ANSI: 5
> > > > > > > Sep  5 13:47:52 kilchis kernel: sd 5:0:1:0: Attached scsi
> generic
> > > sg6
> > > > > type
> > > > > > > 0
> > > > > > > Sep  5 13:47:52 kilchis kernel: sd 5:0:1:0: [sde] 488366336
> > > 4096-byte
> > > > > > > logical bl
> > > > > > > ocks: (2.00 TB/1.81 TiB)
> > > > > > > Sep  5 13:47:52 kilchis kernel: sd 5:0:1:0: [sde] Write
> Protect is
> > > off
> > > > > > > Sep  5 13:47:52 kilchis kernel: sd 5:0:1:0: [sde] Write cache:
> > > enabled,
> > > > > > > read cac
> > > > > > > he: enabled, doesn't support DPO or FUA
> > > > > > > Sep  5 13:47:52 kilchis kernel: sd 5:0:1:0: [sde] 488366336
> > > 4096-byte
> > > > > > > logical bl
> > > > > > > ocks: (2.00 TB/1.81 TiB)
> > > > > > > Sep  5 13:47:52 kilchis kernel: sde: unknown partition table
> > > > > > > Sep  5 13:47:52 kilchis kernel: sd 5:0:1:0: [sde] 488366336
> > > 4096-byte
> > > > > > > logical bl
> > > > > > > ocks: (2.00 TB/1.81 TiB)
> > > > > > > Sep  5 13:47:52 kilchis kernel: sd 5:0:1:0: [sde] Attached SCSI
> > > disk
> > > > > > > Sep  5 13:48:01 kilchis kernel: EXT4-fs error (device sdd):
> > > > > > > __ext4_get_inode_loc
> > > > > > > : unable to read inode block - inode=2, block=1057
> > > > > > > Sep  5 13:48:01 kilchis kernel: Buffer I/O error on device sdd,
> > > logical
> > > > > > > block 0
> > > > > > > Sep  5 13:48:01 kilchis kernel: lost page write due to I/O
> error
> > > on sdd
> > > > > > > Sep  5 13:48:01 kilchis kernel: EXT4-fs error (device sdd) in
> > > > > > > ext4_reserve_inode
> > > > > > > _write: IO failure
> > > > > > > Sep  5 13:48:01 kilchis kernel: EXT4-fs (sdd): previous I/O
> error
> > > to
> > > > > > > superblock
> > > > > > > detected
> > > > > > > Sep  5 13:48:01 kilchis kernel: Buffer I/O error on device sdd,
> > > logical
> > > > > > > block 0
> > > > > > > Sep  5 13:48:01 kilchis kernel: lost page write due to I/O
> error
> > > on sdd
> > > > > > > Sep  5 13:48:06 kilchis kernel: Aborting journal on device
> sdd-8.
> > > > > > > Sep  5 13:48:06 kilchis kernel: Buffer I/O error on device sdd,
> > > logical
> > > > > > > block 24
> > > > > > > 3826688
> > > > > > > Sep  5 13:48:06 kilchis kernel: lost page write due to I/O
> error
> > > on sdd
> > > > > > > Sep  5 13:48:06 kilchis kernel: JBD2: I/O error detected when
> > > updating
> > > > > > > journal s
> > > > > > > uperblock for sdd-8.
> > > > > > > Sep  5 13:48:08 kilchis kernel: EXT4-fs error (device sdd):
> > > > > > > ext4_put_super: Coul
> > > > > > > dn't clean up the journal
> > > > > > > Sep  5 13:48:08 kilchis kernel: EXT4-fs (sdd): Remounting
> > > filesystem
> > > > > > > read-only
> > > > > > > Sep  5 13:48:44 kilchis kernel: sd 5:0:1:0: [sde] 488366336
> > > 4096-byte
> > > > > > > logical bl
> > > > > > > ocks: (2.00 TB/1.81 TiB)
> > > > > > > Sep  5 13:48:44 kilchis kernel: sde:
> > > > > > > Sep  5 13:54:05 kilchis kernel: EXT4-fs (sde): mounted
> filesystem
> > > with
> > > > > > > ordered d
> > > > > > > ata mode. Opts:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > ------------------------------------------------------------
> > > > > ------------------
> > > > > > > Check out the vibrant tech community on one of the world's most
> > > > > > > engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > _______________________________________________
> > > > > > > Bacula-users mailing listBacula-users@lists.
> > > sourceforge.nethttps://
> > > > > lists.sourceforge.net/lists/listinfo/bacula-users
> > > > > > >
> > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to