That's odd -- the reading side looks normal to me until the error is detected.
Also, "Connection reset by peer" doesn't normally occur when connected to the current machine. Is kilchis a real computer (not a VM)? Is this the only copy job that waits overnight for someone to label a new volume? Maybe something happens overnight on the system that causes networking to be disrupted in some subtle way, causing "Connection reset by peer" when the connection is closed cleanly? __Martin >>>>> On Tue, 19 Sep 2017 15:31:46 -0700, Jerry Lowry said: > > The reading side is the same system. It is a copy job setup to backup > daily backups to the offsite backup disk. > The attachment is the bacula jobid 35202. > > jerry > > On Tue, Sep 19, 2017 at 10:08 AM, Martin Simmons <mar...@lispworks.com> > wrote: > > > The email below is from the writing side of the copy job and the message: > > > > 13-Sep 08:43 kilchis JobId 35203: Error: bsock.c:849 Read error from > > Storage daemon:kilchis:9103: ERR=Connection reset by peer > > > > shows that the connection to the reading side of the job was closed > > unexpectedly from the reading end. > > > > Do you have the corresponding email from the reading side? It will have a > > different JobId (but should mention JobId 35203) and should start with > > something like "Using Device ... to read." > > > > __Martin > > > > > > >>>>> On Mon, 18 Sep 2017 13:42:19 -0700, Jerry Lowry said: > > > > > > Martin, > > > Here is the complete email that was sent just before the "Copy Error" > > > message: > > > > > > 12-Sep 15:09 kilchis-dir JobId 35203: Using Device "MidSwap" to write. > > > 12-Sep 15:09 kilchis JobId 35203: Volume "homeMS-200" previously > > written, moving to end of data. > > > 12-Sep 15:27 kilchis JobId 35203: End of medium on Volume "homeMS-200" > > Bytes=1,932,735,274,146 Blocks=29,959,317 at 12-Sep-2017 15:27. > > > 12-Sep 15:28 kilchis JobId 35203: Job BackupUsers.2017-09-12_09.05.09_50 > > is waiting. Cannot find any appendable volumes. > > > Please use the "label" command to create a new Volume for: > > > Storage: "MidSwap" (/MidSwap) > > > Pool: OffsiteMid > > > Media type: File > > > 12-Sep 15:36 kilchis JobId 35203: Wrote label to prelabeled Volume > > "homeMS-201" on File device "MidSwap" (/MidSwap) > > > 12-Sep 15:36 kilchis JobId 35203: New volume "homeMS-201" mounted on > > device "MidSwap" (/MidSwap) at 12-Sep-2017 15:36. > > > 12-Sep 19:54 kilchis JobId 35203: End of medium on Volume "homeMS-201" > > Bytes=1,932,735,281,790 Blocks=29,959,315 at 12-Sep-2017 19:54. > > > 12-Sep 19:54 kilchis JobId 35203: Job BackupUsers.2017-09-12_09.05.09_50 > > is waiting. Cannot find any appendable volumes. > > > Please use the "label" command to create a new Volume for: > > > Storage: "MidSwap" (/MidSwap) > > > Pool: OffsiteMid > > > Media type: File > > > 12-Sep 20:57 kilchis JobId 35203: Job BackupUsers.2017-09-12_09.05.09_50 > > is waiting. Cannot find any appendable volumes. > > > Please use the "label" command to create a new Volume for: > > > Storage: "MidSwap" (/MidSwap) > > > Pool: OffsiteMid > > > Media type: File > > > 12-Sep 23:03 kilchis JobId 35203: Job BackupUsers.2017-09-12_09.05.09_50 > > is waiting. Cannot find any appendable volumes. > > > Please use the "label" command to create a new Volume for: > > > Storage: "MidSwap" (/MidSwap) > > > Pool: OffsiteMid > > > Media type: File > > > 13-Sep 03:15 kilchis JobId 35203: Job BackupUsers.2017-09-12_09.05.09_50 > > is waiting. Cannot find any appendable volumes. > > > Please use the "label" command to create a new Volume for: > > > Storage: "MidSwap" (/MidSwap) > > > Pool: OffsiteMid > > > Media type: File > > > 13-Sep 08:23 kilchis JobId 35203: Wrote label to prelabeled Volume > > "homeMS-202" on File device "MidSwap" (/MidSwap) > > > 13-Sep 08:23 kilchis JobId 35203: New volume "homeMS-202" mounted on > > device "MidSwap" (/MidSwap) at 13-Sep-2017 08:23. > > > 13-Sep 08:43 kilchis JobId 35203: Error: bsock.c:849 Read error from > > Storage daemon:kilchis:9103: ERR=Connection reset by peer > > > 13-Sep 08:43 kilchis JobId 35203: Fatal error: append.c:271 Network > > error reading from FD. ERR=Connection reset by peer > > > 13-Sep 08:43 kilchis JobId 35203: Elapsed time=04:56:15, Transfer > > rate=125.6 M Bytes/second > > > 13-Sep 08:43 kilchis JobId 35203: Sending spooled attrs to the Director. > > Despooling 1,533,148,574 bytes ... > > > > > > I don't have the job log. Interestingly, I did not have any problems with > > > this or any other copy job before I upgraded. I went from 5.2.13 to > > 9.0.3 > > > of Bacula and latest version of MySql to Mariadb. Not saying that this > > is > > > a problem, because I have 5 other copy jobs that work without error > > still. > > > This one just happens to be the biggest one. > > > > > > thanks, > > > jerry > > > > > > On Mon, Sep 18, 2017 at 7:55 AM, Martin Simmons <mar...@lispworks.com> > > > wrote: > > > > > > > A copy job will communicate using TCP between the Bacula daemons. A > > bsock > > > > error could indicate that bacula-sd closed the connection unexpectedly > > and > > > > I > > > > would expect media errors to be logged. > > > > > > > > Your syslog did include some I/O errors. Any they caused by something > > > > else? > > > > > > > > Do you have the complete job log (from the Bacula log, not the syslog)? > > > > > > > > __Martin > > > > > > > > > > > > >>>>> On Wed, 13 Sep 2017 09:35:07 -0700, Jerry Lowry said: > > > > > > > > > > Kern, > > > > > My Offsite Backup just failed again on the same drive, different > > disk. It > > > > > failed with the same bsock error. If the backup is working on the > > same > > > > > system using the copy function, how far out of the network stack > > does it > > > > > go. My thinking is it does not get out of the application layer. Is > > > > this > > > > > right? Why would I get a bsock error? > > > > > > > > > > I have taken a look at the smart data for the disk and they seem to > > be > > > > > running okay. I am getting some sector relocation errors, would that > > > > cause > > > > > the bsock error during a remap? This procedure has been running > > > > flawlessly > > > > > for many years ( except for human error ). I am wondering if I > > should > > > > > delete the present disk files and let bacula recreate new ones. > > > > > > > > > > thanks for your help! > > > > > > > > > > jerry > > > > > > > > > > > > > > > On Wed, Sep 6, 2017 at 11:26 PM, Kern Sibbald <k...@sibbald.com> > > wrote: > > > > > > > > > > > Hello, > > > > > > > > > > > > If the job is marked as Incomplete in the catalog ("I" I think), > > then > > > > you > > > > > > can simply restart it and it should pickup where it left off. If > > not > > > > you > > > > > > must run it again from the beginning. > > > > > > > > > > > > If you are switching devices when one is full during a Job, it is > > > > unlikely > > > > > > you can restore that job when it terminates. I recommend carefully > > > > testing > > > > > > restores on your system. > > > > > > > > > > > > Best regards, > > > > > > > > > > > > Kern > > > > > > > > > > > > On 09/06/2017 05:38 PM, Jerry Lowry wrote: > > > > > > > > > > > > List, > > > > > > I am running, bacula 9.0.3, Mariadb 12.2.8 on Centos 6.9. I got > > notice > > > > > > last night that my Offsite backup failed due to a bsock error. My > > > > offsite > > > > > > drives are attached to an ATTO raid card which gives me hot swap > > > > > > capability. This configuration works great as it allows me to hot > > swap > > > > a > > > > > > drive when it fills up with a new drive to continue with. The > > problem > > > > is > > > > > > included below. The backup that I was doing is to the OffsiteMid > > drive > > > > > > which is mounted as /dev/sde. Is there a way to restart this backup > > > > job or > > > > > > am I left with an incomplete backup going forward. > > > > > > > > > > > > thanks for your help, > > > > > > > > > > > > jerry > > > > > > > > > > > > > > > > > > Sep 5 08:46:01 kilchis bat[4339]: bsock.c:147 Unable to connect to > > > > > > Director dae > > > > > > mon on kilchis:9101. ERR=Connection refused > > > > > > Sep 5 10:37:20 kilchis attocfgd: [CRIT] [ExpressSAS > > > > > > R608,50:01:08:60:00:57:3d:c > > > > > > 0] [FW] RAID Group state now Offline: OffsiteTop > > > > > > Sep 5 10:39:06 kilchis kernel: scsi 5:0:1:0: Direct-Access > > ATTO > > > > > > Offsite > > > > > > Top00 0001 PQ: 0 ANSI: 5 > > > > > > Sep 5 10:39:06 kilchis kernel: sd 5:0:1:0: Attached scsi generic > > sg6 > > > > type > > > > > > 0 > > > > > > Sep 5 10:39:06 kilchis kernel: sd 5:0:1:0: [sdd] 488366336 > > 4096-byte > > > > > > logical bl > > > > > > ocks: (2.00 TB/1.81 TiB) > > > > > > Sep 5 10:39:06 kilchis kernel: sd 5:0:1:0: [sdd] Write Protect is > > off > > > > > > Sep 5 10:39:06 kilchis kernel: sd 5:0:1:0: [sdd] Write cache: > > enabled, > > > > > > read cac > > > > > > he: enabled, doesn't support DPO or FUA > > > > > > Sep 5 10:39:06 kilchis kernel: sd 5:0:1:0: [sdd] 488366336 > > 4096-byte > > > > > > logical bl > > > > > > ocks: (2.00 TB/1.81 TiB) > > > > > > Sep 5 10:39:06 kilchis kernel: sdd: unknown partition table > > > > > > Sep 5 10:39:06 kilchis kernel: sd 5:0:1:0: [sdd] 488366336 > > 4096-byte > > > > > > logical bl > > > > > > ocks: (2.00 TB/1.81 TiB) > > > > > > Sep 5 10:39:06 kilchis kernel: sd 5:0:1:0: [sdd] Attached SCSI > > disk > > > > > > Sep 5 10:39:35 kilchis kernel: sd 5:0:1:0: [sdd] 488366336 > > 4096-byte > > > > > > logical bl > > > > > > ocks: (2.00 TB/1.81 TiB) > > > > > > Sep 5 10:39:35 kilchis kernel: sdd: > > > > > > Sep 5 10:44:54 kilchis kernel: EXT4-fs (sdd): mounted filesystem > > with > > > > > > ordered d > > > > > > ata mode. Opts: > > > > > > Sep 5 11:02:38 kilchis bacula-dir[4373]: bsock.c:537 Socket has > > > > errors=1 > > > > > > on cal > > > > > > l to client:10.20.10.21:9101 > > > > > > Sep 5 11:02:38 kilchis bacula-dir[4373]: bsock.c:537 Socket has > > > > errors=1 > > > > > > on cal > > > > > > l to client:10.20.10.21:9101 > > > > > > Sep 5 11:02:38 kilchis bacula-dir[4373]: bsock.c:537 Socket has > > > > errors=1 > > > > > > on cal > > > > > > l to client:10.20.10.21:9101 > > > > > > Sep 5 11:02:38 kilchis bacula-dir[4373]: bsock.c:537 Socket has > > > > errors=1 > > > > > > on cal > > > > > > l to client:10.20.10.21:9101 > > > > > > Sep 5 11:02:38 kilchis bacula-dir[4373]: bsock.c:537 Socket has > > > > errors=1 > > > > > > on cal > > > > > > l to client:10.20.10.21:9101 > > > > > > Sep 5 11:02:38 kilchis bacula-dir[4373]: bsock.c:537 Socket has > > > > errors=1 > > > > > > on cal > > > > > > l to client:10.20.10.21:9101 > > > > > > Sep 5 11:02:38 kilchis bacula-dir[4373]: bsock.c:537 Socket has > > > > errors=1 > > > > > > on cal > > > > > > l to client:10.20.10.21:9101 > > > > > > Sep 5 13:45:48 kilchis attocfgd: [CRIT] [ExpressSAS > > > > > > R608,50:01:08:60:00:57:3d:c > > > > > > 0] [FW] RAID Group state now Offline: OffsiteMid > > > > > > Sep 5 13:45:53 kilchis attocfgd: [CRIT] [ExpressSAS > > > > > > R608,50:01:08:60:00:57:3d:c > > > > > > 0] [FW] RAID Group state now Offline: OffsiteTop > > > > > > Sep 5 13:47:52 kilchis kernel: scsi 5:0:1:0: Direct-Access > > ATTO > > > > > > Offsite > > > > > > Mid00 0001 PQ: 0 ANSI: 5 > > > > > > Sep 5 13:47:52 kilchis kernel: sd 5:0:1:0: Attached scsi generic > > sg6 > > > > type > > > > > > 0 > > > > > > Sep 5 13:47:52 kilchis kernel: sd 5:0:1:0: [sde] 488366336 > > 4096-byte > > > > > > logical bl > > > > > > ocks: (2.00 TB/1.81 TiB) > > > > > > Sep 5 13:47:52 kilchis kernel: sd 5:0:1:0: [sde] Write Protect is > > off > > > > > > Sep 5 13:47:52 kilchis kernel: sd 5:0:1:0: [sde] Write cache: > > enabled, > > > > > > read cac > > > > > > he: enabled, doesn't support DPO or FUA > > > > > > Sep 5 13:47:52 kilchis kernel: sd 5:0:1:0: [sde] 488366336 > > 4096-byte > > > > > > logical bl > > > > > > ocks: (2.00 TB/1.81 TiB) > > > > > > Sep 5 13:47:52 kilchis kernel: sde: unknown partition table > > > > > > Sep 5 13:47:52 kilchis kernel: sd 5:0:1:0: [sde] 488366336 > > 4096-byte > > > > > > logical bl > > > > > > ocks: (2.00 TB/1.81 TiB) > > > > > > Sep 5 13:47:52 kilchis kernel: sd 5:0:1:0: [sde] Attached SCSI > > disk > > > > > > Sep 5 13:48:01 kilchis kernel: EXT4-fs error (device sdd): > > > > > > __ext4_get_inode_loc > > > > > > : unable to read inode block - inode=2, block=1057 > > > > > > Sep 5 13:48:01 kilchis kernel: Buffer I/O error on device sdd, > > logical > > > > > > block 0 > > > > > > Sep 5 13:48:01 kilchis kernel: lost page write due to I/O error > > on sdd > > > > > > Sep 5 13:48:01 kilchis kernel: EXT4-fs error (device sdd) in > > > > > > ext4_reserve_inode > > > > > > _write: IO failure > > > > > > Sep 5 13:48:01 kilchis kernel: EXT4-fs (sdd): previous I/O error > > to > > > > > > superblock > > > > > > detected > > > > > > Sep 5 13:48:01 kilchis kernel: Buffer I/O error on device sdd, > > logical > > > > > > block 0 > > > > > > Sep 5 13:48:01 kilchis kernel: lost page write due to I/O error > > on sdd > > > > > > Sep 5 13:48:06 kilchis kernel: Aborting journal on device sdd-8. > > > > > > Sep 5 13:48:06 kilchis kernel: Buffer I/O error on device sdd, > > logical > > > > > > block 24 > > > > > > 3826688 > > > > > > Sep 5 13:48:06 kilchis kernel: lost page write due to I/O error > > on sdd > > > > > > Sep 5 13:48:06 kilchis kernel: JBD2: I/O error detected when > > updating > > > > > > journal s > > > > > > uperblock for sdd-8. > > > > > > Sep 5 13:48:08 kilchis kernel: EXT4-fs error (device sdd): > > > > > > ext4_put_super: Coul > > > > > > dn't clean up the journal > > > > > > Sep 5 13:48:08 kilchis kernel: EXT4-fs (sdd): Remounting > > filesystem > > > > > > read-only > > > > > > Sep 5 13:48:44 kilchis kernel: sd 5:0:1:0: [sde] 488366336 > > 4096-byte > > > > > > logical bl > > > > > > ocks: (2.00 TB/1.81 TiB) > > > > > > Sep 5 13:48:44 kilchis kernel: sde: > > > > > > Sep 5 13:54:05 kilchis kernel: EXT4-fs (sde): mounted filesystem > > with > > > > > > ordered d > > > > > > ata mode. Opts: > > > > > > > > > > > > > > > > > > > > > > > > ------------------------------------------------------------ > > > > ------------------ > > > > > > Check out the vibrant tech community on one of the world's most > > > > > > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > Bacula-users mailing listBacula-users@lists. > > sourceforge.nethttps:// > > > > lists.sourceforge.net/lists/listinfo/bacula-users > > > > > > > > > > > > > > > > > > > > > > > > > > > > ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users