follow up.... Cacti shows that swap started growing this morning and reached it's maximum when the job failed...
Ralf Gross schrieb: > Hi, > > bacula 3.0.3 SD+ DIR, 2.4.4 FD, Debian Lenny, psql 8.4 > > The backup job 19429 was running for nearly two days and then failed while > changing the LTO3 tape. The job failed two times now. No messages in syslog. > > The message "ERR=Datei oder Verzeichnis nicht gefunden" means "ERR=file or > directory not found" > > I deleted the 06D132L3 tape in the bacula catalog, erased the bacula label > with > mt and labled it again. No problem while loading/unloading or writing the > lable. > > The strage thing is that this error blocked an other job (19427) that was > running at the same time but on a different storage daemon! The other SD just > stoped writing. The status was still running but no activity on the SD side. > > Any ideas? > > [...] > 10-Feb 16:17 VU0EA003-sd JobId 19427: Despooling elapsed time = 00:03:33, > Transfer rate = 100.8 M bytes/second > 10-Feb 16:17 VU0EA003-sd JobId 19427: Spooling dat ... > 10-Feb 16:18 VUMEM004-sd JobId 19429: Despooling elapsed time = 00:05:49, > Transfer rate = 61.53 M bytes/second > 10-Feb 16:18 VUMEM004-sd JobId 19429: Spooling data again ... > 10-Feb 16:26 VUMEM004-sd JobId 19429: User specified spool size reached. > 10-Feb 16:26 VUMEM004-sd JobId 19429: Writing spooled data to Volume. > Despooling 21,474,877,357 bytes ... > 10-Feb 16:31 VUMEM004-sd JobId 19429: End of Volume "06D142L3" at 575:9470 on > device "LTO3" (/dev/ULTRIUM-TD3). Write of 64512 bytes got -1. > 10-Feb 16:31 VUMEM004-sd JobId 19429: Re-read of last block succeeded. > 10-Feb 16:31 VUMEM004-sd JobId 19429: End of medium on Volume "06D142L3" > Bytes=575,574,128,640 Blocks=8,921,969 at 10-Feb-2010 16:31. > 10-Feb 16:31 VUMEM004-sd JobId 19429: 3307 Issuing autochanger "unload slot > 23, drive 0" command. > 10-Feb 16:31 VUMEM004-sd JobId 19429: 3995 Bad autochanger "unload slot 23, > drive 0": ERR=Datei oder Verzeichnis nicht gefunden > Results= > 10-Feb 16:31 VUMEM004-dir JobId 19429: There are no more Jobs associated with > Volume "06D132L3". Marking it purged. > 10-Feb 16:31 VUMEM004-dir JobId 19429: All records pruned from Volume > "06D132L3"; marking it "Purged" > 10-Feb 16:31 VUMEM004-dir JobId 19429: Recycled volume "06D132L3" > 10-Feb 16:31 VUMEM004-sd JobId 19429: 3301 Issuing autochanger "loaded? drive > 0" command. > 10-Feb 16:31 VUMEM004-sd JobId 19429: 3991 Bad autochanger "loaded? drive 0" > command: ERR=Datei oder Verzeichnis nicht gefunden. > Results= > 10-Feb 16:31 VUMEM004-sd JobId 19429: 3301 Issuing autochanger "loaded? drive > 0" command. > 10-Feb 16:31 VUMEM004-sd JobId 19429: 3991 Bad autochanger "loaded? drive 0" > command: ERR=Datei oder Verzeichnis nicht gefunden. > Results= > 10-Feb 16:31 VUMEM004-sd JobId 19429: 3304 Issuing autochanger "load slot 13, > drive 0" command. > 10-Feb 16:31 VUMEM004-sd JobId 19429: Fatal error: 3992 Bad autochanger "load > slot 13, drive 0": ERR=Datei oder Verzeichnis nicht gefunden. > Results= > 10-Feb 16:31 VUMEM004-sd JobId 19429: Fatal error: spool.c:302 Fatal append > error on device "LTO3" (/dev/ULTRIUM-TD3): ERR= > 10-Feb 16:31 VUMEM004-sd JobId 19429: Despooling elapsed time = 00:04:37, > Transfer rate = 77.52 M bytes/second > 10-Feb 16:32 VUMEM004-sd JobId 19429: Fatal error: fd_cmds.c:177 FD command > not found: <8F>~<8D><D7>_<96><F0><DC>ee<90>5l<ED><FF>^\<EF>=<FB><F8><9A> > ESC<8E><AC><C2><C1>^\z<F9><8C><CE>`<9D>R<D4><F8>`<EA>=<BE><FA>#<F0>I^X?q+<A3><EB>^N̖<FF><E1><D7>cۈ<F6>#w^C+kg^K<CF>U<BD>{<9B>uLj<9E><F1><96>^^<D6>>a > <E7><84>^^<B7>m<AC>^^^Q<BA>y<99>)N<9F><FA>^C9<A2><D0>sc<D8>3kX<9F><A6>I<A4>+O\š<B4>%<D1><FC> > ^U^R<^KJ<DD><E6>^G3<DF>:̢<B9><C4>˨<BC><E9><81><D2>̿NJ > ^V<E1>^<A7>^P<9D>^H<83><B9><C0>A^Gtٴ<C4>rw<AD>젼RD<C1><F3><E0><E2>^Z<C9>^?r҃^Z<C0>^H^Mu9<E5>6><8E><F6><FA><FC><E2>\<99>^?<E9>x`14<E8>w<B2>Q,<A0>:骘:6 > <DD>^?^XQ4^]<F9><86>>^Y<D2>^\^]<D3><FE><8F>^E<B5>q<B8><B3>^G<8C><86>@<C0><C5>cESCo<FA><CD>^As1f<B7><C2>i{<A1><8D><F6>v<AD><BF>^A<B7><90>)-<8C>^V^C > <EA> > <DE><FA><A6>^AE^F<87>]^V<9E><C0><B6>jb2<D4>$6`Ӆ<CE><D2>V<94><9F>H"<F4><82>5<93>]C<F0>0X^L<8A><98><FB>\<AC>ѵ<FD><90>h"|<8F>/<A7>^OD<CD>^UG5>^D{ > <CE><F7>~<F9>GZk<80><CC><F8><92><BE><AB>o^L<93><8D>h<F1>d>D<E9><B7>-<C1><8C>^]<8B>3^O<82>^_<DB>^\U<AA><88><F3><88><EB>^\<F6>^S&f<98><9D>S#<D7><FC> > <99><BF>):11^_^Ol > 2'^_<DB><C8><EB> > <D7><E7><8C><CE><E2>nx<8E> > <B3>^UC<8A>^G^B<D9><F6>M}PZp<C0>9^T<A0><C6>r]<EF><8A> > <AE>AS<85><D7>j/G^W<84><C4>C<BB><C6><F3><CB><E6>!...@t^m^u<80>]o<F2><E6>^D}<CF>*^T%<87>M<FB>^Vy<C2>d<FE>X~t^O^ACI<C9>G^Zf<A6><AC><D7>1\3H<8E>}y<B7><B7>ސIM<D7>ϱ<BE>yQ<B7><D7>p<D3>U؟<E7><A8>Ц<A8>ky<F9>)<AB><FF><E1><EB>K1<A5><D5><F5>^Kd<D2>Ἦ<FE><D7>`G<9C><EB><AF>v<FA> > 10-Feb 16:32 VUMEM004-sd JobId 19429: Job write elapsed time = 47:21:16, > Transfer rate = 19.25 M bytes/second > 10-Feb 16:32 VUMEM004-sd JobId 19429: Fatal error: fd_cmds.c:166 Command > error with FD, hanging up. Append data error. > > 10-Feb 16:32 VU0EM003 JobId 19429: Fatal error: backup.c:892 Network send > error to SD. ERR=Connection reset by peer > 10-Feb 16:32 VUMEM004-dir JobId 19429: Error: Bacula VUMEM004-dir 3.0.3 > (18Oct09): 10-Feb-2010 16:32:20 > Build OS: x86_64-pc-linux-gnu debian 4.0 > JobId: 19429 > Job: VU0EM003.2010-02-08_16.26.19_08 > Backup Level: Full > Client: "VU0EM003" 2.4.4 (28Dec08) > x86_64-pc-linux-gnu,debian,4.0 > FileSet: "VU0EM003" 2007-06-12 00:05:01 > Pool: "Full" (From Job FullPool override) > Catalog: "MyCatalog" (From Client resource) > Storage: "NEC-T40A_B-Net" (From user selection) > Scheduled time: 08-Feb-2010 16:25:56 > Start time: 08-Feb-2010 16:56:49 > End time: 10-Feb-2010 16:32:20 > Elapsed time: 1 day 23 hours 35 mins 31 secs > Priority: 10 > FD Files Written: 6,645,663 > SD Files Written: 6,645,660 > FD Bytes Written: 3,280,593,534,704 (3.280 TB) > SD Bytes Written: 3,282,303,930,997 (3.282 TB) > Rate: 19147.7 KB/s > Software Compression: None > VSS: no > Encryption: no > Accurate: no > Volume name(s): > 06D139L3|06D123L3|06D145L3|06D144L3|06D138L3|06D142L3 > Volume Session Id: 1 > Volume Session Time: 1265641290 > Last Volume Bytes: 1 (1 B) > Non-fatal FD errors: 0 > SD Errors: 0 > FD termination status: Error > SD termination status: Error > > > > 10-Feb 16:22 VU0EA003-sd JobId 19427: User specified spool size reached. > 10-Feb 16:22 VU0EA003-sd JobId 19427: Writing spooled data to Volume. > Despooling 21,475,033,052 bytes ... > 10-Feb 16:26 VU0EA003-sd JobId 19427: Despooling elapsed time = 00:03:26, > Transfer rate = 104.2 M bytes/second > 10-Feb 16:26 VU0EA003-sd JobId 19427: Spooling data again ... > 10-Feb 16:30 VU0EA003-sd JobId 19427: User specified spool size reached. > 10-Feb 16:30 VU0EA003-sd JobId 19427: Writing spooled data to Volume. > Despooling 21,475,033,052 bytes ... > 10-Feb 16:33 VU0EA003-sd JobId 19427: Despooling elapsed time = 00:03:44, > Transfer rate = 95.87 M bytes/second > > ---- nothing more from JobId 19427 ---- > > restart of director > > > Ralf > > ------------------------------------------------------------------------------ > SOLARIS 10 is the OS for Data Centers - provides features such as DTrace, > Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW > http://p.sf.net/sfu/solaris-dev2dev > _______________________________________________ > Bacula-users mailing list > Bacula-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/bacula-users ------------------------------------------------------------------------------ SOLARIS 10 is the OS for Data Centers - provides features such as DTrace, Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW http://p.sf.net/sfu/solaris-dev2dev _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users