Re: y didn't amanda report this as an error?

2003-09-24 Thread Jon LaBadie
On Wed, Sep 24, 2003 at 01:54:49PM -0500, Deb Baddorf wrote:
> From a client  machine,  the admin sent me this:
> 
> Sep 24 02:45:32 daesrv /kernel: pid 7638 (gzip), uid 2: exited on signal 11 
> (core dumped)
> 
> The above message shows gzip crashed on daesrv last night.  It crashed
> because there is a hardware problem on that machine, but since it was
> probably part of an amanda backup that did not work as expected, I wanted
> to be sure amanda had reported something about it to you.   -client admin
> 
> Amanda herself had reported a strange error in her mail report:
> 
> daesrv.fna /usr lev 0 STRANGE
> .
> | DUMP: 33.76% done, finished in 1:20
> ? sendbackup: index tee cannot write [Broken pipe]

Note the problem was in making the index, not the backup.

> | DUMP: Broken pipe
> | DUMP: The ENTIRE dump is aborted.
> ? index returned 1
> ??error [/sbin/dump returned 3, compress got signal 11]? dumper: strange 
> [missing size line from sendbackup]
> ? dumper: strange [missing end line from sendbackup]
> \
> 
> 
> But it appears that she went ahead and stored the partial data on tape
> anyway,   and considered this a good level 0 backup.   (admin config due
> shows the next level 0 is 7 days away)
> 
> daesrv.fnal.gov /usr 0 0 3605024 -- 47:40 1260.7 12:35 4773.9
> 
> Why doesn't amanda recognize this as a failure?
> Am I missing something that I should have noticed?
> Or am I reading it wrong (the fact that "due" implies a level 0 was done)?

Did your report show it was "taped".  If so I suspect the backup is ok,
but using amrecover with the index will be suspect/problematical.

-- 
Jon H. LaBadie  [EMAIL PROTECTED]
 JG Computing
 4455 Province Line Road(609) 252-0159
 Princeton, NJ  08540-4322  (609) 683-7220 (fax)


Re: y didn't amanda report this as an error?

2003-09-24 Thread Deb Baddorf
At 03:36 PM 9/24/2003 -0400, Jon LaBadie wrote:
On Wed, Sep 24, 2003 at 01:54:49PM -0500, Deb Baddorf wrote:
> From a client  machine,  the admin sent me this:
>
> Sep 24 02:45:32 daesrv /kernel: pid 7638 (gzip), uid 2: exited on 
signal 11
> (core dumped)
>
> The above message shows gzip crashed on daesrv last night.  It crashed
> because there is a hardware problem on that machine, but since it was
> probably part of an amanda backup that did not work as expected, I wanted
> to be sure amanda had reported something about it to you.   -client admin
>
> Amanda herself had reported a strange error in her mail report:
>
> daesrv.fna /usr lev 0 STRANGE
> .
> | DUMP: 33.76% done, finished in 1:20
> ? sendbackup: index tee cannot write [Broken pipe]

Note the problem was in making the index, not the backup.
Wel  but the client was doing it's own compressing.   So when the
gzipper failed,  the whole backup failed.   At only 33% finished.
I just did a test amrestore  (true,  amrecover wouldn't touch it).
Got about 1/3 the amount of data that ought to be on that disk.
So I think it really did fail,   but registered it as a successful level 0
backup.  :-(

> | DUMP: Broken pipe
> | DUMP: The ENTIRE dump is aborted.
> ? index returned 1
> ??error [/sbin/dump returned 3, compress got signal 11]? dumper: strange
> [missing size line from sendbackup]
> ? dumper: strange [missing end line from sendbackup]
> \
>
>
> But it appears that she went ahead and stored the partial data on tape
> anyway,   and considered this a good level 0 backup.   (admin config due
> shows the next level 0 is 7 days away)
>
> daesrv.fnal.gov /usr 0 0 3605024 -- 47:40 1260.7 12:35 4773.9
>
> Why doesn't amanda recognize this as a failure?
> Am I missing something that I should have noticed?
> Or am I reading it wrong (the fact that "due" implies a level 0 was done)?
Did your report show it was "taped".  If so I suspect the backup is ok,
but using amrecover with the index will be suspect/problematical.
--
Jon H. LaBadie  [EMAIL PROTECTED]
 JG Computing
 4455 Province Line Road(609) 252-0159
 Princeton, NJ  08540-4322  (609) 683-7220 (fax)



Re: y didn't amanda report this as an error?

2003-09-24 Thread Jon LaBadie
On Wed, Sep 24, 2003 at 05:30:59PM -0500, Deb Baddorf wrote:
> At 03:36 PM 9/24/2003 -0400, Jon LaBadie wrote:
> >On Wed, Sep 24, 2003 at 01:54:49PM -0500, Deb Baddorf wrote:
> >> From a client  machine,  the admin sent me this:
> >>
> >> Sep 24 02:45:32 daesrv /kernel: pid 7638 (gzip), uid 2: exited on 
> >signal 11
> >> (core dumped)
> >>
> >> The above message shows gzip crashed on daesrv last night.  It crashed
> >> because there is a hardware problem on that machine, but since it was
> >> probably part of an amanda backup that did not work as expected, I wanted
> >> to be sure amanda had reported something about it to you.   -client admin
> >>
> >> Amanda herself had reported a strange error in her mail report:
> >>
> >> daesrv.fna /usr lev 0 STRANGE
> >> .
> >> | DUMP: 33.76% done, finished in 1:20
> >> ? sendbackup: index tee cannot write [Broken pipe]
> >
> >Note the problem was in making the index, not the backup.
> 
> Wel  but the client was doing it's own compressing.   So when the
> gzipper failed,  the whole backup failed.   At only 33% finished.
> I just did a test amrestore  (true,  amrecover wouldn't touch it).
> Got about 1/3 the amount of data that ought to be on that disk.
> So I think it really did fail,   but registered it as a successful level 0
> backup.  :-(

Certainly sounds like a situation that amanda should not have
recorded as a valid level 0.  Has anyone else noted this?
There have been several reports showing failed pipes in
the index stream for various reasons.  I wonder if those also
were reported as valid dumps.

-- 
Jon H. LaBadie  [EMAIL PROTECTED]
 JG Computing
 4455 Province Line Road(609) 252-0159
 Princeton, NJ  08540-4322  (609) 683-7220 (fax)


Re: y didn't amanda report this as an error?

2003-09-24 Thread Phil Homewood
Jon LaBadie wrote:
> There have been several reports showing failed pipes in
> the index stream for various reasons.  I wonder if those also
> were reported as valid dumps.

Now that you mention it, I have had this, a couple of times in the
last week. Am still trying to debug it, but:

sendbackup: start [hammer:/home level 0]
sendbackup: info BACKUP=/bin/tar
sendbackup: info RECOVER_CMD=/bin/gzip -dc |/bin/tar -f... -
sendbackup: info COMPRESS_SUFFIX=.gz
sendbackup: info end
? sendbackup: index tee cannot write [Broken pipe]  
? index returned 1
??error [/bin/tar got signal 13, compress got signal 11]? dumper: strange [missing 
size line from sendbackup]
? dumper: strange [missing end line from sendbackup] 

[...]

hammer   /home   0   0   1568   --0:08 184.4   0:04 442.5

The "listed incremental dir" shows:

-rw---1 backup   backup  0 Sep 23 23:20 hammer_home_0.new

The filesystem in question is some 13Gb. Apparently the compress
process is SEGVing, but I'm not seeing a core anywhere. Amanda
version is 2.4.4-2 (Debian package), server and client are the
same machine. Not seeing this on any other boxes, and I have
another Debian box with a very similar configuration working
fine.
-- 
Phil Homewood, Systems Janitor, http://www.SnapGear.com
[EMAIL PROTECTED] Ph: +61 7 3435 2810 Fx: +61 7 3891 3630
SnapGear - Custom Embedded Solutions and Security Appliances


Re: y didn't amanda report this as an error?

2003-09-25 Thread Jean-Louis Martineau
Hi Deb,

Which release of amanda are you using?

amanda-2.4.4p1 will report a failed dump for this kind error and
reschedule a level 0 for the next day.

That was fixed on 2003-04-26.

Jean-Louis

On Wed, Sep 24, 2003 at 01:54:49PM -0500, Deb Baddorf wrote:
> From a client  machine,  the admin sent me this:
> 
> Sep 24 02:45:32 daesrv /kernel: pid 7638 (gzip), uid 2: exited on signal 11 
> (core dumped)
> 
> The above message shows gzip crashed on daesrv last night.  It crashed
> because there is a hardware problem on that machine, but since it was
> probably part of an amanda backup that did not work as expected, I wanted
> to be sure amanda had reported something about it to you.   -client admin
> 
> Amanda herself had reported a strange error in her mail report:
> 
> daesrv.fna /usr lev 0 STRANGE
> .
> | DUMP: 33.76% done, finished in 1:20
> ? sendbackup: index tee cannot write [Broken pipe]
> | DUMP: Broken pipe
> | DUMP: The ENTIRE dump is aborted.
> ? index returned 1
> ??error [/sbin/dump returned 3, compress got signal 11]? dumper: strange 
> [missing size line from sendbackup]
> ? dumper: strange [missing end line from sendbackup]
> \
> 
> 
> But it appears that she went ahead and stored the partial data on tape
> anyway,   and considered this a good level 0 backup.   (admin config due
> shows the next level 0 is 7 days away)
> 
> daesrv.fnal.gov /usr 0 0 3605024 -- 47:40 1260.7 12:35 4773.9
> 
> 
> Why doesn't amanda recognize this as a failure?
> Am I missing something that I should have noticed?
> Or am I reading it wrong (the fact that "due" implies a level 0 was done)?
> Deb Baddorf
> ---
> Deb Baddorf [EMAIL PROTECTED]  840-2289
> "Nobody told me that living happily ever after would be such hard work ..."
> S. White<
> 
> 

-- 
Jean-Louis Martineau email: [EMAIL PROTECTED] 
Departement IRO, Universite de Montreal
C.P. 6128, Succ. CENTRE-VILLETel: (514) 343-6111 ext. 3529
Montreal, Canada, H3C 3J7Fax: (514) 343-5834


Re: y didn't amanda report this as an error?

2003-09-25 Thread Deb Baddorf
At 09:39 AM 9/25/2003 -0400, Jean-Louis Martineau wrote:
Hi Deb,

Which release of amanda are you using?
server is running Amanda-2.4.3  on FreeBSD 4.7-RELEASE-p3
client is running Amanda-2.4.3b4 on FreeBSD 4.8-RELEASE i386
amanda-2.4.4p1 will report a failed dump for this kind error and
reschedule a level 0 for the next day.
amadmin CONFIG due NODE DISK
indicated the next level 0 wasn't scheduled for 7 days yet
(I forced one)

That was fixed on 2003-04-26.

Jean-Louis

On Wed, Sep 24, 2003 at 01:54:49PM -0500, Deb Baddorf wrote:
> From a client  machine,  the admin sent me this:
>
> Sep 24 02:45:32 daesrv /kernel: pid 7638 (gzip), uid 2: exited on 
signal 11
> (core dumped)
>
> The above message shows gzip crashed on daesrv last night.  It crashed
> because there is a hardware problem on that machine, but since it was
> probably part of an amanda backup that did not work as expected, I wanted
> to be sure amanda had reported something about it to you.   -client admin
>
> Amanda herself had reported a strange error in her mail report:
>
> daesrv.fna /usr lev 0 STRANGE
> .
> | DUMP: 33.76% done, finished in 1:20
> ? sendbackup: index tee cannot write [Broken pipe]
> | DUMP: Broken pipe
> | DUMP: The ENTIRE dump is aborted.
> ? index returned 1
> ??error [/sbin/dump returned 3, compress got signal 11]? dumper: strange
> [missing size line from sendbackup]
> ? dumper: strange [missing end line from sendbackup]
> \
>
>
> But it appears that she went ahead and stored the partial data on tape
> anyway,   and considered this a good level 0 backup.   (admin config due
> shows the next level 0 is 7 days away)
>
> daesrv.fnal.gov /usr 0 0 3605024 -- 47:40 1260.7 12:35 4773.9
>
>
> Why doesn't amanda recognize this as a failure?
> Am I missing something that I should have noticed?
> Or am I reading it wrong (the fact that "due" implies a level 0 was done)?
> Deb Baddorf
> ---
> Deb Baddorf [EMAIL PROTECTED]  840-2289
> "Nobody told me that living happily ever after would be such hard work ..."
> S. White<
>
>

--
Jean-Louis Martineau email: [EMAIL PROTECTED]
Departement IRO, Universite de Montreal
C.P. 6128, Succ. CENTRE-VILLETel: (514) 343-6111 ext. 3529
Montreal, Canada, H3C 3J7Fax: (514) 343-5834
---
Deb Baddorf [EMAIL PROTECTED]  840-2289
"Nobody told me that living happily ever after would be such hard work ..."
S. White<




Re: y didn't amanda report this as an error?

2003-09-25 Thread Phil Homewood
Phil Homewood wrote:
> Now that you mention it, I have had this, a couple of times in the
> last week. Am still trying to debug it, but:
> 
> ??error [/bin/tar got signal 13, compress got signal 11]? dumper: strange [missing 
> size line from sendbackup]

Turns out this also appears to be bad hardware, in case anyone's
collecting responses.

> hammer   /home   0   0   1568   --0:08 184.4   0:04 442.5

[left in to show that amanda still considers this a successful dump]
-- 
Phil Homewood, Systems Janitor, http://www.SnapGear.com
[EMAIL PROTECTED] Ph: +61 7 3435 2810 Fx: +61 7 3891 3630
SnapGear - Custom Embedded Solutions and Security Appliances


Re: y didn't amanda report this as an error?

2003-09-28 Thread Sven Rudolph
Jon LaBadie <[EMAIL PROTECTED]> writes:

> On Wed, Sep 24, 2003 at 01:54:49PM -0500, Deb Baddorf wrote:
> > From a client  machine,  the admin sent me this:
> > 
> > Sep 24 02:45:32 daesrv /kernel: pid 7638 (gzip), uid 2: exited on signal 11 
> > (core dumped)
> > 
> > The above message shows gzip crashed on daesrv last night.  It crashed
> > because there is a hardware problem on that machine, but since it was
> > probably part of an amanda backup that did not work as expected, I wanted
> > to be sure amanda had reported something about it to you.   -client admin
> > 
> > Amanda herself had reported a strange error in her mail report:
> > 
> > daesrv.fna /usr lev 0 STRANGE
> > .
> > | DUMP: 33.76% done, finished in 1:20
> > ? sendbackup: index tee cannot write [Broken pipe]
> 
> Note the problem was in making the index, not the backup.

But this is not relazed to a gzip. The gzip for the index always runs
on the server.

Instead this sound like /tmp full, probably on the client.

> > | DUMP: Broken pipe
> > | DUMP: The ENTIRE dump is aborted.
> > ? index returned 1
> > ??error [/sbin/dump returned 3, compress got signal 11]? dumper: strange 
> > [missing size line from sendbackup]
> > ? dumper: strange [missing end line from sendbackup]
> > \

And this is a result of the failed gzip; as already mentioned.

Sven