Re: Problem with amandad?

2004-11-25 Thread Eric Siegerman
On Thu, Nov 25, 2004 at 06:39:58PM +0100, Sören Edzen wrote:
> amanda  6163 0.0 0.3 2296 844 ? Ss   18:29   0:00 amandad
> amanda  6165 0.0 0.00   0 ? Z18:29   0:00 [amandad] 
>  ___^___
> 
> Does  means that there's something wrong with amandad or
> could there be some other problem. 

 means that the process has exited, but its parent
process hasn't yet collected its exit status.  Such "undead"
processes are called "zombies".  See:
http://en.wikipedia.org/wiki/Zombie_process

--

|  | /\
|-_|/  >   Eric Siegerman, Toronto, Ont.[EMAIL PROTECTED]
|  |  /
The animal that coils in a circle is the serpent; that's why so
many cults and myths of the serpent exist, because it's hard to
represent the return of the sun by the coiling of a hippopotamus.
- Umberto Eco, "Foucault's Pendulum"


Re: Problems with HP SureStore VS80e (DLT1)

2004-11-25 Thread Andreas Haumer
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi!

Many thanks for your reports!
It's interesting to see that those drives seem to die quite often.
I didn't expect that!

Martin Hepworth wrote:
>
> Dave Ewart wrote:
>
[...]
>>
>> Hmmm - curious.  My notes show that our drive keeled over in May (not 10
>> months ago, as I said above), although it did survive the hot summer of
>> 2003 :-)
>>
>> Perhaps I should move ours into the air-conditioned room ...
>>
>> Dave.
>> - --
>
>
> Dave
>
> May 10 WAS the summer :-)
>
> More seriously England was thundery according to BBC, but can't find a
> specific diary for Oxford..
>
> Anyway just noticed they tended to die on hot days rather than cool
> ones..not had any troubles since all in rooms where temp <25c and fairly
> constant.
>
Just noticed that both of you guys live in England.
Now I don't know what your definition of "summer" is... ;-)

I don't think that temperature is a problem in our case.
While the office where the drive is located indeed does
not have air-condition, it's quite cool in general
(typical Viennese late 19th century building).

Anyway, in the afternoon I had a call with a HP support
guy. He asked me to have the HP "ltt" software run against
the drive to get more diagnostics. But alas it seems the
drive is not responsive anymore. Replacement time, again...

The HP support guy also told me, that the active SCSI bus
termination on the VS80e drives is very sensitive and is
one of the main reasons those drives die. Thunderstorms
and electrical induction on the SCSI bus might have negative
influence. I'm not sure if the symptoms I see support this
theory, though.

- - andreas

- --
Andreas Haumer | mailto:[EMAIL PROTECTED]
*x Software + Systeme  | http://www.xss.co.at/
Karmarschgasse 51/2/20 | Tel: +43-1-6060114-0
A-1100 Vienna, Austria | Fax: +43-1-6060114-71
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.1 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFBph84xJmyeGcXPhERAq7YAJ4xWFGNJuSvR8A4MhYYcpOfRhtCSQCfRnRW
hJHnpAn46KKMo09/Oaqmbj0=
=sA2q
-END PGP SIGNATURE-



Re: Problem with amandad?

2004-11-25 Thread Martin Hepworth
Hej
have a look in the /tmp/amanda directory on the client and see if any of 
the debug files there give you any further information.

--
Martin Hepworth
Snr Systems Administrator
Solid State Logic
Tel: +44 (0)1865 842300
Sören Edzen wrote:
Hi!
amanda work for me for a few weeks when today amcheck reported
this error with the amandad:
[snip]
Amanda Backup Client Hosts Check

WARNING: zampo: selfcheck reply timed out.
Client check: 1 host checked in 90.220 seconds, 1 problem found
(brought to you by Amanda 2.4.4p4)
-
[/snip]
When I did a ps check for amandad I got the following:
amanda  6163 0.0 0.3 2296 844 ? Ss   18:29   0:00 amandad
amanda  6165 0.0 0.00   0 ? Z18:29   0:00 [amandad] 
 ___^___
Does  means that there's something wrong with amandad or
could there be some other problem. 

**
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.
This footnote confirms that this email message has been swept
for the presence of computer viruses and is believed to be clean.
**


Problem with amandad?

2004-11-25 Thread Sören Edzen
Hi!
amanda work for me for a few weeks when today amcheck reported
this error with the amandad:

[snip]
Amanda Backup Client Hosts Check

WARNING: zampo: selfcheck reply timed out.
Client check: 1 host checked in 90.220 seconds, 1 problem found

(brought to you by Amanda 2.4.4p4)
-
[/snip]

When I did a ps check for amandad I got the following:

amanda  6163 0.0 0.3 2296 844 ? Ss   18:29   0:00 amandad
amanda  6165 0.0 0.00   0 ? Z18:29   0:00 [amandad] 
 ___^___

Does  means that there's something wrong with amandad or
could there be some other problem. 

-- 
Sören Edzen, Sjöfartsgatan 22A, 97437 Luleå, Sweden
Phone: 0920-255133, Cell: 070-6531975
mailto: [EMAIL PROTECTED]


Re: Problems with HP SureStore VS80e (DLT1)

2004-11-25 Thread Martin Hepworth
Dave Ewart wrote:
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On Thursday, 25.11.2004 at 13:57 +, Martin Hepworth wrote:

The situation quickly became worse and at some point the tape drive
refused to eject the tape cartridge. We called the HP support and
they replaced the drive without problems. With the new drive, the
nightly backup again started to work fine.
After about 2 weeks, the same errors occured again. The nightly
backup runs began to fail up to the point where the drive refused to
eject the cartridge. So, HP replaced the drive again and we started
to use our third drive in about 7 months of usage.
Interesting.  Same thing happened to us with a Tandberg DLT1 vs80
drive, which I believe is fundamentally the same as the HP model.
Same symptoms as the above, but we've only had one 'dead' drive.  I
assumed it was Just One Of Those Things.  The first drive lasted for
about 12 months and the second has been going fine for about 10
months so far.
Dave.
- -- 
Dave Ewart
And here - seems to be heat related. Of the drives that have died 
(2xDLT1's and 1x VS80 all HP), have died on hot days in not temp 
controlled rooms.

Hmmm - curious.  My notes show that our drive keeled over in May (not 10
months ago, as I said above), although it did survive the hot summer of
2003 :-)
Perhaps I should move ours into the air-conditioned room ...
Dave.
- -- 
Dave
May 10 WAS the summer :-)
More seriously England was thundery according to BBC, but can't find a 
specific diary for Oxford..

Anyway just noticed they tended to die on hot days rather than cool 
ones..not had any troubles since all in rooms where temp <25c and fairly 
constant.

only a few months to go b4 all you kit gets a nice home anyway...temp 
controlled, smooth ac power  !

--
Martin Hepworth
Snr Systems Administrator
Solid State Logic
Tel: +44 (0)1865 842300
**
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.
This footnote confirms that this email message has been swept
for the presence of computer viruses and is believed to be clean.
**


Re: Problems with HP SureStore VS80e (DLT1)

2004-11-25 Thread Dave Ewart
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Thursday, 25.11.2004 at 13:57 +, Martin Hepworth wrote:

> >>The situation quickly became worse and at some point the tape drive
> >>refused to eject the tape cartridge. We called the HP support and
> >>they replaced the drive without problems. With the new drive, the
> >>nightly backup again started to work fine.
> >>
> >>After about 2 weeks, the same errors occured again. The nightly
> >>backup runs began to fail up to the point where the drive refused to
> >>eject the cartridge. So, HP replaced the drive again and we started
> >>to use our third drive in about 7 months of usage.
> >
> >Interesting.  Same thing happened to us with a Tandberg DLT1 vs80
> >drive, which I believe is fundamentally the same as the HP model.
> >
> >Same symptoms as the above, but we've only had one 'dead' drive.  I
> >assumed it was Just One Of Those Things.  The first drive lasted for
> >about 12 months and the second has been going fine for about 10
> >months so far.
> >
> >Dave.
> >
> >- -- 
> >Dave Ewart
> 
> And here - seems to be heat related. Of the drives that have died 
> (2xDLT1's and 1x VS80 all HP), have died on hot days in not temp 
> controlled rooms.

Hmmm - curious.  My notes show that our drive keeled over in May (not 10
months ago, as I said above), although it did survive the hot summer of
2003 :-)

Perhaps I should move ours into the air-conditioned room ...

Dave.
- -- 
Dave Ewart
[EMAIL PROTECTED]
Computing Manager, Epidemiology Unit, Oxford
Cancer Research UK
PGP: CC70 1883 BD92 E665 B840 118B 6E94 2CFD 694D E370

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFBpgqQbpQs/WlN43ARAr7aAJ4tgWfkz8NF1FuPvzdNblvwlRoRBQCg2Riz
Ee+m536p36BvqAkDUkPdm+A=
=8Kio
-END PGP SIGNATURE-


Re: can't compile amanda solaris

2004-11-25 Thread Paul Bijnens
Nico Bouthoorn wrote:
Excuse me, but maybe this is FAQ, but i couldn't find it, i'm trying to 
compile amanda 2.4.4p4 on a solaris8 box with gcc 3.4.2.
I've configured it with: ./configure --with-user=amanda --with-group=disk
The compile will stop at:

bash-2.05# make
[...SNIP...]
false cru .libs/libamanda.a  alloc.o amflock.o clock.o debug.o dgram.o 
[...SNIP...]
make: *** [all-recursive] Error 1

The command "false" should have been "/usr/ccs/bin/ar".
Make sure you have /usr/ccs/bin in your path before running configure.
Or maybe a saved config from a different architecture was laying around
and ./configure kept those hints.  Do "make distclean", followed by
a "./configure --with..." again.
--
Paul Bijnens, XplanationTel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUMFax  +32 16 397.512
http://www.xplanation.com/  email:  [EMAIL PROTECTED]
***
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, F6, *
* quit,  ZZ, :q, :q!,  M-Z, ^X^C,  logoff, logout, close, bye,  /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* kill -9 1,  Alt-F4,  Ctrl-Alt-Del,  AltGr-NumLock,  Stop-A,  ...*
* ...  "Are you sure?"  ...   YES   ...   Phew ...   I'm out  *
***



Re: Problems with HP SureStore VS80e (DLT1)

2004-11-25 Thread Martin Hepworth
Dave Ewart wrote:
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On Thursday, 25.11.2004 at 14:22 +0100, Andreas Haumer wrote:

[...]
The situation quickly became worse and at some point the tape drive
refused to eject the tape cartridge. We called the HP support and they
replaced the drive without problems. With the new drive, the nightly
backup again started to work fine.
After about 2 weeks, the same errors occured again. The nightly backup
runs began to fail up to the point where the drive refused to eject
the cartridge. So, HP replaced the drive again and we started to use
our third drive in about 7 months of usage.
[...]

Interesting.  Same thing happened to us with a Tandberg DLT1 vs80 drive,
which I believe is fundamentally the same as the HP model.
Same symptoms as the above, but we've only had one 'dead' drive.  I
assumed it was Just One Of Those Things.  The first drive lasted for
about 12 months and the second has been going fine for about 10 months
so far.
Dave.
- -- 
Dave Ewart
And here - seems to be heat related. Of the drives that have died 
(2xDLT1's and 1x VS80 all HP), have died on hot days in not temp 
controlled rooms.

--
Martin Hepworth
Senior Systems Administrator
Solid State Logic Ltd
tel: +44 (0)1865 842300
**
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.
This footnote confirms that this email message has been swept
for the presence of computer viruses and is believed to be clean.
**


Re: Problems with HP SureStore VS80e (DLT1)

2004-11-25 Thread Dave Ewart
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Thursday, 25.11.2004 at 14:22 +0100, Andreas Haumer wrote:

> [...]
> 
> The situation quickly became worse and at some point the tape drive
> refused to eject the tape cartridge. We called the HP support and they
> replaced the drive without problems. With the new drive, the nightly
> backup again started to work fine.
> 
> After about 2 weeks, the same errors occured again. The nightly backup
> runs began to fail up to the point where the drive refused to eject
> the cartridge. So, HP replaced the drive again and we started to use
> our third drive in about 7 months of usage.
> 
> [...]

Interesting.  Same thing happened to us with a Tandberg DLT1 vs80 drive,
which I believe is fundamentally the same as the HP model.

Same symptoms as the above, but we've only had one 'dead' drive.  I
assumed it was Just One Of Those Things.  The first drive lasted for
about 12 months and the second has been going fine for about 10 months
so far.

Dave.

- -- 
Dave Ewart
[EMAIL PROTECTED]
Computing Manager, Epidemiology Unit, Oxford
Cancer Research UK
PGP: CC70 1883 BD92 E665 B840 118B 6E94 2CFD 694D E370

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFBpeGhbpQs/WlN43ARAqHXAKD24gjgFLNIOksgERPKP66N4GyUTgCguxOw
SSl+zAOfI9bi5e1TmHwOLH0=
=ghHC
-END PGP SIGNATURE-


can't compile amanda solaris

2004-11-25 Thread Nico Bouthoorn
Hi All,
Excuse me, but maybe this is FAQ, but i couldn't find it, i'm trying to compile 
amanda 2.4.4p4 on a solaris8 box with gcc 3.4.2.
I've configured it with: ./configure --with-user=amanda --with-group=disk
The compile will stop at:

bash-2.05# make
Making all in config
make[1]: Entering directory `/var/tmp/amanda-2.4.4p4/config'
make  all-am
make[2]: Entering directory `/var/tmp/amanda-2.4.4p4/config'
make[2]: Leaving directory `/var/tmp/amanda-2.4.4p4/config'
make[1]: Leaving directory `/var/tmp/amanda-2.4.4p4/config'
Making all in common-src
make[1]: Entering directory `/var/tmp/amanda-2.4.4p4/common-src'
/bin/bash ../libtool --mode=link gcc  -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 
-g -O2 -o libamanda.la -rpath /usr/local/lib -release 2.4.4p4 alloc.lo 
amflock.lo clock.lo debug.lo dgram.lo error.lo file.lo fileheader.lo 
amfeatures.lo match.lo protocol.lo regcomp.lo regerror.lo regexec.lo regfree.lo 
security.lo statfs.lo stream.lo token.lo util.lo versuff.lo version.lo 
pipespawn.lo sl.lo  -lgen -lm -ltermcap -lsocket -lnsl -lintl
false cru .libs/libamanda.a  alloc.o amflock.o clock.o debug.o dgram.o error.o 
file.o fileheader.o amfeatures.o match.o protocol.o regcomp.o regerror.o 
regexec.o regfree.o security.o statfs.o stream.o token.o util.o versuff.o 
version.o pipespawn.o sl.o
make[1]: *** [libamanda.la] Error 1
make[1]: Leaving directory `/var/tmp/amanda-2.4.4p4/common-src'
make: *** [all-recursive] Error 1

anyone?
Thanks,
Nico


Problems with HP SureStore VS80e (DLT1)

2004-11-25 Thread Andreas Haumer
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi!

I have nasty problems with a HP SureStore VS80e DLT1 drive
which is used as backup tape drive by amanda and I'm
running out of ideas...

The situation: in March 2004 I installed a new HP
SureStore VS80e DLT1 backup drive to be used by Amanda-2.4.4p2
to store backups of a Linux file- and mailserver on a regular basis.
It was decided to do full backups on each run, so on each run there
get about 40-50GB of data dumped to the tape.

The tape drive is connected to the Linux fileserver.
The fileserver has a dual U320 Fusion MTP SCSI controller:

02:04.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 (rev 07)
02:04.1 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 (rev 07)

On the first channel there are four Seagate ST336607LC drives
in a software RAID5 configuration, the DLT1 drive is connected
on the second SCSI channel and is the only device there. The
drive uses it's built-in active termination to terminate the
external SCSI bus.

The software RAID5 delivers a throughput of about 70MB/s on
sequential write and about 90MB/s on sequential read. The system
runs rock solid since march 2004 and delivers good performance
as fileserver (1GB of RAM, single Intel Xeon 2.4GHz CPU with HT)

I tried to optimize the tape throughput by increasing buffer size.
So I set the buffer of the Linux st driver to 512k using the
following option in /etc/modules.conf:

options st buffer_kbs=512

This should make sure that the SCSI tape driver has enough buffer
space to store the data before it get's transferred to the drive.

I use the following Amanda tapetype definition to have a tape
buffer size of 128kB. The HP docs say that the tape block size
should be at least 64kB so I decided to use 128kB (Amanda was
compiled by myself with the "--with-maxtapeblocksize=1024"
configure-option)

define tapetype DLT1 {
comment "DLT1 VS80"
length 80 gbytes
filemark 1 byte
speed  4 mbytes
blocksize 128 kbytes
}

In amanda.conf I set tapebufs to 64 to have Amanda allocate
a buffer for the "taper" with a size of about 2MB or about
16 tape blocks

We have about 20 DLT1 cartridges which get rotated manually
and we insert the cleaning cartridge every friday. No cartridge
is older than 9 months.

We use the drive's hardware compression function, and depending
on the filetypes there seem to fit about 50GB on the DLT1 tape
(40GB native)

We also have a holding disk on the RAID5 array which is used
by Amanda to store the backups of the mail- and fileserver
before they get dumped to tape.

We currently have 18 DLE's in our disklist and the average
throughput for the "taper" is about 3.5 MB/s with a minimum
of about 2.9MB/s and a maximum of about 5MB/s, depending on
size and type of the DLE. I think, these are typical values
for a mixture of typical files (E-Mail, Word- and Excel-documents,
PDF files, JPEG images, user-profiles, etc)
We have DDS4 drives on other installations which show the same
values, and we also have LTO drives which show a throughput of
about 8-10MB/s. According to the specifications of the drives,
this is what I'd expected.

"amstatus" for a typical backup run shows the following:

SUMMARY  part  real  estimated
   size   size
partition   :  18
estimated   :  18 45960630k
flush   :   0 0k
failed  :   00k   (  0.00%)
wait for dumping:   00k   (  0.00%)
dumping to tape :   00k   (  0.00%)
dumping :   0 0k 0k (  0.00%) (  0.00%)
dumped  :  18  45892830k  45960630k ( 99.85%) ( 99.85%)
wait for writing:   0 0k 0k (  0.00%) (  0.00%)
wait to flush   :   0 0k 0k (100.00%) (  0.00%)
writing to tape :   0 0k 0k (  0.00%) (  0.00%)
failed to tape  :   0 0k 0k (  0.00%) (  0.00%)
taped   :  18  45892830k  45960630k ( 99.85%) ( 99.85%)
  tape 1:  18  45892830k  45960630k ( 54.71%) MO1
10 dumpers idle : not-idle
taper idle
network free kps:215040
holding space   :  47440472k (100.00%)
 dumper0 busy   :  1:09:35  ( 29.51%)
 dumper1 busy   :  0:33:52  ( 14.36%)
 dumper2 busy   :  0:04:25  (  1.87%)
 dumper3 busy   :  0:00:41  (  0.29%)
 dumper4 busy   :  0:00:32  (  0.23%)
 dumper5 busy   :  0:02:42  (  1.14%)
 dumper6 busy   :  0:03:13  (  1.37%)
   taper busy   :  3:39:41  ( 93.15%)
 0 dumpers busy :  2:46:13  ( 70.48%)not-idle:  2:46:13  (100.00%)
 1 dumper busy  :  0:35:43  ( 15.15%)  client-constrained:  0:32:22  ( 90.66%)
   start-wait:  0:01:59  (  5.60%)
 not-idle:  0:01:20  (  3.75%)
 2 dumpers busy :  0:29:28  ( 12.50%)  client-constrained:  0:28:43  ( 97.46%)
   start-wait:  0:00:44  (  2.54%)
 3 dumpers busy :  0:00:41