Re: Hardware compression and dump size problems

2007-01-04 Thread Joel Coltoff

Toomas Aas wrote:
BTW, I'm using a DLT1 drive (40 GB native) with hardware compression. My 
typical backup run is ca 40 GB, but after the new year, when Amanda 
autoflushed all the christmas-time dumps from the holding disk, it hit 
EOT at approximately 52 GB. Your data is different from mine, of course, 
but I do think that being able to tape this 53128750 kB dump with 40 GB 
tape drive and hardware compression cannot be taken for granted.




For whatever value it has here are the reports from our system from 2006.
We don't hit EOT all that often on our DLT-4000. You can see there is
quite a range of dumpsizes there. Most are below your size and I wouldn't
bet on it fitting. If I had to guess I'd say about half of this usage is
ascii.

  taper: tape Daily002 kb 50196096 fm 33 writing file: No space left on device
  taper: tape Daily001 kb 55124512 fm 36 writing file: No space left on device
  taper: tape Daily009 kb 50036448 fm 51 writing file: No space left on device
  taper: tape Daily002 kb 51181408 fm 49 writing file: No space left on device
  taper: tape Daily008 kb 50634176 fm 48 writing file: No space left on device
  taper: tape Daily001 kb 49143424 fm 20 writing file: No space left on device
  taper: tape Daily007 kb 45569632 fm 46 writing file: No space left on device
  taper: tape Daily010 kb 61835808 fm 43 writing file: No space left on device
  taper: tape Daily001 kb 48237696 fm 50 writing file: No space left on device
  taper: tape Daily005 kb 57189056 fm 42 writing file: No space left on device
  taper: tape Daily007 kb 61271168 fm 50 writing file: No space left on device

- Joel





Re: odd behavior after adding DLE

2006-12-22 Thread Joel Coltoff

Gene Heskett wrote:

Generally speaking, when amanda sets something up to do, it uses a set of 
defaults derived from previous stanza's.  My not too well educated guess 
is that if none have been established, the really empty options list is 
the problem.


I'm not convinced that the use of two '/services' strings in the disklist 
is 100% kosher either.  From perusing mine, the first string is the alias 
or the FQDN of the machine to be backed up.


That was one idea I had but even with a symbolic name, Services it had
problems. The tradeoff for me is matching up names on the printed reports
with files on the systems. I tend not to use a symbolic name (the second field
in my case) if I don't have to. I simplified to the DLE to look like

cluster1-node1.wmi.com /services global 1 local

and had the same problem. Since this relies on only 1 dumptype entry there
shouldn't be an ordering problem. The dumptypes come after all the other
options.

Where the use of {} is brought into play, see this stanza in your disklist 
if you just added to the default disklist:


What do you mean by the default disklist?

So by using '/services' twice, technically I believe its correct, but I'd 
be wary of unwanted interactions just to be my usual somewhat paranoid 
self.  But I think the real error is in the order of the dumptype defines 
in your amanda.conf.


I moved the dumptype define for this DLE to the bottom of amanda.conf and
the problem was still there. When I added strategy skip to the dumptype
define the problem went away. I don't claim this proves my config files
are correct but I feel that's the hint I am getting. Also, if there were
an issue with the order of entries in amanda.conf then moving the DLE in
the diskfile shouldn't make a difference.

I do have more data points. The backup ran ok last night with that DLE
at the bottom of the disklist file. I also added a duplicate DLE using a
cname for the host. Both completed without problems. There are also
other entries that were disabled with strategy skip. If I remove this
line from the dumptype I see the same problem.

I didn't expect these issues when we moved to our new servers. Then again,
I should know better. I've been down this road before and there is always
a surprise waiting for you.

- Joel



Re: odd behavior after adding DLE

2006-12-21 Thread Joel Coltoff

More info on my problem.

There is a positional dependency on where the DLE goes.
I can't just make it the second DLE. It's probably pointless
to determine where it needs to go. The problem is somewhere
else.

If I change the hostname of the DLE from the server host to
another host in the network there is no problem.

The server host is running drdb and lvm.

Output of df:

   *
   *
/dev/drbd0 7052464 65884   6628336   1% /services
   *

Perhaps drdb is getting in the way.

I'll see what happens tonight when the backup runs. I've added
the DLE a second time but used a cname we have for the host.

- Joel




odd behavior after adding DLE

2006-12-21 Thread Joel Coltoff

Hi,

I added a new DLE today

cluster1-node1.wmi.com /services /services {
system-files
} 1 local

Along with the dumptype in amanda.conf (a work in progress)

define dumptype global {
comment Global definitions
index yes
}
define dumptype system-files {
program GNUTAR
global
comment High priority
priority high
}

The DLE was right at the top of the disklist file.

If I run amadmin Daily disklist I get some of the
expected output plus the following:

*** glibc detected *** double free or corruption (fasttop): 0x09af6698 ***
Abort

If I move the new DLE elsewhere in the file I see no hint that there
is a problem.

amanda version: 2.5.1p1

uname -a:
Linux cluster1-node1.wmi.com 2.6.9-34.ELsmp #1 SMP Fri Feb 24 16:54:53 EST 2006
i686 i686 i386 GNU/Linux

gcc -v:
  gcc version 3.4.6 20060404 (Red Hat 3.4.6-3)

Any ideas what is causing this? Is there a something that gets missed
when the DLE moves?

I've looked a bit at the forums list at zmanda and other have seen this when 
doing dumps. I'm guessing all we do here is some parsing and perhaps some high

level checking. i.e. it shouldn't matter which version of tar I use. (1.13.25)

Thanks,
 - Joel





Question on bandwidth limits

2006-10-27 Thread Joel Coltoff

A recent thread on Loong backups got me to look at my configuration. It
always seemed to me that it should run a bit faster than it does. I wasn't too
concerned given that it started at 9:00 PM and was done by 2:00 AM. It never
got in the way of things. We are moving to a much larger server and I'd like
to resolve this for the flood of new files we'll have. We are running 2.4.4p2.

I've been trying different numbers in my interface setup and they don't seem
to have any effect. This is what I have in my amanda.conf file. Most of the
DLEs are on the host that runs amanda. I don't have a setting for netusage
so I get the default. When the dump is done amstatus reports.

network free kps: 10700

   define interface local {
   comment a local disk
   use 1 kbps
   }

   define interface ethernet {
   comment 10 Mbps ethernet
   use 400 kbps
  }

My disklist looks like this

phoenix.wmi.com/export/cdburn project-files   2   local
phoenix.wmi.com/export/cac  project-files   2   local
phoenix.wmi.com/export/opt   project-files   2   local
phoenix.wmi.com/export/project project-files   2   local

goliath.wmi.com /users /users {
   user-files
   exclude ./jc517/lightvr/*/*.o
   exclude append ./jc517/lightvr/*/*.bz2
} 1 ethernet

goliath.wmi.com /export /export {
   user-files
   include  ./wumpus ./plover ./uclibc ./vendor
} 1 ethernet

Finally, here is the tail of amstatus

network free kps:  6700
holding space   :  10119040k ( 99.92%)
dumper0 busy   :  4:20:38  ( 94.20%)
dumper1 busy   :  0:02:16  (  0.82%)
  taper busy   :  3:31:28  ( 76.43%)
0 dumpers busy :  0:14:44  (  5.33%)no-diskspace:  0:14:44  (100.00%)
1 dumper busy  :  4:19:40  ( 93.85%)no-bandwidth:  2:45:21  ( 63.68%)
not-idle:  1:34:19  ( 36.32%)
2 dumpers busy :  0:02:16  (  0.82%)no-bandwidth:  0:02:16  (100.00%)


If I run amstatus I'll see no-bandwidth associated with 1 dumper busy more 
often than not.
What's a reasonable number to use so that I have more than 1 dumper running at a 
time? I guess the real question is should a single dump saturate connections

to the localhost?

Thanks
 - Joel




Re: sudden backup failure - solved

2004-03-10 Thread Joel Coltoff
On Wed, 10 Mar 2004, Geoff Swavley wrote:

 sounds like your .amandahosts file has been fiddled with or is missing
 (failing that the user amanda .rhosts file).


The problem was unrelated to any amanda files. I needed to restart
xinetd. There was no service listening on the amanda udp port. I've
no idea how this happened.

This, on the other hand, is strictly an amanda issue.

% amadmin Daily due
   *
Overdue 12481 days: cac0.wmi.com:/export/home/ag
   *
%

I suspect the two problems are related. The last dump that ran had
problems on that dle

  driver: cac0.wmi.com /export/home/ag 0 [dump to tape failed, will try again]

After that nothing on that system worked.

-- 
Joel Coltoff

How often I found where I should be going only by setting out for
somewhere else.
-- R. Buckminster Fuller



sudden backup failure

2004-03-09 Thread Joel Coltoff
Hi,

My backups have been working fine for quite some time. In
the past few days I've had failures and don't know where to
start looking. If I run amcheck -s Daily the output looks
ok. If I run it without the '-s' this is my amcheck debug file.

amcheck: debug 1 pid 3709 ruid 200 euid 0: start at Tue Mar  9 15:19:58 2004
amcheck: dgram_bind: socket bound to 0.0.0.0.875
amcheck-clients: dgram_recv: recvfrom() failed: Connection refused
amcheck-clients: time 0.072: no feature set from host dpburn2.wmi.com
amcheck-clients: time 0.117: no feature set from host dpburn3.wmi.com
amcheck-clients: time 0.205: no feature set from host wmi0.wmi.com
amcheck-clients: time 0.258: no feature set from host yoda.wmi.com
changer: got exit: 0 str: 71 99 1
changer_query: changer return was 99 1
changer_query: searchable = 0
changer_find: looking for Daily003 changer is searchable = 0
changer: got exit: 0 str: 71 /dev/nst0
amcheck-clients: dgram_recv: recvfrom() failed: Connection refused
amcheck-clients: dgram_recv: recvfrom() failed: Connection refused
amcheck: pid 3709 finish time Tue Mar  9 15:20:28 2004

I still get incomplete dumps on the local host. If you need more
info I can post it. Any hints/insights/pointers would be greatly
appreciated.

Thanks.

-- 
Joel Coltoff

I was gratified to be able to answer promptly, and I did. I said I
didn't know. -- Mark Twain



going from 1 tape to 2

2004-01-30 Thread Joel Coltoff
Hi,

We've gotten to the point where Amanda now needs multiple tapes.
Our other systems crossed that point years ago. I've split the big
filesystem into two DLEs that I'll dump with tar. I've fixed the
amanda.conf file to account for all the new tapes and set tpchanger
to chg-manual. We are running 2.4.4p2.

I'm asking this question because I hate surprises. The existing tapes
are Daily001 ... Daily013. I've got 13 blank tapes to add to the pool.
I assume I need to label these myself. I'd like each dump to use
consecutive tapes (Daily001  Daily002). I doubt this will happen. What
should I expect to see? In general how will I know which is the next
tape it wants? Do I get this from the report of the previous dump or
does the changer tell me which tape to insert? Are there any surprises
in store for me? I'm hoping this is a quick and easy change but I such
a beast is rare.

Thanks.

-- 
Joel Coltoff

It is appallingly obvious that our technology exceeds our humanity.
-- Albert E. Einstein