date:20111128

long timeouts for very small DLEs

2011-11-28 Thread Jean-Francois Malouin

Hi,

Both client and server are at 3.3.0

I now see this all the time on this particular server-client: amanda
schedules, say an incremental level 1 (a few mbs usually) for a DLE
and when the chunker seems to have all done it's job, it just stands
there doing nothing, for hours sometimes, before it completes. I have
DLEs of a few GBs that complete faster than a 50MB one...

Suffice to say, this completely disrupts the backups. The amanda runs
now take more than 2 days to complete (51 DLEs in total) :(

Any help welcomed.
Thanks,
jf

Here are a few logfiles. I can provide more info if needed.

---
amstatus:  

gustav:/raid/dagher3   152m dumping 52m ( 99.97%) (15:23:32)

---
amdump logfile shows:

pondering gustav:/raid/dagher3... next_level0 6 last_level 1 (not due
for a full dump, picking an incr level)
  gustav /raid/dagher3 pri 2 lev 1 nsize 54160 csize 54160
DUMP gustav 9efefbff0f /raid/dagher3 2027201101 2
1 2011:11:24:1:11:1 54160 54160 143 378 "Skipping: a full is not
planned, so can't dump in degraded mode"
driver: send-cmd time 69151.276 to chunker1: PORT-WRITE 00-00070
/holddisk/top/2027201101/gustav._raid_dagher3.1 gustav
9efefbff0f /raid/dagher3 1 2011:11:24:1:11:1
9007199254740896 APPLICATION 54240 |;auth=BSDTCP;index;
driver: send-cmd time 69151.282 to dumper1: PORT-DUMP 00-00070 36699
gustav 9efefbff0f /raid/dagher3 NODEVICE 1
2011:11:24:1:11:1 amgtar "" "" "" "" BSDTCP AMANDA 127.0.0.1:36700 |"
BSDTCP\n  YES\n  YES\n
AMANDA\n  \n
amgtar\n\n  ignore\n
:_Directory_is_new$\n
:_Directory_has_been_renamed\n
:_file_changed_as_we_read_it$\n
:_socket_ignored$\n
:_Cannot_stat:_No_such_file_or_directory$\n
\n\n  check-device\n
NO\n\n  \n"
driver: state time 69151.282 free kps: 1023574 space: 3896191960
taper: idle idle-dumpers: 9 qlen tapeq: 1 runq: 16 roomq: 0 wakeup: 0
driver-idle: no-dumpers
driver: interface-state time 69151.282 if default: free 1023574
driver: hdisk-state time 69151.282 hdisk 0: free 3896191960 dumpers 3

---
chunker debug file: chunker.2028152332.debug

/var/tmp/amanda/server/top/chunker.2028152332.debug:Mon Nov 28
15:23:32 2011: chunker: getcmd: PORT-WRITE 00-00070
/holddisk/top/2027201101/gustav._raid_dagher3.1 gustav
9efefbff0f /raid/dagher3 1 2011:11:24:1:11:1
9007199254740896 APPLICATION 54240 |;auth=BSDTCP;index;
/var/tmp/amanda/server/top/chunker.2028152332.debug:Mon Nov 28
15:23:32 2011: chunker: Building type FILE header of 32768-32768 bytes
with name='gustav' disk='/raid/dagher3' dumplevel=1 and
blocksize=32768

---
client amandad debug file: amandad.2028152755.debug

[...]

Mon Nov 28 15:27:55 2011: amandad: sending ACK pkt:
<
>
Mon Nov 28 15:27:55 2011: amandad: tcpm_send_token: data is still
flowing
Mon Nov 28 15:27:55 2011: amandad:
security_streaminit(stream=0x235cd00, driver=0x7f1440cfe380 (BSDTCP))
Mon Nov 28 15:27:55 2011: amandad:
security_streaminit(stream=0x2364d60, driver=0x7f1440cfe380 (BSDTCP))
Mon Nov 28 15:27:55 2011: amandad:
security_streaminit(stream=0x236cdc0, driver=0x7f1440cfe380 (BSDTCP))
Mon Nov 28 15:27:55 2011: amandad: sending REP pkt:
<
CONNECT DATA 49 MESG 48 INDEX 47
OPTIONS features=9efefbff0f;
>
Mon Nov 28 15:27:55 2011: amandad: received ACK pkt:
<
>
Mon Nov 28 15:27:55 2011: amandad: opening security stream for fd 50
Mon Nov 28 15:27:55 2011: amandad: opening security stream for fd 51
Mon Nov 28 15:27:55 2011: amandad: opening security stream for fd 52
Mon Nov 28 15:27:55 2011: amandad: security_close(handle=0x23438c0,
driver=0x7f1440cfe380 (BSDTCP))
Mon Nov 28 15:27:55 2011: amandad: security_stream_close(0x2343a90)
Mon Nov 28 15:31:00 2011: amandad: tcpm_send_token: data is still
flowing

Can anyone explain this log error message?

2011-11-28 Thread Dave Ewart

Our setup:

- AMANDA server: Debian 'Stable' release, 2.6.1p2-3 direct from Debian
  packages;

- AMANDA clients: various, Debian 'Stable' with same version as above.

We upgraded from Debian 'oldstable' to 'stable' in the past few months
and since upgrading the client, we occasionally see the following on one
of our servers, named 'athena' relating to the /home volume which is
currently approximately 1.7TB in size.  For example from this morning:

FAILURE DUMP SUMMARY: 
   athena /home lev 0  FAILED "/usr/lib/amanda/runtar exited with status
1: see
 /var/log/amanda/client/Daily/sendsize.2018234504.debug"

The pertinent part of
/var/log/amanda/client/Daily/sendsize.2018234504.debug is:


1321659908.212265: sendsize: waiting for any estimate child: 1 running
1321659908.212332: sendsize: calculating for amname /home, dirname /home,
spindle -1 GNUTAR
1321659908.212399: sendsize: getting size via gnutar for /home level 0
1321659908.213588: sendsize: pipespawnv: stdoutfd is 3
1321659908.213715: sendsize: Spawning "/usr/lib/amanda/runtar runtar Daily
/bin/tar --create --file /dev/null --numeric-owner --directory /home
--one-file-system --listed-incremental
/var/lib/amanda/gnutar-lists/athena_home_0.new --sparse --ignore-failed-read
--totals --exclude-from /tmp/amanda/sendsize._home.2018234508.exclude ." in
pipeline
1321660065.551101: sendsize: Total bytes written: 1831488286720 (1.7TiB, 
11GiB/s)
1321660065.551698: sendsize: .
1321660065.551733: sendsize: estimate time for /home level 0: 157.338
1321660065.551742: sendsize: estimate size for /home level 0: 1788562780 KB
1321660065.551754: sendsize: waiting for runtar "/home" child
1321660065.570903: sendsize: after runtar /home wait
1321660065.592913: sendsize: getting size via gnutar for /home level 4
1321660065.701616: sendsize: pipespawnv: stdoutfd is 3
1321660065.711458: sendsize: Spawning "/usr/lib/amanda/runtar runtar Daily
/bin/tar --create --file /dev/null --numeric-owner --directory /home
--one-file-system --listed-incremental
/var/lib/amanda/gnutar-lists/athena_home_4.new --sparse --ignore-failed-read
--totals --exclude-from /tmp/amanda/sendsize._home.2018234745.exclude ." in
pipeline
1321660183.361325: sendsize: Total bytes written: 8443156480 (7.9GiB, 69MiB/s)
1321660183.361910: sendsize: .
1321660183.361935: sendsize: estimate time for /home level 4: 117.660
1321660183.361944: sendsize: estimate size for /home level 4: 8245270 KB
1321660183.361952: sendsize: waiting for runtar "/home" child
1321660183.387790: sendsize: after runtar /home wait
1321660183.409511: sendsize: getting size via gnutar for /home level 5
1321660183.513086: sendsize: pipespawnv: stdoutfd is 3
1321660183.513274: sendsize: Spawning "/usr/lib/amanda/runtar runtar Daily
/bin/tar --create --file /dev/null --numeric-owner --directory /home
--one-file-system --listed-incremental
/var/lib/amanda/gnutar-lists/athena_home_5.new --sparse --ignore-failed-read
--totals --exclude-from /tmp/amanda/sendsize._home.2018234943.exclude ." in
pipeline
1321660292.407967: sendsize: /bin/tar:
./heather/CEU_Work/csrc/bbquaire/mwlv_cron/nutauto.chk: File removed before we
read it
1321660298.633046: sendsize: Total bytes written: 4815687680 (4.5GiB, 41MiB/s)
1321660298.633493: sendsize: .
1321660298.633518: sendsize: estimate time for /home level 5: 115.120
1321660298.633526: sendsize: estimate size for /home level 5: 4702820 KB
1321660298.633533: sendsize: waiting for runtar "/home" child
1321660298.659697: sendsize: after runtar /home wait
1321660298.681181: sendsize: errmsg is /usr/lib/amanda/runtar exited with status
1: see /var/log/amanda/client/Daily/sendsize.2018234504.debug
1321660298.681237: sendsize: done with amname /home dirname /home spindle -1


I note that there were changes on the filesystem during the backup
process: this is normal (cron jobs etc.) and has caused nothing other
than a 'STRANGE' note in the report previously.

I also note that the error relates to a level 5 backup, but that a level
0 backup was actually carried out, apparently successfully:


athena /home 0 1788562460 532736530 29.8 746:27 11894.8 137:06 64766.4


I believe this is the appropriate runtar.*.debug corresponding to above,
based on the timestamps:


1321660183.516316: runtar: pid 2400 ruid 34 euid 0 version 2.6.1p2: start at 
Fri Nov 18 23:49:43 2011
1321660183.516378: runtar: version 2.6.1p2
1321660183.519739: runtar: /bin/tar version: tar (GNU tar) 1.23

1321660183.519872: runtar: config: Daily
1321660183.520622: runtar: pid 2400 ruid 0 euid 0 version 2.6.1p2: rename at 
Fri Nov 18

long timeouts for very small DLEs

Can anyone explain this log error message?

2 matches

Site Navigation

Mail list logo

Footer information