Hi,

I'm running into an occasional problem where a dumper would run w/ 100% CPU
for days doesn't seem to stop.  The only fix I have is to manually kill the
dumper.  I'm running amanda-backup_server-3.3.3-1.rhel5

-------------
Here's the dumper.<DATE>.debug:

Sun Feb 16 23:16:34 2014: thd-0x13f128e0: dumper:
security_getdriver(name=ssh) returns 0x2aca5c8fe520
Sun Feb 16 23:16:34 2014: thd-0x13f128e0: dumper:
security_handleinit(handle=0x13f8e960, driver=0x2aca5c8fe520 (SSH))
Sun Feb 16 23:16:34 2014: thd-0x13f128e0: dumper:
security_streaminit(stream=0x13f96eb0, driver=0x2aca5c8fe520 (SSH))
Sun Feb 16 23:16:34 2014: thd-0x13f128e0: dumper: exec: /usr/bin/ssh -x -o
BatchMode=yes -o PreferredAuthentications=publickey -l amandabackup -i
/var/lib/amanda/.ssh/id_rsa
colo1pchef01.company.com/usr/libexec/amanda/amandad -auth=ssh
Sun Feb 16 23:16:34 2014: thd-0x13f128e0: dumper: tcpm_send_token: data is
still flowing
Sun Feb 16 23:16:34 2014: thd-0x13f128e0: dumper: got response:
----
CONNECT DATA 499999 MESG 499998 INDEX 499997
OPTIONS features=ffffffff9efefbffffffffff0f;

----

Sun Feb 16 23:16:34 2014: thd-0x13f128e0: dumper:
security_streaminit(stream=0x13f9ef10, driver=0x2aca5c8fe520 (SSH))
Sun Feb 16 23:16:34 2014: thd-0x13f128e0: dumper:
security_streaminit(stream=0x13fa6f70, driver=0x2aca5c8fe520 (SSH))
Sun Feb 16 23:16:34 2014: thd-0x13f128e0: dumper:
security_streaminit(stream=0x13faefd0, driver=0x2aca5c8fe520 (SSH))
Sun Feb 16 23:16:34 2014: thd-0x13f128e0: dumper:
security_close(handle=0x13f8e960, driver=0x2aca5c8fe520 (SSH))
Sun Feb 16 23:16:34 2014: thd-0x13f128e0: dumper:
security_stream_close(0x13f96eb0)
Sun Feb 16 23:16:35 2014: thd-0x13f128e0: dumper: execute: /usr/bin/gzip
--best
Sun Feb 16 23:16:35 2014: thd-0x13f128e0: dumper: Building type FILE header
of 32768-32768 bytes with name='colo1pchef01' disk='/var' dumplevel=2 and
blocksize=32768
Sun Feb 16 23:16:35 2014: thd-0x13f128e0: dumper: Sending data to
127.0.0.1:3809

Sun Feb 16 23:16:35 2014: thd-0x13f128e0: dumper: make_socket opening
socket with family 2
Sun Feb 16 23:16:35 2014: thd-0x13f128e0: dumper: connect_port: Try  port
1024: available - Success
Sun Feb 16 23:16:35 2014: thd-0x13f128e0: dumper: connected to
127.0.0.1:3809
Sun Feb 16 23:16:35 2014: thd-0x13f128e0: dumper: our side is 0.0.0.0:1024
Sun Feb 16 23:16:35 2014: thd-0x13f128e0: dumper: try_socksize: send buffer
size is 65536
Sun Feb 16 23:16:45 2014: thd-0x13f128e0: dumper: stream_read_callback:
data is still flowing
Sun Feb 16 23:16:56 2014: thd-0x13f128e0: dumper: stream_read_callback:
data is still flowing


afterwards, the log line "dumper: stream_read_callback: data is still
flowing" would keep repeating about every 10 seconds.
--------------
Here's the process list, stuck dumper has PID 29924:

[root@backup1:/var/log/amanda/server/CompanyDaily]# ps -ef|grep "^511"
511       1905 29921  0 Feb16 ?        00:00:11 chunker1 CompanyDaily
511       1906 29924  0 Feb16 ?        00:01:21 [ssh] <defunct>
511       1907 29924  0 Feb16 ?        00:00:00 /usr/bin/gzip --best
511       3786 29928  0 Feb16 ?        00:00:00 [ssh] <defunct>
511       3788 29925  0 Feb16 ?        00:00:00 [ssh] <defunct>
511      29882  4670  0 Feb16 ?        00:00:00 crond
511      29886 29882  0 Feb16 ?        00:00:00 /usr/bin/perl
/usr/sbin/amdump CompanyDaily
511      29918 29882  0 Feb16 ?        00:00:00 /usr/sbin/sendmail
-FCronDaemon -i -odi -oem -oi -t
511      29919 29918  0 Feb16 ?        00:00:00 /usr/sbin/postdrop -r
511      29921 29886  0 Feb16 ?        00:00:04 /usr/libexec/amanda/driver
CompanyDaily
511      29922 29921  0 Feb16 ?        00:05:02 /usr/bin/perl
/usr/libexec/amanda/taper CompanyDaily
511      29923 29921  0 Feb16 ?        00:01:09 dumper0 CompanyDaily
511      29924 29921 99 Feb16 ?        2-10:58:00 dumper1 CompanyDaily
511      29925 29921  0 Feb16 ?        00:00:20 dumper2 CompanyDaily
511      29926 29921  0 Feb16 ?        00:04:06 dumper3 CompanyDaily
511      29927 29921  0 Feb16 ?        00:02:27 dumper4 CompanyDaily
511      29928 29921  0 Feb16 ?        00:02:21 dumper5 CompanyDaily

--
heres' the pstree output via pstree -lpa :


  |-crond,4670
  |   `-crond,29882
  |       |-amdump,29886 /usr/sbin/amdump LithiumDaily
  |       |   `-driver,29921 LithiumDaily
  |       |       |-chunker,1905 LithiumDaily
  |       |       |-dumper,29923 LithiumDaily
  |       |       |-dumper,29924 LithiumDaily
  |       |       |   |-gzip,1907 --best
  |       |       |   `-(ssh,1906)
  |       |       |-dumper,29925 LithiumDaily
  |       |       |   `-(ssh,3788)
  |       |       |-dumper,29926 LithiumDaily
  |       |       |-dumper,29927 LithiumDaily
  |       |       |-dumper,29928 LithiumDaily
  |       |       |   `-(ssh,3786)
  |       |       `-taper,29922 /usr/libexec/amanda/taper LithiumDaily
  |       `-sendmail,29918 -FCronDaemon -i -odi -oem -oi -t
  |           `-postdrop,29919 -r

----
Here's the netstat output for those amanda processes:

unix  3      [ ]         STREAM     CONNECTED     223556345 1905/chunker1
unix  3      [ ]         STREAM     CONNECTED     223556344 29921/driver
unix  3      [ ]         STREAM     CONNECTED     223529657 29928/dumper5
unix  3      [ ]         STREAM     CONNECTED     223529656 29921/driver
unix  3      [ ]         STREAM     CONNECTED     223529655 29927/dumper4
unix  3      [ ]         STREAM     CONNECTED     223529654 29921/driver
unix  3      [ ]         STREAM     CONNECTED     223529653 29926/dumper3
unix  3      [ ]         STREAM     CONNECTED     223529652 29921/driver
unix  3      [ ]         STREAM     CONNECTED     223529651 29925/dumper2
unix  3      [ ]         STREAM     CONNECTED     223529650 29921/driver
unix  3      [ ]         STREAM     CONNECTED     223529649 29924/dumper1
unix  3      [ ]         STREAM     CONNECTED     223529648 29921/driver
unix  3      [ ]         STREAM     CONNECTED     223529647 29923/dumper0
unix  3      [ ]         STREAM     CONNECTED     223529646 29921/driver
unix  3      [ ]         STREAM     CONNECTED     223529645 29922/perl
unix  3      [ ]         STREAM     CONNECTED     223529644 29921/driver
unix  2      [ ]         DGRAM                    223529619 29919/postdrop
unix  3      [ ]         STREAM     CONNECTED     223529616 29918/sendmail
unix  3      [ ]         STREAM     CONNECTED     223529615 29919/postdrop
unix  2      [ ]         DGRAM                    223529609 29918/sendmail

--------

Any ideas on why it's stuck?

thx
m

Reply via email to