--On Saturday, June 04, 2005 16:08:32 +0200 Thorsten Jungeblut <[EMAIL PROTECTED]> wrote:
> Hi, > > since a few weeks, my amdump fails unreproducible. > After that, sometimes a few zombie-processes remain. eg. gzip or dumper. > After that, every command i issue (amstatus, amcheck, ...), hangs and keeps > in "uninterruptible sleep". > Then system load stays at a very high load (7 or higher) although, actually > doing nothing. > The only way to clean up the system, is to reboot, sometimes only hard-reset. I'm guessing nobody replied because it really isn't an Amanda question. You are getting a kernel error while trying to update the journal on some filesystem (ext3 uses a journal, for example). Since the system is unable to complete the write, all processes trying to access the filesystem will hang in a 'D' state until the write completes (and probably never will). The problem may be related to the dm-crypt module, it could be the actual disk, bad RAM, a bug in the particular kernel version you're running, or a bad disk controller chip. Some things I would try (in this order) to see if it goes away: 1. Run memtest86+ on the machine for at least one pass. 2. Update your kernel (2.6.11 is the stable version, or try 2.6.12rc5 if you like the latest) 2. Convert your encrypted filesystem to a plain one. Frank > > I suppose, it has something to do with the filesystem (crypted, using > dm-crypt - don't know, if its important): > Every time amdump fails, i get the following error in /var/spool/messages: > > Jun 4 15:15:36 little kernel: Modules linked in: w83781d i2c_sensor i2c_dev > i2c_core aes_i586 dm_crypt sd_mod ppp_deflate zlib_deflate bsd_comp ppp_async > crc_c > citt ppp_generic slhc dm_mod > Jun 4 15:15:36 little kernel: CPU: 0 > Jun 4 15:15:36 little kernel: EIP: 0060:[<c01ba74d>] Not tainted VLI > Jun 4 15:15:36 little kernel: EFLAGS: 00010286 (2.6.11.11) > Jun 4 15:15:36 little kernel: EIP is at > journal_commit_transaction+0x1cd/0xf00 > Jun 4 15:15:36 little kernel: eax: 81910fdd ebx: 922cb828 ecx: 00000000 > edx: ec272000 > Jun 4 15:15:36 little kernel: esi: e56c24e0 edi: cd8217ac ebp: 0000000d > esp: ec273de4 > Jun 4 15:15:36 little kernel: ds: 007b es: 007b ss: 0068 > Jun 4 15:15:36 little kernel: Process kjournald (pid: 881, > threadinfo=ec272000 task=eda82560) > Jun 4 15:15:36 little kernel: Stack: ec273e5c 00000040 ec273e5c 00001130 > d013bf9c ec272000 ec272000 00000000 > Jun 4 15:15:36 little kernel: 00000000 00000000 00000000 cd57523c > cd57544c 00001130 00000000 eda82560 > Jun 4 15:15:36 little kernel: c0123ce0 ec273e48 ec273e48 00000001 > 00000086 00000001 00000000 eda82560 > Jun 4 15:15:36 little kernel: Call Trace: > Jun 4 15:15:36 little kernel: [<c0123ce0>] autoremove_wake_function+0x0/0x60 > Jun 4 15:15:36 little kernel: [<c0123ce0>] autoremove_wake_function+0x0/0x60 > Jun 4 15:15:36 little kernel: [<c01bd4a1>] kjournald+0xc1/0x1f0 > Jun 4 15:15:36 little kernel: [<c0123ce0>] autoremove_wake_function+0x0/0x60 > Jun 4 15:15:36 little kernel: [<c0123ce0>] autoremove_wake_function+0x0/0x60 > Jun 4 15:15:36 little kernel: [<c01022be>] ret_from_fork+0x6/0x14 > Jun 4 15:15:36 little kernel: [<c01bd3c0>] commit_timeout+0x0/0x10 > Jun 4 15:15:36 little kernel: [<c01bd3e0>] kjournald+0x0/0x1f0 > Jun 4 15:15:36 little kernel: [<c01006b1>] kernel_thread_helper+0x5/0x14 > Jun 4 15:15:36 little kernel: Code: c7 44 24 28 00 00 00 00 31 ed e8 af 25 > 15 00 8b 46 20 85 c0 74 64 ba 00 e0 ff ff 21 e2 89 54 24 14 89 c7 8b 40 1c 89 > 46 20 > 8b 1f <8b> 03 a8 04 0f 84 91 0b 00 00 8b 84 24 8c 01 00 00 89 5c 24 04 > Jun 4 15:18:08 little kernel: <1>Unable to handle kernel paging request at > virtual address 6b4a6d16 > Jun 4 15:18:08 little kernel: c01ba26f > Jun 4 15:18:08 little kernel: Modules linked in: w83781d i2c_sensor i2c_dev > i2c_core aes_i586 dm_crypt sd_mod ppp_deflate zlib_deflate bsd_comp ppp_async > crc_c > citt ppp_generic slhc dm_mod > Jun 4 15:18:08 little kernel: CPU: 0 > Jun 4 15:18:08 little kernel: EIP: 0060:[<c01ba26f>] Not tainted VLI > Jun 4 15:18:08 little kernel: EFLAGS: 00010286 (2.6.11.11) > Jun 4 15:18:08 little kernel: EIP is at __journal_file_buffer+0x13f/0x230 > Jun 4 15:18:08 little kernel: eax: 6b4a6cfa ebx: d299717c ecx: 00000000 > edx: cd8217ac > Jun 4 15:18:08 little kernel: esi: 00000001 edi: dd9a3780 ebp: c2ee00bc > esp: dddefc6c > Jun 4 15:18:08 little kernel: ds: 007b es: 007b ss: 0068 > Jun 4 15:18:08 little kernel: Process dumper (pid: 3759, threadinfo=dddee000 > task=d1331540) > Jun 4 15:18:08 little kernel: Stack: 00000000 c01bf10e 00001000 00000000 > c11bd880 00000000 dd9a3780 c17de6c0 > Jun 4 15:18:08 little kernel: edd852b8 c2ee00bc d299717c c01b94fe > d299717c dd9a3760 00000001 00000001 > Jun 4 15:18:08 little kernel: db7f7cb0 00000000 00001000 edd852b8 > c2ee00bc 00001000 c01a9b23 edd852b8 > Jun 4 15:18:08 little kernel: Call Trace: > Jun 4 15:18:08 little kernel: [<c01bf10e>] > journal_add_journal_head+0xae/0xc0 > Jun 4 15:18:08 little kernel: [<c01b94fe>] journal_dirty_data+0xee/0x160 > Jun 4 15:18:08 little kernel: [<c01a9b23>] ext3_journal_dirty_data+0x23/0x70 > Jun 4 15:18:08 little kernel: [<c01a9938>] walk_page_buffers+0x68/0x70 > Jun 4 15:18:08 little kernel: [<c01a9c51>] > ext3_ordered_commit_write+0x61/0xf0 > Jun 4 15:18:08 little kernel: [<c01a9b00>] ext3_journal_dirty_data+0x0/0x70 > Jun 4 15:18:08 little kernel: [<c012c149>] > generic_file_buffered_write+0x229/0x5f0 > Jun 4 15:18:08 little kernel: [<c015eb82>] inode_update_time+0x52/0xe0 > Jun 4 15:18:08 little kernel: [<c012c7dd>] > __generic_file_aio_write_nolock+0x2cd/0x500 > Jun 4 15:18:08 little kernel: [<c02929ea>] sock_common_recvmsg+0x5a/0x80 > Jun 4 15:18:08 little kernel: [<c028f525>] sock_aio_read+0xf5/0x110 > Jun 4 15:18:08 little kernel: [<c012ccc2>] generic_file_aio_write+0x72/0xe0 > Jun 4 15:18:08 little kernel: [<c01a73b4>] ext3_file_write+0x44/0xd0 > Jun 4 15:18:08 little kernel: [<c014669e>] do_sync_write+0xbe/0xf0 > Jun 4 15:18:08 little kernel: [<c0123ce0>] autoremove_wake_function+0x0/0x60 > Jun 4 15:18:08 little kernel: [<c0158264>] sys_select+0x234/0x4d0 > Jun 4 15:18:08 little kernel: [<c014676f>] vfs_write+0x9f/0x120 > Jun 4 15:18:08 little kernel: [<c01468c1>] sys_write+0x51/0x80 > Jun 4 15:18:08 little kernel: [<c01023af>] syscall_call+0x7/0xb > Jun 4 15:18:08 little kernel: Code: 21 89 5b 20 89 5b 1c 89 18 89 73 08 8b > 44 24 14 85 c0 0f 84 64 ff ff ff 0f ba 6d 00 12 e9 5a ff ff ff 8b 42 20 89 53 > 1c 89 > 43 20 <89> 58 1c 89 5a 20 eb d6 ff 47 10 83 c7 1c eb b8 83 c7 24 eb b3 > Jun 4 15:18:08 little kernel: <1>Unable to handle kernel paging request at > virtual address b67ee005 > Jun 4 15:18:08 little kernel: c01ba26f > Jun 4 15:18:08 little kernel: Modules linked in: w83781d i2c_sensor i2c_dev > i2c_core aes_i586 dm_crypt sd_mod ppp_deflate zlib_deflate bsd_comp ppp_async > crc_c > citt ppp_generic slhc dm_mod > Jun 4 15:18:08 little kernel: CPU: 0 > Jun 4 15:18:08 little kernel: EIP: 0060:[<c01ba26f>] Not tainted VLI > Jun 4 15:18:08 little kernel: EFLAGS: 00010286 (2.6.11.11) > Jun 4 15:18:08 little kernel: EIP is at __journal_file_buffer+0x13f/0x230 > Jun 4 15:18:08 little kernel: eax: b67edfe9 ebx: d299714c ecx: 00000000 > edx: cd8217ac > Jun 4 15:18:08 little kernel: esi: 00000001 edi: dd9a3780 ebp: cc90323c > esp: e32cdc6c > Jun 4 15:18:08 little kernel: ds: 007b es: 007b ss: 0068 > Jun 4 15:18:08 little kernel: Process driver (pid: 3753, threadinfo=e32cc000 > task=dfd8d060) > Jun 4 15:18:08 little kernel: Stack: c156e1e0 c01bf10e c01a9494 edd852a4 > cc2c6a54 00000000 dd9a3780 c17de6c0 > Jun 4 15:18:08 little kernel: edd852a4 cc90323c d299714c c01b94fe > d299714c dd9a3760 00000001 00000001 > Jun 4 15:18:08 little kernel: 00001000 00000000 00001000 edd852a4 > cc90323c 00001000 c01a9b23 edd852a4 > Jun 4 15:18:08 little kernel: Call Trace: > Jun 4 15:18:08 little kernel: [<c01bf10e>] > journal_add_journal_head+0xae/0xc0 > Jun 4 15:18:08 little kernel: [<c01a9494>] ext3_get_block+0x54/0xa0 > Jun 4 15:18:08 little kernel: [<c01b94fe>] journal_dirty_data+0xee/0x160 > Jun 4 15:18:08 little kernel: [<c01a9b23>] ext3_journal_dirty_data+0x23/0x70 > Jun 4 15:18:08 little kernel: [<c01a9938>] walk_page_buffers+0x68/0x70 > Jun 4 15:18:08 little kernel: [<c01a9c51>] > ext3_ordered_commit_write+0x61/0xf0 > Jun 4 15:18:08 little kernel: [<c01a9b00>] ext3_journal_dirty_data+0x0/0x70 > Jun 4 15:18:08 little kernel: [<c012c149>] > generic_file_buffered_write+0x229/0x5f0 > Jun 4 15:18:08 little kernel: [<c015ebe3>] inode_update_time+0xb3/0xe0 > Jun 4 15:18:08 little kernel: [<c012c7dd>] > __generic_file_aio_write_nolock+0x2cd/0x500 > Jun 4 15:18:08 little kernel: [<c012af5e>] > __generic_file_aio_read+0x1be/0x1f0 > Jun 4 15:18:08 little kernel: [<c012ccc2>] generic_file_aio_write+0x72/0xe0 > Jun 4 15:18:08 little kernel: [<c0152e90>] do_lookup+0x30/0xb0 > Jun 4 15:18:08 little kernel: [<c01a73b4>] ext3_file_write+0x44/0xd0 > Jun 4 15:18:08 little kernel: [<c014669e>] do_sync_write+0xbe/0xf0 > Jun 4 15:18:08 little kernel: [<c01540f9>] may_open+0x59/0x1e0 > Jun 4 15:18:08 little kernel: [<c0154325>] open_namei+0xa5/0x5c0 > Jun 4 15:18:08 little kernel: [<c0145a8e>] dentry_open+0xce/0x180 > Jun 4 15:18:08 little kernel: [<c0123ce0>] autoremove_wake_function+0x0/0x60 > Jun 4 15:18:08 little kernel: [<c014676f>] vfs_write+0x9f/0x120 > Jun 4 15:18:08 little kernel: [<c01468c1>] sys_write+0x51/0x80 > Jun 4 15:18:08 little kernel: [<c01023af>] syscall_call+0x7/0xb > Jun 4 15:18:08 little kernel: Code: 21 89 5b 20 89 5b 1c 89 18 89 73 08 8b > 44 24 14 85 c0 0f 84 64 ff ff ff 0f ba 6d 00 12 e9 5a ff ff ff 8b 42 20 89 53 > 1c 89 > 43 20 <89> 58 1c 89 5a 20 eb d6 ff 47 10 83 c7 1c eb b8 83 c7 24 eb b3 > > > > I'm using Debian-testing, > > little:~# uname -a > Linux little 2.6.11.11 #1 Fri Jun 3 13:25:57 CEST 2005 i686 GNU/Linux > > build: VERSION="Amanda-2.4.4p3" > BUILT_DATE="Wed Aug 18 13:06:52 MDT 2004" > BUILT_MACH="Linux rover 2.6.7 #1 Fri Jul 23 21:53:49 MDT 2004 i686 > GNU/Linux > > > > Does anyone has an idea, whats going wrong here? > > Tnx for help > Thorsten -- Frank Smith [EMAIL PROTECTED] Sr. Systems Administrator Voice: 512-374-4673 Hoover's Online Fax: 512-374-4501