Re: [2.6.21.1] totally hanging altough not crashed

Folkert van Heusden Fri, 20 Jul 2007 14:04:55 -0700

> > One of my systems running 2.6.21.1 on a P4 with HT and 2GB of ram
> > occasionally crashes. Not with an oops or panic, it just suddenly stops
> > doing anything. It still lets you ping and it still forwards ip traffic
> > but you can't login: not via ssh and not on the console - pressing enter
> > brings the cursor to the next line and that's it. It is NOT swapping at
> > all, in fact alt+sysrq+m says that there's still plenty of memory and
> > swap available:
> 
> Lots of processes are in uninterruptible wait at
> start_this_handle+0x208/0x365. Can you find out what line of code
> that matches? (There are a few of those waits in that function.)


I did some more investigating and found:

[EMAIL PROTECTED]:~$ grep -A 1 "Call Trace" www/ast.txt | grep -v -e "^--" -v 
-e "Call Trace:" | cut -d " "  -f 2- | genstats | more
    1   358     54.82%     1.75  [<c120f326>] schedule_timeout+0x8c/0x8e
    2    88     13.48%     7.42  [<c10c23a0>] start_this_handle+0x208/0x365
    3    63      9.65%    10.10  [<c120f2e0>] schedule_timeout+0x46/0x8e
    4    36      5.51%    17.69  [<c1020868>] do_wait+0x2c4/0x396
    5    36      5.51%    16.81  [<c1098cbd>] inotify_read+0x8d/0x1b5
    6    17      2.60%    37.59  [<c1074f52>] pipe_wait+0x8a/0xab
    7    13      1.99%     2.69  [<c102d22e>] worker_thread+0x130/0x165
    8    11      1.68%    55.73  [<c1210307>] do_nanosleep+0x42/0x70
    9     7      1.07%    92.29  [<c120f5e9>] __mutex_lock_slowpath+0xac/0x28c
   10     7      1.07%    90.71  [<c101f9fc>] do_exit+0x253/0x428
   11     2      0.31%   145.50  [<c1138101>] write_chan+0x15b/0x1d6
   12     2      0.31%     3.50  [<c104de2a>] watchdog+0x47/0x55
   13     2      0.31%     3.00  [<c10224f6>] ksoftirqd+0x85/0x98
   14     2      0.31%     2.50  [<c1018e44>] migration_thread+0x8a/0x10f
   15     1      0.15%   617.00  [<c10c7d56>] log_wait_commit+0xb5/0x121
   16     1      0.15%   562.00  [<c1029e46>] sys_pause+0x14/0x1b
   17     1      0.15%   225.00  [<f8a58ae1>] schluffen+0xad/0xaf [zaptel]
   18     1      0.15%    63.00  [<c1029efc>] sys_rt_sigsuspend+0xaf/0xcf
   19     1      0.15%    23.00  [<c10c75ff>] kjournald+0x1fd/0x207
   20     1      0.15%    22.00  [<c10c4b6f>] 
journal_commit_transaction+0x242/0xcd3
   21     1      0.15%    17.00  [<c105ab1b>] kswapd+0xf7/0x10b
   22     1      0.15%    16.00  [<c1191647>] serio_thread+0xfa/0xff
   23     1      0.15%    15.00  [<c11730e7>] hub_thread+0xe6/0xe8

(please ignore the fourth column)

As I'm not entirely sure that this an innocent to be in for a kernel I
put this point into gdb:

(gdb) p schedule_timeout
$1 = {long int (long int)} 0xc121a8b4 <schedule_timeout>
(gdb) l *0xC120F3B2
0xc120f3b2 is in __xfrm_state_insert (net/xfrm/xfrm_hash.h:49).
44      static inline unsigned __xfrm_src_hash(xfrm_address_t *daddr,
45                                             xfrm_address_t *saddr,
46                                             unsigned short family,
47                                             unsigned int hmask)
48      {
49              unsigned int h = family;
50              switch (family) {
51              case AF_INET:
52                      h ^= __xfrm4_daddr_saddr_hash(daddr, saddr);
53                      break;

I very much hope it is of any help.


Folkert van Heusden

-- 
www.vanheusden.com/multitail - win een vlaai van multivlaai! zorg
ervoor dat multitail opgenomen wordt in Fedora Core, AIX, Solaris of
HP/UX en win een vlaai naar keuze
----------------------------------------------------------------------
Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [2.6.21.1] totally hanging altough not crashed

Reply via email to