5.3-STABLE frozen on heavy network load

2004-11-17 Thread Lukas Ertl
Hi,

I'm seeing complete freezes on a 5.3-STABLE SMP (with HTT) kernel from
Fri Nov 12.  The machine is acting as a newsserver, thus it has heavy
network and disk load.

With the help of MP_WATCHDOG I was able to get a backtrace:

Watchdog timer: 2
Watchdog timer: 1
Watchdog timer: 0
Watchdog firing!
NMI ... going to debugger
[thread 100305]
Stopped at  slab_zalloc+0x2c:   setz%al
db where
slab_zalloc(c0c1fb00,2,c0c1fb00,c0c1fb00,0) at slab_zalloc+0x2c
uma_zone_slab(c0c1fb00,2) at uma_zone_slab+0xd0
uma_zalloc_internal(c0c1fb00,c39b1100,2,0,c39b1100) at uma_zalloc_internal+0x4d
uma_zalloc_arg(c0c1fb00,c39b1100,2) at uma_zalloc_arg+0x2f8
mb_init_pack(c39b1100,100,2) at mb_init_pack+0x1d
uma_zalloc_internal(c0c1fc60,e93d2c74,2,11a8,0) at uma_zalloc_internal+0xe3
uma_zalloc_arg(c0c1fc60,e93d2c74,2) at uma_zalloc_arg+0x2f8
sosend(c391f510,0,c3f3ad80,0,0) at sosend+0x33d
soo_write(c54caae4,c3f3ad80,c3858c80,0,c40897d0) at soo_write+0x62
writev(c40897d0,e93d2d14,3,c,202) at writev+0xc6
syscall(2f,2f,2f,1,1) at syscall+0x283
Xint0x80_syscall() at Xint0x80_syscall+0x1f

The kgdb debug log can be found at
http://people.freebsd.org/~le/debug.log.  The coredump and the
kernel is still available if I should send more info.

Thanks,
le
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 5.3-STABLE frozen on heavy network load

2004-11-17 Thread Robert Watson

On Wed, 17 Nov 2004, Lukas Ertl wrote:

 I'm seeing complete freezes on a 5.3-STABLE SMP (with HTT) kernel from
 Fri Nov 12.  The machine is acting as a newsserver, thus it has heavy
 network and disk load. 

Do you know if the freeze happens with 5.3-RELEASE as released?

If you set 'debug.mpsafenet=0', do the freezes keep happening? 

What happens if you run with INVARIANTS on?

Is the system too slow with WITNESS to run your workload?  If not, it
might be quite helpful to see information locks held, etc, such as show
locks for each interesting network-related thread. 

Could you send dmesg output? 

Do you have an estimate of how long it takes to go from boot to hang?

 With the help of MP_WATCHDOG I was able to get a backtrace:  kernel is
 still available if I should send more info. 

If/when this recurs, could I get you to run the following commands in DDB,
and send output:

- ps
- show lockedvnods
- show pcpu
- show pcpu X, for each valid value of X (0 ... maxcpus-1)
- do trace on each thread active on a CPU
- do trace on any network device driver ithread, on the netisr, and any
  other thread that appears to be involved in network activity

Using the current core, could you go to frame #29, and print *td,
*td-td_proc, *uio, *active_cred, and *fp.  Go to frame #28 and print *so. 
If possible, please keep this dump around, I may also ask you to inspect
*so_pcb once we know what to cast it to (given that it's a news server,
could well be TCP, in which cast *(struct inpcb *)so-so_pcb, as well as
the tcpcb reached through that).

Unfortunately complete freeze could be a result of a number of potential
problems in many different areas of the system.  I'm hoping that the ps
and trace output will hint to us whether it's caused by the network stack
or some other bit of the system (such as the file system code -- look out
for lots of processes in getblk + lots of locked vnodes).

Oh, one more thing that would be useful: if you compile with
BREAK_TO_DEBUGGER, are you able to get into the debugger using a console
break or a serial break?  If so, which?  I assume that because you're
using MP_WATCHDOG, you can't, but it's worth asking.  Right now, syscons
requires Giant, so if you can get into the debugger via the serial link
but not syscons, it will suggest something is spinning with Giant.

Thanks!

Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
[EMAIL PROTECTED]  Principal Research Scientist, McAfee Research



___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 5.3-STABLE frozen on heavy network load

2004-11-17 Thread Lukas Ertl
On Wed, 17 Nov 2004 10:22:14 + (GMT), Robert Watson
[EMAIL PROTECTED] wrote:
 
 On Wed, 17 Nov 2004, Lukas Ertl wrote:
 
  I'm seeing complete freezes on a 5.3-STABLE SMP (with HTT) kernel from
  Fri Nov 12.  The machine is acting as a newsserver, thus it has heavy
  network and disk load.
 
 Do you know if the freeze happens with 5.3-RELEASE as released?

No, as I went directly from some 5-CURRENT to RELENG_5.

 If you set 'debug.mpsafenet=0', do the freezes keep happening?
 
 What happens if you run with INVARIANTS on?

I'll check that.

 Is the system too slow with WITNESS to run your workload?

Unfortunately, yes.
 
 Could you send dmesg output?

Can be found at http://people.freebsd.org/~le/newscore.dmesg.

 Do you have an estimate of how long it takes to go from boot to hang?

Somewhere between one, two days and one, two weeks.

 If/when this recurs, could I get you to run the following commands in DDB,
 and send output:
 
 - ps
 - show lockedvnods
 - show pcpu
 - show pcpu X, for each valid value of X (0 ... maxcpus-1)
 - do trace on each thread active on a CPU
 - do trace on any network device driver ithread, on the netisr, and any
   other thread that appears to be involved in network activity

OK, will do.
 
 Using the current core, could you go to frame #29, and print *td,
 *td-td_proc, *uio, *active_cred, and *fp.  Go to frame #28 and print *so.
 If possible, please keep this dump around, I may also ask you to inspect
 *so_pcb once we know what to cast it to (given that it's a news server,
 could well be TCP, in which cast *(struct inpcb *)so-so_pcb, as well as
 the tcpcb reached through that).

Can be found at http://people.freebsd.org/~le/debug2.log.

 Oh, one more thing that would be useful: if you compile with
 BREAK_TO_DEBUGGER, are you able to get into the debugger using a console
 break or a serial break?  If so, which?  I assume that because you're
 using MP_WATCHDOG, you can't, but it's worth asking.  Right now, syscons
 requires Giant, so if you can get into the debugger via the serial link
 but not syscons, it will suggest something is spinning with Giant.

Unfortunately, I don't have a serial link available. MP_WATCHDOG was
my last resort to get at some info.

Hope that helps,
thanks,
le
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]