2.4.4: Kernel crash, possibly tcp related

2001-04-29 Thread Ralf Nyren


Greetings,

A possibly tcp-related bug causing a kernel crash, possible to trigger
from an unprivileged user.

Kernel 2.4.4, no patches applied.

The problem appeared when performing some network-performance tests with a
program called tcpblast. tcpblast has an option to set its "block size".
The block size is the size of the buffer passed to the write function.
The problem appears when this value is set to 40481 or higher. For ex:
$ tcpblast -d0 -s 40481 another_host 9000
With this block size the following message spammed:
tcp/udpblast send:: No such file or directory
Trying the same command with a 2.2.18 kernel gave:
tcp/udpblast send:: Bad address
The first part is from tcpblast, the second is printed via perror.
  Well, if the machine then has "some" other work running a kernel
crash occurs (note that this only applies to 2.4.4, 2.2.18 didn't
seem to have the problem):

KERNEL: assertion (!skb_queue_empty(&sk->write_queue)) failed at tcp_timer.c(327):
tcp_retransmit_timer
Unable to handle kernel NULL pointer dereference...
.
.
.
Kernel panic: Aiee, killing interrupt handler!
In interrupt handler - not syncing

Then the machine is completely locked up, no vt-changing or ctrl->scroll_lock etc
works.


The most efficient way I found to produce "some load" to trigger the bug while running
tcpblast was to use a simple forkbomb:
int main() { while(1) fork(); }

If you need more information, just ask.

regards,
/Ralf Nyrén


System information:

cat /proc/version
Linux version 2.4.4 (plumbum@client2) (gcc version 2.95.2 2220 (Debian GNU/Linux))
#4 Sat Apr 28 15:47:17 CEST 2001

cat /proc/cpuinfo
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 3
model name  : Pentium II (Klamath)
stepping: 4
cpu MHz : 232.349
cache size  : 512 KB
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 2
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov mmx
bogomips: 463.66

cat /proc/modules
vfat8688   0 (unused)
fat30272   0 [vfat]

cat /proc/ioports
-001f : dma1
0020-003f : pic1
0040-005f : timer
0060-006f : keyboard
0070-007f : rtc
0080-008f : dma page reg
00a0-00bf : pic2
00c0-00df : dma2
00f0-00ff : fpu
0170-0177 : ide1
01f0-01f7 : ide0
02f8-02ff : serial(auto)
0376-0376 : ide1
03c0-03df : vga+
03f6-03f6 : ide0
03f8-03ff : serial(auto)
0cf8-0cff : PCI conf1
4000-403f : Intel Corporation 82371AB PIIX4 ACPI
5000-501f : Intel Corporation 82371AB PIIX4 ACPI
6400-641f : Intel Corporation 82371AB PIIX4 USB
6800-687f : VIA Technologies, Inc. VT86C100A [Rhine 10/100]
  6800-687f : via-rhine
e000-efff : PCI Bus #01
  e000-e0ff : ATI Technologies Inc 3D Rage LT Pro AGP-133
f000-f00f : Intel Corporation 82371AB PIIX4 IDE
  f000-f007 : ide0
  f008-f00f : ide1

cat /proc/iomem
-0009fbff : System RAM
0009fc00-0009 : reserved
000a-000b : Video RAM area
000c-000c7fff : Video ROM
000f-000f : System ROM
0010-03ff : System RAM
  0010-001d160b : Kernel code
  001d160c-0021a957 : Kernel data
a800-afff : PCI Bus #01
d800-dfff : PCI Bus #01
  d800-d8ff : ATI Technologies Inc 3D Rage LT Pro AGP-133
  d900-d9000fff : ATI Technologies Inc 3D Rage LT Pro AGP-133
e000-e3ff : Intel Corporation 440LX/EX - 82443LX/EX Host bridge
e400-e4ff : 3Dfx Interactive, Inc. Voodoo 2
e500-e57f : VIA Technologies, Inc. VT86C100A [Rhine 10/100]
  e500-e57f : via-rhine
- : reserved



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.4: Kernel crash, possibly tcp related

2001-04-30 Thread Ralf Nyren




On Sun, 29 Apr 2001, David S. Miller wrote:

[snip]
>
> Anyways, I just tried to reproduce Ralf's problem on two of my
> machines.  One was an SMP sparc64 system, and the other was my
> uniprocessor Athlon.
>
> What kind of machine are you reproducing this on Ralf?  I'm not
> even getting the very strange errors from tcpblast on the command
> line, it is functioning perfectly fine and sending a stream of
> data to the other machine.  Are you doing something weird like
> making the remote machine the local machine in your tcpblast run?
>
> Later,
> David S. Miller
> [EMAIL PROTECTED]
>


Sorry for not including a reference to the software. I used the
tcpblast program from Debian (unstable). It can be found in the
netdiag package:
http://ftp.debian.org/debian/dists/woody/main/source/net/netdiag_0.7.orig.tar.gz

Since this problem seemed a bit hard to reproduce I tested it on another
machine too. It needed some more load, but eventually crashed.
This machine is a PII 400MHz, 128MB, 440BX/ZX, PIIX. 3c905B network card.
For more information like .config, System.map, ver_linux etc see:
http://www.educ.umu.se/~plumbum/kernel/panic_2.4.4_20010430/

Regarding the strange error msg: tcp/udpblast send:: No such file or directory
both the precompiled binary and one compiled from the source produced
this message. Although I noticed that the min blocksize triggering the message
changed from 40481 to 39841. Probably some compiletime feature :)

Making remote machine the local machine... no, I send from my machine
to another. Both with 100Mbps network connections.

Reproduction procedure:
./tcpblast -d0 -s 20 _another_host_ 9000
./forkbomb
wait...

The so called "forkbomb" shouldn't really be necessary, some heavy load
making use of scheduler, memory and swap seems to do the thing.

Hope this information could be helpful.

regards,
/Ralf

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/