2.4.2pre3 panic with nfsd

2001-02-13 Thread Brian Grossman


This is a crash from nfsd on 2.4.2pre3.  Cpuinfo below ksymoops output.
Top state at crash below ksymoops.

Exported disks are 3 partitions of one scsi disk device.
All three filesystems are ext2.
Scsi adapter is buslogic.
Two tulip network cards.
Two heavy use write-mostly nfs clients.

gcc version 2.95.2 2220 (Debian GNU/Linux)

Brian


No modules in ksyms, skipping objects
Warning (read_lsmod): no symbols in lsmod, is /proc/modules a valid lsmod file?
Reading Oops report from the terminal
c011484a
*pde = 
Oops: 
CPU:0
EIP:0010:[]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010883
eax: 72747334   ebx: 7274733c   ecx: 0001   edx: 0001
esi: cfdae750   edi: c6db3ce0   ebp: df355c78   esp: df355c58
ds: 0018   es: 0018   ss: 0018
Process nfsd (pid: 366, stackpage=df355000)
Stack: cfdae680 cfdae750 d7b18b60 72747334 c6db3ce4 0001 0282 0001
    c020abbb  d7b18b60 cc670e60 0018 0020 
   100210ac 0003 d7b1ce27 cc670e74 c01f5a97 d7b18b60 0020 d7b18b60
Call Trace: [] [] [] [] [] 
[] []
   [] [] [] [] [] [] 
[] []
   [] [] [] [] [] [] 
[] []
   [] [] []
Code: 8b 48 04 8b 1b 8b 01 85 45 fc 74 6a 31 c0 9c 5e fa f0 fe 0d

>>EIP; c011484a <__wake_up+3e/cc>   <=
Trace; c020abbb 
Trace; c01f5a97 
Trace; c01f5e11 
Trace; c01eb70e 
Trace; c011ac2c 
Trace; c010a9b5 
Trace; c01090ec 
Trace; c0116f82 
Trace; c01bd0eb 
Trace; c01bc2ad 
Trace; c017fd31 
Trace; c011afac <__run_task_queue+60/74>
Trace; c01341bd <__wait_on_buffer+5d/94>
Trace; c0135539 
Trace; c01536aa 
Trace; c0147d95 
Trace; c0147ff9 
Trace; c015447b 
Trace; c013ec6d 
Trace; c013ecff 
Trace; c016b15b 
Trace; c0168e6c 
Trace; c01687e3 
Trace; c0224f55 
Trace; c016860d 
Trace; c01075a4 
Code;  c011484a <__wake_up+3e/cc>
 <_EIP>:
Code;  c011484a <__wake_up+3e/cc>   <=
   0:   8b 48 04  mov0x4(%eax),%ecx   <=
Code;  c011484d <__wake_up+41/cc>
   3:   8b 1b mov(%ebx),%ebx
Code;  c011484f <__wake_up+43/cc>
   5:   8b 01 mov(%ecx),%eax
Code;  c0114851 <__wake_up+45/cc>
   7:   85 45 fc  test   %eax,0xfffc(%ebp)
Code;  c0114854 <__wake_up+48/cc>
   a:   74 6a je 76 <_EIP+0x76> c01148c0 <__wake_up+b4/cc>
Code;  c0114856 <__wake_up+4a/cc>
   c:   31 c0 xor%eax,%eax
Code;  c0114858 <__wake_up+4c/cc>
   e:   9cpushf  
Code;  c0114859 <__wake_up+4d/cc>
   f:   5epop%esi
Code;  c011485a <__wake_up+4e/cc>
  10:   facli
Code;  c011485b <__wake_up+4f/cc>
  11:   f0 fe 0d 00 00 00 00  lock decb 0x0

Kernel panic: Aiee, killing interrupt handler!

2 warnings issued.  Results may not be reliable.





CPUINFO
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 5
model name  : Pentium II (Deschutes)
stepping: 2
cpu MHz : 451.031
cache size  : 512 KB
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 2
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat 
pse36 mmx fxsr
bogomips: 897.84

processor   : 1
vendor_id   : GenuineIntel
cpu family  : 6
model   : 5
model name  : Pentium II (Deschutes)
stepping: 2
cpu MHz : 451.031
cache size  : 512 KB
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 2
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat 
pse36 mmx fxsr
bogomips: 901.12



TOP
  4:12am  up 1 day, 11:39,  3 users,  load average: 10.69, 11.49, 11.66
460 processes: 458 sleeping, 1 running, 0 zombie, 1 stopped
CPU states:  7.6% user, 17.4% system,  0.0% nice, 174.6% idle
CPU0 states:  4.1% user,  8.4% system,  0.0% nice, 86.4% idle
CPU1 states:  3.2% user,  8.2% system,  0.0% nice, 88.0% idle
Mem:   513104K av,  508720K used,4384K free,   0K shrd,   54020K buff
Swap:  130748K av,   0K used,  130748K free   62104K cached

  PID USER PRI  NI  SIZE  RSS SHARE LC STAT %CPU %MEM   TIME COMMAND
 1340 root  15   0  1256 1256   732  0 R 7.3  0.2 126:06 top
  420 root  18   0   580  580   456  1 S 3.4  0.1  95:56 ipstat
24500 qmailusr  11   0   400  400   328  1 S 1.5  0.0   0:00 qmail-pop3d
  364 root  11   0 00 0  0 SW0.7  0.0   9:26 nfsd
  370 root  11   0 00 0  0 SW0.7  0.0   9:27 nfsd
  609 root   0   0   604  604   516  1 S 0.7  0.1  12:29 tcpserver
23430 www   10   0  3008 3008  2792  1 S 0.7  0.5   0:00 httpd
23761 www9   0  3028 3028  2784  1 S 0.7  0.5   0:00 httpd
24495 qmailusr  11   0   400  

tcp stalls with 2.4 (but not 2.2)

2001-02-26 Thread Brian Grossman


I'm seeing stalls sending packets to some clients.  I see this problem
under 2.4 (2.4.1 and 2.4.1ac17) but not under 2.2.17.

My theory is there is an ICMP black hole between my server and some of its
clients.  Is there a tool to pinpoint that black hole if it exists?

Can anyone suggest another cause or a direction for investigation?

Why does this affect 2.4 but not 2.2?

The characteristics I've discovered so far:

From strace of the server process, each write to the network is
preceeded by a select on the output fd.  The select waits for a
long time, after which the write succeeds.

The packets are received by the client a couple minutes after my
server sends them.

The clients I have tested with are win98 and winNT.

The router for both 2.4 and 2.2 servers is running 2.2.18 with
ipvs (ipvs-1.0.2-2.2.18).

That router does not block any ICMP.

The behavior occurs on the 2.4 machine whether the packets are
routed directly or are mangled by ipvs.

I've tried the same machine with both 2.4 and 2.2, as well as
another machine with just 2.2.  2.2 works.  2.4 doesn't.

Both of my servers and the router I mentioned have two tulip
network cards.

The clients I've tested with are behind a modem through earthlink.
Another I suspect to have same problem is behind a modem
through Juno.

I've tried adjusting both /proc/sys/net/ipv4/route/min_adv_mss and
/proc/sys/net/ipv4/route/min_pmtu downward.  Do these require an
ifconfig down/up to take effect?

Thanks for any help anyone can supply.

Brian
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: tcp stalls with 2.4 (but not 2.2)

2001-02-26 Thread Brian Grossman


> > I'm seeing stalls sending packets to some clients.  I see this problem
> > under 2.4 (2.4.1 and 2.4.1ac17) but not under 2.2.17.
> 
> compiled in ECN support? SYNcookies?  try disabling through /proc
> tcp or udp? if udp check /proc/net/ipv4/ip_udpdloose or such

CONFIG_INET_ECN is not set in .config.
CONFIG_SYN_COOKIES is set, but tcp_syncookies but is set to 0.

> > My theory is there is an ICMP black hole between my server and some of its
> > clients.  Is there a tool to pinpoint that black hole if it exists?
> 
> ping is your friend.  -s lets you set size of packet. (to
> check for fragmentation) use tcpdump to capture
> a trace of this or a tcp session.

> email trace to me private if you want.

Does ping set the no fragment bit?

Ping -s 1500 to the router immediately before client's known IP address
works fine.  I'll get the owner of the client to help out later and send
those results with tcpdump to you privately.

Brian
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



ext2: block > big ?

2001-02-11 Thread Brian Grossman


What does a message like 'ext2: block > big' indicate?

This was kernel 2.2.18aa2.

The machine was completely unresponsive when I got there.  There were a
bunch of block>big messages on the screen, but no oops.

In my grogginess, I didn't have the sense to copy down the whole message,
but it did also mention the device (8,9?).  The major 8 scsi devices in use
are three partitions of one disk -- two 15GB and one 50GB.

Brian
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



sysinfo.sharedram not accounted for on i386 ?

2001-02-11 Thread Brian Grossman


On i386, sysinfo.sharedram is not accounted for, leading /proc/meminfo to
always report MemShared as 0.  Is this the intended behavior?

Brian
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: sysinfo.sharedram not accounted for on i386 ?

2001-02-12 Thread Brian Grossman


> On Mon, Feb 12, 2001 at 12:05:03AM -0700, Brian Grossman wrote:
> > On i386, sysinfo.sharedram is not accounted for, leading /proc/meminfo to
> > always report MemShared as 0.  Is this the intended behavior?
> 
> Yes.

Thanks.  Is there a preferred way of getting the equivalent info
as free(1) did under 2.2?

I've written a script to derive it from /proc/[0-9]*/statm, but that seems
like an awkward approach.  A related question: is the page size stored in
/proc somewhere?

Is there a discussion of this somewhere?  I couldn't find one when I
searched the linux-kernel archives.

Brian
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/