Hi!

I've problem with my server - it hangs every 3-14 days with different
kernel oops error messages. I've already post message about this issue
at 30 Sep 2004 in this maillist. After that time I did a lot of different
things:
- try different kernels (currently use 2.6.10 vanilla-sources)
- configure netconsole to catch kernel oops messages on second server
- post a number of bugreports in kernel bugzilla:
  -- invalid operand. EIP is at schedule_timeout+0x35/0xb7
  http://bugme.osdl.org/show_bug.cgi?id=4085
  -- udp queue Recv-Q overloaded because socket was not closed
  http://bugme.osdl.org/show_bug.cgi?id=4086
  -- oops in e1000
  http://bugme.osdl.org/show_bug.cgi?id=4088
  -- Recursive die() failure
  http://bugme.osdl.org/show_bug.cgi?id=4096
  -- REISERFS: panic: journal-601, buffer write failed
  http://bugme.osdl.org/show_bug.cgi?id=4101
  -- invalid operand: 0000 at include/linux/netdevice.h:879
  http://bugme.osdl.org/show_bug.cgi?id=4111

In short, kernel just hangs from time to time with different errors, and
this continuing in last 1.5 year on 2 different hostings and 3 different
servers (so this isn't probably hardware issue) and different kernel versions.

Only "unusual" things running on my servers is perl scripts which doing
web spidering 24x7 using a lot of non-blocking sockets (datamining of
about 100 special sites). Right now I've only two ideas why my server hangs:
1) probably some race condition bug in kernel related to non-blocking IO
2) hacker attack   :-/  (don't really believe in this)

Searching google for same errors don't help - sometimes I see something
like single non-answered post about similar issue (usually happens with
squid) in some forum and nothing more.

Can anybody help me with this @#$??? :-( Any ideas what to do or to check?

Right now I've noticed non-fatal kernel oops error in logs (looks like
sometime critical error happens which hang server while sometime
non-critical error happens... usually 1-2 days after non-critical error
critical error will happens too). Here is log:

2005-02-26_05:20:53.46432 kern.alert: Unable to handle kernel paging request at 
virtual address bf155a80
2005-02-26_05:20:53.68198 kern.alert:  printing eip:
2005-02-26_05:20:53.68201 kern.warn: bf155a80
2005-02-26_05:20:53.68202 kern.alert: *pde = 00000000
2005-02-26_05:20:53.68203 kern.alert: Oops: 0000 [#1]
2005-02-26_05:20:53.68204 kern.warn: CPU:    0
2005-02-26_05:20:53.68205 kern.warn: EIP:    0060:[<bf155a80>]    Not tainted 
VLI
2005-02-26_05:20:53.68206 kern.warn: EFLAGS: 00010246   (2.6.10) 
2005-02-26_05:20:53.68210 kern.warn: EIP is at 0xbf155a80
2005-02-26_05:20:53.68212 kern.warn: eax: 00000000   ebx: 00000000   ecx: 
00000000   edx: f7087f40
2005-02-26_05:20:53.68213 kern.warn: esi: 00000000   edi: 00000000   ebp: 
bffffd38   esp: f7087eec
2005-02-26_05:20:53.68214 kern.warn: ds: 007b   es: 007b   ss: 0068
2005-02-26_05:20:53.68215 kern.warn: Process apache2 (pid: 888, 
threadinfo=f7086000 task=f7045580)
2005-02-26_05:20:53.68216 kern.warn: Stack: c0155d01 f7087f40 f7087f90 00000001 
f7045580 ed3ba0cc f7045624 c1b145ac 
2005-02-26_05:20:53.68217 kern.warn:        c18d007b c18d007b 00000296 00000000 
f7086000 c0114b61 ffffffff 00000007 
2005-02-26_05:20:53.68218 kern.warn:        f721b0a0 c1bc0a20 000003e8 00000000 
00000000 00000246 00000000 f7045580 
2005-02-26_05:20:53.68219 kern.warn: Call Trace:
2005-02-26_05:20:53.68219 kern.warn:  [<c0155d01>] do_select+0x41/0x2b0
2005-02-26_05:20:53.68220 kern.warn:  [<c0114b61>] do_wait+0x1c1/0x470
2005-02-26_05:20:53.68221 kern.warn:  [<c01562ab>] sys_select+0x2fb/0x530
2005-02-26_05:20:53.68223 kern.warn:  [<c0114f25>] sys_waitpid+0x25/0x29
2005-02-26_05:20:53.68224 kern.warn:  [<c01022e3>] syscall_call+0x7/0xb
2005-02-26_05:20:53.68225 kern.warn: Code:  Bad EIP value.
2005-02-26_05:20:53.68226 kern.warn:  <7>IN=eth0 OUT= 
MAC=00:30:48:42:63:fc:00:d0:02:49:64:00:08:00 SRC=212.31.242.103 
DST=XXX.XXX.XXX.XXX LEN=40 TOS=0x00 PREC=0x00 TTL=48 ID=6892 DF PROTO=TCP 
SPT=3039 DPT=443 WINDOW=0 RES=0x00 RST URGP=0 

P.S. A lot information about my hardware/software you can see in first
bugreport in kernel bugzilla (http://bugme.osdl.org/show_bug.cgi?id=4085).

-- 
                        WBR, Alex.

Reply via email to