Hi! I've problem with my server - it hangs every 3-14 days with different kernel oops error messages. I've already post message about this issue at 30 Sep 2004 in this maillist. After that time I did a lot of different things: - try different kernels (currently use 2.6.10 vanilla-sources) - configure netconsole to catch kernel oops messages on second server - post a number of bugreports in kernel bugzilla: -- invalid operand. EIP is at schedule_timeout+0x35/0xb7 http://bugme.osdl.org/show_bug.cgi?id=4085 -- udp queue Recv-Q overloaded because socket was not closed http://bugme.osdl.org/show_bug.cgi?id=4086 -- oops in e1000 http://bugme.osdl.org/show_bug.cgi?id=4088 -- Recursive die() failure http://bugme.osdl.org/show_bug.cgi?id=4096 -- REISERFS: panic: journal-601, buffer write failed http://bugme.osdl.org/show_bug.cgi?id=4101 -- invalid operand: 0000 at include/linux/netdevice.h:879 http://bugme.osdl.org/show_bug.cgi?id=4111
In short, kernel just hangs from time to time with different errors, and this continuing in last 1.5 year on 2 different hostings and 3 different servers (so this isn't probably hardware issue) and different kernel versions. Only "unusual" things running on my servers is perl scripts which doing web spidering 24x7 using a lot of non-blocking sockets (datamining of about 100 special sites). Right now I've only two ideas why my server hangs: 1) probably some race condition bug in kernel related to non-blocking IO 2) hacker attack :-/ (don't really believe in this) Searching google for same errors don't help - sometimes I see something like single non-answered post about similar issue (usually happens with squid) in some forum and nothing more. Can anybody help me with this @#$??? :-( Any ideas what to do or to check? Right now I've noticed non-fatal kernel oops error in logs (looks like sometime critical error happens which hang server while sometime non-critical error happens... usually 1-2 days after non-critical error critical error will happens too). Here is log: 2005-02-26_05:20:53.46432 kern.alert: Unable to handle kernel paging request at virtual address bf155a80 2005-02-26_05:20:53.68198 kern.alert: printing eip: 2005-02-26_05:20:53.68201 kern.warn: bf155a80 2005-02-26_05:20:53.68202 kern.alert: *pde = 00000000 2005-02-26_05:20:53.68203 kern.alert: Oops: 0000 [#1] 2005-02-26_05:20:53.68204 kern.warn: CPU: 0 2005-02-26_05:20:53.68205 kern.warn: EIP: 0060:[<bf155a80>] Not tainted VLI 2005-02-26_05:20:53.68206 kern.warn: EFLAGS: 00010246 (2.6.10) 2005-02-26_05:20:53.68210 kern.warn: EIP is at 0xbf155a80 2005-02-26_05:20:53.68212 kern.warn: eax: 00000000 ebx: 00000000 ecx: 00000000 edx: f7087f40 2005-02-26_05:20:53.68213 kern.warn: esi: 00000000 edi: 00000000 ebp: bffffd38 esp: f7087eec 2005-02-26_05:20:53.68214 kern.warn: ds: 007b es: 007b ss: 0068 2005-02-26_05:20:53.68215 kern.warn: Process apache2 (pid: 888, threadinfo=f7086000 task=f7045580) 2005-02-26_05:20:53.68216 kern.warn: Stack: c0155d01 f7087f40 f7087f90 00000001 f7045580 ed3ba0cc f7045624 c1b145ac 2005-02-26_05:20:53.68217 kern.warn: c18d007b c18d007b 00000296 00000000 f7086000 c0114b61 ffffffff 00000007 2005-02-26_05:20:53.68218 kern.warn: f721b0a0 c1bc0a20 000003e8 00000000 00000000 00000246 00000000 f7045580 2005-02-26_05:20:53.68219 kern.warn: Call Trace: 2005-02-26_05:20:53.68219 kern.warn: [<c0155d01>] do_select+0x41/0x2b0 2005-02-26_05:20:53.68220 kern.warn: [<c0114b61>] do_wait+0x1c1/0x470 2005-02-26_05:20:53.68221 kern.warn: [<c01562ab>] sys_select+0x2fb/0x530 2005-02-26_05:20:53.68223 kern.warn: [<c0114f25>] sys_waitpid+0x25/0x29 2005-02-26_05:20:53.68224 kern.warn: [<c01022e3>] syscall_call+0x7/0xb 2005-02-26_05:20:53.68225 kern.warn: Code: Bad EIP value. 2005-02-26_05:20:53.68226 kern.warn: <7>IN=eth0 OUT= MAC=00:30:48:42:63:fc:00:d0:02:49:64:00:08:00 SRC=212.31.242.103 DST=XXX.XXX.XXX.XXX LEN=40 TOS=0x00 PREC=0x00 TTL=48 ID=6892 DF PROTO=TCP SPT=3039 DPT=443 WINDOW=0 RES=0x00 RST URGP=0 P.S. A lot information about my hardware/software you can see in first bugreport in kernel bugzilla (http://bugme.osdl.org/show_bug.cgi?id=4085). -- WBR, Alex.
