Have you considered hardware failure? Especially since the errors are
"random", and not reproducable? This seems most likely.


On Sun, 27 Feb 2005 01:40:01 +0200, Alex Efros <[EMAIL PROTECTED]> wrote:
> Hi!
> 
> I've problem with my server - it hangs every 3-14 days with different
> kernel oops error messages. I've already post message about this issue
> at 30 Sep 2004 in this maillist. After that time I did a lot of different
> things:
> - try different kernels (currently use 2.6.10 vanilla-sources)
> - configure netconsole to catch kernel oops messages on second server
> - post a number of bugreports in kernel bugzilla:
>   -- invalid operand. EIP is at schedule_timeout+0x35/0xb7
>   http://bugme.osdl.org/show_bug.cgi?id=4085
>   -- udp queue Recv-Q overloaded because socket was not closed
>   http://bugme.osdl.org/show_bug.cgi?id=4086
>   -- oops in e1000
>   http://bugme.osdl.org/show_bug.cgi?id=4088
>   -- Recursive die() failure
>   http://bugme.osdl.org/show_bug.cgi?id=4096
>   -- REISERFS: panic: journal-601, buffer write failed
>   http://bugme.osdl.org/show_bug.cgi?id=4101
>   -- invalid operand: 0000 at include/linux/netdevice.h:879
>   http://bugme.osdl.org/show_bug.cgi?id=4111
> 
> In short, kernel just hangs from time to time with different errors, and
> this continuing in last 1.5 year on 2 different hostings and 3 different
> servers (so this isn't probably hardware issue) and different kernel versions.
> 
> Only "unusual" things running on my servers is perl scripts which doing
> web spidering 24x7 using a lot of non-blocking sockets (datamining of
> about 100 special sites). Right now I've only two ideas why my server hangs:
> 1) probably some race condition bug in kernel related to non-blocking IO
> 2) hacker attack   :-/  (don't really believe in this)
> 
> Searching google for same errors don't help - sometimes I see something
> like single non-answered post about similar issue (usually happens with
> squid) in some forum and nothing more.
> 
> Can anybody help me with this @#$??? :-( Any ideas what to do or to check?
> 
> Right now I've noticed non-fatal kernel oops error in logs (looks like
> sometime critical error happens which hang server while sometime
> non-critical error happens... usually 1-2 days after non-critical error
> critical error will happens too). Here is log:
> 
> 2005-02-26_05:20:53.46432 kern.alert: Unable to handle kernel paging request 
> at virtual address bf155a80
> 2005-02-26_05:20:53.68198 kern.alert:  printing eip:
> 2005-02-26_05:20:53.68201 kern.warn: bf155a80
> 2005-02-26_05:20:53.68202 kern.alert: *pde = 00000000
> 2005-02-26_05:20:53.68203 kern.alert: Oops: 0000 [#1]
> 2005-02-26_05:20:53.68204 kern.warn: CPU:    0
> 2005-02-26_05:20:53.68205 kern.warn: EIP:    0060:[<bf155a80>]    Not tainted 
> VLI
> 2005-02-26_05:20:53.68206 kern.warn: EFLAGS: 00010246   (2.6.10)
> 2005-02-26_05:20:53.68210 kern.warn: EIP is at 0xbf155a80
> 2005-02-26_05:20:53.68212 kern.warn: eax: 00000000   ebx: 00000000   ecx: 
> 00000000   edx: f7087f40
> 2005-02-26_05:20:53.68213 kern.warn: esi: 00000000   edi: 00000000   ebp: 
> bffffd38   esp: f7087eec
> 2005-02-26_05:20:53.68214 kern.warn: ds: 007b   es: 007b   ss: 0068
> 2005-02-26_05:20:53.68215 kern.warn: Process apache2 (pid: 888, 
> threadinfo=f7086000 task=f7045580)
> 2005-02-26_05:20:53.68216 kern.warn: Stack: c0155d01 f7087f40 f7087f90 
> 00000001 f7045580 ed3ba0cc f7045624 c1b145ac
> 2005-02-26_05:20:53.68217 kern.warn:        c18d007b c18d007b 00000296 
> 00000000 f7086000 c0114b61 ffffffff 00000007
> 2005-02-26_05:20:53.68218 kern.warn:        f721b0a0 c1bc0a20 000003e8 
> 00000000 00000000 00000246 00000000 f7045580
> 2005-02-26_05:20:53.68219 kern.warn: Call Trace:
> 2005-02-26_05:20:53.68219 kern.warn:  [<c0155d01>] do_select+0x41/0x2b0
> 2005-02-26_05:20:53.68220 kern.warn:  [<c0114b61>] do_wait+0x1c1/0x470
> 2005-02-26_05:20:53.68221 kern.warn:  [<c01562ab>] sys_select+0x2fb/0x530
> 2005-02-26_05:20:53.68223 kern.warn:  [<c0114f25>] sys_waitpid+0x25/0x29
> 2005-02-26_05:20:53.68224 kern.warn:  [<c01022e3>] syscall_call+0x7/0xb
> 2005-02-26_05:20:53.68225 kern.warn: Code:  Bad EIP value.
> 2005-02-26_05:20:53.68226 kern.warn:  <7>IN=eth0 OUT= 
> MAC=00:30:48:42:63:fc:00:d0:02:49:64:00:08:00 SRC=212.31.242.103 
> DST=XXX.XXX.XXX.XXX LEN=40 TOS=0x00 PREC=0x00 TTL=48 ID=6892 DF PROTO=TCP 
> SPT=3039 DPT=443 WINDOW=0 RES=0x00 RST URGP=0
> 
> P.S. A lot information about my hardware/software you can see in first
> bugreport in kernel bugzilla (http://bugme.osdl.org/show_bug.cgi?id=4085).
> 
> --
>                         WBR, Alex.
>

Reply via email to