Re: Oracle 8I & Kernel 2.4.3 : Sane ?

2001-04-03 Thread Stephen Clouse

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Mon, Apr 02, 2001 at 02:12:40PM +0200, Yann Dupont wrote:
> Is oracle 8.1.5 + Kernel 2.4.3 a sane combination ?
> In general is oracle + Kernel 2.4 working ?

2.4.3 I can't speak for, but we have been running our development server (Oracle
8.1.6) on 2.4.2 since the day it was released.  No problems whatsoever.

I'd recommend consulting the Oracle docs as to what is screwed with your 
rollback segments.  I highly doubt this is Linux's fault.

- -- 
Stephen Clouse <[EMAIL PROTECTED]>
Senior Programmer, IQ Coordinator Project Lead
The IQ Group, Inc. <http://www.theiqgroup.com/>

-BEGIN PGP SIGNATURE-
Version: PGP 6.5.8

iQA/AwUBOsmDOwOGqGs0PadnEQI0qQCdFS+PLvff8YxstOUAB33gSoyRsfkAoKeP
n87LAwm5FrYIjFG8/WXh0IEh
=LCx9
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Oracle 8I Kernel 2.4.3 : Sane ?

2001-04-03 Thread Stephen Clouse

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Mon, Apr 02, 2001 at 02:12:40PM +0200, Yann Dupont wrote:
 Is oracle 8.1.5 + Kernel 2.4.3 a sane combination ?
 In general is oracle + Kernel 2.4 working ?

2.4.3 I can't speak for, but we have been running our development server (Oracle
8.1.6) on 2.4.2 since the day it was released.  No problems whatsoever.

I'd recommend consulting the Oracle docs as to what is screwed with your 
rollback segments.  I highly doubt this is Linux's fault.

- -- 
Stephen Clouse [EMAIL PROTECTED]
Senior Programmer, IQ Coordinator Project Lead
The IQ Group, Inc. http://www.theiqgroup.com/

-BEGIN PGP SIGNATURE-
Version: PGP 6.5.8

iQA/AwUBOsmDOwOGqGs0PadnEQI0qQCdFS+PLvff8YxstOUAB33gSoyRsfkAoKeP
n87LAwm5FrYIjFG8/WXh0IEh
=LCx9
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Bug in the file attributes ?

2001-03-29 Thread Stephen Clouse

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Thu, Mar 29, 2001 at 08:20:32PM +, Xavier Ordoquy wrote:
> This is in the user home directory.
> Since the file is read only for the user, it should not be able to remove
> it. Moreover, the user can't write to test.
> So I think this is a bug.

You have failed to RTFM.  There is no bug here.

http://www.linuxdoc.org/FAQ/Linux-FAQ/x1955.html#AEN2242

- -- 
Stephen Clouse <[EMAIL PROTECTED]>
Senior Programmer, IQ Coordinator Project Lead
The IQ Group, Inc. <http://www.theiqgroup.com/>

-BEGIN PGP SIGNATURE-
Version: PGP 6.5.8

iQA/AwUBOsOC1gOGqGs0PadnEQJtVwCgm23nRu0O14SwWvxjZDulld8m24YAn2vb
yHGvzJR10oC1dabikTezfX+3
=TlMz
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Bug in the file attributes ?

2001-03-29 Thread Stephen Clouse

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Thu, Mar 29, 2001 at 08:20:32PM +, Xavier Ordoquy wrote:
 This is in the user home directory.
 Since the file is read only for the user, it should not be able to remove
 it. Moreover, the user can't write to test.
 So I think this is a bug.

You have failed to RTFM.  There is no bug here.

http://www.linuxdoc.org/FAQ/Linux-FAQ/x1955.html#AEN2242

- -- 
Stephen Clouse [EMAIL PROTECTED]
Senior Programmer, IQ Coordinator Project Lead
The IQ Group, Inc. http://www.theiqgroup.com/

-BEGIN PGP SIGNATURE-
Version: PGP 6.5.8

iQA/AwUBOsOC1gOGqGs0PadnEQJtVwCgm23nRu0O14SwWvxjZDulld8m24YAn2vb
yHGvzJR10oC1dabikTezfX+3
=TlMz
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] Prevent OOM from killing init

2001-03-24 Thread Stephen Clouse

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Sat, Mar 24, 2001 at 09:45:01PM -0800, Stephen Satchell wrote:
> If you have a mission-critical application running on your box, add it to 
> the inittab file with the RESPAWN attribute.  That way, OOM killer kills 
> it, init notices it, and init restarts your server.

Ah, that's great for simple daemons.  Now tell me how to help an app like this 
(Oracle exampled here):

oracle  89  0.0  0.4 41076 1776 ?SMar22   0:00 ora_pmon_slash
oracle  91  0.0  0.6 40676 2620 ?SMar22   0:00 ora_dbw0_slash
oracle  93  0.0  0.4 40544 1788 ?SMar22   0:00 ora_lgwr_slash
oracle  95  0.0  0.4 40544 1744 ?SMar22   0:00 ora_ckpt_slash
oracle  97  0.0  1.1 40556 4404 ?SMar22   0:00 ora_smon_slash
oracle  99  0.0  0.5 40536 2188 ?SMar22   0:00 ora_reco_slash
oracle 101  0.0  0.4 40656 1756 ?SMar22   0:00 ora_arc0_slash

In this example, when oom_kill reaps one of these autonomous threads, Oracle 
opts to crash and burn.  Database corruption is almost guaranteed.

In all reality, I'm sure any daemon (threads or no) that works heavily with disk
files is likely to screw itself and its data if it gets sigkilled for no
reason.  And in our environment, there is no reason for it to get sigkilled.

I'm going to severely hurt the first person that says such a program should be
*expecting* random untrappable annihilation of its threads.  (And what happens
when the master process *is* the target?)

- -- 
Stephen Clouse <[EMAIL PROTECTED]>
Senior Programmer, IQ Coordinator Project Lead
The IQ Group, Inc. <http://www.theiqgroup.com/>

-BEGIN PGP SIGNATURE-
Version: PGP 6.5.8

iQA/AwUBOr2XDgOGqGs0PadnEQK0rACfQELDid11+m90bS/DrGyrsHW45ZEAn19G
mL3fSCdi2TeHDxGLA8uXT8l5
=oQPV
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] Prevent OOM from killing init

2001-03-24 Thread Stephen Clouse

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Sat, Mar 24, 2001 at 09:45:01PM -0800, Stephen Satchell wrote:
 If you have a mission-critical application running on your box, add it to 
 the inittab file with the RESPAWN attribute.  That way, OOM killer kills 
 it, init notices it, and init restarts your server.

Ah, that's great for simple daemons.  Now tell me how to help an app like this 
(Oracle exampled here):

oracle  89  0.0  0.4 41076 1776 ?SMar22   0:00 ora_pmon_slash
oracle  91  0.0  0.6 40676 2620 ?SMar22   0:00 ora_dbw0_slash
oracle  93  0.0  0.4 40544 1788 ?SMar22   0:00 ora_lgwr_slash
oracle  95  0.0  0.4 40544 1744 ?SMar22   0:00 ora_ckpt_slash
oracle  97  0.0  1.1 40556 4404 ?SMar22   0:00 ora_smon_slash
oracle  99  0.0  0.5 40536 2188 ?SMar22   0:00 ora_reco_slash
oracle 101  0.0  0.4 40656 1756 ?SMar22   0:00 ora_arc0_slash

In this example, when oom_kill reaps one of these autonomous threads, Oracle 
opts to crash and burn.  Database corruption is almost guaranteed.

In all reality, I'm sure any daemon (threads or no) that works heavily with disk
files is likely to screw itself and its data if it gets sigkilled for no
reason.  And in our environment, there is no reason for it to get sigkilled.

I'm going to severely hurt the first person that says such a program should be
*expecting* random untrappable annihilation of its threads.  (And what happens
when the master process *is* the target?)

- -- 
Stephen Clouse [EMAIL PROTECTED]
Senior Programmer, IQ Coordinator Project Lead
The IQ Group, Inc. http://www.theiqgroup.com/

-BEGIN PGP SIGNATURE-
Version: PGP 6.5.8

iQA/AwUBOr2XDgOGqGs0PadnEQK0rACfQELDid11+m90bS/DrGyrsHW45ZEAn19G
mL3fSCdi2TeHDxGLA8uXT8l5
=oQPV
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] Prevent OOM from killing init

2001-03-22 Thread Stephen Clouse

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Sat, Mar 23, 2002 at 01:33:50AM +0100, Martin Dalecki wrote:
> AMEN! TO THIS!
> Uptime of a process is a much better mesaure for a killing candidate
> then it's size.

Thing is, if you take a good study of mm/oom_kill.c, it *does* take start time
into account, as well as CPU time.  The problem is that a process (like Oracle,
in our case) using ludicrous amounts of memory can still rank at the top of the 
list, even with the time-based reduction factors, because total VM is the
starting number in the equation for determining what to kill.  Oracle or what
not sitting at 80 MB for a day or two will still find a way to outrank the
newly-started 1 MB shell process whose malloc triggered oom_kill in the first
place.

If anything, time really needs to be a hard criterion for sorting the final list
on and not merely a variable in the equation and thus tied to vmsize.

This is why the production database boxen aren't running 2.4 yet.  I can control
Oracle's usage very finely (since it uses a fixed memory pool preallocated at
startup), but if something else decides to fire up on there (like the nightly
backup and maintenance routine) and decides it needs just a pinch more memory
than what's available -- ick.  2.2.x doesn't appear to enforce new memory 
allocation with a sniper rifle -- the new process just suffers a pleasant ("Out
of memory!") or violent (SIGSEGV) death.

- -- 
Stephen Clouse <[EMAIL PROTECTED]>
Senior Programmer, IQ Coordinator Project Lead
The IQ Group, Inc. <http://www.theiqgroup.com/>

-BEGIN PGP SIGNATURE-
Version: PGP 6.5.8

iQA/AwUBOrqW3wOGqGs0PadnEQLZUwCfWTr8HwAChQamWWvWWzZcX5DZ8PAAnROB
Ja25OAQu3W1h7Ck0SU/TfKj8
=VlQt
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] Prevent OOM from killing init

2001-03-22 Thread Stephen Clouse

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Thu, Mar 22, 2001 at 12:47:27PM +0100, Guest section DW wrote:
> Last week I installed SuSE 7.1 somewhere.
> During the install: "VM: killing process rpm",
> leaving the installer rather confused.
> (An empty machine, 256MB, 144MB swap, I think 2.2.18.)
> 
> Last month I had a computer algebra process running for a week.
> Killed. But this computation was the only task this machine had.
> Its sole reason of existence.
> Too bad - zero information out of a week's computation.
> (I think 2.4.0.)
> 
> Clearly, Linux cannot be reliable if any process can be killed
> at any moment. I am not happy at all with my recent experiences.

Really the whole oom_kill process seems bass-ackwards to me.  I can't in my mind
logically justify annihilating large-VM processes that have been running for 
days or weeks instead of just returning ENOMEM to a process that just started 
up.

We run Oracle on a development box here, and it's always the first to get the
axe (non-root process using 70-80 MB VM).  Whenever someone's testing decides to 
run away with memory, I usually spend the rest of the day getting intimate with
the backup files, since SIGKILLing random Oracle processes, as you might have
guessed, has a tendency to rape the entire database.

It would be nice to give immunity to certain uids, or better yet, just turn the
damn thing off entirely.  I've already hacked that in...errr, out.

- -- 
Stephen Clouse <[EMAIL PROTECTED]>
Senior Programmer, IQ Coordinator Project Lead
The IQ Group, Inc. <http://www.theiqgroup.com/>

-BEGIN PGP SIGNATURE-
Version: PGP 6.5.8

iQA/AwUBOrpgbgOGqGs0PadnEQLp5QCfZMwtDZRNwYQ6RJX0MJ8lRVHTj3YAoNlt
pFWT2i+2y+Yze/6EYy9V0oaE
=QIrK
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] Prevent OOM from killing init

2001-03-22 Thread Stephen Clouse

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Thu, Mar 22, 2001 at 12:47:27PM +0100, Guest section DW wrote:
 Last week I installed SuSE 7.1 somewhere.
 During the install: "VM: killing process rpm",
 leaving the installer rather confused.
 (An empty machine, 256MB, 144MB swap, I think 2.2.18.)
 
 Last month I had a computer algebra process running for a week.
 Killed. But this computation was the only task this machine had.
 Its sole reason of existence.
 Too bad - zero information out of a week's computation.
 (I think 2.4.0.)
 
 Clearly, Linux cannot be reliable if any process can be killed
 at any moment. I am not happy at all with my recent experiences.

Really the whole oom_kill process seems bass-ackwards to me.  I can't in my mind
logically justify annihilating large-VM processes that have been running for 
days or weeks instead of just returning ENOMEM to a process that just started 
up.

We run Oracle on a development box here, and it's always the first to get the
axe (non-root process using 70-80 MB VM).  Whenever someone's testing decides to 
run away with memory, I usually spend the rest of the day getting intimate with
the backup files, since SIGKILLing random Oracle processes, as you might have
guessed, has a tendency to rape the entire database.

It would be nice to give immunity to certain uids, or better yet, just turn the
damn thing off entirely.  I've already hacked that in...errr, out.

- -- 
Stephen Clouse [EMAIL PROTECTED]
Senior Programmer, IQ Coordinator Project Lead
The IQ Group, Inc. http://www.theiqgroup.com/

-BEGIN PGP SIGNATURE-
Version: PGP 6.5.8

iQA/AwUBOrpgbgOGqGs0PadnEQLp5QCfZMwtDZRNwYQ6RJX0MJ8lRVHTj3YAoNlt
pFWT2i+2y+Yze/6EYy9V0oaE
=QIrK
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] Prevent OOM from killing init

2001-03-22 Thread Stephen Clouse

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Sat, Mar 23, 2002 at 01:33:50AM +0100, Martin Dalecki wrote:
 AMEN! TO THIS!
 Uptime of a process is a much better mesaure for a killing candidate
 then it's size.

Thing is, if you take a good study of mm/oom_kill.c, it *does* take start time
into account, as well as CPU time.  The problem is that a process (like Oracle,
in our case) using ludicrous amounts of memory can still rank at the top of the 
list, even with the time-based reduction factors, because total VM is the
starting number in the equation for determining what to kill.  Oracle or what
not sitting at 80 MB for a day or two will still find a way to outrank the
newly-started 1 MB shell process whose malloc triggered oom_kill in the first
place.

If anything, time really needs to be a hard criterion for sorting the final list
on and not merely a variable in the equation and thus tied to vmsize.

This is why the production database boxen aren't running 2.4 yet.  I can control
Oracle's usage very finely (since it uses a fixed memory pool preallocated at
startup), but if something else decides to fire up on there (like the nightly
backup and maintenance routine) and decides it needs just a pinch more memory
than what's available -- ick.  2.2.x doesn't appear to enforce new memory 
allocation with a sniper rifle -- the new process just suffers a pleasant ("Out
of memory!") or violent (SIGSEGV) death.

- -- 
Stephen Clouse [EMAIL PROTECTED]
Senior Programmer, IQ Coordinator Project Lead
The IQ Group, Inc. http://www.theiqgroup.com/

-BEGIN PGP SIGNATURE-
Version: PGP 6.5.8

iQA/AwUBOrqW3wOGqGs0PadnEQLZUwCfWTr8HwAChQamWWvWWzZcX5DZ8PAAnROB
Ja25OAQu3W1h7Ck0SU/TfKj8
=VlQt
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



2.4.0 oops bdflush

2001-01-13 Thread Stephen Clouse

We have a development SMP machine which runs a myriad of server applications for
our development purposes -- Apache, Oracle, several others.  Under 2.4.0 the
machine locks up, seemingly at random.  Usually it simply stops responding
without fanfare -- you can, oddly enough, switch consoles with Alt+F?, but
typing gets no response and all network services have stopped
responding.  However, on the most recent failure I was lucky enough to find that
it had managed to spit out a kernel oops message before biting it, which I have 
(hopefully) decoded (properly):

root@fs1:/usr/src/linux.2.4.0# ksymoops -v /usr/src/linux.2.4.0/vmlinux -m \
 /usr/src/linux.2.4.0/System.map -o /lib/modules/2.4.0/
ksymoops 2.3.7 on i686 2.2.18.  Options used
 -v /usr/src/linux.2.4.0/vmlinux (specified)
 -k /proc/ksyms (default)
 -l /proc/modules (default)
 -o /lib/modules/2.4.0/ (specified)
 -m /usr/src/linux.2.4.0/System.map (specified)

Error (regular_file): read_ksyms stat /proc/ksyms failed
ksymoops: No such file or directory
No modules in ksyms, skipping objects
No ksyms, skipping lsmod
Reading Oops report from the terminal
invalid operand: 
CPU:0
EIP:0010:[]
EFLAGS: 00010296
eax: 001c   ebx: c1068518   ecx:    edx: 0026
esi: c10684fc   edi: 021c   ebp: 0001   esp: c14f9fa4
ds: 0018   es: 0018   ss: 0018
Process bdflush (pid: 5, stackpage=c14f9000)
Stack: c01f9865 c01f9a24 0299 c14f8000 c01fb96e  0008e000 
   0004 0040  0110 c01369c2 0007  00010f00
   cff93f84 cff93fd0 0008e000 c0107507 cff93fbc cff93fbc cff93fbc
Call Trace: [] []
Code: 0f 0b 83 c4 0c 90 8b 46 14 85 c0 75 19 68 99 02 00 00 68 24
invalid operand: 
CPU:0
EIP:0010:[]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010296
eax: 001c   ebx: c1068518   ecx:    edx: 0026
esi: c10684fc   edi: 021c   ebp: 0001   esp: c14f9fa4
ds: 0018   es: 0018   ss: 0018
Process bdflush (pid: 5, stackpage=c14f9000)
Stack: c01f9865 c01f9a24 0299 c14f8000 c01fb96e  0008e000 
   0004 0040  0110 c01369c2 0007  00010f00
   cff93f84 cff93fd0 0008e000 c0107507 cff93fbc cff93fbc cff93fbc
Call Trace: [] []
Code: 0f 0b 83 c4 0c 90 8b 46 14 85 c0 75 19 68 99 02 00 00 68 24

>>EIP; c012c37e<=
Trace; c01369c2 
Trace; c0107507 
Code;  c012c37e 
 <_EIP>:
Code;  c012c37e<=
   0:   0f 0b ud2a  <=
Code;  c012c380 
   2:   83 c4 0c  addl   $0xc,%esp
Code;  c012c383 
   5:   90nop
Code;  c012c384 
   6:   8b 46 14  movl   0x14(%esi),%eax
Code;  c012c387 
   9:   85 c0 testl  %eax,%eax
Code;  c012c389 
   b:   75 19 jne26 <_EIP+0x26> c012c3a4 
Code;  c012c38b 
   d:   68 99 02 00 00pushl  $0x299
Code;  c012c390 
  12:   68 24 00 00 00pushl  $0x24

This machine has been running flawlessly on 2.2.18 for weeks now, which seems to
preclude a hardware issue.  And since I've been personally running 2.4.0 on my
uniprocessor machine since day one without incident, I suspect some bizarre
interaction in SMP-land.  But I'm hardly a kernel programmer

Unforunately I can't find exact specs on the machine; it's a Dell Precision 420,
most likely built with the hardware du jour about six months ago.  The config
options used are below:

CONFIG_X86=y
CONFIG_ISA=y
CONFIG_UID16=y
CONFIG_M686FXSR=y
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_CMPXCHG=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
CONFIG_X86_L1_CACHE_SHIFT=5
CONFIG_X86_TSC=y
CONFIG_X86_GOOD_APIC=y
CONFIG_X86_PGE=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_X86_FXSR=y
CONFIG_X86_XMM=y
CONFIG_X86_MSR=y
CONFIG_X86_CPUID=y
CONFIG_NOHIGHMEM=y
CONFIG_MTRR=y
CONFIG_SMP=y
CONFIG_HAVE_DEC_LOCK=y
CONFIG_NET=y
CONFIG_X86_IO_APIC=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_PCI=y
CONFIG_PCI_GOANY=y
CONFIG_PCI_BIOS=y
CONFIG_PCI_DIRECT=y
CONFIG_PCI_NAMES=y
CONFIG_SYSVIPC=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_SYSCTL=y
CONFIG_KCORE_ELF=y
CONFIG_BINFMT_AOUT=y
CONFIG_BINFMT_ELF=y
CONFIG_BINFMT_MISC=y
CONFIG_BLK_DEV_FD=y
CONFIG_PACKET=y
CONFIG_PACKET_MMAP=y
CONFIG_NETLINK=y
CONFIG_FILTER=y
CONFIG_UNIX=y
CONFIG_INET=y
CONFIG_SYN_COOKIES=y
CONFIG_IDE=y
CONFIG_BLK_DEV_IDE=y
CONFIG_BLK_DEV_IDEDISK=y
CONFIG_BLK_DEV_IDECD=y
CONFIG_BLK_DEV_IDETAPE=y
CONFIG_BLK_DEV_IDEPCI=y
CONFIG_IDEPCI_SHARE_IRQ=y
CONFIG_BLK_DEV_IDEDMA_PCI=y
CONFIG_IDEDMA_PCI_AUTO=y
CONFIG_BLK_DEV_IDEDMA=y
CONFIG_BLK_DEV_PIIX=y
CONFIG_PIIX_TUNING=y
CONFIG_IDEDMA_AUTO=y
CONFIG_BLK_DEV_IDE_MODES=y
CONFIG_NETDEVICES=y
CONFIG_NET_ETHERNET=y
CONFIG_NET_VENDOR_3COM=y
CONFIG_VORTEX=y
CONFIG_VT=y
CONFIG_VT_CONSOLE=y
CONFIG_UNIX98_PTYS=y
CONFIG_UNIX98_PTY_COUNT=256
CONFIG_MOUSE=y
CONFIG_PSMOUSE=y
CONFIG_RTC=y
CONFIG_QUOTA=y
CONFIG_FAT_FS=y
CONFIG_MSDOS_FS=y
CONFIG_VFAT_FS=y
CONFIG_ISO9660_FS=y
CONFIG_JOLIET=y
CONFIG_MINIX_FS=y

2.4.0 oops bdflush

2001-01-13 Thread Stephen Clouse

We have a development SMP machine which runs a myriad of server applications for
our development purposes -- Apache, Oracle, several others.  Under 2.4.0 the
machine locks up, seemingly at random.  Usually it simply stops responding
without fanfare -- you can, oddly enough, switch consoles with Alt+F?, but
typing gets no response and all network services have stopped
responding.  However, on the most recent failure I was lucky enough to find that
it had managed to spit out a kernel oops message before biting it, which I have 
(hopefully) decoded (properly):

root@fs1:/usr/src/linux.2.4.0# ksymoops -v /usr/src/linux.2.4.0/vmlinux -m \
 /usr/src/linux.2.4.0/System.map -o /lib/modules/2.4.0/
ksymoops 2.3.7 on i686 2.2.18.  Options used
 -v /usr/src/linux.2.4.0/vmlinux (specified)
 -k /proc/ksyms (default)
 -l /proc/modules (default)
 -o /lib/modules/2.4.0/ (specified)
 -m /usr/src/linux.2.4.0/System.map (specified)

Error (regular_file): read_ksyms stat /proc/ksyms failed
ksymoops: No such file or directory
No modules in ksyms, skipping objects
No ksyms, skipping lsmod
Reading Oops report from the terminal
invalid operand: 
CPU:0
EIP:0010:[c012c37e]
EFLAGS: 00010296
eax: 001c   ebx: c1068518   ecx:    edx: 0026
esi: c10684fc   edi: 021c   ebp: 0001   esp: c14f9fa4
ds: 0018   es: 0018   ss: 0018
Process bdflush (pid: 5, stackpage=c14f9000)
Stack: c01f9865 c01f9a24 0299 c14f8000 c01fb96e  0008e000 
   0004 0040  0110 c01369c2 0007  00010f00
   cff93f84 cff93fd0 0008e000 c0107507 cff93fbc cff93fbc cff93fbc
Call Trace: [c01369c2] [c0107507]
Code: 0f 0b 83 c4 0c 90 8b 46 14 85 c0 75 19 68 99 02 00 00 68 24
invalid operand: 
CPU:0
EIP:0010:[c012c37e]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010296
eax: 001c   ebx: c1068518   ecx:    edx: 0026
esi: c10684fc   edi: 021c   ebp: 0001   esp: c14f9fa4
ds: 0018   es: 0018   ss: 0018
Process bdflush (pid: 5, stackpage=c14f9000)
Stack: c01f9865 c01f9a24 0299 c14f8000 c01fb96e  0008e000 
   0004 0040  0110 c01369c2 0007  00010f00
   cff93f84 cff93fd0 0008e000 c0107507 cff93fbc cff93fbc cff93fbc
Call Trace: [c01369c2] [c0107507]
Code: 0f 0b 83 c4 0c 90 8b 46 14 85 c0 75 19 68 99 02 00 00 68 24

EIP; c012c37e page_launder+716/868   =
Trace; c01369c2 bdflush+96/dc
Trace; c0107507 kernel_thread+23/30
Code;  c012c37e page_launder+716/868
 _EIP:
Code;  c012c37e page_launder+716/868   =
   0:   0f 0b ud2a  =
Code;  c012c380 page_launder+718/868
   2:   83 c4 0c  addl   $0xc,%esp
Code;  c012c383 page_launder+71b/868
   5:   90nop
Code;  c012c384 page_launder+71c/868
   6:   8b 46 14  movl   0x14(%esi),%eax
Code;  c012c387 page_launder+71f/868
   9:   85 c0 testl  %eax,%eax
Code;  c012c389 page_launder+721/868
   b:   75 19 jne26 _EIP+0x26 c012c3a4 page_launder+73c/868
Code;  c012c38b page_launder+723/868
   d:   68 99 02 00 00pushl  $0x299
Code;  c012c390 page_launder+728/868
  12:   68 24 00 00 00pushl  $0x24

This machine has been running flawlessly on 2.2.18 for weeks now, which seems to
preclude a hardware issue.  And since I've been personally running 2.4.0 on my
uniprocessor machine since day one without incident, I suspect some bizarre
interaction in SMP-land.  But I'm hardly a kernel programmer

Unforunately I can't find exact specs on the machine; it's a Dell Precision 420,
most likely built with the hardware du jour about six months ago.  The config
options used are below:

CONFIG_X86=y
CONFIG_ISA=y
CONFIG_UID16=y
CONFIG_M686FXSR=y
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_CMPXCHG=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
CONFIG_X86_L1_CACHE_SHIFT=5
CONFIG_X86_TSC=y
CONFIG_X86_GOOD_APIC=y
CONFIG_X86_PGE=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_X86_FXSR=y
CONFIG_X86_XMM=y
CONFIG_X86_MSR=y
CONFIG_X86_CPUID=y
CONFIG_NOHIGHMEM=y
CONFIG_MTRR=y
CONFIG_SMP=y
CONFIG_HAVE_DEC_LOCK=y
CONFIG_NET=y
CONFIG_X86_IO_APIC=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_PCI=y
CONFIG_PCI_GOANY=y
CONFIG_PCI_BIOS=y
CONFIG_PCI_DIRECT=y
CONFIG_PCI_NAMES=y
CONFIG_SYSVIPC=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_SYSCTL=y
CONFIG_KCORE_ELF=y
CONFIG_BINFMT_AOUT=y
CONFIG_BINFMT_ELF=y
CONFIG_BINFMT_MISC=y
CONFIG_BLK_DEV_FD=y
CONFIG_PACKET=y
CONFIG_PACKET_MMAP=y
CONFIG_NETLINK=y
CONFIG_FILTER=y
CONFIG_UNIX=y
CONFIG_INET=y
CONFIG_SYN_COOKIES=y
CONFIG_IDE=y
CONFIG_BLK_DEV_IDE=y
CONFIG_BLK_DEV_IDEDISK=y
CONFIG_BLK_DEV_IDECD=y
CONFIG_BLK_DEV_IDETAPE=y
CONFIG_BLK_DEV_IDEPCI=y
CONFIG_IDEPCI_SHARE_IRQ=y
CONFIG_BLK_DEV_IDEDMA_PCI=y
CONFIG_IDEDMA_PCI_AUTO=y
CONFIG_BLK_DEV_IDEDMA=y
CONFIG_BLK_DEV_PIIX=y
CONFIG_PIIX_TUNING=y
CONFIG_IDEDMA_AUTO=y
CONFIG_BLK_DEV_IDE_MODES=y
CONFIG_NETDEVICES=y
CONFIG_NET_ETHERNET=y