Re: Linux RAS

2000-09-15 Thread Matt D. Robinson

I'd also want the default kernel build to create a symbol table namelist
object that gets installed into $(INSTALL_PATH) that correlates to the
kernel build.  That way you build a symbol table mechanism for user-space
applications that want more complete kernel debug information, but do it
without bloating the kernel with all that gstabs data (which is
duplicated many times over if you turn on -gstabs for an entire build).

CONFIG_??* options are good to know about, but what really matters to
me during debugging is how those CONFIG_??* settings actually change
structure definitions.  And the only way to really know that is to
have the gstabs data outlining what is in the kernel.

How does that sound?

--Matt

Daniel Phillips wrote:
> 
> Keith Owens wrote:
> > * Standardize on tracking the System.map and .config with the kernel.
> 
> There was a suggestion from Alan Cox that .config.gz be appended to
> bzImage, after the part that gets loaded into memory, to which I added
> the suggestion that System.map.gz also be appended.  That about takes
> care of all the descriptive kernel information that normally gets out
> of sync.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Linux RAS

2000-09-15 Thread richardj_moore



The comparison is was making was with OS/2, not MVS, because:

1) all too often MVS is cited as being the paradigm for RAS when infact
there are special architectural features, as you've pointed out, that might
detract from a generalised comparison.
2) OS/2 is an x86 based OS so has the problems, like Linux, of not being
locked into tighlty controlled H/W architecture.
3) we completely reversed the serviceability situation for OS/2 and even
exceeded the RAS capabilities of MVS in the area of dynamic tracing.


Richard Moore -  RAS Project Lead - Linux Technology Centre.

http://oss.software.ibm.com/developerworks/opensource/linux
Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183
PISC, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK


Keith Owens <[EMAIL PROTECTED]> on 15-09-2000 01:07:40 PM

Please respond to Keith Owens <[EMAIL PROTECTED]>

To:   Richard J Moore/UK/IBM@IBMGB
cc:   [EMAIL PROTECTED]
Subject:  Linux RAS




On Fri, 15 Sep 2000 10:42:54 +0100,
[EMAIL PROTECTED] wrote:
>I think  the case for the kernel debugger is better stated as the case for
>RAS (Reliability, Serviceability and Availability) in the kernel [snip]
>Customers knew we never sent
>developers on site to debug OS/390 (or MVS as it was called in those
days).
>They also knew that we never rejected a problem because it was not
>re-creatable and we never even asked for a re-creation scenario. The
reason
>for this was that we had appropriate RAS capability in MVS which allowed
>data to be captured automatically at fault time combined with a certain
>amount of self-healing capability and automated recovery.

It would be nice if Linux had the same level of RAS as MVS.  But
consider the different environments that MVS and Linux run under.

MVS.Standard and external I/O controllers.  Fix the data pages,
build the CCW, issue SIO, forget about it until it interrupts.
I/O requires little or no OS support, it can even be done from
bare hardware.

Linux.  50+ different I/O controllers.  Most require continual hand
holding by the operating system.  Doing I/O from bare hardware
is difficult if not impossible.

MVS.Multiple systems keys (0-7) so different system components can
be segregated.  If VTAM (key 7) dies the rest of the OS (keys
0-6) can run.  Multiple system keys stop most runaway
components from stamping on other component's data.  Key
switching is relatively cheap.

Linux.  On ix86, all of the kernel runs in key 0.  No protection of one
part of the kernel from another.  Key switching is relatively
expensive.

MVS.High quality hardware.  Processors that can detect their own
problems, all memory is ECC or better, disks that report
failures before they occur.

Linux.  Most users are on the cheapest possible hardware.  No parity on
memory, processors and disks that just die without warning.
Lots of random problems, "one bit different, probably bad RAM".

MVS.Trace of interesting events in the master trace table.

Linux.  No kernel trace table (oops backtrace does *not* count).

MVS.Only one hardware model - S/390 with minimum hardware
requirements for each version of OS/390.  IBM can get away with
telling customers "to run OS/390 2.9 you need a G5 processor or
better".

Linux.  12+ different hardware models, with no restriction on how old
the hardware is.  We cannot tell customers "to run Linux 2.4
you must have a Pentium III or better".

MVS.If all else fails, MVS can do a stand alone IPL to capture the
crash data.  It requires that a site plan ahead by building
stand alone IPL text and dedicating an area of disk to receive
the crash dump but apart from that it always works.

Linux.  Assuming that your I/O subsystem will run without a working OS
(and lkcd shows how difficult that is), you still need a
mechanism to get the stand alone dump going.  On our most common
platform (ix86) we have to pray that the BIOS will not wipe
memory before saving the crash dump.  If the BIOS is not
reliable (and most are not) then the stand alone code must be
built into the kernel and protected somehow.  Then you have the
problem of activating the stand alone dump.

MVS.Decent hardware support for OS tracing, with Program Event
Registers (PER) that are standardized and easy to use.

Linux.  Each platform has different hardware mechanisms for hardware
debugging.  Some have no hardware debugging at all.

MVS.The OS developers know everything about the hardware.  They
even know the undocumented processor and I/O subsystem
commands.

Linux.  Nobody knows everything about every bit of hardware that is
supported.

MVS.Decent hardware support for cheap cross 

Re: Linux RAS

2000-09-15 Thread Daniel Phillips

Keith Owens wrote:
> * Standardize on tracking the System.map and .config with the kernel.

There was a suggestion from Alan Cox that .config.gz be appended to
bzImage, after the part that gets loaded into memory, to which I added
the suggestion that System.map.gz also be appended.  That about takes
care of all the descriptive kernel information that normally gets out
of sync.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Linux RAS

2000-09-15 Thread Keith Owens

On Fri, 15 Sep 2000 10:42:54 +0100, 
[EMAIL PROTECTED] wrote:
>I think  the case for the kernel debugger is better stated as the case for
>RAS (Reliability, Serviceability and Availability) in the kernel [snip]
>Customers knew we never sent
>developers on site to debug OS/390 (or MVS as it was called in those days).
>They also knew that we never rejected a problem because it was not
>re-creatable and we never even asked for a re-creation scenario. The reason
>for this was that we had appropriate RAS capability in MVS which allowed
>data to be captured automatically at fault time combined with a certain
>amount of self-healing capability and automated recovery.

It would be nice if Linux had the same level of RAS as MVS.  But
consider the different environments that MVS and Linux run under.

MVS.Standard and external I/O controllers.  Fix the data pages,
build the CCW, issue SIO, forget about it until it interrupts.
I/O requires little or no OS support, it can even be done from
bare hardware.

Linux.  50+ different I/O controllers.  Most require continual hand
holding by the operating system.  Doing I/O from bare hardware
is difficult if not impossible.

MVS.Multiple systems keys (0-7) so different system components can
be segregated.  If VTAM (key 7) dies the rest of the OS (keys
0-6) can run.  Multiple system keys stop most runaway
components from stamping on other component's data.  Key
switching is relatively cheap.

Linux.  On ix86, all of the kernel runs in key 0.  No protection of one
part of the kernel from another.  Key switching is relatively
expensive.

MVS.High quality hardware.  Processors that can detect their own
problems, all memory is ECC or better, disks that report
failures before they occur.

Linux.  Most users are on the cheapest possible hardware.  No parity on
memory, processors and disks that just die without warning.
Lots of random problems, "one bit different, probably bad RAM".

MVS.Trace of interesting events in the master trace table.

Linux.  No kernel trace table (oops backtrace does *not* count).

MVS.Only one hardware model - S/390 with minimum hardware
requirements for each version of OS/390.  IBM can get away with
telling customers "to run OS/390 2.9 you need a G5 processor or
better".

Linux.  12+ different hardware models, with no restriction on how old
the hardware is.  We cannot tell customers "to run Linux 2.4
you must have a Pentium III or better".

MVS.If all else fails, MVS can do a stand alone IPL to capture the
crash data.  It requires that a site plan ahead by building
stand alone IPL text and dedicating an area of disk to receive
the crash dump but apart from that it always works.

Linux.  Assuming that your I/O subsystem will run without a working OS
(and lkcd shows how difficult that is), you still need a
mechanism to get the stand alone dump going.  On our most common
platform (ix86) we have to pray that the BIOS will not wipe
memory before saving the crash dump.  If the BIOS is not
reliable (and most are not) then the stand alone code must be
built into the kernel and protected somehow.  Then you have the
problem of activating the stand alone dump.

MVS.Decent hardware support for OS tracing, with Program Event
Registers (PER) that are standardized and easy to use.

Linux.  Each platform has different hardware mechanisms for hardware
debugging.  Some have no hardware debugging at all.

MVS.The OS developers know everything about the hardware.  They
even know the undocumented processor and I/O subsystem
commands.

Linux.  Nobody knows everything about every bit of hardware that is
supported.

MVS.Decent hardware support for cheap cross memory services.  OS
data can be separated into multiple data spaces, separating
unrelated data.

Linux.  Where is data for subsystem XYZ?  Somewhere in the kernel,
along with every other subsystem's data.

MVS.Much of the I/O is controlled by user space applications.  MVS
is just a low level provider of I/O services and relies on
complicated code in user applications to do data sharing.  This
is changing with Unix System Services and HFS data spaces but I
suspect that the bulk of data movement is still being done by
batch, IMS, CICS and VSAM data spaces, all controlled from user
land.

Linux.  All I/O is done by the kernel so the kernel contains all the
complicated code to maintain data integrity.

MVS.IBM controls the levels of software in a system.  SMP (System
Management/Mangling Product) may have its faults but as a tool
for building and tracking complicated binaries it makes RPM,
DEB etc. look 

Linux RAS

2000-09-15 Thread Keith Owens

On Fri, 15 Sep 2000 10:42:54 +0100, 
[EMAIL PROTECTED] wrote:
I think  the case for the kernel debugger is better stated as the case for
RAS (Reliability, Serviceability and Availability) in the kernel [snip]
Customers knew we never sent
developers on site to debug OS/390 (or MVS as it was called in those days).
They also knew that we never rejected a problem because it was not
re-creatable and we never even asked for a re-creation scenario. The reason
for this was that we had appropriate RAS capability in MVS which allowed
data to be captured automatically at fault time combined with a certain
amount of self-healing capability and automated recovery.

It would be nice if Linux had the same level of RAS as MVS.  But
consider the different environments that MVS and Linux run under.

MVS.Standard and external I/O controllers.  Fix the data pages,
build the CCW, issue SIO, forget about it until it interrupts.
I/O requires little or no OS support, it can even be done from
bare hardware.

Linux.  50+ different I/O controllers.  Most require continual hand
holding by the operating system.  Doing I/O from bare hardware
is difficult if not impossible.

MVS.Multiple systems keys (0-7) so different system components can
be segregated.  If VTAM (key 7) dies the rest of the OS (keys
0-6) can run.  Multiple system keys stop most runaway
components from stamping on other component's data.  Key
switching is relatively cheap.

Linux.  On ix86, all of the kernel runs in key 0.  No protection of one
part of the kernel from another.  Key switching is relatively
expensive.

MVS.High quality hardware.  Processors that can detect their own
problems, all memory is ECC or better, disks that report
failures before they occur.

Linux.  Most users are on the cheapest possible hardware.  No parity on
memory, processors and disks that just die without warning.
Lots of random problems, "one bit different, probably bad RAM".

MVS.Trace of interesting events in the master trace table.

Linux.  No kernel trace table (oops backtrace does *not* count).

MVS.Only one hardware model - S/390 with minimum hardware
requirements for each version of OS/390.  IBM can get away with
telling customers "to run OS/390 2.9 you need a G5 processor or
better".

Linux.  12+ different hardware models, with no restriction on how old
the hardware is.  We cannot tell customers "to run Linux 2.4
you must have a Pentium III or better".

MVS.If all else fails, MVS can do a stand alone IPL to capture the
crash data.  It requires that a site plan ahead by building
stand alone IPL text and dedicating an area of disk to receive
the crash dump but apart from that it always works.

Linux.  Assuming that your I/O subsystem will run without a working OS
(and lkcd shows how difficult that is), you still need a
mechanism to get the stand alone dump going.  On our most common
platform (ix86) we have to pray that the BIOS will not wipe
memory before saving the crash dump.  If the BIOS is not
reliable (and most are not) then the stand alone code must be
built into the kernel and protected somehow.  Then you have the
problem of activating the stand alone dump.

MVS.Decent hardware support for OS tracing, with Program Event
Registers (PER) that are standardized and easy to use.

Linux.  Each platform has different hardware mechanisms for hardware
debugging.  Some have no hardware debugging at all.

MVS.The OS developers know everything about the hardware.  They
even know the undocumented processor and I/O subsystem
commands.

Linux.  Nobody knows everything about every bit of hardware that is
supported.

MVS.Decent hardware support for cheap cross memory services.  OS
data can be separated into multiple data spaces, separating
unrelated data.

Linux.  Where is data for subsystem XYZ?  Somewhere in the kernel,
along with every other subsystem's data.

MVS.Much of the I/O is controlled by user space applications.  MVS
is just a low level provider of I/O services and relies on
complicated code in user applications to do data sharing.  This
is changing with Unix System Services and HFS data spaces but I
suspect that the bulk of data movement is still being done by
batch, IMS, CICS and VSAM data spaces, all controlled from user
land.

Linux.  All I/O is done by the kernel so the kernel contains all the
complicated code to maintain data integrity.

MVS.IBM controls the levels of software in a system.  SMP (System
Management/Mangling Product) may have its faults but as a tool
for building and tracking complicated binaries it makes RPM,
DEB etc. look pitiful.  IBM 

Re: Linux RAS

2000-09-15 Thread Daniel Phillips

Keith Owens wrote:
 * Standardize on tracking the System.map and .config with the kernel.

There was a suggestion from Alan Cox that .config.gz be appended to
bzImage, after the part that gets loaded into memory, to which I added
the suggestion that System.map.gz also be appended.  That about takes
care of all the descriptive kernel information that normally gets out
of sync.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Linux RAS

2000-09-15 Thread richardj_moore



The comparison is was making was with OS/2, not MVS, because:

1) all too often MVS is cited as being the paradigm for RAS when infact
there are special architectural features, as you've pointed out, that might
detract from a generalised comparison.
2) OS/2 is an x86 based OS so has the problems, like Linux, of not being
locked into tighlty controlled H/W architecture.
3) we completely reversed the serviceability situation for OS/2 and even
exceeded the RAS capabilities of MVS in the area of dynamic tracing.


Richard Moore -  RAS Project Lead - Linux Technology Centre.

http://oss.software.ibm.com/developerworks/opensource/linux
Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183
PISC, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK


Keith Owens [EMAIL PROTECTED] on 15-09-2000 01:07:40 PM

Please respond to Keith Owens [EMAIL PROTECTED]

To:   Richard J Moore/UK/IBM@IBMGB
cc:   [EMAIL PROTECTED]
Subject:  Linux RAS




On Fri, 15 Sep 2000 10:42:54 +0100,
[EMAIL PROTECTED] wrote:
I think  the case for the kernel debugger is better stated as the case for
RAS (Reliability, Serviceability and Availability) in the kernel [snip]
Customers knew we never sent
developers on site to debug OS/390 (or MVS as it was called in those
days).
They also knew that we never rejected a problem because it was not
re-creatable and we never even asked for a re-creation scenario. The
reason
for this was that we had appropriate RAS capability in MVS which allowed
data to be captured automatically at fault time combined with a certain
amount of self-healing capability and automated recovery.

It would be nice if Linux had the same level of RAS as MVS.  But
consider the different environments that MVS and Linux run under.

MVS.Standard and external I/O controllers.  Fix the data pages,
build the CCW, issue SIO, forget about it until it interrupts.
I/O requires little or no OS support, it can even be done from
bare hardware.

Linux.  50+ different I/O controllers.  Most require continual hand
holding by the operating system.  Doing I/O from bare hardware
is difficult if not impossible.

MVS.Multiple systems keys (0-7) so different system components can
be segregated.  If VTAM (key 7) dies the rest of the OS (keys
0-6) can run.  Multiple system keys stop most runaway
components from stamping on other component's data.  Key
switching is relatively cheap.

Linux.  On ix86, all of the kernel runs in key 0.  No protection of one
part of the kernel from another.  Key switching is relatively
expensive.

MVS.High quality hardware.  Processors that can detect their own
problems, all memory is ECC or better, disks that report
failures before they occur.

Linux.  Most users are on the cheapest possible hardware.  No parity on
memory, processors and disks that just die without warning.
Lots of random problems, "one bit different, probably bad RAM".

MVS.Trace of interesting events in the master trace table.

Linux.  No kernel trace table (oops backtrace does *not* count).

MVS.Only one hardware model - S/390 with minimum hardware
requirements for each version of OS/390.  IBM can get away with
telling customers "to run OS/390 2.9 you need a G5 processor or
better".

Linux.  12+ different hardware models, with no restriction on how old
the hardware is.  We cannot tell customers "to run Linux 2.4
you must have a Pentium III or better".

MVS.If all else fails, MVS can do a stand alone IPL to capture the
crash data.  It requires that a site plan ahead by building
stand alone IPL text and dedicating an area of disk to receive
the crash dump but apart from that it always works.

Linux.  Assuming that your I/O subsystem will run without a working OS
(and lkcd shows how difficult that is), you still need a
mechanism to get the stand alone dump going.  On our most common
platform (ix86) we have to pray that the BIOS will not wipe
memory before saving the crash dump.  If the BIOS is not
reliable (and most are not) then the stand alone code must be
built into the kernel and protected somehow.  Then you have the
problem of activating the stand alone dump.

MVS.Decent hardware support for OS tracing, with Program Event
Registers (PER) that are standardized and easy to use.

Linux.  Each platform has different hardware mechanisms for hardware
debugging.  Some have no hardware debugging at all.

MVS.The OS developers know everything about the hardware.  They
even know the undocumented processor and I/O subsystem
commands.

Linux.  Nobody knows everything about every bit of hardware that is
supported.

MVS.Decent hardware support for cheap cross memory services.  OS
data can be separ

Re: Linux RAS

2000-09-15 Thread Matt D. Robinson

I'd also want the default kernel build to create a symbol table namelist
object that gets installed into $(INSTALL_PATH) that correlates to the
kernel build.  That way you build a symbol table mechanism for user-space
applications that want more complete kernel debug information, but do it
without bloating the kernel with all that gstabs data (which is
duplicated many times over if you turn on -gstabs for an entire build).

CONFIG_??* options are good to know about, but what really matters to
me during debugging is how those CONFIG_??* settings actually change
structure definitions.  And the only way to really know that is to
have the gstabs data outlining what is in the kernel.

How does that sound?

--Matt

Daniel Phillips wrote:
 
 Keith Owens wrote:
  * Standardize on tracking the System.map and .config with the kernel.
 
 There was a suggestion from Alan Cox that .config.gz be appended to
 bzImage, after the part that gets loaded into memory, to which I added
 the suggestion that System.map.gz also be appended.  That about takes
 care of all the descriptive kernel information that normally gets out
 of sync.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/