Re: I/O tracing
You can use either dprobes to set up a tracepoint dynamically anywhere you please see: http://oss.software.ibm.com/developerworks/opensource/linux/projects/dprobes Or, you can use gkhi to define a hook anywhere in the kernel you please. You can write a hook exit as a kmod to do whatever you fancy and have it activate at a tome of your choosing. See, http://oss.software.ibm.com/developerworks/opensource/linux/projects/gkhi Or, you can investigate some of the standard tracepoint Linux Trace Toolkit creates, see: http://www.opersys.com/ And that's only three of many Richard Moore - RAS Project Lead - Linux Technology Centre (ATS-PIC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK "YU,SAMMY (HP-Roseville,ex1)" <[EMAIL PROTECTED]> on 04/06/2001 19:37:23 Please respond to "YU,SAMMY (HP-Roseville,ex1)" <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] cc: Subject: I/O tracing Hi, Please CC me as I'm not subscribed on the list, thanks. Not sure if this is appropriate forum, is there an existing tool/module for capturing all the I/O requests such as: Unique Identifier Start Time End Time Device Identifier Operation Type (Read Or Write) Offset Length (Number Of Bytes) I am aware of existing /proc/disks and partitions, but these aren't real time. If not, are there any facilities in the kernel I can put a hook in to keep track of the I/O? Thanks in advance, Sammy Yu Hewlett-Packard - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: I/O tracing
You can use either dprobes to set up a tracepoint dynamically anywhere you please see: http://oss.software.ibm.com/developerworks/opensource/linux/projects/dprobes Or, you can use gkhi to define a hook anywhere in the kernel you please. You can write a hook exit as a kmod to do whatever you fancy and have it activate at a tome of your choosing. See, http://oss.software.ibm.com/developerworks/opensource/linux/projects/gkhi Or, you can investigate some of the standard tracepoint Linux Trace Toolkit creates, see: http://www.opersys.com/ And that's only three of many Richard Moore - RAS Project Lead - Linux Technology Centre (ATS-PIC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK YU,SAMMY (HP-Roseville,ex1) [EMAIL PROTECTED] on 04/06/2001 19:37:23 Please respond to YU,SAMMY (HP-Roseville,ex1) [EMAIL PROTECTED] To: [EMAIL PROTECTED] cc: Subject: I/O tracing Hi, Please CC me as I'm not subscribed on the list, thanks. Not sure if this is appropriate forum, is there an existing tool/module for capturing all the I/O requests such as: Unique Identifier Start Time End Time Device Identifier Operation Type (Read Or Write) Offset Length (Number Of Bytes) I am aware of existing /proc/disks and partitions, but these aren't real time. If not, are there any facilities in the kernel I can put a hook in to keep track of the I/O? Thanks in advance, Sammy Yu Hewlett-Packard - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux kernel programming for beginners
Yes, try the O'Reilly books, especilly Linux Device Drivers by Rubini, ISBN 1-56592-292-1 Richard Richard Moore - RAS Project Lead - Linux Technology Centre (ATS-PIC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux kernel programming for beginners
Yes, try the O'Reilly books, especilly Linux Device Drivers by Rubini, ISBN 1-56592-292-1 Richard Richard Moore - RAS Project Lead - Linux Technology Centre (ATS-PIC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Dynamically altering code segments
Dprobes is one mechanism for doing what you want. It works the same way OS/2 dynamic trace did. Another mecnahism, also available from the dprobes web page is the GKHI (generalised kernel hooks interface). If you know you want tracepoints in permanently assigned locations then you could code a gkhi hook in the kernel which is essentially two jmps. When the hook is inactive the first jmp bypasses the second, which jumps to the hook exit dispatcher routine. When active the first jmp uses a zero offset. If you use the gkhi you'll need to write you own hook exits, which presumably will trace data and drop it into a trace buffer of your own making. Again if you do decide to use ghki, please wait for 1.0 to be dropped next week sometime. If you go down the dprobes route you'll see that it inter-operates with Linux Trace Toolkit to give you a dynamic tracing capability for Linux (user and kernel space). We're currently working on custom formatting for raw trace data events created by dprobes. If you're familiar with os/2 then TRCUST might mean something to you in connection with custom formatting. Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK Andreas Dilger <[EMAIL PROTECTED]> on 27/02/2001 17:05:37 Please respond to Andreas Dilger <[EMAIL PROTECTED]> To: "Collins, Tom" <[EMAIL PROTECTED]> cc: [EMAIL PROTECTED] Subject: Re: Dynamically altering code segments Tom Collins writes: > I am wanting to dynamically modify the kernel in specific places to > implement a custom kernel trace mechanism. The general idea is that, > when the "trace" is off, there are NOP instruction sequences at various > places in the kernel. When the "trace" is turned on, those same NOPs > are replaced by JMPs to code that implements the trace (such as logging > events, using the MSR and PMC's etc..). > > This was a trick that was done in my old days of OS/2 performance tools > developement to get trace information from the running kernel. > > Is it possible to do the same thing in Linux? See IBM "dprobes" project. It is basically what you are describing (AFAIK). It makes sense, because a lot of the OS/2 folks are now working on Linux. Cheers, Andreas -- Andreas Dilger \ "If a man ate a pound of pasta and a pound of antipasto, \ would they cancel out, leaving him still hungry?" http://www-mddsp.enel.ucalgary.ca/People/adilger/ -- Dogbert - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Dynamically altering code segments
Dprobes is one mechanism for doing what you want. It works the same way OS/2 dynamic trace did. Another mecnahism, also available from the dprobes web page is the GKHI (generalised kernel hooks interface). If you know you want tracepoints in permanently assigned locations then you could code a gkhi hook in the kernel which is essentially two jmps. When the hook is inactive the first jmp bypasses the second, which jumps to the hook exit dispatcher routine. When active the first jmp uses a zero offset. If you use the gkhi you'll need to write you own hook exits, which presumably will trace data and drop it into a trace buffer of your own making. Again if you do decide to use ghki, please wait for 1.0 to be dropped next week sometime. If you go down the dprobes route you'll see that it inter-operates with Linux Trace Toolkit to give you a dynamic tracing capability for Linux (user and kernel space). We're currently working on custom formatting for raw trace data events created by dprobes. If you're familiar with os/2 then TRCUST might mean something to you in connection with custom formatting. Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK Andreas Dilger [EMAIL PROTECTED] on 27/02/2001 17:05:37 Please respond to Andreas Dilger [EMAIL PROTECTED] To: "Collins, Tom" [EMAIL PROTECTED] cc: [EMAIL PROTECTED] Subject: Re: Dynamically altering code segments Tom Collins writes: I am wanting to dynamically modify the kernel in specific places to implement a custom kernel trace mechanism. The general idea is that, when the "trace" is off, there are NOP instruction sequences at various places in the kernel. When the "trace" is turned on, those same NOPs are replaced by JMPs to code that implements the trace (such as logging events, using the MSR and PMC's etc..). This was a trick that was done in my old days of OS/2 performance tools developement to get trace information from the running kernel. Is it possible to do the same thing in Linux? See IBM "dprobes" project. It is basically what you are describing (AFAIK). It makes sense, because a lot of the OS/2 folks are now working on Linux. Cheers, Andreas -- Andreas Dilger \ "If a man ate a pound of pasta and a pound of antipasto, \ would they cancel out, leaving him still hungry?" http://www-mddsp.enel.ucalgary.ca/People/adilger/ -- Dogbert - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: monitoring I/O
I can offer the GKHI we put together to make kernel hooks easy to add an manage. If you know which code paths you need to peek then you can write you monitor as a kernel mod - user mod pair. The kernel mod will accumulate the stats, the user mod will extract and report the stats. See the web page below if you're interested - but note we're very shortly to release a new version of the GKHI. Another options is to use dynamic probes - this will require not kernel modificaitons - again to have to know exactly where you want to place the probes. Again see the web page below for details. Richard Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK Daniel Kobras <[EMAIL PROTECTED]> on 24/01/2001 11:57:45 Please respond to Daniel Kobras <[EMAIL PROTECTED]> To: Michael McLeod <[EMAIL PROTECTED]> cc: Nicholas Dronen <[EMAIL PROTECTED]>, [EMAIL PROTECTED] Subject: Re: monitoring I/O On Tue, 23 Jan 2001, Nicholas Dronen wrote: > Check out the disk_io field in /proc/stat. Which unfortunately provides only some pieces of information Michael wants to gather. SCT's sard patches give you much improved statistics that should basically do what you want. I'm not sure of the current location of the sard patches but as RedHat puts sard in its kernel, it should be available somewhere on redhat.com, I suppose. Check out the sysstat package for userlevel tools. Earlier versions of sard can be found at ftp.uk.linux.org/pub/linux/sct/fs/profiling/ > On Wed, Jan 24, 2001 at 11:52:36AM +1100, Michael McLeod wrote: > > I am hoping someone can give me a little information or point me in the > > right direction. I would like to write an application that monitors I/O > > on a linux machine, but I need some help in determining where to get the > > information I'm looking for. What I would like to do is 'hook' into the > > kernel and record information such as volume name, type of request (read > > or write), the amount of data being read or written, how long each > > transaction takes Regards, Daniel. -- GNU/Linux Audio Mechanics - http://www.glame.de Cutting Edge Office - http://www.c10a02.de GPG Key ID 89BF7E2B - http://www.keyserver.net - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: monitoring I/O
I can offer the GKHI we put together to make kernel hooks easy to add an manage. If you know which code paths you need to peek then you can write you monitor as a kernel mod - user mod pair. The kernel mod will accumulate the stats, the user mod will extract and report the stats. See the web page below if you're interested - but note we're very shortly to release a new version of the GKHI. Another options is to use dynamic probes - this will require not kernel modificaitons - again to have to know exactly where you want to place the probes. Again see the web page below for details. Richard Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK Daniel Kobras [EMAIL PROTECTED] on 24/01/2001 11:57:45 Please respond to Daniel Kobras [EMAIL PROTECTED] To: Michael McLeod [EMAIL PROTECTED] cc: Nicholas Dronen [EMAIL PROTECTED], [EMAIL PROTECTED] Subject: Re: monitoring I/O On Tue, 23 Jan 2001, Nicholas Dronen wrote: Check out the disk_io field in /proc/stat. Which unfortunately provides only some pieces of information Michael wants to gather. SCT's sard patches give you much improved statistics that should basically do what you want. I'm not sure of the current location of the sard patches but as RedHat puts sard in its kernel, it should be available somewhere on redhat.com, I suppose. Check out the sysstat package for userlevel tools. Earlier versions of sard can be found at ftp.uk.linux.org/pub/linux/sct/fs/profiling/ On Wed, Jan 24, 2001 at 11:52:36AM +1100, Michael McLeod wrote: I am hoping someone can give me a little information or point me in the right direction. I would like to write an application that monitors I/O on a linux machine, but I need some help in determining where to get the information I'm looking for. What I would like to do is 'hook' into the kernel and record information such as volume name, type of request (read or write), the amount of data being read or written, how long each transaction takes Regards, Daniel. -- GNU/Linux Audio Mechanics - http://www.glame.de Cutting Edge Office - http://www.c10a02.de GPG Key ID 89BF7E2B - http://www.keyserver.net - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Anybody got 2.4.0 running on a 386 ?
Does linux cater of all the old 386 chip bugs - especially the memory management oddities? Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Anybody got 2.4.0 running on a 386 ?
Does linux cater of all the old 386 chip bugs - especially the memory management oddities? Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Question about RTC interrupts on i386
You can get some interesting side effects if you incease the clock speed. I'm not saying that Linux will suffer, but I have seen problems on other Intel based systems - it all depends on what you do with the clock interrupt. Increasing the seed will give a finer grained pre-emption capability. I assume you're talking about the free-running timer on IRQ0 and not the TOD clock on IRQ8 - both of these are driven from the same chip. If this is the case then the preblems I referred to arise when the PIC is programed in strict priority order. IRQ0 will be the highest priority interrupt, which meanse lower priotiy devices that are running asynchronously may overrun inbound because they can't get their interrupts serviced quickley enough. For a server or desktop use you want you high priority interrupts to be infrequenlty occuring. Real-time systems may legitimately have a different requirement. I'm not sure there's any particular advantage to the TOD clock on IRQ 8. Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK Lee Reynolds <[EMAIL PROTECTED]> on 15/12/2000 04:04:04 Please respond to Lee Reynolds <[EMAIL PROTECTED]> To: Linux Kernel Maillist <[EMAIL PROTECTED]> cc: Subject: Question about RTC interrupts on i386 I'm reading the book Linux Internals by Moshe Bar. Early on he describes the use of the real time clock to generate an interrupt 100 times a second. He explains that this value was chosen early in the development cycle of the linux kernel and is therefore relatively low compared to what current hardware can make good use of. He mentions that the alpha port of linux uses a 1024Hz interrupt rate and that patches have been made for the Intel kernel to give it the same rate while maintaining the interrupt rate that appears to userland programs such as top at 100Hz. I'm just wondering what the benefits of increasing this value are and whether these patches are going to be included in 2.4? Thanks, Lee Reynolds __ Do You Yahoo!? Yahoo! Shopping - Thousands of Stores. Millions of Products. http://shopping.yahoo.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Question about RTC interrupts on i386
You can get some interesting side effects if you incease the clock speed. I'm not saying that Linux will suffer, but I have seen problems on other Intel based systems - it all depends on what you do with the clock interrupt. Increasing the seed will give a finer grained pre-emption capability. I assume you're talking about the free-running timer on IRQ0 and not the TOD clock on IRQ8 - both of these are driven from the same chip. If this is the case then the preblems I referred to arise when the PIC is programed in strict priority order. IRQ0 will be the highest priority interrupt, which meanse lower priotiy devices that are running asynchronously may overrun inbound because they can't get their interrupts serviced quickley enough. For a server or desktop use you want you high priority interrupts to be infrequenlty occuring. Real-time systems may legitimately have a different requirement. I'm not sure there's any particular advantage to the TOD clock on IRQ 8. Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK Lee Reynolds [EMAIL PROTECTED] on 15/12/2000 04:04:04 Please respond to Lee Reynolds [EMAIL PROTECTED] To: Linux Kernel Maillist [EMAIL PROTECTED] cc: Subject: Question about RTC interrupts on i386 I'm reading the book Linux Internals by Moshe Bar. Early on he describes the use of the real time clock to generate an interrupt 100 times a second. He explains that this value was chosen early in the development cycle of the linux kernel and is therefore relatively low compared to what current hardware can make good use of. He mentions that the alpha port of linux uses a 1024Hz interrupt rate and that patches have been made for the Intel kernel to give it the same rate while maintaining the interrupt rate that appears to userland programs such as top at 100Hz. I'm just wondering what the benefits of increasing this value are and whether these patches are going to be included in 2.4? Thanks, Lee Reynolds __ Do You Yahoo!? Yahoo! Shopping - Thousands of Stores. Millions of Products. http://shopping.yahoo.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: cpu stepping
Stepping is Intel teminology for chip revisions - possibly other manufacturers have used the same terminology. Intel documents the fixes, or rather the bugs and work-arounds, for each stepping level in the addendum to the particular processor's reference manual. Probably, I haven't checked, this info is available in PDF format from the INTEL website (http://www.intel.com). IBM terminology for the equivalent of stepping level is EC level (engineering change level). To understand the fine detail of the intel stepping levels, in particular the work-arounds, you'll need to be familiar with the processor architecture to the extent an assembler programmer would be. Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK "Jon Hulatt" <[EMAIL PROTECTED]> on 12/12/2000 10:45:50 Please respond to "Jon Hulatt" <[EMAIL PROTECTED]> To: "Linux Kernel" <[EMAIL PROTECTED]> cc: Subject: cpu stepping hi, sorry to ask this here but i'm finding difficulty getting this info elsewhere... I'm not an assembly programmer and i know little about cpu's. it's a hole in my knowledge i guess. i'm looking for some technical introduction doc to explain what diff. aspects of cpu do, what is stepping and all that. especially for intel but also for other architectures. Thanks Jon - att1.htm - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: cpu stepping
Stepping is Intel teminology for chip revisions - possibly other manufacturers have used the same terminology. Intel documents the fixes, or rather the bugs and work-arounds, for each stepping level in the addendum to the particular processor's reference manual. Probably, I haven't checked, this info is available in PDF format from the INTEL website (http://www.intel.com). IBM terminology for the equivalent of stepping level is EC level (engineering change level). To understand the fine detail of the intel stepping levels, in particular the work-arounds, you'll need to be familiar with the processor architecture to the extent an assembler programmer would be. Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK "Jon Hulatt" [EMAIL PROTECTED] on 12/12/2000 10:45:50 Please respond to "Jon Hulatt" [EMAIL PROTECTED] To: "Linux Kernel" [EMAIL PROTECTED] cc: Subject: cpu stepping hi, sorry to ask this here but i'm finding difficulty getting this info elsewhere... I'm not an assembly programmer and i know little about cpu's. it's a hole in my knowledge i guess. i'm looking for some technical introduction doc to explain what diff. aspects of cpu do, what is stepping and all that. especially for intel but also for other architectures. Thanks Jon - att1.htm - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Why is double_fault serviced by a trap gate?
I agree, I've changed my mind about the use of a task gate for NMI - Intel recommend an interrupt gate for a very good reason - NMI's are queued until the IRET so using an interrup gate for NMI (and keeping interrupts disabled) will guarantee that NMIs are handled serially. I think our use of a trap gate for NMI in OS/2 was probably not the best idea. Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK Keith Owens <[EMAIL PROTECTED]> on 08/12/2000 22:34:49 Please respond to Keith Owens <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] cc: Richard J Moore/UK/IBM@IBMGB, Brian Gerst <[EMAIL PROTECTED]>, Andi Kleen <[EMAIL PROTECTED]>, "Maciej W. Rozycki" <[EMAIL PROTECTED]>, [EMAIL PROTECTED] Subject: Re: Why is double_fault serviced by a trap gate? On Fri, 8 Dec 2000 07:58:06 -0500 (EST), "Richard B. Johnson" <[EMAIL PROTECTED]> wrote: >Too many people just want to argue without even reading what they >are arguing against. Again, I implied nothing. I said; > > (1) User traps, CPL3, stack for trap is in CPL0. > (2) CPL0 has stack-fault (bad ring zero code, bad memory). > (3) CPL0 traps, using faulted stack, double fault. > (4) There is no stack-trick, including a call-gate to another > "environment" (complete with its previously-reserved stack), > that will ever get you back to (2), much less to (1). Nobody thinks that a stack overflow is recoverable - for that process. By the time you overflow, the struct task at the bottom of the kernel stack has been overwritten so the process is dead, gone to make its maker, it is pushing up daisies. The rest of the system may or may not recover, depending on the resources that the dead process is still holding and the links between processes. Changing the stack overflow to a trap gate will give us diagnostics on the failing task instead of an immediate triple fault and reboot. Diagnostics are useful. If the system can recover afterwards then that is a bonus but it is not guaranteed. The process is always unrecoverable. I am not convinced that using a trap gate for NMI is a good idea, the NMI watchdog kicks in too often for my liking. Using a trap gate for a debugger would be worthwhile, I have always been worried about the amount of stack that kdb uses. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Why is double_fault serviced by a trap gate?
I agree, I've changed my mind about the use of a task gate for NMI - Intel recommend an interrupt gate for a very good reason - NMI's are queued until the IRET so using an interrup gate for NMI (and keeping interrupts disabled) will guarantee that NMIs are handled serially. I think our use of a trap gate for NMI in OS/2 was probably not the best idea. Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK Keith Owens [EMAIL PROTECTED] on 08/12/2000 22:34:49 Please respond to Keith Owens [EMAIL PROTECTED] To: [EMAIL PROTECTED] cc: Richard J Moore/UK/IBM@IBMGB, Brian Gerst [EMAIL PROTECTED], Andi Kleen [EMAIL PROTECTED], "Maciej W. Rozycki" [EMAIL PROTECTED], [EMAIL PROTECTED] Subject: Re: Why is double_fault serviced by a trap gate? On Fri, 8 Dec 2000 07:58:06 -0500 (EST), "Richard B. Johnson" [EMAIL PROTECTED] wrote: Too many people just want to argue without even reading what they are arguing against. Again, I implied nothing. I said; (1) User traps, CPL3, stack for trap is in CPL0. (2) CPL0 has stack-fault (bad ring zero code, bad memory). (3) CPL0 traps, using faulted stack, double fault. (4) There is no stack-trick, including a call-gate to another "environment" (complete with its previously-reserved stack), that will ever get you back to (2), much less to (1). Nobody thinks that a stack overflow is recoverable - for that process. By the time you overflow, the struct task at the bottom of the kernel stack has been overwritten so the process is dead, gone to make its maker, it is pushing up daisies. The rest of the system may or may not recover, depending on the resources that the dead process is still holding and the links between processes. Changing the stack overflow to a trap gate will give us diagnostics on the failing task instead of an immediate triple fault and reboot. Diagnostics are useful. If the system can recover afterwards then that is a bonus but it is not guaranteed. The process is always unrecoverable. I am not convinced that using a trap gate for NMI is a good idea, the NMI watchdog kicks in too often for my liking. Using a trap gate for a debugger would be worthwhile, I have always been worried about the amount of stack that kdb uses. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Why is double_fault serviced by a trap gate?
Exactly, and you wouldn't set DPL=3 for interrupt 8 since a double-fault can only occur from ring 0.. Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK Mikulas Patocka <[EMAIL PROTECTED]> on 08/12/2000 20:31:59 Please respond to Mikulas Patocka <[EMAIL PROTECTED]> To: Richard J Moore/UK/IBM@IBMGB cc: [EMAIL PROTECTED], Brian Gerst <[EMAIL PROTECTED]>, Andi Kleen <[EMAIL PROTECTED]>, "Maciej W. Rozycki" <[EMAIL PROTECTED]>, [EMAIL PROTECTED] Subject: Re: Why is double_fault serviced by a trap gate? > No no. That's that the whole point of a gate. You make a controlled > transition to ring 0 including stack switching. There are complex > protection checking rules, however as long as the DPL of the gate > descriptor is 3 then ring 3 is allowed to make the transition to ring 0. A > stack fault in user mode cannot kill the system. If it ever did it would be > a blatant bug of the most crass kind. Setting DPL == 3 of any interrupt/trap/fault gate is bad idea because it allows the user to kill the machine with INT 8 or something like that. DPL is checked only if interrupt is generated with INT, INT3 or INTO (IA manual, vol 3, section 5.10.1.1). Mikulas - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Why is double_fault serviced by a trap gate?
Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK -- Forwarded by Richard J Moore/UK/IBM on 08/12/2000 13:17 --- To: [EMAIL PROTECTED] cc: From: Richard J Moore/UK/IBM@IBMGB Subject: Re: Why is double_fault serviced by a trap gate? Importance:Normal I'm sorry I still don't see your point. You have a double-fault in R0 running on the normal R0 stack I presume. If you don't handle exception 8 with a task gate then this automatically becomes a triple-fault, the processor resets and we get no information about what's happened. My point is that the double-fault code is a waste of time unless you use a task gate. If you're not going to do that then just leave IDT 8 as an invalid descriptor. As far as aguing without reading what you're written, that's not the case. You're using very abreviated language, it's not obvious to me what you're driving at - I have to fill in the gaps and guess. What do you mean by "stack-trick"? Why can't recovery be sufficient at least to give meaninful diagnostic information? Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK "Richard B. Johnson" <[EMAIL PROTECTED]> on 08/12/2000 12:58:06 Please respond to [EMAIL PROTECTED] To: Richard J Moore/UK/IBM@IBMGB cc: Brian Gerst <[EMAIL PROTECTED]>, Andi Kleen <[EMAIL PROTECTED]>, "Maciej W. Rozycki" <[EMAIL PROTECTED]>, [EMAIL PROTECTED] Subject: Re: Why is double_fault serviced by a trap gate? On Fri, 8 Dec 2000 [EMAIL PROTECTED] wrote: > > > No no. That's that the whole point of a gate. You make a controlled > transition to ring 0 including stack switching. There are complex > protection checking rules, however as long as the DPL of the gate > descriptor is 3 then ring 3 is allowed to make the transition to ring 0. A > stack fault in user mode cannot kill the system. If it ever did it would be > a blatant bug of the most crass kind. > > You seem to be implying that a stack fault in R3 will or could cause a > stack fault in R0 - why? Each thread has it's own R0 stack. The value for > R0 SS:ESP are taken from the current (H/W) TSS and gets initial values at > the top of the stack. > Read my lips. I implied no such thing. The user trap to kernel was just a way to get to the kernel, i.e., "system call". Otherwise you don't have anything to "get back to". Too many people just want to argue without even reading what they are arguing against. Again, I implied nothing. I said; (1) User traps, CPL3, stack for trap is in CPL0. (2) CPL0 has stack-fault (bad ring zero code, bad memory). (3) CPL0 traps, using faulted stack, double fault. (4) There is no stack-trick, including a call-gate to another "environment" (complete with its previously-reserved stack), that will ever get you back to (2), much less to (1). Now, if you can't read this, don't argue. > > "Richard B. Johnson" <[EMAIL PROTECTED]> on 08/12/2000 01:36:58 > > Please respond to [EMAIL PROTECTED] > > To: Brian Gerst <[EMAIL PROTECTED]> > cc: Richard J Moore/UK/IBM@IBMGB, Andi Kleen <[EMAIL PROTECTED]>, "Maciej W. > Rozycki" <[EMAIL PROTECTED]>, [EMAIL PROTECTED] > Subject: Re: Why is double_fault serviced by a trap gate? > > > > > On Thu, 7 Dec 2000, Brian Gerst wrote: > > > "Richard B. Johnson" wrote: > > > > > > On Thu, 7 Dec 2000 [EMAIL PROTECTED] wrote: > > > > > > > > > > > > > > > Which surely we can on today's x86 systems. Even back in the days of > OS/2 > > > > 2.0 running on a 386 with 4Mb RAM we used a taskgate for both NMI and > > > > Double Fault. You need only a minimal stack - 1K, sufficient to save > state > > > > and restore ESP to a known point before switching back to the main > TSS to > > > > allow normal exception handling to occur. > > > > > > > > There no architectural restriction that some folks have hinted at - > as long > > > > as the DPL for the task gates is 3. > > > > > > > [SNIPPED...] > > > > > > Please refer to page 6-16, Inter486 Microprocessor Family Programmer's > > > Reference Manual. > > > > > > The specifc text is: "The TSS does not have a stack pointer for a > > > privilege level 3 stack, because the procedure cannot be called by a > less > > > privileged procedure. The stack for privilege level 3 is preserved by > the > > > contents of SS and EIP registers which have been saved on the stack > > > of the privilege level called from level 3". > > > > > > What this means is that a stack-fault in level 3 will kill you no > > > matter how cute you try to be. And, putting a task gate as call > > > procedure entry from a trap or
Re: Why is double_fault serviced by a trap gate?
No no. That's that the whole point of a gate. You make a controlled transition to ring 0 including stack switching. There are complex protection checking rules, however as long as the DPL of the gate descriptor is 3 then ring 3 is allowed to make the transition to ring 0. A stack fault in user mode cannot kill the system. If it ever did it would be a blatant bug of the most crass kind. You seem to be implying that a stack fault in R3 will or could cause a stack fault in R0 - why? Each thread has it's own R0 stack. The value for R0 SS:ESP are taken from the current (H/W) TSS and gets initial values at the top of the stack. Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK "Richard B. Johnson" <[EMAIL PROTECTED]> on 08/12/2000 01:36:58 Please respond to [EMAIL PROTECTED] To: Brian Gerst <[EMAIL PROTECTED]> cc: Richard J Moore/UK/IBM@IBMGB, Andi Kleen <[EMAIL PROTECTED]>, "Maciej W. Rozycki" <[EMAIL PROTECTED]>, [EMAIL PROTECTED] Subject: Re: Why is double_fault serviced by a trap gate? On Thu, 7 Dec 2000, Brian Gerst wrote: > "Richard B. Johnson" wrote: > > > > On Thu, 7 Dec 2000 [EMAIL PROTECTED] wrote: > > > > > > > > > > > Which surely we can on today's x86 systems. Even back in the days of OS/2 > > > 2.0 running on a 386 with 4Mb RAM we used a taskgate for both NMI and > > > Double Fault. You need only a minimal stack - 1K, sufficient to save state > > > and restore ESP to a known point before switching back to the main TSS to > > > allow normal exception handling to occur. > > > > > > There no architectural restriction that some folks have hinted at - as long > > > as the DPL for the task gates is 3. > > > > > [SNIPPED...] > > > > Please refer to page 6-16, Inter486 Microprocessor Family Programmer's > > Reference Manual. > > > > The specifc text is: "The TSS does not have a stack pointer for a > > privilege level 3 stack, because the procedure cannot be called by a less > > privileged procedure. The stack for privilege level 3 is preserved by the > > contents of SS and EIP registers which have been saved on the stack > > of the privilege level called from level 3". > > > > What this means is that a stack-fault in level 3 will kill you no > > matter how cute you try to be. And, putting a task gate as call > > procedure entry from a trap or fault is just trying to be cute. > > It's extra code that will result in the same processor reset. > > No, because the CPL of the task gate would be 0, which means the stack > will be set to tss->esp0. The DPL of 3 means that the descriptor can be > accessed from CPL3. The text you mention generally means that the only > way to get back to CPL3 is with iret (via the saved %cs:%eip and > %ss:%esp pushed on the CPL0/1/2 stack). > > -- > It is yes, not no. (1) User traps, CPL3, stack for trap is in CPL0. (2) CPL0 has stack-fault (bad ring zero code, bad memory). (3) CPL0 traps, using faulted stack, double fault. (4) There is no stack-trick, including a call-gate to another "environment" (complete with its previously-reserved stack), that will ever get you back to (2), much less to (1). I am not denying the possibility of "warm-booting", i.e., reloate some code to where there is a 1:1 physical to virtual translation, jump to the relocated code, disable paging, restart kernel code, and possibly examine what happened. You just have to get back to "flat-mode" with no paging to handle anything beyond a double fault. You are just not going to be able to restart from the stack-faulted code. Cheers, Dick Johnson Penguin : Linux version 2.4.0 on an i686 machine (799.54 BogoMips). "Memory is like gasoline. You use it up when you are running. Of course you get it all back when you reboot..."; Actual explanation obtained from the Micro$oft help desk. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: who is writing to disk
One option, if there's no bespoke mechanism is to use DPorbes and or Linux Trace Toolkit to set up a trace of file system apis. You could also start with strace. Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK Zhiruo Cao <[EMAIL PROTECTED]> on 08/12/2000 02:25:03 Please respond to Zhiruo Cao <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] cc: [EMAIL PROTECTED] Subject: who is writing to disk Hello, I found a process constantly writing to disk when I run gnome as desktop and while the whole system is idle. I don't find anything in the log file, and I don't see anything updated in my home dir or in /tmp. Does it sound like bdflush is writing? But I don't hear the disk access when I am not running gnome. My question then is, is there a (monitoring) tool that can tell me who is writing to disk? Or how I configure the kernel to know that? Thanks! Joe - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: who is writing to disk
One option, if there's no bespoke mechanism is to use DPorbes and or Linux Trace Toolkit to set up a trace of file system apis. You could also start with strace. Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK Zhiruo Cao [EMAIL PROTECTED] on 08/12/2000 02:25:03 Please respond to Zhiruo Cao [EMAIL PROTECTED] To: [EMAIL PROTECTED] cc: [EMAIL PROTECTED] Subject: who is writing to disk Hello, I found a process constantly writing to disk when I run gnome as desktop and while the whole system is idle. I don't find anything in the log file, and I don't see anything updated in my home dir or in /tmp. Does it sound like bdflush is writing? But I don't hear the disk access when I am not running gnome. My question then is, is there a (monitoring) tool that can tell me who is writing to disk? Or how I configure the kernel to know that? Thanks! Joe - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Why is double_fault serviced by a trap gate?
No no. That's that the whole point of a gate. You make a controlled transition to ring 0 including stack switching. There are complex protection checking rules, however as long as the DPL of the gate descriptor is 3 then ring 3 is allowed to make the transition to ring 0. A stack fault in user mode cannot kill the system. If it ever did it would be a blatant bug of the most crass kind. You seem to be implying that a stack fault in R3 will or could cause a stack fault in R0 - why? Each thread has it's own R0 stack. The value for R0 SS:ESP are taken from the current (H/W) TSS and gets initial values at the top of the stack. Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK "Richard B. Johnson" [EMAIL PROTECTED] on 08/12/2000 01:36:58 Please respond to [EMAIL PROTECTED] To: Brian Gerst [EMAIL PROTECTED] cc: Richard J Moore/UK/IBM@IBMGB, Andi Kleen [EMAIL PROTECTED], "Maciej W. Rozycki" [EMAIL PROTECTED], [EMAIL PROTECTED] Subject: Re: Why is double_fault serviced by a trap gate? On Thu, 7 Dec 2000, Brian Gerst wrote: "Richard B. Johnson" wrote: On Thu, 7 Dec 2000 [EMAIL PROTECTED] wrote: Which surely we can on today's x86 systems. Even back in the days of OS/2 2.0 running on a 386 with 4Mb RAM we used a taskgate for both NMI and Double Fault. You need only a minimal stack - 1K, sufficient to save state and restore ESP to a known point before switching back to the main TSS to allow normal exception handling to occur. There no architectural restriction that some folks have hinted at - as long as the DPL for the task gates is 3. [SNIPPED...] Please refer to page 6-16, Inter486 Microprocessor Family Programmer's Reference Manual. The specifc text is: "The TSS does not have a stack pointer for a privilege level 3 stack, because the procedure cannot be called by a less privileged procedure. The stack for privilege level 3 is preserved by the contents of SS and EIP registers which have been saved on the stack of the privilege level called from level 3". What this means is that a stack-fault in level 3 will kill you no matter how cute you try to be. And, putting a task gate as call procedure entry from a trap or fault is just trying to be cute. It's extra code that will result in the same processor reset. No, because the CPL of the task gate would be 0, which means the stack will be set to tss-esp0. The DPL of 3 means that the descriptor can be accessed from CPL3. The text you mention generally means that the only way to get back to CPL3 is with iret (via the saved %cs:%eip and %ss:%esp pushed on the CPL0/1/2 stack). -- It is yes, not no. (1) User traps, CPL3, stack for trap is in CPL0. (2) CPL0 has stack-fault (bad ring zero code, bad memory). (3) CPL0 traps, using faulted stack, double fault. (4) There is no stack-trick, including a call-gate to another "environment" (complete with its previously-reserved stack), that will ever get you back to (2), much less to (1). I am not denying the possibility of "warm-booting", i.e., reloate some code to where there is a 1:1 physical to virtual translation, jump to the relocated code, disable paging, restart kernel code, and possibly examine what happened. You just have to get back to "flat-mode" with no paging to handle anything beyond a double fault. You are just not going to be able to restart from the stack-faulted code. Cheers, Dick Johnson Penguin : Linux version 2.4.0 on an i686 machine (799.54 BogoMips). "Memory is like gasoline. You use it up when you are running. Of course you get it all back when you reboot..."; Actual explanation obtained from the Micro$oft help desk. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Why is double_fault serviced by a trap gate?
Exactly, and you wouldn't set DPL=3 for interrupt 8 since a double-fault can only occur from ring 0.. Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK Mikulas Patocka [EMAIL PROTECTED] on 08/12/2000 20:31:59 Please respond to Mikulas Patocka [EMAIL PROTECTED] To: Richard J Moore/UK/IBM@IBMGB cc: [EMAIL PROTECTED], Brian Gerst [EMAIL PROTECTED], Andi Kleen [EMAIL PROTECTED], "Maciej W. Rozycki" [EMAIL PROTECTED], [EMAIL PROTECTED] Subject: Re: Why is double_fault serviced by a trap gate? No no. That's that the whole point of a gate. You make a controlled transition to ring 0 including stack switching. There are complex protection checking rules, however as long as the DPL of the gate descriptor is 3 then ring 3 is allowed to make the transition to ring 0. A stack fault in user mode cannot kill the system. If it ever did it would be a blatant bug of the most crass kind. Setting DPL == 3 of any interrupt/trap/fault gate is bad idea because it allows the user to kill the machine with INT 8 or something like that. DPL is checked only if interrupt is generated with INT, INT3 or INTO (IA manual, vol 3, section 5.10.1.1). Mikulas - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Why is double_fault serviced by a trap gate?
Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK -- Forwarded by Richard J Moore/UK/IBM on 08/12/2000 13:17 --- To: [EMAIL PROTECTED] cc: From: Richard J Moore/UK/IBM@IBMGB Subject: Re: Why is double_fault serviced by a trap gate? Importance:Normal I'm sorry I still don't see your point. You have a double-fault in R0 running on the normal R0 stack I presume. If you don't handle exception 8 with a task gate then this automatically becomes a triple-fault, the processor resets and we get no information about what's happened. My point is that the double-fault code is a waste of time unless you use a task gate. If you're not going to do that then just leave IDT 8 as an invalid descriptor. As far as aguing without reading what you're written, that's not the case. You're using very abreviated language, it's not obvious to me what you're driving at - I have to fill in the gaps and guess. What do you mean by "stack-trick"? Why can't recovery be sufficient at least to give meaninful diagnostic information? Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK "Richard B. Johnson" [EMAIL PROTECTED] on 08/12/2000 12:58:06 Please respond to [EMAIL PROTECTED] To: Richard J Moore/UK/IBM@IBMGB cc: Brian Gerst [EMAIL PROTECTED], Andi Kleen [EMAIL PROTECTED], "Maciej W. Rozycki" [EMAIL PROTECTED], [EMAIL PROTECTED] Subject: Re: Why is double_fault serviced by a trap gate? On Fri, 8 Dec 2000 [EMAIL PROTECTED] wrote: No no. That's that the whole point of a gate. You make a controlled transition to ring 0 including stack switching. There are complex protection checking rules, however as long as the DPL of the gate descriptor is 3 then ring 3 is allowed to make the transition to ring 0. A stack fault in user mode cannot kill the system. If it ever did it would be a blatant bug of the most crass kind. You seem to be implying that a stack fault in R3 will or could cause a stack fault in R0 - why? Each thread has it's own R0 stack. The value for R0 SS:ESP are taken from the current (H/W) TSS and gets initial values at the top of the stack. Read my lips. I implied no such thing. The user trap to kernel was just a way to get to the kernel, i.e., "system call". Otherwise you don't have anything to "get back to". Too many people just want to argue without even reading what they are arguing against. Again, I implied nothing. I said; (1) User traps, CPL3, stack for trap is in CPL0. (2) CPL0 has stack-fault (bad ring zero code, bad memory). (3) CPL0 traps, using faulted stack, double fault. (4) There is no stack-trick, including a call-gate to another "environment" (complete with its previously-reserved stack), that will ever get you back to (2), much less to (1). Now, if you can't read this, don't argue. "Richard B. Johnson" [EMAIL PROTECTED] on 08/12/2000 01:36:58 Please respond to [EMAIL PROTECTED] To: Brian Gerst [EMAIL PROTECTED] cc: Richard J Moore/UK/IBM@IBMGB, Andi Kleen [EMAIL PROTECTED], "Maciej W. Rozycki" [EMAIL PROTECTED], [EMAIL PROTECTED] Subject: Re: Why is double_fault serviced by a trap gate? On Thu, 7 Dec 2000, Brian Gerst wrote: "Richard B. Johnson" wrote: On Thu, 7 Dec 2000 [EMAIL PROTECTED] wrote: Which surely we can on today's x86 systems. Even back in the days of OS/2 2.0 running on a 386 with 4Mb RAM we used a taskgate for both NMI and Double Fault. You need only a minimal stack - 1K, sufficient to save state and restore ESP to a known point before switching back to the main TSS to allow normal exception handling to occur. There no architectural restriction that some folks have hinted at - as long as the DPL for the task gates is 3. [SNIPPED...] Please refer to page 6-16, Inter486 Microprocessor Family Programmer's Reference Manual. The specifc text is: "The TSS does not have a stack pointer for a privilege level 3 stack, because the procedure cannot be called by a less privileged procedure. The stack for privilege level 3 is preserved by the contents of SS and EIP registers which have been saved on the stack of the privilege level called from level 3". What this means is that a stack-fault in level 3 will kill you no matter how cute you try to be. And, putting a task gate as call procedure entry from a trap or fault is just trying to be cute. It's extra code that will result in the same processor reset. No, because the CPL of the task gate would be
Re: Why is double_fault serviced by a trap gate?
Yes, indeed this is the point - we should at least be able to report the problem even if we can't recover - and we should do that in the standard kernel. It doesn't seem right to convert a bad problem into an unfathomable disaster, which is what a trap gate for double-fault does. If you're going to do that then why bother to set up a trap gate, just leave IDT vector 8 as an invalid descriptor. As is stands, the do_double_fault routine is otiose. Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK Keith Owens <[EMAIL PROTECTED]> on 07/12/2000 22:47:42 Please respond to Keith Owens <[EMAIL PROTECTED]> To: Richard J Moore/UK/IBM@IBMGB cc: Andi Kleen <[EMAIL PROTECTED]>, [EMAIL PROTECTED], "Maciej W. Rozycki" <[EMAIL PROTECTED]>, [EMAIL PROTECTED] Subject: Re: Why is double_fault serviced by a trap gate? On Thu, 7 Dec 2000 21:09:47 +, [EMAIL PROTECTED] wrote: >In summary I'd say the lack of a task gate is at the very least an >oversight, if not a bug. > >If no one else wants to do it I'll see if I can code up the task gates for >the double-fault and NMI. If you overflow the kernel stack then you have already scribbled on the process state at the low end of the kernel stack pages. The process is definitely not recoverable but you might not even be able to recover the machine. Corrupt p_opptr and friends, thread_group or pidhash and other processes can be affected when they follow the chains. However being able to report the error is a good start, even if you cannot recover. If you add task gates, assign enough stack space for debuggers. kdb does a lot of work when NMI detects a hung cpu and needs stack space to do that work. A good option is to dedicate a set of process entries for per cpu task gates, say processes 2-NR_CPUS+1 are dedicated to task gates. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Why is double_fault serviced by a trap gate?
You seem to be misunderstanding the point of the argument: R3 stack fault - no problem - handled by trap gate for idt vector 12 - recovery is possible if one wants to handle it. R0 stack fault - big problem, exception 12 is converted to a double-fault, which is converted to a triple-fault because vector 8 is a trap gate and not a task gate. Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK "Richard B. Johnson" <[EMAIL PROTECTED]> on 07/12/2000 21:44:23 Please respond to [EMAIL PROTECTED] To: Richard J Moore/UK/IBM@IBMGB cc: Andi Kleen <[EMAIL PROTECTED]>, "Maciej W. Rozycki" <[EMAIL PROTECTED]>, [EMAIL PROTECTED] Subject: Re: Why is double_fault serviced by a trap gate? On Thu, 7 Dec 2000 [EMAIL PROTECTED] wrote: > > > Which surely we can on today's x86 systems. Even back in the days of OS/2 > 2.0 running on a 386 with 4Mb RAM we used a taskgate for both NMI and > Double Fault. You need only a minimal stack - 1K, sufficient to save state > and restore ESP to a known point before switching back to the main TSS to > allow normal exception handling to occur. > > There no architectural restriction that some folks have hinted at - as long > as the DPL for the task gates is 3. > [SNIPPED...] Please refer to page 6-16, Inter486 Microprocessor Family Programmer's Reference Manual. The specifc text is: "The TSS does not have a stack pointer for a privilege level 3 stack, because the procedure cannot be called by a less privileged procedure. The stack for privilege level 3 is preserved by the contents of SS and EIP registers which have been saved on the stack of the privilege level called from level 3". What this means is that a stack-fault in level 3 will kill you no matter how cute you try to be. And, putting a task gate as call procedure entry from a trap or fault is just trying to be cute. It's extra code that will result in the same processor reset. Cheers, Dick Johnson Penguin : Linux version 2.4.0 on an i686 machine (799.54 BogoMips). "Memory is like gasoline. You use it up when you are running. Of course you get it all back when you reboot..."; Actual explanation obtained from the Micro$oft help desk. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Why is double_fault serviced by a trap gate?
Which surely we can on today's x86 systems. Even back in the days of OS/2 2.0 running on a 386 with 4Mb RAM we used a taskgate for both NMI and Double Fault. You need only a minimal stack - 1K, sufficient to save state and restore ESP to a known point before switching back to the main TSS to allow normal exception handling to occur. There no architectural restriction that some folks have hinted at - as long as the DPL for the task gates is 3. There's no problem under MP since the double fault exception will be only presented on the processor that instigated the problem. As for NMIs I didn't think they were presented to all processors simultaneously. If they are then the way to handle that is to map a page of the GDT, to a unique physical address per-processor - i.e. processor local storage. The virtual address will be the same on each. This is what we did under OS/2 SMP. We also alisaed these pages to unique virtual addresses so that they could be seen by the kernel from any processor context. The only time you want the NMI handler to be fast is when it's being used for hand-shaking, which some disk devices do. And perhaps for APIC NMI class interprocessor interrupts. But I honestly don't think that's really a good enough reason not to have a task gate for NMI. The unpredictablility of the abort (NMI or Double-fault) refers to fact that in general it is indeterminate as to whether it is a fault or trap. And that's a matter of whether the EIP point at ot after the instruction related to the exception. The abort nature of theses exceptions is not really a problem for the exception handler. In summary I'd say the lack of a task gate is at the very least an oversight, if not a bug. If no one else wants to do it I'll see if I can code up the task gates for the double-fault and NMI. Richard Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Why is double_fault serviced by a trap gate?
Why is double_fault serviced by a trap gate? The problem with this is that any double-fault caused by a stack-fault, which is the usual reason, becomes a triple-fault. And a triple-fault results in a processor reset or shutdown making the fault damn near impossible to get any information on. Oughtn't the double-fault exception handler be serviced by a task gate? And similarly the NMI handler in case the NMI is on the current stack page frame? Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Why is double_fault serviced by a trap gate?
Why is double_fault serviced by a trap gate? The problem with this is that any double-fault caused by a stack-fault, which is the usual reason, becomes a triple-fault. And a triple-fault results in a processor reset or shutdown making the fault damn near impossible to get any information on. Oughtn't the double-fault exception handler be serviced by a task gate? And similarly the NMI handler in case the NMI is on the current stack page frame? Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Why is double_fault serviced by a trap gate?
Which surely we can on today's x86 systems. Even back in the days of OS/2 2.0 running on a 386 with 4Mb RAM we used a taskgate for both NMI and Double Fault. You need only a minimal stack - 1K, sufficient to save state and restore ESP to a known point before switching back to the main TSS to allow normal exception handling to occur. There no architectural restriction that some folks have hinted at - as long as the DPL for the task gates is 3. There's no problem under MP since the double fault exception will be only presented on the processor that instigated the problem. As for NMIs I didn't think they were presented to all processors simultaneously. If they are then the way to handle that is to map a page of the GDT, to a unique physical address per-processor - i.e. processor local storage. The virtual address will be the same on each. This is what we did under OS/2 SMP. We also alisaed these pages to unique virtual addresses so that they could be seen by the kernel from any processor context. The only time you want the NMI handler to be fast is when it's being used for hand-shaking, which some disk devices do. And perhaps for APIC NMI class interprocessor interrupts. But I honestly don't think that's really a good enough reason not to have a task gate for NMI. The unpredictablility of the abort (NMI or Double-fault) refers to fact that in general it is indeterminate as to whether it is a fault or trap. And that's a matter of whether the EIP point at ot after the instruction related to the exception. The abort nature of theses exceptions is not really a problem for the exception handler. In summary I'd say the lack of a task gate is at the very least an oversight, if not a bug. If no one else wants to do it I'll see if I can code up the task gates for the double-fault and NMI. Richard Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Why is double_fault serviced by a trap gate?
You seem to be misunderstanding the point of the argument: R3 stack fault - no problem - handled by trap gate for idt vector 12 - recovery is possible if one wants to handle it. R0 stack fault - big problem, exception 12 is converted to a double-fault, which is converted to a triple-fault because vector 8 is a trap gate and not a task gate. Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK "Richard B. Johnson" [EMAIL PROTECTED] on 07/12/2000 21:44:23 Please respond to [EMAIL PROTECTED] To: Richard J Moore/UK/IBM@IBMGB cc: Andi Kleen [EMAIL PROTECTED], "Maciej W. Rozycki" [EMAIL PROTECTED], [EMAIL PROTECTED] Subject: Re: Why is double_fault serviced by a trap gate? On Thu, 7 Dec 2000 [EMAIL PROTECTED] wrote: Which surely we can on today's x86 systems. Even back in the days of OS/2 2.0 running on a 386 with 4Mb RAM we used a taskgate for both NMI and Double Fault. You need only a minimal stack - 1K, sufficient to save state and restore ESP to a known point before switching back to the main TSS to allow normal exception handling to occur. There no architectural restriction that some folks have hinted at - as long as the DPL for the task gates is 3. [SNIPPED...] Please refer to page 6-16, Inter486 Microprocessor Family Programmer's Reference Manual. The specifc text is: "The TSS does not have a stack pointer for a privilege level 3 stack, because the procedure cannot be called by a less privileged procedure. The stack for privilege level 3 is preserved by the contents of SS and EIP registers which have been saved on the stack of the privilege level called from level 3". What this means is that a stack-fault in level 3 will kill you no matter how cute you try to be. And, putting a task gate as call procedure entry from a trap or fault is just trying to be cute. It's extra code that will result in the same processor reset. Cheers, Dick Johnson Penguin : Linux version 2.4.0 on an i686 machine (799.54 BogoMips). "Memory is like gasoline. You use it up when you are running. Of course you get it all back when you reboot..."; Actual explanation obtained from the Micro$oft help desk. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Why is double_fault serviced by a trap gate?
Yes, indeed this is the point - we should at least be able to report the problem even if we can't recover - and we should do that in the standard kernel. It doesn't seem right to convert a bad problem into an unfathomable disaster, which is what a trap gate for double-fault does. If you're going to do that then why bother to set up a trap gate, just leave IDT vector 8 as an invalid descriptor. As is stands, the do_double_fault routine is otiose. Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK Keith Owens [EMAIL PROTECTED] on 07/12/2000 22:47:42 Please respond to Keith Owens [EMAIL PROTECTED] To: Richard J Moore/UK/IBM@IBMGB cc: Andi Kleen [EMAIL PROTECTED], [EMAIL PROTECTED], "Maciej W. Rozycki" [EMAIL PROTECTED], [EMAIL PROTECTED] Subject: Re: Why is double_fault serviced by a trap gate? On Thu, 7 Dec 2000 21:09:47 +, [EMAIL PROTECTED] wrote: In summary I'd say the lack of a task gate is at the very least an oversight, if not a bug. If no one else wants to do it I'll see if I can code up the task gates for the double-fault and NMI. If you overflow the kernel stack then you have already scribbled on the process state at the low end of the kernel stack pages. The process is definitely not recoverable but you might not even be able to recover the machine. Corrupt p_opptr and friends, thread_group or pidhash and other processes can be affected when they follow the chains. However being able to report the error is a good start, even if you cannot recover. If you add task gates, assign enough stack space for debuggers. kdb does a lot of work when NMI detects a hung cpu and needs stack space to do that work. A good option is to dedicate a set of process entries for per cpu task gates, say processes 2-NR_CPUS+1 are dedicated to task gates. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
anounce: Universal dynamic trace for Linux
You can now use IBM's DProbes with Opersys' Linux Trace Toolkit to provide a universal (dynamic) tracing capability for Linux. It is universal because it provides a common tracing mechanism for all executables whether in user or kernel space. It is dynamic because tracepoints are defined and applied dynamically to object modules as probepoints using DProbes - no source code modification is required. To use dyamic trace you will require version 1.2 of DProbes, or later from http://oss.software.ibm.com/ and LTT version 0.9.4pre4 or later from http://www.opersys.com/ The DProbes kernel patch will need to be compiled with correct configuration options to enable it to work with LTT. See the respective installation instructions in each package for more details. Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
anounce: Universal dynamic trace for Linux
You can now use IBM's DProbes with Opersys' Linux Trace Toolkit to provide a universal (dynamic) tracing capability for Linux. It is universal because it provides a common tracing mechanism for all executables whether in user or kernel space. It is dynamic because tracepoints are defined and applied dynamically to object modules as probepoints using DProbes - no source code modification is required. To use dyamic trace you will require version 1.2 of DProbes, or later from http://oss.software.ibm.com/ and LTT version 0.9.4pre4 or later from http://www.opersys.com/ The DProbes kernel patch will need to be compiled with correct configuration options to enable it to work with LTT. See the respective installation instructions in each package for more details. Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Newbie
Not even Intel can spell kernal [sic] - see 486 Programmer's reference - description of protection mechanism. BTW one of the enhancements to the Pentium was an improvement in the spelling of kernel. :-) Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] Generalised Kernel Hooks Interface (GKHI)
Please respond to Andrea Arcangeli <[EMAIL PROTECTED]> To: Richard J Moore/UK/IBM@IBMGB cc: Subject: Re: [ANNOUNCE] Generalised Kernel Hooks Interface (GKHI) On Wed, Nov 15, 2000 at 05:14:57AM +, [EMAIL PROTECTED] wrote: > > > Andrea, > > I am very greatful for your detailed analysis. I have yet to digest > everything you commented but will get back to you on all points you raise > very soon. Here are my thoughts so far: I'm glad you appreciated my comments. I think dprobes gives an higher level of flexibility for debugging purposes and I'd really like to include it in the aa kernels until it will be included into the mainstream. > When I announced GKHI I did state that SMP support was to follow. The I probably overlooked that part of your announcement, sorry. > updates are trivial but I didn't wan't to release the code until I had had > a chance to test it. Very promising. Note that SMP could introduce non trivial issues: the self modifying changes should be atomic with respect of the other CPUs executing the self modifying code and specs are often not very explicit about side effects of self modifying code in SMP, it's not only a matter of implementing the GKHI locks with SMP locks). > Are you claiming that flush_icache_range has an error and should implement > the IA32 instruction flush as I did using CPUID? If this is the case has Exactly. > this error been officially reported? I hope I did that in my email :). Actually when I fixed the alpha port some month ago (alpha needs an explicit imb() to flush the speculative icache and it was really crashing in modules because of the missing smp_imb) I also noticed IA32 was buggy, since also IA32 execute out of order and specs says we must do a cpuid as only way to serialize the istream. But I didn't fixed IA32 because nobody ever got bitten by that race because of timing/implementation reasons, but to be correcet we should do cpuid also in flush_icache_range. Once flush_icache_range is fixed in IA32 you can use it inside GKHI too (and then you'll get it right on all architectures). > Thanks for this information. Reserving a syscall will become irrelvant when > we release Dprobes as a module using gkhi since we will use ioctl() as the > application interface. Ok (still you need to reserve a blockdevice major minor number with Linus though). > Well, not necessarily so while lkcd is not get accepted into the standard > kernel source. [..] It won't until it uses a separate driver that doesn't depend on scsi or ide layer. Even ignoring the safety problem of scsi layer potentially corrupted by memory corruption at crash time, the scsi layer doesn't work without being interrupt driven. It will recurse on the stack badly if somebody ever tries to use it polled. Probably similar thing happens with IDE (but none IDE polled hardware exists so we don't know). I documented all this in the `Linux Kernel Debugging' document on my ftp area in ftp.suse.com. > [.] But also, even when lkcd becomes accepted, using gkhi with > lkcd will allow a crash dump capability to be actived dynamically. [..] We can control everything dynamically without self modifying code. The _only_ point of self modifying code is performance. None other reason to use it. lkcd is definitely called in an extremely slow path (infact if all goes right it should never be recalled), so it doesn't give any advantage to use self modifying code there. > [..] That > gives the user more fexibility. Even enterprise customers can sometimes > hedge their bets when it comes to RAS-like features. I agree that being able to enable/disable lkcd dynamically is fine feature. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] Generalised Kernel Hooks Interface (GKHI)
Andrea, I am very greatful for your detailed analysis. I have yet to digest everything you commented but will get back to you on all points you raise very soon. Here are my thoughts so far: > I think gkhi should be renamed to something like "Fast Unregistered Kernel > Hook Interface" to avoid confusion and wrong usage of it that would otherwise > leads to lower performance. A fair point. On point 3): > 3) > > gkhi apparently doesn't yet know the word "SMP" 8). When I announced GKHI I did state that SMP support was to follow. The updates are trivial but I didn't wan't to release the code until I had had a chance to test it. On point 4) > 4) > > gkh_iflush should be done with flush_icache_range that is infact implemented > wrong for IA32 and it should be implemented as regs trashing cpuid (the fact > it's wrongly implemented means that in theory modules can break in IA32 > on 2.4.x 2.2.x even on UP). Are you claiming that flush_icache_range has an error and should implement the IA32 instruction flush as I did using CPUID? If this is the case has this error been officially reported? On point 5) 5) > Current dprobes v1.1.1 against 2.2.16 cames with a syscall collision: > sys_dprobes collides with ugetrlimit (not implemented in 2.2.x). That's fine > for internal use and to show the code, but make _sure_ not to ship any binary > to anybody implementing ugetrlimit as sys_dprobes 8). > > Richard please ask Linus to reserve a syscall for dprobes. I recommend to > allocate the syscall out of the way (I mean using syscall 255 or even better > enlarging the syscall table from 256 to 512 and using number 511) so we make > sure not to waste precious dcachelines for debugging stuff. Thanks for this information. Reserving a syscall will become irrelvant when we release Dprobes as a module using gkhi since we will use ioctl() as the application interface. > BTW, for things like lkcd there's no reason to use gkhi to make it completly > modular since lkcd gets recalled in a completly slow path. Well, not necessarily so while lkcd is not get accepted into the standard kernel source. But also, even when lkcd becomes accepted, using gkhi with lkcd will allow a crash dump capability to be actived dynamically. That gives the user more fexibility. Even enterprise customers can sometimes hedge their bets when it comes to RAS-like features. Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] Generalised Kernel Hooks Interface (GKHI)
Please respond to Andrea Arcangeli [EMAIL PROTECTED] To: Richard J Moore/UK/IBM@IBMGB cc: Subject: Re: [ANNOUNCE] Generalised Kernel Hooks Interface (GKHI) On Wed, Nov 15, 2000 at 05:14:57AM +, [EMAIL PROTECTED] wrote: Andrea, I am very greatful for your detailed analysis. I have yet to digest everything you commented but will get back to you on all points you raise very soon. Here are my thoughts so far: I'm glad you appreciated my comments. I think dprobes gives an higher level of flexibility for debugging purposes and I'd really like to include it in the aa kernels until it will be included into the mainstream. When I announced GKHI I did state that SMP support was to follow. The I probably overlooked that part of your announcement, sorry. updates are trivial but I didn't wan't to release the code until I had had a chance to test it. Very promising. Note that SMP could introduce non trivial issues: the self modifying changes should be atomic with respect of the other CPUs executing the self modifying code and specs are often not very explicit about side effects of self modifying code in SMP, it's not only a matter of implementing the GKHI locks with SMP locks). Are you claiming that flush_icache_range has an error and should implement the IA32 instruction flush as I did using CPUID? If this is the case has Exactly. this error been officially reported? I hope I did that in my email :). Actually when I fixed the alpha port some month ago (alpha needs an explicit imb() to flush the speculative icache and it was really crashing in modules because of the missing smp_imb) I also noticed IA32 was buggy, since also IA32 execute out of order and specs says we must do a cpuid as only way to serialize the istream. But I didn't fixed IA32 because nobody ever got bitten by that race because of timing/implementation reasons, but to be correcet we should do cpuid also in flush_icache_range. Once flush_icache_range is fixed in IA32 you can use it inside GKHI too (and then you'll get it right on all architectures). Thanks for this information. Reserving a syscall will become irrelvant when we release Dprobes as a module using gkhi since we will use ioctl() as the application interface. Ok (still you need to reserve a blockdevice major minor number with Linus though). Well, not necessarily so while lkcd is not get accepted into the standard kernel source. [..] It won't until it uses a separate driver that doesn't depend on scsi or ide layer. Even ignoring the safety problem of scsi layer potentially corrupted by memory corruption at crash time, the scsi layer doesn't work without being interrupt driven. It will recurse on the stack badly if somebody ever tries to use it polled. Probably similar thing happens with IDE (but none IDE polled hardware exists so we don't know). I documented all this in the `Linux Kernel Debugging' document on my ftp area in ftp.suse.com. [.] But also, even when lkcd becomes accepted, using gkhi with lkcd will allow a crash dump capability to be actived dynamically. [..] We can control everything dynamically without self modifying code. The _only_ point of self modifying code is performance. None other reason to use it. lkcd is definitely called in an extremely slow path (infact if all goes right it should never be recalled), so it doesn't give any advantage to use self modifying code there. [..] That gives the user more fexibility. Even enterprise customers can sometimes hedge their bets when it comes to RAS-like features. I agree that being able to enable/disable lkcd dynamically is fine feature. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] Generalised Kernel Hooks Interface (GKHI)
Andrea, I am very greatful for your detailed analysis. I have yet to digest everything you commented but will get back to you on all points you raise very soon. Here are my thoughts so far: I think gkhi should be renamed to something like "Fast Unregistered Kernel Hook Interface" to avoid confusion and wrong usage of it that would otherwise leads to lower performance. A fair point. On point 3): 3) gkhi apparently doesn't yet know the word "SMP" 8). When I announced GKHI I did state that SMP support was to follow. The updates are trivial but I didn't wan't to release the code until I had had a chance to test it. On point 4) 4) gkh_iflush should be done with flush_icache_range that is infact implemented wrong for IA32 and it should be implemented as regs trashing cpuid (the fact it's wrongly implemented means that in theory modules can break in IA32 on 2.4.x 2.2.x even on UP). Are you claiming that flush_icache_range has an error and should implement the IA32 instruction flush as I did using CPUID? If this is the case has this error been officially reported? On point 5) 5) Current dprobes v1.1.1 against 2.2.16 cames with a syscall collision: sys_dprobes collides with ugetrlimit (not implemented in 2.2.x). That's fine for internal use and to show the code, but make _sure_ not to ship any binary to anybody implementing ugetrlimit as sys_dprobes 8). Richard please ask Linus to reserve a syscall for dprobes. I recommend to allocate the syscall out of the way (I mean using syscall 255 or even better enlarging the syscall table from 256 to 512 and using number 511) so we make sure not to waste precious dcachelines for debugging stuff. Thanks for this information. Reserving a syscall will become irrelvant when we release Dprobes as a module using gkhi since we will use ioctl() as the application interface. BTW, for things like lkcd there's no reason to use gkhi to make it completly modular since lkcd gets recalled in a completly slow path. Well, not necessarily so while lkcd is not get accepted into the standard kernel source. But also, even when lkcd becomes accepted, using gkhi with lkcd will allow a crash dump capability to be actived dynamically. That gives the user more fexibility. Even enterprise customers can sometimes hedge their bets when it comes to RAS-like features. Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Newbie
Not even Intel can spell kernal [sic] - see 486 Programmer's reference - description of protection mechanism. BTW one of the enhancements to the Pentium was an improvement in the spelling of kernel. :-) Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] Generalised Kernel Hooks Interface (GKHI)
Andi Kleen wrote: >I think using dprobes for collecting information is ok, but when you want >to do actual actions with it (not only using it as a debugger) IMHO it >is better to patch and recompile the kernel. I absolutely agree. The only time I ever used this capability was to modify a proprietary binary, for which I did not have the source, so that I could prove to the owner what needed fixing. >As far as I can see GKHI is overkill for dprobes alone, the existing >notifier lists would be sufficient because dprobes does not hook into any >performance critical paths. Again, I agree. My intent is that the RAS guys might club together - then GKHI make much more sense. Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] Generalised Kernel Hooks Interface (GKHI)
Andi Kleen wrote: I think using dprobes for collecting information is ok, but when you want to do actual actions with it (not only using it as a debugger) IMHO it is better to patch and recompile the kernel. I absolutely agree. The only time I ever used this capability was to modify a proprietary binary, for which I did not have the source, so that I could prove to the owner what needed fixing. As far as I can see GKHI is overkill for dprobes alone, the existing notifier lists would be sufficient because dprobes does not hook into any performance critical paths. Again, I agree. My intent is that the RAS guys might club together - then GKHI make much more sense. Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] Generalised Kernel Hooks Interface (GKHI)
Alexander Viro wrote: > It's not a good idea, it's an obvious fact. Oh, you mean forking the tree? Again I find your terminology at odds with mine; what do you mean by forking the tree? I get the impression that it's a very restrictive notion where any functional ehancement applied as a patch on top of a standard distribution kernel is considered by you as forking? Is that so? (And BTW by patch I mean input to the patch command.) Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] Generalised Kernel Hooks Interface (GKHI)
Andi Kleen wrote: > It will just help some people who have a unrational aversion against kernel >recompiles and believe in vendor blessed binaries. An interesting remark Andi, especially in the light of your note to me regarding your use of DProbes - i.e. you'd rather use DProbes to dump out some info from the kernel than recompile it with printks. I don't have an aversion to recompiling the kernel - it's great fun - I love watching all the meeages go by, waiting with bated breath for a compile error, which never seems to happen. Just like watching the National Lottery, waiting for your own numbers to come up. To be a little more serious, it's not recompilation that's a problem, its re-working a set of (non-standard) patches together. I'm not that excited by that - I'd rather develop new code than rework old. Anyway for a couple of example scenarios see the response I made to Michael Rothwell. And by the way, I absolutely agree with your approach to kernel problem solving - but wouldn't it be a help if you didn't have to put a large or even moderate effort into working the DProbes patch into some hot-off-the-press version of the kernel? Richard Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] Generalised Kernel Hooks Interface (GKHI)
Andi Kleen wrote: It will just help some people who have a unrational aversion against kernel recompiles and believe in vendor blessed binaries. An interesting remark Andi, especially in the light of your note to me regarding your use of DProbes - i.e. you'd rather use DProbes to dump out some info from the kernel than recompile it with printks. I don't have an aversion to recompiling the kernel - it's great fun - I love watching all the meeages go by, waiting with bated breath for a compile error, which never seems to happen. Just like watching the National Lottery, waiting for your own numbers to come up. To be a little more serious, it's not recompilation that's a problem, its re-working a set of (non-standard) patches together. I'm not that excited by that - I'd rather develop new code than rework old. Anyway for a couple of example scenarios see the response I made to Michael Rothwell. And by the way, I absolutely agree with your approach to kernel problem solving - but wouldn't it be a help if you didn't have to put a large or even moderate effort into working the DProbes patch into some hot-off-the-press version of the kernel? Richard Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] Generalised Kernel Hooks Interface (GKHI)
Alexander Viro wrote: It's not a good idea, it's an obvious fact. Oh, you mean forking the tree? Again I find your terminology at odds with mine; what do you mean by forking the tree? I get the impression that it's a very restrictive notion where any functional ehancement applied as a patch on top of a standard distribution kernel is considered by you as forking? Is that so? (And BTW by patch I mean input to the patch command.) Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] Generalised Kernel Hooks Interface (GKHI)
No, misunderstood. GKHI is not implemented using dynamic probes. GKHI places in the kernel calls to APIs in the DProbes code. Since we'ed rather have Dprobes out of the kernel then essentially it acts as a loader after the fact, i.e. it fixes up the DProbes API calls when the DProbe module loads. Compare this with the usual loading process where the fix-ups are done in the module being loaded. Now, you might want to ask me why I want DProbes as a module? They again you might not. Either way is fine by me ;-) Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK Andi Kleen <[EMAIL PROTECTED]> on 10/11/2000 16:54:05 Please respond to Andi Kleen <[EMAIL PROTECTED]> To: "Theodore Y. Ts'o" <[EMAIL PROTECTED]> cc: Richard J Moore/UK/IBM@IBMGB, Paul Jakma <[EMAIL PROTECTED]>, Michael Rothwell <[EMAIL PROTECTED]>, Christoph Rohland <[EMAIL PROTECTED]>, [EMAIL PROTECTED] Subject: Re: [ANNOUNCE] Generalised Kernel Hooks Interface (GKHI) On Fri, Nov 10, 2000 at 11:24:28AM -0500, Theodore Y. Ts'o wrote: > Right. So what you're saying is that GKHI is adding complexity to the > kernel to make it easier for peopel to put in non-standard patches which > exposes non-standard interfaces which will lead to kernels not supported > by the Linux Kernel Development Community. Right? My understanding is that GKHI does not change the kernel at all, except for the three hooks needed for dprobes. All GKHI hooks are implemented as dynamic probes, which are just like debugger breakpoints. A dynamic probes breakpoint does not require any source changes, but you have to check the assembly to find the right point for them (at least in the current version, I don't know if IBM is planning to support source level dprobes using the debugging information) IMHO GKHI does not make mainteance of additional modules any easier, because you always have to recheck the assembly if the dynamic probe still fits (which may in some cases even be more work than reporting source patches, it is also harder when you want to cover multiple architectures) It will just help some people who have a unrational aversion against kernel recompiles and believe in vendor blessed binaries. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] Generalised Kernel Hooks Interface (GKHI)
Matti, >Please educate me, what does "our RAS offerings" mean here ? >(I didn't find "RAS" at your signature-URL site, but I didn't > poke around very much..) RAS = Reliabilty, Availability & Serviceability = those things that are are not mainline to an OS but add the qualities named in the acronym. That includes self-healing, recoverability, diagnosis etc.. My specialism is in probelm diagnosis. When I say RAS I generally mean debugging/diagnosis aids, but I also mean it not from a developement standpoint but from a support standpoint. Which depending on how a system is deployed can be very different things. >I do know that when IBM suits speak with phrases like that, >they are selling me something which costs $$$. No always. All this stuff (Linux RAS) is free and given away under GPL. And I'm not waring a suit either ;-) RAS in general is not sexy, it's difficult to sell. So it's pad for from after sales service and other indirect means. >Which definitely gives proprietary, binary only, hook image... >But GKHI, and DProbes are neither. Thus I am confused, but can >understand the furor... Well I sort of can and can't as well. Here's a couple of circumstances where I'd find GKHI useful: I'm developing DProbes, I need the SGI KDB, a complex patch, as a debugging aid. I also want to keep up with various kernel version. I've got limited time so would like to spend it on DProbes on not re-working patches. So, I know that DProbes only needs to get control in three places in kernel processing: 1) Trap 1 handler 2) Trap 3 handler 3) Pagefault handler. I reason that if I had a call inserted into the kernel source at these points to respective entry points in my DProbes code I'd not have to spend much time integrating SGI's KDB with DProbes. The I realise if I leave the calls nop'd and dynamically patch them in later I can build the kernel once and re-build DProbes (now a module) many times over - and sometimes without re-booting. So I create GKHI to mess around with NOPs converitng them into calls. Second scenario: I have a customer running Linux for a business purpose. They are not developers and have no programming skill. Every now and again their system crashes and they have to reboot. And down time costs them in real terms. So they say to IBM, you supplied this ... system, you fix it. OK we say, we'll need a dump. We'll send you a kernel with SGI's crash dump built in. On no you won't they say, you're not sending us any more dodgy code until you've fixed this problem. Anyway the server is in a secure remote branch office. There's no technoical support on site and we cannot possibly have developers messing with that system. Now suppose SGI or IBM have converted SGI Kernel Crash Dump to a module and we supplly the system customised with a few GKHI hooks in place. Then we say issue the following command: insmod lkcd.o We get a dump and discover that some cheesehead had overlaid a spinlock causing re-entrancy and a crash. OK we say we know what happened by not who did it. So, we need to trace all storage alterations to the spin-lock. There are only a few valid user's of that spin-lock, which if we had a trace, we could eliminate. By now DProbes and Linux Trace Toolkit are working well together, and providing a dynamic trace capability. Also DProbes is now offering the capability of probes on storage modifications. So we say to the customer please issue three commands: insmod lkcd; insmod dprobes; insmod ltt; The system crashes, we get the trace. We duly find the address from which the spin-lock was overwriten. We look in the dump and find it's a routine in a device driver that's been passed invalid data, but actually, not passed, but placed on a work queue. And furthermore the invalid data has a particular look to it. We explain to the customer that we now need just one more pice of information. So finally we place a dprobe on the enqueuing routine, looking at data enqueued until the invalid pattern occurs and make the probe trigger another dump. And finally we have it. The enqueuing routine was another driver . This scenario is not untypical of the sort of problem I earnt my living solving for the past n years. Now we could have supplied a system with Crash Dump, Dprobes, LTT, KDB, a dozen other specialised RAS tools. The kernel would have 50% bigger, cost us considerable time and effort whenever kernel maintenance was applied - with obvious consequences. And in the end 99.9% of the time we don't need these facilities, coz after all Linux is a pretty stable platform. So why not allow them to be brought in dynamically. Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Re: [ANNOUNCE] Generalised Kernel Hooks Interface (GKHI)
> Right. So what you're saying is that GKHI is adding complexity to the > kernel to make it easier for peopel to put in non-standard patches which > exposes non-standard interfaces which will lead to kernels not supported > by the Linux Kernel Development Community. Right? I don't think I mentioned complexity - in what was do you mean adding complexity? If you mean addition funciton then yes. If you mean kernel source modificaiton then no. If you mean the mechanism then it's nothing special - just what a loader does when it fixes up an external reference. > But if there are no standard hooks in the mainline kernel, it's going to > be hard to pursuade people that adding the GKHI would be a good thing. > So for the purposes of getting GKHI into the kernel, argueing for GKHI > in the abstract is putting the card before the horse. What I would > recommend is showing how certain hooks are good things to have in the > mainline kernel, and to try to pursuade people of that question first. > Only then try to argue for GKHI. Quite agree, there are no standard hooks, no hooks at all in fact. I'm neither seeking to get this accepted or not accepted into the kernel. What it does do is give me an easier time both as a developer and an installer when I want to include some rarefied code additions. GKHI was developed primarily for DProbes. > P.S. There are some such RAS features which I wouldn't be surprised > there being interest in having integrated into the kernel directly > post-2.4, Great! Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] Generalised Kernel Hooks Interface (GKHI)
> The problem with the hooks et.al. is very simple - they promote every > bloody implementation detail to exposed API. Surely not, having the kernel source does that. The alternative to the hook is embed a patch in the kernel source. What proveds greater exposure to internals: hooks of source? Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] Generalised Kernel Hooks Interface (GKHI)
> That being said, the real problem with the GKHI is that as Al said, it > does expose internal kernel interfaces --- and the Linux kernel > development community as a whole refuses to be bound by such interfaces, > sometimes even during a stable kernel series. I'm not sure that GKHI exposes any more interfaces than embedding a patch directly into the kernel would. It has the potential to to make patches easier to re-work for different kernel versions, and to enable development maintence and fixing of the patch to be done independently of a kernel build. And it also has the potential of helping with co-existence. If for example the RAS community could agree on a number of hooks (I'm thinking here of crash dump, trace, dprobes and maybe KDB as well) then you'd probably find a good may on them using then same hooks. The modifications to the kernel would be minimal and the user would be left an easy means of installing a co-existing subset of the offerings supported by hooks. An example: DProbes is down to three hooks - that's three lines of code in the kernel + three lines in ksyms.c Patching DProbes onto any custom kernel is a doddle. Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] Generalised Kernel Hooks Interface (GKHI)
>> Why? I think the IBM GKHI code would be of tremendous value. It would > > And we already refuse to support those kernels - your point being? > > Making this "commonplace" is a nightmare. Go away with that. How is so? Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] Generalised Kernel Hooks Interface (GKHI)
>> extensions using the GKHI would not be breaking the license agreement, I >> don't think. There's lots of binary modules right now -- VMWare, Aureal > sound card drivers, etc. > >All of which just cause large numbers of bugs to go in the bitbucket because >nobody can tell whose the problem is. I don't understand your point - are you saying that the existence of kernel modules causes makes problems more difficult to solve. Why would that be? Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] Generalised Kernel Hooks Interface (GKHI)
> Yes, and that's why I am opposing here: Technically you are right, but > proposing that enterprise Linux should go this way is inviting binary > only modules due to the lax handling of modules. Not so sure it does. If a kernel module wants to make use of GKHI then it will have to 1) include a GKHI header file or copy some of the code in it, 2) Update kernel source in a minimal way to add the callbacks Wouldn't 1) under GPL terms force the kernel module to be GPL? Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] Generalised Kernel Hooks Interface (GKHI)
That being said, the real problem with the GKHI is that as Al said, it does expose internal kernel interfaces --- and the Linux kernel development community as a whole refuses to be bound by such interfaces, sometimes even during a stable kernel series. I'm not sure that GKHI exposes any more interfaces than embedding a patch directly into the kernel would. It has the potential to to make patches easier to re-work for different kernel versions, and to enable development maintence and fixing of the patch to be done independently of a kernel build. And it also has the potential of helping with co-existence. If for example the RAS community could agree on a number of hooks (I'm thinking here of crash dump, trace, dprobes and maybe KDB as well) then you'd probably find a good may on them using then same hooks. The modifications to the kernel would be minimal and the user would be left an easy means of installing a co-existing subset of the offerings supported by hooks. An example: DProbes is down to three hooks - that's three lines of code in the kernel + three lines in ksyms.c Patching DProbes onto any custom kernel is a doddle. Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] Generalised Kernel Hooks Interface (GKHI)
The problem with the hooks et.al. is very simple - they promote every bloody implementation detail to exposed API. Surely not, having the kernel source does that. The alternative to the hook is embed a patch in the kernel source. What proveds greater exposure to internals: hooks of source? Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] Generalised Kernel Hooks Interface (GKHI)
Right. So what you're saying is that GKHI is adding complexity to the kernel to make it easier for peopel to put in non-standard patches which exposes non-standard interfaces which will lead to kernels not supported by the Linux Kernel Development Community. Right? I don't think I mentioned complexity - in what was do you mean adding complexity? If you mean addition funciton then yes. If you mean kernel source modificaiton then no. If you mean the mechanism then it's nothing special - just what a loader does when it fixes up an external reference. But if there are no standard hooks in the mainline kernel, it's going to be hard to pursuade people that adding the GKHI would be a good thing. So for the purposes of getting GKHI into the kernel, argueing for GKHI in the abstract is putting the card before the horse. What I would recommend is showing how certain hooks are good things to have in the mainline kernel, and to try to pursuade people of that question first. Only then try to argue for GKHI. Quite agree, there are no standard hooks, no hooks at all in fact. I'm neither seeking to get this accepted or not accepted into the kernel. What it does do is give me an easier time both as a developer and an installer when I want to include some rarefied code additions. GKHI was developed primarily for DProbes. P.S. There are some such RAS features which I wouldn't be surprised there being interest in having integrated into the kernel directly post-2.4, Great! Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] Generalised Kernel Hooks Interface (GKHI)
No, misunderstood. GKHI is not implemented using dynamic probes. GKHI places in the kernel calls to APIs in the DProbes code. Since we'ed rather have Dprobes out of the kernel then essentially it acts as a loader after the fact, i.e. it fixes up the DProbes API calls when the DProbe module loads. Compare this with the usual loading process where the fix-ups are done in the module being loaded. Now, you might want to ask me why I want DProbes as a module? They again you might not. Either way is fine by me ;-) Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK Andi Kleen [EMAIL PROTECTED] on 10/11/2000 16:54:05 Please respond to Andi Kleen [EMAIL PROTECTED] To: "Theodore Y. Ts'o" [EMAIL PROTECTED] cc: Richard J Moore/UK/IBM@IBMGB, Paul Jakma [EMAIL PROTECTED], Michael Rothwell [EMAIL PROTECTED], Christoph Rohland [EMAIL PROTECTED], [EMAIL PROTECTED] Subject: Re: [ANNOUNCE] Generalised Kernel Hooks Interface (GKHI) On Fri, Nov 10, 2000 at 11:24:28AM -0500, Theodore Y. Ts'o wrote: Right. So what you're saying is that GKHI is adding complexity to the kernel to make it easier for peopel to put in non-standard patches which exposes non-standard interfaces which will lead to kernels not supported by the Linux Kernel Development Community. Right? My understanding is that GKHI does not change the kernel at all, except for the three hooks needed for dprobes. All GKHI hooks are implemented as dynamic probes, which are just like debugger breakpoints. A dynamic probes breakpoint does not require any source changes, but you have to check the assembly to find the right point for them (at least in the current version, I don't know if IBM is planning to support source level dprobes using the debugging information) IMHO GKHI does not make mainteance of additional modules any easier, because you always have to recheck the assembly if the dynamic probe still fits (which may in some cases even be more work than reporting source patches, it is also harder when you want to cover multiple architectures) It will just help some people who have a unrational aversion against kernel recompiles and believe in vendor blessed binaries. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] Generalised Kernel Hooks Interface (GKHI)
Matti, Please educate me, what does "our RAS offerings" mean here ? (I didn't find "RAS" at your signature-URL site, but I didn't poke around very much..) RAS = Reliabilty, Availability Serviceability = those things that are are not mainline to an OS but add the qualities named in the acronym. That includes self-healing, recoverability, diagnosis etc.. My specialism is in probelm diagnosis. When I say RAS I generally mean debugging/diagnosis aids, but I also mean it not from a developement standpoint but from a support standpoint. Which depending on how a system is deployed can be very different things. I do know that when IBM suits speak with phrases like that, they are selling me something which costs $$$. No always. All this stuff (Linux RAS) is free and given away under GPL. And I'm not waring a suit either ;-) RAS in general is not sexy, it's difficult to sell. So it's pad for from after sales service and other indirect means. Which definitely gives proprietary, binary only, hook image... But GKHI, and DProbes are neither. Thus I am confused, but can understand the furor... Well I sort of can and can't as well. Here's a couple of circumstances where I'd find GKHI useful: I'm developing DProbes, I need the SGI KDB, a complex patch, as a debugging aid. I also want to keep up with various kernel version. I've got limited time so would like to spend it on DProbes on not re-working patches. So, I know that DProbes only needs to get control in three places in kernel processing: 1) Trap 1 handler 2) Trap 3 handler 3) Pagefault handler. I reason that if I had a call inserted into the kernel source at these points to respective entry points in my DProbes code I'd not have to spend much time integrating SGI's KDB with DProbes. The I realise if I leave the calls nop'd and dynamically patch them in later I can build the kernel once and re-build DProbes (now a module) many times over - and sometimes without re-booting. So I create GKHI to mess around with NOPs converitng them into calls. Second scenario: I have a customer running Linux for a business purpose. They are not developers and have no programming skill. Every now and again their system crashes and they have to reboot. And down time costs them in real terms. So they say to IBM, you supplied this ... system, you fix it. OK we say, we'll need a dump. We'll send you a kernel with SGI's crash dump built in. On no you won't they say, you're not sending us any more dodgy code until you've fixed this problem. Anyway the server is in a secure remote branch office. There's no technoical support on site and we cannot possibly have developers messing with that system. Now suppose SGI or IBM have converted SGI Kernel Crash Dump to a module and we supplly the system customised with a few GKHI hooks in place. Then we say issue the following command: insmod lkcd.o We get a dump and discover that some cheesehead had overlaid a spinlock causing re-entrancy and a crash. OK we say we know what happened by not who did it. So, we need to trace all storage alterations to the spin-lock. There are only a few valid user's of that spin-lock, which if we had a trace, we could eliminate. By now DProbes and Linux Trace Toolkit are working well together, and providing a dynamic trace capability. Also DProbes is now offering the capability of probes on storage modifications. So we say to the customer please issue three commands: insmod lkcd; insmod dprobes; insmod ltt; The system crashes, we get the trace. We duly find the address from which the spin-lock was overwriten. We look in the dump and find it's a routine in a device driver that's been passed invalid data, but actually, not passed, but placed on a work queue. And furthermore the invalid data has a particular look to it. We explain to the customer that we now need just one more pice of information. So finally we place a dprobe on the enqueuing routine, looking at data enqueued until the invalid pattern occurs and make the probe trigger another dump. And finally we have it. The enqueuing routine was another driver . This scenario is not untypical of the sort of problem I earnt my living solving for the past n years. Now we could have supplied a system with Crash Dump, Dprobes, LTT, KDB, a dozen other specialised RAS tools. The kernel would have 50% bigger, cost us considerable time and effort whenever kernel maintenance was applied - with obvious consequences. And in the end 99.9% of the time we don't need these facilities, coz after all Linux is a pretty stable platform. So why not allow them to be brought in dynamically. Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body
Re: [ANNOUNCE] Generalised Kernel Hooks Interface (GKHI)
Let be clear about one thing: the GKHI make no statement about enabling proprietary extensions and that's a common misconception. GKHI is intended to make optional facilities easier to co-install and change. We designed it for DProbes, and when modularised will remain a GPL opensource offering. The only motivation for providing GKHI is to make the kernel more acceptable to the enterprise customer, but allowing, for example, RAS capabilities to be brough in easily and dynmaically. This type of customer will not readily succome to on-the-fly kernel rebuilds to diagnose problems that occur only in complex production environments. If anything opens the door to proprietary extensions it's the loadable kernel modules capability or perhaps the loose wording of the GPL which doesn't catch loadable kernel modules, or whatever... Bottom line GKHI really has no bearing on this. Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK Christoph Rohland <[EMAIL PROTECTED]> on 09/11/2000 07:44:11 Please respond to Christoph Rohland <[EMAIL PROTECTED]> To: Michael Rothwell <[EMAIL PROTECTED]> cc: Richard J Moore/UK/IBM@IBMGB, [EMAIL PROTECTED] Subject: Re: [ANNOUNCE] Generalised Kernel Hooks Interface (GKHI) Hi Michael, On Wed, 08 Nov 2000, Michael Rothwell wrote: > Sounds great; unfortunately, the core group has spoken out against a > modular kernel. > > Perhaps IBM should get together with SGI, HP and other interested > parties and start an Advanced Linux Kernel Project. Then they can > run off and make their scalable, modular, enterprise kernel and the > Linus Version can always merge back in features from it. *Are you crazy?* =:-0 Proposing proprietary kernel extensions to establish an enterprise kernel? No thanks! Greetings Christoph - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] Generalised Kernel Hooks Interface (GKHI)
Let be clear about one thing: the GKHI make no statement about enabling proprietary extensions and that's a common misconception. GKHI is intended to make optional facilities easier to co-install and change. We designed it for DProbes, and when modularised will remain a GPL opensource offering. The only motivation for providing GKHI is to make the kernel more acceptable to the enterprise customer, but allowing, for example, RAS capabilities to be brough in easily and dynmaically. This type of customer will not readily succome to on-the-fly kernel rebuilds to diagnose problems that occur only in complex production environments. If anything opens the door to proprietary extensions it's the loadable kernel modules capability or perhaps the loose wording of the GPL which doesn't catch loadable kernel modules, or whatever... Bottom line GKHI really has no bearing on this. Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK Christoph Rohland [EMAIL PROTECTED] on 09/11/2000 07:44:11 Please respond to Christoph Rohland [EMAIL PROTECTED] To: Michael Rothwell [EMAIL PROTECTED] cc: Richard J Moore/UK/IBM@IBMGB, [EMAIL PROTECTED] Subject: Re: [ANNOUNCE] Generalised Kernel Hooks Interface (GKHI) Hi Michael, On Wed, 08 Nov 2000, Michael Rothwell wrote: Sounds great; unfortunately, the core group has spoken out against a modular kernel. Perhaps IBM should get together with SGI, HP and other interested parties and start an Advanced Linux Kernel Project. Then they can run off and make their scalable, modular, enterprise kernel and the Linus Version can always merge back in features from it. *Are you crazy?* =:-0 Proposing proprietary kernel extensions to establish an enterprise kernel? No thanks! Greetings Christoph - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
[ANNOUNCE] Generalised Kernel Hooks Interface (GKHI)
We've just release version 0.6 of Generalised Kernel Hooks Interface (GKHI) see the IBM Linux Technology Centre's web page DProbes link: http://oss.software.ibm.com/developerworks/opensource/linux Some folks expressed an interest in this type of facility recently in discussions concerning making call-backs from the kernel to kernel modules. Here's the abstract for this facility. With this intend to modularise our RAS offerings, in particular DProbes, so that they can be applied dynamically without having to be carried as excess baggage. Abstract: Generalised Kernel Hooks Interface (GKHI) is a generalised facility for placing hooks or exits in arbitrary kernel locations. It enables many kernel enhancements, which are otherwise self-contained, to become loadable kernel modules and retain a substantial degree of independence from the kernel source. This affords advantages for maintenance and co-existence with other kernel enhancements. The hook interface allows multiple kernel modules to register their exits for a given hook, in order to receive control at that hook location. Multiple hooks may be defined within the kernel and a singe kernel module may register exits to use multiple hooks. When hook exits register they may specify co-existence criteria. Hooks may be placed in kernel modules as well as the kernel itself with the proviso that the modules with hooks are loaded before the gkhi hook interfacing module. A hook exit receives control as if called from the code in which the hook is located. Parameters may be passed to a hook exit and may be modified by an exit. For more information down-load the tarball. Note: GHKI is in late beta test - we currently do not support SMP, that will occur soon. We also plan to support dynamic hook definition as little later on so that kernel modules may dynamically register hooks for other kernel modules to use. Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
[ANNOUNCE] Generalised Kernel Hooks Interface (GKHI)
We've just release version 0.6 of Generalised Kernel Hooks Interface (GKHI) see the IBM Linux Technology Centre's web page DProbes link: http://oss.software.ibm.com/developerworks/opensource/linux Some folks expressed an interest in this type of facility recently in discussions concerning making call-backs from the kernel to kernel modules. Here's the abstract for this facility. With this intend to modularise our RAS offerings, in particular DProbes, so that they can be applied dynamically without having to be carried as excess baggage. Abstract: Generalised Kernel Hooks Interface (GKHI) is a generalised facility for placing hooks or exits in arbitrary kernel locations. It enables many kernel enhancements, which are otherwise self-contained, to become loadable kernel modules and retain a substantial degree of independence from the kernel source. This affords advantages for maintenance and co-existence with other kernel enhancements. The hook interface allows multiple kernel modules to register their exits for a given hook, in order to receive control at that hook location. Multiple hooks may be defined within the kernel and a singe kernel module may register exits to use multiple hooks. When hook exits register they may specify co-existence criteria. Hooks may be placed in kernel modules as well as the kernel itself with the proviso that the modules with hooks are loaded before the gkhi hook interfacing module. A hook exit receives control as if called from the code in which the hook is located. Parameters may be passed to a hook exit and may be modified by an exit. For more information down-load the tarball. Note: GHKI is in late beta test - we currently do not support SMP, that will occur soon. We also plan to support dynamic hook definition as little later on so that kernel modules may dynamically register hooks for other kernel modules to use. Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Calling module symbols from inside the kernel !
We have a generic way of doing this which we are about to release - called GKHI (Generalised Kernel Hooks Interface) would you like a copy to test? Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Calling module symbols from inside the kernel !
We have a generic way of doing this which we are about to release - called GKHI (Generalised Kernel Hooks Interface) would you like a copy to test? Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] DProbes 1.1
Andi, Thanks for your feedback. We are looking at this now. Hopefully we will be able to give you a response on Monday. If we don't then it's because most of us are on holiday next week. I'm interested in getting information on who is using DProbes and how its being used? Yes, an also that we haven't yet done the SMP port of Dprobes - that's next. Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK Andi Kleen <[EMAIL PROTECTED]> on 18/10/2000 18:38:13 Please respond to Andi Kleen <[EMAIL PROTECTED]> To: Richard J Moore/UK/IBM@IBMGB cc: [EMAIL PROTECTED] Subject: Re: [ANNOUNCE] DProbes 1.1 Hallo Richard, On Wed, Oct 18, 2000 at 10:44:11AM +0100, [EMAIL PROTECTED] wrote: > > > We've release v1.1 of DProbes - deatils and code is on the DProbes web > page. > > the enhancements include: > > - DProbes for kernel version 2.4.0-test7 is now available. First thanks for this nice work. I ported the older 1.0 dprobes to 2.4 a few weeks ago for my own use. It is very useful for kernel work. Unfortunately the user space support had still one ugly race which I didn't fix because it required too extensive changes for my simple port (and it didn't concern me because I only use kernel level breakpoints) I see the problems are still in 1.1. The problem is the vma loop in process_recs_in_cow_pages over the vmas of an address_space. In 2.4 the only way to do that safely is to hold the address_space spinlock. Unfortunately you cannot take the semaphore or execute handle_mm_fault while holding the spinlock, because they could sleep. The only way I think to do it relatively race free without adding locks to the core VM is to do it two pass (first collect all the mms with mmget() and their addresses in a separate list with the spinlock and then process it with the spinlock released) Then dp_vaddr_to_page has another race. It cannot hold the mm semaphore because that would deadlock with handle_mm_struct. Not holding it means though that the page could be swapped out again after you faulted it in before you have a change to access it. It probably can be done with an loop that checks and locks the page atomically (e.g. using cmpexchg) and retries the handle_mm_fault as needed. There may be more races I missed, the 2.4 SMP MM locking hierarchy is unfortunately not very flexible and makes things like what dprobes wants to do relatively hard. Another change I added and which I found useful is a printk to show the opcode of mismatched probes (this way wrong offsets in the probe definitions are easier to fix) -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] DProbes 1.1
Andi, Thanks for your feedback. We are looking at this now. Hopefully we will be able to give you a response on Monday. If we don't then it's because most of us are on holiday next week. I'm interested in getting information on who is using DProbes and how its being used? Yes, an also that we haven't yet done the SMP port of Dprobes - that's next. Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK Andi Kleen [EMAIL PROTECTED] on 18/10/2000 18:38:13 Please respond to Andi Kleen [EMAIL PROTECTED] To: Richard J Moore/UK/IBM@IBMGB cc: [EMAIL PROTECTED] Subject: Re: [ANNOUNCE] DProbes 1.1 Hallo Richard, On Wed, Oct 18, 2000 at 10:44:11AM +0100, [EMAIL PROTECTED] wrote: We've release v1.1 of DProbes - deatils and code is on the DProbes web page. the enhancements include: - DProbes for kernel version 2.4.0-test7 is now available. First thanks for this nice work. I ported the older 1.0 dprobes to 2.4 a few weeks ago for my own use. It is very useful for kernel work. Unfortunately the user space support had still one ugly race which I didn't fix because it required too extensive changes for my simple port (and it didn't concern me because I only use kernel level breakpoints) I see the problems are still in 1.1. The problem is the vma loop in process_recs_in_cow_pages over the vmas of an address_space. In 2.4 the only way to do that safely is to hold the address_space spinlock. Unfortunately you cannot take the semaphore or execute handle_mm_fault while holding the spinlock, because they could sleep. The only way I think to do it relatively race free without adding locks to the core VM is to do it two pass (first collect all the mms with mmget() and their addresses in a separate list with the spinlock and then process it with the spinlock released) Then dp_vaddr_to_page has another race. It cannot hold the mm semaphore because that would deadlock with handle_mm_struct. Not holding it means though that the page could be swapped out again after you faulted it in before you have a change to access it. It probably can be done with an loop that checks and locks the page atomically (e.g. using cmpexchg) and retries the handle_mm_fault as needed. There may be more races I missed, the 2.4 SMP MM locking hierarchy is unfortunately not very flexible and makes things like what dprobes wants to do relatively hard. Another change I added and which I found useful is a printk to show the opcode of mismatched probes (this way wrong offsets in the probe definitions are easier to fix) -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Clear interrupts on a SMP machine?
I think you need to use the smp_call_funtion service and define the function to be a spin_until_notified function. Each other processor will call spin_until_notified when it receives the IPI for smp_call_function. You can do what you need, then change some global that's keeping all the other processors strapped to spin_until_notified. Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
[ANNOUNCE] DProbes 1.1
We've release v1.1 of DProbes - deatils and code is on the DProbes web page. the enhancements include: - DProbes for kernel version 2.4.0-test7 is now available. - Provision to invoke other debug facilities (SGI KDB, Crash Dump and coredumps) from a probe program. - Probe points can now be applied on modules with existing probes. using the command line options --merge and --replace. - Support for generation and application of pre-built probe definition files. - Facility to select a subset of probes, in a probe definition file, to be applied by a probe's group and type. - Option to redirect logs to serial ports. - Support for global variables. - Access to floating point registers (read-only). - Reorganized dprobes interpreter code for clarity. - CVS restructured to cater for multiple kernel versions. Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
[ANNOUNCE] DProbes 1.1
We've release v1.1 of DProbes - deatils and code is on the DProbes web page. the enhancements include: - DProbes for kernel version 2.4.0-test7 is now available. - Provision to invoke other debug facilities (SGI KDB, Crash Dump and coredumps) from a probe program. - Probe points can now be applied on modules with existing probes. using the command line options --merge and --replace. - Support for generation and application of pre-built probe definition files. - Facility to select a subset of probes, in a probe definition file, to be applied by a probe's group and type. - Option to redirect logs to serial ports. - Support for global variables. - Access to floating point registers (read-only). - Reorganized dprobes interpreter code for clarity. - CVS restructured to cater for multiple kernel versions. Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Clear interrupts on a SMP machine?
I think you need to use the smp_call_funtion service and define the function to be a spin_until_notified function. Each other processor will call spin_until_notified when it receives the IPI for smp_call_function. You can do what you need, then change some global that's keeping all the other processors strapped to spin_until_notified. Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: DProbes with LTT
Karim, I've been back through an initial evaluation we did for LTT, back in May. One of the feature we highlighted we'd like to see was an ability to specify custom formatting templates. Our original OS/2 trace facility allowed the user to generate formatting templates which would specify printf-like controls. The templates were defined per major-minor code specification, which was used to identify uniquly a formatting type and was recorded with the trace record in the header. We'd like to see that functionality in LTT. Would port the code from OS/2 if LTT had a suitable formatting exit for custom events. Any thoughts on this? Richard Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: DProbes with LTT
Karim, I've been back through an initial evaluation we did for LTT, back in May. One of the feature we highlighted we'd like to see was an ability to specify custom formatting templates. Our original OS/2 trace facility allowed the user to generate formatting templates which would specify printf-like controls. The templates were defined per major-minor code specification, which was used to identify uniquly a formatting type and was recorded with the trace record in the header. We'd like to see that functionality in LTT. Would port the code from OS/2 if LTT had a suitable formatting exit for custom events. Any thoughts on this? Richard Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: The case for a standard kernel debugger
>Hence, yes I can provide an interface from the kernel to log a trace event >with a variable length buffer, but I don't think that taking away the statically >defined trace points is the right thing to do. (I might have gotten this >completely wrong, though ... My presumption about your suggestion of using >Dprobes to "drive" LTT, is that you mean that all events should come from >Dprobes and Drpobes alone. I could be wrong). > >So here's what I suggest: >There's already two event types within the events recognized by LTT which >had been planned for this type of usage. They are: "New event" and "Custom >event". The first is used to declare a new event type and the second is used >to log all such events. To declare a new event, the caller would call upon >an event ID creation function providing it with an event size. The function >would use the "New event" type to declare a new event in the log and would >return a unique event ID. Thereafter, the normal tracing function, already >available through the LTT kernel patch, could be used to log the new events. >This could be used by Dprobes to enable dynamically inserted probe points to >be logged within a normal trace and, thereafter, be part of trace analysis. >Does this fit your needs? 1) No I'm not suggestingf replacing static trace with dynmamic. One needs standard instrumentation built in. Its a matter of choice how that's implemented. 2) DProbes is amied at the "when all else fails" scenario and you need to develop additional tracepoints very quickly and possibly modify them as debugging proceeds. Of could you could open up the original code and put additional static tracepoints in, but that's not always desirable. 3) Yes, I think what you are suggesting is what we want. I'll pass this round the team and get back to you. Thanks for you interest Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: The case for a standard kernel debugger
Hence, yes I can provide an interface from the kernel to log a trace event with a variable length buffer, but I don't think that taking away the statically defined trace points is the right thing to do. (I might have gotten this completely wrong, though ... My presumption about your suggestion of using Dprobes to "drive" LTT, is that you mean that all events should come from Dprobes and Drpobes alone. I could be wrong). So here's what I suggest: There's already two event types within the events recognized by LTT which had been planned for this type of usage. They are: "New event" and "Custom event". The first is used to declare a new event type and the second is used to log all such events. To declare a new event, the caller would call upon an event ID creation function providing it with an event size. The function would use the "New event" type to declare a new event in the log and would return a unique event ID. Thereafter, the normal tracing function, already available through the LTT kernel patch, could be used to log the new events. This could be used by Dprobes to enable dynamically inserted probe points to be logged within a normal trace and, thereafter, be part of trace analysis. Does this fit your needs? 1) No I'm not suggestingf replacing static trace with dynmamic. One needs standard instrumentation built in. Its a matter of choice how that's implemented. 2) DProbes is amied at the "when all else fails" scenario and you need to develop additional tracepoints very quickly and possibly modify them as debugging proceeds. Of could you could open up the original code and put additional static tracepoints in, but that's not always desirable. 3) Yes, I think what you are suggesting is what we want. I'll pass this round the team and get back to you. Thanks for you interest Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: The case for a standard kernel debugger
Completely agree - co-operation+integration is the order of the day. They other thing I didn't mention was that the GKHI was substantially coded before we discovered your hook capability. Part of the GKHI is also to allow hooks to be dynamicaly defined i.e. to allow kernel modules to declare hooks themselves, though I haven't yet implemented that - the design is done. The MP complient remark refers to the fact that the hook mechanism must work under MP. Our hooks are implemented by modifying code dynamically - that has certain serialisation requrements: you need to ensure that other processors see a consistent view of memory, which means that you need to stop them while you're change code dynamically and also flush their I-fetch caches. There are some very odd (H/W) behavious that can occur with self-modifying code if the appropriate measures are not taken. Our uniprocessor version of GKHI is being tested as I write. We hope to release it in the next couple of weeks. As soon as we are happy with it I will send you a copy so you can evaluate it against you hook methodology and we can see what ecconomies can be established. And talking further of co-operation. I'd like to make DProbes drive your trace facility. Did you see the announcement post I sent to LTT? The idea is that DProbes is an enabler for other RAS facilities. We can dynamically insert a probe anywhere into memory (user and kernel) without the need for re-compilation of the source. From the RPN program that's driven by the probe event handler we can initiate other facilities such as entering SGI's kernel debugger or invoking Crash Dump or forcing a core dump. Now, DProbes came from OS/2 and was called dynamic trace. Its original purpose was to implement tracepoints on the fly. We can still do that with DProbes, provided we have a tracing mechanism we can feed into. That's where you come in. Can you provide an interface we can call from kernel space to log a trace event with a variable length buffer? Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK Karim Yaghmour <[EMAIL PROTECTED]> on 06/10/2000 09:16:12 Please respond to Karim Yaghmour <[EMAIL PROTECTED]> To: Richard J Moore/UK/IBM@IBMGB cc: [EMAIL PROTECTED] Subject: Re: The case for a standard kernel debugger Hello Richard, Part of your analysis is correct. The hooks were designed to take care of static tracepoints only. That said, dynamic allocation of event IDs was next on my list and the hooking mechanism would have been modified consequently. As for "multiple exits registered per hook", if you mean that you can have more than one function called back for each event, then this is already possible. The other items you mention such as atomicity and prioritization seem interesting indeed, although I am not sure what you mean by MP compliant as the only thing that stops the current generalized hooking mechanism to be MP compliant is the insertion of correct locks during callback registration. Please understand that the purpose wasn't to discredit your work, but rather to stop duplication of work as efforts could be deployed elsewhere. I think that your work and the work already done on LTT can be brought together in a way that would profit all. This is what I was hinting to towards the end of the posting. It was an invitation more than anything else. Apart from the hooking mechanism, there were other items which I mentioned that merit discussion, such as the ability to enable dynamic probes to log events in normal LTT traces or the event-driven state machine engine. Hence, if you are interested in joining forces to further enhance probing and tracing capabilities in Linux, I think this would be a good opportunity. Best regards Karim [EMAIL PROTECTED] wrote: > > Yes, we looked at that and it didn't seem to provide the generality we > needed - multipe exits registered per hook, ability to arm a set of hooks > atomically, ability to prioritise dispatching order of a hook exit, MP > complient. I may be wrong but the Linux Trace Toolkit hooks like like they > were specifically designed to cater for inserting static tracepoints into > the kernel. > > Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). > > http://oss.software.ibm.com/developerworks/opensource/linux > Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 > IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK -- === Karim Yaghmour [EMAIL PROTECTED] Operating System Consultant (Linux kernel, real-time and distributed systems) === - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a
Re: The case for a standard kernel debugger
Yes, we looked at that and it didn't seem to provide the generality we needed - multipe exits registered per hook, ability to arm a set of hooks atomically, ability to prioritise dispatching order of a hook exit, MP complient. I may be wrong but the Linux Trace Toolkit hooks like like they were specifically designed to cater for inserting static tracepoints into the kernel. Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: The case for a standard kernel debugger
Yes, we looked at that and it didn't seem to provide the generality we needed - multipe exits registered per hook, ability to arm a set of hooks atomically, ability to prioritise dispatching order of a hook exit, MP complient. I may be wrong but the Linux Trace Toolkit hooks like like they were specifically designed to cater for inserting static tracepoints into the kernel. Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: The case for a standard kernel debugger
Completely agree - co-operation+integration is the order of the day. They other thing I didn't mention was that the GKHI was substantially coded before we discovered your hook capability. Part of the GKHI is also to allow hooks to be dynamicaly defined i.e. to allow kernel modules to declare hooks themselves, though I haven't yet implemented that - the design is done. The MP complient remark refers to the fact that the hook mechanism must work under MP. Our hooks are implemented by modifying code dynamically - that has certain serialisation requrements: you need to ensure that other processors see a consistent view of memory, which means that you need to stop them while you're change code dynamically and also flush their I-fetch caches. There are some very odd (H/W) behavious that can occur with self-modifying code if the appropriate measures are not taken. Our uniprocessor version of GKHI is being tested as I write. We hope to release it in the next couple of weeks. As soon as we are happy with it I will send you a copy so you can evaluate it against you hook methodology and we can see what ecconomies can be established. And talking further of co-operation. I'd like to make DProbes drive your trace facility. Did you see the announcement post I sent to LTT? The idea is that DProbes is an enabler for other RAS facilities. We can dynamically insert a probe anywhere into memory (user and kernel) without the need for re-compilation of the source. From the RPN program that's driven by the probe event handler we can initiate other facilities such as entering SGI's kernel debugger or invoking Crash Dump or forcing a core dump. Now, DProbes came from OS/2 and was called dynamic trace. Its original purpose was to implement tracepoints on the fly. We can still do that with DProbes, provided we have a tracing mechanism we can feed into. That's where you come in. Can you provide an interface we can call from kernel space to log a trace event with a variable length buffer? Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK Karim Yaghmour [EMAIL PROTECTED] on 06/10/2000 09:16:12 Please respond to Karim Yaghmour [EMAIL PROTECTED] To: Richard J Moore/UK/IBM@IBMGB cc: [EMAIL PROTECTED] Subject: Re: The case for a standard kernel debugger Hello Richard, Part of your analysis is correct. The hooks were designed to take care of static tracepoints only. That said, dynamic allocation of event IDs was next on my list and the hooking mechanism would have been modified consequently. As for "multiple exits registered per hook", if you mean that you can have more than one function called back for each event, then this is already possible. The other items you mention such as atomicity and prioritization seem interesting indeed, although I am not sure what you mean by MP compliant as the only thing that stops the current generalized hooking mechanism to be MP compliant is the insertion of correct locks during callback registration. Please understand that the purpose wasn't to discredit your work, but rather to stop duplication of work as efforts could be deployed elsewhere. I think that your work and the work already done on LTT can be brought together in a way that would profit all. This is what I was hinting to towards the end of the posting. It was an invitation more than anything else. Apart from the hooking mechanism, there were other items which I mentioned that merit discussion, such as the ability to enable dynamic probes to log events in normal LTT traces or the event-driven state machine engine. Hence, if you are interested in joining forces to further enhance probing and tracing capabilities in Linux, I think this would be a good opportunity. Best regards Karim [EMAIL PROTECTED] wrote: Yes, we looked at that and it didn't seem to provide the generality we needed - multipe exits registered per hook, ability to arm a set of hooks atomically, ability to prioritise dispatching order of a hook exit, MP complient. I may be wrong but the Linux Trace Toolkit hooks like like they were specifically designed to cater for inserting static tracepoints into the kernel. Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK -- === Karim Yaghmour [EMAIL PROTECTED] Operating System Consultant (Linux kernel, real-time and distributed systems) === - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL
Re: Phase tree algorithm defined
Daniel, This is very valuable - thanks for makeing the effort. You could enhance you document greatly if you provided a few diagrams to illustrate the structure, especially the example file system. I'd suggest converting the document to HTML or XML. Also, I'd like to understand how the Phase Tree differs from other tree schemes used by files systems, for example the Modified Patricia Tree used by HPFS and NTFS. It wasn't quite clear to me how the advantages of consistency are obtained, but diagrams might help. Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK Daniel Phillips <[EMAIL PROTECTED]> on 05/10/2000 05:53:30 Please respond to Daniel Phillips <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] cc:(bcc: Richard J Moore/UK/IBM) Subject: Phase tree algorithm defined I have finally produced something resembling a formal definition of the phase tree algorithm. As you will see, this algorithm is somewhat subtle, and not easy to express in clear simple terms. But I think that I have in fact expressed it clearly in simply. If I have not, I wish very much to be told so, and why. You can get a copy here: http://innominate.org/~phillips/tux2/phase.tree.algorithm.txt Please, if you are especially anal and nasty and have little regard for anyone's feelings, read this and complain about every little thing that is wrong with it, and I will greatly appreciate that. I will also appreciate comments of the form 'you left out this or that', or 'this part sounds like so much bafflegab' and so on. Enjoy. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Phase tree algorithm defined
Daniel, This is very valuable - thanks for makeing the effort. You could enhance you document greatly if you provided a few diagrams to illustrate the structure, especially the example file system. I'd suggest converting the document to HTML or XML. Also, I'd like to understand how the Phase Tree differs from other tree schemes used by files systems, for example the Modified Patricia Tree used by HPFS and NTFS. It wasn't quite clear to me how the advantages of consistency are obtained, but diagrams might help. Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK Daniel Phillips [EMAIL PROTECTED] on 05/10/2000 05:53:30 Please respond to Daniel Phillips [EMAIL PROTECTED] To: [EMAIL PROTECTED] cc:(bcc: Richard J Moore/UK/IBM) Subject: Phase tree algorithm defined I have finally produced something resembling a formal definition of the phase tree algorithm. As you will see, this algorithm is somewhat subtle, and not easy to express in clear simple terms. But I think that I have in fact expressed it clearly in simply. If I have not, I wish very much to be told so, and why. You can get a copy here: http://innominate.org/~phillips/tux2/phase.tree.algorithm.txt Please, if you are especially anal and nasty and have little regard for anyone's feelings, read this and complain about every little thing that is wrong with it, and I will greatly appreciate that. I will also appreciate comments of the form 'you left out this or that', or 'this part sounds like so much bafflegab' and so on. Enjoy. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4 kernel problems on 386
What you're seeing is a tripple fault. I don't know why this is happening, but usually it has something to do with the double-fault mechanism being damaged (or not set-up) when a double-fault occurs. You mentiuon 386 - I rememeber in OS/2 we have many work-arounds for Intel processor errata - it might be possible that Linux hasn't catered for all the I32 386 errata - some of these errata apply to memory management and that's significant since a tripple fault is often caused by either stack faults or descriptor faults - both memory problems (I am assuming without checking the Linux uses a task switch for a double-fault, if it doesn't then a simple double-fault caused by a stack exception in ring 0 will result ina tripple fault.). Again, if this is a 386 issue then you should be aware of the increased instrcution set for 486, 586 etc.. Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK [EMAIL PROTECTED] on 04/10/2000 11:22:19 Please respond to [EMAIL PROTECTED] To: [EMAIL PROTECTED] cc:(bcc: Richard J Moore/UK/IBM) Subject: 2.4 kernel problems on 386 I'm trying to get a linux-2.4.0-test9-pre7 kernel with riel's recent memory swap patch going on a 386 - swapping problems with the 2.2.15 kernel I was using prompted this. I'm trying to make it run on a bull netstation, which means using bootp and etherboot tagged kernels. When I compile using gcc version 2.95.2, the resultant kernel reboots the machine each time its starts up - it does not get started. When I compile using gcc2.7.2.3, the machine just halts, with "Ok, booting the kernel" staying on the screen. I am not on the kernel list, so please send me any replies. I have not at this stage tried to see if any non patched versions of the kernel will work, nor have I tried to see if the kernel boots on my Pentium Pro (but could do so with some prompting). Regards, -- John August - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4 kernel problems on 386
What you're seeing is a tripple fault. I don't know why this is happening, but usually it has something to do with the double-fault mechanism being damaged (or not set-up) when a double-fault occurs. You mentiuon 386 - I rememeber in OS/2 we have many work-arounds for Intel processor errata - it might be possible that Linux hasn't catered for all the I32 386 errata - some of these errata apply to memory management and that's significant since a tripple fault is often caused by either stack faults or descriptor faults - both memory problems (I am assuming without checking the Linux uses a task switch for a double-fault, if it doesn't then a simple double-fault caused by a stack exception in ring 0 will result ina tripple fault.). Again, if this is a 386 issue then you should be aware of the increased instrcution set for 486, 586 etc.. Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK [EMAIL PROTECTED] on 04/10/2000 11:22:19 Please respond to [EMAIL PROTECTED] To: [EMAIL PROTECTED] cc:(bcc: Richard J Moore/UK/IBM) Subject: 2.4 kernel problems on 386 I'm trying to get a linux-2.4.0-test9-pre7 kernel with riel's recent memory swap patch going on a 386 - swapping problems with the 2.2.15 kernel I was using prompted this. I'm trying to make it run on a bull netstation, which means using bootp and etherboot tagged kernels. When I compile using gcc version 2.95.2, the resultant kernel reboots the machine each time its starts up - it does not get started. When I compile using gcc2.7.2.3, the machine just halts, with "Ok, booting the kernel" staying on the screen. I am not on the kernel list, so please send me any replies. I have not at this stage tried to see if any non patched versions of the kernel will work, nor have I tried to see if the kernel boots on my Pentium Pro (but could do so with some prompting). Regards, -- John August - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Anyone working on multi-threaded core files for 2.4 ?
Yes we (IBM Linux Technology Center RAS Team) are. If you have ideas/concerns/requirements please make them known. We are at the point of deciding what to attack. We have other dumping technologies on other OSs we could model a Linux enhancement on. There are many things we'd like to see incorporated, the question is how not to boil the ocean. Here are some of the ideas we are thinking about: Multi-process/multi-thread Customisable memory ranges/object types Code/Stack/Dynamic allocations System Objects: File-system Memory Management Device Management Process/Task Management (Physical memory ranges) Multiple (non-fatal) Triggers: Trap Command API Automated (via DProbes) Richard Richard Moore - RAS Project Lead - Linux Technology Centre. http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 PISC, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK James Cownie <[EMAIL PROTECTED]> on 29/09/2000 12:22:28 Please respond to James Cownie <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] cc:(bcc: Richard J Moore/UK/IBM) Subject: Anyone working on multi-threaded core files for 2.4 ? Please let me know (by mail) otherwise I may take a look, since it doesn't appear to be a _huge_ problem any longer, and it's one of the things users keep bitching at us about when using our debugger :-( Thanks -- Jim James Cownie <[EMAIL PROTECTED]> Etnus, LLC. +44 117 9071438 http://www.etnus.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Anyone working on multi-threaded core files for 2.4 ?
Yes we (IBM Linux Technology Center RAS Team) are. If you have ideas/concerns/requirements please make them known. We are at the point of deciding what to attack. We have other dumping technologies on other OSs we could model a Linux enhancement on. There are many things we'd like to see incorporated, the question is how not to boil the ocean. Here are some of the ideas we are thinking about: Multi-process/multi-thread Customisable memory ranges/object types Code/Stack/Dynamic allocations System Objects: File-system Memory Management Device Management Process/Task Management (Physical memory ranges) Multiple (non-fatal) Triggers: Trap Command API Automated (via DProbes) Richard Richard Moore - RAS Project Lead - Linux Technology Centre. http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 PISC, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK James Cownie [EMAIL PROTECTED] on 29/09/2000 12:22:28 Please respond to James Cownie [EMAIL PROTECTED] To: [EMAIL PROTECTED] cc:(bcc: Richard J Moore/UK/IBM) Subject: Anyone working on multi-threaded core files for 2.4 ? Please let me know (by mail) otherwise I may take a look, since it doesn't appear to be a _huge_ problem any longer, and it's one of the things users keep bitching at us about when using our debugger :-( Thanks -- Jim James Cownie [EMAIL PROTECTED] Etnus, LLC. +44 117 9071438 http://www.etnus.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: cpu reset on laptops and microcode update.
I wouldn't have expected a reset to be done on shutdown -r since that doesn't force POST to run. My guess is that we go directly to the BIOS to read the bootstrap (INT 19 is it??) Richard Moore - RAS Project Lead - Linux Technology Centre. http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 PISC, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK Tigran Aivazian <[EMAIL PROTECTED]> on 21/09/2000 09:50:53 Please respond to Tigran Aivazian <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] cc:(bcc: Richard J Moore/UK/IBM) Subject: cpu reset on laptops and microcode update. Hi guys, A long time ago I noticed a curious feature on my Dell Latitude CPx H-450GT laptop - rebooting it via "shutdown -r now" (and therefore going through BIOS) does not discard the microcode applied to the CPU. But I would expect it to be discarded as prescribed by Intel manuals, on #RESET. Does it mean that rebooting a laptop does not actually ever reset the CPU? (this would imply that the BIOS is also a protected mode software?) Of course, doing "shutdown -h now" and switching off the power does discard the microcode as expected. Any thoughts? I post on linux-kernel because, potentially, your thoughts may become relevant to the content of arch/i386/kernel/microcode.c which is part of Linux kernel :) Regards, Tigran - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: cpu reset on laptops and microcode update.
I wouldn't have expected a reset to be done on shutdown -r since that doesn't force POST to run. My guess is that we go directly to the BIOS to read the bootstrap (INT 19 is it??) Richard Moore - RAS Project Lead - Linux Technology Centre. http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 PISC, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK Tigran Aivazian [EMAIL PROTECTED] on 21/09/2000 09:50:53 Please respond to Tigran Aivazian [EMAIL PROTECTED] To: [EMAIL PROTECTED] cc:(bcc: Richard J Moore/UK/IBM) Subject: cpu reset on laptops and microcode update. Hi guys, A long time ago I noticed a curious feature on my Dell Latitude CPx H-450GT laptop - rebooting it via "shutdown -r now" (and therefore going through BIOS) does not discard the microcode applied to the CPU. But I would expect it to be discarded as prescribed by Intel manuals, on #RESET. Does it mean that rebooting a laptop does not actually ever reset the CPU? (this would imply that the BIOS is also a protected mode software?) Of course, doing "shutdown -h now" and switching off the power does discard the microcode as expected. Any thoughts? I post on linux-kernel because, potentially, your thoughts may become relevant to the content of arch/i386/kernel/microcode.c which is part of Linux kernel :) Regards, Tigran - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Linux RAS
The comparison is was making was with OS/2, not MVS, because: 1) all too often MVS is cited as being the paradigm for RAS when infact there are special architectural features, as you've pointed out, that might detract from a generalised comparison. 2) OS/2 is an x86 based OS so has the problems, like Linux, of not being locked into tighlty controlled H/W architecture. 3) we completely reversed the serviceability situation for OS/2 and even exceeded the RAS capabilities of MVS in the area of dynamic tracing. Richard Moore - RAS Project Lead - Linux Technology Centre. http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 PISC, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK Keith Owens <[EMAIL PROTECTED]> on 15-09-2000 01:07:40 PM Please respond to Keith Owens <[EMAIL PROTECTED]> To: Richard J Moore/UK/IBM@IBMGB cc: [EMAIL PROTECTED] Subject: Linux RAS On Fri, 15 Sep 2000 10:42:54 +0100, [EMAIL PROTECTED] wrote: >I think the case for the kernel debugger is better stated as the case for >RAS (Reliability, Serviceability and Availability) in the kernel [snip] >Customers knew we never sent >developers on site to debug OS/390 (or MVS as it was called in those days). >They also knew that we never rejected a problem because it was not >re-creatable and we never even asked for a re-creation scenario. The reason >for this was that we had appropriate RAS capability in MVS which allowed >data to be captured automatically at fault time combined with a certain >amount of self-healing capability and automated recovery. It would be nice if Linux had the same level of RAS as MVS. But consider the different environments that MVS and Linux run under. MVS.Standard and external I/O controllers. Fix the data pages, build the CCW, issue SIO, forget about it until it interrupts. I/O requires little or no OS support, it can even be done from bare hardware. Linux. 50+ different I/O controllers. Most require continual hand holding by the operating system. Doing I/O from bare hardware is difficult if not impossible. MVS.Multiple systems keys (0-7) so different system components can be segregated. If VTAM (key 7) dies the rest of the OS (keys 0-6) can run. Multiple system keys stop most runaway components from stamping on other component's data. Key switching is relatively cheap. Linux. On ix86, all of the kernel runs in key 0. No protection of one part of the kernel from another. Key switching is relatively expensive. MVS.High quality hardware. Processors that can detect their own problems, all memory is ECC or better, disks that report failures before they occur. Linux. Most users are on the cheapest possible hardware. No parity on memory, processors and disks that just die without warning. Lots of random problems, "one bit different, probably bad RAM". MVS.Trace of interesting events in the master trace table. Linux. No kernel trace table (oops backtrace does *not* count). MVS.Only one hardware model - S/390 with minimum hardware requirements for each version of OS/390. IBM can get away with telling customers "to run OS/390 2.9 you need a G5 processor or better". Linux. 12+ different hardware models, with no restriction on how old the hardware is. We cannot tell customers "to run Linux 2.4 you must have a Pentium III or better". MVS.If all else fails, MVS can do a stand alone IPL to capture the crash data. It requires that a site plan ahead by building stand alone IPL text and dedicating an area of disk to receive the crash dump but apart from that it always works. Linux. Assuming that your I/O subsystem will run without a working OS (and lkcd shows how difficult that is), you still need a mechanism to get the stand alone dump going. On our most common platform (ix86) we have to pray that the BIOS will not wipe memory before saving the crash dump. If the BIOS is not reliable (and most are not) then the stand alone code must be built into the kernel and protected somehow. Then you have the problem of activating the stand alone dump. MVS.Decent hardware support for OS tracing, with Program Event Registers (PER) that are standardized and easy to use. Linux. Each platform has different hardware mechanisms for hardware debugging. Some have no hardware debugging at all. MVS.The OS developers know everything about the hardware. They even know the undocumented processor and I/O subsystem commands. Linux. Nobody knows everything about every bit of hardware that is supported. MVS.Decent hardware support for cheap cross memory services. OS data can be separated into multiple data
Re: The case for a standard kernel debugger
I think the case for the kernel debugger is better stated as the case for RAS (Reliability, Serviceability and Availability) in the kernel, in other words, there is a case for having the right diagnostic, reporting and recovery tools in the right place at the right time. A kdb does not fulfil all diagnostic RAS needs. IMHO it's an extremely powerful developement tool, but hang on so is a logic analyser and a source-level debugger. It can also be a real pain if trying to debug HLL source using an assembler based debugger. The point is, one generally needs a debugger that matches the semantics that the programmer is dealing with. If its, assembler code the so be it, use a kdb. If you're poking around with H/W specific interfaces and system busses the you make need a lower level tool. But should a kdb be a standard part of the kernel for use in production/commercial/enterprise environments? I don't believe so. Looking back at the techniques we've deployed over the years to debug system problems in commercial environments, we only ever had the luxury of using a kdb with OS/2. Just about every other OS we supported did not have a kdb. The OS/2 case is interesting because initially, we had only the kdb for debugging, and it was the worst platform for serviceability that we ever supported. We couldn't debug those typically obscure problems that occurred only in production environments and which could never be readily re-created in the lab. We took an enormous amount of pain over this from our customers over poor serviceability. They hated every minute of production time we took from them when a developer took control of their systems in order to debug, or in many cases not debug the problem. Of course we had created a rod for our own backs. Customers knew we never sent developers on site to debug OS/390 (or MVS as it was called in those days). They also knew that we never rejected a problem because it was not re-creatable and we never even asked for a re-creation scenario. The reason for this was that we had appropriate RAS capability in MVS which allowed data to be captured automatically at fault time combined with a certain amount of self-healing capability and automated recovery. What we did to OS/2 to make it approach this level of RAS capability was to implement a system dump capability - similar to SGI's kernel crash dump, + a comprehensive system tracing facility that could be dynamically customised to tracing events in any code path without any code recompilation - IBM's Dynamic Probes for Linux is an initial port of the capability + a comprehensive and customisable virtual storage based dump, a bit like core dump, except that it could dump process trees if required and memory from not only from user space, but from system space based upon kernel sub-component, for example file-system structures etc.. That capability completely transformed our ability to debug serious and obscure problems, with minimal disruption. It's true that we weren't immediately successful when we implemented this stuff. There's a major learning curve and mind-set change required to work with captured data as opposed to interactive debugging. We didn't throw away the kdb, it's still very useful: 1) as a didactic tool. 2) for the final stages of problem determination - every problem is re-creatable once you know the triggers. And when you do, which you can get from dumps and traces, then you can set up a lab-based experiment where you use a debugger to solve the final mystery. 3) in production for those exceedingly rare cases where we needed to know what the underlying hardware was up to - it's a cheaper option than using a logic analyser. One big argument against RAS of any sort is that it bloats the kernel and not every one wants it (until they have a problem). A further argument with Linux is that you may have to do quite a bit of hard work to get the subset of RAS you need to co-exist, if it exists at all. Something we're working on which may help resolve this, and will be made available with the next drop of Dynamic Probes is Generalised Kernel Hooks Interface (GKHI). The idea here is to make all our RAS function the option of being dynamically loadable kernel modules. In most cases we don't need to modify kernel function, just get control at the right time. So we place hooks in kernel source, which remain dormant until activated by the GKHI when a RAS module asks it to. Maybe this will provide a way out of the difficulty. Richard Moore - RAS Project Lead - Linux Technology Centre. http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 PISC, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK Keith Owens <[EMAIL PROTECTED]> on 13-09-2000 10:49:50 PM Please respond to Keith Owens <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] cc: [EMAIL PROTECTED] (bcc: Richard J Moore/UK/IBM) Subject: The case for a standard kernel debugger
Re: Linux RAS
The comparison is was making was with OS/2, not MVS, because: 1) all too often MVS is cited as being the paradigm for RAS when infact there are special architectural features, as you've pointed out, that might detract from a generalised comparison. 2) OS/2 is an x86 based OS so has the problems, like Linux, of not being locked into tighlty controlled H/W architecture. 3) we completely reversed the serviceability situation for OS/2 and even exceeded the RAS capabilities of MVS in the area of dynamic tracing. Richard Moore - RAS Project Lead - Linux Technology Centre. http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 PISC, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK Keith Owens [EMAIL PROTECTED] on 15-09-2000 01:07:40 PM Please respond to Keith Owens [EMAIL PROTECTED] To: Richard J Moore/UK/IBM@IBMGB cc: [EMAIL PROTECTED] Subject: Linux RAS On Fri, 15 Sep 2000 10:42:54 +0100, [EMAIL PROTECTED] wrote: I think the case for the kernel debugger is better stated as the case for RAS (Reliability, Serviceability and Availability) in the kernel [snip] Customers knew we never sent developers on site to debug OS/390 (or MVS as it was called in those days). They also knew that we never rejected a problem because it was not re-creatable and we never even asked for a re-creation scenario. The reason for this was that we had appropriate RAS capability in MVS which allowed data to be captured automatically at fault time combined with a certain amount of self-healing capability and automated recovery. It would be nice if Linux had the same level of RAS as MVS. But consider the different environments that MVS and Linux run under. MVS.Standard and external I/O controllers. Fix the data pages, build the CCW, issue SIO, forget about it until it interrupts. I/O requires little or no OS support, it can even be done from bare hardware. Linux. 50+ different I/O controllers. Most require continual hand holding by the operating system. Doing I/O from bare hardware is difficult if not impossible. MVS.Multiple systems keys (0-7) so different system components can be segregated. If VTAM (key 7) dies the rest of the OS (keys 0-6) can run. Multiple system keys stop most runaway components from stamping on other component's data. Key switching is relatively cheap. Linux. On ix86, all of the kernel runs in key 0. No protection of one part of the kernel from another. Key switching is relatively expensive. MVS.High quality hardware. Processors that can detect their own problems, all memory is ECC or better, disks that report failures before they occur. Linux. Most users are on the cheapest possible hardware. No parity on memory, processors and disks that just die without warning. Lots of random problems, "one bit different, probably bad RAM". MVS.Trace of interesting events in the master trace table. Linux. No kernel trace table (oops backtrace does *not* count). MVS.Only one hardware model - S/390 with minimum hardware requirements for each version of OS/390. IBM can get away with telling customers "to run OS/390 2.9 you need a G5 processor or better". Linux. 12+ different hardware models, with no restriction on how old the hardware is. We cannot tell customers "to run Linux 2.4 you must have a Pentium III or better". MVS.If all else fails, MVS can do a stand alone IPL to capture the crash data. It requires that a site plan ahead by building stand alone IPL text and dedicating an area of disk to receive the crash dump but apart from that it always works. Linux. Assuming that your I/O subsystem will run without a working OS (and lkcd shows how difficult that is), you still need a mechanism to get the stand alone dump going. On our most common platform (ix86) we have to pray that the BIOS will not wipe memory before saving the crash dump. If the BIOS is not reliable (and most are not) then the stand alone code must be built into the kernel and protected somehow. Then you have the problem of activating the stand alone dump. MVS.Decent hardware support for OS tracing, with Program Event Registers (PER) that are standardized and easy to use. Linux. Each platform has different hardware mechanisms for hardware debugging. Some have no hardware debugging at all. MVS.The OS developers know everything about the hardware. They even know the undocumented processor and I/O subsystem commands. Linux. Nobody knows everything about every bit of hardware that is supported. MVS.Decent hardware support for cheap cross memory services. OS data can be separated into multiple data spaces,