Matthias Fouquet-Lapar wrote: > Keith Owens wrote: > > Russ Anderson <[EMAIL PROTECTED]> wrote: > > >The MCA recovery driver saves the addresses of memory errors > > >in an array. The array has 32 entries. The effect is > > >that after 32 recoveries, the driver stops recovering. > > > > > >This patch removes the page_isolate array. Since the array > > >was only used to see if the page is already marked reserved, > > >check the reserved bit instead of the array. > > > > lkcd dumps kernel pages marked reserved, so lkcd will try to process > > isolated pages. We will eventually need to add a new page flag to mark > > faulty pages. > > Probably any other dump mechanism should be aware of bad HW pages as well, > so we might be better off to add a flag right away. While we are at it I > would propose to have actually two flags : > > - hard error (which will cause a MCA and should be skipped when taking > a system dump) > - soft error (page has encountered SBE, so we might want to avoid future > allocation, but it can be dumped without causing an MCA)
Yes, there should be page->flags to indicate hard and soft memory errors, such as PG_hard_error and PG_soft_error. The dump code could look at those flags. include/linux/page-flags.h has PG_error for indicating I/O errors, which is close but not quite what we need, given the way it is used. CCing linux-kernel since the flags are not ia64 specific. -- Russ Anderson, OS RAS/Partitioning Project Lead SGI - Silicon Graphics Inc [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/