Re: [PATCH] cache_k8_northbridges() overflows beyond allocation (Was: 2.6.21-rc5-mm4 (SLUB))
On Fri, 13 Apr 2007, Badari Pulavarty wrote: > On Wed, 2007-04-04 at 11:04 -0700, Christoph Lameter wrote: > > On Wed, 4 Apr 2007, Badari Pulavarty wrote: > ... > > > > > *** SLUB: Freepointer corrupt in [EMAIL PROTECTED] Slab > > > 0x81017f9f8b80 > > > offset=672 flags=0x2c7 inuse=42 > > > freelist=0x810173f172a0 > > > Bytes b4 0x810173f17290: a0 72 f1 73 00 00 00 00 00 00 00 00 00 > > > 00 00 00 .r\us > > > Object 0x810173f172a0: 00 00 00 00 01 81 ff ff 00 00 00 00 00 > > > 00 00 00 ..\u\u > > > FreePointer 0x810173f172a0 -> 0x8101 > > > > Found it !! After a painful capture of all the kmalloc-16 slab > allocations (400+) so far and auditing some of them, found the > culprit - who writes beyond its allocation, causing the slab > corruption. Thanks. I am sorry that this was not easier for you. But as a result I thoroughly tested the slab corruption detection in SLUB yesterday found various issues and submitted patches to Andrew that will make this really work well. Too late for you though. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] cache_k8_northbridges() overflows beyond allocation (Was: 2.6.21-rc5-mm4 (SLUB))
On Fri, 13 Apr 2007 17:45:37 +0200 Andi Kleen <[EMAIL PROTECTED]> wrote: > > > > > cache_k8_northbridges() is storing config values to incorrect locations > > (in flush_words) and also its overflowing beyond the allocation, causing > > slab verification failures. > > Oops. Thanks for tracking that down, Badari. > > Andrew, clear .21 candidate. OK. And for 2.6.20.x, methinks. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] cache_k8_northbridges() overflows beyond allocation (Was: 2.6.21-rc5-mm4 (SLUB))
On Friday 13 April 2007 18:42:43 Chuck Ebbert wrote: > Andi Kleen wrote: > >> cache_k8_northbridges() is storing config values to incorrect locations > >> (in flush_words) and also its overflowing beyond the allocation, causing > >> slab verification failures. > > > > Oops. Thanks for tracking that down, Badari. > > > > Andrew, clear .21 candidate. > > > > 2.6.20 as well. Do you want me to submit it? After it is in .21 -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] cache_k8_northbridges() overflows beyond allocation (Was: 2.6.21-rc5-mm4 (SLUB))
Andi Kleen wrote: >> cache_k8_northbridges() is storing config values to incorrect locations >> (in flush_words) and also its overflowing beyond the allocation, causing >> slab verification failures. > > Oops. Thanks for tracking that down, Badari. > > Andrew, clear .21 candidate. > 2.6.20 as well. Do you want me to submit it? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] cache_k8_northbridges() overflows beyond allocation (Was: 2.6.21-rc5-mm4 (SLUB))
> > cache_k8_northbridges() is storing config values to incorrect locations > (in flush_words) and also its overflowing beyond the allocation, causing > slab verification failures. Oops. Thanks for tracking that down, Badari. Andrew, clear .21 candidate. -ANdi > > Signed-off-by: Badari Pulavarty <[EMAIL PROTECTED]> > --- > arch/x86_64/kernel/k8.c |4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > Index: linux-2.6.21-rc6/arch/x86_64/kernel/k8.c > === > --- linux-2.6.21-rc6.orig/arch/x86_64/kernel/k8.c 2007-04-05 > 19:36:56.0 -0700 > +++ linux-2.6.21-rc6/arch/x86_64/kernel/k8.c 2007-04-13 07:51:57.0 > -0700 > @@ -61,8 +61,8 @@ int cache_k8_northbridges(void) > dev = NULL; > i = 0; > while ((dev = next_k8_northbridge(dev)) != NULL) { > - k8_northbridges[i++] = dev; > - pci_read_config_dword(dev, 0x9c, _words[i]); > + k8_northbridges[i] = dev; > + pci_read_config_dword(dev, 0x9c, _words[i++]); > } > k8_northbridges[i] = NULL; > return 0; > > > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] cache_k8_northbridges() overflows beyond allocation (Was: 2.6.21-rc5-mm4 (SLUB))
On Wed, 2007-04-04 at 11:04 -0700, Christoph Lameter wrote: > On Wed, 4 Apr 2007, Badari Pulavarty wrote: ... > > > *** SLUB: Freepointer corrupt in [EMAIL PROTECTED] Slab > > 0x81017f9f8b80 > > offset=672 flags=0x2c7 inuse=42 > > freelist=0x810173f172a0 > > Bytes b4 0x810173f17290: a0 72 f1 73 00 00 00 00 00 00 00 00 00 > > 00 00 00 .r\us > > Object 0x810173f172a0: 00 00 00 00 01 81 ff ff 00 00 00 00 00 > > 00 00 00 ..\u\u > > FreePointer 0x810173f172a0 -> 0x8101 > Found it !! After a painful capture of all the kmalloc-16 slab allocations (400+) so far and auditing some of them, found the culprit - who writes beyond its allocation, causing the slab corruption. Thanks, Badari cache_k8_northbridges() is storing config values to incorrect locations (in flush_words) and also its overflowing beyond the allocation, causing slab verification failures. Signed-off-by: Badari Pulavarty <[EMAIL PROTECTED]> --- arch/x86_64/kernel/k8.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) Index: linux-2.6.21-rc6/arch/x86_64/kernel/k8.c === --- linux-2.6.21-rc6.orig/arch/x86_64/kernel/k8.c 2007-04-05 19:36:56.0 -0700 +++ linux-2.6.21-rc6/arch/x86_64/kernel/k8.c2007-04-13 07:51:57.0 -0700 @@ -61,8 +61,8 @@ int cache_k8_northbridges(void) dev = NULL; i = 0; while ((dev = next_k8_northbridge(dev)) != NULL) { - k8_northbridges[i++] = dev; - pci_read_config_dword(dev, 0x9c, _words[i]); + k8_northbridges[i] = dev; + pci_read_config_dword(dev, 0x9c, _words[i++]); } k8_northbridges[i] = NULL; return 0; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] cache_k8_northbridges() overflows beyond allocation (Was: 2.6.21-rc5-mm4 (SLUB))
On Wed, 2007-04-04 at 11:04 -0700, Christoph Lameter wrote: On Wed, 4 Apr 2007, Badari Pulavarty wrote: ... *** SLUB: Freepointer corrupt in [EMAIL PROTECTED] Slab 0x81017f9f8b80 offset=672 flags=0x2c7 inuse=42 freelist=0x810173f172a0 Bytes b4 0x810173f17290: a0 72 f1 73 00 00 00 00 00 00 00 00 00 00 00 00 .r\us Object 0x810173f172a0: 00 00 00 00 01 81 ff ff 00 00 00 00 00 00 00 00 ..\u\u FreePointer 0x810173f172a0 - 0x8101 Found it !! After a painful capture of all the kmalloc-16 slab allocations (400+) so far and auditing some of them, found the culprit - who writes beyond its allocation, causing the slab corruption. Thanks, Badari cache_k8_northbridges() is storing config values to incorrect locations (in flush_words) and also its overflowing beyond the allocation, causing slab verification failures. Signed-off-by: Badari Pulavarty [EMAIL PROTECTED] --- arch/x86_64/kernel/k8.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) Index: linux-2.6.21-rc6/arch/x86_64/kernel/k8.c === --- linux-2.6.21-rc6.orig/arch/x86_64/kernel/k8.c 2007-04-05 19:36:56.0 -0700 +++ linux-2.6.21-rc6/arch/x86_64/kernel/k8.c2007-04-13 07:51:57.0 -0700 @@ -61,8 +61,8 @@ int cache_k8_northbridges(void) dev = NULL; i = 0; while ((dev = next_k8_northbridge(dev)) != NULL) { - k8_northbridges[i++] = dev; - pci_read_config_dword(dev, 0x9c, flush_words[i]); + k8_northbridges[i] = dev; + pci_read_config_dword(dev, 0x9c, flush_words[i++]); } k8_northbridges[i] = NULL; return 0; - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] cache_k8_northbridges() overflows beyond allocation (Was: 2.6.21-rc5-mm4 (SLUB))
cache_k8_northbridges() is storing config values to incorrect locations (in flush_words) and also its overflowing beyond the allocation, causing slab verification failures. Oops. Thanks for tracking that down, Badari. Andrew, clear .21 candidate. -ANdi Signed-off-by: Badari Pulavarty [EMAIL PROTECTED] --- arch/x86_64/kernel/k8.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) Index: linux-2.6.21-rc6/arch/x86_64/kernel/k8.c === --- linux-2.6.21-rc6.orig/arch/x86_64/kernel/k8.c 2007-04-05 19:36:56.0 -0700 +++ linux-2.6.21-rc6/arch/x86_64/kernel/k8.c 2007-04-13 07:51:57.0 -0700 @@ -61,8 +61,8 @@ int cache_k8_northbridges(void) dev = NULL; i = 0; while ((dev = next_k8_northbridge(dev)) != NULL) { - k8_northbridges[i++] = dev; - pci_read_config_dword(dev, 0x9c, flush_words[i]); + k8_northbridges[i] = dev; + pci_read_config_dword(dev, 0x9c, flush_words[i++]); } k8_northbridges[i] = NULL; return 0; - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] cache_k8_northbridges() overflows beyond allocation (Was: 2.6.21-rc5-mm4 (SLUB))
Andi Kleen wrote: cache_k8_northbridges() is storing config values to incorrect locations (in flush_words) and also its overflowing beyond the allocation, causing slab verification failures. Oops. Thanks for tracking that down, Badari. Andrew, clear .21 candidate. 2.6.20 as well. Do you want me to submit it? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] cache_k8_northbridges() overflows beyond allocation (Was: 2.6.21-rc5-mm4 (SLUB))
On Friday 13 April 2007 18:42:43 Chuck Ebbert wrote: Andi Kleen wrote: cache_k8_northbridges() is storing config values to incorrect locations (in flush_words) and also its overflowing beyond the allocation, causing slab verification failures. Oops. Thanks for tracking that down, Badari. Andrew, clear .21 candidate. 2.6.20 as well. Do you want me to submit it? After it is in .21 -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] cache_k8_northbridges() overflows beyond allocation (Was: 2.6.21-rc5-mm4 (SLUB))
On Fri, 13 Apr 2007 17:45:37 +0200 Andi Kleen [EMAIL PROTECTED] wrote: cache_k8_northbridges() is storing config values to incorrect locations (in flush_words) and also its overflowing beyond the allocation, causing slab verification failures. Oops. Thanks for tracking that down, Badari. Andrew, clear .21 candidate. OK. And for 2.6.20.x, methinks. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] cache_k8_northbridges() overflows beyond allocation (Was: 2.6.21-rc5-mm4 (SLUB))
On Fri, 13 Apr 2007, Badari Pulavarty wrote: On Wed, 2007-04-04 at 11:04 -0700, Christoph Lameter wrote: On Wed, 4 Apr 2007, Badari Pulavarty wrote: ... *** SLUB: Freepointer corrupt in [EMAIL PROTECTED] Slab 0x81017f9f8b80 offset=672 flags=0x2c7 inuse=42 freelist=0x810173f172a0 Bytes b4 0x810173f17290: a0 72 f1 73 00 00 00 00 00 00 00 00 00 00 00 00 .r\us Object 0x810173f172a0: 00 00 00 00 01 81 ff ff 00 00 00 00 00 00 00 00 ..\u\u FreePointer 0x810173f172a0 - 0x8101 Found it !! After a painful capture of all the kmalloc-16 slab allocations (400+) so far and auditing some of them, found the culprit - who writes beyond its allocation, causing the slab corruption. Thanks. I am sorry that this was not easier for you. But as a result I thoroughly tested the slab corruption detection in SLUB yesterday found various issues and submitted patches to Andrew that will make this really work well. Too late for you though. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: usb touchscreen breakage in 2.6.21-rc5-mm4 ?
On Wed, Apr 11, 2007 at 10:37:11AM -0400, Dmitry Torokhov wrote: > On 4/11/07, Helge Hafting <[EMAIL PROTECTED]> wrote: > >Dmitry Torokhov wrote: > >> > >> *sigh* When will I learn to spell names of kernel parameters > >> correctly? It is initcall_debug, not debug_initcall :( Could you try > >> again, please? > >Here is the dmesg for rc5mm4 with initcall_debug, showing how > >no usbtouch function is called at all. > > > > Helge, > > I don't have any explanation why we don't see usbtouch_init called at > all in -rc5-mm4. Could it be toolchain misbehaving? Do you see > references to usbtouch_init in the kernel image itself? > I unpacked it, ran "strings" on it, and found no usbtouch in there. There were plenty of other usb names, such as usbfs, usbserial, usbcore and tons of messages that usb mass storage and usb serial might need to produce. Versions of some tools, I don't know if there are any known issues: $ gcc --version gcc (GCC) 4.1.2 20061115 (prerelease) (Debian 4.1.1-21) $ ld --version GNU ld (GNU Binutils for Debian) 2.17.50.20070406 $ dpkg -l binutils ii binutils 2.17.20070406c The GNU assembler, linker and binary Helge Hafting - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: usb touchscreen breakage in 2.6.21-rc5-mm4 ?
On Wed, Apr 11, 2007 at 10:37:11AM -0400, Dmitry Torokhov wrote: On 4/11/07, Helge Hafting [EMAIL PROTECTED] wrote: Dmitry Torokhov wrote: *sigh* When will I learn to spell names of kernel parameters correctly? It is initcall_debug, not debug_initcall :( Could you try again, please? Here is the dmesg for rc5mm4 with initcall_debug, showing how no usbtouch function is called at all. Helge, I don't have any explanation why we don't see usbtouch_init called at all in -rc5-mm4. Could it be toolchain misbehaving? Do you see references to usbtouch_init in the kernel image itself? I unpacked it, ran strings on it, and found no usbtouch in there. There were plenty of other usb names, such as usbfs, usbserial, usbcore and tons of messages that usb mass storage and usb serial might need to produce. Versions of some tools, I don't know if there are any known issues: $ gcc --version gcc (GCC) 4.1.2 20061115 (prerelease) (Debian 4.1.1-21) $ ld --version GNU ld (GNU Binutils for Debian) 2.17.50.20070406 $ dpkg -l binutils ii binutils 2.17.20070406c The GNU assembler, linker and binary Helge Hafting - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: usb touchscreen breakage in 2.6.21-rc5-mm4 ?
On 4/11/07, Helge Hafting <[EMAIL PROTECTED]> wrote: Dmitry Torokhov wrote: > > *sigh* When will I learn to spell names of kernel parameters > correctly? It is initcall_debug, not debug_initcall :( Could you try > again, please? Here is the dmesg for rc5mm4 with initcall_debug, showing how no usbtouch function is called at all. Helge, I don't have any explanation why we don't see usbtouch_init called at all in -rc5-mm4. Could it be toolchain misbehaving? Do you see references to usbtouch_init in the kernel image itself? -- Dmitry - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: usb touchscreen breakage in 2.6.21-rc5-mm4 ?
Dmitry Torokhov wrote: *sigh* When will I learn to spell names of kernel parameters correctly? It is initcall_debug, not debug_initcall :( Could you try again, please? Here is the dmesg for rc5mm4 with initcall_debug, showing how no usbtouch function is called at all. I also attached a similiar dmesg for 2.6.21-rc6, where things work normally. I also decompressed the rc5mm4 image, to check that USB touchscreen really is compiled into this image. These USB options are on: CONFIG_USB_HID=y CONFIG_USB_HIDDEV=y CONFIG_USB_ARCH_HAS_HCD=y CONFIG_USB_ARCH_HAS_OHCI=y CONFIG_USB_ARCH_HAS_EHCI=y CONFIG_USB=y CONFIG_USB_DEBUG=y CONFIG_USB_DEVICEFS=y CONFIG_USB_DEVICE_CLASS=y CONFIG_USB_EHCI_HCD=y CONFIG_USB_EHCI_SPLIT_ISO=y CONFIG_USB_EHCI_ROOT_HUB_TT=y CONFIG_USB_EHCI_TT_NEWSCHED=y CONFIG_USB_UHCI_HCD=y CONFIG_USB_STORAGE=y CONFIG_USB_STORAGE_DEBUG=y CONFIG_USB_STORAGE_DATAFAB=y CONFIG_USB_STORAGE_ISD200=y CONFIG_USB_STORAGE_DPCM=y CONFIG_USB_STORAGE_USBAT=y CONFIG_USB_STORAGE_SDDR09=y CONFIG_USB_STORAGE_SDDR55=y CONFIG_USB_STORAGE_JUMPSHOT=y CONFIG_USB_STORAGE_ALAUDA=y CONFIG_USB_LIBUSUAL=y CONFIG_USB_TOUCHSCREEN=y CONFIG_USB_TOUCHSCREEN_EGALAX=y CONFIG_USB_SERIAL=y CONFIG_USB_SERIAL_PL2303=y Helge Hafting initcall_debugrc5mm4.gz Description: application/gzip initcall_debugrc6.gz Description: application/gzip
Re: usb touchscreen breakage in 2.6.21-rc5-mm4 ?
Dmitry Torokhov wrote: *sigh* When will I learn to spell names of kernel parameters correctly? It is initcall_debug, not debug_initcall :( Could you try again, please? Here is the dmesg for rc5mm4 with initcall_debug, showing how no usbtouch function is called at all. I also attached a similiar dmesg for 2.6.21-rc6, where things work normally. I also decompressed the rc5mm4 image, to check that USB touchscreen really is compiled into this image. These USB options are on: CONFIG_USB_HID=y CONFIG_USB_HIDDEV=y CONFIG_USB_ARCH_HAS_HCD=y CONFIG_USB_ARCH_HAS_OHCI=y CONFIG_USB_ARCH_HAS_EHCI=y CONFIG_USB=y CONFIG_USB_DEBUG=y CONFIG_USB_DEVICEFS=y CONFIG_USB_DEVICE_CLASS=y CONFIG_USB_EHCI_HCD=y CONFIG_USB_EHCI_SPLIT_ISO=y CONFIG_USB_EHCI_ROOT_HUB_TT=y CONFIG_USB_EHCI_TT_NEWSCHED=y CONFIG_USB_UHCI_HCD=y CONFIG_USB_STORAGE=y CONFIG_USB_STORAGE_DEBUG=y CONFIG_USB_STORAGE_DATAFAB=y CONFIG_USB_STORAGE_ISD200=y CONFIG_USB_STORAGE_DPCM=y CONFIG_USB_STORAGE_USBAT=y CONFIG_USB_STORAGE_SDDR09=y CONFIG_USB_STORAGE_SDDR55=y CONFIG_USB_STORAGE_JUMPSHOT=y CONFIG_USB_STORAGE_ALAUDA=y CONFIG_USB_LIBUSUAL=y CONFIG_USB_TOUCHSCREEN=y CONFIG_USB_TOUCHSCREEN_EGALAX=y CONFIG_USB_SERIAL=y CONFIG_USB_SERIAL_PL2303=y Helge Hafting initcall_debugrc5mm4.gz Description: application/gzip initcall_debugrc6.gz Description: application/gzip
Re: usb touchscreen breakage in 2.6.21-rc5-mm4 ?
On 4/11/07, Helge Hafting [EMAIL PROTECTED] wrote: Dmitry Torokhov wrote: *sigh* When will I learn to spell names of kernel parameters correctly? It is initcall_debug, not debug_initcall :( Could you try again, please? Here is the dmesg for rc5mm4 with initcall_debug, showing how no usbtouch function is called at all. Helge, I don't have any explanation why we don't see usbtouch_init called at all in -rc5-mm4. Could it be toolchain misbehaving? Do you see references to usbtouch_init in the kernel image itself? -- Dmitry - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RAID1 "out of memory" error, was Re: 2.6.21-rc5-mm4
On Friday April 6, [EMAIL PROTECTED] wrote: > > Looks like some damage, or maybe intolerance to on-disk damage, to RAID-1. Difference is that kzalloc(0, ) now returns NULL. Maybe it is a SLUB/SLAB difference? (So maybe it did use memory it shouldn't have before, but now it fails, which is the better behaviour). This patch fixes the maths and should probably go in various 'stable' kernels. Bug is in 2.6.18, but not 2.6.16. Patch won't work for 2.6.18 as DIV_ROUND_UP is missing, but 2.6.19 and later have it. Thanks for the bug report. NeilBrown - Fix calculation for size of filemap_attr array in md/bitmap. If 'num_pages' were ever 1 more than a multiple of 8 (32bit platforms) for of 16 (64 bit platforms). filemap_attr would be allocated one 'unsigned long' shorter than required. We need a round-up in there. Signed-off-by: Neil Brown <[EMAIL PROTECTED]> ### Diffstat output ./drivers/md/bitmap.c |4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff .prev/drivers/md/bitmap.c ./drivers/md/bitmap.c --- .prev/drivers/md/bitmap.c 2007-04-11 13:24:50.0 +1000 +++ ./drivers/md/bitmap.c 2007-04-11 13:24:59.0 +1000 @@ -863,9 +863,7 @@ static int bitmap_init_from_disk(struct /* We need 4 bits per page, rounded up to a multiple of sizeof(unsigned long) */ bitmap->filemap_attr = kzalloc( - (((num_pages*4/8)+sizeof(unsigned long)-1) -/sizeof(unsigned long)) - *sizeof(unsigned long), + roundup( DIV_ROUND_UP(num_pages*4, 8), sizeof(unsigned long)), GFP_KERNEL); if (!bitmap->filemap_attr) goto out; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: usb touchscreen breakage in 2.6.21-rc5-mm4 ?
On 4/10/07, Helge Hafting <[EMAIL PROTECTED]> wrote: Dmitry Torokhov wrote: > Hmm, I am concerned because not only you don't have an input device created, > you don't even see the driver being registered with usbcore. Could you please > try booting with debug_initcall to see with what error code usbtouchscreen > initialization fails? > Here is the dmesg from a boot with debug_initcall. I can't see any messages from usbtouchscreen. For me, it looks like the touchscreen is discovered and then nothing happens to it. *sigh* When will I learn to spell names of kernel parameters correctly? It is initcall_debug, not debug_initcall :( Could you try again, please? -- Dmitry - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: usb touchscreen breakage in 2.6.21-rc5-mm4 ?
Andrew Morton wrote: Is 2.6.21-rc6 OK? If so, please keep a close eye on 2.6.22-rcX, let us know if/when we've moved this breakage into mainline :( 2.6.21-rc6 is ok. Here, I get messages from usbtouchscreen, something rc5-mm4 failed to produce. The egalax driver gets /class/input/input3, usbcore registers usbtouchscreen, and the touchscreen works. Well, it became /dev/input/event3 while 2.6.18 placed it at /dev/input/event1, but I think that is more of a udev problem... Helge Hafting - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: usb touchscreen breakage in 2.6.21-rc5-mm4 ?
Dmitry Torokhov wrote: Hmm, I am concerned because not only you don't have an input device created, you don't even see the driver being registered with usbcore. Could you please try booting with debug_initcall to see with what error code usbtouchscreen initialization fails? Here is the dmesg from a boot with debug_initcall. I can't see any messages from usbtouchscreen. For me, it looks like the touchscreen is discovered and then nothing happens to it. Helge Hafting debug_initcall.gz Description: application/gzip
Re: usb touchscreen breakage in 2.6.21-rc5-mm4 ?
Dmitry Torokhov wrote: Hmm, I am concerned because not only you don't have an input device created, you don't even see the driver being registered with usbcore. Could you please try booting with debug_initcall to see with what error code usbtouchscreen initialization fails? Here is the dmesg from a boot with debug_initcall. I can't see any messages from usbtouchscreen. For me, it looks like the touchscreen is discovered and then nothing happens to it. Helge Hafting debug_initcall.gz Description: application/gzip
Re: usb touchscreen breakage in 2.6.21-rc5-mm4 ?
Andrew Morton wrote: Is 2.6.21-rc6 OK? If so, please keep a close eye on 2.6.22-rcX, let us know if/when we've moved this breakage into mainline :( 2.6.21-rc6 is ok. Here, I get messages from usbtouchscreen, something rc5-mm4 failed to produce. The egalax driver gets /class/input/input3, usbcore registers usbtouchscreen, and the touchscreen works. Well, it became /dev/input/event3 while 2.6.18 placed it at /dev/input/event1, but I think that is more of a udev problem... Helge Hafting - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: usb touchscreen breakage in 2.6.21-rc5-mm4 ?
On 4/10/07, Helge Hafting [EMAIL PROTECTED] wrote: Dmitry Torokhov wrote: Hmm, I am concerned because not only you don't have an input device created, you don't even see the driver being registered with usbcore. Could you please try booting with debug_initcall to see with what error code usbtouchscreen initialization fails? Here is the dmesg from a boot with debug_initcall. I can't see any messages from usbtouchscreen. For me, it looks like the touchscreen is discovered and then nothing happens to it. *sigh* When will I learn to spell names of kernel parameters correctly? It is initcall_debug, not debug_initcall :( Could you try again, please? -- Dmitry - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RAID1 out of memory error, was Re: 2.6.21-rc5-mm4
On Friday April 6, [EMAIL PROTECTED] wrote: Looks like some damage, or maybe intolerance to on-disk damage, to RAID-1. Difference is that kzalloc(0, ) now returns NULL. Maybe it is a SLUB/SLAB difference? (So maybe it did use memory it shouldn't have before, but now it fails, which is the better behaviour). This patch fixes the maths and should probably go in various 'stable' kernels. Bug is in 2.6.18, but not 2.6.16. Patch won't work for 2.6.18 as DIV_ROUND_UP is missing, but 2.6.19 and later have it. Thanks for the bug report. NeilBrown - Fix calculation for size of filemap_attr array in md/bitmap. If 'num_pages' were ever 1 more than a multiple of 8 (32bit platforms) for of 16 (64 bit platforms). filemap_attr would be allocated one 'unsigned long' shorter than required. We need a round-up in there. Signed-off-by: Neil Brown [EMAIL PROTECTED] ### Diffstat output ./drivers/md/bitmap.c |4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff .prev/drivers/md/bitmap.c ./drivers/md/bitmap.c --- .prev/drivers/md/bitmap.c 2007-04-11 13:24:50.0 +1000 +++ ./drivers/md/bitmap.c 2007-04-11 13:24:59.0 +1000 @@ -863,9 +863,7 @@ static int bitmap_init_from_disk(struct /* We need 4 bits per page, rounded up to a multiple of sizeof(unsigned long) */ bitmap-filemap_attr = kzalloc( - (((num_pages*4/8)+sizeof(unsigned long)-1) -/sizeof(unsigned long)) - *sizeof(unsigned long), + roundup( DIV_ROUND_UP(num_pages*4, 8), sizeof(unsigned long)), GFP_KERNEL); if (!bitmap-filemap_attr) goto out; - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: usb touchscreen breakage in 2.6.21-rc5-mm4 ?
On Monday 09 April 2007 18:36, Helge Hafting wrote: > On Fri, Apr 06, 2007 at 10:37:12PM -0400, Dmitry Torokhov wrote: > > On Friday 06 April 2007 20:54, Helge Hafting wrote: > > > I have an usb touchscreen (egalax variety) that works with > > > the 2.6.18 kernel supplied by debian. > > > > > > It fails when I compile 2.6.21-rc5-mm4, tuned to the machine > > > in question. Unlike the debian kernel, this kernel don't use > > > modules in order to save boot time. > > > > > > The strange thing is, 2.6.21-rc5-mm4 recognizes the device. > > > dmesg says things like > > > usb 3-2: Manufacturer: eGalac Inc. > > > usb 3-2: Product: USB TouchController > > > > > > and a lot more. Unlike 2.6.18, it never gets around to say > > > "usbcore: registered new driver usbtouchscreen" > > > which seems to indicate a problem. > > > usbcore registers several other drivers, such as usbserial and pl2303 > > > that makes the gps work. It also registers other drivers like > > > usb-storage,usbfs,hub,libusual,hiddev,usbhid. But not usbtouchscreen. > > > I believe I have turned on every config option for usb touchscreen, > > > this should not be missing. > > > > > > Is there something wrong, or could there be a seemingly unrelated option > > > that I need to turn on? > > > > Please make sure that you have CONFIG_USB_TOUCHSCREEN turned on. > > > Unfortunately, I have: > CONFIG_USB_TOUCHSCREEN=y > CONFIG_USB_TOUCHSCREEN_EGALAX=y > > Anything else I may have missed? > Hmm, I am concerned because not only you don't have an input device created, you don't even see the driver being registered with usbcore. Could you please try booting with debug_initcall to see with what error code usbtouchscreen initialization fails? -- Dmitry - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: usb touchscreen breakage in 2.6.21-rc5-mm4 ?
On Tue, 10 Apr 2007 00:36:43 +0200 Helge Hafting <[EMAIL PROTECTED]> wrote: > On Fri, Apr 06, 2007 at 10:37:12PM -0400, Dmitry Torokhov wrote: > > On Friday 06 April 2007 20:54, Helge Hafting wrote: > > > I have an usb touchscreen (egalax variety) that works with > > > the 2.6.18 kernel supplied by debian. > > > > > > It fails when I compile 2.6.21-rc5-mm4, tuned to the machine > > > in question. Unlike the debian kernel, this kernel don't use > > > modules in order to save boot time. > > > > > > The strange thing is, 2.6.21-rc5-mm4 recognizes the device. > > > dmesg says things like > > > usb 3-2: Manufacturer: eGalac Inc. > > > usb 3-2: Product: USB TouchController > > > > > > and a lot more. Unlike 2.6.18, it never gets around to say > > > "usbcore: registered new driver usbtouchscreen" > > > which seems to indicate a problem. > > > usbcore registers several other drivers, such as usbserial and pl2303 > > > that makes the gps work. It also registers other drivers like > > > usb-storage,usbfs,hub,libusual,hiddev,usbhid. But not usbtouchscreen. > > > I believe I have turned on every config option for usb touchscreen, > > > this should not be missing. > > > > > > Is there something wrong, or could there be a seemingly unrelated option > > > that I need to turn on? > > > > Please make sure that you have CONFIG_USB_TOUCHSCREEN turned on. > > > Unfortunately, I have: > CONFIG_USB_TOUCHSCREEN=y > CONFIG_USB_TOUCHSCREEN_EGALAX=y > > Anything else I may have missed? > Is 2.6.21-rc6 OK? If so, please keep a close eye on 2.6.22-rcX, let us know if/when we've moved this breakage into mainline :( - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: usb touchscreen breakage in 2.6.21-rc5-mm4 ?
On Fri, Apr 06, 2007 at 10:37:12PM -0400, Dmitry Torokhov wrote: > On Friday 06 April 2007 20:54, Helge Hafting wrote: > > I have an usb touchscreen (egalax variety) that works with > > the 2.6.18 kernel supplied by debian. > > > > It fails when I compile 2.6.21-rc5-mm4, tuned to the machine > > in question. Unlike the debian kernel, this kernel don't use > > modules in order to save boot time. > > > > The strange thing is, 2.6.21-rc5-mm4 recognizes the device. > > dmesg says things like > > usb 3-2: Manufacturer: eGalac Inc. > > usb 3-2: Product: USB TouchController > > > > and a lot more. Unlike 2.6.18, it never gets around to say > > "usbcore: registered new driver usbtouchscreen" > > which seems to indicate a problem. > > usbcore registers several other drivers, such as usbserial and pl2303 > > that makes the gps work. It also registers other drivers like > > usb-storage,usbfs,hub,libusual,hiddev,usbhid. But not usbtouchscreen. > > I believe I have turned on every config option for usb touchscreen, > > this should not be missing. > > > > Is there something wrong, or could there be a seemingly unrelated option > > that I need to turn on? > > Please make sure that you have CONFIG_USB_TOUCHSCREEN turned on. > Unfortunately, I have: CONFIG_USB_TOUCHSCREEN=y CONFIG_USB_TOUCHSCREEN_EGALAX=y Anything else I may have missed? Helge Hafting - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: usb touchscreen breakage in 2.6.21-rc5-mm4 ?
On Fri, Apr 06, 2007 at 10:37:12PM -0400, Dmitry Torokhov wrote: On Friday 06 April 2007 20:54, Helge Hafting wrote: I have an usb touchscreen (egalax variety) that works with the 2.6.18 kernel supplied by debian. It fails when I compile 2.6.21-rc5-mm4, tuned to the machine in question. Unlike the debian kernel, this kernel don't use modules in order to save boot time. The strange thing is, 2.6.21-rc5-mm4 recognizes the device. dmesg says things like usb 3-2: Manufacturer: eGalac Inc. usb 3-2: Product: USB TouchController and a lot more. Unlike 2.6.18, it never gets around to say usbcore: registered new driver usbtouchscreen which seems to indicate a problem. usbcore registers several other drivers, such as usbserial and pl2303 that makes the gps work. It also registers other drivers like usb-storage,usbfs,hub,libusual,hiddev,usbhid. But not usbtouchscreen. I believe I have turned on every config option for usb touchscreen, this should not be missing. Is there something wrong, or could there be a seemingly unrelated option that I need to turn on? Please make sure that you have CONFIG_USB_TOUCHSCREEN turned on. Unfortunately, I have: CONFIG_USB_TOUCHSCREEN=y CONFIG_USB_TOUCHSCREEN_EGALAX=y Anything else I may have missed? Helge Hafting - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: usb touchscreen breakage in 2.6.21-rc5-mm4 ?
On Tue, 10 Apr 2007 00:36:43 +0200 Helge Hafting [EMAIL PROTECTED] wrote: On Fri, Apr 06, 2007 at 10:37:12PM -0400, Dmitry Torokhov wrote: On Friday 06 April 2007 20:54, Helge Hafting wrote: I have an usb touchscreen (egalax variety) that works with the 2.6.18 kernel supplied by debian. It fails when I compile 2.6.21-rc5-mm4, tuned to the machine in question. Unlike the debian kernel, this kernel don't use modules in order to save boot time. The strange thing is, 2.6.21-rc5-mm4 recognizes the device. dmesg says things like usb 3-2: Manufacturer: eGalac Inc. usb 3-2: Product: USB TouchController and a lot more. Unlike 2.6.18, it never gets around to say usbcore: registered new driver usbtouchscreen which seems to indicate a problem. usbcore registers several other drivers, such as usbserial and pl2303 that makes the gps work. It also registers other drivers like usb-storage,usbfs,hub,libusual,hiddev,usbhid. But not usbtouchscreen. I believe I have turned on every config option for usb touchscreen, this should not be missing. Is there something wrong, or could there be a seemingly unrelated option that I need to turn on? Please make sure that you have CONFIG_USB_TOUCHSCREEN turned on. Unfortunately, I have: CONFIG_USB_TOUCHSCREEN=y CONFIG_USB_TOUCHSCREEN_EGALAX=y Anything else I may have missed? Is 2.6.21-rc6 OK? If so, please keep a close eye on 2.6.22-rcX, let us know if/when we've moved this breakage into mainline :( - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: usb touchscreen breakage in 2.6.21-rc5-mm4 ?
On Monday 09 April 2007 18:36, Helge Hafting wrote: On Fri, Apr 06, 2007 at 10:37:12PM -0400, Dmitry Torokhov wrote: On Friday 06 April 2007 20:54, Helge Hafting wrote: I have an usb touchscreen (egalax variety) that works with the 2.6.18 kernel supplied by debian. It fails when I compile 2.6.21-rc5-mm4, tuned to the machine in question. Unlike the debian kernel, this kernel don't use modules in order to save boot time. The strange thing is, 2.6.21-rc5-mm4 recognizes the device. dmesg says things like usb 3-2: Manufacturer: eGalac Inc. usb 3-2: Product: USB TouchController and a lot more. Unlike 2.6.18, it never gets around to say usbcore: registered new driver usbtouchscreen which seems to indicate a problem. usbcore registers several other drivers, such as usbserial and pl2303 that makes the gps work. It also registers other drivers like usb-storage,usbfs,hub,libusual,hiddev,usbhid. But not usbtouchscreen. I believe I have turned on every config option for usb touchscreen, this should not be missing. Is there something wrong, or could there be a seemingly unrelated option that I need to turn on? Please make sure that you have CONFIG_USB_TOUCHSCREEN turned on. Unfortunately, I have: CONFIG_USB_TOUCHSCREEN=y CONFIG_USB_TOUCHSCREEN_EGALAX=y Anything else I may have missed? Hmm, I am concerned because not only you don't have an input device created, you don't even see the driver being registered with usbcore. Could you please try booting with debug_initcall to see with what error code usbtouchscreen initialization fails? -- Dmitry - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: usb touchscreen breakage in 2.6.21-rc5-mm4 ?
On Friday 06 April 2007 20:54, Helge Hafting wrote: > I have an usb touchscreen (egalax variety) that works with > the 2.6.18 kernel supplied by debian. > > It fails when I compile 2.6.21-rc5-mm4, tuned to the machine > in question. Unlike the debian kernel, this kernel don't use > modules in order to save boot time. > > The strange thing is, 2.6.21-rc5-mm4 recognizes the device. > dmesg says things like > usb 3-2: Manufacturer: eGalac Inc. > usb 3-2: Product: USB TouchController > > and a lot more. Unlike 2.6.18, it never gets around to say > "usbcore: registered new driver usbtouchscreen" > which seems to indicate a problem. > usbcore registers several other drivers, such as usbserial and pl2303 > that makes the gps work. It also registers other drivers like > usb-storage,usbfs,hub,libusual,hiddev,usbhid. But not usbtouchscreen. > I believe I have turned on every config option for usb touchscreen, > this should not be missing. > > Is there something wrong, or could there be a seemingly unrelated option > that I need to turn on? Please make sure that you have CONFIG_USB_TOUCHSCREEN turned on. -- Dmitry - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
usb touchscreen breakage in 2.6.21-rc5-mm4 ?
I have an usb touchscreen (egalax variety) that works with the 2.6.18 kernel supplied by debian. It fails when I compile 2.6.21-rc5-mm4, tuned to the machine in question. Unlike the debian kernel, this kernel don't use modules in order to save boot time. The strange thing is, 2.6.21-rc5-mm4 recognizes the device. dmesg says things like usb 3-2: Manufacturer: eGalac Inc. usb 3-2: Product: USB TouchController and a lot more. Unlike 2.6.18, it never gets around to say "usbcore: registered new driver usbtouchscreen" which seems to indicate a problem. usbcore registers several other drivers, such as usbserial and pl2303 that makes the gps work. It also registers other drivers like usb-storage,usbfs,hub,libusual,hiddev,usbhid. But not usbtouchscreen. I believe I have turned on every config option for usb touchscreen, this should not be missing. Is there something wrong, or could there be a seemingly unrelated option that I need to turn on? Helge Hafting - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc5-mm4
On Fri, 06 Apr 2007 11:26:24 -0400 [EMAIL PROTECTED] wrote: > On Thu, 05 Apr 2007 13:31:09 PDT, Andrew Morton said: > > On Thu, 05 Apr 2007 13:02:59 -0400, [EMAIL PROTECTED] wrote: > > > > Am seeing an Oops 'cannot handle kernel paging request' during late > > > system startup, hand-copied traceback follows: > > > > > > avc_has_perm_noaudit+0x2bf/0x506 > > > avc_has_perm+0x2b/0x5b > > > selinux_socket_stream_connect+0x7e/0xc3 > > > unix_stream_connect+0x202/0x3f3 > > > sys_connect+0x7e/0xa4 > > > tracesys+0xde/0xe1 > > > Thanks. > > > > I'd have thought that the full trace could be captured with netconsole. > > I didn't have a second box available at first. Then I blew close to 45 > minutes trying to figure out why netconsole was totally failing to work, > before I found this in .config: > > # CONFIG_NETCONSOLE is not set > > "Do'h!" -- H. Simpson > > Unfortunately, defining netconsole caused NETPOLL to be defined, which caused > a recompile of half the known world, and the symptoms of the crash moved. > > Film at 11, once I figure out what's going on, and fix the testbed in my > office so I can actually catch this sucker - I may have to string a serial > cable. One solid good data point: > > 21-rc5 with only the -mm4 'origin.patch' applied is OK, so whatever the > issue is, it's not in Linus's tree. > Oh well. If it's all too much fuss, feel free to send the .config. If it happens on my machine(s) I can bisect it real quick. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc5-mm4
On Thu, 05 Apr 2007 13:31:09 PDT, Andrew Morton said: > On Thu, 05 Apr 2007 13:02:59 -0400, [EMAIL PROTECTED] wrote: > > Am seeing an Oops 'cannot handle kernel paging request' during late > > system startup, hand-copied traceback follows: > > > > avc_has_perm_noaudit+0x2bf/0x506 > > avc_has_perm+0x2b/0x5b > > selinux_socket_stream_connect+0x7e/0xc3 > > unix_stream_connect+0x202/0x3f3 > > sys_connect+0x7e/0xa4 > > tracesys+0xde/0xe1 > Thanks. > > I'd have thought that the full trace could be captured with netconsole. I didn't have a second box available at first. Then I blew close to 45 minutes trying to figure out why netconsole was totally failing to work, before I found this in .config: # CONFIG_NETCONSOLE is not set "Do'h!" -- H. Simpson Unfortunately, defining netconsole caused NETPOLL to be defined, which caused a recompile of half the known world, and the symptoms of the crash moved. Film at 11, once I figure out what's going on, and fix the testbed in my office so I can actually catch this sucker - I may have to string a serial cable. One solid good data point: 21-rc5 with only the -mm4 'origin.patch' applied is OK, so whatever the issue is, it's not in Linus's tree. pgpnZHAiRtK0P.pgp Description: PGP signature
Re: 2.6.21-rc5-mm4
Jiri Kosina <[EMAIL PROTECTED]> writes: > Hi Eric, > > after struggling with this issue for some time, I think that it's just > some incosistent usage of NR_IRQS throughout the source probably due to > some include hell. I really don't understand the how the mach-*/ includes > are supposed to work. > > I found out (by disassembling resulting vmlinux binaries) that in > arch/i386/kernel/entry.S, the loop in irq_entries_start does too little > iterations compared to NR_IRQS value as seen in for example io_apic.c > > The super-stupid proof-patch below fixes the panic on my system. It's just > to demonstrate that the i386 includes really need fixing to be consistent > somehow. Thanks, and that would do it, it makes sense why it was the irq patch that caused problems. I had forgotten about the number of stubs issue. I had to clean that up on x86_64 as well and it probably makes most sense to put that cleanup as well, so we have a small fixed number of stubs which would make the includes not matter. Bleh. Hopefully soon. Eric > diff --git a/arch/i386/kernel/entry.S b/arch/i386/kernel/entry.S > index 976438c..b20dc07 100644 > --- a/arch/i386/kernel/entry.S > +++ b/arch/i386/kernel/entry.S > @@ -53,6 +53,8 @@ > #include > #include "irq_vectors.h" > > +#define NR_IRQS 4096 > + > /* > * We use macros for low-level operations which need to be overridden > * for paravirtualization. The following will never clobber any registers: > > -- > Jiri Kosina - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc5-mm4
On Wed, 4 Apr 2007, Eric W. Biederman wrote: > > And the bisection winner is > > > > i386-irq-kill-nr_irq_vectors-and-increase-nr_irqs.patch > > > > I don't immediately see how it could be causing it, so adding CCs which > > are listed in the patch. > Weird. I will have to look at that in a little more detail. > Do you know if this problem happens on x86_64? What does your .config > look like? What does /proc/interrupts look like? What kind of hardware > you running this kernel on? Can anyone else reproduce this? > The oops clearly shows something using -1 and calling that as an > address I don't know why, but I'm guessing I have triggered a memory > stomp somewhere. I think this is the first time I have seen a small > negative number causing a NULL pointer dereference. > That patch looks innocuous enough that either: > - I just missed changing something I should have. > - Your configuration has an increase in NR_IRQS and that triggered > something. > - The patch simply permuted things so a memory stomp now happens > on the e1000 data structures instead of somewhere else. > - Something doesn't like large irq numbers. > This work is essentially a backport from x86_64 so if your hardware > is 64bit capable testing that should be a fairly easy test, and be > able to rule out large irq numbers as the culprit. > Until I get a good look at -mm I'm going to have a hard time guessing. > But a roving memory stomp is my best guess. Hi Eric, after struggling with this issue for some time, I think that it's just some incosistent usage of NR_IRQS throughout the source probably due to some include hell. I really don't understand the how the mach-*/ includes are supposed to work. I found out (by disassembling resulting vmlinux binaries) that in arch/i386/kernel/entry.S, the loop in irq_entries_start does too little iterations compared to NR_IRQS value as seen in for example io_apic.c The super-stupid proof-patch below fixes the panic on my system. It's just to demonstrate that the i386 includes really need fixing to be consistent somehow. diff --git a/arch/i386/kernel/entry.S b/arch/i386/kernel/entry.S index 976438c..b20dc07 100644 --- a/arch/i386/kernel/entry.S +++ b/arch/i386/kernel/entry.S @@ -53,6 +53,8 @@ #include #include "irq_vectors.h" +#define NR_IRQS 4096 + /* * We use macros for low-level operations which need to be overridden * for paravirtualization. The following will never clobber any registers: -- Jiri Kosina - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc5-mm4
On Wed, 4 Apr 2007, Eric W. Biederman wrote: And the bisection winner is i386-irq-kill-nr_irq_vectors-and-increase-nr_irqs.patch I don't immediately see how it could be causing it, so adding CCs which are listed in the patch. Weird. I will have to look at that in a little more detail. Do you know if this problem happens on x86_64? What does your .config look like? What does /proc/interrupts look like? What kind of hardware you running this kernel on? Can anyone else reproduce this? The oops clearly shows something using -1 and calling that as an address I don't know why, but I'm guessing I have triggered a memory stomp somewhere. I think this is the first time I have seen a small negative number causing a NULL pointer dereference. That patch looks innocuous enough that either: - I just missed changing something I should have. - Your configuration has an increase in NR_IRQS and that triggered something. - The patch simply permuted things so a memory stomp now happens on the e1000 data structures instead of somewhere else. - Something doesn't like large irq numbers. This work is essentially a backport from x86_64 so if your hardware is 64bit capable testing that should be a fairly easy test, and be able to rule out large irq numbers as the culprit. Until I get a good look at -mm I'm going to have a hard time guessing. But a roving memory stomp is my best guess. Hi Eric, after struggling with this issue for some time, I think that it's just some incosistent usage of NR_IRQS throughout the source probably due to some include hell. I really don't understand the how the mach-*/ includes are supposed to work. I found out (by disassembling resulting vmlinux binaries) that in arch/i386/kernel/entry.S, the loop in irq_entries_start does too little iterations compared to NR_IRQS value as seen in for example io_apic.c The super-stupid proof-patch below fixes the panic on my system. It's just to demonstrate that the i386 includes really need fixing to be consistent somehow. diff --git a/arch/i386/kernel/entry.S b/arch/i386/kernel/entry.S index 976438c..b20dc07 100644 --- a/arch/i386/kernel/entry.S +++ b/arch/i386/kernel/entry.S @@ -53,6 +53,8 @@ #include asm/dwarf2.h #include irq_vectors.h +#define NR_IRQS 4096 + /* * We use macros for low-level operations which need to be overridden * for paravirtualization. The following will never clobber any registers: -- Jiri Kosina - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc5-mm4
Jiri Kosina [EMAIL PROTECTED] writes: Hi Eric, after struggling with this issue for some time, I think that it's just some incosistent usage of NR_IRQS throughout the source probably due to some include hell. I really don't understand the how the mach-*/ includes are supposed to work. I found out (by disassembling resulting vmlinux binaries) that in arch/i386/kernel/entry.S, the loop in irq_entries_start does too little iterations compared to NR_IRQS value as seen in for example io_apic.c The super-stupid proof-patch below fixes the panic on my system. It's just to demonstrate that the i386 includes really need fixing to be consistent somehow. Thanks, and that would do it, it makes sense why it was the irq patch that caused problems. I had forgotten about the number of stubs issue. I had to clean that up on x86_64 as well and it probably makes most sense to put that cleanup as well, so we have a small fixed number of stubs which would make the includes not matter. Bleh. Hopefully soon. Eric diff --git a/arch/i386/kernel/entry.S b/arch/i386/kernel/entry.S index 976438c..b20dc07 100644 --- a/arch/i386/kernel/entry.S +++ b/arch/i386/kernel/entry.S @@ -53,6 +53,8 @@ #include asm/dwarf2.h #include irq_vectors.h +#define NR_IRQS 4096 + /* * We use macros for low-level operations which need to be overridden * for paravirtualization. The following will never clobber any registers: -- Jiri Kosina - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc5-mm4
On Thu, 05 Apr 2007 13:31:09 PDT, Andrew Morton said: On Thu, 05 Apr 2007 13:02:59 -0400, [EMAIL PROTECTED] wrote: Am seeing an Oops 'cannot handle kernel paging request' during late system startup, hand-copied traceback follows: avc_has_perm_noaudit+0x2bf/0x506 avc_has_perm+0x2b/0x5b selinux_socket_stream_connect+0x7e/0xc3 unix_stream_connect+0x202/0x3f3 sys_connect+0x7e/0xa4 tracesys+0xde/0xe1 Thanks. I'd have thought that the full trace could be captured with netconsole. I didn't have a second box available at first. Then I blew close to 45 minutes trying to figure out why netconsole was totally failing to work, before I found this in .config: # CONFIG_NETCONSOLE is not set Do'h! -- H. Simpson Unfortunately, defining netconsole caused NETPOLL to be defined, which caused a recompile of half the known world, and the symptoms of the crash moved. Film at 11, once I figure out what's going on, and fix the testbed in my office so I can actually catch this sucker - I may have to string a serial cable. One solid good data point: 21-rc5 with only the -mm4 'origin.patch' applied is OK, so whatever the issue is, it's not in Linus's tree. pgpnZHAiRtK0P.pgp Description: PGP signature
Re: 2.6.21-rc5-mm4
On Fri, 06 Apr 2007 11:26:24 -0400 [EMAIL PROTECTED] wrote: On Thu, 05 Apr 2007 13:31:09 PDT, Andrew Morton said: On Thu, 05 Apr 2007 13:02:59 -0400, [EMAIL PROTECTED] wrote: Am seeing an Oops 'cannot handle kernel paging request' during late system startup, hand-copied traceback follows: avc_has_perm_noaudit+0x2bf/0x506 avc_has_perm+0x2b/0x5b selinux_socket_stream_connect+0x7e/0xc3 unix_stream_connect+0x202/0x3f3 sys_connect+0x7e/0xa4 tracesys+0xde/0xe1 Thanks. I'd have thought that the full trace could be captured with netconsole. I didn't have a second box available at first. Then I blew close to 45 minutes trying to figure out why netconsole was totally failing to work, before I found this in .config: # CONFIG_NETCONSOLE is not set Do'h! -- H. Simpson Unfortunately, defining netconsole caused NETPOLL to be defined, which caused a recompile of half the known world, and the symptoms of the crash moved. Film at 11, once I figure out what's going on, and fix the testbed in my office so I can actually catch this sucker - I may have to string a serial cable. One solid good data point: 21-rc5 with only the -mm4 'origin.patch' applied is OK, so whatever the issue is, it's not in Linus's tree. Oh well. If it's all too much fuss, feel free to send the .config. If it happens on my machine(s) I can bisect it real quick. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
usb touchscreen breakage in 2.6.21-rc5-mm4 ?
I have an usb touchscreen (egalax variety) that works with the 2.6.18 kernel supplied by debian. It fails when I compile 2.6.21-rc5-mm4, tuned to the machine in question. Unlike the debian kernel, this kernel don't use modules in order to save boot time. The strange thing is, 2.6.21-rc5-mm4 recognizes the device. dmesg says things like usb 3-2: Manufacturer: eGalac Inc. usb 3-2: Product: USB TouchController and a lot more. Unlike 2.6.18, it never gets around to say usbcore: registered new driver usbtouchscreen which seems to indicate a problem. usbcore registers several other drivers, such as usbserial and pl2303 that makes the gps work. It also registers other drivers like usb-storage,usbfs,hub,libusual,hiddev,usbhid. But not usbtouchscreen. I believe I have turned on every config option for usb touchscreen, this should not be missing. Is there something wrong, or could there be a seemingly unrelated option that I need to turn on? Helge Hafting - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: usb touchscreen breakage in 2.6.21-rc5-mm4 ?
On Friday 06 April 2007 20:54, Helge Hafting wrote: I have an usb touchscreen (egalax variety) that works with the 2.6.18 kernel supplied by debian. It fails when I compile 2.6.21-rc5-mm4, tuned to the machine in question. Unlike the debian kernel, this kernel don't use modules in order to save boot time. The strange thing is, 2.6.21-rc5-mm4 recognizes the device. dmesg says things like usb 3-2: Manufacturer: eGalac Inc. usb 3-2: Product: USB TouchController and a lot more. Unlike 2.6.18, it never gets around to say usbcore: registered new driver usbtouchscreen which seems to indicate a problem. usbcore registers several other drivers, such as usbserial and pl2303 that makes the gps work. It also registers other drivers like usb-storage,usbfs,hub,libusual,hiddev,usbhid. But not usbtouchscreen. I believe I have turned on every config option for usb touchscreen, this should not be missing. Is there something wrong, or could there be a seemingly unrelated option that I need to turn on? Please make sure that you have CONFIG_USB_TOUCHSCREEN turned on. -- Dmitry - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RAID1 "out of memory" error, was Re: 2.6.21-rc5-mm4
On 4/5/07, Andrew Morton <[EMAIL PROTECTED]> wrote: On Fri, 06 Apr 2007 02:33:03 +1000 Reuben Farrelly <[EMAIL PROTECTED]> wrote: > Hi, > > On 3/04/2007 3:47 PM, Andrew Morton wrote: > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc5/2.6.21-rc5-mm4/ > > > > - The oops in git-net.patch has been fixed, so that tree has been restored. > > It is huge. > > > > - Added the device-mapper development tree to the -mm lineup (Alasdair > > Kergon). It is a quilt tree, living at > > ftp://ftp.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/. > > > > - Added davidel's signalfd stuff. > > Looks like some damage, or maybe intolerance to on-disk damage, to RAID-1. > > md1 is the first array on the disk, and it refuses to start up on boot, or after > boot. > > ... > > tornado ~ # mdadm --assemble /dev/md1 /dev/sda1 /dev/sdc1 > mdadm: device /dev/md1 already active - cannot assemble it > tornado ~ # mdadm --run /dev/md1 > mdadm: failed to run array /dev/md1: Cannot allocate memory > tornado ~ # > > and looking at a dmesg, this is logged: > > md: bind > md: bind > raid1: raid set md1 active with 2 out of 2 mirrors > md1: bitmap initialized from disk: read 0/1 pages, set 0 bits, status: -12 > md1: failed to create bitmap (-12) > md: pers->run() failed ... Is this the dmesg from boot or the dmesg after running the mdadm --run command? > > tornado ~ # uname -a > Linux tornado 2.6.21-rc5-mm4 #1 SMP Thu Apr 5 23:47:42 EST 2007 x86_64 Intel(R) > Pentium(R) 4 CPU 3.00GHz GenuineIntel GNU/Linux > tornado ~ # > > The last known version that worked was 2.6.21-rc3-mm1 - I haven't been testing > out the -mm releases so much lately. OK. I assume that bitmap->chunks in bitmap_init_from_disk() has some unexpectedly large value. I don't _think_ there's anything in -mm which would have triggered this. Does mainline do the same thing? I guess it's possible that the code in git-md-accel.patch accidentally broke things. Perhaps try disabling CONFIG_DMA_ENGINE? git-md-accel.patch does not touch anything in the raid1 path, but I guess stranger things have happened. -- Dan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.6.21-rc5-mm4 initramfs Make Error
I built a version of 2.6.21-rc5-mm4 with an initramfs and it built OK the first time. Then I made changes (applied a Reiser4 patch) and rebuilt, and got the following error: zephyr linux # make CHK include/linux/version.h CHK include/linux/utsrelease.h CALLscripts/checksyscalls.sh :1356:2: warning: #warning syscall getcpu not implemented :1360:2: warning: #warning syscall epoll_pwait not implemented :1364:2: warning: #warning syscall lutimesat not implemented :1380:2: warning: #warning syscall revokeat not implemented :1384:2: warning: #warning syscall frevoke not implemented CHK include/linux/compile.h /usr/src/linux-2.6.21-rc5-mm4/usr/Makefile:41: *** target pattern contains no `%'. Stop. make: *** [usr] Error 2 I have this in the config: CONFIG_INITRAMFS_SOURCE="/initramfs" /initramfs is the directory where I build my initramfs, which is just a busybox setup, very simple. # rm usr/.initramfs_data.* seems to make it go again. -- Zan Lynx <[EMAIL PROTECTED]> signature.asc Description: This is a digitally signed message part
Re: 2.6.21-rc5-mm4
On Thu, 05 Apr 2007 13:02:59 -0400 [EMAIL PROTECTED] wrote: > On Mon, 02 Apr 2007 22:47:45 PDT, Andrew Morton said: > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc5/2.6.21-rc5-mm4/ > > Am seeing an Oops 'cannot handle kernel paging request' during late > system startup, hand-copied traceback follows: > > avc_has_perm_noaudit+0x2bf/0x506 > avc_has_perm+0x2b/0x5b > selinux_socket_stream_connect+0x7e/0xc3 > unix_stream_connect+0x202/0x3f3 > sys_connect+0x7e/0xa4 > tracesys+0xde/0xe1 > > I've not identified exactly when it happens, but it's towards the very end of > handling /etc/rc5.d, it's already up to the S98's. Odd thing is it only > happens > when I start with RedHat's 'graphical boot', and may be related to the > shutdown > of the X server that's displaying the boot progress preparing to launch the > X server for gdm logins (as I'm also seeing a hang sometimes when shutting > down - so it is possibly a "shutting down X server nukes system" bug). Thanks. I'd have thought that the full trace could be captured with netconsole. > Figured I'd toss this heads-up in case it rings any bells, while I go do > the bisection dance on -rc5-mm4 (-mm2 is OK, and -mm3 doesn't boot for me > for other reasons I didn't chase down before -mm4 came out and fixed it, so > I have a ways to bisect) > No, I'm not aware of anyone else hitting anything like that. Bisection would be good, and probably pretty quick - I'd pick git-net.patch as the first pivot point. But we'd still be wanting the full trace if poss please. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RAID1 "out of memory" error, was Re: 2.6.21-rc5-mm4
On Fri, 06 Apr 2007 02:33:03 +1000 Reuben Farrelly <[EMAIL PROTECTED]> wrote: > Hi, > > On 3/04/2007 3:47 PM, Andrew Morton wrote: > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc5/2.6.21-rc5-mm4/ > > > > - The oops in git-net.patch has been fixed, so that tree has been restored. > > It is huge. > > > > - Added the device-mapper development tree to the -mm lineup (Alasdair > > Kergon). It is a quilt tree, living at > > ftp://ftp.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/. > > > > - Added davidel's signalfd stuff. > > Looks like some damage, or maybe intolerance to on-disk damage, to RAID-1. > > md1 is the first array on the disk, and it refuses to start up on boot, or > after > boot. > > ... > > tornado ~ # mdadm --assemble /dev/md1 /dev/sda1 /dev/sdc1 > mdadm: device /dev/md1 already active - cannot assemble it > tornado ~ # mdadm --run /dev/md1 > mdadm: failed to run array /dev/md1: Cannot allocate memory > tornado ~ # > > and looking at a dmesg, this is logged: > > md: bind > md: bind > raid1: raid set md1 active with 2 out of 2 mirrors > md1: bitmap initialized from disk: read 0/1 pages, set 0 bits, status: -12 > md1: failed to create bitmap (-12) > md: pers->run() failed ... > > tornado ~ # uname -a > Linux tornado 2.6.21-rc5-mm4 #1 SMP Thu Apr 5 23:47:42 EST 2007 x86_64 > Intel(R) > Pentium(R) 4 CPU 3.00GHz GenuineIntel GNU/Linux > tornado ~ # > > The last known version that worked was 2.6.21-rc3-mm1 - I haven't been > testing > out the -mm releases so much lately. OK. I assume that bitmap->chunks in bitmap_init_from_disk() has some unexpectedly large value. I don't _think_ there's anything in -mm which would have triggered this. Does mainline do the same thing? I guess it's possible that the code in git-md-accel.patch accidentally broke things. Perhaps try disabling CONFIG_DMA_ENGINE? > Also, Andrew, can you please restart posting/cc'ing your -mm announcements to > the [EMAIL PROTECTED] list? Seems this stopped around about > 2.6.20, it was handy. hm. I always Bcc [EMAIL PROTECTED] I assume that its filters didn't get updated after the s/osdl/linux-foundation/ thing. I'll talk to people, thanks. > .config is up at http://www.reub.net/files/kernel/configs/2.6.21-rc5-mm4 > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc5-mm4: ia64: scheduling while atomic - utrace?
Thanks for the report. I introduced this bug recently when I changed around some of the locking but forgot about the writeback issue. I don't think this is directly related to any other crash you might have seen. I've moved the call out of the lock-holding region, where it didn't need to be. I'm updating my patch series now; I've appended the incremental patch. Thanks, Roland --- kernel/ptrace.c | 20 ++-- 1 files changed, 10 insertions(+), 10 deletions(-) diff --git a/kernel/ptrace.c b/kernel/ptrace.c index fb6c3fb..c31d744 100644 --- a/kernel/ptrace.c +++ b/kernel/ptrace.c @@ -1473,16 +1473,6 @@ ptrace_report(struct utrace_attached_eng */ utrace_set_flags(tsk, engine, engine->flags | UTRACE_ACTION_QUIESCE); - /* -* If regset 0 has a writeback call, do it now. On register window -* machines, this makes sure the user memory backing the register -* data is up to date by the time wait_task_inactive returns to -* ptrace_start in our tracer doing a PTRACE_PEEKDATA or the like. -*/ - regset = utrace_regset(tsk, engine, utrace_native_view(tsk), 0); - if (regset->writeback) - (*regset->writeback)(tsk, regset, 0); - BUG_ON(code == 0); tsk->exit_code = code; do_notify(tsk, state->parent, CLD_TRAPPED); @@ -1494,6 +1484,16 @@ ptrace_report(struct utrace_attached_eng NO_LOCKS; + /* +* If regset 0 has a writeback call, do it now. On register window +* machines, this makes sure the user memory backing the register +* data is up to date by the time wait_task_inactive returns to +* ptrace_start in our tracer doing a PTRACE_PEEKDATA or the like. +*/ + regset = utrace_regset(tsk, engine, utrace_native_view(tsk), 0); + if (regset->writeback) + (*regset->writeback)(tsk, regset, 0); + return UTRACE_ACTION_RESUME; } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc5-mm4 (SLUB)
On Thu, 5 Apr 2007, Badari Pulavarty wrote: > On Wed, 2007-04-04 at 21:29 -0700, Christoph Lameter wrote: > > Here is a patch that adds validation (only for cpuslabs and partial > > slabs but thats where the action is). Apply this patch > > and then do > > > > echo 1 >/sys/slab//validate > > > > I suggest to boot with full debugging and then run this on the ACPI slabs. > > Did this and didn't trigger any problems. Duh. Must have been in the full slabs. Maybe I should add a tracking of full slabs for the debug case. Would also enable leak detection. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.6.21-rc5-mm4: ia64: scheduling while atomic - utrace?
Running a 'usex -e' load [http://people.redhat.com/~anderson/usex/] on 2.6.21-rc5-mm4 on ia64, I see the following: BUG: scheduling while atomic: strace/0x4001/20162 Call Trace: [] show_stack+0x80/0xa0 sp=e76042dc7610 bsp=e76042dc1260 [] dump_stack+0x30/0x60 sp=e76042dc77e0 bsp=e76042dc1248 [] schedule+0x1d00/0x22a0 sp=e76042dc77e0 bsp=e76042dc1108 [] __cond_resched+0x50/0xa0 sp=e76042dc7800 bsp=e76042dc10e8 [] cond_resched+0xb0/0xe0 sp=e76042dc7800 bsp=e76042dc10d0 [] get_user_pages+0x1b0/0x7c0 sp=e76042dc7800 bsp=e76042dc1028 [] access_process_vm+0xc0/0x440 sp=e76042dc7820 bsp=e76042dc0f78 [] ia64_sync_user_rbs+0x80/0x100 sp=e76042dc7830 bsp=e76042dc0f38 [] do_gpregs_writeback+0xb0/0xe0 sp=e76042dc7840 bsp=e76042dc0f10 [] unw_init_running+0x70/0xa0 sp=e76042dc7850 bsp=e76042dc0ee8 [] do_regset_call+0x110/0x140 sp=e76042dc7c30 bsp=e76042dc0e88 [] gpregs_writeback+0x40/0x60 sp=e76042dc7e30 bsp=e76042dc0e60 [] ptrace_report+0xe0/0x1e0 sp=e76042dc7e30 bsp=e76042dc0e28 [] ptrace_report_syscall+0xa0/0xe0 sp=e76042dc7e30 bsp=e76042dc0e00 [] ptrace_report_syscall_exit+0x30/0x60 sp=e76042dc7e30 bsp=e76042dc0dc8 [] utrace_report_syscall+0xf0/0x540 sp=e76042dc7e30 bsp=e76042dc0d48 [] syscall_trace_leave+0x60/0xc0 sp=e76042dc7e30 bsp=e76042dc0cf0 [] ia64_trace_syscall+0x100/0x110 sp=e76042dc7e30 bsp=e76042dc0cf0 Looks like get_ptrace_state(), called from ptrace_report_syscall calls rcu_read_lock() which disables preemption. Corresponding rcu_read_unlock() will be from put_ptrace_state() from ptrace_report() at end of report. However, ia64 needs to sync register backing store, and this requires access to process vm. get_user_pages' use of cond_sched() is tripping the "scheduling while atomic" bug. May be related to: http://marc.info/?a=10288337963=1=4 Lee - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc5-mm4
On Mon, 02 Apr 2007 22:47:45 PDT, Andrew Morton said: > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc5/2.6.21-rc5-mm4/ Am seeing an Oops 'cannot handle kernel paging request' during late system startup, hand-copied traceback follows: avc_has_perm_noaudit+0x2bf/0x506 avc_has_perm+0x2b/0x5b selinux_socket_stream_connect+0x7e/0xc3 unix_stream_connect+0x202/0x3f3 sys_connect+0x7e/0xa4 tracesys+0xde/0xe1 I've not identified exactly when it happens, but it's towards the very end of handling /etc/rc5.d, it's already up to the S98's. Odd thing is it only happens when I start with RedHat's 'graphical boot', and may be related to the shutdown of the X server that's displaying the boot progress preparing to launch the X server for gdm logins (as I'm also seeing a hang sometimes when shutting down - so it is possibly a "shutting down X server nukes system" bug). Figured I'd toss this heads-up in case it rings any bells, while I go do the bisection dance on -rc5-mm4 (-mm2 is OK, and -mm3 doesn't boot for me for other reasons I didn't chase down before -mm4 came out and fixed it, so I have a ways to bisect) pgpDJNQg7QOfl.pgp Description: PGP signature
RAID1 "out of memory" error, was Re: 2.6.21-rc5-mm4
Hi, On 3/04/2007 3:47 PM, Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc5/2.6.21-rc5-mm4/ - The oops in git-net.patch has been fixed, so that tree has been restored. It is huge. - Added the device-mapper development tree to the -mm lineup (Alasdair Kergon). It is a quilt tree, living at ftp://ftp.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/. - Added davidel's signalfd stuff. Looks like some damage, or maybe intolerance to on-disk damage, to RAID-1. md1 is the first array on the disk, and it refuses to start up on boot, or after boot. tornado ~ # cat /proc/mdstat Personalities : [raid1] md1 : inactive sda1[0] sdc1[1] 208640 blocks md3 : active raid1 sdc3[1] sda3[0] 20008832 blocks [2/2] [UU] bitmap: 0/153 pages [0KB], 64KB chunk md5 : active raid1 sdc5[1] sda5[0] 10008384 blocks [2/2] [UU] bitmap: 4/153 pages [16KB], 32KB chunk md6 : active raid1 sdc6[1] sda6[0] 10008384 blocks [2/2] [UU] bitmap: 0/153 pages [0KB], 32KB chunk md8 : active raid1 sdc8[1] sda8[0] 1003904 blocks [2/2] [UU] bitmap: 0/123 pages [0KB], 4KB chunk md10 : active raid1 sdc10[1] sda10[0] 119933120 blocks [2/2] [UU] bitmap: 1/229 pages [4KB], 256KB chunk md2 : active raid1 sdc2[1] sda2[0] 14544 blocks [2/2] [UU] bitmap: 10/191 pages [40KB], 256KB chunk unused devices: tornado ~ # tornado ~ # mdadm --examine /dev/sda1 /dev/sda1: Magic : a92b4efc Version : 00.90.00 UUID : f5c2e565:5ed956c0:33b08c07:16154426 Creation Time : Fri Feb 2 10:16:29 2007 Raid Level : raid1 Used Dev Size : 104320 (101.89 MiB 106.82 MB) Array Size : 104320 (101.89 MiB 106.82 MB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 1 Update Time : Fri Apr 6 02:06:17 2007 State : clean Internal Bitmap : present Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Checksum : d3668aaa - correct Events : 0.368 Number Major Minor RaidDevice State this 0 810 active sync /dev/sda1 0 0 810 active sync /dev/sda1 1 1 8 331 active sync /dev/sdc1 tornado ~ # mdadm --examine /dev/sdc1 /dev/sdc1: Magic : a92b4efc Version : 00.90.00 UUID : f5c2e565:5ed956c0:33b08c07:16154426 Creation Time : Fri Feb 2 10:16:29 2007 Raid Level : raid1 Used Dev Size : 104320 (101.89 MiB 106.82 MB) Array Size : 104320 (101.89 MiB 106.82 MB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 1 Update Time : Fri Apr 6 02:06:17 2007 State : clean Internal Bitmap : present Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Checksum : d3668acc - correct Events : 0.368 Number Major Minor RaidDevice State this 1 8 331 active sync /dev/sdc1 0 0 810 active sync /dev/sda1 1 1 8 331 active sync /dev/sdc1 tornado ~ # tornado ~ # mdadm --assemble /dev/md1 /dev/sda1 /dev/sdc1 mdadm: device /dev/md1 already active - cannot assemble it tornado ~ # mdadm --run /dev/md1 mdadm: failed to run array /dev/md1: Cannot allocate memory tornado ~ # and looking at a dmesg, this is logged: md: bind md: bind raid1: raid set md1 active with 2 out of 2 mirrors md1: bitmap initialized from disk: read 0/1 pages, set 0 bits, status: -12 md1: failed to create bitmap (-12) md: pers->run() failed ... tornado ~ # uname -a Linux tornado 2.6.21-rc5-mm4 #1 SMP Thu Apr 5 23:47:42 EST 2007 x86_64 Intel(R) Pentium(R) 4 CPU 3.00GHz GenuineIntel GNU/Linux tornado ~ # The last known version that worked was 2.6.21-rc3-mm1 - I haven't been testing out the -mm releases so much lately. Also, Andrew, can you please restart posting/cc'ing your -mm announcements to the [EMAIL PROTECTED] list? Seems this stopped around about 2.6.20, it was handy. .config is up at http://www.reub.net/files/kernel/configs/2.6.21-rc5-mm4 Thanks, Reuben - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc5-mm4 (SLUB)
On Wed, 2007-04-04 at 21:29 -0700, Christoph Lameter wrote: > Here is a patch that adds validation (only for cpuslabs and partial > slabs but thats where the action is). Apply this patch > and then do > > echo 1 >/sys/slab//validate > > I suggest to boot with full debugging and then run this on the ACPI slabs. Did this and didn't trigger any problems. (Just to be clear, booted with "slub_debug" with all the patches applied). --- Validating slabcache 'Acpi-Namespace' --- Checked 0 slabs in 'Acpi-Namespace' --- Validating slabcache 'Acpi-Operand' --- Checked 5 slabs in 'Acpi-Operand' --- Validating slabcache 'Acpi-Parse' --- Checked 0 slabs in 'Acpi-Parse' --- Validating slabcache 'Acpi-ParseExt' --- Checked 0 slabs in 'Acpi-ParseExt' --- Validating slabcache 'Acpi-State' --- Checked 0 slabs in 'Acpi-State' --- Validating slabcache 'Acpi-Namespace' --- Checked 0 slabs in 'Acpi-Namespace' --- Validating slabcache 'Acpi-Operand' --- Checked 5 slabs in 'Acpi-Operand' --- Validating slabcache 'Acpi-Parse' --- Checked 0 slabs in 'Acpi-Parse' --- Validating slabcache 'Acpi-ParseExt' --- Checked 0 slabs in 'Acpi-ParseExt' --- Validating slabcache 'Acpi-State' --- Checked 0 slabs in 'Acpi-State' --- Validating slabcache 'RAW' --- Checked 1 slabs in 'RAW' --- Validating slabcache 'RAWv6' --- Checked 1 slabs in 'RAWv6' --- Validating slabcache 'TCP' --- Checked 3 slabs in 'TCP' --- Validating slabcache 'TCPv6' --- Checked 4 slabs in 'TCPv6' --- Validating slabcache 'UDP-Lite' --- Checked 0 slabs in 'UDP-Lite' --- Validating slabcache 'UDP' --- Checked 2 slabs in 'UDP' --- Validating slabcache 'UDPLITEv6' --- Checked 0 slabs in 'UDPLITEv6' --- Validating slabcache 'UDPv6' --- Checked 0 slabs in 'UDPv6' --- Validating slabcache 'UNIX' --- Checked 4 slabs in 'UNIX' --- Validating slabcache 'anon_vma' --- Checked 12 slabs in 'anon_vma' --- Validating slabcache 'arp_cache' --- Checked 2 slabs in 'arp_cache' --- Validating slabcache 'bdev_cache' --- Checked 3 slabs in 'bdev_cache' --- Validating slabcache 'bio' --- Checked 0 slabs in 'bio' --- Validating slabcache 'biovec-1' --- Checked 1 slabs in 'biovec-1' --- Validating slabcache 'biovec-128' --- Checked 1 slabs in 'biovec-128' --- Validating slabcache 'biovec-16' --- Checked 1 slabs in 'biovec-16' --- Validating slabcache 'biovec-256' --- Checked 1 slabs in 'biovec-256' --- Validating slabcache 'biovec-4' --- Checked 1 slabs in 'biovec-4' --- Validating slabcache 'biovec-64' --- Checked 1 slabs in 'biovec-64' --- Validating slabcache 'blkdev_ioc' --- Checked 4 slabs in 'blkdev_ioc' --- Validating slabcache 'blkdev_queue' --- Checked 1 slabs in 'blkdev_queue' --- Validating slabcache 'blkdev_requests' --- Checked 2 slabs in 'blkdev_requests' --- Validating slabcache 'buffer_head' --- Checked 4 slabs in 'buffer_head' --- Validating slabcache 'cfq_ioc_pool' --- Checked 4 slabs in 'cfq_ioc_pool' --- Validating slabcache 'cfq_pool' --- Checked 4 slabs in 'cfq_pool' --- Validating slabcache 'configfs_dir_cache' --- Checked 0 slabs in 'configfs_dir_cache' --- Validating slabcache 'dentry_cache' --- Checked 5 slabs in 'dentry_cache' --- Validating slabcache 'dm_io' --- Checked 0 slabs in 'dm_io' --- Validating slabcache 'dm_tio' --- Checked 0 slabs in 'dm_tio' --- Validating slabcache 'dnotify_cache' --- Checked 1 slabs in 'dnotify_cache' --- Validating slabcache 'dquot' --- Checked 0 slabs in 'dquot' --- Validating slabcache 'eventpoll_epi' --- Checked 1 slabs in 'eventpoll_epi' --- Validating slabcache 'eventpoll_pwq' --- Checked 1 slabs in 'eventpoll_pwq' --- Validating slabcache 'ext2_inode_cache' --- Checked 0 slabs in 'ext2_inode_cache' --- Validating slabcache 'ext2_xattr' --- Checked 0 slabs in 'ext2_xattr' --- Validating slabcache 'ext3_inode_cache' --- Checked 0 slabs in 'ext3_inode_cache' --- Validating slabcache 'ext3_xattr' --- Checked 0 slabs in 'ext3_xattr' --- Validating slabcache 'fasync_cache' --- Checked 0 slabs in 'fasync_cache' --- Validating slabcache 'fib6_nodes' --- Checked 1 slabs in 'fib6_nodes' --- Validating slabcache 'file_lock_cache' --- Checked 2 slabs in 'file_lock_cache' --- Validating slabcache 'files_cache' --- Checked 10 slabs in 'files_cache' --- Validating slabcache 'filp' --- Checked 35 slabs in 'filp' --- Validating slabcache 'flow_cache' --- Checked 0 slabs in 'flow_cache' --- Validating slabcache 'fs_cache' --- Checked 5 slabs in 'fs_cache' --- Validating slabcache 'hugetlbfs_inode_cache' --- Checked 1 slabs in 'hugetlbfs_inode_cache' --- Validating slabcache 'idr_layer_cache' --- Checked 2 slabs in 'idr_layer_cache' --- Validating slabcache 'inet_peer_cache' --- Checked 0 slabs in 'inet_peer_cache' --- Validating slabcache 'inode_cache' --- Checked 8 slabs in 'inode_cache' --- Validating slabcache 'inotify_event_cache' --- Checked 0 slabs in 'inotify_event_cache' --- Validating slabcache 'inotify_watch_cache' --- Checked 1 slabs in 'inotify_watch_cache' --- Validating slabcache 'ip6_dst_cache' --- Checked 1 slabs in
Re: 2.6.21-rc5-mm4
On Wed, Apr 04, 2007 at 01:55:08PM -0400, [EMAIL PROTECTED] wrote: > On Tue, 03 Apr 2007 20:37:42 PDT, Randy Dunlap said: > > > > Good luck. But the symbols are there. Just use left/right arrow keys > > to scroll the display left/right and you can see them. Now if you just > > had that indicator to tell you that you Need to scroll to see more text... > > Exactly. :) I had the incredible bad luck that the line got cut off at the > end of a CONFIG_ symbol that made sense - if it had showed up *half* a symbol, > I'd have gone investigating. ;) (Even a '>' or '<' saying data offscreen to > right or left would be sufficient, if somebody wants a small but productive > kernel (config system actually) task to hack on.) > > I'd code it myself, but I have an SL8500 to install, and need to figure out > how my laptop made it into the bag this morning still up and running (I hit > the power button, it seemed to power down - blank screen, power light off, > but syslog msgs prove it was up and running for another 4 hours before it > shut down on a thermal check...) If you do not find time to do it try to ping me in a week or so. Should be trivial to do but away from my dev box atm. Sam - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc5-mm4: ia64: scheduling while atomic - utrace?
Thanks for the report. I introduced this bug recently when I changed around some of the locking but forgot about the writeback issue. I don't think this is directly related to any other crash you might have seen. I've moved the call out of the lock-holding region, where it didn't need to be. I'm updating my patch series now; I've appended the incremental patch. Thanks, Roland --- kernel/ptrace.c | 20 ++-- 1 files changed, 10 insertions(+), 10 deletions(-) diff --git a/kernel/ptrace.c b/kernel/ptrace.c index fb6c3fb..c31d744 100644 --- a/kernel/ptrace.c +++ b/kernel/ptrace.c @@ -1473,16 +1473,6 @@ ptrace_report(struct utrace_attached_eng */ utrace_set_flags(tsk, engine, engine-flags | UTRACE_ACTION_QUIESCE); - /* -* If regset 0 has a writeback call, do it now. On register window -* machines, this makes sure the user memory backing the register -* data is up to date by the time wait_task_inactive returns to -* ptrace_start in our tracer doing a PTRACE_PEEKDATA or the like. -*/ - regset = utrace_regset(tsk, engine, utrace_native_view(tsk), 0); - if (regset-writeback) - (*regset-writeback)(tsk, regset, 0); - BUG_ON(code == 0); tsk-exit_code = code; do_notify(tsk, state-parent, CLD_TRAPPED); @@ -1494,6 +1484,16 @@ ptrace_report(struct utrace_attached_eng NO_LOCKS; + /* +* If regset 0 has a writeback call, do it now. On register window +* machines, this makes sure the user memory backing the register +* data is up to date by the time wait_task_inactive returns to +* ptrace_start in our tracer doing a PTRACE_PEEKDATA or the like. +*/ + regset = utrace_regset(tsk, engine, utrace_native_view(tsk), 0); + if (regset-writeback) + (*regset-writeback)(tsk, regset, 0); + return UTRACE_ACTION_RESUME; } - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RAID1 out of memory error, was Re: 2.6.21-rc5-mm4
On Fri, 06 Apr 2007 02:33:03 +1000 Reuben Farrelly [EMAIL PROTECTED] wrote: Hi, On 3/04/2007 3:47 PM, Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc5/2.6.21-rc5-mm4/ - The oops in git-net.patch has been fixed, so that tree has been restored. It is huge. - Added the device-mapper development tree to the -mm lineup (Alasdair Kergon). It is a quilt tree, living at ftp://ftp.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/. - Added davidel's signalfd stuff. Looks like some damage, or maybe intolerance to on-disk damage, to RAID-1. md1 is the first array on the disk, and it refuses to start up on boot, or after boot. ... tornado ~ # mdadm --assemble /dev/md1 /dev/sda1 /dev/sdc1 mdadm: device /dev/md1 already active - cannot assemble it tornado ~ # mdadm --run /dev/md1 mdadm: failed to run array /dev/md1: Cannot allocate memory tornado ~ # and looking at a dmesg, this is logged: md: bindsdc1 md: bindsda1 raid1: raid set md1 active with 2 out of 2 mirrors md1: bitmap initialized from disk: read 0/1 pages, set 0 bits, status: -12 md1: failed to create bitmap (-12) md: pers-run() failed ... tornado ~ # uname -a Linux tornado 2.6.21-rc5-mm4 #1 SMP Thu Apr 5 23:47:42 EST 2007 x86_64 Intel(R) Pentium(R) 4 CPU 3.00GHz GenuineIntel GNU/Linux tornado ~ # The last known version that worked was 2.6.21-rc3-mm1 - I haven't been testing out the -mm releases so much lately. OK. I assume that bitmap-chunks in bitmap_init_from_disk() has some unexpectedly large value. I don't _think_ there's anything in -mm which would have triggered this. Does mainline do the same thing? I guess it's possible that the code in git-md-accel.patch accidentally broke things. Perhaps try disabling CONFIG_DMA_ENGINE? Also, Andrew, can you please restart posting/cc'ing your -mm announcements to the [EMAIL PROTECTED] list? Seems this stopped around about 2.6.20, it was handy. hm. I always Bcc [EMAIL PROTECTED] I assume that its filters didn't get updated after the s/osdl/linux-foundation/ thing. I'll talk to people, thanks. .config is up at http://www.reub.net/files/kernel/configs/2.6.21-rc5-mm4 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc5-mm4
On Thu, 05 Apr 2007 13:02:59 -0400 [EMAIL PROTECTED] wrote: On Mon, 02 Apr 2007 22:47:45 PDT, Andrew Morton said: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc5/2.6.21-rc5-mm4/ Am seeing an Oops 'cannot handle kernel paging request' during late system startup, hand-copied traceback follows: avc_has_perm_noaudit+0x2bf/0x506 avc_has_perm+0x2b/0x5b selinux_socket_stream_connect+0x7e/0xc3 unix_stream_connect+0x202/0x3f3 sys_connect+0x7e/0xa4 tracesys+0xde/0xe1 I've not identified exactly when it happens, but it's towards the very end of handling /etc/rc5.d, it's already up to the S98's. Odd thing is it only happens when I start with RedHat's 'graphical boot', and may be related to the shutdown of the X server that's displaying the boot progress preparing to launch the X server for gdm logins (as I'm also seeing a hang sometimes when shutting down - so it is possibly a shutting down X server nukes system bug). Thanks. I'd have thought that the full trace could be captured with netconsole. Figured I'd toss this heads-up in case it rings any bells, while I go do the bisection dance on -rc5-mm4 (-mm2 is OK, and -mm3 doesn't boot for me for other reasons I didn't chase down before -mm4 came out and fixed it, so I have a ways to bisect) No, I'm not aware of anyone else hitting anything like that. Bisection would be good, and probably pretty quick - I'd pick git-net.patch as the first pivot point. But we'd still be wanting the full trace if poss please. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.6.21-rc5-mm4 initramfs Make Error
I built a version of 2.6.21-rc5-mm4 with an initramfs and it built OK the first time. Then I made changes (applied a Reiser4 patch) and rebuilt, and got the following error: zephyr linux # make CHK include/linux/version.h CHK include/linux/utsrelease.h CALLscripts/checksyscalls.sh stdin:1356:2: warning: #warning syscall getcpu not implemented stdin:1360:2: warning: #warning syscall epoll_pwait not implemented stdin:1364:2: warning: #warning syscall lutimesat not implemented stdin:1380:2: warning: #warning syscall revokeat not implemented stdin:1384:2: warning: #warning syscall frevoke not implemented CHK include/linux/compile.h /usr/src/linux-2.6.21-rc5-mm4/usr/Makefile:41: *** target pattern contains no `%'. Stop. make: *** [usr] Error 2 I have this in the config: CONFIG_INITRAMFS_SOURCE=/initramfs /initramfs is the directory where I build my initramfs, which is just a busybox setup, very simple. # rm usr/.initramfs_data.* seems to make it go again. -- Zan Lynx [EMAIL PROTECTED] signature.asc Description: This is a digitally signed message part
Re: RAID1 out of memory error, was Re: 2.6.21-rc5-mm4
On 4/5/07, Andrew Morton [EMAIL PROTECTED] wrote: On Fri, 06 Apr 2007 02:33:03 +1000 Reuben Farrelly [EMAIL PROTECTED] wrote: Hi, On 3/04/2007 3:47 PM, Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc5/2.6.21-rc5-mm4/ - The oops in git-net.patch has been fixed, so that tree has been restored. It is huge. - Added the device-mapper development tree to the -mm lineup (Alasdair Kergon). It is a quilt tree, living at ftp://ftp.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/. - Added davidel's signalfd stuff. Looks like some damage, or maybe intolerance to on-disk damage, to RAID-1. md1 is the first array on the disk, and it refuses to start up on boot, or after boot. ... tornado ~ # mdadm --assemble /dev/md1 /dev/sda1 /dev/sdc1 mdadm: device /dev/md1 already active - cannot assemble it tornado ~ # mdadm --run /dev/md1 mdadm: failed to run array /dev/md1: Cannot allocate memory tornado ~ # and looking at a dmesg, this is logged: md: bindsdc1 md: bindsda1 raid1: raid set md1 active with 2 out of 2 mirrors md1: bitmap initialized from disk: read 0/1 pages, set 0 bits, status: -12 md1: failed to create bitmap (-12) md: pers-run() failed ... Is this the dmesg from boot or the dmesg after running the mdadm --run command? tornado ~ # uname -a Linux tornado 2.6.21-rc5-mm4 #1 SMP Thu Apr 5 23:47:42 EST 2007 x86_64 Intel(R) Pentium(R) 4 CPU 3.00GHz GenuineIntel GNU/Linux tornado ~ # The last known version that worked was 2.6.21-rc3-mm1 - I haven't been testing out the -mm releases so much lately. OK. I assume that bitmap-chunks in bitmap_init_from_disk() has some unexpectedly large value. I don't _think_ there's anything in -mm which would have triggered this. Does mainline do the same thing? I guess it's possible that the code in git-md-accel.patch accidentally broke things. Perhaps try disabling CONFIG_DMA_ENGINE? git-md-accel.patch does not touch anything in the raid1 path, but I guess stranger things have happened. -- Dan - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc5-mm4
On Wed, Apr 04, 2007 at 01:55:08PM -0400, [EMAIL PROTECTED] wrote: On Tue, 03 Apr 2007 20:37:42 PDT, Randy Dunlap said: Good luck. But the symbols are there. Just use left/right arrow keys to scroll the display left/right and you can see them. Now if you just had that indicator to tell you that you Need to scroll to see more text... Exactly. :) I had the incredible bad luck that the line got cut off at the end of a CONFIG_ symbol that made sense - if it had showed up *half* a symbol, I'd have gone investigating. ;) (Even a '' or '' saying data offscreen to right or left would be sufficient, if somebody wants a small but productive kernel (config system actually) task to hack on.) I'd code it myself, but I have an SL8500 to install, and need to figure out how my laptop made it into the bag this morning still up and running (I hit the power button, it seemed to power down - blank screen, power light off, but syslog msgs prove it was up and running for another 4 hours before it shut down on a thermal check...) If you do not find time to do it try to ping me in a week or so. Should be trivial to do but away from my dev box atm. Sam - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc5-mm4 (SLUB)
On Wed, 2007-04-04 at 21:29 -0700, Christoph Lameter wrote: Here is a patch that adds validation (only for cpuslabs and partial slabs but thats where the action is). Apply this patch and then do echo 1 /sys/slab/cache-to-check/validate I suggest to boot with full debugging and then run this on the ACPI slabs. Did this and didn't trigger any problems. (Just to be clear, booted with slub_debug with all the patches applied). --- Validating slabcache 'Acpi-Namespace' --- Checked 0 slabs in 'Acpi-Namespace' --- Validating slabcache 'Acpi-Operand' --- Checked 5 slabs in 'Acpi-Operand' --- Validating slabcache 'Acpi-Parse' --- Checked 0 slabs in 'Acpi-Parse' --- Validating slabcache 'Acpi-ParseExt' --- Checked 0 slabs in 'Acpi-ParseExt' --- Validating slabcache 'Acpi-State' --- Checked 0 slabs in 'Acpi-State' --- Validating slabcache 'Acpi-Namespace' --- Checked 0 slabs in 'Acpi-Namespace' --- Validating slabcache 'Acpi-Operand' --- Checked 5 slabs in 'Acpi-Operand' --- Validating slabcache 'Acpi-Parse' --- Checked 0 slabs in 'Acpi-Parse' --- Validating slabcache 'Acpi-ParseExt' --- Checked 0 slabs in 'Acpi-ParseExt' --- Validating slabcache 'Acpi-State' --- Checked 0 slabs in 'Acpi-State' --- Validating slabcache 'RAW' --- Checked 1 slabs in 'RAW' --- Validating slabcache 'RAWv6' --- Checked 1 slabs in 'RAWv6' --- Validating slabcache 'TCP' --- Checked 3 slabs in 'TCP' --- Validating slabcache 'TCPv6' --- Checked 4 slabs in 'TCPv6' --- Validating slabcache 'UDP-Lite' --- Checked 0 slabs in 'UDP-Lite' --- Validating slabcache 'UDP' --- Checked 2 slabs in 'UDP' --- Validating slabcache 'UDPLITEv6' --- Checked 0 slabs in 'UDPLITEv6' --- Validating slabcache 'UDPv6' --- Checked 0 slabs in 'UDPv6' --- Validating slabcache 'UNIX' --- Checked 4 slabs in 'UNIX' --- Validating slabcache 'anon_vma' --- Checked 12 slabs in 'anon_vma' --- Validating slabcache 'arp_cache' --- Checked 2 slabs in 'arp_cache' --- Validating slabcache 'bdev_cache' --- Checked 3 slabs in 'bdev_cache' --- Validating slabcache 'bio' --- Checked 0 slabs in 'bio' --- Validating slabcache 'biovec-1' --- Checked 1 slabs in 'biovec-1' --- Validating slabcache 'biovec-128' --- Checked 1 slabs in 'biovec-128' --- Validating slabcache 'biovec-16' --- Checked 1 slabs in 'biovec-16' --- Validating slabcache 'biovec-256' --- Checked 1 slabs in 'biovec-256' --- Validating slabcache 'biovec-4' --- Checked 1 slabs in 'biovec-4' --- Validating slabcache 'biovec-64' --- Checked 1 slabs in 'biovec-64' --- Validating slabcache 'blkdev_ioc' --- Checked 4 slabs in 'blkdev_ioc' --- Validating slabcache 'blkdev_queue' --- Checked 1 slabs in 'blkdev_queue' --- Validating slabcache 'blkdev_requests' --- Checked 2 slabs in 'blkdev_requests' --- Validating slabcache 'buffer_head' --- Checked 4 slabs in 'buffer_head' --- Validating slabcache 'cfq_ioc_pool' --- Checked 4 slabs in 'cfq_ioc_pool' --- Validating slabcache 'cfq_pool' --- Checked 4 slabs in 'cfq_pool' --- Validating slabcache 'configfs_dir_cache' --- Checked 0 slabs in 'configfs_dir_cache' --- Validating slabcache 'dentry_cache' --- Checked 5 slabs in 'dentry_cache' --- Validating slabcache 'dm_io' --- Checked 0 slabs in 'dm_io' --- Validating slabcache 'dm_tio' --- Checked 0 slabs in 'dm_tio' --- Validating slabcache 'dnotify_cache' --- Checked 1 slabs in 'dnotify_cache' --- Validating slabcache 'dquot' --- Checked 0 slabs in 'dquot' --- Validating slabcache 'eventpoll_epi' --- Checked 1 slabs in 'eventpoll_epi' --- Validating slabcache 'eventpoll_pwq' --- Checked 1 slabs in 'eventpoll_pwq' --- Validating slabcache 'ext2_inode_cache' --- Checked 0 slabs in 'ext2_inode_cache' --- Validating slabcache 'ext2_xattr' --- Checked 0 slabs in 'ext2_xattr' --- Validating slabcache 'ext3_inode_cache' --- Checked 0 slabs in 'ext3_inode_cache' --- Validating slabcache 'ext3_xattr' --- Checked 0 slabs in 'ext3_xattr' --- Validating slabcache 'fasync_cache' --- Checked 0 slabs in 'fasync_cache' --- Validating slabcache 'fib6_nodes' --- Checked 1 slabs in 'fib6_nodes' --- Validating slabcache 'file_lock_cache' --- Checked 2 slabs in 'file_lock_cache' --- Validating slabcache 'files_cache' --- Checked 10 slabs in 'files_cache' --- Validating slabcache 'filp' --- Checked 35 slabs in 'filp' --- Validating slabcache 'flow_cache' --- Checked 0 slabs in 'flow_cache' --- Validating slabcache 'fs_cache' --- Checked 5 slabs in 'fs_cache' --- Validating slabcache 'hugetlbfs_inode_cache' --- Checked 1 slabs in 'hugetlbfs_inode_cache' --- Validating slabcache 'idr_layer_cache' --- Checked 2 slabs in 'idr_layer_cache' --- Validating slabcache 'inet_peer_cache' --- Checked 0 slabs in 'inet_peer_cache' --- Validating slabcache 'inode_cache' --- Checked 8 slabs in 'inode_cache' --- Validating slabcache 'inotify_event_cache' --- Checked 0 slabs in 'inotify_event_cache' --- Validating slabcache 'inotify_watch_cache' --- Checked 1 slabs in 'inotify_watch_cache' --- Validating slabcache 'ip6_dst_cache' --- Checked 1 slabs in
RAID1 out of memory error, was Re: 2.6.21-rc5-mm4
Hi, On 3/04/2007 3:47 PM, Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc5/2.6.21-rc5-mm4/ - The oops in git-net.patch has been fixed, so that tree has been restored. It is huge. - Added the device-mapper development tree to the -mm lineup (Alasdair Kergon). It is a quilt tree, living at ftp://ftp.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/. - Added davidel's signalfd stuff. Looks like some damage, or maybe intolerance to on-disk damage, to RAID-1. md1 is the first array on the disk, and it refuses to start up on boot, or after boot. tornado ~ # cat /proc/mdstat Personalities : [raid1] md1 : inactive sda1[0] sdc1[1] 208640 blocks md3 : active raid1 sdc3[1] sda3[0] 20008832 blocks [2/2] [UU] bitmap: 0/153 pages [0KB], 64KB chunk md5 : active raid1 sdc5[1] sda5[0] 10008384 blocks [2/2] [UU] bitmap: 4/153 pages [16KB], 32KB chunk md6 : active raid1 sdc6[1] sda6[0] 10008384 blocks [2/2] [UU] bitmap: 0/153 pages [0KB], 32KB chunk md8 : active raid1 sdc8[1] sda8[0] 1003904 blocks [2/2] [UU] bitmap: 0/123 pages [0KB], 4KB chunk md10 : active raid1 sdc10[1] sda10[0] 119933120 blocks [2/2] [UU] bitmap: 1/229 pages [4KB], 256KB chunk md2 : active raid1 sdc2[1] sda2[0] 14544 blocks [2/2] [UU] bitmap: 10/191 pages [40KB], 256KB chunk unused devices: none tornado ~ # tornado ~ # mdadm --examine /dev/sda1 /dev/sda1: Magic : a92b4efc Version : 00.90.00 UUID : f5c2e565:5ed956c0:33b08c07:16154426 Creation Time : Fri Feb 2 10:16:29 2007 Raid Level : raid1 Used Dev Size : 104320 (101.89 MiB 106.82 MB) Array Size : 104320 (101.89 MiB 106.82 MB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 1 Update Time : Fri Apr 6 02:06:17 2007 State : clean Internal Bitmap : present Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Checksum : d3668aaa - correct Events : 0.368 Number Major Minor RaidDevice State this 0 810 active sync /dev/sda1 0 0 810 active sync /dev/sda1 1 1 8 331 active sync /dev/sdc1 tornado ~ # mdadm --examine /dev/sdc1 /dev/sdc1: Magic : a92b4efc Version : 00.90.00 UUID : f5c2e565:5ed956c0:33b08c07:16154426 Creation Time : Fri Feb 2 10:16:29 2007 Raid Level : raid1 Used Dev Size : 104320 (101.89 MiB 106.82 MB) Array Size : 104320 (101.89 MiB 106.82 MB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 1 Update Time : Fri Apr 6 02:06:17 2007 State : clean Internal Bitmap : present Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Checksum : d3668acc - correct Events : 0.368 Number Major Minor RaidDevice State this 1 8 331 active sync /dev/sdc1 0 0 810 active sync /dev/sda1 1 1 8 331 active sync /dev/sdc1 tornado ~ # tornado ~ # mdadm --assemble /dev/md1 /dev/sda1 /dev/sdc1 mdadm: device /dev/md1 already active - cannot assemble it tornado ~ # mdadm --run /dev/md1 mdadm: failed to run array /dev/md1: Cannot allocate memory tornado ~ # and looking at a dmesg, this is logged: md: bindsdc1 md: bindsda1 raid1: raid set md1 active with 2 out of 2 mirrors md1: bitmap initialized from disk: read 0/1 pages, set 0 bits, status: -12 md1: failed to create bitmap (-12) md: pers-run() failed ... tornado ~ # uname -a Linux tornado 2.6.21-rc5-mm4 #1 SMP Thu Apr 5 23:47:42 EST 2007 x86_64 Intel(R) Pentium(R) 4 CPU 3.00GHz GenuineIntel GNU/Linux tornado ~ # The last known version that worked was 2.6.21-rc3-mm1 - I haven't been testing out the -mm releases so much lately. Also, Andrew, can you please restart posting/cc'ing your -mm announcements to the [EMAIL PROTECTED] list? Seems this stopped around about 2.6.20, it was handy. .config is up at http://www.reub.net/files/kernel/configs/2.6.21-rc5-mm4 Thanks, Reuben - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc5-mm4
On Mon, 02 Apr 2007 22:47:45 PDT, Andrew Morton said: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc5/2.6.21-rc5-mm4/ Am seeing an Oops 'cannot handle kernel paging request' during late system startup, hand-copied traceback follows: avc_has_perm_noaudit+0x2bf/0x506 avc_has_perm+0x2b/0x5b selinux_socket_stream_connect+0x7e/0xc3 unix_stream_connect+0x202/0x3f3 sys_connect+0x7e/0xa4 tracesys+0xde/0xe1 I've not identified exactly when it happens, but it's towards the very end of handling /etc/rc5.d, it's already up to the S98's. Odd thing is it only happens when I start with RedHat's 'graphical boot', and may be related to the shutdown of the X server that's displaying the boot progress preparing to launch the X server for gdm logins (as I'm also seeing a hang sometimes when shutting down - so it is possibly a shutting down X server nukes system bug). Figured I'd toss this heads-up in case it rings any bells, while I go do the bisection dance on -rc5-mm4 (-mm2 is OK, and -mm3 doesn't boot for me for other reasons I didn't chase down before -mm4 came out and fixed it, so I have a ways to bisect) pgpDJNQg7QOfl.pgp Description: PGP signature
Re: 2.6.21-rc5-mm4 (SLUB)
On Thu, 5 Apr 2007, Badari Pulavarty wrote: On Wed, 2007-04-04 at 21:29 -0700, Christoph Lameter wrote: Here is a patch that adds validation (only for cpuslabs and partial slabs but thats where the action is). Apply this patch and then do echo 1 /sys/slab/cache-to-check/validate I suggest to boot with full debugging and then run this on the ACPI slabs. Did this and didn't trigger any problems. Duh. Must have been in the full slabs. Maybe I should add a tracking of full slabs for the debug case. Would also enable leak detection. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.6.21-rc5-mm4: ia64: scheduling while atomic - utrace?
Running a 'usex -e' load [http://people.redhat.com/~anderson/usex/] on 2.6.21-rc5-mm4 on ia64, I see the following: BUG: scheduling while atomic: strace/0x4001/20162 Call Trace: [a00100014ec0] show_stack+0x80/0xa0 sp=e76042dc7610 bsp=e76042dc1260 [a00100014f10] dump_stack+0x30/0x60 sp=e76042dc77e0 bsp=e76042dc1248 [a001006f76e0] schedule+0x1d00/0x22a0 sp=e76042dc77e0 bsp=e76042dc1108 [a00100099750] __cond_resched+0x50/0xa0 sp=e76042dc7800 bsp=e76042dc10e8 [a001006f8e30] cond_resched+0xb0/0xe0 sp=e76042dc7800 bsp=e76042dc10d0 [a001001561d0] get_user_pages+0x1b0/0x7c0 sp=e76042dc7800 bsp=e76042dc1028 [a001001568a0] access_process_vm+0xc0/0x440 sp=e76042dc7820 bsp=e76042dc0f78 [a0010002fcc0] ia64_sync_user_rbs+0x80/0x100 sp=e76042dc7830 bsp=e76042dc0f38 [a0010002fdf0] do_gpregs_writeback+0xb0/0xe0 sp=e76042dc7840 bsp=e76042dc0f10 [a001cad0] unw_init_running+0x70/0xa0 sp=e76042dc7850 bsp=e76042dc0ee8 [a0010002ed70] do_regset_call+0x110/0x140 sp=e76042dc7c30 bsp=e76042dc0e88 [a0010002eea0] gpregs_writeback+0x40/0x60 sp=e76042dc7e30 bsp=e76042dc0e60 [a00100123900] ptrace_report+0xe0/0x1e0 sp=e76042dc7e30 bsp=e76042dc0e28 [a00100123aa0] ptrace_report_syscall+0xa0/0xe0 sp=e76042dc7e30 bsp=e76042dc0e00 [a00100123b10] ptrace_report_syscall_exit+0x30/0x60 sp=e76042dc7e30 bsp=e76042dc0dc8 [a00100122cb0] utrace_report_syscall+0xf0/0x540 sp=e76042dc7e30 bsp=e76042dc0d48 [a00100031800] syscall_trace_leave+0x60/0xc0 sp=e76042dc7e30 bsp=e76042dc0cf0 [a001c1c0] ia64_trace_syscall+0x100/0x110 sp=e76042dc7e30 bsp=e76042dc0cf0 Looks like get_ptrace_state(), called from ptrace_report_syscall calls rcu_read_lock() which disables preemption. Corresponding rcu_read_unlock() will be from put_ptrace_state() from ptrace_report() at end of report. However, ia64 needs to sync register backing store, and this requires access to process vm. get_user_pages' use of cond_sched() is tripping the scheduling while atomic bug. May be related to: http://marc.info/?a=10288337963r=1w=4 Lee - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc5-mm4 (SLUB)
Here is a patch that adds validation (only for cpuslabs and partial slabs but thats where the action is). Apply this patch and then do echo 1 >/sys/slab//validate I suggest to boot with full debugging and then run this on the ACPI slabs. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> Index: linux-2.6.21-rc5-mm4/mm/slub.c === --- linux-2.6.21-rc5-mm4.orig/mm/slub.c 2007-04-04 20:26:03.0 -0700 +++ linux-2.6.21-rc5-mm4/mm/slub.c 2007-04-04 21:26:15.0 -0700 @@ -2280,6 +2280,67 @@ void *__kmalloc_node_track_caller(size_t #ifdef CONFIG_SYSFS +static int validate_slab(struct kmem_cache *s, struct page *page) +{ + void *p; + void *addr = page_address(page); + unsigned long map[BITS_TO_LONGS(s->objects)]; + + if (!check_slab(s, page) || + !on_freelist(s, page, NULL)) + return 0; + + /* Now we know that a valid freelist exists */ + bitmap_zero(map, s->objects); + + for(p = page->freelist; p; p = get_freepointer(s, p)) { + set_bit((p - addr) / s->size, map); + if (!check_object(s, page, p, 0)) + return 0; + } + + for(p = addr; p < addr + s->objects * s->size; p += s->size) + if (!test_bit((p - addr) / s->size, map)) + if (!check_object(s, page, p, 1)) + return 0; + return 1; +} + +static int validate_slab_node(struct kmem_cache *s, struct kmem_cache_node *n) +{ + int count = 0; + struct page *page; + unsigned long flags; + + spin_lock_irqsave(>list_lock, flags); + list_for_each_entry(page, >partial, lru) { + if (slab_trylock(page)) { + validate_slab(s, page); + slab_unlock(page); + } else + printk(KERN_INFO "Skipped busy slab %p\n", page); + count++; + } + spin_unlock_irqrestore(>list_lock, flags); + return count; +} + +static void validate_slab_cache(struct kmem_cache *s) +{ + int node; + int count = 0; + + printk(KERN_INFO "--- Validating slabcache '%s'\n", s->name); + flush_all(s); + for_each_online_node(node) { + struct kmem_cache_node *n = get_node(s, node); + + count += validate_slab_node(s, n); + } + printk(KERN_INFO "--- Checked %d slabs in '%s'\n", + count, s->name); +} + static unsigned long count_partial(struct kmem_cache_node *n) { unsigned long flags; @@ -2402,7 +2463,6 @@ struct slab_attribute { static struct slab_attribute _name##_attr = \ __ATTR(_name, 0644, _name##_show, _name##_store) - static ssize_t slab_size_show(struct kmem_cache *s, char *buf) { return sprintf(buf, "%d\n", s->size); @@ -2609,6 +2669,22 @@ static ssize_t store_user_store(struct k } SLAB_ATTR(store_user); +static ssize_t validate_show(struct kmem_cache *s, char *buf) +{ + return 0; +} + +static ssize_t validate_store(struct kmem_cache *s, + const char *buf, size_t length) +{ + if (buf[0] == '1') + validate_slab_cache(s); + else + return -EINVAL; + return length; +} +SLAB_ATTR(validate); + #ifdef CONFIG_NUMA static ssize_t defrag_ratio_show(struct kmem_cache *s, char *buf) { @@ -2648,6 +2724,7 @@ static struct attribute * slab_attrs[] = _zone_attr.attr, _attr.attr, _user_attr.attr, + _attr.attr, #ifdef CONFIG_ZONE_DMA _dma_attr.attr, #endif - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc5-mm4 (SLUB)
On Wed, 4 Apr 2007, Badari Pulavarty wrote: > > Were the slabs merged? Look at /sys/slab and see if there are any symlinks > > there. > > Ok. symlinks there. Its a sporadic thing. I think I am going to add a slab validator to SLUB that goes through all slabs and checks all objects for validity. Then we can trigger a scan through the acpi caches which should locate the problem. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc5-mm4 (SLUB)
On Wed, 2007-04-04 at 17:31 -0700, Christoph Lameter wrote: > On Wed, 4 Apr 2007, Badari Pulavarty wrote: > > > On Wed, 2007-04-04 at 15:59 -0700, Christoph Lameter wrote: > > > On Wed, 4 Apr 2007, Badari Pulavarty wrote: > > > > > > > Here is the slub_debug=FU output with the above patch. > > > > > > Hmmm... Looks like the object is actually free. Someone writes beyond the > > > end of the earlier object. Setting Z should check overwrites but it > > > switched off merging. So set > > > > > > slub_debug = FZ > > > > > > Analoguos to the last patch you would need to take out redzoning from > > > the flags that stop merging. Then rerun. Maybe we can track it down this > > > way. > > > > Hmm.. I did that and machine boots fine, with absolutely no > > debug messages :( > > Were the slabs merged? Look at /sys/slab and see if there are any symlinks > there. > elm3b29:/sys/slab # ls -ltr total 0 drwxr-xr-x 2 root root 0 Apr 4 17:40 sock_inode_cache drwxr-xr-x 2 root root 0 Apr 4 17:40 skbuff_fclone_cache drwxr-xr-x 2 root root 0 Apr 4 17:40 sigqueue drwxr-xr-x 2 root root 0 Apr 4 17:40 shmem_inode_cache drwxr-xr-x 2 root root 0 Apr 4 17:40 radix_tree_node drwxr-xr-x 2 root root 0 Apr 4 17:40 proc_inode_cache drwxr-xr-x 2 root root 0 Apr 4 17:40 ip_dst_cache drwxr-xr-x 2 root root 0 Apr 4 17:40 file_lock_cache drwxr-xr-x 2 root root 0 Apr 4 17:40 blkdev_requests drwxr-xr-x 2 root root 0 Apr 4 17:40 blkdev_queue drwxr-xr-x 2 root root 0 Apr 4 17:40 blkdev_ioc drwxr-xr-x 2 root root 0 Apr 4 17:40 biovec-64 drwxr-xr-x 2 root root 0 Apr 4 17:40 biovec-256 drwxr-xr-x 2 root root 0 Apr 4 17:40 biovec-128 drwxr-xr-x 2 root root 0 Apr 4 17:40 bdev_cache drwxr-xr-x 2 root root 0 Apr 4 17:40 TCP drwxr-xr-x 2 root root 0 Apr 4 17:40 Acpi-State drwxr-xr-x 2 root root 0 Apr 4 17:40 Acpi-ParseExt drwxr-xr-x 2 root root 0 Apr 4 17:40 Acpi-Operand drwxr-xr-x 2 root root 0 Apr 4 17:40 Acpi-Namespace drwxr-xr-x 2 root root 0 Apr 4 17:40 vm_area_struct drwxr-xr-x 2 root root 0 Apr 4 17:40 task_struct drwxr-xr-x 2 root root 0 Apr 4 17:40 sysfs_dir_cache drwxr-xr-x 2 root root 0 Apr 4 17:40 signal_cache drwxr-xr-x 2 root root 0 Apr 4 17:40 sighand_cache drwxr-xr-x 2 root root 0 Apr 4 17:40 pid drwxr-xr-x 2 root root 0 Apr 4 17:40 names_cache drwxr-xr-x 2 root root 0 Apr 4 17:40 mm_struct drwxr-xr-x 2 root root 0 Apr 4 17:40 kmem_cache_node drwxr-xr-x 2 root root 0 Apr 4 17:40 kmalloc-96 drwxr-xr-x 2 root root 0 Apr 4 17:40 kmalloc-8192 drwxr-xr-x 2 root root 0 Apr 4 17:40 kmalloc-8 drwxr-xr-x 2 root root 0 Apr 4 17:40 kmalloc-65536 drwxr-xr-x 2 root root 0 Apr 4 17:40 kmalloc-64 drwxr-xr-x 2 root root 0 Apr 4 17:40 kmalloc-512 drwxr-xr-x 2 root root 0 Apr 4 17:40 kmalloc-4096 drwxr-xr-x 2 root root 0 Apr 4 17:40 kmalloc-32768 drwxr-xr-x 2 root root 0 Apr 4 17:40 kmalloc-32 drwxr-xr-x 2 root root 0 Apr 4 17:40 kmalloc-262144 drwxr-xr-x 2 root root 0 Apr 4 17:40 kmalloc-256 drwxr-xr-x 2 root root 0 Apr 4 17:40 kmalloc-2048 drwxr-xr-x 2 root root 0 Apr 4 17:40 kmalloc-192 drwxr-xr-x 2 root root 0 Apr 4 17:40 kmalloc-16384 drwxr-xr-x 2 root root 0 Apr 4 17:40 kmalloc-16 drwxr-xr-x 2 root root 0 Apr 4 17:40 kmalloc-131072 drwxr-xr-x 2 root root 0 Apr 4 17:40 kmalloc-128 drwxr-xr-x 2 root root 0 Apr 4 17:40 kmalloc-1024 drwxr-xr-x 2 root root 0 Apr 4 17:40 inode_cache drwxr-xr-x 2 root root 0 Apr 4 17:40 idr_layer_cache drwxr-xr-x 2 root root 0 Apr 4 17:40 fs_cache drwxr-xr-x 2 root root 0 Apr 4 17:40 filp drwxr-xr-x 2 root root 0 Apr 4 17:40 dentry_cache drwxr-xr-x 2 root root 0 Apr 4 17:40 buffer_head drwxr-xr-x 2 root root 0 Apr 4 17:40 anon_vma drwxr-xr-x 2 root root 0 Apr 4 17:40 dquot drwxr-xr-x 2 root root 0 Apr 4 17:40 reiser_inode_cache drwxr-xr-x 2 root root 0 Apr 4 17:40 nfs_inode_cache drwxr-xr-x 2 root root 0 Apr 4 17:40 nfs_direct_cache drwxr-xr-x 2 root root 0 Apr 4 17:40 mqueue_inode_cache drwxr-xr-x 2 root root 0 Apr 4 17:40 minix_inode_cache drwxr-xr-x 2 root root 0 Apr 4 17:40 journal_head drwxr-xr-x 2 root root 0 Apr 4 17:40 isofs_inode_cache drwxr-xr-x 2 root root 0 Apr 4 17:40 hugetlbfs_inode_cache drwxr-xr-x 2 root root 0 Apr 4 17:40 ext3_xattr drwxr-xr-x 2 root root 0 Apr 4 17:40 ext3_inode_cache drwxr-xr-x 2 root root 0 Apr 4 17:40 ext2_xattr drwxr-xr-x 2 root root 0 Apr 4 17:40 ext2_inode_cache drwxr-xr-x 2 root root 0 Apr 4 17:40 cfq_pool drwxr-xr-x 2 root root 0 Apr 4 17:40 cfq_ioc_pool drwxr-xr-x 2 root root 0 Apr 4 17:40 UNIX drwxr-xr-x 2 root root 0 Apr 4 17:40 rpc_inode_cache drwxr-xr-x 2 root root 0 Apr 4 17:40 rpc_buffers drwxr-xr-x 2 root root 0 Apr 4 17:40 revokefs_inode_cache drwxr-xr-x 2 root root 0 Apr 4 17:40 TCPv6 drwxr-xr-x 2 root root 0 Apr 4 17:54 fat_inode_cache drwxr-xr-x 2 root root 0 Apr 4 17:54 fat_cache drwxr-xr-x 2 root root 0 Apr 4 17:54 sgpool-64 drwxr-xr-x 2 root root 0 Apr 4 17:54 sgpool-32 drwxr-xr-x 2 root root 0 Apr 4 17:54 sgpool-128 drwxr-xr-x 2 root root 0 Apr 4 17:54 scsi_io_context
Re: 2.6.21-rc5-mm4
On Mon, 2007-04-02 at 22:47 -0700, Andrew Morton wrote: > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc5/2.6.21-rc5-mm4/ > > - The oops in git-net.patch has been fixed, so that tree has been restored. > It is huge. > > - Added the device-mapper development tree to the -mm lineup (Alasdair > Kergon). It is a quilt tree, living at > ftp://ftp.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/. > > - Added davidel's signalfd stuff. > > > I see this tracing (from the lock-dependency validator?) for several -mm versions. This is from a Silan ethernet card (CONFIG_SC92031). 00:0b.0 Ethernet controller: Hangzhou Silan Microelectronics Co., Ltd. Unknown device 2031 (rev 01) Other than the tracing, I'm not having any problems. Tony == [ INFO: soft-safe -> soft-unsafe lock order detected ] 2.6.21-rc5-mm4-default #44 -- ip/3036 [HC0[0]:SC0[2]:HE1:SE0] is trying to acquire: (>lock){--..}, at: [] sc92031_set_multicast_list +0x14/0x2d [sc92031] and this task is already holding: (>_xmit_lock){-...}, at: [] dev_mc_upload+0x14/0x3a which would create a new lock dependency: (>_xmit_lock){-...} -> (>lock){--..} but this new dependency connects a soft-irq-safe lock: (>mca_lock){-+..} ... which became soft-irq-safe at: [] __lock_acquire+0x3d7/0xb93 [] lock_acquire+0x68/0x82 [] _spin_lock_bh+0x30/0x3d [] mld_ifc_timer_expire+0x15b/0x21d [ipv6] [] run_timer_softirq+0xf1/0x14e [] __do_softirq+0x46/0x9c [] do_softirq+0x2d/0x46 [] irq_exit+0x3b/0x6b [] do_IRQ+0x5e/0x76 [] common_interrupt+0x2e/0x34 [] error_code+0x71/0x78 [] 0x to a soft-irq-unsafe lock: (>lock){--..} ... which became soft-irq-unsafe at: ... [] __lock_acquire+0x46b/0xb93 [] lock_acquire+0x68/0x82 [] _spin_lock+0x2b/0x38 [] sc92031_open+0xcc/0x16f [sc92031] [] dev_open+0x33/0x6e [] dev_change_flags+0x57/0x10b [] devinet_ioctl+0x235/0x546 [] inet_ioctl+0x89/0xaa [] sock_ioctl+0x1ac/0x1ca [] do_ioctl+0x1c/0x53 [] vfs_ioctl+0x1ec/0x203 [] sys_ioctl+0x49/0x62 [] sysenter_past_esp+0x5d/0x99 [] 0x other info that might help us debug this: 2 locks held by ip/3036: #0: (rtnl_mutex){--..}, at: [] mutex_lock+0x24/0x28 #1: (>_xmit_lock){-...}, at: [] dev_mc_upload+0x14/0x3a the soft-irq-safe lock's dependencies: -> (>mca_lock){-+..} ops: 9 { initial-use at: [] __lock_acquire+0x486/0xb93 [] lock_acquire+0x68/0x82 [] _spin_lock_bh+0x30/0x3d [] igmp6_group_added+0x1b/0x120 [ipv6] [] ipv6_dev_mc_inc+0x2f9/0x346 [ipv6] [] ipv6_add_dev+0x232/0x240 [ipv6] [] versions+0x1e8b/0xf9c8 [x_tables] [] versions+0x1d54/0xf9c8 [x_tables] [] sys_init_module+0x1252/0x138f [] sysenter_past_esp+0x5d/0x99 [] 0x in-softirq-W at: [] __lock_acquire+0x3d7/0xb93 [] lock_acquire+0x68/0x82 [] _spin_lock_bh+0x30/0x3d [] mld_ifc_timer_expire+0x15b/0x21d [ipv6] [] run_timer_softirq+0xf1/0x14e [] __do_softirq+0x46/0x9c [] do_softirq+0x2d/0x46 [] irq_exit+0x3b/0x6b [] do_IRQ+0x5e/0x76 [] common_interrupt+0x2e/0x34 [] error_code+0x71/0x78 [] 0x hardirq-on-W at: [] __lock_acquire+0x441/0xb93 [] lock_acquire+0x68/0x82 [] _spin_lock_bh+0x30/0x3d [] igmp6_group_added+0x1b/0x120 [ipv6] [] ipv6_dev_mc_inc+0x2f9/0x346 [ipv6] [] ipv6_add_dev+0x232/0x240 [ipv6] [] versions+0x1e8b/0xf9c8 [x_tables] [] versions+0x1d54/0xf9c8 [x_tables] [] sys_init_module+0x1252/0x138f [] sysenter_past_esp+0x5d/0x99 [] 0x } ... key at: [] __key.29988+0x0/0xfffe9535 [ipv6] -> (>_xmit_lock){-...} ops: 18 { initial-use at: [] __lock_acquire+0x486/0xb93 [] lock_acquire+0x68/0x82 [] _spin_lock_bh+0x30/0x3d [] dev_mc_upload+0x14/0x3a [] dev_change_flags+0x31/0x10b [] devinet_ioctl+0x235/0x546 [] inet_ioctl+0x89/0xaa [] sock_ioctl+0x1ac/0x1ca
Re: 2.6.21-rc5-mm4 (SLUB)
On Wed, 4 Apr 2007, Badari Pulavarty wrote: > On Wed, 2007-04-04 at 15:59 -0700, Christoph Lameter wrote: > > On Wed, 4 Apr 2007, Badari Pulavarty wrote: > > > > > Here is the slub_debug=FU output with the above patch. > > > > Hmmm... Looks like the object is actually free. Someone writes beyond the > > end of the earlier object. Setting Z should check overwrites but it > > switched off merging. So set > > > > slub_debug = FZ > > > > Analoguos to the last patch you would need to take out redzoning from > > the flags that stop merging. Then rerun. Maybe we can track it down this > > way. > > Hmm.. I did that and machine boots fine, with absolutely no > debug messages :( Were the slabs merged? Look at /sys/slab and see if there are any symlinks there. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc5-mm4
On Thu, 2007-04-05 at 08:38 +1000, Con Kolivas wrote: > On Thursday 05 April 2007 08:10, Andrew Morton wrote: > > Thanks - that'll be the CPU scheduler changes. > > > > Con has produced a patch or two which might address this but afaik we don't > > yet have a definitive fix? > > > > I believe that reverting > > sched-implement-staircase-deadline-cpu-scheduler-staircase-improvements.pat > >ch will prevent it. > > I posted a definitive fix which Michal tested for me offlist. Subject was: > [PATCH] sched: implement staircase deadline cpu scheduler improvements fix > > Sorry about relative noise prior to that. Akpm please pick it up. > > Here again just in case. > Rebooted a few times, I can confirm that this patch fixes this. Thanks Tony - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc5-mm4 (SLUB)
On Wed, 2007-04-04 at 15:59 -0700, Christoph Lameter wrote: > On Wed, 4 Apr 2007, Badari Pulavarty wrote: > > > Here is the slub_debug=FU output with the above patch. > > Hmmm... Looks like the object is actually free. Someone writes beyond the > end of the earlier object. Setting Z should check overwrites but it > switched off merging. So set > > slub_debug = FZ > > Analoguos to the last patch you would need to take out redzoning from > the flags that stop merging. Then rerun. Maybe we can track it down this > way. Hmm.. I did that and machine boots fine, with absolutely no debug messages :( Thanks, Badari - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc5-mm4 (SLUB)
On Wed, 4 Apr 2007, Badari Pulavarty wrote: > Here is the slub_debug=FU output with the above patch. Hmmm... Looks like the object is actually free. Someone writes beyond the end of the earlier object. Setting Z should check overwrites but it switched off merging. So set slub_debug = FZ Analoguos to the last patch you would need to take out redzoning from the flags that stop merging. Then rerun. Maybe we can track it down this way. Hmmm... Maybe remove all the debug flags from those that avoid merging and then run with full debug. That should theoretically flush it out. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc5-mm4 (SLUB)
On Wed, 2007-04-04 at 11:22 -0700, Christoph Lameter wrote: > On Wed, 4 Apr 2007, Christoph Lameter wrote: > > > Yes. slub_debug=U. But user tracking may need to increase the slab > > size (depends on the padding available in the slab) to store the > > tracking information, so you may not get the same corruption. > > Hummm U is switching off merging and you may need merging to trigger the > discovery of the overwrite. > > Here is a patch to enable merging even while tracking slabs. This patch > should not be applied to mm. In general tracking requires knowing which > slab the objects come from and merging looses that information. > > Index: linux-2.6.21-rc5-mm4/mm/slub.c > =============== > --- linux-2.6.21-rc5-mm4.orig/mm/slub.c 2007-04-04 11:19:29.0 > -0700 > +++ linux-2.6.21-rc5-mm4/mm/slub.c2007-04-04 11:19:35.0 -0700 > @@ -86,7 +86,7 @@ > /* > * Set of flags that will prevent slab merging > */ > -#define SLUB_NEVER_MERGE (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER | \ > +#define SLUB_NEVER_MERGE (SLAB_RED_ZONE | SLAB_POISON | \ > SLAB_TRACE | SLAB_DESTROY_BY_RCU) > > #define SLUB_MERGE_SAME (SLAB_DEBUG_FREE | SLAB_RECLAIM_ACCOUNT | \ > Here is the slub_debug=FU output with the above patch. Thanks, Badari Linux version 2.6.21-rc5-mm4 ([EMAIL PROTECTED]) (gcc version 4.1.0 (SUSE Linux)) #6 SMP Wed Apr 4 16:52:03 PDT 2007 Command line: root=/dev/hda2 vga=0x314 slub_debug=FU selinux=0 console=tty0 console=ttyS0,38400 resume=/dev/hda1 resume=/dev/hda1 splash=silent showopts BIOS-provided physical RAM map: BIOS-e820: - 0009f000 (usable) BIOS-e820: 0009f000 - 000a (reserved) BIOS-e820: 000ca000 - 0010 (reserved) BIOS-e820: 0010 - dfef (usable) BIOS-e820: dfef - dfeff000 (ACPI data) BIOS-e820: dfeff000 - dff0 (ACPI NVS) BIOS-e820: dff0 - e000 (usable) BIOS-e820: fec0 - fec00400 (reserved) BIOS-e820: fee0 - fee01000 (reserved) BIOS-e820: fff8 - 0001 (reserved) BIOS-e820: 0001 - 0001e000 (usable) end_pfn_map = 1966080 DMI 2.3 present. ACPI: RSDP 000F6970, 0024 (r2 PTLTD ) ACPI: XSDT DFEFC625, 003C (r1 PTLTD XSDT604 LTP0) ACPI: FACP DFEFED02, 00F4 (r3 AMDHAMMER604 PTECF4240) ACPI: DSDT DFEFC661, 262D (r1 AMD-K8 AMDACPI 604 MSFT 10D) ACPI: FACS DFEFFFC0, 0040 ACPI: SRAT DFEFEDF6, 0160 (r1 AMDHAMMER604 AMD 1) ACPI: APIC DFEFEF56, 00AA (r1 PTLTD APIC604 LTP0) SRAT: PXM 0 -> APIC 0 -> Node 0 SRAT: PXM 1 -> APIC 1 -> Node 1 SRAT: PXM 2 -> APIC 2 -> Node 2 SRAT: PXM 3 -> APIC 3 -> Node 3 SRAT: Node 0 PXM 0 0-a SRAT: Node 0 PXM 0 0-e000 SRAT: Node 0 PXM 0 0-18000 SRAT: PXM 1 (1-1a000) overlaps with PXM 0 (0-18000) SRAT: SRAT not used. Scanning NUMA topology in Northbridge 24 Number of nodes 4 Node 0 MemBase Limit 00018000 Node 1 MemBase 00018000 Limit 0001a000 Node 2 MemBase 0001a000 Limit 0001c000 Node 3 MemBase 0001c000 Limit 0001e000 Using node hash shift of 29 Bootmem setup node 0 -00018000 Bootmem setup node 1 00018000-0001a000 Bootmem setup node 2 0001a000-0001c000 Bootmem setup node 3 0001c000-0001e000 Zone PFN ranges: DMA 0 -> 4096 DMA324096 -> 1048576 Normal1048576 -> 1966080 Movable zone start PFN for each node early_node_map[7] active PFN ranges 0:0 -> 159 0: 256 -> 917232 0: 917248 -> 917504 0: 1048576 -> 1572864 1: 1572864 -> 1703936 2: 1703936 -> 1835008 3: 1835008 -> 1966080 ACPI: PM-Timer IO Port: 0x8008 ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) Processor #0 (Bootup-CPU) ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled) Processor #1 ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled) Processor #2 ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] enabled) Processor #3 ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x03] high edge lint[0x1]) ACPI: IOAPIC (id[0x04] address[0xfec0] gsi_base[0]) IOAPIC[0]: apic_id 4, address 0xfec0, GSI 0-23 ACPI: IOAPIC (id[0x05] address[0xfa3e] gsi_base[24]) IOAPIC[1]: apic_id 5, address 0xfa3e, GSI 24-27 ACPI: IOAPIC (id[0x06] address[0xfa3e1000] gsi_base[28]) IOAPIC[2]: apic_id 6, address 0xfa3e1000, GSI 28-31 ACPI: IOAPIC (id[0x07] address[0xfa3e2000] gsi_b
Re: 2.6.21-rc5-mm4
On Thursday 05 April 2007 08:10, Andrew Morton wrote: > Thanks - that'll be the CPU scheduler changes. > > Con has produced a patch or two which might address this but afaik we don't > yet have a definitive fix? > > I believe that reverting > sched-implement-staircase-deadline-cpu-scheduler-staircase-improvements.pat >ch will prevent it. I posted a definitive fix which Michal tested for me offlist. Subject was: [PATCH] sched: implement staircase deadline cpu scheduler improvements fix Sorry about relative noise prior to that. Akpm please pick it up. Here again just in case. --- Use of memset was bogus. Fix it. Fix exiting recalc_task_prio without p->array being updated. Microoptimisation courtesy of Dmitry Adamushko <[EMAIL PROTECTED]> Signed-off-by: Con Kolivas <[EMAIL PROTECTED]> --- kernel/sched.c | 17 + 1 file changed, 9 insertions(+), 8 deletions(-) Index: linux-2.6.21-rc5-mm4/kernel/sched.c =============== --- linux-2.6.21-rc5-mm4.orig/kernel/sched.c2007-04-04 12:14:29.00000 +1000 +++ linux-2.6.21-rc5-mm4/kernel/sched.c 2007-04-04 12:49:39.0 +1000 @@ -683,11 +683,13 @@ static void dequeue_task(struct task_str * The task is being queued on a fresh array so it has its entitlement * bitmap cleared. */ -static inline void task_new_array(struct task_struct *p, struct rq *rq) +static void task_new_array(struct task_struct *p, struct rq *rq, + struct prio_array *array) { bitmap_zero(p->bitmap, PRIO_RANGE); p->rotation = rq->prio_rotation; p->time_slice = p->quota; + p->array = array; } /* Find the first slot from the relevant prio_matrix entry */ @@ -709,6 +711,8 @@ static inline int next_entitled_slot(str DECLARE_BITMAP(tmp, PRIO_RANGE); int search_prio, uprio = USER_PRIO(p->static_prio); + if (!rq->prio_level[uprio]) + rq->prio_level[uprio] = MAX_RT_PRIO; /* * Only priorities equal to the prio_level and above for their * static_prio are acceptable, and only if it's not better than @@ -736,11 +740,8 @@ static inline int next_entitled_slot(str static void queue_expired(struct task_struct *p, struct rq *rq) { - p->array = rq->expired; - task_new_array(p, rq); + task_new_array(p, rq, rq->expired); p->prio = p->normal_prio = first_prio_slot(p); - p->time_slice = p->quota; - p->rotation = rq->prio_rotation; } #ifdef CONFIG_SMP @@ -800,9 +801,9 @@ static void recalc_task_prio(struct task queue_expired(p, rq); return; } else - task_new_array(p, rq); + task_new_array(p, rq, array); } else - task_new_array(p, rq); + task_new_array(p, rq, array); queue_prio = next_entitled_slot(p, rq); if (queue_prio >= MAX_PRIO) { @@ -3445,7 +3446,7 @@ EXPORT_SYMBOL(sub_preempt_count); static inline void reset_prio_levels(struct rq *rq) { - memset(rq->prio_level, MAX_RT_PRIO, ARRAY_SIZE(rq->prio_level)); + memset(rq->prio_level, 0, sizeof(int) * PRIO_RANGE); } /* -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc5-mm4
On Thu, 05 Apr 2007 05:56:35 +0800 "Antonino A. Daplas" <[EMAIL PROTECTED]> wrote: > On Mon, 2007-04-02 at 22:47 -0700, Andrew Morton wrote: > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc5/2.6.21-rc5-mm4/ > > > > - The oops in git-net.patch has been fixed, so that tree has been restored. > > It is huge. > > > > - Added the device-mapper development tree to the -mm lineup (Alasdair > > Kergon). It is a quilt tree, living at > > ftp://ftp.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/. > > > > - Added davidel's signalfd stuff. > > > > > > > > I'm getting a kernel panic intermittently, approximately 50% of boots. > The tracing is not always the same, but it always dies on an > atomic_bitop operation. Here are two hand-copied tracings (for the life > of me, I can't make netconsole work). > > > /---First Tracing--/ > Oops: [#1] > last sysfs file: class/firmware/microcode > Modules linked in: ... > > ... > CPU: 0 > EIP: ... > EFLAGS:... > EIP is at find_next_zero_bit > ... > ... > ... > Process set_disk_settin > Call Trace: > show_trace_log > show_stack_log > show_register > die > do_page_fault > error_code > recalc_task_prio > activate_task > try_to_wake_up > deault_wake_function > __wake_up_common > __wake_up > sock_def_readable > soc_queue_rev_skb > udp_queue_rcv_skb > __udp4_libr_rcv > udp_rcv > ip_local_delivery > ip_rcv > netif_receive_skb > rtl8139_poll > net_rx_action > __do_soft_irq > do_softirq > irq_exit > do_IRQ > common_interrupt Thanks - that'll be the CPU scheduler changes. Con has produced a patch or two which might address this but afaik we don't yet have a definitive fix? I believe that reverting sched-implement-staircase-deadline-cpu-scheduler-staircase-improvements.patch will prevent it. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc5-mm4
On Mon, 2007-04-02 at 22:47 -0700, Andrew Morton wrote: > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc5/2.6.21-rc5-mm4/ > > - The oops in git-net.patch has been fixed, so that tree has been restored. > It is huge. > > - Added the device-mapper development tree to the -mm lineup (Alasdair > Kergon). It is a quilt tree, living at > ftp://ftp.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/. > > - Added davidel's signalfd stuff. > > > I'm getting a kernel panic intermittently, approximately 50% of boots. The tracing is not always the same, but it always dies on an atomic_bitop operation. Here are two hand-copied tracings (for the life of me, I can't make netconsole work). /---First Tracing--/ Oops: [#1] last sysfs file: class/firmware/microcode Modules linked in: ... ... CPU: 0 EIP: ... EFLAGS:... EIP is at find_next_zero_bit ... ... ... Process set_disk_settin Call Trace: show_trace_log show_stack_log show_register die do_page_fault error_code recalc_task_prio activate_task try_to_wake_up deault_wake_function __wake_up_common __wake_up sock_def_readable soc_queue_rev_skb udp_queue_rcv_skb __udp4_libr_rcv udp_rcv ip_local_delivery ip_rcv netif_receive_skb rtl8139_poll net_rx_action __do_soft_irq do_softirq irq_exit do_IRQ common_interrupt /-- Second Tracing --/ CPU: 0 EIP: ... EFLAGS:... EIP is at find_next_zero_bit ... ... ... Process sshd Call Trace: show_trace_log show_stack_log show_register die do_page_fault error_code recalc_task_prio enqueue_task activate_task try_to_wake_up wake_up_state signal_wake_up __group_complete_signal __group_send_signal group_send_sig_info send_group_sig_info it_real_fn run_hrtimer_softirq __do_softirq irq_exit smp_apic_timer_interrupt apic_timer_interrupt error_code EIP: [. find_next_zero_bit+... Tony PS: I might try use a serial console and bisection, but this might take me a few days. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc5-mm4 (SLUB powerpc)
On Wed, 2007-04-04 at 10:35 -0700, Christoph Lameter wrote: > On Wed, 4 Apr 2007, Badari Pulavarty wrote: > > > Next issue ? Sorry. > > No problem. Could have a look at the hvsi driver and figure out what is > failing there? What is the hvsi driver? > > > Console: switching to colour frame buffer device 80x30 > > fb0: MATROX frame buffer device > > matroxfb_crtc2: secondary head of fb0 was registered as fb1 > > Kernel panic - not syncing: Couldn't register hvsi console driver > > Framebuffer allocation failure > It looks like.. hvsi.c: if (tty_register_driver(hvsi_driver)) panic("Couldn't register hvsi console driver\n"); I added printk() in all failure cases in tty_register_driver() and I can't reproduce the problem. Machine tries to boot and goes further and hangs. I saw similar hang with RSDL earlier. Thanks, Badari Welcome to yaboot version 10.1.5-r625.SuSE booted from '/[EMAIL PROTECTED]/[EMAIL PROTECTED],2/pci1069,[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED],0' Enter "help" to get some basic usage information boot: 2621rc5mm4 Please wait, loading kernel... Allocated 0x0040 bytes for executable @ 0x0040 Elf32 kernel loaded... zImage starting: loaded at 0x0040 (sp: 0x01a3fe60) Allocating 0x806af0 bytes for kernel ... OF version = 'IBM,SF225_096' gunzipping (0x01c0 <- 0x00408000:0x006a4cd2)...done 0x741f90 bytes Finalizing device tree... using OF tree (promptr=00c39a50) OF stdout device is: /vdevice/[EMAIL PROTECTED] Hypertas detected, assuming LPAR ! command line: root=/dev/sda2 xmon=on slub_debug memory layout at init: alloc_bottom : 0240b000 alloc_top: 0800 alloc_top_hi : 0001e800 rmo_top : 0800 ram_top : 0001e800 Looking for displays found display : /[EMAIL PROTECTED]/[EMAIL PROTECTED],2/[EMAIL PROTECTED]/[EMAIL PROTECTED], opening ... done instantiating rtas at 0x077ca000 ... done : boot cpu 0002 : starting cpu hw idx 0002... done 0004 : starting cpu hw idx 0004... done 0006 : starting cpu hw idx 0006... done copying OF device tree ... Building dt strings... Building dt structure... Device tree strings 0x0240c000 -> 0x0240d2fe Device tree struct 0x0240e000 -> 0x02423000 Calling quiesce ... returning from prom_init Partition configured for 8 cpus. Starting Linux PPC64 #5 SMP Wed Apr 4 10:55:34 PDT 2007 - ppc64_pft_size= 0x1b physicalMemorySize= 0x1e800 ppc64_caches.dcache_line_size = 0x80 ppc64_caches.icache_line_size = 0x80 htab_address = 0x htab_hash_mask= 0xf - Linux version 2.6.21-rc4-mm1-ppc64 ([EMAIL PROTECTED]) (gcc version 4.1.0 (SUSE Linux)) #5 SMP Wed Apr 4 10:55:34 PDT 2007 [boot]0012 Setup Arch No ramdisk, default root is /dev/sda2 EEH: PCI Enhanced I/O Error Handling Enabled PPC64 nvram contains 8192 bytes Zone PFN ranges: DMA 0 -> 1998848 Normal1998848 -> 1998848 Movable zone start PFN for each node early_node_map[2] active PFN ranges 0:0 -> 974848 1: 974848 -> 1998848 [boot]0015 Setup Done Built 2 zonelists. Total pages: 1971520 Kernel command line: root=/dev/sda2 xmon=on slub_debug [boot]0020 XICS Init [boot]0021 XICS Done PID hash table entries: 4096 (order: 12, 32768 bytes) Console: colour dummy device 80x25 console handover: boot [udbg-1] -> real [hvc0] Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes) Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes) freeing bootmem node 0 freeing bootmem node 1 Memory: 7855772k/7995392k available (5992k kernel code, 139620k reserved, 1224k data, 814k bss, 272k init) Security Framework v1.0.0 initialized Mount-cache hash table entries: 256 Processor 1 found. Processor 2 found. Processor 3 found. Processor 4 found. Processor 5 found. Processor 6 found. Processor 7 found. Brought up 8 CPUs migration_cost=0,3,25 NET: Registered protocol family 16 IOMMU table initialized, virtual merging enabled SCSI subsystem initialized usbcore: registered new interface driver usbfs usbcore: registered new interface driver hub usbcore: registered new device driver usb NET: Registered protocol family 2 IP route cache hash table entries: 262144 (order: 9, 2097152 bytes) TCP established hash table entries: 524288 (order: 11, 12582912 bytes) TCP bind hash table entries: 65536 (order: 8, 1048576 bytes) TCP: Hash tables configured (established 524288 bind 65536) TCP reno registered vio_bus_init: device_register returned -19 IBM eBus Device Driver audit: initializing netlink socket (disabled) audit(1175719341.610:1): initialized Total HugeTLB memory allocated, 0 VFS: Disk quotas dquot_6.5.1 Dquot-cache hash table entries: 512
Re: 2.6.21-rc5-mm4 -- laptop lid button only triggers suspend on HP dv1240us every other time.
On Tue, 2007-04-03 at 22:44 -0700, Andrew Morton wrote: > On Wed, 4 Apr 2007 00:33:36 -0500 "Miles Lane" <[EMAIL PROTECTED]> wrote: > > > This is an old bug. It has been happening forever, but I'd love to > > know how I can help get this tracked down and fixed. > > Yes, I've been hitting something like that in the past 3-4 weeks. We > started to diagnose it but I got distracted. > > For a start, please review > http://www.mail-archive.com/linux-acpi@vger.kernel.org/msg05094.html, then > see if you are able to take it further than I was. I had a similar problem with my laptop that loose ACPI events, after suspend to disk, on kernels (I don't remember well) works on 2.6.16 or 15, stops work on 2.6.17 and 2.6.18 and works again on 2.6.19 and 20 -- Sérgio M. B. smime.p7s Description: S/MIME cryptographic signature
Re: 2.6.21-rc5-mm4 (SLUB)
On Wed, 4 Apr 2007, Badari Pulavarty wrote: > free. Any ideas on how I can track down easily ? Is there > a way to store last allocated (function, line#) and look > around there ? Also you may want to switch off slab merging. That will allow you to determine the cache involved if its not a kmalloc alloc and the slab was merged. Note that switching off merging may seem to cure the problem because the object was corrupted after allocation and then the slab was never touched again. It may surface only if its merged because merging creates more activity on the slabs that will expose the problem. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc5-mm4 (SLUB)
On Wed, 4 Apr 2007, Christoph Lameter wrote: > Yes. slub_debug=U. But user tracking may need to increase the slab > size (depends on the padding available in the slab) to store the > tracking information, so you may not get the same corruption. Hummm U is switching off merging and you may need merging to trigger the discovery of the overwrite. Here is a patch to enable merging even while tracking slabs. This patch should not be applied to mm. In general tracking requires knowing which slab the objects come from and merging looses that information. Index: linux-2.6.21-rc5-mm4/mm/slub.c === --- linux-2.6.21-rc5-mm4.orig/mm/slub.c 2007-04-04 11:19:29.0 -0700 +++ linux-2.6.21-rc5-mm4/mm/slub.c 2007-04-04 11:19:35.0 -0700 @@ -86,7 +86,7 @@ /* * Set of flags that will prevent slab merging */ -#define SLUB_NEVER_MERGE (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER | \ +#define SLUB_NEVER_MERGE (SLAB_RED_ZONE | SLAB_POISON | \ SLAB_TRACE | SLAB_DESTROY_BY_RCU) #define SLUB_MERGE_SAME (SLAB_DEBUG_FREE | SLAB_RECLAIM_ACCOUNT | \ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc5-mm4 (SLUB)
On Wed, 4 Apr 2007, Badari Pulavarty wrote: > Machine booted fine with slub_debug=F. Got following in the > log. I guess we need to track down who is touching after > free. Any ideas on how I can track down easily ? Is there > a way to store last allocated (function, line#) and look > around there ? Yes. slub_debug=U. But user tracking may need to increase the slab size (depends on the padding available in the slab) to store the tracking information, so you may not get the same corruption. > *** SLUB: Freepointer corrupt in [EMAIL PROTECTED] Slab > 0x81017f9f8b80 > offset=672 flags=0x2c7 inuse=42 > freelist=0x810173f172a0 > Bytes b4 0x810173f17290: a0 72 f1 73 00 00 00 00 00 00 00 00 00 > 00 00 00 .r\us > Object 0x810173f172a0: 00 00 00 00 01 81 ff ff 00 00 00 00 00 > 00 00 00 ..\u\u > FreePointer 0x810173f172a0 -> 0x8101 Same as before. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc5-mm4 (SLUB)
On Wed, 2007-04-04 at 10:03 -0700, Christoph Lameter wrote: > On Wed, 4 Apr 2007, Badari Pulavarty wrote: > > > On Tue, 2007-04-03 at 16:55 -0700, Christoph Lameter wrote: > > > On Tue, 3 Apr 2007, Badari Pulavarty wrote: > > > > > > > Hmm. booted fine with slub_debug :( > > > > > > Try to selectively disable debug options... if you got the > > > time... > > > > > > F.e. Try with sanity checks only > > > > > > slub_debug=F > > > > slub_debug=F got something. > > Ahh Seems that the first 4 bytes of the allocations is zapped after > the object has been freed. Can you trap writes to the first four bytes of > the object? This should give you the culprit. > > The other thing is that the system is performing DMA allocations > for the file cache Then its running out of memory. > > Argh We use GFP DMA bitmask to check SLAB flags field: > > Try this fix: > > > > SLUB: Use correct flags to check for DMA cache > > We use a GFP mask to check the SLAB flags if this is a DMA cache. > > Fix this by using the correct SLAB mask and then use the SLUB_DMA > for the ORing of flags. If the system does not support DMA then > we will OR zero which will hopefully get the compiler to drop the > useless if statement as well. > > Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> > > Index: linux-2.6.21-rc5-mm4/mm/slub.c > === > --- linux-2.6.21-rc5-mm4.orig/mm/slub.c 2007-04-04 09:59:05.0 > -0700 > +++ linux-2.6.21-rc5-mm4/mm/slub.c2007-04-04 10:01:14.0 -0700 > @@ -678,8 +678,8 @@ static struct page *allocate_slab(struct > if (s->order) > flags |= __GFP_COMP; > > - if (s->flags & SLUB_DMA) > - flags |= GFP_DMA; > + if (s->flags & SLAB_CACHE_DMA) > + flags |= SLUB_DMA; > > if (node == -1) > page = alloc_pages(flags, s->order); > Machine booted fine with slub_debug=F. Got following in the log. I guess we need to track down who is touching after free. Any ideas on how I can track down easily ? Is there a way to store last allocated (function, line#) and look around there ? Thanks, Badari *** SLUB: Freepointer corrupt in [EMAIL PROTECTED] Slab 0x81017f9f8b80 offset=672 flags=0x2c7 inuse=42 freelist=0x810173f172a0 Bytes b4 0x810173f17290: a0 72 f1 73 00 00 00 00 00 00 00 00 00 00 00 00 .r\us Object 0x810173f172a0: 00 00 00 00 01 81 ff ff 00 00 00 00 00 00 00 00 ..\u\u FreePointer 0x810173f172a0 -> 0x8101 Call Trace: [] object_err+0x105/0x1b0 [] check_object+0x1b5/0x1d0 [] alloc_object_checks+0x64/0x110 [] kmem_cache_alloc+0xfc/0x1a0 [] sysfs_create_link+0xb7/0x160 [] module_add_driver+0x41/0xd0 [] bus_add_driver+0xce/0x1d0 [] driver_register+0x5d/0x90 [] __pci_register_driver+0x68/0xb0 [] agp_amd64_init+0x36/0xe0 [] gart_iommu_init+0x4c6/0x560 [] __wake_up+0x4e/0x70 [] genl_rcv+0x0/0x70 [] netlink_kernel_create+0x14c/0x160 [] genl_unlock+0x10/0x40 [] pci_iommu_init+0xe/0x20 [] kernel_init+0x154/0x330 [] child_rip+0xa/0x12 [] kernel_init+0x0/0x330 [] child_rip+0x0/0x12 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc5-mm4
On Tue, 03 Apr 2007 20:37:42 PDT, Randy Dunlap said: > > Good luck. But the symbols are there. Just use left/right arrow keys > to scroll the display left/right and you can see them. Now if you just > had that indicator to tell you that you Need to scroll to see more text... Exactly. :) I had the incredible bad luck that the line got cut off at the end of a CONFIG_ symbol that made sense - if it had showed up *half* a symbol, I'd have gone investigating. ;) (Even a '>' or '<' saying data offscreen to right or left would be sufficient, if somebody wants a small but productive kernel (config system actually) task to hack on.) I'd code it myself, but I have an SL8500 to install, and need to figure out how my laptop made it into the bag this morning still up and running (I hit the power button, it seemed to power down - blank screen, power light off, but syslog msgs prove it was up and running for another 4 hours before it shut down on a thermal check...) pgpvKl8VCNjja.pgp Description: PGP signature
Re: 2.6.21-rc5-mm4 (SLUB powerpc)
On Wed, 4 Apr 2007, Badari Pulavarty wrote: > Next issue ? Sorry. No problem. Could have a look at the hvsi driver and figure out what is failing there? What is the hvsi driver? > Console: switching to colour frame buffer device 80x30 > fb0: MATROX frame buffer device > matroxfb_crtc2: secondary head of fb0 was registered as fb1 > Kernel panic - not syncing: Couldn't register hvsi console driver Framebuffer allocation failure - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc5-mm4 (SLUB powerpc)
On Wed, 2007-04-04 at 10:13 -0700, Christoph Lameter wrote: > On Wed, 4 Apr 2007, Badari Pulavarty wrote: > > > Well !! Helps a little, but not enough to boot (hangs little later) :( > > I will try to get stack trace for that. > > Great! Thanks for all the debugging help. > > > > Processor 6 found. > > Processor 7 found. > > Brought up 8 CPUs > > mm/memory.c:111: bad pud c000f20c0480. > > Hmmm... Checking for slabs used in powerpc arch code: > > The pgtable cache is configured as > > > pgtable_cache[i] = kmem_cache_create(name, > size, size, > SLAB_HWCACHE_ALIGN | > SLAB_MUST_HWCACHE_ALIGN, > zero_ctor, > NULL); > > Hmmm aligned slabs at size and then we MUST_HWCACHE_ALIGN?? Two > competing alignment requirements and a constructor. Constructor requires > the moving of the free pointer after the slab and thus increases the slab > size. > > Sigh. IF SLAB_HWCACHE_ALIGN is set then SLUB believes this to be the > ultimate demand that overrides all other alignments and only aligns to the > cacheline. Try the following fix: > > > > SLUB: Treat SLAB_HWCACHE_ALIGN as a mininum and not as *the* alignment > > If the specified alignment is higher than L1_CACHE_BYTES and > SLAB_HWCACHE_ALIGN is set then use the higher alignment. > > Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> > > Index: linux-2.6.21-rc5-mm4/mm/slub.c > =========== > --- linux-2.6.21-rc5-mm4.orig/mm/slub.c 2007-04-04 10:09:20.0 > -0700 > +++ linux-2.6.21-rc5-mm4/mm/slub.c2007-04-04 10:09:42.0 -0700 > @@ -1373,10 +1373,7 @@ static int calculate_order(int size) > static unsigned long calculate_alignment(unsigned long flags, > unsigned long align) > { > - if (flags & SLAB_HWCACHE_ALIGN) > - return L1_CACHE_BYTES; > - > - if (flags & SLAB_MUST_HWCACHE_ALIGN) > + if (flags & (SLAB_MUST_HWCACHE_ALIGN | SLAB_HWCACHE_ALIGN)) > return max_t(unsigned long, align, L1_CACHE_BYTES); > > if (align < ARCH_SLAB_MINALIGN) Next issue ? Sorry. Thanks, Badari Allocated 0x0040 bytes for executable @ 0x0040 Elf32 kernel loaded... zImage starting: loaded at 0x0040 (sp: 0x01a3fb10) Allocating 0x822c40 bytes for kernel ... OF version = 'IBM,SF225_096' gunzipping (0x01c0 <- 0x00408000:0x006a8eac)...done 0x75cdf0 bytes Finalizing device tree... using OF tree (promptr=00c39a50) OF stdout device is: /vdevice/[EMAIL PROTECTED] Hypertas detected, assuming LPAR ! command line: root=/dev/sda2 xmon=on slub_debug memory layout at init: alloc_bottom : 02427000 alloc_top: 0800 alloc_top_hi : 0001e800 rmo_top : 0800 ram_top : 0001e800 Looking for displays found display : /[EMAIL PROTECTED]/[EMAIL PROTECTED],2/[EMAIL PROTECTED]/[EMAIL PROTECTED], opening ... done instantiating rtas at 0x077ca000 ... done : boot cpu 0002 : starting cpu hw idx 0002... done 0004 : starting cpu hw idx 0004... done 0006 : starting cpu hw idx 0006... done copying OF device tree ... Building dt strings... Building dt structure... Device tree strings 0x02428000 -> 0x024292fe Device tree struct 0x0242a000 -> 0x0243f000 Calling quiesce ... returning from prom_init Partition configured for 8 cpus. Starting Linux PPC64 #8 SMP Wed Apr 4 10:21:43 PDT 2007 - ppc64_pft_size= 0x1b physicalMemorySize= 0x1e800 ppc64_caches.dcache_line_size = 0x80 ppc64_caches.icache_line_size = 0x80 htab_address = 0x htab_hash_mask= 0xf - Linux version 2.6.21-rc5-mm4-ppc64 ([EMAIL PROTECTED]) (gcc version 4.1.0 (SUSE Linux)) #8 SMP Wed Apr 4 10:21:43 PDT 2007 [boot]0012 Setup Arch No ramdisk, default root is /dev/sda2 EEH: PCI Enhanced I/O Error Handling Enabled PPC64 nvram contains 8192 bytes Zone PFN ranges: DMA 0 -> 1998848 Normal1998848 -> 1998848 Movable zone start PFN for each node early_node_map[2] active PFN ranges 0:0 -> 974848 1: 974848 -> 1998848 [boot]0015 Setup Done Built 2 zonelists. Total pages: 1971520 Kernel command line: root=/dev/sda2 xmon=on slub_debug [boot]0020
Re: 2.6.21-rc5-mm4
Jiri Kosina <[EMAIL PROTECTED]> writes: > On Tue, 3 Apr 2007, Jiri Kosina wrote: > >> > we're also having problems reproducing it on that same combination >> > (2.6.21-rc4 + my tree), so it points to something in -mm. Since your >> > trace is completely different right now it looks like something else >> > is fuzzing it up. Since the e1000 changes are in rc5-mm3 as well, that >> > might help to narrow it down quickly. >> I don't know (yet) whether rc5-mm3 was OK in this respect, I didn't boot >> it on this machine. I only know that both rc5 and rc5 + e1000 tree are >> OK, but rc5-mm4 panics on ifconfig/dhclient on e1000 card immediately on >> my system. >> I will start bisection when I get back to the respective machine >> (tomorrow) and will let you know. > > And the bisection winner is > > i386-irq-kill-nr_irq_vectors-and-increase-nr_irqs.patch > > I don't immediately see how it could be causing it, so adding CCs which > are listed in the patch. Weird. I will have to look at that in a little more detail. Do you know if this problem happens on x86_64? What does your .config look like? What does /proc/interrupts look like? What kind of hardware you running this kernel on? Can anyone else reproduce this? The oops clearly shows something using -1 and calling that as an address I don't know why, but I'm guessing I have triggered a memory stomp somewhere. I think this is the first time I have seen a small negative number causing a NULL pointer dereference. That patch looks innocuous enough that either: - I just missed changing something I should have. - Your configuration has an increase in NR_IRQS and that triggered something. - The patch simply permuted things so a memory stomp now happens on the e1000 data structures instead of somewhere else. - Something doesn't like large irq numbers. This work is essentially a backport from x86_64 so if your hardware is 64bit capable testing that should be a fairly easy test, and be able to rule out large irq numbers as the culprit. Until I get a good look at -mm I'm going to have a hard time guessing. But a roving memory stomp is my best guess. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc5-mm4 (SLUB powerpc)
On Wed, 4 Apr 2007, Badari Pulavarty wrote: > Well !! Helps a little, but not enough to boot (hangs little later) :( > I will try to get stack trace for that. Great! Thanks for all the debugging help. > Processor 6 found. > Processor 7 found. > Brought up 8 CPUs > mm/memory.c:111: bad pud c000f20c0480. Hmmm... Checking for slabs used in powerpc arch code: The pgtable cache is configured as pgtable_cache[i] = kmem_cache_create(name, size, size, SLAB_HWCACHE_ALIGN | SLAB_MUST_HWCACHE_ALIGN, zero_ctor, NULL); Hmmm aligned slabs at size and then we MUST_HWCACHE_ALIGN?? Two competing alignment requirements and a constructor. Constructor requires the moving of the free pointer after the slab and thus increases the slab size. Sigh. IF SLAB_HWCACHE_ALIGN is set then SLUB believes this to be the ultimate demand that overrides all other alignments and only aligns to the cacheline. Try the following fix: SLUB: Treat SLAB_HWCACHE_ALIGN as a mininum and not as *the* alignment If the specified alignment is higher than L1_CACHE_BYTES and SLAB_HWCACHE_ALIGN is set then use the higher alignment. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> Index: linux-2.6.21-rc5-mm4/mm/slub.c =========== --- linux-2.6.21-rc5-mm4.orig/mm/slub.c 2007-04-04 10:09:20.00000 -0700 +++ linux-2.6.21-rc5-mm4/mm/slub.c 2007-04-04 10:09:42.0 -0700 @@ -1373,10 +1373,7 @@ static int calculate_order(int size) static unsigned long calculate_alignment(unsigned long flags, unsigned long align) { - if (flags & SLAB_HWCACHE_ALIGN) - return L1_CACHE_BYTES; - - if (flags & SLAB_MUST_HWCACHE_ALIGN) + if (flags & (SLAB_MUST_HWCACHE_ALIGN | SLAB_HWCACHE_ALIGN)) return max_t(unsigned long, align, L1_CACHE_BYTES); if (align < ARCH_SLAB_MINALIGN) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc5-mm4 (SLUB)
On Wed, 4 Apr 2007, Badari Pulavarty wrote: > On Tue, 2007-04-03 at 16:55 -0700, Christoph Lameter wrote: > > On Tue, 3 Apr 2007, Badari Pulavarty wrote: > > > > > Hmm. booted fine with slub_debug :( > > > > Try to selectively disable debug options... if you got the > > time... > > > > F.e. Try with sanity checks only > > > > slub_debug=F > > slub_debug=F got something. Ahh Seems that the first 4 bytes of the allocations is zapped after the object has been freed. Can you trap writes to the first four bytes of the object? This should give you the culprit. The other thing is that the system is performing DMA allocations for the file cache Then its running out of memory. Argh We use GFP DMA bitmask to check SLAB flags field: Try this fix: SLUB: Use correct flags to check for DMA cache We use a GFP mask to check the SLAB flags if this is a DMA cache. Fix this by using the correct SLAB mask and then use the SLUB_DMA for the ORing of flags. If the system does not support DMA then we will OR zero which will hopefully get the compiler to drop the useless if statement as well. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> Index: linux-2.6.21-rc5-mm4/mm/slub.c =========== --- linux-2.6.21-rc5-mm4.orig/mm/slub.c 2007-04-04 09:59:05.0 -0700 +++ linux-2.6.21-rc5-mm4/mm/slub.c 2007-04-04 10:01:14.0 -0700 @@ -678,8 +678,8 @@ static struct page *allocate_slab(struct if (s->order) flags |= __GFP_COMP; - if (s->flags & SLUB_DMA) - flags |= GFP_DMA; + if (s->flags & SLAB_CACHE_DMA) + flags |= SLUB_DMA; if (node == -1) page = alloc_pages(flags, s->order); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc5-mm4
On Tue, 3 Apr 2007, Jiri Kosina wrote: > > we're also having problems reproducing it on that same combination > > (2.6.21-rc4 + my tree), so it points to something in -mm. Since your > > trace is completely different right now it looks like something else > > is fuzzing it up. Since the e1000 changes are in rc5-mm3 as well, that > > might help to narrow it down quickly. > I don't know (yet) whether rc5-mm3 was OK in this respect, I didn't boot > it on this machine. I only know that both rc5 and rc5 + e1000 tree are > OK, but rc5-mm4 panics on ifconfig/dhclient on e1000 card immediately on > my system. > I will start bisection when I get back to the respective machine > (tomorrow) and will let you know. And the bisection winner is i386-irq-kill-nr_irq_vectors-and-increase-nr_irqs.patch I don't immediately see how it could be causing it, so adding CCs which are listed in the patch. Original description of the symptoms at http://lkml.org/lkml/2007/4/3/90 -- Jiri Kosina - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc5-mm4
On Wed, 2007-04-04 at 08:12 -0700, Badari Pulavarty wrote: > On Tue, 2007-04-03 at 18:16 -0700, Christoph Lameter wrote: > > On Tue, 3 Apr 2007, Badari Pulavarty wrote: > > > > > Seems to be an issue with calibrate_delay() spinning in a tight > > > loop :( > > > > > > BTW, machine boots fine with SLAB code - not sure why ? > > > > Interrupt disabled sigh. > > > > Here is the fix: > > > > > > > > > > SLUB: Fix numa bootstrap > > > > NUMA bootstrap calls new_slab() if more than one node is found on bootup. > > new_slab() assumes a standard slab context where interrupts must be > > disabled. It enables interrupts for the call into the page allocator > > and then disables them again. Interrupts do not have to be disabled > > during on bootstrap because we still run single threaded there. > > > > I dropped the interrupt preservation code just before SLUB v6 because > > it looked useless there. SLUB worked on the following NUMA tests > > that just had a single node. Sigh. > > > > Enable interrupts after calling new_slab. > > > > Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> > > > > Index: linux-2.6.21-rc5-mm4/mm/slub.c > > === > > --- linux-2.6.21-rc5-mm4.orig/mm/slub.c 2007-04-03 18:07:41.0 > > -0700 > > +++ linux-2.6.21-rc5-mm4/mm/slub.c 2007-04-03 18:08:17.0 -0700 > > @@ -1436,6 +1436,8 @@ static int init_kmem_cache_nodes(struct > > > > BUG_ON(s->size < sizeof(struct kmem_cache_node)); > > page = new_slab(kmalloc_caches, gfpflags, node); > > + /* new_slab() disables interupts */ > > + local_irq_enable(); > > > > BUG_ON(!page); > > n = page->freelist; > > Well !! Helps a little, but not enough to boot (hangs little later) :( > I will try to get stack trace for that. Better debug with slub_debug. Hope this helps. Thanks, Badari boot: 2621rc5mm4 xmon=on slub_debug Please wait, loading kernel... Allocated 0x0040 bytes for executable @ 0x0040 Elf32 kernel loaded... zImage starting: loaded at 0x0040 (sp: 0x01a3fb10) Allocating 0x826c40 bytes for kernel ... OF version = 'IBM,SF225_096' gunzipping (0x01c0 <- 0x00408000:0x006a8e52)...done 0x760df0 bytes Finalizing device tree... using OF tree (promptr=00c39a50) OF stdout device is: /vdevice/[EMAIL PROTECTED] Hypertas detected, assuming LPAR ! command line: root=/dev/sda2 xmon=on slub_debug memory layout at init: alloc_bottom : 0242b000 alloc_top: 0800 alloc_top_hi : 0001e800 rmo_top : 0800 ram_top : 0001e800 Looking for displays found display : /[EMAIL PROTECTED]/[EMAIL PROTECTED],2/[EMAIL PROTECTED]/[EMAIL PROTECTED], opening ... done instantiating rtas at 0x077ca000 ... done : boot cpu 0002 : starting cpu hw idx 0002... done 0004 : starting cpu hw idx 0004... done 0006 : starting cpu hw idx 0006... done copying OF device tree ... Building dt strings... Building dt structure... Device tree strings 0x0242c000 -> 0x0242d2fe Device tree struct 0x0242e000 -> 0x02443000 Calling quiesce ... returning from prom_init Partition configured for 8 cpus. Starting Linux PPC64 #7 SMP Wed Apr 4 07:52:49 PDT 2007 - ppc64_pft_size = 0x1b physicalMemorySize= 0x1e800 ppc64_caches.dcache_line_size = 0x80 ppc64_caches.icache_line_size = 0x80 htab_address = 0x htab_hash_mask= 0xf - Linux version 2.6.21-rc5-mm4-ppc64 ([EMAIL PROTECTED]) (gcc version 4.1.0 (SUSE Linux)) #7 SMP Wed Apr 4 07:52:49 PDT 2007 [boot]0012 Setup Arch No ramdisk, default root is /dev/sda2 EEH: PCI Enhanced I/O Error Handling Enabled PPC64 nvram contains 8192 bytes Zone PFN ranges: DMA 0 -> 1998848 Normal1998848 -> 1998848 Movable zone start PFN for each node early_node_map[2] active PFN ranges 0:0 -> 974848 1: 974848 -> 1998848 [boot]0015 Setup Done Built 2 zonelists. Total pages: 1971520 Kernel command line: root=/dev/sda2 xmon=on slub_debug [boot]0020 XICS Init [boot]0021 XICS Done PID hash table entries: 4096 (order: 12, 32768 bytes) Console: colour dummy device 80x25 console handover: boot [udbg-1] -> real [hvc0] Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes) Inode-cache ha
Re: 2.6.21-rc5-mm4
On Tue, 2007-04-03 at 18:16 -0700, Christoph Lameter wrote: > On Tue, 3 Apr 2007, Badari Pulavarty wrote: > > > Seems to be an issue with calibrate_delay() spinning in a tight > > loop :( > > > > BTW, machine boots fine with SLAB code - not sure why ? > > Interrupt disabled sigh. > > Here is the fix: > > > > > SLUB: Fix numa bootstrap > > NUMA bootstrap calls new_slab() if more than one node is found on bootup. > new_slab() assumes a standard slab context where interrupts must be > disabled. It enables interrupts for the call into the page allocator > and then disables them again. Interrupts do not have to be disabled > during on bootstrap because we still run single threaded there. > > I dropped the interrupt preservation code just before SLUB v6 because > it looked useless there. SLUB worked on the following NUMA tests > that just had a single node. Sigh. > > Enable interrupts after calling new_slab. > > Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> > > Index: linux-2.6.21-rc5-mm4/mm/slub.c > =========== > --- linux-2.6.21-rc5-mm4.orig/mm/slub.c 2007-04-03 18:07:41.0 > -0700 > +++ linux-2.6.21-rc5-mm4/mm/slub.c2007-04-03 18:08:17.0 -0700 > @@ -1436,6 +1436,8 @@ static int init_kmem_cache_nodes(struct > > BUG_ON(s->size < sizeof(struct kmem_cache_node)); > page = new_slab(kmalloc_caches, gfpflags, node); > + /* new_slab() disables interupts */ > + local_irq_enable(); > > BUG_ON(!page); > n = page->freelist; Well !! Helps a little, but not enough to boot (hangs little later) :( I will try to get stack trace for that. Thanks, Badari boot: 2621rc5mm4 Please wait, loading kernel... Allocated 0x0040 bytes for executable @ 0x0040 Elf32 kernel loaded... zImage starting: loaded at 0x0040 (sp: 0x01a3fb10) Allocating 0x826c40 bytes for kernel ... OF version = 'IBM,SF225_096' gunzipping (0x01c0 <- 0x00408000:0x006a8e52)...done 0x760df0 bytes Finalizing device tree... using OF tree (promptr=00c39a50) OF stdout device is: /vdevice/[EMAIL PROTECTED] Hypertas detected, assuming LPAR ! command line: root=/dev/sda2 memory layout at init: alloc_bottom : 0242b000 alloc_top: 0800 alloc_top_hi : 0001e800 rmo_top : 0800 ram_top : 0001e800 Looking for displays found display : /[EMAIL PROTECTED]/[EMAIL PROTECTED],2/[EMAIL PROTECTED]/[EMAIL PROTECTED], opening ... done instantiating rtas at 0x077ca000 ... done : boot cpu 0002 : starting cpu hw idx 0002... done 0004 : starting cpu hw idx 0004... done 0006 : starting cpu hw idx 0006... done copying OF device tree ... Building dt strings... Building dt structure... Device tree strings 0x0242c000 -> 0x0242d2fe Device tree struct 0x0242e000 -> 0x02443000 Calling quiesce ... returning from prom_init Partition configured for 8 cpus. Starting Linux PPC64 #7 SMP Wed Apr 4 07:52:49 PDT 2007 - ppc64_pft_size= 0x1b physicalMemorySize= 0x1e800 ppc64_caches.dcache_line_size = 0x80 ppc64_caches.icache_line_size = 0x80 htab_address = 0x htab_hash_mask= 0xf - Linux version 2.6.21-rc5-mm4-ppc64 ([EMAIL PROTECTED]) (gcc version 4.1.0 (SUSE Linux)) #7 SMP Wed Apr 4 07:52:49 PDT 2007 [boot]0012 Setup Arch No ramdisk, default root is /dev/sda2 EEH: PCI Enhanced I/O Error Handling Enabled PPC64 nvram contains 8192 bytes Zone PFN ranges: DMA 0 -> 1998848 Normal1998848 -> 1998848 Movable zone start PFN for each node early_node_map[2] active PFN ranges 0:0 -> 974848 1: 974848 -> 1998848 [boot]0015 Setup Done Built 2 zonelists. Total pages: 1971520 Kernel command line: root=/dev/sda2 [boot]0020 XICS Init [boot]0021 XICS Done PID hash table entries: 4096 (order: 12, 32768 bytes) Console: colour dummy device 80x25 console handover: boot [udbg-1] -> real [hvc0] Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes) Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes) freeing bootmem node 0 freeing bootmem node 1 Memory: 7855384k/7995392k available (6064k kernel code, 140008k reserved, 1236k data, 819k bss, 272k init) SLUB V6: General Slabs=18, HW alignment=128, Processors=8, Nodes=16 Calibrating delay loop...475.13 BogoMIPS (lpj=2375680) Security Framework v1.0.0 initialized Mount-cache hash table e
Re: 2.6.21-rc5-mm4
On Tue, 2007-04-03 at 18:16 -0700, Christoph Lameter wrote: On Tue, 3 Apr 2007, Badari Pulavarty wrote: Seems to be an issue with calibrate_delay() spinning in a tight loop :( BTW, machine boots fine with SLAB code - not sure why ? Interrupt disabled sigh. Here is the fix: SLUB: Fix numa bootstrap NUMA bootstrap calls new_slab() if more than one node is found on bootup. new_slab() assumes a standard slab context where interrupts must be disabled. It enables interrupts for the call into the page allocator and then disables them again. Interrupts do not have to be disabled during on bootstrap because we still run single threaded there. I dropped the interrupt preservation code just before SLUB v6 because it looked useless there. SLUB worked on the following NUMA tests that just had a single node. Sigh. Enable interrupts after calling new_slab. Signed-off-by: Christoph Lameter [EMAIL PROTECTED] Index: linux-2.6.21-rc5-mm4/mm/slub.c === --- linux-2.6.21-rc5-mm4.orig/mm/slub.c 2007-04-03 18:07:41.0 -0700 +++ linux-2.6.21-rc5-mm4/mm/slub.c2007-04-03 18:08:17.0 -0700 @@ -1436,6 +1436,8 @@ static int init_kmem_cache_nodes(struct BUG_ON(s-size sizeof(struct kmem_cache_node)); page = new_slab(kmalloc_caches, gfpflags, node); + /* new_slab() disables interupts */ + local_irq_enable(); BUG_ON(!page); n = page-freelist; Well !! Helps a little, but not enough to boot (hangs little later) :( I will try to get stack trace for that. Thanks, Badari boot: 2621rc5mm4 Please wait, loading kernel... Allocated 0x0040 bytes for executable @ 0x0040 Elf32 kernel loaded... zImage starting: loaded at 0x0040 (sp: 0x01a3fb10) Allocating 0x826c40 bytes for kernel ... OF version = 'IBM,SF225_096' gunzipping (0x01c0 - 0x00408000:0x006a8e52)...done 0x760df0 bytes Finalizing device tree... using OF tree (promptr=00c39a50) OF stdout device is: /vdevice/[EMAIL PROTECTED] Hypertas detected, assuming LPAR ! command line: root=/dev/sda2 memory layout at init: alloc_bottom : 0242b000 alloc_top: 0800 alloc_top_hi : 0001e800 rmo_top : 0800 ram_top : 0001e800 Looking for displays found display : /[EMAIL PROTECTED]/[EMAIL PROTECTED],2/[EMAIL PROTECTED]/[EMAIL PROTECTED], opening ... done instantiating rtas at 0x077ca000 ... done : boot cpu 0002 : starting cpu hw idx 0002... done 0004 : starting cpu hw idx 0004... done 0006 : starting cpu hw idx 0006... done copying OF device tree ... Building dt strings... Building dt structure... Device tree strings 0x0242c000 - 0x0242d2fe Device tree struct 0x0242e000 - 0x02443000 Calling quiesce ... returning from prom_init Partition configured for 8 cpus. Starting Linux PPC64 #7 SMP Wed Apr 4 07:52:49 PDT 2007 - ppc64_pft_size= 0x1b physicalMemorySize= 0x1e800 ppc64_caches.dcache_line_size = 0x80 ppc64_caches.icache_line_size = 0x80 htab_address = 0x htab_hash_mask= 0xf - Linux version 2.6.21-rc5-mm4-ppc64 ([EMAIL PROTECTED]) (gcc version 4.1.0 (SUSE Linux)) #7 SMP Wed Apr 4 07:52:49 PDT 2007 [boot]0012 Setup Arch No ramdisk, default root is /dev/sda2 EEH: PCI Enhanced I/O Error Handling Enabled PPC64 nvram contains 8192 bytes Zone PFN ranges: DMA 0 - 1998848 Normal1998848 - 1998848 Movable zone start PFN for each node early_node_map[2] active PFN ranges 0:0 - 974848 1: 974848 - 1998848 [boot]0015 Setup Done Built 2 zonelists. Total pages: 1971520 Kernel command line: root=/dev/sda2 [boot]0020 XICS Init [boot]0021 XICS Done PID hash table entries: 4096 (order: 12, 32768 bytes) Console: colour dummy device 80x25 console handover: boot [udbg-1] - real [hvc0] Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes) Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes) freeing bootmem node 0 freeing bootmem node 1 Memory: 7855384k/7995392k available (6064k kernel code, 140008k reserved, 1236k data, 819k bss, 272k init) SLUB V6: General Slabs=18, HW alignment=128, Processors=8, Nodes=16 Calibrating delay loop...475.13 BogoMIPS (lpj=2375680) Security Framework v1.0.0 initialized Mount-cache hash table entries: 256 Processor 1 found. Processor 2 found. Processor 3 found. Processor 4 found. Processor 5 found. Processor 6 found. Processor 7 found. Brought up 8 CPUs mm/memory.c:111: bad pud c000f20c0480. could not vmalloc 20971520 bytes
Re: 2.6.21-rc5-mm4
On Wed, 2007-04-04 at 08:12 -0700, Badari Pulavarty wrote: On Tue, 2007-04-03 at 18:16 -0700, Christoph Lameter wrote: On Tue, 3 Apr 2007, Badari Pulavarty wrote: Seems to be an issue with calibrate_delay() spinning in a tight loop :( BTW, machine boots fine with SLAB code - not sure why ? Interrupt disabled sigh. Here is the fix: SLUB: Fix numa bootstrap NUMA bootstrap calls new_slab() if more than one node is found on bootup. new_slab() assumes a standard slab context where interrupts must be disabled. It enables interrupts for the call into the page allocator and then disables them again. Interrupts do not have to be disabled during on bootstrap because we still run single threaded there. I dropped the interrupt preservation code just before SLUB v6 because it looked useless there. SLUB worked on the following NUMA tests that just had a single node. Sigh. Enable interrupts after calling new_slab. Signed-off-by: Christoph Lameter [EMAIL PROTECTED] Index: linux-2.6.21-rc5-mm4/mm/slub.c === --- linux-2.6.21-rc5-mm4.orig/mm/slub.c 2007-04-03 18:07:41.0 -0700 +++ linux-2.6.21-rc5-mm4/mm/slub.c 2007-04-03 18:08:17.0 -0700 @@ -1436,6 +1436,8 @@ static int init_kmem_cache_nodes(struct BUG_ON(s-size sizeof(struct kmem_cache_node)); page = new_slab(kmalloc_caches, gfpflags, node); + /* new_slab() disables interupts */ + local_irq_enable(); BUG_ON(!page); n = page-freelist; Well !! Helps a little, but not enough to boot (hangs little later) :( I will try to get stack trace for that. Better debug with slub_debug. Hope this helps. Thanks, Badari boot: 2621rc5mm4 xmon=on slub_debug Please wait, loading kernel... Allocated 0x0040 bytes for executable @ 0x0040 Elf32 kernel loaded... zImage starting: loaded at 0x0040 (sp: 0x01a3fb10) Allocating 0x826c40 bytes for kernel ... OF version = 'IBM,SF225_096' gunzipping (0x01c0 - 0x00408000:0x006a8e52)...done 0x760df0 bytes Finalizing device tree... using OF tree (promptr=00c39a50) OF stdout device is: /vdevice/[EMAIL PROTECTED] Hypertas detected, assuming LPAR ! command line: root=/dev/sda2 xmon=on slub_debug memory layout at init: alloc_bottom : 0242b000 alloc_top: 0800 alloc_top_hi : 0001e800 rmo_top : 0800 ram_top : 0001e800 Looking for displays found display : /[EMAIL PROTECTED]/[EMAIL PROTECTED],2/[EMAIL PROTECTED]/[EMAIL PROTECTED], opening ... done instantiating rtas at 0x077ca000 ... done : boot cpu 0002 : starting cpu hw idx 0002... done 0004 : starting cpu hw idx 0004... done 0006 : starting cpu hw idx 0006... done copying OF device tree ... Building dt strings... Building dt structure... Device tree strings 0x0242c000 - 0x0242d2fe Device tree struct 0x0242e000 - 0x02443000 Calling quiesce ... returning from prom_init Partition configured for 8 cpus. Starting Linux PPC64 #7 SMP Wed Apr 4 07:52:49 PDT 2007 - ppc64_pft_size= 0x1b physicalMemorySize= 0x1e800 ppc64_caches.dcache_line_size = 0x80 ppc64_caches.icache_line_size = 0x80 htab_address = 0x htab_hash_mask= 0xf - Linux version 2.6.21-rc5-mm4-ppc64 ([EMAIL PROTECTED]) (gcc version 4.1.0 (SUSE Linux)) #7 SMP Wed Apr 4 07:52:49 PDT 2007 [boot]0012 Setup Arch No ramdisk, default root is /dev/sda2 EEH: PCI Enhanced I/O Error Handling Enabled PPC64 nvram contains 8192 bytes Zone PFN ranges: DMA 0 - 1998848 Normal1998848 - 1998848 Movable zone start PFN for each node early_node_map[2] active PFN ranges 0:0 - 974848 1: 974848 - 1998848 [boot]0015 Setup Done Built 2 zonelists. Total pages: 1971520 Kernel command line: root=/dev/sda2 xmon=on slub_debug [boot]0020 XICS Init [boot]0021 XICS Done PID hash table entries: 4096 (order: 12, 32768 bytes) Console: colour dummy device 80x25 console handover: boot [udbg-1] - real [hvc0] Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes) Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes) freeing bootmem node 0 freeing bootmem node 1 Memory: 7855384k/7995392k available (6064k kernel code, 140008k reserved, 1236k data, 819k bss, 272k init) SLUB V6: General Slabs=18, HW alignment=128, Processors=8, Nodes=16 Calibrating delay loop... 475.13 BogoMIPS (lpj=2375680) Security Framework v1.0.0 initialized Mount-cache hash table entries: 256 Processor 1 found. Processor 2
Re: 2.6.21-rc5-mm4
On Tue, 3 Apr 2007, Jiri Kosina wrote: we're also having problems reproducing it on that same combination (2.6.21-rc4 + my tree), so it points to something in -mm. Since your trace is completely different right now it looks like something else is fuzzing it up. Since the e1000 changes are in rc5-mm3 as well, that might help to narrow it down quickly. I don't know (yet) whether rc5-mm3 was OK in this respect, I didn't boot it on this machine. I only know that both rc5 and rc5 + e1000 tree are OK, but rc5-mm4 panics on ifconfig/dhclient on e1000 card immediately on my system. I will start bisection when I get back to the respective machine (tomorrow) and will let you know. And the bisection winner is i386-irq-kill-nr_irq_vectors-and-increase-nr_irqs.patch I don't immediately see how it could be causing it, so adding CCs which are listed in the patch. Original description of the symptoms at http://lkml.org/lkml/2007/4/3/90 -- Jiri Kosina - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc5-mm4 (SLUB)
On Wed, 4 Apr 2007, Badari Pulavarty wrote: On Tue, 2007-04-03 at 16:55 -0700, Christoph Lameter wrote: On Tue, 3 Apr 2007, Badari Pulavarty wrote: Hmm. booted fine with slub_debug :( Try to selectively disable debug options... if you got the time... F.e. Try with sanity checks only slub_debug=F slub_debug=F got something. Ahh Seems that the first 4 bytes of the allocations is zapped after the object has been freed. Can you trap writes to the first four bytes of the object? This should give you the culprit. The other thing is that the system is performing DMA allocations for the file cache Then its running out of memory. Argh We use GFP DMA bitmask to check SLAB flags field: Try this fix: SLUB: Use correct flags to check for DMA cache We use a GFP mask to check the SLAB flags if this is a DMA cache. Fix this by using the correct SLAB mask and then use the SLUB_DMA for the ORing of flags. If the system does not support DMA then we will OR zero which will hopefully get the compiler to drop the useless if statement as well. Signed-off-by: Christoph Lameter [EMAIL PROTECTED] Index: linux-2.6.21-rc5-mm4/mm/slub.c === --- linux-2.6.21-rc5-mm4.orig/mm/slub.c 2007-04-04 09:59:05.0 -0700 +++ linux-2.6.21-rc5-mm4/mm/slub.c 2007-04-04 10:01:14.0 -0700 @@ -678,8 +678,8 @@ static struct page *allocate_slab(struct if (s-order) flags |= __GFP_COMP; - if (s-flags SLUB_DMA) - flags |= GFP_DMA; + if (s-flags SLAB_CACHE_DMA) + flags |= SLUB_DMA; if (node == -1) page = alloc_pages(flags, s-order); - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc5-mm4 (SLUB powerpc)
On Wed, 4 Apr 2007, Badari Pulavarty wrote: Well !! Helps a little, but not enough to boot (hangs little later) :( I will try to get stack trace for that. Great! Thanks for all the debugging help. Processor 6 found. Processor 7 found. Brought up 8 CPUs mm/memory.c:111: bad pud c000f20c0480. Hmmm... Checking for slabs used in powerpc arch code: The pgtable cache is configured as pgtable_cache[i] = kmem_cache_create(name, size, size, SLAB_HWCACHE_ALIGN | SLAB_MUST_HWCACHE_ALIGN, zero_ctor, NULL); Hmmm aligned slabs at size and then we MUST_HWCACHE_ALIGN?? Two competing alignment requirements and a constructor. Constructor requires the moving of the free pointer after the slab and thus increases the slab size. Sigh. IF SLAB_HWCACHE_ALIGN is set then SLUB believes this to be the ultimate demand that overrides all other alignments and only aligns to the cacheline. Try the following fix: SLUB: Treat SLAB_HWCACHE_ALIGN as a mininum and not as *the* alignment If the specified alignment is higher than L1_CACHE_BYTES and SLAB_HWCACHE_ALIGN is set then use the higher alignment. Signed-off-by: Christoph Lameter [EMAIL PROTECTED] Index: linux-2.6.21-rc5-mm4/mm/slub.c === --- linux-2.6.21-rc5-mm4.orig/mm/slub.c 2007-04-04 10:09:20.0 -0700 +++ linux-2.6.21-rc5-mm4/mm/slub.c 2007-04-04 10:09:42.0 -0700 @@ -1373,10 +1373,7 @@ static int calculate_order(int size) static unsigned long calculate_alignment(unsigned long flags, unsigned long align) { - if (flags SLAB_HWCACHE_ALIGN) - return L1_CACHE_BYTES; - - if (flags SLAB_MUST_HWCACHE_ALIGN) + if (flags (SLAB_MUST_HWCACHE_ALIGN | SLAB_HWCACHE_ALIGN)) return max_t(unsigned long, align, L1_CACHE_BYTES); if (align ARCH_SLAB_MINALIGN) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc5-mm4
Jiri Kosina [EMAIL PROTECTED] writes: On Tue, 3 Apr 2007, Jiri Kosina wrote: we're also having problems reproducing it on that same combination (2.6.21-rc4 + my tree), so it points to something in -mm. Since your trace is completely different right now it looks like something else is fuzzing it up. Since the e1000 changes are in rc5-mm3 as well, that might help to narrow it down quickly. I don't know (yet) whether rc5-mm3 was OK in this respect, I didn't boot it on this machine. I only know that both rc5 and rc5 + e1000 tree are OK, but rc5-mm4 panics on ifconfig/dhclient on e1000 card immediately on my system. I will start bisection when I get back to the respective machine (tomorrow) and will let you know. And the bisection winner is i386-irq-kill-nr_irq_vectors-and-increase-nr_irqs.patch I don't immediately see how it could be causing it, so adding CCs which are listed in the patch. Weird. I will have to look at that in a little more detail. Do you know if this problem happens on x86_64? What does your .config look like? What does /proc/interrupts look like? What kind of hardware you running this kernel on? Can anyone else reproduce this? The oops clearly shows something using -1 and calling that as an address I don't know why, but I'm guessing I have triggered a memory stomp somewhere. I think this is the first time I have seen a small negative number causing a NULL pointer dereference. That patch looks innocuous enough that either: - I just missed changing something I should have. - Your configuration has an increase in NR_IRQS and that triggered something. - The patch simply permuted things so a memory stomp now happens on the e1000 data structures instead of somewhere else. - Something doesn't like large irq numbers. This work is essentially a backport from x86_64 so if your hardware is 64bit capable testing that should be a fairly easy test, and be able to rule out large irq numbers as the culprit. Until I get a good look at -mm I'm going to have a hard time guessing. But a roving memory stomp is my best guess. Eric - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc5-mm4 (SLUB powerpc)
On Wed, 2007-04-04 at 10:13 -0700, Christoph Lameter wrote: On Wed, 4 Apr 2007, Badari Pulavarty wrote: Well !! Helps a little, but not enough to boot (hangs little later) :( I will try to get stack trace for that. Great! Thanks for all the debugging help. Processor 6 found. Processor 7 found. Brought up 8 CPUs mm/memory.c:111: bad pud c000f20c0480. Hmmm... Checking for slabs used in powerpc arch code: The pgtable cache is configured as pgtable_cache[i] = kmem_cache_create(name, size, size, SLAB_HWCACHE_ALIGN | SLAB_MUST_HWCACHE_ALIGN, zero_ctor, NULL); Hmmm aligned slabs at size and then we MUST_HWCACHE_ALIGN?? Two competing alignment requirements and a constructor. Constructor requires the moving of the free pointer after the slab and thus increases the slab size. Sigh. IF SLAB_HWCACHE_ALIGN is set then SLUB believes this to be the ultimate demand that overrides all other alignments and only aligns to the cacheline. Try the following fix: SLUB: Treat SLAB_HWCACHE_ALIGN as a mininum and not as *the* alignment If the specified alignment is higher than L1_CACHE_BYTES and SLAB_HWCACHE_ALIGN is set then use the higher alignment. Signed-off-by: Christoph Lameter [EMAIL PROTECTED] Index: linux-2.6.21-rc5-mm4/mm/slub.c === --- linux-2.6.21-rc5-mm4.orig/mm/slub.c 2007-04-04 10:09:20.0 -0700 +++ linux-2.6.21-rc5-mm4/mm/slub.c2007-04-04 10:09:42.0 -0700 @@ -1373,10 +1373,7 @@ static int calculate_order(int size) static unsigned long calculate_alignment(unsigned long flags, unsigned long align) { - if (flags SLAB_HWCACHE_ALIGN) - return L1_CACHE_BYTES; - - if (flags SLAB_MUST_HWCACHE_ALIGN) + if (flags (SLAB_MUST_HWCACHE_ALIGN | SLAB_HWCACHE_ALIGN)) return max_t(unsigned long, align, L1_CACHE_BYTES); if (align ARCH_SLAB_MINALIGN) Next issue ? Sorry. Thanks, Badari Allocated 0x0040 bytes for executable @ 0x0040 Elf32 kernel loaded... zImage starting: loaded at 0x0040 (sp: 0x01a3fb10) Allocating 0x822c40 bytes for kernel ... OF version = 'IBM,SF225_096' gunzipping (0x01c0 - 0x00408000:0x006a8eac)...done 0x75cdf0 bytes Finalizing device tree... using OF tree (promptr=00c39a50) OF stdout device is: /vdevice/[EMAIL PROTECTED] Hypertas detected, assuming LPAR ! command line: root=/dev/sda2 xmon=on slub_debug memory layout at init: alloc_bottom : 02427000 alloc_top: 0800 alloc_top_hi : 0001e800 rmo_top : 0800 ram_top : 0001e800 Looking for displays found display : /[EMAIL PROTECTED]/[EMAIL PROTECTED],2/[EMAIL PROTECTED]/[EMAIL PROTECTED], opening ... done instantiating rtas at 0x077ca000 ... done : boot cpu 0002 : starting cpu hw idx 0002... done 0004 : starting cpu hw idx 0004... done 0006 : starting cpu hw idx 0006... done copying OF device tree ... Building dt strings... Building dt structure... Device tree strings 0x02428000 - 0x024292fe Device tree struct 0x0242a000 - 0x0243f000 Calling quiesce ... returning from prom_init Partition configured for 8 cpus. Starting Linux PPC64 #8 SMP Wed Apr 4 10:21:43 PDT 2007 - ppc64_pft_size= 0x1b physicalMemorySize= 0x1e800 ppc64_caches.dcache_line_size = 0x80 ppc64_caches.icache_line_size = 0x80 htab_address = 0x htab_hash_mask= 0xf - Linux version 2.6.21-rc5-mm4-ppc64 ([EMAIL PROTECTED]) (gcc version 4.1.0 (SUSE Linux)) #8 SMP Wed Apr 4 10:21:43 PDT 2007 [boot]0012 Setup Arch No ramdisk, default root is /dev/sda2 EEH: PCI Enhanced I/O Error Handling Enabled PPC64 nvram contains 8192 bytes Zone PFN ranges: DMA 0 - 1998848 Normal1998848 - 1998848 Movable zone start PFN for each node early_node_map[2] active PFN ranges 0:0 - 974848 1: 974848 - 1998848 [boot]0015 Setup Done Built 2 zonelists. Total pages: 1971520 Kernel command line: root=/dev/sda2 xmon=on slub_debug [boot]0020 XICS Init [boot]0021 XICS Done PID hash table entries: 4096 (order: 12, 32768 bytes) Console: colour dummy device 80x25 console handover: boot [udbg-1] - real [hvc0] Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes) Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes) freeing bootmem node