Re: [PATCH] cache_k8_northbridges() overflows beyond allocation (Was: 2.6.21-rc5-mm4 (SLUB))

2007-04-13 Thread Christoph Lameter
On Fri, 13 Apr 2007, Badari Pulavarty wrote:

> On Wed, 2007-04-04 at 11:04 -0700, Christoph Lameter wrote:
> > On Wed, 4 Apr 2007, Badari Pulavarty wrote:
> ...
> > 
> > > *** SLUB: Freepointer corrupt in [EMAIL PROTECTED] Slab
> > > 0x81017f9f8b80
> > > offset=672 flags=0x2c7 inuse=42
> > > freelist=0x810173f172a0
> > >   Bytes b4 0x810173f17290:  a0 72 f1 73 00 00 00 00 00 00 00 00 00
> > > 00 00 00 .r\us
> > > Object 0x810173f172a0:  00 00 00 00 01 81 ff ff 00 00 00 00 00
> > > 00 00 00 ..\u\u
> > > FreePointer 0x810173f172a0 -> 0x8101
> > 
> 
> Found it !! After a painful capture of all the kmalloc-16 slab
> allocations (400+) so far and auditing some of them, found the
> culprit - who writes beyond its allocation, causing the slab
> corruption.

Thanks. I am sorry that this was not easier for you. But as a result I 
thoroughly tested the slab corruption detection in SLUB yesterday found 
various issues and submitted  patches to Andrew that will make this 
really work well. Too late for you though.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] cache_k8_northbridges() overflows beyond allocation (Was: 2.6.21-rc5-mm4 (SLUB))

2007-04-13 Thread Andrew Morton
On Fri, 13 Apr 2007 17:45:37 +0200 Andi Kleen <[EMAIL PROTECTED]> wrote:

> 
> > 
> > cache_k8_northbridges() is storing config values to incorrect locations
> > (in flush_words) and also its overflowing beyond the allocation, causing
> > slab verification failures.
> 
> Oops. Thanks for tracking that down, Badari.
> 
> Andrew, clear .21 candidate.

OK.  And for 2.6.20.x, methinks.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] cache_k8_northbridges() overflows beyond allocation (Was: 2.6.21-rc5-mm4 (SLUB))

2007-04-13 Thread Andi Kleen
On Friday 13 April 2007 18:42:43 Chuck Ebbert wrote:
> Andi Kleen wrote:
> >> cache_k8_northbridges() is storing config values to incorrect locations
> >> (in flush_words) and also its overflowing beyond the allocation, causing
> >> slab verification failures.
> > 
> > Oops. Thanks for tracking that down, Badari.
> > 
> > Andrew, clear .21 candidate.
> > 
> 
> 2.6.20 as well. Do you want me to submit it?

After it is in .21

-Andi


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] cache_k8_northbridges() overflows beyond allocation (Was: 2.6.21-rc5-mm4 (SLUB))

2007-04-13 Thread Chuck Ebbert
Andi Kleen wrote:
>> cache_k8_northbridges() is storing config values to incorrect locations
>> (in flush_words) and also its overflowing beyond the allocation, causing
>> slab verification failures.
> 
> Oops. Thanks for tracking that down, Badari.
> 
> Andrew, clear .21 candidate.
> 

2.6.20 as well. Do you want me to submit it?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] cache_k8_northbridges() overflows beyond allocation (Was: 2.6.21-rc5-mm4 (SLUB))

2007-04-13 Thread Andi Kleen

> 
> cache_k8_northbridges() is storing config values to incorrect locations
> (in flush_words) and also its overflowing beyond the allocation, causing
> slab verification failures.

Oops. Thanks for tracking that down, Badari.

Andrew, clear .21 candidate.

-ANdi

> 
> Signed-off-by: Badari Pulavarty <[EMAIL PROTECTED]>
> ---
>  arch/x86_64/kernel/k8.c |4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> Index: linux-2.6.21-rc6/arch/x86_64/kernel/k8.c
> ===
> --- linux-2.6.21-rc6.orig/arch/x86_64/kernel/k8.c 2007-04-05 
> 19:36:56.0 -0700
> +++ linux-2.6.21-rc6/arch/x86_64/kernel/k8.c  2007-04-13 07:51:57.0 
> -0700
> @@ -61,8 +61,8 @@ int cache_k8_northbridges(void)
>   dev = NULL;
>   i = 0;
>   while ((dev = next_k8_northbridge(dev)) != NULL) {
> - k8_northbridges[i++] = dev;
> - pci_read_config_dword(dev, 0x9c, _words[i]);
> + k8_northbridges[i] = dev;
> + pci_read_config_dword(dev, 0x9c, _words[i++]);
>   }
>   k8_northbridges[i] = NULL;
>   return 0;
> 
> 
> 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] cache_k8_northbridges() overflows beyond allocation (Was: 2.6.21-rc5-mm4 (SLUB))

2007-04-13 Thread Badari Pulavarty
On Wed, 2007-04-04 at 11:04 -0700, Christoph Lameter wrote:
> On Wed, 4 Apr 2007, Badari Pulavarty wrote:
...
> 
> > *** SLUB: Freepointer corrupt in [EMAIL PROTECTED] Slab
> > 0x81017f9f8b80
> > offset=672 flags=0x2c7 inuse=42
> > freelist=0x810173f172a0
> >   Bytes b4 0x810173f17290:  a0 72 f1 73 00 00 00 00 00 00 00 00 00
> > 00 00 00 .r\us
> > Object 0x810173f172a0:  00 00 00 00 01 81 ff ff 00 00 00 00 00
> > 00 00 00 ..\u\u
> > FreePointer 0x810173f172a0 -> 0x8101
> 

Found it !! After a painful capture of all the kmalloc-16 slab
allocations (400+) so far and auditing some of them, found the
culprit - who writes beyond its allocation, causing the slab
corruption.

Thanks,
Badari

cache_k8_northbridges() is storing config values to incorrect locations
(in flush_words) and also its overflowing beyond the allocation, causing
slab verification failures.

Signed-off-by: Badari Pulavarty <[EMAIL PROTECTED]>
---
 arch/x86_64/kernel/k8.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux-2.6.21-rc6/arch/x86_64/kernel/k8.c
===
--- linux-2.6.21-rc6.orig/arch/x86_64/kernel/k8.c   2007-04-05 
19:36:56.0 -0700
+++ linux-2.6.21-rc6/arch/x86_64/kernel/k8.c2007-04-13 07:51:57.0 
-0700
@@ -61,8 +61,8 @@ int cache_k8_northbridges(void)
dev = NULL;
i = 0;
while ((dev = next_k8_northbridge(dev)) != NULL) {
-   k8_northbridges[i++] = dev;
-   pci_read_config_dword(dev, 0x9c, _words[i]);
+   k8_northbridges[i] = dev;
+   pci_read_config_dword(dev, 0x9c, _words[i++]);
}
k8_northbridges[i] = NULL;
return 0;


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] cache_k8_northbridges() overflows beyond allocation (Was: 2.6.21-rc5-mm4 (SLUB))

2007-04-13 Thread Badari Pulavarty
On Wed, 2007-04-04 at 11:04 -0700, Christoph Lameter wrote:
 On Wed, 4 Apr 2007, Badari Pulavarty wrote:
...
 
  *** SLUB: Freepointer corrupt in [EMAIL PROTECTED] Slab
  0x81017f9f8b80
  offset=672 flags=0x2c7 inuse=42
  freelist=0x810173f172a0
Bytes b4 0x810173f17290:  a0 72 f1 73 00 00 00 00 00 00 00 00 00
  00 00 00 .r\us
  Object 0x810173f172a0:  00 00 00 00 01 81 ff ff 00 00 00 00 00
  00 00 00 ..\u\u
  FreePointer 0x810173f172a0 - 0x8101
 

Found it !! After a painful capture of all the kmalloc-16 slab
allocations (400+) so far and auditing some of them, found the
culprit - who writes beyond its allocation, causing the slab
corruption.

Thanks,
Badari

cache_k8_northbridges() is storing config values to incorrect locations
(in flush_words) and also its overflowing beyond the allocation, causing
slab verification failures.

Signed-off-by: Badari Pulavarty [EMAIL PROTECTED]
---
 arch/x86_64/kernel/k8.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux-2.6.21-rc6/arch/x86_64/kernel/k8.c
===
--- linux-2.6.21-rc6.orig/arch/x86_64/kernel/k8.c   2007-04-05 
19:36:56.0 -0700
+++ linux-2.6.21-rc6/arch/x86_64/kernel/k8.c2007-04-13 07:51:57.0 
-0700
@@ -61,8 +61,8 @@ int cache_k8_northbridges(void)
dev = NULL;
i = 0;
while ((dev = next_k8_northbridge(dev)) != NULL) {
-   k8_northbridges[i++] = dev;
-   pci_read_config_dword(dev, 0x9c, flush_words[i]);
+   k8_northbridges[i] = dev;
+   pci_read_config_dword(dev, 0x9c, flush_words[i++]);
}
k8_northbridges[i] = NULL;
return 0;


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] cache_k8_northbridges() overflows beyond allocation (Was: 2.6.21-rc5-mm4 (SLUB))

2007-04-13 Thread Andi Kleen

 
 cache_k8_northbridges() is storing config values to incorrect locations
 (in flush_words) and also its overflowing beyond the allocation, causing
 slab verification failures.

Oops. Thanks for tracking that down, Badari.

Andrew, clear .21 candidate.

-ANdi

 
 Signed-off-by: Badari Pulavarty [EMAIL PROTECTED]
 ---
  arch/x86_64/kernel/k8.c |4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)
 
 Index: linux-2.6.21-rc6/arch/x86_64/kernel/k8.c
 ===
 --- linux-2.6.21-rc6.orig/arch/x86_64/kernel/k8.c 2007-04-05 
 19:36:56.0 -0700
 +++ linux-2.6.21-rc6/arch/x86_64/kernel/k8.c  2007-04-13 07:51:57.0 
 -0700
 @@ -61,8 +61,8 @@ int cache_k8_northbridges(void)
   dev = NULL;
   i = 0;
   while ((dev = next_k8_northbridge(dev)) != NULL) {
 - k8_northbridges[i++] = dev;
 - pci_read_config_dword(dev, 0x9c, flush_words[i]);
 + k8_northbridges[i] = dev;
 + pci_read_config_dword(dev, 0x9c, flush_words[i++]);
   }
   k8_northbridges[i] = NULL;
   return 0;
 
 
 


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] cache_k8_northbridges() overflows beyond allocation (Was: 2.6.21-rc5-mm4 (SLUB))

2007-04-13 Thread Chuck Ebbert
Andi Kleen wrote:
 cache_k8_northbridges() is storing config values to incorrect locations
 (in flush_words) and also its overflowing beyond the allocation, causing
 slab verification failures.
 
 Oops. Thanks for tracking that down, Badari.
 
 Andrew, clear .21 candidate.
 

2.6.20 as well. Do you want me to submit it?

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] cache_k8_northbridges() overflows beyond allocation (Was: 2.6.21-rc5-mm4 (SLUB))

2007-04-13 Thread Andi Kleen
On Friday 13 April 2007 18:42:43 Chuck Ebbert wrote:
 Andi Kleen wrote:
  cache_k8_northbridges() is storing config values to incorrect locations
  (in flush_words) and also its overflowing beyond the allocation, causing
  slab verification failures.
  
  Oops. Thanks for tracking that down, Badari.
  
  Andrew, clear .21 candidate.
  
 
 2.6.20 as well. Do you want me to submit it?

After it is in .21

-Andi


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] cache_k8_northbridges() overflows beyond allocation (Was: 2.6.21-rc5-mm4 (SLUB))

2007-04-13 Thread Andrew Morton
On Fri, 13 Apr 2007 17:45:37 +0200 Andi Kleen [EMAIL PROTECTED] wrote:

 
  
  cache_k8_northbridges() is storing config values to incorrect locations
  (in flush_words) and also its overflowing beyond the allocation, causing
  slab verification failures.
 
 Oops. Thanks for tracking that down, Badari.
 
 Andrew, clear .21 candidate.

OK.  And for 2.6.20.x, methinks.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] cache_k8_northbridges() overflows beyond allocation (Was: 2.6.21-rc5-mm4 (SLUB))

2007-04-13 Thread Christoph Lameter
On Fri, 13 Apr 2007, Badari Pulavarty wrote:

 On Wed, 2007-04-04 at 11:04 -0700, Christoph Lameter wrote:
  On Wed, 4 Apr 2007, Badari Pulavarty wrote:
 ...
  
   *** SLUB: Freepointer corrupt in [EMAIL PROTECTED] Slab
   0x81017f9f8b80
   offset=672 flags=0x2c7 inuse=42
   freelist=0x810173f172a0
 Bytes b4 0x810173f17290:  a0 72 f1 73 00 00 00 00 00 00 00 00 00
   00 00 00 .r\us
   Object 0x810173f172a0:  00 00 00 00 01 81 ff ff 00 00 00 00 00
   00 00 00 ..\u\u
   FreePointer 0x810173f172a0 - 0x8101
  
 
 Found it !! After a painful capture of all the kmalloc-16 slab
 allocations (400+) so far and auditing some of them, found the
 culprit - who writes beyond its allocation, causing the slab
 corruption.

Thanks. I am sorry that this was not easier for you. But as a result I 
thoroughly tested the slab corruption detection in SLUB yesterday found 
various issues and submitted  patches to Andrew that will make this 
really work well. Too late for you though.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: usb touchscreen breakage in 2.6.21-rc5-mm4 ?

2007-04-12 Thread Helge Hafting
On Wed, Apr 11, 2007 at 10:37:11AM -0400, Dmitry Torokhov wrote:
> On 4/11/07, Helge Hafting <[EMAIL PROTECTED]> wrote:
> >Dmitry Torokhov wrote:
> >>
> >> *sigh* When will I learn to spell names of kernel parameters
> >> correctly? It is initcall_debug, not debug_initcall :( Could you try
> >> again, please?
> >Here is the dmesg for rc5mm4 with initcall_debug, showing how
> >no usbtouch function is called at all.
> >
> 
> Helge,
> 
> I don't have any explanation why we don't see usbtouch_init called at
> all in -rc5-mm4. Could it be toolchain misbehaving? Do you see
> references to usbtouch_init in the kernel image itself?
> 
I unpacked it, ran "strings" on it, and found no usbtouch in there.
There were plenty of other usb names, such as usbfs, usbserial, usbcore
and tons of messages that usb mass storage and usb serial might
need to produce.

Versions of some tools, I don't know if there are any
known issues:

$ gcc --version
gcc (GCC) 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)
$ ld --version
GNU ld (GNU Binutils for Debian) 2.17.50.20070406
$ dpkg -l binutils
ii  binutils   2.17.20070406c The GNU assembler, linker and binary

Helge Hafting
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: usb touchscreen breakage in 2.6.21-rc5-mm4 ?

2007-04-12 Thread Helge Hafting
On Wed, Apr 11, 2007 at 10:37:11AM -0400, Dmitry Torokhov wrote:
 On 4/11/07, Helge Hafting [EMAIL PROTECTED] wrote:
 Dmitry Torokhov wrote:
 
  *sigh* When will I learn to spell names of kernel parameters
  correctly? It is initcall_debug, not debug_initcall :( Could you try
  again, please?
 Here is the dmesg for rc5mm4 with initcall_debug, showing how
 no usbtouch function is called at all.
 
 
 Helge,
 
 I don't have any explanation why we don't see usbtouch_init called at
 all in -rc5-mm4. Could it be toolchain misbehaving? Do you see
 references to usbtouch_init in the kernel image itself?
 
I unpacked it, ran strings on it, and found no usbtouch in there.
There were plenty of other usb names, such as usbfs, usbserial, usbcore
and tons of messages that usb mass storage and usb serial might
need to produce.

Versions of some tools, I don't know if there are any
known issues:

$ gcc --version
gcc (GCC) 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)
$ ld --version
GNU ld (GNU Binutils for Debian) 2.17.50.20070406
$ dpkg -l binutils
ii  binutils   2.17.20070406c The GNU assembler, linker and binary

Helge Hafting
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: usb touchscreen breakage in 2.6.21-rc5-mm4 ?

2007-04-11 Thread Dmitry Torokhov

On 4/11/07, Helge Hafting <[EMAIL PROTECTED]> wrote:

Dmitry Torokhov wrote:
>
> *sigh* When will I learn to spell names of kernel parameters
> correctly? It is initcall_debug, not debug_initcall :( Could you try
> again, please?
Here is the dmesg for rc5mm4 with initcall_debug, showing how
no usbtouch function is called at all.



Helge,

I don't have any explanation why we don't see usbtouch_init called at
all in -rc5-mm4. Could it be toolchain misbehaving? Do you see
references to usbtouch_init in the kernel image itself?

--
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: usb touchscreen breakage in 2.6.21-rc5-mm4 ?

2007-04-11 Thread Helge Hafting

Dmitry Torokhov wrote:


*sigh* When will I learn to spell names of kernel parameters
correctly? It is initcall_debug, not debug_initcall :( Could you try
again, please?

Here is the dmesg for rc5mm4 with initcall_debug, showing how
no usbtouch function is called at all.

I also attached a similiar dmesg for 2.6.21-rc6, where things work
normally. 


I also decompressed the rc5mm4 image, to check
that USB touchscreen really is compiled into this image. These
USB options are on:
CONFIG_USB_HID=y
CONFIG_USB_HIDDEV=y
CONFIG_USB_ARCH_HAS_HCD=y
CONFIG_USB_ARCH_HAS_OHCI=y
CONFIG_USB_ARCH_HAS_EHCI=y
CONFIG_USB=y
CONFIG_USB_DEBUG=y
CONFIG_USB_DEVICEFS=y
CONFIG_USB_DEVICE_CLASS=y
CONFIG_USB_EHCI_HCD=y
CONFIG_USB_EHCI_SPLIT_ISO=y
CONFIG_USB_EHCI_ROOT_HUB_TT=y
CONFIG_USB_EHCI_TT_NEWSCHED=y
CONFIG_USB_UHCI_HCD=y
CONFIG_USB_STORAGE=y
CONFIG_USB_STORAGE_DEBUG=y
CONFIG_USB_STORAGE_DATAFAB=y
CONFIG_USB_STORAGE_ISD200=y
CONFIG_USB_STORAGE_DPCM=y
CONFIG_USB_STORAGE_USBAT=y
CONFIG_USB_STORAGE_SDDR09=y
CONFIG_USB_STORAGE_SDDR55=y
CONFIG_USB_STORAGE_JUMPSHOT=y
CONFIG_USB_STORAGE_ALAUDA=y
CONFIG_USB_LIBUSUAL=y
CONFIG_USB_TOUCHSCREEN=y
CONFIG_USB_TOUCHSCREEN_EGALAX=y
CONFIG_USB_SERIAL=y
CONFIG_USB_SERIAL_PL2303=y

Helge Hafting


initcall_debugrc5mm4.gz
Description: application/gzip


initcall_debugrc6.gz
Description: application/gzip


Re: usb touchscreen breakage in 2.6.21-rc5-mm4 ?

2007-04-11 Thread Helge Hafting

Dmitry Torokhov wrote:


*sigh* When will I learn to spell names of kernel parameters
correctly? It is initcall_debug, not debug_initcall :( Could you try
again, please?

Here is the dmesg for rc5mm4 with initcall_debug, showing how
no usbtouch function is called at all.

I also attached a similiar dmesg for 2.6.21-rc6, where things work
normally. 


I also decompressed the rc5mm4 image, to check
that USB touchscreen really is compiled into this image. These
USB options are on:
CONFIG_USB_HID=y
CONFIG_USB_HIDDEV=y
CONFIG_USB_ARCH_HAS_HCD=y
CONFIG_USB_ARCH_HAS_OHCI=y
CONFIG_USB_ARCH_HAS_EHCI=y
CONFIG_USB=y
CONFIG_USB_DEBUG=y
CONFIG_USB_DEVICEFS=y
CONFIG_USB_DEVICE_CLASS=y
CONFIG_USB_EHCI_HCD=y
CONFIG_USB_EHCI_SPLIT_ISO=y
CONFIG_USB_EHCI_ROOT_HUB_TT=y
CONFIG_USB_EHCI_TT_NEWSCHED=y
CONFIG_USB_UHCI_HCD=y
CONFIG_USB_STORAGE=y
CONFIG_USB_STORAGE_DEBUG=y
CONFIG_USB_STORAGE_DATAFAB=y
CONFIG_USB_STORAGE_ISD200=y
CONFIG_USB_STORAGE_DPCM=y
CONFIG_USB_STORAGE_USBAT=y
CONFIG_USB_STORAGE_SDDR09=y
CONFIG_USB_STORAGE_SDDR55=y
CONFIG_USB_STORAGE_JUMPSHOT=y
CONFIG_USB_STORAGE_ALAUDA=y
CONFIG_USB_LIBUSUAL=y
CONFIG_USB_TOUCHSCREEN=y
CONFIG_USB_TOUCHSCREEN_EGALAX=y
CONFIG_USB_SERIAL=y
CONFIG_USB_SERIAL_PL2303=y

Helge Hafting


initcall_debugrc5mm4.gz
Description: application/gzip


initcall_debugrc6.gz
Description: application/gzip


Re: usb touchscreen breakage in 2.6.21-rc5-mm4 ?

2007-04-11 Thread Dmitry Torokhov

On 4/11/07, Helge Hafting [EMAIL PROTECTED] wrote:

Dmitry Torokhov wrote:

 *sigh* When will I learn to spell names of kernel parameters
 correctly? It is initcall_debug, not debug_initcall :( Could you try
 again, please?
Here is the dmesg for rc5mm4 with initcall_debug, showing how
no usbtouch function is called at all.



Helge,

I don't have any explanation why we don't see usbtouch_init called at
all in -rc5-mm4. Could it be toolchain misbehaving? Do you see
references to usbtouch_init in the kernel image itself?

--
Dmitry
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RAID1 "out of memory" error, was Re: 2.6.21-rc5-mm4

2007-04-10 Thread Neil Brown
On Friday April 6, [EMAIL PROTECTED] wrote:
> 
> Looks like some damage, or maybe intolerance to on-disk damage, to RAID-1.

Difference is that kzalloc(0, ) now returns NULL.  Maybe it is a
SLUB/SLAB difference? (So maybe it did use memory it shouldn't have
before, but now it fails, which is the better behaviour).

This patch fixes the maths and should probably go in various 'stable'
kernels.  Bug is in 2.6.18, but not 2.6.16.

Patch won't work for 2.6.18 as DIV_ROUND_UP is missing, but 2.6.19 and
later have it.

Thanks for the bug report.

NeilBrown


-
Fix calculation for size of filemap_attr array in md/bitmap.

If 'num_pages' were ever 1 more than a multiple of 8 (32bit platforms)
for of 16 (64 bit platforms). filemap_attr would be allocated one
'unsigned long' shorter than required.  We need a round-up in there.


Signed-off-by: Neil Brown <[EMAIL PROTECTED]>

### Diffstat output
 ./drivers/md/bitmap.c |4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff .prev/drivers/md/bitmap.c ./drivers/md/bitmap.c
--- .prev/drivers/md/bitmap.c   2007-04-11 13:24:50.0 +1000
+++ ./drivers/md/bitmap.c   2007-04-11 13:24:59.0 +1000
@@ -863,9 +863,7 @@ static int bitmap_init_from_disk(struct 
 
/* We need 4 bits per page, rounded up to a multiple of sizeof(unsigned 
long) */
bitmap->filemap_attr = kzalloc(
-   (((num_pages*4/8)+sizeof(unsigned long)-1)
-/sizeof(unsigned long))
-   *sizeof(unsigned long),
+   roundup( DIV_ROUND_UP(num_pages*4, 8), sizeof(unsigned long)),
GFP_KERNEL);
if (!bitmap->filemap_attr)
goto out;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: usb touchscreen breakage in 2.6.21-rc5-mm4 ?

2007-04-10 Thread Dmitry Torokhov

On 4/10/07, Helge Hafting <[EMAIL PROTECTED]> wrote:

Dmitry Torokhov wrote:
> Hmm, I am concerned because not only you don't have an input device created,
> you don't even see the driver being registered with usbcore. Could you please
> try booting with debug_initcall to see with what error code usbtouchscreen
> initialization fails?
>
Here is the dmesg from a boot with debug_initcall.
I can't see any messages from usbtouchscreen.
For me, it looks like the touchscreen is discovered and then
nothing happens to it.



*sigh* When will I learn to spell names of kernel parameters
correctly? It is initcall_debug, not debug_initcall :( Could you try
again, please?

--
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: usb touchscreen breakage in 2.6.21-rc5-mm4 ?

2007-04-10 Thread Helge Hafting

Andrew Morton wrote:

Is 2.6.21-rc6 OK?

If so, please keep a close eye on 2.6.22-rcX, let us know if/when we've
moved this breakage into mainline :(
  


2.6.21-rc6 is ok.
Here, I get messages from usbtouchscreen, something
rc5-mm4 failed to produce.
The egalax driver gets /class/input/input3,
usbcore registers usbtouchscreen, and the
touchscreen works. 


Well, it became /dev/input/event3 while
2.6.18 placed it at /dev/input/event1, but I think that
is more of a udev problem...

Helge Hafting

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: usb touchscreen breakage in 2.6.21-rc5-mm4 ?

2007-04-10 Thread Helge Hafting

Dmitry Torokhov wrote:

Hmm, I am concerned because not only you don't have an input device created,
you don't even see the driver being registered with usbcore. Could you please
try booting with debug_initcall to see with what error code usbtouchscreen
initialization fails?
  

Here is the dmesg from a boot with debug_initcall.
I can't see any messages from usbtouchscreen.
For me, it looks like the touchscreen is discovered and then
nothing happens to it.

Helge Hafting


debug_initcall.gz
Description: application/gzip


Re: usb touchscreen breakage in 2.6.21-rc5-mm4 ?

2007-04-10 Thread Helge Hafting

Dmitry Torokhov wrote:

Hmm, I am concerned because not only you don't have an input device created,
you don't even see the driver being registered with usbcore. Could you please
try booting with debug_initcall to see with what error code usbtouchscreen
initialization fails?
  

Here is the dmesg from a boot with debug_initcall.
I can't see any messages from usbtouchscreen.
For me, it looks like the touchscreen is discovered and then
nothing happens to it.

Helge Hafting


debug_initcall.gz
Description: application/gzip


Re: usb touchscreen breakage in 2.6.21-rc5-mm4 ?

2007-04-10 Thread Helge Hafting

Andrew Morton wrote:

Is 2.6.21-rc6 OK?

If so, please keep a close eye on 2.6.22-rcX, let us know if/when we've
moved this breakage into mainline :(
  


2.6.21-rc6 is ok.
Here, I get messages from usbtouchscreen, something
rc5-mm4 failed to produce.
The egalax driver gets /class/input/input3,
usbcore registers usbtouchscreen, and the
touchscreen works. 


Well, it became /dev/input/event3 while
2.6.18 placed it at /dev/input/event1, but I think that
is more of a udev problem...

Helge Hafting

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: usb touchscreen breakage in 2.6.21-rc5-mm4 ?

2007-04-10 Thread Dmitry Torokhov

On 4/10/07, Helge Hafting [EMAIL PROTECTED] wrote:

Dmitry Torokhov wrote:
 Hmm, I am concerned because not only you don't have an input device created,
 you don't even see the driver being registered with usbcore. Could you please
 try booting with debug_initcall to see with what error code usbtouchscreen
 initialization fails?

Here is the dmesg from a boot with debug_initcall.
I can't see any messages from usbtouchscreen.
For me, it looks like the touchscreen is discovered and then
nothing happens to it.



*sigh* When will I learn to spell names of kernel parameters
correctly? It is initcall_debug, not debug_initcall :( Could you try
again, please?

--
Dmitry
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RAID1 out of memory error, was Re: 2.6.21-rc5-mm4

2007-04-10 Thread Neil Brown
On Friday April 6, [EMAIL PROTECTED] wrote:
 
 Looks like some damage, or maybe intolerance to on-disk damage, to RAID-1.

Difference is that kzalloc(0, ) now returns NULL.  Maybe it is a
SLUB/SLAB difference? (So maybe it did use memory it shouldn't have
before, but now it fails, which is the better behaviour).

This patch fixes the maths and should probably go in various 'stable'
kernels.  Bug is in 2.6.18, but not 2.6.16.

Patch won't work for 2.6.18 as DIV_ROUND_UP is missing, but 2.6.19 and
later have it.

Thanks for the bug report.

NeilBrown


-
Fix calculation for size of filemap_attr array in md/bitmap.

If 'num_pages' were ever 1 more than a multiple of 8 (32bit platforms)
for of 16 (64 bit platforms). filemap_attr would be allocated one
'unsigned long' shorter than required.  We need a round-up in there.


Signed-off-by: Neil Brown [EMAIL PROTECTED]

### Diffstat output
 ./drivers/md/bitmap.c |4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff .prev/drivers/md/bitmap.c ./drivers/md/bitmap.c
--- .prev/drivers/md/bitmap.c   2007-04-11 13:24:50.0 +1000
+++ ./drivers/md/bitmap.c   2007-04-11 13:24:59.0 +1000
@@ -863,9 +863,7 @@ static int bitmap_init_from_disk(struct 
 
/* We need 4 bits per page, rounded up to a multiple of sizeof(unsigned 
long) */
bitmap-filemap_attr = kzalloc(
-   (((num_pages*4/8)+sizeof(unsigned long)-1)
-/sizeof(unsigned long))
-   *sizeof(unsigned long),
+   roundup( DIV_ROUND_UP(num_pages*4, 8), sizeof(unsigned long)),
GFP_KERNEL);
if (!bitmap-filemap_attr)
goto out;
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: usb touchscreen breakage in 2.6.21-rc5-mm4 ?

2007-04-09 Thread Dmitry Torokhov
On Monday 09 April 2007 18:36, Helge Hafting wrote:
> On Fri, Apr 06, 2007 at 10:37:12PM -0400, Dmitry Torokhov wrote:
> > On Friday 06 April 2007 20:54, Helge Hafting wrote:
> > > I have an usb  touchscreen (egalax variety) that works with
> > > the 2.6.18 kernel supplied by debian.
> > > 
> > > It fails when I compile 2.6.21-rc5-mm4, tuned to the machine
> > > in question.  Unlike the debian kernel, this kernel don't use
> > > modules in order to save boot time.
> > > 
> > > The strange thing is, 2.6.21-rc5-mm4 recognizes the device.
> > > dmesg says things like 
> > > usb 3-2: Manufacturer: eGalac Inc.
> > > usb 3-2: Product: USB TouchController
> > > 
> > > and a lot more. Unlike 2.6.18, it never gets around to say
> > > "usbcore: registered new driver usbtouchscreen"
> > > which seems to indicate a problem.
> > > usbcore registers several other drivers, such as usbserial and pl2303
> > > that makes the gps work. It also registers other drivers like
> > > usb-storage,usbfs,hub,libusual,hiddev,usbhid.  But not usbtouchscreen.
> > > I believe I have turned on every config option for usb touchscreen,
> > > this should not be missing.
> > > 
> > > Is there something wrong, or could there be a seemingly unrelated option
> > > that I need to turn on?
> > 
> > Please make sure that you have CONFIG_USB_TOUCHSCREEN turned on.
> > 
> Unfortunately, I have:
> CONFIG_USB_TOUCHSCREEN=y
> CONFIG_USB_TOUCHSCREEN_EGALAX=y
> 
> Anything else I may have missed?
>

Hmm, I am concerned because not only you don't have an input device created,
you don't even see the driver being registered with usbcore. Could you please
try booting with debug_initcall to see with what error code usbtouchscreen
initialization fails?


-- 
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: usb touchscreen breakage in 2.6.21-rc5-mm4 ?

2007-04-09 Thread Andrew Morton
On Tue, 10 Apr 2007 00:36:43 +0200
Helge Hafting <[EMAIL PROTECTED]> wrote:

> On Fri, Apr 06, 2007 at 10:37:12PM -0400, Dmitry Torokhov wrote:
> > On Friday 06 April 2007 20:54, Helge Hafting wrote:
> > > I have an usb  touchscreen (egalax variety) that works with
> > > the 2.6.18 kernel supplied by debian.
> > > 
> > > It fails when I compile 2.6.21-rc5-mm4, tuned to the machine
> > > in question.  Unlike the debian kernel, this kernel don't use
> > > modules in order to save boot time.
> > > 
> > > The strange thing is, 2.6.21-rc5-mm4 recognizes the device.
> > > dmesg says things like 
> > > usb 3-2: Manufacturer: eGalac Inc.
> > > usb 3-2: Product: USB TouchController
> > > 
> > > and a lot more. Unlike 2.6.18, it never gets around to say
> > > "usbcore: registered new driver usbtouchscreen"
> > > which seems to indicate a problem.
> > > usbcore registers several other drivers, such as usbserial and pl2303
> > > that makes the gps work. It also registers other drivers like
> > > usb-storage,usbfs,hub,libusual,hiddev,usbhid.  But not usbtouchscreen.
> > > I believe I have turned on every config option for usb touchscreen,
> > > this should not be missing.
> > > 
> > > Is there something wrong, or could there be a seemingly unrelated option
> > > that I need to turn on?
> > 
> > Please make sure that you have CONFIG_USB_TOUCHSCREEN turned on.
> > 
> Unfortunately, I have:
> CONFIG_USB_TOUCHSCREEN=y
> CONFIG_USB_TOUCHSCREEN_EGALAX=y
> 
> Anything else I may have missed?
> 

Is 2.6.21-rc6 OK?

If so, please keep a close eye on 2.6.22-rcX, let us know if/when we've
moved this breakage into mainline :(
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: usb touchscreen breakage in 2.6.21-rc5-mm4 ?

2007-04-09 Thread Helge Hafting
On Fri, Apr 06, 2007 at 10:37:12PM -0400, Dmitry Torokhov wrote:
> On Friday 06 April 2007 20:54, Helge Hafting wrote:
> > I have an usb  touchscreen (egalax variety) that works with
> > the 2.6.18 kernel supplied by debian.
> > 
> > It fails when I compile 2.6.21-rc5-mm4, tuned to the machine
> > in question.  Unlike the debian kernel, this kernel don't use
> > modules in order to save boot time.
> > 
> > The strange thing is, 2.6.21-rc5-mm4 recognizes the device.
> > dmesg says things like 
> > usb 3-2: Manufacturer: eGalac Inc.
> > usb 3-2: Product: USB TouchController
> > 
> > and a lot more. Unlike 2.6.18, it never gets around to say
> > "usbcore: registered new driver usbtouchscreen"
> > which seems to indicate a problem.
> > usbcore registers several other drivers, such as usbserial and pl2303
> > that makes the gps work. It also registers other drivers like
> > usb-storage,usbfs,hub,libusual,hiddev,usbhid.  But not usbtouchscreen.
> > I believe I have turned on every config option for usb touchscreen,
> > this should not be missing.
> > 
> > Is there something wrong, or could there be a seemingly unrelated option
> > that I need to turn on?
> 
> Please make sure that you have CONFIG_USB_TOUCHSCREEN turned on.
> 
Unfortunately, I have:
CONFIG_USB_TOUCHSCREEN=y
CONFIG_USB_TOUCHSCREEN_EGALAX=y

Anything else I may have missed?

Helge Hafting
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: usb touchscreen breakage in 2.6.21-rc5-mm4 ?

2007-04-09 Thread Helge Hafting
On Fri, Apr 06, 2007 at 10:37:12PM -0400, Dmitry Torokhov wrote:
 On Friday 06 April 2007 20:54, Helge Hafting wrote:
  I have an usb  touchscreen (egalax variety) that works with
  the 2.6.18 kernel supplied by debian.
  
  It fails when I compile 2.6.21-rc5-mm4, tuned to the machine
  in question.  Unlike the debian kernel, this kernel don't use
  modules in order to save boot time.
  
  The strange thing is, 2.6.21-rc5-mm4 recognizes the device.
  dmesg says things like 
  usb 3-2: Manufacturer: eGalac Inc.
  usb 3-2: Product: USB TouchController
  
  and a lot more. Unlike 2.6.18, it never gets around to say
  usbcore: registered new driver usbtouchscreen
  which seems to indicate a problem.
  usbcore registers several other drivers, such as usbserial and pl2303
  that makes the gps work. It also registers other drivers like
  usb-storage,usbfs,hub,libusual,hiddev,usbhid.  But not usbtouchscreen.
  I believe I have turned on every config option for usb touchscreen,
  this should not be missing.
  
  Is there something wrong, or could there be a seemingly unrelated option
  that I need to turn on?
 
 Please make sure that you have CONFIG_USB_TOUCHSCREEN turned on.
 
Unfortunately, I have:
CONFIG_USB_TOUCHSCREEN=y
CONFIG_USB_TOUCHSCREEN_EGALAX=y

Anything else I may have missed?

Helge Hafting
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: usb touchscreen breakage in 2.6.21-rc5-mm4 ?

2007-04-09 Thread Andrew Morton
On Tue, 10 Apr 2007 00:36:43 +0200
Helge Hafting [EMAIL PROTECTED] wrote:

 On Fri, Apr 06, 2007 at 10:37:12PM -0400, Dmitry Torokhov wrote:
  On Friday 06 April 2007 20:54, Helge Hafting wrote:
   I have an usb  touchscreen (egalax variety) that works with
   the 2.6.18 kernel supplied by debian.
   
   It fails when I compile 2.6.21-rc5-mm4, tuned to the machine
   in question.  Unlike the debian kernel, this kernel don't use
   modules in order to save boot time.
   
   The strange thing is, 2.6.21-rc5-mm4 recognizes the device.
   dmesg says things like 
   usb 3-2: Manufacturer: eGalac Inc.
   usb 3-2: Product: USB TouchController
   
   and a lot more. Unlike 2.6.18, it never gets around to say
   usbcore: registered new driver usbtouchscreen
   which seems to indicate a problem.
   usbcore registers several other drivers, such as usbserial and pl2303
   that makes the gps work. It also registers other drivers like
   usb-storage,usbfs,hub,libusual,hiddev,usbhid.  But not usbtouchscreen.
   I believe I have turned on every config option for usb touchscreen,
   this should not be missing.
   
   Is there something wrong, or could there be a seemingly unrelated option
   that I need to turn on?
  
  Please make sure that you have CONFIG_USB_TOUCHSCREEN turned on.
  
 Unfortunately, I have:
 CONFIG_USB_TOUCHSCREEN=y
 CONFIG_USB_TOUCHSCREEN_EGALAX=y
 
 Anything else I may have missed?
 

Is 2.6.21-rc6 OK?

If so, please keep a close eye on 2.6.22-rcX, let us know if/when we've
moved this breakage into mainline :(
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: usb touchscreen breakage in 2.6.21-rc5-mm4 ?

2007-04-09 Thread Dmitry Torokhov
On Monday 09 April 2007 18:36, Helge Hafting wrote:
 On Fri, Apr 06, 2007 at 10:37:12PM -0400, Dmitry Torokhov wrote:
  On Friday 06 April 2007 20:54, Helge Hafting wrote:
   I have an usb  touchscreen (egalax variety) that works with
   the 2.6.18 kernel supplied by debian.
   
   It fails when I compile 2.6.21-rc5-mm4, tuned to the machine
   in question.  Unlike the debian kernel, this kernel don't use
   modules in order to save boot time.
   
   The strange thing is, 2.6.21-rc5-mm4 recognizes the device.
   dmesg says things like 
   usb 3-2: Manufacturer: eGalac Inc.
   usb 3-2: Product: USB TouchController
   
   and a lot more. Unlike 2.6.18, it never gets around to say
   usbcore: registered new driver usbtouchscreen
   which seems to indicate a problem.
   usbcore registers several other drivers, such as usbserial and pl2303
   that makes the gps work. It also registers other drivers like
   usb-storage,usbfs,hub,libusual,hiddev,usbhid.  But not usbtouchscreen.
   I believe I have turned on every config option for usb touchscreen,
   this should not be missing.
   
   Is there something wrong, or could there be a seemingly unrelated option
   that I need to turn on?
  
  Please make sure that you have CONFIG_USB_TOUCHSCREEN turned on.
  
 Unfortunately, I have:
 CONFIG_USB_TOUCHSCREEN=y
 CONFIG_USB_TOUCHSCREEN_EGALAX=y
 
 Anything else I may have missed?


Hmm, I am concerned because not only you don't have an input device created,
you don't even see the driver being registered with usbcore. Could you please
try booting with debug_initcall to see with what error code usbtouchscreen
initialization fails?


-- 
Dmitry
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: usb touchscreen breakage in 2.6.21-rc5-mm4 ?

2007-04-06 Thread Dmitry Torokhov
On Friday 06 April 2007 20:54, Helge Hafting wrote:
> I have an usb  touchscreen (egalax variety) that works with
> the 2.6.18 kernel supplied by debian.
> 
> It fails when I compile 2.6.21-rc5-mm4, tuned to the machine
> in question.  Unlike the debian kernel, this kernel don't use
> modules in order to save boot time.
> 
> The strange thing is, 2.6.21-rc5-mm4 recognizes the device.
> dmesg says things like 
> usb 3-2: Manufacturer: eGalac Inc.
> usb 3-2: Product: USB TouchController
> 
> and a lot more. Unlike 2.6.18, it never gets around to say
> "usbcore: registered new driver usbtouchscreen"
> which seems to indicate a problem.
> usbcore registers several other drivers, such as usbserial and pl2303
> that makes the gps work. It also registers other drivers like
> usb-storage,usbfs,hub,libusual,hiddev,usbhid.  But not usbtouchscreen.
> I believe I have turned on every config option for usb touchscreen,
> this should not be missing.
> 
> Is there something wrong, or could there be a seemingly unrelated option
> that I need to turn on?

Please make sure that you have CONFIG_USB_TOUCHSCREEN turned on.

-- 
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


usb touchscreen breakage in 2.6.21-rc5-mm4 ?

2007-04-06 Thread Helge Hafting
I have an usb  touchscreen (egalax variety) that works with
the 2.6.18 kernel supplied by debian.

It fails when I compile 2.6.21-rc5-mm4, tuned to the machine
in question.  Unlike the debian kernel, this kernel don't use
modules in order to save boot time.

The strange thing is, 2.6.21-rc5-mm4 recognizes the device.
dmesg says things like 
usb 3-2: Manufacturer: eGalac Inc.
usb 3-2: Product: USB TouchController

and a lot more. Unlike 2.6.18, it never gets around to say
"usbcore: registered new driver usbtouchscreen"
which seems to indicate a problem.
usbcore registers several other drivers, such as usbserial and pl2303
that makes the gps work. It also registers other drivers like
usb-storage,usbfs,hub,libusual,hiddev,usbhid.  But not usbtouchscreen.
I believe I have turned on every config option for usb touchscreen,
this should not be missing.

Is there something wrong, or could there be a seemingly unrelated option
that I need to turn on?

Helge Hafting

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc5-mm4

2007-04-06 Thread Andrew Morton
On Fri, 06 Apr 2007 11:26:24 -0400
[EMAIL PROTECTED] wrote:

> On Thu, 05 Apr 2007 13:31:09 PDT, Andrew Morton said:
> > On Thu, 05 Apr 2007 13:02:59 -0400, [EMAIL PROTECTED] wrote:
> 
> > > Am seeing an Oops 'cannot handle kernel paging request' during late
> > > system startup, hand-copied traceback follows:
> > > 
> > > avc_has_perm_noaudit+0x2bf/0x506
> > > avc_has_perm+0x2b/0x5b
> > > selinux_socket_stream_connect+0x7e/0xc3
> > > unix_stream_connect+0x202/0x3f3
> > > sys_connect+0x7e/0xa4
> > > tracesys+0xde/0xe1
> 
> > Thanks.
> > 
> > I'd have thought that the full trace could be captured with netconsole.
> 
> I didn't have a second box available at first.  Then I blew close to 45
> minutes trying to figure out why netconsole was totally failing to work,
> before I found this in .config:
> 
> # CONFIG_NETCONSOLE is not set
> 
> "Do'h!" -- H. Simpson
> 
> Unfortunately, defining netconsole caused NETPOLL to be defined, which caused
> a recompile of half the known world, and the symptoms of the crash moved.
> 
> Film at 11, once I figure out what's going on, and fix the testbed in my
> office so I can actually catch this sucker - I may have to string a serial
> cable.  One solid good data point:
> 
> 21-rc5 with only the -mm4 'origin.patch' applied is OK, so whatever the
> issue is, it's not in Linus's tree.
> 

Oh well.  If it's all too much fuss, feel free to send the .config.  If it
happens on my machine(s) I can bisect it real quick.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc5-mm4

2007-04-06 Thread Valdis . Kletnieks
On Thu, 05 Apr 2007 13:31:09 PDT, Andrew Morton said:
> On Thu, 05 Apr 2007 13:02:59 -0400, [EMAIL PROTECTED] wrote:

> > Am seeing an Oops 'cannot handle kernel paging request' during late
> > system startup, hand-copied traceback follows:
> > 
> > avc_has_perm_noaudit+0x2bf/0x506
> > avc_has_perm+0x2b/0x5b
> > selinux_socket_stream_connect+0x7e/0xc3
> > unix_stream_connect+0x202/0x3f3
> > sys_connect+0x7e/0xa4
> > tracesys+0xde/0xe1

> Thanks.
> 
> I'd have thought that the full trace could be captured with netconsole.

I didn't have a second box available at first.  Then I blew close to 45
minutes trying to figure out why netconsole was totally failing to work,
before I found this in .config:

# CONFIG_NETCONSOLE is not set

"Do'h!" -- H. Simpson

Unfortunately, defining netconsole caused NETPOLL to be defined, which caused
a recompile of half the known world, and the symptoms of the crash moved.

Film at 11, once I figure out what's going on, and fix the testbed in my
office so I can actually catch this sucker - I may have to string a serial
cable.  One solid good data point:

21-rc5 with only the -mm4 'origin.patch' applied is OK, so whatever the
issue is, it's not in Linus's tree.



pgpnZHAiRtK0P.pgp
Description: PGP signature


Re: 2.6.21-rc5-mm4

2007-04-06 Thread Eric W. Biederman
Jiri Kosina <[EMAIL PROTECTED]> writes:

> Hi Eric,
>
> after struggling with this issue for some time, I think that it's just 
> some incosistent usage of NR_IRQS throughout the source probably due to 
> some include hell. I really don't understand the how the mach-*/ includes 
> are supposed to work.
>
> I found out (by disassembling resulting vmlinux binaries) that in 
> arch/i386/kernel/entry.S, the loop in irq_entries_start does too little 
> iterations compared to NR_IRQS value as seen in for example io_apic.c
>
> The super-stupid proof-patch below fixes the panic on my system. It's just 
> to demonstrate that the i386 includes really need fixing to be consistent 
> somehow.

Thanks, and that would do it, it  makes sense why it was the irq patch
that caused problems.  I had forgotten about the number of stubs issue.

I had to clean that up on x86_64 as well and it probably makes most sense
to put that cleanup as well, so we have a small fixed number of stubs
which would make the includes not matter.

Bleh.  Hopefully soon.

Eric

> diff --git a/arch/i386/kernel/entry.S b/arch/i386/kernel/entry.S
> index 976438c..b20dc07 100644
> --- a/arch/i386/kernel/entry.S
> +++ b/arch/i386/kernel/entry.S
> @@ -53,6 +53,8 @@
>  #include 
>  #include "irq_vectors.h"
>  
> +#define NR_IRQS 4096
> +
>  /*
>   * We use macros for low-level operations which need to be overridden
>   * for paravirtualization.  The following will never clobber any registers:
>
> -- 
> Jiri Kosina
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc5-mm4

2007-04-06 Thread Jiri Kosina
On Wed, 4 Apr 2007, Eric W. Biederman wrote:

> > And the bisection winner is
> >
> > i386-irq-kill-nr_irq_vectors-and-increase-nr_irqs.patch
> >
> > I don't immediately see how it could be causing it, so adding CCs which 
> > are listed in the patch.
> Weird.  I will have to look at that in a little more detail.
> Do you know if this problem happens on x86_64? What does your .config 
> look like? What does /proc/interrupts look like? What kind of hardware 
> you running this kernel on? Can anyone else reproduce this?
> The oops clearly shows something using -1 and calling that as an
> address I don't know why, but I'm guessing I have triggered a memory
> stomp somewhere.  I think this is the first time I have seen a small
> negative number causing a NULL pointer dereference.
> That patch looks innocuous enough that either:
> - I just missed changing something I should have.
> - Your configuration has an increase in NR_IRQS and that triggered
>   something.
> - The patch simply permuted things so a memory stomp now happens
>   on the e1000 data structures instead of somewhere else.
> - Something doesn't like large irq numbers.
> This work is essentially a backport from x86_64 so if your hardware
> is 64bit capable testing that should be a fairly easy test, and be
> able to rule out large irq numbers as the culprit.
> Until I get a good look at -mm I'm going to have a hard time guessing.
> But a roving memory stomp is my best guess.

Hi Eric,

after struggling with this issue for some time, I think that it's just 
some incosistent usage of NR_IRQS throughout the source probably due to 
some include hell. I really don't understand the how the mach-*/ includes 
are supposed to work.

I found out (by disassembling resulting vmlinux binaries) that in 
arch/i386/kernel/entry.S, the loop in irq_entries_start does too little 
iterations compared to NR_IRQS value as seen in for example io_apic.c

The super-stupid proof-patch below fixes the panic on my system. It's just 
to demonstrate that the i386 includes really need fixing to be consistent 
somehow.

diff --git a/arch/i386/kernel/entry.S b/arch/i386/kernel/entry.S
index 976438c..b20dc07 100644
--- a/arch/i386/kernel/entry.S
+++ b/arch/i386/kernel/entry.S
@@ -53,6 +53,8 @@
 #include 
 #include "irq_vectors.h"
 
+#define NR_IRQS 4096
+
 /*
  * We use macros for low-level operations which need to be overridden
  * for paravirtualization.  The following will never clobber any registers:

-- 
Jiri Kosina
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc5-mm4

2007-04-06 Thread Jiri Kosina
On Wed, 4 Apr 2007, Eric W. Biederman wrote:

  And the bisection winner is
 
  i386-irq-kill-nr_irq_vectors-and-increase-nr_irqs.patch
 
  I don't immediately see how it could be causing it, so adding CCs which 
  are listed in the patch.
 Weird.  I will have to look at that in a little more detail.
 Do you know if this problem happens on x86_64? What does your .config 
 look like? What does /proc/interrupts look like? What kind of hardware 
 you running this kernel on? Can anyone else reproduce this?
 The oops clearly shows something using -1 and calling that as an
 address I don't know why, but I'm guessing I have triggered a memory
 stomp somewhere.  I think this is the first time I have seen a small
 negative number causing a NULL pointer dereference.
 That patch looks innocuous enough that either:
 - I just missed changing something I should have.
 - Your configuration has an increase in NR_IRQS and that triggered
   something.
 - The patch simply permuted things so a memory stomp now happens
   on the e1000 data structures instead of somewhere else.
 - Something doesn't like large irq numbers.
 This work is essentially a backport from x86_64 so if your hardware
 is 64bit capable testing that should be a fairly easy test, and be
 able to rule out large irq numbers as the culprit.
 Until I get a good look at -mm I'm going to have a hard time guessing.
 But a roving memory stomp is my best guess.

Hi Eric,

after struggling with this issue for some time, I think that it's just 
some incosistent usage of NR_IRQS throughout the source probably due to 
some include hell. I really don't understand the how the mach-*/ includes 
are supposed to work.

I found out (by disassembling resulting vmlinux binaries) that in 
arch/i386/kernel/entry.S, the loop in irq_entries_start does too little 
iterations compared to NR_IRQS value as seen in for example io_apic.c

The super-stupid proof-patch below fixes the panic on my system. It's just 
to demonstrate that the i386 includes really need fixing to be consistent 
somehow.

diff --git a/arch/i386/kernel/entry.S b/arch/i386/kernel/entry.S
index 976438c..b20dc07 100644
--- a/arch/i386/kernel/entry.S
+++ b/arch/i386/kernel/entry.S
@@ -53,6 +53,8 @@
 #include asm/dwarf2.h
 #include irq_vectors.h
 
+#define NR_IRQS 4096
+
 /*
  * We use macros for low-level operations which need to be overridden
  * for paravirtualization.  The following will never clobber any registers:

-- 
Jiri Kosina
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc5-mm4

2007-04-06 Thread Eric W. Biederman
Jiri Kosina [EMAIL PROTECTED] writes:

 Hi Eric,

 after struggling with this issue for some time, I think that it's just 
 some incosistent usage of NR_IRQS throughout the source probably due to 
 some include hell. I really don't understand the how the mach-*/ includes 
 are supposed to work.

 I found out (by disassembling resulting vmlinux binaries) that in 
 arch/i386/kernel/entry.S, the loop in irq_entries_start does too little 
 iterations compared to NR_IRQS value as seen in for example io_apic.c

 The super-stupid proof-patch below fixes the panic on my system. It's just 
 to demonstrate that the i386 includes really need fixing to be consistent 
 somehow.

Thanks, and that would do it, it  makes sense why it was the irq patch
that caused problems.  I had forgotten about the number of stubs issue.

I had to clean that up on x86_64 as well and it probably makes most sense
to put that cleanup as well, so we have a small fixed number of stubs
which would make the includes not matter.

Bleh.  Hopefully soon.

Eric

 diff --git a/arch/i386/kernel/entry.S b/arch/i386/kernel/entry.S
 index 976438c..b20dc07 100644
 --- a/arch/i386/kernel/entry.S
 +++ b/arch/i386/kernel/entry.S
 @@ -53,6 +53,8 @@
  #include asm/dwarf2.h
  #include irq_vectors.h
  
 +#define NR_IRQS 4096
 +
  /*
   * We use macros for low-level operations which need to be overridden
   * for paravirtualization.  The following will never clobber any registers:

 -- 
 Jiri Kosina
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc5-mm4

2007-04-06 Thread Valdis . Kletnieks
On Thu, 05 Apr 2007 13:31:09 PDT, Andrew Morton said:
 On Thu, 05 Apr 2007 13:02:59 -0400, [EMAIL PROTECTED] wrote:

  Am seeing an Oops 'cannot handle kernel paging request' during late
  system startup, hand-copied traceback follows:
  
  avc_has_perm_noaudit+0x2bf/0x506
  avc_has_perm+0x2b/0x5b
  selinux_socket_stream_connect+0x7e/0xc3
  unix_stream_connect+0x202/0x3f3
  sys_connect+0x7e/0xa4
  tracesys+0xde/0xe1

 Thanks.
 
 I'd have thought that the full trace could be captured with netconsole.

I didn't have a second box available at first.  Then I blew close to 45
minutes trying to figure out why netconsole was totally failing to work,
before I found this in .config:

# CONFIG_NETCONSOLE is not set

Do'h! -- H. Simpson

Unfortunately, defining netconsole caused NETPOLL to be defined, which caused
a recompile of half the known world, and the symptoms of the crash moved.

Film at 11, once I figure out what's going on, and fix the testbed in my
office so I can actually catch this sucker - I may have to string a serial
cable.  One solid good data point:

21-rc5 with only the -mm4 'origin.patch' applied is OK, so whatever the
issue is, it's not in Linus's tree.



pgpnZHAiRtK0P.pgp
Description: PGP signature


Re: 2.6.21-rc5-mm4

2007-04-06 Thread Andrew Morton
On Fri, 06 Apr 2007 11:26:24 -0400
[EMAIL PROTECTED] wrote:

 On Thu, 05 Apr 2007 13:31:09 PDT, Andrew Morton said:
  On Thu, 05 Apr 2007 13:02:59 -0400, [EMAIL PROTECTED] wrote:
 
   Am seeing an Oops 'cannot handle kernel paging request' during late
   system startup, hand-copied traceback follows:
   
   avc_has_perm_noaudit+0x2bf/0x506
   avc_has_perm+0x2b/0x5b
   selinux_socket_stream_connect+0x7e/0xc3
   unix_stream_connect+0x202/0x3f3
   sys_connect+0x7e/0xa4
   tracesys+0xde/0xe1
 
  Thanks.
  
  I'd have thought that the full trace could be captured with netconsole.
 
 I didn't have a second box available at first.  Then I blew close to 45
 minutes trying to figure out why netconsole was totally failing to work,
 before I found this in .config:
 
 # CONFIG_NETCONSOLE is not set
 
 Do'h! -- H. Simpson
 
 Unfortunately, defining netconsole caused NETPOLL to be defined, which caused
 a recompile of half the known world, and the symptoms of the crash moved.
 
 Film at 11, once I figure out what's going on, and fix the testbed in my
 office so I can actually catch this sucker - I may have to string a serial
 cable.  One solid good data point:
 
 21-rc5 with only the -mm4 'origin.patch' applied is OK, so whatever the
 issue is, it's not in Linus's tree.
 

Oh well.  If it's all too much fuss, feel free to send the .config.  If it
happens on my machine(s) I can bisect it real quick.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


usb touchscreen breakage in 2.6.21-rc5-mm4 ?

2007-04-06 Thread Helge Hafting
I have an usb  touchscreen (egalax variety) that works with
the 2.6.18 kernel supplied by debian.

It fails when I compile 2.6.21-rc5-mm4, tuned to the machine
in question.  Unlike the debian kernel, this kernel don't use
modules in order to save boot time.

The strange thing is, 2.6.21-rc5-mm4 recognizes the device.
dmesg says things like 
usb 3-2: Manufacturer: eGalac Inc.
usb 3-2: Product: USB TouchController

and a lot more. Unlike 2.6.18, it never gets around to say
usbcore: registered new driver usbtouchscreen
which seems to indicate a problem.
usbcore registers several other drivers, such as usbserial and pl2303
that makes the gps work. It also registers other drivers like
usb-storage,usbfs,hub,libusual,hiddev,usbhid.  But not usbtouchscreen.
I believe I have turned on every config option for usb touchscreen,
this should not be missing.

Is there something wrong, or could there be a seemingly unrelated option
that I need to turn on?

Helge Hafting

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: usb touchscreen breakage in 2.6.21-rc5-mm4 ?

2007-04-06 Thread Dmitry Torokhov
On Friday 06 April 2007 20:54, Helge Hafting wrote:
 I have an usb  touchscreen (egalax variety) that works with
 the 2.6.18 kernel supplied by debian.
 
 It fails when I compile 2.6.21-rc5-mm4, tuned to the machine
 in question.  Unlike the debian kernel, this kernel don't use
 modules in order to save boot time.
 
 The strange thing is, 2.6.21-rc5-mm4 recognizes the device.
 dmesg says things like 
 usb 3-2: Manufacturer: eGalac Inc.
 usb 3-2: Product: USB TouchController
 
 and a lot more. Unlike 2.6.18, it never gets around to say
 usbcore: registered new driver usbtouchscreen
 which seems to indicate a problem.
 usbcore registers several other drivers, such as usbserial and pl2303
 that makes the gps work. It also registers other drivers like
 usb-storage,usbfs,hub,libusual,hiddev,usbhid.  But not usbtouchscreen.
 I believe I have turned on every config option for usb touchscreen,
 this should not be missing.
 
 Is there something wrong, or could there be a seemingly unrelated option
 that I need to turn on?

Please make sure that you have CONFIG_USB_TOUCHSCREEN turned on.

-- 
Dmitry
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RAID1 "out of memory" error, was Re: 2.6.21-rc5-mm4

2007-04-05 Thread Dan Williams

On 4/5/07, Andrew Morton <[EMAIL PROTECTED]> wrote:

On Fri, 06 Apr 2007 02:33:03 +1000
Reuben Farrelly <[EMAIL PROTECTED]> wrote:

> Hi,
>
> On 3/04/2007 3:47 PM, Andrew Morton wrote:
> > 
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc5/2.6.21-rc5-mm4/
> >
> > - The oops in git-net.patch has been fixed, so that tree has been restored.
> >   It is huge.
> >
> > - Added the device-mapper development tree to the -mm lineup (Alasdair
> >   Kergon).  It is a quilt tree, living at
> >   ftp://ftp.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/.
> >
> > - Added davidel's signalfd stuff.
>
> Looks like some damage, or maybe intolerance to on-disk damage, to RAID-1.
>
> md1 is the first array on the disk, and it refuses to start up on boot, or 
after
> boot.
>
> ...
>
> tornado ~ # mdadm --assemble /dev/md1 /dev/sda1 /dev/sdc1
> mdadm: device /dev/md1 already active - cannot assemble it
> tornado ~ # mdadm --run /dev/md1
> mdadm: failed to run array /dev/md1: Cannot allocate memory
> tornado ~ #
>
> and looking at a dmesg, this is logged:
>
> md: bind
> md: bind
> raid1: raid set md1 active with 2 out of 2 mirrors
> md1: bitmap initialized from disk: read 0/1 pages, set 0 bits, status: -12
> md1: failed to create bitmap (-12)
> md: pers->run() failed ...


Is this the dmesg from boot or the dmesg after running the mdadm --run command?


>
> tornado ~ # uname -a
> Linux tornado 2.6.21-rc5-mm4 #1 SMP Thu Apr 5 23:47:42 EST 2007 x86_64 
Intel(R)
> Pentium(R) 4 CPU 3.00GHz GenuineIntel GNU/Linux
> tornado ~ #
>
> The last known version that worked was 2.6.21-rc3-mm1 - I haven't been testing
> out the -mm releases so much lately.

OK.  I assume that bitmap->chunks in bitmap_init_from_disk() has some
unexpectedly large value.

I don't _think_ there's anything in -mm which would have triggered this.
Does mainline do the same thing?

I guess it's possible that the code in git-md-accel.patch accidentally
broke things.  Perhaps try disabling CONFIG_DMA_ENGINE?



git-md-accel.patch does not touch anything in the raid1 path, but I
guess stranger things have happened.

--
Dan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.21-rc5-mm4 initramfs Make Error

2007-04-05 Thread Zan Lynx
I built a version of 2.6.21-rc5-mm4 with an initramfs and it built OK
the first time.

Then I made changes (applied a Reiser4 patch) and rebuilt, and got the
following error:

zephyr linux # make
  CHK include/linux/version.h
  CHK include/linux/utsrelease.h
  CALLscripts/checksyscalls.sh
:1356:2: warning: #warning syscall getcpu not implemented
:1360:2: warning: #warning syscall epoll_pwait not implemented
:1364:2: warning: #warning syscall lutimesat not implemented
:1380:2: warning: #warning syscall revokeat not implemented
:1384:2: warning: #warning syscall frevoke not implemented
  CHK include/linux/compile.h
/usr/src/linux-2.6.21-rc5-mm4/usr/Makefile:41: *** target pattern contains no 
`%'.  Stop.
make: *** [usr] Error 2

I have this in the config:
CONFIG_INITRAMFS_SOURCE="/initramfs"

/initramfs is the directory where I build my initramfs, which is just a
busybox setup, very simple.

# rm usr/.initramfs_data.*
seems to make it go again.
-- 
Zan Lynx <[EMAIL PROTECTED]>


signature.asc
Description: This is a digitally signed message part


Re: 2.6.21-rc5-mm4

2007-04-05 Thread Andrew Morton
On Thu, 05 Apr 2007 13:02:59 -0400
[EMAIL PROTECTED] wrote:

> On Mon, 02 Apr 2007 22:47:45 PDT, Andrew Morton said:
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc5/2.6.21-rc5-mm4/
> 
> Am seeing an Oops 'cannot handle kernel paging request' during late
> system startup, hand-copied traceback follows:
> 
> avc_has_perm_noaudit+0x2bf/0x506
> avc_has_perm+0x2b/0x5b
> selinux_socket_stream_connect+0x7e/0xc3
> unix_stream_connect+0x202/0x3f3
> sys_connect+0x7e/0xa4
> tracesys+0xde/0xe1
> 
> I've not identified exactly when it happens, but it's towards the very end of
> handling /etc/rc5.d, it's already up to the S98's.  Odd thing is it only 
> happens
> when I start with RedHat's 'graphical boot', and may be related to the 
> shutdown
> of the X server that's displaying the boot progress preparing to launch the
> X server for gdm logins (as I'm also seeing a hang sometimes when shutting
> down - so it is possibly a "shutting down X server nukes system" bug).

Thanks.

I'd have thought that the full trace could be captured with netconsole.

> Figured I'd toss this heads-up in case it rings any bells, while I go do
> the bisection dance on -rc5-mm4 (-mm2 is OK, and -mm3 doesn't boot for me
> for other reasons I didn't chase down before -mm4 came out and fixed it, so
> I have a ways to bisect)
> 

No, I'm not aware of anyone else hitting anything like that.

Bisection would be good, and probably pretty quick - I'd pick git-net.patch
as the first pivot point.

But we'd still be wanting the full trace if poss please.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RAID1 "out of memory" error, was Re: 2.6.21-rc5-mm4

2007-04-05 Thread Andrew Morton
On Fri, 06 Apr 2007 02:33:03 +1000
Reuben Farrelly <[EMAIL PROTECTED]> wrote:

> Hi,
> 
> On 3/04/2007 3:47 PM, Andrew Morton wrote:
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc5/2.6.21-rc5-mm4/
> > 
> > - The oops in git-net.patch has been fixed, so that tree has been restored. 
> >   It is huge.
> > 
> > - Added the device-mapper development tree to the -mm lineup (Alasdair
> >   Kergon).  It is a quilt tree, living at
> >   ftp://ftp.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/.
> > 
> > - Added davidel's signalfd stuff.
> 
> Looks like some damage, or maybe intolerance to on-disk damage, to RAID-1.
> 
> md1 is the first array on the disk, and it refuses to start up on boot, or 
> after 
> boot.
> 
> ...
> 
> tornado ~ # mdadm --assemble /dev/md1 /dev/sda1 /dev/sdc1
> mdadm: device /dev/md1 already active - cannot assemble it
> tornado ~ # mdadm --run /dev/md1
> mdadm: failed to run array /dev/md1: Cannot allocate memory
> tornado ~ #
> 
> and looking at a dmesg, this is logged:
> 
> md: bind
> md: bind
> raid1: raid set md1 active with 2 out of 2 mirrors
> md1: bitmap initialized from disk: read 0/1 pages, set 0 bits, status: -12
> md1: failed to create bitmap (-12)
> md: pers->run() failed ...
> 
> tornado ~ # uname -a
> Linux tornado 2.6.21-rc5-mm4 #1 SMP Thu Apr 5 23:47:42 EST 2007 x86_64 
> Intel(R) 
> Pentium(R) 4 CPU 3.00GHz GenuineIntel GNU/Linux
> tornado ~ #
> 
> The last known version that worked was 2.6.21-rc3-mm1 - I haven't been 
> testing 
> out the -mm releases so much lately.

OK.  I assume that bitmap->chunks in bitmap_init_from_disk() has some
unexpectedly large value.

I don't _think_ there's anything in -mm which would have triggered this. 
Does mainline do the same thing?

I guess it's possible that the code in git-md-accel.patch accidentally
broke things.  Perhaps try disabling CONFIG_DMA_ENGINE?

> Also, Andrew, can you please restart posting/cc'ing your -mm announcements to 
> the [EMAIL PROTECTED] list?  Seems this stopped around about 
> 2.6.20, it was handy.

hm.  I always Bcc [EMAIL PROTECTED]  I assume that its
filters didn't get updated after the s/osdl/linux-foundation/ thing.  I'll
talk to people, thanks.

> .config is up at http://www.reub.net/files/kernel/configs/2.6.21-rc5-mm4
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc5-mm4: ia64: scheduling while atomic - utrace?

2007-04-05 Thread Roland McGrath
Thanks for the report.  I introduced this bug recently when I changed
around some of the locking but forgot about the writeback issue.  I don't
think this is directly related to any other crash you might have seen.

I've moved the call out of the lock-holding region, where it didn't need to
be.  I'm updating my patch series now; I've appended the incremental patch.


Thanks,
Roland

---
 kernel/ptrace.c |   20 ++--
 1 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index fb6c3fb..c31d744 100644  
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -1473,16 +1473,6 @@ ptrace_report(struct utrace_attached_eng
 */
utrace_set_flags(tsk, engine, engine->flags | UTRACE_ACTION_QUIESCE);
 
-   /*
-* If regset 0 has a writeback call, do it now.  On register window
-* machines, this makes sure the user memory backing the register
-* data is up to date by the time wait_task_inactive returns to
-* ptrace_start in our tracer doing a PTRACE_PEEKDATA or the like.
-*/
-   regset = utrace_regset(tsk, engine, utrace_native_view(tsk), 0);
-   if (regset->writeback)
-   (*regset->writeback)(tsk, regset, 0);
-
BUG_ON(code == 0);
tsk->exit_code = code;
do_notify(tsk, state->parent, CLD_TRAPPED);
@@ -1494,6 +1484,16 @@ ptrace_report(struct utrace_attached_eng
 
NO_LOCKS;
 
+   /*
+* If regset 0 has a writeback call, do it now.  On register window
+* machines, this makes sure the user memory backing the register
+* data is up to date by the time wait_task_inactive returns to
+* ptrace_start in our tracer doing a PTRACE_PEEKDATA or the like.
+*/
+   regset = utrace_regset(tsk, engine, utrace_native_view(tsk), 0);
+   if (regset->writeback)
+   (*regset->writeback)(tsk, regset, 0);
+
return UTRACE_ACTION_RESUME;
 }
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc5-mm4 (SLUB)

2007-04-05 Thread Christoph Lameter
On Thu, 5 Apr 2007, Badari Pulavarty wrote:

> On Wed, 2007-04-04 at 21:29 -0700, Christoph Lameter wrote:
> > Here is a patch that adds validation (only for cpuslabs and partial 
> > slabs but thats where the action is). Apply this patch
> > and then do
> > 
> > echo 1 >/sys/slab//validate
> > 
> > I suggest to boot with full debugging and then run this on the ACPI slabs.
> 
> Did this and didn't trigger any problems.

Duh. Must have been in the full slabs. Maybe I should add a tracking of 
full slabs for the debug case. Would also enable leak detection.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.21-rc5-mm4: ia64: scheduling while atomic - utrace?

2007-04-05 Thread Lee Schermerhorn
Running a 'usex -e' load [http://people.redhat.com/~anderson/usex/] on
2.6.21-rc5-mm4 on ia64, I see the following:

BUG: scheduling while atomic: strace/0x4001/20162

Call Trace:
 [] show_stack+0x80/0xa0
sp=e76042dc7610 bsp=e76042dc1260
 [] dump_stack+0x30/0x60
sp=e76042dc77e0 bsp=e76042dc1248
 [] schedule+0x1d00/0x22a0
sp=e76042dc77e0 bsp=e76042dc1108
 [] __cond_resched+0x50/0xa0
sp=e76042dc7800 bsp=e76042dc10e8
 [] cond_resched+0xb0/0xe0
sp=e76042dc7800 bsp=e76042dc10d0
 [] get_user_pages+0x1b0/0x7c0
sp=e76042dc7800 bsp=e76042dc1028
 [] access_process_vm+0xc0/0x440
sp=e76042dc7820 bsp=e76042dc0f78
 [] ia64_sync_user_rbs+0x80/0x100
sp=e76042dc7830 bsp=e76042dc0f38
 [] do_gpregs_writeback+0xb0/0xe0
sp=e76042dc7840 bsp=e76042dc0f10
 [] unw_init_running+0x70/0xa0
sp=e76042dc7850 bsp=e76042dc0ee8
 [] do_regset_call+0x110/0x140
sp=e76042dc7c30 bsp=e76042dc0e88
 [] gpregs_writeback+0x40/0x60
sp=e76042dc7e30 bsp=e76042dc0e60
 [] ptrace_report+0xe0/0x1e0
sp=e76042dc7e30 bsp=e76042dc0e28
 [] ptrace_report_syscall+0xa0/0xe0
sp=e76042dc7e30 bsp=e76042dc0e00
 [] ptrace_report_syscall_exit+0x30/0x60
sp=e76042dc7e30 bsp=e76042dc0dc8
 [] utrace_report_syscall+0xf0/0x540
sp=e76042dc7e30 bsp=e76042dc0d48
 [] syscall_trace_leave+0x60/0xc0
sp=e76042dc7e30 bsp=e76042dc0cf0
 [] ia64_trace_syscall+0x100/0x110
sp=e76042dc7e30 bsp=e76042dc0cf0

Looks like get_ptrace_state(), called from ptrace_report_syscall calls
rcu_read_lock() which disables preemption.  Corresponding
rcu_read_unlock() will be from put_ptrace_state() from ptrace_report()
at end of report.  However, ia64 needs to sync register backing store,
and this requires access to process vm.  get_user_pages' use of
cond_sched() is tripping the "scheduling while atomic" bug.

May be related to:

http://marc.info/?a=10288337963=1=4


Lee

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc5-mm4

2007-04-05 Thread Valdis . Kletnieks
On Mon, 02 Apr 2007 22:47:45 PDT, Andrew Morton said:
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc5/2.6.21-rc5-mm4/

Am seeing an Oops 'cannot handle kernel paging request' during late
system startup, hand-copied traceback follows:

avc_has_perm_noaudit+0x2bf/0x506
avc_has_perm+0x2b/0x5b
selinux_socket_stream_connect+0x7e/0xc3
unix_stream_connect+0x202/0x3f3
sys_connect+0x7e/0xa4
tracesys+0xde/0xe1

I've not identified exactly when it happens, but it's towards the very end of
handling /etc/rc5.d, it's already up to the S98's.  Odd thing is it only happens
when I start with RedHat's 'graphical boot', and may be related to the shutdown
of the X server that's displaying the boot progress preparing to launch the
X server for gdm logins (as I'm also seeing a hang sometimes when shutting
down - so it is possibly a "shutting down X server nukes system" bug).

Figured I'd toss this heads-up in case it rings any bells, while I go do
the bisection dance on -rc5-mm4 (-mm2 is OK, and -mm3 doesn't boot for me
for other reasons I didn't chase down before -mm4 came out and fixed it, so
I have a ways to bisect)


pgpDJNQg7QOfl.pgp
Description: PGP signature


RAID1 "out of memory" error, was Re: 2.6.21-rc5-mm4

2007-04-05 Thread Reuben Farrelly

Hi,

On 3/04/2007 3:47 PM, Andrew Morton wrote:

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc5/2.6.21-rc5-mm4/

- The oops in git-net.patch has been fixed, so that tree has been restored. 
  It is huge.


- Added the device-mapper development tree to the -mm lineup (Alasdair
  Kergon).  It is a quilt tree, living at
  ftp://ftp.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/.

- Added davidel's signalfd stuff.


Looks like some damage, or maybe intolerance to on-disk damage, to RAID-1.

md1 is the first array on the disk, and it refuses to start up on boot, or after 
boot.


tornado ~ # cat /proc/mdstat
Personalities : [raid1]
md1 : inactive sda1[0] sdc1[1]
  208640 blocks

md3 : active raid1 sdc3[1] sda3[0]
  20008832 blocks [2/2] [UU]
  bitmap: 0/153 pages [0KB], 64KB chunk

md5 : active raid1 sdc5[1] sda5[0]
  10008384 blocks [2/2] [UU]
  bitmap: 4/153 pages [16KB], 32KB chunk

md6 : active raid1 sdc6[1] sda6[0]
  10008384 blocks [2/2] [UU]
  bitmap: 0/153 pages [0KB], 32KB chunk

md8 : active raid1 sdc8[1] sda8[0]
  1003904 blocks [2/2] [UU]
  bitmap: 0/123 pages [0KB], 4KB chunk

md10 : active raid1 sdc10[1] sda10[0]
  119933120 blocks [2/2] [UU]
  bitmap: 1/229 pages [4KB], 256KB chunk

md2 : active raid1 sdc2[1] sda2[0]
  14544 blocks [2/2] [UU]
  bitmap: 10/191 pages [40KB], 256KB chunk

unused devices: 
tornado ~ #

tornado ~ # mdadm --examine /dev/sda1
/dev/sda1:
  Magic : a92b4efc
Version : 00.90.00
   UUID : f5c2e565:5ed956c0:33b08c07:16154426
  Creation Time : Fri Feb  2 10:16:29 2007
 Raid Level : raid1
  Used Dev Size : 104320 (101.89 MiB 106.82 MB)
 Array Size : 104320 (101.89 MiB 106.82 MB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 1

Update Time : Fri Apr  6 02:06:17 2007
  State : clean
Internal Bitmap : present
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
   Checksum : d3668aaa - correct
 Events : 0.368


  Number   Major   Minor   RaidDevice State
this 0   810  active sync   /dev/sda1

   0 0   810  active sync   /dev/sda1
   1 1   8   331  active sync   /dev/sdc1
tornado ~ # mdadm --examine /dev/sdc1
/dev/sdc1:
  Magic : a92b4efc
Version : 00.90.00
   UUID : f5c2e565:5ed956c0:33b08c07:16154426
  Creation Time : Fri Feb  2 10:16:29 2007
 Raid Level : raid1
  Used Dev Size : 104320 (101.89 MiB 106.82 MB)
 Array Size : 104320 (101.89 MiB 106.82 MB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 1

Update Time : Fri Apr  6 02:06:17 2007
  State : clean
Internal Bitmap : present
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
   Checksum : d3668acc - correct
 Events : 0.368


  Number   Major   Minor   RaidDevice State
this 1   8   331  active sync   /dev/sdc1

   0 0   810  active sync   /dev/sda1
   1 1   8   331  active sync   /dev/sdc1
tornado ~ #


tornado ~ # mdadm --assemble /dev/md1 /dev/sda1 /dev/sdc1
mdadm: device /dev/md1 already active - cannot assemble it
tornado ~ # mdadm --run /dev/md1
mdadm: failed to run array /dev/md1: Cannot allocate memory
tornado ~ #

and looking at a dmesg, this is logged:

md: bind
md: bind
raid1: raid set md1 active with 2 out of 2 mirrors
md1: bitmap initialized from disk: read 0/1 pages, set 0 bits, status: -12
md1: failed to create bitmap (-12)
md: pers->run() failed ...

tornado ~ # uname -a
Linux tornado 2.6.21-rc5-mm4 #1 SMP Thu Apr 5 23:47:42 EST 2007 x86_64 Intel(R) 
Pentium(R) 4 CPU 3.00GHz GenuineIntel GNU/Linux

tornado ~ #

The last known version that worked was 2.6.21-rc3-mm1 - I haven't been testing 
out the -mm releases so much lately.


Also, Andrew, can you please restart posting/cc'ing your -mm announcements to 
the [EMAIL PROTECTED] list?  Seems this stopped around about 
2.6.20, it was handy.


.config is up at http://www.reub.net/files/kernel/configs/2.6.21-rc5-mm4

Thanks,
Reuben
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc5-mm4 (SLUB)

2007-04-05 Thread Badari Pulavarty
On Wed, 2007-04-04 at 21:29 -0700, Christoph Lameter wrote:
> Here is a patch that adds validation (only for cpuslabs and partial 
> slabs but thats where the action is). Apply this patch
> and then do
> 
> echo 1 >/sys/slab//validate
> 
> I suggest to boot with full debugging and then run this on the ACPI slabs.

Did this and didn't trigger any problems.

(Just to be clear, booted with "slub_debug" with all the patches
applied).

--- Validating slabcache 'Acpi-Namespace'
--- Checked 0 slabs in 'Acpi-Namespace'
--- Validating slabcache 'Acpi-Operand'
--- Checked 5 slabs in 'Acpi-Operand'
--- Validating slabcache 'Acpi-Parse'
--- Checked 0 slabs in 'Acpi-Parse'
--- Validating slabcache 'Acpi-ParseExt'
--- Checked 0 slabs in 'Acpi-ParseExt'
--- Validating slabcache 'Acpi-State'
--- Checked 0 slabs in 'Acpi-State'
--- Validating slabcache 'Acpi-Namespace'
--- Checked 0 slabs in 'Acpi-Namespace'
--- Validating slabcache 'Acpi-Operand'
--- Checked 5 slabs in 'Acpi-Operand'
--- Validating slabcache 'Acpi-Parse'
--- Checked 0 slabs in 'Acpi-Parse'
--- Validating slabcache 'Acpi-ParseExt'
--- Checked 0 slabs in 'Acpi-ParseExt'
--- Validating slabcache 'Acpi-State'
--- Checked 0 slabs in 'Acpi-State'
--- Validating slabcache 'RAW'
--- Checked 1 slabs in 'RAW'
--- Validating slabcache 'RAWv6'
--- Checked 1 slabs in 'RAWv6'
--- Validating slabcache 'TCP'
--- Checked 3 slabs in 'TCP'
--- Validating slabcache 'TCPv6'
--- Checked 4 slabs in 'TCPv6'
--- Validating slabcache 'UDP-Lite'
--- Checked 0 slabs in 'UDP-Lite'
--- Validating slabcache 'UDP'
--- Checked 2 slabs in 'UDP'
--- Validating slabcache 'UDPLITEv6'
--- Checked 0 slabs in 'UDPLITEv6'
--- Validating slabcache 'UDPv6'
--- Checked 0 slabs in 'UDPv6'
--- Validating slabcache 'UNIX'
--- Checked 4 slabs in 'UNIX'
--- Validating slabcache 'anon_vma'
--- Checked 12 slabs in 'anon_vma'
--- Validating slabcache 'arp_cache'
--- Checked 2 slabs in 'arp_cache'
--- Validating slabcache 'bdev_cache'
--- Checked 3 slabs in 'bdev_cache'
--- Validating slabcache 'bio'
--- Checked 0 slabs in 'bio'
--- Validating slabcache 'biovec-1'
--- Checked 1 slabs in 'biovec-1'
--- Validating slabcache 'biovec-128'
--- Checked 1 slabs in 'biovec-128'
--- Validating slabcache 'biovec-16'
--- Checked 1 slabs in 'biovec-16'
--- Validating slabcache 'biovec-256'
--- Checked 1 slabs in 'biovec-256'
--- Validating slabcache 'biovec-4'
--- Checked 1 slabs in 'biovec-4'
--- Validating slabcache 'biovec-64'
--- Checked 1 slabs in 'biovec-64'
--- Validating slabcache 'blkdev_ioc'
--- Checked 4 slabs in 'blkdev_ioc'
--- Validating slabcache 'blkdev_queue'
--- Checked 1 slabs in 'blkdev_queue'
--- Validating slabcache 'blkdev_requests'
--- Checked 2 slabs in 'blkdev_requests'
--- Validating slabcache 'buffer_head'
--- Checked 4 slabs in 'buffer_head'
--- Validating slabcache 'cfq_ioc_pool'
--- Checked 4 slabs in 'cfq_ioc_pool'
--- Validating slabcache 'cfq_pool'
--- Checked 4 slabs in 'cfq_pool'
--- Validating slabcache 'configfs_dir_cache'
--- Checked 0 slabs in 'configfs_dir_cache'
--- Validating slabcache 'dentry_cache'
--- Checked 5 slabs in 'dentry_cache'
--- Validating slabcache 'dm_io'
--- Checked 0 slabs in 'dm_io'
--- Validating slabcache 'dm_tio'
--- Checked 0 slabs in 'dm_tio'
--- Validating slabcache 'dnotify_cache'
--- Checked 1 slabs in 'dnotify_cache'
--- Validating slabcache 'dquot'
--- Checked 0 slabs in 'dquot'
--- Validating slabcache 'eventpoll_epi'
--- Checked 1 slabs in 'eventpoll_epi'
--- Validating slabcache 'eventpoll_pwq'
--- Checked 1 slabs in 'eventpoll_pwq'
--- Validating slabcache 'ext2_inode_cache'
--- Checked 0 slabs in 'ext2_inode_cache'
--- Validating slabcache 'ext2_xattr'
--- Checked 0 slabs in 'ext2_xattr'
--- Validating slabcache 'ext3_inode_cache'
--- Checked 0 slabs in 'ext3_inode_cache'
--- Validating slabcache 'ext3_xattr'
--- Checked 0 slabs in 'ext3_xattr'
--- Validating slabcache 'fasync_cache'
--- Checked 0 slabs in 'fasync_cache'
--- Validating slabcache 'fib6_nodes'
--- Checked 1 slabs in 'fib6_nodes'
--- Validating slabcache 'file_lock_cache'
--- Checked 2 slabs in 'file_lock_cache'
--- Validating slabcache 'files_cache'
--- Checked 10 slabs in 'files_cache'
--- Validating slabcache 'filp'
--- Checked 35 slabs in 'filp'
--- Validating slabcache 'flow_cache'
--- Checked 0 slabs in 'flow_cache'
--- Validating slabcache 'fs_cache'
--- Checked 5 slabs in 'fs_cache'
--- Validating slabcache 'hugetlbfs_inode_cache'
--- Checked 1 slabs in 'hugetlbfs_inode_cache'
--- Validating slabcache 'idr_layer_cache'
--- Checked 2 slabs in 'idr_layer_cache'
--- Validating slabcache 'inet_peer_cache'
--- Checked 0 slabs in 'inet_peer_cache'
--- Validating slabcache 'inode_cache'
--- Checked 8 slabs in 'inode_cache'
--- Validating slabcache 'inotify_event_cache'
--- Checked 0 slabs in 'inotify_event_cache'
--- Validating slabcache 'inotify_watch_cache'
--- Checked 1 slabs in 'inotify_watch_cache'
--- Validating slabcache 'ip6_dst_cache'
--- Checked 1 slabs in 

Re: 2.6.21-rc5-mm4

2007-04-05 Thread Sam Ravnborg
On Wed, Apr 04, 2007 at 01:55:08PM -0400, [EMAIL PROTECTED] wrote:
> On Tue, 03 Apr 2007 20:37:42 PDT, Randy Dunlap said:
> >
> > Good luck.  But the symbols are there.  Just use left/right arrow keys
> > to scroll the display left/right and you can see them.  Now if you just
> > had that indicator to tell you that you Need to scroll to see more text...
> 
> Exactly. :)  I had the incredible bad luck that the line got cut off at the
> end of a CONFIG_ symbol that made sense - if it had showed up *half* a symbol,
> I'd have gone investigating. ;) (Even a '>' or '<' saying data offscreen to
> right or left would be sufficient, if somebody wants a small but productive
> kernel (config system actually) task to hack on.)
> 
> I'd code it myself, but I have an SL8500 to install, and need to figure out
> how my laptop made it into the bag this morning still up and running (I hit
> the power button, it seemed to power down - blank screen, power light off,
> but syslog msgs prove it was up and running for another 4 hours before it
> shut down on a thermal check...)

If you do not find time to do it try to ping me in a week or so.
Should be trivial to do but away from my dev box atm.

   Sam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc5-mm4: ia64: scheduling while atomic - utrace?

2007-04-05 Thread Roland McGrath
Thanks for the report.  I introduced this bug recently when I changed
around some of the locking but forgot about the writeback issue.  I don't
think this is directly related to any other crash you might have seen.

I've moved the call out of the lock-holding region, where it didn't need to
be.  I'm updating my patch series now; I've appended the incremental patch.


Thanks,
Roland

---
 kernel/ptrace.c |   20 ++--
 1 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index fb6c3fb..c31d744 100644  
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -1473,16 +1473,6 @@ ptrace_report(struct utrace_attached_eng
 */
utrace_set_flags(tsk, engine, engine-flags | UTRACE_ACTION_QUIESCE);
 
-   /*
-* If regset 0 has a writeback call, do it now.  On register window
-* machines, this makes sure the user memory backing the register
-* data is up to date by the time wait_task_inactive returns to
-* ptrace_start in our tracer doing a PTRACE_PEEKDATA or the like.
-*/
-   regset = utrace_regset(tsk, engine, utrace_native_view(tsk), 0);
-   if (regset-writeback)
-   (*regset-writeback)(tsk, regset, 0);
-
BUG_ON(code == 0);
tsk-exit_code = code;
do_notify(tsk, state-parent, CLD_TRAPPED);
@@ -1494,6 +1484,16 @@ ptrace_report(struct utrace_attached_eng
 
NO_LOCKS;
 
+   /*
+* If regset 0 has a writeback call, do it now.  On register window
+* machines, this makes sure the user memory backing the register
+* data is up to date by the time wait_task_inactive returns to
+* ptrace_start in our tracer doing a PTRACE_PEEKDATA or the like.
+*/
+   regset = utrace_regset(tsk, engine, utrace_native_view(tsk), 0);
+   if (regset-writeback)
+   (*regset-writeback)(tsk, regset, 0);
+
return UTRACE_ACTION_RESUME;
 }
 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RAID1 out of memory error, was Re: 2.6.21-rc5-mm4

2007-04-05 Thread Andrew Morton
On Fri, 06 Apr 2007 02:33:03 +1000
Reuben Farrelly [EMAIL PROTECTED] wrote:

 Hi,
 
 On 3/04/2007 3:47 PM, Andrew Morton wrote:
  ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc5/2.6.21-rc5-mm4/
  
  - The oops in git-net.patch has been fixed, so that tree has been restored. 
It is huge.
  
  - Added the device-mapper development tree to the -mm lineup (Alasdair
Kergon).  It is a quilt tree, living at
ftp://ftp.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/.
  
  - Added davidel's signalfd stuff.
 
 Looks like some damage, or maybe intolerance to on-disk damage, to RAID-1.
 
 md1 is the first array on the disk, and it refuses to start up on boot, or 
 after 
 boot.
 
 ...
 
 tornado ~ # mdadm --assemble /dev/md1 /dev/sda1 /dev/sdc1
 mdadm: device /dev/md1 already active - cannot assemble it
 tornado ~ # mdadm --run /dev/md1
 mdadm: failed to run array /dev/md1: Cannot allocate memory
 tornado ~ #
 
 and looking at a dmesg, this is logged:
 
 md: bindsdc1
 md: bindsda1
 raid1: raid set md1 active with 2 out of 2 mirrors
 md1: bitmap initialized from disk: read 0/1 pages, set 0 bits, status: -12
 md1: failed to create bitmap (-12)
 md: pers-run() failed ...
 
 tornado ~ # uname -a
 Linux tornado 2.6.21-rc5-mm4 #1 SMP Thu Apr 5 23:47:42 EST 2007 x86_64 
 Intel(R) 
 Pentium(R) 4 CPU 3.00GHz GenuineIntel GNU/Linux
 tornado ~ #
 
 The last known version that worked was 2.6.21-rc3-mm1 - I haven't been 
 testing 
 out the -mm releases so much lately.

OK.  I assume that bitmap-chunks in bitmap_init_from_disk() has some
unexpectedly large value.

I don't _think_ there's anything in -mm which would have triggered this. 
Does mainline do the same thing?

I guess it's possible that the code in git-md-accel.patch accidentally
broke things.  Perhaps try disabling CONFIG_DMA_ENGINE?

 Also, Andrew, can you please restart posting/cc'ing your -mm announcements to 
 the [EMAIL PROTECTED] list?  Seems this stopped around about 
 2.6.20, it was handy.

hm.  I always Bcc [EMAIL PROTECTED]  I assume that its
filters didn't get updated after the s/osdl/linux-foundation/ thing.  I'll
talk to people, thanks.

 .config is up at http://www.reub.net/files/kernel/configs/2.6.21-rc5-mm4
 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc5-mm4

2007-04-05 Thread Andrew Morton
On Thu, 05 Apr 2007 13:02:59 -0400
[EMAIL PROTECTED] wrote:

 On Mon, 02 Apr 2007 22:47:45 PDT, Andrew Morton said:
  ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc5/2.6.21-rc5-mm4/
 
 Am seeing an Oops 'cannot handle kernel paging request' during late
 system startup, hand-copied traceback follows:
 
 avc_has_perm_noaudit+0x2bf/0x506
 avc_has_perm+0x2b/0x5b
 selinux_socket_stream_connect+0x7e/0xc3
 unix_stream_connect+0x202/0x3f3
 sys_connect+0x7e/0xa4
 tracesys+0xde/0xe1
 
 I've not identified exactly when it happens, but it's towards the very end of
 handling /etc/rc5.d, it's already up to the S98's.  Odd thing is it only 
 happens
 when I start with RedHat's 'graphical boot', and may be related to the 
 shutdown
 of the X server that's displaying the boot progress preparing to launch the
 X server for gdm logins (as I'm also seeing a hang sometimes when shutting
 down - so it is possibly a shutting down X server nukes system bug).

Thanks.

I'd have thought that the full trace could be captured with netconsole.

 Figured I'd toss this heads-up in case it rings any bells, while I go do
 the bisection dance on -rc5-mm4 (-mm2 is OK, and -mm3 doesn't boot for me
 for other reasons I didn't chase down before -mm4 came out and fixed it, so
 I have a ways to bisect)
 

No, I'm not aware of anyone else hitting anything like that.

Bisection would be good, and probably pretty quick - I'd pick git-net.patch
as the first pivot point.

But we'd still be wanting the full trace if poss please.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.21-rc5-mm4 initramfs Make Error

2007-04-05 Thread Zan Lynx
I built a version of 2.6.21-rc5-mm4 with an initramfs and it built OK
the first time.

Then I made changes (applied a Reiser4 patch) and rebuilt, and got the
following error:

zephyr linux # make
  CHK include/linux/version.h
  CHK include/linux/utsrelease.h
  CALLscripts/checksyscalls.sh
stdin:1356:2: warning: #warning syscall getcpu not implemented
stdin:1360:2: warning: #warning syscall epoll_pwait not implemented
stdin:1364:2: warning: #warning syscall lutimesat not implemented
stdin:1380:2: warning: #warning syscall revokeat not implemented
stdin:1384:2: warning: #warning syscall frevoke not implemented
  CHK include/linux/compile.h
/usr/src/linux-2.6.21-rc5-mm4/usr/Makefile:41: *** target pattern contains no 
`%'.  Stop.
make: *** [usr] Error 2

I have this in the config:
CONFIG_INITRAMFS_SOURCE=/initramfs

/initramfs is the directory where I build my initramfs, which is just a
busybox setup, very simple.

# rm usr/.initramfs_data.*
seems to make it go again.
-- 
Zan Lynx [EMAIL PROTECTED]


signature.asc
Description: This is a digitally signed message part


Re: RAID1 out of memory error, was Re: 2.6.21-rc5-mm4

2007-04-05 Thread Dan Williams

On 4/5/07, Andrew Morton [EMAIL PROTECTED] wrote:

On Fri, 06 Apr 2007 02:33:03 +1000
Reuben Farrelly [EMAIL PROTECTED] wrote:

 Hi,

 On 3/04/2007 3:47 PM, Andrew Morton wrote:
  
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc5/2.6.21-rc5-mm4/
 
  - The oops in git-net.patch has been fixed, so that tree has been restored.
It is huge.
 
  - Added the device-mapper development tree to the -mm lineup (Alasdair
Kergon).  It is a quilt tree, living at
ftp://ftp.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/.
 
  - Added davidel's signalfd stuff.

 Looks like some damage, or maybe intolerance to on-disk damage, to RAID-1.

 md1 is the first array on the disk, and it refuses to start up on boot, or 
after
 boot.

 ...

 tornado ~ # mdadm --assemble /dev/md1 /dev/sda1 /dev/sdc1
 mdadm: device /dev/md1 already active - cannot assemble it
 tornado ~ # mdadm --run /dev/md1
 mdadm: failed to run array /dev/md1: Cannot allocate memory
 tornado ~ #

 and looking at a dmesg, this is logged:

 md: bindsdc1
 md: bindsda1
 raid1: raid set md1 active with 2 out of 2 mirrors
 md1: bitmap initialized from disk: read 0/1 pages, set 0 bits, status: -12
 md1: failed to create bitmap (-12)
 md: pers-run() failed ...


Is this the dmesg from boot or the dmesg after running the mdadm --run command?



 tornado ~ # uname -a
 Linux tornado 2.6.21-rc5-mm4 #1 SMP Thu Apr 5 23:47:42 EST 2007 x86_64 
Intel(R)
 Pentium(R) 4 CPU 3.00GHz GenuineIntel GNU/Linux
 tornado ~ #

 The last known version that worked was 2.6.21-rc3-mm1 - I haven't been testing
 out the -mm releases so much lately.

OK.  I assume that bitmap-chunks in bitmap_init_from_disk() has some
unexpectedly large value.

I don't _think_ there's anything in -mm which would have triggered this.
Does mainline do the same thing?

I guess it's possible that the code in git-md-accel.patch accidentally
broke things.  Perhaps try disabling CONFIG_DMA_ENGINE?



git-md-accel.patch does not touch anything in the raid1 path, but I
guess stranger things have happened.

--
Dan
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc5-mm4

2007-04-05 Thread Sam Ravnborg
On Wed, Apr 04, 2007 at 01:55:08PM -0400, [EMAIL PROTECTED] wrote:
 On Tue, 03 Apr 2007 20:37:42 PDT, Randy Dunlap said:
 
  Good luck.  But the symbols are there.  Just use left/right arrow keys
  to scroll the display left/right and you can see them.  Now if you just
  had that indicator to tell you that you Need to scroll to see more text...
 
 Exactly. :)  I had the incredible bad luck that the line got cut off at the
 end of a CONFIG_ symbol that made sense - if it had showed up *half* a symbol,
 I'd have gone investigating. ;) (Even a '' or '' saying data offscreen to
 right or left would be sufficient, if somebody wants a small but productive
 kernel (config system actually) task to hack on.)
 
 I'd code it myself, but I have an SL8500 to install, and need to figure out
 how my laptop made it into the bag this morning still up and running (I hit
 the power button, it seemed to power down - blank screen, power light off,
 but syslog msgs prove it was up and running for another 4 hours before it
 shut down on a thermal check...)

If you do not find time to do it try to ping me in a week or so.
Should be trivial to do but away from my dev box atm.

   Sam
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc5-mm4 (SLUB)

2007-04-05 Thread Badari Pulavarty
On Wed, 2007-04-04 at 21:29 -0700, Christoph Lameter wrote:
 Here is a patch that adds validation (only for cpuslabs and partial 
 slabs but thats where the action is). Apply this patch
 and then do
 
 echo 1 /sys/slab/cache-to-check/validate
 
 I suggest to boot with full debugging and then run this on the ACPI slabs.

Did this and didn't trigger any problems.

(Just to be clear, booted with slub_debug with all the patches
applied).

--- Validating slabcache 'Acpi-Namespace'
--- Checked 0 slabs in 'Acpi-Namespace'
--- Validating slabcache 'Acpi-Operand'
--- Checked 5 slabs in 'Acpi-Operand'
--- Validating slabcache 'Acpi-Parse'
--- Checked 0 slabs in 'Acpi-Parse'
--- Validating slabcache 'Acpi-ParseExt'
--- Checked 0 slabs in 'Acpi-ParseExt'
--- Validating slabcache 'Acpi-State'
--- Checked 0 slabs in 'Acpi-State'
--- Validating slabcache 'Acpi-Namespace'
--- Checked 0 slabs in 'Acpi-Namespace'
--- Validating slabcache 'Acpi-Operand'
--- Checked 5 slabs in 'Acpi-Operand'
--- Validating slabcache 'Acpi-Parse'
--- Checked 0 slabs in 'Acpi-Parse'
--- Validating slabcache 'Acpi-ParseExt'
--- Checked 0 slabs in 'Acpi-ParseExt'
--- Validating slabcache 'Acpi-State'
--- Checked 0 slabs in 'Acpi-State'
--- Validating slabcache 'RAW'
--- Checked 1 slabs in 'RAW'
--- Validating slabcache 'RAWv6'
--- Checked 1 slabs in 'RAWv6'
--- Validating slabcache 'TCP'
--- Checked 3 slabs in 'TCP'
--- Validating slabcache 'TCPv6'
--- Checked 4 slabs in 'TCPv6'
--- Validating slabcache 'UDP-Lite'
--- Checked 0 slabs in 'UDP-Lite'
--- Validating slabcache 'UDP'
--- Checked 2 slabs in 'UDP'
--- Validating slabcache 'UDPLITEv6'
--- Checked 0 slabs in 'UDPLITEv6'
--- Validating slabcache 'UDPv6'
--- Checked 0 slabs in 'UDPv6'
--- Validating slabcache 'UNIX'
--- Checked 4 slabs in 'UNIX'
--- Validating slabcache 'anon_vma'
--- Checked 12 slabs in 'anon_vma'
--- Validating slabcache 'arp_cache'
--- Checked 2 slabs in 'arp_cache'
--- Validating slabcache 'bdev_cache'
--- Checked 3 slabs in 'bdev_cache'
--- Validating slabcache 'bio'
--- Checked 0 slabs in 'bio'
--- Validating slabcache 'biovec-1'
--- Checked 1 slabs in 'biovec-1'
--- Validating slabcache 'biovec-128'
--- Checked 1 slabs in 'biovec-128'
--- Validating slabcache 'biovec-16'
--- Checked 1 slabs in 'biovec-16'
--- Validating slabcache 'biovec-256'
--- Checked 1 slabs in 'biovec-256'
--- Validating slabcache 'biovec-4'
--- Checked 1 slabs in 'biovec-4'
--- Validating slabcache 'biovec-64'
--- Checked 1 slabs in 'biovec-64'
--- Validating slabcache 'blkdev_ioc'
--- Checked 4 slabs in 'blkdev_ioc'
--- Validating slabcache 'blkdev_queue'
--- Checked 1 slabs in 'blkdev_queue'
--- Validating slabcache 'blkdev_requests'
--- Checked 2 slabs in 'blkdev_requests'
--- Validating slabcache 'buffer_head'
--- Checked 4 slabs in 'buffer_head'
--- Validating slabcache 'cfq_ioc_pool'
--- Checked 4 slabs in 'cfq_ioc_pool'
--- Validating slabcache 'cfq_pool'
--- Checked 4 slabs in 'cfq_pool'
--- Validating slabcache 'configfs_dir_cache'
--- Checked 0 slabs in 'configfs_dir_cache'
--- Validating slabcache 'dentry_cache'
--- Checked 5 slabs in 'dentry_cache'
--- Validating slabcache 'dm_io'
--- Checked 0 slabs in 'dm_io'
--- Validating slabcache 'dm_tio'
--- Checked 0 slabs in 'dm_tio'
--- Validating slabcache 'dnotify_cache'
--- Checked 1 slabs in 'dnotify_cache'
--- Validating slabcache 'dquot'
--- Checked 0 slabs in 'dquot'
--- Validating slabcache 'eventpoll_epi'
--- Checked 1 slabs in 'eventpoll_epi'
--- Validating slabcache 'eventpoll_pwq'
--- Checked 1 slabs in 'eventpoll_pwq'
--- Validating slabcache 'ext2_inode_cache'
--- Checked 0 slabs in 'ext2_inode_cache'
--- Validating slabcache 'ext2_xattr'
--- Checked 0 slabs in 'ext2_xattr'
--- Validating slabcache 'ext3_inode_cache'
--- Checked 0 slabs in 'ext3_inode_cache'
--- Validating slabcache 'ext3_xattr'
--- Checked 0 slabs in 'ext3_xattr'
--- Validating slabcache 'fasync_cache'
--- Checked 0 slabs in 'fasync_cache'
--- Validating slabcache 'fib6_nodes'
--- Checked 1 slabs in 'fib6_nodes'
--- Validating slabcache 'file_lock_cache'
--- Checked 2 slabs in 'file_lock_cache'
--- Validating slabcache 'files_cache'
--- Checked 10 slabs in 'files_cache'
--- Validating slabcache 'filp'
--- Checked 35 slabs in 'filp'
--- Validating slabcache 'flow_cache'
--- Checked 0 slabs in 'flow_cache'
--- Validating slabcache 'fs_cache'
--- Checked 5 slabs in 'fs_cache'
--- Validating slabcache 'hugetlbfs_inode_cache'
--- Checked 1 slabs in 'hugetlbfs_inode_cache'
--- Validating slabcache 'idr_layer_cache'
--- Checked 2 slabs in 'idr_layer_cache'
--- Validating slabcache 'inet_peer_cache'
--- Checked 0 slabs in 'inet_peer_cache'
--- Validating slabcache 'inode_cache'
--- Checked 8 slabs in 'inode_cache'
--- Validating slabcache 'inotify_event_cache'
--- Checked 0 slabs in 'inotify_event_cache'
--- Validating slabcache 'inotify_watch_cache'
--- Checked 1 slabs in 'inotify_watch_cache'
--- Validating slabcache 'ip6_dst_cache'
--- Checked 1 slabs in 

RAID1 out of memory error, was Re: 2.6.21-rc5-mm4

2007-04-05 Thread Reuben Farrelly

Hi,

On 3/04/2007 3:47 PM, Andrew Morton wrote:

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc5/2.6.21-rc5-mm4/

- The oops in git-net.patch has been fixed, so that tree has been restored. 
  It is huge.


- Added the device-mapper development tree to the -mm lineup (Alasdair
  Kergon).  It is a quilt tree, living at
  ftp://ftp.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/.

- Added davidel's signalfd stuff.


Looks like some damage, or maybe intolerance to on-disk damage, to RAID-1.

md1 is the first array on the disk, and it refuses to start up on boot, or after 
boot.


tornado ~ # cat /proc/mdstat
Personalities : [raid1]
md1 : inactive sda1[0] sdc1[1]
  208640 blocks

md3 : active raid1 sdc3[1] sda3[0]
  20008832 blocks [2/2] [UU]
  bitmap: 0/153 pages [0KB], 64KB chunk

md5 : active raid1 sdc5[1] sda5[0]
  10008384 blocks [2/2] [UU]
  bitmap: 4/153 pages [16KB], 32KB chunk

md6 : active raid1 sdc6[1] sda6[0]
  10008384 blocks [2/2] [UU]
  bitmap: 0/153 pages [0KB], 32KB chunk

md8 : active raid1 sdc8[1] sda8[0]
  1003904 blocks [2/2] [UU]
  bitmap: 0/123 pages [0KB], 4KB chunk

md10 : active raid1 sdc10[1] sda10[0]
  119933120 blocks [2/2] [UU]
  bitmap: 1/229 pages [4KB], 256KB chunk

md2 : active raid1 sdc2[1] sda2[0]
  14544 blocks [2/2] [UU]
  bitmap: 10/191 pages [40KB], 256KB chunk

unused devices: none
tornado ~ #

tornado ~ # mdadm --examine /dev/sda1
/dev/sda1:
  Magic : a92b4efc
Version : 00.90.00
   UUID : f5c2e565:5ed956c0:33b08c07:16154426
  Creation Time : Fri Feb  2 10:16:29 2007
 Raid Level : raid1
  Used Dev Size : 104320 (101.89 MiB 106.82 MB)
 Array Size : 104320 (101.89 MiB 106.82 MB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 1

Update Time : Fri Apr  6 02:06:17 2007
  State : clean
Internal Bitmap : present
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
   Checksum : d3668aaa - correct
 Events : 0.368


  Number   Major   Minor   RaidDevice State
this 0   810  active sync   /dev/sda1

   0 0   810  active sync   /dev/sda1
   1 1   8   331  active sync   /dev/sdc1
tornado ~ # mdadm --examine /dev/sdc1
/dev/sdc1:
  Magic : a92b4efc
Version : 00.90.00
   UUID : f5c2e565:5ed956c0:33b08c07:16154426
  Creation Time : Fri Feb  2 10:16:29 2007
 Raid Level : raid1
  Used Dev Size : 104320 (101.89 MiB 106.82 MB)
 Array Size : 104320 (101.89 MiB 106.82 MB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 1

Update Time : Fri Apr  6 02:06:17 2007
  State : clean
Internal Bitmap : present
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
   Checksum : d3668acc - correct
 Events : 0.368


  Number   Major   Minor   RaidDevice State
this 1   8   331  active sync   /dev/sdc1

   0 0   810  active sync   /dev/sda1
   1 1   8   331  active sync   /dev/sdc1
tornado ~ #


tornado ~ # mdadm --assemble /dev/md1 /dev/sda1 /dev/sdc1
mdadm: device /dev/md1 already active - cannot assemble it
tornado ~ # mdadm --run /dev/md1
mdadm: failed to run array /dev/md1: Cannot allocate memory
tornado ~ #

and looking at a dmesg, this is logged:

md: bindsdc1
md: bindsda1
raid1: raid set md1 active with 2 out of 2 mirrors
md1: bitmap initialized from disk: read 0/1 pages, set 0 bits, status: -12
md1: failed to create bitmap (-12)
md: pers-run() failed ...

tornado ~ # uname -a
Linux tornado 2.6.21-rc5-mm4 #1 SMP Thu Apr 5 23:47:42 EST 2007 x86_64 Intel(R) 
Pentium(R) 4 CPU 3.00GHz GenuineIntel GNU/Linux

tornado ~ #

The last known version that worked was 2.6.21-rc3-mm1 - I haven't been testing 
out the -mm releases so much lately.


Also, Andrew, can you please restart posting/cc'ing your -mm announcements to 
the [EMAIL PROTECTED] list?  Seems this stopped around about 
2.6.20, it was handy.


.config is up at http://www.reub.net/files/kernel/configs/2.6.21-rc5-mm4

Thanks,
Reuben
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc5-mm4

2007-04-05 Thread Valdis . Kletnieks
On Mon, 02 Apr 2007 22:47:45 PDT, Andrew Morton said:
 ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc5/2.6.21-rc5-mm4/

Am seeing an Oops 'cannot handle kernel paging request' during late
system startup, hand-copied traceback follows:

avc_has_perm_noaudit+0x2bf/0x506
avc_has_perm+0x2b/0x5b
selinux_socket_stream_connect+0x7e/0xc3
unix_stream_connect+0x202/0x3f3
sys_connect+0x7e/0xa4
tracesys+0xde/0xe1

I've not identified exactly when it happens, but it's towards the very end of
handling /etc/rc5.d, it's already up to the S98's.  Odd thing is it only happens
when I start with RedHat's 'graphical boot', and may be related to the shutdown
of the X server that's displaying the boot progress preparing to launch the
X server for gdm logins (as I'm also seeing a hang sometimes when shutting
down - so it is possibly a shutting down X server nukes system bug).

Figured I'd toss this heads-up in case it rings any bells, while I go do
the bisection dance on -rc5-mm4 (-mm2 is OK, and -mm3 doesn't boot for me
for other reasons I didn't chase down before -mm4 came out and fixed it, so
I have a ways to bisect)


pgpDJNQg7QOfl.pgp
Description: PGP signature


Re: 2.6.21-rc5-mm4 (SLUB)

2007-04-05 Thread Christoph Lameter
On Thu, 5 Apr 2007, Badari Pulavarty wrote:

 On Wed, 2007-04-04 at 21:29 -0700, Christoph Lameter wrote:
  Here is a patch that adds validation (only for cpuslabs and partial 
  slabs but thats where the action is). Apply this patch
  and then do
  
  echo 1 /sys/slab/cache-to-check/validate
  
  I suggest to boot with full debugging and then run this on the ACPI slabs.
 
 Did this and didn't trigger any problems.

Duh. Must have been in the full slabs. Maybe I should add a tracking of 
full slabs for the debug case. Would also enable leak detection.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.21-rc5-mm4: ia64: scheduling while atomic - utrace?

2007-04-05 Thread Lee Schermerhorn
Running a 'usex -e' load [http://people.redhat.com/~anderson/usex/] on
2.6.21-rc5-mm4 on ia64, I see the following:

BUG: scheduling while atomic: strace/0x4001/20162

Call Trace:
 [a00100014ec0] show_stack+0x80/0xa0
sp=e76042dc7610 bsp=e76042dc1260
 [a00100014f10] dump_stack+0x30/0x60
sp=e76042dc77e0 bsp=e76042dc1248
 [a001006f76e0] schedule+0x1d00/0x22a0
sp=e76042dc77e0 bsp=e76042dc1108
 [a00100099750] __cond_resched+0x50/0xa0
sp=e76042dc7800 bsp=e76042dc10e8
 [a001006f8e30] cond_resched+0xb0/0xe0
sp=e76042dc7800 bsp=e76042dc10d0
 [a001001561d0] get_user_pages+0x1b0/0x7c0
sp=e76042dc7800 bsp=e76042dc1028
 [a001001568a0] access_process_vm+0xc0/0x440
sp=e76042dc7820 bsp=e76042dc0f78
 [a0010002fcc0] ia64_sync_user_rbs+0x80/0x100
sp=e76042dc7830 bsp=e76042dc0f38
 [a0010002fdf0] do_gpregs_writeback+0xb0/0xe0
sp=e76042dc7840 bsp=e76042dc0f10
 [a001cad0] unw_init_running+0x70/0xa0
sp=e76042dc7850 bsp=e76042dc0ee8
 [a0010002ed70] do_regset_call+0x110/0x140
sp=e76042dc7c30 bsp=e76042dc0e88
 [a0010002eea0] gpregs_writeback+0x40/0x60
sp=e76042dc7e30 bsp=e76042dc0e60
 [a00100123900] ptrace_report+0xe0/0x1e0
sp=e76042dc7e30 bsp=e76042dc0e28
 [a00100123aa0] ptrace_report_syscall+0xa0/0xe0
sp=e76042dc7e30 bsp=e76042dc0e00
 [a00100123b10] ptrace_report_syscall_exit+0x30/0x60
sp=e76042dc7e30 bsp=e76042dc0dc8
 [a00100122cb0] utrace_report_syscall+0xf0/0x540
sp=e76042dc7e30 bsp=e76042dc0d48
 [a00100031800] syscall_trace_leave+0x60/0xc0
sp=e76042dc7e30 bsp=e76042dc0cf0
 [a001c1c0] ia64_trace_syscall+0x100/0x110
sp=e76042dc7e30 bsp=e76042dc0cf0

Looks like get_ptrace_state(), called from ptrace_report_syscall calls
rcu_read_lock() which disables preemption.  Corresponding
rcu_read_unlock() will be from put_ptrace_state() from ptrace_report()
at end of report.  However, ia64 needs to sync register backing store,
and this requires access to process vm.  get_user_pages' use of
cond_sched() is tripping the scheduling while atomic bug.

May be related to:

http://marc.info/?a=10288337963r=1w=4


Lee

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc5-mm4 (SLUB)

2007-04-04 Thread Christoph Lameter

Here is a patch that adds validation (only for cpuslabs and partial 
slabs but thats where the action is). Apply this patch
and then do

echo 1 >/sys/slab//validate

I suggest to boot with full debugging and then run this on the ACPI slabs.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

Index: linux-2.6.21-rc5-mm4/mm/slub.c
===
--- linux-2.6.21-rc5-mm4.orig/mm/slub.c 2007-04-04 20:26:03.0 -0700
+++ linux-2.6.21-rc5-mm4/mm/slub.c  2007-04-04 21:26:15.0 -0700
@@ -2280,6 +2280,67 @@ void *__kmalloc_node_track_caller(size_t
 
 #ifdef CONFIG_SYSFS
 
+static int validate_slab(struct kmem_cache *s, struct page *page)
+{
+   void *p;
+   void *addr = page_address(page);
+   unsigned long map[BITS_TO_LONGS(s->objects)];
+
+   if (!check_slab(s, page) ||
+   !on_freelist(s, page, NULL))
+   return 0;
+
+   /* Now we know that a valid freelist exists */
+   bitmap_zero(map, s->objects);
+
+   for(p = page->freelist; p; p = get_freepointer(s, p)) {
+   set_bit((p - addr) / s->size, map);
+   if (!check_object(s, page, p, 0))
+   return 0;
+   }
+
+   for(p = addr; p < addr + s->objects * s->size; p += s->size)
+   if (!test_bit((p - addr) / s->size, map))
+   if (!check_object(s, page, p, 1))
+   return 0;
+   return 1;
+}
+
+static int validate_slab_node(struct kmem_cache *s, struct kmem_cache_node *n)
+{
+   int count = 0;
+   struct page *page;
+   unsigned long flags;
+
+   spin_lock_irqsave(>list_lock, flags);
+   list_for_each_entry(page, >partial, lru) {
+   if (slab_trylock(page)) {
+   validate_slab(s, page);
+   slab_unlock(page);
+   } else
+   printk(KERN_INFO "Skipped busy slab %p\n", page);
+   count++;
+   }
+   spin_unlock_irqrestore(>list_lock, flags);
+   return count;
+}
+
+static void validate_slab_cache(struct kmem_cache *s)
+{
+   int node;
+   int count = 0;
+
+   printk(KERN_INFO "--- Validating slabcache '%s'\n", s->name);
+   flush_all(s);
+   for_each_online_node(node) {
+   struct kmem_cache_node *n = get_node(s, node);
+
+   count += validate_slab_node(s, n);
+   }
+   printk(KERN_INFO "--- Checked %d slabs in '%s'\n",
+   count, s->name);
+}
+
 static unsigned long count_partial(struct kmem_cache_node *n)
 {
unsigned long flags;
@@ -2402,7 +2463,6 @@ struct slab_attribute {
static struct slab_attribute _name##_attr =  \
__ATTR(_name, 0644, _name##_show, _name##_store)
 
-
 static ssize_t slab_size_show(struct kmem_cache *s, char *buf)
 {
return sprintf(buf, "%d\n", s->size);
@@ -2609,6 +2669,22 @@ static ssize_t store_user_store(struct k
 }
 SLAB_ATTR(store_user);
 
+static ssize_t validate_show(struct kmem_cache *s, char *buf)
+{
+   return 0;
+}
+
+static ssize_t validate_store(struct kmem_cache *s,
+   const char *buf, size_t length)
+{
+   if (buf[0] == '1')
+   validate_slab_cache(s);
+   else
+   return -EINVAL;
+   return length;
+}
+SLAB_ATTR(validate);
+
 #ifdef CONFIG_NUMA
 static ssize_t defrag_ratio_show(struct kmem_cache *s, char *buf)
 {
@@ -2648,6 +2724,7 @@ static struct attribute * slab_attrs[] =
_zone_attr.attr,
_attr.attr,
_user_attr.attr,
+   _attr.attr,
 #ifdef CONFIG_ZONE_DMA
_dma_attr.attr,
 #endif
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc5-mm4 (SLUB)

2007-04-04 Thread Christoph Lameter
On Wed, 4 Apr 2007, Badari Pulavarty wrote:

> > Were the slabs merged? Look at /sys/slab and see if there are any symlinks 
> > there.
> > 

Ok. symlinks there. Its a sporadic thing. I think I am going to add a slab
validator to SLUB that goes through all slabs and checks all objects for 
validity. Then we can trigger a scan through the acpi caches which should 
locate the problem.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc5-mm4 (SLUB)

2007-04-04 Thread Badari Pulavarty
On Wed, 2007-04-04 at 17:31 -0700, Christoph Lameter wrote:
> On Wed, 4 Apr 2007, Badari Pulavarty wrote:
> 
> > On Wed, 2007-04-04 at 15:59 -0700, Christoph Lameter wrote:
> > > On Wed, 4 Apr 2007, Badari Pulavarty wrote:
> > > 
> > > > Here is the slub_debug=FU output with the above patch.
> > > 
> > > Hmmm... Looks like the object is actually free. Someone writes beyond the 
> > > end of the earlier object. Setting Z should check overwrites but it 
> > > switched off merging. So set
> > > 
> > > slub_debug = FZ
> > > 
> > > Analoguos to the last patch you would need to take out redzoning from 
> > > the flags that stop merging. Then rerun. Maybe we can track it down this 
> > > way.
> > 
> > Hmm.. I did that and machine boots fine, with absolutely no
> > debug messages :(
> 
> Were the slabs merged? Look at /sys/slab and see if there are any symlinks 
> there.
> 

elm3b29:/sys/slab # ls -ltr
total 0
drwxr-xr-x 2 root root 0 Apr  4 17:40 sock_inode_cache
drwxr-xr-x 2 root root 0 Apr  4 17:40 skbuff_fclone_cache
drwxr-xr-x 2 root root 0 Apr  4 17:40 sigqueue
drwxr-xr-x 2 root root 0 Apr  4 17:40 shmem_inode_cache
drwxr-xr-x 2 root root 0 Apr  4 17:40 radix_tree_node
drwxr-xr-x 2 root root 0 Apr  4 17:40 proc_inode_cache
drwxr-xr-x 2 root root 0 Apr  4 17:40 ip_dst_cache
drwxr-xr-x 2 root root 0 Apr  4 17:40 file_lock_cache
drwxr-xr-x 2 root root 0 Apr  4 17:40 blkdev_requests
drwxr-xr-x 2 root root 0 Apr  4 17:40 blkdev_queue
drwxr-xr-x 2 root root 0 Apr  4 17:40 blkdev_ioc
drwxr-xr-x 2 root root 0 Apr  4 17:40 biovec-64
drwxr-xr-x 2 root root 0 Apr  4 17:40 biovec-256
drwxr-xr-x 2 root root 0 Apr  4 17:40 biovec-128
drwxr-xr-x 2 root root 0 Apr  4 17:40 bdev_cache
drwxr-xr-x 2 root root 0 Apr  4 17:40 TCP
drwxr-xr-x 2 root root 0 Apr  4 17:40 Acpi-State
drwxr-xr-x 2 root root 0 Apr  4 17:40 Acpi-ParseExt
drwxr-xr-x 2 root root 0 Apr  4 17:40 Acpi-Operand
drwxr-xr-x 2 root root 0 Apr  4 17:40 Acpi-Namespace
drwxr-xr-x 2 root root 0 Apr  4 17:40 vm_area_struct
drwxr-xr-x 2 root root 0 Apr  4 17:40 task_struct
drwxr-xr-x 2 root root 0 Apr  4 17:40 sysfs_dir_cache
drwxr-xr-x 2 root root 0 Apr  4 17:40 signal_cache
drwxr-xr-x 2 root root 0 Apr  4 17:40 sighand_cache
drwxr-xr-x 2 root root 0 Apr  4 17:40 pid
drwxr-xr-x 2 root root 0 Apr  4 17:40 names_cache
drwxr-xr-x 2 root root 0 Apr  4 17:40 mm_struct
drwxr-xr-x 2 root root 0 Apr  4 17:40 kmem_cache_node
drwxr-xr-x 2 root root 0 Apr  4 17:40 kmalloc-96
drwxr-xr-x 2 root root 0 Apr  4 17:40 kmalloc-8192
drwxr-xr-x 2 root root 0 Apr  4 17:40 kmalloc-8
drwxr-xr-x 2 root root 0 Apr  4 17:40 kmalloc-65536
drwxr-xr-x 2 root root 0 Apr  4 17:40 kmalloc-64
drwxr-xr-x 2 root root 0 Apr  4 17:40 kmalloc-512
drwxr-xr-x 2 root root 0 Apr  4 17:40 kmalloc-4096
drwxr-xr-x 2 root root 0 Apr  4 17:40 kmalloc-32768
drwxr-xr-x 2 root root 0 Apr  4 17:40 kmalloc-32
drwxr-xr-x 2 root root 0 Apr  4 17:40 kmalloc-262144
drwxr-xr-x 2 root root 0 Apr  4 17:40 kmalloc-256
drwxr-xr-x 2 root root 0 Apr  4 17:40 kmalloc-2048
drwxr-xr-x 2 root root 0 Apr  4 17:40 kmalloc-192
drwxr-xr-x 2 root root 0 Apr  4 17:40 kmalloc-16384
drwxr-xr-x 2 root root 0 Apr  4 17:40 kmalloc-16
drwxr-xr-x 2 root root 0 Apr  4 17:40 kmalloc-131072
drwxr-xr-x 2 root root 0 Apr  4 17:40 kmalloc-128
drwxr-xr-x 2 root root 0 Apr  4 17:40 kmalloc-1024
drwxr-xr-x 2 root root 0 Apr  4 17:40 inode_cache
drwxr-xr-x 2 root root 0 Apr  4 17:40 idr_layer_cache
drwxr-xr-x 2 root root 0 Apr  4 17:40 fs_cache
drwxr-xr-x 2 root root 0 Apr  4 17:40 filp
drwxr-xr-x 2 root root 0 Apr  4 17:40 dentry_cache
drwxr-xr-x 2 root root 0 Apr  4 17:40 buffer_head
drwxr-xr-x 2 root root 0 Apr  4 17:40 anon_vma
drwxr-xr-x 2 root root 0 Apr  4 17:40 dquot
drwxr-xr-x 2 root root 0 Apr  4 17:40 reiser_inode_cache
drwxr-xr-x 2 root root 0 Apr  4 17:40 nfs_inode_cache
drwxr-xr-x 2 root root 0 Apr  4 17:40 nfs_direct_cache
drwxr-xr-x 2 root root 0 Apr  4 17:40 mqueue_inode_cache
drwxr-xr-x 2 root root 0 Apr  4 17:40 minix_inode_cache
drwxr-xr-x 2 root root 0 Apr  4 17:40 journal_head
drwxr-xr-x 2 root root 0 Apr  4 17:40 isofs_inode_cache
drwxr-xr-x 2 root root 0 Apr  4 17:40 hugetlbfs_inode_cache
drwxr-xr-x 2 root root 0 Apr  4 17:40 ext3_xattr
drwxr-xr-x 2 root root 0 Apr  4 17:40 ext3_inode_cache
drwxr-xr-x 2 root root 0 Apr  4 17:40 ext2_xattr
drwxr-xr-x 2 root root 0 Apr  4 17:40 ext2_inode_cache
drwxr-xr-x 2 root root 0 Apr  4 17:40 cfq_pool
drwxr-xr-x 2 root root 0 Apr  4 17:40 cfq_ioc_pool
drwxr-xr-x 2 root root 0 Apr  4 17:40 UNIX
drwxr-xr-x 2 root root 0 Apr  4 17:40 rpc_inode_cache
drwxr-xr-x 2 root root 0 Apr  4 17:40 rpc_buffers
drwxr-xr-x 2 root root 0 Apr  4 17:40 revokefs_inode_cache
drwxr-xr-x 2 root root 0 Apr  4 17:40 TCPv6
drwxr-xr-x 2 root root 0 Apr  4 17:54 fat_inode_cache
drwxr-xr-x 2 root root 0 Apr  4 17:54 fat_cache
drwxr-xr-x 2 root root 0 Apr  4 17:54 sgpool-64
drwxr-xr-x 2 root root 0 Apr  4 17:54 sgpool-32
drwxr-xr-x 2 root root 0 Apr  4 17:54 sgpool-128
drwxr-xr-x 2 root root 0 Apr  4 17:54 scsi_io_context

Re: 2.6.21-rc5-mm4

2007-04-04 Thread Antonino A. Daplas
On Mon, 2007-04-02 at 22:47 -0700, Andrew Morton wrote:
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc5/2.6.21-rc5-mm4/
> 
> - The oops in git-net.patch has been fixed, so that tree has been restored. 
>   It is huge.
> 
> - Added the device-mapper development tree to the -mm lineup (Alasdair
>   Kergon).  It is a quilt tree, living at
>   ftp://ftp.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/.
> 
> - Added davidel's signalfd stuff.
> 
> 
> 

I see this tracing (from the lock-dependency validator?) for several -mm
versions.  This is from a Silan ethernet card (CONFIG_SC92031).

00:0b.0 Ethernet controller: Hangzhou Silan Microelectronics Co., Ltd.
Unknown device 2031 (rev 01)

Other than the tracing, I'm not having any problems.

Tony

==
[ INFO: soft-safe -> soft-unsafe lock order detected ]
2.6.21-rc5-mm4-default #44
--
ip/3036 [HC0[0]:SC0[2]:HE1:SE0] is trying to acquire:
 (>lock){--..}, at: [] sc92031_set_multicast_list
+0x14/0x2d [sc92031]

and this task is already holding:
 (>_xmit_lock){-...}, at: [] dev_mc_upload+0x14/0x3a
which would create a new lock dependency:
 (>_xmit_lock){-...} -> (>lock){--..}

but this new dependency connects a soft-irq-safe lock:
 (>mca_lock){-+..}
... which became soft-irq-safe at:
  [] __lock_acquire+0x3d7/0xb93
  [] lock_acquire+0x68/0x82
  [] _spin_lock_bh+0x30/0x3d
  [] mld_ifc_timer_expire+0x15b/0x21d [ipv6]
  [] run_timer_softirq+0xf1/0x14e
  [] __do_softirq+0x46/0x9c
  [] do_softirq+0x2d/0x46
  [] irq_exit+0x3b/0x6b
  [] do_IRQ+0x5e/0x76
  [] common_interrupt+0x2e/0x34
  [] error_code+0x71/0x78
  [] 0x

to a soft-irq-unsafe lock:
 (>lock){--..}
... which became soft-irq-unsafe at:
...  [] __lock_acquire+0x46b/0xb93
  [] lock_acquire+0x68/0x82
  [] _spin_lock+0x2b/0x38
  [] sc92031_open+0xcc/0x16f [sc92031]
  [] dev_open+0x33/0x6e
  [] dev_change_flags+0x57/0x10b
  [] devinet_ioctl+0x235/0x546
  [] inet_ioctl+0x89/0xaa
  [] sock_ioctl+0x1ac/0x1ca
  [] do_ioctl+0x1c/0x53
  [] vfs_ioctl+0x1ec/0x203
  [] sys_ioctl+0x49/0x62
  [] sysenter_past_esp+0x5d/0x99
  [] 0x

other info that might help us debug this:

2 locks held by ip/3036:
 #0:  (rtnl_mutex){--..}, at: [] mutex_lock+0x24/0x28
 #1:  (>_xmit_lock){-...}, at: [] dev_mc_upload+0x14/0x3a

the soft-irq-safe lock's dependencies:
-> (>mca_lock){-+..} ops: 9 {
   initial-use  at:
[] __lock_acquire+0x486/0xb93
[] lock_acquire+0x68/0x82
[] _spin_lock_bh+0x30/0x3d
[] igmp6_group_added+0x1b/0x120 [ipv6]
[] ipv6_dev_mc_inc+0x2f9/0x346 [ipv6]
[] ipv6_add_dev+0x232/0x240 [ipv6]
[] versions+0x1e8b/0xf9c8
[x_tables]
[] versions+0x1d54/0xf9c8
[x_tables]
[] sys_init_module+0x1252/0x138f
[] sysenter_past_esp+0x5d/0x99
[] 0x
   in-softirq-W at:
[] __lock_acquire+0x3d7/0xb93
[] lock_acquire+0x68/0x82
[] _spin_lock_bh+0x30/0x3d
[] mld_ifc_timer_expire+0x15b/0x21d
[ipv6]
[] run_timer_softirq+0xf1/0x14e
[] __do_softirq+0x46/0x9c
[] do_softirq+0x2d/0x46
[] irq_exit+0x3b/0x6b
[] do_IRQ+0x5e/0x76
[] common_interrupt+0x2e/0x34
[] error_code+0x71/0x78
[] 0x
   hardirq-on-W at:
[] __lock_acquire+0x441/0xb93
[] lock_acquire+0x68/0x82
[] _spin_lock_bh+0x30/0x3d
[] igmp6_group_added+0x1b/0x120 [ipv6]
[] ipv6_dev_mc_inc+0x2f9/0x346 [ipv6]
[] ipv6_add_dev+0x232/0x240 [ipv6]
[] versions+0x1e8b/0xf9c8
[x_tables]
[] versions+0x1d54/0xf9c8
[x_tables]
[] sys_init_module+0x1252/0x138f
[] sysenter_past_esp+0x5d/0x99
[] 0x
 }
 ... key  at: [] __key.29988+0x0/0xfffe9535 [ipv6]
 -> (>_xmit_lock){-...} ops: 18 {
initial-use  at:
  [] __lock_acquire+0x486/0xb93
  [] lock_acquire+0x68/0x82
  [] _spin_lock_bh+0x30/0x3d
  [] dev_mc_upload+0x14/0x3a
  [] dev_change_flags+0x31/0x10b
  [] devinet_ioctl+0x235/0x546
  [] inet_ioctl+0x89/0xaa
  [] sock_ioctl+0x1ac/0x1ca

Re: 2.6.21-rc5-mm4 (SLUB)

2007-04-04 Thread Christoph Lameter
On Wed, 4 Apr 2007, Badari Pulavarty wrote:

> On Wed, 2007-04-04 at 15:59 -0700, Christoph Lameter wrote:
> > On Wed, 4 Apr 2007, Badari Pulavarty wrote:
> > 
> > > Here is the slub_debug=FU output with the above patch.
> > 
> > Hmmm... Looks like the object is actually free. Someone writes beyond the 
> > end of the earlier object. Setting Z should check overwrites but it 
> > switched off merging. So set
> > 
> > slub_debug = FZ
> > 
> > Analoguos to the last patch you would need to take out redzoning from 
> > the flags that stop merging. Then rerun. Maybe we can track it down this 
> > way.
> 
> Hmm.. I did that and machine boots fine, with absolutely no
> debug messages :(

Were the slabs merged? Look at /sys/slab and see if there are any symlinks 
there.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc5-mm4

2007-04-04 Thread Antonino A. Daplas
On Thu, 2007-04-05 at 08:38 +1000, Con Kolivas wrote:
> On Thursday 05 April 2007 08:10, Andrew Morton wrote:
> > Thanks - that'll be the CPU scheduler changes.
> >
> > Con has produced a patch or two which might address this but afaik we don't
> > yet have a definitive fix?
> >
> > I believe that reverting
> > sched-implement-staircase-deadline-cpu-scheduler-staircase-improvements.pat
> >ch will prevent it.
> 
> I posted a definitive fix which Michal tested for me offlist. Subject was:
>  [PATCH] sched: implement staircase deadline cpu scheduler improvements fix
> 
> Sorry about relative noise prior to that. Akpm please pick it up.
> 
> Here again just in case.
> 

Rebooted a few times, I can confirm that this patch fixes this.

Thanks

Tony


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc5-mm4 (SLUB)

2007-04-04 Thread Badari Pulavarty
On Wed, 2007-04-04 at 15:59 -0700, Christoph Lameter wrote:
> On Wed, 4 Apr 2007, Badari Pulavarty wrote:
> 
> > Here is the slub_debug=FU output with the above patch.
> 
> Hmmm... Looks like the object is actually free. Someone writes beyond the 
> end of the earlier object. Setting Z should check overwrites but it 
> switched off merging. So set
> 
> slub_debug = FZ
> 
> Analoguos to the last patch you would need to take out redzoning from 
> the flags that stop merging. Then rerun. Maybe we can track it down this 
> way.

Hmm.. I did that and machine boots fine, with absolutely no
debug messages :(

Thanks,
Badari



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc5-mm4 (SLUB)

2007-04-04 Thread Christoph Lameter
On Wed, 4 Apr 2007, Badari Pulavarty wrote:

> Here is the slub_debug=FU output with the above patch.

Hmmm... Looks like the object is actually free. Someone writes beyond the 
end of the earlier object. Setting Z should check overwrites but it 
switched off merging. So set

slub_debug = FZ

Analoguos to the last patch you would need to take out redzoning from 
the flags that stop merging. Then rerun. Maybe we can track it down this 
way.

Hmmm... Maybe remove all the debug flags from those that avoid merging and 
then run with full debug. That should theoretically flush it out.
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc5-mm4 (SLUB)

2007-04-04 Thread Badari Pulavarty
On Wed, 2007-04-04 at 11:22 -0700, Christoph Lameter wrote:
> On Wed, 4 Apr 2007, Christoph Lameter wrote:
> 
> > Yes. slub_debug=U. But user tracking may need to increase the slab 
> > size (depends on the padding available in the slab) to store the 
> > tracking information, so you may not get the same corruption.
> 
> Hummm U is switching off merging and you may need merging to trigger the 
> discovery of the overwrite.
> 
> Here is a patch to enable merging even while tracking slabs. This patch 
> should not be applied to mm. In general tracking requires knowing which
> slab the objects come from and merging looses that information.
> 
> Index: linux-2.6.21-rc5-mm4/mm/slub.c
> ===============
> --- linux-2.6.21-rc5-mm4.orig/mm/slub.c   2007-04-04 11:19:29.0 
> -0700
> +++ linux-2.6.21-rc5-mm4/mm/slub.c2007-04-04 11:19:35.0 -0700
> @@ -86,7 +86,7 @@
>  /*
>   * Set of flags that will prevent slab merging
>   */
> -#define SLUB_NEVER_MERGE (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER | \
> +#define SLUB_NEVER_MERGE (SLAB_RED_ZONE | SLAB_POISON | \
>   SLAB_TRACE | SLAB_DESTROY_BY_RCU)
>  
>  #define SLUB_MERGE_SAME (SLAB_DEBUG_FREE | SLAB_RECLAIM_ACCOUNT | \
> 

Here is the slub_debug=FU output with the above patch.

Thanks,
Badari


Linux version 2.6.21-rc5-mm4 ([EMAIL PROTECTED]) (gcc version 4.1.0 (SUSE 
Linux)) #6 SMP Wed Apr 4 16:52:03 PDT 2007
Command line: root=/dev/hda2 vga=0x314  slub_debug=FU selinux=0   console=tty0 
console=ttyS0,38400 resume=/dev/hda1 resume=/dev/hda1  splash=silent showopts
BIOS-provided physical RAM map:
 BIOS-e820:  - 0009f000 (usable)
 BIOS-e820: 0009f000 - 000a (reserved)
 BIOS-e820: 000ca000 - 0010 (reserved)
 BIOS-e820: 0010 - dfef (usable)
 BIOS-e820: dfef - dfeff000 (ACPI data)
 BIOS-e820: dfeff000 - dff0 (ACPI NVS)
 BIOS-e820: dff0 - e000 (usable)
 BIOS-e820: fec0 - fec00400 (reserved)
 BIOS-e820: fee0 - fee01000 (reserved)
 BIOS-e820: fff8 - 0001 (reserved)
 BIOS-e820: 0001 - 0001e000 (usable)
end_pfn_map = 1966080
DMI 2.3 present.
ACPI: RSDP 000F6970, 0024 (r2 PTLTD )
ACPI: XSDT DFEFC625, 003C (r1 PTLTD  XSDT604  LTP0)
ACPI: FACP DFEFED02, 00F4 (r3 AMDHAMMER604 PTECF4240)
ACPI: DSDT DFEFC661, 262D (r1 AMD-K8  AMDACPI  604 MSFT  10D)
ACPI: FACS DFEFFFC0, 0040
ACPI: SRAT DFEFEDF6, 0160 (r1 AMDHAMMER604 AMD 1)
ACPI: APIC DFEFEF56, 00AA (r1 PTLTD  APIC604  LTP0)
SRAT: PXM 0 -> APIC 0 -> Node 0
SRAT: PXM 1 -> APIC 1 -> Node 1
SRAT: PXM 2 -> APIC 2 -> Node 2
SRAT: PXM 3 -> APIC 3 -> Node 3
SRAT: Node 0 PXM 0 0-a
SRAT: Node 0 PXM 0 0-e000
SRAT: Node 0 PXM 0 0-18000
SRAT: PXM 1 (1-1a000) overlaps with PXM 0 (0-18000)
SRAT: SRAT not used.
Scanning NUMA topology in Northbridge 24
Number of nodes 4
Node 0 MemBase  Limit 00018000
Node 1 MemBase 00018000 Limit 0001a000
Node 2 MemBase 0001a000 Limit 0001c000
Node 3 MemBase 0001c000 Limit 0001e000
Using node hash shift of 29
Bootmem setup node 0 -00018000
Bootmem setup node 1 00018000-0001a000
Bootmem setup node 2 0001a000-0001c000
Bootmem setup node 3 0001c000-0001e000
Zone PFN ranges:
  DMA 0 -> 4096
  DMA324096 ->  1048576
  Normal1048576 ->  1966080
Movable zone start PFN for each node
early_node_map[7] active PFN ranges
0:0 ->  159
0:  256 ->   917232
0:   917248 ->   917504
0:  1048576 ->  1572864
1:  1572864 ->  1703936
2:  1703936 ->  1835008
3:  1835008 ->  1966080
ACPI: PM-Timer IO Port: 0x8008
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
Processor #0 (Bootup-CPU)
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
Processor #1
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled)
Processor #2
ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] enabled)
Processor #3
ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x03] high edge lint[0x1])
ACPI: IOAPIC (id[0x04] address[0xfec0] gsi_base[0])
IOAPIC[0]: apic_id 4, address 0xfec0, GSI 0-23
ACPI: IOAPIC (id[0x05] address[0xfa3e] gsi_base[24])
IOAPIC[1]: apic_id 5, address 0xfa3e, GSI 24-27
ACPI: IOAPIC (id[0x06] address[0xfa3e1000] gsi_base[28])
IOAPIC[2]: apic_id 6, address 0xfa3e1000, GSI 28-31
ACPI: IOAPIC (id[0x07] address[0xfa3e2000] gsi_b

Re: 2.6.21-rc5-mm4

2007-04-04 Thread Con Kolivas
On Thursday 05 April 2007 08:10, Andrew Morton wrote:
> Thanks - that'll be the CPU scheduler changes.
>
> Con has produced a patch or two which might address this but afaik we don't
> yet have a definitive fix?
>
> I believe that reverting
> sched-implement-staircase-deadline-cpu-scheduler-staircase-improvements.pat
>ch will prevent it.

I posted a definitive fix which Michal tested for me offlist. Subject was:
 [PATCH] sched: implement staircase deadline cpu scheduler improvements fix

Sorry about relative noise prior to that. Akpm please pick it up.

Here again just in case.

---
Use of memset was bogus. Fix it.

Fix exiting recalc_task_prio without p->array being updated.

Microoptimisation courtesy of Dmitry Adamushko <[EMAIL PROTECTED]>

Signed-off-by: Con Kolivas <[EMAIL PROTECTED]>

---
 kernel/sched.c |   17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

Index: linux-2.6.21-rc5-mm4/kernel/sched.c
===============
--- linux-2.6.21-rc5-mm4.orig/kernel/sched.c2007-04-04 12:14:29.00000 
+1000
+++ linux-2.6.21-rc5-mm4/kernel/sched.c 2007-04-04 12:49:39.0 +1000
@@ -683,11 +683,13 @@ static void dequeue_task(struct task_str
  * The task is being queued on a fresh array so it has its entitlement
  * bitmap cleared.
  */
-static inline void task_new_array(struct task_struct *p, struct rq *rq)
+static void task_new_array(struct task_struct *p, struct rq *rq,
+  struct prio_array *array)
 {
bitmap_zero(p->bitmap, PRIO_RANGE);
p->rotation = rq->prio_rotation;
p->time_slice = p->quota;
+   p->array = array;
 }
 
 /* Find the first slot from the relevant prio_matrix entry */
@@ -709,6 +711,8 @@ static inline int next_entitled_slot(str
DECLARE_BITMAP(tmp, PRIO_RANGE);
int search_prio, uprio = USER_PRIO(p->static_prio);
 
+   if (!rq->prio_level[uprio])
+   rq->prio_level[uprio] = MAX_RT_PRIO;
/*
 * Only priorities equal to the prio_level and above for their
 * static_prio are acceptable, and only if it's not better than
@@ -736,11 +740,8 @@ static inline int next_entitled_slot(str
 
 static void queue_expired(struct task_struct *p, struct rq *rq)
 {
-   p->array = rq->expired;
-   task_new_array(p, rq);
+   task_new_array(p, rq, rq->expired);
p->prio = p->normal_prio = first_prio_slot(p);
-   p->time_slice = p->quota;
-   p->rotation = rq->prio_rotation;
 }
 
 #ifdef CONFIG_SMP
@@ -800,9 +801,9 @@ static void recalc_task_prio(struct task
queue_expired(p, rq);
return;
} else
-   task_new_array(p, rq);
+   task_new_array(p, rq, array);
} else
-   task_new_array(p, rq);
+   task_new_array(p, rq, array);
 
queue_prio = next_entitled_slot(p, rq);
if (queue_prio >= MAX_PRIO) {
@@ -3445,7 +3446,7 @@ EXPORT_SYMBOL(sub_preempt_count);
 
 static inline void reset_prio_levels(struct rq *rq)
 {
-   memset(rq->prio_level, MAX_RT_PRIO, ARRAY_SIZE(rq->prio_level));
+   memset(rq->prio_level, 0, sizeof(int) * PRIO_RANGE);
 }
 
 /*

-- 
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc5-mm4

2007-04-04 Thread Andrew Morton
On Thu, 05 Apr 2007 05:56:35 +0800
"Antonino A. Daplas" <[EMAIL PROTECTED]> wrote:

> On Mon, 2007-04-02 at 22:47 -0700, Andrew Morton wrote:
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc5/2.6.21-rc5-mm4/
> > 
> > - The oops in git-net.patch has been fixed, so that tree has been restored. 
> >   It is huge.
> > 
> > - Added the device-mapper development tree to the -mm lineup (Alasdair
> >   Kergon).  It is a quilt tree, living at
> >   ftp://ftp.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/.
> > 
> > - Added davidel's signalfd stuff.
> > 
> > 
> > 
> 
> I'm getting a kernel panic intermittently, approximately 50% of boots.
> The tracing is not always the same, but it always dies on an
> atomic_bitop operation.  Here are two hand-copied tracings (for the life
> of me, I can't make netconsole work).
> 
> 
> /---First Tracing--/
> Oops:  [#1]
> last sysfs file: class/firmware/microcode
> Modules linked in: ...
> 
> ...
> CPU: 0
> EIP: ...
> EFLAGS:...
> EIP is at find_next_zero_bit
> ...
> ...
> ...
> Process set_disk_settin
> Call Trace:
> show_trace_log
> show_stack_log
> show_register
> die
> do_page_fault
> error_code
> recalc_task_prio
> activate_task
> try_to_wake_up
> deault_wake_function
> __wake_up_common
> __wake_up
> sock_def_readable
> soc_queue_rev_skb
> udp_queue_rcv_skb
> __udp4_libr_rcv
> udp_rcv
> ip_local_delivery
> ip_rcv
> netif_receive_skb
> rtl8139_poll
> net_rx_action
> __do_soft_irq
> do_softirq
> irq_exit
> do_IRQ
> common_interrupt

Thanks - that'll be the CPU scheduler changes.

Con has produced a patch or two which might address this but afaik we don't
yet have a definitive fix?

I believe that reverting
sched-implement-staircase-deadline-cpu-scheduler-staircase-improvements.patch
will prevent it.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc5-mm4

2007-04-04 Thread Antonino A. Daplas
On Mon, 2007-04-02 at 22:47 -0700, Andrew Morton wrote:
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc5/2.6.21-rc5-mm4/
> 
> - The oops in git-net.patch has been fixed, so that tree has been restored. 
>   It is huge.
> 
> - Added the device-mapper development tree to the -mm lineup (Alasdair
>   Kergon).  It is a quilt tree, living at
>   ftp://ftp.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/.
> 
> - Added davidel's signalfd stuff.
> 
> 
> 

I'm getting a kernel panic intermittently, approximately 50% of boots.
The tracing is not always the same, but it always dies on an
atomic_bitop operation.  Here are two hand-copied tracings (for the life
of me, I can't make netconsole work).


/---First Tracing--/
Oops:  [#1]
last sysfs file: class/firmware/microcode
Modules linked in: ...

...
CPU: 0
EIP: ...
EFLAGS:...
EIP is at find_next_zero_bit
...
...
...
Process set_disk_settin
Call Trace:
show_trace_log
show_stack_log
show_register
die
do_page_fault
error_code
recalc_task_prio
activate_task
try_to_wake_up
deault_wake_function
__wake_up_common
__wake_up
sock_def_readable
soc_queue_rev_skb
udp_queue_rcv_skb
__udp4_libr_rcv
udp_rcv
ip_local_delivery
ip_rcv
netif_receive_skb
rtl8139_poll
net_rx_action
__do_soft_irq
do_softirq
irq_exit
do_IRQ
common_interrupt

/-- Second Tracing --/
CPU: 0
EIP: ...
EFLAGS:...
EIP is at find_next_zero_bit
...
...
...
Process sshd
Call Trace:
show_trace_log
show_stack_log
show_register
die
do_page_fault
error_code
recalc_task_prio
enqueue_task
activate_task
try_to_wake_up
wake_up_state
signal_wake_up
__group_complete_signal
__group_send_signal
group_send_sig_info
send_group_sig_info
it_real_fn
run_hrtimer_softirq
__do_softirq
irq_exit
smp_apic_timer_interrupt
apic_timer_interrupt
error_code

EIP: [. find_next_zero_bit+...

Tony

PS: I might try use a serial console and bisection, but this might take
me a few days.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc5-mm4 (SLUB powerpc)

2007-04-04 Thread Badari Pulavarty
On Wed, 2007-04-04 at 10:35 -0700, Christoph Lameter wrote: 
> On Wed, 4 Apr 2007, Badari Pulavarty wrote:
> 
> > Next issue ? Sorry.
> 
> No problem. Could have a look at the hvsi driver and figure out what is 
> failing there? What is the hvsi driver?
> 
> > Console: switching to colour frame buffer device 80x30
> > fb0: MATROX frame buffer device
> > matroxfb_crtc2: secondary head of fb0 was registered as fb1
> > Kernel panic - not syncing: Couldn't register hvsi console driver
> 
> Framebuffer allocation failure
> 

It looks like.. hvsi.c:

if (tty_register_driver(hvsi_driver))
panic("Couldn't register hvsi console driver\n");

I added printk() in all failure cases in tty_register_driver()
and I can't reproduce the problem.

Machine tries to boot and goes further and hangs. I saw similar
hang with RSDL earlier.

Thanks,
Badari

Welcome to yaboot version 10.1.5-r625.SuSE
booted from '/[EMAIL PROTECTED]/[EMAIL PROTECTED],2/pci1069,[EMAIL 
PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED],0'
Enter "help" to get some basic usage information
boot: 2621rc5mm4
Please wait, loading kernel...
Allocated 0x0040 bytes for executable @ 0x0040
   Elf32 kernel loaded...

zImage starting: loaded at 0x0040 (sp: 0x01a3fe60)
Allocating 0x806af0 bytes for kernel ...
OF version = 'IBM,SF225_096'
gunzipping (0x01c0 <- 0x00408000:0x006a4cd2)...done 0x741f90 bytes
Finalizing device tree... using OF tree (promptr=00c39a50)
OF stdout device is: /vdevice/[EMAIL PROTECTED]
Hypertas detected, assuming LPAR !
command line: root=/dev/sda2 xmon=on slub_debug
memory layout at init:
  alloc_bottom : 0240b000
  alloc_top: 0800
  alloc_top_hi : 0001e800
  rmo_top  : 0800
  ram_top  : 0001e800
Looking for displays
found display   : /[EMAIL PROTECTED]/[EMAIL PROTECTED],2/[EMAIL 
PROTECTED]/[EMAIL PROTECTED],
opening ... done
instantiating rtas at 0x077ca000 ... done
 : boot cpu 
0002 : starting cpu hw idx 0002... done
0004 : starting cpu hw idx 0004... done
0006 : starting cpu hw idx 0006... done
copying OF device tree ...
Building dt strings...
Building dt structure...
Device tree strings 0x0240c000 -> 0x0240d2fe
Device tree struct  0x0240e000 -> 0x02423000
Calling quiesce ...
returning from prom_init
Partition configured for 8 cpus.
Starting Linux PPC64 #5 SMP Wed Apr 4 10:55:34 PDT 2007
-
ppc64_pft_size= 0x1b
physicalMemorySize= 0x1e800
ppc64_caches.dcache_line_size = 0x80
ppc64_caches.icache_line_size = 0x80
htab_address  = 0x
htab_hash_mask= 0xf
-
Linux version 2.6.21-rc4-mm1-ppc64 ([EMAIL PROTECTED]) (gcc version 4.1.0
(SUSE Linux)) #5 SMP Wed Apr 4 10:55:34 PDT 2007
[boot]0012 Setup Arch
No ramdisk, default root is /dev/sda2
EEH: PCI Enhanced I/O Error Handling Enabled
PPC64 nvram contains 8192 bytes
Zone PFN ranges:
  DMA 0 ->  1998848
  Normal1998848 ->  1998848
Movable zone start PFN for each node
early_node_map[2] active PFN ranges
0:0 ->   974848
1:   974848 ->  1998848
[boot]0015 Setup Done
Built 2 zonelists.  Total pages: 1971520
Kernel command line: root=/dev/sda2 xmon=on slub_debug
[boot]0020 XICS Init
[boot]0021 XICS Done
PID hash table entries: 4096 (order: 12, 32768 bytes)
Console: colour dummy device 80x25
console handover: boot [udbg-1] -> real [hvc0]
Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes)
Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes)
freeing bootmem node 0
freeing bootmem node 1
Memory: 7855772k/7995392k available (5992k kernel code, 139620k
reserved, 1224k data, 814k bss, 272k init)
Security Framework v1.0.0 initialized
Mount-cache hash table entries: 256
Processor 1 found.
Processor 2 found.
Processor 3 found.
Processor 4 found.
Processor 5 found.
Processor 6 found.
Processor 7 found.
Brought up 8 CPUs
migration_cost=0,3,25
NET: Registered protocol family 16
IOMMU table initialized, virtual merging enabled
SCSI subsystem initialized
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
NET: Registered protocol family 2
IP route cache hash table entries: 262144 (order: 9, 2097152 bytes)
TCP established hash table entries: 524288 (order: 11, 12582912 bytes)
TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
TCP: Hash tables configured (established 524288 bind 65536)
TCP reno registered
vio_bus_init: device_register returned -19
IBM eBus Device Driver
audit: initializing netlink socket (disabled)
audit(1175719341.610:1): initialized
Total HugeTLB memory allocated, 0
VFS: Disk quotas dquot_6.5.1
Dquot-cache hash table entries: 512 

Re: 2.6.21-rc5-mm4 -- laptop lid button only triggers suspend on HP dv1240us every other time.

2007-04-04 Thread Sergio Monteiro Basto
On Tue, 2007-04-03 at 22:44 -0700, Andrew Morton wrote:
> On Wed, 4 Apr 2007 00:33:36 -0500 "Miles Lane" <[EMAIL PROTECTED]> wrote:
> 
> > This is an old bug.  It has been happening forever, but I'd love to
> > know how I can help get this tracked down and fixed.
> 
> Yes, I've been hitting something like that in the past 3-4 weeks.  We
> started to diagnose it but I got distracted.
> 
> For a start, please review
> http://www.mail-archive.com/linux-acpi@vger.kernel.org/msg05094.html, then
> see if you are able to take it further than I was.

I had a similar problem with my laptop that loose ACPI events, after
suspend to disk, on kernels (I don't remember well) works on 2.6.16 or
15, stops work on 2.6.17 and 2.6.18 and works again on 2.6.19 and 20 


-- 
Sérgio M. B.


smime.p7s
Description: S/MIME cryptographic signature


Re: 2.6.21-rc5-mm4 (SLUB)

2007-04-04 Thread Christoph Lameter
On Wed, 4 Apr 2007, Badari Pulavarty wrote:

> free. Any ideas on how I can track down easily ? Is there
> a way to store last allocated (function, line#) and look
> around there ?

Also you may want to switch off slab merging. That will allow you to 
determine the cache involved if its not a kmalloc alloc and the slab was 
merged.

Note that switching off merging may seem to cure the problem because 
the object was corrupted after allocation and then the slab was never 
touched again. It may surface only if its merged because merging creates 
more activity on the slabs that will expose the problem.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc5-mm4 (SLUB)

2007-04-04 Thread Christoph Lameter
On Wed, 4 Apr 2007, Christoph Lameter wrote:

> Yes. slub_debug=U. But user tracking may need to increase the slab 
> size (depends on the padding available in the slab) to store the 
> tracking information, so you may not get the same corruption.

Hummm U is switching off merging and you may need merging to trigger the 
discovery of the overwrite.

Here is a patch to enable merging even while tracking slabs. This patch 
should not be applied to mm. In general tracking requires knowing which
slab the objects come from and merging looses that information.

Index: linux-2.6.21-rc5-mm4/mm/slub.c
===
--- linux-2.6.21-rc5-mm4.orig/mm/slub.c 2007-04-04 11:19:29.0 -0700
+++ linux-2.6.21-rc5-mm4/mm/slub.c  2007-04-04 11:19:35.0 -0700
@@ -86,7 +86,7 @@
 /*
  * Set of flags that will prevent slab merging
  */
-#define SLUB_NEVER_MERGE (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER | \
+#define SLUB_NEVER_MERGE (SLAB_RED_ZONE | SLAB_POISON | \
SLAB_TRACE | SLAB_DESTROY_BY_RCU)
 
 #define SLUB_MERGE_SAME (SLAB_DEBUG_FREE | SLAB_RECLAIM_ACCOUNT | \

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc5-mm4 (SLUB)

2007-04-04 Thread Christoph Lameter
On Wed, 4 Apr 2007, Badari Pulavarty wrote:

> Machine booted fine with slub_debug=F. Got following in the
> log. I guess we need to track down who is touching after
> free. Any ideas on how I can track down easily ? Is there
> a way to store last allocated (function, line#) and look
> around there ?

Yes. slub_debug=U. But user tracking may need to increase the slab 
size (depends on the padding available in the slab) to store the 
tracking information, so you may not get the same corruption.

> *** SLUB: Freepointer corrupt in [EMAIL PROTECTED] Slab
> 0x81017f9f8b80
> offset=672 flags=0x2c7 inuse=42
> freelist=0x810173f172a0
>   Bytes b4 0x810173f17290:  a0 72 f1 73 00 00 00 00 00 00 00 00 00
> 00 00 00 .r\us
> Object 0x810173f172a0:  00 00 00 00 01 81 ff ff 00 00 00 00 00
> 00 00 00 ..\u\u
> FreePointer 0x810173f172a0 -> 0x8101

Same as before.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc5-mm4 (SLUB)

2007-04-04 Thread Badari Pulavarty
On Wed, 2007-04-04 at 10:03 -0700, Christoph Lameter wrote:
> On Wed, 4 Apr 2007, Badari Pulavarty wrote:
> 
> > On Tue, 2007-04-03 at 16:55 -0700, Christoph Lameter wrote:
> > > On Tue, 3 Apr 2007, Badari Pulavarty wrote:
> > > 
> > > > Hmm. booted fine with slub_debug :(
> > > 
> > > Try to selectively disable debug options... if you got the 
> > > time...
> > > 
> > > F.e. Try with sanity checks only
> > > 
> > > slub_debug=F
> > 
> > slub_debug=F got something. 
> 
> Ahh Seems that the first 4 bytes of the allocations is zapped after 
> the object has been freed. Can you trap writes to the first four bytes of 
> the object? This should give you the culprit.
> 
> The other thing is that the system is performing DMA allocations
> for the file cache Then its running out of memory.
> 
> Argh We use  GFP DMA bitmask to check SLAB flags field:
> 
> Try this fix:
> 
> 
> 
> SLUB: Use correct flags to check for DMA cache
> 
> We use a GFP mask to check the SLAB flags if this is a DMA cache.
> 
> Fix this by using the correct SLAB mask and then use the SLUB_DMA
> for the ORing of flags. If the system does not support DMA then
> we will OR zero which will hopefully get the compiler to drop the
> useless if statement as well.
> 
> Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
> 
> Index: linux-2.6.21-rc5-mm4/mm/slub.c
> ===
> --- linux-2.6.21-rc5-mm4.orig/mm/slub.c   2007-04-04 09:59:05.0 
> -0700
> +++ linux-2.6.21-rc5-mm4/mm/slub.c2007-04-04 10:01:14.0 -0700
> @@ -678,8 +678,8 @@ static struct page *allocate_slab(struct
>   if (s->order)
>   flags |= __GFP_COMP;
>  
> - if (s->flags & SLUB_DMA)
> - flags |= GFP_DMA;
> + if (s->flags & SLAB_CACHE_DMA)
> + flags |= SLUB_DMA;
>  
>   if (node == -1)
>   page = alloc_pages(flags, s->order);
> 


Machine booted fine with slub_debug=F. Got following in the
log. I guess we need to track down who is touching after
free. Any ideas on how I can track down easily ? Is there
a way to store last allocated (function, line#) and look
around there ?

Thanks,
Badari

*** SLUB: Freepointer corrupt in [EMAIL PROTECTED] Slab
0x81017f9f8b80
offset=672 flags=0x2c7 inuse=42
freelist=0x810173f172a0
  Bytes b4 0x810173f17290:  a0 72 f1 73 00 00 00 00 00 00 00 00 00
00 00 00 .r\us
Object 0x810173f172a0:  00 00 00 00 01 81 ff ff 00 00 00 00 00
00 00 00 ..\u\u
FreePointer 0x810173f172a0 -> 0x8101

Call Trace:
 [] object_err+0x105/0x1b0
 [] check_object+0x1b5/0x1d0
 [] alloc_object_checks+0x64/0x110
 [] kmem_cache_alloc+0xfc/0x1a0
 [] sysfs_create_link+0xb7/0x160
 [] module_add_driver+0x41/0xd0
 [] bus_add_driver+0xce/0x1d0
 [] driver_register+0x5d/0x90
 [] __pci_register_driver+0x68/0xb0
 [] agp_amd64_init+0x36/0xe0
 [] gart_iommu_init+0x4c6/0x560
 [] __wake_up+0x4e/0x70
 [] genl_rcv+0x0/0x70
 [] netlink_kernel_create+0x14c/0x160
 [] genl_unlock+0x10/0x40
 [] pci_iommu_init+0xe/0x20
 [] kernel_init+0x154/0x330
 [] child_rip+0xa/0x12
 [] kernel_init+0x0/0x330
 [] child_rip+0x0/0x12


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc5-mm4

2007-04-04 Thread Valdis . Kletnieks
On Tue, 03 Apr 2007 20:37:42 PDT, Randy Dunlap said:
>
> Good luck.  But the symbols are there.  Just use left/right arrow keys
> to scroll the display left/right and you can see them.  Now if you just
> had that indicator to tell you that you Need to scroll to see more text...

Exactly. :)  I had the incredible bad luck that the line got cut off at the
end of a CONFIG_ symbol that made sense - if it had showed up *half* a symbol,
I'd have gone investigating. ;) (Even a '>' or '<' saying data offscreen to
right or left would be sufficient, if somebody wants a small but productive
kernel (config system actually) task to hack on.)

I'd code it myself, but I have an SL8500 to install, and need to figure out
how my laptop made it into the bag this morning still up and running (I hit
the power button, it seemed to power down - blank screen, power light off,
but syslog msgs prove it was up and running for another 4 hours before it
shut down on a thermal check...)



pgpvKl8VCNjja.pgp
Description: PGP signature


Re: 2.6.21-rc5-mm4 (SLUB powerpc)

2007-04-04 Thread Christoph Lameter
On Wed, 4 Apr 2007, Badari Pulavarty wrote:

> Next issue ? Sorry.

No problem. Could have a look at the hvsi driver and figure out what is 
failing there? What is the hvsi driver?

> Console: switching to colour frame buffer device 80x30
> fb0: MATROX frame buffer device
> matroxfb_crtc2: secondary head of fb0 was registered as fb1
> Kernel panic - not syncing: Couldn't register hvsi console driver

Framebuffer allocation failure

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc5-mm4 (SLUB powerpc)

2007-04-04 Thread Badari Pulavarty
On Wed, 2007-04-04 at 10:13 -0700, Christoph Lameter wrote:
> On Wed, 4 Apr 2007, Badari Pulavarty wrote:
> 
> > Well !! Helps a little, but not enough to boot (hangs little later) :(
> > I will try to get stack trace for that.
> 
> Great! Thanks for all the debugging help.
> 
>  
> > Processor 6 found.
> > Processor 7 found.
> > Brought up 8 CPUs
> > mm/memory.c:111: bad pud c000f20c0480.
> 
> Hmmm... Checking for slabs used in powerpc arch code:
> 
> The pgtable cache is configured as
> 
> 
>   pgtable_cache[i] = kmem_cache_create(name,
>  size, size,
>  SLAB_HWCACHE_ALIGN |
>  SLAB_MUST_HWCACHE_ALIGN,
>  zero_ctor,
>  NULL);
> 
> Hmmm aligned slabs at size and then we MUST_HWCACHE_ALIGN?? Two 
> competing alignment requirements and a constructor. Constructor requires
> the moving of the free pointer after the slab and thus increases the slab 
> size.
> 
> Sigh. IF SLAB_HWCACHE_ALIGN is set then SLUB believes this to be the 
> ultimate demand that overrides all other alignments and only aligns to the 
> cacheline. Try the following fix:
> 
> 
> 
> SLUB: Treat SLAB_HWCACHE_ALIGN as a mininum and not as *the* alignment
> 
> If the specified alignment is higher than L1_CACHE_BYTES and
> SLAB_HWCACHE_ALIGN is set then use the higher alignment.
> 
> Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
> 
> Index: linux-2.6.21-rc5-mm4/mm/slub.c
> ===========
> --- linux-2.6.21-rc5-mm4.orig/mm/slub.c   2007-04-04 10:09:20.0 
> -0700
> +++ linux-2.6.21-rc5-mm4/mm/slub.c2007-04-04 10:09:42.0 -0700
> @@ -1373,10 +1373,7 @@ static int calculate_order(int size)
>  static unsigned long calculate_alignment(unsigned long flags,
>   unsigned long align)
>  {
> - if (flags & SLAB_HWCACHE_ALIGN)
> - return L1_CACHE_BYTES;
> -
> - if (flags & SLAB_MUST_HWCACHE_ALIGN)
> + if (flags & (SLAB_MUST_HWCACHE_ALIGN | SLAB_HWCACHE_ALIGN))
>   return max_t(unsigned long, align, L1_CACHE_BYTES);
>  
>   if (align < ARCH_SLAB_MINALIGN)

Next issue ? Sorry.

Thanks,
Badari

Allocated 0x0040 bytes for executable @ 0x0040
   Elf32 kernel loaded...

zImage starting: loaded at 0x0040 (sp: 0x01a3fb10)
Allocating 0x822c40 bytes for kernel ...
OF version = 'IBM,SF225_096'
gunzipping (0x01c0 <- 0x00408000:0x006a8eac)...done 0x75cdf0 bytes
Finalizing device tree... using OF tree (promptr=00c39a50)
OF stdout device is: /vdevice/[EMAIL PROTECTED]
Hypertas detected, assuming LPAR !
command line: root=/dev/sda2 xmon=on slub_debug
memory layout at init:
  alloc_bottom : 02427000
  alloc_top: 0800
  alloc_top_hi : 0001e800
  rmo_top  : 0800
  ram_top  : 0001e800
Looking for displays
found display   : /[EMAIL PROTECTED]/[EMAIL PROTECTED],2/[EMAIL 
PROTECTED]/[EMAIL PROTECTED],
opening ... done
instantiating rtas at 0x077ca000 ... done
 : boot cpu 
0002 : starting cpu hw idx 0002... done
0004 : starting cpu hw idx 0004... done
0006 : starting cpu hw idx 0006... done
copying OF device tree ...
Building dt strings...
Building dt structure...
Device tree strings 0x02428000 -> 0x024292fe
Device tree struct  0x0242a000 -> 0x0243f000
Calling quiesce ...
returning from prom_init
Partition configured for 8 cpus.
Starting Linux PPC64 #8 SMP Wed Apr 4 10:21:43 PDT 2007
-
ppc64_pft_size= 0x1b
physicalMemorySize= 0x1e800
ppc64_caches.dcache_line_size = 0x80
ppc64_caches.icache_line_size = 0x80
htab_address  = 0x
htab_hash_mask= 0xf
-
Linux version 2.6.21-rc5-mm4-ppc64 ([EMAIL PROTECTED]) (gcc version 4.1.0
(SUSE Linux)) #8 SMP Wed Apr 4 10:21:43 PDT 2007
[boot]0012 Setup Arch
No ramdisk, default root is /dev/sda2
EEH: PCI Enhanced I/O Error Handling Enabled
PPC64 nvram contains 8192 bytes
Zone PFN ranges:
  DMA 0 ->  1998848
  Normal1998848 ->  1998848
Movable zone start PFN for each node
early_node_map[2] active PFN ranges
0:0 ->   974848
1:   974848 ->  1998848
[boot]0015 Setup Done
Built 2 zonelists.  Total pages: 1971520
Kernel command line: root=/dev/sda2 xmon=on slub_debug
[boot]0020 

Re: 2.6.21-rc5-mm4

2007-04-04 Thread Eric W. Biederman
Jiri Kosina <[EMAIL PROTECTED]> writes:

> On Tue, 3 Apr 2007, Jiri Kosina wrote:
>
>> > we're also having problems reproducing it on that same combination 
>> > (2.6.21-rc4 + my tree), so it points to something in -mm. Since your 
>> > trace is completely different right now it looks like something else 
>> > is fuzzing it up. Since the e1000 changes are in rc5-mm3 as well, that 
>> > might help to narrow it down quickly.
>> I don't know (yet) whether rc5-mm3 was OK in this respect, I didn't boot 
>> it on this machine. I only know that both rc5 and rc5 + e1000 tree are 
>> OK, but rc5-mm4 panics on ifconfig/dhclient on e1000 card immediately on 
>> my system.
>> I will start bisection when I get back to the respective machine 
>> (tomorrow) and will let you know.
>
> And the bisection winner is
>
>   i386-irq-kill-nr_irq_vectors-and-increase-nr_irqs.patch
>
> I don't immediately see how it could be causing it, so adding CCs which 
> are listed in the patch.

Weird.  I will have to look at that in a little more detail.

Do you know if this problem happens on x86_64?
What does your .config look like?
What does /proc/interrupts look like?
What kind of hardware you running this kernel on?
Can anyone else reproduce this?

The oops clearly shows something using -1 and calling that as an
address I don't know why, but I'm guessing I have triggered a memory
stomp somewhere.  I think this is the first time I have seen a small
negative number causing a NULL pointer dereference.

That patch looks innocuous enough that either:
- I just missed changing something I should have.
- Your configuration has an increase in NR_IRQS and that triggered
  something.
- The patch simply permuted things so a memory stomp now happens
  on the e1000 data structures instead of somewhere else.
- Something doesn't like large irq numbers.

This work is essentially a backport from x86_64 so if your hardware
is 64bit capable testing that should be a fairly easy test, and be
able to rule out large irq numbers as the culprit.

Until I get a good look at -mm I'm going to have a hard time guessing.
But a roving memory stomp is my best guess.


Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc5-mm4 (SLUB powerpc)

2007-04-04 Thread Christoph Lameter
On Wed, 4 Apr 2007, Badari Pulavarty wrote:

> Well !! Helps a little, but not enough to boot (hangs little later) :(
> I will try to get stack trace for that.

Great! Thanks for all the debugging help.

 
> Processor 6 found.
> Processor 7 found.
> Brought up 8 CPUs
> mm/memory.c:111: bad pud c000f20c0480.

Hmmm... Checking for slabs used in powerpc arch code:

The pgtable cache is configured as


  pgtable_cache[i] = kmem_cache_create(name,
 size, size,
 SLAB_HWCACHE_ALIGN |
 SLAB_MUST_HWCACHE_ALIGN,
 zero_ctor,
 NULL);

Hmmm aligned slabs at size and then we MUST_HWCACHE_ALIGN?? Two 
competing alignment requirements and a constructor. Constructor requires
the moving of the free pointer after the slab and thus increases the slab 
size.

Sigh. IF SLAB_HWCACHE_ALIGN is set then SLUB believes this to be the 
ultimate demand that overrides all other alignments and only aligns to the 
cacheline. Try the following fix:



SLUB: Treat SLAB_HWCACHE_ALIGN as a mininum and not as *the* alignment

If the specified alignment is higher than L1_CACHE_BYTES and
SLAB_HWCACHE_ALIGN is set then use the higher alignment.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

Index: linux-2.6.21-rc5-mm4/mm/slub.c
===========
--- linux-2.6.21-rc5-mm4.orig/mm/slub.c 2007-04-04 10:09:20.00000 -0700
+++ linux-2.6.21-rc5-mm4/mm/slub.c  2007-04-04 10:09:42.0 -0700
@@ -1373,10 +1373,7 @@ static int calculate_order(int size)
 static unsigned long calculate_alignment(unsigned long flags,
unsigned long align)
 {
-   if (flags & SLAB_HWCACHE_ALIGN)
-   return L1_CACHE_BYTES;
-
-   if (flags & SLAB_MUST_HWCACHE_ALIGN)
+   if (flags & (SLAB_MUST_HWCACHE_ALIGN | SLAB_HWCACHE_ALIGN))
return max_t(unsigned long, align, L1_CACHE_BYTES);
 
if (align < ARCH_SLAB_MINALIGN)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc5-mm4 (SLUB)

2007-04-04 Thread Christoph Lameter
On Wed, 4 Apr 2007, Badari Pulavarty wrote:

> On Tue, 2007-04-03 at 16:55 -0700, Christoph Lameter wrote:
> > On Tue, 3 Apr 2007, Badari Pulavarty wrote:
> > 
> > > Hmm. booted fine with slub_debug :(
> > 
> > Try to selectively disable debug options... if you got the 
> > time...
> > 
> > F.e. Try with sanity checks only
> > 
> > slub_debug=F
> 
> slub_debug=F got something. 

Ahh Seems that the first 4 bytes of the allocations is zapped after 
the object has been freed. Can you trap writes to the first four bytes of 
the object? This should give you the culprit.

The other thing is that the system is performing DMA allocations
for the file cache Then its running out of memory.

Argh We use  GFP DMA bitmask to check SLAB flags field:

Try this fix:



SLUB: Use correct flags to check for DMA cache

We use a GFP mask to check the SLAB flags if this is a DMA cache.

Fix this by using the correct SLAB mask and then use the SLUB_DMA
for the ORing of flags. If the system does not support DMA then
we will OR zero which will hopefully get the compiler to drop the
useless if statement as well.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

Index: linux-2.6.21-rc5-mm4/mm/slub.c
===========
--- linux-2.6.21-rc5-mm4.orig/mm/slub.c 2007-04-04 09:59:05.0 -0700
+++ linux-2.6.21-rc5-mm4/mm/slub.c  2007-04-04 10:01:14.0 -0700
@@ -678,8 +678,8 @@ static struct page *allocate_slab(struct
if (s->order)
flags |= __GFP_COMP;
 
-   if (s->flags & SLUB_DMA)
-   flags |= GFP_DMA;
+   if (s->flags & SLAB_CACHE_DMA)
+   flags |= SLUB_DMA;
 
if (node == -1)
page = alloc_pages(flags, s->order);


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc5-mm4

2007-04-04 Thread Jiri Kosina
On Tue, 3 Apr 2007, Jiri Kosina wrote:

> > we're also having problems reproducing it on that same combination 
> > (2.6.21-rc4 + my tree), so it points to something in -mm. Since your 
> > trace is completely different right now it looks like something else 
> > is fuzzing it up. Since the e1000 changes are in rc5-mm3 as well, that 
> > might help to narrow it down quickly.
> I don't know (yet) whether rc5-mm3 was OK in this respect, I didn't boot 
> it on this machine. I only know that both rc5 and rc5 + e1000 tree are 
> OK, but rc5-mm4 panics on ifconfig/dhclient on e1000 card immediately on 
> my system.
> I will start bisection when I get back to the respective machine 
> (tomorrow) and will let you know.

And the bisection winner is

i386-irq-kill-nr_irq_vectors-and-increase-nr_irqs.patch

I don't immediately see how it could be causing it, so adding CCs which 
are listed in the patch.

Original description of the symptoms at http://lkml.org/lkml/2007/4/3/90

-- 
Jiri Kosina
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc5-mm4

2007-04-04 Thread Badari Pulavarty
On Wed, 2007-04-04 at 08:12 -0700, Badari Pulavarty wrote:
> On Tue, 2007-04-03 at 18:16 -0700, Christoph Lameter wrote:
> > On Tue, 3 Apr 2007, Badari Pulavarty wrote:
> > 
> > > Seems to be an issue with calibrate_delay() spinning in a tight
> > > loop :(
> > > 
> > > BTW, machine boots fine with SLAB code - not sure why ?
> > 
> > Interrupt disabled sigh.
> > 
> > Here is the fix:
> > 
> > 
> > 
> > 
> > SLUB: Fix numa bootstrap
> > 
> > NUMA bootstrap calls new_slab() if more than one node is found on bootup.
> > new_slab() assumes a standard slab context where interrupts must be
> > disabled. It enables interrupts for the call into the page allocator
> > and then disables them again. Interrupts do not have to be disabled
> > during on bootstrap because we still run single threaded there.
> > 
> > I dropped the interrupt preservation code just before SLUB v6 because
> > it looked useless there. SLUB worked on the following NUMA tests
> > that just had a single node. Sigh.
> > 
> > Enable interrupts after calling new_slab.
> > 
> > Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
> > 
> > Index: linux-2.6.21-rc5-mm4/mm/slub.c
> > ===
> > --- linux-2.6.21-rc5-mm4.orig/mm/slub.c 2007-04-03 18:07:41.0 
> > -0700
> > +++ linux-2.6.21-rc5-mm4/mm/slub.c  2007-04-03 18:08:17.0 -0700
> > @@ -1436,6 +1436,8 @@ static int init_kmem_cache_nodes(struct 
> >  
> > BUG_ON(s->size < sizeof(struct kmem_cache_node));
> > page = new_slab(kmalloc_caches, gfpflags, node);
> > +   /* new_slab() disables interupts */
> > +   local_irq_enable();
> >  
> > BUG_ON(!page);
> > n = page->freelist;
> 
> Well !! Helps a little, but not enough to boot (hangs little later) :(
> I will try to get stack trace for that.

Better debug with slub_debug.
Hope this helps.

Thanks,
Badari

boot: 2621rc5mm4 xmon=on slub_debug
Please wait, loading kernel...
Allocated 0x0040 bytes for executable @ 0x0040
   Elf32 kernel loaded...

zImage starting: loaded at 0x0040 (sp: 0x01a3fb10)
Allocating 0x826c40 bytes for kernel ...
OF version = 'IBM,SF225_096'
gunzipping (0x01c0 <- 0x00408000:0x006a8e52)...done 0x760df0 bytes
Finalizing device tree... using OF tree (promptr=00c39a50)
OF stdout device is: /vdevice/[EMAIL PROTECTED]
Hypertas detected, assuming LPAR !
command line: root=/dev/sda2 xmon=on slub_debug
memory layout at init:
  alloc_bottom : 0242b000
  alloc_top: 0800
  alloc_top_hi : 0001e800
  rmo_top  : 0800
  ram_top  : 0001e800
Looking for displays
found display   : /[EMAIL PROTECTED]/[EMAIL PROTECTED],2/[EMAIL 
PROTECTED]/[EMAIL PROTECTED],
opening ... done
instantiating rtas at 0x077ca000 ... done
 : boot cpu 
0002 : starting cpu hw idx 0002... done
0004 : starting cpu hw idx 0004... done
0006 : starting cpu hw idx 0006... done
copying OF device tree ...
Building dt strings...
Building dt structure...
Device tree strings 0x0242c000 -> 0x0242d2fe
Device tree struct  0x0242e000 -> 0x02443000
Calling quiesce ...
returning from prom_init
Partition configured for 8 cpus.
Starting Linux PPC64 #7 SMP Wed Apr 4 07:52:49 PDT 2007
-
ppc64_pft_size    = 0x1b
physicalMemorySize= 0x1e800
ppc64_caches.dcache_line_size = 0x80
ppc64_caches.icache_line_size = 0x80
htab_address  = 0x
htab_hash_mask= 0xf
-
Linux version 2.6.21-rc5-mm4-ppc64 ([EMAIL PROTECTED]) (gcc version 4.1.0
(SUSE Linux)) #7 SMP Wed Apr 4 07:52:49 PDT 2007
[boot]0012 Setup Arch
No ramdisk, default root is /dev/sda2
EEH: PCI Enhanced I/O Error Handling Enabled
PPC64 nvram contains 8192 bytes
Zone PFN ranges:
  DMA 0 ->  1998848
  Normal1998848 ->  1998848
Movable zone start PFN for each node
early_node_map[2] active PFN ranges
0:0 ->   974848
1:   974848 ->  1998848
[boot]0015 Setup Done
Built 2 zonelists.  Total pages: 1971520
Kernel command line: root=/dev/sda2 xmon=on slub_debug
[boot]0020 XICS Init
[boot]0021 XICS Done
PID hash table entries: 4096 (order: 12, 32768 bytes)
Console: colour dummy device 80x25
console handover: boot [udbg-1] -> real [hvc0]
Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes)
Inode-cache ha

Re: 2.6.21-rc5-mm4

2007-04-04 Thread Badari Pulavarty
On Tue, 2007-04-03 at 18:16 -0700, Christoph Lameter wrote:
> On Tue, 3 Apr 2007, Badari Pulavarty wrote:
> 
> > Seems to be an issue with calibrate_delay() spinning in a tight
> > loop :(
> > 
> > BTW, machine boots fine with SLAB code - not sure why ?
> 
> Interrupt disabled sigh.
> 
> Here is the fix:
> 
> 
> 
> 
> SLUB: Fix numa bootstrap
> 
> NUMA bootstrap calls new_slab() if more than one node is found on bootup.
> new_slab() assumes a standard slab context where interrupts must be
> disabled. It enables interrupts for the call into the page allocator
> and then disables them again. Interrupts do not have to be disabled
> during on bootstrap because we still run single threaded there.
> 
> I dropped the interrupt preservation code just before SLUB v6 because
> it looked useless there. SLUB worked on the following NUMA tests
> that just had a single node. Sigh.
> 
> Enable interrupts after calling new_slab.
> 
> Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
> 
> Index: linux-2.6.21-rc5-mm4/mm/slub.c
> ===========
> --- linux-2.6.21-rc5-mm4.orig/mm/slub.c   2007-04-03 18:07:41.0 
> -0700
> +++ linux-2.6.21-rc5-mm4/mm/slub.c2007-04-03 18:08:17.0 -0700
> @@ -1436,6 +1436,8 @@ static int init_kmem_cache_nodes(struct 
>  
>   BUG_ON(s->size < sizeof(struct kmem_cache_node));
>   page = new_slab(kmalloc_caches, gfpflags, node);
> + /* new_slab() disables interupts */
> + local_irq_enable();
>  
>   BUG_ON(!page);
>   n = page->freelist;

Well !! Helps a little, but not enough to boot (hangs little later) :(
I will try to get stack trace for that.

Thanks,
Badari

boot: 2621rc5mm4
Please wait, loading kernel...
Allocated 0x0040 bytes for executable @ 0x0040
   Elf32 kernel loaded...

zImage starting: loaded at 0x0040 (sp: 0x01a3fb10)
Allocating 0x826c40 bytes for kernel ...
OF version = 'IBM,SF225_096'
gunzipping (0x01c0 <- 0x00408000:0x006a8e52)...done 0x760df0 bytes
Finalizing device tree... using OF tree (promptr=00c39a50)
OF stdout device is: /vdevice/[EMAIL PROTECTED]
Hypertas detected, assuming LPAR !
command line: root=/dev/sda2
memory layout at init:
  alloc_bottom : 0242b000
  alloc_top: 0800
  alloc_top_hi : 0001e800
  rmo_top  : 0800
  ram_top  : 0001e800
Looking for displays
found display   : /[EMAIL PROTECTED]/[EMAIL PROTECTED],2/[EMAIL 
PROTECTED]/[EMAIL PROTECTED],
opening ... done
instantiating rtas at 0x077ca000 ... done
 : boot cpu 
0002 : starting cpu hw idx 0002... done
0004 : starting cpu hw idx 0004... done
0006 : starting cpu hw idx 0006... done
copying OF device tree ...
Building dt strings...
Building dt structure...
Device tree strings 0x0242c000 -> 0x0242d2fe
Device tree struct  0x0242e000 -> 0x02443000
Calling quiesce ...
returning from prom_init
Partition configured for 8 cpus.
Starting Linux PPC64 #7 SMP Wed Apr 4 07:52:49 PDT 2007
-
ppc64_pft_size= 0x1b
physicalMemorySize= 0x1e800
ppc64_caches.dcache_line_size = 0x80
ppc64_caches.icache_line_size = 0x80
htab_address  = 0x
htab_hash_mask= 0xf
-
Linux version 2.6.21-rc5-mm4-ppc64 ([EMAIL PROTECTED]) (gcc version 4.1.0
(SUSE Linux)) #7 SMP Wed Apr 4 07:52:49 PDT 2007
[boot]0012 Setup Arch
No ramdisk, default root is /dev/sda2
EEH: PCI Enhanced I/O Error Handling Enabled
PPC64 nvram contains 8192 bytes
Zone PFN ranges:
  DMA 0 ->  1998848
  Normal1998848 ->  1998848
Movable zone start PFN for each node
early_node_map[2] active PFN ranges
0:0 ->   974848
1:   974848 ->  1998848
[boot]0015 Setup Done
Built 2 zonelists.  Total pages: 1971520
Kernel command line: root=/dev/sda2
[boot]0020 XICS Init
[boot]0021 XICS Done
PID hash table entries: 4096 (order: 12, 32768 bytes)
Console: colour dummy device 80x25
console handover: boot [udbg-1] -> real [hvc0]
Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes)
Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes)
freeing bootmem node 0
freeing bootmem node 1
Memory: 7855384k/7995392k available (6064k kernel code, 140008k
reserved, 1236k data, 819k bss, 272k init)
SLUB V6: General Slabs=18, HW alignment=128, Processors=8, Nodes=16
Calibrating delay loop...475.13 BogoMIPS (lpj=2375680)
Security Framework v1.0.0 initialized
Mount-cache hash table e

Re: 2.6.21-rc5-mm4

2007-04-04 Thread Badari Pulavarty
On Tue, 2007-04-03 at 18:16 -0700, Christoph Lameter wrote:
 On Tue, 3 Apr 2007, Badari Pulavarty wrote:
 
  Seems to be an issue with calibrate_delay() spinning in a tight
  loop :(
  
  BTW, machine boots fine with SLAB code - not sure why ?
 
 Interrupt disabled sigh.
 
 Here is the fix:
 
 
 
 
 SLUB: Fix numa bootstrap
 
 NUMA bootstrap calls new_slab() if more than one node is found on bootup.
 new_slab() assumes a standard slab context where interrupts must be
 disabled. It enables interrupts for the call into the page allocator
 and then disables them again. Interrupts do not have to be disabled
 during on bootstrap because we still run single threaded there.
 
 I dropped the interrupt preservation code just before SLUB v6 because
 it looked useless there. SLUB worked on the following NUMA tests
 that just had a single node. Sigh.
 
 Enable interrupts after calling new_slab.
 
 Signed-off-by: Christoph Lameter [EMAIL PROTECTED]
 
 Index: linux-2.6.21-rc5-mm4/mm/slub.c
 ===
 --- linux-2.6.21-rc5-mm4.orig/mm/slub.c   2007-04-03 18:07:41.0 
 -0700
 +++ linux-2.6.21-rc5-mm4/mm/slub.c2007-04-03 18:08:17.0 -0700
 @@ -1436,6 +1436,8 @@ static int init_kmem_cache_nodes(struct 
  
   BUG_ON(s-size  sizeof(struct kmem_cache_node));
   page = new_slab(kmalloc_caches, gfpflags, node);
 + /* new_slab() disables interupts */
 + local_irq_enable();
  
   BUG_ON(!page);
   n = page-freelist;

Well !! Helps a little, but not enough to boot (hangs little later) :(
I will try to get stack trace for that.

Thanks,
Badari

boot: 2621rc5mm4
Please wait, loading kernel...
Allocated 0x0040 bytes for executable @ 0x0040
   Elf32 kernel loaded...

zImage starting: loaded at 0x0040 (sp: 0x01a3fb10)
Allocating 0x826c40 bytes for kernel ...
OF version = 'IBM,SF225_096'
gunzipping (0x01c0 - 0x00408000:0x006a8e52)...done 0x760df0 bytes
Finalizing device tree... using OF tree (promptr=00c39a50)
OF stdout device is: /vdevice/[EMAIL PROTECTED]
Hypertas detected, assuming LPAR !
command line: root=/dev/sda2
memory layout at init:
  alloc_bottom : 0242b000
  alloc_top: 0800
  alloc_top_hi : 0001e800
  rmo_top  : 0800
  ram_top  : 0001e800
Looking for displays
found display   : /[EMAIL PROTECTED]/[EMAIL PROTECTED],2/[EMAIL 
PROTECTED]/[EMAIL PROTECTED],
opening ... done
instantiating rtas at 0x077ca000 ... done
 : boot cpu 
0002 : starting cpu hw idx 0002... done
0004 : starting cpu hw idx 0004... done
0006 : starting cpu hw idx 0006... done
copying OF device tree ...
Building dt strings...
Building dt structure...
Device tree strings 0x0242c000 - 0x0242d2fe
Device tree struct  0x0242e000 - 0x02443000
Calling quiesce ...
returning from prom_init
Partition configured for 8 cpus.
Starting Linux PPC64 #7 SMP Wed Apr 4 07:52:49 PDT 2007
-
ppc64_pft_size= 0x1b
physicalMemorySize= 0x1e800
ppc64_caches.dcache_line_size = 0x80
ppc64_caches.icache_line_size = 0x80
htab_address  = 0x
htab_hash_mask= 0xf
-
Linux version 2.6.21-rc5-mm4-ppc64 ([EMAIL PROTECTED]) (gcc version 4.1.0
(SUSE Linux)) #7 SMP Wed Apr 4 07:52:49 PDT 2007
[boot]0012 Setup Arch
No ramdisk, default root is /dev/sda2
EEH: PCI Enhanced I/O Error Handling Enabled
PPC64 nvram contains 8192 bytes
Zone PFN ranges:
  DMA 0 -  1998848
  Normal1998848 -  1998848
Movable zone start PFN for each node
early_node_map[2] active PFN ranges
0:0 -   974848
1:   974848 -  1998848
[boot]0015 Setup Done
Built 2 zonelists.  Total pages: 1971520
Kernel command line: root=/dev/sda2
[boot]0020 XICS Init
[boot]0021 XICS Done
PID hash table entries: 4096 (order: 12, 32768 bytes)
Console: colour dummy device 80x25
console handover: boot [udbg-1] - real [hvc0]
Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes)
Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes)
freeing bootmem node 0
freeing bootmem node 1
Memory: 7855384k/7995392k available (6064k kernel code, 140008k
reserved, 1236k data, 819k bss, 272k init)
SLUB V6: General Slabs=18, HW alignment=128, Processors=8, Nodes=16
Calibrating delay loop...475.13 BogoMIPS (lpj=2375680)
Security Framework v1.0.0 initialized
Mount-cache hash table entries: 256
Processor 1 found.
Processor 2 found.
Processor 3 found.
Processor 4 found.
Processor 5 found.
Processor 6 found.
Processor 7 found.
Brought up 8 CPUs
mm/memory.c:111: bad pud c000f20c0480.
could not vmalloc 20971520 bytes

Re: 2.6.21-rc5-mm4

2007-04-04 Thread Badari Pulavarty
On Wed, 2007-04-04 at 08:12 -0700, Badari Pulavarty wrote:
 On Tue, 2007-04-03 at 18:16 -0700, Christoph Lameter wrote:
  On Tue, 3 Apr 2007, Badari Pulavarty wrote:
  
   Seems to be an issue with calibrate_delay() spinning in a tight
   loop :(
   
   BTW, machine boots fine with SLAB code - not sure why ?
  
  Interrupt disabled sigh.
  
  Here is the fix:
  
  
  
  
  SLUB: Fix numa bootstrap
  
  NUMA bootstrap calls new_slab() if more than one node is found on bootup.
  new_slab() assumes a standard slab context where interrupts must be
  disabled. It enables interrupts for the call into the page allocator
  and then disables them again. Interrupts do not have to be disabled
  during on bootstrap because we still run single threaded there.
  
  I dropped the interrupt preservation code just before SLUB v6 because
  it looked useless there. SLUB worked on the following NUMA tests
  that just had a single node. Sigh.
  
  Enable interrupts after calling new_slab.
  
  Signed-off-by: Christoph Lameter [EMAIL PROTECTED]
  
  Index: linux-2.6.21-rc5-mm4/mm/slub.c
  ===
  --- linux-2.6.21-rc5-mm4.orig/mm/slub.c 2007-04-03 18:07:41.0 
  -0700
  +++ linux-2.6.21-rc5-mm4/mm/slub.c  2007-04-03 18:08:17.0 -0700
  @@ -1436,6 +1436,8 @@ static int init_kmem_cache_nodes(struct 
   
  BUG_ON(s-size  sizeof(struct kmem_cache_node));
  page = new_slab(kmalloc_caches, gfpflags, node);
  +   /* new_slab() disables interupts */
  +   local_irq_enable();
   
  BUG_ON(!page);
  n = page-freelist;
 
 Well !! Helps a little, but not enough to boot (hangs little later) :(
 I will try to get stack trace for that.

Better debug with slub_debug.
Hope this helps.

Thanks,
Badari

boot: 2621rc5mm4 xmon=on slub_debug
Please wait, loading kernel...
Allocated 0x0040 bytes for executable @ 0x0040
   Elf32 kernel loaded...

zImage starting: loaded at 0x0040 (sp: 0x01a3fb10)
Allocating 0x826c40 bytes for kernel ...
OF version = 'IBM,SF225_096'
gunzipping (0x01c0 - 0x00408000:0x006a8e52)...done 0x760df0 bytes
Finalizing device tree... using OF tree (promptr=00c39a50)
OF stdout device is: /vdevice/[EMAIL PROTECTED]
Hypertas detected, assuming LPAR !
command line: root=/dev/sda2 xmon=on slub_debug
memory layout at init:
  alloc_bottom : 0242b000
  alloc_top: 0800
  alloc_top_hi : 0001e800
  rmo_top  : 0800
  ram_top  : 0001e800
Looking for displays
found display   : /[EMAIL PROTECTED]/[EMAIL PROTECTED],2/[EMAIL 
PROTECTED]/[EMAIL PROTECTED],
opening ... done
instantiating rtas at 0x077ca000 ... done
 : boot cpu 
0002 : starting cpu hw idx 0002... done
0004 : starting cpu hw idx 0004... done
0006 : starting cpu hw idx 0006... done
copying OF device tree ...
Building dt strings...
Building dt structure...
Device tree strings 0x0242c000 - 0x0242d2fe
Device tree struct  0x0242e000 - 0x02443000
Calling quiesce ...
returning from prom_init
Partition configured for 8 cpus.
Starting Linux PPC64 #7 SMP Wed Apr 4 07:52:49 PDT 2007
-
ppc64_pft_size= 0x1b
physicalMemorySize= 0x1e800
ppc64_caches.dcache_line_size = 0x80
ppc64_caches.icache_line_size = 0x80
htab_address  = 0x
htab_hash_mask= 0xf
-
Linux version 2.6.21-rc5-mm4-ppc64 ([EMAIL PROTECTED]) (gcc version 4.1.0
(SUSE Linux)) #7 SMP Wed Apr 4 07:52:49 PDT 2007
[boot]0012 Setup Arch
No ramdisk, default root is /dev/sda2
EEH: PCI Enhanced I/O Error Handling Enabled
PPC64 nvram contains 8192 bytes
Zone PFN ranges:
  DMA 0 -  1998848
  Normal1998848 -  1998848
Movable zone start PFN for each node
early_node_map[2] active PFN ranges
0:0 -   974848
1:   974848 -  1998848
[boot]0015 Setup Done
Built 2 zonelists.  Total pages: 1971520
Kernel command line: root=/dev/sda2 xmon=on slub_debug
[boot]0020 XICS Init
[boot]0021 XICS Done
PID hash table entries: 4096 (order: 12, 32768 bytes)
Console: colour dummy device 80x25
console handover: boot [udbg-1] - real [hvc0]
Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes)
Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes)
freeing bootmem node 0
freeing bootmem node 1
Memory: 7855384k/7995392k available (6064k kernel code, 140008k
reserved, 1236k data, 819k bss, 272k init)
SLUB V6: General Slabs=18, HW alignment=128, Processors=8, Nodes=16
Calibrating delay loop... 475.13 BogoMIPS (lpj=2375680)
Security Framework v1.0.0 initialized
Mount-cache hash table entries: 256
Processor 1 found.
Processor 2

Re: 2.6.21-rc5-mm4

2007-04-04 Thread Jiri Kosina
On Tue, 3 Apr 2007, Jiri Kosina wrote:

  we're also having problems reproducing it on that same combination 
  (2.6.21-rc4 + my tree), so it points to something in -mm. Since your 
  trace is completely different right now it looks like something else 
  is fuzzing it up. Since the e1000 changes are in rc5-mm3 as well, that 
  might help to narrow it down quickly.
 I don't know (yet) whether rc5-mm3 was OK in this respect, I didn't boot 
 it on this machine. I only know that both rc5 and rc5 + e1000 tree are 
 OK, but rc5-mm4 panics on ifconfig/dhclient on e1000 card immediately on 
 my system.
 I will start bisection when I get back to the respective machine 
 (tomorrow) and will let you know.

And the bisection winner is

i386-irq-kill-nr_irq_vectors-and-increase-nr_irqs.patch

I don't immediately see how it could be causing it, so adding CCs which 
are listed in the patch.

Original description of the symptoms at http://lkml.org/lkml/2007/4/3/90

-- 
Jiri Kosina
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc5-mm4 (SLUB)

2007-04-04 Thread Christoph Lameter
On Wed, 4 Apr 2007, Badari Pulavarty wrote:

 On Tue, 2007-04-03 at 16:55 -0700, Christoph Lameter wrote:
  On Tue, 3 Apr 2007, Badari Pulavarty wrote:
  
   Hmm. booted fine with slub_debug :(
  
  Try to selectively disable debug options... if you got the 
  time...
  
  F.e. Try with sanity checks only
  
  slub_debug=F
 
 slub_debug=F got something. 

Ahh Seems that the first 4 bytes of the allocations is zapped after 
the object has been freed. Can you trap writes to the first four bytes of 
the object? This should give you the culprit.

The other thing is that the system is performing DMA allocations
for the file cache Then its running out of memory.

Argh We use  GFP DMA bitmask to check SLAB flags field:

Try this fix:



SLUB: Use correct flags to check for DMA cache

We use a GFP mask to check the SLAB flags if this is a DMA cache.

Fix this by using the correct SLAB mask and then use the SLUB_DMA
for the ORing of flags. If the system does not support DMA then
we will OR zero which will hopefully get the compiler to drop the
useless if statement as well.

Signed-off-by: Christoph Lameter [EMAIL PROTECTED]

Index: linux-2.6.21-rc5-mm4/mm/slub.c
===
--- linux-2.6.21-rc5-mm4.orig/mm/slub.c 2007-04-04 09:59:05.0 -0700
+++ linux-2.6.21-rc5-mm4/mm/slub.c  2007-04-04 10:01:14.0 -0700
@@ -678,8 +678,8 @@ static struct page *allocate_slab(struct
if (s-order)
flags |= __GFP_COMP;
 
-   if (s-flags  SLUB_DMA)
-   flags |= GFP_DMA;
+   if (s-flags  SLAB_CACHE_DMA)
+   flags |= SLUB_DMA;
 
if (node == -1)
page = alloc_pages(flags, s-order);


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc5-mm4 (SLUB powerpc)

2007-04-04 Thread Christoph Lameter
On Wed, 4 Apr 2007, Badari Pulavarty wrote:

 Well !! Helps a little, but not enough to boot (hangs little later) :(
 I will try to get stack trace for that.

Great! Thanks for all the debugging help.

 
 Processor 6 found.
 Processor 7 found.
 Brought up 8 CPUs
 mm/memory.c:111: bad pud c000f20c0480.

Hmmm... Checking for slabs used in powerpc arch code:

The pgtable cache is configured as


  pgtable_cache[i] = kmem_cache_create(name,
 size, size,
 SLAB_HWCACHE_ALIGN |
 SLAB_MUST_HWCACHE_ALIGN,
 zero_ctor,
 NULL);

Hmmm aligned slabs at size and then we MUST_HWCACHE_ALIGN?? Two 
competing alignment requirements and a constructor. Constructor requires
the moving of the free pointer after the slab and thus increases the slab 
size.

Sigh. IF SLAB_HWCACHE_ALIGN is set then SLUB believes this to be the 
ultimate demand that overrides all other alignments and only aligns to the 
cacheline. Try the following fix:



SLUB: Treat SLAB_HWCACHE_ALIGN as a mininum and not as *the* alignment

If the specified alignment is higher than L1_CACHE_BYTES and
SLAB_HWCACHE_ALIGN is set then use the higher alignment.

Signed-off-by: Christoph Lameter [EMAIL PROTECTED]

Index: linux-2.6.21-rc5-mm4/mm/slub.c
===
--- linux-2.6.21-rc5-mm4.orig/mm/slub.c 2007-04-04 10:09:20.0 -0700
+++ linux-2.6.21-rc5-mm4/mm/slub.c  2007-04-04 10:09:42.0 -0700
@@ -1373,10 +1373,7 @@ static int calculate_order(int size)
 static unsigned long calculate_alignment(unsigned long flags,
unsigned long align)
 {
-   if (flags  SLAB_HWCACHE_ALIGN)
-   return L1_CACHE_BYTES;
-
-   if (flags  SLAB_MUST_HWCACHE_ALIGN)
+   if (flags  (SLAB_MUST_HWCACHE_ALIGN | SLAB_HWCACHE_ALIGN))
return max_t(unsigned long, align, L1_CACHE_BYTES);
 
if (align  ARCH_SLAB_MINALIGN)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc5-mm4

2007-04-04 Thread Eric W. Biederman
Jiri Kosina [EMAIL PROTECTED] writes:

 On Tue, 3 Apr 2007, Jiri Kosina wrote:

  we're also having problems reproducing it on that same combination 
  (2.6.21-rc4 + my tree), so it points to something in -mm. Since your 
  trace is completely different right now it looks like something else 
  is fuzzing it up. Since the e1000 changes are in rc5-mm3 as well, that 
  might help to narrow it down quickly.
 I don't know (yet) whether rc5-mm3 was OK in this respect, I didn't boot 
 it on this machine. I only know that both rc5 and rc5 + e1000 tree are 
 OK, but rc5-mm4 panics on ifconfig/dhclient on e1000 card immediately on 
 my system.
 I will start bisection when I get back to the respective machine 
 (tomorrow) and will let you know.

 And the bisection winner is

   i386-irq-kill-nr_irq_vectors-and-increase-nr_irqs.patch

 I don't immediately see how it could be causing it, so adding CCs which 
 are listed in the patch.

Weird.  I will have to look at that in a little more detail.

Do you know if this problem happens on x86_64?
What does your .config look like?
What does /proc/interrupts look like?
What kind of hardware you running this kernel on?
Can anyone else reproduce this?

The oops clearly shows something using -1 and calling that as an
address I don't know why, but I'm guessing I have triggered a memory
stomp somewhere.  I think this is the first time I have seen a small
negative number causing a NULL pointer dereference.

That patch looks innocuous enough that either:
- I just missed changing something I should have.
- Your configuration has an increase in NR_IRQS and that triggered
  something.
- The patch simply permuted things so a memory stomp now happens
  on the e1000 data structures instead of somewhere else.
- Something doesn't like large irq numbers.

This work is essentially a backport from x86_64 so if your hardware
is 64bit capable testing that should be a fairly easy test, and be
able to rule out large irq numbers as the culprit.

Until I get a good look at -mm I'm going to have a hard time guessing.
But a roving memory stomp is my best guess.


Eric
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc5-mm4 (SLUB powerpc)

2007-04-04 Thread Badari Pulavarty
On Wed, 2007-04-04 at 10:13 -0700, Christoph Lameter wrote:
 On Wed, 4 Apr 2007, Badari Pulavarty wrote:
 
  Well !! Helps a little, but not enough to boot (hangs little later) :(
  I will try to get stack trace for that.
 
 Great! Thanks for all the debugging help.
 
  
  Processor 6 found.
  Processor 7 found.
  Brought up 8 CPUs
  mm/memory.c:111: bad pud c000f20c0480.
 
 Hmmm... Checking for slabs used in powerpc arch code:
 
 The pgtable cache is configured as
 
 
   pgtable_cache[i] = kmem_cache_create(name,
  size, size,
  SLAB_HWCACHE_ALIGN |
  SLAB_MUST_HWCACHE_ALIGN,
  zero_ctor,
  NULL);
 
 Hmmm aligned slabs at size and then we MUST_HWCACHE_ALIGN?? Two 
 competing alignment requirements and a constructor. Constructor requires
 the moving of the free pointer after the slab and thus increases the slab 
 size.
 
 Sigh. IF SLAB_HWCACHE_ALIGN is set then SLUB believes this to be the 
 ultimate demand that overrides all other alignments and only aligns to the 
 cacheline. Try the following fix:
 
 
 
 SLUB: Treat SLAB_HWCACHE_ALIGN as a mininum and not as *the* alignment
 
 If the specified alignment is higher than L1_CACHE_BYTES and
 SLAB_HWCACHE_ALIGN is set then use the higher alignment.
 
 Signed-off-by: Christoph Lameter [EMAIL PROTECTED]
 
 Index: linux-2.6.21-rc5-mm4/mm/slub.c
 ===
 --- linux-2.6.21-rc5-mm4.orig/mm/slub.c   2007-04-04 10:09:20.0 
 -0700
 +++ linux-2.6.21-rc5-mm4/mm/slub.c2007-04-04 10:09:42.0 -0700
 @@ -1373,10 +1373,7 @@ static int calculate_order(int size)
  static unsigned long calculate_alignment(unsigned long flags,
   unsigned long align)
  {
 - if (flags  SLAB_HWCACHE_ALIGN)
 - return L1_CACHE_BYTES;
 -
 - if (flags  SLAB_MUST_HWCACHE_ALIGN)
 + if (flags  (SLAB_MUST_HWCACHE_ALIGN | SLAB_HWCACHE_ALIGN))
   return max_t(unsigned long, align, L1_CACHE_BYTES);
  
   if (align  ARCH_SLAB_MINALIGN)

Next issue ? Sorry.

Thanks,
Badari

Allocated 0x0040 bytes for executable @ 0x0040
   Elf32 kernel loaded...

zImage starting: loaded at 0x0040 (sp: 0x01a3fb10)
Allocating 0x822c40 bytes for kernel ...
OF version = 'IBM,SF225_096'
gunzipping (0x01c0 - 0x00408000:0x006a8eac)...done 0x75cdf0 bytes
Finalizing device tree... using OF tree (promptr=00c39a50)
OF stdout device is: /vdevice/[EMAIL PROTECTED]
Hypertas detected, assuming LPAR !
command line: root=/dev/sda2 xmon=on slub_debug
memory layout at init:
  alloc_bottom : 02427000
  alloc_top: 0800
  alloc_top_hi : 0001e800
  rmo_top  : 0800
  ram_top  : 0001e800
Looking for displays
found display   : /[EMAIL PROTECTED]/[EMAIL PROTECTED],2/[EMAIL 
PROTECTED]/[EMAIL PROTECTED],
opening ... done
instantiating rtas at 0x077ca000 ... done
 : boot cpu 
0002 : starting cpu hw idx 0002... done
0004 : starting cpu hw idx 0004... done
0006 : starting cpu hw idx 0006... done
copying OF device tree ...
Building dt strings...
Building dt structure...
Device tree strings 0x02428000 - 0x024292fe
Device tree struct  0x0242a000 - 0x0243f000
Calling quiesce ...
returning from prom_init
Partition configured for 8 cpus.
Starting Linux PPC64 #8 SMP Wed Apr 4 10:21:43 PDT 2007
-
ppc64_pft_size= 0x1b
physicalMemorySize= 0x1e800
ppc64_caches.dcache_line_size = 0x80
ppc64_caches.icache_line_size = 0x80
htab_address  = 0x
htab_hash_mask= 0xf
-
Linux version 2.6.21-rc5-mm4-ppc64 ([EMAIL PROTECTED]) (gcc version 4.1.0
(SUSE Linux)) #8 SMP Wed Apr 4 10:21:43 PDT 2007
[boot]0012 Setup Arch
No ramdisk, default root is /dev/sda2
EEH: PCI Enhanced I/O Error Handling Enabled
PPC64 nvram contains 8192 bytes
Zone PFN ranges:
  DMA 0 -  1998848
  Normal1998848 -  1998848
Movable zone start PFN for each node
early_node_map[2] active PFN ranges
0:0 -   974848
1:   974848 -  1998848
[boot]0015 Setup Done
Built 2 zonelists.  Total pages: 1971520
Kernel command line: root=/dev/sda2 xmon=on slub_debug
[boot]0020 XICS Init
[boot]0021 XICS Done
PID hash table entries: 4096 (order: 12, 32768 bytes)
Console: colour dummy device 80x25
console handover: boot [udbg-1] - real [hvc0]
Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes)
Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes)
freeing bootmem node

  1   2   3   >