Re: Kernel NFS lockd freezes notebook on shutdown (Linux 2.6.22-rc1 + CFS v12)

2007-05-20 Thread Oleg Nesterov
On 05/20, [EMAIL PROTECTED] wrote:
> 
> I've done some more tests and quite frankly I think this is really related 
> to the dreaded ''fglrx.ko'' module. It seems to me that it is much easier 
> to reproduce the problem if that damn module is loaded. It does uses 
> workqueue. Then there is another driver ipw3945 loaded and it is required 
> to run binary only ''ipw3945d'' daemon just to start using wireless driver 
> ...
> 
> In either way both these kernel modules are workqueue users.
> 
> Btw, I had also tested kernel (compiled from the same source) but on 
> different laptop (EVO N800v), single core, Pentium M 2GHz. Kernel is not 
> freezing on shutdown, even loop nfs kernel stop/start - does not cause any 
> kernel panic as on nx9420 (Dual Core) laptop. And that with or without any 
> patch applied from Oleg. :((

Great. Even if not a bugfix, this patch is a reasonable cleanup anyway.

Thank you very much for additional testing and report!

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel NFS lockd freezes notebook on shutdown (Linux 2.6.22-rc1 + CFS v12)

2007-05-20 Thread J. Bruce Fields
On Sun, May 20, 2007 at 01:37:13PM +0300, [EMAIL PROTECTED] wrote:
> Hello Oleg,
> 
> I've done some more tests and quite frankly I think this is really related 
> to the dreaded ''fglrx.ko'' module. It seems to me that it is much easier 
> to reproduce the problem if that damn module is loaded. It does uses 
> workqueue. Then there is another driver ipw3945 loaded and it is required 
> to run binary only ''ipw3945d'' daemon just to start using wireless driver 
> ...
> 
> In either way both these kernel modules are workqueue users.

Have you ever been able to reproduce the problem on a kernel that never
had those modules loaded?

--b.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel NFS lockd freezes notebook on shutdown (Linux 2.6.22-rc1 + CFS v12)

2007-05-20 Thread zilvinas

Hello Oleg,

I've done some more tests and quite frankly I think this is really related 
to the dreaded ''fglrx.ko'' module. It seems to me that it is much easier 
to reproduce the problem if that damn module is loaded. It does uses 
workqueue. Then there is another driver ipw3945 loaded and it is required 
to run binary only ''ipw3945d'' daemon just to start using wireless driver 
...


In either way both these kernel modules are workqueue users.

Btw, I had also tested kernel (compiled from the same source) but on 
different laptop (EVO N800v), single core, Pentium M 2GHz. Kernel is not 
freezing on shutdown, even loop nfs kernel stop/start - does not cause any 
kernel panic as on nx9420 (Dual Core) laptop. And that with or without any 
patch applied from Oleg. :((


I think this time it is really needed to stop here, kernel was tainted for 
a reason. :(((


Thank you both, Oleg and Andrew.

Zilvinas "Lucky ATI fglrx owner" Valinskas

On Sat, 19 May 2007, Oleg Nesterov wrote:


On 05/18, Zilvinas Valinskas wrote:


On Thu, 2007-05-17 at 22:45 +0400, Oleg Nesterov wrote:


However, I can't understand why cleanup_workqueue_thread() hangs anyway.
It shouldn't. Looks like rpciod/1 was preempted, and can't get CPU. According
to kernel-nfs-freeze.log it is TASK_RUNNING. Strange.

It is very sad, because this code was supposed to be cleanuped anyway,
but if it is really buggy, it would be great to know why.


Can this be related to :

CONFIG_PREEMPT=y


Yes, but this preemption should be very unlikely, but it happens every time
for you, strange. lockd in turn spins with preemption enabled, but somehow
rpciod/1 can't make progress. system_state == SYSTEM_HALT, but this shouldn't
affect preempt_schedule_irq(). So I think there is something else.


workqueue.objdump - without any patch.


So it hangs waiting for cwq->thread == NULL, as expected.

OK. I still can't see how this code could be wrong, but it is bad anyway and
should be changed. The 2nd patch was done more than a month ago, but was
delayed for some stupid reasons. I'll send it today.

Still, it is not clear to me what happens, and you have other crashes with
nfs stop/start

http://marc.info/?l=linux-kernel=117939027602591
http://marc.info/?l=linux-kernel=117939257630947

which probaly need some attention.

Thanks!

Oleg.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel NFS lockd freezes notebook on shutdown (Linux 2.6.22-rc1 + CFS v12)

2007-05-20 Thread zilvinas

Hello Oleg,

I've done some more tests and quite frankly I think this is really related 
to the dreaded ''fglrx.ko'' module. It seems to me that it is much easier 
to reproduce the problem if that damn module is loaded. It does uses 
workqueue. Then there is another driver ipw3945 loaded and it is required 
to run binary only ''ipw3945d'' daemon just to start using wireless driver 
...


In either way both these kernel modules are workqueue users.

Btw, I had also tested kernel (compiled from the same source) but on 
different laptop (EVO N800v), single core, Pentium M 2GHz. Kernel is not 
freezing on shutdown, even loop nfs kernel stop/start - does not cause any 
kernel panic as on nx9420 (Dual Core) laptop. And that with or without any 
patch applied from Oleg. :((


I think this time it is really needed to stop here, kernel was tainted for 
a reason. :(((


Thank you both, Oleg and Andrew.

Zilvinas Lucky ATI fglrx owner Valinskas

On Sat, 19 May 2007, Oleg Nesterov wrote:


On 05/18, Zilvinas Valinskas wrote:


On Thu, 2007-05-17 at 22:45 +0400, Oleg Nesterov wrote:


However, I can't understand why cleanup_workqueue_thread() hangs anyway.
It shouldn't. Looks like rpciod/1 was preempted, and can't get CPU. According
to kernel-nfs-freeze.log it is TASK_RUNNING. Strange.

It is very sad, because this code was supposed to be cleanuped anyway,
but if it is really buggy, it would be great to know why.


Can this be related to :

CONFIG_PREEMPT=y


Yes, but this preemption should be very unlikely, but it happens every time
for you, strange. lockd in turn spins with preemption enabled, but somehow
rpciod/1 can't make progress. system_state == SYSTEM_HALT, but this shouldn't
affect preempt_schedule_irq(). So I think there is something else.


workqueue.objdump - without any patch.


So it hangs waiting for cwq-thread == NULL, as expected.

OK. I still can't see how this code could be wrong, but it is bad anyway and
should be changed. The 2nd patch was done more than a month ago, but was
delayed for some stupid reasons. I'll send it today.

Still, it is not clear to me what happens, and you have other crashes with
nfs stop/start

http://marc.info/?l=linux-kernelm=117939027602591
http://marc.info/?l=linux-kernelm=117939257630947

which probaly need some attention.

Thanks!

Oleg.



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel NFS lockd freezes notebook on shutdown (Linux 2.6.22-rc1 + CFS v12)

2007-05-20 Thread J. Bruce Fields
On Sun, May 20, 2007 at 01:37:13PM +0300, [EMAIL PROTECTED] wrote:
 Hello Oleg,
 
 I've done some more tests and quite frankly I think this is really related 
 to the dreaded ''fglrx.ko'' module. It seems to me that it is much easier 
 to reproduce the problem if that damn module is loaded. It does uses 
 workqueue. Then there is another driver ipw3945 loaded and it is required 
 to run binary only ''ipw3945d'' daemon just to start using wireless driver 
 ...
 
 In either way both these kernel modules are workqueue users.

Have you ever been able to reproduce the problem on a kernel that never
had those modules loaded?

--b.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel NFS lockd freezes notebook on shutdown (Linux 2.6.22-rc1 + CFS v12)

2007-05-20 Thread Oleg Nesterov
On 05/20, [EMAIL PROTECTED] wrote:
 
 I've done some more tests and quite frankly I think this is really related 
 to the dreaded ''fglrx.ko'' module. It seems to me that it is much easier 
 to reproduce the problem if that damn module is loaded. It does uses 
 workqueue. Then there is another driver ipw3945 loaded and it is required 
 to run binary only ''ipw3945d'' daemon just to start using wireless driver 
 ...
 
 In either way both these kernel modules are workqueue users.
 
 Btw, I had also tested kernel (compiled from the same source) but on 
 different laptop (EVO N800v), single core, Pentium M 2GHz. Kernel is not 
 freezing on shutdown, even loop nfs kernel stop/start - does not cause any 
 kernel panic as on nx9420 (Dual Core) laptop. And that with or without any 
 patch applied from Oleg. :((

Great. Even if not a bugfix, this patch is a reasonable cleanup anyway.

Thank you very much for additional testing and report!

Oleg.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel NFS lockd freezes notebook on shutdown (Linux 2.6.22-rc1 + CFS v12)

2007-05-18 Thread Oleg Nesterov
On 05/18, Zilvinas Valinskas wrote:
>
> On Thu, 2007-05-17 at 22:45 +0400, Oleg Nesterov wrote:
> >
> > However, I can't understand why cleanup_workqueue_thread() hangs anyway.
> > It shouldn't. Looks like rpciod/1 was preempted, and can't get CPU. 
> > According
> > to kernel-nfs-freeze.log it is TASK_RUNNING. Strange.
> >
> > It is very sad, because this code was supposed to be cleanuped anyway,
> > but if it is really buggy, it would be great to know why.
>
> Can this be related to :
>
> CONFIG_PREEMPT=y

Yes, but this preemption should be very unlikely, but it happens every time
for you, strange. lockd in turn spins with preemption enabled, but somehow
rpciod/1 can't make progress. system_state == SYSTEM_HALT, but this shouldn't
affect preempt_schedule_irq(). So I think there is something else.

> workqueue.objdump - without any patch.

So it hangs waiting for cwq->thread == NULL, as expected.

OK. I still can't see how this code could be wrong, but it is bad anyway and
should be changed. The 2nd patch was done more than a month ago, but was
delayed for some stupid reasons. I'll send it today.

Still, it is not clear to me what happens, and you have other crashes with
nfs stop/start

http://marc.info/?l=linux-kernel=117939027602591
http://marc.info/?l=linux-kernel=117939257630947

which probaly need some attention.

Thanks!

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel NFS lockd freezes notebook on shutdown (Linux 2.6.22-rc1 + CFS v12)

2007-05-18 Thread Andrew Morton
On Fri, 18 May 2007 15:17:36 +0300 Zilvinas Valinskas <[EMAIL PROTECTED]> wrote:

> Have found this in dmesg (well earlier because of initcall_debug) I've
> never noticed that during boot (scrolls away too fast). Anyway -
> 
> [7.841871] NetLabel: Initializing
> [7.841983] NetLabel:  domain hash size = 128
> [7.842095] NetLabel:  protocols = UNLABELED CIPSOv4
> [7.842219] NetLabel:  unlabeled traffic allowed by default
> [7.842338] BUG: at include/linux/slub_def.h:77 kmalloc_index()
> [7.842451] 
> [7.842452] Call Trace:
> [7.842677]  [] get_slab+0x1cc/0x260
> [7.842791]  [] __kmalloc+0xd/0x80
> [7.842907]  [] cache_k8_northbridges+0x7e/0x100
> [7.843024]  [] gart_iommu_init+0x33/0x5b0
> [7.843140]  [] netlbl_unlabel_acceptflg_set+0x86/0xf0
> [7.843255]  [] pci_iommu_init+0x9/0x20
> [7.843370]  [] kernel_init+0x157/0x330
> [7.843485]  [] child_rip+0xa/0x12
> [7.843601]  [] acpi_ds_init_one_object+0x0/0x7c
> [7.843715]  [] kernel_init+0x0/0x330
> [7.843829]  [] child_rip+0x0/0x12
> [7.843941] 
> [7.844056] PCI-GART: No AMD northbridge found.


yup, thanks - the below patch will be in this evening's batch -> Linus.



From: Ben Collins <[EMAIL PROTECTED]>

kmalloc for flush_words resulted in zero size allocation when no
k8_northbridges existed.  Short circuit the code path for this case.

Also remove uneeded zeroing of num_k8_northbridges just after checking if
it is zero.

Signed-off-by: Ben Collins <[EMAIL PROTECTED]>
Cc: Andi Kleen <[EMAIL PROTECTED]>
Cc: Dave Jones <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 arch/x86_64/kernel/k8.c |7 ++-
 1 files changed, 6 insertions(+), 1 deletion(-)

diff -puN 
arch/x86_64/kernel/k8.c~avoid-zero-size-allocation-in-cache_k8_northbridges 
arch/x86_64/kernel/k8.c
--- 
a/arch/x86_64/kernel/k8.c~avoid-zero-size-allocation-in-cache_k8_northbridges
+++ a/arch/x86_64/kernel/k8.c
@@ -39,10 +39,10 @@ int cache_k8_northbridges(void)
 {
int i;
struct pci_dev *dev;
+
if (num_k8_northbridges)
return 0;
 
-   num_k8_northbridges = 0;
dev = NULL;
while ((dev = next_k8_northbridge(dev)) != NULL)
num_k8_northbridges++;
@@ -52,6 +52,11 @@ int cache_k8_northbridges(void)
if (!k8_northbridges)
return -ENOMEM;
 
+   if (!num_k8_northbridges) {
+   k8_northbridges[0] = NULL;
+   return 0;
+   }
+
flush_words = kmalloc(num_k8_northbridges * sizeof(u32), GFP_KERNEL);
if (!flush_words) {
kfree(k8_northbridges);
_

> Does this backtrace looks sane ? Hmm, netlabel code mixes with
> acpi_ds_init_one_object() ... Strange.

Backtraces can be pretty messy nowadays.  CONFIG_FRAME_POINTER helps
improve them.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel NFS lockd freezes notebook on shutdown (Linux 2.6.22-rc1 + CFS v12)

2007-05-18 Thread Zilvinas Valinskas
Hello, 

Have found this in dmesg (well earlier because of initcall_debug) I've
never noticed that during boot (scrolls away too fast). Anyway -

[7.841871] NetLabel: Initializing
[7.841983] NetLabel:  domain hash size = 128
[7.842095] NetLabel:  protocols = UNLABELED CIPSOv4
[7.842219] NetLabel:  unlabeled traffic allowed by default
[7.842338] BUG: at include/linux/slub_def.h:77 kmalloc_index()
[7.842451] 
[7.842452] Call Trace:
[7.842677]  [] get_slab+0x1cc/0x260
[7.842791]  [] __kmalloc+0xd/0x80
[7.842907]  [] cache_k8_northbridges+0x7e/0x100
[7.843024]  [] gart_iommu_init+0x33/0x5b0
[7.843140]  [] netlbl_unlabel_acceptflg_set+0x86/0xf0
[7.843255]  [] pci_iommu_init+0x9/0x20
[7.843370]  [] kernel_init+0x157/0x330
[7.843485]  [] child_rip+0xa/0x12
[7.843601]  [] acpi_ds_init_one_object+0x0/0x7c
[7.843715]  [] kernel_init+0x0/0x330
[7.843829]  [] child_rip+0x0/0x12
[7.843941] 
[7.844056] PCI-GART: No AMD northbridge found.

Does this backtrace looks sane ? Hmm, netlabel code mixes with
acpi_ds_init_one_object() ... Strange.

On Wed, 2007-05-16 at 12:15 -0700, Andrew Morton wrote:
> On Wed, 16 May 2007 21:00:41 +0300
> Zilvinas Valinskas <[EMAIL PROTECTED]> wrote:
> 
> > Hello, 
> > 
> > In short, on shutdown my laptop is always freezing now. I was able to
> > capture the 'sysrq-P' (hit that several times), sysrq-T outputs. Please
> > see .config and log messages at http://barclay.balt.net/~zilvinas/oops/ 
> > 
> > Kernel version I had built according git is :
> > 
> > [EMAIL PROTECTED]:/projects/linux-amd64.git$ git describe HEAD
> > v2.6.22-rc1-29-gfaa8b6c
> > 
> > On top of that I have CFS v12 applied (no other changes otherwise).
> > Please note that there is ''fglrx.ko'' loaded and kernel is tainted
> > because of that (feel free to ignore the report ...).
> > 
> > Anyway, 'sysrq-P' always show that PC is stuck at (NFS lockd?) and it is
> > always the same backtrace is shown. 'sysrq-t' output is in
> > 'kernel-nfs-freeze.log' file (did not want to post it here).
> > 
> >  Pid: 3652, comm: lockd Tainted: P   2.6.22-rc1-cfs-v12 #1
> > 
> > [] wq_barrier_func+0x0/0x10
> > [] destroy_workqueue+0x75/0xa0
> > [] :sunrpc:rpciod_down+0xf4/0x170
> > [] :lockd:lockd+0x244/0x300
> > [] schedule_tail+0x3f/0xb0
> > [] child_rip+0xa/0x12
> > [] :lockd:lockd+0x0/0x300
> > [] :lockd:lockd+0x0/0x300
> > [] child_rip+0x0/0x12
> > 
> > Hope this helps. Thanks in advance for any advice how to solve problem !
> > For now I am back to '2.6.21.1-cfs-v10'.
> > 
> 
> Thanks for the report.   I'm thinking "Oleg".

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel NFS lockd freezes notebook on shutdown (Linux 2.6.22-rc1 + CFS v12)

2007-05-18 Thread Zilvinas Valinskas
Hello Oleg,

On Thu, 2007-05-17 at 22:45 +0400, Oleg Nesterov wrote:
> Hello Zilvinas,
> 
> On 05/17, Zilvinas Valinskas wrote:
> > 
> > Patch seems to help and it seems kernel doesn't free anymore. I've
> > booted new kernel and did :
> 
> OK, thank you very much. So, we have some other problems, and I _think_
> that workqueue.c is not the source of them.

You are welcome. I wish I could determine and fix the problem myself. I
will try to help, debug the problem as long as there is any progress or
ideas to try out.

> However, I can't understand why cleanup_workqueue_thread() hangs anyway.
> It shouldn't. Looks like rpciod/1 was preempted, and can't get CPU. According
> to kernel-nfs-freeze.log it is TASK_RUNNING. Strange.
> 
> It is very sad, because this code was supposed to be cleanuped anyway,
> but if it is really buggy, it would be great to know why.

Can this be related to :

CONFIG_PREEMPT=y
# CONFIG_PREEMPT_BKL is not set


> Perhaps, we can understand the problem with your help. Could you please
> revert the patch I sent, and send me (privately) the output of
> 
>   objdump -d kernel/workqueue.o


I have uploaded files at http://barclay.balt.net/~zilvinas/oops/ 

workqueue.objdump - without any patch.
workqueue+oleg-old.objdump - with older patch Oleg sent on Thu, 17 May.
workqueue+oleg-new.objdump - with the newest patch from Oleg applied.

For what it's worth, I am using Debian/Unstable 
$ gcc -v
Using built-in specs.
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --enable-languages=c,c
++,fortran,objc,obj-c++,treelang --prefix=/usr --enable-shared
--with-system-zlib --libexecdir=/usr/lib --without-included-gettext
--enable-threads=posix --enable-nls
--with-gxx-include-dir=/usr/include/c++/4.1.3 --program-suffix=-4.1
--enable-__cxa_atexit --enable-clocale=gnu --enable-libstdcxx-debug
--enable-mpfr --enable-checking=release x86_64-linux-gnu
Thread model: posix
gcc version 4.1.3 20070514 (prerelease) (Debian 4.1.2-7)

$ ld -V
GNU ld (GNU Binutils for Debian) 2.17.50.20070426
  Supported emulations:
   elf_x86_64
   elf_i386
   i386linux

> ? I doubt very much I'll see something interesting, but who knows...
> 
> Thanks!
> 
> Oleg.
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel NFS lockd freezes notebook on shutdown (Linux 2.6.22-rc1 + CFS v12)

2007-05-18 Thread Zilvinas Valinskas
Hello Oleg,

On Thu, 2007-05-17 at 22:45 +0400, Oleg Nesterov wrote:
 Hello Zilvinas,
 
 On 05/17, Zilvinas Valinskas wrote:
  
  Patch seems to help and it seems kernel doesn't free anymore. I've
  booted new kernel and did :
 
 OK, thank you very much. So, we have some other problems, and I _think_
 that workqueue.c is not the source of them.

You are welcome. I wish I could determine and fix the problem myself. I
will try to help, debug the problem as long as there is any progress or
ideas to try out.

 However, I can't understand why cleanup_workqueue_thread() hangs anyway.
 It shouldn't. Looks like rpciod/1 was preempted, and can't get CPU. According
 to kernel-nfs-freeze.log it is TASK_RUNNING. Strange.
 
 It is very sad, because this code was supposed to be cleanuped anyway,
 but if it is really buggy, it would be great to know why.

Can this be related to :

CONFIG_PREEMPT=y
# CONFIG_PREEMPT_BKL is not set


 Perhaps, we can understand the problem with your help. Could you please
 revert the patch I sent, and send me (privately) the output of
 
   objdump -d kernel/workqueue.o


I have uploaded files at http://barclay.balt.net/~zilvinas/oops/ 

workqueue.objdump - without any patch.
workqueue+oleg-old.objdump - with older patch Oleg sent on Thu, 17 May.
workqueue+oleg-new.objdump - with the newest patch from Oleg applied.

For what it's worth, I am using Debian/Unstable 
$ gcc -v
Using built-in specs.
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --enable-languages=c,c
++,fortran,objc,obj-c++,treelang --prefix=/usr --enable-shared
--with-system-zlib --libexecdir=/usr/lib --without-included-gettext
--enable-threads=posix --enable-nls
--with-gxx-include-dir=/usr/include/c++/4.1.3 --program-suffix=-4.1
--enable-__cxa_atexit --enable-clocale=gnu --enable-libstdcxx-debug
--enable-mpfr --enable-checking=release x86_64-linux-gnu
Thread model: posix
gcc version 4.1.3 20070514 (prerelease) (Debian 4.1.2-7)

$ ld -V
GNU ld (GNU Binutils for Debian) 2.17.50.20070426
  Supported emulations:
   elf_x86_64
   elf_i386
   i386linux

 ? I doubt very much I'll see something interesting, but who knows...
 
 Thanks!
 
 Oleg.
 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel NFS lockd freezes notebook on shutdown (Linux 2.6.22-rc1 + CFS v12)

2007-05-18 Thread Zilvinas Valinskas
Hello, 

Have found this in dmesg (well earlier because of initcall_debug) I've
never noticed that during boot (scrolls away too fast). Anyway -

[7.841871] NetLabel: Initializing
[7.841983] NetLabel:  domain hash size = 128
[7.842095] NetLabel:  protocols = UNLABELED CIPSOv4
[7.842219] NetLabel:  unlabeled traffic allowed by default
[7.842338] BUG: at include/linux/slub_def.h:77 kmalloc_index()
[7.842451] 
[7.842452] Call Trace:
[7.842677]  [8029215c] get_slab+0x1cc/0x260
[7.842791]  [8029229d] __kmalloc+0xd/0x80
[7.842907]  [802219ee] cache_k8_northbridges+0x7e/0x100
[7.843024]  [8062bd13] gart_iommu_init+0x33/0x5b0
[7.843140]  [8049f836] netlbl_unlabel_acceptflg_set+0x86/0xf0
[7.843255]  [80626f49] pci_iommu_init+0x9/0x20
[7.843370]  [806216d7] kernel_init+0x157/0x330
[7.843485]  [8020b0f8] child_rip+0xa/0x12
[7.843601]  [80373fd8] acpi_ds_init_one_object+0x0/0x7c
[7.843715]  [80621580] kernel_init+0x0/0x330
[7.843829]  [8020b0ee] child_rip+0x0/0x12
[7.843941] 
[7.844056] PCI-GART: No AMD northbridge found.

Does this backtrace looks sane ? Hmm, netlabel code mixes with
acpi_ds_init_one_object() ... Strange.

On Wed, 2007-05-16 at 12:15 -0700, Andrew Morton wrote:
 On Wed, 16 May 2007 21:00:41 +0300
 Zilvinas Valinskas [EMAIL PROTECTED] wrote:
 
  Hello, 
  
  In short, on shutdown my laptop is always freezing now. I was able to
  capture the 'sysrq-P' (hit that several times), sysrq-T outputs. Please
  see .config and log messages at http://barclay.balt.net/~zilvinas/oops/ 
  
  Kernel version I had built according git is :
  
  [EMAIL PROTECTED]:/projects/linux-amd64.git$ git describe HEAD
  v2.6.22-rc1-29-gfaa8b6c
  
  On top of that I have CFS v12 applied (no other changes otherwise).
  Please note that there is ''fglrx.ko'' loaded and kernel is tainted
  because of that (feel free to ignore the report ...).
  
  Anyway, 'sysrq-P' always show that PC is stuck at (NFS lockd?) and it is
  always the same backtrace is shown. 'sysrq-t' output is in
  'kernel-nfs-freeze.log' file (did not want to post it here).
  
   Pid: 3652, comm: lockd Tainted: P   2.6.22-rc1-cfs-v12 #1
  
  [8024a5a0] wq_barrier_func+0x0/0x10
  [8024a7e5] destroy_workqueue+0x75/0xa0
  [8833cd34] :sunrpc:rpciod_down+0xf4/0x170
  [8836dd74] :lockd:lockd+0x244/0x300
  [80233e1f] schedule_tail+0x3f/0xb0
  [8020b0f8] child_rip+0xa/0x12
  [8836db30] :lockd:lockd+0x0/0x300
  [8836db30] :lockd:lockd+0x0/0x300
  [8020b0ee] child_rip+0x0/0x12
  
  Hope this helps. Thanks in advance for any advice how to solve problem !
  For now I am back to '2.6.21.1-cfs-v10'.
  
 
 Thanks for the report.   I'm thinking Oleg.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel NFS lockd freezes notebook on shutdown (Linux 2.6.22-rc1 + CFS v12)

2007-05-18 Thread Andrew Morton
On Fri, 18 May 2007 15:17:36 +0300 Zilvinas Valinskas [EMAIL PROTECTED] wrote:

 Have found this in dmesg (well earlier because of initcall_debug) I've
 never noticed that during boot (scrolls away too fast). Anyway -
 
 [7.841871] NetLabel: Initializing
 [7.841983] NetLabel:  domain hash size = 128
 [7.842095] NetLabel:  protocols = UNLABELED CIPSOv4
 [7.842219] NetLabel:  unlabeled traffic allowed by default
 [7.842338] BUG: at include/linux/slub_def.h:77 kmalloc_index()
 [7.842451] 
 [7.842452] Call Trace:
 [7.842677]  [8029215c] get_slab+0x1cc/0x260
 [7.842791]  [8029229d] __kmalloc+0xd/0x80
 [7.842907]  [802219ee] cache_k8_northbridges+0x7e/0x100
 [7.843024]  [8062bd13] gart_iommu_init+0x33/0x5b0
 [7.843140]  [8049f836] netlbl_unlabel_acceptflg_set+0x86/0xf0
 [7.843255]  [80626f49] pci_iommu_init+0x9/0x20
 [7.843370]  [806216d7] kernel_init+0x157/0x330
 [7.843485]  [8020b0f8] child_rip+0xa/0x12
 [7.843601]  [80373fd8] acpi_ds_init_one_object+0x0/0x7c
 [7.843715]  [80621580] kernel_init+0x0/0x330
 [7.843829]  [8020b0ee] child_rip+0x0/0x12
 [7.843941] 
 [7.844056] PCI-GART: No AMD northbridge found.


yup, thanks - the below patch will be in this evening's batch - Linus.



From: Ben Collins [EMAIL PROTECTED]

kmalloc for flush_words resulted in zero size allocation when no
k8_northbridges existed.  Short circuit the code path for this case.

Also remove uneeded zeroing of num_k8_northbridges just after checking if
it is zero.

Signed-off-by: Ben Collins [EMAIL PROTECTED]
Cc: Andi Kleen [EMAIL PROTECTED]
Cc: Dave Jones [EMAIL PROTECTED]
Signed-off-by: Andrew Morton [EMAIL PROTECTED]
---

 arch/x86_64/kernel/k8.c |7 ++-
 1 files changed, 6 insertions(+), 1 deletion(-)

diff -puN 
arch/x86_64/kernel/k8.c~avoid-zero-size-allocation-in-cache_k8_northbridges 
arch/x86_64/kernel/k8.c
--- 
a/arch/x86_64/kernel/k8.c~avoid-zero-size-allocation-in-cache_k8_northbridges
+++ a/arch/x86_64/kernel/k8.c
@@ -39,10 +39,10 @@ int cache_k8_northbridges(void)
 {
int i;
struct pci_dev *dev;
+
if (num_k8_northbridges)
return 0;
 
-   num_k8_northbridges = 0;
dev = NULL;
while ((dev = next_k8_northbridge(dev)) != NULL)
num_k8_northbridges++;
@@ -52,6 +52,11 @@ int cache_k8_northbridges(void)
if (!k8_northbridges)
return -ENOMEM;
 
+   if (!num_k8_northbridges) {
+   k8_northbridges[0] = NULL;
+   return 0;
+   }
+
flush_words = kmalloc(num_k8_northbridges * sizeof(u32), GFP_KERNEL);
if (!flush_words) {
kfree(k8_northbridges);
_

 Does this backtrace looks sane ? Hmm, netlabel code mixes with
 acpi_ds_init_one_object() ... Strange.

Backtraces can be pretty messy nowadays.  CONFIG_FRAME_POINTER helps
improve them.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel NFS lockd freezes notebook on shutdown (Linux 2.6.22-rc1 + CFS v12)

2007-05-18 Thread Oleg Nesterov
On 05/18, Zilvinas Valinskas wrote:

 On Thu, 2007-05-17 at 22:45 +0400, Oleg Nesterov wrote:
 
  However, I can't understand why cleanup_workqueue_thread() hangs anyway.
  It shouldn't. Looks like rpciod/1 was preempted, and can't get CPU. 
  According
  to kernel-nfs-freeze.log it is TASK_RUNNING. Strange.
 
  It is very sad, because this code was supposed to be cleanuped anyway,
  but if it is really buggy, it would be great to know why.

 Can this be related to :

 CONFIG_PREEMPT=y

Yes, but this preemption should be very unlikely, but it happens every time
for you, strange. lockd in turn spins with preemption enabled, but somehow
rpciod/1 can't make progress. system_state == SYSTEM_HALT, but this shouldn't
affect preempt_schedule_irq(). So I think there is something else.

 workqueue.objdump - without any patch.

So it hangs waiting for cwq-thread == NULL, as expected.

OK. I still can't see how this code could be wrong, but it is bad anyway and
should be changed. The 2nd patch was done more than a month ago, but was
delayed for some stupid reasons. I'll send it today.

Still, it is not clear to me what happens, and you have other crashes with
nfs stop/start

http://marc.info/?l=linux-kernelm=117939027602591
http://marc.info/?l=linux-kernelm=117939257630947

which probaly need some attention.

Thanks!

Oleg.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel NFS lockd freezes notebook on shutdown (Linux 2.6.22-rc1 + CFS v12)

2007-05-17 Thread Oleg Nesterov
Hello Zilvinas,

On 05/17, Zilvinas Valinskas wrote:
> 
> Patch seems to help and it seems kernel doesn't free anymore. I've
> booted new kernel and did :

OK, thank you very much. So, we have some other problems, and I _think_
that workqueue.c is not the source of them.

However, I can't understand why cleanup_workqueue_thread() hangs anyway.
It shouldn't. Looks like rpciod/1 was preempted, and can't get CPU. According
to kernel-nfs-freeze.log it is TASK_RUNNING. Strange.

It is very sad, because this code was supposed to be cleanuped anyway,
but if it is really buggy, it would be great to know why.

Perhaps, we can understand the problem with your help. Could you please
revert the patch I sent, and send me (privately) the output of

objdump -d kernel/workqueue.o

? I doubt very much I'll see something interesting, but who knows...

Thanks!

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel NFS lockd freezes notebook on shutdown (Linux 2.6.22-rc1 + CFS v12)

2007-05-17 Thread Zilvinas Valinskas
And another one crash, achieved by running the following in the shell.
Ran several times, as see from dmesg:

$ op=stop; sudo /etc/init.d/nfs-common $op; \
   sudo /etc/init.d/nfs-kernel-server $op; \
  op=start; sudo /etc/init.d/nfs-common $op; \
sudo /etc/init.d/nfs-kernel-server $op;

Repeat several times ;)

The dmesg output:

May 17 11:36:23 zv kernel: [  613.071050] NFSD: Using /var/lib/nfs/v4recovery 
as the NFSv4 state recovery directory
May 17 11:36:23 zv kernel: [  613.071082] NFSD: starting 90-second grace period
May 17 11:36:25 zv kernel: [  615.639312] nfsd: last server has exited
May 17 11:36:25 zv kernel: [  615.639322] nfsd: unexporting all filesystems
May 17 11:36:25 zv kernel: [  615.838746] NFSD: Using /var/lib/nfs/v4recovery 
as the NFSv4 state recovery directory
May 17 11:36:25 zv kernel: [  615.838782] NFSD: starting 90-second grace period
May 17 11:36:26 zv kernel: [  616.464554] nfsd: last server has exited
May 17 11:36:26 zv kernel: [  616.464563] nfsd: unexporting all filesystems
May 17 11:36:26 zv kernel: [  616.468219] RPC: failed to contact local rpcbind 
server (errno 5).
May 17 11:36:26 zv kernel: [  616.669736] NFSD: Using /var/lib/nfs/v4recovery 
as the NFSv4 state recovery directory
May 17 11:36:26 zv kernel: [  616.669771] NFSD: starting 90-second grace period
May 17 11:36:27 zv kernel: [  617.200592] nfsd: last server has exited
May 17 11:36:27 zv kernel: [  617.200601] nfsd: unexporting all filesystems
May 17 11:36:27 zv kernel: [  617.202565] RPC: failed to contact local rpcbind 
server (errno 5).
May 17 11:36:27 zv kernel: [  617.409917] NFSD: Using /var/lib/nfs/v4recovery 
as the NFSv4 state recovery directory
May 17 11:36:27 zv kernel: [  617.409948] NFSD: starting 90-second grace period
May 17 11:36:27 zv kernel: [  617.872937] nfsd: last server has exited
May 17 11:36:27 zv kernel: [  617.872945] nfsd: unexporting all filesystems
May 17 11:36:27 zv kernel: [  617.877526] RPC: failed to contact local rpcbind 
server (errno 5).
May 17 11:36:28 zv kernel: [  618.084212] PGD 21f9e067 PUD 3b8bf067 PMD 0 
May 17 11:36:28 zv kernel: [  618.084224] CPU 0 
May 17 11:36:28 zv kernel: [  618.084227] Modules linked in: fglrx(P) nfs ipv6 
nfsd exportfs lockd nfs_acl sunrpc pp
dev lp autofs4 deflate zlib_deflate twofish twofish_common camellia serpent 
blowfish des cbc ecb blkcipher aes xcbc 
sha256 sha1 crypto_null af_key piix ide_core dm_crypt dm_snapshot dm_mirror 
dm_mod sbp2 loop coretemp cpufreq_conser
vative cpufreq_stats acpi_cpufreq freq_table pcmcia snd_hda_intel usbhid 
snd_pcm_oss snd_mixer_oss pl2303 ipw3945 ye
nta_socket snd_pcm ohci1394 ieee1394 tifm_7xx1 joydev snd_timer usbserial tsdev 
tpm_infineon sdhci rsrc_nonstatic ie
ee80211 ieee80211_crypt parport_pc snd tpm fw_ohci fw_core parport 
firmware_class iTCO_wdt iTCO_vendor_support sg ps
mouse pcmcia_core tg3 mmc_core crc_itu_t tifm_core pcspkr tpm_bios soundcore 
snd_page_alloc intel_agp sr_mod serio_r
aw ehci_hcd uhci_hcd cdrom evdev
May 17 11:36:28 zv kernel: [  618.084327] Pid: 5560, comm: rpc.nfsd Tainted: P  
 2.6.22-rc1-cfs-v12 #2
May 17 11:36:28 zv kernel: [  618.084332] RIP: 0010:[]  
[] kobject_cleanup+0x24/
0xa0
May 17 11:36:28 zv kernel: [  618.084342] RSP: 0018:8100210bdd08  EFLAGS: 
00010202
May 17 11:36:28 zv kernel: [  618.084347] RAX: 0001 RBX: 
810021c7d688 RCX: 804c4be0
May 17 11:36:28 zv kernel: [  618.084353] RDX:  RSI: 
80341f40 RDI: 810021c7d688
May 17 11:36:28 zv kernel: [  618.084358] RBP: 80341f40 R08: 
 R09: 
May 17 11:36:28 zv kernel: [  618.084362] R10: 0001 R11: 
 R12: 0010
May 17 11:36:28 zv kernel: [  618.084367] R13: 810001fe6270 R14: 
88382941 R15: 
May 17 11:36:28 zv kernel: [  618.084374] FS:  2ab11a0db6f0() 
GS:80603000() knlGS:00
00
May 17 11:36:28 zv kernel: [  618.084379] CS:  0010 DS:  ES:  CR0: 
8005003b
May 17 11:36:28 zv kernel: [  618.084384] CR2: 0010 CR3: 
38d4f000 CR4: 06e0
May 17 11:36:28 zv kernel: [  618.084390] Process rpc.nfsd (pid: 5560, 
threadinfo 8100210bc000, task 8100266
a6000)
May 17 11:36:28 zv kernel: [  618.084394] Stack:  0287 
810021c7d6a4 80341f40 81003bf9837
8
May 17 11:36:28 zv kernel: [  618.084405]  810001fe6270 80342fff 
81003bf98378 810038bf0f50
May 17 11:36:28 zv kernel: [  618.084414]  81003bf98370 802e59ec 
88382941 810021250100
May 17 11:36:28 zv kernel: [  618.084422] Call Trace:
May 17 11:36:28 zv kernel: [  618.084432]  [] 
kobject_release+0x0/0x10
May 17 11:36:28 zv kernel: [  618.084440]  [] 
kref_put+0x3f/0x80
May 17 11:36:28 zv kernel: [  618.084449]  [] 
sysfs_hash_and_remove+0x14c/0x160
May 17 11:36:28 zv kernel: [  618.084460]  [] 
sysfs_slab_alias+0x71/0xa0
May 17 11:36:28 zv 

Re: Kernel NFS lockd freezes notebook on shutdown (Linux 2.6.22-rc1 + CFS v12)

2007-05-17 Thread Zilvinas Valinskas
Hello Oleg, 

Patch seems to help and it seems kernel doesn't free anymore. I've
booted new kernel and did :

#1 $ sudo /etc/init.d/nfs-kernel-server stop
#2 $ sudo /etc/init.d/nfs-common stop

Previously it was enough to run '#1' to freeze the kernel. This time
with your patch applied #1 and #2 worked fine. So far so good. Don't
know why , but I've tried to run #1 & #2 several times  - as a result
OOPS (kernel is tainted). Opps from dmesg:

[  429.103734] usb 1-5.4: link qh8-0601/81003ebac320 start 7 [1/2 us]
[  436.009276] nfsd: last server has exited
[  436.009410] nfsd: unexporting all filesystems
[  436.011395] RPC: failed to contact local rpcbind server (errno 5).
[  460.950495] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery 
directory
[  460.950659] NFSD: starting 90-second grace period
[  615.796112] nfsd: last server has exited
[  615.796121] nfsd: unexporting all filesystems
[  615.800976] RPC: failed to contact local rpcbind server (errno 5).
[  619.444368] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery 
directory
[  619.03] NFSD: starting 90-second grace period
[  620.576730] nfsd: last server has exited
[  620.576739] nfsd: unexporting all filesystems
[  620.581036] RPC: failed to contact local rpcbind server (errno 5).
[  621.606324] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery 
directory
[  621.606359] NFSD: starting 90-second grace period
[  622.561989] nfsd: last server has exited
[  622.561999] nfsd: unexporting all filesystems
[  623.639396] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery 
directory
[  623.639430] NFSD: starting 90-second grace period
[  623.639487] Unable to handle kernel paging request at  RIP: 
[  623.639492]  [] __kfree_skb+0x9f/0x150
[  623.639504] PGD 203067 PUD 0 
[  623.639510] Oops: 0002 [1] PREEMPT SMP 
[  623.639515] CPU 0 
[  623.639519] Modules linked in: fglrx(P) nfs nfsd exportfs lockd nfs_acl 
sunrpc ppdev lp autofs4 ipw3945 ieee80211 ieee80211_crypt ipv6 deflate 
zlib_deflate twofish twofish_common camellia serpent blowfish des cbc ecb 
blkcipher aes xcbc sha256 sha1 crypto_null af_key piix ide_core dm_crypt 
dm_snapshot dm_mirror dm_mod sbp2 loop coretemp cpufreq_conservative 
cpufreq_stats acpi_cpufreq freq_table usbhid pl2303 ohci1394 ieee1394 usbserial 
pcmcia firmware_class snd_hda_intel snd_pcm_oss snd_mixer_oss sdhci snd_pcm 
joydev iTCO_wdt fw_ohci fw_core mmc_core snd_timer tg3 sg snd yenta_socket 
rsrc_nonstatic pcmcia_core crc_itu_t iTCO_vendor_support tifm_7xx1 tsdev 
parport_pc parport intel_agp tpm_infineon tpm tpm_bios uhci_hcd sr_mod 
tifm_core ehci_hcd psmouse soundcore snd_page_alloc pcspkr serio_raw evdev cdrom
[  623.639616] Pid: 616, comm: udevd Tainted: P   2.6.22-rc1-cfs-v12 #2
[  623.639622] RIP: 0010:[]  [] 
__kfree_skb+0x9f/0x150
[  623.639631] RSP: 0018:81003ed87be8  EFLAGS: 00010286
[  623.639635] RAX: 81003f2144a0 RBX:  RCX: 
[  623.639641] RDX: 0130 RSI: 8100285eb400 RDI: 
[  623.639646] RBP: 8100285eb400 R08: 0050eaf0 R09: 
[  623.639651] R10:  R11: 0246 R12: 81003f214400
[  623.639656] R13: 81003ed87ee8 R14: 8100285eb440 R15: 
[  623.639662] FS:  2b0370c18e00() GS:80603000() 
knlGS:
[  623.639667] CS:  0010 DS:  ES:  CR0: 8005003b
[  623.639672] CR2:  CR3: 3ed6c000 CR4: 06e0
[  623.639678] Process udevd (pid: 616, threadinfo 81003ed86000, task 
81003ecd)
[  623.639682] Stack:  81003ed87ee8 81003ed87e68 8100285eb400 
8043b6a6
[  623.639694]  0001 810001ff7b80 0050 
81003ed87db8
[  623.639702]     

[  623.639709] Call Trace:
[  623.639719]  [] netlink_recvmsg+0x176/0x3a0
[  623.639739]  [] sock_recvmsg+0x150/0x170
[  623.639754]  [] autoremove_wake_function+0x0/0x30
[  623.639768]  [] core_sys_select+0x26e/0x350
[  623.639785]  [] __d_lookup+0x165/0x180
[  623.639797]  [] sys_recvfrom+0xfe/0x190
[  623.639807]  [] remove_wait_queue+0x19/0x60
[  623.639823]  [] sys_select+0x44/0x1c0
[  623.639836]  [] system_call+0x7e/0x83
[  623.639849] 
[  623.639851] 
[  623.639852] Code: f0 ff 0f 0f 94 c0 84 c0 75 27 66 c7 85 a8 00 00 00 00 00 
66 
[  623.639871] RIP  [] __kfree_skb+0x9f/0x150
[  623.639878]  RSP 
[  623.639881] CR2: 

Hmm, I've got something different now :( - 


On Thu, 2007-05-17 at 02:55 +0400, Oleg Nesterov wrote:
> On Wed, 16 May 2007 21:00:41 +0300
> Zilvinas Valinskas <[EMAIL PROTECTED]> wrote:
> > 
> > In short, on shutdown my laptop is always freezing now. I was able to
> > capture the 'sysrq-P' (hit that several times), sysrq-T outputs. Please
> > see .config and log messages at http://barclay.balt.net/~zilvinas/oops/ 
> > 
> > 

Re: Kernel NFS lockd freezes notebook on shutdown (Linux 2.6.22-rc1 + CFS v12)

2007-05-17 Thread Zilvinas Valinskas
Hello Oleg, Andrew,

Sure no problem Oleg, compiling now, reboot will follow with results.
Thank you both !

On Thu, 2007-05-17 at 02:55 +0400, Oleg Nesterov wrote:

> Zilvinas, could you try the patch below?
> 
> It is a shot in the dark. I hope I'll suggest somethimg better tomorrow.
> 
> Oleg.
> 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel NFS lockd freezes notebook on shutdown (Linux 2.6.22-rc1 + CFS v12)

2007-05-17 Thread Zilvinas Valinskas
Hello Oleg, Andrew,

Sure no problem Oleg, compiling now, reboot will follow with results.
Thank you both !

On Thu, 2007-05-17 at 02:55 +0400, Oleg Nesterov wrote:

 Zilvinas, could you try the patch below?
 
 It is a shot in the dark. I hope I'll suggest somethimg better tomorrow.
 
 Oleg.
 


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel NFS lockd freezes notebook on shutdown (Linux 2.6.22-rc1 + CFS v12)

2007-05-17 Thread Zilvinas Valinskas
Hello Oleg, 

Patch seems to help and it seems kernel doesn't free anymore. I've
booted new kernel and did :

#1 $ sudo /etc/init.d/nfs-kernel-server stop
#2 $ sudo /etc/init.d/nfs-common stop

Previously it was enough to run '#1' to freeze the kernel. This time
with your patch applied #1 and #2 worked fine. So far so good. Don't
know why , but I've tried to run #1  #2 several times  - as a result
OOPS (kernel is tainted). Opps from dmesg:

[  429.103734] usb 1-5.4: link qh8-0601/81003ebac320 start 7 [1/2 us]
[  436.009276] nfsd: last server has exited
[  436.009410] nfsd: unexporting all filesystems
[  436.011395] RPC: failed to contact local rpcbind server (errno 5).
[  460.950495] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery 
directory
[  460.950659] NFSD: starting 90-second grace period
[  615.796112] nfsd: last server has exited
[  615.796121] nfsd: unexporting all filesystems
[  615.800976] RPC: failed to contact local rpcbind server (errno 5).
[  619.444368] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery 
directory
[  619.03] NFSD: starting 90-second grace period
[  620.576730] nfsd: last server has exited
[  620.576739] nfsd: unexporting all filesystems
[  620.581036] RPC: failed to contact local rpcbind server (errno 5).
[  621.606324] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery 
directory
[  621.606359] NFSD: starting 90-second grace period
[  622.561989] nfsd: last server has exited
[  622.561999] nfsd: unexporting all filesystems
[  623.639396] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery 
directory
[  623.639430] NFSD: starting 90-second grace period
[  623.639487] Unable to handle kernel paging request at  RIP: 
[  623.639492]  [8041c47f] __kfree_skb+0x9f/0x150
[  623.639504] PGD 203067 PUD 0 
[  623.639510] Oops: 0002 [1] PREEMPT SMP 
[  623.639515] CPU 0 
[  623.639519] Modules linked in: fglrx(P) nfs nfsd exportfs lockd nfs_acl 
sunrpc ppdev lp autofs4 ipw3945 ieee80211 ieee80211_crypt ipv6 deflate 
zlib_deflate twofish twofish_common camellia serpent blowfish des cbc ecb 
blkcipher aes xcbc sha256 sha1 crypto_null af_key piix ide_core dm_crypt 
dm_snapshot dm_mirror dm_mod sbp2 loop coretemp cpufreq_conservative 
cpufreq_stats acpi_cpufreq freq_table usbhid pl2303 ohci1394 ieee1394 usbserial 
pcmcia firmware_class snd_hda_intel snd_pcm_oss snd_mixer_oss sdhci snd_pcm 
joydev iTCO_wdt fw_ohci fw_core mmc_core snd_timer tg3 sg snd yenta_socket 
rsrc_nonstatic pcmcia_core crc_itu_t iTCO_vendor_support tifm_7xx1 tsdev 
parport_pc parport intel_agp tpm_infineon tpm tpm_bios uhci_hcd sr_mod 
tifm_core ehci_hcd psmouse soundcore snd_page_alloc pcspkr serio_raw evdev cdrom
[  623.639616] Pid: 616, comm: udevd Tainted: P   2.6.22-rc1-cfs-v12 #2
[  623.639622] RIP: 0010:[8041c47f]  [8041c47f] 
__kfree_skb+0x9f/0x150
[  623.639631] RSP: 0018:81003ed87be8  EFLAGS: 00010286
[  623.639635] RAX: 81003f2144a0 RBX:  RCX: 
[  623.639641] RDX: 0130 RSI: 8100285eb400 RDI: 
[  623.639646] RBP: 8100285eb400 R08: 0050eaf0 R09: 
[  623.639651] R10:  R11: 0246 R12: 81003f214400
[  623.639656] R13: 81003ed87ee8 R14: 8100285eb440 R15: 
[  623.639662] FS:  2b0370c18e00() GS:80603000() 
knlGS:
[  623.639667] CS:  0010 DS:  ES:  CR0: 8005003b
[  623.639672] CR2:  CR3: 3ed6c000 CR4: 06e0
[  623.639678] Process udevd (pid: 616, threadinfo 81003ed86000, task 
81003ecd)
[  623.639682] Stack:  81003ed87ee8 81003ed87e68 8100285eb400 
8043b6a6
[  623.639694]  0001 810001ff7b80 0050 
81003ed87db8
[  623.639702]     

[  623.639709] Call Trace:
[  623.639719]  [8043b6a6] netlink_recvmsg+0x176/0x3a0
[  623.639739]  [80415b80] sock_recvmsg+0x150/0x170
[  623.639754]  [8024e760] autoremove_wake_function+0x0/0x30
[  623.639768]  [802a531e] core_sys_select+0x26e/0x350
[  623.639785]  [802a9f05] __d_lookup+0x165/0x180
[  623.639797]  [80416f8e] sys_recvfrom+0xfe/0x190
[  623.639807]  [8024e969] remove_wait_queue+0x19/0x60
[  623.639823]  [802a5874] sys_select+0x44/0x1c0
[  623.639836]  [8020a2ae] system_call+0x7e/0x83
[  623.639849] 
[  623.639851] 
[  623.639852] Code: f0 ff 0f 0f 94 c0 84 c0 75 27 66 c7 85 a8 00 00 00 00 00 
66 
[  623.639871] RIP  [8041c47f] __kfree_skb+0x9f/0x150
[  623.639878]  RSP 81003ed87be8
[  623.639881] CR2: 

Hmm, I've got something different now :( - 


On Thu, 2007-05-17 at 02:55 +0400, Oleg Nesterov wrote:
 On Wed, 16 May 2007 21:00:41 +0300
 Zilvinas Valinskas [EMAIL PROTECTED] wrote:
  
  In short, on shutdown 

Re: Kernel NFS lockd freezes notebook on shutdown (Linux 2.6.22-rc1 + CFS v12)

2007-05-17 Thread Zilvinas Valinskas
And another one crash, achieved by running the following in the shell.
Ran several times, as see from dmesg:

$ op=stop; sudo /etc/init.d/nfs-common $op; \
   sudo /etc/init.d/nfs-kernel-server $op; \
  op=start; sudo /etc/init.d/nfs-common $op; \
sudo /etc/init.d/nfs-kernel-server $op;

Repeat several times ;)

The dmesg output:

May 17 11:36:23 zv kernel: [  613.071050] NFSD: Using /var/lib/nfs/v4recovery 
as the NFSv4 state recovery directory
May 17 11:36:23 zv kernel: [  613.071082] NFSD: starting 90-second grace period
May 17 11:36:25 zv kernel: [  615.639312] nfsd: last server has exited
May 17 11:36:25 zv kernel: [  615.639322] nfsd: unexporting all filesystems
May 17 11:36:25 zv kernel: [  615.838746] NFSD: Using /var/lib/nfs/v4recovery 
as the NFSv4 state recovery directory
May 17 11:36:25 zv kernel: [  615.838782] NFSD: starting 90-second grace period
May 17 11:36:26 zv kernel: [  616.464554] nfsd: last server has exited
May 17 11:36:26 zv kernel: [  616.464563] nfsd: unexporting all filesystems
May 17 11:36:26 zv kernel: [  616.468219] RPC: failed to contact local rpcbind 
server (errno 5).
May 17 11:36:26 zv kernel: [  616.669736] NFSD: Using /var/lib/nfs/v4recovery 
as the NFSv4 state recovery directory
May 17 11:36:26 zv kernel: [  616.669771] NFSD: starting 90-second grace period
May 17 11:36:27 zv kernel: [  617.200592] nfsd: last server has exited
May 17 11:36:27 zv kernel: [  617.200601] nfsd: unexporting all filesystems
May 17 11:36:27 zv kernel: [  617.202565] RPC: failed to contact local rpcbind 
server (errno 5).
May 17 11:36:27 zv kernel: [  617.409917] NFSD: Using /var/lib/nfs/v4recovery 
as the NFSv4 state recovery directory
May 17 11:36:27 zv kernel: [  617.409948] NFSD: starting 90-second grace period
May 17 11:36:27 zv kernel: [  617.872937] nfsd: last server has exited
May 17 11:36:27 zv kernel: [  617.872945] nfsd: unexporting all filesystems
May 17 11:36:27 zv kernel: [  617.877526] RPC: failed to contact local rpcbind 
server (errno 5).
May 17 11:36:28 zv kernel: [  618.084212] PGD 21f9e067 PUD 3b8bf067 PMD 0 
May 17 11:36:28 zv kernel: [  618.084224] CPU 0 
May 17 11:36:28 zv kernel: [  618.084227] Modules linked in: fglrx(P) nfs ipv6 
nfsd exportfs lockd nfs_acl sunrpc pp
dev lp autofs4 deflate zlib_deflate twofish twofish_common camellia serpent 
blowfish des cbc ecb blkcipher aes xcbc 
sha256 sha1 crypto_null af_key piix ide_core dm_crypt dm_snapshot dm_mirror 
dm_mod sbp2 loop coretemp cpufreq_conser
vative cpufreq_stats acpi_cpufreq freq_table pcmcia snd_hda_intel usbhid 
snd_pcm_oss snd_mixer_oss pl2303 ipw3945 ye
nta_socket snd_pcm ohci1394 ieee1394 tifm_7xx1 joydev snd_timer usbserial tsdev 
tpm_infineon sdhci rsrc_nonstatic ie
ee80211 ieee80211_crypt parport_pc snd tpm fw_ohci fw_core parport 
firmware_class iTCO_wdt iTCO_vendor_support sg ps
mouse pcmcia_core tg3 mmc_core crc_itu_t tifm_core pcspkr tpm_bios soundcore 
snd_page_alloc intel_agp sr_mod serio_r
aw ehci_hcd uhci_hcd cdrom evdev
May 17 11:36:28 zv kernel: [  618.084327] Pid: 5560, comm: rpc.nfsd Tainted: P  
 2.6.22-rc1-cfs-v12 #2
May 17 11:36:28 zv kernel: [  618.084332] RIP: 0010:[80341ec4]  
[80341ec4] kobject_cleanup+0x24/
0xa0
May 17 11:36:28 zv kernel: [  618.084342] RSP: 0018:8100210bdd08  EFLAGS: 
00010202
May 17 11:36:28 zv kernel: [  618.084347] RAX: 0001 RBX: 
810021c7d688 RCX: 804c4be0
May 17 11:36:28 zv kernel: [  618.084353] RDX:  RSI: 
80341f40 RDI: 810021c7d688
May 17 11:36:28 zv kernel: [  618.084358] RBP: 80341f40 R08: 
 R09: 
May 17 11:36:28 zv kernel: [  618.084362] R10: 0001 R11: 
 R12: 0010
May 17 11:36:28 zv kernel: [  618.084367] R13: 810001fe6270 R14: 
88382941 R15: 
May 17 11:36:28 zv kernel: [  618.084374] FS:  2ab11a0db6f0() 
GS:80603000() knlGS:00
00
May 17 11:36:28 zv kernel: [  618.084379] CS:  0010 DS:  ES:  CR0: 
8005003b
May 17 11:36:28 zv kernel: [  618.084384] CR2: 0010 CR3: 
38d4f000 CR4: 06e0
May 17 11:36:28 zv kernel: [  618.084390] Process rpc.nfsd (pid: 5560, 
threadinfo 8100210bc000, task 8100266
a6000)
May 17 11:36:28 zv kernel: [  618.084394] Stack:  0287 
810021c7d6a4 80341f40 81003bf9837
8
May 17 11:36:28 zv kernel: [  618.084405]  810001fe6270 80342fff 
81003bf98378 810038bf0f50
May 17 11:36:28 zv kernel: [  618.084414]  81003bf98370 802e59ec 
88382941 810021250100
May 17 11:36:28 zv kernel: [  618.084422] Call Trace:
May 17 11:36:28 zv kernel: [  618.084432]  [80341f40] 
kobject_release+0x0/0x10
May 17 11:36:28 zv kernel: [  618.084440]  [80342fff] 
kref_put+0x3f/0x80
May 17 11:36:28 zv kernel: [  618.084449]  [802e59ec] 
sysfs_hash_and_remove+0x14c/0x160
May 17 11:36:28 zv 

Re: Kernel NFS lockd freezes notebook on shutdown (Linux 2.6.22-rc1 + CFS v12)

2007-05-17 Thread Oleg Nesterov
Hello Zilvinas,

On 05/17, Zilvinas Valinskas wrote:
 
 Patch seems to help and it seems kernel doesn't free anymore. I've
 booted new kernel and did :

OK, thank you very much. So, we have some other problems, and I _think_
that workqueue.c is not the source of them.

However, I can't understand why cleanup_workqueue_thread() hangs anyway.
It shouldn't. Looks like rpciod/1 was preempted, and can't get CPU. According
to kernel-nfs-freeze.log it is TASK_RUNNING. Strange.

It is very sad, because this code was supposed to be cleanuped anyway,
but if it is really buggy, it would be great to know why.

Perhaps, we can understand the problem with your help. Could you please
revert the patch I sent, and send me (privately) the output of

objdump -d kernel/workqueue.o

? I doubt very much I'll see something interesting, but who knows...

Thanks!

Oleg.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel NFS lockd freezes notebook on shutdown (Linux 2.6.22-rc1 + CFS v12)

2007-05-16 Thread Oleg Nesterov
On Wed, 16 May 2007 21:00:41 +0300
Zilvinas Valinskas <[EMAIL PROTECTED]> wrote:
> 
> In short, on shutdown my laptop is always freezing now. I was able to
> capture the 'sysrq-P' (hit that several times), sysrq-T outputs. Please
> see .config and log messages at http://barclay.balt.net/~zilvinas/oops/ 
> 
> Kernel version I had built according git is :
> 
> [EMAIL PROTECTED]:/projects/linux-amd64.git$ git describe HEAD
> v2.6.22-rc1-29-gfaa8b6c
> 
> On top of that I have CFS v12 applied (no other changes otherwise).
> Please note that there is ''fglrx.ko'' loaded and kernel is tainted
> because of that (feel free to ignore the report ...).
> 
> Anyway, 'sysrq-P' always show that PC is stuck at (NFS lockd?) and it is
> always the same backtrace is shown. 'sysrq-t' output is in
> 'kernel-nfs-freeze.log' file (did not want to post it here).
> 
>  Pid: 3652, comm: lockd Tainted: P   2.6.22-rc1-cfs-v12 #1
> 
> [] wq_barrier_func+0x0/0x10
> [] destroy_workqueue+0x75/0xa0
> [] :sunrpc:rpciod_down+0xf4/0x170
> [] :lockd:lockd+0x244/0x300
> [] schedule_tail+0x3f/0xb0
> [] child_rip+0xa/0x12
> [] :lockd:lockd+0x0/0x300
> [] :lockd:lockd+0x0/0x300
> [] child_rip+0x0/0x12
> 
> Hope this helps. Thanks in advance for any advice how to solve problem !
> For now I am back to '2.6.21.1-cfs-v10'.
> 

Nice, thanks.

Zilvinas, could you try the patch below?

It is a shot in the dark. I hope I'll suggest somethimg better tomorrow.

Oleg.

--- OLD/kernel/workqueue.c~ 2007-05-17 00:15:37.0 +0400
+++ OLD/kernel/workqueue.c  2007-05-17 02:51:15.0 +0400
@@ -752,16 +752,25 @@ static void cleanup_workqueue_thread(str
spin_unlock_irq(>lock);
 
if (alive) {
+   int n;
+
wait_for_completion();
 
-   while (unlikely(cwq->thread != NULL))
-   cpu_relax();
-   /*
-* Wait until cwq->thread unlocks cwq->lock,
-* it won't touch *cwq after that.
-*/
-   smp_rmb();
-   spin_unlock_wait(>lock);
+   for (n = 0;; ++n) {
+   spin_lock_irq(>lock);
+   alive = (cwq->thread != NULL);
+   spin_unlock_irq(>lock);
+
+   if (!alive)
+   break;
+
+   if (n > 1000) {
+   printk(KERN_CRIT "ERR!! wq: %s\n", 
cwq->wq->name);
+   break;
+   }
+
+   yield();
+   }
}
 }
 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel NFS lockd freezes notebook on shutdown (Linux 2.6.22-rc1 + CFS v12)

2007-05-16 Thread Andrew Morton
On Wed, 16 May 2007 21:00:41 +0300
Zilvinas Valinskas <[EMAIL PROTECTED]> wrote:

> Hello, 
> 
> In short, on shutdown my laptop is always freezing now. I was able to
> capture the 'sysrq-P' (hit that several times), sysrq-T outputs. Please
> see .config and log messages at http://barclay.balt.net/~zilvinas/oops/ 
> 
> Kernel version I had built according git is :
> 
> [EMAIL PROTECTED]:/projects/linux-amd64.git$ git describe HEAD
> v2.6.22-rc1-29-gfaa8b6c
> 
> On top of that I have CFS v12 applied (no other changes otherwise).
> Please note that there is ''fglrx.ko'' loaded and kernel is tainted
> because of that (feel free to ignore the report ...).
> 
> Anyway, 'sysrq-P' always show that PC is stuck at (NFS lockd?) and it is
> always the same backtrace is shown. 'sysrq-t' output is in
> 'kernel-nfs-freeze.log' file (did not want to post it here).
> 
>  Pid: 3652, comm: lockd Tainted: P   2.6.22-rc1-cfs-v12 #1
> 
> [] wq_barrier_func+0x0/0x10
> [] destroy_workqueue+0x75/0xa0
> [] :sunrpc:rpciod_down+0xf4/0x170
> [] :lockd:lockd+0x244/0x300
> [] schedule_tail+0x3f/0xb0
> [] child_rip+0xa/0x12
> [] :lockd:lockd+0x0/0x300
> [] :lockd:lockd+0x0/0x300
> [] child_rip+0x0/0x12
> 
> Hope this helps. Thanks in advance for any advice how to solve problem !
> For now I am back to '2.6.21.1-cfs-v10'.
> 

Thanks for the report.   I'm thinking "Oleg".
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Kernel NFS lockd freezes notebook on shutdown (Linux 2.6.22-rc1 + CFS v12)

2007-05-16 Thread Zilvinas Valinskas
Hello, 

In short, on shutdown my laptop is always freezing now. I was able to
capture the 'sysrq-P' (hit that several times), sysrq-T outputs. Please
see .config and log messages at http://barclay.balt.net/~zilvinas/oops/ 

Kernel version I had built according git is :

[EMAIL PROTECTED]:/projects/linux-amd64.git$ git describe HEAD
v2.6.22-rc1-29-gfaa8b6c

On top of that I have CFS v12 applied (no other changes otherwise).
Please note that there is ''fglrx.ko'' loaded and kernel is tainted
because of that (feel free to ignore the report ...).

Anyway, 'sysrq-P' always show that PC is stuck at (NFS lockd?) and it is
always the same backtrace is shown. 'sysrq-t' output is in
'kernel-nfs-freeze.log' file (did not want to post it here).

 Pid: 3652, comm: lockd Tainted: P   2.6.22-rc1-cfs-v12 #1

[] wq_barrier_func+0x0/0x10
[] destroy_workqueue+0x75/0xa0
[] :sunrpc:rpciod_down+0xf4/0x170
[] :lockd:lockd+0x244/0x300
[] schedule_tail+0x3f/0xb0
[] child_rip+0xa/0x12
[] :lockd:lockd+0x0/0x300
[] :lockd:lockd+0x0/0x300
[] child_rip+0x0/0x12

Hope this helps. Thanks in advance for any advice how to solve problem !
For now I am back to '2.6.21.1-cfs-v10'.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Kernel NFS lockd freezes notebook on shutdown (Linux 2.6.22-rc1 + CFS v12)

2007-05-16 Thread Zilvinas Valinskas
Hello, 

In short, on shutdown my laptop is always freezing now. I was able to
capture the 'sysrq-P' (hit that several times), sysrq-T outputs. Please
see .config and log messages at http://barclay.balt.net/~zilvinas/oops/ 

Kernel version I had built according git is :

[EMAIL PROTECTED]:/projects/linux-amd64.git$ git describe HEAD
v2.6.22-rc1-29-gfaa8b6c

On top of that I have CFS v12 applied (no other changes otherwise).
Please note that there is ''fglrx.ko'' loaded and kernel is tainted
because of that (feel free to ignore the report ...).

Anyway, 'sysrq-P' always show that PC is stuck at (NFS lockd?) and it is
always the same backtrace is shown. 'sysrq-t' output is in
'kernel-nfs-freeze.log' file (did not want to post it here).

 Pid: 3652, comm: lockd Tainted: P   2.6.22-rc1-cfs-v12 #1

[8024a5a0] wq_barrier_func+0x0/0x10
[8024a7e5] destroy_workqueue+0x75/0xa0
[8833cd34] :sunrpc:rpciod_down+0xf4/0x170
[8836dd74] :lockd:lockd+0x244/0x300
[80233e1f] schedule_tail+0x3f/0xb0
[8020b0f8] child_rip+0xa/0x12
[8836db30] :lockd:lockd+0x0/0x300
[8836db30] :lockd:lockd+0x0/0x300
[8020b0ee] child_rip+0x0/0x12

Hope this helps. Thanks in advance for any advice how to solve problem !
For now I am back to '2.6.21.1-cfs-v10'.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel NFS lockd freezes notebook on shutdown (Linux 2.6.22-rc1 + CFS v12)

2007-05-16 Thread Andrew Morton
On Wed, 16 May 2007 21:00:41 +0300
Zilvinas Valinskas [EMAIL PROTECTED] wrote:

 Hello, 
 
 In short, on shutdown my laptop is always freezing now. I was able to
 capture the 'sysrq-P' (hit that several times), sysrq-T outputs. Please
 see .config and log messages at http://barclay.balt.net/~zilvinas/oops/ 
 
 Kernel version I had built according git is :
 
 [EMAIL PROTECTED]:/projects/linux-amd64.git$ git describe HEAD
 v2.6.22-rc1-29-gfaa8b6c
 
 On top of that I have CFS v12 applied (no other changes otherwise).
 Please note that there is ''fglrx.ko'' loaded and kernel is tainted
 because of that (feel free to ignore the report ...).
 
 Anyway, 'sysrq-P' always show that PC is stuck at (NFS lockd?) and it is
 always the same backtrace is shown. 'sysrq-t' output is in
 'kernel-nfs-freeze.log' file (did not want to post it here).
 
  Pid: 3652, comm: lockd Tainted: P   2.6.22-rc1-cfs-v12 #1
 
 [8024a5a0] wq_barrier_func+0x0/0x10
 [8024a7e5] destroy_workqueue+0x75/0xa0
 [8833cd34] :sunrpc:rpciod_down+0xf4/0x170
 [8836dd74] :lockd:lockd+0x244/0x300
 [80233e1f] schedule_tail+0x3f/0xb0
 [8020b0f8] child_rip+0xa/0x12
 [8836db30] :lockd:lockd+0x0/0x300
 [8836db30] :lockd:lockd+0x0/0x300
 [8020b0ee] child_rip+0x0/0x12
 
 Hope this helps. Thanks in advance for any advice how to solve problem !
 For now I am back to '2.6.21.1-cfs-v10'.
 

Thanks for the report.   I'm thinking Oleg.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel NFS lockd freezes notebook on shutdown (Linux 2.6.22-rc1 + CFS v12)

2007-05-16 Thread Oleg Nesterov
On Wed, 16 May 2007 21:00:41 +0300
Zilvinas Valinskas [EMAIL PROTECTED] wrote:
 
 In short, on shutdown my laptop is always freezing now. I was able to
 capture the 'sysrq-P' (hit that several times), sysrq-T outputs. Please
 see .config and log messages at http://barclay.balt.net/~zilvinas/oops/ 
 
 Kernel version I had built according git is :
 
 [EMAIL PROTECTED]:/projects/linux-amd64.git$ git describe HEAD
 v2.6.22-rc1-29-gfaa8b6c
 
 On top of that I have CFS v12 applied (no other changes otherwise).
 Please note that there is ''fglrx.ko'' loaded and kernel is tainted
 because of that (feel free to ignore the report ...).
 
 Anyway, 'sysrq-P' always show that PC is stuck at (NFS lockd?) and it is
 always the same backtrace is shown. 'sysrq-t' output is in
 'kernel-nfs-freeze.log' file (did not want to post it here).
 
  Pid: 3652, comm: lockd Tainted: P   2.6.22-rc1-cfs-v12 #1
 
 [8024a5a0] wq_barrier_func+0x0/0x10
 [8024a7e5] destroy_workqueue+0x75/0xa0
 [8833cd34] :sunrpc:rpciod_down+0xf4/0x170
 [8836dd74] :lockd:lockd+0x244/0x300
 [80233e1f] schedule_tail+0x3f/0xb0
 [8020b0f8] child_rip+0xa/0x12
 [8836db30] :lockd:lockd+0x0/0x300
 [8836db30] :lockd:lockd+0x0/0x300
 [8020b0ee] child_rip+0x0/0x12
 
 Hope this helps. Thanks in advance for any advice how to solve problem !
 For now I am back to '2.6.21.1-cfs-v10'.
 

Nice, thanks.

Zilvinas, could you try the patch below?

It is a shot in the dark. I hope I'll suggest somethimg better tomorrow.

Oleg.

--- OLD/kernel/workqueue.c~ 2007-05-17 00:15:37.0 +0400
+++ OLD/kernel/workqueue.c  2007-05-17 02:51:15.0 +0400
@@ -752,16 +752,25 @@ static void cleanup_workqueue_thread(str
spin_unlock_irq(cwq-lock);
 
if (alive) {
+   int n;
+
wait_for_completion(barr.done);
 
-   while (unlikely(cwq-thread != NULL))
-   cpu_relax();
-   /*
-* Wait until cwq-thread unlocks cwq-lock,
-* it won't touch *cwq after that.
-*/
-   smp_rmb();
-   spin_unlock_wait(cwq-lock);
+   for (n = 0;; ++n) {
+   spin_lock_irq(cwq-lock);
+   alive = (cwq-thread != NULL);
+   spin_unlock_irq(cwq-lock);
+
+   if (!alive)
+   break;
+
+   if (n  1000) {
+   printk(KERN_CRIT ERR!! wq: %s\n, 
cwq-wq-name);
+   break;
+   }
+
+   yield();
+   }
}
 }
 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/