Re: [BUG] 2.6.24-rc2-mm1 - kernel bug on nfs v4

2008-01-02 Thread Torsten Kaiser
On Jan 2, 2008 9:51 PM, Christoph Lameter <[EMAIL PROTECTED]> wrote:
> On Wed, 2 Jan 2008, Torsten Kaiser wrote:
>
> > I just tested something with vanilla 2.6.24-rc6 and had the same problem.
> > Should this patch, or something similar be included for 2.6.24?
>
> Such a patch is in Andrew's tree.
>
> 2.6.24-rc6-mm1:
>
> tatic struct kmem_cache_node *early_kmem_cache_node_alloc(gfp_t gfpflags,
>int node)
> {
> struct page *page;
> struct kmem_cache_node *n;
> unsigned long flags;
> ...
>
> /*
>
>  * lockdep requires consistent irq usage for each lock
>  * so even though there cannot be a race this early in
>  * the boot sequence, we still disable irqs.
>  */
> local_irq_save(flags);
> add_partial(kmalloc_caches, page, 0);
> local_irq_restore(flags);
> return n;
> }
>

from 2.6.24-rc6-mm1 patch-series file:
slub-noinline-some-functions-to-avoid-them-being-folded-into-alloc-free.patch
slub-move-kmem_cache_node-determination-into-add_full-and-add_partial.patch
slub-move-kmem_cache_node-determination-into-add_full-and-add_partial-slub-workaround-for-lockdep-confusion.patch
slub-avoid-checking-for-a-valid-object-before-zeroing-on-the-fast-path.patch

It seems it got lumped into some other slub patches, but the bug does
not seem to be introduced by them, as I can see it in mainline
2.6.24-rc6.

Should this patch made a candidate for the merge-before-2.6.24-final-queue?

Torsten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] 2.6.24-rc2-mm1 - kernel bug on nfs v4

2008-01-02 Thread Christoph Lameter
On Wed, 2 Jan 2008, Torsten Kaiser wrote:

> I just tested something with vanilla 2.6.24-rc6 and had the same problem.
> Should this patch, or something similar be included for 2.6.24?

Such a patch is in Andrew's tree.

2.6.24-rc6-mm1:

tatic struct kmem_cache_node *early_kmem_cache_node_alloc(gfp_t gfpflags,
   int node)
{
struct page *page;
struct kmem_cache_node *n;
unsigned long flags;
...

/*
 * lockdep requires consistent irq usage for each lock
 * so even though there cannot be a race this early in
 * the boot sequence, we still disable irqs.
 */
local_irq_save(flags);
add_partial(kmalloc_caches, page, 0);
local_irq_restore(flags);
return n;
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] 2.6.24-rc2-mm1 - kernel bug on nfs v4

2008-01-02 Thread Torsten Kaiser
CC's somewhat trimmed...
On Nov 18, 2007 12:00 AM, root <[EMAIL PROTECTED]> wrote:
> On Sat, Nov 17, 2007 at 07:09:46PM +0100, Ingo Molnar wrote:
> > * Torsten Kaiser <[EMAIL PROTECTED]> wrote:
> >
> > > Sadly lockdep does not work for me, as it gets turned off early:
> > > [   39.851594] -
> > > [   39.855963] inconsistent {softirq-on-W} -> {in-softirq-W} usage.
> > > [   39.861981] swapper/0 [HC0[0]:SC1[1]:HE0:SE0] takes:
> > > [   39.866963]  (>list_lock){-+..}, at: []
> >
> > hey, that means it found a bug - which is not sad at all :-)
>
> ---
> Subject: lockdep: slub: annotate boot time node->list_lock usage
>
> inconsistent {softirq-on-W} -> {in-softirq-W} usage.
> swapper/0 [HC0[0]:SC1[1]:HE0:SE0] takes:
>  (>list_lock){-+..}, at: [] add_partial+0x31/0xa0
> {softirq-on-W} state was registered at:
>   [] __lock_acquire+0x3e8/0x1140
>   [] debug_check_no_locks_freed+0x188/0x1a0
>   [] lock_acquire+0x55/0x70
>   [] add_partial+0x31/0xa0
>   [] _spin_lock+0x1e/0x30
>   [] add_partial+0x31/0xa0
>   [] kmem_cache_open+0x1cc/0x330
>   [] _spin_unlock_irq+0x24/0x30
>   [] create_kmalloc_cache+0x64/0xf0
>   [] init_alloc_cpu_cpu+0x70/0x90
>   [] kmem_cache_init+0x65/0x1d0
>   [] start_kernel+0x23e/0x350
>   [] _sinittext+0x12d/0x140
>   [] 0x
>
> Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]>
> CC: Christoph Lameter <[EMAIL PROTECTED]>
> CC: Kamalesh Babulal <[EMAIL PROTECTED]>
> ---
>  mm/slub.c |8 
>  1 file changed, 8 insertions(+)
>
> Index: linux-2.6/mm/slub.c
> ===
> --- linux-2.6.orig/mm/slub.c
> +++ linux-2.6/mm/slub.c
> @@ -2155,6 +2155,7 @@ static struct kmem_cache_node *early_kme
>  {
> struct page *page;
> struct kmem_cache_node *n;
> +   unsigned long flags;
>
> BUG_ON(kmalloc_caches->size < sizeof(struct kmem_cache_node));
>
> @@ -2179,7 +2180,14 @@ static struct kmem_cache_node *early_kme
>  #endif
> init_kmem_cache_node(n);
> atomic_long_inc(>nr_slabs);
> +   /*
> +* lockdep requires consistent irq usage for each lock
> +* so even though there cannot be a race this early in
> +* the boot sequence, we still disable irqs.
> +*/
> +   local_irq_save(flags);
> add_partial(kmalloc_caches, page, 0);
> +   local_irq_restore(flags);
> return n;
>  }

I just tested something with vanilla 2.6.24-rc6 and had the same problem.
Should this patch, or something similar be included for 2.6.24?

The lockdep report:
[   40.057281] PCI: BIOS Bug: MCFG area at f000 is not E820-reserved
[   40.063736] PCI: Not using MMCONFIG.
[   40.067329] PCI: Using configuration type 1
[   40.063153]
[   40.063154] =
[   40.063156] [ INFO: inconsistent lock state ]
[   40.063157] 2.6.24-rc6 #1
[   40.063158] -
[   40.063160] inconsistent {softirq-on-W} -> {in-softirq-W} usage.
[   40.063162] swapper/0 [HC0[0]:SC1[1]:HE0:SE0] takes:
[   40.063163]  (>list_lock){-+..}, at: []
add_partial+0x1c/0x50
[   40.063172] {softirq-on-W} state was registered at:
[   40.063173]   [] __lock_acquire+0x3c7/0x1140
[   40.063179]   [] trace_hardirqs_on+0xbf/0x160
[   40.063182]   [] lock_acquire+0x5b/0x80
[   40.063185]   [] add_partial+0x1c/0x50
[   40.063187]   [] _spin_lock+0x25/0x40
[   40.063192]   [] add_partial+0x1c/0x50
[   40.063195]   [] kmem_cache_open+0x1c7/0x330
[   40.063198]   [] create_kmalloc_cache+0x63/0xc0
[   40.063200]   [] kmem_cache_init+0x65/0x1d0
[   40.063204]   [] start_kernel+0x245/0x360
[   40.063208]   [] _sinittext+0x131/0x140
[   40.063211]   [] 0x
[   40.063214] irq event stamp: 569
[   40.063215] hardirqs last  enabled at (568): []
kmem_cache_free+0xcd/0x100
[   40.063219] hardirqs last disabled at (569): []
kmem_cache_free+0x68/0x100
[   40.063222] softirqs last  enabled at (550): []
__do_softirq+0xef/0x110
[   40.063226] softirqs last disabled at (557): []
call_softirq+0x1c/0x30
[   40.063230]
[   40.063230] other info that might help us debug this:
[   40.063231] no locks held by swapper/0.
[   40.063232]
[   40.063233] stack backtrace:
[   40.063235] Pid: 0, comm: swapper Not tainted 2.6.24-rc6 #1
[   40.063236]
[   40.063236] Call Trace:
[   40.063237][] print_usage_bug+0x189/0x190
[   40.063243]  [] mark_lock+0x63d/0x650
[   40.063246]  [] __lock_acquire+0x37e/0x1140
[   40.063248]  [] dump_trace+0xd7/0x2d0
[   40.063250]  [] save_stack_trace+0x28/0x50
[   40.063253]  [] free_fdtable_rcu+0x94/0xa0
[   40.063255]  [] lock_acquire+0x5b/0x80
[   40.063257]  [] add_partial+0x1c/0x50
[   40.063259]  [] _spin_lock+0x25/0x40
[   40.063261]  [] add_partial+0x1c/0x50
[   40.063264]  [] __slab_free+0xaf/0x2f0
[   40.063265]  [] free_fdtable_rcu+0x94/0xa0
[   40.063267]  [] free_fdtable_rcu+0x94/0xa0
[   40.063269]  [] kmem_cache_free+0xa1/0x100
[   40.063271]  [] free_fdtable_rcu+0x94/0xa0
[   

Re: [BUG] 2.6.24-rc2-mm1 - kernel bug on nfs v4

2008-01-02 Thread Torsten Kaiser
CC's somewhat trimmed...
On Nov 18, 2007 12:00 AM, root [EMAIL PROTECTED] wrote:
 On Sat, Nov 17, 2007 at 07:09:46PM +0100, Ingo Molnar wrote:
  * Torsten Kaiser [EMAIL PROTECTED] wrote:
 
   Sadly lockdep does not work for me, as it gets turned off early:
   [   39.851594] -
   [   39.855963] inconsistent {softirq-on-W} - {in-softirq-W} usage.
   [   39.861981] swapper/0 [HC0[0]:SC1[1]:HE0:SE0] takes:
   [   39.866963]  (n-list_lock){-+..}, at: [802935c1]
 
  hey, that means it found a bug - which is not sad at all :-)

 ---
 Subject: lockdep: slub: annotate boot time node-list_lock usage

 inconsistent {softirq-on-W} - {in-softirq-W} usage.
 swapper/0 [HC0[0]:SC1[1]:HE0:SE0] takes:
  (n-list_lock){-+..}, at: [802935c1] add_partial+0x31/0xa0
 {softirq-on-W} state was registered at:
   [80259fb8] __lock_acquire+0x3e8/0x1140
   [80259838] debug_check_no_locks_freed+0x188/0x1a0
   [8025ad65] lock_acquire+0x55/0x70
   [802935c1] add_partial+0x31/0xa0
   [805c76de] _spin_lock+0x1e/0x30
   [802935c1] add_partial+0x31/0xa0
   [80296f9c] kmem_cache_open+0x1cc/0x330
   [805c7984] _spin_unlock_irq+0x24/0x30
   [802974f4] create_kmalloc_cache+0x64/0xf0
   [80295640] init_alloc_cpu_cpu+0x70/0x90
   [8080ada5] kmem_cache_init+0x65/0x1d0
   [807f1b4e] start_kernel+0x23e/0x350
   [807f112d] _sinittext+0x12d/0x140
   [] 0x

 Signed-off-by: Peter Zijlstra [EMAIL PROTECTED]
 CC: Christoph Lameter [EMAIL PROTECTED]
 CC: Kamalesh Babulal [EMAIL PROTECTED]
 ---
  mm/slub.c |8 
  1 file changed, 8 insertions(+)

 Index: linux-2.6/mm/slub.c
 ===
 --- linux-2.6.orig/mm/slub.c
 +++ linux-2.6/mm/slub.c
 @@ -2155,6 +2155,7 @@ static struct kmem_cache_node *early_kme
  {
 struct page *page;
 struct kmem_cache_node *n;
 +   unsigned long flags;

 BUG_ON(kmalloc_caches-size  sizeof(struct kmem_cache_node));

 @@ -2179,7 +2180,14 @@ static struct kmem_cache_node *early_kme
  #endif
 init_kmem_cache_node(n);
 atomic_long_inc(n-nr_slabs);
 +   /*
 +* lockdep requires consistent irq usage for each lock
 +* so even though there cannot be a race this early in
 +* the boot sequence, we still disable irqs.
 +*/
 +   local_irq_save(flags);
 add_partial(kmalloc_caches, page, 0);
 +   local_irq_restore(flags);
 return n;
  }

I just tested something with vanilla 2.6.24-rc6 and had the same problem.
Should this patch, or something similar be included for 2.6.24?

The lockdep report:
[   40.057281] PCI: BIOS Bug: MCFG area at f000 is not E820-reserved
[   40.063736] PCI: Not using MMCONFIG.
[   40.067329] PCI: Using configuration type 1
[   40.063153]
[   40.063154] =
[   40.063156] [ INFO: inconsistent lock state ]
[   40.063157] 2.6.24-rc6 #1
[   40.063158] -
[   40.063160] inconsistent {softirq-on-W} - {in-softirq-W} usage.
[   40.063162] swapper/0 [HC0[0]:SC1[1]:HE0:SE0] takes:
[   40.063163]  (n-list_lock){-+..}, at: [8029409c]
add_partial+0x1c/0x50
[   40.063172] {softirq-on-W} state was registered at:
[   40.063173]   [8025a237] __lock_acquire+0x3c7/0x1140
[   40.063179]   [8025991f] trace_hardirqs_on+0xbf/0x160
[   40.063182]   [8025b00b] lock_acquire+0x5b/0x80
[   40.063185]   [8029409c] add_partial+0x1c/0x50
[   40.063187]   [805d0345] _spin_lock+0x25/0x40
[   40.063192]   [8029409c] add_partial+0x1c/0x50
[   40.063195]   [802979a7] kmem_cache_open+0x1c7/0x330
[   40.063198]   [80297f23] create_kmalloc_cache+0x63/0xc0
[   40.063200]   [80812d65] kmem_cache_init+0x65/0x1d0
[   40.063204]   [807f9bc5] start_kernel+0x245/0x360
[   40.063208]   [807f9131] _sinittext+0x131/0x140
[   40.063211]   [] 0x
[   40.063214] irq event stamp: 569
[   40.063215] hardirqs last  enabled at (568): [8029677d]
kmem_cache_free+0xcd/0x100
[   40.063219] hardirqs last disabled at (569): [80296718]
kmem_cache_free+0x68/0x100
[   40.063222] softirqs last  enabled at (550): [8023bb7f]
__do_softirq+0xef/0x110
[   40.063226] softirqs last disabled at (557): [8020cf0c]
call_softirq+0x1c/0x30
[   40.063230]
[   40.063230] other info that might help us debug this:
[   40.063231] no locks held by swapper/0.
[   40.063232]
[   40.063233] stack backtrace:
[   40.063235] Pid: 0, comm: swapper Not tainted 2.6.24-rc6 #1
[   40.063236]
[   40.063236] Call Trace:
[   40.063237]  IRQ  [80258699] print_usage_bug+0x189/0x190
[   40.063243]  [802596fd] mark_lock+0x63d/0x650
[   40.063246]  [8025a1ee] __lock_acquire+0x37e/0x1140
[   40.063248]  [8020d1b7] 

Re: [BUG] 2.6.24-rc2-mm1 - kernel bug on nfs v4

2008-01-02 Thread Christoph Lameter
On Wed, 2 Jan 2008, Torsten Kaiser wrote:

 I just tested something with vanilla 2.6.24-rc6 and had the same problem.
 Should this patch, or something similar be included for 2.6.24?

Such a patch is in Andrew's tree.

2.6.24-rc6-mm1:

tatic struct kmem_cache_node *early_kmem_cache_node_alloc(gfp_t gfpflags,
   int node)
{
struct page *page;
struct kmem_cache_node *n;
unsigned long flags;
...

/*
 * lockdep requires consistent irq usage for each lock
 * so even though there cannot be a race this early in
 * the boot sequence, we still disable irqs.
 */
local_irq_save(flags);
add_partial(kmalloc_caches, page, 0);
local_irq_restore(flags);
return n;
}
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] 2.6.24-rc2-mm1 - kernel bug on nfs v4

2008-01-02 Thread Torsten Kaiser
On Jan 2, 2008 9:51 PM, Christoph Lameter [EMAIL PROTECTED] wrote:
 On Wed, 2 Jan 2008, Torsten Kaiser wrote:

  I just tested something with vanilla 2.6.24-rc6 and had the same problem.
  Should this patch, or something similar be included for 2.6.24?

 Such a patch is in Andrew's tree.

 2.6.24-rc6-mm1:

 tatic struct kmem_cache_node *early_kmem_cache_node_alloc(gfp_t gfpflags,
int node)
 {
 struct page *page;
 struct kmem_cache_node *n;
 unsigned long flags;
 ...

 /*

  * lockdep requires consistent irq usage for each lock
  * so even though there cannot be a race this early in
  * the boot sequence, we still disable irqs.
  */
 local_irq_save(flags);
 add_partial(kmalloc_caches, page, 0);
 local_irq_restore(flags);
 return n;
 }


from 2.6.24-rc6-mm1 patch-series file:
slub-noinline-some-functions-to-avoid-them-being-folded-into-alloc-free.patch
slub-move-kmem_cache_node-determination-into-add_full-and-add_partial.patch
slub-move-kmem_cache_node-determination-into-add_full-and-add_partial-slub-workaround-for-lockdep-confusion.patch
slub-avoid-checking-for-a-valid-object-before-zeroing-on-the-fast-path.patch

It seems it got lumped into some other slub patches, but the bug does
not seem to be introduced by them, as I can see it in mainline
2.6.24-rc6.

Should this patch made a candidate for the merge-before-2.6.24-final-queue?

Torsten
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [lm-sensors] broken suspend [Was: 2.6.24-rc2-mm1]

2007-11-26 Thread Jiri Slaby
On 11/21/2007 01:00 AM, Rafael J. Wysocki wrote:
> On Tuesday, 20 of November 2007, Mark M. Hoffman wrote:
>> commit ce9c7b78c839a6304696d90083eac08baad524ce
>> Author: Mark M. Hoffman <[EMAIL PROTECTED]>
>> Date:   Tue Nov 20 07:51:50 2007 -0500
>>
>> hwmon: (coretemp) fix suspend/resume hang
>> 
>> Signed-off-by: Mark M. Hoffman <[EMAIL PROTECTED]>
> 
> I'd do it like this:
> 
>> diff --git a/drivers/hwmon/coretemp.c b/drivers/hwmon/coretemp.c
>> index 5c82ec7..afe2d31 100644
>> --- a/drivers/hwmon/coretemp.c
>> +++ b/drivers/hwmon/coretemp.c
>> @@ -338,10 +338,12 @@ static int coretemp_cpu_callback(struct notifier_block 
>> *nfb,
>>  switch (action) {
>>  case CPU_ONLINE:
>>  case CPU_ONLINE_FROZEN:
>> +case CPU_DOWN_FAILED:
>>  coretemp_device_add(cpu);
> + case CPU_DOWN_FAILED_FROZEN: 
>>  break;
>> -case CPU_DEAD:
>> -case CPU_DEAD_FROZEN:
>> +case CPU_DOWN_PREPARE:
>>  coretemp_device_remove(cpu);
> + case CPU_DOWN_PREPARE_FROZEN:
>>  break;
>>  }

Sorry for the delay, this (trimmed version) solves the problem!

thanks,
-- 
Jiri Slaby ([EMAIL PROTECTED])
Faculty of Informatics, Masaryk University
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [lm-sensors] broken suspend [Was: 2.6.24-rc2-mm1]

2007-11-26 Thread Jiri Slaby
On 11/21/2007 01:00 AM, Rafael J. Wysocki wrote:
 On Tuesday, 20 of November 2007, Mark M. Hoffman wrote:
 commit ce9c7b78c839a6304696d90083eac08baad524ce
 Author: Mark M. Hoffman [EMAIL PROTECTED]
 Date:   Tue Nov 20 07:51:50 2007 -0500

 hwmon: (coretemp) fix suspend/resume hang
 
 Signed-off-by: Mark M. Hoffman [EMAIL PROTECTED]
 
 I'd do it like this:
 
 diff --git a/drivers/hwmon/coretemp.c b/drivers/hwmon/coretemp.c
 index 5c82ec7..afe2d31 100644
 --- a/drivers/hwmon/coretemp.c
 +++ b/drivers/hwmon/coretemp.c
 @@ -338,10 +338,12 @@ static int coretemp_cpu_callback(struct notifier_block 
 *nfb,
  switch (action) {
  case CPU_ONLINE:
  case CPU_ONLINE_FROZEN:
 +case CPU_DOWN_FAILED:
  coretemp_device_add(cpu);
 + case CPU_DOWN_FAILED_FROZEN: 
  break;
 -case CPU_DEAD:
 -case CPU_DEAD_FROZEN:
 +case CPU_DOWN_PREPARE:
  coretemp_device_remove(cpu);
 + case CPU_DOWN_PREPARE_FROZEN:
  break;
  }

Sorry for the delay, this (trimmed version) solves the problem!

thanks,
-- 
Jiri Slaby ([EMAIL PROTECTED])
Faculty of Informatics, Masaryk University
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc2-mm1: kcryptd vs lockdep

2007-11-23 Thread Herbert Xu
Alasdair G Kergon <[EMAIL PROTECTED]> wrote:
> Also io->pending may need better protection - atomic, but missing memory
> barriers?  (May be getting away without sometimes due to side-effects of
> other function calls, but needs doing properly.)

If it's using atomic_dec_and_test then that comes with an implicit
memory barrier.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc2-mm1: kcryptd vs lockdep

2007-11-23 Thread Torsten Kaiser
On Nov 24, 2007 4:49 AM, Alasdair G Kergon <[EMAIL PROTECTED]> wrote:
> On Fri, Nov 23, 2007 at 11:42:36PM +0100, Torsten Kaiser wrote:
> > ... or I just don't see the bug.
>
> See my earlier post in this thread: there's a race in the write loop
> where a work struct could be used twice on the same queue.
> (Needs data structure change to fix that, which nobody has attempted
> to do yet.)

As I wrote in an earlier post:
I did see this lockdep message even with
agk-dm-dm-crypt-move-bio-submission-to-thread.patch reverted, so the
work struct is not used in the write loop.

> BTW To eliminate any internal lockdep concerns (and people say there
> should be no problem) temporarily add a second struct instead of reusing
> one on two queues.

I think, this might really be a lockdep bug, but as I'm not fluent
enough with C, please check, if my logik is correct:

The freed-locked-lock-test is the only function that uses this in lockdep.c:
static inline int in_range(const void *start, const void *addr, const void *end)
{
return addr >= start && addr <= end;
}
This  will return true, if addr is in the range of start (including)
to end (including).

But debug_check_no_locks_freed() seems does:
const void *mem_to = mem_from + mem_len
-> mem_to is the last byte of the freed range, that fits in_range
lock_from = (void *)hlock->instance;
-> first byte of the lock
lock_to = (void *)(hlock->instance + 1);
-> first byte of the next lock, not last byte of the lock that is being checked!
(Or am I reading this wrong?)

The test is:
if (!in_range(mem_from, lock_from, mem_to) &&
!in_range(mem_from, lock_to, mem_to))
continue;
So it tests, if the first byte of the lock is in the range that is freed ->OK
And if the first byte of the *next* lock is in the range that is freed
-> Not OK.

That would also explain the rather strange output:
=
[ BUG: held lock freed! ]
-
kcryptd/1022 is freeing memory
81011EBEFB00-81011EBEFB3F, with a lock still held there!
  (kcryptd){--..}, at: [] run_workqueue+0x129/0x210
2 locks held by kcryptd/1022:
 #0:  (kcryptd){--..}, at: [] run_workqueue+0x129/0x210
 #1:  (>work#2){--..}, at: [] run_workqueue+0x129/0x210

That claims that the lock of the *workqueue* struct, not the work
struct is getting freed!
But I'm still happily using the dm-crypt device, even 19 hours after
that message.

So my current best guess to the source of this message is, that with
the change in the ref counting it is now possible that the work struct
is really getting freed before the workqueue function returns. But as
the comment in run_workqueue() says, that is still legal.
But now the first byte of the next lock is part of the freed memory
and so the wrong "held lock freed" is triggered.

Torsten
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc2-mm1: kcryptd vs lockdep

2007-11-23 Thread Alasdair G Kergon
On Fri, Nov 23, 2007 at 11:42:36PM +0100, Torsten Kaiser wrote:
> Before the cleanup *all* calls to crypt_dec_pending() was via crypt_endio().
> Now there is an additional call to crypt_dec_pending() to balance the
> additional ref placed into crypt_write_io_process(). And that one is
> not called from whatever context/thread cleans up after
> make_generic_request, but directly in the context/thread of the caller
> of crypt_write_io_process(), and that is kcryptd.
 
Please do look at the latest patches (always at
http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/series.html
 )
where you'll see I've already disentangled the mess of functions
and given them more understandable names, so at least following the program
flow is easier.

Read and write do the ref counting differently (but correctly AFAICT) - I want
that changing, but held back from doing it without first checking whether the
later patches (not yet reviewed) provide a reason to prefer one method
over the other.

Alasdair
-- 
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc2-mm1: kcryptd vs lockdep

2007-11-23 Thread Alasdair G Kergon
Also io->pending may need better protection - atomic, but missing memory
barriers?  (May be getting away without sometimes due to side-effects of
other function calls, but needs doing properly.)

[BTW Other device-mapper atomic_t usage also needs reviewing.]

Alasdair
-- 
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc2-mm1: kcryptd vs lockdep

2007-11-23 Thread Alasdair G Kergon
On Fri, Nov 23, 2007 at 11:42:36PM +0100, Torsten Kaiser wrote:
> ... or I just don't see the bug.
 
See my earlier post in this thread: there's a race in the write loop
where a work struct could be used twice on the same queue.
(Needs data structure change to fix that, which nobody has attempted
to do yet.)

BTW To eliminate any internal lockdep concerns (and people say there
should be no problem) temporarily add a second struct instead of reusing
one on two queues.

Alasdair
-- 
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc2-mm1: kcryptd vs lockdep

2007-11-23 Thread Torsten Kaiser
On Nov 19, 2007 10:00 PM, Milan Broz <[EMAIL PROTECTED]> wrote:
> Torsten Kaiser wrote:
> > On Nov 19, 2007 8:56 AM, Ingo Molnar <[EMAIL PROTECTED]> wrote:
> >> * Torsten Kaiser <[EMAIL PROTECTED]> wrote:
> ...
> > Above this acquire/release sequence is the following comment:
> > #ifdef CONFIG_LOCKDEP
> > /*
> >  * It is permissible to free the struct work_struct
> >  * from inside the function that is called from it,
> >  * this we need to take into account for lockdep too.
> >  * To avoid bogus "held lock freed" warnings as well
> >  * as problems when looking into work->lockdep_map,
> >  * make a copy and use that here.
> >  */
> > struct lockdep_map lockdep_map = work->lockdep_map;
> > #endif
> >
> > Did something trigger this anyway?
> >
> > Anything I could try, apart from more boots with slub_debug=F?
>
> Please could you try which patch from the dm-crypt series cause this ?
> (agk-dm-dm-crypt* names.)
>
> I suspect agk-dm-dm-crypt-move-bio-submission-to-thread.patch because
> there is one work struct used subsequently in two threads...
> (io thread already started while crypt thread is processing lockdep_map
> after calling f(work)...)
>
> (btw these patches prepare dm-crypt for next patchset introducing
> async cryptoapi, so there should be no functional changes yet.)

I looked at all of these agk-*-patches, as the error is not
bisectable, because it triggers unreliable.
The one that looks suspicious is agk-dm-dm-crypt-tidy-io-ref-counting.patch

This one does a functional change, as there now is an additional ref
on io->pending. Instead of only increasing io->pending if there really
are more then one clone-bio, it will now take an additional ref in
crypt_write_io_process().

I certainly agree with the cleanup, but this introduces the following change:

Before the cleanup *all* calls to crypt_dec_pending() was via crypt_endio().
Now there is an additional call to crypt_dec_pending() to balance the
additional ref placed into crypt_write_io_process(). And that one is
not called from whatever context/thread cleans up after
make_generic_request, but directly in the context/thread of the caller
of crypt_write_io_process(), and that is kcryptd.

So now it is possible (if all requests finish before
crypt_write_io_process() returns) that kcryptd itself will release the
bio, but the workqueue infrastructure still seems to have a lock on
that.

But as the comment in run_workqueue says, this should be legal, and I
can't figure out what would make the the lockdep copy mechanism fail.
Especially if the trigger was really a WRITE request, as with
agk-dm-dm-crypt-move-bio-submission-to-thread.patch reverted this
should never use the kcrypt_io-workqueue and so there should be not
even the problem with using INIT_WORK twice on the same work_struct.

... or I just don't see the bug.

Torsten
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc2-mm1: kcryptd vs lockdep

2007-11-23 Thread Torsten Kaiser
On Nov 20, 2007 7:55 AM, Torsten Kaiser <[EMAIL PROTECTED]> wrote:
> On Nov 19, 2007 10:00 PM, Milan Broz <[EMAIL PROTECTED]> wrote:
> > Please could you try which patch from the dm-crypt series cause this ?
> > (agk-dm-dm-crypt* names.)
> >
> > I suspect agk-dm-dm-crypt-move-bio-submission-to-thread.patch because
> > there is one work struct used subsequently in two threads...
> > (io thread already started while crypt thread is processing lockdep_map
> > after calling f(work)...)
>
> After reverting only
> agk-dm-dm-crypt-move-bio-submission-to-thread.patch I also have not
> seen the 'held lock freed' message again.
>
> If it happens again with this revert, I will post that output.

It happened again, here I post the output:
Nov 23 10:56:17 treogen [   58.364441] XFS mounting filesystem dm-0
Nov 23 10:56:17 treogen [   58.519648] Ending clean XFS mount for
filesystem: dm-0
Nov 23 10:56:17 treogen [   58.858098]
Nov 23 10:56:17 treogen [   58.858104] =
Nov 23 10:56:17 treogen [   58.863316] [ BUG: held lock freed! ]
Nov 23 10:56:17 treogen [   58.866998] -
Nov 23 10:56:17 treogen [   58.870685] kcryptd/1022 is freeing memory
81011EAD4B00-81011EAD4B3F
, with a lock still held there!
Nov 23 10:56:17 treogen [   58.880430]  (kcryptd){--..}, at:
[] run_workqueue+0x129/0
x210
Nov 23 10:56:17 treogen [   58.888014] 2 locks held by kcryptd/1022:
Nov 23 10:56:17 treogen [   58.892045]  #0:  (kcryptd){--..}, at:
[] run_workqueue+0x
129/0x210
Nov 23 10:56:17 treogen [   58.900095]  #1:  (>work#2){--..}, at:
[] run_workqueu
e+0x129/0x210
Nov 23 10:56:17 treogen [   58.908535]
Nov 23 10:56:17 treogen [   58.908535] stack backtrace:
Nov 23 10:56:17 treogen [   58.912954]
Nov 23 10:56:17 treogen [   58.912955] Call Trace:
Nov 23 10:56:17 treogen [   58.916944]  []
debug_check_no_locks_freed+0x190/0x1b0
Nov 23 10:56:17 treogen [   58.924313]  []
mempool_free_slab+0x12/0x20
Nov 23 10:56:17 treogen [   58.930073]  []
kmem_cache_free+0x79/0xe0
Nov 23 10:56:17 treogen [   58.935665]  []
mempool_free_slab+0x12/0x20
Nov 23 10:56:17 treogen [   58.941424]  []
mempool_free+0x8a/0xa0
Nov 23 10:56:17 treogen [   58.946755]  [] bio_free+0x2f/0x50
Nov 23 10:56:17 treogen [   58.951736]  []
dm_bio_destructor+0xd/0x10
Nov 23 10:56:17 treogen [   58.957414]  [] bio_put+0x26/0x30
Nov 23 10:56:17 treogen [   58.962311]  []
clone_endio+0x83/0xb0
Nov 23 10:56:17 treogen [   58.967553]  []
kcryptd_do_crypt+0x0/0x290
Nov 23 10:56:17 treogen [   58.973224]  [] bio_endio+0x19/0x40
Nov 23 10:56:17 treogen [   58.978291]  []
crypt_dec_pending+0x32/0x50
Nov 23 10:56:17 treogen [   58.984050]  []
kcryptd_do_crypt+0x64/0x290
Nov 23 10:56:17 treogen [   58.989810]  []
kcryptd_do_crypt+0x0/0x290
Nov 23 10:56:17 treogen [   58.995483]  []
kcryptd_do_crypt+0x0/0x290
Nov 23 10:56:17 treogen [   59.001158]  []
run_workqueue+0x175/0x210
Nov 23 10:56:17 treogen [   59.006746]  []
worker_thread+0x71/0xb0
Nov 23 10:56:17 treogen [   59.012158]  []
autoremove_wake_function+0x0/0x40
Nov 23 10:56:17 treogen [   59.018435]  []
worker_thread+0x0/0xb0
Nov 23 10:56:17 treogen [   59.023764]  [] kthread+0x4d/0x80
Nov 23 10:56:17 treogen [   59.028660]  [] child_rip+0xa/0x12
Nov 23 10:56:17 treogen [   59.033640]  []
restore_args+0x0/0x30
Nov 23 10:56:17 treogen [   59.038880]  [] kthread+0x0/0x80
Nov 23 10:56:17 treogen [   59.043689]  [] child_rip+0x0/0x12
Nov 23 10:56:17 treogen [   59.048670]
Nov 23 10:56:17 treogen [   59.050190] INFO: lockdep is turned off.
Nov 23 10:56:17 treogen [   59.919020] pata_amd :00:04.0: version 0.3.10

>From what I see the only difference between the other stack traces and
this one is the following part:

old traces with agk-dm-dm-crypt-move-bio-submission-to-thread.patch applied:
[   64.584415]  [] bio_free+0x2f/0x50
[   64.586337]  [] bio_fs_destructor+0x10/0x20
[   64.588558]  [] bio_put+0x26/0x30
[   64.590446]  [] xfs_buf_bio_end_io+0x99/0x120
[   64.592734]  [] bio_endio+0x19/0x40
[   64.594687]  [] dec_pending+0x107/0x210
[   64.596775]  [] clone_endio+0x70/0xb0

new trace with agk-dm-dm-crypt-move-bio-submission-to-thread.patch reverted:
[   58.946755]  [] bio_free+0x2f/0x50
[   58.951736]  [] dm_bio_destructor+0xd/0x10
[   58.957414]  [] bio_put+0x26/0x30
[   58.962311]  [] clone_endio+0x83/0xb0

(gdb) list *0x804e3ae0
0x804e3ae0 is in clone_endio (drivers/md/dm.c:539).
534 dec_pending(tio->io, error);
535
536 /*
537  * Store md for cleanup instead of tio which is about
to get freed.
538  */
539 bio->bi_private = md->bs;
540
541 bio_put(bio);
542 free_tio(md, tio);
543 }

(gdb) list *0x804e3af3
0x804e3af3 is in clone_endio (drivers/md/dm.c:542).
537  * Store md for cleanup instead of tio which is about
to get freed.
538  */
539 bio->bi_private = md->bs;
540
541 bio_put(bio);
542 

Re: 2.6.24-rc2-mm1: kcryptd vs lockdep

2007-11-23 Thread Alasdair G Kergon
Also io-pending may need better protection - atomic, but missing memory
barriers?  (May be getting away without sometimes due to side-effects of
other function calls, but needs doing properly.)

[BTW Other device-mapper atomic_t usage also needs reviewing.]

Alasdair
-- 
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc2-mm1: kcryptd vs lockdep

2007-11-23 Thread Alasdair G Kergon
On Fri, Nov 23, 2007 at 11:42:36PM +0100, Torsten Kaiser wrote:
 ... or I just don't see the bug.
 
See my earlier post in this thread: there's a race in the write loop
where a work struct could be used twice on the same queue.
(Needs data structure change to fix that, which nobody has attempted
to do yet.)

BTW To eliminate any internal lockdep concerns (and people say there
should be no problem) temporarily add a second struct instead of reusing
one on two queues.

Alasdair
-- 
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc2-mm1: kcryptd vs lockdep

2007-11-23 Thread Alasdair G Kergon
On Fri, Nov 23, 2007 at 11:42:36PM +0100, Torsten Kaiser wrote:
 Before the cleanup *all* calls to crypt_dec_pending() was via crypt_endio().
 Now there is an additional call to crypt_dec_pending() to balance the
 additional ref placed into crypt_write_io_process(). And that one is
 not called from whatever context/thread cleans up after
 make_generic_request, but directly in the context/thread of the caller
 of crypt_write_io_process(), and that is kcryptd.
 
Please do look at the latest patches (always at
http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/series.html
 )
where you'll see I've already disentangled the mess of functions
and given them more understandable names, so at least following the program
flow is easier.

Read and write do the ref counting differently (but correctly AFAICT) - I want
that changing, but held back from doing it without first checking whether the
later patches (not yet reviewed) provide a reason to prefer one method
over the other.

Alasdair
-- 
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc2-mm1: kcryptd vs lockdep

2007-11-23 Thread Torsten Kaiser
On Nov 24, 2007 4:49 AM, Alasdair G Kergon [EMAIL PROTECTED] wrote:
 On Fri, Nov 23, 2007 at 11:42:36PM +0100, Torsten Kaiser wrote:
  ... or I just don't see the bug.

 See my earlier post in this thread: there's a race in the write loop
 where a work struct could be used twice on the same queue.
 (Needs data structure change to fix that, which nobody has attempted
 to do yet.)

As I wrote in an earlier post:
I did see this lockdep message even with
agk-dm-dm-crypt-move-bio-submission-to-thread.patch reverted, so the
work struct is not used in the write loop.

 BTW To eliminate any internal lockdep concerns (and people say there
 should be no problem) temporarily add a second struct instead of reusing
 one on two queues.

I think, this might really be a lockdep bug, but as I'm not fluent
enough with C, please check, if my logik is correct:

The freed-locked-lock-test is the only function that uses this in lockdep.c:
static inline int in_range(const void *start, const void *addr, const void *end)
{
return addr = start  addr = end;
}
This  will return true, if addr is in the range of start (including)
to end (including).

But debug_check_no_locks_freed() seems does:
const void *mem_to = mem_from + mem_len
- mem_to is the last byte of the freed range, that fits in_range
lock_from = (void *)hlock-instance;
- first byte of the lock
lock_to = (void *)(hlock-instance + 1);
- first byte of the next lock, not last byte of the lock that is being checked!
(Or am I reading this wrong?)

The test is:
if (!in_range(mem_from, lock_from, mem_to) 
!in_range(mem_from, lock_to, mem_to))
continue;
So it tests, if the first byte of the lock is in the range that is freed -OK
And if the first byte of the *next* lock is in the range that is freed
- Not OK.

That would also explain the rather strange output:
=
[ BUG: held lock freed! ]
-
kcryptd/1022 is freeing memory
81011EBEFB00-81011EBEFB3F, with a lock still held there!
  (kcryptd){--..}, at: [80247dd9] run_workqueue+0x129/0x210
2 locks held by kcryptd/1022:
 #0:  (kcryptd){--..}, at: [80247dd9] run_workqueue+0x129/0x210
 #1:  (io-work#2){--..}, at: [80247dd9] run_workqueue+0x129/0x210

That claims that the lock of the *workqueue* struct, not the work
struct is getting freed!
But I'm still happily using the dm-crypt device, even 19 hours after
that message.

So my current best guess to the source of this message is, that with
the change in the ref counting it is now possible that the work struct
is really getting freed before the workqueue function returns. But as
the comment in run_workqueue() says, that is still legal.
But now the first byte of the next lock is part of the freed memory
and so the wrong held lock freed is triggered.

Torsten
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc2-mm1: kcryptd vs lockdep

2007-11-23 Thread Torsten Kaiser
On Nov 19, 2007 10:00 PM, Milan Broz [EMAIL PROTECTED] wrote:
 Torsten Kaiser wrote:
  On Nov 19, 2007 8:56 AM, Ingo Molnar [EMAIL PROTECTED] wrote:
  * Torsten Kaiser [EMAIL PROTECTED] wrote:
 ...
  Above this acquire/release sequence is the following comment:
  #ifdef CONFIG_LOCKDEP
  /*
   * It is permissible to free the struct work_struct
   * from inside the function that is called from it,
   * this we need to take into account for lockdep too.
   * To avoid bogus held lock freed warnings as well
   * as problems when looking into work-lockdep_map,
   * make a copy and use that here.
   */
  struct lockdep_map lockdep_map = work-lockdep_map;
  #endif
 
  Did something trigger this anyway?
 
  Anything I could try, apart from more boots with slub_debug=F?

 Please could you try which patch from the dm-crypt series cause this ?
 (agk-dm-dm-crypt* names.)

 I suspect agk-dm-dm-crypt-move-bio-submission-to-thread.patch because
 there is one work struct used subsequently in two threads...
 (io thread already started while crypt thread is processing lockdep_map
 after calling f(work)...)

 (btw these patches prepare dm-crypt for next patchset introducing
 async cryptoapi, so there should be no functional changes yet.)

I looked at all of these agk-*-patches, as the error is not
bisectable, because it triggers unreliable.
The one that looks suspicious is agk-dm-dm-crypt-tidy-io-ref-counting.patch

This one does a functional change, as there now is an additional ref
on io-pending. Instead of only increasing io-pending if there really
are more then one clone-bio, it will now take an additional ref in
crypt_write_io_process().

I certainly agree with the cleanup, but this introduces the following change:

Before the cleanup *all* calls to crypt_dec_pending() was via crypt_endio().
Now there is an additional call to crypt_dec_pending() to balance the
additional ref placed into crypt_write_io_process(). And that one is
not called from whatever context/thread cleans up after
make_generic_request, but directly in the context/thread of the caller
of crypt_write_io_process(), and that is kcryptd.

So now it is possible (if all requests finish before
crypt_write_io_process() returns) that kcryptd itself will release the
bio, but the workqueue infrastructure still seems to have a lock on
that.

But as the comment in run_workqueue says, this should be legal, and I
can't figure out what would make the the lockdep copy mechanism fail.
Especially if the trigger was really a WRITE request, as with
agk-dm-dm-crypt-move-bio-submission-to-thread.patch reverted this
should never use the kcrypt_io-workqueue and so there should be not
even the problem with using INIT_WORK twice on the same work_struct.

... or I just don't see the bug.

Torsten
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc2-mm1: kcryptd vs lockdep

2007-11-23 Thread Torsten Kaiser
On Nov 20, 2007 7:55 AM, Torsten Kaiser [EMAIL PROTECTED] wrote:
 On Nov 19, 2007 10:00 PM, Milan Broz [EMAIL PROTECTED] wrote:
  Please could you try which patch from the dm-crypt series cause this ?
  (agk-dm-dm-crypt* names.)
 
  I suspect agk-dm-dm-crypt-move-bio-submission-to-thread.patch because
  there is one work struct used subsequently in two threads...
  (io thread already started while crypt thread is processing lockdep_map
  after calling f(work)...)

 After reverting only
 agk-dm-dm-crypt-move-bio-submission-to-thread.patch I also have not
 seen the 'held lock freed' message again.

 If it happens again with this revert, I will post that output.

It happened again, here I post the output:
Nov 23 10:56:17 treogen [   58.364441] XFS mounting filesystem dm-0
Nov 23 10:56:17 treogen [   58.519648] Ending clean XFS mount for
filesystem: dm-0
Nov 23 10:56:17 treogen [   58.858098]
Nov 23 10:56:17 treogen [   58.858104] =
Nov 23 10:56:17 treogen [   58.863316] [ BUG: held lock freed! ]
Nov 23 10:56:17 treogen [   58.866998] -
Nov 23 10:56:17 treogen [   58.870685] kcryptd/1022 is freeing memory
81011EAD4B00-81011EAD4B3F
, with a lock still held there!
Nov 23 10:56:17 treogen [   58.880430]  (kcryptd){--..}, at:
[80247dd9] run_workqueue+0x129/0
x210
Nov 23 10:56:17 treogen [   58.888014] 2 locks held by kcryptd/1022:
Nov 23 10:56:17 treogen [   58.892045]  #0:  (kcryptd){--..}, at:
[80247dd9] run_workqueue+0x
129/0x210
Nov 23 10:56:17 treogen [   58.900095]  #1:  (io-work#2){--..}, at:
[80247dd9] run_workqueu
e+0x129/0x210
Nov 23 10:56:17 treogen [   58.908535]
Nov 23 10:56:17 treogen [   58.908535] stack backtrace:
Nov 23 10:56:17 treogen [   58.912954]
Nov 23 10:56:17 treogen [   58.912955] Call Trace:
Nov 23 10:56:17 treogen [   58.916944]  [8025a5f0]
debug_check_no_locks_freed+0x190/0x1b0
Nov 23 10:56:17 treogen [   58.924313]  [8026f192]
mempool_free_slab+0x12/0x20
Nov 23 10:56:17 treogen [   58.930073]  [80296bb9]
kmem_cache_free+0x79/0xe0
Nov 23 10:56:17 treogen [   58.935665]  [8026f192]
mempool_free_slab+0x12/0x20
Nov 23 10:56:17 treogen [   58.941424]  [8026f22a]
mempool_free+0x8a/0xa0
Nov 23 10:56:17 treogen [   58.946755]  [802c76af] bio_free+0x2f/0x50
Nov 23 10:56:17 treogen [   58.951736]  [804e36fd]
dm_bio_destructor+0xd/0x10
Nov 23 10:56:17 treogen [   58.957414]  [802c7436] bio_put+0x26/0x30
Nov 23 10:56:17 treogen [   58.962311]  [804e3af3]
clone_endio+0x83/0xb0
Nov 23 10:56:17 treogen [   58.967553]  [804eb860]
kcryptd_do_crypt+0x0/0x290
Nov 23 10:56:17 treogen [   58.973224]  [802c72a9] bio_endio+0x19/0x40
Nov 23 10:56:17 treogen [   58.978291]  [804eb372]
crypt_dec_pending+0x32/0x50
Nov 23 10:56:17 treogen [   58.984050]  [804eb8c4]
kcryptd_do_crypt+0x64/0x290
Nov 23 10:56:17 treogen [   58.989810]  [804eb860]
kcryptd_do_crypt+0x0/0x290
Nov 23 10:56:17 treogen [   58.995483]  [804eb860]
kcryptd_do_crypt+0x0/0x290
Nov 23 10:56:17 treogen [   59.001158]  [80247e25]
run_workqueue+0x175/0x210
Nov 23 10:56:17 treogen [   59.006746]  [80248af1]
worker_thread+0x71/0xb0
Nov 23 10:56:17 treogen [   59.012158]  [8024c830]
autoremove_wake_function+0x0/0x40
Nov 23 10:56:17 treogen [   59.018435]  [80248a80]
worker_thread+0x0/0xb0
Nov 23 10:56:17 treogen [   59.023764]  [8024c43d] kthread+0x4d/0x80
Nov 23 10:56:17 treogen [   59.028660]  [8020cbc8] child_rip+0xa/0x12
Nov 23 10:56:17 treogen [   59.033640]  [8020c2df]
restore_args+0x0/0x30
Nov 23 10:56:17 treogen [   59.038880]  [8024c3f0] kthread+0x0/0x80
Nov 23 10:56:17 treogen [   59.043689]  [8020cbbe] child_rip+0x0/0x12
Nov 23 10:56:17 treogen [   59.048670]
Nov 23 10:56:17 treogen [   59.050190] INFO: lockdep is turned off.
Nov 23 10:56:17 treogen [   59.919020] pata_amd :00:04.0: version 0.3.10

From what I see the only difference between the other stack traces and
this one is the following part:

old traces with agk-dm-dm-crypt-move-bio-submission-to-thread.patch applied:
[   64.584415]  [802c76af] bio_free+0x2f/0x50
[   64.586337]  [802c76e0] bio_fs_destructor+0x10/0x20
[   64.588558]  [802c7436] bio_put+0x26/0x30
[   64.590446]  [803834d9] xfs_buf_bio_end_io+0x99/0x120
[   64.592734]  [802c72a9] bio_endio+0x19/0x40
[   64.594687]  [804e3827] dec_pending+0x107/0x210
[   64.596775]  [804e3ae0] clone_endio+0x70/0xb0

new trace with agk-dm-dm-crypt-move-bio-submission-to-thread.patch reverted:
[   58.946755]  [802c76af] bio_free+0x2f/0x50
[   58.951736]  [804e36fd] dm_bio_destructor+0xd/0x10
[   58.957414]  [802c7436] bio_put+0x26/0x30
[   58.962311]  [804e3af3] clone_endio+0x83/0xb0

(gdb) list *0x804e3ae0
0x804e3ae0 is in clone_endio (drivers/md/dm.c:539).
534   

Re: 2.6.24-rc2-mm1: kcryptd vs lockdep

2007-11-23 Thread Herbert Xu
Alasdair G Kergon [EMAIL PROTECTED] wrote:
 Also io-pending may need better protection - atomic, but missing memory
 barriers?  (May be getting away without sometimes due to side-effects of
 other function calls, but needs doing properly.)

If it's using atomic_dec_and_test then that comes with an implicit
memory barrier.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [lm-sensors] broken suspend [Was: 2.6.24-rc2-mm1]

2007-11-21 Thread Rafael J. Wysocki
On Wednesday, 21 of November 2007, Alan Stern wrote:
> On Wed, 21 Nov 2007, Rafael J. Wysocki wrote:
> 
> > > Is it possible to unregister a driver on CPU_DOWN_PREPARE_FROZEN?
> > 
> > No.  In that case the suspend core is holding the device's mutex and your
> > attempt to unregister it will deadlock with it.
> > 
> > Do you _have_ _to_ unregister the device at all?  Why don't you just leave
> > it registered on CPU_DOWN_PREPARE_FROZEN?  The CPU is not going away
> > physically in this case and it's _guaranteed_ that _cpu_up() will be called 
> > on
> > it as soon as the hibernation image is ready or we are back from suspend.
> 
> This leaves the device registered if for some reason the number of CPUs 
> after resuming from hibernation is smaller than the number of CPUs 
> before hibernation.  Of course, in theory that's never supposed to 
> happen...

Yes, that clearly would be a bug.

Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc2-mm1: kcryptd vs lockdep

2007-11-21 Thread Johannes Berg
Hi,

> > Ok, then I have question: Is the following pseudocode correct
> > (and problem is in lock validation which checks something
> > already initialized for another queue) or reusing work_struct
> > is not permitted from inside called work function ?
> >
> > (Note comment in code "It is permissible to free the struct
> > work_struct from inside the function that is called from it".)
> >
> > struct work_struct work;
> > struct workqueue_struct *a, *b;
> >
> > do_b(*work)
> > {
> > /* do something else */
> > }
> >
> > do_a(*work)
> > {
> > /* do something */
> > INIT_WORK(, do_b);
> > queue_work(b, );
> > }
> >
> >
> > INIT_WORK(, do_a);
> > queue_work(a, );
> 
> (just in case, in that particular case PREPARE_WORK() should be used)
> 
> INIT_WORK(w) can be used if we know that "w" is not pending, and nobody
> else can write to this work (say, queue_work(w) or cancel_work_sync(w)).
> So currently the code above should work correctly.
> 
> However, I'd say it is not correct, INIT_WORK() can throw out some debug
> info for example, or the implementation could be changed.
> 
> I'm not sure about CONFIG_LOCKDEP (Johannes cc'ed). INIT_WORK() does
> lockdep_init_map(->lockdep_map) but run_workqueue() has a local copy,
> looks ok.

We explicitly need to use a copy of the lockdep_map for "locking" the
work struct as per the quoted comment. So as far as I can tell, what
INIT_WORK() is doing here is changing an at that point unused copy of
the lockdep map so I think it should be fine. Not sure about the other
fine points nor why you'd want this though :)

johannes


signature.asc
Description: This is a digitally signed message part


Re: 2.6.24-rc2-mm1: kcryptd vs lockdep

2007-11-21 Thread Oleg Nesterov
Alasdair G Kergon wrote:
>
>   - But what happens if kcryptd_crypt_write_convert_loop() calls
> INIT_WORK/queue_work twice?

Can't find this function. But "INIT_WORK + queue_work" twice is very
wrong of course.

Milan Broz wrote:
>
> Ok, then I have question: Is the following pseudocode correct
> (and problem is in lock validation which checks something
> already initialized for another queue) or reusing work_struct
> is not permitted from inside called work function ?
>
> (Note comment in code "It is permissible to free the struct
> work_struct from inside the function that is called from it".)
>
> struct work_struct work;
> struct workqueue_struct *a, *b;
>
> do_b(*work)
> {
> /* do something else */
> }
>
> do_a(*work)
> {
> /* do something */
> INIT_WORK(, do_b);
> queue_work(b, );
> }
>
>
> INIT_WORK(, do_a);
> queue_work(a, );

(just in case, in that particular case PREPARE_WORK() should be used)

INIT_WORK(w) can be used if we know that "w" is not pending, and nobody
else can write to this work (say, queue_work(w) or cancel_work_sync(w)).
So currently the code above should work correctly.

However, I'd say it is not correct, INIT_WORK() can throw out some debug
info for example, or the implementation could be changed.

I'm not sure about CONFIG_LOCKDEP (Johannes cc'ed). INIT_WORK() does
lockdep_init_map(->lockdep_map) but run_workqueue() has a local copy,
looks ok.

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [lm-sensors] broken suspend [Was: 2.6.24-rc2-mm1]

2007-11-21 Thread Alan Stern
On Wed, 21 Nov 2007, Rafael J. Wysocki wrote:

> > Is it possible to unregister a driver on CPU_DOWN_PREPARE_FROZEN?
> 
> No.  In that case the suspend core is holding the device's mutex and your
> attempt to unregister it will deadlock with it.
> 
> Do you _have_ _to_ unregister the device at all?  Why don't you just leave
> it registered on CPU_DOWN_PREPARE_FROZEN?  The CPU is not going away
> physically in this case and it's _guaranteed_ that _cpu_up() will be called on
> it as soon as the hibernation image is ready or we are back from suspend.

This leaves the device registered if for some reason the number of CPUs 
after resuming from hibernation is smaller than the number of CPUs 
before hibernation.  Of course, in theory that's never supposed to 
happen...

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [lm-sensors] broken suspend [Was: 2.6.24-rc2-mm1]

2007-11-21 Thread Alan Stern
On Wed, 21 Nov 2007, Rafael J. Wysocki wrote:

  Is it possible to unregister a driver on CPU_DOWN_PREPARE_FROZEN?
 
 No.  In that case the suspend core is holding the device's mutex and your
 attempt to unregister it will deadlock with it.
 
 Do you _have_ _to_ unregister the device at all?  Why don't you just leave
 it registered on CPU_DOWN_PREPARE_FROZEN?  The CPU is not going away
 physically in this case and it's _guaranteed_ that _cpu_up() will be called on
 it as soon as the hibernation image is ready or we are back from suspend.

This leaves the device registered if for some reason the number of CPUs 
after resuming from hibernation is smaller than the number of CPUs 
before hibernation.  Of course, in theory that's never supposed to 
happen...

Alan Stern

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc2-mm1: kcryptd vs lockdep

2007-11-21 Thread Oleg Nesterov
Alasdair G Kergon wrote:

   - But what happens if kcryptd_crypt_write_convert_loop() calls
 INIT_WORK/queue_work twice?

Can't find this function. But INIT_WORK + queue_work twice is very
wrong of course.

Milan Broz wrote:

 Ok, then I have question: Is the following pseudocode correct
 (and problem is in lock validation which checks something
 already initialized for another queue) or reusing work_struct
 is not permitted from inside called work function ?

 (Note comment in code It is permissible to free the struct
 work_struct from inside the function that is called from it.)

 struct work_struct work;
 struct workqueue_struct *a, *b;

 do_b(*work)
 {
 /* do something else */
 }

 do_a(*work)
 {
 /* do something */
 INIT_WORK(work, do_b);
 queue_work(b, work);
 }


 INIT_WORK(work, do_a);
 queue_work(a, work);

(just in case, in that particular case PREPARE_WORK() should be used)

INIT_WORK(w) can be used if we know that w is not pending, and nobody
else can write to this work (say, queue_work(w) or cancel_work_sync(w)).
So currently the code above should work correctly.

However, I'd say it is not correct, INIT_WORK() can throw out some debug
info for example, or the implementation could be changed.

I'm not sure about CONFIG_LOCKDEP (Johannes cc'ed). INIT_WORK() does
lockdep_init_map(-lockdep_map) but run_workqueue() has a local copy,
looks ok.

Oleg.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc2-mm1: kcryptd vs lockdep

2007-11-21 Thread Johannes Berg
Hi,

  Ok, then I have question: Is the following pseudocode correct
  (and problem is in lock validation which checks something
  already initialized for another queue) or reusing work_struct
  is not permitted from inside called work function ?
 
  (Note comment in code It is permissible to free the struct
  work_struct from inside the function that is called from it.)
 
  struct work_struct work;
  struct workqueue_struct *a, *b;
 
  do_b(*work)
  {
  /* do something else */
  }
 
  do_a(*work)
  {
  /* do something */
  INIT_WORK(work, do_b);
  queue_work(b, work);
  }
 
 
  INIT_WORK(work, do_a);
  queue_work(a, work);
 
 (just in case, in that particular case PREPARE_WORK() should be used)
 
 INIT_WORK(w) can be used if we know that w is not pending, and nobody
 else can write to this work (say, queue_work(w) or cancel_work_sync(w)).
 So currently the code above should work correctly.
 
 However, I'd say it is not correct, INIT_WORK() can throw out some debug
 info for example, or the implementation could be changed.
 
 I'm not sure about CONFIG_LOCKDEP (Johannes cc'ed). INIT_WORK() does
 lockdep_init_map(-lockdep_map) but run_workqueue() has a local copy,
 looks ok.

We explicitly need to use a copy of the lockdep_map for locking the
work struct as per the quoted comment. So as far as I can tell, what
INIT_WORK() is doing here is changing an at that point unused copy of
the lockdep map so I think it should be fine. Not sure about the other
fine points nor why you'd want this though :)

johannes


signature.asc
Description: This is a digitally signed message part


Re: [lm-sensors] broken suspend [Was: 2.6.24-rc2-mm1]

2007-11-21 Thread Rafael J. Wysocki
On Wednesday, 21 of November 2007, Alan Stern wrote:
 On Wed, 21 Nov 2007, Rafael J. Wysocki wrote:
 
   Is it possible to unregister a driver on CPU_DOWN_PREPARE_FROZEN?
  
  No.  In that case the suspend core is holding the device's mutex and your
  attempt to unregister it will deadlock with it.
  
  Do you _have_ _to_ unregister the device at all?  Why don't you just leave
  it registered on CPU_DOWN_PREPARE_FROZEN?  The CPU is not going away
  physically in this case and it's _guaranteed_ that _cpu_up() will be called 
  on
  it as soon as the hibernation image is ready or we are back from suspend.
 
 This leaves the device registered if for some reason the number of CPUs 
 after resuming from hibernation is smaller than the number of CPUs 
 before hibernation.  Of course, in theory that's never supposed to 
 happen...

Yes, that clearly would be a bug.

Rafael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] 2.6.24-rc2-mm1 Warning at arch/x86/kernel/smp_32.c:561

2007-11-20 Thread Dave Young
On Nov 20, 2007 5:59 PM, Dave Young <[EMAIL PROTECTED]> wrote:
>
> On Nov 20, 2007 5:56 PM, Andrew Morton <[EMAIL PROTECTED]> wrote:
> >
> > On Tue, 20 Nov 2007 17:47:28 +0800 Dave Young <[EMAIL PROTECTED]> wrote:
> >
> > > Hi,
> > > I encountered kernel warningsr. I just executed xawtv without video dev 
> > > being found.
> > >
> > > like this:
> > >
> > > WARNING: at arch/x86/kernel/smp_32.c:561 native_smp_call_function_mask()
> > >  [] native_smp_call_function_mask+0x149/0x150
> > >  [] alloc_debug_processing+0xa9/0x130
> > >  [] smp_callback+0x0/0x10
> > >  [] smp_call_function+0x1c/0x20
> > >  [] cpuidle_latency_notify+0x18/0x20
> > >  [] notifier_call_chain+0x3e/0x70
> > >  [] __blocking_notifier_call_chain+0x44/0x70
> > >  [] blocking_notifier_call_chain+0x17/0x20
> > >  [] pm_qos_add_requirement+0x8d/0xd0
> > >  [] snd_pcm_hw_params+0x20c/0x2a0 [snd_pcm]
> > >  [] snd_pcm_hw_params_user+0x4e/0x90 [snd_pcm]
> > >  [] snd_pcm_capture_ioctl1+0x3d/0x230 [snd_pcm]
> > >  [] snd_pcm_hw_param_near+0x198/0x230 [snd_pcm_oss]
> > >  [] snd_pcm_kernel_ioctl+0x7e/0x90 [snd_pcm]
> > >  [] snd_pcm_oss_change_params+0x2fc/0x750 [snd_pcm_oss]
> > >  [] snd_pcm_oss_make_ready+0x47/0x60 [snd_pcm_oss]
> > >  [] snd_pcm_oss_sync+0x10e/0x290 [snd_pcm_oss]
> > >  [] snd_pcm_oss_release+0x9a/0xb0 [snd_pcm_oss]
> > >  [] __fput+0x16e/0x200
> > >  [] filp_close+0x3c/0x80
> > >  [] sys_close+0x69/0xd0
> > >  [] syscall_call+0x7/0xb
> > >  [] xfrm_notify_sa+0x110/0x290
> > >  ===
> > >
> >
> > That was hopefully fixed.  You might care to test
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/mm/broken-out-2007-11-20-01-45.tar.gz
> > to confirm that, if feeling sufficiently brave..
> >
>

Hi,
I just confirm that I can't reproduce this after apply
broken-out-2007-11-20-01-45 patch set.

Regards
dave
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [lm-sensors] broken suspend [Was: 2.6.24-rc2-mm1]

2007-11-20 Thread Rafael J. Wysocki
On Tuesday, 20 of November 2007, Mark M. Hoffman wrote:
> Hi all:
> 
> * Alan Stern <[EMAIL PROTECTED]> [2007-11-19 15:27:14 -0500]:
> > On Mon, 19 Nov 2007, Rudolf Marek wrote:
> > 
> > > Hello all,
> > > >>> gives coretemp_cpu_callback -> coretemp_device_remove ->
> > > >>> platform_device_unregister, so coretemp seems to be what I have and 
> > > >>> you don't.
> > > > 
> > > > Yes.
> > > > 
> > > > For the coretemp developers: coretemp_cpu_callback() needs to be more 
> > > > careful about what it does.  During a system sleep transition (suspend, 
> > > > hibernate, resume) it isn't possible to register or unregister a 
> > > > device.  Attempts to register will fail and attempts to unregister will 
> > > > block until the system sleep is over -- and for this callback that 
> > > > means hanging.
> > > 
> > > Well I wrote the driver. Thanks for the clarification. If I recall 
> > > correctly I 
> > > looked how this part should be done from others drivers. Now while 
> > > checking
> > > what happened to the file, seems Rafael added something related.
> > > 
> > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=8bb7844286fb8c9fce6f65d8288aeb09d03a5e0d
> > 
> > That does look like it was meant for exactly this sort of situation.
> > 
> > > > It's not clear what the best way is to fix this.  Perhaps the CPU 
> > > > notification should be sent along with a special flag indicating that 
> > > > the CPU transition is part of a system sleep (although this seems 
> > > > racy).  Perhaps the driver should notice when a system sleep begins, 
> > > > and defer all CPU-change handling until after the sleep is over.
> > > 
> > > maybe it does exist?  CPU_DOWN_PREPARE ?
> > > 
> > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/cpu-hotplug.txt;hb=HEAD
> > > 
> > > Unfortunately I'm not very familiar with this, calling the 
> > > coretemp_device_remove from CPU_DOWN_PREPARE would help? Looking at 
> > > microcode 
> > > driver, seems it just hide sysfs interface from user.
> 
> AFAICT from that documentation, it would have been better to unregister
> the device on CPU_DOWN_PREPARE anyway.  CPU_DEAD seems to be too late -
> it's already gone by then.
> 
> > I'm not sure exactly what you want to do here.  But it seems like a 
> > waste to unregister the coretemp devices at the start of a system sleep 
> > and then register them back at the end.
> > 
> > Could you simply leave the devices registered throughout the entire
> > sleep?  Of course, at the end you would have to check that all the CPUs
> > really did come back up, and unregister the devices for the CPUs that
> > are still offline.
> 
> Is it possible to unregister a driver on CPU_DOWN_PREPARE_FROZEN?

No.  In that case the suspend core is holding the device's mutex and your
attempt to unregister it will deadlock with it.

Do you _have_ _to_ unregister the device at all?  Why don't you just leave
it registered on CPU_DOWN_PREPARE_FROZEN?  The CPU is not going away
physically in this case and it's _guaranteed_ that _cpu_up() will be called on
it as soon as the hibernation image is ready or we are back from suspend.

> If so, then the simplest fix would be the patch below (Jiri: feel free to
> try it). Otherwise it would take a bit of refactoring to bring the sysfs
> interface down/up for suspend/resume.
> 
> commit ce9c7b78c839a6304696d90083eac08baad524ce
> Author: Mark M. Hoffman <[EMAIL PROTECTED]>
> Date:   Tue Nov 20 07:51:50 2007 -0500
> 
> hwmon: (coretemp) fix suspend/resume hang
> 
> Signed-off-by: Mark M. Hoffman <[EMAIL PROTECTED]>

I'd do it like this:

> diff --git a/drivers/hwmon/coretemp.c b/drivers/hwmon/coretemp.c
> index 5c82ec7..afe2d31 100644
> --- a/drivers/hwmon/coretemp.c
> +++ b/drivers/hwmon/coretemp.c
> @@ -338,10 +338,12 @@ static int coretemp_cpu_callback(struct notifier_block 
> *nfb,
>   switch (action) {
>   case CPU_ONLINE:
>   case CPU_ONLINE_FROZEN:
> + case CPU_DOWN_FAILED:
>   coretemp_device_add(cpu);
+   case CPU_DOWN_FAILED_FROZEN: 
>   break;
> - case CPU_DEAD:
> - case CPU_DEAD_FROZEN:
> + case CPU_DOWN_PREPARE:
>   coretemp_device_remove(cpu);
+   case CPU_DOWN_PREPARE_FROZEN:
>   break;
>   }

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc2-mm1: kcryptd vs lockdep

2007-11-20 Thread Alasdair G Kergon
On Tue, Nov 20, 2007 at 03:40:30PM +0100, Milan Broz wrote:
> (Note comment in code "It is permissible to free the struct
> work_struct from inside the function that is called from it".)
 
I don't understand yet how lockdep behaves if the work struct gets
reused and the reused one finishes first.

I renamed the kcryptd functions today in an attempt to disentangle this
code a bit more.

  - io->pending reference counting looks correct (though used
inconsistently when comparing READ with WRITE)

  - But what happens if kcryptd_crypt_write_convert_loop() calls
INIT_WORK/queue_work twice?

Alasdair
-- 
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc2-mm1: kcryptd vs lockdep

2007-11-20 Thread Milan Broz
Torsten Kaiser wrote:
> On Nov 19, 2007 10:00 PM, Milan Broz <[EMAIL PROTECTED]> wrote:
>> Torsten Kaiser wrote:
>>> Anything I could try, apart from more boots with slub_debug=F?
> 
> One time it triggered with slub_debug=F, but no additional output.
> With slub_debug=FP I have not seen it again, so I can't say if that
> would yield more info.
> 
>> Please could you try which patch from the dm-crypt series cause this ?
>> (agk-dm-dm-crypt* names.)
>>
>> I suspect agk-dm-dm-crypt-move-bio-submission-to-thread.patch because
>> there is one work struct used subsequently in two threads...
>> (io thread already started while crypt thread is processing lockdep_map
>> after calling f(work)...)
> 
> After reverting only
> agk-dm-dm-crypt-move-bio-submission-to-thread.patch I also have not
> seen the 'held lock freed' message again.

Ok, then I have question: Is the following pseudocode correct
(and problem is in lock validation which checks something
already initialized for another queue) or reusing work_struct
is not permitted from inside called work function ?

(Note comment in code "It is permissible to free the struct
work_struct from inside the function that is called from it".)

struct work_struct work;
struct workqueue_struct *a, *b;

do_b(*work) 
{
/* do something else */
}

do_a(*work)
{
/* do something */
INIT_WORK(, do_b);
queue_work(b, );
}


INIT_WORK(, do_a);
queue_work(a, );

Milan
--
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [lm-sensors] broken suspend [Was: 2.6.24-rc2-mm1]

2007-11-20 Thread Mark M. Hoffman
Hi all:

* Alan Stern <[EMAIL PROTECTED]> [2007-11-19 15:27:14 -0500]:
> On Mon, 19 Nov 2007, Rudolf Marek wrote:
> 
> > Hello all,
> > >>> gives coretemp_cpu_callback -> coretemp_device_remove ->
> > >>> platform_device_unregister, so coretemp seems to be what I have and you 
> > >>> don't.
> > > 
> > > Yes.
> > > 
> > > For the coretemp developers: coretemp_cpu_callback() needs to be more 
> > > careful about what it does.  During a system sleep transition (suspend, 
> > > hibernate, resume) it isn't possible to register or unregister a 
> > > device.  Attempts to register will fail and attempts to unregister will 
> > > block until the system sleep is over -- and for this callback that 
> > > means hanging.
> > 
> > Well I wrote the driver. Thanks for the clarification. If I recall 
> > correctly I 
> > looked how this part should be done from others drivers. Now while checking
> > what happened to the file, seems Rafael added something related.
> > 
> > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=8bb7844286fb8c9fce6f65d8288aeb09d03a5e0d
> 
> That does look like it was meant for exactly this sort of situation.
> 
> > > It's not clear what the best way is to fix this.  Perhaps the CPU 
> > > notification should be sent along with a special flag indicating that 
> > > the CPU transition is part of a system sleep (although this seems 
> > > racy).  Perhaps the driver should notice when a system sleep begins, 
> > > and defer all CPU-change handling until after the sleep is over.
> > 
> > maybe it does exist?  CPU_DOWN_PREPARE ?
> > 
> > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/cpu-hotplug.txt;hb=HEAD
> > 
> > Unfortunately I'm not very familiar with this, calling the 
> > coretemp_device_remove from CPU_DOWN_PREPARE would help? Looking at 
> > microcode 
> > driver, seems it just hide sysfs interface from user.

AFAICT from that documentation, it would have been better to unregister
the device on CPU_DOWN_PREPARE anyway.  CPU_DEAD seems to be too late -
it's already gone by then.

> I'm not sure exactly what you want to do here.  But it seems like a 
> waste to unregister the coretemp devices at the start of a system sleep 
> and then register them back at the end.
> 
> Could you simply leave the devices registered throughout the entire
> sleep?  Of course, at the end you would have to check that all the CPUs
> really did come back up, and unregister the devices for the CPUs that
> are still offline.

Is it possible to unregister a driver on CPU_DOWN_PREPARE_FROZEN?  If
so, then the simplest fix would be the patch below (Jiri: feel free to
try it). Otherwise it would take a bit of refactoring to bring the sysfs
interface down/up for suspend/resume.

commit ce9c7b78c839a6304696d90083eac08baad524ce
Author: Mark M. Hoffman <[EMAIL PROTECTED]>
Date:   Tue Nov 20 07:51:50 2007 -0500

hwmon: (coretemp) fix suspend/resume hang

Signed-off-by: Mark M. Hoffman <[EMAIL PROTECTED]>

diff --git a/drivers/hwmon/coretemp.c b/drivers/hwmon/coretemp.c
index 5c82ec7..afe2d31 100644
--- a/drivers/hwmon/coretemp.c
+++ b/drivers/hwmon/coretemp.c
@@ -338,10 +338,12 @@ static int coretemp_cpu_callback(struct notifier_block 
*nfb,
switch (action) {
case CPU_ONLINE:
case CPU_ONLINE_FROZEN:
+   case CPU_DOWN_FAILED:
+   case CPU_DOWN_FAILED_FROZEN:
coretemp_device_add(cpu);
break;
-   case CPU_DEAD:
-   case CPU_DEAD_FROZEN:
+   case CPU_DOWN_PREPARE:
+   case CPU_DOWN_PREPARE_FROZEN:
coretemp_device_remove(cpu);
break;
}
-- 
Mark M. Hoffman
[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] 2.6.24-rc2-mm1 - smbd write fails

2007-11-20 Thread Kamalesh Babulal
Andrew Morton wrote:
> On Tue, 20 Nov 2007 14:22:43 +0530 Kamalesh Babulal <[EMAIL PROTECTED]> wrote:
> 
>> Andrew Morton wrote:
>>> On Tue, 20 Nov 2007 13:57:39 +0530 Kamalesh Babulal <[EMAIL PROTECTED]> 
>>> wrote:
>>>
>>>> Hi Andrew,
>>>>
>>>> Following calltrace is seen server, while running filesystem stress on smb 
>>>> mounted partition on the client machine.
>>>>
>>>> Nov 19 18:45:52 p55lp6 smbd[3304]: [2007/11/19 18:45:52, 0] 
>>>> lib/util_sock.c:write_data(562)
>>>> Nov 19 18:45:52 p55lp6 smbd[3304]:   write_data: write failure in writing 
>>>> to client 9.124.111.212. Error Broken pipe
>>>> Nov 19 18:45:52 p55lp6 smbd[3304]: [2007/11/19 18:45:52, 0] 
>>>> lib/util_sock.c:send_smb(769)
>>>> Nov 19 18:45:52 p55lp6 smbd[3304]:   Error writing 39 bytes to client. -1. 
>>>> (Broken pipe)
>>>> Nov 19 18:47:42 p55lp6 smbd[3650]: [2007/11/19 18:47:42, 0] 
>>>> smbd/oplock.c:oplock_timeout_handler(351)
>>>> Nov 19 18:47:42 p55lp6 smbd[3650]:   Oplock break failed for file 
>>>> p0/d3X
>>>> XX/deX/d3cX/d6eXXX/f8d -- replying 
>>>> anyway
>>>> Nov 19 18:47:42 p55lp6 kernel: [ 6960.261068] warning: process `smbd' gets 
>>>> w/ old libcap
>>>> Nov 19 18:47:42 p55lp6 smbd[3650]: [2007/11/19 18:47:42, 0] 
>>>> smbd/oplock_linux.c:linux_release_kernel_oplock(193)
>>>> Nov 19 18:47:43 p55lp6 smbd[3650]:   linux_release_kernel_oplock: Error 
>>>> when removing kernel oplock on file p0/d3XXX
>>>> /deX/d3cX/d6eXXX/f8d,
>>>>  dev = 807, inode = 30983, file
>>>> _id = 501. Error w
>>>> Nov 19 18:48:04 p55lp6 smbd[3650]: [2007/11/19 18:48:04, 0] 
>>>> lib/util_sock.c:write_data(562)
>>>> Nov 19 18:48:04 p55lp6 smbd[3650]:   write_data: write failure in writing 
>>>> to client 9.124.111.212. Error Connection reset by peer
>>>>
>>> So you have samba running on a 2.6.24-rc2-mm1 machine and samba is failing
>>> with the above messages?
>>>
>> Hi Andrew,
>>
>> Yes, the above messages are seen with the 2.6.24-rc2-mm1 kernel.
> 
> Oh.  Well I don't know where to start looking for that one.
> 
> Maybe someone fixed it amongst all the things which have been happening
> recently.  I'll upload an mm snapshot as soon as I can get some of it to
> compile.  Can you please retest with that?
> 
> If it still fails and if this is reasonably reproducible I'm afraid I'd ask
> if you have time to run a bisection search on it.
> 
Sure, will retest it on the mm snapshot.
-- 
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] 2.6.24-rc2-mm1 Warning at arch/x86/kernel/smp_32.c:561

2007-11-20 Thread Dave Young
On Nov 20, 2007 5:56 PM, Andrew Morton <[EMAIL PROTECTED]> wrote:
>
> On Tue, 20 Nov 2007 17:47:28 +0800 Dave Young <[EMAIL PROTECTED]> wrote:
>
> > Hi,
> > I encountered kernel warningsr. I just executed xawtv without video dev 
> > being found.
> >
> > like this:
> >
> > WARNING: at arch/x86/kernel/smp_32.c:561 native_smp_call_function_mask()
> >  [] native_smp_call_function_mask+0x149/0x150
> >  [] alloc_debug_processing+0xa9/0x130
> >  [] smp_callback+0x0/0x10
> >  [] smp_call_function+0x1c/0x20
> >  [] cpuidle_latency_notify+0x18/0x20
> >  [] notifier_call_chain+0x3e/0x70
> >  [] __blocking_notifier_call_chain+0x44/0x70
> >  [] blocking_notifier_call_chain+0x17/0x20
> >  [] pm_qos_add_requirement+0x8d/0xd0
> >  [] snd_pcm_hw_params+0x20c/0x2a0 [snd_pcm]
> >  [] snd_pcm_hw_params_user+0x4e/0x90 [snd_pcm]
> >  [] snd_pcm_capture_ioctl1+0x3d/0x230 [snd_pcm]
> >  [] snd_pcm_hw_param_near+0x198/0x230 [snd_pcm_oss]
> >  [] snd_pcm_kernel_ioctl+0x7e/0x90 [snd_pcm]
> >  [] snd_pcm_oss_change_params+0x2fc/0x750 [snd_pcm_oss]
> >  [] snd_pcm_oss_make_ready+0x47/0x60 [snd_pcm_oss]
> >  [] snd_pcm_oss_sync+0x10e/0x290 [snd_pcm_oss]
> >  [] snd_pcm_oss_release+0x9a/0xb0 [snd_pcm_oss]
> >  [] __fput+0x16e/0x200
> >  [] filp_close+0x3c/0x80
> >  [] sys_close+0x69/0xd0
> >  [] syscall_call+0x7/0xb
> >  [] xfrm_notify_sa+0x110/0x290
> >  ===
> >
>
> That was hopefully fixed.  You might care to test
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/mm/broken-out-2007-11-20-01-45.tar.gz
> to confirm that, if feeling sufficiently brave..
>

Hi,
I would like to try this tomorrow if have time,  thanks.

Regards
dave
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] 2.6.24-rc2-mm1 Warning at arch/x86/kernel/smp_32.c:561

2007-11-20 Thread Andrew Morton
On Tue, 20 Nov 2007 17:47:28 +0800 Dave Young <[EMAIL PROTECTED]> wrote:

> Hi,
> I encountered kernel warningsr. I just executed xawtv without video dev being 
> found.
> 
> like this:
> 
> WARNING: at arch/x86/kernel/smp_32.c:561 native_smp_call_function_mask()
>  [] native_smp_call_function_mask+0x149/0x150
>  [] alloc_debug_processing+0xa9/0x130
>  [] smp_callback+0x0/0x10
>  [] smp_call_function+0x1c/0x20
>  [] cpuidle_latency_notify+0x18/0x20
>  [] notifier_call_chain+0x3e/0x70
>  [] __blocking_notifier_call_chain+0x44/0x70
>  [] blocking_notifier_call_chain+0x17/0x20
>  [] pm_qos_add_requirement+0x8d/0xd0
>  [] snd_pcm_hw_params+0x20c/0x2a0 [snd_pcm]
>  [] snd_pcm_hw_params_user+0x4e/0x90 [snd_pcm]
>  [] snd_pcm_capture_ioctl1+0x3d/0x230 [snd_pcm]
>  [] snd_pcm_hw_param_near+0x198/0x230 [snd_pcm_oss]
>  [] snd_pcm_kernel_ioctl+0x7e/0x90 [snd_pcm]
>  [] snd_pcm_oss_change_params+0x2fc/0x750 [snd_pcm_oss]
>  [] snd_pcm_oss_make_ready+0x47/0x60 [snd_pcm_oss]
>  [] snd_pcm_oss_sync+0x10e/0x290 [snd_pcm_oss]
>  [] snd_pcm_oss_release+0x9a/0xb0 [snd_pcm_oss]
>  [] __fput+0x16e/0x200
>  [] filp_close+0x3c/0x80
>  [] sys_close+0x69/0xd0
>  [] syscall_call+0x7/0xb
>  [] xfrm_notify_sa+0x110/0x290
>  ===
> 

That was hopefully fixed.  You might care to test
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/mm/broken-out-2007-11-20-01-45.tar.gz
to confirm that, if feeling sufficiently brave..

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[BUG] 2.6.24-rc2-mm1 Warning at arch/x86/kernel/smp_32.c:561

2007-11-20 Thread Dave Young
Hi,
I encountered kernel warningsr. I just executed xawtv without video dev being 
found.

like this:

WARNING: at arch/x86/kernel/smp_32.c:561 native_smp_call_function_mask()
 [] native_smp_call_function_mask+0x149/0x150
 [] alloc_debug_processing+0xa9/0x130
 [] smp_callback+0x0/0x10
 [] smp_call_function+0x1c/0x20
 [] cpuidle_latency_notify+0x18/0x20
 [] notifier_call_chain+0x3e/0x70
 [] __blocking_notifier_call_chain+0x44/0x70
 [] blocking_notifier_call_chain+0x17/0x20
 [] pm_qos_add_requirement+0x8d/0xd0
 [] snd_pcm_hw_params+0x20c/0x2a0 [snd_pcm]
 [] snd_pcm_hw_params_user+0x4e/0x90 [snd_pcm]
 [] snd_pcm_capture_ioctl1+0x3d/0x230 [snd_pcm]
 [] snd_pcm_hw_param_near+0x198/0x230 [snd_pcm_oss]
 [] snd_pcm_kernel_ioctl+0x7e/0x90 [snd_pcm]
 [] snd_pcm_oss_change_params+0x2fc/0x750 [snd_pcm_oss]
 [] snd_pcm_oss_make_ready+0x47/0x60 [snd_pcm_oss]
 [] snd_pcm_oss_sync+0x10e/0x290 [snd_pcm_oss]
 [] snd_pcm_oss_release+0x9a/0xb0 [snd_pcm_oss]
 [] __fput+0x16e/0x200
 [] filp_close+0x3c/0x80
 [] sys_close+0x69/0xd0
 [] syscall_call+0x7/0xb
 [] xfrm_notify_sa+0x110/0x290
 ===

config files :

#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.24-rc2-mm1
# Tue Nov 20 17:24:16 2007
#
CONFIG_X86_32=y
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_X86=y
CONFIG_FAST_CMPXCHG_LOCAL=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_QUICKLIST=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
# CONFIG_GENERIC_GPIO is not set
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_DMI=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

#
# General setup
#
CONFIG_EXPERIMENTAL=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_BSD_PROCESS_ACCT=y
# CONFIG_BSD_PROCESS_ACCT_V3 is not set
# CONFIG_TASKSTATS is not set
# CONFIG_USER_NS is not set
CONFIG_PID_NS=y
# CONFIG_AUDIT is not set
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=16
# CONFIG_CGROUPS is not set
CONFIG_FAIR_GROUP_SCHED=y
CONFIG_FAIR_USER_SCHED=y
# CONFIG_FAIR_CGROUP_SCHED is not set
CONFIG_SYSFS_DEPRECATED=y
# CONFIG_RELAY is not set
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_SYSCTL=y
# CONFIG_EMBEDDED is not set
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
# CONFIG_KALLSYMS_EXTRA_PASS is not set
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_ANON_INODES=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_SLUB_DEBUG=y
# CONFIG_SLAB is not set
CONFIG_SLUB=y
# CONFIG_SLOB is not set
CONFIG_PROC_PAGE_MONITOR=y
CONFIG_RT_MUTEXES=y
# CONFIG_TINY_SHMEM is not set
CONFIG_BASE_SMALL=0
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
# CONFIG_MODVERSIONS is not set
# CONFIG_MODULE_SRCVERSION_ALL is not set
CONFIG_KMOD=y
CONFIG_STOP_MACHINE=y
CONFIG_BLOCK=y
CONFIG_LBD=y
# CONFIG_BLK_DEV_IO_TRACE is not set
CONFIG_LSF=y
CONFIG_BLK_DEV_BSG=y

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_AS=y
CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=y
CONFIG_DEFAULT_AS=y
# CONFIG_DEFAULT_DEADLINE is not set
# CONFIG_DEFAULT_CFQ is not set
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED="anticipatory"
CONFIG_PREEMPT_NOTIFIERS=y

#
# Processor type and features
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y
CONFIG_GENERIC_CLOCKEVENTS_BUILD=y
CONFIG_SMP=y
CONFIG_X86_PC=y
# CONFIG_X86_ELAN is not set
# CONFIG_X86_VOYAGER is not set
# CONFIG_X86_NUMAQ is not set
# CONFIG_X86_RDC321X is not set
# CONFIG_X86_SUMMIT is not set
# CONFIG_X86_BIGSMP is not set
# CONFIG_X86_VISWS is not set
# CONFIG_X86_GENERICARCH is not set
# CONFIG_X86_ES7000 is not set
CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER=y
# CONFIG_PARAVIRT_GUEST is not set
# CONFIG_M386 is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
# CONFIG_M586TSC is not set
# CONFIG_M586MMX is not set
# CONFIG_M686 is not set
# CONFIG_MPENTIUMII is not set
# CONFIG_MPENTIUMIII is not set
# CONFIG_MPENTIUMM is not set
# CONFIG_MCORE2 is not set
CONFIG_MPENTIUM4=y
# CONFIG_MK6 is not set
# CONFIG_MK7 is not set
# CONFIG_MK8 is not set
# CONFIG_MCRUSOE is not set
# CONFIG_MEFFICEON is not set
# CONFIG_MWINCHIPC6 is not set
# CONFIG_MWINCHIP2 is not set
# CONFIG_MWINCHIP3D is not set
# CONFIG_MGEODEGX1 is not set
# CONFIG_MGEODE_LX is not set
# CONFIG_MCYRIXIII is not set
# CONFIG_MVIAC3_2 is not set
# CONFIG_MVIAC7 is not set
CONFIG_X86_GENERIC=y
CONFIG_X86_CMPXCHG=y
CONFIG_X86_L1_CACHE_SHIFT=7
CONFIG_X86_XADD=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=

Re: [BUG] 2.6.24-rc2-mm1 - smbd write fails

2007-11-20 Thread Andrew Morton
On Tue, 20 Nov 2007 14:22:43 +0530 Kamalesh Babulal <[EMAIL PROTECTED]> wrote:

> Andrew Morton wrote:
> > On Tue, 20 Nov 2007 13:57:39 +0530 Kamalesh Babulal <[EMAIL PROTECTED]> 
> > wrote:
> > 
> >> Hi Andrew,
> >>
> >> Following calltrace is seen server, while running filesystem stress on smb 
> >> mounted partition on the client machine.
> >>
> >> Nov 19 18:45:52 p55lp6 smbd[3304]: [2007/11/19 18:45:52, 0] 
> >> lib/util_sock.c:write_data(562)
> >> Nov 19 18:45:52 p55lp6 smbd[3304]:   write_data: write failure in writing 
> >> to client 9.124.111.212. Error Broken pipe
> >> Nov 19 18:45:52 p55lp6 smbd[3304]: [2007/11/19 18:45:52, 0] 
> >> lib/util_sock.c:send_smb(769)
> >> Nov 19 18:45:52 p55lp6 smbd[3304]:   Error writing 39 bytes to client. -1. 
> >> (Broken pipe)
> >> Nov 19 18:47:42 p55lp6 smbd[3650]: [2007/11/19 18:47:42, 0] 
> >> smbd/oplock.c:oplock_timeout_handler(351)
> >> Nov 19 18:47:42 p55lp6 smbd[3650]:   Oplock break failed for file 
> >> p0/d3X
> >> XX/deX/d3cX/d6eXXX/f8d -- replying 
> >> anyway
> >> Nov 19 18:47:42 p55lp6 kernel: [ 6960.261068] warning: process `smbd' gets 
> >> w/ old libcap
> >> Nov 19 18:47:42 p55lp6 smbd[3650]: [2007/11/19 18:47:42, 0] 
> >> smbd/oplock_linux.c:linux_release_kernel_oplock(193)
> >> Nov 19 18:47:43 p55lp6 smbd[3650]:   linux_release_kernel_oplock: Error 
> >> when removing kernel oplock on file p0/d3XXX
> >> /deX/d3cX/d6eXXX/f8d,
> >>  dev = 807, inode = 30983, file
> >> _id = 501. Error w
> >> Nov 19 18:48:04 p55lp6 smbd[3650]: [2007/11/19 18:48:04, 0] 
> >> lib/util_sock.c:write_data(562)
> >> Nov 19 18:48:04 p55lp6 smbd[3650]:   write_data: write failure in writing 
> >> to client 9.124.111.212. Error Connection reset by peer
> >>
> > 
> > So you have samba running on a 2.6.24-rc2-mm1 machine and samba is failing
> > with the above messages?
> > 
> Hi Andrew,
> 
> Yes, the above messages are seen with the 2.6.24-rc2-mm1 kernel.

Oh.  Well I don't know where to start looking for that one.

Maybe someone fixed it amongst all the things which have been happening
recently.  I'll upload an mm snapshot as soon as I can get some of it to
compile.  Can you please retest with that?

If it still fails and if this is reasonably reproducible I'm afraid I'd ask
if you have time to run a bisection search on it.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] 2.6.24-rc2-mm1 - smbd write fails

2007-11-20 Thread Kamalesh Babulal
Andrew Morton wrote:
> On Tue, 20 Nov 2007 13:57:39 +0530 Kamalesh Babulal <[EMAIL PROTECTED]> wrote:
> 
>> Hi Andrew,
>>
>> Following calltrace is seen server, while running filesystem stress on smb 
>> mounted partition on the client machine.
>>
>> Nov 19 18:45:52 p55lp6 smbd[3304]: [2007/11/19 18:45:52, 0] 
>> lib/util_sock.c:write_data(562)
>> Nov 19 18:45:52 p55lp6 smbd[3304]:   write_data: write failure in writing to 
>> client 9.124.111.212. Error Broken pipe
>> Nov 19 18:45:52 p55lp6 smbd[3304]: [2007/11/19 18:45:52, 0] 
>> lib/util_sock.c:send_smb(769)
>> Nov 19 18:45:52 p55lp6 smbd[3304]:   Error writing 39 bytes to client. -1. 
>> (Broken pipe)
>> Nov 19 18:47:42 p55lp6 smbd[3650]: [2007/11/19 18:47:42, 0] 
>> smbd/oplock.c:oplock_timeout_handler(351)
>> Nov 19 18:47:42 p55lp6 smbd[3650]:   Oplock break failed for file 
>> p0/d3X
>> XX/deX/d3cX/d6eXXX/f8d -- replying 
>> anyway
>> Nov 19 18:47:42 p55lp6 kernel: [ 6960.261068] warning: process `smbd' gets 
>> w/ old libcap
>> Nov 19 18:47:42 p55lp6 smbd[3650]: [2007/11/19 18:47:42, 0] 
>> smbd/oplock_linux.c:linux_release_kernel_oplock(193)
>> Nov 19 18:47:43 p55lp6 smbd[3650]:   linux_release_kernel_oplock: Error when 
>> removing kernel oplock on file p0/d3XXX
>> /deX/d3cX/d6eXXX/f8d,
>>  dev = 807, inode = 30983, file
>> _id = 501. Error w
>> Nov 19 18:48:04 p55lp6 smbd[3650]: [2007/11/19 18:48:04, 0] 
>> lib/util_sock.c:write_data(562)
>> Nov 19 18:48:04 p55lp6 smbd[3650]:   write_data: write failure in writing to 
>> client 9.124.111.212. Error Connection reset by peer
>>
> 
> So you have samba running on a 2.6.24-rc2-mm1 machine and samba is failing
> with the above messages?
> 
Hi Andrew,

Yes, the above messages are seen with the 2.6.24-rc2-mm1 kernel.

-- 
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] 2.6.24-rc2-mm1 - smbd write fails

2007-11-20 Thread Andrew Morton
On Tue, 20 Nov 2007 13:57:39 +0530 Kamalesh Babulal <[EMAIL PROTECTED]> wrote:

> Hi Andrew,
> 
> Following calltrace is seen server, while running filesystem stress on smb 
> mounted partition on the client machine.
> 
> Nov 19 18:45:52 p55lp6 smbd[3304]: [2007/11/19 18:45:52, 0] 
> lib/util_sock.c:write_data(562)
> Nov 19 18:45:52 p55lp6 smbd[3304]:   write_data: write failure in writing to 
> client 9.124.111.212. Error Broken pipe
> Nov 19 18:45:52 p55lp6 smbd[3304]: [2007/11/19 18:45:52, 0] 
> lib/util_sock.c:send_smb(769)
> Nov 19 18:45:52 p55lp6 smbd[3304]:   Error writing 39 bytes to client. -1. 
> (Broken pipe)
> Nov 19 18:47:42 p55lp6 smbd[3650]: [2007/11/19 18:47:42, 0] 
> smbd/oplock.c:oplock_timeout_handler(351)
> Nov 19 18:47:42 p55lp6 smbd[3650]:   Oplock break failed for file 
> p0/d3X
> XX/deX/d3cX/d6eXXX/f8d -- replying 
> anyway
> Nov 19 18:47:42 p55lp6 kernel: [ 6960.261068] warning: process `smbd' gets w/ 
> old libcap
> Nov 19 18:47:42 p55lp6 smbd[3650]: [2007/11/19 18:47:42, 0] 
> smbd/oplock_linux.c:linux_release_kernel_oplock(193)
> Nov 19 18:47:43 p55lp6 smbd[3650]:   linux_release_kernel_oplock: Error when 
> removing kernel oplock on file p0/d3XXX
> /deX/d3cX/d6eXXX/f8d,
>  dev = 807, inode = 30983, file
> _id = 501. Error w
> Nov 19 18:48:04 p55lp6 smbd[3650]: [2007/11/19 18:48:04, 0] 
> lib/util_sock.c:write_data(562)
> Nov 19 18:48:04 p55lp6 smbd[3650]:   write_data: write failure in writing to 
> client 9.124.111.212. Error Connection reset by peer
> 

So you have samba running on a 2.6.24-rc2-mm1 machine and samba is failing
with the above messages?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[BUG] 2.6.24-rc2-mm1 - smbd write fails

2007-11-20 Thread Kamalesh Babulal
Hi Andrew,

Following calltrace is seen server, while running filesystem stress on smb 
mounted partition on the client machine.

Nov 19 18:45:52 p55lp6 smbd[3304]: [2007/11/19 18:45:52, 0] 
lib/util_sock.c:write_data(562)
Nov 19 18:45:52 p55lp6 smbd[3304]:   write_data: write failure in writing to 
client 9.124.111.212. Error Broken pipe
Nov 19 18:45:52 p55lp6 smbd[3304]: [2007/11/19 18:45:52, 0] 
lib/util_sock.c:send_smb(769)
Nov 19 18:45:52 p55lp6 smbd[3304]:   Error writing 39 bytes to client. -1. 
(Broken pipe)
Nov 19 18:47:42 p55lp6 smbd[3650]: [2007/11/19 18:47:42, 0] 
smbd/oplock.c:oplock_timeout_handler(351)
Nov 19 18:47:42 p55lp6 smbd[3650]:   Oplock break failed for file 
p0/d3X
XX/deX/d3cX/d6eXXX/f8d -- replying 
anyway
Nov 19 18:47:42 p55lp6 kernel: [ 6960.261068] warning: process `smbd' gets w/ 
old libcap
Nov 19 18:47:42 p55lp6 smbd[3650]: [2007/11/19 18:47:42, 0] 
smbd/oplock_linux.c:linux_release_kernel_oplock(193)
Nov 19 18:47:43 p55lp6 smbd[3650]:   linux_release_kernel_oplock: Error when 
removing kernel oplock on file p0/d3XXX
/deX/d3cX/d6eXXX/f8d,
 dev = 807, inode = 30983, file
_id = 501. Error w
Nov 19 18:48:04 p55lp6 smbd[3650]: [2007/11/19 18:48:04, 0] 
lib/util_sock.c:write_data(562)
Nov 19 18:48:04 p55lp6 smbd[3650]:   write_data: write failure in writing to 
client 9.124.111.212. Error Connection reset by peer

-- 
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] 2.6.24-rc2-mm1 - smbd write fails

2007-11-20 Thread Andrew Morton
On Tue, 20 Nov 2007 13:57:39 +0530 Kamalesh Babulal [EMAIL PROTECTED] wrote:

 Hi Andrew,
 
 Following calltrace is seen server, while running filesystem stress on smb 
 mounted partition on the client machine.
 
 Nov 19 18:45:52 p55lp6 smbd[3304]: [2007/11/19 18:45:52, 0] 
 lib/util_sock.c:write_data(562)
 Nov 19 18:45:52 p55lp6 smbd[3304]:   write_data: write failure in writing to 
 client 9.124.111.212. Error Broken pipe
 Nov 19 18:45:52 p55lp6 smbd[3304]: [2007/11/19 18:45:52, 0] 
 lib/util_sock.c:send_smb(769)
 Nov 19 18:45:52 p55lp6 smbd[3304]:   Error writing 39 bytes to client. -1. 
 (Broken pipe)
 Nov 19 18:47:42 p55lp6 smbd[3650]: [2007/11/19 18:47:42, 0] 
 smbd/oplock.c:oplock_timeout_handler(351)
 Nov 19 18:47:42 p55lp6 smbd[3650]:   Oplock break failed for file 
 p0/d3X
 XX/deX/d3cX/d6eXXX/f8d -- replying 
 anyway
 Nov 19 18:47:42 p55lp6 kernel: [ 6960.261068] warning: process `smbd' gets w/ 
 old libcap
 Nov 19 18:47:42 p55lp6 smbd[3650]: [2007/11/19 18:47:42, 0] 
 smbd/oplock_linux.c:linux_release_kernel_oplock(193)
 Nov 19 18:47:43 p55lp6 smbd[3650]:   linux_release_kernel_oplock: Error when 
 removing kernel oplock on file p0/d3XXX
 /deX/d3cX/d6eXXX/f8d,
  dev = 807, inode = 30983, file
 _id = 501. Error w
 Nov 19 18:48:04 p55lp6 smbd[3650]: [2007/11/19 18:48:04, 0] 
 lib/util_sock.c:write_data(562)
 Nov 19 18:48:04 p55lp6 smbd[3650]:   write_data: write failure in writing to 
 client 9.124.111.212. Error Connection reset by peer
 

So you have samba running on a 2.6.24-rc2-mm1 machine and samba is failing
with the above messages?

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] 2.6.24-rc2-mm1 - smbd write fails

2007-11-20 Thread Kamalesh Babulal
Andrew Morton wrote:
 On Tue, 20 Nov 2007 13:57:39 +0530 Kamalesh Babulal [EMAIL PROTECTED] wrote:
 
 Hi Andrew,

 Following calltrace is seen server, while running filesystem stress on smb 
 mounted partition on the client machine.

 Nov 19 18:45:52 p55lp6 smbd[3304]: [2007/11/19 18:45:52, 0] 
 lib/util_sock.c:write_data(562)
 Nov 19 18:45:52 p55lp6 smbd[3304]:   write_data: write failure in writing to 
 client 9.124.111.212. Error Broken pipe
 Nov 19 18:45:52 p55lp6 smbd[3304]: [2007/11/19 18:45:52, 0] 
 lib/util_sock.c:send_smb(769)
 Nov 19 18:45:52 p55lp6 smbd[3304]:   Error writing 39 bytes to client. -1. 
 (Broken pipe)
 Nov 19 18:47:42 p55lp6 smbd[3650]: [2007/11/19 18:47:42, 0] 
 smbd/oplock.c:oplock_timeout_handler(351)
 Nov 19 18:47:42 p55lp6 smbd[3650]:   Oplock break failed for file 
 p0/d3X
 XX/deX/d3cX/d6eXXX/f8d -- replying 
 anyway
 Nov 19 18:47:42 p55lp6 kernel: [ 6960.261068] warning: process `smbd' gets 
 w/ old libcap
 Nov 19 18:47:42 p55lp6 smbd[3650]: [2007/11/19 18:47:42, 0] 
 smbd/oplock_linux.c:linux_release_kernel_oplock(193)
 Nov 19 18:47:43 p55lp6 smbd[3650]:   linux_release_kernel_oplock: Error when 
 removing kernel oplock on file p0/d3XXX
 /deX/d3cX/d6eXXX/f8d,
  dev = 807, inode = 30983, file
 _id = 501. Error w
 Nov 19 18:48:04 p55lp6 smbd[3650]: [2007/11/19 18:48:04, 0] 
 lib/util_sock.c:write_data(562)
 Nov 19 18:48:04 p55lp6 smbd[3650]:   write_data: write failure in writing to 
 client 9.124.111.212. Error Connection reset by peer

 
 So you have samba running on a 2.6.24-rc2-mm1 machine and samba is failing
 with the above messages?
 
Hi Andrew,

Yes, the above messages are seen with the 2.6.24-rc2-mm1 kernel.

-- 
Thanks  Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] 2.6.24-rc2-mm1 - smbd write fails

2007-11-20 Thread Kamalesh Babulal
Andrew Morton wrote:
 On Tue, 20 Nov 2007 14:22:43 +0530 Kamalesh Babulal [EMAIL PROTECTED] wrote:
 
 Andrew Morton wrote:
 On Tue, 20 Nov 2007 13:57:39 +0530 Kamalesh Babulal [EMAIL PROTECTED] 
 wrote:

 Hi Andrew,

 Following calltrace is seen server, while running filesystem stress on smb 
 mounted partition on the client machine.

 Nov 19 18:45:52 p55lp6 smbd[3304]: [2007/11/19 18:45:52, 0] 
 lib/util_sock.c:write_data(562)
 Nov 19 18:45:52 p55lp6 smbd[3304]:   write_data: write failure in writing 
 to client 9.124.111.212. Error Broken pipe
 Nov 19 18:45:52 p55lp6 smbd[3304]: [2007/11/19 18:45:52, 0] 
 lib/util_sock.c:send_smb(769)
 Nov 19 18:45:52 p55lp6 smbd[3304]:   Error writing 39 bytes to client. -1. 
 (Broken pipe)
 Nov 19 18:47:42 p55lp6 smbd[3650]: [2007/11/19 18:47:42, 0] 
 smbd/oplock.c:oplock_timeout_handler(351)
 Nov 19 18:47:42 p55lp6 smbd[3650]:   Oplock break failed for file 
 p0/d3X
 XX/deX/d3cX/d6eXXX/f8d -- replying 
 anyway
 Nov 19 18:47:42 p55lp6 kernel: [ 6960.261068] warning: process `smbd' gets 
 w/ old libcap
 Nov 19 18:47:42 p55lp6 smbd[3650]: [2007/11/19 18:47:42, 0] 
 smbd/oplock_linux.c:linux_release_kernel_oplock(193)
 Nov 19 18:47:43 p55lp6 smbd[3650]:   linux_release_kernel_oplock: Error 
 when removing kernel oplock on file p0/d3XXX
 /deX/d3cX/d6eXXX/f8d,
  dev = 807, inode = 30983, file
 _id = 501. Error w
 Nov 19 18:48:04 p55lp6 smbd[3650]: [2007/11/19 18:48:04, 0] 
 lib/util_sock.c:write_data(562)
 Nov 19 18:48:04 p55lp6 smbd[3650]:   write_data: write failure in writing 
 to client 9.124.111.212. Error Connection reset by peer

 So you have samba running on a 2.6.24-rc2-mm1 machine and samba is failing
 with the above messages?

 Hi Andrew,

 Yes, the above messages are seen with the 2.6.24-rc2-mm1 kernel.
 
 Oh.  Well I don't know where to start looking for that one.
 
 Maybe someone fixed it amongst all the things which have been happening
 recently.  I'll upload an mm snapshot as soon as I can get some of it to
 compile.  Can you please retest with that?
 
 If it still fails and if this is reasonably reproducible I'm afraid I'd ask
 if you have time to run a bisection search on it.
 
Sure, will retest it on the mm snapshot.
-- 
Thanks  Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] 2.6.24-rc2-mm1 Warning at arch/x86/kernel/smp_32.c:561

2007-11-20 Thread Andrew Morton
On Tue, 20 Nov 2007 17:47:28 +0800 Dave Young [EMAIL PROTECTED] wrote:

 Hi,
 I encountered kernel warningsr. I just executed xawtv without video dev being 
 found.
 
 like this:
 
 WARNING: at arch/x86/kernel/smp_32.c:561 native_smp_call_function_mask()
  [c0118769] native_smp_call_function_mask+0x149/0x150
  [c0178dd9] alloc_debug_processing+0xa9/0x130
  [c0372da0] smp_callback+0x0/0x10
  [c0119b7c] smp_call_function+0x1c/0x20
  [c0372dc8] cpuidle_latency_notify+0x18/0x20
  [c0144eae] notifier_call_chain+0x3e/0x70
  [c01450d4] __blocking_notifier_call_chain+0x44/0x70
  [c0145117] blocking_notifier_call_chain+0x17/0x20
  [c01454fd] pm_qos_add_requirement+0x8d/0xd0
  [f887030c] snd_pcm_hw_params+0x20c/0x2a0 [snd_pcm]
  [f88703ee] snd_pcm_hw_params_user+0x4e/0x90 [snd_pcm]
  [f88741ed] snd_pcm_capture_ioctl1+0x3d/0x230 [snd_pcm]
  [f88c1b28] snd_pcm_hw_param_near+0x198/0x230 [snd_pcm_oss]
  [f88744fe] snd_pcm_kernel_ioctl+0x7e/0x90 [snd_pcm]
  [f88c28fc] snd_pcm_oss_change_params+0x2fc/0x750 [snd_pcm_oss]
  [f88c2e47] snd_pcm_oss_make_ready+0x47/0x60 [snd_pcm_oss]
  [f88c3a4e] snd_pcm_oss_sync+0x10e/0x290 [snd_pcm_oss]
  [f88c4daa] snd_pcm_oss_release+0x9a/0xb0 [snd_pcm_oss]
  [c01801ae] __fput+0x16e/0x200
  [c017e35c] filp_close+0x3c/0x80
  [c017e409] sys_close+0x69/0xd0
  [c01042da] syscall_call+0x7/0xb
  [c040] xfrm_notify_sa+0x110/0x290
  ===
 

That was hopefully fixed.  You might care to test
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/mm/broken-out-2007-11-20-01-45.tar.gz
to confirm that, if feeling sufficiently brave..

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] 2.6.24-rc2-mm1 - smbd write fails

2007-11-20 Thread Andrew Morton
On Tue, 20 Nov 2007 14:22:43 +0530 Kamalesh Babulal [EMAIL PROTECTED] wrote:

 Andrew Morton wrote:
  On Tue, 20 Nov 2007 13:57:39 +0530 Kamalesh Babulal [EMAIL PROTECTED] 
  wrote:
  
  Hi Andrew,
 
  Following calltrace is seen server, while running filesystem stress on smb 
  mounted partition on the client machine.
 
  Nov 19 18:45:52 p55lp6 smbd[3304]: [2007/11/19 18:45:52, 0] 
  lib/util_sock.c:write_data(562)
  Nov 19 18:45:52 p55lp6 smbd[3304]:   write_data: write failure in writing 
  to client 9.124.111.212. Error Broken pipe
  Nov 19 18:45:52 p55lp6 smbd[3304]: [2007/11/19 18:45:52, 0] 
  lib/util_sock.c:send_smb(769)
  Nov 19 18:45:52 p55lp6 smbd[3304]:   Error writing 39 bytes to client. -1. 
  (Broken pipe)
  Nov 19 18:47:42 p55lp6 smbd[3650]: [2007/11/19 18:47:42, 0] 
  smbd/oplock.c:oplock_timeout_handler(351)
  Nov 19 18:47:42 p55lp6 smbd[3650]:   Oplock break failed for file 
  p0/d3X
  XX/deX/d3cX/d6eXXX/f8d -- replying 
  anyway
  Nov 19 18:47:42 p55lp6 kernel: [ 6960.261068] warning: process `smbd' gets 
  w/ old libcap
  Nov 19 18:47:42 p55lp6 smbd[3650]: [2007/11/19 18:47:42, 0] 
  smbd/oplock_linux.c:linux_release_kernel_oplock(193)
  Nov 19 18:47:43 p55lp6 smbd[3650]:   linux_release_kernel_oplock: Error 
  when removing kernel oplock on file p0/d3XXX
  /deX/d3cX/d6eXXX/f8d,
   dev = 807, inode = 30983, file
  _id = 501. Error w
  Nov 19 18:48:04 p55lp6 smbd[3650]: [2007/11/19 18:48:04, 0] 
  lib/util_sock.c:write_data(562)
  Nov 19 18:48:04 p55lp6 smbd[3650]:   write_data: write failure in writing 
  to client 9.124.111.212. Error Connection reset by peer
 
  
  So you have samba running on a 2.6.24-rc2-mm1 machine and samba is failing
  with the above messages?
  
 Hi Andrew,
 
 Yes, the above messages are seen with the 2.6.24-rc2-mm1 kernel.

Oh.  Well I don't know where to start looking for that one.

Maybe someone fixed it amongst all the things which have been happening
recently.  I'll upload an mm snapshot as soon as I can get some of it to
compile.  Can you please retest with that?

If it still fails and if this is reasonably reproducible I'm afraid I'd ask
if you have time to run a bisection search on it.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[BUG] 2.6.24-rc2-mm1 - smbd write fails

2007-11-20 Thread Kamalesh Babulal
Hi Andrew,

Following calltrace is seen server, while running filesystem stress on smb 
mounted partition on the client machine.

Nov 19 18:45:52 p55lp6 smbd[3304]: [2007/11/19 18:45:52, 0] 
lib/util_sock.c:write_data(562)
Nov 19 18:45:52 p55lp6 smbd[3304]:   write_data: write failure in writing to 
client 9.124.111.212. Error Broken pipe
Nov 19 18:45:52 p55lp6 smbd[3304]: [2007/11/19 18:45:52, 0] 
lib/util_sock.c:send_smb(769)
Nov 19 18:45:52 p55lp6 smbd[3304]:   Error writing 39 bytes to client. -1. 
(Broken pipe)
Nov 19 18:47:42 p55lp6 smbd[3650]: [2007/11/19 18:47:42, 0] 
smbd/oplock.c:oplock_timeout_handler(351)
Nov 19 18:47:42 p55lp6 smbd[3650]:   Oplock break failed for file 
p0/d3X
XX/deX/d3cX/d6eXXX/f8d -- replying 
anyway
Nov 19 18:47:42 p55lp6 kernel: [ 6960.261068] warning: process `smbd' gets w/ 
old libcap
Nov 19 18:47:42 p55lp6 smbd[3650]: [2007/11/19 18:47:42, 0] 
smbd/oplock_linux.c:linux_release_kernel_oplock(193)
Nov 19 18:47:43 p55lp6 smbd[3650]:   linux_release_kernel_oplock: Error when 
removing kernel oplock on file p0/d3XXX
/deX/d3cX/d6eXXX/f8d,
 dev = 807, inode = 30983, file
_id = 501. Error w
Nov 19 18:48:04 p55lp6 smbd[3650]: [2007/11/19 18:48:04, 0] 
lib/util_sock.c:write_data(562)
Nov 19 18:48:04 p55lp6 smbd[3650]:   write_data: write failure in writing to 
client 9.124.111.212. Error Connection reset by peer

-- 
Thanks  Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[BUG] 2.6.24-rc2-mm1 Warning at arch/x86/kernel/smp_32.c:561

2007-11-20 Thread Dave Young
Hi,
I encountered kernel warningsr. I just executed xawtv without video dev being 
found.

like this:

WARNING: at arch/x86/kernel/smp_32.c:561 native_smp_call_function_mask()
 [c0118769] native_smp_call_function_mask+0x149/0x150
 [c0178dd9] alloc_debug_processing+0xa9/0x130
 [c0372da0] smp_callback+0x0/0x10
 [c0119b7c] smp_call_function+0x1c/0x20
 [c0372dc8] cpuidle_latency_notify+0x18/0x20
 [c0144eae] notifier_call_chain+0x3e/0x70
 [c01450d4] __blocking_notifier_call_chain+0x44/0x70
 [c0145117] blocking_notifier_call_chain+0x17/0x20
 [c01454fd] pm_qos_add_requirement+0x8d/0xd0
 [f887030c] snd_pcm_hw_params+0x20c/0x2a0 [snd_pcm]
 [f88703ee] snd_pcm_hw_params_user+0x4e/0x90 [snd_pcm]
 [f88741ed] snd_pcm_capture_ioctl1+0x3d/0x230 [snd_pcm]
 [f88c1b28] snd_pcm_hw_param_near+0x198/0x230 [snd_pcm_oss]
 [f88744fe] snd_pcm_kernel_ioctl+0x7e/0x90 [snd_pcm]
 [f88c28fc] snd_pcm_oss_change_params+0x2fc/0x750 [snd_pcm_oss]
 [f88c2e47] snd_pcm_oss_make_ready+0x47/0x60 [snd_pcm_oss]
 [f88c3a4e] snd_pcm_oss_sync+0x10e/0x290 [snd_pcm_oss]
 [f88c4daa] snd_pcm_oss_release+0x9a/0xb0 [snd_pcm_oss]
 [c01801ae] __fput+0x16e/0x200
 [c017e35c] filp_close+0x3c/0x80
 [c017e409] sys_close+0x69/0xd0
 [c01042da] syscall_call+0x7/0xb
 [c040] xfrm_notify_sa+0x110/0x290
 ===

config files :

#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.24-rc2-mm1
# Tue Nov 20 17:24:16 2007
#
CONFIG_X86_32=y
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_X86=y
CONFIG_FAST_CMPXCHG_LOCAL=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_QUICKLIST=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
# CONFIG_GENERIC_GPIO is not set
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_DMI=y
CONFIG_DEFCONFIG_LIST=/lib/modules/$UNAME_RELEASE/.config

#
# General setup
#
CONFIG_EXPERIMENTAL=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_LOCALVERSION=
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_BSD_PROCESS_ACCT=y
# CONFIG_BSD_PROCESS_ACCT_V3 is not set
# CONFIG_TASKSTATS is not set
# CONFIG_USER_NS is not set
CONFIG_PID_NS=y
# CONFIG_AUDIT is not set
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=16
# CONFIG_CGROUPS is not set
CONFIG_FAIR_GROUP_SCHED=y
CONFIG_FAIR_USER_SCHED=y
# CONFIG_FAIR_CGROUP_SCHED is not set
CONFIG_SYSFS_DEPRECATED=y
# CONFIG_RELAY is not set
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_SYSCTL=y
# CONFIG_EMBEDDED is not set
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
# CONFIG_KALLSYMS_EXTRA_PASS is not set
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_ANON_INODES=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_SLUB_DEBUG=y
# CONFIG_SLAB is not set
CONFIG_SLUB=y
# CONFIG_SLOB is not set
CONFIG_PROC_PAGE_MONITOR=y
CONFIG_RT_MUTEXES=y
# CONFIG_TINY_SHMEM is not set
CONFIG_BASE_SMALL=0
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
# CONFIG_MODVERSIONS is not set
# CONFIG_MODULE_SRCVERSION_ALL is not set
CONFIG_KMOD=y
CONFIG_STOP_MACHINE=y
CONFIG_BLOCK=y
CONFIG_LBD=y
# CONFIG_BLK_DEV_IO_TRACE is not set
CONFIG_LSF=y
CONFIG_BLK_DEV_BSG=y

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_AS=y
CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=y
CONFIG_DEFAULT_AS=y
# CONFIG_DEFAULT_DEADLINE is not set
# CONFIG_DEFAULT_CFQ is not set
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED=anticipatory
CONFIG_PREEMPT_NOTIFIERS=y

#
# Processor type and features
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y
CONFIG_GENERIC_CLOCKEVENTS_BUILD=y
CONFIG_SMP=y
CONFIG_X86_PC=y
# CONFIG_X86_ELAN is not set
# CONFIG_X86_VOYAGER is not set
# CONFIG_X86_NUMAQ is not set
# CONFIG_X86_RDC321X is not set
# CONFIG_X86_SUMMIT is not set
# CONFIG_X86_BIGSMP is not set
# CONFIG_X86_VISWS is not set
# CONFIG_X86_GENERICARCH is not set
# CONFIG_X86_ES7000 is not set
CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER=y
# CONFIG_PARAVIRT_GUEST is not set
# CONFIG_M386 is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
# CONFIG_M586TSC is not set
# CONFIG_M586MMX is not set
# CONFIG_M686 is not set
# CONFIG_MPENTIUMII is not set
# CONFIG_MPENTIUMIII is not set
# CONFIG_MPENTIUMM is not set
# CONFIG_MCORE2 is not set
CONFIG_MPENTIUM4=y
# CONFIG_MK6 is not set
# CONFIG_MK7 is not set
# CONFIG_MK8 is not set
# CONFIG_MCRUSOE is not set
# CONFIG_MEFFICEON is not set
# CONFIG_MWINCHIPC6 is not set
# CONFIG_MWINCHIP2 is not set
# CONFIG_MWINCHIP3D is not set
# CONFIG_MGEODEGX1 is not set
# CONFIG_MGEODE_LX is not set
# CONFIG_MCYRIXIII is not set
# CONFIG_MVIAC3_2 is not set
# CONFIG_MVIAC7 is not set

Re: [BUG] 2.6.24-rc2-mm1 Warning at arch/x86/kernel/smp_32.c:561

2007-11-20 Thread Dave Young
On Nov 20, 2007 5:56 PM, Andrew Morton [EMAIL PROTECTED] wrote:

 On Tue, 20 Nov 2007 17:47:28 +0800 Dave Young [EMAIL PROTECTED] wrote:

  Hi,
  I encountered kernel warningsr. I just executed xawtv without video dev 
  being found.
 
  like this:
 
  WARNING: at arch/x86/kernel/smp_32.c:561 native_smp_call_function_mask()
   [c0118769] native_smp_call_function_mask+0x149/0x150
   [c0178dd9] alloc_debug_processing+0xa9/0x130
   [c0372da0] smp_callback+0x0/0x10
   [c0119b7c] smp_call_function+0x1c/0x20
   [c0372dc8] cpuidle_latency_notify+0x18/0x20
   [c0144eae] notifier_call_chain+0x3e/0x70
   [c01450d4] __blocking_notifier_call_chain+0x44/0x70
   [c0145117] blocking_notifier_call_chain+0x17/0x20
   [c01454fd] pm_qos_add_requirement+0x8d/0xd0
   [f887030c] snd_pcm_hw_params+0x20c/0x2a0 [snd_pcm]
   [f88703ee] snd_pcm_hw_params_user+0x4e/0x90 [snd_pcm]
   [f88741ed] snd_pcm_capture_ioctl1+0x3d/0x230 [snd_pcm]
   [f88c1b28] snd_pcm_hw_param_near+0x198/0x230 [snd_pcm_oss]
   [f88744fe] snd_pcm_kernel_ioctl+0x7e/0x90 [snd_pcm]
   [f88c28fc] snd_pcm_oss_change_params+0x2fc/0x750 [snd_pcm_oss]
   [f88c2e47] snd_pcm_oss_make_ready+0x47/0x60 [snd_pcm_oss]
   [f88c3a4e] snd_pcm_oss_sync+0x10e/0x290 [snd_pcm_oss]
   [f88c4daa] snd_pcm_oss_release+0x9a/0xb0 [snd_pcm_oss]
   [c01801ae] __fput+0x16e/0x200
   [c017e35c] filp_close+0x3c/0x80
   [c017e409] sys_close+0x69/0xd0
   [c01042da] syscall_call+0x7/0xb
   [c040] xfrm_notify_sa+0x110/0x290
   ===
 

 That was hopefully fixed.  You might care to test
 ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/mm/broken-out-2007-11-20-01-45.tar.gz
 to confirm that, if feeling sufficiently brave..


Hi,
I would like to try this tomorrow if have time,  thanks.

Regards
dave
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [lm-sensors] broken suspend [Was: 2.6.24-rc2-mm1]

2007-11-20 Thread Mark M. Hoffman
Hi all:

* Alan Stern [EMAIL PROTECTED] [2007-11-19 15:27:14 -0500]:
 On Mon, 19 Nov 2007, Rudolf Marek wrote:
 
  Hello all,
   gives coretemp_cpu_callback - coretemp_device_remove -
   platform_device_unregister, so coretemp seems to be what I have and you 
   don't.
   
   Yes.
   
   For the coretemp developers: coretemp_cpu_callback() needs to be more 
   careful about what it does.  During a system sleep transition (suspend, 
   hibernate, resume) it isn't possible to register or unregister a 
   device.  Attempts to register will fail and attempts to unregister will 
   block until the system sleep is over -- and for this callback that 
   means hanging.
  
  Well I wrote the driver. Thanks for the clarification. If I recall 
  correctly I 
  looked how this part should be done from others drivers. Now while checking
  what happened to the file, seems Rafael added something related.
  
  http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=8bb7844286fb8c9fce6f65d8288aeb09d03a5e0d
 
 That does look like it was meant for exactly this sort of situation.
 
   It's not clear what the best way is to fix this.  Perhaps the CPU 
   notification should be sent along with a special flag indicating that 
   the CPU transition is part of a system sleep (although this seems 
   racy).  Perhaps the driver should notice when a system sleep begins, 
   and defer all CPU-change handling until after the sleep is over.
  
  maybe it does exist?  CPU_DOWN_PREPARE ?
  
  http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/cpu-hotplug.txt;hb=HEAD
  
  Unfortunately I'm not very familiar with this, calling the 
  coretemp_device_remove from CPU_DOWN_PREPARE would help? Looking at 
  microcode 
  driver, seems it just hide sysfs interface from user.

AFAICT from that documentation, it would have been better to unregister
the device on CPU_DOWN_PREPARE anyway.  CPU_DEAD seems to be too late -
it's already gone by then.

 I'm not sure exactly what you want to do here.  But it seems like a 
 waste to unregister the coretemp devices at the start of a system sleep 
 and then register them back at the end.
 
 Could you simply leave the devices registered throughout the entire
 sleep?  Of course, at the end you would have to check that all the CPUs
 really did come back up, and unregister the devices for the CPUs that
 are still offline.

Is it possible to unregister a driver on CPU_DOWN_PREPARE_FROZEN?  If
so, then the simplest fix would be the patch below (Jiri: feel free to
try it). Otherwise it would take a bit of refactoring to bring the sysfs
interface down/up for suspend/resume.

commit ce9c7b78c839a6304696d90083eac08baad524ce
Author: Mark M. Hoffman [EMAIL PROTECTED]
Date:   Tue Nov 20 07:51:50 2007 -0500

hwmon: (coretemp) fix suspend/resume hang

Signed-off-by: Mark M. Hoffman [EMAIL PROTECTED]

diff --git a/drivers/hwmon/coretemp.c b/drivers/hwmon/coretemp.c
index 5c82ec7..afe2d31 100644
--- a/drivers/hwmon/coretemp.c
+++ b/drivers/hwmon/coretemp.c
@@ -338,10 +338,12 @@ static int coretemp_cpu_callback(struct notifier_block 
*nfb,
switch (action) {
case CPU_ONLINE:
case CPU_ONLINE_FROZEN:
+   case CPU_DOWN_FAILED:
+   case CPU_DOWN_FAILED_FROZEN:
coretemp_device_add(cpu);
break;
-   case CPU_DEAD:
-   case CPU_DEAD_FROZEN:
+   case CPU_DOWN_PREPARE:
+   case CPU_DOWN_PREPARE_FROZEN:
coretemp_device_remove(cpu);
break;
}
-- 
Mark M. Hoffman
[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc2-mm1: kcryptd vs lockdep

2007-11-20 Thread Milan Broz
Torsten Kaiser wrote:
 On Nov 19, 2007 10:00 PM, Milan Broz [EMAIL PROTECTED] wrote:
 Torsten Kaiser wrote:
 Anything I could try, apart from more boots with slub_debug=F?
 
 One time it triggered with slub_debug=F, but no additional output.
 With slub_debug=FP I have not seen it again, so I can't say if that
 would yield more info.
 
 Please could you try which patch from the dm-crypt series cause this ?
 (agk-dm-dm-crypt* names.)

 I suspect agk-dm-dm-crypt-move-bio-submission-to-thread.patch because
 there is one work struct used subsequently in two threads...
 (io thread already started while crypt thread is processing lockdep_map
 after calling f(work)...)
 
 After reverting only
 agk-dm-dm-crypt-move-bio-submission-to-thread.patch I also have not
 seen the 'held lock freed' message again.

Ok, then I have question: Is the following pseudocode correct
(and problem is in lock validation which checks something
already initialized for another queue) or reusing work_struct
is not permitted from inside called work function ?

(Note comment in code It is permissible to free the struct
work_struct from inside the function that is called from it.)

struct work_struct work;
struct workqueue_struct *a, *b;

do_b(*work) 
{
/* do something else */
}

do_a(*work)
{
/* do something */
INIT_WORK(work, do_b);
queue_work(b, work);
}


INIT_WORK(work, do_a);
queue_work(a, work);

Milan
--
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc2-mm1: kcryptd vs lockdep

2007-11-20 Thread Alasdair G Kergon
On Tue, Nov 20, 2007 at 03:40:30PM +0100, Milan Broz wrote:
 (Note comment in code It is permissible to free the struct
 work_struct from inside the function that is called from it.)
 
I don't understand yet how lockdep behaves if the work struct gets
reused and the reused one finishes first.

I renamed the kcryptd functions today in an attempt to disentangle this
code a bit more.

  - io-pending reference counting looks correct (though used
inconsistently when comparing READ with WRITE)

  - But what happens if kcryptd_crypt_write_convert_loop() calls
INIT_WORK/queue_work twice?

Alasdair
-- 
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [lm-sensors] broken suspend [Was: 2.6.24-rc2-mm1]

2007-11-20 Thread Rafael J. Wysocki
On Tuesday, 20 of November 2007, Mark M. Hoffman wrote:
 Hi all:
 
 * Alan Stern [EMAIL PROTECTED] [2007-11-19 15:27:14 -0500]:
  On Mon, 19 Nov 2007, Rudolf Marek wrote:
  
   Hello all,
gives coretemp_cpu_callback - coretemp_device_remove -
platform_device_unregister, so coretemp seems to be what I have and 
you don't.

Yes.

For the coretemp developers: coretemp_cpu_callback() needs to be more 
careful about what it does.  During a system sleep transition (suspend, 
hibernate, resume) it isn't possible to register or unregister a 
device.  Attempts to register will fail and attempts to unregister will 
block until the system sleep is over -- and for this callback that 
means hanging.
   
   Well I wrote the driver. Thanks for the clarification. If I recall 
   correctly I 
   looked how this part should be done from others drivers. Now while 
   checking
   what happened to the file, seems Rafael added something related.
   
   http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=8bb7844286fb8c9fce6f65d8288aeb09d03a5e0d
  
  That does look like it was meant for exactly this sort of situation.
  
It's not clear what the best way is to fix this.  Perhaps the CPU 
notification should be sent along with a special flag indicating that 
the CPU transition is part of a system sleep (although this seems 
racy).  Perhaps the driver should notice when a system sleep begins, 
and defer all CPU-change handling until after the sleep is over.
   
   maybe it does exist?  CPU_DOWN_PREPARE ?
   
   http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/cpu-hotplug.txt;hb=HEAD
   
   Unfortunately I'm not very familiar with this, calling the 
   coretemp_device_remove from CPU_DOWN_PREPARE would help? Looking at 
   microcode 
   driver, seems it just hide sysfs interface from user.
 
 AFAICT from that documentation, it would have been better to unregister
 the device on CPU_DOWN_PREPARE anyway.  CPU_DEAD seems to be too late -
 it's already gone by then.
 
  I'm not sure exactly what you want to do here.  But it seems like a 
  waste to unregister the coretemp devices at the start of a system sleep 
  and then register them back at the end.
  
  Could you simply leave the devices registered throughout the entire
  sleep?  Of course, at the end you would have to check that all the CPUs
  really did come back up, and unregister the devices for the CPUs that
  are still offline.
 
 Is it possible to unregister a driver on CPU_DOWN_PREPARE_FROZEN?

No.  In that case the suspend core is holding the device's mutex and your
attempt to unregister it will deadlock with it.

Do you _have_ _to_ unregister the device at all?  Why don't you just leave
it registered on CPU_DOWN_PREPARE_FROZEN?  The CPU is not going away
physically in this case and it's _guaranteed_ that _cpu_up() will be called on
it as soon as the hibernation image is ready or we are back from suspend.

 If so, then the simplest fix would be the patch below (Jiri: feel free to
 try it). Otherwise it would take a bit of refactoring to bring the sysfs
 interface down/up for suspend/resume.
 
 commit ce9c7b78c839a6304696d90083eac08baad524ce
 Author: Mark M. Hoffman [EMAIL PROTECTED]
 Date:   Tue Nov 20 07:51:50 2007 -0500
 
 hwmon: (coretemp) fix suspend/resume hang
 
 Signed-off-by: Mark M. Hoffman [EMAIL PROTECTED]

I'd do it like this:

 diff --git a/drivers/hwmon/coretemp.c b/drivers/hwmon/coretemp.c
 index 5c82ec7..afe2d31 100644
 --- a/drivers/hwmon/coretemp.c
 +++ b/drivers/hwmon/coretemp.c
 @@ -338,10 +338,12 @@ static int coretemp_cpu_callback(struct notifier_block 
 *nfb,
   switch (action) {
   case CPU_ONLINE:
   case CPU_ONLINE_FROZEN:
 + case CPU_DOWN_FAILED:
   coretemp_device_add(cpu);
+   case CPU_DOWN_FAILED_FROZEN: 
   break;
 - case CPU_DEAD:
 - case CPU_DEAD_FROZEN:
 + case CPU_DOWN_PREPARE:
   coretemp_device_remove(cpu);
+   case CPU_DOWN_PREPARE_FROZEN:
   break;
   }

Greetings,
Rafael


-- 
Premature optimization is the root of all evil. - Donald Knuth
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] 2.6.24-rc2-mm1 Warning at arch/x86/kernel/smp_32.c:561

2007-11-20 Thread Dave Young
On Nov 20, 2007 5:59 PM, Dave Young [EMAIL PROTECTED] wrote:

 On Nov 20, 2007 5:56 PM, Andrew Morton [EMAIL PROTECTED] wrote:
 
  On Tue, 20 Nov 2007 17:47:28 +0800 Dave Young [EMAIL PROTECTED] wrote:
 
   Hi,
   I encountered kernel warningsr. I just executed xawtv without video dev 
   being found.
  
   like this:
  
   WARNING: at arch/x86/kernel/smp_32.c:561 native_smp_call_function_mask()
[c0118769] native_smp_call_function_mask+0x149/0x150
[c0178dd9] alloc_debug_processing+0xa9/0x130
[c0372da0] smp_callback+0x0/0x10
[c0119b7c] smp_call_function+0x1c/0x20
[c0372dc8] cpuidle_latency_notify+0x18/0x20
[c0144eae] notifier_call_chain+0x3e/0x70
[c01450d4] __blocking_notifier_call_chain+0x44/0x70
[c0145117] blocking_notifier_call_chain+0x17/0x20
[c01454fd] pm_qos_add_requirement+0x8d/0xd0
[f887030c] snd_pcm_hw_params+0x20c/0x2a0 [snd_pcm]
[f88703ee] snd_pcm_hw_params_user+0x4e/0x90 [snd_pcm]
[f88741ed] snd_pcm_capture_ioctl1+0x3d/0x230 [snd_pcm]
[f88c1b28] snd_pcm_hw_param_near+0x198/0x230 [snd_pcm_oss]
[f88744fe] snd_pcm_kernel_ioctl+0x7e/0x90 [snd_pcm]
[f88c28fc] snd_pcm_oss_change_params+0x2fc/0x750 [snd_pcm_oss]
[f88c2e47] snd_pcm_oss_make_ready+0x47/0x60 [snd_pcm_oss]
[f88c3a4e] snd_pcm_oss_sync+0x10e/0x290 [snd_pcm_oss]
[f88c4daa] snd_pcm_oss_release+0x9a/0xb0 [snd_pcm_oss]
[c01801ae] __fput+0x16e/0x200
[c017e35c] filp_close+0x3c/0x80
[c017e409] sys_close+0x69/0xd0
[c01042da] syscall_call+0x7/0xb
[c040] xfrm_notify_sa+0x110/0x290
===
  
 
  That was hopefully fixed.  You might care to test
  ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/mm/broken-out-2007-11-20-01-45.tar.gz
  to confirm that, if feeling sufficiently brave..
 


Hi,
I just confirm that I can't reproduce this after apply
broken-out-2007-11-20-01-45 patch set.

Regards
dave
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc2-mm1: kcryptd vs lockdep

2007-11-19 Thread Torsten Kaiser
On Nov 19, 2007 10:00 PM, Milan Broz <[EMAIL PROTECTED]> wrote:
> Torsten Kaiser wrote:
> > Anything I could try, apart from more boots with slub_debug=F?

One time it triggered with slub_debug=F, but no additional output.
With slub_debug=FP I have not seen it again, so I can't say if that
would yield more info.

> Please could you try which patch from the dm-crypt series cause this ?
> (agk-dm-dm-crypt* names.)
>
> I suspect agk-dm-dm-crypt-move-bio-submission-to-thread.patch because
> there is one work struct used subsequently in two threads...
> (io thread already started while crypt thread is processing lockdep_map
> after calling f(work)...)

After reverting only
agk-dm-dm-crypt-move-bio-submission-to-thread.patch I also have not
seen the 'held lock freed' message again.

If it happens again with this revert, I will post that output.

Thanks for the hint.

Torsten
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] 2.6.24-rc2-mm1 - kernel bug on nfs v4

2007-11-19 Thread Andrew Morton
On Sun, 18 Nov 2007 14:18:06 -0500 Trond Myklebust <[EMAIL PROTECTED]> wrote:

> > 
> > Torsten
> 
> I had already fixed that one in my own stack. Attached are the 3 patches
> that I've got. 1 from SteveD, 2 fixes.
> 
> Andrew, could you please unapply the sillyrename patches you've got, and
> apply these 3 instead?

I'd expect to see things like this appear in git-nfs.patch.  Did something 
change?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] 2.6.24-rc2-mm1 - kernel bug on nfs v4

2007-11-19 Thread Christoph Lameter
On Sun, 18 Nov 2007, root wrote:

> @@ -2155,6 +2155,7 @@ static struct kmem_cache_node *early_kme
>  {
>   struct page *page;
>   struct kmem_cache_node *n;
> + unsigned long flags;
>  
>   BUG_ON(kmalloc_caches->size < sizeof(struct kmem_cache_node));
>  

Well local_irq_save is a bit of an overkill. We know that interrupts are 
enabled during this phase of the boot sequence.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: broken suspend [Was: 2.6.24-rc2-mm1]

2007-11-19 Thread Rafael J. Wysocki
On Monday, 19 of November 2007, Rudolf Marek wrote:
> Hello all,
> >>> gives coretemp_cpu_callback -> coretemp_device_remove ->
> >>> platform_device_unregister, so coretemp seems to be what I have and you 
> >>> don't.
> > 
> > Yes.
> > 
> > For the coretemp developers: coretemp_cpu_callback() needs to be more 
> > careful about what it does.  During a system sleep transition (suspend, 
> > hibernate, resume) it isn't possible to register or unregister a 
> > device.  Attempts to register will fail and attempts to unregister will 
> > block until the system sleep is over -- and for this callback that 
> > means hanging.
> 
> Well I wrote the driver. Thanks for the clarification. If I recall correctly 
> I 
> looked how this part should be done from others drivers. Now while checking
> what happened to the file, seems Rafael added something related.
> 
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=8bb7844286fb8c9fce6f65d8288aeb09d03a5e0d

Well, in principle you can use the observation that the _FROZEN versions
are used during suspend/hibernation.  Thus, if you only unregister the device
for CPU_DEAD, but you won't do that for CPU_DEAD_FROZEN, it will work as long
as the freezer is there.

> > It's not clear what the best way is to fix this.  Perhaps the CPU 
> > notification should be sent along with a special flag indicating that 
> > the CPU transition is part of a system sleep (although this seems 
> > racy).

In fact, it's already done that way and I don't think it's racy (see above).

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc2-mm1: kcryptd vs lockdep

2007-11-19 Thread Milan Broz
Torsten Kaiser wrote:
> On Nov 19, 2007 8:56 AM, Ingo Molnar <[EMAIL PROTECTED]> wrote:
>> * Torsten Kaiser <[EMAIL PROTECTED]> wrote:
...
> Above this acquire/release sequence is the following comment:
> #ifdef CONFIG_LOCKDEP
> /*
>  * It is permissible to free the struct work_struct
>  * from inside the function that is called from it,
>  * this we need to take into account for lockdep too.
>  * To avoid bogus "held lock freed" warnings as well
>  * as problems when looking into work->lockdep_map,
>  * make a copy and use that here.
>  */
> struct lockdep_map lockdep_map = work->lockdep_map;
> #endif
> 
> Did something trigger this anyway?
> 
> Anything I could try, apart from more boots with slub_debug=F?

Please could you try which patch from the dm-crypt series cause this ?
(agk-dm-dm-crypt* names.)

I suspect agk-dm-dm-crypt-move-bio-submission-to-thread.patch because
there is one work struct used subsequently in two threads...
(io thread already started while crypt thread is processing lockdep_map
after calling f(work)...)

(btw these patches prepare dm-crypt for next patchset introducing 
async cryptoapi, so there should be no functional changes yet.)

Milan
--
[EMAIL PROTECTED]


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: broken suspend [Was: 2.6.24-rc2-mm1]

2007-11-19 Thread Alan Stern
On Mon, 19 Nov 2007, Rudolf Marek wrote:

> Hello all,
> >>> gives coretemp_cpu_callback -> coretemp_device_remove ->
> >>> platform_device_unregister, so coretemp seems to be what I have and you 
> >>> don't.
> > 
> > Yes.
> > 
> > For the coretemp developers: coretemp_cpu_callback() needs to be more 
> > careful about what it does.  During a system sleep transition (suspend, 
> > hibernate, resume) it isn't possible to register or unregister a 
> > device.  Attempts to register will fail and attempts to unregister will 
> > block until the system sleep is over -- and for this callback that 
> > means hanging.
> 
> Well I wrote the driver. Thanks for the clarification. If I recall correctly 
> I 
> looked how this part should be done from others drivers. Now while checking
> what happened to the file, seems Rafael added something related.
> 
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=8bb7844286fb8c9fce6f65d8288aeb09d03a5e0d

That does look like it was meant for exactly this sort of situation.

> > It's not clear what the best way is to fix this.  Perhaps the CPU 
> > notification should be sent along with a special flag indicating that 
> > the CPU transition is part of a system sleep (although this seems 
> > racy).  Perhaps the driver should notice when a system sleep begins, 
> > and defer all CPU-change handling until after the sleep is over.
> 
> maybe it does exist?  CPU_DOWN_PREPARE ?
> 
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/cpu-hotplug.txt;hb=HEAD
> 
> Unfortunately I'm not very familiar with this, calling the 
> coretemp_device_remove from CPU_DOWN_PREPARE would help? Looking at microcode 
> driver, seems it just hide sysfs interface from user.

I'm not sure exactly what you want to do here.  But it seems like a 
waste to unregister the coretemp devices at the start of a system sleep 
and then register them back at the end.

Could you simply leave the devices registered throughout the entire
sleep?  Of course, at the end you would have to check that all the CPUs
really did come back up, and unregister the devices for the CPUs that
are still offline.

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: broken suspend [Was: 2.6.24-rc2-mm1]

2007-11-19 Thread Rudolf Marek

Hello all,

gives coretemp_cpu_callback -> coretemp_device_remove ->
platform_device_unregister, so coretemp seems to be what I have and you don't.


Yes.

For the coretemp developers: coretemp_cpu_callback() needs to be more 
careful about what it does.  During a system sleep transition (suspend, 
hibernate, resume) it isn't possible to register or unregister a 
device.  Attempts to register will fail and attempts to unregister will 
block until the system sleep is over -- and for this callback that 
means hanging.


Well I wrote the driver. Thanks for the clarification. If I recall correctly I 
looked how this part should be done from others drivers. Now while checking

what happened to the file, seems Rafael added something related.

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=8bb7844286fb8c9fce6f65d8288aeb09d03a5e0d

It's not clear what the best way is to fix this.  Perhaps the CPU 
notification should be sent along with a special flag indicating that 
the CPU transition is part of a system sleep (although this seems 
racy).  Perhaps the driver should notice when a system sleep begins, 
and defer all CPU-change handling until after the sleep is over.


maybe it does exist?  CPU_DOWN_PREPARE ?

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/cpu-hotplug.txt;hb=HEAD

Unfortunately I'm not very familiar with this, calling the 
coretemp_device_remove from CPU_DOWN_PREPARE would help? Looking at microcode 
driver, seems it just hide sysfs interface from user.


Thanks,
Rudolf
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc2-mm1: kcryptd vs lockdep

2007-11-19 Thread Torsten Kaiser
On Nov 19, 2007 8:56 AM, Ingo Molnar <[EMAIL PROTECTED]> wrote:
>
> * Torsten Kaiser <[EMAIL PROTECTED]> wrote:
>
> > Trying the last NFSv4 patch (but that patch is only the cause, why I
> > had lockdep enabled) I got this:
> > [   64.550203]
> > [   64.550205] =
> > [   64.552213] [ BUG: held lock freed! ]
> > [   64.553633] -
> > [   64.555055] kcryptd/1022 is freeing memory
> > 81011EBEFB00-81011EBEFB3F, with a lock still held there!
>
> so kcryptd frees a live, still in use bio? That could be a receipe for
> data corruption. Does SLUB_DEBUG (or SLAB_DEBUG) catch anything?

I have SLUB_DEBUG=y, but not SLUB_DEBUG_ON.
But apart from this message I did not the anything in the syslog.
It seems to not be onetime event, as one the third boot it happend
again. Stacktrace was identical.
Sadly trying 3 boots with slub_debug=FZP and another one with only F
did not trigger it.

But I don't think kcryptd is freeing a bio at that point.
The message said about the freed lock: (kcryptd){--..}, at: []
(gdb) list *0x80247dd9
0x80247dd9 is in run_workqueue (include/asm/bitops_64.h:69).
64   * you should call smp_mb__before_clear_bit() and/or
smp_mb__after_clear_bit()
65   * in order to ensure changes are visible on other processors.
66   */
67  static inline void clear_bit(int nr, volatile void *addr)
68  {
69  __asm__ __volatile__( LOCK_PREFIX
70  "btrl %1,%0"
71  :ADDR
72  :"dIr" (nr));
73  }
increasing the addr a little bit shows:
(gdb) list *0x80247ddf
0x80247ddf is in run_workqueue (kernel/workqueue.c:275).
270 list_del_init(cwq->worklist.next);
271 spin_unlock_irq(>lock);
272
273 BUG_ON(get_wq_data(work) != cwq);
274 work_clear_pending(work);
275 lock_acquire(>wq->lockdep_map, 0, 0, 0,
2, _THIS_IP_);
276 lock_acquire(_map, 0, 0, 0, 2, _THIS_IP_);
277 f(work);
278 lock_release(_map, 1, _THIS_IP_);
279 lock_release(>wq->lockdep_map, 1, _THIS_IP_);


Above this acquire/release sequence is the following comment:
#ifdef CONFIG_LOCKDEP
/*
 * It is permissible to free the struct work_struct
 * from inside the function that is called from it,
 * this we need to take into account for lockdep too.
 * To avoid bogus "held lock freed" warnings as well
 * as problems when looking into work->lockdep_map,
 * make a copy and use that here.
 */
struct lockdep_map lockdep_map = work->lockdep_map;
#endif

Did something trigger this anyway?

Anything I could try, apart from more boots with slub_debug=F?

Torsten
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] 2.6.24-rc2-mm1 - kernel bug on nfs v4

2007-11-19 Thread Torsten Kaiser
On Nov 19, 2007 10:00 AM, Andrew Morton <[EMAIL PROTECTED]> wrote:
> On Mon, 19 Nov 2007 08:15:48 +0100 "Torsten Kaiser" <[EMAIL PROTECTED]> wrote:
> > On Nov 18, 2007 8:18 PM, Trond Myklebust <[EMAIL PROTECTED]> wrote:
> > > I had already fixed that one in my own stack. Attached are the 3 patches
> > > that I've got. 1 from SteveD, 2 fixes.
> >
> > Moving the init_waitqueue_head() like patch
> > linux-2.6.24-006-fix_to_fix_sillyrename_bug_on_umount.dif and applying
> > linux-2.6.24-007-fix_nfs_free_unlinkdata.dif lets my testcase work.
> > Also lockdep no longer complains about the non-static key.
>
> Thanks.
>
> To avoid goofups, could you please send the full fix against 2.6.24-rc2-mm1?

Umm... As I applied this changes manually there is a not insignificant
change of goofups on my part...

For the hang problem I think Tronds suggestion with replacing the
patches from -mm with fresh versions would be the best.


Anyway, currently I have the patch from
http://lkml.org/lkml/2007/11/16/74 to fix the can't-create-files-bug.

To fix the hang bug I used Tronds
linux-2.6.24-007-fix_nfs_free_unlinkdata.dif and the first two hunks
from linux-2.6.24-006-fix_to_fix_sillyrename_bug_on_umount.dif.

Torsten

The needed 2 hunks for reference:

--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -594,9 +594,6 @@ static int nfs_init_server(struct nfs_server *server,
/* Create a client RPC handle for the NFSv3 ACL management interface */
nfs_init_server_aclclient(server);

-   init_waitqueue_head(>active_wq);
-   atomic_set(>active, 0);
-
dprintk("<-- nfs_init_server() = 0 [new %p]\n", clp);
return 0;

@@ -736,6 +733,9 @@ static struct nfs_server *nfs_alloc_server(void)
INIT_LIST_HEAD(>client_link);
INIT_LIST_HEAD(>master_link);

+   init_waitqueue_head(>active_wq);
+   atomic_set(>active, 0);
+
server->io_stats = nfs_alloc_iostats();
if (!server->io_stats) {
kfree(server);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] 2.6.24-rc2-mm1 - kernel bug on nfs v4

2007-11-19 Thread Andrew Morton
On Mon, 19 Nov 2007 08:15:48 +0100 "Torsten Kaiser" <[EMAIL PROTECTED]> wrote:

> On Nov 18, 2007 8:18 PM, Trond Myklebust <[EMAIL PROTECTED]> wrote:
> > On Sun, 2007-11-18 at 19:44 +0100, Torsten Kaiser wrote:
> > > NFSv2/3 and NFSv4 share the same dentry_iput and so share the same
> > > unlink and sillyrename logic.
> > > But they do not share nfs_init_server()!
> > >
> > > I wonder why this doesn't blow up more violently, but only hangs...
> > >
> > > But as I don't know if it is correct to add the workqueue
> > > initialization to nfs4_init_server() or remove the nfs_sb_active /
> > > nfs_sb_deactive for the NFSv4 case, I can't offer a patch to fix this.
> > >
> > > Torsten
> >
> > I had already fixed that one in my own stack. Attached are the 3 patches
> > that I've got. 1 from SteveD, 2 fixes.
> >
> 
> Moving the init_waitqueue_head() like patch
> linux-2.6.24-006-fix_to_fix_sillyrename_bug_on_umount.dif and applying
> linux-2.6.24-007-fix_nfs_free_unlinkdata.dif lets my testcase work.
> Also lockdep no longer complains about the non-static key.
> 

Thanks.

To avoid goofups, could you please send the full fix against 2.6.24-rc2-mm1?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] 2.6.24-rc2-mm1 - kernel bug on nfs v4

2007-11-19 Thread Andrew Morton
On Mon, 19 Nov 2007 08:15:48 +0100 Torsten Kaiser [EMAIL PROTECTED] wrote:

 On Nov 18, 2007 8:18 PM, Trond Myklebust [EMAIL PROTECTED] wrote:
  On Sun, 2007-11-18 at 19:44 +0100, Torsten Kaiser wrote:
   NFSv2/3 and NFSv4 share the same dentry_iput and so share the same
   unlink and sillyrename logic.
   But they do not share nfs_init_server()!
  
   I wonder why this doesn't blow up more violently, but only hangs...
  
   But as I don't know if it is correct to add the workqueue
   initialization to nfs4_init_server() or remove the nfs_sb_active /
   nfs_sb_deactive for the NFSv4 case, I can't offer a patch to fix this.
  
   Torsten
 
  I had already fixed that one in my own stack. Attached are the 3 patches
  that I've got. 1 from SteveD, 2 fixes.
 
 
 Moving the init_waitqueue_head() like patch
 linux-2.6.24-006-fix_to_fix_sillyrename_bug_on_umount.dif and applying
 linux-2.6.24-007-fix_nfs_free_unlinkdata.dif lets my testcase work.
 Also lockdep no longer complains about the non-static key.
 

Thanks.

To avoid goofups, could you please send the full fix against 2.6.24-rc2-mm1?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] 2.6.24-rc2-mm1 - kernel bug on nfs v4

2007-11-19 Thread Torsten Kaiser
On Nov 19, 2007 10:00 AM, Andrew Morton [EMAIL PROTECTED] wrote:
 On Mon, 19 Nov 2007 08:15:48 +0100 Torsten Kaiser [EMAIL PROTECTED] wrote:
  On Nov 18, 2007 8:18 PM, Trond Myklebust [EMAIL PROTECTED] wrote:
   I had already fixed that one in my own stack. Attached are the 3 patches
   that I've got. 1 from SteveD, 2 fixes.
 
  Moving the init_waitqueue_head() like patch
  linux-2.6.24-006-fix_to_fix_sillyrename_bug_on_umount.dif and applying
  linux-2.6.24-007-fix_nfs_free_unlinkdata.dif lets my testcase work.
  Also lockdep no longer complains about the non-static key.

 Thanks.

 To avoid goofups, could you please send the full fix against 2.6.24-rc2-mm1?

Umm... As I applied this changes manually there is a not insignificant
change of goofups on my part...

For the hang problem I think Tronds suggestion with replacing the
patches from -mm with fresh versions would be the best.


Anyway, currently I have the patch from
http://lkml.org/lkml/2007/11/16/74 to fix the can't-create-files-bug.

To fix the hang bug I used Tronds
linux-2.6.24-007-fix_nfs_free_unlinkdata.dif and the first two hunks
from linux-2.6.24-006-fix_to_fix_sillyrename_bug_on_umount.dif.

Torsten

The needed 2 hunks for reference:

--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -594,9 +594,6 @@ static int nfs_init_server(struct nfs_server *server,
/* Create a client RPC handle for the NFSv3 ACL management interface */
nfs_init_server_aclclient(server);

-   init_waitqueue_head(server-active_wq);
-   atomic_set(server-active, 0);
-
dprintk(-- nfs_init_server() = 0 [new %p]\n, clp);
return 0;

@@ -736,6 +733,9 @@ static struct nfs_server *nfs_alloc_server(void)
INIT_LIST_HEAD(server-client_link);
INIT_LIST_HEAD(server-master_link);

+   init_waitqueue_head(server-active_wq);
+   atomic_set(server-active, 0);
+
server-io_stats = nfs_alloc_iostats();
if (!server-io_stats) {
kfree(server);
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc2-mm1: kcryptd vs lockdep

2007-11-19 Thread Torsten Kaiser
On Nov 19, 2007 8:56 AM, Ingo Molnar [EMAIL PROTECTED] wrote:

 * Torsten Kaiser [EMAIL PROTECTED] wrote:

  Trying the last NFSv4 patch (but that patch is only the cause, why I
  had lockdep enabled) I got this:
  [   64.550203]
  [   64.550205] =
  [   64.552213] [ BUG: held lock freed! ]
  [   64.553633] -
  [   64.555055] kcryptd/1022 is freeing memory
  81011EBEFB00-81011EBEFB3F, with a lock still held there!

 so kcryptd frees a live, still in use bio? That could be a receipe for
 data corruption. Does SLUB_DEBUG (or SLAB_DEBUG) catch anything?

I have SLUB_DEBUG=y, but not SLUB_DEBUG_ON.
But apart from this message I did not the anything in the syslog.
It seems to not be onetime event, as one the third boot it happend
again. Stacktrace was identical.
Sadly trying 3 boots with slub_debug=FZP and another one with only F
did not trigger it.

But I don't think kcryptd is freeing a bio at that point.
The message said about the freed lock: (kcryptd){--..}, at: [80247dd9]
(gdb) list *0x80247dd9
0x80247dd9 is in run_workqueue (include/asm/bitops_64.h:69).
64   * you should call smp_mb__before_clear_bit() and/or
smp_mb__after_clear_bit()
65   * in order to ensure changes are visible on other processors.
66   */
67  static inline void clear_bit(int nr, volatile void *addr)
68  {
69  __asm__ __volatile__( LOCK_PREFIX
70  btrl %1,%0
71  :ADDR
72  :dIr (nr));
73  }
increasing the addr a little bit shows:
(gdb) list *0x80247ddf
0x80247ddf is in run_workqueue (kernel/workqueue.c:275).
270 list_del_init(cwq-worklist.next);
271 spin_unlock_irq(cwq-lock);
272
273 BUG_ON(get_wq_data(work) != cwq);
274 work_clear_pending(work);
275 lock_acquire(cwq-wq-lockdep_map, 0, 0, 0,
2, _THIS_IP_);
276 lock_acquire(lockdep_map, 0, 0, 0, 2, _THIS_IP_);
277 f(work);
278 lock_release(lockdep_map, 1, _THIS_IP_);
279 lock_release(cwq-wq-lockdep_map, 1, _THIS_IP_);


Above this acquire/release sequence is the following comment:
#ifdef CONFIG_LOCKDEP
/*
 * It is permissible to free the struct work_struct
 * from inside the function that is called from it,
 * this we need to take into account for lockdep too.
 * To avoid bogus held lock freed warnings as well
 * as problems when looking into work-lockdep_map,
 * make a copy and use that here.
 */
struct lockdep_map lockdep_map = work-lockdep_map;
#endif

Did something trigger this anyway?

Anything I could try, apart from more boots with slub_debug=F?

Torsten
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: broken suspend [Was: 2.6.24-rc2-mm1]

2007-11-19 Thread Rudolf Marek

Hello all,

gives coretemp_cpu_callback - coretemp_device_remove -
platform_device_unregister, so coretemp seems to be what I have and you don't.


Yes.

For the coretemp developers: coretemp_cpu_callback() needs to be more 
careful about what it does.  During a system sleep transition (suspend, 
hibernate, resume) it isn't possible to register or unregister a 
device.  Attempts to register will fail and attempts to unregister will 
block until the system sleep is over -- and for this callback that 
means hanging.


Well I wrote the driver. Thanks for the clarification. If I recall correctly I 
looked how this part should be done from others drivers. Now while checking

what happened to the file, seems Rafael added something related.

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=8bb7844286fb8c9fce6f65d8288aeb09d03a5e0d

It's not clear what the best way is to fix this.  Perhaps the CPU 
notification should be sent along with a special flag indicating that 
the CPU transition is part of a system sleep (although this seems 
racy).  Perhaps the driver should notice when a system sleep begins, 
and defer all CPU-change handling until after the sleep is over.


maybe it does exist?  CPU_DOWN_PREPARE ?

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/cpu-hotplug.txt;hb=HEAD

Unfortunately I'm not very familiar with this, calling the 
coretemp_device_remove from CPU_DOWN_PREPARE would help? Looking at microcode 
driver, seems it just hide sysfs interface from user.


Thanks,
Rudolf
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: broken suspend [Was: 2.6.24-rc2-mm1]

2007-11-19 Thread Alan Stern
On Mon, 19 Nov 2007, Rudolf Marek wrote:

 Hello all,
  gives coretemp_cpu_callback - coretemp_device_remove -
  platform_device_unregister, so coretemp seems to be what I have and you 
  don't.
  
  Yes.
  
  For the coretemp developers: coretemp_cpu_callback() needs to be more 
  careful about what it does.  During a system sleep transition (suspend, 
  hibernate, resume) it isn't possible to register or unregister a 
  device.  Attempts to register will fail and attempts to unregister will 
  block until the system sleep is over -- and for this callback that 
  means hanging.
 
 Well I wrote the driver. Thanks for the clarification. If I recall correctly 
 I 
 looked how this part should be done from others drivers. Now while checking
 what happened to the file, seems Rafael added something related.
 
 http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=8bb7844286fb8c9fce6f65d8288aeb09d03a5e0d

That does look like it was meant for exactly this sort of situation.

  It's not clear what the best way is to fix this.  Perhaps the CPU 
  notification should be sent along with a special flag indicating that 
  the CPU transition is part of a system sleep (although this seems 
  racy).  Perhaps the driver should notice when a system sleep begins, 
  and defer all CPU-change handling until after the sleep is over.
 
 maybe it does exist?  CPU_DOWN_PREPARE ?
 
 http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/cpu-hotplug.txt;hb=HEAD
 
 Unfortunately I'm not very familiar with this, calling the 
 coretemp_device_remove from CPU_DOWN_PREPARE would help? Looking at microcode 
 driver, seems it just hide sysfs interface from user.

I'm not sure exactly what you want to do here.  But it seems like a 
waste to unregister the coretemp devices at the start of a system sleep 
and then register them back at the end.

Could you simply leave the devices registered throughout the entire
sleep?  Of course, at the end you would have to check that all the CPUs
really did come back up, and unregister the devices for the CPUs that
are still offline.

Alan Stern

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc2-mm1: kcryptd vs lockdep

2007-11-19 Thread Milan Broz
Torsten Kaiser wrote:
 On Nov 19, 2007 8:56 AM, Ingo Molnar [EMAIL PROTECTED] wrote:
 * Torsten Kaiser [EMAIL PROTECTED] wrote:
...
 Above this acquire/release sequence is the following comment:
 #ifdef CONFIG_LOCKDEP
 /*
  * It is permissible to free the struct work_struct
  * from inside the function that is called from it,
  * this we need to take into account for lockdep too.
  * To avoid bogus held lock freed warnings as well
  * as problems when looking into work-lockdep_map,
  * make a copy and use that here.
  */
 struct lockdep_map lockdep_map = work-lockdep_map;
 #endif
 
 Did something trigger this anyway?
 
 Anything I could try, apart from more boots with slub_debug=F?

Please could you try which patch from the dm-crypt series cause this ?
(agk-dm-dm-crypt* names.)

I suspect agk-dm-dm-crypt-move-bio-submission-to-thread.patch because
there is one work struct used subsequently in two threads...
(io thread already started while crypt thread is processing lockdep_map
after calling f(work)...)

(btw these patches prepare dm-crypt for next patchset introducing 
async cryptoapi, so there should be no functional changes yet.)

Milan
--
[EMAIL PROTECTED]


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: broken suspend [Was: 2.6.24-rc2-mm1]

2007-11-19 Thread Rafael J. Wysocki
On Monday, 19 of November 2007, Rudolf Marek wrote:
 Hello all,
  gives coretemp_cpu_callback - coretemp_device_remove -
  platform_device_unregister, so coretemp seems to be what I have and you 
  don't.
  
  Yes.
  
  For the coretemp developers: coretemp_cpu_callback() needs to be more 
  careful about what it does.  During a system sleep transition (suspend, 
  hibernate, resume) it isn't possible to register or unregister a 
  device.  Attempts to register will fail and attempts to unregister will 
  block until the system sleep is over -- and for this callback that 
  means hanging.
 
 Well I wrote the driver. Thanks for the clarification. If I recall correctly 
 I 
 looked how this part should be done from others drivers. Now while checking
 what happened to the file, seems Rafael added something related.
 
 http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=8bb7844286fb8c9fce6f65d8288aeb09d03a5e0d

Well, in principle you can use the observation that the _FROZEN versions
are used during suspend/hibernation.  Thus, if you only unregister the device
for CPU_DEAD, but you won't do that for CPU_DEAD_FROZEN, it will work as long
as the freezer is there.

  It's not clear what the best way is to fix this.  Perhaps the CPU 
  notification should be sent along with a special flag indicating that 
  the CPU transition is part of a system sleep (although this seems 
  racy).

In fact, it's already done that way and I don't think it's racy (see above).

Greetings,
Rafael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] 2.6.24-rc2-mm1 - kernel bug on nfs v4

2007-11-19 Thread Christoph Lameter
On Sun, 18 Nov 2007, root wrote:

 @@ -2155,6 +2155,7 @@ static struct kmem_cache_node *early_kme
  {
   struct page *page;
   struct kmem_cache_node *n;
 + unsigned long flags;
  
   BUG_ON(kmalloc_caches-size  sizeof(struct kmem_cache_node));
  

Well local_irq_save is a bit of an overkill. We know that interrupts are 
enabled during this phase of the boot sequence.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] 2.6.24-rc2-mm1 - kernel bug on nfs v4

2007-11-19 Thread Andrew Morton
On Sun, 18 Nov 2007 14:18:06 -0500 Trond Myklebust [EMAIL PROTECTED] wrote:

  
  Torsten
 
 I had already fixed that one in my own stack. Attached are the 3 patches
 that I've got. 1 from SteveD, 2 fixes.
 
 Andrew, could you please unapply the sillyrename patches you've got, and
 apply these 3 instead?

I'd expect to see things like this appear in git-nfs.patch.  Did something 
change?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc2-mm1: kcryptd vs lockdep

2007-11-19 Thread Torsten Kaiser
On Nov 19, 2007 10:00 PM, Milan Broz [EMAIL PROTECTED] wrote:
 Torsten Kaiser wrote:
  Anything I could try, apart from more boots with slub_debug=F?

One time it triggered with slub_debug=F, but no additional output.
With slub_debug=FP I have not seen it again, so I can't say if that
would yield more info.

 Please could you try which patch from the dm-crypt series cause this ?
 (agk-dm-dm-crypt* names.)

 I suspect agk-dm-dm-crypt-move-bio-submission-to-thread.patch because
 there is one work struct used subsequently in two threads...
 (io thread already started while crypt thread is processing lockdep_map
 after calling f(work)...)

After reverting only
agk-dm-dm-crypt-move-bio-submission-to-thread.patch I also have not
seen the 'held lock freed' message again.

If it happens again with this revert, I will post that output.

Thanks for the hint.

Torsten
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc2-mm1: kcryptd vs lockdep

2007-11-18 Thread Ingo Molnar

* Torsten Kaiser <[EMAIL PROTECTED]> wrote:

> Trying the last NFSv4 patch (but that patch is only the cause, why I
> had lockdep enabled) I got this:
> [   64.550203]
> [   64.550205] =
> [   64.552213] [ BUG: held lock freed! ]
> [   64.553633] -
> [   64.555055] kcryptd/1022 is freeing memory
> 81011EBEFB00-81011EBEFB3F, with a lock still held there!

so kcryptd frees a live, still in use bio? That could be a receipe for 
data corruption. Does SLUB_DEBUG (or SLAB_DEBUG) catch anything?

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.24-rc2-mm1: kcryptd vs lockdep

2007-11-18 Thread Torsten Kaiser
Trying the last NFSv4 patch (but that patch is only the cause, why I
had lockdep enabled) I got this:
[   64.550203]
[   64.550205] =
[   64.552213] [ BUG: held lock freed! ]
[   64.553633] -
[   64.555055] kcryptd/1022 is freeing memory
81011EBEFB00-81011EBEFB3F, with a lock still held there!
[   64.558809]  (kcryptd){--..}, at: []
run_workqueue+0x129/0x210
[   64.561743] 2 locks held by kcryptd/1022:
[   64.563296]  #0:  (kcryptd){--..}, at: []
run_workqueue+0x129/0x210
[   64.566409]  #1:  (>work#2){--..}, at: []
run_workqueue+0x129/0x210
[   64.569672]
[   64.569672] stack backtrace:
[   64.571375]
[   64.571375] Call Trace:
[   64.572913]  [] debug_check_no_locks_freed+0x190/0x1b0
[   64.575764]  [] mempool_free_slab+0x12/0x20
[   64.577986]  [] kmem_cache_free+0x79/0xe0
[   64.580140]  [] mempool_free_slab+0x12/0x20
[   64.582362]  [] mempool_free+0x8a/0xa0
[   64.584415]  [] bio_free+0x2f/0x50
[   64.586337]  [] bio_fs_destructor+0x10/0x20
[   64.588558]  [] bio_put+0x26/0x30
[   64.590446]  [] xfs_buf_bio_end_io+0x99/0x120
[   64.592734]  [] bio_endio+0x19/0x40
[   64.594687]  [] dec_pending+0x107/0x210
[   64.596775]  [] clone_endio+0x70/0xb0
[   64.598793]  [] kcryptd_do_crypt+0x0/0x290
[   64.600978]  [] bio_endio+0x19/0x40
[   64.602931]  [] crypt_dec_pending+0x32/0x50
[   64.605149]  [] kcryptd_do_crypt+0x64/0x290
[   64.607368]  [] kcryptd_do_crypt+0x0/0x290
[   64.609553]  [] kcryptd_do_crypt+0x0/0x290
[   64.611739]  [] run_workqueue+0x175/0x210
[   64.613892]  [] worker_thread+0x71/0xb0
[   64.615981]  [] autoremove_wake_function+0x0/0x40
[   64.618402]  [] worker_thread+0x0/0xb0
[   64.620454]  [] kthread+0x4d/0x80
[   64.622340]  [] child_rip+0xa/0x12
[   64.624262]  [] restore_args+0x0/0x30
[   64.626281]  [] kthread+0x0/0x80
[   64.628134]  [] child_rip+0x0/0x12
[   64.630052]
[   64.630637] INFO: lockdep is turned off.

I only have only seen this once, booting the same kernel build a
second time, it did not happen again.
Also I got two other oopses when trying to shut the system down after
the above happend. So it might be possible that kcryptd only was the
victim of an other corruption, but then I don't know what subsystem
was to blame.

The other oopses:
[  108.613851] Unable to handle kernel paging request at 0a6425203a72 RIP:
[  108.618485]  [<0a6425203a72>]
[  108.624339] PGD 0
[  108.626416] Oops: 0010 [1] SMP
[  108.629657] last sysfs file:
/sys/devices/pci:00/:00:0f.0/:01:00.1/resource
[  108.637675] CPU 3
[  108.639749] Modules linked in: radeon drm nfsd exportfs ipv6 tuner
tea5767 tda8290 tuner_xc2028 tda9887 tuner_simple mt20xx tea5761
tvaudio msp3400 bttv ir_common compat_ioctl32 videobuf_dma_sg
videobuf_core btcx_risc tveeprom videodev usbhid v4l2_common
v4l1_compat hid pata_amd sg
[  108.665913] Pid: 8715, comm: reboot Not tainted 2.6.24-rc2-mm1 #14
[  108.672103] RIP: 0010:[<0a6425203a72>]  [<0a6425203a72>]
[  108.678164] RSP: 0018:81011d4a1e10  EFLAGS: 00010206
[  108.683491] RAX: 0a6425203a72 RBX: 8077f9a0 RCX: 000a
[  108.690635] RDX: 8077fb40 RSI: 000a RDI: 81011ff0f870
[  108.697779] RBP: 81011d4a1e28 R08: 07d0 R09: 0001
[  108.704922] R10: 804bf40a R11:  R12: fee1dead
[  108.712066] R13: 01234567 R14:  R15: 0001
[  108.719210] FS:  7f217607f6f0() GS:81011ff11780()
knlGS:
[  108.727312] CS:  0010 DS:  ES:  CR0: 8005003b
[  108.733071] CR2: 0a6425203a72 CR3: 00011e182000 CR4: 06e0
[  108.740215] DR0:  DR1:  DR2: 
[  108.747358] DR3:  DR6: 0ff0 DR7: 0400
[  108.754501] Process reboot (pid: 8715, threadinfo 81011D4A,
task 81011EC20EC0)
[  108.762776] Stack:  8042c4ac 81011d4a1e28
 81011d4a1e48
[  108.770993]  802451ff 28121969 28121969
81011d4a1f78
[  108.778561]  802453d5 81011f9ec078 81011f97e780
81011d4a1e88
[  108.785911] Call Trace:
[  108.788601]  [] device_shutdown+0x4c/0xa0
[  108.794192]  [] kernel_restart+0x2f/0x70
[  108.799696]  [] sys_reboot+0x185/0x1d0
[  108.805028]  [] d_free+0x49/0x50
[  108.809832]  [] d_kill+0x50/0x70
[  108.814641]  [] mntput_no_expire+0x20/0xe0
[  108.820314]  [] __fput+0x17d/0x230
[  108.825295]  [] fput+0x16/0x20
[  108.829933]  [] trace_hardirqs_on_thunk+0x35/0x3a
[  108.836219]  [] system_call+0x7e/0x83
[  108.841459]
[  108.842979] INFO: lockdep is turned off.
[  108.846922]
[  108.846922] Code:  Bad RIP value.
[  108.851852] RIP  [<0a6425203a72>]
[  108.855570]  RSP 
[  108.859081] CR2: 0a6425203a72
[  110.859331] md: stopping all md devices.
[  110.863297] md: md1 still in use.
[  111.865229] sd 8:0:1:0: [sd

Re: [BUG] 2.6.24-rc2-mm1 - kernel bug on nfs v4

2007-11-18 Thread Torsten Kaiser
On Nov 18, 2007 8:18 PM, Trond Myklebust <[EMAIL PROTECTED]> wrote:
> On Sun, 2007-11-18 at 19:44 +0100, Torsten Kaiser wrote:
> > NFSv2/3 and NFSv4 share the same dentry_iput and so share the same
> > unlink and sillyrename logic.
> > But they do not share nfs_init_server()!
> >
> > I wonder why this doesn't blow up more violently, but only hangs...
> >
> > But as I don't know if it is correct to add the workqueue
> > initialization to nfs4_init_server() or remove the nfs_sb_active /
> > nfs_sb_deactive for the NFSv4 case, I can't offer a patch to fix this.
> >
> > Torsten
>
> I had already fixed that one in my own stack. Attached are the 3 patches
> that I've got. 1 from SteveD, 2 fixes.
>

Moving the init_waitqueue_head() like patch
linux-2.6.24-006-fix_to_fix_sillyrename_bug_on_umount.dif and applying
linux-2.6.24-007-fix_nfs_free_unlinkdata.dif lets my testcase work.
Also lockdep no longer complains about the non-static key.

Torsten
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] 2.6.24-rc2-mm1 powerpc (iseries)- build failure - mm/stab.c

2007-11-18 Thread Stephen Rothwell
From: Kamalesh Babulal <[EMAIL PROTECTED]>

The kernel builds fails with following error, with randconfig

  CC  arch/powerpc/mm/stab.o
arch/powerpc/mm/stab.c: In function ‘stab_initialize’:
arch/powerpc/mm/stab.c:282: error: implicit declaration of function ‘HvCall1’
arch/powerpc/mm/stab.c:282: error: ‘HvCallBaseSetASR’ undeclared (first use in 
this function)
arch/powerpc/mm/stab.c:282: error: (Each undeclared identifier is reported only 
once
arch/powerpc/mm/stab.c:282: error: for each function it appears in.)
make[1]: *** [arch/powerpc/mm/stab.o] Error 1
make: *** [arch/powerpc/mm] Error 2

Signed-off-by: Kamalesh Babulal <[EMAIL PROTECTED]>
Acked-by: Stephen Rothwell <[EMAIL PROTECTED]>
---
 arch/powerpc/mm/stab.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

On Mon, 19 Nov 2007 11:56:11 +0530 Kamalesh Babulal <[EMAIL PROTECTED]> wrote:
>
> Resubmitting the patch titled 
> powerpc-iseries-build-failure-mm-stabc.patch in the -mm tree.

Paulus, this should be fine for merging like above.

Cheers,
Stephen Rothwell[EMAIL PROTECTED]

diff --git a/arch/powerpc/mm/stab.c b/arch/powerpc/mm/stab.c
index 9e85bda..50448d5 100644
--- a/arch/powerpc/mm/stab.c
+++ b/arch/powerpc/mm/stab.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct stab_entry {
unsigned long esid_data;
-- 
1.5.3.5

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH-RESEND] 2.6.24-rc2-mm1 powerpc (iseries)- build failure - mm/stab.c

2007-11-18 Thread Kamalesh Babulal
Hi Stephen,

Resubmitting the patch titled 
powerpc-iseries-build-failure-mm-stabc.patch in the -mm tree.

Signed-off-by: Kamalesh Babulal <[EMAIL PROTECTED]>
--
--- linux-2.6.24-rc2/arch/powerpc/mm/stab.c 2007-11-07 03:27:46.0 
+0530
+++ linux-2.6.24-rc2/arch/powerpc/mm/~stab.c2007-11-19 19:43:55.0 
+0530
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct stab_entry {
unsigned long esid_data;

-- 
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: broken suspend [Was: 2.6.24-rc2-mm1]

2007-11-18 Thread Alan Stern
On Sun, 18 Nov 2007, Jiri Slaby wrote:

> > gives coretemp_cpu_callback -> coretemp_device_remove ->
> > platform_device_unregister, so coretemp seems to be what I have and you 
> > don't.

Yes.

For the coretemp developers: coretemp_cpu_callback() needs to be more 
careful about what it does.  During a system sleep transition (suspend, 
hibernate, resume) it isn't possible to register or unregister a 
device.  Attempts to register will fail and attempts to unregister will 
block until the system sleep is over -- and for this callback that 
means hanging.

It's not clear what the best way is to fix this.  Perhaps the CPU 
notification should be sent along with a special flag indicating that 
the CPU transition is part of a system sleep (although this seems 
racy).  Perhaps the driver should notice when a system sleep begins, 
and defer all CPU-change handling until after the sleep is over.

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] 2.6.24-rc2-mm1 powerpc (iseries)- build failure - mm/stab.c

2007-11-18 Thread Stephen Rothwell
Hi Kamalesh,

On Wed, 14 Nov 2007 16:24:10 +0530 Kamalesh Babulal <[EMAIL PROTECTED]> wrote:
>
> +#ifdef CONFIG_PPC_ISERIES
> +#include 
> +#endif /* CONFIG_PPC_ISERIES */

You should not need the #ifdef and we prefer not to ifdef include unless
necessary.  Please resubmit.

-- 
Cheers,
Stephen Rothwell[EMAIL PROTECTED]
http://www.canb.auug.org.au/~sfr/


pgpYIHPGXlXyO.pgp
Description: PGP signature


Re: broken suspend [Was: 2.6.24-rc2-mm1]

2007-11-18 Thread Jiri Slaby
Aah, we probably should let coretemp people known.

Whole thread:
http://marc.info/?t=11950720581=1=2

On 11/18/2007 08:09 PM, Jiri Slaby wrote:
> On 11/18/2007 06:07 PM, Alan Stern wrote:
>> You'll get more useful results if you redo your changes to 
>> notifier_call_chain().  Have it print out the address of the routine 
>> _before_ making the call, and don't limit it to 20.  That way you'll 
>> know exactly which notifier routine ends up hanging.
> 
> The problem is, that notifier_call_chain is called again and again zillion 
> times
> by somebody else...
> 
> Anyway you led me to another idea:
> * _cpu_down
> printk("%s: u\n", __func__);
> BUBAK=1;
> /* CPU is completely dead: tell everyone.  Too late to complain. */
> if (raw_notifier_call_chain(_chain, CPU_DEAD | 0x88000 | mod,
> hcpu) == NOTIFY_BAD)
> BUG();
> BUBAK=0;
> -
> * notifier_call_chain
> unsigned int a = val & 0x88000;
> unsigned int yes = a == 0x88000;
> 
> nb = rcu_dereference(*nl);
> 
> if (a && a != 0x88000)
> printk("Somebody calls with val: %lx\n", val);
> else
> val &= ~0x88000;
> 
> while (nb && nr_to_call) {
> next_nb = rcu_dereference(nb->next);
> if (unlikely(BUBAK && yes))
> printk("%s: %p\n", __func__, nb->notifier_call);
> ret = nb->notifier_call(nb, val, v);
> -
> gives coretemp_cpu_callback -> coretemp_device_remove ->
> platform_device_unregister, so coretemp seems to be what I have and you don't.

Just in case you are curious:
http://www.fi.muni.cz/~xslaby/sklad/susp_hang3.diff
produces:
http://www.fi.muni.cz/~xslaby/sklad/susp_hang3.png


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: broken suspend [Was: 2.6.24-rc2-mm1]

2007-11-18 Thread Rafael J. Wysocki
On Sunday, 18 of November 2007, Jiri Slaby wrote:
> On 11/18/2007 11:27 PM, Rafael J. Wysocki wrote:
> > You can use a global variable to switch the logging only before the CPU
> > hotunplug done by the suspend code.  You just need to hack
> > disable_nonboot_cpus() for that.
> 
> If I understand you correctly, that's what BUBAK variable is there for.

Ah, yes.

-ETOOTIRED

> But it is still called again and again while the suspend code runs...

You can count the number of calls and then make it print the information for
the last, say, 20 of them.

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: broken suspend [Was: 2.6.24-rc2-mm1]

2007-11-18 Thread Jiri Slaby
On 11/18/2007 11:27 PM, Rafael J. Wysocki wrote:
> You can use a global variable to switch the logging only before the CPU
> hotunplug done by the suspend code.  You just need to hack
> disable_nonboot_cpus() for that.

If I understand you correctly, that's what BUBAK variable is there for. But it
is still called again and again while the suspend code runs...

regards,
-- 
Jiri Slaby ([EMAIL PROTECTED])
Faculty of Informatics, Masaryk University
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: broken suspend [Was: 2.6.24-rc2-mm1]

2007-11-18 Thread Rafael J. Wysocki
On Sunday, 18 of November 2007, Jiri Slaby wrote:
> On 11/18/2007 06:07 PM, Alan Stern wrote:
> > You'll get more useful results if you redo your changes to 
> > notifier_call_chain().  Have it print out the address of the routine 
> > _before_ making the call, and don't limit it to 20.  That way you'll 
> > know exactly which notifier routine ends up hanging.
> 
> The problem is, that notifier_call_chain is called again and again zillion 
> times
> by somebody else...

You can use a global variable to switch the logging only before the CPU
hotunplug done by the suspend code.  You just need to hack
disable_nonboot_cpus() for that.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] 2.6.24-rc2-mm1 - kernel bug on nfs v4

2007-11-18 Thread Trond Myklebust

On Sun, 2007-11-18 at 19:44 +0100, Torsten Kaiser wrote:
> On Nov 18, 2007 12:05 AM, Peter Zijlstra <[EMAIL PROTECTED]> wrote:
> > I've been staring at this NFS code for a while an can't make any sense
> > out of it. It seems to correctly initialize the waitqueue. So this would
> > indicate corruption of some sort.
> 
> No, it does not "correctly" initialize the waitqueue. It doesn't even
> try to initialize it.
> 
> I now found the guilty patch and what is wrong with it.
> 
> nfs-stop-sillyname-renames-and-unmounts-from-racing.patch adds:
> 
> @@ -110,8 +112,22 @@ struct nfs_server {
>filesystem */
>  #endif
> void (*destroy)(struct nfs_server *);
> +
> +   atomic_t active; /* Keep trace of any activity to this server */
> +   wait_queue_head_t active_wq;  /* Wait for any activity to stop  */
> 
> and tries to initialize it:
> @@ -593,6 +593,10 @@ static int nfs_init_server(struct nfs_server *server,
> server->namelen  = data->namlen;
> /* Create a client RPC handle for the NFSv3 ACL management interface 
> */
> nfs_init_server_aclclient(server);
> +
> +   init_waitqueue_head(>active_wq);
> +   atomic_set(>active, 0);
> +
> 
> and then uses it via nfs_sb_active and nfs_sb_deactive:
> 
> @@ -29,6 +29,7 @@ struct nfs_unlinkdata {
>  static void
>  nfs_free_unlinkdata(struct nfs_unlinkdata *data)
>  {
> +   nfs_sb_deactive(NFS_SERVER(data->dir));
> iput(data->dir);
> put_rpccred(data->cred);
> kfree(data->args.name.name);
> @@ -151,6 +152,7 @@ static int nfs_do_call_unlink(struct dentry
> *parent, struct inode *dir, struct n
> nfs_dec_sillycount(dir);
> return 0;
> }
> +   nfs_sb_active(NFS_SERVER(dir));
> data->args.fh = NFS_FH(dir);
> nfs_fattr_init(>res.dir_attr);
> 
> 
> But it does not notice this:
> struct dentry_operations nfs_dentry_operations = {
> .d_revalidate   = nfs_lookup_revalidate,
> .d_delete   = nfs_dentry_delete,
> .d_iput = nfs_dentry_iput,
> };
> struct dentry_operations nfs4_dentry_operations = {
> .d_revalidate   = nfs_open_revalidate,
> .d_delete   = nfs_dentry_delete,
> .d_iput = nfs_dentry_iput,
> };
> 
> NFSv2/3 and NFSv4 share the same dentry_iput and so share the same
> unlink and sillyrename logic.
> But they do not share nfs_init_server()!
> 
> I wonder why this doesn't blow up more violently, but only hangs...
> 
> But as I don't know if it is correct to add the workqueue
> initialization to nfs4_init_server() or remove the nfs_sb_active /
> nfs_sb_deactive for the NFSv4 case, I can't offer a patch to fix this.
> 
> Torsten

I had already fixed that one in my own stack. Attached are the 3 patches
that I've got. 1 from SteveD, 2 fixes.

Andrew, could you please unapply the sillyrename patches you've got, and
apply these 3 instead?

Trond

--- Begin Message ---
Added an active/deactive mechanism to the nfs_server structure
allowing async operations to hold off umount until the
operations are done.

Signed-off-by: Steve Dickson <[EMAIL PROTECTED]>
Signed-off-by: Trond Myklebust <[EMAIL PROTECTED]>
---

 fs/nfs/client.c   |4 
 fs/nfs/super.c|   13 +
 fs/nfs/unlink.c   |2 ++
 include/linux/nfs_fs_sb.h |   17 +
 4 files changed, 36 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 70587f3..2ecf726 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -593,6 +593,10 @@ static int nfs_init_server(struct nfs_server *server,
server->namelen  = data->namlen;
/* Create a client RPC handle for the NFSv3 ACL management interface */
nfs_init_server_aclclient(server);
+
+   init_waitqueue_head(>active_wq);
+   atomic_set(>active, 0);
+
dprintk("<-- nfs_init_server() = 0 [new %p]\n", clp);
return 0;
 
diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index 71067d1..833aed8 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -202,6 +202,7 @@ static int nfs_get_sb(struct file_system_type *, int, const 
char *, void *, stru
 static int nfs_xdev_get_sb(struct file_system_type *fs_type,
int flags, const char *dev_name, void *raw_data, struct 
vfsmount *mnt);
 static void nfs_kill_super(struct super_block *);
+static void nfs_put_super(struct super_block *);
 
 static struct file_system_type nfs_fs_type = {
.owner  = THIS_MODULE,
@@ -223,6 +224,7 @@ static const struct super_operations nfs_sops = {
.alloc_inode= nfs_alloc_inode,
.destroy_inode  = nfs_destroy_inode,
.write_inode= nfs_write_inode,
+   .put_super  = nfs_put_super,
.statfs = nfs_statfs,
.clear_inode= nfs_clear_inode,
.umount_begin   = nfs_umount_begin,
@@ -1772,6 +1774,17 @@ static void nfs4_kill_super(struct super_block *sb)
  

Re: broken suspend [Was: 2.6.24-rc2-mm1]

2007-11-18 Thread Jiri Slaby
On 11/18/2007 06:07 PM, Alan Stern wrote:
> You'll get more useful results if you redo your changes to 
> notifier_call_chain().  Have it print out the address of the routine 
> _before_ making the call, and don't limit it to 20.  That way you'll 
> know exactly which notifier routine ends up hanging.

The problem is, that notifier_call_chain is called again and again zillion times
by somebody else...

Anyway you led me to another idea:
* _cpu_down
printk("%s: u\n", __func__);
BUBAK=1;
/* CPU is completely dead: tell everyone.  Too late to complain. */
if (raw_notifier_call_chain(_chain, CPU_DEAD | 0x88000 | mod,
hcpu) == NOTIFY_BAD)
BUG();
BUBAK=0;
-
* notifier_call_chain
unsigned int a = val & 0x88000;
unsigned int yes = a == 0x88000;

nb = rcu_dereference(*nl);

if (a && a != 0x88000)
printk("Somebody calls with val: %lx\n", val);
else
val &= ~0x88000;

while (nb && nr_to_call) {
next_nb = rcu_dereference(nb->next);
if (unlikely(BUBAK && yes))
printk("%s: %p\n", __func__, nb->notifier_call);
ret = nb->notifier_call(nb, val, v);
-
gives coretemp_cpu_callback -> coretemp_device_remove ->
platform_device_unregister, so coretemp seems to be what I have and you don't.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] 2.6.24-rc2-mm1 - kernel bug on nfs v4

2007-11-18 Thread Torsten Kaiser
On Nov 18, 2007 12:05 AM, Peter Zijlstra <[EMAIL PROTECTED]> wrote:
> I've been staring at this NFS code for a while an can't make any sense
> out of it. It seems to correctly initialize the waitqueue. So this would
> indicate corruption of some sort.

No, it does not "correctly" initialize the waitqueue. It doesn't even
try to initialize it.

I now found the guilty patch and what is wrong with it.

nfs-stop-sillyname-renames-and-unmounts-from-racing.patch adds:

@@ -110,8 +112,22 @@ struct nfs_server {
   filesystem */
 #endif
void (*destroy)(struct nfs_server *);
+
+   atomic_t active; /* Keep trace of any activity to this server */
+   wait_queue_head_t active_wq;  /* Wait for any activity to stop  */

and tries to initialize it:
@@ -593,6 +593,10 @@ static int nfs_init_server(struct nfs_server *server,
server->namelen  = data->namlen;
/* Create a client RPC handle for the NFSv3 ACL management interface */
nfs_init_server_aclclient(server);
+
+   init_waitqueue_head(>active_wq);
+   atomic_set(>active, 0);
+

and then uses it via nfs_sb_active and nfs_sb_deactive:

@@ -29,6 +29,7 @@ struct nfs_unlinkdata {
 static void
 nfs_free_unlinkdata(struct nfs_unlinkdata *data)
 {
+   nfs_sb_deactive(NFS_SERVER(data->dir));
iput(data->dir);
put_rpccred(data->cred);
kfree(data->args.name.name);
@@ -151,6 +152,7 @@ static int nfs_do_call_unlink(struct dentry
*parent, struct inode *dir, struct n
nfs_dec_sillycount(dir);
return 0;
}
+   nfs_sb_active(NFS_SERVER(dir));
data->args.fh = NFS_FH(dir);
nfs_fattr_init(>res.dir_attr);


But it does not notice this:
struct dentry_operations nfs_dentry_operations = {
.d_revalidate   = nfs_lookup_revalidate,
.d_delete   = nfs_dentry_delete,
.d_iput = nfs_dentry_iput,
};
struct dentry_operations nfs4_dentry_operations = {
.d_revalidate   = nfs_open_revalidate,
.d_delete   = nfs_dentry_delete,
.d_iput = nfs_dentry_iput,
};

NFSv2/3 and NFSv4 share the same dentry_iput and so share the same
unlink and sillyrename logic.
But they do not share nfs_init_server()!

I wonder why this doesn't blow up more violently, but only hangs...

But as I don't know if it is correct to add the workqueue
initialization to nfs4_init_server() or remove the nfs_sb_active /
nfs_sb_deactive for the NFSv4 case, I can't offer a patch to fix this.

Torsten
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: broken suspend [Was: 2.6.24-rc2-mm1]

2007-11-18 Thread Alan Stern
On Sun, 18 Nov 2007, Jiri Slaby wrote:

> On 11/18/2007 04:23 PM, Rafał J. Wysocki wrote:
> > On Sunday, 18 of November 2007, Jiri Slaby wrote:
> >> On 11/18/2007 04:03 PM, Rafael J. Wysocki wrote:
> >>> Can you also make the new System-map available, please?
> >> Sure:
> >> http://www.fi.muni.cz/~xslaby/sklad/System.map1
> > 
> > The last notifier called in 
> > http://www.fi.muni.cz/~xslaby/sklad/susp_hang2.png
> 
> Last... Note, that it's only first 20 invokations of notifiers, there are
> bazillion of them when I remove the condition '< 20'.
> 
> > is apparently cpu_swap_callback() which is not called in
> > http://www.fi.muni.cz/~xslaby/sklad/susp_hang1.png .
> > 
> > Can you verify that cpu_swap_callback() gets called if the patch is not
> > applied?
> 
> Does this still apply?

You'll get more useful results if you redo your changes to 
notifier_call_chain().  Have it print out the address of the routine 
_before_ making the call, and don't limit it to 20.  That way you'll 
know exactly which notifier routine ends up hanging.

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: broken suspend [Was: 2.6.24-rc2-mm1]

2007-11-18 Thread Jiri Slaby
On 11/18/2007 04:23 PM, Rafał J. Wysocki wrote:
> On Sunday, 18 of November 2007, Jiri Slaby wrote:
>> On 11/18/2007 04:03 PM, Rafael J. Wysocki wrote:
>>> Can you also make the new System-map available, please?
>> Sure:
>> http://www.fi.muni.cz/~xslaby/sklad/System.map1
> 
> The last notifier called in http://www.fi.muni.cz/~xslaby/sklad/susp_hang2.png

Last... Note, that it's only first 20 invokations of notifiers, there are
bazillion of them when I remove the condition '< 20'.

> is apparently cpu_swap_callback() which is not called in
> http://www.fi.muni.cz/~xslaby/sklad/susp_hang1.png .
> 
> Can you verify that cpu_swap_callback() gets called if the patch is not
> applied?

Does this still apply?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: broken suspend [Was: 2.6.24-rc2-mm1]

2007-11-18 Thread Rafał J. Wysocki
On Sunday, 18 of November 2007, Jiri Slaby wrote:
> On 11/18/2007 04:03 PM, Rafael J. Wysocki wrote:
> > Can you also make the new System-map available, please?
> 
> Sure:
> http://www.fi.muni.cz/~xslaby/sklad/System.map1

The last notifier called in http://www.fi.muni.cz/~xslaby/sklad/susp_hang2.png
is apparently cpu_swap_callback() which is not called in
http://www.fi.muni.cz/~xslaby/sklad/susp_hang1.png .

Can you verify that cpu_swap_callback() gets called if the patch is not
applied?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: broken suspend [Was: 2.6.24-rc2-mm1]

2007-11-18 Thread Jiri Slaby
On 11/18/2007 04:03 PM, Rafael J. Wysocki wrote:
> Can you also make the new System-map available, please?

Sure:
http://www.fi.muni.cz/~xslaby/sklad/System.map1
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: broken suspend [Was: 2.6.24-rc2-mm1]

2007-11-18 Thread Rafael J. Wysocki
On Sunday, 18 of November 2007, Jiri Slaby wrote:
> On 11/18/2007 02:42 PM, Rafael J. Wysocki wrote:
> > On Sunday, 18 of November 2007, Jiri Slaby wrote:
> >> On 11/18/2007 01:42 PM, Jiri Slaby wrote:
> >>> See shot of prints here:
> >>> http://www.fi.muni.cz/~xslaby/sklad/susp_hang1.png
> >> BTW output from that tree minus the patch:
> > 
> > Hm, it looks like one of the CPU hotplug notifiers is doing something wrong.
> > 
> > Can you try to see what happens (with the patch applied) if
> > thermal_throttle_cpu_callback() is not registered?
> 
> After commenting out
> //device_initcall(thermal_throttle_init_device);
> it looks like this:
> http://www.fi.muni.cz/~xslaby/sklad/susp_hang2.png

Can you also make the new System-map available, please?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[BUG] Oops in 2.6.24-rc2-mm1

2007-11-18 Thread Magotari
First of all, a disclaimer.
I am new to the kernel and this is my first report. As such I can make mistakes.
In doubt feel free to assume that the fault lies with me or my system.

A computer I have crashes during the boot process. My .config is
attached, I have generated it with oldconfig from a 24-rc2 kernel plus
a few changes with menuconfig. Because I was not able to get a picture
or the text of the oops, I have written down the most important stuff.
A lot of it varies between reboots. I had to use reboot=1000 in my
kernel command line to catch it, because the computer would power off
at once without it.

My kernel is not tained by blobs. The oops message has either [#1] or
[#2] in it.
The pid it reports varies.

The call trace looks totally uninformative. For each chunk only the
part in the [] is changing, the other bit being always <0>.

Finally I get this line: usb usb5: suspend_rh (auto-stop)
After this all output stops.

I'm not subscribed to the list, so please CC me.
If I had made any mistakes with this report, please tell me.
Thank you.

Karol Swietlicki


#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.24-rc2-mm1
# Sun Nov 18 13:31:45 2007
#
CONFIG_X86_32=y
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_X86=y
CONFIG_FAST_CMPXCHG_LOCAL=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_QUICKLIST=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
# CONFIG_GENERIC_GPIO is not set
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_DMI=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

#
# General setup
#
CONFIG_EXPERIMENTAL=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_TASKSTATS is not set
# CONFIG_USER_NS is not set
# CONFIG_PID_NS is not set
# CONFIG_AUDIT is not set
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=18
# CONFIG_CGROUPS is not set
# CONFIG_FAIR_GROUP_SCHED is not set
CONFIG_SYSFS_DEPRECATED=y
# CONFIG_RELAY is not set
# CONFIG_BLK_DEV_INITRD is not set
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_SYSCTL=y
CONFIG_EMBEDDED=y
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
# CONFIG_KALLSYMS is not set
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
# CONFIG_BUG is not set
# CONFIG_ELF_CORE is not set
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_ANON_INODES=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_VM_EVENT_COUNTERS=y
# CONFIG_SLUB_DEBUG is not set
# CONFIG_SLAB is not set
CONFIG_SLUB=y
# CONFIG_SLOB is not set
# CONFIG_PROC_PAGE_MONITOR is not set
CONFIG_RT_MUTEXES=y
# CONFIG_TINY_SHMEM is not set
CONFIG_BASE_SMALL=0
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
# CONFIG_MODVERSIONS is not set
# CONFIG_MODULE_SRCVERSION_ALL is not set
# CONFIG_KMOD is not set
CONFIG_STOP_MACHINE=y
CONFIG_BLOCK=y
# CONFIG_LBD is not set
# CONFIG_BLK_DEV_IO_TRACE is not set
# CONFIG_LSF is not set
# CONFIG_BLK_DEV_BSG is not set

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_AS=y
# CONFIG_IOSCHED_DEADLINE is not set
# CONFIG_IOSCHED_CFQ is not set
CONFIG_DEFAULT_AS=y
# CONFIG_DEFAULT_DEADLINE is not set
# CONFIG_DEFAULT_CFQ is not set
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED="anticipatory"

#
# Processor type and features
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y
CONFIG_GENERIC_CLOCKEVENTS_BUILD=y
CONFIG_SMP=y
CONFIG_X86_PC=y
# CONFIG_X86_ELAN is not set
# CONFIG_X86_VOYAGER is not set
# CONFIG_X86_NUMAQ is not set
# CONFIG_X86_RDC321X is not set
# CONFIG_X86_SUMMIT is not set
# CONFIG_X86_BIGSMP is not set
# CONFIG_X86_VISWS is not set
# CONFIG_X86_GENERICARCH is not set
# CONFIG_X86_ES7000 is not set
CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER=y
# CONFIG_PARAVIRT_GUEST is not set
# CONFIG_M386 is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
# CONFIG_M586TSC is not set
# CONFIG_M586MMX is not set
# CONFIG_M686 is not set
# CONFIG_MPENTIUMII is not set
# CONFIG_MPENTIUMIII is not set
# CONFIG_MPENTIUMM is not set
CONFIG_MCORE2=y
# CONFIG_MPENTIUM4 is not set
# CONFIG_MK6 is not set
# CONFIG_MK7 is not set
# CONFIG_MK8 is not set
# CONFIG_MCRUSOE is not set
# CONFIG_MEFFICEON is not set
# CONFIG_MWINCHIPC6 is not set
# CONFIG_MWINCHIP2 is not set
# CONFIG_MWINCHIP3D is not set
# CONFIG_MGEODEGX1 is not set
# CONFIG_MGEODE_LX is not set
# CONFIG_MCYRIXIII is not set
# CONFIG_MVIAC3_2 is not set
# CONFIG_MVIAC7 is not set
# CONFIG_X86_GENERIC is not set
CONFIG_X86_CMPXCHG=y
CONFIG_X86_L1_CACHE_SHIFT=6
CONFIG_X86_XADD=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
# CONFIG_ARCH_HAS_ILOG2_U32 is not set
# CONFIG_ARCH_HAS_ILOG2_U64 is not set
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_X

Re: broken suspend [Was: 2.6.24-rc2-mm1]

2007-11-18 Thread Jiri Slaby
On 11/18/2007 02:42 PM, Rafael J. Wysocki wrote:
> On Sunday, 18 of November 2007, Jiri Slaby wrote:
>> On 11/18/2007 01:42 PM, Jiri Slaby wrote:
>>> See shot of prints here:
>>> http://www.fi.muni.cz/~xslaby/sklad/susp_hang1.png
>> BTW output from that tree minus the patch:
> 
> Hm, it looks like one of the CPU hotplug notifiers is doing something wrong.
> 
> Can you try to see what happens (with the patch applied) if
> thermal_throttle_cpu_callback() is not registered?

After commenting out
//device_initcall(thermal_throttle_init_device);
it looks like this:
http://www.fi.muni.cz/~xslaby/sklad/susp_hang2.png

regards,
-- 
Jiri Slaby ([EMAIL PROTECTED])
Faculty of Informatics, Masaryk University
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: broken suspend [Was: 2.6.24-rc2-mm1]

2007-11-18 Thread Rafael J. Wysocki
On Sunday, 18 of November 2007, Jiri Slaby wrote:
> On 11/18/2007 01:42 PM, Jiri Slaby wrote:
> > See shot of prints here:
> > http://www.fi.muni.cz/~xslaby/sklad/susp_hang1.png
> 
> BTW output from that tree minus the patch:

Hm, it looks like one of the CPU hotplug notifiers is doing something wrong.

Can you try to see what happens (with the patch applied) if
thermal_throttle_cpu_callback() is not registered?

> _cpu_down: s
> _cpu_down: t
> CPU 1 is now offline
> SMP alternatives: switching to UP code
> _cpu_down: u
> notifier_call_chain: c 80232370 1
> notifier_call_chain: c 8026EF10 1
> notifier_call_chain: c 8024B8F0 1
> notifier_call_chain: c 802419E0 1
> notifier_call_chain: c 80255B50 1
> notifier_call_chain: c 80250C40 1
> notifier_call_chain: c 8028E8F0 1
> notifier_call_chain: c 802B59C0 1
> notifier_call_chain: c 80323460 1
> notifier_call_chain: c 80270990 0
> notifier_call_chain: c 8023D5D0 1
> notifier_call_chain: c 80266090 1
> notifier_call_chain: c 802320A0 1
> notifier_call_chain: c 80249DA0 1
> notifier_call_chain: c 80318440 1
> notifier_call_chain: c 8047BE80 1
> notifier_call_chain: c 80212F40 0
> notifier_call_chain: c 80216350 1
> notifier_call_chain: c 80217220 1
> notifier_call_chain: c 80218120 1
> _cpu_down: v
> _cpu_down: w
> _cpu_down: x
> _cpu_down: y
> _cpu_down: z
> disable_nonboot_cpus: 3 0
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: broken suspend [Was: 2.6.24-rc2-mm1]

2007-11-18 Thread Jiri Slaby
On 11/18/2007 01:42 PM, Jiri Slaby wrote:
> See shot of prints here:
> http://www.fi.muni.cz/~xslaby/sklad/susp_hang1.png

BTW output from that tree minus the patch:
_cpu_down: s
_cpu_down: t
CPU 1 is now offline
SMP alternatives: switching to UP code
_cpu_down: u
notifier_call_chain: c 80232370 1
notifier_call_chain: c 8026EF10 1
notifier_call_chain: c 8024B8F0 1
notifier_call_chain: c 802419E0 1
notifier_call_chain: c 80255B50 1
notifier_call_chain: c 80250C40 1
notifier_call_chain: c 8028E8F0 1
notifier_call_chain: c 802B59C0 1
notifier_call_chain: c 80323460 1
notifier_call_chain: c 80270990 0
notifier_call_chain: c 8023D5D0 1
notifier_call_chain: c 80266090 1
notifier_call_chain: c 802320A0 1
notifier_call_chain: c 80249DA0 1
notifier_call_chain: c 80318440 1
notifier_call_chain: c 8047BE80 1
notifier_call_chain: c 80212F40 0
notifier_call_chain: c 80216350 1
notifier_call_chain: c 80217220 1
notifier_call_chain: c 80218120 1
_cpu_down: v
_cpu_down: w
_cpu_down: x
_cpu_down: y
_cpu_down: z
disable_nonboot_cpus: 3 0

regards,
-- 
Jiri Slaby ([EMAIL PROTECTED])
Faculty of Informatics, Masaryk University
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: broken suspend [Was: 2.6.24-rc2-mm1]

2007-11-18 Thread Jiri Slaby
Alan Stern napsal(a):
> On Sat, 17 Nov 2007, Rafael J. Wysocki wrote:
>> On Saturday, 17 of November 2007, Jiri Slaby wrote:
>>> On 11/16/2007 05:10 PM, Alan Stern wrote:
 The thing to do is figure out which driver is causing the problem.
 Jiri, try enabling CONFIG_DEBUG_DRIVER.  
>>> Sadly no output.

Nice update scripts wiped kern.* from syslog config file out, hence no
output before.

> Back to the main topic...  My system hibernates and resumes with no
> apparent problem.  Jiri, it looks like you'll have to do some debug
> tracing of the routines in drivers/base/power/main.c.

Beside this two nothing strange:

dpm_suspend: b 00:06
WARNING: at /home/l/latest/bughunt/kernel/resource.c:185 __release_resource()

Call Trace:
 [] release_resource+0xb5/0xf0
 [] pnp_release_resources+0x70/0x130
 [] pnp_stop_dev+0x45/0x90
 [] pnp_bus_suspend+0x92/0xb0
 [] suspend_device+0x113/0x180
 [] device_suspend+0x200/0x320
 [] suspend_devices_and_enter+0xa5/0x170
 [] enter_state+0x209/0x270
 [] state_store+0xaf/0xf0
 [] kobj_attr_store+0x17/0x20
 [] sysfs_write_file+0xce/0x140
 [] vfs_write+0xc7/0x170
 [] sys_write+0x50/0x90
 [] system_call+0x7e/0x83

WARNING: at /home/l/latest/bughunt/kernel/resource.c:189 __release_resource()

Call Trace:
 [] release_resource+0xe0/0xf0
 [] pnp_release_resources+0x70/0x130
 [] pnp_stop_dev+0x45/0x90
 [] pnp_bus_suspend+0x92/0xb0
 [] suspend_device+0x113/0x180
 [] device_suspend+0x200/0x320
 [] suspend_devices_and_enter+0xa5/0x170
 [] enter_state+0x209/0x270
 [] state_store+0xaf/0xf0
 [] kobj_attr_store+0x17/0x20
 [] sysfs_write_file+0xce/0x140
 [] vfs_write+0xc7/0x170
 [] sys_write+0x50/0x90
 [] system_call+0x7e/0x83
...
dpm_suspend: b :00:1f.5
ACPI Error (psargs-0355): [FZHD] Namespace lookup failure, AE_NOT_FOUND
ACPI Error (psparse-0537): Method parse/execution failed
[\_SB_.PCI0.SAT1.CHN0._GTM] (Node 81007D000220), AE_NOT_FOUND
ACPI Error (psargs-0355): [FZHD] Namespace lookup failure, AE_NOT_FOUND
ACPI Error (psparse-0537): Method parse/execution failed
[\_SB_.PCI0.SAT1.CHN1._GTM] (Node 81007D000360), AE_NOT_FOUND



It's stuck at _cpu_down (enter_state -> suspend_devices_and_enter ->
disable_nonboot_cpus -> _cpu_down) after calling raw_notifier_call_chain

printk("%s: s\n", __func__);
/* Wait for it to sleep (leaving idle task). */
while (!idle_cpu(cpu))
yield();

printk("%s: t\n", __func__);
/* This actually kills the CPU. */
__cpu_die(cpu);

printk("%s: u\n", __func__);
BUBAK=1;
/* CPU is completely dead: tell everyone.  Too late to complain. */
if (raw_notifier_call_chain(_chain, CPU_DEAD | mod,
hcpu) == NOTIFY_BAD)
BUG();
BUBAK=0;

printk("%s: v\n", __func__);


See shot of prints here:
http://www.fi.muni.cz/~xslaby/sklad/susp_hang1.png

notifier_call_chain looks like:
while (nb && nr_to_call) {
next_nb = rcu_dereference(nb->next);
ret = nb->notifier_call(nb, val, v);
if (unlikely(BUBAK && cnt < 20 && (ret != lastr ||
lastp != nb->notifier_call))) {
printk("%s: c %p %d\n", __func__, nb->notifier_call,
ret);
lastr = ret;
lastp = nb->notifier_call;
cnt++;
}

if (nr_calls)
(*nr_calls)++;

if ((ret & NOTIFY_STOP_MASK) == NOTIFY_STOP_MASK)
break;
nb = next_nb;
nr_to_call--;
}

System.map is here if you are curoius what are the pointers from the snapshot:
http://www.fi.muni.cz/~xslaby/sklad/System.map

regards,
-- 
http://www.fi.muni.cz/~xslaby/Jiri Slaby
faculty of informatics, masaryk university, brno, cz
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   >