Re: [BUG] 2.6.24-rc2-mm1 - kernel bug on nfs v4
On Jan 2, 2008 9:51 PM, Christoph Lameter <[EMAIL PROTECTED]> wrote: > On Wed, 2 Jan 2008, Torsten Kaiser wrote: > > > I just tested something with vanilla 2.6.24-rc6 and had the same problem. > > Should this patch, or something similar be included for 2.6.24? > > Such a patch is in Andrew's tree. > > 2.6.24-rc6-mm1: > > tatic struct kmem_cache_node *early_kmem_cache_node_alloc(gfp_t gfpflags, >int node) > { > struct page *page; > struct kmem_cache_node *n; > unsigned long flags; > ... > > /* > > * lockdep requires consistent irq usage for each lock > * so even though there cannot be a race this early in > * the boot sequence, we still disable irqs. > */ > local_irq_save(flags); > add_partial(kmalloc_caches, page, 0); > local_irq_restore(flags); > return n; > } > from 2.6.24-rc6-mm1 patch-series file: slub-noinline-some-functions-to-avoid-them-being-folded-into-alloc-free.patch slub-move-kmem_cache_node-determination-into-add_full-and-add_partial.patch slub-move-kmem_cache_node-determination-into-add_full-and-add_partial-slub-workaround-for-lockdep-confusion.patch slub-avoid-checking-for-a-valid-object-before-zeroing-on-the-fast-path.patch It seems it got lumped into some other slub patches, but the bug does not seem to be introduced by them, as I can see it in mainline 2.6.24-rc6. Should this patch made a candidate for the merge-before-2.6.24-final-queue? Torsten -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] 2.6.24-rc2-mm1 - kernel bug on nfs v4
On Wed, 2 Jan 2008, Torsten Kaiser wrote: > I just tested something with vanilla 2.6.24-rc6 and had the same problem. > Should this patch, or something similar be included for 2.6.24? Such a patch is in Andrew's tree. 2.6.24-rc6-mm1: tatic struct kmem_cache_node *early_kmem_cache_node_alloc(gfp_t gfpflags, int node) { struct page *page; struct kmem_cache_node *n; unsigned long flags; ... /* * lockdep requires consistent irq usage for each lock * so even though there cannot be a race this early in * the boot sequence, we still disable irqs. */ local_irq_save(flags); add_partial(kmalloc_caches, page, 0); local_irq_restore(flags); return n; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] 2.6.24-rc2-mm1 - kernel bug on nfs v4
CC's somewhat trimmed... On Nov 18, 2007 12:00 AM, root <[EMAIL PROTECTED]> wrote: > On Sat, Nov 17, 2007 at 07:09:46PM +0100, Ingo Molnar wrote: > > * Torsten Kaiser <[EMAIL PROTECTED]> wrote: > > > > > Sadly lockdep does not work for me, as it gets turned off early: > > > [ 39.851594] - > > > [ 39.855963] inconsistent {softirq-on-W} -> {in-softirq-W} usage. > > > [ 39.861981] swapper/0 [HC0[0]:SC1[1]:HE0:SE0] takes: > > > [ 39.866963] (>list_lock){-+..}, at: [] > > > > hey, that means it found a bug - which is not sad at all :-) > > --- > Subject: lockdep: slub: annotate boot time node->list_lock usage > > inconsistent {softirq-on-W} -> {in-softirq-W} usage. > swapper/0 [HC0[0]:SC1[1]:HE0:SE0] takes: > (>list_lock){-+..}, at: [] add_partial+0x31/0xa0 > {softirq-on-W} state was registered at: > [] __lock_acquire+0x3e8/0x1140 > [] debug_check_no_locks_freed+0x188/0x1a0 > [] lock_acquire+0x55/0x70 > [] add_partial+0x31/0xa0 > [] _spin_lock+0x1e/0x30 > [] add_partial+0x31/0xa0 > [] kmem_cache_open+0x1cc/0x330 > [] _spin_unlock_irq+0x24/0x30 > [] create_kmalloc_cache+0x64/0xf0 > [] init_alloc_cpu_cpu+0x70/0x90 > [] kmem_cache_init+0x65/0x1d0 > [] start_kernel+0x23e/0x350 > [] _sinittext+0x12d/0x140 > [] 0x > > Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]> > CC: Christoph Lameter <[EMAIL PROTECTED]> > CC: Kamalesh Babulal <[EMAIL PROTECTED]> > --- > mm/slub.c |8 > 1 file changed, 8 insertions(+) > > Index: linux-2.6/mm/slub.c > === > --- linux-2.6.orig/mm/slub.c > +++ linux-2.6/mm/slub.c > @@ -2155,6 +2155,7 @@ static struct kmem_cache_node *early_kme > { > struct page *page; > struct kmem_cache_node *n; > + unsigned long flags; > > BUG_ON(kmalloc_caches->size < sizeof(struct kmem_cache_node)); > > @@ -2179,7 +2180,14 @@ static struct kmem_cache_node *early_kme > #endif > init_kmem_cache_node(n); > atomic_long_inc(>nr_slabs); > + /* > +* lockdep requires consistent irq usage for each lock > +* so even though there cannot be a race this early in > +* the boot sequence, we still disable irqs. > +*/ > + local_irq_save(flags); > add_partial(kmalloc_caches, page, 0); > + local_irq_restore(flags); > return n; > } I just tested something with vanilla 2.6.24-rc6 and had the same problem. Should this patch, or something similar be included for 2.6.24? The lockdep report: [ 40.057281] PCI: BIOS Bug: MCFG area at f000 is not E820-reserved [ 40.063736] PCI: Not using MMCONFIG. [ 40.067329] PCI: Using configuration type 1 [ 40.063153] [ 40.063154] = [ 40.063156] [ INFO: inconsistent lock state ] [ 40.063157] 2.6.24-rc6 #1 [ 40.063158] - [ 40.063160] inconsistent {softirq-on-W} -> {in-softirq-W} usage. [ 40.063162] swapper/0 [HC0[0]:SC1[1]:HE0:SE0] takes: [ 40.063163] (>list_lock){-+..}, at: [] add_partial+0x1c/0x50 [ 40.063172] {softirq-on-W} state was registered at: [ 40.063173] [] __lock_acquire+0x3c7/0x1140 [ 40.063179] [] trace_hardirqs_on+0xbf/0x160 [ 40.063182] [] lock_acquire+0x5b/0x80 [ 40.063185] [] add_partial+0x1c/0x50 [ 40.063187] [] _spin_lock+0x25/0x40 [ 40.063192] [] add_partial+0x1c/0x50 [ 40.063195] [] kmem_cache_open+0x1c7/0x330 [ 40.063198] [] create_kmalloc_cache+0x63/0xc0 [ 40.063200] [] kmem_cache_init+0x65/0x1d0 [ 40.063204] [] start_kernel+0x245/0x360 [ 40.063208] [] _sinittext+0x131/0x140 [ 40.063211] [] 0x [ 40.063214] irq event stamp: 569 [ 40.063215] hardirqs last enabled at (568): [] kmem_cache_free+0xcd/0x100 [ 40.063219] hardirqs last disabled at (569): [] kmem_cache_free+0x68/0x100 [ 40.063222] softirqs last enabled at (550): [] __do_softirq+0xef/0x110 [ 40.063226] softirqs last disabled at (557): [] call_softirq+0x1c/0x30 [ 40.063230] [ 40.063230] other info that might help us debug this: [ 40.063231] no locks held by swapper/0. [ 40.063232] [ 40.063233] stack backtrace: [ 40.063235] Pid: 0, comm: swapper Not tainted 2.6.24-rc6 #1 [ 40.063236] [ 40.063236] Call Trace: [ 40.063237][] print_usage_bug+0x189/0x190 [ 40.063243] [] mark_lock+0x63d/0x650 [ 40.063246] [] __lock_acquire+0x37e/0x1140 [ 40.063248] [] dump_trace+0xd7/0x2d0 [ 40.063250] [] save_stack_trace+0x28/0x50 [ 40.063253] [] free_fdtable_rcu+0x94/0xa0 [ 40.063255] [] lock_acquire+0x5b/0x80 [ 40.063257] [] add_partial+0x1c/0x50 [ 40.063259] [] _spin_lock+0x25/0x40 [ 40.063261] [] add_partial+0x1c/0x50 [ 40.063264] [] __slab_free+0xaf/0x2f0 [ 40.063265] [] free_fdtable_rcu+0x94/0xa0 [ 40.063267] [] free_fdtable_rcu+0x94/0xa0 [ 40.063269] [] kmem_cache_free+0xa1/0x100 [ 40.063271] [] free_fdtable_rcu+0x94/0xa0 [
Re: [BUG] 2.6.24-rc2-mm1 - kernel bug on nfs v4
CC's somewhat trimmed... On Nov 18, 2007 12:00 AM, root [EMAIL PROTECTED] wrote: On Sat, Nov 17, 2007 at 07:09:46PM +0100, Ingo Molnar wrote: * Torsten Kaiser [EMAIL PROTECTED] wrote: Sadly lockdep does not work for me, as it gets turned off early: [ 39.851594] - [ 39.855963] inconsistent {softirq-on-W} - {in-softirq-W} usage. [ 39.861981] swapper/0 [HC0[0]:SC1[1]:HE0:SE0] takes: [ 39.866963] (n-list_lock){-+..}, at: [802935c1] hey, that means it found a bug - which is not sad at all :-) --- Subject: lockdep: slub: annotate boot time node-list_lock usage inconsistent {softirq-on-W} - {in-softirq-W} usage. swapper/0 [HC0[0]:SC1[1]:HE0:SE0] takes: (n-list_lock){-+..}, at: [802935c1] add_partial+0x31/0xa0 {softirq-on-W} state was registered at: [80259fb8] __lock_acquire+0x3e8/0x1140 [80259838] debug_check_no_locks_freed+0x188/0x1a0 [8025ad65] lock_acquire+0x55/0x70 [802935c1] add_partial+0x31/0xa0 [805c76de] _spin_lock+0x1e/0x30 [802935c1] add_partial+0x31/0xa0 [80296f9c] kmem_cache_open+0x1cc/0x330 [805c7984] _spin_unlock_irq+0x24/0x30 [802974f4] create_kmalloc_cache+0x64/0xf0 [80295640] init_alloc_cpu_cpu+0x70/0x90 [8080ada5] kmem_cache_init+0x65/0x1d0 [807f1b4e] start_kernel+0x23e/0x350 [807f112d] _sinittext+0x12d/0x140 [] 0x Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] CC: Christoph Lameter [EMAIL PROTECTED] CC: Kamalesh Babulal [EMAIL PROTECTED] --- mm/slub.c |8 1 file changed, 8 insertions(+) Index: linux-2.6/mm/slub.c === --- linux-2.6.orig/mm/slub.c +++ linux-2.6/mm/slub.c @@ -2155,6 +2155,7 @@ static struct kmem_cache_node *early_kme { struct page *page; struct kmem_cache_node *n; + unsigned long flags; BUG_ON(kmalloc_caches-size sizeof(struct kmem_cache_node)); @@ -2179,7 +2180,14 @@ static struct kmem_cache_node *early_kme #endif init_kmem_cache_node(n); atomic_long_inc(n-nr_slabs); + /* +* lockdep requires consistent irq usage for each lock +* so even though there cannot be a race this early in +* the boot sequence, we still disable irqs. +*/ + local_irq_save(flags); add_partial(kmalloc_caches, page, 0); + local_irq_restore(flags); return n; } I just tested something with vanilla 2.6.24-rc6 and had the same problem. Should this patch, or something similar be included for 2.6.24? The lockdep report: [ 40.057281] PCI: BIOS Bug: MCFG area at f000 is not E820-reserved [ 40.063736] PCI: Not using MMCONFIG. [ 40.067329] PCI: Using configuration type 1 [ 40.063153] [ 40.063154] = [ 40.063156] [ INFO: inconsistent lock state ] [ 40.063157] 2.6.24-rc6 #1 [ 40.063158] - [ 40.063160] inconsistent {softirq-on-W} - {in-softirq-W} usage. [ 40.063162] swapper/0 [HC0[0]:SC1[1]:HE0:SE0] takes: [ 40.063163] (n-list_lock){-+..}, at: [8029409c] add_partial+0x1c/0x50 [ 40.063172] {softirq-on-W} state was registered at: [ 40.063173] [8025a237] __lock_acquire+0x3c7/0x1140 [ 40.063179] [8025991f] trace_hardirqs_on+0xbf/0x160 [ 40.063182] [8025b00b] lock_acquire+0x5b/0x80 [ 40.063185] [8029409c] add_partial+0x1c/0x50 [ 40.063187] [805d0345] _spin_lock+0x25/0x40 [ 40.063192] [8029409c] add_partial+0x1c/0x50 [ 40.063195] [802979a7] kmem_cache_open+0x1c7/0x330 [ 40.063198] [80297f23] create_kmalloc_cache+0x63/0xc0 [ 40.063200] [80812d65] kmem_cache_init+0x65/0x1d0 [ 40.063204] [807f9bc5] start_kernel+0x245/0x360 [ 40.063208] [807f9131] _sinittext+0x131/0x140 [ 40.063211] [] 0x [ 40.063214] irq event stamp: 569 [ 40.063215] hardirqs last enabled at (568): [8029677d] kmem_cache_free+0xcd/0x100 [ 40.063219] hardirqs last disabled at (569): [80296718] kmem_cache_free+0x68/0x100 [ 40.063222] softirqs last enabled at (550): [8023bb7f] __do_softirq+0xef/0x110 [ 40.063226] softirqs last disabled at (557): [8020cf0c] call_softirq+0x1c/0x30 [ 40.063230] [ 40.063230] other info that might help us debug this: [ 40.063231] no locks held by swapper/0. [ 40.063232] [ 40.063233] stack backtrace: [ 40.063235] Pid: 0, comm: swapper Not tainted 2.6.24-rc6 #1 [ 40.063236] [ 40.063236] Call Trace: [ 40.063237] IRQ [80258699] print_usage_bug+0x189/0x190 [ 40.063243] [802596fd] mark_lock+0x63d/0x650 [ 40.063246] [8025a1ee] __lock_acquire+0x37e/0x1140 [ 40.063248] [8020d1b7]
Re: [BUG] 2.6.24-rc2-mm1 - kernel bug on nfs v4
On Wed, 2 Jan 2008, Torsten Kaiser wrote: I just tested something with vanilla 2.6.24-rc6 and had the same problem. Should this patch, or something similar be included for 2.6.24? Such a patch is in Andrew's tree. 2.6.24-rc6-mm1: tatic struct kmem_cache_node *early_kmem_cache_node_alloc(gfp_t gfpflags, int node) { struct page *page; struct kmem_cache_node *n; unsigned long flags; ... /* * lockdep requires consistent irq usage for each lock * so even though there cannot be a race this early in * the boot sequence, we still disable irqs. */ local_irq_save(flags); add_partial(kmalloc_caches, page, 0); local_irq_restore(flags); return n; } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] 2.6.24-rc2-mm1 - kernel bug on nfs v4
On Jan 2, 2008 9:51 PM, Christoph Lameter [EMAIL PROTECTED] wrote: On Wed, 2 Jan 2008, Torsten Kaiser wrote: I just tested something with vanilla 2.6.24-rc6 and had the same problem. Should this patch, or something similar be included for 2.6.24? Such a patch is in Andrew's tree. 2.6.24-rc6-mm1: tatic struct kmem_cache_node *early_kmem_cache_node_alloc(gfp_t gfpflags, int node) { struct page *page; struct kmem_cache_node *n; unsigned long flags; ... /* * lockdep requires consistent irq usage for each lock * so even though there cannot be a race this early in * the boot sequence, we still disable irqs. */ local_irq_save(flags); add_partial(kmalloc_caches, page, 0); local_irq_restore(flags); return n; } from 2.6.24-rc6-mm1 patch-series file: slub-noinline-some-functions-to-avoid-them-being-folded-into-alloc-free.patch slub-move-kmem_cache_node-determination-into-add_full-and-add_partial.patch slub-move-kmem_cache_node-determination-into-add_full-and-add_partial-slub-workaround-for-lockdep-confusion.patch slub-avoid-checking-for-a-valid-object-before-zeroing-on-the-fast-path.patch It seems it got lumped into some other slub patches, but the bug does not seem to be introduced by them, as I can see it in mainline 2.6.24-rc6. Should this patch made a candidate for the merge-before-2.6.24-final-queue? Torsten -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [lm-sensors] broken suspend [Was: 2.6.24-rc2-mm1]
On 11/21/2007 01:00 AM, Rafael J. Wysocki wrote: > On Tuesday, 20 of November 2007, Mark M. Hoffman wrote: >> commit ce9c7b78c839a6304696d90083eac08baad524ce >> Author: Mark M. Hoffman <[EMAIL PROTECTED]> >> Date: Tue Nov 20 07:51:50 2007 -0500 >> >> hwmon: (coretemp) fix suspend/resume hang >> >> Signed-off-by: Mark M. Hoffman <[EMAIL PROTECTED]> > > I'd do it like this: > >> diff --git a/drivers/hwmon/coretemp.c b/drivers/hwmon/coretemp.c >> index 5c82ec7..afe2d31 100644 >> --- a/drivers/hwmon/coretemp.c >> +++ b/drivers/hwmon/coretemp.c >> @@ -338,10 +338,12 @@ static int coretemp_cpu_callback(struct notifier_block >> *nfb, >> switch (action) { >> case CPU_ONLINE: >> case CPU_ONLINE_FROZEN: >> +case CPU_DOWN_FAILED: >> coretemp_device_add(cpu); > + case CPU_DOWN_FAILED_FROZEN: >> break; >> -case CPU_DEAD: >> -case CPU_DEAD_FROZEN: >> +case CPU_DOWN_PREPARE: >> coretemp_device_remove(cpu); > + case CPU_DOWN_PREPARE_FROZEN: >> break; >> } Sorry for the delay, this (trimmed version) solves the problem! thanks, -- Jiri Slaby ([EMAIL PROTECTED]) Faculty of Informatics, Masaryk University - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [lm-sensors] broken suspend [Was: 2.6.24-rc2-mm1]
On 11/21/2007 01:00 AM, Rafael J. Wysocki wrote: On Tuesday, 20 of November 2007, Mark M. Hoffman wrote: commit ce9c7b78c839a6304696d90083eac08baad524ce Author: Mark M. Hoffman [EMAIL PROTECTED] Date: Tue Nov 20 07:51:50 2007 -0500 hwmon: (coretemp) fix suspend/resume hang Signed-off-by: Mark M. Hoffman [EMAIL PROTECTED] I'd do it like this: diff --git a/drivers/hwmon/coretemp.c b/drivers/hwmon/coretemp.c index 5c82ec7..afe2d31 100644 --- a/drivers/hwmon/coretemp.c +++ b/drivers/hwmon/coretemp.c @@ -338,10 +338,12 @@ static int coretemp_cpu_callback(struct notifier_block *nfb, switch (action) { case CPU_ONLINE: case CPU_ONLINE_FROZEN: +case CPU_DOWN_FAILED: coretemp_device_add(cpu); + case CPU_DOWN_FAILED_FROZEN: break; -case CPU_DEAD: -case CPU_DEAD_FROZEN: +case CPU_DOWN_PREPARE: coretemp_device_remove(cpu); + case CPU_DOWN_PREPARE_FROZEN: break; } Sorry for the delay, this (trimmed version) solves the problem! thanks, -- Jiri Slaby ([EMAIL PROTECTED]) Faculty of Informatics, Masaryk University - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc2-mm1: kcryptd vs lockdep
Alasdair G Kergon <[EMAIL PROTECTED]> wrote: > Also io->pending may need better protection - atomic, but missing memory > barriers? (May be getting away without sometimes due to side-effects of > other function calls, but needs doing properly.) If it's using atomic_dec_and_test then that comes with an implicit memory barrier. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc2-mm1: kcryptd vs lockdep
On Nov 24, 2007 4:49 AM, Alasdair G Kergon <[EMAIL PROTECTED]> wrote: > On Fri, Nov 23, 2007 at 11:42:36PM +0100, Torsten Kaiser wrote: > > ... or I just don't see the bug. > > See my earlier post in this thread: there's a race in the write loop > where a work struct could be used twice on the same queue. > (Needs data structure change to fix that, which nobody has attempted > to do yet.) As I wrote in an earlier post: I did see this lockdep message even with agk-dm-dm-crypt-move-bio-submission-to-thread.patch reverted, so the work struct is not used in the write loop. > BTW To eliminate any internal lockdep concerns (and people say there > should be no problem) temporarily add a second struct instead of reusing > one on two queues. I think, this might really be a lockdep bug, but as I'm not fluent enough with C, please check, if my logik is correct: The freed-locked-lock-test is the only function that uses this in lockdep.c: static inline int in_range(const void *start, const void *addr, const void *end) { return addr >= start && addr <= end; } This will return true, if addr is in the range of start (including) to end (including). But debug_check_no_locks_freed() seems does: const void *mem_to = mem_from + mem_len -> mem_to is the last byte of the freed range, that fits in_range lock_from = (void *)hlock->instance; -> first byte of the lock lock_to = (void *)(hlock->instance + 1); -> first byte of the next lock, not last byte of the lock that is being checked! (Or am I reading this wrong?) The test is: if (!in_range(mem_from, lock_from, mem_to) && !in_range(mem_from, lock_to, mem_to)) continue; So it tests, if the first byte of the lock is in the range that is freed ->OK And if the first byte of the *next* lock is in the range that is freed -> Not OK. That would also explain the rather strange output: = [ BUG: held lock freed! ] - kcryptd/1022 is freeing memory 81011EBEFB00-81011EBEFB3F, with a lock still held there! (kcryptd){--..}, at: [] run_workqueue+0x129/0x210 2 locks held by kcryptd/1022: #0: (kcryptd){--..}, at: [] run_workqueue+0x129/0x210 #1: (>work#2){--..}, at: [] run_workqueue+0x129/0x210 That claims that the lock of the *workqueue* struct, not the work struct is getting freed! But I'm still happily using the dm-crypt device, even 19 hours after that message. So my current best guess to the source of this message is, that with the change in the ref counting it is now possible that the work struct is really getting freed before the workqueue function returns. But as the comment in run_workqueue() says, that is still legal. But now the first byte of the next lock is part of the freed memory and so the wrong "held lock freed" is triggered. Torsten - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc2-mm1: kcryptd vs lockdep
On Fri, Nov 23, 2007 at 11:42:36PM +0100, Torsten Kaiser wrote: > Before the cleanup *all* calls to crypt_dec_pending() was via crypt_endio(). > Now there is an additional call to crypt_dec_pending() to balance the > additional ref placed into crypt_write_io_process(). And that one is > not called from whatever context/thread cleans up after > make_generic_request, but directly in the context/thread of the caller > of crypt_write_io_process(), and that is kcryptd. Please do look at the latest patches (always at http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/series.html ) where you'll see I've already disentangled the mess of functions and given them more understandable names, so at least following the program flow is easier. Read and write do the ref counting differently (but correctly AFAICT) - I want that changing, but held back from doing it without first checking whether the later patches (not yet reviewed) provide a reason to prefer one method over the other. Alasdair -- [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc2-mm1: kcryptd vs lockdep
Also io->pending may need better protection - atomic, but missing memory barriers? (May be getting away without sometimes due to side-effects of other function calls, but needs doing properly.) [BTW Other device-mapper atomic_t usage also needs reviewing.] Alasdair -- [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc2-mm1: kcryptd vs lockdep
On Fri, Nov 23, 2007 at 11:42:36PM +0100, Torsten Kaiser wrote: > ... or I just don't see the bug. See my earlier post in this thread: there's a race in the write loop where a work struct could be used twice on the same queue. (Needs data structure change to fix that, which nobody has attempted to do yet.) BTW To eliminate any internal lockdep concerns (and people say there should be no problem) temporarily add a second struct instead of reusing one on two queues. Alasdair -- [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc2-mm1: kcryptd vs lockdep
On Nov 19, 2007 10:00 PM, Milan Broz <[EMAIL PROTECTED]> wrote: > Torsten Kaiser wrote: > > On Nov 19, 2007 8:56 AM, Ingo Molnar <[EMAIL PROTECTED]> wrote: > >> * Torsten Kaiser <[EMAIL PROTECTED]> wrote: > ... > > Above this acquire/release sequence is the following comment: > > #ifdef CONFIG_LOCKDEP > > /* > > * It is permissible to free the struct work_struct > > * from inside the function that is called from it, > > * this we need to take into account for lockdep too. > > * To avoid bogus "held lock freed" warnings as well > > * as problems when looking into work->lockdep_map, > > * make a copy and use that here. > > */ > > struct lockdep_map lockdep_map = work->lockdep_map; > > #endif > > > > Did something trigger this anyway? > > > > Anything I could try, apart from more boots with slub_debug=F? > > Please could you try which patch from the dm-crypt series cause this ? > (agk-dm-dm-crypt* names.) > > I suspect agk-dm-dm-crypt-move-bio-submission-to-thread.patch because > there is one work struct used subsequently in two threads... > (io thread already started while crypt thread is processing lockdep_map > after calling f(work)...) > > (btw these patches prepare dm-crypt for next patchset introducing > async cryptoapi, so there should be no functional changes yet.) I looked at all of these agk-*-patches, as the error is not bisectable, because it triggers unreliable. The one that looks suspicious is agk-dm-dm-crypt-tidy-io-ref-counting.patch This one does a functional change, as there now is an additional ref on io->pending. Instead of only increasing io->pending if there really are more then one clone-bio, it will now take an additional ref in crypt_write_io_process(). I certainly agree with the cleanup, but this introduces the following change: Before the cleanup *all* calls to crypt_dec_pending() was via crypt_endio(). Now there is an additional call to crypt_dec_pending() to balance the additional ref placed into crypt_write_io_process(). And that one is not called from whatever context/thread cleans up after make_generic_request, but directly in the context/thread of the caller of crypt_write_io_process(), and that is kcryptd. So now it is possible (if all requests finish before crypt_write_io_process() returns) that kcryptd itself will release the bio, but the workqueue infrastructure still seems to have a lock on that. But as the comment in run_workqueue says, this should be legal, and I can't figure out what would make the the lockdep copy mechanism fail. Especially if the trigger was really a WRITE request, as with agk-dm-dm-crypt-move-bio-submission-to-thread.patch reverted this should never use the kcrypt_io-workqueue and so there should be not even the problem with using INIT_WORK twice on the same work_struct. ... or I just don't see the bug. Torsten - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc2-mm1: kcryptd vs lockdep
On Nov 20, 2007 7:55 AM, Torsten Kaiser <[EMAIL PROTECTED]> wrote: > On Nov 19, 2007 10:00 PM, Milan Broz <[EMAIL PROTECTED]> wrote: > > Please could you try which patch from the dm-crypt series cause this ? > > (agk-dm-dm-crypt* names.) > > > > I suspect agk-dm-dm-crypt-move-bio-submission-to-thread.patch because > > there is one work struct used subsequently in two threads... > > (io thread already started while crypt thread is processing lockdep_map > > after calling f(work)...) > > After reverting only > agk-dm-dm-crypt-move-bio-submission-to-thread.patch I also have not > seen the 'held lock freed' message again. > > If it happens again with this revert, I will post that output. It happened again, here I post the output: Nov 23 10:56:17 treogen [ 58.364441] XFS mounting filesystem dm-0 Nov 23 10:56:17 treogen [ 58.519648] Ending clean XFS mount for filesystem: dm-0 Nov 23 10:56:17 treogen [ 58.858098] Nov 23 10:56:17 treogen [ 58.858104] = Nov 23 10:56:17 treogen [ 58.863316] [ BUG: held lock freed! ] Nov 23 10:56:17 treogen [ 58.866998] - Nov 23 10:56:17 treogen [ 58.870685] kcryptd/1022 is freeing memory 81011EAD4B00-81011EAD4B3F , with a lock still held there! Nov 23 10:56:17 treogen [ 58.880430] (kcryptd){--..}, at: [] run_workqueue+0x129/0 x210 Nov 23 10:56:17 treogen [ 58.888014] 2 locks held by kcryptd/1022: Nov 23 10:56:17 treogen [ 58.892045] #0: (kcryptd){--..}, at: [] run_workqueue+0x 129/0x210 Nov 23 10:56:17 treogen [ 58.900095] #1: (>work#2){--..}, at: [] run_workqueu e+0x129/0x210 Nov 23 10:56:17 treogen [ 58.908535] Nov 23 10:56:17 treogen [ 58.908535] stack backtrace: Nov 23 10:56:17 treogen [ 58.912954] Nov 23 10:56:17 treogen [ 58.912955] Call Trace: Nov 23 10:56:17 treogen [ 58.916944] [] debug_check_no_locks_freed+0x190/0x1b0 Nov 23 10:56:17 treogen [ 58.924313] [] mempool_free_slab+0x12/0x20 Nov 23 10:56:17 treogen [ 58.930073] [] kmem_cache_free+0x79/0xe0 Nov 23 10:56:17 treogen [ 58.935665] [] mempool_free_slab+0x12/0x20 Nov 23 10:56:17 treogen [ 58.941424] [] mempool_free+0x8a/0xa0 Nov 23 10:56:17 treogen [ 58.946755] [] bio_free+0x2f/0x50 Nov 23 10:56:17 treogen [ 58.951736] [] dm_bio_destructor+0xd/0x10 Nov 23 10:56:17 treogen [ 58.957414] [] bio_put+0x26/0x30 Nov 23 10:56:17 treogen [ 58.962311] [] clone_endio+0x83/0xb0 Nov 23 10:56:17 treogen [ 58.967553] [] kcryptd_do_crypt+0x0/0x290 Nov 23 10:56:17 treogen [ 58.973224] [] bio_endio+0x19/0x40 Nov 23 10:56:17 treogen [ 58.978291] [] crypt_dec_pending+0x32/0x50 Nov 23 10:56:17 treogen [ 58.984050] [] kcryptd_do_crypt+0x64/0x290 Nov 23 10:56:17 treogen [ 58.989810] [] kcryptd_do_crypt+0x0/0x290 Nov 23 10:56:17 treogen [ 58.995483] [] kcryptd_do_crypt+0x0/0x290 Nov 23 10:56:17 treogen [ 59.001158] [] run_workqueue+0x175/0x210 Nov 23 10:56:17 treogen [ 59.006746] [] worker_thread+0x71/0xb0 Nov 23 10:56:17 treogen [ 59.012158] [] autoremove_wake_function+0x0/0x40 Nov 23 10:56:17 treogen [ 59.018435] [] worker_thread+0x0/0xb0 Nov 23 10:56:17 treogen [ 59.023764] [] kthread+0x4d/0x80 Nov 23 10:56:17 treogen [ 59.028660] [] child_rip+0xa/0x12 Nov 23 10:56:17 treogen [ 59.033640] [] restore_args+0x0/0x30 Nov 23 10:56:17 treogen [ 59.038880] [] kthread+0x0/0x80 Nov 23 10:56:17 treogen [ 59.043689] [] child_rip+0x0/0x12 Nov 23 10:56:17 treogen [ 59.048670] Nov 23 10:56:17 treogen [ 59.050190] INFO: lockdep is turned off. Nov 23 10:56:17 treogen [ 59.919020] pata_amd :00:04.0: version 0.3.10 >From what I see the only difference between the other stack traces and this one is the following part: old traces with agk-dm-dm-crypt-move-bio-submission-to-thread.patch applied: [ 64.584415] [] bio_free+0x2f/0x50 [ 64.586337] [] bio_fs_destructor+0x10/0x20 [ 64.588558] [] bio_put+0x26/0x30 [ 64.590446] [] xfs_buf_bio_end_io+0x99/0x120 [ 64.592734] [] bio_endio+0x19/0x40 [ 64.594687] [] dec_pending+0x107/0x210 [ 64.596775] [] clone_endio+0x70/0xb0 new trace with agk-dm-dm-crypt-move-bio-submission-to-thread.patch reverted: [ 58.946755] [] bio_free+0x2f/0x50 [ 58.951736] [] dm_bio_destructor+0xd/0x10 [ 58.957414] [] bio_put+0x26/0x30 [ 58.962311] [] clone_endio+0x83/0xb0 (gdb) list *0x804e3ae0 0x804e3ae0 is in clone_endio (drivers/md/dm.c:539). 534 dec_pending(tio->io, error); 535 536 /* 537 * Store md for cleanup instead of tio which is about to get freed. 538 */ 539 bio->bi_private = md->bs; 540 541 bio_put(bio); 542 free_tio(md, tio); 543 } (gdb) list *0x804e3af3 0x804e3af3 is in clone_endio (drivers/md/dm.c:542). 537 * Store md for cleanup instead of tio which is about to get freed. 538 */ 539 bio->bi_private = md->bs; 540 541 bio_put(bio); 542
Re: 2.6.24-rc2-mm1: kcryptd vs lockdep
Also io-pending may need better protection - atomic, but missing memory barriers? (May be getting away without sometimes due to side-effects of other function calls, but needs doing properly.) [BTW Other device-mapper atomic_t usage also needs reviewing.] Alasdair -- [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc2-mm1: kcryptd vs lockdep
On Fri, Nov 23, 2007 at 11:42:36PM +0100, Torsten Kaiser wrote: ... or I just don't see the bug. See my earlier post in this thread: there's a race in the write loop where a work struct could be used twice on the same queue. (Needs data structure change to fix that, which nobody has attempted to do yet.) BTW To eliminate any internal lockdep concerns (and people say there should be no problem) temporarily add a second struct instead of reusing one on two queues. Alasdair -- [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc2-mm1: kcryptd vs lockdep
On Fri, Nov 23, 2007 at 11:42:36PM +0100, Torsten Kaiser wrote: Before the cleanup *all* calls to crypt_dec_pending() was via crypt_endio(). Now there is an additional call to crypt_dec_pending() to balance the additional ref placed into crypt_write_io_process(). And that one is not called from whatever context/thread cleans up after make_generic_request, but directly in the context/thread of the caller of crypt_write_io_process(), and that is kcryptd. Please do look at the latest patches (always at http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/series.html ) where you'll see I've already disentangled the mess of functions and given them more understandable names, so at least following the program flow is easier. Read and write do the ref counting differently (but correctly AFAICT) - I want that changing, but held back from doing it without first checking whether the later patches (not yet reviewed) provide a reason to prefer one method over the other. Alasdair -- [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc2-mm1: kcryptd vs lockdep
On Nov 24, 2007 4:49 AM, Alasdair G Kergon [EMAIL PROTECTED] wrote: On Fri, Nov 23, 2007 at 11:42:36PM +0100, Torsten Kaiser wrote: ... or I just don't see the bug. See my earlier post in this thread: there's a race in the write loop where a work struct could be used twice on the same queue. (Needs data structure change to fix that, which nobody has attempted to do yet.) As I wrote in an earlier post: I did see this lockdep message even with agk-dm-dm-crypt-move-bio-submission-to-thread.patch reverted, so the work struct is not used in the write loop. BTW To eliminate any internal lockdep concerns (and people say there should be no problem) temporarily add a second struct instead of reusing one on two queues. I think, this might really be a lockdep bug, but as I'm not fluent enough with C, please check, if my logik is correct: The freed-locked-lock-test is the only function that uses this in lockdep.c: static inline int in_range(const void *start, const void *addr, const void *end) { return addr = start addr = end; } This will return true, if addr is in the range of start (including) to end (including). But debug_check_no_locks_freed() seems does: const void *mem_to = mem_from + mem_len - mem_to is the last byte of the freed range, that fits in_range lock_from = (void *)hlock-instance; - first byte of the lock lock_to = (void *)(hlock-instance + 1); - first byte of the next lock, not last byte of the lock that is being checked! (Or am I reading this wrong?) The test is: if (!in_range(mem_from, lock_from, mem_to) !in_range(mem_from, lock_to, mem_to)) continue; So it tests, if the first byte of the lock is in the range that is freed -OK And if the first byte of the *next* lock is in the range that is freed - Not OK. That would also explain the rather strange output: = [ BUG: held lock freed! ] - kcryptd/1022 is freeing memory 81011EBEFB00-81011EBEFB3F, with a lock still held there! (kcryptd){--..}, at: [80247dd9] run_workqueue+0x129/0x210 2 locks held by kcryptd/1022: #0: (kcryptd){--..}, at: [80247dd9] run_workqueue+0x129/0x210 #1: (io-work#2){--..}, at: [80247dd9] run_workqueue+0x129/0x210 That claims that the lock of the *workqueue* struct, not the work struct is getting freed! But I'm still happily using the dm-crypt device, even 19 hours after that message. So my current best guess to the source of this message is, that with the change in the ref counting it is now possible that the work struct is really getting freed before the workqueue function returns. But as the comment in run_workqueue() says, that is still legal. But now the first byte of the next lock is part of the freed memory and so the wrong held lock freed is triggered. Torsten - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc2-mm1: kcryptd vs lockdep
On Nov 19, 2007 10:00 PM, Milan Broz [EMAIL PROTECTED] wrote: Torsten Kaiser wrote: On Nov 19, 2007 8:56 AM, Ingo Molnar [EMAIL PROTECTED] wrote: * Torsten Kaiser [EMAIL PROTECTED] wrote: ... Above this acquire/release sequence is the following comment: #ifdef CONFIG_LOCKDEP /* * It is permissible to free the struct work_struct * from inside the function that is called from it, * this we need to take into account for lockdep too. * To avoid bogus held lock freed warnings as well * as problems when looking into work-lockdep_map, * make a copy and use that here. */ struct lockdep_map lockdep_map = work-lockdep_map; #endif Did something trigger this anyway? Anything I could try, apart from more boots with slub_debug=F? Please could you try which patch from the dm-crypt series cause this ? (agk-dm-dm-crypt* names.) I suspect agk-dm-dm-crypt-move-bio-submission-to-thread.patch because there is one work struct used subsequently in two threads... (io thread already started while crypt thread is processing lockdep_map after calling f(work)...) (btw these patches prepare dm-crypt for next patchset introducing async cryptoapi, so there should be no functional changes yet.) I looked at all of these agk-*-patches, as the error is not bisectable, because it triggers unreliable. The one that looks suspicious is agk-dm-dm-crypt-tidy-io-ref-counting.patch This one does a functional change, as there now is an additional ref on io-pending. Instead of only increasing io-pending if there really are more then one clone-bio, it will now take an additional ref in crypt_write_io_process(). I certainly agree with the cleanup, but this introduces the following change: Before the cleanup *all* calls to crypt_dec_pending() was via crypt_endio(). Now there is an additional call to crypt_dec_pending() to balance the additional ref placed into crypt_write_io_process(). And that one is not called from whatever context/thread cleans up after make_generic_request, but directly in the context/thread of the caller of crypt_write_io_process(), and that is kcryptd. So now it is possible (if all requests finish before crypt_write_io_process() returns) that kcryptd itself will release the bio, but the workqueue infrastructure still seems to have a lock on that. But as the comment in run_workqueue says, this should be legal, and I can't figure out what would make the the lockdep copy mechanism fail. Especially if the trigger was really a WRITE request, as with agk-dm-dm-crypt-move-bio-submission-to-thread.patch reverted this should never use the kcrypt_io-workqueue and so there should be not even the problem with using INIT_WORK twice on the same work_struct. ... or I just don't see the bug. Torsten - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc2-mm1: kcryptd vs lockdep
On Nov 20, 2007 7:55 AM, Torsten Kaiser [EMAIL PROTECTED] wrote: On Nov 19, 2007 10:00 PM, Milan Broz [EMAIL PROTECTED] wrote: Please could you try which patch from the dm-crypt series cause this ? (agk-dm-dm-crypt* names.) I suspect agk-dm-dm-crypt-move-bio-submission-to-thread.patch because there is one work struct used subsequently in two threads... (io thread already started while crypt thread is processing lockdep_map after calling f(work)...) After reverting only agk-dm-dm-crypt-move-bio-submission-to-thread.patch I also have not seen the 'held lock freed' message again. If it happens again with this revert, I will post that output. It happened again, here I post the output: Nov 23 10:56:17 treogen [ 58.364441] XFS mounting filesystem dm-0 Nov 23 10:56:17 treogen [ 58.519648] Ending clean XFS mount for filesystem: dm-0 Nov 23 10:56:17 treogen [ 58.858098] Nov 23 10:56:17 treogen [ 58.858104] = Nov 23 10:56:17 treogen [ 58.863316] [ BUG: held lock freed! ] Nov 23 10:56:17 treogen [ 58.866998] - Nov 23 10:56:17 treogen [ 58.870685] kcryptd/1022 is freeing memory 81011EAD4B00-81011EAD4B3F , with a lock still held there! Nov 23 10:56:17 treogen [ 58.880430] (kcryptd){--..}, at: [80247dd9] run_workqueue+0x129/0 x210 Nov 23 10:56:17 treogen [ 58.888014] 2 locks held by kcryptd/1022: Nov 23 10:56:17 treogen [ 58.892045] #0: (kcryptd){--..}, at: [80247dd9] run_workqueue+0x 129/0x210 Nov 23 10:56:17 treogen [ 58.900095] #1: (io-work#2){--..}, at: [80247dd9] run_workqueu e+0x129/0x210 Nov 23 10:56:17 treogen [ 58.908535] Nov 23 10:56:17 treogen [ 58.908535] stack backtrace: Nov 23 10:56:17 treogen [ 58.912954] Nov 23 10:56:17 treogen [ 58.912955] Call Trace: Nov 23 10:56:17 treogen [ 58.916944] [8025a5f0] debug_check_no_locks_freed+0x190/0x1b0 Nov 23 10:56:17 treogen [ 58.924313] [8026f192] mempool_free_slab+0x12/0x20 Nov 23 10:56:17 treogen [ 58.930073] [80296bb9] kmem_cache_free+0x79/0xe0 Nov 23 10:56:17 treogen [ 58.935665] [8026f192] mempool_free_slab+0x12/0x20 Nov 23 10:56:17 treogen [ 58.941424] [8026f22a] mempool_free+0x8a/0xa0 Nov 23 10:56:17 treogen [ 58.946755] [802c76af] bio_free+0x2f/0x50 Nov 23 10:56:17 treogen [ 58.951736] [804e36fd] dm_bio_destructor+0xd/0x10 Nov 23 10:56:17 treogen [ 58.957414] [802c7436] bio_put+0x26/0x30 Nov 23 10:56:17 treogen [ 58.962311] [804e3af3] clone_endio+0x83/0xb0 Nov 23 10:56:17 treogen [ 58.967553] [804eb860] kcryptd_do_crypt+0x0/0x290 Nov 23 10:56:17 treogen [ 58.973224] [802c72a9] bio_endio+0x19/0x40 Nov 23 10:56:17 treogen [ 58.978291] [804eb372] crypt_dec_pending+0x32/0x50 Nov 23 10:56:17 treogen [ 58.984050] [804eb8c4] kcryptd_do_crypt+0x64/0x290 Nov 23 10:56:17 treogen [ 58.989810] [804eb860] kcryptd_do_crypt+0x0/0x290 Nov 23 10:56:17 treogen [ 58.995483] [804eb860] kcryptd_do_crypt+0x0/0x290 Nov 23 10:56:17 treogen [ 59.001158] [80247e25] run_workqueue+0x175/0x210 Nov 23 10:56:17 treogen [ 59.006746] [80248af1] worker_thread+0x71/0xb0 Nov 23 10:56:17 treogen [ 59.012158] [8024c830] autoremove_wake_function+0x0/0x40 Nov 23 10:56:17 treogen [ 59.018435] [80248a80] worker_thread+0x0/0xb0 Nov 23 10:56:17 treogen [ 59.023764] [8024c43d] kthread+0x4d/0x80 Nov 23 10:56:17 treogen [ 59.028660] [8020cbc8] child_rip+0xa/0x12 Nov 23 10:56:17 treogen [ 59.033640] [8020c2df] restore_args+0x0/0x30 Nov 23 10:56:17 treogen [ 59.038880] [8024c3f0] kthread+0x0/0x80 Nov 23 10:56:17 treogen [ 59.043689] [8020cbbe] child_rip+0x0/0x12 Nov 23 10:56:17 treogen [ 59.048670] Nov 23 10:56:17 treogen [ 59.050190] INFO: lockdep is turned off. Nov 23 10:56:17 treogen [ 59.919020] pata_amd :00:04.0: version 0.3.10 From what I see the only difference between the other stack traces and this one is the following part: old traces with agk-dm-dm-crypt-move-bio-submission-to-thread.patch applied: [ 64.584415] [802c76af] bio_free+0x2f/0x50 [ 64.586337] [802c76e0] bio_fs_destructor+0x10/0x20 [ 64.588558] [802c7436] bio_put+0x26/0x30 [ 64.590446] [803834d9] xfs_buf_bio_end_io+0x99/0x120 [ 64.592734] [802c72a9] bio_endio+0x19/0x40 [ 64.594687] [804e3827] dec_pending+0x107/0x210 [ 64.596775] [804e3ae0] clone_endio+0x70/0xb0 new trace with agk-dm-dm-crypt-move-bio-submission-to-thread.patch reverted: [ 58.946755] [802c76af] bio_free+0x2f/0x50 [ 58.951736] [804e36fd] dm_bio_destructor+0xd/0x10 [ 58.957414] [802c7436] bio_put+0x26/0x30 [ 58.962311] [804e3af3] clone_endio+0x83/0xb0 (gdb) list *0x804e3ae0 0x804e3ae0 is in clone_endio (drivers/md/dm.c:539). 534
Re: 2.6.24-rc2-mm1: kcryptd vs lockdep
Alasdair G Kergon [EMAIL PROTECTED] wrote: Also io-pending may need better protection - atomic, but missing memory barriers? (May be getting away without sometimes due to side-effects of other function calls, but needs doing properly.) If it's using atomic_dec_and_test then that comes with an implicit memory barrier. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [lm-sensors] broken suspend [Was: 2.6.24-rc2-mm1]
On Wednesday, 21 of November 2007, Alan Stern wrote: > On Wed, 21 Nov 2007, Rafael J. Wysocki wrote: > > > > Is it possible to unregister a driver on CPU_DOWN_PREPARE_FROZEN? > > > > No. In that case the suspend core is holding the device's mutex and your > > attempt to unregister it will deadlock with it. > > > > Do you _have_ _to_ unregister the device at all? Why don't you just leave > > it registered on CPU_DOWN_PREPARE_FROZEN? The CPU is not going away > > physically in this case and it's _guaranteed_ that _cpu_up() will be called > > on > > it as soon as the hibernation image is ready or we are back from suspend. > > This leaves the device registered if for some reason the number of CPUs > after resuming from hibernation is smaller than the number of CPUs > before hibernation. Of course, in theory that's never supposed to > happen... Yes, that clearly would be a bug. Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc2-mm1: kcryptd vs lockdep
Hi, > > Ok, then I have question: Is the following pseudocode correct > > (and problem is in lock validation which checks something > > already initialized for another queue) or reusing work_struct > > is not permitted from inside called work function ? > > > > (Note comment in code "It is permissible to free the struct > > work_struct from inside the function that is called from it".) > > > > struct work_struct work; > > struct workqueue_struct *a, *b; > > > > do_b(*work) > > { > > /* do something else */ > > } > > > > do_a(*work) > > { > > /* do something */ > > INIT_WORK(, do_b); > > queue_work(b, ); > > } > > > > > > INIT_WORK(, do_a); > > queue_work(a, ); > > (just in case, in that particular case PREPARE_WORK() should be used) > > INIT_WORK(w) can be used if we know that "w" is not pending, and nobody > else can write to this work (say, queue_work(w) or cancel_work_sync(w)). > So currently the code above should work correctly. > > However, I'd say it is not correct, INIT_WORK() can throw out some debug > info for example, or the implementation could be changed. > > I'm not sure about CONFIG_LOCKDEP (Johannes cc'ed). INIT_WORK() does > lockdep_init_map(->lockdep_map) but run_workqueue() has a local copy, > looks ok. We explicitly need to use a copy of the lockdep_map for "locking" the work struct as per the quoted comment. So as far as I can tell, what INIT_WORK() is doing here is changing an at that point unused copy of the lockdep map so I think it should be fine. Not sure about the other fine points nor why you'd want this though :) johannes signature.asc Description: This is a digitally signed message part
Re: 2.6.24-rc2-mm1: kcryptd vs lockdep
Alasdair G Kergon wrote: > > - But what happens if kcryptd_crypt_write_convert_loop() calls > INIT_WORK/queue_work twice? Can't find this function. But "INIT_WORK + queue_work" twice is very wrong of course. Milan Broz wrote: > > Ok, then I have question: Is the following pseudocode correct > (and problem is in lock validation which checks something > already initialized for another queue) or reusing work_struct > is not permitted from inside called work function ? > > (Note comment in code "It is permissible to free the struct > work_struct from inside the function that is called from it".) > > struct work_struct work; > struct workqueue_struct *a, *b; > > do_b(*work) > { > /* do something else */ > } > > do_a(*work) > { > /* do something */ > INIT_WORK(, do_b); > queue_work(b, ); > } > > > INIT_WORK(, do_a); > queue_work(a, ); (just in case, in that particular case PREPARE_WORK() should be used) INIT_WORK(w) can be used if we know that "w" is not pending, and nobody else can write to this work (say, queue_work(w) or cancel_work_sync(w)). So currently the code above should work correctly. However, I'd say it is not correct, INIT_WORK() can throw out some debug info for example, or the implementation could be changed. I'm not sure about CONFIG_LOCKDEP (Johannes cc'ed). INIT_WORK() does lockdep_init_map(->lockdep_map) but run_workqueue() has a local copy, looks ok. Oleg. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [lm-sensors] broken suspend [Was: 2.6.24-rc2-mm1]
On Wed, 21 Nov 2007, Rafael J. Wysocki wrote: > > Is it possible to unregister a driver on CPU_DOWN_PREPARE_FROZEN? > > No. In that case the suspend core is holding the device's mutex and your > attempt to unregister it will deadlock with it. > > Do you _have_ _to_ unregister the device at all? Why don't you just leave > it registered on CPU_DOWN_PREPARE_FROZEN? The CPU is not going away > physically in this case and it's _guaranteed_ that _cpu_up() will be called on > it as soon as the hibernation image is ready or we are back from suspend. This leaves the device registered if for some reason the number of CPUs after resuming from hibernation is smaller than the number of CPUs before hibernation. Of course, in theory that's never supposed to happen... Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [lm-sensors] broken suspend [Was: 2.6.24-rc2-mm1]
On Wed, 21 Nov 2007, Rafael J. Wysocki wrote: Is it possible to unregister a driver on CPU_DOWN_PREPARE_FROZEN? No. In that case the suspend core is holding the device's mutex and your attempt to unregister it will deadlock with it. Do you _have_ _to_ unregister the device at all? Why don't you just leave it registered on CPU_DOWN_PREPARE_FROZEN? The CPU is not going away physically in this case and it's _guaranteed_ that _cpu_up() will be called on it as soon as the hibernation image is ready or we are back from suspend. This leaves the device registered if for some reason the number of CPUs after resuming from hibernation is smaller than the number of CPUs before hibernation. Of course, in theory that's never supposed to happen... Alan Stern - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc2-mm1: kcryptd vs lockdep
Alasdair G Kergon wrote: - But what happens if kcryptd_crypt_write_convert_loop() calls INIT_WORK/queue_work twice? Can't find this function. But INIT_WORK + queue_work twice is very wrong of course. Milan Broz wrote: Ok, then I have question: Is the following pseudocode correct (and problem is in lock validation which checks something already initialized for another queue) or reusing work_struct is not permitted from inside called work function ? (Note comment in code It is permissible to free the struct work_struct from inside the function that is called from it.) struct work_struct work; struct workqueue_struct *a, *b; do_b(*work) { /* do something else */ } do_a(*work) { /* do something */ INIT_WORK(work, do_b); queue_work(b, work); } INIT_WORK(work, do_a); queue_work(a, work); (just in case, in that particular case PREPARE_WORK() should be used) INIT_WORK(w) can be used if we know that w is not pending, and nobody else can write to this work (say, queue_work(w) or cancel_work_sync(w)). So currently the code above should work correctly. However, I'd say it is not correct, INIT_WORK() can throw out some debug info for example, or the implementation could be changed. I'm not sure about CONFIG_LOCKDEP (Johannes cc'ed). INIT_WORK() does lockdep_init_map(-lockdep_map) but run_workqueue() has a local copy, looks ok. Oleg. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc2-mm1: kcryptd vs lockdep
Hi, Ok, then I have question: Is the following pseudocode correct (and problem is in lock validation which checks something already initialized for another queue) or reusing work_struct is not permitted from inside called work function ? (Note comment in code It is permissible to free the struct work_struct from inside the function that is called from it.) struct work_struct work; struct workqueue_struct *a, *b; do_b(*work) { /* do something else */ } do_a(*work) { /* do something */ INIT_WORK(work, do_b); queue_work(b, work); } INIT_WORK(work, do_a); queue_work(a, work); (just in case, in that particular case PREPARE_WORK() should be used) INIT_WORK(w) can be used if we know that w is not pending, and nobody else can write to this work (say, queue_work(w) or cancel_work_sync(w)). So currently the code above should work correctly. However, I'd say it is not correct, INIT_WORK() can throw out some debug info for example, or the implementation could be changed. I'm not sure about CONFIG_LOCKDEP (Johannes cc'ed). INIT_WORK() does lockdep_init_map(-lockdep_map) but run_workqueue() has a local copy, looks ok. We explicitly need to use a copy of the lockdep_map for locking the work struct as per the quoted comment. So as far as I can tell, what INIT_WORK() is doing here is changing an at that point unused copy of the lockdep map so I think it should be fine. Not sure about the other fine points nor why you'd want this though :) johannes signature.asc Description: This is a digitally signed message part
Re: [lm-sensors] broken suspend [Was: 2.6.24-rc2-mm1]
On Wednesday, 21 of November 2007, Alan Stern wrote: On Wed, 21 Nov 2007, Rafael J. Wysocki wrote: Is it possible to unregister a driver on CPU_DOWN_PREPARE_FROZEN? No. In that case the suspend core is holding the device's mutex and your attempt to unregister it will deadlock with it. Do you _have_ _to_ unregister the device at all? Why don't you just leave it registered on CPU_DOWN_PREPARE_FROZEN? The CPU is not going away physically in this case and it's _guaranteed_ that _cpu_up() will be called on it as soon as the hibernation image is ready or we are back from suspend. This leaves the device registered if for some reason the number of CPUs after resuming from hibernation is smaller than the number of CPUs before hibernation. Of course, in theory that's never supposed to happen... Yes, that clearly would be a bug. Rafael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] 2.6.24-rc2-mm1 Warning at arch/x86/kernel/smp_32.c:561
On Nov 20, 2007 5:59 PM, Dave Young <[EMAIL PROTECTED]> wrote: > > On Nov 20, 2007 5:56 PM, Andrew Morton <[EMAIL PROTECTED]> wrote: > > > > On Tue, 20 Nov 2007 17:47:28 +0800 Dave Young <[EMAIL PROTECTED]> wrote: > > > > > Hi, > > > I encountered kernel warningsr. I just executed xawtv without video dev > > > being found. > > > > > > like this: > > > > > > WARNING: at arch/x86/kernel/smp_32.c:561 native_smp_call_function_mask() > > > [] native_smp_call_function_mask+0x149/0x150 > > > [] alloc_debug_processing+0xa9/0x130 > > > [] smp_callback+0x0/0x10 > > > [] smp_call_function+0x1c/0x20 > > > [] cpuidle_latency_notify+0x18/0x20 > > > [] notifier_call_chain+0x3e/0x70 > > > [] __blocking_notifier_call_chain+0x44/0x70 > > > [] blocking_notifier_call_chain+0x17/0x20 > > > [] pm_qos_add_requirement+0x8d/0xd0 > > > [] snd_pcm_hw_params+0x20c/0x2a0 [snd_pcm] > > > [] snd_pcm_hw_params_user+0x4e/0x90 [snd_pcm] > > > [] snd_pcm_capture_ioctl1+0x3d/0x230 [snd_pcm] > > > [] snd_pcm_hw_param_near+0x198/0x230 [snd_pcm_oss] > > > [] snd_pcm_kernel_ioctl+0x7e/0x90 [snd_pcm] > > > [] snd_pcm_oss_change_params+0x2fc/0x750 [snd_pcm_oss] > > > [] snd_pcm_oss_make_ready+0x47/0x60 [snd_pcm_oss] > > > [] snd_pcm_oss_sync+0x10e/0x290 [snd_pcm_oss] > > > [] snd_pcm_oss_release+0x9a/0xb0 [snd_pcm_oss] > > > [] __fput+0x16e/0x200 > > > [] filp_close+0x3c/0x80 > > > [] sys_close+0x69/0xd0 > > > [] syscall_call+0x7/0xb > > > [] xfrm_notify_sa+0x110/0x290 > > > === > > > > > > > That was hopefully fixed. You might care to test > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/mm/broken-out-2007-11-20-01-45.tar.gz > > to confirm that, if feeling sufficiently brave.. > > > Hi, I just confirm that I can't reproduce this after apply broken-out-2007-11-20-01-45 patch set. Regards dave - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [lm-sensors] broken suspend [Was: 2.6.24-rc2-mm1]
On Tuesday, 20 of November 2007, Mark M. Hoffman wrote: > Hi all: > > * Alan Stern <[EMAIL PROTECTED]> [2007-11-19 15:27:14 -0500]: > > On Mon, 19 Nov 2007, Rudolf Marek wrote: > > > > > Hello all, > > > >>> gives coretemp_cpu_callback -> coretemp_device_remove -> > > > >>> platform_device_unregister, so coretemp seems to be what I have and > > > >>> you don't. > > > > > > > > Yes. > > > > > > > > For the coretemp developers: coretemp_cpu_callback() needs to be more > > > > careful about what it does. During a system sleep transition (suspend, > > > > hibernate, resume) it isn't possible to register or unregister a > > > > device. Attempts to register will fail and attempts to unregister will > > > > block until the system sleep is over -- and for this callback that > > > > means hanging. > > > > > > Well I wrote the driver. Thanks for the clarification. If I recall > > > correctly I > > > looked how this part should be done from others drivers. Now while > > > checking > > > what happened to the file, seems Rafael added something related. > > > > > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=8bb7844286fb8c9fce6f65d8288aeb09d03a5e0d > > > > That does look like it was meant for exactly this sort of situation. > > > > > > It's not clear what the best way is to fix this. Perhaps the CPU > > > > notification should be sent along with a special flag indicating that > > > > the CPU transition is part of a system sleep (although this seems > > > > racy). Perhaps the driver should notice when a system sleep begins, > > > > and defer all CPU-change handling until after the sleep is over. > > > > > > maybe it does exist? CPU_DOWN_PREPARE ? > > > > > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/cpu-hotplug.txt;hb=HEAD > > > > > > Unfortunately I'm not very familiar with this, calling the > > > coretemp_device_remove from CPU_DOWN_PREPARE would help? Looking at > > > microcode > > > driver, seems it just hide sysfs interface from user. > > AFAICT from that documentation, it would have been better to unregister > the device on CPU_DOWN_PREPARE anyway. CPU_DEAD seems to be too late - > it's already gone by then. > > > I'm not sure exactly what you want to do here. But it seems like a > > waste to unregister the coretemp devices at the start of a system sleep > > and then register them back at the end. > > > > Could you simply leave the devices registered throughout the entire > > sleep? Of course, at the end you would have to check that all the CPUs > > really did come back up, and unregister the devices for the CPUs that > > are still offline. > > Is it possible to unregister a driver on CPU_DOWN_PREPARE_FROZEN? No. In that case the suspend core is holding the device's mutex and your attempt to unregister it will deadlock with it. Do you _have_ _to_ unregister the device at all? Why don't you just leave it registered on CPU_DOWN_PREPARE_FROZEN? The CPU is not going away physically in this case and it's _guaranteed_ that _cpu_up() will be called on it as soon as the hibernation image is ready or we are back from suspend. > If so, then the simplest fix would be the patch below (Jiri: feel free to > try it). Otherwise it would take a bit of refactoring to bring the sysfs > interface down/up for suspend/resume. > > commit ce9c7b78c839a6304696d90083eac08baad524ce > Author: Mark M. Hoffman <[EMAIL PROTECTED]> > Date: Tue Nov 20 07:51:50 2007 -0500 > > hwmon: (coretemp) fix suspend/resume hang > > Signed-off-by: Mark M. Hoffman <[EMAIL PROTECTED]> I'd do it like this: > diff --git a/drivers/hwmon/coretemp.c b/drivers/hwmon/coretemp.c > index 5c82ec7..afe2d31 100644 > --- a/drivers/hwmon/coretemp.c > +++ b/drivers/hwmon/coretemp.c > @@ -338,10 +338,12 @@ static int coretemp_cpu_callback(struct notifier_block > *nfb, > switch (action) { > case CPU_ONLINE: > case CPU_ONLINE_FROZEN: > + case CPU_DOWN_FAILED: > coretemp_device_add(cpu); + case CPU_DOWN_FAILED_FROZEN: > break; > - case CPU_DEAD: > - case CPU_DEAD_FROZEN: > + case CPU_DOWN_PREPARE: > coretemp_device_remove(cpu); + case CPU_DOWN_PREPARE_FROZEN: > break; > } Greetings, Rafael -- "Premature optimization is the root of all evil." - Donald Knuth - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc2-mm1: kcryptd vs lockdep
On Tue, Nov 20, 2007 at 03:40:30PM +0100, Milan Broz wrote: > (Note comment in code "It is permissible to free the struct > work_struct from inside the function that is called from it".) I don't understand yet how lockdep behaves if the work struct gets reused and the reused one finishes first. I renamed the kcryptd functions today in an attempt to disentangle this code a bit more. - io->pending reference counting looks correct (though used inconsistently when comparing READ with WRITE) - But what happens if kcryptd_crypt_write_convert_loop() calls INIT_WORK/queue_work twice? Alasdair -- [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc2-mm1: kcryptd vs lockdep
Torsten Kaiser wrote: > On Nov 19, 2007 10:00 PM, Milan Broz <[EMAIL PROTECTED]> wrote: >> Torsten Kaiser wrote: >>> Anything I could try, apart from more boots with slub_debug=F? > > One time it triggered with slub_debug=F, but no additional output. > With slub_debug=FP I have not seen it again, so I can't say if that > would yield more info. > >> Please could you try which patch from the dm-crypt series cause this ? >> (agk-dm-dm-crypt* names.) >> >> I suspect agk-dm-dm-crypt-move-bio-submission-to-thread.patch because >> there is one work struct used subsequently in two threads... >> (io thread already started while crypt thread is processing lockdep_map >> after calling f(work)...) > > After reverting only > agk-dm-dm-crypt-move-bio-submission-to-thread.patch I also have not > seen the 'held lock freed' message again. Ok, then I have question: Is the following pseudocode correct (and problem is in lock validation which checks something already initialized for another queue) or reusing work_struct is not permitted from inside called work function ? (Note comment in code "It is permissible to free the struct work_struct from inside the function that is called from it".) struct work_struct work; struct workqueue_struct *a, *b; do_b(*work) { /* do something else */ } do_a(*work) { /* do something */ INIT_WORK(, do_b); queue_work(b, ); } INIT_WORK(, do_a); queue_work(a, ); Milan -- [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [lm-sensors] broken suspend [Was: 2.6.24-rc2-mm1]
Hi all: * Alan Stern <[EMAIL PROTECTED]> [2007-11-19 15:27:14 -0500]: > On Mon, 19 Nov 2007, Rudolf Marek wrote: > > > Hello all, > > >>> gives coretemp_cpu_callback -> coretemp_device_remove -> > > >>> platform_device_unregister, so coretemp seems to be what I have and you > > >>> don't. > > > > > > Yes. > > > > > > For the coretemp developers: coretemp_cpu_callback() needs to be more > > > careful about what it does. During a system sleep transition (suspend, > > > hibernate, resume) it isn't possible to register or unregister a > > > device. Attempts to register will fail and attempts to unregister will > > > block until the system sleep is over -- and for this callback that > > > means hanging. > > > > Well I wrote the driver. Thanks for the clarification. If I recall > > correctly I > > looked how this part should be done from others drivers. Now while checking > > what happened to the file, seems Rafael added something related. > > > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=8bb7844286fb8c9fce6f65d8288aeb09d03a5e0d > > That does look like it was meant for exactly this sort of situation. > > > > It's not clear what the best way is to fix this. Perhaps the CPU > > > notification should be sent along with a special flag indicating that > > > the CPU transition is part of a system sleep (although this seems > > > racy). Perhaps the driver should notice when a system sleep begins, > > > and defer all CPU-change handling until after the sleep is over. > > > > maybe it does exist? CPU_DOWN_PREPARE ? > > > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/cpu-hotplug.txt;hb=HEAD > > > > Unfortunately I'm not very familiar with this, calling the > > coretemp_device_remove from CPU_DOWN_PREPARE would help? Looking at > > microcode > > driver, seems it just hide sysfs interface from user. AFAICT from that documentation, it would have been better to unregister the device on CPU_DOWN_PREPARE anyway. CPU_DEAD seems to be too late - it's already gone by then. > I'm not sure exactly what you want to do here. But it seems like a > waste to unregister the coretemp devices at the start of a system sleep > and then register them back at the end. > > Could you simply leave the devices registered throughout the entire > sleep? Of course, at the end you would have to check that all the CPUs > really did come back up, and unregister the devices for the CPUs that > are still offline. Is it possible to unregister a driver on CPU_DOWN_PREPARE_FROZEN? If so, then the simplest fix would be the patch below (Jiri: feel free to try it). Otherwise it would take a bit of refactoring to bring the sysfs interface down/up for suspend/resume. commit ce9c7b78c839a6304696d90083eac08baad524ce Author: Mark M. Hoffman <[EMAIL PROTECTED]> Date: Tue Nov 20 07:51:50 2007 -0500 hwmon: (coretemp) fix suspend/resume hang Signed-off-by: Mark M. Hoffman <[EMAIL PROTECTED]> diff --git a/drivers/hwmon/coretemp.c b/drivers/hwmon/coretemp.c index 5c82ec7..afe2d31 100644 --- a/drivers/hwmon/coretemp.c +++ b/drivers/hwmon/coretemp.c @@ -338,10 +338,12 @@ static int coretemp_cpu_callback(struct notifier_block *nfb, switch (action) { case CPU_ONLINE: case CPU_ONLINE_FROZEN: + case CPU_DOWN_FAILED: + case CPU_DOWN_FAILED_FROZEN: coretemp_device_add(cpu); break; - case CPU_DEAD: - case CPU_DEAD_FROZEN: + case CPU_DOWN_PREPARE: + case CPU_DOWN_PREPARE_FROZEN: coretemp_device_remove(cpu); break; } -- Mark M. Hoffman [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] 2.6.24-rc2-mm1 - smbd write fails
Andrew Morton wrote: > On Tue, 20 Nov 2007 14:22:43 +0530 Kamalesh Babulal <[EMAIL PROTECTED]> wrote: > >> Andrew Morton wrote: >>> On Tue, 20 Nov 2007 13:57:39 +0530 Kamalesh Babulal <[EMAIL PROTECTED]> >>> wrote: >>> >>>> Hi Andrew, >>>> >>>> Following calltrace is seen server, while running filesystem stress on smb >>>> mounted partition on the client machine. >>>> >>>> Nov 19 18:45:52 p55lp6 smbd[3304]: [2007/11/19 18:45:52, 0] >>>> lib/util_sock.c:write_data(562) >>>> Nov 19 18:45:52 p55lp6 smbd[3304]: write_data: write failure in writing >>>> to client 9.124.111.212. Error Broken pipe >>>> Nov 19 18:45:52 p55lp6 smbd[3304]: [2007/11/19 18:45:52, 0] >>>> lib/util_sock.c:send_smb(769) >>>> Nov 19 18:45:52 p55lp6 smbd[3304]: Error writing 39 bytes to client. -1. >>>> (Broken pipe) >>>> Nov 19 18:47:42 p55lp6 smbd[3650]: [2007/11/19 18:47:42, 0] >>>> smbd/oplock.c:oplock_timeout_handler(351) >>>> Nov 19 18:47:42 p55lp6 smbd[3650]: Oplock break failed for file >>>> p0/d3X >>>> XX/deX/d3cX/d6eXXX/f8d -- replying >>>> anyway >>>> Nov 19 18:47:42 p55lp6 kernel: [ 6960.261068] warning: process `smbd' gets >>>> w/ old libcap >>>> Nov 19 18:47:42 p55lp6 smbd[3650]: [2007/11/19 18:47:42, 0] >>>> smbd/oplock_linux.c:linux_release_kernel_oplock(193) >>>> Nov 19 18:47:43 p55lp6 smbd[3650]: linux_release_kernel_oplock: Error >>>> when removing kernel oplock on file p0/d3XXX >>>> /deX/d3cX/d6eXXX/f8d, >>>> dev = 807, inode = 30983, file >>>> _id = 501. Error w >>>> Nov 19 18:48:04 p55lp6 smbd[3650]: [2007/11/19 18:48:04, 0] >>>> lib/util_sock.c:write_data(562) >>>> Nov 19 18:48:04 p55lp6 smbd[3650]: write_data: write failure in writing >>>> to client 9.124.111.212. Error Connection reset by peer >>>> >>> So you have samba running on a 2.6.24-rc2-mm1 machine and samba is failing >>> with the above messages? >>> >> Hi Andrew, >> >> Yes, the above messages are seen with the 2.6.24-rc2-mm1 kernel. > > Oh. Well I don't know where to start looking for that one. > > Maybe someone fixed it amongst all the things which have been happening > recently. I'll upload an mm snapshot as soon as I can get some of it to > compile. Can you please retest with that? > > If it still fails and if this is reasonably reproducible I'm afraid I'd ask > if you have time to run a bisection search on it. > Sure, will retest it on the mm snapshot. -- Thanks & Regards, Kamalesh Babulal, Linux Technology Center, IBM, ISTL. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] 2.6.24-rc2-mm1 Warning at arch/x86/kernel/smp_32.c:561
On Nov 20, 2007 5:56 PM, Andrew Morton <[EMAIL PROTECTED]> wrote: > > On Tue, 20 Nov 2007 17:47:28 +0800 Dave Young <[EMAIL PROTECTED]> wrote: > > > Hi, > > I encountered kernel warningsr. I just executed xawtv without video dev > > being found. > > > > like this: > > > > WARNING: at arch/x86/kernel/smp_32.c:561 native_smp_call_function_mask() > > [] native_smp_call_function_mask+0x149/0x150 > > [] alloc_debug_processing+0xa9/0x130 > > [] smp_callback+0x0/0x10 > > [] smp_call_function+0x1c/0x20 > > [] cpuidle_latency_notify+0x18/0x20 > > [] notifier_call_chain+0x3e/0x70 > > [] __blocking_notifier_call_chain+0x44/0x70 > > [] blocking_notifier_call_chain+0x17/0x20 > > [] pm_qos_add_requirement+0x8d/0xd0 > > [] snd_pcm_hw_params+0x20c/0x2a0 [snd_pcm] > > [] snd_pcm_hw_params_user+0x4e/0x90 [snd_pcm] > > [] snd_pcm_capture_ioctl1+0x3d/0x230 [snd_pcm] > > [] snd_pcm_hw_param_near+0x198/0x230 [snd_pcm_oss] > > [] snd_pcm_kernel_ioctl+0x7e/0x90 [snd_pcm] > > [] snd_pcm_oss_change_params+0x2fc/0x750 [snd_pcm_oss] > > [] snd_pcm_oss_make_ready+0x47/0x60 [snd_pcm_oss] > > [] snd_pcm_oss_sync+0x10e/0x290 [snd_pcm_oss] > > [] snd_pcm_oss_release+0x9a/0xb0 [snd_pcm_oss] > > [] __fput+0x16e/0x200 > > [] filp_close+0x3c/0x80 > > [] sys_close+0x69/0xd0 > > [] syscall_call+0x7/0xb > > [] xfrm_notify_sa+0x110/0x290 > > === > > > > That was hopefully fixed. You might care to test > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/mm/broken-out-2007-11-20-01-45.tar.gz > to confirm that, if feeling sufficiently brave.. > Hi, I would like to try this tomorrow if have time, thanks. Regards dave - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] 2.6.24-rc2-mm1 Warning at arch/x86/kernel/smp_32.c:561
On Tue, 20 Nov 2007 17:47:28 +0800 Dave Young <[EMAIL PROTECTED]> wrote: > Hi, > I encountered kernel warningsr. I just executed xawtv without video dev being > found. > > like this: > > WARNING: at arch/x86/kernel/smp_32.c:561 native_smp_call_function_mask() > [] native_smp_call_function_mask+0x149/0x150 > [] alloc_debug_processing+0xa9/0x130 > [] smp_callback+0x0/0x10 > [] smp_call_function+0x1c/0x20 > [] cpuidle_latency_notify+0x18/0x20 > [] notifier_call_chain+0x3e/0x70 > [] __blocking_notifier_call_chain+0x44/0x70 > [] blocking_notifier_call_chain+0x17/0x20 > [] pm_qos_add_requirement+0x8d/0xd0 > [] snd_pcm_hw_params+0x20c/0x2a0 [snd_pcm] > [] snd_pcm_hw_params_user+0x4e/0x90 [snd_pcm] > [] snd_pcm_capture_ioctl1+0x3d/0x230 [snd_pcm] > [] snd_pcm_hw_param_near+0x198/0x230 [snd_pcm_oss] > [] snd_pcm_kernel_ioctl+0x7e/0x90 [snd_pcm] > [] snd_pcm_oss_change_params+0x2fc/0x750 [snd_pcm_oss] > [] snd_pcm_oss_make_ready+0x47/0x60 [snd_pcm_oss] > [] snd_pcm_oss_sync+0x10e/0x290 [snd_pcm_oss] > [] snd_pcm_oss_release+0x9a/0xb0 [snd_pcm_oss] > [] __fput+0x16e/0x200 > [] filp_close+0x3c/0x80 > [] sys_close+0x69/0xd0 > [] syscall_call+0x7/0xb > [] xfrm_notify_sa+0x110/0x290 > === > That was hopefully fixed. You might care to test ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/mm/broken-out-2007-11-20-01-45.tar.gz to confirm that, if feeling sufficiently brave.. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[BUG] 2.6.24-rc2-mm1 Warning at arch/x86/kernel/smp_32.c:561
Hi, I encountered kernel warningsr. I just executed xawtv without video dev being found. like this: WARNING: at arch/x86/kernel/smp_32.c:561 native_smp_call_function_mask() [] native_smp_call_function_mask+0x149/0x150 [] alloc_debug_processing+0xa9/0x130 [] smp_callback+0x0/0x10 [] smp_call_function+0x1c/0x20 [] cpuidle_latency_notify+0x18/0x20 [] notifier_call_chain+0x3e/0x70 [] __blocking_notifier_call_chain+0x44/0x70 [] blocking_notifier_call_chain+0x17/0x20 [] pm_qos_add_requirement+0x8d/0xd0 [] snd_pcm_hw_params+0x20c/0x2a0 [snd_pcm] [] snd_pcm_hw_params_user+0x4e/0x90 [snd_pcm] [] snd_pcm_capture_ioctl1+0x3d/0x230 [snd_pcm] [] snd_pcm_hw_param_near+0x198/0x230 [snd_pcm_oss] [] snd_pcm_kernel_ioctl+0x7e/0x90 [snd_pcm] [] snd_pcm_oss_change_params+0x2fc/0x750 [snd_pcm_oss] [] snd_pcm_oss_make_ready+0x47/0x60 [snd_pcm_oss] [] snd_pcm_oss_sync+0x10e/0x290 [snd_pcm_oss] [] snd_pcm_oss_release+0x9a/0xb0 [snd_pcm_oss] [] __fput+0x16e/0x200 [] filp_close+0x3c/0x80 [] sys_close+0x69/0xd0 [] syscall_call+0x7/0xb [] xfrm_notify_sa+0x110/0x290 === config files : # # Automatically generated make config: don't edit # Linux kernel version: 2.6.24-rc2-mm1 # Tue Nov 20 17:24:16 2007 # CONFIG_X86_32=y CONFIG_GENERIC_TIME=y CONFIG_GENERIC_CMOS_UPDATE=y CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_SEMAPHORE_SLEEPERS=y CONFIG_X86=y CONFIG_FAST_CMPXCHG_LOCAL=y CONFIG_MMU=y CONFIG_ZONE_DMA=y CONFIG_QUICKLIST=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_GENERIC_BUG=y # CONFIG_GENERIC_GPIO is not set CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_DMI=y CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" # # General setup # CONFIG_EXPERIMENTAL=y CONFIG_LOCK_KERNEL=y CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_LOCALVERSION="" # CONFIG_LOCALVERSION_AUTO is not set CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y CONFIG_POSIX_MQUEUE=y CONFIG_BSD_PROCESS_ACCT=y # CONFIG_BSD_PROCESS_ACCT_V3 is not set # CONFIG_TASKSTATS is not set # CONFIG_USER_NS is not set CONFIG_PID_NS=y # CONFIG_AUDIT is not set CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y CONFIG_LOG_BUF_SHIFT=16 # CONFIG_CGROUPS is not set CONFIG_FAIR_GROUP_SCHED=y CONFIG_FAIR_USER_SCHED=y # CONFIG_FAIR_CGROUP_SCHED is not set CONFIG_SYSFS_DEPRECATED=y # CONFIG_RELAY is not set CONFIG_BLK_DEV_INITRD=y CONFIG_INITRAMFS_SOURCE="" # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set CONFIG_SYSCTL=y # CONFIG_EMBEDDED is not set CONFIG_UID16=y CONFIG_SYSCTL_SYSCALL=y CONFIG_KALLSYMS=y # CONFIG_KALLSYMS_EXTRA_PASS is not set CONFIG_HOTPLUG=y CONFIG_PRINTK=y CONFIG_BUG=y CONFIG_ELF_CORE=y CONFIG_BASE_FULL=y CONFIG_FUTEX=y CONFIG_ANON_INODES=y CONFIG_EPOLL=y CONFIG_SIGNALFD=y CONFIG_EVENTFD=y CONFIG_SHMEM=y CONFIG_VM_EVENT_COUNTERS=y CONFIG_SLUB_DEBUG=y # CONFIG_SLAB is not set CONFIG_SLUB=y # CONFIG_SLOB is not set CONFIG_PROC_PAGE_MONITOR=y CONFIG_RT_MUTEXES=y # CONFIG_TINY_SHMEM is not set CONFIG_BASE_SMALL=0 CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y CONFIG_MODULE_FORCE_UNLOAD=y # CONFIG_MODVERSIONS is not set # CONFIG_MODULE_SRCVERSION_ALL is not set CONFIG_KMOD=y CONFIG_STOP_MACHINE=y CONFIG_BLOCK=y CONFIG_LBD=y # CONFIG_BLK_DEV_IO_TRACE is not set CONFIG_LSF=y CONFIG_BLK_DEV_BSG=y # # IO Schedulers # CONFIG_IOSCHED_NOOP=y CONFIG_IOSCHED_AS=y CONFIG_IOSCHED_DEADLINE=y CONFIG_IOSCHED_CFQ=y CONFIG_DEFAULT_AS=y # CONFIG_DEFAULT_DEADLINE is not set # CONFIG_DEFAULT_CFQ is not set # CONFIG_DEFAULT_NOOP is not set CONFIG_DEFAULT_IOSCHED="anticipatory" CONFIG_PREEMPT_NOTIFIERS=y # # Processor type and features # CONFIG_TICK_ONESHOT=y CONFIG_NO_HZ=y CONFIG_HIGH_RES_TIMERS=y CONFIG_GENERIC_CLOCKEVENTS_BUILD=y CONFIG_SMP=y CONFIG_X86_PC=y # CONFIG_X86_ELAN is not set # CONFIG_X86_VOYAGER is not set # CONFIG_X86_NUMAQ is not set # CONFIG_X86_RDC321X is not set # CONFIG_X86_SUMMIT is not set # CONFIG_X86_BIGSMP is not set # CONFIG_X86_VISWS is not set # CONFIG_X86_GENERICARCH is not set # CONFIG_X86_ES7000 is not set CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER=y # CONFIG_PARAVIRT_GUEST is not set # CONFIG_M386 is not set # CONFIG_M486 is not set # CONFIG_M586 is not set # CONFIG_M586TSC is not set # CONFIG_M586MMX is not set # CONFIG_M686 is not set # CONFIG_MPENTIUMII is not set # CONFIG_MPENTIUMIII is not set # CONFIG_MPENTIUMM is not set # CONFIG_MCORE2 is not set CONFIG_MPENTIUM4=y # CONFIG_MK6 is not set # CONFIG_MK7 is not set # CONFIG_MK8 is not set # CONFIG_MCRUSOE is not set # CONFIG_MEFFICEON is not set # CONFIG_MWINCHIPC6 is not set # CONFIG_MWINCHIP2 is not set # CONFIG_MWINCHIP3D is not set # CONFIG_MGEODEGX1 is not set # CONFIG_MGEODE_LX is not set # CONFIG_MCYRIXIII is not set # CONFIG_MVIAC3_2 is not set # CONFIG_MVIAC7 is not set CONFIG_X86_GENERIC=y CONFIG_X86_CMPXCHG=y CONFIG_X86_L1_CACHE_SHIFT=7 CONFIG_X86_XADD=y CONFIG_RWSEM_XCHGADD_ALGORITHM=
Re: [BUG] 2.6.24-rc2-mm1 - smbd write fails
On Tue, 20 Nov 2007 14:22:43 +0530 Kamalesh Babulal <[EMAIL PROTECTED]> wrote: > Andrew Morton wrote: > > On Tue, 20 Nov 2007 13:57:39 +0530 Kamalesh Babulal <[EMAIL PROTECTED]> > > wrote: > > > >> Hi Andrew, > >> > >> Following calltrace is seen server, while running filesystem stress on smb > >> mounted partition on the client machine. > >> > >> Nov 19 18:45:52 p55lp6 smbd[3304]: [2007/11/19 18:45:52, 0] > >> lib/util_sock.c:write_data(562) > >> Nov 19 18:45:52 p55lp6 smbd[3304]: write_data: write failure in writing > >> to client 9.124.111.212. Error Broken pipe > >> Nov 19 18:45:52 p55lp6 smbd[3304]: [2007/11/19 18:45:52, 0] > >> lib/util_sock.c:send_smb(769) > >> Nov 19 18:45:52 p55lp6 smbd[3304]: Error writing 39 bytes to client. -1. > >> (Broken pipe) > >> Nov 19 18:47:42 p55lp6 smbd[3650]: [2007/11/19 18:47:42, 0] > >> smbd/oplock.c:oplock_timeout_handler(351) > >> Nov 19 18:47:42 p55lp6 smbd[3650]: Oplock break failed for file > >> p0/d3X > >> XX/deX/d3cX/d6eXXX/f8d -- replying > >> anyway > >> Nov 19 18:47:42 p55lp6 kernel: [ 6960.261068] warning: process `smbd' gets > >> w/ old libcap > >> Nov 19 18:47:42 p55lp6 smbd[3650]: [2007/11/19 18:47:42, 0] > >> smbd/oplock_linux.c:linux_release_kernel_oplock(193) > >> Nov 19 18:47:43 p55lp6 smbd[3650]: linux_release_kernel_oplock: Error > >> when removing kernel oplock on file p0/d3XXX > >> /deX/d3cX/d6eXXX/f8d, > >> dev = 807, inode = 30983, file > >> _id = 501. Error w > >> Nov 19 18:48:04 p55lp6 smbd[3650]: [2007/11/19 18:48:04, 0] > >> lib/util_sock.c:write_data(562) > >> Nov 19 18:48:04 p55lp6 smbd[3650]: write_data: write failure in writing > >> to client 9.124.111.212. Error Connection reset by peer > >> > > > > So you have samba running on a 2.6.24-rc2-mm1 machine and samba is failing > > with the above messages? > > > Hi Andrew, > > Yes, the above messages are seen with the 2.6.24-rc2-mm1 kernel. Oh. Well I don't know where to start looking for that one. Maybe someone fixed it amongst all the things which have been happening recently. I'll upload an mm snapshot as soon as I can get some of it to compile. Can you please retest with that? If it still fails and if this is reasonably reproducible I'm afraid I'd ask if you have time to run a bisection search on it. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] 2.6.24-rc2-mm1 - smbd write fails
Andrew Morton wrote: > On Tue, 20 Nov 2007 13:57:39 +0530 Kamalesh Babulal <[EMAIL PROTECTED]> wrote: > >> Hi Andrew, >> >> Following calltrace is seen server, while running filesystem stress on smb >> mounted partition on the client machine. >> >> Nov 19 18:45:52 p55lp6 smbd[3304]: [2007/11/19 18:45:52, 0] >> lib/util_sock.c:write_data(562) >> Nov 19 18:45:52 p55lp6 smbd[3304]: write_data: write failure in writing to >> client 9.124.111.212. Error Broken pipe >> Nov 19 18:45:52 p55lp6 smbd[3304]: [2007/11/19 18:45:52, 0] >> lib/util_sock.c:send_smb(769) >> Nov 19 18:45:52 p55lp6 smbd[3304]: Error writing 39 bytes to client. -1. >> (Broken pipe) >> Nov 19 18:47:42 p55lp6 smbd[3650]: [2007/11/19 18:47:42, 0] >> smbd/oplock.c:oplock_timeout_handler(351) >> Nov 19 18:47:42 p55lp6 smbd[3650]: Oplock break failed for file >> p0/d3X >> XX/deX/d3cX/d6eXXX/f8d -- replying >> anyway >> Nov 19 18:47:42 p55lp6 kernel: [ 6960.261068] warning: process `smbd' gets >> w/ old libcap >> Nov 19 18:47:42 p55lp6 smbd[3650]: [2007/11/19 18:47:42, 0] >> smbd/oplock_linux.c:linux_release_kernel_oplock(193) >> Nov 19 18:47:43 p55lp6 smbd[3650]: linux_release_kernel_oplock: Error when >> removing kernel oplock on file p0/d3XXX >> /deX/d3cX/d6eXXX/f8d, >> dev = 807, inode = 30983, file >> _id = 501. Error w >> Nov 19 18:48:04 p55lp6 smbd[3650]: [2007/11/19 18:48:04, 0] >> lib/util_sock.c:write_data(562) >> Nov 19 18:48:04 p55lp6 smbd[3650]: write_data: write failure in writing to >> client 9.124.111.212. Error Connection reset by peer >> > > So you have samba running on a 2.6.24-rc2-mm1 machine and samba is failing > with the above messages? > Hi Andrew, Yes, the above messages are seen with the 2.6.24-rc2-mm1 kernel. -- Thanks & Regards, Kamalesh Babulal, Linux Technology Center, IBM, ISTL. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] 2.6.24-rc2-mm1 - smbd write fails
On Tue, 20 Nov 2007 13:57:39 +0530 Kamalesh Babulal <[EMAIL PROTECTED]> wrote: > Hi Andrew, > > Following calltrace is seen server, while running filesystem stress on smb > mounted partition on the client machine. > > Nov 19 18:45:52 p55lp6 smbd[3304]: [2007/11/19 18:45:52, 0] > lib/util_sock.c:write_data(562) > Nov 19 18:45:52 p55lp6 smbd[3304]: write_data: write failure in writing to > client 9.124.111.212. Error Broken pipe > Nov 19 18:45:52 p55lp6 smbd[3304]: [2007/11/19 18:45:52, 0] > lib/util_sock.c:send_smb(769) > Nov 19 18:45:52 p55lp6 smbd[3304]: Error writing 39 bytes to client. -1. > (Broken pipe) > Nov 19 18:47:42 p55lp6 smbd[3650]: [2007/11/19 18:47:42, 0] > smbd/oplock.c:oplock_timeout_handler(351) > Nov 19 18:47:42 p55lp6 smbd[3650]: Oplock break failed for file > p0/d3X > XX/deX/d3cX/d6eXXX/f8d -- replying > anyway > Nov 19 18:47:42 p55lp6 kernel: [ 6960.261068] warning: process `smbd' gets w/ > old libcap > Nov 19 18:47:42 p55lp6 smbd[3650]: [2007/11/19 18:47:42, 0] > smbd/oplock_linux.c:linux_release_kernel_oplock(193) > Nov 19 18:47:43 p55lp6 smbd[3650]: linux_release_kernel_oplock: Error when > removing kernel oplock on file p0/d3XXX > /deX/d3cX/d6eXXX/f8d, > dev = 807, inode = 30983, file > _id = 501. Error w > Nov 19 18:48:04 p55lp6 smbd[3650]: [2007/11/19 18:48:04, 0] > lib/util_sock.c:write_data(562) > Nov 19 18:48:04 p55lp6 smbd[3650]: write_data: write failure in writing to > client 9.124.111.212. Error Connection reset by peer > So you have samba running on a 2.6.24-rc2-mm1 machine and samba is failing with the above messages? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[BUG] 2.6.24-rc2-mm1 - smbd write fails
Hi Andrew, Following calltrace is seen server, while running filesystem stress on smb mounted partition on the client machine. Nov 19 18:45:52 p55lp6 smbd[3304]: [2007/11/19 18:45:52, 0] lib/util_sock.c:write_data(562) Nov 19 18:45:52 p55lp6 smbd[3304]: write_data: write failure in writing to client 9.124.111.212. Error Broken pipe Nov 19 18:45:52 p55lp6 smbd[3304]: [2007/11/19 18:45:52, 0] lib/util_sock.c:send_smb(769) Nov 19 18:45:52 p55lp6 smbd[3304]: Error writing 39 bytes to client. -1. (Broken pipe) Nov 19 18:47:42 p55lp6 smbd[3650]: [2007/11/19 18:47:42, 0] smbd/oplock.c:oplock_timeout_handler(351) Nov 19 18:47:42 p55lp6 smbd[3650]: Oplock break failed for file p0/d3X XX/deX/d3cX/d6eXXX/f8d -- replying anyway Nov 19 18:47:42 p55lp6 kernel: [ 6960.261068] warning: process `smbd' gets w/ old libcap Nov 19 18:47:42 p55lp6 smbd[3650]: [2007/11/19 18:47:42, 0] smbd/oplock_linux.c:linux_release_kernel_oplock(193) Nov 19 18:47:43 p55lp6 smbd[3650]: linux_release_kernel_oplock: Error when removing kernel oplock on file p0/d3XXX /deX/d3cX/d6eXXX/f8d, dev = 807, inode = 30983, file _id = 501. Error w Nov 19 18:48:04 p55lp6 smbd[3650]: [2007/11/19 18:48:04, 0] lib/util_sock.c:write_data(562) Nov 19 18:48:04 p55lp6 smbd[3650]: write_data: write failure in writing to client 9.124.111.212. Error Connection reset by peer -- Thanks & Regards, Kamalesh Babulal, Linux Technology Center, IBM, ISTL. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] 2.6.24-rc2-mm1 - smbd write fails
On Tue, 20 Nov 2007 13:57:39 +0530 Kamalesh Babulal [EMAIL PROTECTED] wrote: Hi Andrew, Following calltrace is seen server, while running filesystem stress on smb mounted partition on the client machine. Nov 19 18:45:52 p55lp6 smbd[3304]: [2007/11/19 18:45:52, 0] lib/util_sock.c:write_data(562) Nov 19 18:45:52 p55lp6 smbd[3304]: write_data: write failure in writing to client 9.124.111.212. Error Broken pipe Nov 19 18:45:52 p55lp6 smbd[3304]: [2007/11/19 18:45:52, 0] lib/util_sock.c:send_smb(769) Nov 19 18:45:52 p55lp6 smbd[3304]: Error writing 39 bytes to client. -1. (Broken pipe) Nov 19 18:47:42 p55lp6 smbd[3650]: [2007/11/19 18:47:42, 0] smbd/oplock.c:oplock_timeout_handler(351) Nov 19 18:47:42 p55lp6 smbd[3650]: Oplock break failed for file p0/d3X XX/deX/d3cX/d6eXXX/f8d -- replying anyway Nov 19 18:47:42 p55lp6 kernel: [ 6960.261068] warning: process `smbd' gets w/ old libcap Nov 19 18:47:42 p55lp6 smbd[3650]: [2007/11/19 18:47:42, 0] smbd/oplock_linux.c:linux_release_kernel_oplock(193) Nov 19 18:47:43 p55lp6 smbd[3650]: linux_release_kernel_oplock: Error when removing kernel oplock on file p0/d3XXX /deX/d3cX/d6eXXX/f8d, dev = 807, inode = 30983, file _id = 501. Error w Nov 19 18:48:04 p55lp6 smbd[3650]: [2007/11/19 18:48:04, 0] lib/util_sock.c:write_data(562) Nov 19 18:48:04 p55lp6 smbd[3650]: write_data: write failure in writing to client 9.124.111.212. Error Connection reset by peer So you have samba running on a 2.6.24-rc2-mm1 machine and samba is failing with the above messages? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] 2.6.24-rc2-mm1 - smbd write fails
Andrew Morton wrote: On Tue, 20 Nov 2007 13:57:39 +0530 Kamalesh Babulal [EMAIL PROTECTED] wrote: Hi Andrew, Following calltrace is seen server, while running filesystem stress on smb mounted partition on the client machine. Nov 19 18:45:52 p55lp6 smbd[3304]: [2007/11/19 18:45:52, 0] lib/util_sock.c:write_data(562) Nov 19 18:45:52 p55lp6 smbd[3304]: write_data: write failure in writing to client 9.124.111.212. Error Broken pipe Nov 19 18:45:52 p55lp6 smbd[3304]: [2007/11/19 18:45:52, 0] lib/util_sock.c:send_smb(769) Nov 19 18:45:52 p55lp6 smbd[3304]: Error writing 39 bytes to client. -1. (Broken pipe) Nov 19 18:47:42 p55lp6 smbd[3650]: [2007/11/19 18:47:42, 0] smbd/oplock.c:oplock_timeout_handler(351) Nov 19 18:47:42 p55lp6 smbd[3650]: Oplock break failed for file p0/d3X XX/deX/d3cX/d6eXXX/f8d -- replying anyway Nov 19 18:47:42 p55lp6 kernel: [ 6960.261068] warning: process `smbd' gets w/ old libcap Nov 19 18:47:42 p55lp6 smbd[3650]: [2007/11/19 18:47:42, 0] smbd/oplock_linux.c:linux_release_kernel_oplock(193) Nov 19 18:47:43 p55lp6 smbd[3650]: linux_release_kernel_oplock: Error when removing kernel oplock on file p0/d3XXX /deX/d3cX/d6eXXX/f8d, dev = 807, inode = 30983, file _id = 501. Error w Nov 19 18:48:04 p55lp6 smbd[3650]: [2007/11/19 18:48:04, 0] lib/util_sock.c:write_data(562) Nov 19 18:48:04 p55lp6 smbd[3650]: write_data: write failure in writing to client 9.124.111.212. Error Connection reset by peer So you have samba running on a 2.6.24-rc2-mm1 machine and samba is failing with the above messages? Hi Andrew, Yes, the above messages are seen with the 2.6.24-rc2-mm1 kernel. -- Thanks Regards, Kamalesh Babulal, Linux Technology Center, IBM, ISTL. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] 2.6.24-rc2-mm1 - smbd write fails
Andrew Morton wrote: On Tue, 20 Nov 2007 14:22:43 +0530 Kamalesh Babulal [EMAIL PROTECTED] wrote: Andrew Morton wrote: On Tue, 20 Nov 2007 13:57:39 +0530 Kamalesh Babulal [EMAIL PROTECTED] wrote: Hi Andrew, Following calltrace is seen server, while running filesystem stress on smb mounted partition on the client machine. Nov 19 18:45:52 p55lp6 smbd[3304]: [2007/11/19 18:45:52, 0] lib/util_sock.c:write_data(562) Nov 19 18:45:52 p55lp6 smbd[3304]: write_data: write failure in writing to client 9.124.111.212. Error Broken pipe Nov 19 18:45:52 p55lp6 smbd[3304]: [2007/11/19 18:45:52, 0] lib/util_sock.c:send_smb(769) Nov 19 18:45:52 p55lp6 smbd[3304]: Error writing 39 bytes to client. -1. (Broken pipe) Nov 19 18:47:42 p55lp6 smbd[3650]: [2007/11/19 18:47:42, 0] smbd/oplock.c:oplock_timeout_handler(351) Nov 19 18:47:42 p55lp6 smbd[3650]: Oplock break failed for file p0/d3X XX/deX/d3cX/d6eXXX/f8d -- replying anyway Nov 19 18:47:42 p55lp6 kernel: [ 6960.261068] warning: process `smbd' gets w/ old libcap Nov 19 18:47:42 p55lp6 smbd[3650]: [2007/11/19 18:47:42, 0] smbd/oplock_linux.c:linux_release_kernel_oplock(193) Nov 19 18:47:43 p55lp6 smbd[3650]: linux_release_kernel_oplock: Error when removing kernel oplock on file p0/d3XXX /deX/d3cX/d6eXXX/f8d, dev = 807, inode = 30983, file _id = 501. Error w Nov 19 18:48:04 p55lp6 smbd[3650]: [2007/11/19 18:48:04, 0] lib/util_sock.c:write_data(562) Nov 19 18:48:04 p55lp6 smbd[3650]: write_data: write failure in writing to client 9.124.111.212. Error Connection reset by peer So you have samba running on a 2.6.24-rc2-mm1 machine and samba is failing with the above messages? Hi Andrew, Yes, the above messages are seen with the 2.6.24-rc2-mm1 kernel. Oh. Well I don't know where to start looking for that one. Maybe someone fixed it amongst all the things which have been happening recently. I'll upload an mm snapshot as soon as I can get some of it to compile. Can you please retest with that? If it still fails and if this is reasonably reproducible I'm afraid I'd ask if you have time to run a bisection search on it. Sure, will retest it on the mm snapshot. -- Thanks Regards, Kamalesh Babulal, Linux Technology Center, IBM, ISTL. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] 2.6.24-rc2-mm1 Warning at arch/x86/kernel/smp_32.c:561
On Tue, 20 Nov 2007 17:47:28 +0800 Dave Young [EMAIL PROTECTED] wrote: Hi, I encountered kernel warningsr. I just executed xawtv without video dev being found. like this: WARNING: at arch/x86/kernel/smp_32.c:561 native_smp_call_function_mask() [c0118769] native_smp_call_function_mask+0x149/0x150 [c0178dd9] alloc_debug_processing+0xa9/0x130 [c0372da0] smp_callback+0x0/0x10 [c0119b7c] smp_call_function+0x1c/0x20 [c0372dc8] cpuidle_latency_notify+0x18/0x20 [c0144eae] notifier_call_chain+0x3e/0x70 [c01450d4] __blocking_notifier_call_chain+0x44/0x70 [c0145117] blocking_notifier_call_chain+0x17/0x20 [c01454fd] pm_qos_add_requirement+0x8d/0xd0 [f887030c] snd_pcm_hw_params+0x20c/0x2a0 [snd_pcm] [f88703ee] snd_pcm_hw_params_user+0x4e/0x90 [snd_pcm] [f88741ed] snd_pcm_capture_ioctl1+0x3d/0x230 [snd_pcm] [f88c1b28] snd_pcm_hw_param_near+0x198/0x230 [snd_pcm_oss] [f88744fe] snd_pcm_kernel_ioctl+0x7e/0x90 [snd_pcm] [f88c28fc] snd_pcm_oss_change_params+0x2fc/0x750 [snd_pcm_oss] [f88c2e47] snd_pcm_oss_make_ready+0x47/0x60 [snd_pcm_oss] [f88c3a4e] snd_pcm_oss_sync+0x10e/0x290 [snd_pcm_oss] [f88c4daa] snd_pcm_oss_release+0x9a/0xb0 [snd_pcm_oss] [c01801ae] __fput+0x16e/0x200 [c017e35c] filp_close+0x3c/0x80 [c017e409] sys_close+0x69/0xd0 [c01042da] syscall_call+0x7/0xb [c040] xfrm_notify_sa+0x110/0x290 === That was hopefully fixed. You might care to test ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/mm/broken-out-2007-11-20-01-45.tar.gz to confirm that, if feeling sufficiently brave.. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] 2.6.24-rc2-mm1 - smbd write fails
On Tue, 20 Nov 2007 14:22:43 +0530 Kamalesh Babulal [EMAIL PROTECTED] wrote: Andrew Morton wrote: On Tue, 20 Nov 2007 13:57:39 +0530 Kamalesh Babulal [EMAIL PROTECTED] wrote: Hi Andrew, Following calltrace is seen server, while running filesystem stress on smb mounted partition on the client machine. Nov 19 18:45:52 p55lp6 smbd[3304]: [2007/11/19 18:45:52, 0] lib/util_sock.c:write_data(562) Nov 19 18:45:52 p55lp6 smbd[3304]: write_data: write failure in writing to client 9.124.111.212. Error Broken pipe Nov 19 18:45:52 p55lp6 smbd[3304]: [2007/11/19 18:45:52, 0] lib/util_sock.c:send_smb(769) Nov 19 18:45:52 p55lp6 smbd[3304]: Error writing 39 bytes to client. -1. (Broken pipe) Nov 19 18:47:42 p55lp6 smbd[3650]: [2007/11/19 18:47:42, 0] smbd/oplock.c:oplock_timeout_handler(351) Nov 19 18:47:42 p55lp6 smbd[3650]: Oplock break failed for file p0/d3X XX/deX/d3cX/d6eXXX/f8d -- replying anyway Nov 19 18:47:42 p55lp6 kernel: [ 6960.261068] warning: process `smbd' gets w/ old libcap Nov 19 18:47:42 p55lp6 smbd[3650]: [2007/11/19 18:47:42, 0] smbd/oplock_linux.c:linux_release_kernel_oplock(193) Nov 19 18:47:43 p55lp6 smbd[3650]: linux_release_kernel_oplock: Error when removing kernel oplock on file p0/d3XXX /deX/d3cX/d6eXXX/f8d, dev = 807, inode = 30983, file _id = 501. Error w Nov 19 18:48:04 p55lp6 smbd[3650]: [2007/11/19 18:48:04, 0] lib/util_sock.c:write_data(562) Nov 19 18:48:04 p55lp6 smbd[3650]: write_data: write failure in writing to client 9.124.111.212. Error Connection reset by peer So you have samba running on a 2.6.24-rc2-mm1 machine and samba is failing with the above messages? Hi Andrew, Yes, the above messages are seen with the 2.6.24-rc2-mm1 kernel. Oh. Well I don't know where to start looking for that one. Maybe someone fixed it amongst all the things which have been happening recently. I'll upload an mm snapshot as soon as I can get some of it to compile. Can you please retest with that? If it still fails and if this is reasonably reproducible I'm afraid I'd ask if you have time to run a bisection search on it. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[BUG] 2.6.24-rc2-mm1 - smbd write fails
Hi Andrew, Following calltrace is seen server, while running filesystem stress on smb mounted partition on the client machine. Nov 19 18:45:52 p55lp6 smbd[3304]: [2007/11/19 18:45:52, 0] lib/util_sock.c:write_data(562) Nov 19 18:45:52 p55lp6 smbd[3304]: write_data: write failure in writing to client 9.124.111.212. Error Broken pipe Nov 19 18:45:52 p55lp6 smbd[3304]: [2007/11/19 18:45:52, 0] lib/util_sock.c:send_smb(769) Nov 19 18:45:52 p55lp6 smbd[3304]: Error writing 39 bytes to client. -1. (Broken pipe) Nov 19 18:47:42 p55lp6 smbd[3650]: [2007/11/19 18:47:42, 0] smbd/oplock.c:oplock_timeout_handler(351) Nov 19 18:47:42 p55lp6 smbd[3650]: Oplock break failed for file p0/d3X XX/deX/d3cX/d6eXXX/f8d -- replying anyway Nov 19 18:47:42 p55lp6 kernel: [ 6960.261068] warning: process `smbd' gets w/ old libcap Nov 19 18:47:42 p55lp6 smbd[3650]: [2007/11/19 18:47:42, 0] smbd/oplock_linux.c:linux_release_kernel_oplock(193) Nov 19 18:47:43 p55lp6 smbd[3650]: linux_release_kernel_oplock: Error when removing kernel oplock on file p0/d3XXX /deX/d3cX/d6eXXX/f8d, dev = 807, inode = 30983, file _id = 501. Error w Nov 19 18:48:04 p55lp6 smbd[3650]: [2007/11/19 18:48:04, 0] lib/util_sock.c:write_data(562) Nov 19 18:48:04 p55lp6 smbd[3650]: write_data: write failure in writing to client 9.124.111.212. Error Connection reset by peer -- Thanks Regards, Kamalesh Babulal, Linux Technology Center, IBM, ISTL. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[BUG] 2.6.24-rc2-mm1 Warning at arch/x86/kernel/smp_32.c:561
Hi, I encountered kernel warningsr. I just executed xawtv without video dev being found. like this: WARNING: at arch/x86/kernel/smp_32.c:561 native_smp_call_function_mask() [c0118769] native_smp_call_function_mask+0x149/0x150 [c0178dd9] alloc_debug_processing+0xa9/0x130 [c0372da0] smp_callback+0x0/0x10 [c0119b7c] smp_call_function+0x1c/0x20 [c0372dc8] cpuidle_latency_notify+0x18/0x20 [c0144eae] notifier_call_chain+0x3e/0x70 [c01450d4] __blocking_notifier_call_chain+0x44/0x70 [c0145117] blocking_notifier_call_chain+0x17/0x20 [c01454fd] pm_qos_add_requirement+0x8d/0xd0 [f887030c] snd_pcm_hw_params+0x20c/0x2a0 [snd_pcm] [f88703ee] snd_pcm_hw_params_user+0x4e/0x90 [snd_pcm] [f88741ed] snd_pcm_capture_ioctl1+0x3d/0x230 [snd_pcm] [f88c1b28] snd_pcm_hw_param_near+0x198/0x230 [snd_pcm_oss] [f88744fe] snd_pcm_kernel_ioctl+0x7e/0x90 [snd_pcm] [f88c28fc] snd_pcm_oss_change_params+0x2fc/0x750 [snd_pcm_oss] [f88c2e47] snd_pcm_oss_make_ready+0x47/0x60 [snd_pcm_oss] [f88c3a4e] snd_pcm_oss_sync+0x10e/0x290 [snd_pcm_oss] [f88c4daa] snd_pcm_oss_release+0x9a/0xb0 [snd_pcm_oss] [c01801ae] __fput+0x16e/0x200 [c017e35c] filp_close+0x3c/0x80 [c017e409] sys_close+0x69/0xd0 [c01042da] syscall_call+0x7/0xb [c040] xfrm_notify_sa+0x110/0x290 === config files : # # Automatically generated make config: don't edit # Linux kernel version: 2.6.24-rc2-mm1 # Tue Nov 20 17:24:16 2007 # CONFIG_X86_32=y CONFIG_GENERIC_TIME=y CONFIG_GENERIC_CMOS_UPDATE=y CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_SEMAPHORE_SLEEPERS=y CONFIG_X86=y CONFIG_FAST_CMPXCHG_LOCAL=y CONFIG_MMU=y CONFIG_ZONE_DMA=y CONFIG_QUICKLIST=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_GENERIC_BUG=y # CONFIG_GENERIC_GPIO is not set CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_DMI=y CONFIG_DEFCONFIG_LIST=/lib/modules/$UNAME_RELEASE/.config # # General setup # CONFIG_EXPERIMENTAL=y CONFIG_LOCK_KERNEL=y CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_LOCALVERSION= # CONFIG_LOCALVERSION_AUTO is not set CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y CONFIG_POSIX_MQUEUE=y CONFIG_BSD_PROCESS_ACCT=y # CONFIG_BSD_PROCESS_ACCT_V3 is not set # CONFIG_TASKSTATS is not set # CONFIG_USER_NS is not set CONFIG_PID_NS=y # CONFIG_AUDIT is not set CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y CONFIG_LOG_BUF_SHIFT=16 # CONFIG_CGROUPS is not set CONFIG_FAIR_GROUP_SCHED=y CONFIG_FAIR_USER_SCHED=y # CONFIG_FAIR_CGROUP_SCHED is not set CONFIG_SYSFS_DEPRECATED=y # CONFIG_RELAY is not set CONFIG_BLK_DEV_INITRD=y CONFIG_INITRAMFS_SOURCE= # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set CONFIG_SYSCTL=y # CONFIG_EMBEDDED is not set CONFIG_UID16=y CONFIG_SYSCTL_SYSCALL=y CONFIG_KALLSYMS=y # CONFIG_KALLSYMS_EXTRA_PASS is not set CONFIG_HOTPLUG=y CONFIG_PRINTK=y CONFIG_BUG=y CONFIG_ELF_CORE=y CONFIG_BASE_FULL=y CONFIG_FUTEX=y CONFIG_ANON_INODES=y CONFIG_EPOLL=y CONFIG_SIGNALFD=y CONFIG_EVENTFD=y CONFIG_SHMEM=y CONFIG_VM_EVENT_COUNTERS=y CONFIG_SLUB_DEBUG=y # CONFIG_SLAB is not set CONFIG_SLUB=y # CONFIG_SLOB is not set CONFIG_PROC_PAGE_MONITOR=y CONFIG_RT_MUTEXES=y # CONFIG_TINY_SHMEM is not set CONFIG_BASE_SMALL=0 CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y CONFIG_MODULE_FORCE_UNLOAD=y # CONFIG_MODVERSIONS is not set # CONFIG_MODULE_SRCVERSION_ALL is not set CONFIG_KMOD=y CONFIG_STOP_MACHINE=y CONFIG_BLOCK=y CONFIG_LBD=y # CONFIG_BLK_DEV_IO_TRACE is not set CONFIG_LSF=y CONFIG_BLK_DEV_BSG=y # # IO Schedulers # CONFIG_IOSCHED_NOOP=y CONFIG_IOSCHED_AS=y CONFIG_IOSCHED_DEADLINE=y CONFIG_IOSCHED_CFQ=y CONFIG_DEFAULT_AS=y # CONFIG_DEFAULT_DEADLINE is not set # CONFIG_DEFAULT_CFQ is not set # CONFIG_DEFAULT_NOOP is not set CONFIG_DEFAULT_IOSCHED=anticipatory CONFIG_PREEMPT_NOTIFIERS=y # # Processor type and features # CONFIG_TICK_ONESHOT=y CONFIG_NO_HZ=y CONFIG_HIGH_RES_TIMERS=y CONFIG_GENERIC_CLOCKEVENTS_BUILD=y CONFIG_SMP=y CONFIG_X86_PC=y # CONFIG_X86_ELAN is not set # CONFIG_X86_VOYAGER is not set # CONFIG_X86_NUMAQ is not set # CONFIG_X86_RDC321X is not set # CONFIG_X86_SUMMIT is not set # CONFIG_X86_BIGSMP is not set # CONFIG_X86_VISWS is not set # CONFIG_X86_GENERICARCH is not set # CONFIG_X86_ES7000 is not set CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER=y # CONFIG_PARAVIRT_GUEST is not set # CONFIG_M386 is not set # CONFIG_M486 is not set # CONFIG_M586 is not set # CONFIG_M586TSC is not set # CONFIG_M586MMX is not set # CONFIG_M686 is not set # CONFIG_MPENTIUMII is not set # CONFIG_MPENTIUMIII is not set # CONFIG_MPENTIUMM is not set # CONFIG_MCORE2 is not set CONFIG_MPENTIUM4=y # CONFIG_MK6 is not set # CONFIG_MK7 is not set # CONFIG_MK8 is not set # CONFIG_MCRUSOE is not set # CONFIG_MEFFICEON is not set # CONFIG_MWINCHIPC6 is not set # CONFIG_MWINCHIP2 is not set # CONFIG_MWINCHIP3D is not set # CONFIG_MGEODEGX1 is not set # CONFIG_MGEODE_LX is not set # CONFIG_MCYRIXIII is not set # CONFIG_MVIAC3_2 is not set # CONFIG_MVIAC7 is not set
Re: [BUG] 2.6.24-rc2-mm1 Warning at arch/x86/kernel/smp_32.c:561
On Nov 20, 2007 5:56 PM, Andrew Morton [EMAIL PROTECTED] wrote: On Tue, 20 Nov 2007 17:47:28 +0800 Dave Young [EMAIL PROTECTED] wrote: Hi, I encountered kernel warningsr. I just executed xawtv without video dev being found. like this: WARNING: at arch/x86/kernel/smp_32.c:561 native_smp_call_function_mask() [c0118769] native_smp_call_function_mask+0x149/0x150 [c0178dd9] alloc_debug_processing+0xa9/0x130 [c0372da0] smp_callback+0x0/0x10 [c0119b7c] smp_call_function+0x1c/0x20 [c0372dc8] cpuidle_latency_notify+0x18/0x20 [c0144eae] notifier_call_chain+0x3e/0x70 [c01450d4] __blocking_notifier_call_chain+0x44/0x70 [c0145117] blocking_notifier_call_chain+0x17/0x20 [c01454fd] pm_qos_add_requirement+0x8d/0xd0 [f887030c] snd_pcm_hw_params+0x20c/0x2a0 [snd_pcm] [f88703ee] snd_pcm_hw_params_user+0x4e/0x90 [snd_pcm] [f88741ed] snd_pcm_capture_ioctl1+0x3d/0x230 [snd_pcm] [f88c1b28] snd_pcm_hw_param_near+0x198/0x230 [snd_pcm_oss] [f88744fe] snd_pcm_kernel_ioctl+0x7e/0x90 [snd_pcm] [f88c28fc] snd_pcm_oss_change_params+0x2fc/0x750 [snd_pcm_oss] [f88c2e47] snd_pcm_oss_make_ready+0x47/0x60 [snd_pcm_oss] [f88c3a4e] snd_pcm_oss_sync+0x10e/0x290 [snd_pcm_oss] [f88c4daa] snd_pcm_oss_release+0x9a/0xb0 [snd_pcm_oss] [c01801ae] __fput+0x16e/0x200 [c017e35c] filp_close+0x3c/0x80 [c017e409] sys_close+0x69/0xd0 [c01042da] syscall_call+0x7/0xb [c040] xfrm_notify_sa+0x110/0x290 === That was hopefully fixed. You might care to test ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/mm/broken-out-2007-11-20-01-45.tar.gz to confirm that, if feeling sufficiently brave.. Hi, I would like to try this tomorrow if have time, thanks. Regards dave - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [lm-sensors] broken suspend [Was: 2.6.24-rc2-mm1]
Hi all: * Alan Stern [EMAIL PROTECTED] [2007-11-19 15:27:14 -0500]: On Mon, 19 Nov 2007, Rudolf Marek wrote: Hello all, gives coretemp_cpu_callback - coretemp_device_remove - platform_device_unregister, so coretemp seems to be what I have and you don't. Yes. For the coretemp developers: coretemp_cpu_callback() needs to be more careful about what it does. During a system sleep transition (suspend, hibernate, resume) it isn't possible to register or unregister a device. Attempts to register will fail and attempts to unregister will block until the system sleep is over -- and for this callback that means hanging. Well I wrote the driver. Thanks for the clarification. If I recall correctly I looked how this part should be done from others drivers. Now while checking what happened to the file, seems Rafael added something related. http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=8bb7844286fb8c9fce6f65d8288aeb09d03a5e0d That does look like it was meant for exactly this sort of situation. It's not clear what the best way is to fix this. Perhaps the CPU notification should be sent along with a special flag indicating that the CPU transition is part of a system sleep (although this seems racy). Perhaps the driver should notice when a system sleep begins, and defer all CPU-change handling until after the sleep is over. maybe it does exist? CPU_DOWN_PREPARE ? http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/cpu-hotplug.txt;hb=HEAD Unfortunately I'm not very familiar with this, calling the coretemp_device_remove from CPU_DOWN_PREPARE would help? Looking at microcode driver, seems it just hide sysfs interface from user. AFAICT from that documentation, it would have been better to unregister the device on CPU_DOWN_PREPARE anyway. CPU_DEAD seems to be too late - it's already gone by then. I'm not sure exactly what you want to do here. But it seems like a waste to unregister the coretemp devices at the start of a system sleep and then register them back at the end. Could you simply leave the devices registered throughout the entire sleep? Of course, at the end you would have to check that all the CPUs really did come back up, and unregister the devices for the CPUs that are still offline. Is it possible to unregister a driver on CPU_DOWN_PREPARE_FROZEN? If so, then the simplest fix would be the patch below (Jiri: feel free to try it). Otherwise it would take a bit of refactoring to bring the sysfs interface down/up for suspend/resume. commit ce9c7b78c839a6304696d90083eac08baad524ce Author: Mark M. Hoffman [EMAIL PROTECTED] Date: Tue Nov 20 07:51:50 2007 -0500 hwmon: (coretemp) fix suspend/resume hang Signed-off-by: Mark M. Hoffman [EMAIL PROTECTED] diff --git a/drivers/hwmon/coretemp.c b/drivers/hwmon/coretemp.c index 5c82ec7..afe2d31 100644 --- a/drivers/hwmon/coretemp.c +++ b/drivers/hwmon/coretemp.c @@ -338,10 +338,12 @@ static int coretemp_cpu_callback(struct notifier_block *nfb, switch (action) { case CPU_ONLINE: case CPU_ONLINE_FROZEN: + case CPU_DOWN_FAILED: + case CPU_DOWN_FAILED_FROZEN: coretemp_device_add(cpu); break; - case CPU_DEAD: - case CPU_DEAD_FROZEN: + case CPU_DOWN_PREPARE: + case CPU_DOWN_PREPARE_FROZEN: coretemp_device_remove(cpu); break; } -- Mark M. Hoffman [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc2-mm1: kcryptd vs lockdep
Torsten Kaiser wrote: On Nov 19, 2007 10:00 PM, Milan Broz [EMAIL PROTECTED] wrote: Torsten Kaiser wrote: Anything I could try, apart from more boots with slub_debug=F? One time it triggered with slub_debug=F, but no additional output. With slub_debug=FP I have not seen it again, so I can't say if that would yield more info. Please could you try which patch from the dm-crypt series cause this ? (agk-dm-dm-crypt* names.) I suspect agk-dm-dm-crypt-move-bio-submission-to-thread.patch because there is one work struct used subsequently in two threads... (io thread already started while crypt thread is processing lockdep_map after calling f(work)...) After reverting only agk-dm-dm-crypt-move-bio-submission-to-thread.patch I also have not seen the 'held lock freed' message again. Ok, then I have question: Is the following pseudocode correct (and problem is in lock validation which checks something already initialized for another queue) or reusing work_struct is not permitted from inside called work function ? (Note comment in code It is permissible to free the struct work_struct from inside the function that is called from it.) struct work_struct work; struct workqueue_struct *a, *b; do_b(*work) { /* do something else */ } do_a(*work) { /* do something */ INIT_WORK(work, do_b); queue_work(b, work); } INIT_WORK(work, do_a); queue_work(a, work); Milan -- [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc2-mm1: kcryptd vs lockdep
On Tue, Nov 20, 2007 at 03:40:30PM +0100, Milan Broz wrote: (Note comment in code It is permissible to free the struct work_struct from inside the function that is called from it.) I don't understand yet how lockdep behaves if the work struct gets reused and the reused one finishes first. I renamed the kcryptd functions today in an attempt to disentangle this code a bit more. - io-pending reference counting looks correct (though used inconsistently when comparing READ with WRITE) - But what happens if kcryptd_crypt_write_convert_loop() calls INIT_WORK/queue_work twice? Alasdair -- [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [lm-sensors] broken suspend [Was: 2.6.24-rc2-mm1]
On Tuesday, 20 of November 2007, Mark M. Hoffman wrote: Hi all: * Alan Stern [EMAIL PROTECTED] [2007-11-19 15:27:14 -0500]: On Mon, 19 Nov 2007, Rudolf Marek wrote: Hello all, gives coretemp_cpu_callback - coretemp_device_remove - platform_device_unregister, so coretemp seems to be what I have and you don't. Yes. For the coretemp developers: coretemp_cpu_callback() needs to be more careful about what it does. During a system sleep transition (suspend, hibernate, resume) it isn't possible to register or unregister a device. Attempts to register will fail and attempts to unregister will block until the system sleep is over -- and for this callback that means hanging. Well I wrote the driver. Thanks for the clarification. If I recall correctly I looked how this part should be done from others drivers. Now while checking what happened to the file, seems Rafael added something related. http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=8bb7844286fb8c9fce6f65d8288aeb09d03a5e0d That does look like it was meant for exactly this sort of situation. It's not clear what the best way is to fix this. Perhaps the CPU notification should be sent along with a special flag indicating that the CPU transition is part of a system sleep (although this seems racy). Perhaps the driver should notice when a system sleep begins, and defer all CPU-change handling until after the sleep is over. maybe it does exist? CPU_DOWN_PREPARE ? http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/cpu-hotplug.txt;hb=HEAD Unfortunately I'm not very familiar with this, calling the coretemp_device_remove from CPU_DOWN_PREPARE would help? Looking at microcode driver, seems it just hide sysfs interface from user. AFAICT from that documentation, it would have been better to unregister the device on CPU_DOWN_PREPARE anyway. CPU_DEAD seems to be too late - it's already gone by then. I'm not sure exactly what you want to do here. But it seems like a waste to unregister the coretemp devices at the start of a system sleep and then register them back at the end. Could you simply leave the devices registered throughout the entire sleep? Of course, at the end you would have to check that all the CPUs really did come back up, and unregister the devices for the CPUs that are still offline. Is it possible to unregister a driver on CPU_DOWN_PREPARE_FROZEN? No. In that case the suspend core is holding the device's mutex and your attempt to unregister it will deadlock with it. Do you _have_ _to_ unregister the device at all? Why don't you just leave it registered on CPU_DOWN_PREPARE_FROZEN? The CPU is not going away physically in this case and it's _guaranteed_ that _cpu_up() will be called on it as soon as the hibernation image is ready or we are back from suspend. If so, then the simplest fix would be the patch below (Jiri: feel free to try it). Otherwise it would take a bit of refactoring to bring the sysfs interface down/up for suspend/resume. commit ce9c7b78c839a6304696d90083eac08baad524ce Author: Mark M. Hoffman [EMAIL PROTECTED] Date: Tue Nov 20 07:51:50 2007 -0500 hwmon: (coretemp) fix suspend/resume hang Signed-off-by: Mark M. Hoffman [EMAIL PROTECTED] I'd do it like this: diff --git a/drivers/hwmon/coretemp.c b/drivers/hwmon/coretemp.c index 5c82ec7..afe2d31 100644 --- a/drivers/hwmon/coretemp.c +++ b/drivers/hwmon/coretemp.c @@ -338,10 +338,12 @@ static int coretemp_cpu_callback(struct notifier_block *nfb, switch (action) { case CPU_ONLINE: case CPU_ONLINE_FROZEN: + case CPU_DOWN_FAILED: coretemp_device_add(cpu); + case CPU_DOWN_FAILED_FROZEN: break; - case CPU_DEAD: - case CPU_DEAD_FROZEN: + case CPU_DOWN_PREPARE: coretemp_device_remove(cpu); + case CPU_DOWN_PREPARE_FROZEN: break; } Greetings, Rafael -- Premature optimization is the root of all evil. - Donald Knuth - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] 2.6.24-rc2-mm1 Warning at arch/x86/kernel/smp_32.c:561
On Nov 20, 2007 5:59 PM, Dave Young [EMAIL PROTECTED] wrote: On Nov 20, 2007 5:56 PM, Andrew Morton [EMAIL PROTECTED] wrote: On Tue, 20 Nov 2007 17:47:28 +0800 Dave Young [EMAIL PROTECTED] wrote: Hi, I encountered kernel warningsr. I just executed xawtv without video dev being found. like this: WARNING: at arch/x86/kernel/smp_32.c:561 native_smp_call_function_mask() [c0118769] native_smp_call_function_mask+0x149/0x150 [c0178dd9] alloc_debug_processing+0xa9/0x130 [c0372da0] smp_callback+0x0/0x10 [c0119b7c] smp_call_function+0x1c/0x20 [c0372dc8] cpuidle_latency_notify+0x18/0x20 [c0144eae] notifier_call_chain+0x3e/0x70 [c01450d4] __blocking_notifier_call_chain+0x44/0x70 [c0145117] blocking_notifier_call_chain+0x17/0x20 [c01454fd] pm_qos_add_requirement+0x8d/0xd0 [f887030c] snd_pcm_hw_params+0x20c/0x2a0 [snd_pcm] [f88703ee] snd_pcm_hw_params_user+0x4e/0x90 [snd_pcm] [f88741ed] snd_pcm_capture_ioctl1+0x3d/0x230 [snd_pcm] [f88c1b28] snd_pcm_hw_param_near+0x198/0x230 [snd_pcm_oss] [f88744fe] snd_pcm_kernel_ioctl+0x7e/0x90 [snd_pcm] [f88c28fc] snd_pcm_oss_change_params+0x2fc/0x750 [snd_pcm_oss] [f88c2e47] snd_pcm_oss_make_ready+0x47/0x60 [snd_pcm_oss] [f88c3a4e] snd_pcm_oss_sync+0x10e/0x290 [snd_pcm_oss] [f88c4daa] snd_pcm_oss_release+0x9a/0xb0 [snd_pcm_oss] [c01801ae] __fput+0x16e/0x200 [c017e35c] filp_close+0x3c/0x80 [c017e409] sys_close+0x69/0xd0 [c01042da] syscall_call+0x7/0xb [c040] xfrm_notify_sa+0x110/0x290 === That was hopefully fixed. You might care to test ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/mm/broken-out-2007-11-20-01-45.tar.gz to confirm that, if feeling sufficiently brave.. Hi, I just confirm that I can't reproduce this after apply broken-out-2007-11-20-01-45 patch set. Regards dave - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc2-mm1: kcryptd vs lockdep
On Nov 19, 2007 10:00 PM, Milan Broz <[EMAIL PROTECTED]> wrote: > Torsten Kaiser wrote: > > Anything I could try, apart from more boots with slub_debug=F? One time it triggered with slub_debug=F, but no additional output. With slub_debug=FP I have not seen it again, so I can't say if that would yield more info. > Please could you try which patch from the dm-crypt series cause this ? > (agk-dm-dm-crypt* names.) > > I suspect agk-dm-dm-crypt-move-bio-submission-to-thread.patch because > there is one work struct used subsequently in two threads... > (io thread already started while crypt thread is processing lockdep_map > after calling f(work)...) After reverting only agk-dm-dm-crypt-move-bio-submission-to-thread.patch I also have not seen the 'held lock freed' message again. If it happens again with this revert, I will post that output. Thanks for the hint. Torsten - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] 2.6.24-rc2-mm1 - kernel bug on nfs v4
On Sun, 18 Nov 2007 14:18:06 -0500 Trond Myklebust <[EMAIL PROTECTED]> wrote: > > > > Torsten > > I had already fixed that one in my own stack. Attached are the 3 patches > that I've got. 1 from SteveD, 2 fixes. > > Andrew, could you please unapply the sillyrename patches you've got, and > apply these 3 instead? I'd expect to see things like this appear in git-nfs.patch. Did something change? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] 2.6.24-rc2-mm1 - kernel bug on nfs v4
On Sun, 18 Nov 2007, root wrote: > @@ -2155,6 +2155,7 @@ static struct kmem_cache_node *early_kme > { > struct page *page; > struct kmem_cache_node *n; > + unsigned long flags; > > BUG_ON(kmalloc_caches->size < sizeof(struct kmem_cache_node)); > Well local_irq_save is a bit of an overkill. We know that interrupts are enabled during this phase of the boot sequence. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: broken suspend [Was: 2.6.24-rc2-mm1]
On Monday, 19 of November 2007, Rudolf Marek wrote: > Hello all, > >>> gives coretemp_cpu_callback -> coretemp_device_remove -> > >>> platform_device_unregister, so coretemp seems to be what I have and you > >>> don't. > > > > Yes. > > > > For the coretemp developers: coretemp_cpu_callback() needs to be more > > careful about what it does. During a system sleep transition (suspend, > > hibernate, resume) it isn't possible to register or unregister a > > device. Attempts to register will fail and attempts to unregister will > > block until the system sleep is over -- and for this callback that > > means hanging. > > Well I wrote the driver. Thanks for the clarification. If I recall correctly > I > looked how this part should be done from others drivers. Now while checking > what happened to the file, seems Rafael added something related. > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=8bb7844286fb8c9fce6f65d8288aeb09d03a5e0d Well, in principle you can use the observation that the _FROZEN versions are used during suspend/hibernation. Thus, if you only unregister the device for CPU_DEAD, but you won't do that for CPU_DEAD_FROZEN, it will work as long as the freezer is there. > > It's not clear what the best way is to fix this. Perhaps the CPU > > notification should be sent along with a special flag indicating that > > the CPU transition is part of a system sleep (although this seems > > racy). In fact, it's already done that way and I don't think it's racy (see above). Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc2-mm1: kcryptd vs lockdep
Torsten Kaiser wrote: > On Nov 19, 2007 8:56 AM, Ingo Molnar <[EMAIL PROTECTED]> wrote: >> * Torsten Kaiser <[EMAIL PROTECTED]> wrote: ... > Above this acquire/release sequence is the following comment: > #ifdef CONFIG_LOCKDEP > /* > * It is permissible to free the struct work_struct > * from inside the function that is called from it, > * this we need to take into account for lockdep too. > * To avoid bogus "held lock freed" warnings as well > * as problems when looking into work->lockdep_map, > * make a copy and use that here. > */ > struct lockdep_map lockdep_map = work->lockdep_map; > #endif > > Did something trigger this anyway? > > Anything I could try, apart from more boots with slub_debug=F? Please could you try which patch from the dm-crypt series cause this ? (agk-dm-dm-crypt* names.) I suspect agk-dm-dm-crypt-move-bio-submission-to-thread.patch because there is one work struct used subsequently in two threads... (io thread already started while crypt thread is processing lockdep_map after calling f(work)...) (btw these patches prepare dm-crypt for next patchset introducing async cryptoapi, so there should be no functional changes yet.) Milan -- [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: broken suspend [Was: 2.6.24-rc2-mm1]
On Mon, 19 Nov 2007, Rudolf Marek wrote: > Hello all, > >>> gives coretemp_cpu_callback -> coretemp_device_remove -> > >>> platform_device_unregister, so coretemp seems to be what I have and you > >>> don't. > > > > Yes. > > > > For the coretemp developers: coretemp_cpu_callback() needs to be more > > careful about what it does. During a system sleep transition (suspend, > > hibernate, resume) it isn't possible to register or unregister a > > device. Attempts to register will fail and attempts to unregister will > > block until the system sleep is over -- and for this callback that > > means hanging. > > Well I wrote the driver. Thanks for the clarification. If I recall correctly > I > looked how this part should be done from others drivers. Now while checking > what happened to the file, seems Rafael added something related. > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=8bb7844286fb8c9fce6f65d8288aeb09d03a5e0d That does look like it was meant for exactly this sort of situation. > > It's not clear what the best way is to fix this. Perhaps the CPU > > notification should be sent along with a special flag indicating that > > the CPU transition is part of a system sleep (although this seems > > racy). Perhaps the driver should notice when a system sleep begins, > > and defer all CPU-change handling until after the sleep is over. > > maybe it does exist? CPU_DOWN_PREPARE ? > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/cpu-hotplug.txt;hb=HEAD > > Unfortunately I'm not very familiar with this, calling the > coretemp_device_remove from CPU_DOWN_PREPARE would help? Looking at microcode > driver, seems it just hide sysfs interface from user. I'm not sure exactly what you want to do here. But it seems like a waste to unregister the coretemp devices at the start of a system sleep and then register them back at the end. Could you simply leave the devices registered throughout the entire sleep? Of course, at the end you would have to check that all the CPUs really did come back up, and unregister the devices for the CPUs that are still offline. Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: broken suspend [Was: 2.6.24-rc2-mm1]
Hello all, gives coretemp_cpu_callback -> coretemp_device_remove -> platform_device_unregister, so coretemp seems to be what I have and you don't. Yes. For the coretemp developers: coretemp_cpu_callback() needs to be more careful about what it does. During a system sleep transition (suspend, hibernate, resume) it isn't possible to register or unregister a device. Attempts to register will fail and attempts to unregister will block until the system sleep is over -- and for this callback that means hanging. Well I wrote the driver. Thanks for the clarification. If I recall correctly I looked how this part should be done from others drivers. Now while checking what happened to the file, seems Rafael added something related. http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=8bb7844286fb8c9fce6f65d8288aeb09d03a5e0d It's not clear what the best way is to fix this. Perhaps the CPU notification should be sent along with a special flag indicating that the CPU transition is part of a system sleep (although this seems racy). Perhaps the driver should notice when a system sleep begins, and defer all CPU-change handling until after the sleep is over. maybe it does exist? CPU_DOWN_PREPARE ? http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/cpu-hotplug.txt;hb=HEAD Unfortunately I'm not very familiar with this, calling the coretemp_device_remove from CPU_DOWN_PREPARE would help? Looking at microcode driver, seems it just hide sysfs interface from user. Thanks, Rudolf - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc2-mm1: kcryptd vs lockdep
On Nov 19, 2007 8:56 AM, Ingo Molnar <[EMAIL PROTECTED]> wrote: > > * Torsten Kaiser <[EMAIL PROTECTED]> wrote: > > > Trying the last NFSv4 patch (but that patch is only the cause, why I > > had lockdep enabled) I got this: > > [ 64.550203] > > [ 64.550205] = > > [ 64.552213] [ BUG: held lock freed! ] > > [ 64.553633] - > > [ 64.555055] kcryptd/1022 is freeing memory > > 81011EBEFB00-81011EBEFB3F, with a lock still held there! > > so kcryptd frees a live, still in use bio? That could be a receipe for > data corruption. Does SLUB_DEBUG (or SLAB_DEBUG) catch anything? I have SLUB_DEBUG=y, but not SLUB_DEBUG_ON. But apart from this message I did not the anything in the syslog. It seems to not be onetime event, as one the third boot it happend again. Stacktrace was identical. Sadly trying 3 boots with slub_debug=FZP and another one with only F did not trigger it. But I don't think kcryptd is freeing a bio at that point. The message said about the freed lock: (kcryptd){--..}, at: [] (gdb) list *0x80247dd9 0x80247dd9 is in run_workqueue (include/asm/bitops_64.h:69). 64 * you should call smp_mb__before_clear_bit() and/or smp_mb__after_clear_bit() 65 * in order to ensure changes are visible on other processors. 66 */ 67 static inline void clear_bit(int nr, volatile void *addr) 68 { 69 __asm__ __volatile__( LOCK_PREFIX 70 "btrl %1,%0" 71 :ADDR 72 :"dIr" (nr)); 73 } increasing the addr a little bit shows: (gdb) list *0x80247ddf 0x80247ddf is in run_workqueue (kernel/workqueue.c:275). 270 list_del_init(cwq->worklist.next); 271 spin_unlock_irq(>lock); 272 273 BUG_ON(get_wq_data(work) != cwq); 274 work_clear_pending(work); 275 lock_acquire(>wq->lockdep_map, 0, 0, 0, 2, _THIS_IP_); 276 lock_acquire(_map, 0, 0, 0, 2, _THIS_IP_); 277 f(work); 278 lock_release(_map, 1, _THIS_IP_); 279 lock_release(>wq->lockdep_map, 1, _THIS_IP_); Above this acquire/release sequence is the following comment: #ifdef CONFIG_LOCKDEP /* * It is permissible to free the struct work_struct * from inside the function that is called from it, * this we need to take into account for lockdep too. * To avoid bogus "held lock freed" warnings as well * as problems when looking into work->lockdep_map, * make a copy and use that here. */ struct lockdep_map lockdep_map = work->lockdep_map; #endif Did something trigger this anyway? Anything I could try, apart from more boots with slub_debug=F? Torsten - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] 2.6.24-rc2-mm1 - kernel bug on nfs v4
On Nov 19, 2007 10:00 AM, Andrew Morton <[EMAIL PROTECTED]> wrote: > On Mon, 19 Nov 2007 08:15:48 +0100 "Torsten Kaiser" <[EMAIL PROTECTED]> wrote: > > On Nov 18, 2007 8:18 PM, Trond Myklebust <[EMAIL PROTECTED]> wrote: > > > I had already fixed that one in my own stack. Attached are the 3 patches > > > that I've got. 1 from SteveD, 2 fixes. > > > > Moving the init_waitqueue_head() like patch > > linux-2.6.24-006-fix_to_fix_sillyrename_bug_on_umount.dif and applying > > linux-2.6.24-007-fix_nfs_free_unlinkdata.dif lets my testcase work. > > Also lockdep no longer complains about the non-static key. > > Thanks. > > To avoid goofups, could you please send the full fix against 2.6.24-rc2-mm1? Umm... As I applied this changes manually there is a not insignificant change of goofups on my part... For the hang problem I think Tronds suggestion with replacing the patches from -mm with fresh versions would be the best. Anyway, currently I have the patch from http://lkml.org/lkml/2007/11/16/74 to fix the can't-create-files-bug. To fix the hang bug I used Tronds linux-2.6.24-007-fix_nfs_free_unlinkdata.dif and the first two hunks from linux-2.6.24-006-fix_to_fix_sillyrename_bug_on_umount.dif. Torsten The needed 2 hunks for reference: --- a/fs/nfs/client.c +++ b/fs/nfs/client.c @@ -594,9 +594,6 @@ static int nfs_init_server(struct nfs_server *server, /* Create a client RPC handle for the NFSv3 ACL management interface */ nfs_init_server_aclclient(server); - init_waitqueue_head(>active_wq); - atomic_set(>active, 0); - dprintk("<-- nfs_init_server() = 0 [new %p]\n", clp); return 0; @@ -736,6 +733,9 @@ static struct nfs_server *nfs_alloc_server(void) INIT_LIST_HEAD(>client_link); INIT_LIST_HEAD(>master_link); + init_waitqueue_head(>active_wq); + atomic_set(>active, 0); + server->io_stats = nfs_alloc_iostats(); if (!server->io_stats) { kfree(server); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] 2.6.24-rc2-mm1 - kernel bug on nfs v4
On Mon, 19 Nov 2007 08:15:48 +0100 "Torsten Kaiser" <[EMAIL PROTECTED]> wrote: > On Nov 18, 2007 8:18 PM, Trond Myklebust <[EMAIL PROTECTED]> wrote: > > On Sun, 2007-11-18 at 19:44 +0100, Torsten Kaiser wrote: > > > NFSv2/3 and NFSv4 share the same dentry_iput and so share the same > > > unlink and sillyrename logic. > > > But they do not share nfs_init_server()! > > > > > > I wonder why this doesn't blow up more violently, but only hangs... > > > > > > But as I don't know if it is correct to add the workqueue > > > initialization to nfs4_init_server() or remove the nfs_sb_active / > > > nfs_sb_deactive for the NFSv4 case, I can't offer a patch to fix this. > > > > > > Torsten > > > > I had already fixed that one in my own stack. Attached are the 3 patches > > that I've got. 1 from SteveD, 2 fixes. > > > > Moving the init_waitqueue_head() like patch > linux-2.6.24-006-fix_to_fix_sillyrename_bug_on_umount.dif and applying > linux-2.6.24-007-fix_nfs_free_unlinkdata.dif lets my testcase work. > Also lockdep no longer complains about the non-static key. > Thanks. To avoid goofups, could you please send the full fix against 2.6.24-rc2-mm1? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] 2.6.24-rc2-mm1 - kernel bug on nfs v4
On Mon, 19 Nov 2007 08:15:48 +0100 Torsten Kaiser [EMAIL PROTECTED] wrote: On Nov 18, 2007 8:18 PM, Trond Myklebust [EMAIL PROTECTED] wrote: On Sun, 2007-11-18 at 19:44 +0100, Torsten Kaiser wrote: NFSv2/3 and NFSv4 share the same dentry_iput and so share the same unlink and sillyrename logic. But they do not share nfs_init_server()! I wonder why this doesn't blow up more violently, but only hangs... But as I don't know if it is correct to add the workqueue initialization to nfs4_init_server() or remove the nfs_sb_active / nfs_sb_deactive for the NFSv4 case, I can't offer a patch to fix this. Torsten I had already fixed that one in my own stack. Attached are the 3 patches that I've got. 1 from SteveD, 2 fixes. Moving the init_waitqueue_head() like patch linux-2.6.24-006-fix_to_fix_sillyrename_bug_on_umount.dif and applying linux-2.6.24-007-fix_nfs_free_unlinkdata.dif lets my testcase work. Also lockdep no longer complains about the non-static key. Thanks. To avoid goofups, could you please send the full fix against 2.6.24-rc2-mm1? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] 2.6.24-rc2-mm1 - kernel bug on nfs v4
On Nov 19, 2007 10:00 AM, Andrew Morton [EMAIL PROTECTED] wrote: On Mon, 19 Nov 2007 08:15:48 +0100 Torsten Kaiser [EMAIL PROTECTED] wrote: On Nov 18, 2007 8:18 PM, Trond Myklebust [EMAIL PROTECTED] wrote: I had already fixed that one in my own stack. Attached are the 3 patches that I've got. 1 from SteveD, 2 fixes. Moving the init_waitqueue_head() like patch linux-2.6.24-006-fix_to_fix_sillyrename_bug_on_umount.dif and applying linux-2.6.24-007-fix_nfs_free_unlinkdata.dif lets my testcase work. Also lockdep no longer complains about the non-static key. Thanks. To avoid goofups, could you please send the full fix against 2.6.24-rc2-mm1? Umm... As I applied this changes manually there is a not insignificant change of goofups on my part... For the hang problem I think Tronds suggestion with replacing the patches from -mm with fresh versions would be the best. Anyway, currently I have the patch from http://lkml.org/lkml/2007/11/16/74 to fix the can't-create-files-bug. To fix the hang bug I used Tronds linux-2.6.24-007-fix_nfs_free_unlinkdata.dif and the first two hunks from linux-2.6.24-006-fix_to_fix_sillyrename_bug_on_umount.dif. Torsten The needed 2 hunks for reference: --- a/fs/nfs/client.c +++ b/fs/nfs/client.c @@ -594,9 +594,6 @@ static int nfs_init_server(struct nfs_server *server, /* Create a client RPC handle for the NFSv3 ACL management interface */ nfs_init_server_aclclient(server); - init_waitqueue_head(server-active_wq); - atomic_set(server-active, 0); - dprintk(-- nfs_init_server() = 0 [new %p]\n, clp); return 0; @@ -736,6 +733,9 @@ static struct nfs_server *nfs_alloc_server(void) INIT_LIST_HEAD(server-client_link); INIT_LIST_HEAD(server-master_link); + init_waitqueue_head(server-active_wq); + atomic_set(server-active, 0); + server-io_stats = nfs_alloc_iostats(); if (!server-io_stats) { kfree(server); - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc2-mm1: kcryptd vs lockdep
On Nov 19, 2007 8:56 AM, Ingo Molnar [EMAIL PROTECTED] wrote: * Torsten Kaiser [EMAIL PROTECTED] wrote: Trying the last NFSv4 patch (but that patch is only the cause, why I had lockdep enabled) I got this: [ 64.550203] [ 64.550205] = [ 64.552213] [ BUG: held lock freed! ] [ 64.553633] - [ 64.555055] kcryptd/1022 is freeing memory 81011EBEFB00-81011EBEFB3F, with a lock still held there! so kcryptd frees a live, still in use bio? That could be a receipe for data corruption. Does SLUB_DEBUG (or SLAB_DEBUG) catch anything? I have SLUB_DEBUG=y, but not SLUB_DEBUG_ON. But apart from this message I did not the anything in the syslog. It seems to not be onetime event, as one the third boot it happend again. Stacktrace was identical. Sadly trying 3 boots with slub_debug=FZP and another one with only F did not trigger it. But I don't think kcryptd is freeing a bio at that point. The message said about the freed lock: (kcryptd){--..}, at: [80247dd9] (gdb) list *0x80247dd9 0x80247dd9 is in run_workqueue (include/asm/bitops_64.h:69). 64 * you should call smp_mb__before_clear_bit() and/or smp_mb__after_clear_bit() 65 * in order to ensure changes are visible on other processors. 66 */ 67 static inline void clear_bit(int nr, volatile void *addr) 68 { 69 __asm__ __volatile__( LOCK_PREFIX 70 btrl %1,%0 71 :ADDR 72 :dIr (nr)); 73 } increasing the addr a little bit shows: (gdb) list *0x80247ddf 0x80247ddf is in run_workqueue (kernel/workqueue.c:275). 270 list_del_init(cwq-worklist.next); 271 spin_unlock_irq(cwq-lock); 272 273 BUG_ON(get_wq_data(work) != cwq); 274 work_clear_pending(work); 275 lock_acquire(cwq-wq-lockdep_map, 0, 0, 0, 2, _THIS_IP_); 276 lock_acquire(lockdep_map, 0, 0, 0, 2, _THIS_IP_); 277 f(work); 278 lock_release(lockdep_map, 1, _THIS_IP_); 279 lock_release(cwq-wq-lockdep_map, 1, _THIS_IP_); Above this acquire/release sequence is the following comment: #ifdef CONFIG_LOCKDEP /* * It is permissible to free the struct work_struct * from inside the function that is called from it, * this we need to take into account for lockdep too. * To avoid bogus held lock freed warnings as well * as problems when looking into work-lockdep_map, * make a copy and use that here. */ struct lockdep_map lockdep_map = work-lockdep_map; #endif Did something trigger this anyway? Anything I could try, apart from more boots with slub_debug=F? Torsten - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: broken suspend [Was: 2.6.24-rc2-mm1]
Hello all, gives coretemp_cpu_callback - coretemp_device_remove - platform_device_unregister, so coretemp seems to be what I have and you don't. Yes. For the coretemp developers: coretemp_cpu_callback() needs to be more careful about what it does. During a system sleep transition (suspend, hibernate, resume) it isn't possible to register or unregister a device. Attempts to register will fail and attempts to unregister will block until the system sleep is over -- and for this callback that means hanging. Well I wrote the driver. Thanks for the clarification. If I recall correctly I looked how this part should be done from others drivers. Now while checking what happened to the file, seems Rafael added something related. http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=8bb7844286fb8c9fce6f65d8288aeb09d03a5e0d It's not clear what the best way is to fix this. Perhaps the CPU notification should be sent along with a special flag indicating that the CPU transition is part of a system sleep (although this seems racy). Perhaps the driver should notice when a system sleep begins, and defer all CPU-change handling until after the sleep is over. maybe it does exist? CPU_DOWN_PREPARE ? http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/cpu-hotplug.txt;hb=HEAD Unfortunately I'm not very familiar with this, calling the coretemp_device_remove from CPU_DOWN_PREPARE would help? Looking at microcode driver, seems it just hide sysfs interface from user. Thanks, Rudolf - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: broken suspend [Was: 2.6.24-rc2-mm1]
On Mon, 19 Nov 2007, Rudolf Marek wrote: Hello all, gives coretemp_cpu_callback - coretemp_device_remove - platform_device_unregister, so coretemp seems to be what I have and you don't. Yes. For the coretemp developers: coretemp_cpu_callback() needs to be more careful about what it does. During a system sleep transition (suspend, hibernate, resume) it isn't possible to register or unregister a device. Attempts to register will fail and attempts to unregister will block until the system sleep is over -- and for this callback that means hanging. Well I wrote the driver. Thanks for the clarification. If I recall correctly I looked how this part should be done from others drivers. Now while checking what happened to the file, seems Rafael added something related. http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=8bb7844286fb8c9fce6f65d8288aeb09d03a5e0d That does look like it was meant for exactly this sort of situation. It's not clear what the best way is to fix this. Perhaps the CPU notification should be sent along with a special flag indicating that the CPU transition is part of a system sleep (although this seems racy). Perhaps the driver should notice when a system sleep begins, and defer all CPU-change handling until after the sleep is over. maybe it does exist? CPU_DOWN_PREPARE ? http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/cpu-hotplug.txt;hb=HEAD Unfortunately I'm not very familiar with this, calling the coretemp_device_remove from CPU_DOWN_PREPARE would help? Looking at microcode driver, seems it just hide sysfs interface from user. I'm not sure exactly what you want to do here. But it seems like a waste to unregister the coretemp devices at the start of a system sleep and then register them back at the end. Could you simply leave the devices registered throughout the entire sleep? Of course, at the end you would have to check that all the CPUs really did come back up, and unregister the devices for the CPUs that are still offline. Alan Stern - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc2-mm1: kcryptd vs lockdep
Torsten Kaiser wrote: On Nov 19, 2007 8:56 AM, Ingo Molnar [EMAIL PROTECTED] wrote: * Torsten Kaiser [EMAIL PROTECTED] wrote: ... Above this acquire/release sequence is the following comment: #ifdef CONFIG_LOCKDEP /* * It is permissible to free the struct work_struct * from inside the function that is called from it, * this we need to take into account for lockdep too. * To avoid bogus held lock freed warnings as well * as problems when looking into work-lockdep_map, * make a copy and use that here. */ struct lockdep_map lockdep_map = work-lockdep_map; #endif Did something trigger this anyway? Anything I could try, apart from more boots with slub_debug=F? Please could you try which patch from the dm-crypt series cause this ? (agk-dm-dm-crypt* names.) I suspect agk-dm-dm-crypt-move-bio-submission-to-thread.patch because there is one work struct used subsequently in two threads... (io thread already started while crypt thread is processing lockdep_map after calling f(work)...) (btw these patches prepare dm-crypt for next patchset introducing async cryptoapi, so there should be no functional changes yet.) Milan -- [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: broken suspend [Was: 2.6.24-rc2-mm1]
On Monday, 19 of November 2007, Rudolf Marek wrote: Hello all, gives coretemp_cpu_callback - coretemp_device_remove - platform_device_unregister, so coretemp seems to be what I have and you don't. Yes. For the coretemp developers: coretemp_cpu_callback() needs to be more careful about what it does. During a system sleep transition (suspend, hibernate, resume) it isn't possible to register or unregister a device. Attempts to register will fail and attempts to unregister will block until the system sleep is over -- and for this callback that means hanging. Well I wrote the driver. Thanks for the clarification. If I recall correctly I looked how this part should be done from others drivers. Now while checking what happened to the file, seems Rafael added something related. http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=8bb7844286fb8c9fce6f65d8288aeb09d03a5e0d Well, in principle you can use the observation that the _FROZEN versions are used during suspend/hibernation. Thus, if you only unregister the device for CPU_DEAD, but you won't do that for CPU_DEAD_FROZEN, it will work as long as the freezer is there. It's not clear what the best way is to fix this. Perhaps the CPU notification should be sent along with a special flag indicating that the CPU transition is part of a system sleep (although this seems racy). In fact, it's already done that way and I don't think it's racy (see above). Greetings, Rafael - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] 2.6.24-rc2-mm1 - kernel bug on nfs v4
On Sun, 18 Nov 2007, root wrote: @@ -2155,6 +2155,7 @@ static struct kmem_cache_node *early_kme { struct page *page; struct kmem_cache_node *n; + unsigned long flags; BUG_ON(kmalloc_caches-size sizeof(struct kmem_cache_node)); Well local_irq_save is a bit of an overkill. We know that interrupts are enabled during this phase of the boot sequence. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] 2.6.24-rc2-mm1 - kernel bug on nfs v4
On Sun, 18 Nov 2007 14:18:06 -0500 Trond Myklebust [EMAIL PROTECTED] wrote: Torsten I had already fixed that one in my own stack. Attached are the 3 patches that I've got. 1 from SteveD, 2 fixes. Andrew, could you please unapply the sillyrename patches you've got, and apply these 3 instead? I'd expect to see things like this appear in git-nfs.patch. Did something change? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc2-mm1: kcryptd vs lockdep
On Nov 19, 2007 10:00 PM, Milan Broz [EMAIL PROTECTED] wrote: Torsten Kaiser wrote: Anything I could try, apart from more boots with slub_debug=F? One time it triggered with slub_debug=F, but no additional output. With slub_debug=FP I have not seen it again, so I can't say if that would yield more info. Please could you try which patch from the dm-crypt series cause this ? (agk-dm-dm-crypt* names.) I suspect agk-dm-dm-crypt-move-bio-submission-to-thread.patch because there is one work struct used subsequently in two threads... (io thread already started while crypt thread is processing lockdep_map after calling f(work)...) After reverting only agk-dm-dm-crypt-move-bio-submission-to-thread.patch I also have not seen the 'held lock freed' message again. If it happens again with this revert, I will post that output. Thanks for the hint. Torsten - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.24-rc2-mm1: kcryptd vs lockdep
* Torsten Kaiser <[EMAIL PROTECTED]> wrote: > Trying the last NFSv4 patch (but that patch is only the cause, why I > had lockdep enabled) I got this: > [ 64.550203] > [ 64.550205] = > [ 64.552213] [ BUG: held lock freed! ] > [ 64.553633] - > [ 64.555055] kcryptd/1022 is freeing memory > 81011EBEFB00-81011EBEFB3F, with a lock still held there! so kcryptd frees a live, still in use bio? That could be a receipe for data corruption. Does SLUB_DEBUG (or SLAB_DEBUG) catch anything? Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.6.24-rc2-mm1: kcryptd vs lockdep
Trying the last NFSv4 patch (but that patch is only the cause, why I had lockdep enabled) I got this: [ 64.550203] [ 64.550205] = [ 64.552213] [ BUG: held lock freed! ] [ 64.553633] - [ 64.555055] kcryptd/1022 is freeing memory 81011EBEFB00-81011EBEFB3F, with a lock still held there! [ 64.558809] (kcryptd){--..}, at: [] run_workqueue+0x129/0x210 [ 64.561743] 2 locks held by kcryptd/1022: [ 64.563296] #0: (kcryptd){--..}, at: [] run_workqueue+0x129/0x210 [ 64.566409] #1: (>work#2){--..}, at: [] run_workqueue+0x129/0x210 [ 64.569672] [ 64.569672] stack backtrace: [ 64.571375] [ 64.571375] Call Trace: [ 64.572913] [] debug_check_no_locks_freed+0x190/0x1b0 [ 64.575764] [] mempool_free_slab+0x12/0x20 [ 64.577986] [] kmem_cache_free+0x79/0xe0 [ 64.580140] [] mempool_free_slab+0x12/0x20 [ 64.582362] [] mempool_free+0x8a/0xa0 [ 64.584415] [] bio_free+0x2f/0x50 [ 64.586337] [] bio_fs_destructor+0x10/0x20 [ 64.588558] [] bio_put+0x26/0x30 [ 64.590446] [] xfs_buf_bio_end_io+0x99/0x120 [ 64.592734] [] bio_endio+0x19/0x40 [ 64.594687] [] dec_pending+0x107/0x210 [ 64.596775] [] clone_endio+0x70/0xb0 [ 64.598793] [] kcryptd_do_crypt+0x0/0x290 [ 64.600978] [] bio_endio+0x19/0x40 [ 64.602931] [] crypt_dec_pending+0x32/0x50 [ 64.605149] [] kcryptd_do_crypt+0x64/0x290 [ 64.607368] [] kcryptd_do_crypt+0x0/0x290 [ 64.609553] [] kcryptd_do_crypt+0x0/0x290 [ 64.611739] [] run_workqueue+0x175/0x210 [ 64.613892] [] worker_thread+0x71/0xb0 [ 64.615981] [] autoremove_wake_function+0x0/0x40 [ 64.618402] [] worker_thread+0x0/0xb0 [ 64.620454] [] kthread+0x4d/0x80 [ 64.622340] [] child_rip+0xa/0x12 [ 64.624262] [] restore_args+0x0/0x30 [ 64.626281] [] kthread+0x0/0x80 [ 64.628134] [] child_rip+0x0/0x12 [ 64.630052] [ 64.630637] INFO: lockdep is turned off. I only have only seen this once, booting the same kernel build a second time, it did not happen again. Also I got two other oopses when trying to shut the system down after the above happend. So it might be possible that kcryptd only was the victim of an other corruption, but then I don't know what subsystem was to blame. The other oopses: [ 108.613851] Unable to handle kernel paging request at 0a6425203a72 RIP: [ 108.618485] [<0a6425203a72>] [ 108.624339] PGD 0 [ 108.626416] Oops: 0010 [1] SMP [ 108.629657] last sysfs file: /sys/devices/pci:00/:00:0f.0/:01:00.1/resource [ 108.637675] CPU 3 [ 108.639749] Modules linked in: radeon drm nfsd exportfs ipv6 tuner tea5767 tda8290 tuner_xc2028 tda9887 tuner_simple mt20xx tea5761 tvaudio msp3400 bttv ir_common compat_ioctl32 videobuf_dma_sg videobuf_core btcx_risc tveeprom videodev usbhid v4l2_common v4l1_compat hid pata_amd sg [ 108.665913] Pid: 8715, comm: reboot Not tainted 2.6.24-rc2-mm1 #14 [ 108.672103] RIP: 0010:[<0a6425203a72>] [<0a6425203a72>] [ 108.678164] RSP: 0018:81011d4a1e10 EFLAGS: 00010206 [ 108.683491] RAX: 0a6425203a72 RBX: 8077f9a0 RCX: 000a [ 108.690635] RDX: 8077fb40 RSI: 000a RDI: 81011ff0f870 [ 108.697779] RBP: 81011d4a1e28 R08: 07d0 R09: 0001 [ 108.704922] R10: 804bf40a R11: R12: fee1dead [ 108.712066] R13: 01234567 R14: R15: 0001 [ 108.719210] FS: 7f217607f6f0() GS:81011ff11780() knlGS: [ 108.727312] CS: 0010 DS: ES: CR0: 8005003b [ 108.733071] CR2: 0a6425203a72 CR3: 00011e182000 CR4: 06e0 [ 108.740215] DR0: DR1: DR2: [ 108.747358] DR3: DR6: 0ff0 DR7: 0400 [ 108.754501] Process reboot (pid: 8715, threadinfo 81011D4A, task 81011EC20EC0) [ 108.762776] Stack: 8042c4ac 81011d4a1e28 81011d4a1e48 [ 108.770993] 802451ff 28121969 28121969 81011d4a1f78 [ 108.778561] 802453d5 81011f9ec078 81011f97e780 81011d4a1e88 [ 108.785911] Call Trace: [ 108.788601] [] device_shutdown+0x4c/0xa0 [ 108.794192] [] kernel_restart+0x2f/0x70 [ 108.799696] [] sys_reboot+0x185/0x1d0 [ 108.805028] [] d_free+0x49/0x50 [ 108.809832] [] d_kill+0x50/0x70 [ 108.814641] [] mntput_no_expire+0x20/0xe0 [ 108.820314] [] __fput+0x17d/0x230 [ 108.825295] [] fput+0x16/0x20 [ 108.829933] [] trace_hardirqs_on_thunk+0x35/0x3a [ 108.836219] [] system_call+0x7e/0x83 [ 108.841459] [ 108.842979] INFO: lockdep is turned off. [ 108.846922] [ 108.846922] Code: Bad RIP value. [ 108.851852] RIP [<0a6425203a72>] [ 108.855570] RSP [ 108.859081] CR2: 0a6425203a72 [ 110.859331] md: stopping all md devices. [ 110.863297] md: md1 still in use. [ 111.865229] sd 8:0:1:0: [sd
Re: [BUG] 2.6.24-rc2-mm1 - kernel bug on nfs v4
On Nov 18, 2007 8:18 PM, Trond Myklebust <[EMAIL PROTECTED]> wrote: > On Sun, 2007-11-18 at 19:44 +0100, Torsten Kaiser wrote: > > NFSv2/3 and NFSv4 share the same dentry_iput and so share the same > > unlink and sillyrename logic. > > But they do not share nfs_init_server()! > > > > I wonder why this doesn't blow up more violently, but only hangs... > > > > But as I don't know if it is correct to add the workqueue > > initialization to nfs4_init_server() or remove the nfs_sb_active / > > nfs_sb_deactive for the NFSv4 case, I can't offer a patch to fix this. > > > > Torsten > > I had already fixed that one in my own stack. Attached are the 3 patches > that I've got. 1 from SteveD, 2 fixes. > Moving the init_waitqueue_head() like patch linux-2.6.24-006-fix_to_fix_sillyrename_bug_on_umount.dif and applying linux-2.6.24-007-fix_nfs_free_unlinkdata.dif lets my testcase work. Also lockdep no longer complains about the non-static key. Torsten - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] 2.6.24-rc2-mm1 powerpc (iseries)- build failure - mm/stab.c
From: Kamalesh Babulal <[EMAIL PROTECTED]> The kernel builds fails with following error, with randconfig CC arch/powerpc/mm/stab.o arch/powerpc/mm/stab.c: In function ‘stab_initialize’: arch/powerpc/mm/stab.c:282: error: implicit declaration of function ‘HvCall1’ arch/powerpc/mm/stab.c:282: error: ‘HvCallBaseSetASR’ undeclared (first use in this function) arch/powerpc/mm/stab.c:282: error: (Each undeclared identifier is reported only once arch/powerpc/mm/stab.c:282: error: for each function it appears in.) make[1]: *** [arch/powerpc/mm/stab.o] Error 1 make: *** [arch/powerpc/mm] Error 2 Signed-off-by: Kamalesh Babulal <[EMAIL PROTECTED]> Acked-by: Stephen Rothwell <[EMAIL PROTECTED]> --- arch/powerpc/mm/stab.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) On Mon, 19 Nov 2007 11:56:11 +0530 Kamalesh Babulal <[EMAIL PROTECTED]> wrote: > > Resubmitting the patch titled > powerpc-iseries-build-failure-mm-stabc.patch in the -mm tree. Paulus, this should be fine for merging like above. Cheers, Stephen Rothwell[EMAIL PROTECTED] diff --git a/arch/powerpc/mm/stab.c b/arch/powerpc/mm/stab.c index 9e85bda..50448d5 100644 --- a/arch/powerpc/mm/stab.c +++ b/arch/powerpc/mm/stab.c @@ -20,6 +20,7 @@ #include #include #include +#include struct stab_entry { unsigned long esid_data; -- 1.5.3.5 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH-RESEND] 2.6.24-rc2-mm1 powerpc (iseries)- build failure - mm/stab.c
Hi Stephen, Resubmitting the patch titled powerpc-iseries-build-failure-mm-stabc.patch in the -mm tree. Signed-off-by: Kamalesh Babulal <[EMAIL PROTECTED]> -- --- linux-2.6.24-rc2/arch/powerpc/mm/stab.c 2007-11-07 03:27:46.0 +0530 +++ linux-2.6.24-rc2/arch/powerpc/mm/~stab.c2007-11-19 19:43:55.0 +0530 @@ -20,6 +20,7 @@ #include #include #include +#include struct stab_entry { unsigned long esid_data; -- Thanks & Regards, Kamalesh Babulal, Linux Technology Center, IBM, ISTL. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: broken suspend [Was: 2.6.24-rc2-mm1]
On Sun, 18 Nov 2007, Jiri Slaby wrote: > > gives coretemp_cpu_callback -> coretemp_device_remove -> > > platform_device_unregister, so coretemp seems to be what I have and you > > don't. Yes. For the coretemp developers: coretemp_cpu_callback() needs to be more careful about what it does. During a system sleep transition (suspend, hibernate, resume) it isn't possible to register or unregister a device. Attempts to register will fail and attempts to unregister will block until the system sleep is over -- and for this callback that means hanging. It's not clear what the best way is to fix this. Perhaps the CPU notification should be sent along with a special flag indicating that the CPU transition is part of a system sleep (although this seems racy). Perhaps the driver should notice when a system sleep begins, and defer all CPU-change handling until after the sleep is over. Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] 2.6.24-rc2-mm1 powerpc (iseries)- build failure - mm/stab.c
Hi Kamalesh, On Wed, 14 Nov 2007 16:24:10 +0530 Kamalesh Babulal <[EMAIL PROTECTED]> wrote: > > +#ifdef CONFIG_PPC_ISERIES > +#include > +#endif /* CONFIG_PPC_ISERIES */ You should not need the #ifdef and we prefer not to ifdef include unless necessary. Please resubmit. -- Cheers, Stephen Rothwell[EMAIL PROTECTED] http://www.canb.auug.org.au/~sfr/ pgpYIHPGXlXyO.pgp Description: PGP signature
Re: broken suspend [Was: 2.6.24-rc2-mm1]
Aah, we probably should let coretemp people known. Whole thread: http://marc.info/?t=11950720581=1=2 On 11/18/2007 08:09 PM, Jiri Slaby wrote: > On 11/18/2007 06:07 PM, Alan Stern wrote: >> You'll get more useful results if you redo your changes to >> notifier_call_chain(). Have it print out the address of the routine >> _before_ making the call, and don't limit it to 20. That way you'll >> know exactly which notifier routine ends up hanging. > > The problem is, that notifier_call_chain is called again and again zillion > times > by somebody else... > > Anyway you led me to another idea: > * _cpu_down > printk("%s: u\n", __func__); > BUBAK=1; > /* CPU is completely dead: tell everyone. Too late to complain. */ > if (raw_notifier_call_chain(_chain, CPU_DEAD | 0x88000 | mod, > hcpu) == NOTIFY_BAD) > BUG(); > BUBAK=0; > - > * notifier_call_chain > unsigned int a = val & 0x88000; > unsigned int yes = a == 0x88000; > > nb = rcu_dereference(*nl); > > if (a && a != 0x88000) > printk("Somebody calls with val: %lx\n", val); > else > val &= ~0x88000; > > while (nb && nr_to_call) { > next_nb = rcu_dereference(nb->next); > if (unlikely(BUBAK && yes)) > printk("%s: %p\n", __func__, nb->notifier_call); > ret = nb->notifier_call(nb, val, v); > - > gives coretemp_cpu_callback -> coretemp_device_remove -> > platform_device_unregister, so coretemp seems to be what I have and you don't. Just in case you are curious: http://www.fi.muni.cz/~xslaby/sklad/susp_hang3.diff produces: http://www.fi.muni.cz/~xslaby/sklad/susp_hang3.png - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: broken suspend [Was: 2.6.24-rc2-mm1]
On Sunday, 18 of November 2007, Jiri Slaby wrote: > On 11/18/2007 11:27 PM, Rafael J. Wysocki wrote: > > You can use a global variable to switch the logging only before the CPU > > hotunplug done by the suspend code. You just need to hack > > disable_nonboot_cpus() for that. > > If I understand you correctly, that's what BUBAK variable is there for. Ah, yes. -ETOOTIRED > But it is still called again and again while the suspend code runs... You can count the number of calls and then make it print the information for the last, say, 20 of them. Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: broken suspend [Was: 2.6.24-rc2-mm1]
On 11/18/2007 11:27 PM, Rafael J. Wysocki wrote: > You can use a global variable to switch the logging only before the CPU > hotunplug done by the suspend code. You just need to hack > disable_nonboot_cpus() for that. If I understand you correctly, that's what BUBAK variable is there for. But it is still called again and again while the suspend code runs... regards, -- Jiri Slaby ([EMAIL PROTECTED]) Faculty of Informatics, Masaryk University - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: broken suspend [Was: 2.6.24-rc2-mm1]
On Sunday, 18 of November 2007, Jiri Slaby wrote: > On 11/18/2007 06:07 PM, Alan Stern wrote: > > You'll get more useful results if you redo your changes to > > notifier_call_chain(). Have it print out the address of the routine > > _before_ making the call, and don't limit it to 20. That way you'll > > know exactly which notifier routine ends up hanging. > > The problem is, that notifier_call_chain is called again and again zillion > times > by somebody else... You can use a global variable to switch the logging only before the CPU hotunplug done by the suspend code. You just need to hack disable_nonboot_cpus() for that. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] 2.6.24-rc2-mm1 - kernel bug on nfs v4
On Sun, 2007-11-18 at 19:44 +0100, Torsten Kaiser wrote: > On Nov 18, 2007 12:05 AM, Peter Zijlstra <[EMAIL PROTECTED]> wrote: > > I've been staring at this NFS code for a while an can't make any sense > > out of it. It seems to correctly initialize the waitqueue. So this would > > indicate corruption of some sort. > > No, it does not "correctly" initialize the waitqueue. It doesn't even > try to initialize it. > > I now found the guilty patch and what is wrong with it. > > nfs-stop-sillyname-renames-and-unmounts-from-racing.patch adds: > > @@ -110,8 +112,22 @@ struct nfs_server { >filesystem */ > #endif > void (*destroy)(struct nfs_server *); > + > + atomic_t active; /* Keep trace of any activity to this server */ > + wait_queue_head_t active_wq; /* Wait for any activity to stop */ > > and tries to initialize it: > @@ -593,6 +593,10 @@ static int nfs_init_server(struct nfs_server *server, > server->namelen = data->namlen; > /* Create a client RPC handle for the NFSv3 ACL management interface > */ > nfs_init_server_aclclient(server); > + > + init_waitqueue_head(>active_wq); > + atomic_set(>active, 0); > + > > and then uses it via nfs_sb_active and nfs_sb_deactive: > > @@ -29,6 +29,7 @@ struct nfs_unlinkdata { > static void > nfs_free_unlinkdata(struct nfs_unlinkdata *data) > { > + nfs_sb_deactive(NFS_SERVER(data->dir)); > iput(data->dir); > put_rpccred(data->cred); > kfree(data->args.name.name); > @@ -151,6 +152,7 @@ static int nfs_do_call_unlink(struct dentry > *parent, struct inode *dir, struct n > nfs_dec_sillycount(dir); > return 0; > } > + nfs_sb_active(NFS_SERVER(dir)); > data->args.fh = NFS_FH(dir); > nfs_fattr_init(>res.dir_attr); > > > But it does not notice this: > struct dentry_operations nfs_dentry_operations = { > .d_revalidate = nfs_lookup_revalidate, > .d_delete = nfs_dentry_delete, > .d_iput = nfs_dentry_iput, > }; > struct dentry_operations nfs4_dentry_operations = { > .d_revalidate = nfs_open_revalidate, > .d_delete = nfs_dentry_delete, > .d_iput = nfs_dentry_iput, > }; > > NFSv2/3 and NFSv4 share the same dentry_iput and so share the same > unlink and sillyrename logic. > But they do not share nfs_init_server()! > > I wonder why this doesn't blow up more violently, but only hangs... > > But as I don't know if it is correct to add the workqueue > initialization to nfs4_init_server() or remove the nfs_sb_active / > nfs_sb_deactive for the NFSv4 case, I can't offer a patch to fix this. > > Torsten I had already fixed that one in my own stack. Attached are the 3 patches that I've got. 1 from SteveD, 2 fixes. Andrew, could you please unapply the sillyrename patches you've got, and apply these 3 instead? Trond --- Begin Message --- Added an active/deactive mechanism to the nfs_server structure allowing async operations to hold off umount until the operations are done. Signed-off-by: Steve Dickson <[EMAIL PROTECTED]> Signed-off-by: Trond Myklebust <[EMAIL PROTECTED]> --- fs/nfs/client.c |4 fs/nfs/super.c| 13 + fs/nfs/unlink.c |2 ++ include/linux/nfs_fs_sb.h | 17 + 4 files changed, 36 insertions(+), 0 deletions(-) diff --git a/fs/nfs/client.c b/fs/nfs/client.c index 70587f3..2ecf726 100644 --- a/fs/nfs/client.c +++ b/fs/nfs/client.c @@ -593,6 +593,10 @@ static int nfs_init_server(struct nfs_server *server, server->namelen = data->namlen; /* Create a client RPC handle for the NFSv3 ACL management interface */ nfs_init_server_aclclient(server); + + init_waitqueue_head(>active_wq); + atomic_set(>active, 0); + dprintk("<-- nfs_init_server() = 0 [new %p]\n", clp); return 0; diff --git a/fs/nfs/super.c b/fs/nfs/super.c index 71067d1..833aed8 100644 --- a/fs/nfs/super.c +++ b/fs/nfs/super.c @@ -202,6 +202,7 @@ static int nfs_get_sb(struct file_system_type *, int, const char *, void *, stru static int nfs_xdev_get_sb(struct file_system_type *fs_type, int flags, const char *dev_name, void *raw_data, struct vfsmount *mnt); static void nfs_kill_super(struct super_block *); +static void nfs_put_super(struct super_block *); static struct file_system_type nfs_fs_type = { .owner = THIS_MODULE, @@ -223,6 +224,7 @@ static const struct super_operations nfs_sops = { .alloc_inode= nfs_alloc_inode, .destroy_inode = nfs_destroy_inode, .write_inode= nfs_write_inode, + .put_super = nfs_put_super, .statfs = nfs_statfs, .clear_inode= nfs_clear_inode, .umount_begin = nfs_umount_begin, @@ -1772,6 +1774,17 @@ static void nfs4_kill_super(struct super_block *sb)
Re: broken suspend [Was: 2.6.24-rc2-mm1]
On 11/18/2007 06:07 PM, Alan Stern wrote: > You'll get more useful results if you redo your changes to > notifier_call_chain(). Have it print out the address of the routine > _before_ making the call, and don't limit it to 20. That way you'll > know exactly which notifier routine ends up hanging. The problem is, that notifier_call_chain is called again and again zillion times by somebody else... Anyway you led me to another idea: * _cpu_down printk("%s: u\n", __func__); BUBAK=1; /* CPU is completely dead: tell everyone. Too late to complain. */ if (raw_notifier_call_chain(_chain, CPU_DEAD | 0x88000 | mod, hcpu) == NOTIFY_BAD) BUG(); BUBAK=0; - * notifier_call_chain unsigned int a = val & 0x88000; unsigned int yes = a == 0x88000; nb = rcu_dereference(*nl); if (a && a != 0x88000) printk("Somebody calls with val: %lx\n", val); else val &= ~0x88000; while (nb && nr_to_call) { next_nb = rcu_dereference(nb->next); if (unlikely(BUBAK && yes)) printk("%s: %p\n", __func__, nb->notifier_call); ret = nb->notifier_call(nb, val, v); - gives coretemp_cpu_callback -> coretemp_device_remove -> platform_device_unregister, so coretemp seems to be what I have and you don't. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] 2.6.24-rc2-mm1 - kernel bug on nfs v4
On Nov 18, 2007 12:05 AM, Peter Zijlstra <[EMAIL PROTECTED]> wrote: > I've been staring at this NFS code for a while an can't make any sense > out of it. It seems to correctly initialize the waitqueue. So this would > indicate corruption of some sort. No, it does not "correctly" initialize the waitqueue. It doesn't even try to initialize it. I now found the guilty patch and what is wrong with it. nfs-stop-sillyname-renames-and-unmounts-from-racing.patch adds: @@ -110,8 +112,22 @@ struct nfs_server { filesystem */ #endif void (*destroy)(struct nfs_server *); + + atomic_t active; /* Keep trace of any activity to this server */ + wait_queue_head_t active_wq; /* Wait for any activity to stop */ and tries to initialize it: @@ -593,6 +593,10 @@ static int nfs_init_server(struct nfs_server *server, server->namelen = data->namlen; /* Create a client RPC handle for the NFSv3 ACL management interface */ nfs_init_server_aclclient(server); + + init_waitqueue_head(>active_wq); + atomic_set(>active, 0); + and then uses it via nfs_sb_active and nfs_sb_deactive: @@ -29,6 +29,7 @@ struct nfs_unlinkdata { static void nfs_free_unlinkdata(struct nfs_unlinkdata *data) { + nfs_sb_deactive(NFS_SERVER(data->dir)); iput(data->dir); put_rpccred(data->cred); kfree(data->args.name.name); @@ -151,6 +152,7 @@ static int nfs_do_call_unlink(struct dentry *parent, struct inode *dir, struct n nfs_dec_sillycount(dir); return 0; } + nfs_sb_active(NFS_SERVER(dir)); data->args.fh = NFS_FH(dir); nfs_fattr_init(>res.dir_attr); But it does not notice this: struct dentry_operations nfs_dentry_operations = { .d_revalidate = nfs_lookup_revalidate, .d_delete = nfs_dentry_delete, .d_iput = nfs_dentry_iput, }; struct dentry_operations nfs4_dentry_operations = { .d_revalidate = nfs_open_revalidate, .d_delete = nfs_dentry_delete, .d_iput = nfs_dentry_iput, }; NFSv2/3 and NFSv4 share the same dentry_iput and so share the same unlink and sillyrename logic. But they do not share nfs_init_server()! I wonder why this doesn't blow up more violently, but only hangs... But as I don't know if it is correct to add the workqueue initialization to nfs4_init_server() or remove the nfs_sb_active / nfs_sb_deactive for the NFSv4 case, I can't offer a patch to fix this. Torsten - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: broken suspend [Was: 2.6.24-rc2-mm1]
On Sun, 18 Nov 2007, Jiri Slaby wrote: > On 11/18/2007 04:23 PM, RafaÅ J. Wysocki wrote: > > On Sunday, 18 of November 2007, Jiri Slaby wrote: > >> On 11/18/2007 04:03 PM, Rafael J. Wysocki wrote: > >>> Can you also make the new System-map available, please? > >> Sure: > >> http://www.fi.muni.cz/~xslaby/sklad/System.map1 > > > > The last notifier called in > > http://www.fi.muni.cz/~xslaby/sklad/susp_hang2.png > > Last... Note, that it's only first 20 invokations of notifiers, there are > bazillion of them when I remove the condition '< 20'. > > > is apparently cpu_swap_callback() which is not called in > > http://www.fi.muni.cz/~xslaby/sklad/susp_hang1.png . > > > > Can you verify that cpu_swap_callback() gets called if the patch is not > > applied? > > Does this still apply? You'll get more useful results if you redo your changes to notifier_call_chain(). Have it print out the address of the routine _before_ making the call, and don't limit it to 20. That way you'll know exactly which notifier routine ends up hanging. Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: broken suspend [Was: 2.6.24-rc2-mm1]
On 11/18/2007 04:23 PM, Rafał J. Wysocki wrote: > On Sunday, 18 of November 2007, Jiri Slaby wrote: >> On 11/18/2007 04:03 PM, Rafael J. Wysocki wrote: >>> Can you also make the new System-map available, please? >> Sure: >> http://www.fi.muni.cz/~xslaby/sklad/System.map1 > > The last notifier called in http://www.fi.muni.cz/~xslaby/sklad/susp_hang2.png Last... Note, that it's only first 20 invokations of notifiers, there are bazillion of them when I remove the condition '< 20'. > is apparently cpu_swap_callback() which is not called in > http://www.fi.muni.cz/~xslaby/sklad/susp_hang1.png . > > Can you verify that cpu_swap_callback() gets called if the patch is not > applied? Does this still apply? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: broken suspend [Was: 2.6.24-rc2-mm1]
On Sunday, 18 of November 2007, Jiri Slaby wrote: > On 11/18/2007 04:03 PM, Rafael J. Wysocki wrote: > > Can you also make the new System-map available, please? > > Sure: > http://www.fi.muni.cz/~xslaby/sklad/System.map1 The last notifier called in http://www.fi.muni.cz/~xslaby/sklad/susp_hang2.png is apparently cpu_swap_callback() which is not called in http://www.fi.muni.cz/~xslaby/sklad/susp_hang1.png . Can you verify that cpu_swap_callback() gets called if the patch is not applied? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: broken suspend [Was: 2.6.24-rc2-mm1]
On 11/18/2007 04:03 PM, Rafael J. Wysocki wrote: > Can you also make the new System-map available, please? Sure: http://www.fi.muni.cz/~xslaby/sklad/System.map1 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: broken suspend [Was: 2.6.24-rc2-mm1]
On Sunday, 18 of November 2007, Jiri Slaby wrote: > On 11/18/2007 02:42 PM, Rafael J. Wysocki wrote: > > On Sunday, 18 of November 2007, Jiri Slaby wrote: > >> On 11/18/2007 01:42 PM, Jiri Slaby wrote: > >>> See shot of prints here: > >>> http://www.fi.muni.cz/~xslaby/sklad/susp_hang1.png > >> BTW output from that tree minus the patch: > > > > Hm, it looks like one of the CPU hotplug notifiers is doing something wrong. > > > > Can you try to see what happens (with the patch applied) if > > thermal_throttle_cpu_callback() is not registered? > > After commenting out > //device_initcall(thermal_throttle_init_device); > it looks like this: > http://www.fi.muni.cz/~xslaby/sklad/susp_hang2.png Can you also make the new System-map available, please? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[BUG] Oops in 2.6.24-rc2-mm1
First of all, a disclaimer. I am new to the kernel and this is my first report. As such I can make mistakes. In doubt feel free to assume that the fault lies with me or my system. A computer I have crashes during the boot process. My .config is attached, I have generated it with oldconfig from a 24-rc2 kernel plus a few changes with menuconfig. Because I was not able to get a picture or the text of the oops, I have written down the most important stuff. A lot of it varies between reboots. I had to use reboot=1000 in my kernel command line to catch it, because the computer would power off at once without it. My kernel is not tained by blobs. The oops message has either [#1] or [#2] in it. The pid it reports varies. The call trace looks totally uninformative. For each chunk only the part in the [] is changing, the other bit being always <0>. Finally I get this line: usb usb5: suspend_rh (auto-stop) After this all output stops. I'm not subscribed to the list, so please CC me. If I had made any mistakes with this report, please tell me. Thank you. Karol Swietlicki # # Automatically generated make config: don't edit # Linux kernel version: 2.6.24-rc2-mm1 # Sun Nov 18 13:31:45 2007 # CONFIG_X86_32=y CONFIG_GENERIC_TIME=y CONFIG_GENERIC_CMOS_UPDATE=y CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_SEMAPHORE_SLEEPERS=y CONFIG_X86=y CONFIG_FAST_CMPXCHG_LOCAL=y CONFIG_MMU=y CONFIG_ZONE_DMA=y CONFIG_QUICKLIST=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y # CONFIG_GENERIC_GPIO is not set CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_DMI=y CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" # # General setup # CONFIG_EXPERIMENTAL=y CONFIG_LOCK_KERNEL=y CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_LOCALVERSION="" CONFIG_LOCALVERSION_AUTO=y CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y CONFIG_POSIX_MQUEUE=y # CONFIG_BSD_PROCESS_ACCT is not set # CONFIG_TASKSTATS is not set # CONFIG_USER_NS is not set # CONFIG_PID_NS is not set # CONFIG_AUDIT is not set CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y CONFIG_LOG_BUF_SHIFT=18 # CONFIG_CGROUPS is not set # CONFIG_FAIR_GROUP_SCHED is not set CONFIG_SYSFS_DEPRECATED=y # CONFIG_RELAY is not set # CONFIG_BLK_DEV_INITRD is not set CONFIG_CC_OPTIMIZE_FOR_SIZE=y CONFIG_SYSCTL=y CONFIG_EMBEDDED=y CONFIG_UID16=y CONFIG_SYSCTL_SYSCALL=y # CONFIG_KALLSYMS is not set CONFIG_HOTPLUG=y CONFIG_PRINTK=y # CONFIG_BUG is not set # CONFIG_ELF_CORE is not set CONFIG_BASE_FULL=y CONFIG_FUTEX=y CONFIG_ANON_INODES=y CONFIG_EPOLL=y CONFIG_SIGNALFD=y CONFIG_EVENTFD=y CONFIG_SHMEM=y CONFIG_VM_EVENT_COUNTERS=y # CONFIG_SLUB_DEBUG is not set # CONFIG_SLAB is not set CONFIG_SLUB=y # CONFIG_SLOB is not set # CONFIG_PROC_PAGE_MONITOR is not set CONFIG_RT_MUTEXES=y # CONFIG_TINY_SHMEM is not set CONFIG_BASE_SMALL=0 CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y CONFIG_MODULE_FORCE_UNLOAD=y # CONFIG_MODVERSIONS is not set # CONFIG_MODULE_SRCVERSION_ALL is not set # CONFIG_KMOD is not set CONFIG_STOP_MACHINE=y CONFIG_BLOCK=y # CONFIG_LBD is not set # CONFIG_BLK_DEV_IO_TRACE is not set # CONFIG_LSF is not set # CONFIG_BLK_DEV_BSG is not set # # IO Schedulers # CONFIG_IOSCHED_NOOP=y CONFIG_IOSCHED_AS=y # CONFIG_IOSCHED_DEADLINE is not set # CONFIG_IOSCHED_CFQ is not set CONFIG_DEFAULT_AS=y # CONFIG_DEFAULT_DEADLINE is not set # CONFIG_DEFAULT_CFQ is not set # CONFIG_DEFAULT_NOOP is not set CONFIG_DEFAULT_IOSCHED="anticipatory" # # Processor type and features # CONFIG_TICK_ONESHOT=y CONFIG_NO_HZ=y CONFIG_HIGH_RES_TIMERS=y CONFIG_GENERIC_CLOCKEVENTS_BUILD=y CONFIG_SMP=y CONFIG_X86_PC=y # CONFIG_X86_ELAN is not set # CONFIG_X86_VOYAGER is not set # CONFIG_X86_NUMAQ is not set # CONFIG_X86_RDC321X is not set # CONFIG_X86_SUMMIT is not set # CONFIG_X86_BIGSMP is not set # CONFIG_X86_VISWS is not set # CONFIG_X86_GENERICARCH is not set # CONFIG_X86_ES7000 is not set CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER=y # CONFIG_PARAVIRT_GUEST is not set # CONFIG_M386 is not set # CONFIG_M486 is not set # CONFIG_M586 is not set # CONFIG_M586TSC is not set # CONFIG_M586MMX is not set # CONFIG_M686 is not set # CONFIG_MPENTIUMII is not set # CONFIG_MPENTIUMIII is not set # CONFIG_MPENTIUMM is not set CONFIG_MCORE2=y # CONFIG_MPENTIUM4 is not set # CONFIG_MK6 is not set # CONFIG_MK7 is not set # CONFIG_MK8 is not set # CONFIG_MCRUSOE is not set # CONFIG_MEFFICEON is not set # CONFIG_MWINCHIPC6 is not set # CONFIG_MWINCHIP2 is not set # CONFIG_MWINCHIP3D is not set # CONFIG_MGEODEGX1 is not set # CONFIG_MGEODE_LX is not set # CONFIG_MCYRIXIII is not set # CONFIG_MVIAC3_2 is not set # CONFIG_MVIAC7 is not set # CONFIG_X86_GENERIC is not set CONFIG_X86_CMPXCHG=y CONFIG_X86_L1_CACHE_SHIFT=6 CONFIG_X86_XADD=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y # CONFIG_ARCH_HAS_ILOG2_U32 is not set # CONFIG_ARCH_HAS_ILOG2_U64 is not set CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_X
Re: broken suspend [Was: 2.6.24-rc2-mm1]
On 11/18/2007 02:42 PM, Rafael J. Wysocki wrote: > On Sunday, 18 of November 2007, Jiri Slaby wrote: >> On 11/18/2007 01:42 PM, Jiri Slaby wrote: >>> See shot of prints here: >>> http://www.fi.muni.cz/~xslaby/sklad/susp_hang1.png >> BTW output from that tree minus the patch: > > Hm, it looks like one of the CPU hotplug notifiers is doing something wrong. > > Can you try to see what happens (with the patch applied) if > thermal_throttle_cpu_callback() is not registered? After commenting out //device_initcall(thermal_throttle_init_device); it looks like this: http://www.fi.muni.cz/~xslaby/sklad/susp_hang2.png regards, -- Jiri Slaby ([EMAIL PROTECTED]) Faculty of Informatics, Masaryk University - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: broken suspend [Was: 2.6.24-rc2-mm1]
On Sunday, 18 of November 2007, Jiri Slaby wrote: > On 11/18/2007 01:42 PM, Jiri Slaby wrote: > > See shot of prints here: > > http://www.fi.muni.cz/~xslaby/sklad/susp_hang1.png > > BTW output from that tree minus the patch: Hm, it looks like one of the CPU hotplug notifiers is doing something wrong. Can you try to see what happens (with the patch applied) if thermal_throttle_cpu_callback() is not registered? > _cpu_down: s > _cpu_down: t > CPU 1 is now offline > SMP alternatives: switching to UP code > _cpu_down: u > notifier_call_chain: c 80232370 1 > notifier_call_chain: c 8026EF10 1 > notifier_call_chain: c 8024B8F0 1 > notifier_call_chain: c 802419E0 1 > notifier_call_chain: c 80255B50 1 > notifier_call_chain: c 80250C40 1 > notifier_call_chain: c 8028E8F0 1 > notifier_call_chain: c 802B59C0 1 > notifier_call_chain: c 80323460 1 > notifier_call_chain: c 80270990 0 > notifier_call_chain: c 8023D5D0 1 > notifier_call_chain: c 80266090 1 > notifier_call_chain: c 802320A0 1 > notifier_call_chain: c 80249DA0 1 > notifier_call_chain: c 80318440 1 > notifier_call_chain: c 8047BE80 1 > notifier_call_chain: c 80212F40 0 > notifier_call_chain: c 80216350 1 > notifier_call_chain: c 80217220 1 > notifier_call_chain: c 80218120 1 > _cpu_down: v > _cpu_down: w > _cpu_down: x > _cpu_down: y > _cpu_down: z > disable_nonboot_cpus: 3 0 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: broken suspend [Was: 2.6.24-rc2-mm1]
On 11/18/2007 01:42 PM, Jiri Slaby wrote: > See shot of prints here: > http://www.fi.muni.cz/~xslaby/sklad/susp_hang1.png BTW output from that tree minus the patch: _cpu_down: s _cpu_down: t CPU 1 is now offline SMP alternatives: switching to UP code _cpu_down: u notifier_call_chain: c 80232370 1 notifier_call_chain: c 8026EF10 1 notifier_call_chain: c 8024B8F0 1 notifier_call_chain: c 802419E0 1 notifier_call_chain: c 80255B50 1 notifier_call_chain: c 80250C40 1 notifier_call_chain: c 8028E8F0 1 notifier_call_chain: c 802B59C0 1 notifier_call_chain: c 80323460 1 notifier_call_chain: c 80270990 0 notifier_call_chain: c 8023D5D0 1 notifier_call_chain: c 80266090 1 notifier_call_chain: c 802320A0 1 notifier_call_chain: c 80249DA0 1 notifier_call_chain: c 80318440 1 notifier_call_chain: c 8047BE80 1 notifier_call_chain: c 80212F40 0 notifier_call_chain: c 80216350 1 notifier_call_chain: c 80217220 1 notifier_call_chain: c 80218120 1 _cpu_down: v _cpu_down: w _cpu_down: x _cpu_down: y _cpu_down: z disable_nonboot_cpus: 3 0 regards, -- Jiri Slaby ([EMAIL PROTECTED]) Faculty of Informatics, Masaryk University - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: broken suspend [Was: 2.6.24-rc2-mm1]
Alan Stern napsal(a): > On Sat, 17 Nov 2007, Rafael J. Wysocki wrote: >> On Saturday, 17 of November 2007, Jiri Slaby wrote: >>> On 11/16/2007 05:10 PM, Alan Stern wrote: The thing to do is figure out which driver is causing the problem. Jiri, try enabling CONFIG_DEBUG_DRIVER. >>> Sadly no output. Nice update scripts wiped kern.* from syslog config file out, hence no output before. > Back to the main topic... My system hibernates and resumes with no > apparent problem. Jiri, it looks like you'll have to do some debug > tracing of the routines in drivers/base/power/main.c. Beside this two nothing strange: dpm_suspend: b 00:06 WARNING: at /home/l/latest/bughunt/kernel/resource.c:185 __release_resource() Call Trace: [] release_resource+0xb5/0xf0 [] pnp_release_resources+0x70/0x130 [] pnp_stop_dev+0x45/0x90 [] pnp_bus_suspend+0x92/0xb0 [] suspend_device+0x113/0x180 [] device_suspend+0x200/0x320 [] suspend_devices_and_enter+0xa5/0x170 [] enter_state+0x209/0x270 [] state_store+0xaf/0xf0 [] kobj_attr_store+0x17/0x20 [] sysfs_write_file+0xce/0x140 [] vfs_write+0xc7/0x170 [] sys_write+0x50/0x90 [] system_call+0x7e/0x83 WARNING: at /home/l/latest/bughunt/kernel/resource.c:189 __release_resource() Call Trace: [] release_resource+0xe0/0xf0 [] pnp_release_resources+0x70/0x130 [] pnp_stop_dev+0x45/0x90 [] pnp_bus_suspend+0x92/0xb0 [] suspend_device+0x113/0x180 [] device_suspend+0x200/0x320 [] suspend_devices_and_enter+0xa5/0x170 [] enter_state+0x209/0x270 [] state_store+0xaf/0xf0 [] kobj_attr_store+0x17/0x20 [] sysfs_write_file+0xce/0x140 [] vfs_write+0xc7/0x170 [] sys_write+0x50/0x90 [] system_call+0x7e/0x83 ... dpm_suspend: b :00:1f.5 ACPI Error (psargs-0355): [FZHD] Namespace lookup failure, AE_NOT_FOUND ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.SAT1.CHN0._GTM] (Node 81007D000220), AE_NOT_FOUND ACPI Error (psargs-0355): [FZHD] Namespace lookup failure, AE_NOT_FOUND ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.SAT1.CHN1._GTM] (Node 81007D000360), AE_NOT_FOUND It's stuck at _cpu_down (enter_state -> suspend_devices_and_enter -> disable_nonboot_cpus -> _cpu_down) after calling raw_notifier_call_chain printk("%s: s\n", __func__); /* Wait for it to sleep (leaving idle task). */ while (!idle_cpu(cpu)) yield(); printk("%s: t\n", __func__); /* This actually kills the CPU. */ __cpu_die(cpu); printk("%s: u\n", __func__); BUBAK=1; /* CPU is completely dead: tell everyone. Too late to complain. */ if (raw_notifier_call_chain(_chain, CPU_DEAD | mod, hcpu) == NOTIFY_BAD) BUG(); BUBAK=0; printk("%s: v\n", __func__); See shot of prints here: http://www.fi.muni.cz/~xslaby/sklad/susp_hang1.png notifier_call_chain looks like: while (nb && nr_to_call) { next_nb = rcu_dereference(nb->next); ret = nb->notifier_call(nb, val, v); if (unlikely(BUBAK && cnt < 20 && (ret != lastr || lastp != nb->notifier_call))) { printk("%s: c %p %d\n", __func__, nb->notifier_call, ret); lastr = ret; lastp = nb->notifier_call; cnt++; } if (nr_calls) (*nr_calls)++; if ((ret & NOTIFY_STOP_MASK) == NOTIFY_STOP_MASK) break; nb = next_nb; nr_to_call--; } System.map is here if you are curoius what are the pointers from the snapshot: http://www.fi.muni.cz/~xslaby/sklad/System.map regards, -- http://www.fi.muni.cz/~xslaby/Jiri Slaby faculty of informatics, masaryk university, brno, cz - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/