Re: lockdep problem conversion semaphore->mutex (dev->sem)
Peter Zijlstra wrote, On 12/08/2007 09:50 PM: > On Sat, 2007-12-08 at 21:33 +0100, Remy Bohmer wrote: > >> Which problems? I did not see any special things, it looked rather >> straight forward. What have I overlooked? > > On suspend it locks the whole device tree, this means it has 'unbounded' > nesting and holds an 'unbounded' number of locks. Neither things are > easy to annotate (remember that mutex_lock_nested can handle up to 8 > nestings and current->held_locks has a max of 30). > > In fact, converting this will be the hardest part, it would require > reworking the locking and introduction of a hard limit on the device > tree depth - this might upset some people, but I suspect that 16 or 24 > should be deep enough for pretty much anything. Of course, if people > prove me wrong, I'll have to reconsider. The up-side of the locking > scheme I'm thinking of will be that locking the whole tree will only > take 'depth' number of opterations vs the total number of tree elements. Of course, I don't know the problem enough, and would be glad if somebody give me a hint, but I wonder why so deep nesting with lockdep's modification is necessary here? Does buses have parent buses and so on x8? Why isn't here considered creating of different lockdep classes according to types of buses and devices "the usual way"? This way seems to be quite easy in later debugging. Regards, Jarek P. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: lockdep problem conversion semaphore->mutex (dev->sem)
Peter, Thanks for this clear answer. Remy 2007/12/8, Peter Zijlstra <[EMAIL PROTECTED]>: > > On Sat, 2007-12-08 at 21:33 +0100, Remy Bohmer wrote: > > > Which problems? I did not see any special things, it looked rather > > straight forward. What have I overlooked? > > On suspend it locks the whole device tree, this means it has 'unbounded' > nesting and holds an 'unbounded' number of locks. Neither things are > easy to annotate (remember that mutex_lock_nested can handle up to 8 > nestings and current->held_locks has a max of 30). > > In fact, converting this will be the hardest part, it would require > reworking the locking and introduction of a hard limit on the device > tree depth - this might upset some people, but I suspect that 16 or 24 > should be deep enough for pretty much anything. Of course, if people > prove me wrong, I'll have to reconsider. The up-side of the locking > scheme I'm thinking of will be that locking the whole tree will only > take 'depth' number of opterations vs the total number of tree elements. > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: lockdep problem conversion semaphore->mutex (dev->sem)
On Sat, 2007-12-08 at 21:33 +0100, Remy Bohmer wrote: > Which problems? I did not see any special things, it looked rather > straight forward. What have I overlooked? On suspend it locks the whole device tree, this means it has 'unbounded' nesting and holds an 'unbounded' number of locks. Neither things are easy to annotate (remember that mutex_lock_nested can handle up to 8 nestings and current->held_locks has a max of 30). In fact, converting this will be the hardest part, it would require reworking the locking and introduction of a hard limit on the device tree depth - this might upset some people, but I suspect that 16 or 24 should be deep enough for pretty much anything. Of course, if people prove me wrong, I'll have to reconsider. The up-side of the locking scheme I'm thinking of will be that locking the whole tree will only take 'depth' number of opterations vs the total number of tree elements. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: lockdep problem conversion semaphore->mutex (dev->sem)
Hello Ingo, > no, you are wrong. If you want to do complex locking, you can still do > it: take a look at the dev->sem conversion patches from Peter which > correctly do this. Lockdep has all the facilities for that. > (you just dont know about them) Ok. > the general policy message here is: do not implement complex locking. It > hurts. It's hard to maintain. It's easy to mess up and leads to bugs. Agree > Lockdep just makes that plain obvious. > Your mail and your frustration shows this general concept in happy Please, do not see it as frustration, because it isn't... I just want to understand it better. > action: judging from your comments you have little clue about dev->sem > locking and its implications and you'd happily go along and pollute the > kernel with complex and hard to maintain nested locking constructs. Now, you assume that i would _like_ complex locking. This is not true. I just want to understand what was so wrong with dev->sem, I even read in the discussions before about dev->sem, that it still was on the old semaphore interface to get around lockdep issues, and _that_ is wrong. That is bug-hiding from either the code or the tool. I just wanted to understand if this was a lockdep bug, or a real code bug. > Lockdep prevents you from doing it mindlessly, it _forces_ you to first > understand the data structures, their locking and their relationship > with each other. Then you can implement complexity, if you still want > it. > > That, Sir, is a Good Thing (tm). Completely agree. Remy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: lockdep problem conversion semaphore->mutex (dev->sem)
Hello Peter, > And while you might not see it in-tree anymore, lockdep does help out > tremendously while developing new code. I'm sure that without it the > locking would be in a much worse state than it is today. I am not arguing that, I am also convinced it has done a good job. > I have a good idea on how to annotate this, but not yet the time to do > so. Sounds good... > Aside from the nesting problems, the dev->sem has other problems as > well. Converting this is rather non-trivial. Which problems? I did not see any special things, it looked rather straight forward. What have I overlooked? > I'd not put it as harshly as you put it though, lockdep makes some > assumptions which can lead to false positives - By putting it this black and white, it usually helps to get all the opinions clear ;-) (By staying in the middle, everybody usually tend to agree ;-) > otoh these assumptions > often end up pointing out 'curious' locking coupled to 'curious' data > structures. And fixing up these things often leads to better and simpler > code. > The emphasis is on often, this is one of the cases where this is not so. > So while it does restain the creativity of locking it often ends up > being for the better. Ack. Kind Regards, Remy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: lockdep problem conversion semaphore->mutex (dev->sem)
* Remy Bohmer <[EMAIL PROTECTED]> wrote: > But... now we do not transfer the dev->sem to a mutex, because lockdep > will start generating false positives, and thus we mask the lockdep > error, by not converting the dev->sem to what it really is... no, you are wrong. If you want to do complex locking, you can still do it: take a look at the dev->sem conversion patches from Peter which correctly do this. Lockdep has all the facilities for that. (you just dont know about them) Currently there are 4459 critical sections in the kernel that use mutexes and which work fine with lockdep. the general policy message here is: do not implement complex locking. It hurts. It's hard to maintain. It's easy to mess up and leads to bugs. Lockdep just makes that plain obvious. Your mail and your frustration shows this general concept in happy action: judging from your comments you have little clue about dev->sem locking and its implications and you'd happily go along and pollute the kernel with complex and hard to maintain nested locking constructs. Lockdep prevents you from doing it mindlessly, it _forces_ you to first understand the data structures, their locking and their relationship with each other. Then you can implement complexity, if you still want it. That, Sir, is a Good Thing (tm). Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: lockdep problem conversion semaphore->mutex (dev->sem)
On Sat, 2007-12-08 at 20:52 +0100, Remy Bohmer wrote: > Hello Peter and Daniel, > > > Yeah, it are different lock instances, however by virtue of sharing the > > same lock init site, they belong to the same lock class. Lockdep works > > by tracking class dependancies, not instance dependancies. > > The device and its parent both indeed have different > pointers/instances. I saw that during debugging yesterday, so I > already expected this was not really a bug, but a false positive of > lockdep. > > > By generalizing to classes it can detect locking errors before they > > actually occur. > > Basically that is a good thing... > But... now we do not transfer the dev->sem to a mutex, because lockdep > will start generating false positives, and thus we mask the lockdep > error, by not converting the dev->sem to what it really is... > > This does give me a bad feeling about this... In short, we are > implementing workarounds in good code code to hide bugs in > debug-tooling... ?! Its not a bug, its a feature! :-) > So, any suggestions on how to fix lockdep? Anyone some good hints > where I can start? > Is it that fundamental to lockdep that it cannot be fixed without > breaking it, or do we have to instrument the code that tells lockdep > to ignore this particular mutex? I have a good idea on how to annotate this, but not yet the time to do so. Aside from the nesting problems, the dev->sem has other problems as well. Converting this is rather non-trivial. I'd not put it as harshly as you put it though, lockdep makes some assumptions which can lead to false positives - otoh these assumptions often end up pointing out 'curious' locking coupled to 'curious' data structures. And fixing up these things often leads to better and simpler code. The emphasis is on often, this is one of the cases where this is not so. So while it does restain the creativity of locking it often ends up being for the better. And while you might not see it in-tree anymore, lockdep does help out tremendously while developing new code. I'm sure that without it the locking would be in a much worse state than it is today. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: lockdep problem conversion semaphore->mutex (dev->sem)
Hello Peter and Daniel, > Yeah, it are different lock instances, however by virtue of sharing the > same lock init site, they belong to the same lock class. Lockdep works > by tracking class dependancies, not instance dependancies. The device and its parent both indeed have different pointers/instances. I saw that during debugging yesterday, so I already expected this was not really a bug, but a false positive of lockdep. > By generalizing to classes it can detect locking errors before they > actually occur. Basically that is a good thing... But... now we do not transfer the dev->sem to a mutex, because lockdep will start generating false positives, and thus we mask the lockdep error, by not converting the dev->sem to what it really is... This does give me a bad feeling about this... In short, we are implementing workarounds in good code code to hide bugs in debug-tooling... ?! So, any suggestions on how to fix lockdep? Anyone some good hints where I can start? Is it that fundamental to lockdep that it cannot be fixed without breaking it, or do we have to instrument the code that tells lockdep to ignore this particular mutex? Kind Regards, Remy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: lockdep problem conversion semaphore->mutex (dev->sem)
On Sat, 2007-12-08 at 08:53 -0800, Daniel Walker wrote: > On Sat, 2007-12-08 at 13:16 +0100, Peter Zijlstra wrote: > > On Sat, 2007-12-08 at 00:02 +0100, Remy Bohmer wrote: > > > Hello Peter, > > > > > > > > What specifically is wrong with dev->sem ? > > > > > > > > Nothing really, other than that they use semaphores to avoid lockdep :-/ > > > > > > > > I think I know how to annotate this, after Alan Stern explained all the > > > > use cases, but I haven't come around to implementing it. Hope to do that > > > > soonish. > > > > > > I was looking for an easy semaphore I could convert to a mutex, and I > > > ran into one that was widely spread and interesting, and which seemed > > > quite doable at first sight. > > > So, I started working on it, but was forgotten this discussion, (until > > > Daniel made me remember it this afternoon). So, I (stupid me ;-) ) > > > tried to convert dev->sem... > > > > > > After doing the monkey part of the conversion I can boot the kernel > > > completely on X86 and ARM, and everything works fine, except after > > > enabling lockdep, lockdep starts complaining... > > > > > > Is this the problem you were pointing at? > > > > Yeah, one of the interesting nestings :-) > > It must be the locking in __driver_attach(), taking dev->parent->sem > then taking dev->sem .. Assuming those are different structures, why > does lockdep trigger? They aren't different, parent is a struct device again. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: lockdep problem conversion semaphore->mutex (dev->sem)
On Sat, 2007-12-08 at 09:06 -0800, Daniel Walker wrote: > On Sat, 2007-12-08 at 18:11 +0100, Peter Zijlstra wrote: > > > > It must be the locking in __driver_attach(), taking dev->parent->sem > > > then taking dev->sem .. Assuming those are different structures, why > > > does lockdep trigger? > > > > They aren't different, parent is a struct device again. > > It's different memory tho .. I wasn't sure how to term that .. The locks > are in two different memory location so it couldn't be recursive .. I'm > only asking for my own understanding .. I don't mind inspecting > potentially bad locking .. Yeah, it are different lock instances, however by virtue of sharing the same lock init site, they belong to the same lock class. Lockdep works by tracking class dependancies, not instance dependancies. By generalizing to classes it can detect locking errors before they actually occur. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: lockdep problem conversion semaphore->mutex (dev->sem)
On Sat, 2007-12-08 at 18:11 +0100, Peter Zijlstra wrote: > > It must be the locking in __driver_attach(), taking dev->parent->sem > > then taking dev->sem .. Assuming those are different structures, why > > does lockdep trigger? > > They aren't different, parent is a struct device again. It's different memory tho .. I wasn't sure how to term that .. The locks are in two different memory location so it couldn't be recursive .. I'm only asking for my own understanding .. I don't mind inspecting potentially bad locking .. Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: lockdep problem conversion semaphore->mutex (dev->sem)
On Sat, 2007-12-08 at 13:16 +0100, Peter Zijlstra wrote: > On Sat, 2007-12-08 at 00:02 +0100, Remy Bohmer wrote: > > Hello Peter, > > > > > > What specifically is wrong with dev->sem ? > > > > > > Nothing really, other than that they use semaphores to avoid lockdep :-/ > > > > > > I think I know how to annotate this, after Alan Stern explained all the > > > use cases, but I haven't come around to implementing it. Hope to do that > > > soonish. > > > > I was looking for an easy semaphore I could convert to a mutex, and I > > ran into one that was widely spread and interesting, and which seemed > > quite doable at first sight. > > So, I started working on it, but was forgotten this discussion, (until > > Daniel made me remember it this afternoon). So, I (stupid me ;-) ) > > tried to convert dev->sem... > > > > After doing the monkey part of the conversion I can boot the kernel > > completely on X86 and ARM, and everything works fine, except after > > enabling lockdep, lockdep starts complaining... > > > > Is this the problem you were pointing at? > > Yeah, one of the interesting nestings :-) It must be the locking in __driver_attach(), taking dev->parent->sem then taking dev->sem .. Assuming those are different structures, why does lockdep trigger? Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: lockdep problem conversion semaphore->mutex (dev->sem)
On Sat, 2007-12-08 at 00:02 +0100, Remy Bohmer wrote: > Hello Peter, > > > > What specifically is wrong with dev->sem ? > > > > Nothing really, other than that they use semaphores to avoid lockdep :-/ > > > > I think I know how to annotate this, after Alan Stern explained all the > > use cases, but I haven't come around to implementing it. Hope to do that > > soonish. > > I was looking for an easy semaphore I could convert to a mutex, and I > ran into one that was widely spread and interesting, and which seemed > quite doable at first sight. > So, I started working on it, but was forgotten this discussion, (until > Daniel made me remember it this afternoon). So, I (stupid me ;-) ) > tried to convert dev->sem... > > After doing the monkey part of the conversion I can boot the kernel > completely on X86 and ARM, and everything works fine, except after > enabling lockdep, lockdep starts complaining... > > Is this the problem you were pointing at? Yeah, one of the interesting nestings :-) > I tried debugging it, and I have not found a recursive mutex locking > so far, only locking of 2 different mutexes in a row prior to this > warning, which IMO should be valid. > > What is your opinion? Yeah, the locking is all valid afaics, its just that it needs some interesting annotations to make lockdep see it that way. > BTW: I attached my patch for dev->sem as I have it now, that generates > this lockdep warning ( for if you want to look at it yourself also, so > you do not have to do the monkey part yourself anymore ;-) I have a similar patch floating around, but thanks anyway :-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
lockdep problem conversion semaphore->mutex (dev->sem)
Hello Peter, > > What specifically is wrong with dev->sem ? > > Nothing really, other than that they use semaphores to avoid lockdep :-/ > > I think I know how to annotate this, after Alan Stern explained all the > use cases, but I haven't come around to implementing it. Hope to do that > soonish. I was looking for an easy semaphore I could convert to a mutex, and I ran into one that was widely spread and interesting, and which seemed quite doable at first sight. So, I started working on it, but was forgotten this discussion, (until Daniel made me remember it this afternoon). So, I (stupid me ;-) ) tried to convert dev->sem... After doing the monkey part of the conversion I can boot the kernel completely on X86 and ARM, and everything works fine, except after enabling lockdep, lockdep starts complaining... Is this the problem you were pointing at? === Setting up standard PCI resources ACPI: EC: Look up EC in DSDT ACPI: Interpreter enabled ACPI: (supports S0 S1 S3 S4 S5) ACPI: Using IOAPIC for interrupt routing = [ INFO: possible recursive locking detected ] 2.6.24-rc4 #5 - swapper/1 is trying to acquire lock: (&dev->mut){--..}, at: [] __driver_attach+0x2c/0x5f but task is already holding lock: (&dev->mut){--..}, at: [] __driver_attach+0x1d/0x5f other info that might help us debug this: 1 lock held by swapper/1: #0: (&dev->mut){--..}, at: [] __driver_attach+0x1d/0x5f stack backtrace: Pid: 1, comm: swapper Not tainted 2.6.24-rc4 #5 [] __lock_acquire+0x17b/0xb83 [] trace_hardirqs_on+0x11a/0x13d [] lock_acquire+0x5f/0x77 [] __driver_attach+0x2c/0x5f [] mutex_lock_nested+0x105/0x26b [] __driver_attach+0x2c/0x5f [] __driver_attach+0x2c/0x5f [] __driver_attach+0x0/0x5f [] __driver_attach+0x2c/0x5f [] bus_for_each_dev+0x3a/0x5c [] driver_attach+0x16/0x18 [] __driver_attach+0x0/0x5f [] bus_add_driver+0x6d/0x19a [] acpi_ec_init+0x33/0x51 [] kernel_init+0x148/0x2af [] kernel_init+0x0/0x2af [] kernel_init+0x0/0x2af [] kernel_thread_helper+0x7/0x10 === ACPI: PCI Root Bridge [PCI0] (:00) Force enabled HPET at base address 0xfed0 === I tried debugging it, and I have not found a recursive mutex locking so far, only locking of 2 different mutexes in a row prior to this warning, which IMO should be valid. What is your opinion? BTW: I attached my patch for dev->sem as I have it now, that generates this lockdep warning ( for if you want to look at it yourself also, so you do not have to do the monkey part yourself anymore ;-) Kind Regards, Remy patch_replace_sem_by_mutex_in_device_h Description: Binary data