Re: [PATCH 3.14.25-rt22 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997!
On 03/09/2015 05:36 PM, Steven Rostedt wrote: > BTW, I'm going to start with 3.18-rt1 and see what's been added to the > other -rt updates. If there's something I need that was added to > 3.18-rt1 can you let me know. That is, if it wasn't marked with a > stable-rt tag. My scripts will find those. I tried to create new patch files for new things and mark them stable where possible. I didn't mark the "simple-work" (a workqueue based on swait to avoid a new kernel thread for each of those things where we need to schedule a workqueue or something like that from atomic context, like MCE) with cc stable. The patch where I reverted this timer thingy is Revert-timers-do-not-raise-softirq-unconditionally.patch and it was due to hrtimer beeing broken on 3.18 and has no stable tag. I'm not sure about the multi-queue block stuff (*-mq-* in the queue). You might want take a look. I updated a few patches because I managed to create deadlocks. I'm not sure if it was possible deadlock in v3.14 and I simply didn't trigger if the code changed and it become possible. I remember that the multi queue block code entered v3.12 but was unused and started with v3.14 it gained a user. > > -- Steve Sebastian -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3.14.25-rt22 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997!
On Fri, 6 Mar 2015 13:19:21 +0100 Sebastian Andrzej Siewior wrote: > * Steven Rostedt | 2015-02-26 09:06:10 [-0500]: > > >If we can pull that off and remove all rtmutex trylocks from hardirq > >context, I would much rather do that. > > > >This hocus pocus coding is just going to lead us down the path of the > >black arts. I already have a black cat, so I'm good to go. > > Okay. So what is the fix for v3.14 and other trees where the rtmutex > reworked has been integrated? Revert the optimization like I did in > v3.18? Probably. I'm going to start looking at what's in 3.18 and start porting changes to 3.14 and earlier. I've only been updating the stable kernels to pull in the mainline stable updates. I haven't reverted anything (unless mainline stable caused things to be obsolete), nor have I added any updates. BTW, I'm going to start with 3.18-rt1 and see what's been added to the other -rt updates. If there's something I need that was added to 3.18-rt1 can you let me know. That is, if it wasn't marked with a stable-rt tag. My scripts will find those. -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3.14.25-rt22 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997!
On 03/09/2015 05:36 PM, Steven Rostedt wrote: BTW, I'm going to start with 3.18-rt1 and see what's been added to the other -rt updates. If there's something I need that was added to 3.18-rt1 can you let me know. That is, if it wasn't marked with a stable-rt tag. My scripts will find those. I tried to create new patch files for new things and mark them stable where possible. I didn't mark the simple-work (a workqueue based on swait to avoid a new kernel thread for each of those things where we need to schedule a workqueue or something like that from atomic context, like MCE) with cc stable. The patch where I reverted this timer thingy is Revert-timers-do-not-raise-softirq-unconditionally.patch and it was due to hrtimer beeing broken on 3.18 and has no stable tag. I'm not sure about the multi-queue block stuff (*-mq-* in the queue). You might want take a look. I updated a few patches because I managed to create deadlocks. I'm not sure if it was possible deadlock in v3.14 and I simply didn't trigger if the code changed and it become possible. I remember that the multi queue block code entered v3.12 but was unused and started with v3.14 it gained a user. -- Steve Sebastian -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3.14.25-rt22 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997!
On Fri, 6 Mar 2015 13:19:21 +0100 Sebastian Andrzej Siewior bige...@linutronix.de wrote: * Steven Rostedt | 2015-02-26 09:06:10 [-0500]: If we can pull that off and remove all rtmutex trylocks from hardirq context, I would much rather do that. This hocus pocus coding is just going to lead us down the path of the black arts. I already have a black cat, so I'm good to go. Okay. So what is the fix for v3.14 and other trees where the rtmutex reworked has been integrated? Revert the optimization like I did in v3.18? Probably. I'm going to start looking at what's in 3.18 and start porting changes to 3.14 and earlier. I've only been updating the stable kernels to pull in the mainline stable updates. I haven't reverted anything (unless mainline stable caused things to be obsolete), nor have I added any updates. BTW, I'm going to start with 3.18-rt1 and see what's been added to the other -rt updates. If there's something I need that was added to 3.18-rt1 can you let me know. That is, if it wasn't marked with a stable-rt tag. My scripts will find those. -- Steve -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3.14.25-rt22 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997!
* Steven Rostedt | 2015-02-26 09:06:10 [-0500]: >If we can pull that off and remove all rtmutex trylocks from hardirq >context, I would much rather do that. > >This hocus pocus coding is just going to lead us down the path of the >black arts. I already have a black cat, so I'm good to go. Okay. So what is the fix for v3.14 and other trees where the rtmutex reworked has been integrated? Revert the optimization like I did in v3.18? Sebastian -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3.14.25-rt22 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997!
* Steven Rostedt | 2015-02-26 09:06:10 [-0500]: If we can pull that off and remove all rtmutex trylocks from hardirq context, I would much rather do that. This hocus pocus coding is just going to lead us down the path of the black arts. I already have a black cat, so I'm good to go. Okay. So what is the fix for v3.14 and other trees where the rtmutex reworked has been integrated? Revert the optimization like I did in v3.18? Sebastian -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3.14.25-rt22 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997!
On Thu, 26 Feb 2015 14:56:30 +0100 Sebastian Andrzej Siewior wrote: > I am not sure if we want keep doing that. The only reason why we grab > the lock in the first place was to check if there is a timer pending > and we run on the isolated CPU. It should not matter for the other CPUs, > right? > So instead going further that road, what about storing base->next_timer > someplace so it can be obtained via atomic_read() for the isolated CPUs? > If we can pull that off and remove all rtmutex trylocks from hardirq context, I would much rather do that. This hocus pocus coding is just going to lead us down the path of the black arts. I already have a black cat, so I'm good to go. -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3.14.25-rt22 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997!
* Steven Rostedt | 2015-02-23 19:57:43 [-0500]: >On Mon, 23 Feb 2015 17:16:27 -0700 >Thavatchai Makphaibulchoke wrote: >> If I'm not mistaken, another reason could also be due to the rate of the >> timer interrupt, in the case that the mutex is highly contested IH could >> stall the non-real-time requester for a long time, even to the point of >> the cpu is perceived as hung. > >Perhaps we should just have trylocks fail if there are other waiters. >As it wont slow down the high priority task. And doing that would >probably even help other rt tasks that are blocked on the lock waiting >for a release. Why make those tasks wait more if even a higher priority >task is simply doing a trylock and can safely fail it. At least we >could do that if the task trying to get the lock is a interrupt. What happened so far? The events I remember: - we gained FULL_NO_HZ - people started to isolate CPUs and push their work / RT tasks there - it has been noticed that the kernel is raising the timer softirq even there is nothing going on once the softirq was started. - tglx came with a patch which could go mainline if solves the problem. - this patch did not make its way upstream (yet) and I pulled it into -RT. Isn't this a problem in mainline, too? Why isn't there anyone screaming? - we had dead locks because the inner-lock of the sleeping was not safe to be used from hardirq context. #1 - we had boxes freezing on startup and not making progress due to missed irq_work wakeups. #2 - we had a deadlock splat on UP because the trylock failed. This was fixed by only allowing this feature on SMP (since it only makes sense with isolated CPUs). #3 - Now since the rtmutex rework we have dead lock splats which BUG()s the systems. #4 The four problems we had/have so far are -RT specific but still plainfull when I think back. rtmutex wasn't made to be accessed from hardirq context. That is where we use the rawlocks. One problem that we still have and Peter pointer out around #1 is about owner boosting if the lock is held in hardirq context and the wrong owner is recorded. This problem was ignored so far. Using a fake task as you suggest in irq context and ignoring would somehow fix the boosting problem and avoid the deadlock we see now. I am not sure if we want keep doing that. The only reason why we grab the lock in the first place was to check if there is a timer pending and we run on the isolated CPU. It should not matter for the other CPUs, right? So instead going further that road, what about storing base->next_timer someplace so it can be obtained via atomic_read() for the isolated CPUs? >-- Steve Sebastian -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3.14.25-rt22 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997!
On Thu, 26 Feb 2015 14:56:30 +0100 Sebastian Andrzej Siewior bige...@linutronix.de wrote: I am not sure if we want keep doing that. The only reason why we grab the lock in the first place was to check if there is a timer pending and we run on the isolated CPU. It should not matter for the other CPUs, right? So instead going further that road, what about storing base-next_timer someplace so it can be obtained via atomic_read() for the isolated CPUs? If we can pull that off and remove all rtmutex trylocks from hardirq context, I would much rather do that. This hocus pocus coding is just going to lead us down the path of the black arts. I already have a black cat, so I'm good to go. -- Steve -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3.14.25-rt22 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997!
* Steven Rostedt | 2015-02-23 19:57:43 [-0500]: On Mon, 23 Feb 2015 17:16:27 -0700 Thavatchai Makphaibulchoke thavatchai.makpahibulch...@hp.com wrote: If I'm not mistaken, another reason could also be due to the rate of the timer interrupt, in the case that the mutex is highly contested IH could stall the non-real-time requester for a long time, even to the point of the cpu is perceived as hung. Perhaps we should just have trylocks fail if there are other waiters. As it wont slow down the high priority task. And doing that would probably even help other rt tasks that are blocked on the lock waiting for a release. Why make those tasks wait more if even a higher priority task is simply doing a trylock and can safely fail it. At least we could do that if the task trying to get the lock is a interrupt. What happened so far? The events I remember: - we gained FULL_NO_HZ - people started to isolate CPUs and push their work / RT tasks there - it has been noticed that the kernel is raising the timer softirq even there is nothing going on once the softirq was started. - tglx came with a patch which could go mainline if solves the problem. - this patch did not make its way upstream (yet) and I pulled it into -RT. Isn't this a problem in mainline, too? Why isn't there anyone screaming? - we had dead locks because the inner-lock of the sleeping was not safe to be used from hardirq context. #1 - we had boxes freezing on startup and not making progress due to missed irq_work wakeups. #2 - we had a deadlock splat on UP because the trylock failed. This was fixed by only allowing this feature on SMP (since it only makes sense with isolated CPUs). #3 - Now since the rtmutex rework we have dead lock splats which BUG()s the systems. #4 The four problems we had/have so far are -RT specific but still plainfull when I think back. rtmutex wasn't made to be accessed from hardirq context. That is where we use the rawlocks. One problem that we still have and Peter pointer out around #1 is about owner boosting if the lock is held in hardirq context and the wrong owner is recorded. This problem was ignored so far. Using a fake task as you suggest in irq context and ignoring would somehow fix the boosting problem and avoid the deadlock we see now. I am not sure if we want keep doing that. The only reason why we grab the lock in the first place was to check if there is a timer pending and we run on the isolated CPU. It should not matter for the other CPUs, right? So instead going further that road, what about storing base-next_timer someplace so it can be obtained via atomic_read() for the isolated CPUs? -- Steve Sebastian -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3.14.25-rt22 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997!
On Mon, 23 Feb 2015 17:16:27 -0700 Thavatchai Makphaibulchoke wrote: > Thanks again for the comments and suggestion. > > Yes, creating a per cpu fake task was one of the alternative considered. > I believe one of the reasons I did not purse is the amount of extra > storage it requires (sizeof(struct task_struct) * number of cpus. > Though the changes may not be as intrusive as the one I sent, some are > still required, mainly with current (one in particular came to mind is > in wakeup_next-watier()). Yeah, that's a draw back, but still. Big CPU machines should have big memory. If we really are concerned, we could tidy up task_struct a bit, rearranging the fields to make sure like fields are together (helps with cache too) and move the RT stuff up further. Then we could allocate just enough to cover all the task struct fields that are accessed in the rtmutex code. > > If I'm not mistaken, another reason could also be due to the rate of the > timer interrupt, in the case that the mutex is highly contested IH could > stall the non-real-time requester for a long time, even to the point of > the cpu is perceived as hung. Perhaps we should just have trylocks fail if there are other waiters. As it wont slow down the high priority task. And doing that would probably even help other rt tasks that are blocked on the lock waiting for a release. Why make those tasks wait more if even a higher priority task is simply doing a trylock and can safely fail it. At least we could do that if the task trying to get the lock is a interrupt. -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3.14.25-rt22 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997!
On 02/23/2015 11:37 AM, Steven Rostedt wrote: > > OK, I believe I understand the issue. Perhaps it would be much better > to create a fake task per CPU that we use when grabbing locks in > interrupt mode. And make these have a priority of 0 (highest), since > they can not be preempted, they do have such a priority. > > Then in the fast trylock and unlock code, we can add: > > struct task_struct *curr = current; > > if (unlikely(in_irq())) > curr = this_cpu_read(irq_task); > > This way the priority inheritance will stop when it hits this task (no > need to boost a task of highest priority), and we can leave that code > alone. > Thanks again for the comments and suggestion. Yes, creating a per cpu fake task was one of the alternative considered. I believe one of the reasons I did not purse is the amount of extra storage it requires (sizeof(struct task_struct) * number of cpus. Though the changes may not be as intrusive as the one I sent, some are still required, mainly with current (one in particular came to mind is in wakeup_next-watier()). If I'm not mistaken, another reason could also be due to the rate of the timer interrupt, in the case that the mutex is highly contested IH could stall the non-real-time requester for a long time, even to the point of the cpu is perceived as hung. Anyway, I'll retry the fake task approach a try and report back if there is any issue. Thanks, Mak. > -- Steve > -- > To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3.14.25-rt22 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997!
On Thu, 19 Feb 2015 18:31:05 -0700 Thavatchai Makphaibulchoke wrote: > This patch fixes the problem that the ownership of a mutex acquired by an > interrupt handler(IH) gets incorrectly attributed to the interrupted thread. > > This could result in an incorrect deadlock detection in function > rt_mutex_adjust_prio_chain(), causing thread to be killed and possibly leading > up to a system hang. > > Here is the approach taken: when calling from an interrupt handler, instead of > attributing ownership to the interrupted task, use a reserved task_struct > value > to indicate that the owner is a interrupt handler. This approach avoids the > incorrect deadlock detection. > > This also includes changes in several function in rtmutex.c now that the > lock's > requester may be a interrupt handler, not a real task struct. This impacts > the way how the lock is acquired and prioritized and decision whether to do > the house keeping functions required for a real task struct. > > The reserved task_struct values for interrupt handler are > > current | 0x2 > > where current is the task_struct value of the interrupted task. > > Since IH will both acquire and release the lock only during an interrupt > handling, during which current is not changed, the reserved task_struct value > for an IH should be distinct from another instances of IH on a different cpu. > > Kernel version 3.14.25 + patch-3.14.25-rt22 > > Signed-off-by: T. Makphaibulchoke OK, I believe I understand the issue. Perhaps it would be much better to create a fake task per CPU that we use when grabbing locks in interrupt mode. And make these have a priority of 0 (highest), since they can not be preempted, they do have such a priority. Then in the fast trylock and unlock code, we can add: struct task_struct *curr = current; if (unlikely(in_irq())) curr = this_cpu_read(irq_task); This way the priority inheritance will stop when it hits this task (no need to boost a task of highest priority), and we can leave that code alone. -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3.14.25-rt22 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997!
On Thu, 19 Feb 2015 18:31:05 -0700 Thavatchai Makphaibulchoke t...@hp.com wrote: This patch fixes the problem that the ownership of a mutex acquired by an interrupt handler(IH) gets incorrectly attributed to the interrupted thread. This could result in an incorrect deadlock detection in function rt_mutex_adjust_prio_chain(), causing thread to be killed and possibly leading up to a system hang. Here is the approach taken: when calling from an interrupt handler, instead of attributing ownership to the interrupted task, use a reserved task_struct value to indicate that the owner is a interrupt handler. This approach avoids the incorrect deadlock detection. This also includes changes in several function in rtmutex.c now that the lock's requester may be a interrupt handler, not a real task struct. This impacts the way how the lock is acquired and prioritized and decision whether to do the house keeping functions required for a real task struct. The reserved task_struct values for interrupt handler are current | 0x2 where current is the task_struct value of the interrupted task. Since IH will both acquire and release the lock only during an interrupt handling, during which current is not changed, the reserved task_struct value for an IH should be distinct from another instances of IH on a different cpu. Kernel version 3.14.25 + patch-3.14.25-rt22 Signed-off-by: T. Makphaibulchoke t...@hp.com OK, I believe I understand the issue. Perhaps it would be much better to create a fake task per CPU that we use when grabbing locks in interrupt mode. And make these have a priority of 0 (highest), since they can not be preempted, they do have such a priority. Then in the fast trylock and unlock code, we can add: struct task_struct *curr = current; if (unlikely(in_irq())) curr = this_cpu_read(irq_task); This way the priority inheritance will stop when it hits this task (no need to boost a task of highest priority), and we can leave that code alone. -- Steve -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3.14.25-rt22 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997!
On Mon, 23 Feb 2015 17:16:27 -0700 Thavatchai Makphaibulchoke thavatchai.makpahibulch...@hp.com wrote: Thanks again for the comments and suggestion. Yes, creating a per cpu fake task was one of the alternative considered. I believe one of the reasons I did not purse is the amount of extra storage it requires (sizeof(struct task_struct) * number of cpus. Though the changes may not be as intrusive as the one I sent, some are still required, mainly with current (one in particular came to mind is in wakeup_next-watier()). Yeah, that's a draw back, but still. Big CPU machines should have big memory. If we really are concerned, we could tidy up task_struct a bit, rearranging the fields to make sure like fields are together (helps with cache too) and move the RT stuff up further. Then we could allocate just enough to cover all the task struct fields that are accessed in the rtmutex code. If I'm not mistaken, another reason could also be due to the rate of the timer interrupt, in the case that the mutex is highly contested IH could stall the non-real-time requester for a long time, even to the point of the cpu is perceived as hung. Perhaps we should just have trylocks fail if there are other waiters. As it wont slow down the high priority task. And doing that would probably even help other rt tasks that are blocked on the lock waiting for a release. Why make those tasks wait more if even a higher priority task is simply doing a trylock and can safely fail it. At least we could do that if the task trying to get the lock is a interrupt. -- Steve -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3.14.25-rt22 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997!
On 02/23/2015 11:37 AM, Steven Rostedt wrote: OK, I believe I understand the issue. Perhaps it would be much better to create a fake task per CPU that we use when grabbing locks in interrupt mode. And make these have a priority of 0 (highest), since they can not be preempted, they do have such a priority. Then in the fast trylock and unlock code, we can add: struct task_struct *curr = current; if (unlikely(in_irq())) curr = this_cpu_read(irq_task); This way the priority inheritance will stop when it hits this task (no need to boost a task of highest priority), and we can leave that code alone. Thanks again for the comments and suggestion. Yes, creating a per cpu fake task was one of the alternative considered. I believe one of the reasons I did not purse is the amount of extra storage it requires (sizeof(struct task_struct) * number of cpus. Though the changes may not be as intrusive as the one I sent, some are still required, mainly with current (one in particular came to mind is in wakeup_next-watier()). If I'm not mistaken, another reason could also be due to the rate of the timer interrupt, in the case that the mutex is highly contested IH could stall the non-real-time requester for a long time, even to the point of the cpu is perceived as hung. Anyway, I'll retry the fake task approach a try and report back if there is any issue. Thanks, Mak. -- Steve -- To unsubscribe from this list: send the line unsubscribe linux-rt-users in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3.14.25-rt22 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997!
On Fri, 20 Feb 2015 11:54:45 -0700 Thavatchai Makphaibulchoke wrote: > Sorry for not explaining the problem in more details. > > IH here means the bottom half of interrupt handler, executing in the > interrupt context (IC), not the preemptible interrupt kernel thread. > interrupt. > > Here is the problem we encountered. > > An smp_apic_timer_interrupt comes in while task X is in the process of > waiting for mutex A . The IH successfully locks mutex B (in this case > run_local_timers() gets the timer base's lock, base->lock, via > spin_trylock()). > > At the same time, task Y holding mutex A requests mutex B. > > With current rtmutex code, mutex B ownership is incorrectly attributed > to task X (using current, which is inaccurate in the IC). To task Y the > situation effectively looks like it is holding mutex A and reuqesting B, > which is held by task X holding mutex B and is now waiting for mutex A. > The deadlock detection is correct, a classic potential circular mutex > deadlock. > > In reality, it is not. The IH the actual owner of mutex B will > eventually completes and releases mutex B and task Y will eventually get > mutex B and proceed and so will task X. Actually either deleting or > changing BUG_ON(ret) to WARN_ON(ret) in line 997 in fucntion > rt_spin_lock_slowlock(), the test ran fine without any problem. > Ah, I see the problem you have. Let me explain it in my own words to make sure that you and I are on the same page. Task X tries to grab rt_mutex A, but it's held by task Y. But as Y is still running, the adaptive mutex code is in play and task X is spinning (with it's blocked on A set). An interrupt comes in, preempting task X and does a trylock on rt_mutex B, and succeeds. Now it looks like task X has mutex B and is blocked on mutex A. Task Y tries to take mutex B and sees that is held by task X which is blocked on mutex A which Y owns. Thus you get a false deadlock detect. I'm I correct? Now, the question is, can we safely change the ownership of mutex B in the interrupt context where it wont cause another side effect? > A more detailed description of the problem could also be found at, > > http://markmail.org/message/np33it233hoot4b2#query:+page:1+mid:np33it233hoot4b2+state:results > > > Please let me know what you think or need any additional info. > I haven't looked at the above link. I'll have to think about this some more, but as I'm currently traveling, it will have to be done sometime next week. Feel free top ping me on Monday or Tuesday. Tuesday would probably be better, as I'm sure I'm going to be overloaded with other work when I get back to my office on Monday. -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3.14.25-rt22 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997!
On 02/19/2015 09:53 PM, Steven Rostedt wrote: > On Thu, 19 Feb 2015 18:31:05 -0700 > Thavatchai Makphaibulchoke wrote: > >> This patch fixes the problem that the ownership of a mutex acquired by an >> interrupt handler(IH) gets incorrectly attributed to the interrupted thread. > > *blink* > >> >> This could result in an incorrect deadlock detection in function >> rt_mutex_adjust_prio_chain(), causing thread to be killed and possibly >> leading >> up to a system hang. > > I highly doubt this is an incorrect deadlock that was detected. My > money is on a real deadlock that happened. > >> >> Here is the approach taken: when calling from an interrupt handler, instead >> of >> attributing ownership to the interrupted task, use a reserved task_struct >> value >> to indicate that the owner is a interrupt handler. This approach avoids the >> incorrect deadlock detection. > > How is this an incorrect deadlock? Please explain. > Thanks for the comments. Sorry for not explaining the problem in more details. IH here means the bottom half of interrupt handler, executing in the interrupt context (IC), not the preemptible interrupt kernel thread. interrupt. Here is the problem we encountered. An smp_apic_timer_interrupt comes in while task X is in the process of waiting for mutex A . The IH successfully locks mutex B (in this case run_local_timers() gets the timer base's lock, base->lock, via spin_trylock()). At the same time, task Y holding mutex A requests mutex B. With current rtmutex code, mutex B ownership is incorrectly attributed to task X (using current, which is inaccurate in the IC). To task Y the situation effectively looks like it is holding mutex A and reuqesting B, which is held by task X holding mutex B and is now waiting for mutex A. The deadlock detection is correct, a classic potential circular mutex deadlock. In reality, it is not. The IH the actual owner of mutex B will eventually completes and releases mutex B and task Y will eventually get mutex B and proceed and so will task X. Actually either deleting or changing BUG_ON(ret) to WARN_ON(ret) in line 997 in fucntion rt_spin_lock_slowlock(), the test ran fine without any problem. A more detailed description of the problem could also be found at, http://markmail.org/message/np33it233hoot4b2#query:+page:1+mid:np33it233hoot4b2+state:results Please let me know what you think or need any additional info. Thanks, Mak. >> >> This also includes changes in several function in rtmutex.c now that the >> lock's >> requester may be a interrupt handler, not a real task struct. This impacts >> the way how the lock is acquired and prioritized and decision whether to do >> the house keeping functions required for a real task struct. >> >> The reserved task_struct values for interrupt handler are >> >> current | 0x2 >> >> where current is the task_struct value of the interrupted task. >> >> Since IH will both acquire and release the lock only during an interrupt >> handling, during which current is not changed, the reserved task_struct value >> for an IH should be distinct from another instances of IH on a different cpu. >> > > The interrupt handler is a thread just like any other task. It's not > special. If there was a deadlock detected, it most likely means that a > deadlock exists. > > -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3.14.25-rt22 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997!
On Fri, 20 Feb 2015 11:54:45 -0700 Thavatchai Makphaibulchoke thavatchai.makpahibulch...@hp.com wrote: Sorry for not explaining the problem in more details. IH here means the bottom half of interrupt handler, executing in the interrupt context (IC), not the preemptible interrupt kernel thread. interrupt. Here is the problem we encountered. An smp_apic_timer_interrupt comes in while task X is in the process of waiting for mutex A . The IH successfully locks mutex B (in this case run_local_timers() gets the timer base's lock, base-lock, via spin_trylock()). At the same time, task Y holding mutex A requests mutex B. With current rtmutex code, mutex B ownership is incorrectly attributed to task X (using current, which is inaccurate in the IC). To task Y the situation effectively looks like it is holding mutex A and reuqesting B, which is held by task X holding mutex B and is now waiting for mutex A. The deadlock detection is correct, a classic potential circular mutex deadlock. In reality, it is not. The IH the actual owner of mutex B will eventually completes and releases mutex B and task Y will eventually get mutex B and proceed and so will task X. Actually either deleting or changing BUG_ON(ret) to WARN_ON(ret) in line 997 in fucntion rt_spin_lock_slowlock(), the test ran fine without any problem. Ah, I see the problem you have. Let me explain it in my own words to make sure that you and I are on the same page. Task X tries to grab rt_mutex A, but it's held by task Y. But as Y is still running, the adaptive mutex code is in play and task X is spinning (with it's blocked on A set). An interrupt comes in, preempting task X and does a trylock on rt_mutex B, and succeeds. Now it looks like task X has mutex B and is blocked on mutex A. Task Y tries to take mutex B and sees that is held by task X which is blocked on mutex A which Y owns. Thus you get a false deadlock detect. I'm I correct? Now, the question is, can we safely change the ownership of mutex B in the interrupt context where it wont cause another side effect? A more detailed description of the problem could also be found at, http://markmail.org/message/np33it233hoot4b2#query:+page:1+mid:np33it233hoot4b2+state:results Please let me know what you think or need any additional info. I haven't looked at the above link. I'll have to think about this some more, but as I'm currently traveling, it will have to be done sometime next week. Feel free top ping me on Monday or Tuesday. Tuesday would probably be better, as I'm sure I'm going to be overloaded with other work when I get back to my office on Monday. -- Steve -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3.14.25-rt22 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997!
On 02/19/2015 09:53 PM, Steven Rostedt wrote: On Thu, 19 Feb 2015 18:31:05 -0700 Thavatchai Makphaibulchoke t...@hp.com wrote: This patch fixes the problem that the ownership of a mutex acquired by an interrupt handler(IH) gets incorrectly attributed to the interrupted thread. *blink* This could result in an incorrect deadlock detection in function rt_mutex_adjust_prio_chain(), causing thread to be killed and possibly leading up to a system hang. I highly doubt this is an incorrect deadlock that was detected. My money is on a real deadlock that happened. Here is the approach taken: when calling from an interrupt handler, instead of attributing ownership to the interrupted task, use a reserved task_struct value to indicate that the owner is a interrupt handler. This approach avoids the incorrect deadlock detection. How is this an incorrect deadlock? Please explain. Thanks for the comments. Sorry for not explaining the problem in more details. IH here means the bottom half of interrupt handler, executing in the interrupt context (IC), not the preemptible interrupt kernel thread. interrupt. Here is the problem we encountered. An smp_apic_timer_interrupt comes in while task X is in the process of waiting for mutex A . The IH successfully locks mutex B (in this case run_local_timers() gets the timer base's lock, base-lock, via spin_trylock()). At the same time, task Y holding mutex A requests mutex B. With current rtmutex code, mutex B ownership is incorrectly attributed to task X (using current, which is inaccurate in the IC). To task Y the situation effectively looks like it is holding mutex A and reuqesting B, which is held by task X holding mutex B and is now waiting for mutex A. The deadlock detection is correct, a classic potential circular mutex deadlock. In reality, it is not. The IH the actual owner of mutex B will eventually completes and releases mutex B and task Y will eventually get mutex B and proceed and so will task X. Actually either deleting or changing BUG_ON(ret) to WARN_ON(ret) in line 997 in fucntion rt_spin_lock_slowlock(), the test ran fine without any problem. A more detailed description of the problem could also be found at, http://markmail.org/message/np33it233hoot4b2#query:+page:1+mid:np33it233hoot4b2+state:results Please let me know what you think or need any additional info. Thanks, Mak. This also includes changes in several function in rtmutex.c now that the lock's requester may be a interrupt handler, not a real task struct. This impacts the way how the lock is acquired and prioritized and decision whether to do the house keeping functions required for a real task struct. The reserved task_struct values for interrupt handler are current | 0x2 where current is the task_struct value of the interrupted task. Since IH will both acquire and release the lock only during an interrupt handling, during which current is not changed, the reserved task_struct value for an IH should be distinct from another instances of IH on a different cpu. The interrupt handler is a thread just like any other task. It's not special. If there was a deadlock detected, it most likely means that a deadlock exists. -- Steve -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3.14.25-rt22 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997!
On Thu, 19 Feb 2015 18:31:05 -0700 Thavatchai Makphaibulchoke wrote: > This patch fixes the problem that the ownership of a mutex acquired by an > interrupt handler(IH) gets incorrectly attributed to the interrupted thread. *blink* > > This could result in an incorrect deadlock detection in function > rt_mutex_adjust_prio_chain(), causing thread to be killed and possibly leading > up to a system hang. I highly doubt this is an incorrect deadlock that was detected. My money is on a real deadlock that happened. > > Here is the approach taken: when calling from an interrupt handler, instead of > attributing ownership to the interrupted task, use a reserved task_struct > value > to indicate that the owner is a interrupt handler. This approach avoids the > incorrect deadlock detection. How is this an incorrect deadlock? Please explain. > > This also includes changes in several function in rtmutex.c now that the > lock's > requester may be a interrupt handler, not a real task struct. This impacts > the way how the lock is acquired and prioritized and decision whether to do > the house keeping functions required for a real task struct. > > The reserved task_struct values for interrupt handler are > > current | 0x2 > > where current is the task_struct value of the interrupted task. > > Since IH will both acquire and release the lock only during an interrupt > handling, during which current is not changed, the reserved task_struct value > for an IH should be distinct from another instances of IH on a different cpu. > The interrupt handler is a thread just like any other task. It's not special. If there was a deadlock detected, it most likely means that a deadlock exists. -- Steve > Kernel version 3.14.25 + patch-3.14.25-rt22 > > Signed-off-by: T. Makphaibulchoke > --- > include/linux/spinlock_rt.h | 4 + > kernel/locking/rtmutex-debug.c | 15 ++- > kernel/locking/rtmutex.c| 212 > > kernel/locking/rtmutex_common.h | 21 > kernel/timer.c | 4 +- > 5 files changed, 188 insertions(+), 68 deletions(-) > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3.14.25-rt22 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997!
This patch fixes the problem that the ownership of a mutex acquired by an interrupt handler(IH) gets incorrectly attributed to the interrupted thread. This could result in an incorrect deadlock detection in function rt_mutex_adjust_prio_chain(), causing thread to be killed and possibly leading up to a system hang. Here is the approach taken: when calling from an interrupt handler, instead of attributing ownership to the interrupted task, use a reserved task_struct value to indicate that the owner is a interrupt handler. This approach avoids the incorrect deadlock detection. This also includes changes in several function in rtmutex.c now that the lock's requester may be a interrupt handler, not a real task struct. This impacts the way how the lock is acquired and prioritized and decision whether to do the house keeping functions required for a real task struct. The reserved task_struct values for interrupt handler are current | 0x2 where current is the task_struct value of the interrupted task. Since IH will both acquire and release the lock only during an interrupt handling, during which current is not changed, the reserved task_struct value for an IH should be distinct from another instances of IH on a different cpu. Kernel version 3.14.25 + patch-3.14.25-rt22 Signed-off-by: T. Makphaibulchoke --- include/linux/spinlock_rt.h | 4 + kernel/locking/rtmutex-debug.c | 15 ++- kernel/locking/rtmutex.c| 212 kernel/locking/rtmutex_common.h | 21 kernel/timer.c | 4 +- 5 files changed, 188 insertions(+), 68 deletions(-) diff --git a/include/linux/spinlock_rt.h b/include/linux/spinlock_rt.h index c0d1367..eeb4188 100644 --- a/include/linux/spinlock_rt.h +++ b/include/linux/spinlock_rt.h @@ -27,6 +27,7 @@ extern void __lockfunc rt_spin_unlock_wait(spinlock_t *lock); extern int __lockfunc rt_spin_trylock_irqsave(spinlock_t *lock, unsigned long *flags); extern int __lockfunc rt_spin_trylock_bh(spinlock_t *lock); extern int __lockfunc rt_spin_trylock(spinlock_t *lock); +extern int __lockfunc rt_spin_trylock_in_interrupt(spinlock_t *lock); extern int atomic_dec_and_spin_lock(atomic_t *atomic, spinlock_t *lock); /* @@ -52,6 +53,9 @@ extern int __lockfunc __rt_spin_trylock(struct rt_mutex *lock); #define spin_lock_irq(lock)spin_lock(lock) +#define spin_do_trylock_in_interrupt(lock) \ + __cond_lock(lock, rt_spin_trylock_in_interrupt(lock)) + #define spin_do_trylock(lock) __cond_lock(lock, rt_spin_trylock(lock)) #define spin_trylock(lock) \ diff --git a/kernel/locking/rtmutex-debug.c b/kernel/locking/rtmutex-debug.c index 49b2ed3..c36d629 100644 --- a/kernel/locking/rtmutex-debug.c +++ b/kernel/locking/rtmutex-debug.c @@ -40,6 +40,8 @@ static void printk_task(struct task_struct *p) static void printk_lock(struct rt_mutex *lock, int print_owner) { + struct task_struct *owner = rt_mutex_owner(lock); + if (lock->name) printk(" [%p] {%s}\n", lock, lock->name); @@ -47,10 +49,13 @@ static void printk_lock(struct rt_mutex *lock, int print_owner) printk(" [%p] {%s:%d}\n", lock, lock->file, lock->line); - if (print_owner && rt_mutex_owner(lock)) { + if (print_owner && owner) { printk(".. ->owner: %p\n", lock->owner); printk(".. held by: "); - printk_task(rt_mutex_owner(lock)); + if (rt_mutex_owner_is_task(owner)) + printk_task(owner); + else + printk(" an interrupt handler."); printk("\n"); } } @@ -76,6 +81,8 @@ void debug_rt_mutex_deadlock(int detect, struct rt_mutex_waiter *act_waiter, task = rt_mutex_owner(act_waiter->lock); if (task && task != current) { + /* Interrupt handler should not be deadlocking. */ + BUG_ON(!rt_mutex_owner_is_task(task)); act_waiter->deadlock_task_pid = get_pid(task_pid(task)); act_waiter->deadlock_lock = lock; } @@ -138,7 +145,9 @@ void debug_rt_mutex_lock(struct rt_mutex *lock) void debug_rt_mutex_unlock(struct rt_mutex *lock) { - DEBUG_LOCKS_WARN_ON(rt_mutex_owner(lock) != current); + DEBUG_LOCKS_WARN_ON(in_interrupt() ? + !rt_mutex_owner_is_task(rt_mutex_owner(lock)) : + rt_mutex_owner(lock) != current); } void diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c index 6c40660..8b66f81 100644 --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -51,6 +51,9 @@ * waiters. This can happen when grabbing the lock in the slow path. * To prevent a cmpxchg of the owner releasing the lock, we need to * set this bit before looking at the lock. + * + * owner can also be reserved value, INTERRUPT_HANDLER, in case the
Re: [PATCH 3.14.25-rt22 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997!
On Thu, 19 Feb 2015 18:31:05 -0700 Thavatchai Makphaibulchoke t...@hp.com wrote: This patch fixes the problem that the ownership of a mutex acquired by an interrupt handler(IH) gets incorrectly attributed to the interrupted thread. *blink* This could result in an incorrect deadlock detection in function rt_mutex_adjust_prio_chain(), causing thread to be killed and possibly leading up to a system hang. I highly doubt this is an incorrect deadlock that was detected. My money is on a real deadlock that happened. Here is the approach taken: when calling from an interrupt handler, instead of attributing ownership to the interrupted task, use a reserved task_struct value to indicate that the owner is a interrupt handler. This approach avoids the incorrect deadlock detection. How is this an incorrect deadlock? Please explain. This also includes changes in several function in rtmutex.c now that the lock's requester may be a interrupt handler, not a real task struct. This impacts the way how the lock is acquired and prioritized and decision whether to do the house keeping functions required for a real task struct. The reserved task_struct values for interrupt handler are current | 0x2 where current is the task_struct value of the interrupted task. Since IH will both acquire and release the lock only during an interrupt handling, during which current is not changed, the reserved task_struct value for an IH should be distinct from another instances of IH on a different cpu. The interrupt handler is a thread just like any other task. It's not special. If there was a deadlock detected, it most likely means that a deadlock exists. -- Steve Kernel version 3.14.25 + patch-3.14.25-rt22 Signed-off-by: T. Makphaibulchoke t...@hp.com --- include/linux/spinlock_rt.h | 4 + kernel/locking/rtmutex-debug.c | 15 ++- kernel/locking/rtmutex.c| 212 kernel/locking/rtmutex_common.h | 21 kernel/timer.c | 4 +- 5 files changed, 188 insertions(+), 68 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3.14.25-rt22 1/2] rtmutex Real-Time Linux: Fixing kernel BUG at kernel/locking/rtmutex.c:997!
This patch fixes the problem that the ownership of a mutex acquired by an interrupt handler(IH) gets incorrectly attributed to the interrupted thread. This could result in an incorrect deadlock detection in function rt_mutex_adjust_prio_chain(), causing thread to be killed and possibly leading up to a system hang. Here is the approach taken: when calling from an interrupt handler, instead of attributing ownership to the interrupted task, use a reserved task_struct value to indicate that the owner is a interrupt handler. This approach avoids the incorrect deadlock detection. This also includes changes in several function in rtmutex.c now that the lock's requester may be a interrupt handler, not a real task struct. This impacts the way how the lock is acquired and prioritized and decision whether to do the house keeping functions required for a real task struct. The reserved task_struct values for interrupt handler are current | 0x2 where current is the task_struct value of the interrupted task. Since IH will both acquire and release the lock only during an interrupt handling, during which current is not changed, the reserved task_struct value for an IH should be distinct from another instances of IH on a different cpu. Kernel version 3.14.25 + patch-3.14.25-rt22 Signed-off-by: T. Makphaibulchoke t...@hp.com --- include/linux/spinlock_rt.h | 4 + kernel/locking/rtmutex-debug.c | 15 ++- kernel/locking/rtmutex.c| 212 kernel/locking/rtmutex_common.h | 21 kernel/timer.c | 4 +- 5 files changed, 188 insertions(+), 68 deletions(-) diff --git a/include/linux/spinlock_rt.h b/include/linux/spinlock_rt.h index c0d1367..eeb4188 100644 --- a/include/linux/spinlock_rt.h +++ b/include/linux/spinlock_rt.h @@ -27,6 +27,7 @@ extern void __lockfunc rt_spin_unlock_wait(spinlock_t *lock); extern int __lockfunc rt_spin_trylock_irqsave(spinlock_t *lock, unsigned long *flags); extern int __lockfunc rt_spin_trylock_bh(spinlock_t *lock); extern int __lockfunc rt_spin_trylock(spinlock_t *lock); +extern int __lockfunc rt_spin_trylock_in_interrupt(spinlock_t *lock); extern int atomic_dec_and_spin_lock(atomic_t *atomic, spinlock_t *lock); /* @@ -52,6 +53,9 @@ extern int __lockfunc __rt_spin_trylock(struct rt_mutex *lock); #define spin_lock_irq(lock)spin_lock(lock) +#define spin_do_trylock_in_interrupt(lock) \ + __cond_lock(lock, rt_spin_trylock_in_interrupt(lock)) + #define spin_do_trylock(lock) __cond_lock(lock, rt_spin_trylock(lock)) #define spin_trylock(lock) \ diff --git a/kernel/locking/rtmutex-debug.c b/kernel/locking/rtmutex-debug.c index 49b2ed3..c36d629 100644 --- a/kernel/locking/rtmutex-debug.c +++ b/kernel/locking/rtmutex-debug.c @@ -40,6 +40,8 @@ static void printk_task(struct task_struct *p) static void printk_lock(struct rt_mutex *lock, int print_owner) { + struct task_struct *owner = rt_mutex_owner(lock); + if (lock-name) printk( [%p] {%s}\n, lock, lock-name); @@ -47,10 +49,13 @@ static void printk_lock(struct rt_mutex *lock, int print_owner) printk( [%p] {%s:%d}\n, lock, lock-file, lock-line); - if (print_owner rt_mutex_owner(lock)) { + if (print_owner owner) { printk(.. -owner: %p\n, lock-owner); printk(.. held by: ); - printk_task(rt_mutex_owner(lock)); + if (rt_mutex_owner_is_task(owner)) + printk_task(owner); + else + printk( an interrupt handler.); printk(\n); } } @@ -76,6 +81,8 @@ void debug_rt_mutex_deadlock(int detect, struct rt_mutex_waiter *act_waiter, task = rt_mutex_owner(act_waiter-lock); if (task task != current) { + /* Interrupt handler should not be deadlocking. */ + BUG_ON(!rt_mutex_owner_is_task(task)); act_waiter-deadlock_task_pid = get_pid(task_pid(task)); act_waiter-deadlock_lock = lock; } @@ -138,7 +145,9 @@ void debug_rt_mutex_lock(struct rt_mutex *lock) void debug_rt_mutex_unlock(struct rt_mutex *lock) { - DEBUG_LOCKS_WARN_ON(rt_mutex_owner(lock) != current); + DEBUG_LOCKS_WARN_ON(in_interrupt() ? + !rt_mutex_owner_is_task(rt_mutex_owner(lock)) : + rt_mutex_owner(lock) != current); } void diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c index 6c40660..8b66f81 100644 --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -51,6 +51,9 @@ * waiters. This can happen when grabbing the lock in the slow path. * To prevent a cmpxchg of the owner releasing the lock, we need to * set this bit before looking at the lock. + * + * owner can also be reserved value, INTERRUPT_HANDLER, in case the mutex + * is