Re: get_irq_regs() from soft IRQ
On Mon, Jun 29, 2009 at 04:31:18PM +0200, Jean Pihet wrote: I am trying to get the latest IRQ registers from a timer or a work queue but I am running into problems: - get_irq_regs() returns NULL in some cases, It will always return NULL outside of IRQ context - and only returns valid pointers when used inside IRQ context. It's one of these things that nests itself - when you have several IRQs being processed on one CPU, there are several register contexts saved, and get_irq_regs() returns the most recent one. The use case is that the performance unit (PMNC) of the Cortex A8 has some serious bug, in short the performance counters overflow IRQ is to be avoided. I don't follow. None of the PMNC support code in the mainline kernel uses get_irq_regs() outside of IRQ context. Some questions: - is there a way to get the last 'real' IRQ registers from a timer or work queue handler? No. Outside of IRQ events, the saved IRQ context does not exist. -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: get_irq_regs() from soft IRQ
On Monday 29 June 2009 17:19:31 Russell King - ARM Linux wrote: On Mon, Jun 29, 2009 at 04:31:18PM +0200, Jean Pihet wrote: I am trying to get the latest IRQ registers from a timer or a work queue but I am running into problems: - get_irq_regs() returns NULL in some cases, It will always return NULL outside of IRQ context - and only returns valid pointers when used inside IRQ context. Ok got it. It's one of these things that nests itself - when you have several IRQs being processed on one CPU, there are several register contexts saved, and get_irq_regs() returns the most recent one. The use case is that the performance unit (PMNC) of the Cortex A8 has some serious bug, in short the performance counters overflow IRQ is to be avoided. I don't follow. None of the PMNC support code in the mainline kernel uses get_irq_regs() outside of IRQ context. That is correct. The Cortex A8 needs some special treatment. The errata says that if the counters are overflowing at the same time as a coprocessor access is performed, the perf unit gets reset and/or locks up. In short the counters overflow is to be avoided and so the PMNC IRQ. Some questions: - is there a way to get the last 'real' IRQ registers from a timer or work queue handler? No. Outside of IRQ events, the saved IRQ context does not exist. Ok. I wonder how to implement it correctly from here. The ultimate goal is to feed the registers to oprofile for statistics gathering (mostly the PC). I do not see much benefit from oprofile without the PC statistics. Thanks, Jean -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: get_irq_regs() from soft IRQ
On Mon, Jun 29, 2009 at 05:35:37PM +0200, Jean Pihet wrote: On Monday 29 June 2009 17:19:31 Russell King - ARM Linux wrote: It's one of these things that nests itself - when you have several IRQs being processed on one CPU, there are several register contexts saved, and get_irq_regs() returns the most recent one. The use case is that the performance unit (PMNC) of the Cortex A8 has some serious bug, in short the performance counters overflow IRQ is to be avoided. I don't follow. None of the PMNC support code in the mainline kernel uses get_irq_regs() outside of IRQ context. That is correct. The Cortex A8 needs some special treatment. The errata says that if the counters are overflowing at the same time as a coprocessor access is performed, the perf unit gets reset and/or locks up. In short the counters overflow is to be avoided and so the PMNC IRQ. Are you talking about 628216? -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: get_irq_regs() from soft IRQ
On Monday 29 June 2009 18:07:44 Russell King - ARM Linux wrote: On Mon, Jun 29, 2009 at 05:35:37PM +0200, Jean Pihet wrote: On Monday 29 June 2009 17:19:31 Russell King - ARM Linux wrote: It's one of these things that nests itself - when you have several IRQs being processed on one CPU, there are several register contexts saved, and get_irq_regs() returns the most recent one. The use case is that the performance unit (PMNC) of the Cortex A8 has some serious bug, in short the performance counters overflow IRQ is to be avoided. I don't follow. None of the PMNC support code in the mainline kernel uses get_irq_regs() outside of IRQ context. That is correct. The Cortex A8 needs some special treatment. The errata says that if the counters are overflowing at the same time as a coprocessor access is performed, the perf unit gets reset and/or locks up. In short the counters overflow is to be avoided and so the PMNC IRQ. Are you talking about 628216? Yes that is the one. Sorry not to mention it sooner. Jean -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: get_irq_regs() from soft IRQ
On Monday 29 June 2009 17:31:18 ext Jean Pihet wrote: Hi, I am trying to get the latest IRQ registers from a timer or a work queue but I am running into problems: - get_irq_regs() returns NULL in some cases, so it is unsuable and even causes crash when trying to get the registers values from the returned ptr - I never get user space registers, only kernel The use case is that the performance unit (PMNC) of the Cortex A8 has some serious bug, in short the performance counters overflow IRQ is to be avoided. The solution I am implementing is to read and reset the counters from a work queue that is triggered by a timer. Regarding this oprofile related part. I wonder how you can get oprofile working properly (providing non-bogus results) without performance counters overflow IRQ generation? Are you trying to implement (in a clean way) something similar to http://marc.info/?l=oprofile-listm=123688347009580w=2 Or is it going to be a different workaround? -- Best regards, Siarhei Siamashka -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: get_irq_regs() from soft IRQ
Hi Siarhei Siamashka, On Monday 29 June 2009 18:36:57 Siarhei Siamashka wrote: On Monday 29 June 2009 17:31:18 ext Jean Pihet wrote: Hi, I am trying to get the latest IRQ registers from a timer or a work queue but I am running into problems: - get_irq_regs() returns NULL in some cases, so it is unsuable and even causes crash when trying to get the registers values from the returned ptr - I never get user space registers, only kernel The use case is that the performance unit (PMNC) of the Cortex A8 has some serious bug, in short the performance counters overflow IRQ is to be avoided. The solution I am implementing is to read and reset the counters from a work queue that is triggered by a timer. Regarding this oprofile related part. I wonder how you can get oprofile working properly (providing non-bogus results) without performance counters overflow IRQ generation? Are you trying to implement (in a clean way) something similar to http://marc.info/?l=oprofile-listm=123688347009580w=2 Or is it going to be a different workaround? I am trying to get a different approach, starting from the errata description. The idea is to avoid the counters from overflowing, which could cause a PMNC unit reset or lock-up (or both). Here are the implementation details: - use a timer to read and reset the counters, then fire a work queue - in the work queue the counters values are converted to oprofile samples - the proper locking is used to avoid some races between the various tasks I am nearly done with it but I am now running into problems with PM (suspend/resume) and get_irq_regs(). What do you think? How far are you on your side? Did you stress test the solution? Is the PMNC recovery always successful? Regards, Jean -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: get_irq_regs() from soft IRQ
On Mon, Jun 29, 2009 at 07:36:57PM +0300, Siarhei Siamashka wrote: On Monday 29 June 2009 17:31:18 ext Jean Pihet wrote: I am trying to get the latest IRQ registers from a timer or a work queue but I am running into problems: - get_irq_regs() returns NULL in some cases, so it is unsuable and even causes crash when trying to get the registers values from the returned ptr - I never get user space registers, only kernel The use case is that the performance unit (PMNC) of the Cortex A8 has some serious bug, in short the performance counters overflow IRQ is to be avoided. The solution I am implementing is to read and reset the counters from a work queue that is triggered by a timer. Regarding this oprofile related part. I wonder how you can get oprofile working properly (providing non-bogus results) without performance counters overflow IRQ generation? I don't think you can - triggering capture on overflow is precisely how oprofile works. The erratum talks about polling for overflow. By doing this, you are in a well defined part of the kernel, which is obviously going to be shown as a hot path for every counter, thus making oprofile useless for kernel work. Deferring the interrupt to a workqueue doesn't resolve the problem either. The problem has nothing to do with what happens after the interrupt occurs - it's about interrupts themselves being lost. I think just accepting that this erratum breaks oprofile is the only realistic solution. ;( -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: get_irq_regs() from soft IRQ
On Mon, Jun 29, 2009 at 06:58:41PM +0200, Jean Pihet wrote: I am trying to get a different approach, starting from the errata description. The idea is to avoid the counters from overflowing, which could cause a PMNC unit reset or lock-up (or both). But this can't work. Oprofile essentially works as follows: You set the number (N) of events you wish to occur between each sample. When N events have occured, you record the stacktrace and reset the counter so it fires after another N events. Now, you could start the counters at zero every time, and then poll them via a timer. When the counter value is larger than N, you could log a stacktrace and zero the counter. However, this suffers one very serious problem - if you're wanting to measure something at an interval which occurs faster than your timer, you're going to get misleading results. You could set the timer to fire at a high rate, but then that's going to upset things like cache miss, cache hit, etc measurements. Here are the implementation details: - use a timer to read and reset the counters, then fire a work queue - in the work queue the counters values are converted to oprofile samples - the proper locking is used to avoid some races between the various tasks This sounds over complicated. I see no reason for a workqueue to be involved anywhere near the oprofile sample code. I am nearly done with it but I am now running into problems with PM (suspend/resume) and get_irq_regs(). You really really really can't use get_irq_regs() outside of IRQ context. The stored registers just do not exist anymore - they've been overwritten by whatever exception or system call you're currently in. You can't create a copy of them - copies will be overwritten on the very next (nested) interrupt. You don't know which interrupt is the first interrupt to occur. I really think that the only option here is to just accept that oprofile is crucified by this errata. -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: get_irq_regs() from soft IRQ
On Monday 29 June 2009 19:37:57 Russell King - ARM Linux wrote: On Mon, Jun 29, 2009 at 07:36:57PM +0300, Siarhei Siamashka wrote: On Monday 29 June 2009 17:31:18 ext Jean Pihet wrote: I am trying to get the latest IRQ registers from a timer or a work queue but I am running into problems: - get_irq_regs() returns NULL in some cases, so it is unsuable and even causes crash when trying to get the registers values from the returned ptr - I never get user space registers, only kernel The use case is that the performance unit (PMNC) of the Cortex A8 has some serious bug, in short the performance counters overflow IRQ is to be avoided. The solution I am implementing is to read and reset the counters from a work queue that is triggered by a timer. Regarding this oprofile related part. I wonder how you can get oprofile working properly (providing non-bogus results) without performance counters overflow IRQ generation? I don't think you can - triggering capture on overflow is precisely how oprofile works. The erratum talks about polling for overflow. By doing this, you are in a well defined part of the kernel, which is obviously going to be shown as a hot path for every counter, thus making oprofile useless for kernel work. I think it is possible, well if you except the get_irq_regs() problem. The idea is to read and reset the counters before the overflow, instead of loading them with a small negative value and waiting for the overflow to happen. Deferring the interrupt to a workqueue doesn't resolve the problem either. The problem has nothing to do with what happens after the interrupt occurs - it's about interrupts themselves being lost. The errata is about a lost event and/or a lock-up of the PMNC unit at the time of overflow. I think just accepting that this erratum breaks oprofile is the only realistic solution. ;( Completely agree. However it would be nice to have a workaround, as un-elegant as it can be ;( -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: get_irq_regs() from soft IRQ
On Monday 29 June 2009 19:46:33 Russell King - ARM Linux wrote: On Mon, Jun 29, 2009 at 06:58:41PM +0200, Jean Pihet wrote: I am trying to get a different approach, starting from the errata description. The idea is to avoid the counters from overflowing, which could cause a PMNC unit reset or lock-up (or both). But this can't work. Oprofile essentially works as follows: You set the number (N) of events you wish to occur between each sample. When N events have occured, you record the stacktrace and reset the counter so it fires after another N events. Now, you could start the counters at zero every time, and then poll them via a timer. When the counter value is larger than N, you could log a stacktrace and zero the counter. However, this suffers one very serious problem - if you're wanting to measure something at an interval which occurs faster than your timer, you're going to get misleading results. The counters are 32-bit wide and the maximum counting frequency is 2 events per cycle (cf. errata). That means you get plenty of time before the counters overflow. You could set the timer to fire at a high rate, but then that's going to upset things like cache miss, cache hit, etc measurements. Correct. You need a tradeoff for the timer period. Here are the implementation details: - use a timer to read and reset the counters, then fire a work queue - in the work queue the counters values are converted to oprofile samples - the proper locking is used to avoid some races between the various tasks This sounds over complicated. It is ;p I see no reason for a workqueue to be involved anywhere near the oprofile sample code. Got it. I am nearly done with it but I am now running into problems with PM (suspend/resume) and get_irq_regs(). You really really really can't use get_irq_regs() outside of IRQ context. The stored registers just do not exist anymore - they've been overwritten by whatever exception or system call you're currently in. You can't create a copy of them - copies will be overwritten on the very next (nested) interrupt. You don't know which interrupt is the first interrupt to occur. Doh! I really think that the only option here is to just accept that oprofile is crucified by this errata. Amen! Thanks, Jean -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: get_irq_regs() from soft IRQ
On Monday 29 June 2009 19:54:23 Siarhei Siamashka wrote: On Monday 29 June 2009 19:58:41 ext Jean Pihet wrote: Hi Siarhei Siamashka, On Monday 29 June 2009 18:36:57 Siarhei Siamashka wrote: On Monday 29 June 2009 17:31:18 ext Jean Pihet wrote: Hi, I am trying to get the latest IRQ registers from a timer or a work queue but I am running into problems: - get_irq_regs() returns NULL in some cases, so it is unsuable and even causes crash when trying to get the registers values from the returned ptr - I never get user space registers, only kernel The use case is that the performance unit (PMNC) of the Cortex A8 has some serious bug, in short the performance counters overflow IRQ is to be avoided. The solution I am implementing is to read and reset the counters from a work queue that is triggered by a timer. Regarding this oprofile related part. I wonder how you can get oprofile working properly (providing non-bogus results) without performance counters overflow IRQ generation? Are you trying to implement (in a clean way) something similar to http://marc.info/?l=oprofile-listm=123688347009580w=2 Or is it going to be a different workaround? I am trying to get a different approach, starting from the errata description. The idea is to avoid the counters from overflowing, which could cause a PMNC unit reset or lock-up (or both). Here are the implementation details: - use a timer to read and reset the counters, then fire a work queue - in the work queue the counters values are converted to oprofile samples - the proper locking is used to avoid some races between the various tasks I am nearly done with it but I am now running into problems with PM (suspend/resume) and get_irq_regs(). What do you think? Russel was the first to reply :) But we also discussed this hybrid model some time ago, and there is a clear counterexample where it fails: http://www.nabble.com/Re%3A--PATCH-0-1--OMAP-gptimer-based-event-monitor-dr iver-for-oprofile-p21374285.html All right, sorry I was not aware of that discussion. So the PMNC unit is broken beyond repair. BTW good description and test results! How far are you on your side? Did you stress test the solution? Is the PMNC recovery always successful? I ended up just using a timer with high frequency of samples generation. it works without hassle and is sufficient for the majority of cases. Ok. It looks like it is the best we can do. Thanks, Jean -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: get_irq_regs() from soft IRQ
On Monday 29 June 2009 20:37:57 ext Russell King - ARM Linux wrote: On Mon, Jun 29, 2009 at 07:36:57PM +0300, Siarhei Siamashka wrote: On Monday 29 June 2009 17:31:18 ext Jean Pihet wrote: I am trying to get the latest IRQ registers from a timer or a work queue but I am running into problems: - get_irq_regs() returns NULL in some cases, so it is unsuable and even causes crash when trying to get the registers values from the returned ptr - I never get user space registers, only kernel The use case is that the performance unit (PMNC) of the Cortex A8 has some serious bug, in short the performance counters overflow IRQ is to be avoided. The solution I am implementing is to read and reset the counters from a work queue that is triggered by a timer. Regarding this oprofile related part. I wonder how you can get oprofile working properly (providing non-bogus results) without performance counters overflow IRQ generation? I don't think you can - triggering capture on overflow is precisely how oprofile works. The erratum talks about polling for overflow. By doing this, you are in a well defined part of the kernel, which is obviously going to be shown as a hot path for every counter, thus making oprofile useless for kernel work. Deferring the interrupt to a workqueue doesn't resolve the problem either. The problem has nothing to do with what happens after the interrupt occurs - it's about interrupts themselves being lost. I think just accepting that this erratum breaks oprofile is the only realistic solution. ;( I also thought about the same initially. But the problem still looks like it can be workarounded, admittedly in quite a dirty way. We just need to use not a periodic timer, but kind of a watchdog (this can be implemented with OMAP GPTIMER). As long as PMU interrupts are coming fast, watchdog is frequently reset and never shows up anywhere. Everything is working nice. Now if PMU gets broken, watchdog gets triggered eventually and recovers PMU state. As PMU could get broken something like 10 times per second in the worst case in my experiments, having ~10 ms for a watchdog trigger period seemed to be a reasonable empirical value. So in this conditions, PMU will be in a nonworking state approximately less than 10% of the time in the worst practical case. Not very nice, but not completely ugly either. Another problematic condition is when PMU is fine, but is not generating events naturally (for example we have configured it for cache misses, but are burning cpu in a loop which is not accessing memory at all). In this case a watchdog will be triggered periodically for no reason, generating the noise in profiling statistics. This noise needs to be filtered out, and seems like it is possible to do it. The trick is to reset watchdog counter to a lower value than it is typically reset in PMU IRQ handler. This way, whenever PMU interrupt is generated, we check if watchdog counter is below the normal threshold. If it is lower, then we know that watchdog interrupt was triggered recently and this sample can be ignored. The difference between normal watchdog counter reset value and the value which gets set on watchdog interrupts should provide sufficient time to get out of the watchdog interrupt handler and its related code, so that it does not show up in statistics that much. A working proof of concept patch was submitted there: http://groups.google.com/group/beagleboard/msg/dd361f3b43fdeff0 Sorry for not posting it to one of the kernel mailing lists, but I thought that beagleboard mailing list was a good place to find users who may want to try it and evaluate if it has any practical value. Maybe it was not a very wise decision. Unfortunately I'm not a kernel hacker and cleaning up the patch may take too much time and efforts, taking into account my current knowledge. I would be happy if somebody else with more hands-on kernel experience could make a clean and usable Cortex-A8 PMU workaround. I don't care about getting some part of credit for it or not, the end result is more important :) One of the obvious problems with the patch (other than race conditions) is that it is using OMAP-specific GPTIMER. Is there something more portable in the kernel to provide similar functionality? Or are there any Cortex-A8 r1 cores other than OMAP3 in the wild? -- Best regards, Siarhei Siamashka -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: get_irq_regs() from soft IRQ
On Monday 29 June 2009 20:38:59 Siarhei Siamashka wrote: On Monday 29 June 2009 20:37:57 ext Russell King - ARM Linux wrote: On Mon, Jun 29, 2009 at 07:36:57PM +0300, Siarhei Siamashka wrote: On Monday 29 June 2009 17:31:18 ext Jean Pihet wrote: I am trying to get the latest IRQ registers from a timer or a work queue but I am running into problems: - get_irq_regs() returns NULL in some cases, so it is unsuable and even causes crash when trying to get the registers values from the returned ptr - I never get user space registers, only kernel The use case is that the performance unit (PMNC) of the Cortex A8 has some serious bug, in short the performance counters overflow IRQ is to be avoided. The solution I am implementing is to read and reset the counters from a work queue that is triggered by a timer. Regarding this oprofile related part. I wonder how you can get oprofile working properly (providing non-bogus results) without performance counters overflow IRQ generation? I don't think you can - triggering capture on overflow is precisely how oprofile works. The erratum talks about polling for overflow. By doing this, you are in a well defined part of the kernel, which is obviously going to be shown as a hot path for every counter, thus making oprofile useless for kernel work. Deferring the interrupt to a workqueue doesn't resolve the problem either. The problem has nothing to do with what happens after the interrupt occurs - it's about interrupts themselves being lost. I think just accepting that this erratum breaks oprofile is the only realistic solution. ;( I also thought about the same initially. But the problem still looks like it can be workarounded, admittedly in quite a dirty way. We just need to use not a periodic timer, but kind of a watchdog (this can be implemented with OMAP GPTIMER). As long as PMU interrupts are coming fast, watchdog is frequently reset and never shows up anywhere. Everything is working nice. Now if PMU gets broken, watchdog gets triggered eventually and recovers PMU state. As PMU could get broken something like 10 times per second in the worst case in my experiments, having ~10 ms for a watchdog trigger period seemed to be a reasonable empirical value. So in this conditions, PMU will be in a nonworking state approximately less than 10% of the time in the worst practical case. Not very nice, but not completely ugly either. The accuracy is not very good. Another problematic condition is when PMU is fine, but is not generating events naturally (for example we have configured it for cache misses, but are burning cpu in a loop which is not accessing memory at all). In this case a watchdog will be triggered periodically for no reason, generating the noise in profiling statistics. This noise needs to be filtered out, and seems like it is possible to do it. The trick is to reset watchdog counter to a lower value than it is typically reset in PMU IRQ handler. This way, whenever PMU interrupt is generated, we check if watchdog counter is below the normal threshold. If it is lower, then we know that watchdog interrupt was triggered recently and this sample can be ignored. The difference between normal watchdog counter reset value and the value which gets set on watchdog interrupts should provide sufficient time to get out of the watchdog interrupt handler and its related code, so that it does not show up in statistics that much. A working proof of concept patch was submitted there: http://groups.google.com/group/beagleboard/msg/dd361f3b43fdeff0 Sorry for not posting it to one of the kernel mailing lists, but I thought that beagleboard mailing list was a good place to find users who may want to try it and evaluate if it has any practical value. Maybe it was not a very wise decision. Unfortunately I'm not a kernel hacker and cleaning up the patch may take too much time and efforts, taking into account my current knowledge. I would be happy if somebody else with more hands-on kernel experience could make a clean and usable Cortex-A8 PMU workaround. I don't care about getting some part of credit for it or not, the end result is more important :) I am ok to help One of the obvious problems with the patch (other than race conditions) is that it is using OMAP-specific GPTIMER. Is there something more portable in the kernel to provide similar functionality? Or are there any Cortex-A8 r1 cores other than OMAP3 in the wild? You can use a 'struct timer_list' and the setup_timer, mod_timer, del_timer_sync. Another API is the hight resolution timers (HRT) but I do not think we need such a high precision timer here. Jean -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: get_irq_regs() from soft IRQ
On Monday 29 June 2009 21:49:59 ext Jean Pihet wrote: [...] We just need to use not a periodic timer, but kind of a watchdog (this can be implemented with OMAP GPTIMER). As long as PMU interrupts are coming fast, watchdog is frequently reset and never shows up anywhere. Everything is working nice. Now if PMU gets broken, watchdog gets triggered eventually and recovers PMU state. As PMU could get broken something like 10 times per second in the worst case in my experiments, having ~10 ms for a watchdog trigger period seemed to be a reasonable empirical value. So in this conditions, PMU will be in a nonworking state approximately less than 10% of the time in the worst practical case. Not very nice, but not completely ugly either. The accuracy is not very good. Yes, but it is the worst case. In normal case when PMU not broken or very rarely broken, the statistics would be quite good. One of the reasons of dropping working on this patch was also the fact that in some cases Cortex-A8 PMU even works reliable enough :) Adding some suspicious weird extra logic may be not very desired by the people, who are quite satisfied even with the current oprofile state on Cortex-A8 chips (numbercrunching applications with relatively low number of syscalls and hence rarely touching any coprocessor registers, are mostly unaffected). Some adaptive watchdog trigger period may be better (try to predict when the next PMU interrupt is going to normally happen and tune watchdog timeout at runtime), but also may be more complex and may theoretically still misbehave in some cases. Another problematic condition is when PMU is fine, but is not generating events naturally (for example we have configured it for cache misses, but are burning cpu in a loop which is not accessing memory at all). In this case a watchdog will be triggered periodically for no reason, generating the noise in profiling statistics. This noise needs to be filtered out, and seems like it is possible to do it. The trick is to reset watchdog counter to a lower value than it is typically reset in PMU IRQ handler. This way, whenever PMU interrupt is generated, we check if watchdog counter is below the normal threshold. If it is lower, then we know that watchdog interrupt was triggered recently and this sample can be ignored. The difference between normal watchdog counter reset value and the value which gets set on watchdog interrupts should provide sufficient time to get out of the watchdog interrupt handler and its related code, so that it does not show up in statistics that much. And forgot to mention here, very low frequency events (with frequency lower than the frequency of watchdog) may be quite problematic and still distort the statistics because they will be filtered out. Tuning all the magic values may turn out to be a hell. But at the very least, all the watchdog interrupts (both false alarms and real cases of PMU breakage) can be counted and taken into account. This statistics could be somehow reported to the user, so that (s)he would make a decision if the final profiling statistics can be trusted and for how much time the PMU was actually broken. A working proof of concept patch was submitted there: http://groups.google.com/group/beagleboard/msg/dd361f3b43fdeff0 Sorry for not posting it to one of the kernel mailing lists, but I thought that beagleboard mailing list was a good place to find users who may want to try it and evaluate if it has any practical value. Maybe it was not a very wise decision. Unfortunately I'm not a kernel hacker and cleaning up the patch may take too much time and efforts, taking into account my current knowledge. I would be happy if somebody else with more hands-on kernel experience could make a clean and usable Cortex-A8 PMU workaround. I don't care about getting some part of credit for it or not, the end result is more important :) I am ok to help One of the obvious problems with the patch (other than race conditions) is that it is using OMAP-specific GPTIMER. Is there something more portable in the kernel to provide similar functionality? Or are there any Cortex-A8 r1 cores other than OMAP3 in the wild? You can use a 'struct timer_list' and the setup_timer, mod_timer, del_timer_sync. Another API is the hight resolution timers (HRT) but I do not think we need such a high precision timer here. Thanks -- Best regards, Siarhei Siamashka -- To unsubscribe from this list: send the line unsubscribe linux-omap in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html