Re: [linux-sunxi] Re: Strange behavior with missing H3 interrupts
On 04/11/2018 05:40 AM, Maxime Ripard wrote: On Tue, Apr 10, 2018 at 12:33:27PM -0700, Martin Kelly wrote: On 04/10/2018 06:54 AM, Maxime Ripard wrote: On Fri, Apr 06, 2018 at 02:52:35PM +0100, Andre Przywara wrote: On 05/04/18 20:48, Martin Kelly wrote: On 04/05/2018 06:07 AM, Maxime Ripard wrote: On Wed, Apr 04, 2018 at 02:50:25PM -0700, Martin Kelly wrote: Hi, I've noticed strange behavior on my H3 (nanopi neo air) and am wondering if anyone has suggestions for further debugging it, as I'm getting stumped. Specifically, I have configured a device (Invensense MPU9250) to deliver interrupts at 10Hz to PG_EINT11. For some reason, though, the interrupt handler is being called at only about 6 Hz. Looking at a logic analyzer, I see the hardware is interrupting at 10 Hz as it should, but sometimes the interrupts are just missed from the kernel side. So you might see a 200 ms gap between calls to the IRQ handler, but 100 ms between hardware IRQ events. This really looks like you're just missing the edge. Interrupts handlers in Linux run with the interrupts disabled, so if you happen to have another interrupt running at the time where your device is emmiting its own, you'll miss it. But the software/kernel shouldn't matter in that case, should it? It is actually the port controller hardware registering the interrupt cause, and then forwarding this to the GIC, and that to the CPU. So once the Allwinner port controller has sampled the IRQ, it sets the pending bit in the PG_EINT_STATUS_REG, from then on the interrupt cannot be lost anymore. Unless it's configured as a line level IRQ on the pin controller side, where a lowered line (the end of the pulse) would mean the pending state is cleared again. So it should really be edge on the pinctrl side. Or am I missing something? That's what I would think a proper controller would behave yeah, but I've never experienced one behaving like that. It should be pretty easy to test, you just need to read the pending register once the interrupts are re-enabled. Would this be in the arm-gic code or in the sunxi-pinctrl code? The interrupts are masked at the CPU level, so even before the GIC. However, you can always add a hack to read the PIO registers whenever the interrupts are re-enabled. When I instrument the sunxi-pinctrl code, I see lots of calls to sunxi_pinctrl_irq_ack but no calls to sunxi_pinctrl_irq_mask, so the controller is not seeing the interrupts at all. If I remember this properly, _irq_mask will only be called when you disable that interrupt in particular, through a call to disable_irq for example. Yes, based on what I'm seeing, I agree. I tried a hack to readl the PIO INT STATUS register for the interrupt I'm dealing with, and it appears it's not being set to pending, AFAICT. If it can use level interrupts, you probably should use that instead. Well, it's always level between the pinctrl and the GIC, and even if it would be edge, the GIC would store this state until the CPU acknowledges it. PSTATE.I=0 shouldn't have an effect. I would actually expect it to be the other way around: configuring as *level* on the pinctrl side allows for IRQ *pulses* to be lost. The behaviour I've seen on some controllers is that it's actually following the input pin state, which means that if the input pin goes low, the line between the pin controller and the GIC will also go low. And since it's level based, you will not notice it. Could it be that the pinctrl is clocked too slowly, so it can't sample the pins quickly enough and misses the rather short pulse? By default it's clocked at 32kHz, which means a period of around 30us. That's indeed not enough if the pulse is around 50us. I guess you could try to play with the input-debounce property and see if it makes things better. Could you expand on that? Since 30us is shorter than the pulse time, I'm unclear on why the interrupts would still be missed. Right, nevermind, that was a brainfart :) Still playing with the clock and the clock scaler would be a good idea too to see if it has any impact. Ah OK. I will try changing the clock oscillator and see what happens. Thanks very much again for your help! Maxime -- You received this message because you are subscribed to the Google Groups "linux-sunxi" group. To unsubscribe from this group and stop receiving emails from it, send an email to linux-sunxi+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [linux-sunxi] Re: Strange behavior with missing H3 interrupts
On 04/10/2018 06:54 AM, Maxime Ripard wrote: On Fri, Apr 06, 2018 at 02:52:35PM +0100, Andre Przywara wrote: On 05/04/18 20:48, Martin Kelly wrote: On 04/05/2018 06:07 AM, Maxime Ripard wrote: On Wed, Apr 04, 2018 at 02:50:25PM -0700, Martin Kelly wrote: Hi, I've noticed strange behavior on my H3 (nanopi neo air) and am wondering if anyone has suggestions for further debugging it, as I'm getting stumped. Specifically, I have configured a device (Invensense MPU9250) to deliver interrupts at 10Hz to PG_EINT11. For some reason, though, the interrupt handler is being called at only about 6 Hz. Looking at a logic analyzer, I see the hardware is interrupting at 10 Hz as it should, but sometimes the interrupts are just missed from the kernel side. So you might see a 200 ms gap between calls to the IRQ handler, but 100 ms between hardware IRQ events. This really looks like you're just missing the edge. Interrupts handlers in Linux run with the interrupts disabled, so if you happen to have another interrupt running at the time where your device is emmiting its own, you'll miss it. But the software/kernel shouldn't matter in that case, should it? It is actually the port controller hardware registering the interrupt cause, and then forwarding this to the GIC, and that to the CPU. So once the Allwinner port controller has sampled the IRQ, it sets the pending bit in the PG_EINT_STATUS_REG, from then on the interrupt cannot be lost anymore. Unless it's configured as a line level IRQ on the pin controller side, where a lowered line (the end of the pulse) would mean the pending state is cleared again. So it should really be edge on the pinctrl side. Or am I missing something? That's what I would think a proper controller would behave yeah, but I've never experienced one behaving like that. It should be pretty easy to test, you just need to read the pending register once the interrupts are re-enabled. Would this be in the arm-gic code or in the sunxi-pinctrl code? When I instrument the sunxi-pinctrl code, I see lots of calls to sunxi_pinctrl_irq_ack but no calls to sunxi_pinctrl_irq_mask, so the controller is not seeing the interrupts at all. If it can use level interrupts, you probably should use that instead. Well, it's always level between the pinctrl and the GIC, and even if it would be edge, the GIC would store this state until the CPU acknowledges it. PSTATE.I=0 shouldn't have an effect. I would actually expect it to be the other way around: configuring as *level* on the pinctrl side allows for IRQ *pulses* to be lost. The behaviour I've seen on some controllers is that it's actually following the input pin state, which means that if the input pin goes low, the line between the pin controller and the GIC will also go low. And since it's level based, you will not notice it. Could it be that the pinctrl is clocked too slowly, so it can't sample the pins quickly enough and misses the rather short pulse? By default it's clocked at 32kHz, which means a period of around 30us. That's indeed not enough if the pulse is around 50us. I guess you could try to play with the input-debounce property and see if it makes things better. Could you expand on that? Since 30us is shorter than the pulse time, I'm unclear on why the interrupts would still be missed. -- You received this message because you are subscribed to the Google Groups "linux-sunxi" group. To unsubscribe from this group and stop receiving emails from it, send an email to linux-sunxi+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[linux-sunxi] Re: Strange behavior with missing H3 interrupts
On 04/05/2018 10:50 PM, Maxime Ripard wrote: On Thu, Apr 05, 2018 at 12:48:56PM -0700, Martin Kelly wrote: On 04/05/2018 06:07 AM, Maxime Ripard wrote: On Wed, Apr 04, 2018 at 02:50:25PM -0700, Martin Kelly wrote: Hi, I've noticed strange behavior on my H3 (nanopi neo air) and am wondering if anyone has suggestions for further debugging it, as I'm getting stumped. Specifically, I have configured a device (Invensense MPU9250) to deliver interrupts at 10Hz to PG_EINT11. For some reason, though, the interrupt handler is being called at only about 6 Hz. Looking at a logic analyzer, I see the hardware is interrupting at 10 Hz as it should, but sometimes the interrupts are just missed from the kernel side. So you might see a 200 ms gap between calls to the IRQ handler, but 100 ms between hardware IRQ events. This really looks like you're just missing the edge. Interrupts handlers in Linux run with the interrupts disabled, so if you happen to have another interrupt running at the time where your device is emmiting its own, you'll miss it. If it can use level interrupts, you probably should use that instead. Yes, I have to agree with your theory. I switched to level interrupts and nothing is missed now. Ok, great. Thanks very much for the suggestion! -- You received this message because you are subscribed to the Google Groups "linux-sunxi" group. To unsubscribe from this group and stop receiving emails from it, send an email to linux-sunxi+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [linux-sunxi] Re: Strange behavior with missing H3 interrupts
On 04/06/2018 06:52 AM, Andre Przywara wrote: Hi, On 05/04/18 20:48, Martin Kelly wrote: On 04/05/2018 06:07 AM, Maxime Ripard wrote: On Wed, Apr 04, 2018 at 02:50:25PM -0700, Martin Kelly wrote: Hi, I've noticed strange behavior on my H3 (nanopi neo air) and am wondering if anyone has suggestions for further debugging it, as I'm getting stumped. Specifically, I have configured a device (Invensense MPU9250) to deliver interrupts at 10Hz to PG_EINT11. For some reason, though, the interrupt handler is being called at only about 6 Hz. Looking at a logic analyzer, I see the hardware is interrupting at 10 Hz as it should, but sometimes the interrupts are just missed from the kernel side. So you might see a 200 ms gap between calls to the IRQ handler, but 100 ms between hardware IRQ events. This really looks like you're just missing the edge. Interrupts handlers in Linux run with the interrupts disabled, so if you happen to have another interrupt running at the time where your device is emmiting its own, you'll miss it. But the software/kernel shouldn't matter in that case, should it? It is actually the port controller hardware registering the interrupt cause, and then forwarding this to the GIC, and that to the CPU. So once the Allwinner port controller has sampled the IRQ, it sets the pending bit in the PG_EINT_STATUS_REG, from then on the interrupt cannot be lost anymore. Unless it's configured as a line level IRQ on the pin controller side, where a lowered line (the end of the pulse) would mean the pending state is cleared again. So it should really be edge on the pinctrl side. Or am I missing something? Yes, this is what is confusing me; I would expect the pending bit to be set and the interrupt to still be service if only a single edge occurred while interrupts were disabled (albeit slightly delayed). Another strange aspect of this is that roughly the same percentage of interrupts are missed at low (10 Hz) and higher (500 Hz) frequencies: 40% or so. This would align well with a sampling rate mismatch. I am also seeing some very strange extreme cases of nearly a full second without interrupts, which seems hard to explain under any theory I can come up with. Here a screenshot from my logic analyzer: https://ibb.co/cUQSBc D0 is the I2C SCL D1 is the I2C SDA D2 is the interrupt (falling edge in this case) The interrupt handler causes I2C traffic, so when you see the traffic shortly after the interrupt, the interrupt handler fired, while for other cases it it did not. If it can use level interrupts, you probably should use that instead. Well, it's always level between the pinctrl and the GIC, and even if it would be edge, the GIC would store this state until the CPU acknowledges it. PSTATE.I=0 shouldn't have an effect. I would actually expect it to be the other way around: configuring as *level* on the pinctrl side allows for IRQ *pulses* to be lost. Could it be that the pinctrl is clocked too slowly, so it can't sample the pins quickly enough and misses the rather short pulse? Cheers, Andre. Yes, I have to agree with your theory. I switched to level interrupts and nothing is missed now. One thing I'm confused about: I had expected that if a single edge-triggered interrupt is missed because interrupts were disabled, it would still be resumed later. From my testing, it appears when interrupts are missed, the entire period is skipped, and we we act on the next interrupt instead. For example, I'm seeing: 0 ms: interrupt, handler fires 100 ms: interrupt, no handler 200 ms: interrupt, handler fires I never see a slightly-delayed handler, as I would expect if interrupts are being processed after they are re-enabled. I also noticed that, depending on what's going on in the system and the interrupt frequencies (my tests ranged from 10 to 500 Hz), the pattern of which interrupts are missed changes. Sometimes at 10 Hz, I see almost an entire second of missed interrupts, and then a few seconds of every interrupt being hit, and then another long batch of missed interrupts. At higher frequencies, the misses are about the same frequency (about 40% misses), but more fine-grained instead of batched up. I'm not sure what to make of these patterns, but 40% miss rate on a 10 Hz signal (at near 0% CPU load) really surprises me. -- You received this message because you are subscribed to the Google Groups "linux-sunxi" group. To unsubscribe from this group and stop receiving emails from it, send an email to linux-sunxi+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [linux-sunxi] Re: Strange behavior with missing H3 interrupts
Hi, On 05/04/18 20:48, Martin Kelly wrote: > On 04/05/2018 06:07 AM, Maxime Ripard wrote: >> On Wed, Apr 04, 2018 at 02:50:25PM -0700, Martin Kelly wrote: >>> Hi, >>> >>> I've noticed strange behavior on my H3 (nanopi neo air) and am >>> wondering if >>> anyone has suggestions for further debugging it, as I'm getting stumped. >>> >>> Specifically, I have configured a device (Invensense MPU9250) to deliver >>> interrupts at 10Hz to PG_EINT11. For some reason, though, the interrupt >>> handler is being called at only about 6 Hz. >>> >>> Looking at a logic analyzer, I see the hardware is interrupting at 10 >>> Hz as >>> it should, but sometimes the interrupts are just missed from the kernel >>> side. So you might see a 200 ms gap between calls to the IRQ handler, >>> but >>> 100 ms between hardware IRQ events. >> >> This really looks like you're just missing the edge. Interrupts >> handlers in Linux run with the interrupts disabled, so if you happen >> to have another interrupt running at the time where your device is >> emmiting its own, you'll miss it. But the software/kernel shouldn't matter in that case, should it? It is actually the port controller hardware registering the interrupt cause, and then forwarding this to the GIC, and that to the CPU. So once the Allwinner port controller has sampled the IRQ, it sets the pending bit in the PG_EINT_STATUS_REG, from then on the interrupt cannot be lost anymore. Unless it's configured as a line level IRQ on the pin controller side, where a lowered line (the end of the pulse) would mean the pending state is cleared again. So it should really be edge on the pinctrl side. Or am I missing something? >> If it can use level interrupts, you probably should use that instead. Well, it's always level between the pinctrl and the GIC, and even if it would be edge, the GIC would store this state until the CPU acknowledges it. PSTATE.I=0 shouldn't have an effect. I would actually expect it to be the other way around: configuring as *level* on the pinctrl side allows for IRQ *pulses* to be lost. Could it be that the pinctrl is clocked too slowly, so it can't sample the pins quickly enough and misses the rather short pulse? Cheers, Andre. > Yes, I have to agree with your theory. I switched to level interrupts > and nothing is missed now. > > One thing I'm confused about: I had expected that if a single > edge-triggered interrupt is missed because interrupts were disabled, it > would still be resumed later. From my testing, it appears when > interrupts are missed, the entire period is skipped, and we we act on > the next interrupt instead. > > For example, I'm seeing: > 0 ms: interrupt, handler fires > 100 ms: interrupt, no handler > 200 ms: interrupt, handler fires > > I never see a slightly-delayed handler, as I would expect if interrupts > are being processed after they are re-enabled. > > I also noticed that, depending on what's going on in the system and the > interrupt frequencies (my tests ranged from 10 to 500 Hz), the pattern > of which interrupts are missed changes. Sometimes at 10 Hz, I see almost > an entire second of missed interrupts, and then a few seconds of every > interrupt being hit, and then another long batch of missed interrupts. > At higher frequencies, the misses are about the same frequency (about > 40% misses), but more fine-grained instead of batched up. I'm not sure > what to make of these patterns, but 40% miss rate on a 10 Hz signal (at > near 0% CPU load) really surprises me. > -- You received this message because you are subscribed to the Google Groups "linux-sunxi" group. To unsubscribe from this group and stop receiving emails from it, send an email to linux-sunxi+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[linux-sunxi] Re: Strange behavior with missing H3 interrupts
On 04/05/2018 06:07 AM, Maxime Ripard wrote: On Wed, Apr 04, 2018 at 02:50:25PM -0700, Martin Kelly wrote: Hi, I've noticed strange behavior on my H3 (nanopi neo air) and am wondering if anyone has suggestions for further debugging it, as I'm getting stumped. Specifically, I have configured a device (Invensense MPU9250) to deliver interrupts at 10Hz to PG_EINT11. For some reason, though, the interrupt handler is being called at only about 6 Hz. Looking at a logic analyzer, I see the hardware is interrupting at 10 Hz as it should, but sometimes the interrupts are just missed from the kernel side. So you might see a 200 ms gap between calls to the IRQ handler, but 100 ms between hardware IRQ events. This really looks like you're just missing the edge. Interrupts handlers in Linux run with the interrupts disabled, so if you happen to have another interrupt running at the time where your device is emmiting its own, you'll miss it. If it can use level interrupts, you probably should use that instead. Maxime Yes, I have to agree with your theory. I switched to level interrupts and nothing is missed now. One thing I'm confused about: I had expected that if a single edge-triggered interrupt is missed because interrupts were disabled, it would still be resumed later. From my testing, it appears when interrupts are missed, the entire period is skipped, and we we act on the next interrupt instead. For example, I'm seeing: 0 ms: interrupt, handler fires 100 ms: interrupt, no handler 200 ms: interrupt, handler fires I never see a slightly-delayed handler, as I would expect if interrupts are being processed after they are re-enabled. I also noticed that, depending on what's going on in the system and the interrupt frequencies (my tests ranged from 10 to 500 Hz), the pattern of which interrupts are missed changes. Sometimes at 10 Hz, I see almost an entire second of missed interrupts, and then a few seconds of every interrupt being hit, and then another long batch of missed interrupts. At higher frequencies, the misses are about the same frequency (about 40% misses), but more fine-grained instead of batched up. I'm not sure what to make of these patterns, but 40% miss rate on a 10 Hz signal (at near 0% CPU load) really surprises me. -- You received this message because you are subscribed to the Google Groups "linux-sunxi" group. To unsubscribe from this group and stop receiving emails from it, send an email to linux-sunxi+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.