Re: [PATCH 1/3] usb: dwc3: gadget: Prevent losing events in event cache

John Youn Wed, 12 Apr 2017 00:27:07 -0700

On 04/11/2017 11:14 PM, Felipe Balbi wrote:
>
> Hi,
>
> John Youn <john.y...@synopsys.com> writes:
>>>> John Youn <john.y...@synopsys.com> writes:
>>>>>> Thinh Nguyen <thinh.ngu...@synopsys.com> writes:
>>>>>>> The dwc3 driver can overwite its previous events if its top half IRQ
>>>>>>> handler gets invoked again before processing the events in the cache. We
>>>>>>
>>>>>> interrupts are masked, why would top half get invoked again? Is this,
>>>>>> perhaps, related to DWC3 3.00a which has the "Interrupt line doesn't
>>>>>> lower when masked" problem? We've added a lot of code to workaround that
>>>>>> problem and, apparently, it wasn't enough.
>>>>>
>>>>> No, it is not related to that. We verified with PCIe traces. The
>>>>> interrupt line gets deasserted after we mask it. And we put the
>>>>> masking as close to the beginning of the top-half as possible.
>>>>>
>>>>>>
>>>>>> In any case, there's no way top half would be invoked again in a
>>>>>> properly working DWC3.
>>>>>
>>>>> Yet we still see it sometimes. Usually it doesn't create a problem,
>>>>
>>>> that's fair, but it's not for the reason you're describing :-) There
>>>> might be another problem going on, because since we masked the interrupt
>>>> and cleared all events, IRQ shouldn't be raised at all; unless, as I
>>>> mentioned on the other subthread, the IRQ line is shared.
>>>>
>>>>> but if there happens to be a new event there, we get the failure.
>>>>>
>>>>> We didn't trace into that part of the kernel so we can't explain why.
>>>>> But if there is any chance the interrupt line deassertion wasn't
>>>>> detected in time, whatever part of the kernel that thinks it is still
>>>>> asserted might just call our top-half again. This could be a totally
>>>>> wrong assumption, but it doesn't seem too far-fetched.
>>>>
>>>> The kernel doesn't detect IRQ line assertion/deassertion. CPU gets an
>>>> exception when that happens and calls Kernel IRQ handler vector. That
>>>> will, in turn, figure out which line triggered, call the handler and so
>>>> on.
>>>
>>> We're talking about PCIe though, where interrupt assertion and
>>> deassertion are packets. So I would imagine the kernel has to do
>>> something and there could be some latency associated with that.
>>
>> Also, another thing is that the device uses legacy, level-triggered,
>> PCIe interrupts, so for as long as the interrupt is asserted, the TH
>> is called repeatedly.
>
> yes, and that's why we have:
>
>> static irqreturn_t dwc3_check_event_buf(struct dwc3_event_buffer *evt)
>> {
>>      struct dwc3 *dwc = evt->dwc;
>>      u32 amount;
>>      u32 count;
>>      u32 reg;
>
>>      if (pm_runtime_suspended(dwc->dev)) {
>>              pm_runtime_get(dwc->dev);
>>              disable_irq_nosync(dwc->irq_gadget);
>>              dwc->pending_events = true;
>>              return IRQ_HANDLED;
>>      }
>>
>>      count = dwc3_readl(dwc->regs, DWC3_GEVNTCOUNT(0));
>>      count &= DWC3_GEVNTCOUNT_MASK;
>
> check how many events are pending in the event buffer.
>
>>      if (!count)
>>              return IRQ_NONE;
>>
>>      evt->count = count;
>>      evt->flags |= DWC3_EVENT_PENDING;
>>
>>      /* Mask interrupt */
>>      reg = dwc3_readl(dwc->regs, DWC3_GEVNTSIZ(0));
>>      reg |= DWC3_GEVNTSIZ_INTMASK;
>
> mask interrupt generation
>
>>      dwc3_writel(dwc->regs, DWC3_GEVNTSIZ(0), reg);
>>
>>      amount = min(count, evt->length - evt->lpos);
>>      memcpy(evt->cache + evt->lpos, evt->buf + evt->lpos, amount);
>>
>>      if (amount < count)
>>              memcpy(evt->cache, evt->buf, count - amount);
>>
>>      dwc3_writel(dwc->regs, DWC3_GEVNTCOUNT(0), count);
>
> clear ALL events from event buffer. This brings the line down, so we
> shouldn't re-enter.


Right, I get all that, but you're not addressing the fact that the
"interrupts" are just packets on the PCIe and there could be some
latency in detecting the de-assertion. The "line" doesn't go down
immediately when we mask it. I don't see any obvious way to guarantee
that it will be de-asserted by the time we exit TH.

There is both an "assert" and "de-assert" packet. Between those, the
TH gets called continuously. You can see multiple TH before BH
behavior just by omitting the mask and leaving it asserted.

>
>>      return IRQ_WAKE_THREAD;
>> }
>
>> So we mask the interrupt in the TH and a short time later, the
>> interrupt de-assertion packet is sent on PCIe bus and if that's not
>> seen right away we may already have another call to TH before the BH
>> gets scheduled.
>
> not sure this can happen. If that's the case, every PCI driver would
> have all sorts of tricks to cope with this, not only dwc3 :-)

Well, the DWC3 is now doing something destructive in the TH. Otherwise
multiple TH calls wouldn't have affected anything. We should probably
use a queue for the events instead. I think most drivers should be
robust against this type of thing, as you never know when you can
get an interrupt, and "spurious" interrupts are a thing.

>
> Bjorn, is this something that can happen on PCIe?
>
> Quick summary of the problem:
>
> John and Thinh are experiencing a re-entrant top-half handler even
> though we have cleared pending IRQ status _and_ masked Interrupts. SNPS
> is using an FPGA model of the latest DWC3 core under x86.
>
> I have never seen this behavior on ARM or any of the x86 devices
> containing this core (and this includes all the newest x86 cores, see
> drivers/usb/dwc3/dwc3-pci.c for PCI IDs if you care enough :-)
>
> Anyway, from my point of view, this is either a bug in IRQ subsystem
> which only John and Thinh can reproduce at this moment, or a regression
> with DWC3 IP Core :-s
>

I don't think it is necessarily either of those things. I think the
fundamental behavior was always like this but adding the event cache
and having a destructive TH, requiring a single TH before BH caused
the issue to come up.

We can try with various older IP versions tomorrow to try to confirm
this.

Regards,
John
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/3] usb: dwc3: gadget: Prevent losing events in event cache

Reply via email to