> From: Stephen Warren [mailto:swar...@wwwdotorg.org]
> Sent: Saturday, February 01, 2014 7:44 PM
> 
> On 02/01/2014 03:00 AM, Andre Heider wrote:
> > On Fri, Jan 31, 2014 at 11:48:37PM -0700, Stephen Warren wrote:
> >> On 01/31/2014 11:12 AM, Andre Heider wrote:
> >>> On Mon, Jan 13, 2014 at 01:50:09PM -0800, Paul Zimmerman wrote:
> >>>> The DWC2 driver should now be in good enough shape to move out of
> >>>> staging. I have stress tested it overnight on RPI running mass
> >>>> storage and Ethernet transfers in parallel, and for several days
> >>>> on our proprietary PCI-based platform.
> >> ...
> >>> this looks just fine, but for whatever reason it breaks sdhci on my rpi.
> >>> With today's Linus' master the dwc2 controller seems to initialize fine,
> >>> but I get this upon boot:
> >>>
> >>> [    1.783316] sdhci-bcm2835 20300000.sdhci: sdhci_pltfm_init failed -12
> >>> [    1.794820] sdhci-bcm2835: probe of 20300000.sdhci failed with error 
> >>> -12
> ...
> >> This is due to the following code:
> ...
> >> What ends up happening, simply due to memory allocation order, is that
> >> the memory writes inside usb_settoggle() end up setting the SDHCI struct
> >> platform_device's num_resources to 0, so that it's call to
> >> platform_get_resource() fails.
> >>
> >> With the DWC2 move patch reverted, some other random piece of memory is
> >> being corrupted, which just happens not to cause any visible problem.
> >> Likely it's some other struct platform_device that's already had its
> >> resources read by the time DWC2 probes and corrupts them.
> >>
> >> (Yes, this was hard to find!)
> >
> > Nice work, but how did you pinpoint this? Am I missing some option/tool
> > or did I just not stare for long enough?
> 
> Well, there was a clear place where an issue was present; the resource
> lookup in sdhci_pltfm_init() was failing, so I put a bunch of printfs
> into that function to dump out the data platform_get_resource() used.
> This clearly pointed at num_resources==0 being the problem. Next, I
> dumped the same data from the code in drivers/of that sets it up, and it
> was OK there, so I knew it was getting over-written somewhere. I then
> basically added hundreds of calls to the same data dumping function
> throughout kernel functions like really_probe() to track down the
> location of the problem. Luckily, the behaviour was stable, so I wasn't
> chasing a race/timing condition. Eventually I narrowed the window to the
> few lines of code I mentioned in _dwc2_hcd_endpoint_reset(). It would
> have been much harder if it was e.g. the USB HW DMAing to memory that
> caused the corruption, so I was lucky:-)

Nice work Stephen, thanks. I will try to come up with a patch to fix this
ASAP, along the lines of what Alan suggested.

-- 
Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to