On 02/01/2014 03:00 AM, Andre Heider wrote:
> On Fri, Jan 31, 2014 at 11:48:37PM -0700, Stephen Warren wrote:
>> On 01/31/2014 11:12 AM, Andre Heider wrote:
>>> On Mon, Jan 13, 2014 at 01:50:09PM -0800, Paul Zimmerman wrote:
>>>> The DWC2 driver should now be in good enough shape to move out of
>>>> staging. I have stress tested it overnight on RPI running mass
>>>> storage and Ethernet transfers in parallel, and for several days
>>>> on our proprietary PCI-based platform.
>> ...
>>> this looks just fine, but for whatever reason it breaks sdhci on my rpi.
>>> With today's Linus' master the dwc2 controller seems to initialize fine,
>>> but I get this upon boot:
>>>
>>> [    1.783316] sdhci-bcm2835 20300000.sdhci: sdhci_pltfm_init failed -12
>>> [    1.794820] sdhci-bcm2835: probe of 20300000.sdhci failed with error -12
...
>> This is due to the following code:
...
>> What ends up happening, simply due to memory allocation order, is that
>> the memory writes inside usb_settoggle() end up setting the SDHCI struct
>> platform_device's num_resources to 0, so that it's call to
>> platform_get_resource() fails.
>>
>> With the DWC2 move patch reverted, some other random piece of memory is
>> being corrupted, which just happens not to cause any visible problem.
>> Likely it's some other struct platform_device that's already had its
>> resources read by the time DWC2 probes and corrupts them.
>>
>> (Yes, this was hard to find!)
> 
> Nice work, but how did you pinpoint this? Am I missing some option/tool
> or did I just not stare for long enough?

Well, there was a clear place where an issue was present; the resource
lookup in sdhci_pltfm_init() was failing, so I put a bunch of printfs
into that function to dump out the data platform_get_resource() used.
This clearly pointed at num_resources==0 being the problem. Next, I
dumped the same data from the code in drivers/of that sets it up, and it
was OK there, so I knew it was getting over-written somewhere. I then
basically added hundreds of calls to the same data dumping function
throughout kernel functions like really_probe() to track down the
location of the problem. Luckily, the behaviour was stable, so I wasn't
chasing a race/timing condition. Eventually I narrowed the window to the
few lines of code I mentioned in _dwc2_hcd_endpoint_reset(). It would
have been much harder if it was e.g. the USB HW DMAing to memory that
caused the corruption, so I was lucky:-)
_______________________________________________
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

Reply via email to