Re: patch 8466489ef5ba48272ba4fa4ea9f8f403306de4c7 breaks Renesas USB3 controller functionality

2018-05-03 Thread Ard Biesheuvel
On 3 May 2018 at 11:41, Marc Zyngier  wrote:
> On 03/05/18 09:42, Faiz Abbas wrote:
>> Hi Marc,
>>
>> On Thursday 03 May 2018 01:44 PM, Marc Zyngier wrote:
>>> On 03/05/18 05:49, Faiz Abbas wrote:
 Hi Everyone,

 On Wednesday 11 April 2018 07:32 PM, Marc Zyngier wrote:
> On Wed, 11 Apr 2018 14:11:52 +0100,
> Bockholdt Arne wrote:
>>
>> Hi all,
>>
>> is there anything new, I've just tried the new stable 4.16.1 kernel
>> without any change. The Renesys USB3 controller is still not
>> functional. I'm willing to test any patch that is based on a stable
>> kernel version because the machine is in production use.
>
> Have you tested the branch[1] I mentioned in my previous email?
> Without your feedback, I cannot really make much progress on this.
>
> Thanks,
>
>M.
>
> [1] https://www.spinics.net/lists/linux-usb/msg167301.html
>

 I was also having problems with a Renesas card (01:00.0 USB controller
 [0c03]: Renesas Technology Corp. uPD720201 USB 3.0 Host Controller
 [1912:0014]) on a dra72x-evm. The kernel would just hang because of the
 xhci reset.

 Log:https://pastebin.ubuntu.com/p/dYYV9DZMwx/

 The patches on Marc's branch have fixed this for me as well. Thanks for
 the fix.
>>> OK. I'll rebase this on a more recent version of the kernel, make it
>>> conditional on having an iommu (as it seems to be the only affected
>>> configuration), and post that to a wider audience.
>>
>> Just to be sure, you are talking about the original 32 bit DMA issue
>> being conditional on iommu right? Because dra72x doesn't use iommu.
>
> I'm talking about making the whole workaround dependent on the USB
> controller being behind an iommu. No iommu, no workaround (because it is
> likely that there is no problem in that case).
>

That is still only a partial solution, unfortunately.

On non-x86 systems with memory above and below the 4 GB mark, it is
undefined which side the firmware will choose and which side the
kernel will choose (although I suppose the kernel may attempt to
preserve 32-bit addressable memory for peripherals that are only
32-bit capable)

This would be much easier to fix at the UEFI stub level (but still in
kernel code, not firmware), by checking for PCI I/O protocol instances
with the correct VID/PID and check whether the
EFI_PCI_IO_ATTRIBUTE_DUAL_ADDRESS_CYCLE attribute is set, but that is
also only a partial solution.
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Regression due to "Workaround for uPD72020x USB3 chips"

2018-05-02 Thread Ard Biesheuvel
On 2 May 2018 at 11:18, Domenico Andreoli <domenico.andre...@linux.com> wrote:
> On Wed, May 02, 2018 at 10:22:05AM +0200, Ard Biesheuvel wrote:
>> On 2 May 2018 at 10:06, Domenico Andreoli <domenico.andre...@linux.com> 
>> wrote:
>> > Dear all,
>> >
>> >   my home machine stopped to boot starting from kernel version 4.12.7.
>> >
>> > The last message I read is about resetting some USB3 bus. It's 100%
>> > reproducible also with any recent kernel up to 4.17.0-rc3.
>> >
>> > I bisected down to the following commit:
>> >
>> > commit 0e1f0eaed6c20db41ff61e024b361ee3ec9d686c (tag: my_broken_xhci)
>> > Author: Marc Zyngier <marc.zyng...@arm.com>
>> > Date:   Tue Aug 1 20:11:08 2017 -0500
>> >
>> > xhci: Reset Renesas uPD72020x USB controller for 32-bit DMA issue
>> >
>>
>> Hello Domenico,
>
> Sir..
>
>>
>> Long time no see :-)
>
> Instead I see you pretty much everywhere :P
>
>>
>> Please refer to the following thread
>>
>> https://marc.info/?l=linux-usb=151872295419486
>>
>> for some discussion on this topic, and a possible workaround.
>
> Interesting, thanks for the pointer!
>
> I'm happy to learn I've this nice piece of HW in my machine. It might
> have at least some other issue while trying to hibernate (there is a
> fair chance for it to abort the process because of some "business").
>
> I tried the usb/uPD720202-reset branch from Marc (with some massaging
> due to quirks now being >32) and my system boots again. I think it's
> more because of the PCI reset is removed, I am not affected by the
> original issue it was meant to fix.
>

Indeed. The majority of x86 machines never perform DMA above 4 GB in
UEFI, so the issue does not occur, although it could still affect a
kernel with IOMMU on kexec'ed from a kernel with IOMMU off.
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: patch 8466489ef5ba48272ba4fa4ea9f8f403306de4c7 breaks Renesas USB3 controller functionality

2018-04-11 Thread Ard Biesheuvel
On 12 April 2018 at 07:05, Bockholdt Arne
 wrote:
>
> On Wed, 2018-04-11 at 15:02 +0100, Marc Zyngier wrote:
>
> On Wed, 11 Apr 2018 14:11:52 +0100,
> Bockholdt Arne wrote:
>
>
> Hi all,
>
> is there anything new, I've just tried the new stable 4.16.1 kernel
> without any change. The Renesys USB3 controller is still not
> functional. I'm willing to test any patch that is based on a stable
> kernel version because the machine is in production use.
>
>
> Have you tested the branch[1] I mentioned in my previous email?
> Without your feedback, I cannot really make much progress on this.
>
> Thanks,
>
> M.
>
> [1] https://www.spinics.net/lists/linux-usb/msg167301.html
>
>
> It seems the repo is based on 3.9 kernel and not on a current stable branch,
> isn't it? It's rather old, I'm not sure my setup will work with this old
> kernel.
>

Are you sure you pulled the correct branch? usb/uPD720202-reset has
the following on top of *v4.16-rc6*

Revert "xhci: Reset Renesas uPD72020x USB controller for 32-bit DMA issue"
xhci: Add quirk to zero 64bit registers on Renesas PCIe controllers

(https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=usb/uPD720202-reset)

> Do you have a patch for a less dated kernel?
>
> Thank you,
>
> Arne
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: patch 8466489ef5ba48272ba4fa4ea9f8f403306de4c7 breaks Renesas USB3 controller functionality

2018-03-25 Thread Ard Biesheuvel


> On 25 Mar 2018, at 15:14, Marc Zyngier <marc.zyng...@arm.com> wrote:
> 
> On Sun, 25 Mar 2018 14:26:58 +0100
> Ard Biesheuvel <ard.biesheu...@linaro.org> wrote:
> 
>>> On 25 March 2018 at 13:52, Marc Zyngier <marc.zyng...@arm.com> wrote:
>>> On Sun, 25 Mar 2018 13:38:19 +0100,
>>> Ard Biesheuvel wrote:  
>>>> 
>>>>> On 25 March 2018 at 13:31, Marc Zyngier <marc.zyng...@arm.com> wrote:  
>>>>> On Sun, 25 Mar 2018 12:57:55 +0100,
>>>>> Ard Biesheuvel wrote:  
>>>>>> 
>>>>>>> On 25 March 2018 at 12:51, Marc Zyngier <marc.zyng...@arm.com> wrote:  
>>>>>>> On Sun, 25 Mar 2018 11:48:35 +0100,
>>>>>>> Ard Biesheuvel wrote:
>>>>>>> 
>>>>>>> Hi Ard,
>>>>>>> 
>>>>>>> [...]
>>>>>>> 
>>>>>>>>> I finally found some time to work on this, and came up with an
>>>>>>>>> alternative approach (it turns out that this chip is even more
>>>>>>>>> braindead than I thought).
>>>>>>>>> 
>>>>>>>>> It is slightly scary, in the sense that the USB controller seems to
>>>>>>>>> perform memory accesses even when halted, and can generate faults,
>>>>>>>>> but it works just fine on my system. And with this, we can drop the
>>>>>>>>> hard reset at boot time. I'm still on the fence to limit it to systems
>>>>>>>>> with an iommu though.
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> Hi Marc,
>>>>>>>> 
>>>>>>>> I take it you tested this on Cello?  
>>>>>>> 
>>>>>>> Tested on Cello indeed (I should have mentioned that the first place).
>>>>>>> 
>>>>>>>> There, it might make sense to
>>>>>>>> limit this to systems with an IOMMU, but not in the general case, I
>>>>>>>> think. The reason is that it is not guaranteed that the firmware will
>>>>>>>> use 32-bit addressable allocations for these data structures, even if
>>>>>>>> the kernel is able to without an IOMMU. (UEFI on arm64 will not prefer
>>>>>>>> 32-bit addressable memory for PCI DMA if it is available, and usually
>>>>>>>> serves heap allocations [such as the ones used for these data
>>>>>>>> structures] starting at the top of DRAM)  
>>>>>>> 
>>>>>>> My main worry is that this controller will happily try and DMA from
>>>>>>> zero as we wipe the 64bit registers, even when halted. On Seattle (and
>>>>>>> thus Cello), this is just fine as there is nothing there, and the
>>>>>>> controller aborts with the HSE bit set.
>>>>>>> 
>>>>>>> On other systems, where memory actually exists at 0, who knows what
>>>>>>> this is going to do? On the other hand, this is not worse than the
>>>>>>> current situation, where we could end-up with any odd address...
>>>>>>> 
>>>>>> 
>>>>>> Is the PCI_COMMAND_MASTER bit enabled at this point? What happens if
>>>>>> you clear it first?  
>>>>> 
>>>>> Tried that. No difference whatsoever, as I still get a fault with the
>>>>> device accessing address 0, and being caught by the iommu.
>>>>> 
>>>> 
>>>> Wow so this device is more broken than I thought.  
>>> 
>>> That's my impression as well. I came to the conclusion that the only
>>> way to make it behave is to crash it first, and then to reset it.
>>> 
>> 
>> OK, so what if it doesn't crash? Without an IOMMU, it is quite likely
>> that putting zeroes in the lower half of a 64-bit memory address
>> produces a physical address that is valid, and so the device will
>> still misbehave in that case.
>> 
>> If making it crash is what kicks it into submission, could we perhaps
>> use U64_MAX instead?
> 
> Just tried that. It gets into some even uglier state, and never
> recovers. Even doing U64_MAX, fault, and then zero doesn't help. The
> problem is that it dies with something in the top 32 bits, and that's
> pretty final.
> 
> If really feels that without an iommu in the path, this device is
> completely unsafe, and should never be fed 64bit addresses.
> 

... unless we do the pci reset in the exitbootservices handler in uefi, which 
is probably the most robust way of handling this (or wire up the iommu)

i have cello smmu patches if you’re interested--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: patch 8466489ef5ba48272ba4fa4ea9f8f403306de4c7 breaks Renesas USB3 controller functionality

2018-03-25 Thread Ard Biesheuvel
On 25 March 2018 at 13:52, Marc Zyngier <marc.zyng...@arm.com> wrote:
> On Sun, 25 Mar 2018 13:38:19 +0100,
> Ard Biesheuvel wrote:
>>
>> On 25 March 2018 at 13:31, Marc Zyngier <marc.zyng...@arm.com> wrote:
>> > On Sun, 25 Mar 2018 12:57:55 +0100,
>> > Ard Biesheuvel wrote:
>> >>
>> >> On 25 March 2018 at 12:51, Marc Zyngier <marc.zyng...@arm.com> wrote:
>> >> > On Sun, 25 Mar 2018 11:48:35 +0100,
>> >> > Ard Biesheuvel wrote:
>> >> >
>> >> > Hi Ard,
>> >> >
>> >> > [...]
>> >> >
>> >> >> > I finally found some time to work on this, and came up with an
>> >> >> > alternative approach (it turns out that this chip is even more
>> >> >> > braindead than I thought).
>> >> >> >
>> >> >> > It is slightly scary, in the sense that the USB controller seems to
>> >> >> > perform memory accesses even when halted, and can generate faults,
>> >> >> > but it works just fine on my system. And with this, we can drop the
>> >> >> > hard reset at boot time. I'm still on the fence to limit it to 
>> >> >> > systems
>> >> >> > with an iommu though.
>> >> >> >
>> >> >>
>> >> >> Hi Marc,
>> >> >>
>> >> >> I take it you tested this on Cello?
>> >> >
>> >> > Tested on Cello indeed (I should have mentioned that the first place).
>> >> >
>> >> >> There, it might make sense to
>> >> >> limit this to systems with an IOMMU, but not in the general case, I
>> >> >> think. The reason is that it is not guaranteed that the firmware will
>> >> >> use 32-bit addressable allocations for these data structures, even if
>> >> >> the kernel is able to without an IOMMU. (UEFI on arm64 will not prefer
>> >> >> 32-bit addressable memory for PCI DMA if it is available, and usually
>> >> >> serves heap allocations [such as the ones used for these data
>> >> >> structures] starting at the top of DRAM)
>> >> >
>> >> > My main worry is that this controller will happily try and DMA from
>> >> > zero as we wipe the 64bit registers, even when halted. On Seattle (and
>> >> > thus Cello), this is just fine as there is nothing there, and the
>> >> > controller aborts with the HSE bit set.
>> >> >
>> >> > On other systems, where memory actually exists at 0, who knows what
>> >> > this is going to do? On the other hand, this is not worse than the
>> >> > current situation, where we could end-up with any odd address...
>> >> >
>> >>
>> >> Is the PCI_COMMAND_MASTER bit enabled at this point? What happens if
>> >> you clear it first?
>> >
>> > Tried that. No difference whatsoever, as I still get a fault with the
>> > device accessing address 0, and being caught by the iommu.
>> >
>>
>> Wow so this device is more broken than I thought.
>
> That's my impression as well. I came to the conclusion that the only
> way to make it behave is to crash it first, and then to reset it.
>

OK, so what if it doesn't crash? Without an IOMMU, it is quite likely
that putting zeroes in the lower half of a 64-bit memory address
produces a physical address that is valid, and so the device will
still misbehave in that case.

If making it crash is what kicks it into submission, could we perhaps
use U64_MAX instead?
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: patch 8466489ef5ba48272ba4fa4ea9f8f403306de4c7 breaks Renesas USB3 controller functionality

2018-03-25 Thread Ard Biesheuvel
On 25 March 2018 at 13:31, Marc Zyngier <marc.zyng...@arm.com> wrote:
> On Sun, 25 Mar 2018 12:57:55 +0100,
> Ard Biesheuvel wrote:
>>
>> On 25 March 2018 at 12:51, Marc Zyngier <marc.zyng...@arm.com> wrote:
>> > On Sun, 25 Mar 2018 11:48:35 +0100,
>> > Ard Biesheuvel wrote:
>> >
>> > Hi Ard,
>> >
>> > [...]
>> >
>> >> > I finally found some time to work on this, and came up with an
>> >> > alternative approach (it turns out that this chip is even more
>> >> > braindead than I thought).
>> >> >
>> >> > It is slightly scary, in the sense that the USB controller seems to
>> >> > perform memory accesses even when halted, and can generate faults,
>> >> > but it works just fine on my system. And with this, we can drop the
>> >> > hard reset at boot time. I'm still on the fence to limit it to systems
>> >> > with an iommu though.
>> >> >
>> >>
>> >> Hi Marc,
>> >>
>> >> I take it you tested this on Cello?
>> >
>> > Tested on Cello indeed (I should have mentioned that the first place).
>> >
>> >> There, it might make sense to
>> >> limit this to systems with an IOMMU, but not in the general case, I
>> >> think. The reason is that it is not guaranteed that the firmware will
>> >> use 32-bit addressable allocations for these data structures, even if
>> >> the kernel is able to without an IOMMU. (UEFI on arm64 will not prefer
>> >> 32-bit addressable memory for PCI DMA if it is available, and usually
>> >> serves heap allocations [such as the ones used for these data
>> >> structures] starting at the top of DRAM)
>> >
>> > My main worry is that this controller will happily try and DMA from
>> > zero as we wipe the 64bit registers, even when halted. On Seattle (and
>> > thus Cello), this is just fine as there is nothing there, and the
>> > controller aborts with the HSE bit set.
>> >
>> > On other systems, where memory actually exists at 0, who knows what
>> > this is going to do? On the other hand, this is not worse than the
>> > current situation, where we could end-up with any odd address...
>> >
>>
>> Is the PCI_COMMAND_MASTER bit enabled at this point? What happens if
>> you clear it first?
>
> Tried that. No difference whatsoever, as I still get a fault with the
> device accessing address 0, and being caught by the iommu.
>

Wow so this device is more broken than I thought.

In UEFI, we rely on clearing the bus master bit to ensure that all
hardware stops doing DMA when ExitBootServices is called, but this is
clearly not enough.
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: patch 8466489ef5ba48272ba4fa4ea9f8f403306de4c7 breaks Renesas USB3 controller functionality

2018-03-25 Thread Ard Biesheuvel
On 25 March 2018 at 12:51, Marc Zyngier <marc.zyng...@arm.com> wrote:
> On Sun, 25 Mar 2018 11:48:35 +0100,
> Ard Biesheuvel wrote:
>
> Hi Ard,
>
> [...]
>
>> > I finally found some time to work on this, and came up with an
>> > alternative approach (it turns out that this chip is even more
>> > braindead than I thought).
>> >
>> > It is slightly scary, in the sense that the USB controller seems to
>> > perform memory accesses even when halted, and can generate faults,
>> > but it works just fine on my system. And with this, we can drop the
>> > hard reset at boot time. I'm still on the fence to limit it to systems
>> > with an iommu though.
>> >
>>
>> Hi Marc,
>>
>> I take it you tested this on Cello?
>
> Tested on Cello indeed (I should have mentioned that the first place).
>
>> There, it might make sense to
>> limit this to systems with an IOMMU, but not in the general case, I
>> think. The reason is that it is not guaranteed that the firmware will
>> use 32-bit addressable allocations for these data structures, even if
>> the kernel is able to without an IOMMU. (UEFI on arm64 will not prefer
>> 32-bit addressable memory for PCI DMA if it is available, and usually
>> serves heap allocations [such as the ones used for these data
>> structures] starting at the top of DRAM)
>
> My main worry is that this controller will happily try and DMA from
> zero as we wipe the 64bit registers, even when halted. On Seattle (and
> thus Cello), this is just fine as there is nothing there, and the
> controller aborts with the HSE bit set.
>
> On other systems, where memory actually exists at 0, who knows what
> this is going to do? On the other hand, this is not worse than the
> current situation, where we could end-up with any odd address...
>

Is the PCI_COMMAND_MASTER bit enabled at this point? What happens if
you clear it first?
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: patch 8466489ef5ba48272ba4fa4ea9f8f403306de4c7 breaks Renesas USB3 controller functionality

2018-03-25 Thread Ard Biesheuvel
On 25 March 2018 at 11:37, Marc Zyngier  wrote:
> On Fri, 02 Mar 2018 17:38:26 +,
> Bockholdt Arne wrote:
>
> Hi Arne,
>
>>
>> On Thu, 2018-03-01 at 17:37 +, Marc Zyngier wrote:
>> > On 01/03/18 08:01, Bockholdt Arne wrote:
>> > >
>> > > On Thu, 2018-02-15 at 19:29 +, Marc Zyngier wrote:
>> > > > [+ Ard, who helped me chasing the initial issue]
>> > > >
>> > > > On 15/02/18 06:43, Bockholdt Arne wrote:
>> > > > > Hi all,
>> > > > >
>> > > > > on our Intel Atom C2578 server with a SuperMicro A1SAi board
>> > > > > and a
>> > > > > Renesas uPD720201 USB 3.0 host controller the controller has
>> > > > > stopped
>> > > > > working since kernel 4.13.x. Before that kernel the dmesg
>> > > > > messages
>> > > > > from
>> > > > > XHCI were:
>> > > > >
>> > > > > dmesg-4.12.1-serverv4.log:xhci_hcd :03:00.0: xHCI Host
>> > > > > Controller
>> > > > > dmesg-4.12.1-serverv4.log:xhci_hcd :03:00.0: new USB bus
>> > > > > registered,
>> > > > > assigned bus number 2
>> > > > > dmesg-4.12.1-serverv4.log:xhci_hcd :03:00.0: hcc params
>> > > > > 0x014051cf
>> > > > > hci version 0x100 quirks 0x0010
>> > > > > dmesg-4.12.1-serverv4.log:usb usb2: Manufacturer: Linux 4.12.1-
>> > > > > serverv4
>> > > > > xhci-hcd
>> > > > > dmesg-4.12.1-serverv4.log:xhci_hcd :03:00.0: xHCI Host
>> > > > > Controller
>> > > > > dmesg-4.12.1-serverv4.log:xhci_hcd :03:00.0: new USB bus
>> > > > > registered,
>> > > > > assigned bus number 3
>> > > > > dmesg-4.12.1-serverv4.log:usb usb3: Manufacturer: Linux 4.12.1-
>> > > > > serverv4
>> > > > > xhci-hcd
>> > > > >
>> > > > > After that the message look like that:
>> > > > >
>> > > > > dmesg-4.13.1-serverv4.log:xhci_hcd :03:00.0: Resetting
>> > > > > dmesg-4.13.1-serverv4.log:xhci_hcd :03:00.0: Refused to
>> > > > > change
>> > > > > power
>> > > > > state, currently in D3
>> > > > > dmesg-4.13.1-serverv4.log:xhci_hcd :03:00.0: xHCI Host
>> > > > > Controller
>> > > > > dmesg-4.13.1-serverv4.log:xhci_hcd :03:00.0: new USB bus
>> > > > > registered,
>> > > > > assigned bus number 2
>> > > > > dmesg-4.13.1-serverv4.log:xhci_hcd :03:00.0: Host halt
>> > > > > failed,
>> > > > > -19
>> > > > > dmesg-4.13.1-serverv4.log:xhci_hcd :03:00.0: can't setup:
>> > > > > -19
>> > > > > dmesg-4.13.1-serverv4.log:xhci_hcd :03:00.0: USB bus 2
>> > > > > deregistered
>> > > > > dmesg-4.13.1-serverv4.log:xhci_hcd :03:00.0: init
>> > > > > :03:00.0
>> > > > > fail, -19
>> > > > >
>> > > > > I've tested it with 4.15.3 too, it's still the same. I've
>> > > > > narrowed
>> > > > > it
>> > > > > down to the following patch:
>> > > > >
>> > > > > commit 8466489ef5ba48272ba4fa4ea9f8f403306de4c7
>> > > > > Author: Marc Zyngier > > > > > @arm
>> > > > > .com>>
>> > > > > Date:   Tue Aug 1 20:11:08 2017 -0500
>> > > > >
>> > > > > xhci: Reset Renesas uPD72020x USB controller for 32-bit DMA
>> > > > > issue
>> > > > >
>> > > > > Reverting the patch on top of 4.15.3 restores the USB3
>> > > > > functionality on
>> > > > > our server. Please let me know if there is anything I can do to
>> > > > > fix
>> > > > > the
>> > > > > problem. Thank you.
>> > > >
>> > > > Hi Arne,
>> > > >
>> > > > This looks pretty bad. I suspect that once reset, the firmware is
>> > > > lost.
>> > > > I'll try to write a patch dumping some information about it.
>> > > >
>> > > > Ard: Do you know if the Cello board has a SPI flash connected to
>> > > > the
>> > > > Renesas chip, from which it would load the firmware?
>> > > >
>> > > > Another possibility is that power management kicks in, and that
>> > > > the
>> > > > endpoint is stuck there. Could also be firmware related, but not
>> > > > only.
>> > > > I'd welcome any idea on the subject, as I cannot reproduce this
>> > > > issue
>> > > > on
>> > > > the HW I have.
>> > > >
>> > > > It we cannot work out what exactly is causing this, we may have
>> > > > to
>> > > > default to not resetting the part and rely on a command-line
>> > > > option
>> > > > to
>> > > > do it... I can't say I'm a fan.
>> > > >
>> > > > Thanks,
>> > > >
>> > > > M.
>> > > >
>> > >
>> > > Hi Marc,
>> > >
>> > > I've tested it with 4.15.7 and it's still there. Is there anything
>> > > that
>> > > I can do to help fixing this problem?
>> >
>> > Would you mind trying the following patch and let me know if it
>> > helps?
>> > It is not exactly pretty, but we can polish it if that solves your
>> > issue.
>
> [...]
>
>> I've applied your patch on top of 4.15.7 and tried it on the server,
>> unfortunately the problem is still there. Here's the output from dmesg:
>>
>> [1.570115] xhci_hcd :03:00.0: Found a 64bit address in ERSTBA 4
>> [1.570120] xhci_hcd :03:00.0: Resetting
>> [2.668066] xhci_hcd :03:00.0: Refused to change power state,
>> currently in D3
>> [2.668215] xhci_hcd :03:00.0: xHCI Host Controller
>> [2.668225] xhci_hcd :03:00.0: new 

Re: patch 8466489ef5ba48272ba4fa4ea9f8f403306de4c7 breaks Renesas USB3 controller functionality

2018-02-15 Thread Ard Biesheuvel
On 15 February 2018 at 19:29, Marc Zyngier  wrote:
> [+ Ard, who helped me chasing the initial issue]
>
> On 15/02/18 06:43, Bockholdt Arne wrote:
>> Hi all,
>>
>> on our Intel Atom C2578 server with a SuperMicro A1SAi board and a
>> Renesas uPD720201 USB 3.0 host controller the controller has stopped
>> working since kernel 4.13.x. Before that kernel the dmesg messages from
>> XHCI were:
>>
>> dmesg-4.12.1-serverv4.log:xhci_hcd :03:00.0: xHCI Host Controller
>> dmesg-4.12.1-serverv4.log:xhci_hcd :03:00.0: new USB bus registered,
>> assigned bus number 2
>> dmesg-4.12.1-serverv4.log:xhci_hcd :03:00.0: hcc params 0x014051cf
>> hci version 0x100 quirks 0x0010
>> dmesg-4.12.1-serverv4.log:usb usb2: Manufacturer: Linux 4.12.1-serverv4
>> xhci-hcd
>> dmesg-4.12.1-serverv4.log:xhci_hcd :03:00.0: xHCI Host Controller
>> dmesg-4.12.1-serverv4.log:xhci_hcd :03:00.0: new USB bus registered,
>> assigned bus number 3
>> dmesg-4.12.1-serverv4.log:usb usb3: Manufacturer: Linux 4.12.1-serverv4
>> xhci-hcd
>>
>> After that the message look like that:
>>
>> dmesg-4.13.1-serverv4.log:xhci_hcd :03:00.0: Resetting
>> dmesg-4.13.1-serverv4.log:xhci_hcd :03:00.0: Refused to change power
>> state, currently in D3
>> dmesg-4.13.1-serverv4.log:xhci_hcd :03:00.0: xHCI Host Controller
>> dmesg-4.13.1-serverv4.log:xhci_hcd :03:00.0: new USB bus registered,
>> assigned bus number 2
>> dmesg-4.13.1-serverv4.log:xhci_hcd :03:00.0: Host halt failed, -19
>> dmesg-4.13.1-serverv4.log:xhci_hcd :03:00.0: can't setup: -19
>> dmesg-4.13.1-serverv4.log:xhci_hcd :03:00.0: USB bus 2 deregistered
>> dmesg-4.13.1-serverv4.log:xhci_hcd :03:00.0: init :03:00.0 fail, -19
>>
>> I've tested it with 4.15.3 too, it's still the same. I've narrowed it
>> down to the following patch:
>>
>> commit 8466489ef5ba48272ba4fa4ea9f8f403306de4c7
>> Author: Marc Zyngier >
>> Date:   Tue Aug 1 20:11:08 2017 -0500
>>
>> xhci: Reset Renesas uPD72020x USB controller for 32-bit DMA issue
>>
>> Reverting the patch on top of 4.15.3 restores the USB3 functionality on
>> our server. Please let me know if there is anything I can do to fix the
>> problem. Thank you.
> Hi Arne,
>
> This looks pretty bad. I suspect that once reset, the firmware is lost.
> I'll try to write a patch dumping some information about it.
>
> Ard: Do you know if the Cello board has a SPI flash connected to the
> Renesas chip, from which it would load the firmware?
>

No. The system firmware includes a firmware image for the Renesas, and
uploads it via the PCIe config space before attaching the ordinary
xhci pci driver.

https://lists.01.org/pipermail/edk2-devel/2017-April/010013.html

This is actually rather annoying, because we cannot distribute the
firmware image itself, so rebuilding the firmware from source involves
finding your own copy.

> Another possibility is that power management kicks in, and that the
> endpoint is stuck there. Could also be firmware related, but not only.
> I'd welcome any idea on the subject, as I cannot reproduce this issue on
> the HW I have.
>
> It we cannot work out what exactly is causing this, we may have to
> default to not resetting the part and rely on a command-line option to
> do it... I can't say I'm a fan.
>

Given that the Cello implementation does not have non-volatile storage
either, and the fact that the firmware upload occurs only once, my
suspicion is that something else is going on here.
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] usb: xhci: Add XHCI_TRUST_TX_LENGTH for Renesas uPD720201

2017-12-12 Thread Ard Biesheuvel
On 12 December 2017 at 16:47, Daniel Thompson
<daniel.thomp...@linaro.org> wrote:
> When plugging in a USB webcam I see the following message:
> xhci_hcd :04:00.0: WARN Successful completion on short TX: needs
> XHCI_TRUST_TX_LENGTH quirk?
> handle_tx_event: 913 callbacks suppressed
>
> All is quiet again with this patch (and I've done a fair but of soak
> testing with the camera since).
>
> Cc: <sta...@vger.kernel.org>
> Signed-off-by: Daniel Thompson <daniel.thomp...@linaro.org>

I have been setting this quirk manually for ages on my kernel command
line, for the same reason (MS HD Webcam), and with the same positive
result, so

Acked-by: Ard Biesheuvel <ard.biesheu...@linaro.org>

> ---
>  drivers/usb/host/xhci-pci.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c
> index 7ef1274ef7f7..1aad89b8aba0 100644
> --- a/drivers/usb/host/xhci-pci.c
> +++ b/drivers/usb/host/xhci-pci.c
> @@ -177,6 +177,9 @@ static void xhci_pci_quirks(struct device *dev, struct 
> xhci_hcd *xhci)
> xhci->quirks |= XHCI_TRUST_TX_LENGTH;
> xhci->quirks |= XHCI_BROKEN_STREAMS;
> }
> +   if (pdev->vendor == PCI_VENDOR_ID_RENESAS &&
> +   pdev->device == 0x0014)
> +   xhci->quirks |= XHCI_TRUST_TX_LENGTH;
> if (pdev->vendor == PCI_VENDOR_ID_RENESAS &&
> pdev->device == 0x0015)
> xhci->quirks |= XHCI_RESET_ON_RESUME;
> --
> 2.14.2
>
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Possible regression between 4.9 and 4.13

2017-08-30 Thread Ard Biesheuvel
On 30 August 2017 at 09:55, Mason  wrote:
> On 30/08/2017 08:02, Greg Kroah-Hartman wrote:
>
>> To get back to the original issue here, the hardware seems to have died,
>> the driver stops talking to it, and all is good.  The "regression" here
>> is that we now properly can determine that the hardware is crap.
>
> Before 4.12, when I unplugged my USB3 Flash drive, Linux would
> detect a few "Uncorrected Non-Fatal errors" via AER, but it was
> still possible to plug the drive back in.
>
> Since 4.12, once I unplug the drive, the whole USB3 card is marked
> as dead (all 4 ports), and I can no longer plug anything in (not even
> the USB2 drive that didn't have any issues, IIRC).
>
> It seems a bit premature to "mark as dead" something that remains
> functional, doesn't it?
>
> Disclaimer, there are many variables in this setup, and I've only
> tested a small fraction of the problem space: only one system,
> only one USB3 board, only one USB3 Flash drive.
>

Please don't forget to mention that this is quirky hardware that
depends on BROKEN because it multiplexes MMIO and config space
accesses in the same memory window without any locking whatsoever
(which would be difficult to do in the first place because we don't
use accessors for MMIO in the kernel).

So how likely is it that you are attempting to read from the xhci BAR
window while a config space access is in progress? Any way to
instrument this in your driver?

>> So, how do you think we should proceed, delay a bit longer before saying
>> the device is gone?  How long is "long enough"?  How many bus errors are
>> we allowed to tolerate (hint, the PCI spec says none...)
>>
>> Maybe someone wants to get to the root problem here, why is the hardware
>> suddenly reporting all 1s?
>
> I'm afraid I won't be able to make any progress on this front,
> unless I can get my hands on a PCIe packet analyzer.
>
> Regards.
>
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Workaround for uPD72020x USB3 chips

2017-08-01 Thread Ard Biesheuvel
On 1 August 2017 at 22:44, Bjorn Helgaas  wrote:
> On Thu, Jul 13, 2017 at 10:26:40AM +0200, Greg Kroah-Hartman wrote:
>> On Wed, Jul 12, 2017 at 10:12:34PM -0500, Bjorn Helgaas wrote:
>> > On Mon, Jul 10, 2017 at 04:52:28PM +0100, Marc Zyngier wrote:
>> > > Ard and myself have just spent quite some time lately trying to pin
>> > > down an issue in the DMA code which was taking the form of a PCIe USB3
>> > > controller issuing a DMA access at some bizarre address, and being
>> > > caught red-handed by the IOMMU.
>> > >
>> > > After much head scratching and most of a week-end spent on tracing the
>> > > damn thing, I'm now convinced that the DMA code is fine, the XHCI
>> > > driver is correct, but that the HW (a Renesas uPD720202 chip) is a
>> > > nasty piece of work.
>> > >
>> > > The issue is as follow:
>> > >
>> > > - EFI initializes the controller using physical addresses above the
>> > >   4GB limit (this is on an arm64 box where the memory starts at
>> > >   0x80_...).
>> > >
>> > > - The kernel takes over, sends a XHCI reset to the controller, and
>> > >   because we have an IOMMU sitting between the controller and memory,
>> > >   provides *virtual* addresses. Trying to make things a bit faster for
>> > >   our controller, it issues IOVAs in the low 4GB range).
>> > >
>> > > - Low and behold, the controller is now issuing transactions with a
>> > >   0x80 prefix in front of our IOVA. Yes, the same prefix that was
>> > >   programmed during the EFI configuration. IOMMU fault, not happy.
>> > >
>> > > If the kernel is hacked to only generate IOVAs that are more than
>> > > 32bit wide, the HW behaves correctly. The only way I can explain this
>> > > behaviour is that the HW latches the top 32bit of the ERST (it is
>> > > always the ERST IOVA that appears in my traces) in some internal
>> > > register, and that the XHCI reset fails to clear it. Writing zero in
>> > > the top bits is not enough to clear it either.
>> > >
>> > > So far, the only solution we have for this lovely piece of kit is to
>> > > force a PCI reset at probe time, which puts it right. The patches are
>> > > pretty ugly, but that's the best I could come up with so far.
>> > >
>> > > Tested on a pair of AMD Opteron 1100 boxes with Renesas uPD720201 and
>> > > uPD720202 controllers.
>> > >
>> > > Marc Zyngier (2):
>> > >   PCI: Implement pci_reset_function_locked
>> > >   usb: host: pci_quirks: Force hard reset of Renesas uPD72020x USB
>> > > controller
>> > >
>> > >  drivers/pci/pci.c | 35 +++
>> > >  drivers/usb/host/pci-quirks.c | 20 
>> > >  drivers/usb/host/pci-quirks.h |  1 +
>> > >  drivers/usb/host/xhci-pci.c   |  7 +++
>> > >  include/linux/pci.h   |  1 +
>> > >  5 files changed, 64 insertions(+)
>> >
>> > I provisionally applied this to pci/virtualization.  I'd like to have an
>> > XHCI ack before going further, though.
>>
>> The xhci maintainer is on vacation, let's wait a week for him to get
>> back to get this.  Given the long time that this has been broken on this
>> hardware, I think we can wait another week just fine :)
>
> Ping, in case Mathias is back from vacation :)

Actually, Mathias already acked these changes, in reply to 2/2
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Workaround for uPD72020x USB3 chips

2017-07-13 Thread Ard Biesheuvel
On 13 July 2017 at 04:12, Bjorn Helgaas  wrote:
> On Mon, Jul 10, 2017 at 04:52:28PM +0100, Marc Zyngier wrote:
>> Ard and myself have just spent quite some time lately trying to pin
>> down an issue in the DMA code which was taking the form of a PCIe USB3
>> controller issuing a DMA access at some bizarre address, and being
>> caught red-handed by the IOMMU.
>>
>> After much head scratching and most of a week-end spent on tracing the
>> damn thing, I'm now convinced that the DMA code is fine, the XHCI
>> driver is correct, but that the HW (a Renesas uPD720202 chip) is a
>> nasty piece of work.
>>
>> The issue is as follow:
>>
>> - EFI initializes the controller using physical addresses above the
>>   4GB limit (this is on an arm64 box where the memory starts at
>>   0x80_...).
>>
>> - The kernel takes over, sends a XHCI reset to the controller, and
>>   because we have an IOMMU sitting between the controller and memory,
>>   provides *virtual* addresses. Trying to make things a bit faster for
>>   our controller, it issues IOVAs in the low 4GB range).
>>
>> - Low and behold, the controller is now issuing transactions with a
>>   0x80 prefix in front of our IOVA. Yes, the same prefix that was
>>   programmed during the EFI configuration. IOMMU fault, not happy.
>>
>> If the kernel is hacked to only generate IOVAs that are more than
>> 32bit wide, the HW behaves correctly. The only way I can explain this
>> behaviour is that the HW latches the top 32bit of the ERST (it is
>> always the ERST IOVA that appears in my traces) in some internal
>> register, and that the XHCI reset fails to clear it. Writing zero in
>> the top bits is not enough to clear it either.
>>
>> So far, the only solution we have for this lovely piece of kit is to
>> force a PCI reset at probe time, which puts it right. The patches are
>> pretty ugly, but that's the best I could come up with so far.
>>
>> Tested on a pair of AMD Opteron 1100 boxes with Renesas uPD720201 and
>> uPD720202 controllers.
>>
>> Marc Zyngier (2):
>>   PCI: Implement pci_reset_function_locked
>>   usb: host: pci_quirks: Force hard reset of Renesas uPD72020x USB
>> controller
>>
>>  drivers/pci/pci.c | 35 +++
>>  drivers/usb/host/pci-quirks.c | 20 
>>  drivers/usb/host/pci-quirks.h |  1 +
>>  drivers/usb/host/xhci-pci.c   |  7 +++
>>  include/linux/pci.h   |  1 +
>>  5 files changed, 64 insertions(+)
>
> I provisionally applied this to pci/virtualization.  I'd like to have an
> XHCI ack before going further, though.
>
> I assume this only affects boxes where the firmware uses addresses above
> 4GB, i.e., not very many?  So this is v4.14 material?  Or do you think it's
> important for v4.13?
>

As I mentioned, it would be nice if this could at least go into v4.11
and later, given that v4.11 contains a patch that switches all PCI
devices to 32-bit addressing only when the IOMMU is involved in DMA,
and this is what triggered the issue on arm64 boards with such a PCI
card and no DRAM below 4 GB.

Thanks,
Ard.
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Workaround for uPD72020x USB3 chips

2017-07-12 Thread Ard Biesheuvel
On 10 July 2017 at 18:21, Ard Biesheuvel <ard.biesheu...@linaro.org> wrote:
> On 10 July 2017 at 16:52, Marc Zyngier <marc.zyng...@arm.com> wrote:
>> Ard and myself have just spent quite some time lately trying to pin
>> down an issue in the DMA code which was taking the form of a PCIe USB3
>> controller issuing a DMA access at some bizarre address, and being
>> caught red-handed by the IOMMU.
>>
>> After much head scratching and most of a week-end spent on tracing the
>> damn thing, I'm now convinced that the DMA code is fine, the XHCI
>> driver is correct, but that the HW (a Renesas uPD720202 chip) is a
>> nasty piece of work.
>>
>> The issue is as follow:
>>
>> - EFI initializes the controller using physical addresses above the
>>   4GB limit (this is on an arm64 box where the memory starts at
>>   0x80_...).
>>
>> - The kernel takes over, sends a XHCI reset to the controller, and
>>   because we have an IOMMU sitting between the controller and memory,
>>   provides *virtual* addresses. Trying to make things a bit faster for
>>   our controller, it issues IOVAs in the low 4GB range).
>>
>> - Low and behold, the controller is now issuing transactions with a
>>   0x80 prefix in front of our IOVA. Yes, the same prefix that was
>>   programmed during the EFI configuration. IOMMU fault, not happy.
>>
>> If the kernel is hacked to only generate IOVAs that are more than
>> 32bit wide, the HW behaves correctly. The only way I can explain this
>> behaviour is that the HW latches the top 32bit of the ERST (it is
>> always the ERST IOVA that appears in my traces) in some internal
>> register, and that the XHCI reset fails to clear it. Writing zero in
>> the top bits is not enough to clear it either.
>>
>
> To clarify, this seems to be an issue in the internal DMA logic of the
> controller. The ESRT base address register *is* cleared by the XHCI
> reset, i.e., it reads back as all zeroes. However, any 32-bit value we
> write there is extended with the high word written by the UEFI in the
> actual DMA transactions that take place.
>
>> So far, the only solution we have for this lovely piece of kit is to
>> force a PCI reset at probe time, which puts it right. The patches are
>> pretty ugly, but that's the best I could come up with so far.
>>
>> Tested on a pair of AMD Opteron 1100 boxes with Renesas uPD720201 and
>> uPD720202 controllers.
>>
>> Marc Zyngier (2):
>>   PCI: Implement pci_reset_function_locked
>>   usb: host: pci_quirks: Force hard reset of Renesas uPD72020x USB
>> controller
>>

Tested-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Workaround for uPD72020x USB3 chips

2017-07-10 Thread Ard Biesheuvel
On 10 July 2017 at 16:52, Marc Zyngier  wrote:
> Ard and myself have just spent quite some time lately trying to pin
> down an issue in the DMA code which was taking the form of a PCIe USB3
> controller issuing a DMA access at some bizarre address, and being
> caught red-handed by the IOMMU.
>
> After much head scratching and most of a week-end spent on tracing the
> damn thing, I'm now convinced that the DMA code is fine, the XHCI
> driver is correct, but that the HW (a Renesas uPD720202 chip) is a
> nasty piece of work.
>
> The issue is as follow:
>
> - EFI initializes the controller using physical addresses above the
>   4GB limit (this is on an arm64 box where the memory starts at
>   0x80_...).
>
> - The kernel takes over, sends a XHCI reset to the controller, and
>   because we have an IOMMU sitting between the controller and memory,
>   provides *virtual* addresses. Trying to make things a bit faster for
>   our controller, it issues IOVAs in the low 4GB range).
>
> - Low and behold, the controller is now issuing transactions with a
>   0x80 prefix in front of our IOVA. Yes, the same prefix that was
>   programmed during the EFI configuration. IOMMU fault, not happy.
>
> If the kernel is hacked to only generate IOVAs that are more than
> 32bit wide, the HW behaves correctly. The only way I can explain this
> behaviour is that the HW latches the top 32bit of the ERST (it is
> always the ERST IOVA that appears in my traces) in some internal
> register, and that the XHCI reset fails to clear it. Writing zero in
> the top bits is not enough to clear it either.
>

To clarify, this seems to be an issue in the internal DMA logic of the
controller. The ESRT base address register *is* cleared by the XHCI
reset, i.e., it reads back as all zeroes. However, any 32-bit value we
write there is extended with the high word written by the UEFI in the
actual DMA transactions that take place.

> So far, the only solution we have for this lovely piece of kit is to
> force a PCI reset at probe time, which puts it right. The patches are
> pretty ugly, but that's the best I could come up with so far.
>
> Tested on a pair of AMD Opteron 1100 boxes with Renesas uPD720201 and
> uPD720202 controllers.
>
> Marc Zyngier (2):
>   PCI: Implement pci_reset_function_locked
>   usb: host: pci_quirks: Force hard reset of Renesas uPD72020x USB
> controller
>
>  drivers/pci/pci.c | 35 +++
>  drivers/usb/host/pci-quirks.c | 20 
>  drivers/usb/host/pci-quirks.h |  1 +
>  drivers/usb/host/xhci-pci.c   |  7 +++
>  include/linux/pci.h   |  1 +
>  5 files changed, 64 insertions(+)
>

This issue was uncovered by commit 122fac030e91 ("iommu/dma: Implement
PCI allocation optimisation"), which appeared in v4.11. So if this
approach is considered appropriate, it would be nice if we could tag
it for v4.11-stable as well.

Thanks,
Ard.
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Panic in quirk_usb_early_handoff

2017-03-04 Thread Ard Biesheuvel

> On 4 Mar 2017, at 17:29, Mason <slash@free.fr> wrote:
> 
>> On 04/03/2017 18:16, Ard Biesheuvel wrote:
>> 
>> After pc, the link register is the most likely to legally point into
>> the kernel .text section so it makes sense imo to decode the address
>> into a function name plus offset.
> 
> Does gcc ever use the link register as a general purpose register?

Yes.

> (In which case, it is very likely to contain "garbage" as far as
> function addresses are concerned.)
> 
>> Educating people about the architecture's calling convention and
>> associated caveats is not the job of the panic handler.
> 
> That's a weird statement.
> 

By your own admission (in various threads and in #armlinux on IRC), you are not 
an expert in the topics you seek help about. Yet, that does not seem to stop 
you from sharing your opinions vocally, how 'weird' or 'useless' some things 
are.

As for the lr, I attempted to explain that in some cases, annotating its value 
can be useful. Adding an explanation to or letting the panic handler reason 
about whether this is currently the case is not so useful imo.--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Panic in quirk_usb_early_handoff

2017-03-04 Thread Ard Biesheuvel

> On 4 Mar 2017, at 16:57, Mason <slash@free.fr> wrote:
> 
>> On 04/03/2017 09:07, Ard Biesheuvel wrote:
>>> On 4 March 2017 at 00:24, Mason wrote:
>>>> On 03/03/2017 20:02, Robin Murphy wrote:
>>>> 
>>>>> On 03/03/17 17:15, Mason wrote:
>>>>> 
>>>>> [1.261813] Unable to handle kernel paging request at virtual address 
>>>>> d08611e4
>>>>> [1.269167] pgd = c0004000
>>>>> [1.271979] [d08611e4] *pgd=8f804811, *pte=, *ppte=
>>>>> [1.278394] Internal error: Oops: 7 [#1] PREEMPT SMP ARM
>>>>> [1.283815] Modules linked in:
>>>>> [1.286970] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.9.7-1-rc2 #157
>>>>> [1.293614] Hardware name: Sigma Tango DT
>>>>> [1.297726] task: cf82c9c0 task.stack: cf838000
>>>>> [1.302364] PC is at quirk_usb_early_handoff+0x3e8/0x790
>>>>> [1.307790] LR is at ioremap_page_range+0xf8/0x1a8
>>>>> [1.312688] pc : []lr : []psr: 000e0013
>>>>> [1.312688] sp : cf839d78  ip :   fp : cf839e38
>>>>> [1.324399] r10: c10248a0  r9 :   r8 : d08611e4
>>>>> [1.329733] r7 : d084e000  r6 : 2000  r5 : 000c0300  r4 : cfb4e800
>>>>> [1.336377] r3 : 000131e4  r2 :   r1 : 91001e13  r0 : d084e000
>>>> 
>>>> ...and again. And always at the same PC, too.
>>> 
>>> By the way, isn't LR supposed to point to the caller of the
>>> current function? ("LR is at ioremap_page_range")
>>> 
>>> If so, why does it not appear in the back trace?
>> 
>> lr is supposed to point to the return address at function entry. After
>> that, all bets are off, really, since ARM usually pops the return
>> address from the stack straight into the pc register. So in this case,
>> it looks like it still contains the address that the most recent leaf
>> function returned to (or another function that actually restores the
>> return address into lr before branching to it). But it could easily
>> contain garbage as well.
> 
> If there is only a tiny chance that LR contains genuinely useful
> information, then what is the rationale for providing the info
> at all in the panic message?
> 
> I would argue that no info is better than info that is wrong
> most of the time.
> 

After pc, the link register is the most likely to legally point into the kernel 
.text section so it makes sense imo to decode the address into a function name 
plus offset.

Educating people about the architecture's calling convention and associated 
caveats is not the job of the panic handler.

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Panic in quirk_usb_early_handoff

2017-03-04 Thread Ard Biesheuvel
On 4 March 2017 at 00:24, Mason  wrote:
> On 03/03/2017 20:02, Robin Murphy wrote:
>
>> On 03/03/17 17:15, Mason wrote:
>>
>>> [1.261813] Unable to handle kernel paging request at virtual address 
>>> d08611e4
>>> [1.269167] pgd = c0004000
>>> [1.271979] [d08611e4] *pgd=8f804811, *pte=, *ppte=
>>> [1.278394] Internal error: Oops: 7 [#1] PREEMPT SMP ARM
>>> [1.283815] Modules linked in:
>>> [1.286970] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.9.7-1-rc2 #157
>>> [1.293614] Hardware name: Sigma Tango DT
>>> [1.297726] task: cf82c9c0 task.stack: cf838000
>>> [1.302364] PC is at quirk_usb_early_handoff+0x3e8/0x790
>>> [1.307790] LR is at ioremap_page_range+0xf8/0x1a8
>>> [1.312688] pc : []lr : []psr: 000e0013
>>> [1.312688] sp : cf839d78  ip :   fp : cf839e38
>>> [1.324399] r10: c10248a0  r9 :   r8 : d08611e4
>>> [1.329733] r7 : d084e000  r6 : 2000  r5 : 000c0300  r4 : cfb4e800
>>> [1.336377] r3 : 000131e4  r2 :   r1 : 91001e13  r0 : d084e000
>>
>> ...and again. And always at the same PC, too.
>
> By the way, isn't LR supposed to point to the caller of the
> current function? ("LR is at ioremap_page_range")
>
> If so, why does it not appear in the back trace?
>

lr is supposed to point to the return address at function entry. After
that, all bets are off, really, since ARM usually pops the return
address from the stack straight into the pc register. So in this case,
it looks like it still contains the address that the most recent leaf
function returned to (or another function that actually restores the
return address into lr before branching to it). But it could easily
contain garbage as well.
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html