Re: PCIe legacy interrupts blocked on Intel Apollo Lake platforms

2017-10-22 Thread Daniel Drake
Hi,

On Wed, Oct 18, 2017 at 7:54 PM, Andy Shevchenko
 wrote:
> While Rafael is looking for a solution, can you in meantime gather the
> following on the affected hardware and share it via some resource on
> Internet?
>
> 1. % acpidump -o tables.dat # tables.dat is point of interest

https://gist.githubusercontent.com/dsd/ef9b9da4c634f57de89f917c43703272/raw/391db13df07ceab78ccc2ca1d5f4e5ccd3fb10d8/acpi%2520tables

> 2. % lspci -vv -nk # output of the command

https://gist.githubusercontent.com/dsd/ef9b9da4c634f57de89f917c43703272/raw/391db13df07ceab78ccc2ca1d5f4e5ccd3fb10d8/pci

> 3. % dmidecode # output of the command

https://gist.githubusercontent.com/dsd/ef9b9da4c634f57de89f917c43703272/raw/391db13df07ceab78ccc2ca1d5f4e5ccd3fb10d8/dmidecode

> 4. % grep -H 15 /sys/bus/acpi/devices/*/status # output of the command

https://gist.githubusercontent.com/dsd/ef9b9da4c634f57de89f917c43703272/raw/391db13df07ceab78ccc2ca1d5f4e5ccd3fb10d8/grep%2520-H%252015%2520acpi%2520status

> 5. % dmesg # when kernel command line has the 'ignore_loglevel
> initcall_debug' added

https://gist.githubusercontent.com/dsd/ef9b9da4c634f57de89f917c43703272/raw/391db13df07ceab78ccc2ca1d5f4e5ccd3fb10d8/dmesg


All above files in a zip:
https://gist.github.com/dsd/ef9b9da4c634f57de89f917c43703272/archive/391db13df07ceab78ccc2ca1d5f4e5ccd3fb10d8.zip

Please let me know how we can help further!

Daniel


Re: PCIe legacy interrupts blocked on Intel Apollo Lake platforms

2017-10-18 Thread Andy Shevchenko
On Wed, Oct 18, 2017 at 11:36 AM, Daniel Drake  wrote:
> [retitling and re-summarizing in hope of attention from Intel]
>
> Andy / Rafael,
>
> Thomas Gleixner suggested that you might be able to help with a nasty
> issue related to Intel Apollo Lake platforms - or you can put us in
> contact with another relevant person at Intel.

While Rafael is looking for a solution, can you in meantime gather the
following on the affected hardware and share it via some resource on
Internet?

1. % acpidump -o tables.dat # tables.dat is point of interest
2. % lspci -vv -nk # output of the command
3. % dmidecode # output of the command
4. % grep -H 15 /sys/bus/acpi/devices/*/status # output of the command
5. % dmesg # when kernel command line has the 'ignore_loglevel
initcall_debug' added

-- 
With Best Regards,
Andy Shevchenko


Re: PCIe legacy interrupts blocked on Intel Apollo Lake platforms

2017-10-18 Thread Rafael J. Wysocki
On Wednesday, October 18, 2017 10:36:49 AM CEST Daniel Drake wrote:
> [retitling and re-summarizing in hope of attention from Intel]
> 
> Andy / Rafael,
> 
> Thomas Gleixner suggested that you might be able to help with a nasty
> issue related to Intel Apollo Lake platforms - or you can put us in
> contact with another relevant person at Intel.
> 
> On Thu, Oct 5, 2017 at 6:13 PM, Thomas Gleixner  wrote:
> >> We have tried taking the mini-PCIe wifi module out of one of the affected
> >> Acer products and moved it to another computer, where it is working fine
> >> with legacy interrupts. So this suggests that the wifi module itself is OK,
> >> but we are facing a hardware limitation or BIOS limitation on the affected
> >> products. In the Dell thread it says "Some platform(BIOS) blocks legacy
> >> interrupts (INTx)".
> >>
> >> If you have any suggestions for how we might solve this without getting 
> >> into
> >> the MSI mess then that would be much appreciated. If the BIOS blocks the
> >> interrupts, can Linux unblock them?
> >
> > I'm pretty sure we can. Cc'ed Rafael and Andy. They might know, if not they
> > certainly know whom to ask @Intel.
> 
> To summarize the issue:
> 
> At least 8 new Acer consumer laptop products based on Intel Apollo
> Lake are unable to deliver legacy interrupts from the ath9k miniPCIe
> wifi card to the host. This results in wifi connectivity being
> unusable on those systems.
> 
> This also seems to affect the 4 Dell systems included in this patch series:
>   https://lkml.org/lkml/2017/9/26/55
> at least 2 of which are also Apollo Lake (can't find specs for the other 2).
> 
> We know that the wifi module itself is OK, since we can take it to
> another laptop and it delivers legacy interrupts just fine.
> 
> We know that this is not a fundamental limitation of Intel Apollo
> Lake, since we have other Apollo Lake products in hand with ath9k wifi
> modules and legacy interrupts work fine there.
> 
> We would like to switch to MSI interrupts instead, but unfortunately
> ath9k seems to have a hardware bug in that it corrupts the MSI message
> data, see:
> 
>   ath9k hardware corrupts MSI Message Data, raises wrong interrupt
>   https://marc.info/?l=linux-pci&m=150238260826803&w=2
> 
> We have explored workarounds for this on the Linux side, but that has
> turned out to be unattractive and impractical:
> 
>   [PATCH] PCI MSI: allow alignment restrictions on vector allocation
>   https://marc.info/?t=15063128321&r=1&w=2
> 
> Interrupt remapping could probably help us avoid this MSI problem, but
> unfortunately that's not available on the affected platforms:
> 
>   DMAR table missing, Intel IOMMU not available
>   https://lists.linuxfoundation.org/pipermail/iommu/2017-August/023717.html
> 
> So now, digging for other options, I would like to explore the theory
> mentioned on the Dell thread that the BIOS is blocking legacy
> interrupts on these platforms. The question for Intel is: if the BIOS
> is blocking legacy interrupts, can Linux unblock them? (and how)
> 
> I have Apollo Lake platforms here which exhibit the issue, and also
> other Apollo Lake platforms that work fine, so let me know where I can
> help look for differences (register dumps etc)

Thanks for the very useful summary of the problem, I'll do my best to find
out what can be done to address it.

Thanks,
Rafael



RE: PCIe MSI address is not written at pci_enable_msi_range call

2016-07-27 Thread Bharat Kumar Gogada
> Subject: Re: PCIe MSI address is not written at pci_enable_msi_range call
>
> +tglx
>
> On 13/07/16 09:33, Bharat Kumar Gogada wrote:
> >> Subject: Re: PCIe MSI address is not written at pci_enable_msi_range
> >> call
> >>
> >> On 13/07/16 07:22, Bharat Kumar Gogada wrote:
> >>>> Subject: Re: PCIe MSI address is not written at
> >>>> pci_enable_msi_range call
> >>>>
> >>>> On 11/07/16 10:33, Bharat Kumar Gogada wrote:
> >>>>> Hi Marc,
> >>>>>
> >>>>> Thanks for the reply.
> >>>>>
> >>>>> From PCIe Spec:
> >>>>> MSI Enable Bit:
> >>>>> If 1 and the MSI-X Enable bit in the MSI-X Message Control
> >>>>> register (see Section 6.8.2.3) is 0, the function is permitted to
> >>>>> use MSI to request service and is prohibited from using its INTx# pin.
> >>>>>
> >>>>> From Endpoint perspective, MSI Enable = 1 indicates MSI can be
> >>>>> used
> >>>> which means MSI address and data fields are available/programmed.
> >>>>>
> >>>>> In our SoC whenever MSI Enable goes from 0 --> 1 the hardware
> >>>>> latches
> >>>> onto MSI address and MSI data values.
> >>>>>
> >>>>> With current MSI implementation in kernel, our SoC is latching on
> >>>>> to incorrect address and data values, as address/data are updated
> >>>>> much later
> >>>> than MSI Enable bit.
> >>>>
> >>>> As a side question, how does setting the affinity work on this
> >>>> end-point if this involves changing the address programmed in the
> >>>> MSI
> >> registers?
> >>>> Do you expect the enabled bit to be toggled to around the write?
> >>>>
> >>>
> >>> Yes,
> >>
> >> Well, that's pretty annoying, as this will not work either. But maybe
> >> your MSI controller has a single doorbell? You haven't mentioned which
> HW that is...
> >>
> > The MSI address/data is located in config space, in our SoC for the
> > logic behind PCIe to become aware of new address/data  MSI enable
> transition is used (0 to 1).
> > The logic cannot keep polling these registers in configuration space as it
> would consume power.
> >
> > So the logic uses the transition in MSI enable to latch on to address/data.
>
> A couple of additional questions:
>
> Does your HW support MSI masking? And if it does, does it resample the
> address/data on unmask?
>
No we do not support masking.

Regards,
Bharat



This email and any attachments are intended for the sole use of the named 
recipient(s) and contain(s) confidential information that may be proprietary, 
privileged or copyrighted under applicable law. If you are not the intended 
recipient, do not read, copy, or forward this email message or any attachments. 
Delete this email message and any attachments immediately.



Re: PCIe MSI address is not written at pci_enable_msi_range call

2016-07-20 Thread Marc Zyngier
+tglx

On 13/07/16 09:33, Bharat Kumar Gogada wrote:
>> Subject: Re: PCIe MSI address is not written at pci_enable_msi_range call
>>
>> On 13/07/16 07:22, Bharat Kumar Gogada wrote:
>>>> Subject: Re: PCIe MSI address is not written at pci_enable_msi_range
>>>> call
>>>>
>>>> On 11/07/16 10:33, Bharat Kumar Gogada wrote:
>>>>> Hi Marc,
>>>>>
>>>>> Thanks for the reply.
>>>>>
>>>>> From PCIe Spec:
>>>>> MSI Enable Bit:
>>>>> If 1 and the MSI-X Enable bit in the MSI-X Message Control register
>>>>> (see Section 6.8.2.3) is 0, the function is permitted to use MSI to
>>>>> request service and is prohibited from using its INTx# pin.
>>>>>
>>>>> From Endpoint perspective, MSI Enable = 1 indicates MSI can be used
>>>> which means MSI address and data fields are available/programmed.
>>>>>
>>>>> In our SoC whenever MSI Enable goes from 0 --> 1 the hardware
>>>>> latches
>>>> onto MSI address and MSI data values.
>>>>>
>>>>> With current MSI implementation in kernel, our SoC is latching on to
>>>>> incorrect address and data values, as address/data are updated much
>>>>> later
>>>> than MSI Enable bit.
>>>>
>>>> As a side question, how does setting the affinity work on this
>>>> end-point if this involves changing the address programmed in the MSI
>> registers?
>>>> Do you expect the enabled bit to be toggled to around the write?
>>>>
>>>
>>> Yes,
>>
>> Well, that's pretty annoying, as this will not work either. But maybe your 
>> MSI
>> controller has a single doorbell? You haven't mentioned which HW that is...
>>
> The MSI address/data is located in config space, in our SoC for the logic 
> behind PCIe
> to become aware of new address/data  MSI enable transition is used (0 to 1).
> The logic cannot keep polling these registers in configuration space as it 
> would consume power.
> 
> So the logic uses the transition in MSI enable to latch on to address/data.

A couple of additional questions:

Does your HW support MSI masking? And if it does, does it resample the
address/data on unmask?

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...


Re: PCIe MSI address is not written at pci_enable_msi_range call

2016-07-13 Thread Marc Zyngier
On 13/07/16 16:34, Bharat Kumar Gogada wrote:
>> On 13/07/16 10:36, Bharat Kumar Gogada wrote:
>>>> Subject: Re: PCIe MSI address is not written at pci_enable_msi_range
>>>> call
>>>>
>>>> On 13/07/16 10:10, Bharat Kumar Gogada wrote:
>>>>>> Subject: Re: PCIe MSI address is not written at
>>>>>> pci_enable_msi_range call
>>>>>>
>>>>>> On 13/07/16 09:33, Bharat Kumar Gogada wrote:
>>>>>>>> Subject: Re: PCIe MSI address is not written at
>>>>>>>> pci_enable_msi_range call
>>>>>>>>
>>>>>>>> On 13/07/16 07:22, Bharat Kumar Gogada wrote:
>>>>>>>>>> Subject: Re: PCIe MSI address is not written at
>>>>>>>>>> pci_enable_msi_range call
>>>>>>>>>>
>>>>>>>>>> On 11/07/16 10:33, Bharat Kumar Gogada wrote:
>>>>>>>>>>> Hi Marc,
>>>>>>>>>>>
>>>>>>>>>>> Thanks for the reply.
>>>>>>>>>>>
>>>>>>>>>>> From PCIe Spec:
>>>>>>>>>>> MSI Enable Bit:
>>>>>>>>>>> If 1 and the MSI-X Enable bit in the MSI-X Message Control
>>>>>>>>>>> register (see Section 6.8.2.3) is 0, the function is permitted
>>>>>>>>>>> to use MSI to request service and is prohibited from using its
>>>>>>>>>>> INTx#
>>>> pin.
>>>>>>>>>>>
>>>>>>>>>>> From Endpoint perspective, MSI Enable = 1 indicates MSI can be
>>>>>>>>>>> used
>>>>>>>>>> which means MSI address and data fields are
>> available/programmed.
>>>>>>>>>>>
>>>>>>>>>>> In our SoC whenever MSI Enable goes from 0 --> 1 the hardware
>>>>>>>>>>> latches
>>>>>>>>>> onto MSI address and MSI data values.
>>>>>>>>>>>
>>>>>>>>>>> With current MSI implementation in kernel, our SoC is latching
>>>>>>>>>>> on to incorrect address and data values, as address/data are
>>>>>>>>>>> updated much later
>>>>>>>>>> than MSI Enable bit.
>>>>>>>>>>
>>>>>>>>>> As a side question, how does setting the affinity work on this
>>>>>>>>>> end-point if this involves changing the address programmed in
>>>>>>>>>> the MSI
>>>>>>>> registers?
>>>>>>>>>> Do you expect the enabled bit to be toggled to around the write?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Yes,
>>>>>>>>
>>>>>>>> Well, that's pretty annoying, as this will not work either. But
>>>>>>>> maybe your
>>>>>> MSI
>>>>>>>> controller has a single doorbell? You haven't mentioned which HW
>>>>>>>> that
>>>>>> is...
>>>>>>>>
>>>>>>> The MSI address/data is located in config space, in our SoC for
>>>>>>> the logic
>>>>>> behind PCIe
>>>>>>> to become aware of new address/data  MSI enable transition is used
>>>>>>> (0 to
>>>>>> 1).
>>>>>>> The logic cannot keep polling these registers in configuration
>>>>>>> space as it
>>>>>> would consume power.
>>>>>>>
>>>>>>> So the logic uses the transition in MSI enable to latch on to
>> address/data.
>>>>>>
>>>>>> I understand the "why". I'm just wondering if your SoC needs to
>>>>>> have the MSI address changed when changing the affinity of the MSI?
>>>>>> What MSI controller are you using? Is it in mainline?
>>>>>>
>>>>> Can you please give more information on MSI affinity ?
>>>>> For cpu affinity for interrupts we would use MSI-X.
>>>>>
>>>>> We are using GIC 400 v2.
>>>>
>>>> None of that is relevant. GIC400 doesn't have the faintest notion of
>>>> what an MSI is, and MSI-X vs MSI is an end-point property.
>>>>
>>>> Please answer these questions: does your MSI controller have a unique
>>>> doorbell, or multiple doorbells? Does it use wired interrupts (SPIs)
>>>> connected to the GIC? Is the support code for this MSI controller in
>> mainline or not?
>>>>
>>>
>>> It has single doorbell.
>>> The MSI decoding is part of our PCIe bridge, and it has SPI to GIC.
>>> Our root driver is in mainline drivers/pci/host/pcie-xilinx-nwl.c
>>
>> OK, so you're not affected by this affinity setting issue. Please let me 
>> know if
>> the patch I sent yesterday improve things for you once you have a chance to
>> test it.
>>
> Hi Marc,
> 
> I tested with the patch you provided, now it is working for us.

Thanks, I'll repost this as a proper patch with your Tested-by.

> Can you please point to any doc related to affinity in MSI, until now we
> came across affinity for MSI-X. I will explore more on it.

I don't have anything at hand, but simply look at how MSI (and MSI-X) is
implemented on x86, for example: each CPU has its own doorbell, and
changing the affinity of a MSI is done by changing the target address of
that interrupt. And it doesn't seem that the kernel switches the Enable
bit off and on for those.

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...


RE: PCIe MSI address is not written at pci_enable_msi_range call

2016-07-13 Thread Bharat Kumar Gogada
> On 13/07/16 10:36, Bharat Kumar Gogada wrote:
> >> Subject: Re: PCIe MSI address is not written at pci_enable_msi_range
> >> call
> >>
> >> On 13/07/16 10:10, Bharat Kumar Gogada wrote:
> >>>> Subject: Re: PCIe MSI address is not written at
> >>>> pci_enable_msi_range call
> >>>>
> >>>> On 13/07/16 09:33, Bharat Kumar Gogada wrote:
> >>>>>> Subject: Re: PCIe MSI address is not written at
> >>>>>> pci_enable_msi_range call
> >>>>>>
> >>>>>> On 13/07/16 07:22, Bharat Kumar Gogada wrote:
> >>>>>>>> Subject: Re: PCIe MSI address is not written at
> >>>>>>>> pci_enable_msi_range call
> >>>>>>>>
> >>>>>>>> On 11/07/16 10:33, Bharat Kumar Gogada wrote:
> >>>>>>>>> Hi Marc,
> >>>>>>>>>
> >>>>>>>>> Thanks for the reply.
> >>>>>>>>>
> >>>>>>>>> From PCIe Spec:
> >>>>>>>>> MSI Enable Bit:
> >>>>>>>>> If 1 and the MSI-X Enable bit in the MSI-X Message Control
> >>>>>>>>> register (see Section 6.8.2.3) is 0, the function is permitted
> >>>>>>>>> to use MSI to request service and is prohibited from using its
> >>>>>>>>> INTx#
> >> pin.
> >>>>>>>>>
> >>>>>>>>> From Endpoint perspective, MSI Enable = 1 indicates MSI can be
> >>>>>>>>> used
> >>>>>>>> which means MSI address and data fields are
> available/programmed.
> >>>>>>>>>
> >>>>>>>>> In our SoC whenever MSI Enable goes from 0 --> 1 the hardware
> >>>>>>>>> latches
> >>>>>>>> onto MSI address and MSI data values.
> >>>>>>>>>
> >>>>>>>>> With current MSI implementation in kernel, our SoC is latching
> >>>>>>>>> on to incorrect address and data values, as address/data are
> >>>>>>>>> updated much later
> >>>>>>>> than MSI Enable bit.
> >>>>>>>>
> >>>>>>>> As a side question, how does setting the affinity work on this
> >>>>>>>> end-point if this involves changing the address programmed in
> >>>>>>>> the MSI
> >>>>>> registers?
> >>>>>>>> Do you expect the enabled bit to be toggled to around the write?
> >>>>>>>>
> >>>>>>>
> >>>>>>> Yes,
> >>>>>>
> >>>>>> Well, that's pretty annoying, as this will not work either. But
> >>>>>> maybe your
> >>>> MSI
> >>>>>> controller has a single doorbell? You haven't mentioned which HW
> >>>>>> that
> >>>> is...
> >>>>>>
> >>>>> The MSI address/data is located in config space, in our SoC for
> >>>>> the logic
> >>>> behind PCIe
> >>>>> to become aware of new address/data  MSI enable transition is used
> >>>>> (0 to
> >>>> 1).
> >>>>> The logic cannot keep polling these registers in configuration
> >>>>> space as it
> >>>> would consume power.
> >>>>>
> >>>>> So the logic uses the transition in MSI enable to latch on to
> address/data.
> >>>>
> >>>> I understand the "why". I'm just wondering if your SoC needs to
> >>>> have the MSI address changed when changing the affinity of the MSI?
> >>>> What MSI controller are you using? Is it in mainline?
> >>>>
> >>> Can you please give more information on MSI affinity ?
> >>> For cpu affinity for interrupts we would use MSI-X.
> >>>
> >>> We are using GIC 400 v2.
> >>
> >> None of that is relevant. GIC400 doesn't have the faintest notion of
> >> what an MSI is, and MSI-X vs MSI is an end-point property.
> >>
> >> Please answer these questions: does your MSI controller have a unique
> >> doorbell, or multiple doorbells? Does it use wired interrupts (SPIs)
> >> connected to the GIC? Is the support code for this MSI controller in
> mainline or not?
> >>
> >
> > It has single doorbell.
> > The MSI decoding is part of our PCIe bridge, and it has SPI to GIC.
> > Our root driver is in mainline drivers/pci/host/pcie-xilinx-nwl.c
>
> OK, so you're not affected by this affinity setting issue. Please let me know 
> if
> the patch I sent yesterday improve things for you once you have a chance to
> test it.
>
Hi Marc,

I tested with the patch you provided, now it is working for us.

Can you please point to any doc related to affinity in MSI, until now we
came across affinity for MSI-X. I will explore more on it.

Thanks for your help.

Regards,
Bharat



This email and any attachments are intended for the sole use of the named 
recipient(s) and contain(s) confidential information that may be proprietary, 
privileged or copyrighted under applicable law. If you are not the intended 
recipient, do not read, copy, or forward this email message or any attachments. 
Delete this email message and any attachments immediately.



Re: PCIe MSI address is not written at pci_enable_msi_range call

2016-07-13 Thread Marc Zyngier
On 13/07/16 10:36, Bharat Kumar Gogada wrote:
>> Subject: Re: PCIe MSI address is not written at pci_enable_msi_range call
>>
>> On 13/07/16 10:10, Bharat Kumar Gogada wrote:
>>>> Subject: Re: PCIe MSI address is not written at pci_enable_msi_range
>>>> call
>>>>
>>>> On 13/07/16 09:33, Bharat Kumar Gogada wrote:
>>>>>> Subject: Re: PCIe MSI address is not written at
>>>>>> pci_enable_msi_range call
>>>>>>
>>>>>> On 13/07/16 07:22, Bharat Kumar Gogada wrote:
>>>>>>>> Subject: Re: PCIe MSI address is not written at
>>>>>>>> pci_enable_msi_range call
>>>>>>>>
>>>>>>>> On 11/07/16 10:33, Bharat Kumar Gogada wrote:
>>>>>>>>> Hi Marc,
>>>>>>>>>
>>>>>>>>> Thanks for the reply.
>>>>>>>>>
>>>>>>>>> From PCIe Spec:
>>>>>>>>> MSI Enable Bit:
>>>>>>>>> If 1 and the MSI-X Enable bit in the MSI-X Message Control
>>>>>>>>> register (see Section 6.8.2.3) is 0, the function is permitted
>>>>>>>>> to use MSI to request service and is prohibited from using its INTx#
>> pin.
>>>>>>>>>
>>>>>>>>> From Endpoint perspective, MSI Enable = 1 indicates MSI can be
>>>>>>>>> used
>>>>>>>> which means MSI address and data fields are available/programmed.
>>>>>>>>>
>>>>>>>>> In our SoC whenever MSI Enable goes from 0 --> 1 the hardware
>>>>>>>>> latches
>>>>>>>> onto MSI address and MSI data values.
>>>>>>>>>
>>>>>>>>> With current MSI implementation in kernel, our SoC is latching
>>>>>>>>> on to incorrect address and data values, as address/data are
>>>>>>>>> updated much later
>>>>>>>> than MSI Enable bit.
>>>>>>>>
>>>>>>>> As a side question, how does setting the affinity work on this
>>>>>>>> end-point if this involves changing the address programmed in the
>>>>>>>> MSI
>>>>>> registers?
>>>>>>>> Do you expect the enabled bit to be toggled to around the write?
>>>>>>>>
>>>>>>>
>>>>>>> Yes,
>>>>>>
>>>>>> Well, that's pretty annoying, as this will not work either. But
>>>>>> maybe your
>>>> MSI
>>>>>> controller has a single doorbell? You haven't mentioned which HW
>>>>>> that
>>>> is...
>>>>>>
>>>>> The MSI address/data is located in config space, in our SoC for the
>>>>> logic
>>>> behind PCIe
>>>>> to become aware of new address/data  MSI enable transition is used
>>>>> (0 to
>>>> 1).
>>>>> The logic cannot keep polling these registers in configuration space
>>>>> as it
>>>> would consume power.
>>>>>
>>>>> So the logic uses the transition in MSI enable to latch on to 
>>>>> address/data.
>>>>
>>>> I understand the "why". I'm just wondering if your SoC needs to have
>>>> the MSI address changed when changing the affinity of the MSI? What
>>>> MSI controller are you using? Is it in mainline?
>>>>
>>> Can you please give more information on MSI affinity ?
>>> For cpu affinity for interrupts we would use MSI-X.
>>>
>>> We are using GIC 400 v2.
>>
>> None of that is relevant. GIC400 doesn't have the faintest notion of what an
>> MSI is, and MSI-X vs MSI is an end-point property.
>>
>> Please answer these questions: does your MSI controller have a unique
>> doorbell, or multiple doorbells? Does it use wired interrupts (SPIs) 
>> connected
>> to the GIC? Is the support code for this MSI controller in mainline or not?
>>
> 
> It has single doorbell.
> The MSI decoding is part of our PCIe bridge, and it has SPI to GIC.
> Our root driver is in mainline drivers/pci/host/pcie-xilinx-nwl.c

OK, so you're not affected by this affinity setting issue. Please let me
know if the patch I sent yesterday improve things for you once you have
a chance to test it.

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...


RE: PCIe MSI address is not written at pci_enable_msi_range call

2016-07-13 Thread Bharat Kumar Gogada
> Subject: Re: PCIe MSI address is not written at pci_enable_msi_range call
>
> On 13/07/16 10:10, Bharat Kumar Gogada wrote:
> >> Subject: Re: PCIe MSI address is not written at pci_enable_msi_range
> >> call
> >>
> >> On 13/07/16 09:33, Bharat Kumar Gogada wrote:
> >>>> Subject: Re: PCIe MSI address is not written at
> >>>> pci_enable_msi_range call
> >>>>
> >>>> On 13/07/16 07:22, Bharat Kumar Gogada wrote:
> >>>>>> Subject: Re: PCIe MSI address is not written at
> >>>>>> pci_enable_msi_range call
> >>>>>>
> >>>>>> On 11/07/16 10:33, Bharat Kumar Gogada wrote:
> >>>>>>> Hi Marc,
> >>>>>>>
> >>>>>>> Thanks for the reply.
> >>>>>>>
> >>>>>>> From PCIe Spec:
> >>>>>>> MSI Enable Bit:
> >>>>>>> If 1 and the MSI-X Enable bit in the MSI-X Message Control
> >>>>>>> register (see Section 6.8.2.3) is 0, the function is permitted
> >>>>>>> to use MSI to request service and is prohibited from using its INTx#
> pin.
> >>>>>>>
> >>>>>>> From Endpoint perspective, MSI Enable = 1 indicates MSI can be
> >>>>>>> used
> >>>>>> which means MSI address and data fields are available/programmed.
> >>>>>>>
> >>>>>>> In our SoC whenever MSI Enable goes from 0 --> 1 the hardware
> >>>>>>> latches
> >>>>>> onto MSI address and MSI data values.
> >>>>>>>
> >>>>>>> With current MSI implementation in kernel, our SoC is latching
> >>>>>>> on to incorrect address and data values, as address/data are
> >>>>>>> updated much later
> >>>>>> than MSI Enable bit.
> >>>>>>
> >>>>>> As a side question, how does setting the affinity work on this
> >>>>>> end-point if this involves changing the address programmed in the
> >>>>>> MSI
> >>>> registers?
> >>>>>> Do you expect the enabled bit to be toggled to around the write?
> >>>>>>
> >>>>>
> >>>>> Yes,
> >>>>
> >>>> Well, that's pretty annoying, as this will not work either. But
> >>>> maybe your
> >> MSI
> >>>> controller has a single doorbell? You haven't mentioned which HW
> >>>> that
> >> is...
> >>>>
> >>> The MSI address/data is located in config space, in our SoC for the
> >>> logic
> >> behind PCIe
> >>> to become aware of new address/data  MSI enable transition is used
> >>> (0 to
> >> 1).
> >>> The logic cannot keep polling these registers in configuration space
> >>> as it
> >> would consume power.
> >>>
> >>> So the logic uses the transition in MSI enable to latch on to 
> >>> address/data.
> >>
> >> I understand the "why". I'm just wondering if your SoC needs to have
> >> the MSI address changed when changing the affinity of the MSI? What
> >> MSI controller are you using? Is it in mainline?
> >>
> > Can you please give more information on MSI affinity ?
> > For cpu affinity for interrupts we would use MSI-X.
> >
> > We are using GIC 400 v2.
>
> None of that is relevant. GIC400 doesn't have the faintest notion of what an
> MSI is, and MSI-X vs MSI is an end-point property.
>
> Please answer these questions: does your MSI controller have a unique
> doorbell, or multiple doorbells? Does it use wired interrupts (SPIs) connected
> to the GIC? Is the support code for this MSI controller in mainline or not?
>

It has single doorbell.
The MSI decoding is part of our PCIe bridge, and it has SPI to GIC.
Our root driver is in mainline drivers/pci/host/pcie-xilinx-nwl.c

Regards,
Bharat


This email and any attachments are intended for the sole use of the named 
recipient(s) and contain(s) confidential information that may be proprietary, 
privileged or copyrighted under applicable law. If you are not the intended 
recipient, do not read, copy, or forward this email message or any attachments. 
Delete this email message and any attachments immediately.



Re: PCIe MSI address is not written at pci_enable_msi_range call

2016-07-13 Thread Marc Zyngier
On 13/07/16 10:10, Bharat Kumar Gogada wrote:
>> Subject: Re: PCIe MSI address is not written at pci_enable_msi_range call
>>
>> On 13/07/16 09:33, Bharat Kumar Gogada wrote:
>>>> Subject: Re: PCIe MSI address is not written at pci_enable_msi_range call
>>>>
>>>> On 13/07/16 07:22, Bharat Kumar Gogada wrote:
>>>>>> Subject: Re: PCIe MSI address is not written at pci_enable_msi_range
>>>>>> call
>>>>>>
>>>>>> On 11/07/16 10:33, Bharat Kumar Gogada wrote:
>>>>>>> Hi Marc,
>>>>>>>
>>>>>>> Thanks for the reply.
>>>>>>>
>>>>>>> From PCIe Spec:
>>>>>>> MSI Enable Bit:
>>>>>>> If 1 and the MSI-X Enable bit in the MSI-X Message Control register
>>>>>>> (see Section 6.8.2.3) is 0, the function is permitted to use MSI to
>>>>>>> request service and is prohibited from using its INTx# pin.
>>>>>>>
>>>>>>> From Endpoint perspective, MSI Enable = 1 indicates MSI can be used
>>>>>> which means MSI address and data fields are available/programmed.
>>>>>>>
>>>>>>> In our SoC whenever MSI Enable goes from 0 --> 1 the hardware
>>>>>>> latches
>>>>>> onto MSI address and MSI data values.
>>>>>>>
>>>>>>> With current MSI implementation in kernel, our SoC is latching on to
>>>>>>> incorrect address and data values, as address/data are updated much
>>>>>>> later
>>>>>> than MSI Enable bit.
>>>>>>
>>>>>> As a side question, how does setting the affinity work on this
>>>>>> end-point if this involves changing the address programmed in the MSI
>>>> registers?
>>>>>> Do you expect the enabled bit to be toggled to around the write?
>>>>>>
>>>>>
>>>>> Yes,
>>>>
>>>> Well, that's pretty annoying, as this will not work either. But maybe your
>> MSI
>>>> controller has a single doorbell? You haven't mentioned which HW that
>> is...
>>>>
>>> The MSI address/data is located in config space, in our SoC for the logic
>> behind PCIe
>>> to become aware of new address/data  MSI enable transition is used (0 to
>> 1).
>>> The logic cannot keep polling these registers in configuration space as it
>> would consume power.
>>>
>>> So the logic uses the transition in MSI enable to latch on to address/data.
>>
>> I understand the "why". I'm just wondering if your SoC needs to have
>> the MSI address changed when changing the affinity of the MSI? What MSI
>> controller are you using? Is it in mainline?
>>
> Can you please give more information on MSI affinity ?
> For cpu affinity for interrupts we would use MSI-X.
> 
> We are using GIC 400 v2.

None of that is relevant. GIC400 doesn't have the faintest notion of
what an MSI is, and MSI-X vs MSI is an end-point property.

Please answer these questions: does your MSI controller have a unique
doorbell, or multiple doorbells? Does it use wired interrupts (SPIs)
connected to the GIC? Is the support code for this MSI controller in
mainline or not?

I'm trying to work out what I can do to help you.

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...


RE: PCIe MSI address is not written at pci_enable_msi_range call

2016-07-13 Thread Bharat Kumar Gogada
> Subject: Re: PCIe MSI address is not written at pci_enable_msi_range call
>
> On 13/07/16 09:33, Bharat Kumar Gogada wrote:
> >> Subject: Re: PCIe MSI address is not written at pci_enable_msi_range call
> >>
> >> On 13/07/16 07:22, Bharat Kumar Gogada wrote:
> >>>> Subject: Re: PCIe MSI address is not written at pci_enable_msi_range
> >>>> call
> >>>>
> >>>> On 11/07/16 10:33, Bharat Kumar Gogada wrote:
> >>>>> Hi Marc,
> >>>>>
> >>>>> Thanks for the reply.
> >>>>>
> >>>>> From PCIe Spec:
> >>>>> MSI Enable Bit:
> >>>>> If 1 and the MSI-X Enable bit in the MSI-X Message Control register
> >>>>> (see Section 6.8.2.3) is 0, the function is permitted to use MSI to
> >>>>> request service and is prohibited from using its INTx# pin.
> >>>>>
> >>>>> From Endpoint perspective, MSI Enable = 1 indicates MSI can be used
> >>>> which means MSI address and data fields are available/programmed.
> >>>>>
> >>>>> In our SoC whenever MSI Enable goes from 0 --> 1 the hardware
> >>>>> latches
> >>>> onto MSI address and MSI data values.
> >>>>>
> >>>>> With current MSI implementation in kernel, our SoC is latching on to
> >>>>> incorrect address and data values, as address/data are updated much
> >>>>> later
> >>>> than MSI Enable bit.
> >>>>
> >>>> As a side question, how does setting the affinity work on this
> >>>> end-point if this involves changing the address programmed in the MSI
> >> registers?
> >>>> Do you expect the enabled bit to be toggled to around the write?
> >>>>
> >>>
> >>> Yes,
> >>
> >> Well, that's pretty annoying, as this will not work either. But maybe your
> MSI
> >> controller has a single doorbell? You haven't mentioned which HW that
> is...
> >>
> > The MSI address/data is located in config space, in our SoC for the logic
> behind PCIe
> > to become aware of new address/data  MSI enable transition is used (0 to
> 1).
> > The logic cannot keep polling these registers in configuration space as it
> would consume power.
> >
> > So the logic uses the transition in MSI enable to latch on to address/data.
>
> I understand the "why". I'm just wondering if your SoC needs to have
> the MSI address changed when changing the affinity of the MSI? What MSI
> controller are you using? Is it in mainline?
>
Can you please give more information on MSI affinity ?
For cpu affinity for interrupts we would use MSI-X.

We are using GIC 400 v2.

Regards,
Bharat


This email and any attachments are intended for the sole use of the named 
recipient(s) and contain(s) confidential information that may be proprietary, 
privileged or copyrighted under applicable law. If you are not the intended 
recipient, do not read, copy, or forward this email message or any attachments. 
Delete this email message and any attachments immediately.



Re: PCIe MSI address is not written at pci_enable_msi_range call

2016-07-13 Thread Marc Zyngier
On 13/07/16 09:33, Bharat Kumar Gogada wrote:
>> Subject: Re: PCIe MSI address is not written at pci_enable_msi_range call
>>
>> On 13/07/16 07:22, Bharat Kumar Gogada wrote:
>>>> Subject: Re: PCIe MSI address is not written at pci_enable_msi_range
>>>> call
>>>>
>>>> On 11/07/16 10:33, Bharat Kumar Gogada wrote:
>>>>> Hi Marc,
>>>>>
>>>>> Thanks for the reply.
>>>>>
>>>>> From PCIe Spec:
>>>>> MSI Enable Bit:
>>>>> If 1 and the MSI-X Enable bit in the MSI-X Message Control register
>>>>> (see Section 6.8.2.3) is 0, the function is permitted to use MSI to
>>>>> request service and is prohibited from using its INTx# pin.
>>>>>
>>>>> From Endpoint perspective, MSI Enable = 1 indicates MSI can be used
>>>> which means MSI address and data fields are available/programmed.
>>>>>
>>>>> In our SoC whenever MSI Enable goes from 0 --> 1 the hardware
>>>>> latches
>>>> onto MSI address and MSI data values.
>>>>>
>>>>> With current MSI implementation in kernel, our SoC is latching on to
>>>>> incorrect address and data values, as address/data are updated much
>>>>> later
>>>> than MSI Enable bit.
>>>>
>>>> As a side question, how does setting the affinity work on this
>>>> end-point if this involves changing the address programmed in the MSI
>> registers?
>>>> Do you expect the enabled bit to be toggled to around the write?
>>>>
>>>
>>> Yes,
>>
>> Well, that's pretty annoying, as this will not work either. But maybe your 
>> MSI
>> controller has a single doorbell? You haven't mentioned which HW that is...
>>
> The MSI address/data is located in config space, in our SoC for the logic 
> behind PCIe
> to become aware of new address/data  MSI enable transition is used (0 to 1).
> The logic cannot keep polling these registers in configuration space as it 
> would consume power.
> 
> So the logic uses the transition in MSI enable to latch on to address/data.

I understand the "why". I'm just wondering if your SoC needs to have
the MSI address changed when changing the affinity of the MSI? What MSI
controller are you using? Is it in mainline?

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...


RE: PCIe MSI address is not written at pci_enable_msi_range call

2016-07-13 Thread Bharat Kumar Gogada
> Subject: Re: PCIe MSI address is not written at pci_enable_msi_range call
>
> On 13/07/16 07:22, Bharat Kumar Gogada wrote:
> >> Subject: Re: PCIe MSI address is not written at pci_enable_msi_range
> >> call
> >>
> >> On 11/07/16 10:33, Bharat Kumar Gogada wrote:
> >>> Hi Marc,
> >>>
> >>> Thanks for the reply.
> >>>
> >>> From PCIe Spec:
> >>> MSI Enable Bit:
> >>> If 1 and the MSI-X Enable bit in the MSI-X Message Control register
> >>> (see Section 6.8.2.3) is 0, the function is permitted to use MSI to
> >>> request service and is prohibited from using its INTx# pin.
> >>>
> >>> From Endpoint perspective, MSI Enable = 1 indicates MSI can be used
> >> which means MSI address and data fields are available/programmed.
> >>>
> >>> In our SoC whenever MSI Enable goes from 0 --> 1 the hardware
> >>> latches
> >> onto MSI address and MSI data values.
> >>>
> >>> With current MSI implementation in kernel, our SoC is latching on to
> >>> incorrect address and data values, as address/data are updated much
> >>> later
> >> than MSI Enable bit.
> >>
> >> As a side question, how does setting the affinity work on this
> >> end-point if this involves changing the address programmed in the MSI
> registers?
> >> Do you expect the enabled bit to be toggled to around the write?
> >>
> >
> > Yes,
>
> Well, that's pretty annoying, as this will not work either. But maybe your MSI
> controller has a single doorbell? You haven't mentioned which HW that is...
>
The MSI address/data is located in config space, in our SoC for the logic 
behind PCIe
to become aware of new address/data  MSI enable transition is used (0 to 1).
The logic cannot keep polling these registers in configuration space as it 
would consume power.

So the logic uses the transition in MSI enable to latch on to address/data.

> > Would anybody change MSI address in between wouldn't it cause race
> condition ?
>
> Changing the affinity of an interrupt is always racy, and the kernel deals 
> with
> it.
>

Regards,
Bharat


This email and any attachments are intended for the sole use of the named 
recipient(s) and contain(s) confidential information that may be proprietary, 
privileged or copyrighted under applicable law. If you are not the intended 
recipient, do not read, copy, or forward this email message or any attachments. 
Delete this email message and any attachments immediately.



Re: PCIe MSI address is not written at pci_enable_msi_range call

2016-07-13 Thread Marc Zyngier
On 13/07/16 07:22, Bharat Kumar Gogada wrote:
>> Subject: Re: PCIe MSI address is not written at pci_enable_msi_range call
>>
>> On 11/07/16 10:33, Bharat Kumar Gogada wrote:
>>> Hi Marc,
>>>
>>> Thanks for the reply.
>>>
>>> From PCIe Spec:
>>> MSI Enable Bit:
>>> If 1 and the MSI-X Enable bit in the MSI-X Message Control register
>>> (see Section 6.8.2.3) is 0, the function is permitted to use MSI to
>>> request service and is prohibited from using its INTx# pin.
>>>
>>> From Endpoint perspective, MSI Enable = 1 indicates MSI can be used
>> which means MSI address and data fields are available/programmed.
>>>
>>> In our SoC whenever MSI Enable goes from 0 --> 1 the hardware latches
>> onto MSI address and MSI data values.
>>>
>>> With current MSI implementation in kernel, our SoC is latching on to
>>> incorrect address and data values, as address/data are updated much later
>> than MSI Enable bit.
>>
>> As a side question, how does setting the affinity work on this end-point if 
>> this
>> involves changing the address programmed in the MSI registers?
>> Do you expect the enabled bit to be toggled to around the write?
>>
> 
> Yes,

Well, that's pretty annoying, as this will not work either. But maybe
your MSI controller has a single doorbell? You haven't mentioned which
HW that is...

> Would anybody change MSI address in between wouldn't it cause race condition ?

Changing the affinity of an interrupt is always racy, and the kernel
deals with it.

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...


RE: PCIe MSI address is not written at pci_enable_msi_range call

2016-07-12 Thread Bharat Kumar Gogada
> Subject: Re: PCIe MSI address is not written at pci_enable_msi_range call
>
> On 11/07/16 10:33, Bharat Kumar Gogada wrote:
> > Hi Marc,
> >
> > Thanks for the reply.
> >
> > From PCIe Spec:
> > MSI Enable Bit:
> > If 1 and the MSI-X Enable bit in the MSI-X Message Control register
> > (see Section 6.8.2.3) is 0, the function is permitted to use MSI to
> > request service and is prohibited from using its INTx# pin.
> >
> > From Endpoint perspective, MSI Enable = 1 indicates MSI can be used
> which means MSI address and data fields are available/programmed.
> >
> > In our SoC whenever MSI Enable goes from 0 --> 1 the hardware latches
> onto MSI address and MSI data values.
> >
> > With current MSI implementation in kernel, our SoC is latching on to
> > incorrect address and data values, as address/data are updated much later
> than MSI Enable bit.
>
> As a side question, how does setting the affinity work on this end-point if 
> this
> involves changing the address programmed in the MSI registers?
> Do you expect the enabled bit to be toggled to around the write?
>

Yes,
Would anybody change MSI address in between wouldn't it cause race condition ?

Thanks & Regards,
Bharat


This email and any attachments are intended for the sole use of the named 
recipient(s) and contain(s) confidential information that may be proprietary, 
privileged or copyrighted under applicable law. If you are not the intended 
recipient, do not read, copy, or forward this email message or any attachments. 
Delete this email message and any attachments immediately.



Re: PCIe MSI address is not written at pci_enable_msi_range call

2016-07-12 Thread Marc Zyngier
On 11/07/16 10:33, Bharat Kumar Gogada wrote:
> Hi Marc,
> 
> Thanks for the reply.
> 
> From PCIe Spec:
> MSI Enable Bit:
> If 1 and the MSI-X Enable bit in the MSI-X Message
> Control register (see Section 6.8.2.3) is 0, the
> function is permitted to use MSI to request service
> and is prohibited from using its INTx# pin.
> 
> From Endpoint perspective, MSI Enable = 1 indicates MSI can be used which 
> means MSI address and data fields are available/programmed.
> 
> In our SoC whenever MSI Enable goes from 0 --> 1 the hardware latches onto 
> MSI address and MSI data values.
> 
> With current MSI implementation in kernel, our SoC is latching on to 
> incorrect address and data values, as address/data
> are updated much later than MSI Enable bit.

As a side question, how does setting the affinity work on this end-point
if this involves changing the address programmed in the MSI registers?
Do you expect the enabled bit to be toggled to around the write?

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...


Re: PCIe MSI address is not written at pci_enable_msi_range call

2016-07-12 Thread Marc Zyngier
On 12/07/16 10:11, Bharat Kumar Gogada wrote:
>> Subject: Re: PCIe MSI address is not written at pci_enable_msi_range call
>>
>> On 11/07/16 11:51, Bharat Kumar Gogada wrote:
>>>>> Hi Marc,
>>>>>
>>>>> Thanks for the reply.
>>>>>
>>>>> From PCIe Spec:
>>>>> MSI Enable Bit:
>>>>> If 1 and the MSI-X Enable bit in the MSI-X Message Control register
>>>>> (see Section 6.8.2.3) is 0, the function is permitted to use MSI to
>>>>> request service and is prohibited from using its INTx# pin.
>>>>>
>>>>> From Endpoint perspective, MSI Enable = 1 indicates MSI can be used
>>>> which means MSI address and data fields are available/programmed.
>>>>>
>>>>> In our SoC whenever MSI Enable goes from 0 --> 1 the hardware
>>>>> latches
>>>> onto MSI address and MSI data values.
>>>>>
>>>>> With current MSI implementation in kernel, our SoC is latching on to
>>>> incorrect address and data values, as address/data
>>>>> are updated much later than MSI Enable bit.
>>>>
>>>> Interesting. It looks like we're doing something wrong in the MSI flow.
>>>> Can you confirm that this is limited to MSI and doesn't affect MSI-X?
>>>>
>>> I think it's the same issue irrespective of MSI or MSI-X as we are
>>> enabling these interrupts before providing the  vectors.
>>>
>>> So we always have a hole when MSI/MSI-X is 1, and software driver has
>>> not registered the irq, and End Point may raise an interrupt (may be
>>> due to error) in this point of time.
>>
>> Looking at the MSI-X part of the code, there is this:
>>
>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/pci
>> /msi.c#n764
>>
>> which hints that it may not be possible to do otherwise. Damned if you do,
>> damned if you don't.
>>
> MSI-X might not have problem then, how to resolve the issue with MSI ?

Can you give this patch a go and let me know if that works for you?

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index a080f44..565e2a4 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -1277,6 +1277,8 @@ struct irq_domain *pci_msi_create_irq_domain(struct 
fwnode_handle *fwnode,
if (info->flags & MSI_FLAG_USE_DEF_CHIP_OPS)
pci_msi_domain_update_chip_ops(info);
 
+   info->flags |= MSI_FLAG_ACTIVATE_EARLY;
+
domain = msi_create_irq_domain(fwnode, info, parent);
if (!domain)
return NULL;
diff --git a/include/linux/msi.h b/include/linux/msi.h
index 8b425c6..513b7c7 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -270,6 +270,8 @@ enum {
MSI_FLAG_MULTI_PCI_MSI  = (1 << 3),
/* Support PCI MSIX interrupts */
MSI_FLAG_PCI_MSIX   = (1 << 4),
+   /* Needs early activate, required for PCI */
+   MSI_FLAG_ACTIVATE_EARLY = (1 << 5),
 };
 
 int msi_domain_set_affinity(struct irq_data *data, const struct cpumask *mask,
diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 38e89ce..4ed2cca 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -361,6 +361,13 @@ int msi_domain_alloc_irqs(struct irq_domain *domain, 
struct device *dev,
else
dev_dbg(dev, "irq [%d-%d] for MSI\n",
virq, virq + desc->nvec_used - 1);
+
+   if (info->flags & MSI_FLAG_ACTIVATE_EARLY) {
+   struct irq_data *irq_data;
+
+   irq_data = irq_domain_get_irq_data(domain, desc->irq);
+   irq_domain_activate_irq(irq_data);
+   }
}
 
return 0;


Thanks,

M.

-- 
Jazz is not dead. It just smells funny...


RE: PCIe MSI address is not written at pci_enable_msi_range call

2016-07-12 Thread Bharat Kumar Gogada
> Subject: Re: PCIe MSI address is not written at pci_enable_msi_range call
>
> On 11/07/16 11:51, Bharat Kumar Gogada wrote:
> >>> Hi Marc,
> >>>
> >>> Thanks for the reply.
> >>>
> >>> From PCIe Spec:
> >>> MSI Enable Bit:
> >>> If 1 and the MSI-X Enable bit in the MSI-X Message Control register
> >>> (see Section 6.8.2.3) is 0, the function is permitted to use MSI to
> >>> request service and is prohibited from using its INTx# pin.
> >>>
> >>> From Endpoint perspective, MSI Enable = 1 indicates MSI can be used
> >> which means MSI address and data fields are available/programmed.
> >>>
> >>> In our SoC whenever MSI Enable goes from 0 --> 1 the hardware
> >>> latches
> >> onto MSI address and MSI data values.
> >>>
> >>> With current MSI implementation in kernel, our SoC is latching on to
> >> incorrect address and data values, as address/data
> >>> are updated much later than MSI Enable bit.
> >>
> >> Interesting. It looks like we're doing something wrong in the MSI flow.
> >> Can you confirm that this is limited to MSI and doesn't affect MSI-X?
> >>
> > I think it's the same issue irrespective of MSI or MSI-X as we are
> > enabling these interrupts before providing the  vectors.
> >
> > So we always have a hole when MSI/MSI-X is 1, and software driver has
> > not registered the irq, and End Point may raise an interrupt (may be
> > due to error) in this point of time.
>
> Looking at the MSI-X part of the code, there is this:
>
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/pci
> /msi.c#n764
>
> which hints that it may not be possible to do otherwise. Damned if you do,
> damned if you don't.
>
MSI-X might not have problem then, how to resolve the issue with MSI ?

Thanks & Regards,
Bharat


This email and any attachments are intended for the sole use of the named 
recipient(s) and contain(s) confidential information that may be proprietary, 
privileged or copyrighted under applicable law. If you are not the intended 
recipient, do not read, copy, or forward this email message or any attachments. 
Delete this email message and any attachments immediately.



Re: PCIe MSI address is not written at pci_enable_msi_range call

2016-07-11 Thread Marc Zyngier
On 11/07/16 11:51, Bharat Kumar Gogada wrote:
>>> Hi Marc,
>>>
>>> Thanks for the reply.
>>>
>>> From PCIe Spec:
>>> MSI Enable Bit:
>>> If 1 and the MSI-X Enable bit in the MSI-X Message
>>> Control register (see Section 6.8.2.3) is 0, the
>>> function is permitted to use MSI to request service
>>> and is prohibited from using its INTx# pin.
>>>
>>> From Endpoint perspective, MSI Enable = 1 indicates MSI can be used
>> which means MSI address and data fields are available/programmed.
>>>
>>> In our SoC whenever MSI Enable goes from 0 --> 1 the hardware latches
>> onto MSI address and MSI data values.
>>>
>>> With current MSI implementation in kernel, our SoC is latching on to
>> incorrect address and data values, as address/data
>>> are updated much later than MSI Enable bit.
>>
>> Interesting. It looks like we're doing something wrong in the MSI flow.
>> Can you confirm that this is limited to MSI and doesn't affect MSI-X?
>>
> I think it's the same issue irrespective of MSI or MSI-X as we are
> enabling these interrupts before providing the  vectors.
> 
> So we always have a hole when MSI/MSI-X is 1, and software driver has
> not registered the irq, and End Point may raise an interrupt (may be
> due to error) in this point of time.

Looking at the MSI-X part of the code, there is this:

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/msi.c#n764

which hints that it may not be possible to do otherwise. Damned if you
do, damned if you don't.

M.
-- 
Jazz is not dead. It just smells funny...


RE: PCIe MSI address is not written at pci_enable_msi_range call

2016-07-11 Thread Bharat Kumar Gogada
> > Hi Marc,
> >
> > Thanks for the reply.
> >
> > From PCIe Spec:
> > MSI Enable Bit:
> > If 1 and the MSI-X Enable bit in the MSI-X Message
> > Control register (see Section 6.8.2.3) is 0, the
> > function is permitted to use MSI to request service
> > and is prohibited from using its INTx# pin.
> >
> > From Endpoint perspective, MSI Enable = 1 indicates MSI can be used
> which means MSI address and data fields are available/programmed.
> >
> > In our SoC whenever MSI Enable goes from 0 --> 1 the hardware latches
> onto MSI address and MSI data values.
> >
> > With current MSI implementation in kernel, our SoC is latching on to
> incorrect address and data values, as address/data
> > are updated much later than MSI Enable bit.
>
> Interesting. It looks like we're doing something wrong in the MSI flow.
> Can you confirm that this is limited to MSI and doesn't affect MSI-X?
>

I think it's the same issue irrespective of MSI or MSI-X as we are enabling 
these interrupts before providing
the  vectors.

So we always have a hole when MSI/MSI-X is 1, and software driver has not 
registered the irq, and End Point
may raise an interrupt (may be due to error) in this point of time.

Thanks & Regards,
Bharat


This email and any attachments are intended for the sole use of the named 
recipient(s) and contain(s) confidential information that may be proprietary, 
privileged or copyrighted under applicable law. If you are not the intended 
recipient, do not read, copy, or forward this email message or any attachments. 
Delete this email message and any attachments immediately.



Re: PCIe MSI address is not written at pci_enable_msi_range call

2016-07-11 Thread Marc Zyngier
[Please don't top-post]

On 11/07/16 10:33, Bharat Kumar Gogada wrote:
> Hi Marc,
> 
> Thanks for the reply.
> 
> From PCIe Spec:
> MSI Enable Bit:
> If 1 and the MSI-X Enable bit in the MSI-X Message
> Control register (see Section 6.8.2.3) is 0, the
> function is permitted to use MSI to request service
> and is prohibited from using its INTx# pin.
> 
> From Endpoint perspective, MSI Enable = 1 indicates MSI can be used which 
> means MSI address and data fields are available/programmed.
> 
> In our SoC whenever MSI Enable goes from 0 --> 1 the hardware latches onto 
> MSI address and MSI data values.
> 
> With current MSI implementation in kernel, our SoC is latching on to 
> incorrect address and data values, as address/data
> are updated much later than MSI Enable bit.

Interesting. It looks like we're doing something wrong in the MSI flow.
Can you confirm that this is limited to MSI and doesn't affect MSI-X?

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...


RE: PCIe MSI address is not written at pci_enable_msi_range call

2016-07-11 Thread Bharat Kumar Gogada
Hi Marc,

Thanks for the reply.

>From PCIe Spec:
MSI Enable Bit:
If 1 and the MSI-X Enable bit in the MSI-X Message
Control register (see Section 6.8.2.3) is 0, the
function is permitted to use MSI to request service
and is prohibited from using its INTx# pin.

>From Endpoint perspective, MSI Enable = 1 indicates MSI can be used which 
>means MSI address and data fields are available/programmed.

In our SoC whenever MSI Enable goes from 0 --> 1 the hardware latches onto MSI 
address and MSI data values.

With current MSI implementation in kernel, our SoC is latching on to incorrect 
address and data values, as address/data
are updated much later than MSI Enable bit.

Thanks & Regards,
Bharat

> -Original Message-
> From: Marc Zyngier [mailto:marc.zyng...@arm.com]
> Sent: Monday, July 11, 2016 2:18 PM
> To: Bharat Kumar Gogada ; linux-
> p...@vger.kernel.org; linux-kernel@vger.kernel.org
> Cc: Arnd Bergmann ; Bjorn Helgaas
> 
> Subject: Re: PCIe MSI address is not written at pci_enable_msi_range call
>
> On 11/07/16 03:32, Bharat Kumar Gogada wrote:
> > Hi,
> >
> > I have a query.
> > I see that when we use PCI_MSI_IRQ_DOMAIN to handle MSI's, MSI
> address is not being
> > written in to end point's PCI_MSI_ADDRESS_LO/HI at the call
> pci_enable_msi_range.
> >
> > Instead it is being written at the time end point requests irq.
> >
> > Can any one tell the reason why is it handled in this manner ?
>
> Because there is no real need to do it earlier, and in some case you
> cannot allocate MSIs at that stage. pci_enable_msi_range only works out
> how many vectors are required. At least one MSI controller (GICv3 ITS)
> needs to know how many vectors are required before they can be provided
> to the end-point.
>
> Do you see any issue with this?
>
> Thanks,
>
>   M.
> --
> Jazz is not dead. It just smells funny...


This email and any attachments are intended for the sole use of the named 
recipient(s) and contain(s) confidential information that may be proprietary, 
privileged or copyrighted under applicable law. If you are not the intended 
recipient, do not read, copy, or forward this email message or any attachments. 
Delete this email message and any attachments immediately.



Re: PCIe MSI address is not written at pci_enable_msi_range call

2016-07-11 Thread Marc Zyngier
On 11/07/16 03:32, Bharat Kumar Gogada wrote:
> Hi,
> 
> I have a query.
> I see that when we use PCI_MSI_IRQ_DOMAIN to handle MSI's, MSI address is not 
> being
> written in to end point's PCI_MSI_ADDRESS_LO/HI at the call 
> pci_enable_msi_range.
> 
> Instead it is being written at the time end point requests irq.
> 
> Can any one tell the reason why is it handled in this manner ?

Because there is no real need to do it earlier, and in some case you
cannot allocate MSIs at that stage. pci_enable_msi_range only works out
how many vectors are required. At least one MSI controller (GICv3 ITS)
needs to know how many vectors are required before they can be provided
to the end-point.

Do you see any issue with this?

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...


RE: PCIe EndPoint DMA driver with DMA Framework

2016-06-14 Thread Bharat Kumar Gogada
Hi Vinod/kanya,

> On 6/13/2016 1:25 AM, Vinod Koul wrote:
> >> We are planning to write a PCIe EndPoint DMA driver with DMA
> >> Framework
> >> > targeting x86 machine.  (
> >> >
> "https://www.kernel.org/doc/Documentation/dmaengine/provider.txt";)
> >> > Our DMA controller is part of PCIe End Point.  We are targeting to
> >> > measure PCIe performance with this Framework driver.
> >> >
> >> > But when I see DMA Framework drivers is kernel source "drivers/dma"
> >> > most of the drivers are platform drivers.
> > wrong, there are bunch of PCI X86 driver. Look closely dw, ioat etc
>
> I usually see endpoint specific DMA code to reside in the endpoint device
> driver not in the dmaengine directory.
>
> I think the main question is who the consumer of this DMA controller is like
> Vinod asked.
>
> If it is a general purpose DMA controller then dmaengine would be the right
> place.
>
> If it is specific to your endpoint, then it should be set up and used in your
> endpoint device driver.
>
Our DMA controller is specific to EP.

Thanks a lot kanya and vinod for the clarifications.

Bharat


This email and any attachments are intended for the sole use of the named 
recipient(s) and contain(s) confidential information that may be proprietary, 
privileged or copyrighted under applicable law. If you are not the intended 
recipient, do not read, copy, or forward this email message or any attachments. 
Delete this email message and any attachments immediately.



Re: PCIe EndPoint DMA driver with DMA Framework

2016-06-13 Thread Sinan Kaya
On 6/13/2016 1:25 AM, Vinod Koul wrote:
>> We are planning to write a PCIe EndPoint DMA driver with DMA Framework
>> > targeting x86 machine.  (
>> > "https://www.kernel.org/doc/Documentation/dmaengine/provider.txt";) Our DMA
>> > controller is part of PCIe End Point.  We are targeting to measure PCIe
>> > performance with this Framework driver.
>> > 
>> > But when I see DMA Framework drivers is kernel source "drivers/dma" most
>> > of the drivers are platform drivers.
> wrong, there are bunch of PCI X86 driver. Look closely dw, ioat etc

I usually see endpoint specific DMA code to reside in the endpoint device
driver not in the dmaengine directory.

I think the main question is who the consumer of this DMA controller is like
Vinod asked.

If it is a general purpose DMA controller then dmaengine would be the right
place. 

If it is specific to your endpoint, then it should be set up and used in your
endpoint device driver.

-- 
Sinan Kaya
Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux 
Foundation Collaborative Project


Re: PCIe EndPoint DMA driver with DMA Framework

2016-06-12 Thread Vinod Koul
On Fri, Jun 10, 2016 at 03:39:15PM +, Bharat Kumar Gogada wrote:
> Hi,
> 

PLEASE wrap your replied to 80 chars..  I have reflown below..

> We are planning to write a PCIe EndPoint DMA driver with DMA Framework
> targeting x86 machine.  (
> "https://www.kernel.org/doc/Documentation/dmaengine/provider.txt";) Our DMA
> controller is part of PCIe End Point.  We are targeting to measure PCIe
> performance with this Framework driver.
> 
> But when I see DMA Framework drivers is kernel source "drivers/dma" most
> of the drivers are platform drivers.

wrong, there are bunch of PCI X86 driver. Look closely dw, ioat etc
> 
> So DMA Framework is mainly targeted for platform drivers?

First it is dmaengine framework.

And your assumption is wrong, btw did you see anything is dmaengine APIs
to make the assumption that dmaengine frameowrk is suited for platform
drivers. The frameworks do not care which type of device you have.

> 
> With current design model we need to have one DMA controller driver and
> PCIe EP client driver?

That depends on what you are trying to do but yes the dmaengine driver
will provide dma services and a client needs to use it

> 
> In which part of kernel source PCIe EP client driver will go?

whatever that client is trying to do DMA for. If its network then it
should go in network.

What exactly are you trying to do?

> 
> Can we use DMA Framework on x86 ?

And asking same question multiple times does not change the answer, which is
yes.

> 
> Thanks & Regards, Bharat
> 
> 
> 
> This email and any attachments are intended for the sole use of the named
> recipient(s) and contain(s) confidential information that may be
> proprietary, privileged or copyrighted under applicable law. If you are
> not the intended recipient, do not read, copy, or forward this email
> message or any attachments. Delete this email message and any attachments
> immediately.

What confidential information you have here?

-- 
~Vinod


RE: PCIe EP DMA driver with DMA Framework

2016-06-08 Thread Bharat Kumar Gogada
Sorry Please neglect the footer. Forgot to configure some settings.
> Hi All,
>
> We are planning to write a PCIe EndPoint DMA driver with DMA Framework
> on x86 machine.
> ("https://www.kernel.org/doc/Documentation/dmaengine/provider.txt";)
> We are targeting to measure PCIe performance with this Framework driver.
>
> But I did not find any PCIe driver with DMA Framework. Please let me know if
> any such PCIe driver exists.
>
> If there isn't any such driver, Please let me know if there are any 
> limitations
> with DMA Framework to be used with PCIe or Is this DMA Engine Framework
> is targeted for different purpose and not for PCIe ?
>
>
> Thanks
> Bharat
>
>
>
>
>
> This email and any attachments are intended for the sole use of the named
> recipient(s) and contain(s) confidential information that may be proprietary,
> privileged or copyrighted under applicable law. If you are not the intended
> recipient, do not read, copy, or forward this email message or any
> attachments. Delete this email message and any attachments immediately.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in the
> body of a message to majord...@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html


This email and any attachments are intended for the sole use of the named 
recipient(s) and contain(s) confidential information that may be proprietary, 
privileged or copyrighted under applicable law. If you are not the intended 
recipient, do not read, copy, or forward this email message or any attachments. 
Delete this email message and any attachments immediately.



Re: PCIe regression with DRA7xx in 4.4-rc1

2015-11-24 Thread Kishon Vijay Abraham I
Hi,

On Tuesday 24 November 2015 05:44 PM, Jisheng Zhang wrote:
> 
> 
> On Tue, 24 Nov 2015 17:31:07 +0530
> Kishon Vijay Abraham I wrote:
> 
>> Hi,
>>
>> I'm seeing a regression with ("PCI:
>> designware: Make driver arch-agnostic").
>>
>> Logs using a SATA PCIe card [1]. The PCIe card enumerates fine but after 
>> that I
>> observe "ata3.00: qc timeout (cmd 0xec), ata3.00: failed to IDENTIFY (I/O
>> error, err_mask=0x4)"
>>
>> Logs using a Ethenet PCIe card [2]. Again here the PCIe card enumerates fine
>> but when I give ifconfig up, it fails.
>>
>> If I just revert commit , the PCIe
>> cards starts to work fine again
> 
> FYI, maybe the patch can fix the regression.
> http://lists.infradead.org/pipermail/linux-arm-kernel/2015-November/387362.html

yes, it fixed.

Thanks
Kishon

> 
>>
>> Logs using a SATA PCIe card [3]. Here the KINGSTON SSD gets detected fine.
>> Logs using a Ethernet PCIe card [4]. I'm able to do ping tests now.
>>
>> Actually I'm not able to find any obvious problems with the patch and the irq
>> number and the memory resource also looks fine. Any idea what could be the 
>> problem?
>>
>> [1] -> http://pastebin.ubuntu.com/13491456/
>> [2] -> http://pastebin.ubuntu.com/13491526/
>>
>> [3] -> http://pastebin.ubuntu.com/13491658/
>> [4] -> http://pastebin.ubuntu.com/13491593/
>>
>> Thanks
>> Kishon
>>
>> ___
>> linux-arm-kernel mailing list
>> linux-arm-ker...@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCIe regression with DRA7xx in 4.4-rc1

2015-11-24 Thread Kishon Vijay Abraham I
Hi,

On Tuesday 24 November 2015 05:38 PM, Gabriele Paoloni wrote:
> Hi Kishon
> 
>> -Original Message-
>> From: Kishon Vijay Abraham I [mailto:kis...@ti.com]
>> Sent: 24 November 2015 12:01
>> To: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; linux-
>> o...@vger.kernel.org; james.mo...@arm.com; gabriel.fernan...@st.com;
>> minghuan.l...@freescale.com; Wangzhou (B); Gabriele Paoloni; a...@arndb.de;
>> bhelg...@google.com; pratyush.an...@gmail.com; Nori, Sekhar;
>> jingooh...@gmail.com; linux-arm-ker...@lists.infradead.org
>> Subject: PCIe regression with DRA7xx in 4.4-rc1
>>
>> Hi,
>>
>> I'm seeing a regression with ("PCI:
>> designware: Make driver arch-agnostic").
>>
>> Logs using a SATA PCIe card [1]. The PCIe card enumerates fine but after that
>> I
>> observe "ata3.00: qc timeout (cmd 0xec), ata3.00: failed to IDENTIFY (I/O
>> error, err_mask=0x4)"
>>
> 
> May this be related to the bug flagged in:
> 
> [PATCH] PCI: designware: remove wrong io_base assignment
> 
> [...]
> diff --git a/drivers/pci/host/pcie-designware.c 
> b/drivers/pci/host/pcie-designware.c
> index 540f077c37ea..02a7452bdf23 100644
> --- a/drivers/pci/host/pcie-designware.c
> +++ b/drivers/pci/host/pcie-designware.c
> @@ -440,7 +440,6 @@ int dw_pcie_host_init(struct pcie_port *pp)
>ret, pp->io);
>   continue;
>   }
> - pp->io_base = pp->io->start;
>   break;
>   case IORESOURCE_MEM:
>   pp->mem = win->res;
> 


yes, this indeed solved the bug.

Thanks
Kishon
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCIe regression with DRA7xx in 4.4-rc1

2015-11-24 Thread Jisheng Zhang


On Tue, 24 Nov 2015 17:31:07 +0530
Kishon Vijay Abraham I wrote:

> Hi,
> 
> I'm seeing a regression with ("PCI:
> designware: Make driver arch-agnostic").
> 
> Logs using a SATA PCIe card [1]. The PCIe card enumerates fine but after that 
> I
> observe "ata3.00: qc timeout (cmd 0xec), ata3.00: failed to IDENTIFY (I/O
> error, err_mask=0x4)"
> 
> Logs using a Ethenet PCIe card [2]. Again here the PCIe card enumerates fine
> but when I give ifconfig up, it fails.
> 
> If I just revert commit , the PCIe
> cards starts to work fine again

FYI, maybe the patch can fix the regression.
http://lists.infradead.org/pipermail/linux-arm-kernel/2015-November/387362.html

> 
> Logs using a SATA PCIe card [3]. Here the KINGSTON SSD gets detected fine.
> Logs using a Ethernet PCIe card [4]. I'm able to do ping tests now.
> 
> Actually I'm not able to find any obvious problems with the patch and the irq
> number and the memory resource also looks fine. Any idea what could be the 
> problem?
> 
> [1] -> http://pastebin.ubuntu.com/13491456/
> [2] -> http://pastebin.ubuntu.com/13491526/
> 
> [3] -> http://pastebin.ubuntu.com/13491658/
> [4] -> http://pastebin.ubuntu.com/13491593/
> 
> Thanks
> Kishon
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: PCIe regression with DRA7xx in 4.4-rc1

2015-11-24 Thread Gabriele Paoloni
Hi Kishon

> -Original Message-
> From: Kishon Vijay Abraham I [mailto:kis...@ti.com]
> Sent: 24 November 2015 12:01
> To: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; linux-
> o...@vger.kernel.org; james.mo...@arm.com; gabriel.fernan...@st.com;
> minghuan.l...@freescale.com; Wangzhou (B); Gabriele Paoloni; a...@arndb.de;
> bhelg...@google.com; pratyush.an...@gmail.com; Nori, Sekhar;
> jingooh...@gmail.com; linux-arm-ker...@lists.infradead.org
> Subject: PCIe regression with DRA7xx in 4.4-rc1
> 
> Hi,
> 
> I'm seeing a regression with ("PCI:
> designware: Make driver arch-agnostic").
> 
> Logs using a SATA PCIe card [1]. The PCIe card enumerates fine but after that
> I
> observe "ata3.00: qc timeout (cmd 0xec), ata3.00: failed to IDENTIFY (I/O
> error, err_mask=0x4)"
> 

May this be related to the bug flagged in:

[PATCH] PCI: designware: remove wrong io_base assignment

[...]
diff --git a/drivers/pci/host/pcie-designware.c 
b/drivers/pci/host/pcie-designware.c
index 540f077c37ea..02a7452bdf23 100644
--- a/drivers/pci/host/pcie-designware.c
+++ b/drivers/pci/host/pcie-designware.c
@@ -440,7 +440,6 @@ int dw_pcie_host_init(struct pcie_port *pp)
 ret, pp->io);
continue;
}
-   pp->io_base = pp->io->start;
break;
case IORESOURCE_MEM:
pp->mem = win->res;
-- 
1.7.9.5
[...]

Can you try to see if applying the patch above solves the issue?

Thanks

Gab

> Logs using a Ethenet PCIe card [2]. Again here the PCIe card enumerates fine
> but when I give ifconfig up, it fails.
> 
> If I just revert commit , the PCIe
> cards starts to work fine again
> 
> Logs using a SATA PCIe card [3]. Here the KINGSTON SSD gets detected fine.
> Logs using a Ethernet PCIe card [4]. I'm able to do ping tests now.
> 
> Actually I'm not able to find any obvious problems with the patch and the irq
> number and the memory resource also looks fine. Any idea what could be the
> problem?
> 
> [1] -> http://pastebin.ubuntu.com/13491456/
> [2] -> http://pastebin.ubuntu.com/13491526/
> 
> [3] -> http://pastebin.ubuntu.com/13491658/
> [4] -> http://pastebin.ubuntu.com/13491593/
> 
> Thanks
> Kishon


RE: PCIe host controller behind IOMMU on ARM

2015-11-13 Thread Phil Edworthy
On 13 November 2015 14:00, Arnd Bergmann wrote:
> On Friday 13 November 2015 13:03:11 Phil Edworthy wrote:
> >
> > > > Then pci_device_add() sets the devices coherent_dma_mask to 4GiB
> before
> > > > calling of_pci_dma_configure(). I assume it does this on the basis that 
> > > > this is
> a
> > > > good default for PCI drivers that don't call dma_set_mask().
> > > > So if arch_setup_dma_ops() walks up the parents to limit the mask, 
> > > > you'll
> hit
> > > > this mask.
> > >
> > > arch_setup_dma_ops() does not walk up the hierarchy, of_dma_configure()
> > > does this before calling arch_setup_dma_ops(). The PCI devices start out
> > > with the 32-bit mask, but the limit should be whatever PCI host uses.
> > Ok, so of_dma_configure() could walk up the tree and restrict the dma
> > mask to whatever parents limit it to. Then it could be overridden by
> > a dma-ranges entry in the DT node, right?
> 
> No, the dma-ranges properties tell you what the allowed masks are,
> this is what of_dma_configure() looks at.
Ok, I understand now.


> > If so, one problem I can see is PCI controllers already use the
> > dma-ranges binding but with 3 address cells since it also specifies
> > the PCI address range.
> >
> > I noticed that of_dma_get_range() skips straight to the parent node.
> > Shouldn't it attempt to get the dma-ranges for the device's node
> > first?
> 
> No, the dma-ranges explain the capabilities of the bus, this is
> what you have to look at. The device itself may have additional
> restrictions, but those are what the driver knows based on the
> compatibility value when it passes the device specific mask into
> dma_set_mask()
Ok, this is making sense now.


> > I mean most hardware is limited by the peripheral's
> > capabilities, not the bus. If fact, of_dma_get_range() gets the number
> > of address and size cells from the device node, but gets the dma-ranges
> > from the parent. That seems a little odd to me.
> 
> of_dma_get_range() calls of_n_addr_cells()/of_n_size_cells(), which get
> the #address-cells/#size-cells property from the parent device (except
> for the root, which is special).
Right, I should have checked what of_n_addr/size_cell() actually did.


> > The only other problem I can see is that currently all PCI drivers can
> > try to set their dma mask to 64 bits. At the moment that succeeds
> > because there are no checks.
> 
> Right, this is the main bug we need to fix.
Yep.


> > Until devices using them have their DTs
> > updated with dma-ranges, we would be limiting them to a 32 bit mask. I
> > guess that's not much of an issue in practice.
> 
> Correct. I've tried to tell everyone about this when they added device
> nodes for DMA capable devices. In most cases, they want 32-bit masks
> anyway.

Thanks for your help & patience, much appreciated!
Phil
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCIe host controller behind IOMMU on ARM

2015-11-13 Thread Arnd Bergmann
On Friday 13 November 2015 13:03:11 Phil Edworthy wrote:
> 
> > > Then pci_device_add() sets the devices coherent_dma_mask to 4GiB before
> > > calling of_pci_dma_configure(). I assume it does this on the basis that 
> > > this is a
> > > good default for PCI drivers that don't call dma_set_mask().
> > > So if arch_setup_dma_ops() walks up the parents to limit the mask, you'll 
> > > hit
> > > this mask.
> > 
> > arch_setup_dma_ops() does not walk up the hierarchy, of_dma_configure()
> > does this before calling arch_setup_dma_ops(). The PCI devices start out
> > with the 32-bit mask, but the limit should be whatever PCI host uses.
> Ok, so of_dma_configure() could walk up the tree and restrict the dma
> mask to whatever parents limit it to. Then it could be overridden by
> a dma-ranges entry in the DT node, right?

No, the dma-ranges properties tell you what the allowed masks are,
this is what of_dma_configure() looks at.

> If so, one problem I can see is PCI controllers already use the
> dma-ranges binding but with 3 address cells since it also specifies
> the PCI address range.
> 
> I noticed that of_dma_get_range() skips straight to the parent node.
> Shouldn't it attempt to get the dma-ranges for the device's node
> first?

No, the dma-ranges explain the capabilities of the bus, this is
what you have to look at. The device itself may have additional
restrictions, but those are what the driver knows based on the
compatibility value when it passes the device specific mask into
dma_set_mask()

> I mean most hardware is limited by the peripheral's
> capabilities, not the bus. If fact, of_dma_get_range() gets the number
> of address and size cells from the device node, but gets the dma-ranges
> from the parent. That seems a little odd to me.

of_dma_get_range() calls of_n_addr_cells()/of_n_size_cells(), which get
the #address-cells/#size-cells property from the parent device (except
for the root, which is special).

> The only other problem I can see is that currently all PCI drivers can
> try to set their dma mask to 64 bits. At the moment that succeeds
> because there are no checks.

Right, this is the main bug we need to fix.

> Until devices using them have their DTs
> updated with dma-ranges, we would be limiting them to a 32 bit mask. I
> guess that's not much of an issue in practice.

Correct. I've tried to tell everyone about this when they added device
nodes for DMA capable devices. In most cases, they want 32-bit masks
anyway.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: PCIe host controller behind IOMMU on ARM

2015-11-13 Thread Phil Edworthy
Hi Arnd,

On 12 November 2015 16:17, Arnd Bergmann wrote:
> On Thursday 12 November 2015 15:33:41 Phil Edworthy wrote:
> > On 12 November 2015 09:49, Arnd Bergmann wrote:
> > > On Thursday 12 November 2015 09:26:33 Phil Edworthy wrote:
> > > > On 11 November 2015 18:25, LIviu wrote:
> > > > > On Mon, Nov 09, 2015 at 12:32:13PM +, Phil Edworthy wrote:
> > >
> > > of_dma_configure calls of_dma_get_range to do all this for the PCIe host,
> > > and then calls arch_setup_dma_ops() so the architecture specific code can
> > > enforce the limits in dma_set_mask and pick an appropriate set of dma
> > > operations. The missing part is in the implementation of
> arch_setup_dma_ops,
> > > which currently happily ignores the base and limit.
> > I don't think it's as simple as that, though I could be wrong!
> >
> > First off, of_dma_configure() sets a default coherent_dma_mask to 4GiB.
> > This default is set for the 'platform soc' device. For my own testing I 
> > increased
> > this to DMA_BIT_MASK(63). Note that setting it to DMA_BIT_MASK(64) causes
> > boot failure that I haven't looked into.
> 
> Most platform devices actually need the 32-bit mask, so we intentionally
> followed what PCI does here and default to that and require platform drivers
> to explicitly ask for a larger mask if they need it.
Ok, that makes sense.


> > Then pci_device_add() sets the devices coherent_dma_mask to 4GiB before
> > calling of_pci_dma_configure(). I assume it does this on the basis that 
> > this is a
> > good default for PCI drivers that don't call dma_set_mask().
> > So if arch_setup_dma_ops() walks up the parents to limit the mask, you'll 
> > hit
> > this mask.
> 
> arch_setup_dma_ops() does not walk up the hierarchy, of_dma_configure()
> does this before calling arch_setup_dma_ops(). The PCI devices start out
> with the 32-bit mask, but the limit should be whatever PCI host uses.
Ok, so of_dma_configure() could walk up the tree and restrict the dma
mask to whatever parents limit it to. Then it could be overridden by
a dma-ranges entry in the DT node, right?
If so, one problem I can see is PCI controllers already use the
dma-ranges binding but with 3 address cells since it also specifies
the PCI address range.

I noticed that of_dma_get_range() skips straight to the parent node.
Shouldn't it attempt to get the dma-ranges for the device's node
first? I mean most hardware is limited by the peripheral's
capabilities, not the bus. If fact, of_dma_get_range() gets the number
of address and size cells from the device node, but gets the dma-ranges
from the parent. That seems a little odd to me.

The only other problem I can see is that currently all PCI drivers can
try to set their dma mask to 64 bits. At the moment that succeeds
because there are no checks. Until devices using them have their DTs
updated with dma-ranges, we would be limiting them to a 32 bit mask. I
guess that's not much of an issue in practice.


> > Finally, dma_set_mask_and_coherent() is called from the PCI card driver
> > but it doesn't check the parents dma masks either.
> 
> The way I think this should work is that arch_setup_dma_ops() stores the
> allowed mask in the struct device, and that dma_set_mask compares the
> mask against that.
That makes sense.

Thanks for your help,
Phil
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCIe host controller behind IOMMU on ARM

2015-11-12 Thread Arnd Bergmann
On Thursday 12 November 2015 15:33:41 Phil Edworthy wrote:
> On 12 November 2015 09:49, Arnd Bergmann wrote:
> > On Thursday 12 November 2015 09:26:33 Phil Edworthy wrote:
> > > On 11 November 2015 18:25, LIviu wrote:
> > > > On Mon, Nov 09, 2015 at 12:32:13PM +, Phil Edworthy wrote:
> > 
> > of_dma_configure calls of_dma_get_range to do all this for the PCIe host,
> > and then calls arch_setup_dma_ops() so the architecture specific code can
> > enforce the limits in dma_set_mask and pick an appropriate set of dma
> > operations. The missing part is in the implementation of arch_setup_dma_ops,
> > which currently happily ignores the base and limit.
> I don't think it's as simple as that, though I could be wrong!
> 
> First off, of_dma_configure() sets a default coherent_dma_mask to 4GiB.
> This default is set for the 'platform soc' device. For my own testing I 
> increased
> this to DMA_BIT_MASK(63). Note that setting it to DMA_BIT_MASK(64) causes
> boot failure that I haven't looked into.

Most platform devices actually need the 32-bit mask, so we intentionally
followed what PCI does here and default to that and require platform drivers
to explicitly ask for a larger mask if they need it.

> Then pci_device_add() sets the devices coherent_dma_mask to 4GiB before
> calling of_pci_dma_configure(). I assume it does this on the basis that this 
> is a
> good default for PCI drivers that don't call dma_set_mask().
> So if arch_setup_dma_ops() walks up the parents to limit the mask, you'll hit
> this mask.

arch_setup_dma_ops() does not walk up the hierarchy, of_dma_configure()
does this before calling arch_setup_dma_ops(). The PCI devices start out
with the 32-bit mask, but the limit should be whatever PCI host uses.

> Finally, dma_set_mask_and_coherent() is called from the PCI card driver
> but it doesn't check the parents dma masks either.

The way I think this should work is that arch_setup_dma_ops() stores the
allowed mask in the struct device, and that dma_set_mask compares the
mask against that.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: PCIe host controller behind IOMMU on ARM

2015-11-12 Thread Phil Edworthy
Hi Arnd,

On 12 November 2015 09:49, Arnd Bergmann wrote:
> On Thursday 12 November 2015 09:26:33 Phil Edworthy wrote:
> > On 11 November 2015 18:25, LIviu wrote:
> > > On Mon, Nov 09, 2015 at 12:32:13PM +, Phil Edworthy wrote:
> 
> > > I think you're mixing things a bit or not explaining them very well. 
> > > Having the
> > > PCIe controller limited to 32-bit AXI does not mean that the PCIe bus 
> > > cannot
> > > carry 64-bit addresses. It depends on how they get translated by the host
> bridge
> > > or its associated ATS block. I can't see why you can't have a setup where
> > > the CPU addresses are 32-bit but the PCIe bus addresses are all 64-bit.
> > > You just have to be careful on how you setup your mem64 ranges so that
> they
> > > don't
> > > overlap with the 32-bit ranges when translated.
> > From a HW point of view I agree that we can setup the PCI host bridge such 
> > that
> > it uses 64-bit PCI address, with 32-bit cpu addresses. Though in practice 
> > doesn't
> > this mean that the dma ops used by card drivers has to be provided by our 
> > PCI
> > host bridge driver so we can apply the translation to those PCI addresses?
> > This comes back to my point below about how to do this. Adding a bus 
> > notifier
> > to do this may be too late, and arm64 doesn't implement set_dma_ops().
> >
> > > And no, you should not limit at the card driver the DMA_BIT_MASK() unless
> the
> > > card is not capable of supporting more than 32-bit addresses.
> > If there was infrastructure that checked all parents dma-ranges when the
> > dma_set_mask() function is called as Arnd pointed out, this would nicely 
> > solve
> > the problem.
> 
> of_dma_configure calls of_dma_get_range to do all this for the PCIe host,
> and then calls arch_setup_dma_ops() so the architecture specific code can
> enforce the limits in dma_set_mask and pick an appropriate set of dma
> operations. The missing part is in the implementation of arch_setup_dma_ops,
> which currently happily ignores the base and limit.
I don't think it's as simple as that, though I could be wrong!

First off, of_dma_configure() sets a default coherent_dma_mask to 4GiB.
This default is set for the 'platform soc' device. For my own testing I 
increased
this to DMA_BIT_MASK(63). Note that setting it to DMA_BIT_MASK(64) causes
boot failure that I haven't looked into.

Then pci_device_add() sets the devices coherent_dma_mask to 4GiB before
calling of_pci_dma_configure(). I assume it does this on the basis that this is 
a
good default for PCI drivers that don't call dma_set_mask().
So if arch_setup_dma_ops() walks up the parents to limit the mask, you'll hit
this mask.

Finally, dma_set_mask_and_coherent() is called from the PCI card driver
but it doesn't check the parents dma masks either.

Thanks
Phil

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCIe host controller behind IOMMU on ARM

2015-11-12 Thread liviu.du...@arm.com
On Thu, Nov 12, 2015 at 09:26:33AM +, Phil Edworthy wrote:
> Hi Liviu, Arnd,
> 
> On 11 November 2015 18:25, LIviu wrote:
> > On Mon, Nov 09, 2015 at 12:32:13PM +, Phil Edworthy wrote:
> > > Hi Liviu, Will,
> > >
> > > On 04 November 2015 15:19, Phil wrote:
> > > > On 04 November 2015 15:02, Liviu wrote:
> > > > > On Wed, Nov 04, 2015 at 02:48:38PM +, Phil Edworthy wrote:
> > > > > > Hi Liviu,
> > > > > >
> > > > > > On 04 November 2015 14:24, Liviu wrote:
> > > > > > > On Wed, Nov 04, 2015 at 01:57:48PM +, Phil Edworthy wrote:
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > I am trying to hook up a PCIe host controller that sits behind 
> > > > > > > > an
> > IOMMU,
> > > > > > > > but having some problems.
> > > > > > > >
> > > > > > > > I'm using the pcie-rcar PCIe host controller and it works fine 
> > > > > > > > without
> > > > > > > > the IOMMU, and I can attach the IOMMU to the controller such 
> > > > > > > > that
> > any
> > > > > calls
> > > > > > > > to dma_alloc_coherent made by the controller driver uses the
> > > > iommu_ops
> > > > > > > > version of dma_ops.
> > > > > > > >
> > > > > > > > However, I can't see how to make the endpoints to utilise the
> > dma_ops
> > > > that
> > > > > > > > the controller uses. Shouldn't the endpoints inherit the 
> > > > > > > > dma_ops from
> > the
> > > > > > > > controller?
> > > > > > >
> > > > > > > No, not directly.
> > > > > > >
> > > > > > > > Any pointers for this?
> > > > > > >
> > > > > > > You need to understand the process through which a driver for
> > endpoint
> > > > get
> > > > > > > an address to be passed down to the device. Have a look at
> > > > > > > Documentation/DMA-API-HOWTO.txt, there is a nice explanation 
> > > > > > > there.
> > > > > > > (Hint: EP driver needs to call dma_map_single).
> > > > > > >
> > > > > > > Also, you need to make sure that the bus address that ends up 
> > > > > > > being set
> > > > into
> > > > > > > the endpoint gets translated correctly by the host controller 
> > > > > > > into an
> > address
> > > > > > > that the IOMMU can then translate into physical address.
> > > > > > Sure, though since this is bog standard Intel PCIe ethernet card 
> > > > > > which
> > works
> > > > > > fine when the IOMMU is effectively unused, I don’t think there is a
> > problem
> > > > > > with that.
> > > > > >
> > > > > > The driver for the PCIe controller sets up the IOMMU mapping ok 
> > > > > > when I
> > > > > > do a test call to dma_alloc_coherent() in the controller's driver. 
> > > > > > i.e. when I
> > > > > > do this, it ends up in arm_iommu_alloc_attrs(), which calls
> > > > > > __iommu_alloc_buffer() and __alloc_iova().
> > > > > >
> > > > > > When an endpoint driver allocates and maps a dma coherent buffer it
> > > > > > also needs to end up in arm_iommu_alloc_attrs(), but it doesn't.
> > > > >
> > > > > Why do you think that? Remember that the only thing attached to the
> > IOMMU
> > > > is
> > > > > the
> > > > > host controller. The endpoint is on the PCIe bus, which gets a 
> > > > > different
> > > > > translation
> > > > > that the IOMMU knows nothing about. If it helps you to visualise it 
> > > > > better,
> > think
> > > > > of the host controller as another IOMMU device. It's the ops of the 
> > > > > host
> > > > > controller
> > > > > that should be invoked, not the IOMMU's.
> > > > Ok, that makes sense. I'll have a think and poke it a bit more...
> > 
> > Hi Phil,
> > 
> > Not trying to ignore your email, but I thought this is more in Will's 
> > backyard.
> > 
> > > Somewhat related to this, since our PCIe controller HW is limited to
> > > 32-bit AXI address range, before trying to hook up the IOMMU I have
> > > tried to limit the dma_mask for PCI cards to DMA_BIT_MASK(32). The
> > > reason being that Linux uses a 1 to 1 mapping between PCI addresses
> > > and cpu (phys) addresses when there isn't an IOMMU involved, so I
> > > think that we need to limit the PCI address space used.
> > 
> > I think you're mixing things a bit or not explaining them very well. Having 
> > the
> > PCIe controller limited to 32-bit AXI does not mean that the PCIe bus cannot
> > carry 64-bit addresses. It depends on how they get translated by the host 
> > bridge
> > or its associated ATS block. I can't see why you can't have a setup where
> > the CPU addresses are 32-bit but the PCIe bus addresses are all 64-bit.
> > You just have to be careful on how you setup your mem64 ranges so that they
> > don't
> > overlap with the 32-bit ranges when translated.
> From a HW point of view I agree that we can setup the PCI host bridge such 
> that
> it uses 64-bit PCI address, with 32-bit cpu addresses. Though in practice 
> doesn't
> this mean that the dma ops used by card drivers has to be provided by our PCI
> host bridge driver so we can apply the translation to those PCI addresses?

I thought all addresses that are set into the cards go through
pcibios_resource_to_bus() which give you the PCI address 

Re: PCIe host controller behind IOMMU on ARM

2015-11-12 Thread Arnd Bergmann
On Thursday 12 November 2015 09:26:33 Phil Edworthy wrote:
> On 11 November 2015 18:25, LIviu wrote:
> > On Mon, Nov 09, 2015 at 12:32:13PM +, Phil Edworthy wrote:

> > I think you're mixing things a bit or not explaining them very well. Having 
> > the
> > PCIe controller limited to 32-bit AXI does not mean that the PCIe bus cannot
> > carry 64-bit addresses. It depends on how they get translated by the host 
> > bridge
> > or its associated ATS block. I can't see why you can't have a setup where
> > the CPU addresses are 32-bit but the PCIe bus addresses are all 64-bit.
> > You just have to be careful on how you setup your mem64 ranges so that they
> > don't
> > overlap with the 32-bit ranges when translated.
> From a HW point of view I agree that we can setup the PCI host bridge such 
> that
> it uses 64-bit PCI address, with 32-bit cpu addresses. Though in practice 
> doesn't
> this mean that the dma ops used by card drivers has to be provided by our PCI
> host bridge driver so we can apply the translation to those PCI addresses?
> This comes back to my point below about how to do this. Adding a bus notifier
> to do this may be too late, and arm64 doesn't implement set_dma_ops().
> 
> > And no, you should not limit at the card driver the DMA_BIT_MASK() unless 
> > the
> > card is not capable of supporting more than 32-bit addresses.
> If there was infrastructure that checked all parents dma-ranges when the
> dma_set_mask() function is called as Arnd pointed out, this would nicely solve
> the problem.

of_dma_configure calls of_dma_get_range to do all this for the PCIe host,
and then calls arch_setup_dma_ops() so the architecture specific code can
enforce the limits in dma_set_mask and pick an appropriate set of dma
operations. The missing part is in the implementation of arch_setup_dma_ops,
which currently happily ignores the base and limit.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: PCIe host controller behind IOMMU on ARM

2015-11-12 Thread Phil Edworthy
Hi Liviu, Arnd,

On 11 November 2015 18:25, LIviu wrote:
> On Mon, Nov 09, 2015 at 12:32:13PM +, Phil Edworthy wrote:
> > Hi Liviu, Will,
> >
> > On 04 November 2015 15:19, Phil wrote:
> > > On 04 November 2015 15:02, Liviu wrote:
> > > > On Wed, Nov 04, 2015 at 02:48:38PM +, Phil Edworthy wrote:
> > > > > Hi Liviu,
> > > > >
> > > > > On 04 November 2015 14:24, Liviu wrote:
> > > > > > On Wed, Nov 04, 2015 at 01:57:48PM +, Phil Edworthy wrote:
> > > > > > > Hi,
> > > > > > >
> > > > > > > I am trying to hook up a PCIe host controller that sits behind an
> IOMMU,
> > > > > > > but having some problems.
> > > > > > >
> > > > > > > I'm using the pcie-rcar PCIe host controller and it works fine 
> > > > > > > without
> > > > > > > the IOMMU, and I can attach the IOMMU to the controller such that
> any
> > > > calls
> > > > > > > to dma_alloc_coherent made by the controller driver uses the
> > > iommu_ops
> > > > > > > version of dma_ops.
> > > > > > >
> > > > > > > However, I can't see how to make the endpoints to utilise the
> dma_ops
> > > that
> > > > > > > the controller uses. Shouldn't the endpoints inherit the dma_ops 
> > > > > > > from
> the
> > > > > > > controller?
> > > > > >
> > > > > > No, not directly.
> > > > > >
> > > > > > > Any pointers for this?
> > > > > >
> > > > > > You need to understand the process through which a driver for
> endpoint
> > > get
> > > > > > an address to be passed down to the device. Have a look at
> > > > > > Documentation/DMA-API-HOWTO.txt, there is a nice explanation there.
> > > > > > (Hint: EP driver needs to call dma_map_single).
> > > > > >
> > > > > > Also, you need to make sure that the bus address that ends up being 
> > > > > > set
> > > into
> > > > > > the endpoint gets translated correctly by the host controller into 
> > > > > > an
> address
> > > > > > that the IOMMU can then translate into physical address.
> > > > > Sure, though since this is bog standard Intel PCIe ethernet card which
> works
> > > > > fine when the IOMMU is effectively unused, I don’t think there is a
> problem
> > > > > with that.
> > > > >
> > > > > The driver for the PCIe controller sets up the IOMMU mapping ok when I
> > > > > do a test call to dma_alloc_coherent() in the controller's driver. 
> > > > > i.e. when I
> > > > > do this, it ends up in arm_iommu_alloc_attrs(), which calls
> > > > > __iommu_alloc_buffer() and __alloc_iova().
> > > > >
> > > > > When an endpoint driver allocates and maps a dma coherent buffer it
> > > > > also needs to end up in arm_iommu_alloc_attrs(), but it doesn't.
> > > >
> > > > Why do you think that? Remember that the only thing attached to the
> IOMMU
> > > is
> > > > the
> > > > host controller. The endpoint is on the PCIe bus, which gets a different
> > > > translation
> > > > that the IOMMU knows nothing about. If it helps you to visualise it 
> > > > better,
> think
> > > > of the host controller as another IOMMU device. It's the ops of the host
> > > > controller
> > > > that should be invoked, not the IOMMU's.
> > > Ok, that makes sense. I'll have a think and poke it a bit more...
> 
> Hi Phil,
> 
> Not trying to ignore your email, but I thought this is more in Will's 
> backyard.
> 
> > Somewhat related to this, since our PCIe controller HW is limited to
> > 32-bit AXI address range, before trying to hook up the IOMMU I have
> > tried to limit the dma_mask for PCI cards to DMA_BIT_MASK(32). The
> > reason being that Linux uses a 1 to 1 mapping between PCI addresses
> > and cpu (phys) addresses when there isn't an IOMMU involved, so I
> > think that we need to limit the PCI address space used.
> 
> I think you're mixing things a bit or not explaining them very well. Having 
> the
> PCIe controller limited to 32-bit AXI does not mean that the PCIe bus cannot
> carry 64-bit addresses. It depends on how they get translated by the host 
> bridge
> or its associated ATS block. I can't see why you can't have a setup where
> the CPU addresses are 32-bit but the PCIe bus addresses are all 64-bit.
> You just have to be careful on how you setup your mem64 ranges so that they
> don't
> overlap with the 32-bit ranges when translated.
From a HW point of view I agree that we can setup the PCI host bridge such that
it uses 64-bit PCI address, with 32-bit cpu addresses. Though in practice 
doesn't
this mean that the dma ops used by card drivers has to be provided by our PCI
host bridge driver so we can apply the translation to those PCI addresses?
This comes back to my point below about how to do this. Adding a bus notifier
to do this may be too late, and arm64 doesn't implement set_dma_ops().

> And no, you should not limit at the card driver the DMA_BIT_MASK() unless the
> card is not capable of supporting more than 32-bit addresses.
If there was infrastructure that checked all parents dma-ranges when the
dma_set_mask() function is called as Arnd pointed out, this would nicely solve
the problem.

> > Since pci_setup_device() sets up dma_mas

Re: PCIe host controller behind IOMMU on ARM

2015-11-11 Thread Arnd Bergmann
On Wednesday 11 November 2015 18:24:56 liviu.du...@arm.com wrote:
> 
> > Somewhat related to this, since our PCIe controller HW is limited to
> > 32-bit AXI address range, before trying to hook up the IOMMU I have
> > tried to limit the dma_mask for PCI cards to DMA_BIT_MASK(32). The
> > reason being that Linux uses a 1 to 1 mapping between PCI addresses
> > and cpu (phys) addresses when there isn't an IOMMU involved, so I
> > think that we need to limit the PCI address space used.
> 
> I think you're mixing things a bit or not explaining them very well. Having 
> the
> PCIe controller limited to 32-bit AXI does not mean that the PCIe bus cannot
> carry 64-bit addresses. It depends on how they get translated by the host 
> bridge
> or its associated ATS block. I can't see why you can't have a setup where
> the CPU addresses are 32-bit but the PCIe bus addresses are all 64-bit.
> You just have to be careful on how you setup your mem64 ranges so that they 
> don't
> overlap with the 32-bit ranges when translated.
> 
> And no, you should not limit at the card driver the DMA_BIT_MASK() unless the
> card is not capable of supporting more than 32-bit addresses.

I think we are missing one crucial bit of infrastructure on ARM64 at
the moment: the dma_set_mask() function should fail if a driver asks
for a mask that is larger than the dma-ranges property of the parent
device (or any device higher up in the hierarchy) allows.

Drivers that want a larger mask should try that first, and then fall
back to a 32-bit mask, which is guaranteed to work.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCIe host controller behind IOMMU on ARM

2015-11-11 Thread liviu.du...@arm.com
On Mon, Nov 09, 2015 at 12:32:13PM +, Phil Edworthy wrote:
> Hi Liviu, Will,
> 
> On 04 November 2015 15:19, Phil wrote:
> > On 04 November 2015 15:02, Liviu wrote:
> > > On Wed, Nov 04, 2015 at 02:48:38PM +, Phil Edworthy wrote:
> > > > Hi Liviu,
> > > >
> > > > On 04 November 2015 14:24, Liviu wrote:
> > > > > On Wed, Nov 04, 2015 at 01:57:48PM +, Phil Edworthy wrote:
> > > > > > Hi,
> > > > > >
> > > > > > I am trying to hook up a PCIe host controller that sits behind an 
> > > > > > IOMMU,
> > > > > > but having some problems.
> > > > > >
> > > > > > I'm using the pcie-rcar PCIe host controller and it works fine 
> > > > > > without
> > > > > > the IOMMU, and I can attach the IOMMU to the controller such that 
> > > > > > any
> > > calls
> > > > > > to dma_alloc_coherent made by the controller driver uses the
> > iommu_ops
> > > > > > version of dma_ops.
> > > > > >
> > > > > > However, I can't see how to make the endpoints to utilise the 
> > > > > > dma_ops
> > that
> > > > > > the controller uses. Shouldn't the endpoints inherit the dma_ops 
> > > > > > from the
> > > > > > controller?
> > > > >
> > > > > No, not directly.
> > > > >
> > > > > > Any pointers for this?
> > > > >
> > > > > You need to understand the process through which a driver for endpoint
> > get
> > > > > an address to be passed down to the device. Have a look at
> > > > > Documentation/DMA-API-HOWTO.txt, there is a nice explanation there.
> > > > > (Hint: EP driver needs to call dma_map_single).
> > > > >
> > > > > Also, you need to make sure that the bus address that ends up being 
> > > > > set
> > into
> > > > > the endpoint gets translated correctly by the host controller into an 
> > > > > address
> > > > > that the IOMMU can then translate into physical address.
> > > > Sure, though since this is bog standard Intel PCIe ethernet card which 
> > > > works
> > > > fine when the IOMMU is effectively unused, I don’t think there is a 
> > > > problem
> > > > with that.
> > > >
> > > > The driver for the PCIe controller sets up the IOMMU mapping ok when I
> > > > do a test call to dma_alloc_coherent() in the controller's driver. i.e. 
> > > > when I
> > > > do this, it ends up in arm_iommu_alloc_attrs(), which calls
> > > > __iommu_alloc_buffer() and __alloc_iova().
> > > >
> > > > When an endpoint driver allocates and maps a dma coherent buffer it
> > > > also needs to end up in arm_iommu_alloc_attrs(), but it doesn't.
> > >
> > > Why do you think that? Remember that the only thing attached to the IOMMU
> > is
> > > the
> > > host controller. The endpoint is on the PCIe bus, which gets a different
> > > translation
> > > that the IOMMU knows nothing about. If it helps you to visualise it 
> > > better, think
> > > of the host controller as another IOMMU device. It's the ops of the host
> > > controller
> > > that should be invoked, not the IOMMU's.
> > Ok, that makes sense. I'll have a think and poke it a bit more...

Hi Phil,

Not trying to ignore your email, but I thought this is more in Will's backyard.

> Somewhat related to this, since our PCIe controller HW is limited to
> 32-bit AXI address range, before trying to hook up the IOMMU I have
> tried to limit the dma_mask for PCI cards to DMA_BIT_MASK(32). The
> reason being that Linux uses a 1 to 1 mapping between PCI addresses
> and cpu (phys) addresses when there isn't an IOMMU involved, so I
> think that we need to limit the PCI address space used.

I think you're mixing things a bit or not explaining them very well. Having the
PCIe controller limited to 32-bit AXI does not mean that the PCIe bus cannot
carry 64-bit addresses. It depends on how they get translated by the host bridge
or its associated ATS block. I can't see why you can't have a setup where
the CPU addresses are 32-bit but the PCIe bus addresses are all 64-bit.
You just have to be careful on how you setup your mem64 ranges so that they 
don't
overlap with the 32-bit ranges when translated.

And no, you should not limit at the card driver the DMA_BIT_MASK() unless the
card is not capable of supporting more than 32-bit addresses.

> 
> Since pci_setup_device() sets up dma_mask, I added a bus notifier in the
> PCIe controller driver so I can change the mask, if needed, on the
> BUS_NOTIFY_BOUND_DRIVER action.
> However, I think there is the potential for card drivers to allocate and
> map buffers before the bus notifier get called. Additionally, I've seen
> drivers change their behaviour based on the success or failure of
> dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64)), so the
> driver could, theoretically at least, operate in a way that is not
> compatible with a more restricted dma_mask (though I can't think
> of any way this would not work with hardware I've seen).
> 
> So, I think that using a bus notifier is the wrong way to go, but I don’t
> know what other options I have. Any suggestions?

I would first have a look at how the PCIe bus addresses are translated by the
host controller. 

Bes

RE: PCIe host controller behind IOMMU on ARM

2015-11-09 Thread Phil Edworthy
Hi Liviu, Will,

On 04 November 2015 15:19, Phil wrote:
> On 04 November 2015 15:02, Liviu wrote:
> > On Wed, Nov 04, 2015 at 02:48:38PM +, Phil Edworthy wrote:
> > > Hi Liviu,
> > >
> > > On 04 November 2015 14:24, Liviu wrote:
> > > > On Wed, Nov 04, 2015 at 01:57:48PM +, Phil Edworthy wrote:
> > > > > Hi,
> > > > >
> > > > > I am trying to hook up a PCIe host controller that sits behind an 
> > > > > IOMMU,
> > > > > but having some problems.
> > > > >
> > > > > I'm using the pcie-rcar PCIe host controller and it works fine without
> > > > > the IOMMU, and I can attach the IOMMU to the controller such that any
> > calls
> > > > > to dma_alloc_coherent made by the controller driver uses the
> iommu_ops
> > > > > version of dma_ops.
> > > > >
> > > > > However, I can't see how to make the endpoints to utilise the dma_ops
> that
> > > > > the controller uses. Shouldn't the endpoints inherit the dma_ops from 
> > > > > the
> > > > > controller?
> > > >
> > > > No, not directly.
> > > >
> > > > > Any pointers for this?
> > > >
> > > > You need to understand the process through which a driver for endpoint
> get
> > > > an address to be passed down to the device. Have a look at
> > > > Documentation/DMA-API-HOWTO.txt, there is a nice explanation there.
> > > > (Hint: EP driver needs to call dma_map_single).
> > > >
> > > > Also, you need to make sure that the bus address that ends up being set
> into
> > > > the endpoint gets translated correctly by the host controller into an 
> > > > address
> > > > that the IOMMU can then translate into physical address.
> > > Sure, though since this is bog standard Intel PCIe ethernet card which 
> > > works
> > > fine when the IOMMU is effectively unused, I don’t think there is a 
> > > problem
> > > with that.
> > >
> > > The driver for the PCIe controller sets up the IOMMU mapping ok when I
> > > do a test call to dma_alloc_coherent() in the controller's driver. i.e. 
> > > when I
> > > do this, it ends up in arm_iommu_alloc_attrs(), which calls
> > > __iommu_alloc_buffer() and __alloc_iova().
> > >
> > > When an endpoint driver allocates and maps a dma coherent buffer it
> > > also needs to end up in arm_iommu_alloc_attrs(), but it doesn't.
> >
> > Why do you think that? Remember that the only thing attached to the IOMMU
> is
> > the
> > host controller. The endpoint is on the PCIe bus, which gets a different
> > translation
> > that the IOMMU knows nothing about. If it helps you to visualise it better, 
> > think
> > of the host controller as another IOMMU device. It's the ops of the host
> > controller
> > that should be invoked, not the IOMMU's.
> Ok, that makes sense. I'll have a think and poke it a bit more...
Somewhat related to this, since our PCIe controller HW is limited to
32-bit AXI address range, before trying to hook up the IOMMU I have
tried to limit the dma_mask for PCI cards to DMA_BIT_MASK(32). The
reason being that Linux uses a 1 to 1 mapping between PCI addresses
and cpu (phys) addresses when there isn't an IOMMU involved, so I
think that we need to limit the PCI address space used.

Since pci_setup_device() sets up dma_mask, I added a bus notifier in the
PCIe controller driver so I can change the mask, if needed, on the
BUS_NOTIFY_BOUND_DRIVER action.
However, I think there is the potential for card drivers to allocate and
map buffers before the bus notifier get called. Additionally, I've seen
drivers change their behaviour based on the success or failure of
dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64)), so the
driver could, theoretically at least, operate in a way that is not
compatible with a more restricted dma_mask (though I can't think
of any way this would not work with hardware I've seen).

So, I think that using a bus notifier is the wrong way to go, but I don’t
know what other options I have. Any suggestions?

Thanks for your help
Phil


RE: PCIe host controller behind IOMMU on ARM

2015-11-04 Thread Phil Edworthy
Hi Will,

On 04 November 2015 15:30, Will wrote:
> On Wed, Nov 04, 2015 at 03:19:13PM +, Phil Edworthy wrote:
> > On 04 November 2015 15:02, Liviu wrote:
> > > On Wed, Nov 04, 2015 at 02:48:38PM +, Phil Edworthy wrote:
> > > > Sure, though since this is bog standard Intel PCIe ethernet card which 
> > > > works
> > > > fine when the IOMMU is effectively unused, I don’t think there is a 
> > > > problem
> > > > with that.
> > > >
> > > > The driver for the PCIe controller sets up the IOMMU mapping ok when I
> > > > do a test call to dma_alloc_coherent() in the controller's driver. i.e. 
> > > > when I
> > > > do this, it ends up in arm_iommu_alloc_attrs(), which calls
> > > > __iommu_alloc_buffer() and __alloc_iova().
> > > >
> > > > When an endpoint driver allocates and maps a dma coherent buffer it
> > > > also needs to end up in arm_iommu_alloc_attrs(), but it doesn't.
> > >
> > > Why do you think that? Remember that the only thing attached to the
> IOMMU is
> > > the
> > > host controller. The endpoint is on the PCIe bus, which gets a different
> > > translation
> > > that the IOMMU knows nothing about. If it helps you to visualise it 
> > > better,
> think
> > > of the host controller as another IOMMU device. It's the ops of the host
> > > controller
> > > that should be invoked, not the IOMMU's.
> > Ok, that makes sense. I'll have a think and poke it a bit more...
> 
> Take a look at of_iommu_configure, which is currently lacking support
> for PCI devices. It should be using a variant on the device-tree bindings
> already in use for describing MSI device IDs, so that we can translate
> the RequesterID of the endpoint into an ID that the IOMMU can understand.
Whilst we want to introduce isolation at some point, right now we would like
to use the IOMMU as this PCIe controller only uses a 32-bit AXI address.

Thanks
Phil


Re: PCIe host controller behind IOMMU on ARM

2015-11-04 Thread Will Deacon
On Wed, Nov 04, 2015 at 03:19:13PM +, Phil Edworthy wrote:
> On 04 November 2015 15:02, Liviu wrote:
> > On Wed, Nov 04, 2015 at 02:48:38PM +, Phil Edworthy wrote:
> > > Sure, though since this is bog standard Intel PCIe ethernet card which 
> > > works
> > > fine when the IOMMU is effectively unused, I don’t think there is a 
> > > problem
> > > with that.
> > >
> > > The driver for the PCIe controller sets up the IOMMU mapping ok when I
> > > do a test call to dma_alloc_coherent() in the controller's driver. i.e. 
> > > when I
> > > do this, it ends up in arm_iommu_alloc_attrs(), which calls
> > > __iommu_alloc_buffer() and __alloc_iova().
> > >
> > > When an endpoint driver allocates and maps a dma coherent buffer it
> > > also needs to end up in arm_iommu_alloc_attrs(), but it doesn't.
> > 
> > Why do you think that? Remember that the only thing attached to the IOMMU is
> > the
> > host controller. The endpoint is on the PCIe bus, which gets a different
> > translation
> > that the IOMMU knows nothing about. If it helps you to visualise it better, 
> > think
> > of the host controller as another IOMMU device. It's the ops of the host
> > controller
> > that should be invoked, not the IOMMU's.
> Ok, that makes sense. I'll have a think and poke it a bit more...

Take a look at of_iommu_configure, which is currently lacking support
for PCI devices. It should be using a variant on the device-tree bindings
already in use for describing MSI device IDs, so that we can translate
the RequesterID of the endpoint into an ID that the IOMMU can understand.

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: PCIe host controller behind IOMMU on ARM

2015-11-04 Thread Phil Edworthy
Hi Liviu,

On 04 November 2015 15:02, Liviu wrote:
> On Wed, Nov 04, 2015 at 02:48:38PM +, Phil Edworthy wrote:
> > Hi Liviu,
> >
> > On 04 November 2015 14:24, Liviu wrote:
> > > On Wed, Nov 04, 2015 at 01:57:48PM +, Phil Edworthy wrote:
> > > > Hi,
> > > >
> > > > I am trying to hook up a PCIe host controller that sits behind an IOMMU,
> > > > but having some problems.
> > > >
> > > > I'm using the pcie-rcar PCIe host controller and it works fine without
> > > > the IOMMU, and I can attach the IOMMU to the controller such that any
> calls
> > > > to dma_alloc_coherent made by the controller driver uses the iommu_ops
> > > > version of dma_ops.
> > > >
> > > > However, I can't see how to make the endpoints to utilise the dma_ops 
> > > > that
> > > > the controller uses. Shouldn't the endpoints inherit the dma_ops from 
> > > > the
> > > > controller?
> > >
> > > No, not directly.
> > >
> > > > Any pointers for this?
> > >
> > > You need to understand the process through which a driver for endpoint get
> > > an address to be passed down to the device. Have a look at
> > > Documentation/DMA-API-HOWTO.txt, there is a nice explanation there.
> > > (Hint: EP driver needs to call dma_map_single).
> > >
> > > Also, you need to make sure that the bus address that ends up being set 
> > > into
> > > the endpoint gets translated correctly by the host controller into an 
> > > address
> > > that the IOMMU can then translate into physical address.
> > Sure, though since this is bog standard Intel PCIe ethernet card which works
> > fine when the IOMMU is effectively unused, I don’t think there is a problem
> > with that.
> >
> > The driver for the PCIe controller sets up the IOMMU mapping ok when I
> > do a test call to dma_alloc_coherent() in the controller's driver. i.e. 
> > when I
> > do this, it ends up in arm_iommu_alloc_attrs(), which calls
> > __iommu_alloc_buffer() and __alloc_iova().
> >
> > When an endpoint driver allocates and maps a dma coherent buffer it
> > also needs to end up in arm_iommu_alloc_attrs(), but it doesn't.
> 
> Why do you think that? Remember that the only thing attached to the IOMMU is
> the
> host controller. The endpoint is on the PCIe bus, which gets a different
> translation
> that the IOMMU knows nothing about. If it helps you to visualise it better, 
> think
> of the host controller as another IOMMU device. It's the ops of the host
> controller
> that should be invoked, not the IOMMU's.
Ok, that makes sense. I'll have a think and poke it a bit more...

Thanks for your comments
Phil


Re: PCIe host controller behind IOMMU on ARM

2015-11-04 Thread liviu.du...@arm.com
On Wed, Nov 04, 2015 at 02:48:38PM +, Phil Edworthy wrote:
> Hi Liviu,
> 
> On 04 November 2015 14:24, Liviu wrote:
> > On Wed, Nov 04, 2015 at 01:57:48PM +, Phil Edworthy wrote:
> > > Hi,
> > >
> > > I am trying to hook up a PCIe host controller that sits behind an IOMMU,
> > > but having some problems.
> > >
> > > I'm using the pcie-rcar PCIe host controller and it works fine without
> > > the IOMMU, and I can attach the IOMMU to the controller such that any 
> > > calls
> > > to dma_alloc_coherent made by the controller driver uses the iommu_ops
> > > version of dma_ops.
> > >
> > > However, I can't see how to make the endpoints to utilise the dma_ops that
> > > the controller uses. Shouldn't the endpoints inherit the dma_ops from the
> > > controller?
> > 
> > No, not directly.
> > 
> > > Any pointers for this?
> > 
> > You need to understand the process through which a driver for endpoint get
> > an address to be passed down to the device. Have a look at
> > Documentation/DMA-API-HOWTO.txt, there is a nice explanation there.
> > (Hint: EP driver needs to call dma_map_single).
> > 
> > Also, you need to make sure that the bus address that ends up being set into
> > the endpoint gets translated correctly by the host controller into an 
> > address
> > that the IOMMU can then translate into physical address.
> Sure, though since this is bog standard Intel PCIe ethernet card which works
> fine when the IOMMU is effectively unused, I don’t think there is a problem
> with that.
> 
> The driver for the PCIe controller sets up the IOMMU mapping ok when I
> do a test call to dma_alloc_coherent() in the controller's driver. i.e. when I
> do this, it ends up in arm_iommu_alloc_attrs(), which calls
> __iommu_alloc_buffer() and __alloc_iova().
> 
> When an endpoint driver allocates and maps a dma coherent buffer it
> also needs to end up in arm_iommu_alloc_attrs(), but it doesn't.

Why do you think that? Remember that the only thing attached to the IOMMU is the
host controller. The endpoint is on the PCIe bus, which gets a different 
translation
that the IOMMU knows nothing about. If it helps you to visualise it better, 
think
of the host controller as another IOMMU device. It's the ops of the host 
controller
that should be invoked, not the IOMMU's.

Best regards,
Liviu

> 
> Thanks
> Phil

-- 

| I would like to |
| fix the world,  |
| but they're not |
| giving me the   |
 \ source code!  /
  ---
¯\_(ツ)_/¯
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: PCIe host controller behind IOMMU on ARM

2015-11-04 Thread Phil Edworthy
Hi Liviu,

On 04 November 2015 14:24, Liviu wrote:
> On Wed, Nov 04, 2015 at 01:57:48PM +, Phil Edworthy wrote:
> > Hi,
> >
> > I am trying to hook up a PCIe host controller that sits behind an IOMMU,
> > but having some problems.
> >
> > I'm using the pcie-rcar PCIe host controller and it works fine without
> > the IOMMU, and I can attach the IOMMU to the controller such that any calls
> > to dma_alloc_coherent made by the controller driver uses the iommu_ops
> > version of dma_ops.
> >
> > However, I can't see how to make the endpoints to utilise the dma_ops that
> > the controller uses. Shouldn't the endpoints inherit the dma_ops from the
> > controller?
> 
> No, not directly.
> 
> > Any pointers for this?
> 
> You need to understand the process through which a driver for endpoint get
> an address to be passed down to the device. Have a look at
> Documentation/DMA-API-HOWTO.txt, there is a nice explanation there.
> (Hint: EP driver needs to call dma_map_single).
> 
> Also, you need to make sure that the bus address that ends up being set into
> the endpoint gets translated correctly by the host controller into an address
> that the IOMMU can then translate into physical address.
Sure, though since this is bog standard Intel PCIe ethernet card which works
fine when the IOMMU is effectively unused, I don’t think there is a problem
with that.

The driver for the PCIe controller sets up the IOMMU mapping ok when I
do a test call to dma_alloc_coherent() in the controller's driver. i.e. when I
do this, it ends up in arm_iommu_alloc_attrs(), which calls
__iommu_alloc_buffer() and __alloc_iova().

When an endpoint driver allocates and maps a dma coherent buffer it
also needs to end up in arm_iommu_alloc_attrs(), but it doesn't.

Thanks
Phil


Re: PCIe host controller behind IOMMU on ARM

2015-11-04 Thread liviu.du...@arm.com
On Wed, Nov 04, 2015 at 01:57:48PM +, Phil Edworthy wrote:
> Hi,
> 
> I am trying to hook up a PCIe host controller that sits behind an IOMMU,
> but having some problems.
> 
> I'm using the pcie-rcar PCIe host controller and it works fine without
> the IOMMU, and I can attach the IOMMU to the controller such that any calls
> to dma_alloc_coherent made by the controller driver uses the iommu_ops
> version of dma_ops.
> 
> However, I can't see how to make the endpoints to utilise the dma_ops that
> the controller uses. Shouldn't the endpoints inherit the dma_ops from the
> controller? 

No, not directly.

> Any pointers for this?

You need to understand the process through which a driver for endpoint get
an address to be passed down to the device. Have a look at
Documentation/DMA-API-HOWTO.txt, there is a nice explanation there.
(Hint: EP driver needs to call dma_map_single).

Also, you need to make sure that the bus address that ends up being set into
the endpoint gets translated correctly by the host controller into an address
that the IOMMU can then translate into physical address.

Best regards,
Liviu


> 
> Thanks
> Phil
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 

| I would like to |
| fix the world,  |
| but they're not |
| giving me the   |
 \ source code!  /
  ---
¯\_(ツ)_/¯
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCIe bus (re-)numbering

2015-09-29 Thread Yinghai Lu
On Tue, Sep 29, 2015 at 7:04 AM, Ruud  wrote:
>
> (for illustration, lspci -tv from a 3.2 kernel, hand edited as the
> original picture has already discovered the subordinate busses)
> +-02.0-[04-17]00.0-[05-17]--+-04.0-[06-16]00.0-[07-16]---
>  |   |
>   \-08.0-[11-16]
>

please try to post lspci -tv and whole boot log.

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCIe bus (re-)numbering

2015-09-29 Thread Ruud
>>
>> Thus the procedure that works is:
>> 1) Chassis off
>> 2) Boot linux
>> 3) Chassis on
>> 4) setpci busnrs to 0
>> 5) remove switch
>> 6) rescan
>
> 4, 5 is reversed?

I can not setpci on a removed device, afaik for that reason I
reset the busses before removing the switch (not physically remove but
echo 1 >../remove).

For the kernel crash issue I will try to explain better, and try to
get a kernel crash dump to post some logging. It seems that the switch
will still fire hotplug events even after being removed (see lspci -tv
below). I zero-ed the busnumber for bus 04 (02.0-[04-17])  without
removing the underlying bus 05 device 00.0 and switches below.

(for illustration, lspci -tv from a 3.2 kernel, hand edited as the
original picture has already discovered the subordinate busses)
+-02.0-[04-17]00.0-[05-17]--+-04.0-[06-16]00.0-[07-16]---
 |   |
  \-08.0-[11-16]

Sorry for the delay, but I didn't manage to create a crashdump yet :(
and the BMC is not cooperative in getting the serial console kernel
working either.
Will not go into details on my attempts for the kernel crash logging
to work as it is out of scope for the discussion..

Bet regards, Ruud
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCIe bus (re-)numbering

2015-09-21 Thread Yinghai Lu
On Mon, Sep 21, 2015 at 7:06 AM, Ruud  wrote:
> 2015-09-21 9:49 GMT+02:00 Ruud :

> Test result based upon
>
> http://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git
> 7e3fad2 (for-pci-v4.3-rc1)
>
> Unfortunately your script hangs the system, I modified it for my
> situation (only one root complex, 1 PCI switch will be affected).
>  It seems to aggressive on the BAR's..tried to troubleshoot but don't
> understand the details on
>
>   /sbin/setpci -s $NAME 0x20.l=0
>   /sbin/setpci -s $NAME 0x24.l=0
>   /sbin/setpci -s $NAME 0x28.l=0
>   /sbin/setpci -s $NAME 0x2c.l=0
>
> why you zero these (besides the bus numbers).

so the realloc will start from clean.

>
> I got the full system working by just zeroing the busnumbers on the
> switch at root complex level, remove the switch and rescan it.

Good.

> In this
> procedure I need to take care of the some physical aspects: I can not
> turn on the chassis when the PCIe switch is removed, some interrupt
> comes through and will result in a kernel panic.

?

>
> Thus the procedure that works is:
> 1) Chassis off
> 2) Boot linux
> 3) Chassis on
> 4) setpci busnrs to 0
> 5) remove switch
> 6) rescan

4, 5 is reversed?

>
> Cards recognised and functional, memory regions look ok, running out
> of iospace but not much can be done about that.
>
> Same procedure on 3.2.0 kernel (debian stock)  results in all cards
> being recognized (enum ok) but fails to assign BAR's (tries to fit
> them below 4G)

Could be resource allocation problem.

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCIe bus (re-)numbering

2015-09-21 Thread Yinghai Lu
On Mon, Sep 21, 2015 at 12:49 AM, Ruud  wrote:
> Sorry for the rookie question: which version do you mean by "current
> upstream", the current main trunk rc?.

current linus tree like v4.2 or v4.3-rc1 etc
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCIe bus (re-)numbering

2015-09-21 Thread Ruud
2015-09-21 9:49 GMT+02:00 Ruud :
>>   /sbin/setpci -s $NAME 0x1a.b=0
>>   N=`find /sys/devices/pci:"$BUS"/"$NAME"/remove -name "remove"`
>>   echo $N
>>   echo -n 1 > "$N"
>>   sleep 1s
>> done
>> done
>>
>
> Thanks for the script!
>
>>
>>>
>>> I will test next monday.
>>


Test result based upon

http://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git
7e3fad2 (for-pci-v4.3-rc1)

Unfortunately your script hangs the system, I modified it for my
situation (only one root complex, 1 PCI switch will be affected).
 It seems to aggressive on the BAR's..tried to troubleshoot but don't
understand the details on

  /sbin/setpci -s $NAME 0x20.l=0
  /sbin/setpci -s $NAME 0x24.l=0
  /sbin/setpci -s $NAME 0x28.l=0
  /sbin/setpci -s $NAME 0x2c.l=0

why you zero these (besides the bus numbers).

I got the full system working by just zeroing the busnumbers on the
switch at root complex level, remove the switch and rescan it. In this
procedure I need to take care of the some physical aspects: I can not
turn on the chassis when the PCIe switch is removed, some interrupt
comes through and will result in a kernel panic.

Thus the procedure that works is:
1) Chassis off
2) Boot linux
3) Chassis on
4) setpci busnrs to 0
5) remove switch
6) rescan

Cards recognised and functional, memory regions look ok, running out
of iospace but not much can be done about that.

Same procedure on 3.2.0 kernel (debian stock)  results in all cards
being recognized (enum ok) but fails to assign BAR's (tries to fit
them below 4G)


Best regards,

Ruud
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCIe bus (re-)numbering

2015-09-21 Thread Ruud
>   /sbin/setpci -s $NAME 0x1a.b=0
>   N=`find /sys/devices/pci:"$BUS"/"$NAME"/remove -name "remove"`
>   echo $N
>   echo -n 1 > "$N"
>   sleep 1s
> done
> done
>

Thanks for the script!

>
>>
>> I will test next monday.
>
> Good. Please check current upstream and my tree for-pci-v4.3-rc1 branch.


Sorry for the rookie question: which version do you mean by "current
upstream", the current main trunk rc?.
I am currently using for-pci-v4.3-rc1 for testing.

>> Are these settings in the binary driver? I do not see that much need
>> for a driver to use the geographical addressing after the BAR's have
>> been set. I thus wondered if it is feasable to hide the geographical
>> addressing from the driver and offer an API for it from the PCIe layer
>> to the drivers...
>
> Card firmware.  Assume those card firmware would trap pci conf read cycle
> and compare something inside.
> The only workaround that I found is reset the link to make firmware rebooting.
> but that will have problem if you are use the disk as root etc.
>

I do not completely understand the remark you make, are you suggesting
the firmware running on the PCIe card (embedded in a CPU on the card
itself) is acting on the content of the busnumber or do you mean x86
extension rom code acting on the busnumber.

Although it is probably irrelevant for the problem I like to solve, I
still like to understand all details.


> Thanks
>
> Yinghai

BR,

Ruud
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCIe bus (re-)numbering

2015-09-20 Thread Yinghai Lu
On Sun, Sep 20, 2015 at 2:17 AM, Ruud  wrote:
>
> The current procedure I follow is to boot with two PCIe switches in the host.
> (one at the root complex level, intel based, one level above PLX
> based, and the whole tree in the chassis).
>
> - I turn off the chassis (as it conflicts with the BIOS :( )
> - Reboot into linux.
> - remove the intel based switch (has no relevant childs) (echo 1
>>.../remove  sorry for the missing numbers its weekend)
> - turn on chassis
> - rescan starting at the root complex  (echo 1 > .../rescan )
>
> During the rescan, it will map in the original busnumber-range which
> is too small. I understand from your email that by clearing the
> busnumber range in the switch (perhaps both host switces), the kernel
> will pick a different range which is not clamped in by the other
> busnumbers of surrounding other switches?

Yes.

Only need to clear root port.

here the scripts that I used to test busn_alloc and other mmio
resource allocation.
The system could have 8 peer pci root buses.

#
# for x4-4, x4-8, x5-4, x5-8

BUSES='00 20 40 60 80 a0 c0 e0'
DEV_FUNCS='02.0 03.2'

echo "Remove all child devices at first"
for BUS in $BUSES; do
for DEV_FUNC in $DEV_FUNCS; do
  NAME=:"$BUS":"$DEV_FUNC"
  LINE=`/sbin/lspci -nn -s $NAME | wc -l`
  if [ $LINE -eq 0 ]; then
continue
  fi
  echo $NAME
  NA=`find /sys/devices/pci:"$BUS"/"$NAME"/*/remove -name "remove"`
  for N in $NA; do
echo $N
echo -n 1 > "$N"
sleep 1s
  done
done
done

sleep 5s

echo "Clear bridge mmio BARs and busn"
for BUS in $BUSES; do
for DEV_FUNC in $DEV_FUNCS; do
  NAME=:"$BUS":"$DEV_FUNC"
  LINE=`/sbin/lspci -nn -s $NAME | wc -l`
  if [ $LINE -eq 0 ]; then
continue
  fi
  echo $NAME
  /sbin/setpci -s $NAME 0x20.l=0
  /sbin/setpci -s $NAME 0x24.l=0
  /sbin/setpci -s $NAME 0x28.l=0
  /sbin/setpci -s $NAME 0x2c.l=0
  /sbin/setpci -s $NAME 0x18.w=0
  /sbin/setpci -s $NAME 0x1a.b=0
  N=`find /sys/devices/pci:"$BUS"/"$NAME"/remove -name "remove"`
  echo $N
  echo -n 1 > "$N"
  sleep 1s
done
done


>
> I will test next monday.

Good. Please check current upstream and my tree for-pci-v4.3-rc1 branch.

>
> What I did get to work is the following procedure:
>> - I turn off the chassis (as it conflicts with the BIOS :(  )
> - Boot linux with parameter pci=assign-busses (BIOS will have
> configured the switches in the host without a serious busnumber range)
> - remove the intel based switch (has no relevant childs) (echo 1
>>.../remove  sorry for the missing numbers its weekend)
> - turn on chassis
> - rescan starting at the root complex  (echo 1 > .../rescan )
> During rescan the numbering is messed up, and dmesg fills up with
> ethernet renaming "errors", didn;t dare to look at other side-effects.

assign-busses may be too destructive.  May make some card firmware
not happy.

>
>>
>>>
>>
>> Do you mean changing bus number without unloading driver ?
>>
>> No, you can not do that.
>>
>> some device firmware like lsi cards, if you change it's primary bus number,
>> the device will stop working, but that is another problem.
>>
>
> Are these settings in the binary driver? I do not see that much need
> for a driver to use the geographical addressing after the BAR's have
> been set. I thus wondered if it is feasable to hide the geographical
> addressing from the driver and offer an API for it from the PCIe layer
> to the drivers...

Card firmware.  Assume those card firmware would trap pci conf read cycle
and compare something inside.
The only workaround that I found is reset the link to make firmware rebooting.
but that will have problem if you are use the disk as root etc.

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCIe bus (re-)numbering

2015-09-20 Thread Ruud
>> The current algorithm seems to allocate 8 extra busnumbers at the
>> hotplug switch, but clearly 8 is not sufficient for the whole tree
>> when it is discovered after initial numbering has been assigned. As
>> the PCIe routing requires the bus numbers to be consecutive as it
>> describes ranges there are not that many allocation strategies for bus
>> numbers. It is impossible to predict at boot-time which switch will
>> require lots of busses and which do not.
>
> Well, if you need more than 8 bus number then practical way is
> booting with pcie switch and late only hot-remove and host-add
> instead of code hot-add.

The current procedure I follow is to boot with two PCIe switches in the host.
(one at the root complex level, intel based, one level above PLX
based, and the whole tree in the chassis).

- I turn off the chassis (as it conflicts with the BIOS :( )
- Reboot into linux.
- remove the intel based switch (has no relevant childs) (echo 1
>.../remove  sorry for the missing numbers its weekend)
- turn on chassis
- rescan starting at the root complex  (echo 1 > .../rescan )

During the rescan, it will map in the original busnumber-range which
is too small. I understand from your email that by clearing the
busnumber range in the switch (perhaps both host switces), the kernel
will pick a different range which is not clamped in by the other
busnumbers of surrounding other switches?

I will test next monday.

What I did get to work is the following procedure:

- I turn off the chassis (as it conflicts with the BIOS :(  )
- Reboot into GRUB
- turn on chassis
- Boot linux with parameter pci=assign-busses (BIOS will have
configured the switches in the host without a serious busnumber range)
This procedure is very inconvenient as the host is operated headless.

What almost works is the following procedure:

- I turn off the chassis (as it conflicts with the BIOS :(  )
- Boot linux with parameter pci=assign-busses (BIOS will have
configured the switches in the host without a serious busnumber range)
- remove the intel based switch (has no relevant childs) (echo 1
>.../remove  sorry for the missing numbers its weekend)
- turn on chassis
- rescan starting at the root complex  (echo 1 > .../rescan )
During rescan the numbering is messed up, and dmesg fills up with
ethernet renaming "errors", didn;t dare to look at other side-effects.

>
>>
>
> Do you mean changing bus number without unloading driver ?
>
> No, you can not do that.
>
> some device firmware like lsi cards, if you change it's primary bus number,
> the device will stop working, but that is another problem.
>

Are these settings in the binary driver? I do not see that much need
for a driver to use the geographical addressing after the BAR's have
been set. I thus wondered if it is feasable to hide the geographical
addressing from the driver and offer an API for it from the PCIe layer
to the drivers...

Just a thought.

Best regards,

Ruud
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCIe bus (re-)numbering

2015-09-19 Thread Yinghai Lu
On Sat, Sep 19, 2015 at 1:20 AM, Ruud  wrote:
> Hello all,
>
> Not a patch, not a complaint: a start of a discussion on PCIe bus
> renumbering and bus numbering in general..
>
>
> The current algorithm seems to allocate 8 extra busnumbers at the
> hotplug switch, but clearly 8 is not sufficient for the whole tree
> when it is discovered after initial numbering has been assigned. As
> the PCIe routing requires the bus numbers to be consecutive as it
> describes ranges there are not that many allocation strategies for bus
> numbers. It is impossible to predict at boot-time which switch will
> require lots of busses and which do not.

Well, if you need more than 8 bus number then practical way is
booting with pcie switch and late only hot-remove and host-add
instead of code hot-add.

>
> A solution is static assignment (e.g. as described by
> http://article.gmane.org/gmane.linux.kernel.pci/45212), but it seems
> not convenient to me.

Interesting, one patch in that thread looks like try to use bus range blindly.
so it is no go.

>
> I got the impression the most elegant way is to renumber, but at the
> same time I doubt. Would the BIOS become confused? Currently the
> kernel becomes confused as it renumbers the ethernet interfaces when
> the bus-numbers change. Several drivers seem to be locked to the
> device by its geographical routing (aka bus << 16 | device << 11 |
> function << 8 ). I got the impression that this is the root of the
> evil as the bus need not be as constant as expected.

Do you mean changing bus number without unloading driver ?

No, you can not do that.

some device firmware like lsi cards, if you change it's primary bus number,
the device will stop working, but that is another problem.

For cold hot add several pcie switches, right way would be have a script:
1. remove related children devices.
2. use setpci to clear bus number register
3. remove bridge devices
4. do pci rescan.

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Pcie bus enumeration and 64bit issues

2015-09-16 Thread Yinghai Lu
On Wed, Sep 16, 2015 at 12:08 PM, Ruudgoogle  wrote:
>
> For a big system i use an external pcie enclosure. Unfortunately the bios  
> fails to properly initialise the system. As work around i plan to start the 
> chassis after the linux kernel has booted. This leads to some other problems 
> i would like to discuss here/get pointers to kernel code to read.
>
> 1) the chassis adds several pcie busses, i would like to reserve a range of 
> busnumbers at the last pcie switch in the host. Something like that seems to 
> be done for cardbus, but not for pcie switches in general. The reason is 
> clear: allocating big chunks of busnumbers will exhaust that resource. 
> Looking for advice!
>
> 2) For unclear reasons the linux kernel maps the 64bit bars below 4G and that 
> range get exhausted, how can i check if a 64bit mmio pool is present/do i 
> need to specify it manually or should the bios indicate a range? Looking for 
> advice as well.
>

I have one patch set that would do pci busn allocation.

So please check

git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git
for-pci-v4.3-rc1

and it also have latest resource allocation enhancement.

please post boot log with "debug ignore_loglevel" along with output from
lspci -vvv
lspci -tv

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCIe patch for ARM64

2015-06-19 Thread Pratyush Anand
On Fri, Jun 19, 2015 at 11:20 AM, Bharat Kumar Gogada
 wrote:
> Hi
>
> I am developing PCIe root port driver for Xilinx device. We have used 
> following patch for ARM64 bit support "https://lkml.org/lkml/2014/7/3/764";. 
> The link explains it as temporary patch and main line will be updated soon 
> with those changes. We have not seen the changes till now. Can you let us 
> know when we can expect the changes in mainline? We are planning to push our 
> drivers to mainline.

Please have a look to the latest kernel and try to see what is
available for ARM64 PCIe.

You may see drivers/pci/host/pci-xgene.c as reference.

I see that there is already a driver for xilinx PCIe controller with
ARM platform.
drivers/pci/host/pcie-xilinx.c

If it is the same device on ARM64, then you need to take similar
approach what is being discussed
for designware on ARM64.

http://marc.info/?l=devicetree&m=143394380815743&w=2

~Pratyush
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCIe 32-bit MMIO exhaustion

2015-03-19 Thread Bjorn Helgaas
On Wed, Mar 04, 2015 at 11:01:59AM -0600, Bjorn Helgaas wrote:
> On Wed, Mar 04, 2015 at 03:12:04PM +0800, Daniel J Blueman wrote:
> > Your patch solves the conflicts nicely [1] with:
> > 
> > From f835b16b0758a1dde6042a0e4c8aa5a2e8be5f21 Mon Sep 17 00:00:00 2001
> > From: Daniel J Blueman 
> > Date: Wed, 4 Mar 2015 14:53:00 +0800
> > Subject: [PATCH] Mark PCI BARs with address 0 as unset
> > 
> > Allow the kernel to activate the unset flag for PCI BAR resources if
> > the firmware assigns address 0 (invalid as legacy IO is in this range).
> > 
> > This allows preventing conflicts with legacy IO/ACPI PNP resources in
> > this range.
> > 
> > Signed-off-by: Daniel J Blueman 
> > ---
> >  drivers/pci/probe.c | 7 +++
> >  1 file changed, 7 insertions(+)
> > 
> > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> > index 8d2f400..ef43652 100644
> > --- a/drivers/pci/probe.c
> > +++ b/drivers/pci/probe.c
> > @@ -281,6 +281,13 @@ int __pci_read_base(struct pci_dev *dev, enum
> > pci_bar_type type,
> > pcibios_resource_to_bus(dev->bus, &inverted_region, res);
> > 
> > /*
> > +* If firmware doesn't assign a valid PCI address (as legacy IO is below
> > +* PCI IO), mark resource unset to prevent later resource conflicts
> > +*/
> > +   if (region.start == 0)
> > +   res->flags |= IORESOURCE_UNSET;
> 
> It's true that an uninitialized BAR should contain zero.  But an
> initialized BAR may also contain zero, since zero is a valid PCI memory or
> I/O address, so I don't really want to preclude that here.  On large
> systems with host bridges that support address translation, it would be
> reasonable to have something like this:
> 
>   pci_bus 0001:00: root bus resource [mem 0x1-0x1] (bus 
> address [0x-0x])
> 
> In that case, an initialized BAR may contain zero and that should not be an
> error.
> 
> On your system, I don't think you advertise an I/O aperture to bus 0001:00.
> I'd like to make the PCI core smart enough to notice that and just ignore
> any I/O BARs on that bus.
> 
> There's an argument for doing this immediately, here inside
> __pci_read_base(): we could look for an upstream window that contains the
> BAR we're reading.  I'd like to be able to do that someday, but I'm not
> sure we have enough of the upstream topology set up to do that.
> 
> Can you try the patch below, which tries to do it a little later?
> 
> > +   /*
> >  * If "A" is a BAR value (a bus address), "bus_to_resource(A)" is
> >  * the corresponding resource address (the physical address used by
> >  * the CPU.  Converting that resource address back to a bus address
> > 
> > [1] https://resource.numascale.com/dmesg-4.0.0-rc2.txt
> 
> This URL doesn't work for me.

Ping?

> commit 66c15b678466cb217f2615d4078d12a2ee4c99ac
> Author: Bjorn Helgaas 
> Date:   Wed Mar 4 10:47:35 2015 -0600
> 
> PCI: Mark invalid BARs as unassigned
> 
> If a BAR is not inside any upstream bridge window, or if it conflicts with
> another resource, mark it as IORESOURCE_UNSET so we don't try to use it.
> We may be able to assign a different address for it.
> 
> Signed-off-by: Bjorn Helgaas 
> 
> diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
> index b7c3a5ea1fca..232f9254c11a 100644
> --- a/drivers/pci/setup-res.c
> +++ b/drivers/pci/setup-res.c
> @@ -120,6 +120,7 @@ int pci_claim_resource(struct pci_dev *dev, int resource)
>   if (!root) {
>   dev_info(&dev->dev, "can't claim BAR %d %pR: no compatible 
> bridge window\n",
>resource, res);
> + res->flags |= IORESOURCE_UNSET;
>   return -EINVAL;
>   }
>  
> @@ -127,6 +128,7 @@ int pci_claim_resource(struct pci_dev *dev, int resource)
>   if (conflict) {
>   dev_info(&dev->dev, "can't claim BAR %d %pR: address conflict 
> with %s %pR\n",
>resource, res, conflict->name, conflict);
> + res->flags |= IORESOURCE_UNSET;
>   return -EBUSY;
>   }
>  
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCIe 32-bit MMIO exhaustion

2015-03-04 Thread Bjorn Helgaas
On Wed, Mar 04, 2015 at 03:12:04PM +0800, Daniel J Blueman wrote:
> Your patch solves the conflicts nicely [1] with:
> 
> From f835b16b0758a1dde6042a0e4c8aa5a2e8be5f21 Mon Sep 17 00:00:00 2001
> From: Daniel J Blueman 
> Date: Wed, 4 Mar 2015 14:53:00 +0800
> Subject: [PATCH] Mark PCI BARs with address 0 as unset
> 
> Allow the kernel to activate the unset flag for PCI BAR resources if
> the firmware assigns address 0 (invalid as legacy IO is in this range).
> 
> This allows preventing conflicts with legacy IO/ACPI PNP resources in
> this range.
> 
> Signed-off-by: Daniel J Blueman 
> ---
>  drivers/pci/probe.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 8d2f400..ef43652 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -281,6 +281,13 @@ int __pci_read_base(struct pci_dev *dev, enum
> pci_bar_type type,
>   pcibios_resource_to_bus(dev->bus, &inverted_region, res);
> 
>   /*
> +  * If firmware doesn't assign a valid PCI address (as legacy IO is below
> +  * PCI IO), mark resource unset to prevent later resource conflicts
> +  */
> + if (region.start == 0)
> + res->flags |= IORESOURCE_UNSET;

It's true that an uninitialized BAR should contain zero.  But an
initialized BAR may also contain zero, since zero is a valid PCI memory or
I/O address, so I don't really want to preclude that here.  On large
systems with host bridges that support address translation, it would be
reasonable to have something like this:

  pci_bus 0001:00: root bus resource [mem 0x1-0x1] (bus address 
[0x-0x])

In that case, an initialized BAR may contain zero and that should not be an
error.

On your system, I don't think you advertise an I/O aperture to bus 0001:00.
I'd like to make the PCI core smart enough to notice that and just ignore
any I/O BARs on that bus.

There's an argument for doing this immediately, here inside
__pci_read_base(): we could look for an upstream window that contains the
BAR we're reading.  I'd like to be able to do that someday, but I'm not
sure we have enough of the upstream topology set up to do that.

Can you try the patch below, which tries to do it a little later?

> + /*
>* If "A" is a BAR value (a bus address), "bus_to_resource(A)" is
>* the corresponding resource address (the physical address used by
>* the CPU.  Converting that resource address back to a bus address
> 
> [1] https://resource.numascale.com/dmesg-4.0.0-rc2.txt

This URL doesn't work for me.

Bjorn


commit 66c15b678466cb217f2615d4078d12a2ee4c99ac
Author: Bjorn Helgaas 
Date:   Wed Mar 4 10:47:35 2015 -0600

PCI: Mark invalid BARs as unassigned

If a BAR is not inside any upstream bridge window, or if it conflicts with
another resource, mark it as IORESOURCE_UNSET so we don't try to use it.
We may be able to assign a different address for it.

Signed-off-by: Bjorn Helgaas 

diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
index b7c3a5ea1fca..232f9254c11a 100644
--- a/drivers/pci/setup-res.c
+++ b/drivers/pci/setup-res.c
@@ -120,6 +120,7 @@ int pci_claim_resource(struct pci_dev *dev, int resource)
if (!root) {
dev_info(&dev->dev, "can't claim BAR %d %pR: no compatible 
bridge window\n",
 resource, res);
+   res->flags |= IORESOURCE_UNSET;
return -EINVAL;
}
 
@@ -127,6 +128,7 @@ int pci_claim_resource(struct pci_dev *dev, int resource)
if (conflict) {
dev_info(&dev->dev, "can't claim BAR %d %pR: address conflict 
with %s %pR\n",
 resource, res, conflict->name, conflict);
+   res->flags |= IORESOURCE_UNSET;
return -EBUSY;
}
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCIe 32-bit MMIO exhaustion

2015-03-03 Thread Daniel J Blueman

On 04/03/2015 06:38, Bjorn Helgaas wrote:

[+cc linux-pci, linux-acpi]

On Tue, Feb 24, 2015 at 12:37:39PM +0800, Daniel J Blueman wrote:

Hi Bjorn, Jiang,

On 29/01/2015 23:23, Bjorn Helgaas wrote:

Hi Daniel,

On Wed, Jan 28, 2015 at 2:42 AM, Daniel J Blueman  wrote:

With systems with a large number of PCI devices, we're seeing lack of 32-bit
MMIO space, eg one quad-port NetXtreme-2 adapter takes 128MB of space [1].

An errata to the PCIe 2.1 spec provides guidance on limitations with 64-bit
non-prefetchable BARs (since bridges have only 32-bit non-prefetchable
ranges) stating that vendors can enable the prefetchable bit in BARs under
certain circumstances to allow 64-bit allocation [2].

The problem with that, is that vendors can't know apriori what hosts their
products will be in, so can't just advertise prefetchable 64-bit BARs. What
can be done, is system firmware can use the 64-bit prefetchable BAR in
bridges, and assign a 64-bit non-prefetchable device BAR into that area,
where it is safe to do so (following the guidance).

At present, linux denies such allocations [3] and disables the BARs. It
seems a practical solution to allow them if the firmware believes it is
safe.


This particular message ([3]):


pci 0002:01:00.0: BAR 0: [mem size 0x2000 64bit] conflicts with PCI Bus
0002:00 [mem 0x1002000-0x10027ff pref]


is misleading at best and likely a symptom of a bug.  We printed the
*size* of BAR 0, not an address, which means we haven't assigned space
for the BAR.  That means it should not conflict with anything.

We already do revert to firmware assignments in some situations when
Linux can't figure out how to assign things itself.  But apparently
not in *this* situation.

Without seeing the whole picture, it's hard for me to figure out
what's going on here.  Could you open a bug report at
http://bugzilla.kernel.org (category drivers/PCI) and attach a
complete dmesg and "lspci -vv" output?  Then we can look at what
firmware did and what Linux thought was wrong with it.


Done a while back:
https://bugzilla.kernel.org/show_bug.cgi?id=92671

An interesting question popped up: I find the kernel doesn't accept
IO BARs and bridge windows after address 0x, though the PCI spec
and modern hardware allows 32-bit decode.

Thus for practical reasons, our NumaConnect firmware doesn't setup
IO BARs/windows beyond the first PCI domain (which is the only one
with legacy support, and no drivers seem to require IO their BARs
anyway), ...


If we don't handle IO ports above 0x, I think that's broken.  I'm
pretty sure we do handle that on ia64 (it's done by assigning 64K of IO
space to each host bridge, and I think it's typically translated by the
bridge so each root bus sees a 0-64K space on PCI).  We should be able to
do something similar on x86, but it may not be implemented there yet.


and we get conflicts and warnings [1]:

pnp 00:00: disabling [io  0x0061] because it overlaps 0001:05:00.0
BAR 0 [io  0x-0x00ff]
pci 0001:03:00.0: BAR 13: no space for [io  size 0x1000]
pci 0001:03:00.0: BAR 13: failed to assign [io  size 0x1000]

Is there a cleaner way of dealing with this, in our firmware and/or
the kernel? Eg, I guess if IO BARs aren't assigned (value 0) on PCI
domains without IO bridge windows in the ACPI AML, no need to
conflict/attempt assignment?


Yes, we should be able to deal with this better.

The complaint about disabling the pnp 00:00 resource is bogus because the
PCI 0001:05:00.0 BAR is not assigned and should never be enabled, so this
is not a real conflict.  My intent is that the PCI resource corresponding
to this BAR should have the IORESOURCE_UNSET bit set.  That will prevent
pci_enable_resources() from setting the PCI_COMMAND_IO bit, which is what
would enable the BAR.

Can you try the patch below?  I don't think it will work right off the bat
because I think the fact that we print "[io  0x-0x00ff]" instead of
"[io  size 0x0100]" means we don't have IORESOURCE_UNSET set in the PCI
resource.  But maybe you can figure out where it *should* be getting
set?

Bjorn


commit fd4888cf942a2ae9cdefc46d1fba86b2c7ec2dbf
Author: Bjorn Helgaas 
Date:   Tue Mar 3 16:13:56 2015 -0600

 PNP: Don't check for overlaps with unassigned PCI BARs

 After 0509ad5e1a7d ("PNP: disable PNP motherboard resources that overlap
 PCI BARs"), we disable and warn about PNP resources that overlap PCI BARs.
 But we assume that all PCI BARs are valid, which is incorrect, because a
 BAR may not have any space assigned to it.  In that case, we will not
 enable the BAR, so no other resource can conflict with it.

 Ignore PCI BARs that are unassigned, as indicated by IORESOURCE_UNSET.

 Signed-off-by: Bjorn Helgaas 

diff --git a/drivers/pnp/quirks.c b/drivers/pnp/quirks.c
index ebf0d6710b5a..943c1cb9566c 100644
--- a/drivers/pnp/quirks.c
+++ b/drivers/pnp/quirks.c
@@ -246,13 +246,16 @@ static void quirk_system_pci_resources(struct pnp_dev 
*dev)
 */
for_

Re: PCIe 32-bit MMIO exhaustion

2015-03-03 Thread Bjorn Helgaas
[+cc linux-pci, linux-acpi]

On Tue, Feb 24, 2015 at 12:37:39PM +0800, Daniel J Blueman wrote:
> Hi Bjorn, Jiang,
> 
> On 29/01/2015 23:23, Bjorn Helgaas wrote:
> >Hi Daniel,
> >
> >On Wed, Jan 28, 2015 at 2:42 AM, Daniel J Blueman  
> >wrote:
> >>With systems with a large number of PCI devices, we're seeing lack of 32-bit
> >>MMIO space, eg one quad-port NetXtreme-2 adapter takes 128MB of space [1].
> >>
> >>An errata to the PCIe 2.1 spec provides guidance on limitations with 64-bit
> >>non-prefetchable BARs (since bridges have only 32-bit non-prefetchable
> >>ranges) stating that vendors can enable the prefetchable bit in BARs under
> >>certain circumstances to allow 64-bit allocation [2].
> >>
> >>The problem with that, is that vendors can't know apriori what hosts their
> >>products will be in, so can't just advertise prefetchable 64-bit BARs. What
> >>can be done, is system firmware can use the 64-bit prefetchable BAR in
> >>bridges, and assign a 64-bit non-prefetchable device BAR into that area,
> >>where it is safe to do so (following the guidance).
> >>
> >>At present, linux denies such allocations [3] and disables the BARs. It
> >>seems a practical solution to allow them if the firmware believes it is
> >>safe.
> >
> >This particular message ([3]):
> >
> >>pci 0002:01:00.0: BAR 0: [mem size 0x2000 64bit] conflicts with PCI Bus
> >>0002:00 [mem 0x1002000-0x10027ff pref]
> >
> >is misleading at best and likely a symptom of a bug.  We printed the
> >*size* of BAR 0, not an address, which means we haven't assigned space
> >for the BAR.  That means it should not conflict with anything.
> >
> >We already do revert to firmware assignments in some situations when
> >Linux can't figure out how to assign things itself.  But apparently
> >not in *this* situation.
> >
> >Without seeing the whole picture, it's hard for me to figure out
> >what's going on here.  Could you open a bug report at
> >http://bugzilla.kernel.org (category drivers/PCI) and attach a
> >complete dmesg and "lspci -vv" output?  Then we can look at what
> >firmware did and what Linux thought was wrong with it.
> 
> Done a while back:
> https://bugzilla.kernel.org/show_bug.cgi?id=92671
> 
> An interesting question popped up: I find the kernel doesn't accept
> IO BARs and bridge windows after address 0x, though the PCI spec
> and modern hardware allows 32-bit decode.
> 
> Thus for practical reasons, our NumaConnect firmware doesn't setup
> IO BARs/windows beyond the first PCI domain (which is the only one
> with legacy support, and no drivers seem to require IO their BARs
> anyway), ...

If we don't handle IO ports above 0x, I think that's broken.  I'm
pretty sure we do handle that on ia64 (it's done by assigning 64K of IO
space to each host bridge, and I think it's typically translated by the
bridge so each root bus sees a 0-64K space on PCI).  We should be able to
do something similar on x86, but it may not be implemented there yet.

> and we get conflicts and warnings [1]:
> 
> pnp 00:00: disabling [io  0x0061] because it overlaps 0001:05:00.0
> BAR 0 [io  0x-0x00ff]
> pci 0001:03:00.0: BAR 13: no space for [io  size 0x1000]
> pci 0001:03:00.0: BAR 13: failed to assign [io  size 0x1000]
> 
> Is there a cleaner way of dealing with this, in our firmware and/or
> the kernel? Eg, I guess if IO BARs aren't assigned (value 0) on PCI
> domains without IO bridge windows in the ACPI AML, no need to
> conflict/attempt assignment?

Yes, we should be able to deal with this better.

The complaint about disabling the pnp 00:00 resource is bogus because the
PCI 0001:05:00.0 BAR is not assigned and should never be enabled, so this
is not a real conflict.  My intent is that the PCI resource corresponding
to this BAR should have the IORESOURCE_UNSET bit set.  That will prevent
pci_enable_resources() from setting the PCI_COMMAND_IO bit, which is what
would enable the BAR.

Can you try the patch below?  I don't think it will work right off the bat
because I think the fact that we print "[io  0x-0x00ff]" instead of
"[io  size 0x0100]" means we don't have IORESOURCE_UNSET set in the PCI
resource.  But maybe you can figure out where it *should* be getting
set?

Bjorn


commit fd4888cf942a2ae9cdefc46d1fba86b2c7ec2dbf
Author: Bjorn Helgaas 
Date:   Tue Mar 3 16:13:56 2015 -0600

PNP: Don't check for overlaps with unassigned PCI BARs

After 0509ad5e1a7d ("PNP: disable PNP motherboard resources that overlap
PCI BARs"), we disable and warn about PNP resources that overlap PCI BARs.
But we assume that all PCI BARs are valid, which is incorrect, because a
BAR may not have any space assigned to it.  In that case, we will not
enable the BAR, so no other resource can conflict with it.

Ignore PCI BARs that are unassigned, as indicated by IORESOURCE_UNSET.

Signed-off-by: Bjorn Helgaas 

diff --git a/drivers/pnp/quirks.c b/drivers/pnp/quirks.c
index ebf0d6710b5a..943c1cb9566c 100644
--- a/drivers/pnp/q

Re: PCIe 32-bit MMIO exhaustion

2015-02-23 Thread Daniel J Blueman

Hi Bjorn, Jiang,

On 29/01/2015 23:23, Bjorn Helgaas wrote:

Hi Daniel,

On Wed, Jan 28, 2015 at 2:42 AM, Daniel J Blueman  wrote:

With systems with a large number of PCI devices, we're seeing lack of 32-bit
MMIO space, eg one quad-port NetXtreme-2 adapter takes 128MB of space [1].

An errata to the PCIe 2.1 spec provides guidance on limitations with 64-bit
non-prefetchable BARs (since bridges have only 32-bit non-prefetchable
ranges) stating that vendors can enable the prefetchable bit in BARs under
certain circumstances to allow 64-bit allocation [2].

The problem with that, is that vendors can't know apriori what hosts their
products will be in, so can't just advertise prefetchable 64-bit BARs. What
can be done, is system firmware can use the 64-bit prefetchable BAR in
bridges, and assign a 64-bit non-prefetchable device BAR into that area,
where it is safe to do so (following the guidance).

At present, linux denies such allocations [3] and disables the BARs. It
seems a practical solution to allow them if the firmware believes it is
safe.


This particular message ([3]):


pci 0002:01:00.0: BAR 0: [mem size 0x2000 64bit] conflicts with PCI Bus
0002:00 [mem 0x1002000-0x10027ff pref]


is misleading at best and likely a symptom of a bug.  We printed the
*size* of BAR 0, not an address, which means we haven't assigned space
for the BAR.  That means it should not conflict with anything.

We already do revert to firmware assignments in some situations when
Linux can't figure out how to assign things itself.  But apparently
not in *this* situation.

Without seeing the whole picture, it's hard for me to figure out
what's going on here.  Could you open a bug report at
http://bugzilla.kernel.org (category drivers/PCI) and attach a
complete dmesg and "lspci -vv" output?  Then we can look at what
firmware did and what Linux thought was wrong with it.


Done a while back:
https://bugzilla.kernel.org/show_bug.cgi?id=92671

An interesting question popped up: I find the kernel doesn't accept IO 
BARs and bridge windows after address 0x, though the PCI spec and 
modern hardware allows 32-bit decode.


Thus for practical reasons, our NumaConnect firmware doesn't setup IO 
BARs/windows beyond the first PCI domain (which is the only one with 
legacy support, and no drivers seem to require IO their BARs anyway), 
and we get conflicts and warnings [1]:


pnp 00:00: disabling [io  0x0061] because it overlaps 0001:05:00.0 BAR 0 
[io  0x-0x00ff]

pci 0001:03:00.0: BAR 13: no space for [io  size 0x1000]
pci 0001:03:00.0: BAR 13: failed to assign [io  size 0x1000]

Is there a cleaner way of dealing with this, in our firmware and/or the 
kernel? Eg, I guess if IO BARs aren't assigned (value 0) on PCI domains 
without IO bridge windows in the ACPI AML, no need to conflict/attempt 
assignment?


Many thanks!
  Daniel

[1] https://bugzilla.kernel.org/attachment.cgi?id=165831
--
Daniel J Blueman
Principal Software Engineer, Numascale
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCIe 32-bit MMIO exhaustion

2015-01-29 Thread Bjorn Helgaas
[+cc Yinghai]

Hi Daniel,

On Wed, Jan 28, 2015 at 2:42 AM, Daniel J Blueman  wrote:
> With systems with a large number of PCI devices, we're seeing lack of 32-bit
> MMIO space, eg one quad-port NetXtreme-2 adapter takes 128MB of space [1].
>
> An errata to the PCIe 2.1 spec provides guidance on limitations with 64-bit
> non-prefetchable BARs (since bridges have only 32-bit non-prefetchable
> ranges) stating that vendors can enable the prefetchable bit in BARs under
> certain circumstances to allow 64-bit allocation [2].
>
> The problem with that, is that vendors can't know apriori what hosts their
> products will be in, so can't just advertise prefetchable 64-bit BARs. What
> can be done, is system firmware can use the 64-bit prefetchable BAR in
> bridges, and assign a 64-bit non-prefetchable device BAR into that area,
> where it is safe to do so (following the guidance).
>
> At present, linux denies such allocations [3] and disables the BARs. It
> seems a practical solution to allow them if the firmware believes it is
> safe.

This particular message ([3]):

> pci 0002:01:00.0: BAR 0: [mem size 0x2000 64bit] conflicts with PCI Bus
> 0002:00 [mem 0x1002000-0x10027ff pref]

is misleading at best and likely a symptom of a bug.  We printed the
*size* of BAR 0, not an address, which means we haven't assigned space
for the BAR.  That means it should not conflict with anything.

We already do revert to firmware assignments in some situations when
Linux can't figure out how to assign things itself.  But apparently
not in *this* situation.

Without seeing the whole picture, it's hard for me to figure out
what's going on here.  Could you open a bug report at
http://bugzilla.kernel.org (category drivers/PCI) and attach a
complete dmesg and "lspci -vv" output?  Then we can look at what
firmware did and what Linux thought was wrong with it.

Bjorn

> --- [1]
>
> :01:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709
> Gigabit Ethernet (rev 20)
> Subsystem: Dell Device 1f26
> Flags: bus master, fast devsel, latency 0, IRQ 24
> Memory at e600 (64-bit, non-prefetchable) [size=32M]
> Capabilities: [48] Power Management version 3
> Capabilities: [50] Vital Product Data
> Capabilities: [58] MSI: Enable- Count=1/16 Maskable- 64bit+
> Capabilities: [a0] MSI-X: Enable+ Count=9 Masked-
> Capabilities: [ac] Express Endpoint, MSI 00
> Capabilities: [100] Device Serial Number d4-ae-52-ff-fe-ea-5c-e8
> Capabilities: [110] Advanced Error Reporting
> Capabilities: [150] Power Budgeting 
> Capabilities: [160] Virtual Channel
> Kernel driver in use: bnx2
>
> :01:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709
> Gigabit Ethernet (rev 20)
> Subsystem: Dell Device 1f26
> Flags: bus master, fast devsel, latency 0, IRQ 25
> Memory at e800 (64-bit, non-prefetchable) [size=32M]
> Capabilities: [48] Power Management version 3
> Capabilities: [50] Vital Product Data
> Capabilities: [58] MSI: Enable- Count=1/16 Maskable- 64bit+
> Capabilities: [a0] MSI-X: Enable- Count=9 Masked-
> Capabilities: [ac] Express Endpoint, MSI 00
> Capabilities: [100] Device Serial Number d4-ae-52-ff-fe-ea-5c-ea
> Capabilities: [110] Advanced Error Reporting
> Capabilities: [150] Power Budgeting 
> Capabilities: [160] Virtual Channel
> Kernel driver in use: bnx2
>
> :02:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709
> Gigabit Ethernet (rev 20)
> Subsystem: Dell Device 1f26
> Flags: bus master, fast devsel, latency 0, IRQ 28
> Memory at ea00 (64-bit, non-prefetchable) [size=32M]
> Capabilities: [48] Power Management version 3
> Capabilities: [50] Vital Product Data
> Capabilities: [58] MSI: Enable- Count=1/16 Maskable- 64bit+
> Capabilities: [a0] MSI-X: Enable- Count=9 Masked-
> Capabilities: [ac] Express Endpoint, MSI 00
> Capabilities: [100] Device Serial Number d4-ae-52-ff-fe-ea-5c-ec
> Capabilities: [110] Advanced Error Reporting
> Capabilities: [150] Power Budgeting 
> Capabilities: [160] Virtual Channel
> Kernel driver in use: bnx2
>
> :02:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709
> Gigabit Ethernet (rev 20)
> Subsystem: Dell Device 1f26
> Flags: bus master, fast devsel, latency 0, IRQ 29
> Memory at ec00 (64-bit, non-prefetchable) [size=32M]
> Capabilities: [48] Power Management version 3
> Capabilities: [50] Vital Product Data
> Capabilities: [58] MSI: Enable- Count=1/16 Maskable- 64bit+
> Capabilities: [a0] MSI-X: Enable- Count=9 Masked-
> Capabilities: [ac] Express Endpoint, MSI 00
> Capabilities: [100] Device Serial Number d4-ae-52-ff-fe-ea-5c-ee
>   

Re: PCIe PASID (Process Address Space ID) and iommu code

2014-10-16 Thread Joerg Roedel
On Wed, Oct 15, 2014 at 09:50:58PM -0600, Bjorn Helgaas wrote:
> [+cc Joerg, Suravee, Jay, iommu list, linux-pci]
> 
> On Wed, Oct 15, 2014 at 5:44 PM, Kallol Biswas  wrote:
> > Resending, as message got bounced for html content.
> > 
> > Hi,
> > PCIe has introduced PASID TLP Prefix.  There are two ECNs on this.
> >
> > It seems that AMD iommu code makes use of PASID. Is there a device that
> > utilizes this TLP prefix?

Yes, recent radeon GPUs can make use of the TLP prefix.

> > PASID allocation and management within a device is not clear to me. How
> > does device know which PASID to issue for which virtual address? Who makes
> > the association? Must be software/OS, but how? There is no table for this
> > like MSI-X table.

The setup of the PASIDs is device specific, so there is no standard for
setting up PASID address spaces on devices.


HTH,

Joerg

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCIe PASID (Process Address Space ID) and iommu code

2014-10-15 Thread Bjorn Helgaas
[+cc Joerg, Suravee, Jay, iommu list, linux-pci]

On Wed, Oct 15, 2014 at 5:44 PM, Kallol Biswas  wrote:
> Resending, as message got bounced for html content.
> 
> Hi,
> PCIe has introduced PASID TLP Prefix.  There are two ECNs on this.
>
> It seems that AMD iommu code makes use of PASID. Is there a device that
> utilizes this TLP prefix?
>
> PASID allocation and management within a device is not clear to me. How
> does device know which PASID to issue for which virtual address? Who makes
> the association? Must be software/OS, but how? There is no table for this
> like MSI-X table.
>
> Any pointer/documentation will be appreciated.
>
> Regards,
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCIe PASID (Process Address Space ID) and iommu code

2014-10-15 Thread Kallol Biswas
Hi,
PCIe has introduced PASID TLP Prefix.  There are two ECNs on this.

It seems that AMD iommu code makes use of PASID. Is there a device that
utilizes this TLP prefix?

PASID allocation and management within a device is not clear to me. How
does device know which PASID to issue for with virtual address? Who makes
the association? Must be software/OS, but how? There is no table for this
like MSI-X table.

Any pointer/documentation will be appreciated.

Regards,
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCIe bus enumeration

2014-08-07 Thread Federico Vaga
On Tuesday 08 July 2014 14:27:00 Bjorn Helgaas wrote:
> On Tue, Jul 8, 2014 at 1:20 PM, Federico Vaga 
 wrote:
> > On Tuesday 08 July 2014 12:23:39 Bjorn Helgaas wrote:
> >> On Tue, Jul 8, 2014 at 1:15 AM, Federico Vaga
> > 
> >  wrote:
> >> >> > So, It looks like that some BIOS disable the bridge when
> >> >> > there
> >> >> > is
> >> >> > nothing behind it. Why? Power save? :/
> >> >> 
> >> >> Could be power savings, or possibly to conserve bus numbers,
> >> >> which
> >> >> are a limited resource.
> >> > 
> >> > what is the maximum number of buses?
> >> 
> >> 256.
> > 
> > Well, it is not a small number. I will ask directly to the company
> > who sell this crate and ask them what is going on in the BIOS
> 
> Yeah, it's not usually a problem until you get to the really big
> machines.  The BIOS vendor could give you a much better reason; I'm
> only speculating.

Just to complete the discussion (I forgot to do it). The vendor point 
me to the correct BIOS configuration to keep all the PCIe port enable 
even if there is nothing in the slot. Now the bus number enumeration 
seems "constant"

Thank you

-- 
Federico Vaga
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCIe bus enumeration

2014-07-08 Thread Bjorn Helgaas
On Tue, Jul 8, 2014 at 1:20 PM, Federico Vaga  wrote:
> On Tuesday 08 July 2014 12:23:39 Bjorn Helgaas wrote:
>> On Tue, Jul 8, 2014 at 1:15 AM, Federico Vaga
>  wrote:
>> >> > So, It looks like that some BIOS disable the bridge when there
>> >> > is
>> >> > nothing behind it. Why? Power save? :/
>> >>
>> >> Could be power savings, or possibly to conserve bus numbers,
>> >> which
>> >> are a limited resource.
>> >
>> > what is the maximum number of buses?
>>
>> 256.
>
> Well, it is not a small number. I will ask directly to the company who
> sell this crate and ask them what is going on in the BIOS

Yeah, it's not usually a problem until you get to the really big
machines.  The BIOS vendor could give you a much better reason; I'm
only speculating.

>> > At this point I'm a little bit confused about the definition "slot
>> > numbers" :) You mean the 22, 25, ...
>>
>> Right.  Bus numbers are under software control, to some degree (as a
>> general rule, an x86 BIOS assigns them and Linux leaves them alone,
>> but they *can* be changed so they aren't a good thing to rely on).
>> The bus number of a root bus is usually determined by hardware or
>> by an arch-specific host bridge driver.  The bus number below a
>> PCI-PCI bridge is determined by the bridge's "secondary bus number"
>> register, which software can change.
>>
>> Slot numbers are based on the Physical Slot Number in the PCIe Slot
>> Capability register.  This is set by some hardware mechanism such as
>> pin strapping or a serial EEPROM.  Software can't change it, so you
>> can rely on it to be constant.  (There's also a mechanism for
>> getting a slot number from ACPI, but that should also return a
>> constant value).  The problem is that I don't think the Linux slot
>> number support is very good, so I'm sure there's plenty of stuff
>> that we *should* be able to do that we can't do *yet*.
>
> Mh, I understand. Let's say that I have time to spend on this problem
> (I do not know) and contributing to the PCI subsystem. How many
> differences are there between 3.2, 3.6, 3.16/next? We are using
> 3.2/3.6 at the moment, but probably you should expect that it will
> work on the last version :)

There are quite a few differences, including a fair amount of work on
the hotplug drivers.  The problem in this area is that pciehp (the
PCIe hotplug driver) and acpiphp (the ACPI hotplug driver) both can
register slot numbers and it's sort of ugly to figure out which one to
use in a given situation.  Neither can be a loadable module anymore,
which simplifies things a little bit, but it's still ugly.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCIe bus enumeration

2014-07-08 Thread Federico Vaga
On Tuesday 08 July 2014 12:23:39 Bjorn Helgaas wrote:
> On Tue, Jul 8, 2014 at 1:15 AM, Federico Vaga 
 wrote:
> >> > So, It looks like that some BIOS disable the bridge when there
> >> > is
> >> > nothing behind it. Why? Power save? :/
> >> 
> >> Could be power savings, or possibly to conserve bus numbers,
> >> which
> >> are a limited resource.
> > 
> > what is the maximum number of buses?
> 
> 256.

Well, it is not a small number. I will ask directly to the company who 
sell this crate and ask them what is going on in the BIOS

> > At this point I'm a little bit confused about the definition "slot
> > numbers" :) You mean the 22, 25, ...
> 
> Right.  Bus numbers are under software control, to some degree (as a
> general rule, an x86 BIOS assigns them and Linux leaves them alone,
> but they *can* be changed so they aren't a good thing to rely on).
> The bus number of a root bus is usually determined by hardware or
> by an arch-specific host bridge driver.  The bus number below a
> PCI-PCI bridge is determined by the bridge's "secondary bus number"
> register, which software can change.
> 
> Slot numbers are based on the Physical Slot Number in the PCIe Slot
> Capability register.  This is set by some hardware mechanism such as
> pin strapping or a serial EEPROM.  Software can't change it, so you
> can rely on it to be constant.  (There's also a mechanism for
> getting a slot number from ACPI, but that should also return a
> constant value).  The problem is that I don't think the Linux slot
> number support is very good, so I'm sure there's plenty of stuff
> that we *should* be able to do that we can't do *yet*.

Mh, I understand. Let's say that I have time to spend on this problem 
(I do not know) and contributing to the PCI subsystem. How many 
differences are there between 3.2, 3.6, 3.16/next? We are using 
3.2/3.6 at the moment, but probably you should expect that it will 
work on the last version :)

-- 
Federico Vaga
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCIe bus enumeration

2014-07-08 Thread Bjorn Helgaas
On Tue, Jul 8, 2014 at 1:15 AM, Federico Vaga  wrote:
>> > So, It looks like that some BIOS disable the bridge when there is
>> > nothing behind it. Why? Power save? :/
>>
>> Could be power savings, or possibly to conserve bus numbers, which
>> are a limited resource.
>
> what is the maximum number of buses?

256.

>> Well, it's true that it's hard to get constant *bus numbers*, but
>> it's never really been a good idea to rely on those, because
>> they're assigned at the discretion of the OS, and there are reasons
>> why the OS might want to reallocate them, e.g., to accommodate a
>> deep hot-plugged hierarchy.  If you shift focus to *slot numbers*,
>> then I think there's a lot more we can do.
>
> At this point I'm a little bit confused about the definition "slot
> numbers" :) You mean the 22, 25, ...

Right.  Bus numbers are under software control, to some degree (as a
general rule, an x86 BIOS assigns them and Linux leaves them alone,
but they *can* be changed so they aren't a good thing to rely on).
The bus number of a root bus is usually determined by hardware or by
an arch-specific host bridge driver.  The bus number below a PCI-PCI
bridge is determined by the bridge's "secondary bus number" register,
which software can change.

Slot numbers are based on the Physical Slot Number in the PCIe Slot
Capability register.  This is set by some hardware mechanism such as
pin strapping or a serial EEPROM.  Software can't change it, so you
can rely on it to be constant.  (There's also a mechanism for getting
a slot number from ACPI, but that should also return a constant
value).  The problem is that I don't think the Linux slot number
support is very good, so I'm sure there's plenty of stuff that we
*should* be able to do that we can't do *yet*.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCIe bus enumeration

2014-07-08 Thread Federico Vaga
(I'm changing my email address to the work one. Initially it was just 
my personal curiosity but now you are helping me with my work, so I 
think is correct in this way)

> > So, It looks like that some BIOS disable the bridge when there is
> > nothing behind it. Why? Power save? :/
> 
> Could be power savings, or possibly to conserve bus numbers, which
> are a limited resource.

what is the maximum number of buses?

> >> If you can get to an EFI shell on this box, you might be able to
> >> confirm this with the "pci" command.  Booting Linux with
> >> "pci=earlydump" is similar in that it dumps PCI config space
> >> before
> >> we change anything.
> > 
> > yes I confirm, the bridge are not there if I don't plug the card.
> > 
> >> To solve this problem, I think you need slot information even
> >> when
> >> there's no hotplug.  This has been raised before [1, 2], and I
> >> think it's a good idea, but nobody has implemented it yet.
> > 
> > Yes, but if the BIOS disable the bridge there is nothing we can
> > do.
> 
> Well, it's true that it's hard to get constant *bus numbers*, but
> it's never really been a good idea to rely on those, because
> they're assigned at the discretion of the OS, and there are reasons
> why the OS might want to reallocate them, e.g., to accommodate a
> deep hot-plugged hierarchy.  If you shift focus to *slot numbers*,
> then I think there's a lot more we can do.

At this point I'm a little bit confused about the definition "slot 
numbers" :) You mean the 22, 25, ...

> >> Another curious thing is that you refer to "slot 10", but there's
> >> no obvious connection between that and the "slot 21" in the PCIe
> >> capability of the Root Port leading to that slot.  But I guess
> >> you said the slots are in a backplane (they're not an integral
> >> part of the motherboard).  In that case, there's no way for the
> >> motherboard to know what the labels on the backplane are.
> > 
> > It is written on the backplane. I said slot 10 because I'm
> > counting
> > the available slot, but on the backplane they are 22, 25, and
> > other
> > no-consecutive numbers.
> 
> The 22, 25, etc., are in the same range as the slot numbers in the
> PCIe Slot Capabilities registers, so maybe the backplane is
> constructed to make this possible.  The external PCIe chassis I'm
> familiar with have one fast link on a cable leading to the box, with
> a PCIe switch inside the box.  The upstream port is connected to
> the incoming link, and there's a downstream port connected to each
> slot. In this case, the slot numbers in the downstream ports' Slot
> Capabilities registers can be made to match the silkscreen labels
> on the board since everything is fixed by the hardware.
> 
> Your backplane sounds a little different (you have Ports on the root
> bus leading directly to slots in the backplane, so I assume those
> Ports are on the motherboard, not the backplane), but maybe the
> motherboard & backplane are designed as a unit so the Port slot
> numbers could match the backplane.

Yes, the backplane is almost "empty". Except for the 9 PCIe backplane 
which has PCI bridges on it. At the moment I cannot check physically 
this kind of backplane, but from the lspci output I understand that 
there is a bridge on the backplane because the motherboard is the 
same.

> 
> > If I use `biosdecode` I can get that information, but only for the
> > "first level" of bridges. On some backplane I have PCI bridges
> > behind bridges, and in this case biosdecode doesn't help: it just
> > tell me about the bridge on the motherboard.
> 
> What specific biosdecode information are you using? 

I was looking at the "PCI interrupt routing", but it seems that it 
returns only information about the last bridge in the interrupt's 
routing. Here an example with a different backplane (9 PCIe). 

It seems fine for backplane without PCI Bridge on the backplane.

I attached two files, one for each type of backplane.


Maybe I'm just misunderstanding the output of biosdecode. I didn't 
find an explanation of its output: I'm guessing the meaning.

> There's a fair
> amount of stuff in the PCI-to-PCI bridge spec about slot and chassis
> numbering, including some about expansion chassis.  I doubt that
> Linux implements all that, so there's probably room for a lot of
> improvement.  I attached your lspci output to the bugzilla
> (https://bugzilla.kernel.org/show_bug.cgi?id=72681).  Maybe you
> could attach the biosdecode info there, too, and we could see if
> there's a way we can make this easier.

ok

-- 
Federico Vaga-[:00]-+-00.0
   +-01.0-[05]00.0
   +-02.0
   +-03.0
   +-03.2
   +-03.3
   +-19.0
   +-1a.0
   +-1a.1
   +-1a.2
   +-1a.7
   +-1b.0
   +-1c.0-[04]--
   +-1c.4-[03]00.0
   +-1d.0
   +-1d.1
   +-1d.2
   +-1d.7
   +-1e.0-[01-02]0c.0-[02]--
   +-1f.0
   +-1f.2

Re: PCIe bus enumeration

2014-07-07 Thread Bjorn Helgaas
On Mon, Jul 7, 2014 at 1:29 AM, Federico Vaga  wrote:
> On Friday 04 July 2014 15:26:12 Bjorn Helgaas wrote:
>> On Fri, Jul 04, 2014 at 09:55:20AM +0200, Federico Vaga wrote:
>> > > I assume these ports don't support hotplug.  If they *did*
>> > > support
>> > > hotplug, those ports would have to exist because they handle the
>> > > hotplug events (presence detect, etc.)
>> >
>> > I asked: yes, they do not support hotplug
>> >
>> > > If you can collect the complete "lspci -vv" output from your
>> > > machine (with a device plugged in, so we can see the port
>> > > leading to it), that will help make this more concrete.  And
>> > > maybe one with no devices plugged in, so we can see exactly
>> > > what changes.
>> >
>> > I attached two files with the output. I putted a card in slot 10
>> > and took the output, then moved the card on slot 11 and took the
>> > output.
>> >
>> > As you can see with diff the bridge behind the slot disappear when
>> > it is empty.
>>
>> Perfect, thanks!  For some reason, it really helps me to be able to
>> stare at the actual data.  Here's the situation with slot 10
>> occupied:
>>
>>   00:01.0 82Q35 Root Port to [bus 05]  PCIe SltCap slot #21
>>   05:00.0 CERN/ECP/EDU Device  slot 10
>>   00:1c.0 82801I Express Port 1 to [bus 04]PCIe SltCap slot #22
>>   00:1c.3 (not present at all)
>>   00:1c.4 82801I Express Port 5 to [bus 03]PCIe SltCap slot #0
>>   03:00.0 Realtek NIC
>>
>> and here it is with slot 11 occupied:
>>
>>   00:01.0 (not present at all)
>>   00:1c.0 82801I Express Port 1 to [bus 05]PCIe SltCap slot #22
>>   00:1c.3 82801I Express Port 4 to [bus 04]PCIe SltCap slot #25
>>   04:00.0 CERN/ECP/EDU Device  slot 11
>>   00:1c.4 82801I Express Port 5 to [bus 03]PCIe SltCap slot #0
>>   03:00.0 Realtek NIC
>>
>> I'm pretty sure this is a function of your BIOS.  There are often
>> device-specific ways to enable or disable individual devices (like
>> the root ports here), and the BIOS is likely disabling these ports
>> when there is nothing below them.  I don't know why it would turn
>> off 00:1c.3 when its slot is empty, but it doesn't turn off
>> 00:1c.0, which also leads to an empty slot. But I don't think
>> Linux is involved in this, and if the BIOS disables devices, there
>> really isn't anything Linux can do about it.
>
> It seems to happen also on some "classic" PC. I didn't experiment it
> by myself, some friends reported me this behavior in the recent past.
>
> So, It looks like that some BIOS disable the bridge when there is
> nothing behind it. Why? Power save? :/

Could be power savings, or possibly to conserve bus numbers, which are
a limited resource.

>> If you can get to an EFI shell on this box, you might be able to
>> confirm this with the "pci" command.  Booting Linux with
>> "pci=earlydump" is similar in that it dumps PCI config space before
>> we change anything.
>
> yes I confirm, the bridge are not there if I don't plug the card.
>
>> To solve this problem, I think you need slot information even when
>> there's no hotplug.  This has been raised before [1, 2], and I
>> think it's a good idea, but nobody has implemented it yet.
>
> Yes, but if the BIOS disable the bridge there is nothing we can do.

Well, it's true that it's hard to get constant *bus numbers*, but it's
never really been a good idea to rely on those, because they're
assigned at the discretion of the OS, and there are reasons why the OS
might want to reallocate them, e.g., to accommodate a deep hot-plugged
hierarchy.  If you shift focus to *slot numbers*, then I think there's
a lot more we can do.

>> Another curious thing is that you refer to "slot 10", but there's no
>> obvious connection between that and the "slot 21" in the PCIe
>> capability of the Root Port leading to that slot.  But I guess you
>> said the slots are in a backplane (they're not an integral part of
>> the motherboard).  In that case, there's no way for the motherboard
>> to know what the labels on the backplane are.
>
> It is written on the backplane. I said slot 10 because I'm counting
> the available slot, but on the backplane they are 22, 25, and other
> no-consecutive numbers.

The 22, 25, etc., are in the same range as the slot numbers in the
PCIe Slot Capabilities registers, so maybe the backplane is
constructed to make this possible.  The external PCIe chassis I'm
familiar with have one fast link on a cable leading to the box, with a
PCIe switch inside the box.  The upstream port is connected to the
incoming link, and there's a downstream port connected to each slot.
In this case, the slot numbers in the downstream ports' Slot
Capabilities registers can be made to match the silkscreen labels on
the board since everything is fixed by the hardware.

Your backplane sounds a little different (you have Ports on the root
bus leading directly to slots in the backplane, so I assume those
Ports are on the motherboard, not the backplane), but maybe the
mother

Re: PCIe bus enumeration

2014-07-07 Thread Federico Vaga
On Friday 04 July 2014 15:26:12 Bjorn Helgaas wrote:
> On Fri, Jul 04, 2014 at 09:55:20AM +0200, Federico Vaga wrote:
> > > I assume these ports don't support hotplug.  If they *did*
> > > support
> > > hotplug, those ports would have to exist because they handle the
> > > hotplug events (presence detect, etc.)
> > 
> > I asked: yes, they do not support hotplug
> > 
> > > If you can collect the complete "lspci -vv" output from your
> > > machine (with a device plugged in, so we can see the port
> > > leading to it), that will help make this more concrete.  And
> > > maybe one with no devices plugged in, so we can see exactly
> > > what changes.
> > 
> > I attached two files with the output. I putted a card in slot 10
> > and took the output, then moved the card on slot 11 and took the
> > output.
> > 
> > As you can see with diff the bridge behind the slot disappear when
> > it is empty.
> 
> Perfect, thanks!  For some reason, it really helps me to be able to
> stare at the actual data.  Here's the situation with slot 10
> occupied:
> 
>   00:01.0 82Q35 Root Port to [bus 05]  PCIe SltCap slot #21
>   05:00.0 CERN/ECP/EDU Device  slot 10
>   00:1c.0 82801I Express Port 1 to [bus 04]PCIe SltCap slot #22
>   00:1c.3 (not present at all)
>   00:1c.4 82801I Express Port 5 to [bus 03]PCIe SltCap slot #0
>   03:00.0 Realtek NIC
> 
> and here it is with slot 11 occupied:
> 
>   00:01.0 (not present at all)
>   00:1c.0 82801I Express Port 1 to [bus 05]PCIe SltCap slot #22
>   00:1c.3 82801I Express Port 4 to [bus 04]PCIe SltCap slot #25
>   04:00.0 CERN/ECP/EDU Device  slot 11
>   00:1c.4 82801I Express Port 5 to [bus 03]PCIe SltCap slot #0
>   03:00.0 Realtek NIC
> 
> I'm pretty sure this is a function of your BIOS.  There are often
> device-specific ways to enable or disable individual devices (like
> the root ports here), and the BIOS is likely disabling these ports
> when there is nothing below them.  I don't know why it would turn
> off 00:1c.3 when its slot is empty, but it doesn't turn off
> 00:1c.0, which also leads to an empty slot. But I don't think
> Linux is involved in this, and if the BIOS disables devices, there
> really isn't anything Linux can do about it.

It seems to happen also on some "classic" PC. I didn't experiment it 
by myself, some friends reported me this behavior in the recent past.

So, It looks like that some BIOS disable the bridge when there is 
nothing behind it. Why? Power save? :/

> If you can get to an EFI shell on this box, you might be able to
> confirm this with the "pci" command.  Booting Linux with
> "pci=earlydump" is similar in that it dumps PCI config space before
> we change anything.

yes I confirm, the bridge are not there if I don't plug the card.

> To solve this problem, I think you need slot information even when
> there's no hotplug.  This has been raised before [1, 2], and I
> think it's a good idea, but nobody has implemented it yet.

Yes, but if the BIOS disable the bridge there is nothing we can do.

> Another curious thing is that you refer to "slot 10", but there's no
> obvious connection between that and the "slot 21" in the PCIe
> capability of the Root Port leading to that slot.  But I guess you
> said the slots are in a backplane (they're not an integral part of
> the motherboard).  In that case, there's no way for the motherboard
> to know what the labels on the backplane are.

It is written on the backplane. I said slot 10 because I'm counting 
the available slot, but on the backplane they are 22, 25, and other 
no-consecutive numbers.

If I use `biosdecode` I can get that information, but only for the 
"first level" of bridges. On some backplane I have PCI bridges behind 
bridges, and in this case biosdecode doesn't help: it just tell me 
about the bridge on the motherboard.

At the moment, I'm using the PCI bridge address to make the 
association with a specific slot. When they are on they have always 
the same address. A colleague did a map between physical slot and PCI 
bridge address; from this we can extract the bus number and identify 
the cards. But well I was looking for better solutions :)

-- 
Federico Vaga
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCIe bus enumeration

2014-07-04 Thread Bjorn Helgaas
On Fri, Jul 04, 2014 at 09:55:20AM +0200, Federico Vaga wrote:
> > I assume these ports don't support hotplug.  If they *did* support
> > hotplug, those ports would have to exist because they handle the
> > hotplug events (presence detect, etc.)
> 
> I asked: yes, they do not support hotplug
> 
> > If you can collect the complete "lspci -vv" output from your machine
> > (with a device plugged in, so we can see the port leading to it),
> > that will help make this more concrete.  And maybe one with no
> > devices plugged in, so we can see exactly what changes.
> 
> I attached two files with the output. I putted a card in slot 10 and 
> took the output, then moved the card on slot 11 and took the output.
> 
> As you can see with diff the bridge behind the slot disappear when it 
> is empty.

Perfect, thanks!  For some reason, it really helps me to be able to stare
at the actual data.  Here's the situation with slot 10 occupied:

  00:01.0 82Q35 Root Port to [bus 05]  PCIe SltCap slot #21
  05:00.0 CERN/ECP/EDU Device  slot 10
  00:1c.0 82801I Express Port 1 to [bus 04]PCIe SltCap slot #22
  00:1c.3 (not present at all)
  00:1c.4 82801I Express Port 5 to [bus 03]PCIe SltCap slot #0
  03:00.0 Realtek NIC

and here it is with slot 11 occupied:

  00:01.0 (not present at all)
  00:1c.0 82801I Express Port 1 to [bus 05]PCIe SltCap slot #22
  00:1c.3 82801I Express Port 4 to [bus 04]PCIe SltCap slot #25
  04:00.0 CERN/ECP/EDU Device  slot 11
  00:1c.4 82801I Express Port 5 to [bus 03]PCIe SltCap slot #0
  03:00.0 Realtek NIC

I'm pretty sure this is a function of your BIOS.  There are often
device-specific ways to enable or disable individual devices (like the root
ports here), and the BIOS is likely disabling these ports when there is
nothing below them.  I don't know why it would turn off 00:1c.3 when its
slot is empty, but it doesn't turn off 00:1c.0, which also leads to an
empty slot.  But I don't think Linux is involved in this, and if the BIOS
disables devices, there really isn't anything Linux can do about it.

If you can get to an EFI shell on this box, you might be able to confirm
this with the "pci" command.  Booting Linux with "pci=earlydump" is similar
in that it dumps PCI config space before we change anything.

To solve this problem, I think you need slot information even when there's
no hotplug.  This has been raised before [1, 2], and I think it's a good
idea, but nobody has implemented it yet.

Another curious thing is that you refer to "slot 10", but there's no
obvious connection between that and the "slot 21" in the PCIe capability of
the Root Port leading to that slot.  But I guess you said the slots are in
a backplane (they're not an integral part of the motherboard).  In that
case, there's no way for the motherboard to know what the labels on the
backplane are.

Bjorn

[1] 
http://lkml.kernel.org/r/CAErSpo45sDNPt6=yw-qgqdojyl8+_jnovnenvxrlatga+by...@mail.gmail.com
[2] https://bugzilla.kernel.org/show_bug.cgi?id=72681
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCIe bus enumeration

2014-07-03 Thread Bjorn Helgaas
On Thu, Jul 3, 2014 at 2:40 PM, Federico Vaga  wrote:
> On Thursday 03 July 2014 13:43:14 Bjorn Helgaas wrote:

>> The /sys/bus/pci/slots/*/address files might help.  On my system, I
>> have:
>>
>>   $ grep . /sys/bus/pci/slots/*/address /dev/null
>>   /sys/bus/pci/slots/5/address::03:00
>
> My slots directory is empty on 3.2, 3.6, 3.14. I have to compile the
> kernel with a
> particular configuration? Use a kernel parameter?

Should be built-in, no parameter needed.  I think this is from
pci_create_slot() in drivers/pci/slot.c.  That's called from
register_slot() (drivers/acpi/pci_slot.c, which obviously depends on
the BIOS) and indirectly from pciehp (which doesn't depend on the BIOS
and reads a slot number from the PCIe capability).  "lspci -vv" will
show you this slot number in the SltCap (if the port supports a slot),
e.g.,

  00:1c.3 PCI bridge: Intel Corporation Lynx Point-LP PCI Express Root Port 4
Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00
SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd-
HotPlug- Surprise-
Slot #3, PowerLimit 10.000W; Interlock- NoCompl+

Since you don't see these, my guess is that your ports don't indicate
that they support a slot, e.g., they might look like this:

  00:1c.3 PCI bridge: Intel Corporation Lynx Point-LP PCI Express Root Port 4
Capabilities: [40] Express (v2) Root Port (Slot-), MSI 00

The "Slot-" means the port doesn't have a slot, and lspci won't show
you the SltCap register, and I think the kernel won't put anything in
/sys/bus/pci/slots.

>> "lspci -v" also shows:
>>
>>   03:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd.
>> Device 5227 (rev 01)
>> Physical Slot: 5
>
> My lspci hasn't the "Physical Slot" field. However, where does it take
> this information?
> From the BIOS I suppose, a recent BIOS.

>From looking at the lspci source
(git://git.kernel.org/pub/scm/utils/pciutils/pciutils.git), it looks
like that "Physical Slot" comes from /sys/bus/pci/slots/..., so if you
don't have anything there, you won't see "Physical Slot".

>> I don't think the behavior should be different on PCIe, but maybe if
>> you have an example, it will help me figure out why it is
>> different.  My current machine has three Root Ports (which are
>> treated as PCI-to-PCI bridges), and they all have secondary bus
>> numbers assigned, even though only two have devices below them:
>>
>>+-1c.0-[01]--
>>+-1c.3-[02]00.0
>>+-1c.5-[03]00.0
>
> What I observed is that when several PCIe slot belong to a single PCI
> Bridge, and you
> plug a board in one on these, then it enumerates all secondary buses,
> also the
> empty ones (like your case, all your slot belong to device 1c).
>
> But, if you un-plug the devices on secondary bus 02 and 03, you should
> not see the
> device 1c anymore. This is what is happening with my machine
> [industrial backplane
> with several PCI(e) slots and the motherboard plugged in a special
> slot.].

I think there's something unusual going on with your machine.  I can't
remove the devices on my machine (a laptop), but normally the Root
Ports or Downstream Ports leading to the slots continue to exist even
if the slots are empty.  In your case, it sounds like there's some
hardware that is turning off power to those ports when the slots are
all empty.

I assume these ports don't support hotplug.  If they *did* support
hotplug, those ports would have to exist because they handle the
hotplug events (presence detect, etc.)

If you can collect the complete "lspci -vv" output from your machine
(with a device plugged in, so we can see the port leading to it), that
will help make this more concrete.  And maybe one with no devices
plugged in, so we can see exactly what changes.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCIe bus enumeration

2014-07-03 Thread Federico Vaga
(Sorry for double emailing, a sw update changes my configuration to 
HTML email as default.So, the linux kernel mailing list complains that 
probably I'm spamming)

On Thursday 03 July 2014 13:43:14 Bjorn Helgaas wrote:
> On Thu, Jul 3, 2014 at 10:45 AM, Federico Vaga 
 wrote:
> > Hello,
> > 
> > (I haven't a deep knowledge of the PCIe specification, maybe I'm
> > just missing something)
> > 
> > is there a way to force the PCI subsystem to assign a bus-number
> > to
> > every PCIe bridge, even if there is nothing connected?
> > 
> > 
> > My aim is to have a bus enumeration constant and independent from
> > what I plugged on the system. So, I can associate a physical slot
> > to linux device address bb:dd.f. Is it possible?

More information that I forgot to add. I'm working on kernel 3.2 and 
3.6.

> The /sys/bus/pci/slots/*/address files might help.  On my system, I
> have:
> 
>   $ grep . /sys/bus/pci/slots/*/address /dev/null
>   /sys/bus/pci/slots/5/address::03:00

My slots directory is empty on 3.2, 3.6, 3.14. I have to compile the 
kernel with a 
particular configuration? Use a kernel parameter?

> "lspci -v" also shows:
> 
>   03:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd.
> Device 5227 (rev 01)
> Physical Slot: 5

My lspci hasn't the "Physical Slot" field. However, where does it take 
this information? 
>From the BIOS I suppose, a recent BIOS.

So if you look at your motherboard you can identify the which is the 
slot 5

> If you want to start with a physical slot number and figure out the
> bb.dd associated with it, the /sys/bus/pci/slots files are probably
> the most straightforward way.
> 
> > I can do the mapping with a simple shell script by discovering the
> > "new" bus number, but I'm wondering if there is a way to have a
> > constant bus enumeration.
> > 
> > 
> > 
> > My Humble Observation
> > -
> > It seems (to me) that for PCI the kernel assigns a bus-number to
> > every PCI bridges and sub-bridges even if there is nothing
> > connected:
> > 
> > 
> > e.g. from lspci -t
> > 
> >   [...]
> >   +-1e.0-[04-05]0c.0-[05]--
> > 
> > 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 92)
> > 04:0c.0 PCI bridge: Texas Instruments PCI2050 PCI-to-PCI Bridge
> > (rev 02)
> 
> Yes.  I think you're talking about the bridge "secondary bus
> number". In this case the 04:0c.0 bridge has secondary bus 05, and
> there are no devices on bus 05.

yep

> > The behavior on PCIe seems different. When there is nothing
> > plugged on a bus, then the kernel doesn't assign any bus-number
> > and it doesn't detect any PCI-Bridge at all. So, when I reboot
> > the system with a new PCIe card the bus enumeration may change.
> 
> I don't think the behavior should be different on PCIe, but maybe if
> you have an example, it will help me figure out why it is
> different.  My current machine has three Root Ports (which are
> treated as PCI-to-PCI bridges), and they all have secondary bus
> numbers assigned, even though only two have devices below them:
> 
>+-1c.0-[01]--
>+-1c.3-[02]00.0
>+-1c.5-[03]00.0

What I observed is that when several PCIe slot belong to a single PCI 
Bridge, and you 
plug a board in one on these, then it enumerates all secondary buses, 
also the 
empty ones (like your case, all your slot belong to device 1c).

But, if you un-plug the devices on secondary bus 02 and 03, you should 
not see the 
device 1c anymore. This is what is happening with my machine 
[industrial backplane 
with several PCI(e) slots and the motherboard plugged in a special 
slot.].

Even on sysfs the device doesn't appear.

> We have to assign a secondary bus number in order to enumerate below
> the bridge.  We can't even tell whether the bus is empty until we
> enumerate it.

Yep, I read the code and that's what I understood. 

> We should assign a secondary bus number, then
> enumerate the secondary bus (possibly finding nothing).  If we
> don't find anything, I think we currently leave the secondary bus
> number assigned even though the bus is empty.

I'll try to check :)


Thank you Bjorn

-- 
Federico Vaga
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCIe bus enumeration

2014-07-03 Thread Bjorn Helgaas
On Thu, Jul 3, 2014 at 10:45 AM, Federico Vaga  wrote:
> Hello,
>
> (I haven't a deep knowledge of the PCIe specification, maybe I'm just
> missing something)
>
> is there a way to force the PCI subsystem to assign a bus-number to
> every PCIe bridge, even if there is nothing connected?
>
>
> My aim is to have a bus enumeration constant and independent from what
> I plugged on the system. So, I can associate a physical slot to linux
> device address bb:dd.f. Is it possible?

The /sys/bus/pci/slots/*/address files might help.  On my system, I have:

  $ grep . /sys/bus/pci/slots/*/address /dev/null
  /sys/bus/pci/slots/5/address::03:00

"lspci -v" also shows:

  03:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd.
Device 5227 (rev 01)
Physical Slot: 5

If you want to start with a physical slot number and figure out the
bb.dd associated with it, the /sys/bus/pci/slots files are probably
the most straightforward way.

> I can do the mapping with a simple shell script by discovering the
> "new" bus number, but I'm wondering if there is a way to have a
> constant bus enumeration.
>
>
>
> My Humble Observation
> -
> It seems (to me) that for PCI the kernel assigns a bus-number to every
> PCI bridges and sub-bridges even if there is nothing connected:
>
>
> e.g. from lspci -t
>
>   [...]
>   +-1e.0-[04-05]0c.0-[05]--
>
> 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 92)
> 04:0c.0 PCI bridge: Texas Instruments PCI2050 PCI-to-PCI Bridge (rev
> 02)

Yes.  I think you're talking about the bridge "secondary bus number".
In this case the 04:0c.0 bridge has secondary bus 05, and there are no
devices on bus 05.

> The behavior on PCIe seems different. When there is nothing plugged on
> a bus, then the kernel doesn't assign any bus-number and it doesn't
> detect any PCI-Bridge at all. So, when I reboot the system with a new
> PCIe card the bus enumeration may change.

I don't think the behavior should be different on PCIe, but maybe if
you have an example, it will help me figure out why it is different.
My current machine has three Root Ports (which are treated as
PCI-to-PCI bridges), and they all have secondary bus numbers assigned,
even though only two have devices below them:

   +-1c.0-[01]--
   +-1c.3-[02]00.0
   +-1c.5-[03]00.0

We have to assign a secondary bus number in order to enumerate below
the bridge.  We can't even tell whether the bus is empty until we
enumerate it.  We should assign a secondary bus number, then enumerate
the secondary bus (possibly finding nothing).  If we don't find
anything, I think we currently leave the secondary bus number assigned
even though the bus is empty.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCIE resetting graphic card

2013-07-11 Thread Dave Young
Hi, Takao
On 07/12/2013 10:39 AM, Takao Indoh wrote:
> Hi Dave,
> 
> (2013/07/12 11:04), Dave Young wrote:
>> Hi, Takao
>>
>> I know you are working on the PCIE resetting patches for the iommu kdump
>> issue.
>>
>> You explicitly excluded the graphic card in your patch. I have some
>> questions about this. Why can't we reset the graphic card like other
>> pcie devices?
> 
> As far as I tested, the monitor blacks out after video controller is
> reset, and we cannot know what's going on. So, for now display device is
> not reset in my patch.
> 
> I'm not sure what we need to do to recover graphic card after its reset,
> but my colleague said that we need to run BIOS code to get back legacy
> VGA mode after reset. It seems not to be easy:-(
> 
> Maybe this document is helpful to do this.
> http://www.coreboot.org/images/2/2b/Vgabios.pdf

Thanks for quick response and the info about vgabios. It's awkward that
we can not switch back to VGA mode.

-- 
Thanks
Dave
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCIE resetting graphic card

2013-07-11 Thread Takao Indoh
Hi Dave,

(2013/07/12 11:04), Dave Young wrote:
> Hi, Takao
> 
> I know you are working on the PCIE resetting patches for the iommu kdump
> issue.
> 
> You explicitly excluded the graphic card in your patch. I have some
> questions about this. Why can't we reset the graphic card like other
> pcie devices?

As far as I tested, the monitor blacks out after video controller is
reset, and we cannot know what's going on. So, for now display device is
not reset in my patch.

I'm not sure what we need to do to recover graphic card after its reset,
but my colleague said that we need to run BIOS code to get back legacy
VGA mode after reset. It seems not to be easy:-(

Maybe this document is helpful to do this.
http://www.coreboot.org/images/2/2b/Vgabios.pdf

Thanks,
Takao Indoh

> 
> We have problems, if 1st kernel is in kms mode kdump kernel will have no
> chance to switch back to VGA console. There's no serial port on most of
> recent laptops thus it's difficult to debug kdump issue.
> 
> So if we can reset graphic card as well, and if it works I wonder if the
> 2nd kdump kernel can use vga console with nomodeset?



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: pcie aspm link setup, grandparent instead of parent?

2013-06-13 Thread Radim Krčmář
2013-06-12 15:54-0600, Bjorn Helgaas:
> [+cc linux-pci, Myron, Joe]

I'll remember it.

> On Wed, Jun 12, 2013 at 11:21 AM, Radim Krčmář  wrote:
> > Hello,
> >
> > as a consequence of hitting a NULL dereference bug[1] while downstream
> > aspm is setting up link_state, I started to wonder why is the code
> > skipping its parent bus in favour of grandparent's link_state.[2]
> >
> > Is this right? (I have no device to test on ...)

No,
pcie_link_state covers both upstream and downstream port, so we skip
upstream port of current device.

Must the "parent device" (grandparent) have valid "->self"?
(qemu allows even topologies without the upstream port, so it does not
count for much ...)

> > Thanks.
> >
> > ---
> > 1: https://bugzilla.redhat.com/show_bug.cgi?id=972381
> >The bug is hit because "pdev->bus->parent" has NULL "->parent" and thus
> >NULL "->self".
> > 2: "pdev = bus->self", so "pdev->bus->parent == bus->parent->parent"
> >
> >
> > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> > index 403a443..d58e282 100644
> > --- a/drivers/pci/pcie/aspm.c
> > +++ b/drivers/pci/pcie/aspm.c
> > @@ -527,7 +527,7 @@ static struct pcie_link_state 
> > *alloc_pcie_link_state(struct pci_dev *pdev)
> > link->pdev = pdev;
> > if (pci_pcie_type(pdev) == PCI_EXP_TYPE_DOWNSTREAM) {
> > struct pcie_link_state *parent;
> > -   parent = pdev->bus->parent->self->link_state;
> > +   parent = pdev->bus->self->link_state;
> > if (!parent) {
> > kfree(link);
> > return NULL;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: pcie aspm link setup, grandparent instead of parent?

2013-06-12 Thread Bjorn Helgaas
[+cc linux-pci, Myron, Joe]

On Wed, Jun 12, 2013 at 11:21 AM, Radim Krčmář  wrote:
> Hello,
>
> as a consequence of hitting a NULL dereference bug[1] while downstream
> aspm is setting up link_state, I started to wonder why is the code
> skipping its parent bus in favour of grandparent's link_state.[2]
>
> Is this right? (I have no device to test on ...)
>
> Thanks.
>
> ---
> 1: https://bugzilla.redhat.com/show_bug.cgi?id=972381
>The bug is hit because "pdev->bus->parent" has NULL "->parent" and thus
>NULL "->self".
> 2: "pdev = bus->self", so "pdev->bus->parent == bus->parent->parent"
>
>
> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> index 403a443..d58e282 100644
> --- a/drivers/pci/pcie/aspm.c
> +++ b/drivers/pci/pcie/aspm.c
> @@ -527,7 +527,7 @@ static struct pcie_link_state 
> *alloc_pcie_link_state(struct pci_dev *pdev)
> link->pdev = pdev;
> if (pci_pcie_type(pdev) == PCI_EXP_TYPE_DOWNSTREAM) {
> struct pcie_link_state *parent;
> -   parent = pdev->bus->parent->self->link_state;
> +   parent = pdev->bus->self->link_state;
> if (!parent) {
> kfree(link);
> return NULL;
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: PCIE: 32 bit prefetchable memory not getting programmed in BARs

2013-03-22 Thread Jay Agarwal
Hi All,

In my Graphics card, it has 4 memory requirement:

a. 32 bit non-prefetchable
b. 32 bit prefetchable
c. 64 bit non-prefetchable.
d. 64 bit prefetchable

And I see a is programmed in BAR0(offset 0x10), c in BAR3(offset 0x1C), d in 
BAR1(offset 0x14) but b is not programmed anywhere.

So is it  an expected behaviour? 

With best,
Jay


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCIe code Upstream

2013-03-22 Thread Thierry Reding
On Fri, Mar 22, 2013 at 05:59:13PM +0530, Jay Agarwal wrote:
> I don't see can't assign things now on Thierry's latest repo() + my tegra3 
> support patch, but some reconfiguring things are still there as below:
> 
>   bridge configuration invalid ([bus 00-00]), reconfiguring
> 
> 
> 
> I am not sure if it's harmless?

That's harmless.

Thierry


pgpm_er8B0Lxd.pgp
Description: PGP signature


Re: PCIe IO space support on Tilera GX: Is there any one who can confirm my modification to fix it is OK?

2012-10-26 Thread Bjorn Helgaas
On Fri, Oct 26, 2012 at 7:39 PM, Cyberman Wu  wrote:
> On Sat, Oct 27, 2012 at 12:28 AM, Bjorn Helgaas  wrote:
>> On Fri, Oct 26, 2012 at 8:08 AM, Chris Metcalf  wrote:
>>
>>> Cyberman: it seems like your bias hack is working for you.  But, as Bjorn
>>> says, this sounds like a driver bug.  What happens if you just revert your
>>> changes, but then in mvsas.c change the "if (!res_start || !res_len)" to
>>> just say "if (!res_len)"?  That seems like the true error test.  If that
>>> works, you should submit that change to the community.
>>
>> I don't *think* that is going to be enough, even with the kernel that
>> has some I/O space support, because both devices are assigned
>> identical resources:
>>
>>   pci :01:00.0: BAR 2: assigned [io  0x-0x007f]
>>   pci 0001:01:00.0: BAR 2: assigned [io  0x-0x007f]
>>
>> The I/O space support that's there is broken because we think the same
>> I/O range is available on both root buses, which is probably not the
>> case:
>>
>>   pci_bus :00: resource 0 [io  0x-0x]
>>   pci_bus 0001:00: resource 0 [io  0x-0x]
>>
> That's the problem I want to confirm what I've changed is correct. I've split
> the two RootComplex using separate I/O range, it seems works on our device,
> but since I'm not very clear about Linux kernel, I want some some to check it.
> For mvsas, I've already modified it some thing like Chris said when I began
> using MDE-4.0.0 GA release. I bring it out to see if there have some ideas
> about that issue.

Some architectures do implement multiple I/O ranges.  Typical HP
parisc and ia64 boxes have a PCI host bridge for every slot, so each
slot can be in a separate PCI domain, and each host bridge can support
a separate 64KB I/O port space for its slot.  In that case, the values
in the struct resource will be different from the actual addresses
that appear on the PCI buses.

For example, you might have bridge A leading to bus :00 with [io
0x-0x] and bridge B leading to bus 0001:00 with [io
0x1-0x1].  The I/O port addresses used by drivers don't
overlap, and there's no ambiguity, but if you put an analyzer on bus
0001:00, you'd see port addresses in the 0x-0x range.  If you
moved the analyzer to bus :00, you'd see the same 0x-0x
range of port addresses.  It's up to the architecture implementation
of inb()/outb()/etc. to map an I/O resource address to a host bridge
and a bus port address behind that bridge.

The bottom line is that what you want to do seems possible and makes
some sense.  Of course, the diff you posted is useless for upstream
Linux because it's all entangled with MDE and it reverts a lot of the
recent Linux work.  But what you want to do is possible in principle.
It's up to you and Chris to figure out whether and how to rework the
changes to add this functionality cleanly.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCIe IO space support on Tilera GX: Is there any one who can confirm my modification to fix it is OK?

2012-10-26 Thread Cyberman Wu
On Sat, Oct 27, 2012 at 12:28 AM, Bjorn Helgaas  wrote:
> On Fri, Oct 26, 2012 at 8:08 AM, Chris Metcalf  wrote:
>
>> Cyberman: it seems like your bias hack is working for you.  But, as Bjorn
>> says, this sounds like a driver bug.  What happens if you just revert your
>> changes, but then in mvsas.c change the "if (!res_start || !res_len)" to
>> just say "if (!res_len)"?  That seems like the true error test.  If that
>> works, you should submit that change to the community.
>
> I don't *think* that is going to be enough, even with the kernel that
> has some I/O space support, because both devices are assigned
> identical resources:
>
>   pci :01:00.0: BAR 2: assigned [io  0x-0x007f]
>   pci 0001:01:00.0: BAR 2: assigned [io  0x-0x007f]
>
> The I/O space support that's there is broken because we think the same
> I/O range is available on both root buses, which is probably not the
> case:
>
>   pci_bus :00: resource 0 [io  0x-0x]
>   pci_bus 0001:00: resource 0 [io  0x-0x]
>
That's the problem I want to confirm what I've changed is correct. I've split
the two RootComplex using separate I/O range, it seems works on our device,
but since I'm not very clear about Linux kernel, I want some some to check it.
For mvsas, I've already modified it some thing like Chris said when I began
using MDE-4.0.0 GA release. I bring it out to see if there have some ideas
about that issue.

> If mvsas really doesn't need the I/O BAR, I think it's likely that
> making it use pci_enable_device_mem() will make both devices work even
> without I/O space support in the kernel.
>
>> Bjorn et al: does it seem reasonable to add a bias to the mappings so that
>> we never report a zero value as valid?  This may be sufficiently defensive
>> programming that it's just the right thing to do regardless of whether
>> drivers are technically at fault or not.  If so, what's a good bias?  (I'm
>> inclined to think 64K rather than 4K.)
>
> I/O space is very limited to begin with (many architectures only
> *have* 64K), so I hesitate to add a bias in the PCI core.  But we do
> something similar in arch_remove_reservations(), and I think you could
> implement it that way if you wanted to.
>
> Bjorn



-- 
Cyberman Wu
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCIe IO space support on Tilera GX: Is there any one who can confirm my modification to fix it is OK?

2012-10-26 Thread Bjorn Helgaas
On Fri, Oct 26, 2012 at 8:08 AM, Chris Metcalf  wrote:

> Cyberman: it seems like your bias hack is working for you.  But, as Bjorn
> says, this sounds like a driver bug.  What happens if you just revert your
> changes, but then in mvsas.c change the "if (!res_start || !res_len)" to
> just say "if (!res_len)"?  That seems like the true error test.  If that
> works, you should submit that change to the community.

I don't *think* that is going to be enough, even with the kernel that
has some I/O space support, because both devices are assigned
identical resources:

  pci :01:00.0: BAR 2: assigned [io  0x-0x007f]
  pci 0001:01:00.0: BAR 2: assigned [io  0x-0x007f]

The I/O space support that's there is broken because we think the same
I/O range is available on both root buses, which is probably not the
case:

  pci_bus :00: resource 0 [io  0x-0x]
  pci_bus 0001:00: resource 0 [io  0x-0x]

If mvsas really doesn't need the I/O BAR, I think it's likely that
making it use pci_enable_device_mem() will make both devices work even
without I/O space support in the kernel.

> Bjorn et al: does it seem reasonable to add a bias to the mappings so that
> we never report a zero value as valid?  This may be sufficiently defensive
> programming that it's just the right thing to do regardless of whether
> drivers are technically at fault or not.  If so, what's a good bias?  (I'm
> inclined to think 64K rather than 4K.)

I/O space is very limited to begin with (many architectures only
*have* 64K), so I hesitate to add a bias in the PCI core.  But we do
something similar in arch_remove_reservations(), and I think you could
implement it that way if you wanted to.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCIe IO space support on Tilera GX: Is there any one who can confirm my modification to fix it is OK?

2012-10-26 Thread Chris Metcalf
On 10/26/2012 4:03 AM, Bjorn Helgaas wrote:
> [+cc Chris, also a few comments below]

Bjorn, thanks for looping me in.

> On Fri, Oct 26, 2012 at 12:59 AM, Cyberman Wu  wrote:
>> After we upgrade to MDE 4.1.0 from Tilera, we encounter a problem that
>> only on HighPoint 2680 card works, I've
>> tried to fix it, but since most time I'm working in user space, I'm
>> not sure my fix is enough. Their FAE said that
>> the guy who add PCIe I/O space support is on vacation and I can't get
>> help from him now, I hope maybe there
>> will have somebody can help.

I asked internally and I think the response provided by Tilera was more
along the lines of "it may take a bit longer to get you an answer". :-)

> Per http://lkml.indiana.edu/hypermail/linux/kernel/1205.1/01176.html,
> Chris considered adding I/O space support and decided against it at
> that time, partly because it would use up a TRIO PIO region.
>
> I don't know his current thoughts.  Possibly it could be done under a
> config option or something.

In the end, we did code up I/O port support in the root complex driver,
since we had a customer with a device that needed it.  We haven't yet
returned it to the community - getting the basic PCI root complex support
returned (and networking support) took way more time than I had anticipated
so I'm batching up the next round of stuff to return for later.  But
"later" is probably going to come fairly soon since it would make sense to
target the 3.8 merge window.

>> It works now. But I really need some one to confirm whether my
>> modification is enough or not,
>> if there have other potential problems.

Cyberman: it seems like your bias hack is working for you.  But, as Bjorn
says, this sounds like a driver bug.  What happens if you just revert your
changes, but then in mvsas.c change the "if (!res_start || !res_len)" to
just say "if (!res_len)"?  That seems like the true error test.  If that
works, you should submit that change to the community.

Bjorn et al: does it seem reasonable to add a bias to the mappings so that
we never report a zero value as valid?  This may be sufficiently defensive
programming that it's just the right thing to do regardless of whether
drivers are technically at fault or not.  If so, what's a good bias?  (I'm
inclined to think 64K rather than 4K.)

Thanks!

-- 
Chris Metcalf, Tilera Corp.
http://www.tilera.com

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCIe IO space support on Tilera GX: Is there any one who can confirm my modification to fix it is OK?

2012-10-26 Thread Bjorn Helgaas
On Fri, Oct 26, 2012 at 3:01 AM, Cyberman Wu  wrote:
> We're not using 3.6.x, we're using is from MDE-4.1.0 from Tilera and
> it patch 3.0.38.

That's fine, but you sent the email to the linux-pci and linux-kernel
lists, and on those lists, we're only concerned with the upstream
Linux kernels, e.g., 3.6.  If you need support for MDE-4.1.0, you need
to talk to whoever supplies that, because we have no idea what it is.

> For mvsas, it seems do think 0 I/O address invalied.

That's a driver bug.  Zero is a perfectly valid I/O address.  On many
systems it's not usable because of platform restrictions, but the
driver has no way to know about those restrictions, and the driver
should still work on the platforms where zero *is* usable.

> When we using MDE-4.0.0 it don't support I/O space, I just bypass
> these check since after
> investigate all code of mvsas it seems that I/O space map to BAR 2 is
> not really used.

If the driver doesn't need I/O space, it'd be a lot simpler to just
change it to use pci_enable_device_mem(), which indicates that we
don't need to enable I/O BARs, and strip out the code that checks
whether the I/O BARs are valid.  Then you wouldn't need to mess with
adding I/O space support in your platform.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCIe IO space support on Tilera GX: Is there any one who can confirm my modification to fix it is OK?

2012-10-26 Thread Cyberman Wu
We're not using 3.6.x, we're using is from MDE-4.1.0 from Tilera and
it patch 3.0.38.
>From its release notes that PCIe I/O space is already supported.
I provide diff of pci_gx.c between 3.6.3 and MDE-4.1.0 for a hint,
since I don't know
if it's allowed to use attached file in mail list, and their patch for
3.0.38 is bigger than
7MB.

For mvsas, it seems do think 0 I/O address invalied. Some code from
drivers/scsi/mvsas/mv_init.c:
int mvs_ioremap(struct mvs_info *mvi, int bar, int bar_ex)
{
unsigned long res_start, res_len, res_flag, res_flag_ex = 0;
struct pci_dev *pdev = mvi->pdev;
if (bar_ex != -1) {
/*
 * ioremap main and peripheral registers
 */
res_start = pci_resource_start(pdev, bar_ex);
res_len = pci_resource_len(pdev, bar_ex);
if (!res_start || !res_len)
goto err_out;

res_flag_ex = pci_resource_flags(pdev, bar_ex);
if (res_flag_ex & IORESOURCE_MEM) {
if (res_flag_ex & IORESOURCE_CACHEABLE)
mvi->regs_ex = ioremap(res_start, res_len);
else
mvi->regs_ex = ioremap_nocache(res_start,
res_len);
} else
mvi->regs_ex = (void *)res_start;
if (!mvi->regs_ex)
goto err_out;
}

res_start = pci_resource_start(pdev, bar);
res_len = pci_resource_len(pdev, bar);
if (!res_start || !res_len)
goto err_out;

res_flag = pci_resource_flags(pdev, bar);
if (res_flag & IORESOURCE_CACHEABLE)
mvi->regs = ioremap(res_start, res_len);
else
mvi->regs = ioremap_nocache(res_start, res_len);

if (!mvi->regs) {
if (mvi->regs_ex && (res_flag_ex & IORESOURCE_MEM))
iounmap(mvi->regs_ex);
mvi->regs_ex = NULL;
goto err_out;
}

return 0;
err_out:
return -1;
}

For 64xx, bar_ex is I/O space, and
res_start = pci_resource_start(pdev, bar_ex);
res_len = pci_resource_len(pdev, bar_ex);
if (!res_start || !res_len)
goto err_out;
will cause driver loading failed.

When we using MDE-4.0.0 it don't support I/O space, I just bypass
these check since after
investigate all code of mvsas it seems that I/O space map to BAR 2 is
not really used.

When the same card inserted into x86 platform, the allocated I/O space
is not start from 0,
so it works fine.


On Fri, Oct 26, 2012 at 4:03 PM, Bjorn Helgaas  wrote:
> [+cc Chris, also a few comments below]
>
> On Fri, Oct 26, 2012 at 12:59 AM, Cyberman Wu  wrote:
>> After we upgrade to MDE 4.1.0 from Tilera, we encounter a problem that
>> only on HighPoint 2680 card works, I've
>> tried to fix it, but since most time I'm working in user space, I'm
>> not sure my fix is enough. Their FAE said that
>> the guy who add PCIe I/O space support is on vacation and I can't get
>> help from him now, I hope maybe there
>> will have somebody can help.
>>
>>
>> Problem we encountered:
>>
>> pci :00:00.0: BAR 8: assigned [mem 0x100c000-0x100c00f]
>> pci :00:00.0: BAR 9: assigned [mem 0x100c010-0x100c01f pref]
>> pci :00:00.0: BAR 7: assigned [io  0x-0x0fff]
>> pci :01:00.0: BAR 6: assigned [mem 0x100c010-0x100c013 pref]
>> pci :01:00.0: BAR 6: set to [mem 0x100c010-0x100c013 pref]
>> (PCI address [0xc010-0xc013])
>> pci :01:00.0: BAR 4: assigned [mem 0x100c000-0x100c000 64bit]
>> pci :01:00.0: BAR 4: set to [mem 0x100c000-0x100c000
>> 64bit] (PCI address [0xc000-0xc000])
>> pci :01:00.0: BAR 2: assigned [io  0x-0x007f]
>> pci :01:00.0: BAR 2: set to [io  0x-0x007f] (PCI address [0x0-0x7f])
>> pci :00:00.0: PCI bridge to [bus 01-01]
>> pci :00:00.0:   bridge window [io  0x-0x0fff]
>> pci :00:00.0:   bridge window [mem 0x100c000-0x100c00f]
>> pci :00:00.0:   bridge window [mem 0x100c010-0x100c01f pref]
>> pci 0001:00:00.0: BAR 8: assigned [mem 0x101c000-0x101c00f]
>> pci 0001:00:00.0: BAR 9: assigned [mem 0x101c010-0x101c01f pref]
>> pci 0001:00:00.0: BAR 7: assigned [io  0x-0x0fff]
>> pci 0001:01:00.0: BAR 6: assigned [mem 0x101c010-0x101c013 pref]
>> pci 0001:01:00.0: BAR 6: set to [mem 0x101c010-0x101c013 pref]
>> (PCI address [0xc010-0xc013])
>> pci 0001:01:00.0: BAR 4: assigned [mem 0x101c000-0x101c000 64bit]
>> pci 0001:01:00.0: BAR 4: set to [mem 0x101c000-0x101c000
>> 64bit] (PCI address [0xc000-0xc000])
>> pci 0001:01:00.0: BAR 2: assigned [io  0x-0x007f]
>> pci 0001:01:00.0: BAR 2: set to [io  0x-0x007f] (PCI ad

Re: PCIe IO space support on Tilera GX: Is there any one who can confirm my modification to fix it is OK?

2012-10-26 Thread Bjorn Helgaas
[+cc Chris, also a few comments below]

On Fri, Oct 26, 2012 at 12:59 AM, Cyberman Wu  wrote:
> After we upgrade to MDE 4.1.0 from Tilera, we encounter a problem that
> only on HighPoint 2680 card works, I've
> tried to fix it, but since most time I'm working in user space, I'm
> not sure my fix is enough. Their FAE said that
> the guy who add PCIe I/O space support is on vacation and I can't get
> help from him now, I hope maybe there
> will have somebody can help.
>
>
> Problem we encountered:
>
> pci :00:00.0: BAR 8: assigned [mem 0x100c000-0x100c00f]
> pci :00:00.0: BAR 9: assigned [mem 0x100c010-0x100c01f pref]
> pci :00:00.0: BAR 7: assigned [io  0x-0x0fff]
> pci :01:00.0: BAR 6: assigned [mem 0x100c010-0x100c013 pref]
> pci :01:00.0: BAR 6: set to [mem 0x100c010-0x100c013 pref]
> (PCI address [0xc010-0xc013])
> pci :01:00.0: BAR 4: assigned [mem 0x100c000-0x100c000 64bit]
> pci :01:00.0: BAR 4: set to [mem 0x100c000-0x100c000
> 64bit] (PCI address [0xc000-0xc000])
> pci :01:00.0: BAR 2: assigned [io  0x-0x007f]
> pci :01:00.0: BAR 2: set to [io  0x-0x007f] (PCI address [0x0-0x7f])
> pci :00:00.0: PCI bridge to [bus 01-01]
> pci :00:00.0:   bridge window [io  0x-0x0fff]
> pci :00:00.0:   bridge window [mem 0x100c000-0x100c00f]
> pci :00:00.0:   bridge window [mem 0x100c010-0x100c01f pref]
> pci 0001:00:00.0: BAR 8: assigned [mem 0x101c000-0x101c00f]
> pci 0001:00:00.0: BAR 9: assigned [mem 0x101c010-0x101c01f pref]
> pci 0001:00:00.0: BAR 7: assigned [io  0x-0x0fff]
> pci 0001:01:00.0: BAR 6: assigned [mem 0x101c010-0x101c013 pref]
> pci 0001:01:00.0: BAR 6: set to [mem 0x101c010-0x101c013 pref]
> (PCI address [0xc010-0xc013])
> pci 0001:01:00.0: BAR 4: assigned [mem 0x101c000-0x101c000 64bit]
> pci 0001:01:00.0: BAR 4: set to [mem 0x101c000-0x101c000
> 64bit] (PCI address [0xc000-0xc000])
> pci 0001:01:00.0: BAR 2: assigned [io  0x-0x007f]
> pci 0001:01:00.0: BAR 2: set to [io  0x-0x007f] (PCI address [0x0-0x7f])
> pci 0001:00:00.0: PCI bridge to [bus 01-01]
> pci 0001:00:00.0:   bridge window [io  0x-0x0fff]
> pci 0001:00:00.0:   bridge window [mem 0x101c000-0x101c00f]
> pci 0001:00:00.0:   bridge window [mem 0x101c010-0x101c01f pref]
> pci :00:00.0: enabling device (0006 -> 0007)
> pci 0001:00:00.0: enabling device (0006 -> 0007)
> pci_bus :00: resource 0 [io  0x-0x]
> pci_bus :00: resource 1 [mem 0x100c000-0x100]
> pci_bus :01: resource 0 [io  0x-0x0fff]
> pci_bus :01: resource 1 [mem 0x100c000-0x100c00f]
> pci_bus :01: resource 2 [mem 0x100c010-0x100c01f pref]
> pci_bus 0001:00: resource 0 [io  0x-0x]
> pci_bus 0001:00: resource 1 [mem 0x101c000-0x101]
> pci_bus 0001:01: resource 0 [io  0x-0x0fff]
> pci_bus 0001:01: resource 1 [mem 0x101c000-0x101c00f]
> pci_bus 0001:01: resource 2 [mem 0x101c010-0x101c01f pref]
> ..
> mvsas :01:00.0: mvsas: driver version 0.8.2
> mvsas :01:00.0: enabling device ( -> 0003)
> mvsas :01:00.0: enabling bus mastering
> mvsas :01:00.0: mvsas: PCI-E x4, Bandwidth Usage: 2.5 Gbps
> mvsas :01:00.0: Phy3 : No sig fis
> scsi0 : mvsas
> ..
> mvsas 0001:01:00.0: mvsas: driver version 0.8.2
> mvsas 0001:01:00.0: enabling device ( -> 0003)
> mvsas 0001:01:00.0: enabling bus mastering
> mvsas 0001:01:00.0: BAR 2: can't reserve [io  0x-0x007f]
> mvsas: probe of 0001:01:00.0 failed with error -16
>
>
> My modification:
>
> --- 
> /opt/tilera/TileraMDE-4.1.0.148119/tilegx/src/linux-2.6.40.38/arch/tile/kernel/pci_gx.c
>  2012-10-22
> 14:56:59.783096378 +0800
> +++ Tilera_src/src/linux-2.6.40.38/arch/tile/kernel/pci_gx.c2012-10-26
> 13:55:02.731947886 +0800
> @@ -368,6 +368,10 @@
> int num_trio_shims = 0;
> int ctl_index = 0;
> int i, j;
> +// Modified by Cyberman Wu on Oct 25th, 2012.
> +   resource_size_t io_mem_start;
> +   resource_size_t io_mem_end;
> +   resource_size_t io_mem_size;
>
> if (!pci_probe) {
> pr_info("PCI: disabled by boot argument\n");
> @@ -457,6 +461,18 @@
> }
>
>  out:
> +   // Use IO memory space 0~0x for every controller will
> +   // cause device on controller other than the first failed to
> +   // load driver if it using IO regions.
> +   // Is reserve the first 4K IO address space OK? Tilera use
> +   // IO space address begin from 0, but some drivers in Linux
> +   // recognize 0 address a error, say, mvsas, so for compatiblity
> +   // reserve some address from 0 should be better?

It's not that mvsas thinks I/O address 0 is invalid, it's just that we
already assigned [io 0x-0x007f] to the device at :01:00.0:

  pci :01:00.0: BAR 2: set

Re: PCIE ASPM support hangs my laptop pretty often

2008-02-18 Thread Shaohua Li

On Wed, 2008-02-06 at 01:40 +0800, Дамјан Георгиевски wrote:
> I've patched my kernel with the PCIe ASPM and after setting
> echo powersave > /sys/module/pcie_aspm/parameters/policy
> 
> I started to experience random hangs of my laptop.
> Hardware info:
> Thinkpad x60s 1704-5UG
> also tested on a firends X60s 1702-F6U
> 
> Kernel is 2.6.24 + these patches:
>  tuxonice 3.0-rc5
>  thinkpad_acpi v0.19-20080107
>  tp_smapi 0.36
Hi,
Sorry for the long delay, I'm just back from vocation. Some devices or
chipsets don't work well with ASPM. This is one of the reason why the
default policy of the patch is per BIOS setting. Ideally drivers should
disable ASPM for specific devices, the patch provides an API
(pci_disable_link_state) for this too. As Auke suggested, you can use
the per-device interface to control separate links to see which device
is broken. If you found one, please report to driver maintainer and me,
we can disable ASPM in the driver.

Thanks,
Shaohua

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCIE ASPM support hangs my laptop pretty often

2008-02-06 Thread Greg KH
On Wed, Feb 06, 2008 at 01:46:22PM -0800, Kok, Auke wrote:
> Rafael J. Wysocki wrote:
> > On Wednesday, 6 of February 2008, Pavel Machek wrote:
> >> On Tue 2008-02-05 16:22:55, Kok, Auke wrote:
> >>> ?? ??? wrote:
>  I've patched my kernel with the PCIe ASPM and after setting
>  echo powersave > /sys/module/pcie_aspm/parameters/policy
> 
>  I started to experience random hangs of my laptop.
>  Hardware info:
>  Thinkpad x60s 1704-5UG
> >>> the x60's chipset doesn't support ASPM properly afaik... bad idea.
> >> Well, the code shouldn't then cause a crash of the machine :)
> > The user enabled it specifically (where it is disabled by default)
> >
> > ASPM has been crashing e1000(e), which is why I've recently merged a 
> > patch
> > to disable L1 ASPM for the onboard 82573 nic on those platforms.
> >
> > this new infrastructure should work in the default configuration - 
> > enabling
> > ASPM where this system leaves it disabled is expected to give problems
> > unless you know what you are doing.
>  In my defense, the patch documentation didn't say it doesn't work with 
>  my 
>  hardware, nor that it hangs the chipset :) and the promised 1.3w surelly 
>  looked nice.
> 
>  So, are there any benefits of ASPM if I have it in the kernel but it's 
>  set to 
>  default? I got the impression that "default" means not much power 
>  savings?
> >>> did the Kconfig not come with a big fat (EXPERIMENTAL) ?
> >> (EXPERIMENTAL) is something different from (KNOWN BROKEN).
> >>
> >> If we know about broken setups, we should probably be blacklisting
> >> them.
> > 
> > Well, the ASPM thing seems to break every single setup I've tested.  So,
> > perhaps we should whitelist the working ones?
> 
> greg KH is reverting this patch alltogether in mainline, maybe the original 
> writer
> can accomodate some of the comments in the rewrite.

It's already reverted.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCIE ASPM support hangs my laptop pretty often

2008-02-06 Thread Kok, Auke
Rafael J. Wysocki wrote:
> On Wednesday, 6 of February 2008, Pavel Machek wrote:
>> On Tue 2008-02-05 16:22:55, Kok, Auke wrote:
>>> ?? ??? wrote:
 I've patched my kernel with the PCIe ASPM and after setting
 echo powersave > /sys/module/pcie_aspm/parameters/policy

 I started to experience random hangs of my laptop.
 Hardware info:
 Thinkpad x60s 1704-5UG
>>> the x60's chipset doesn't support ASPM properly afaik... bad idea.
>> Well, the code shouldn't then cause a crash of the machine :)
> The user enabled it specifically (where it is disabled by default)
>
> ASPM has been crashing e1000(e), which is why I've recently merged a patch
> to disable L1 ASPM for the onboard 82573 nic on those platforms.
>
> this new infrastructure should work in the default configuration - 
> enabling
> ASPM where this system leaves it disabled is expected to give problems
> unless you know what you are doing.
 In my defense, the patch documentation didn't say it doesn't work with my 
 hardware, nor that it hangs the chipset :) and the promised 1.3w surelly 
 looked nice.

 So, are there any benefits of ASPM if I have it in the kernel but it's set 
 to 
 default? I got the impression that "default" means not much power savings?
>>> did the Kconfig not come with a big fat (EXPERIMENTAL) ?
>> (EXPERIMENTAL) is something different from (KNOWN BROKEN).
>>
>> If we know about broken setups, we should probably be blacklisting
>> them.
> 
> Well, the ASPM thing seems to break every single setup I've tested.  So,
> perhaps we should whitelist the working ones?

greg KH is reverting this patch alltogether in mainline, maybe the original 
writer
can accomodate some of the comments in the rewrite.

Auke


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCIE ASPM support hangs my laptop pretty often

2008-02-06 Thread Rafael J. Wysocki
On Wednesday, 6 of February 2008, Pavel Machek wrote:
> On Tue 2008-02-05 16:22:55, Kok, Auke wrote:
> > ?? ??? wrote:
> > > I've patched my kernel with the PCIe ASPM and after setting
> > > echo powersave > /sys/module/pcie_aspm/parameters/policy
> > >
> > > I started to experience random hangs of my laptop.
> > > Hardware info:
> > > Thinkpad x60s 1704-5UG
> >  the x60's chipset doesn't support ASPM properly afaik... bad idea.
> > >>> Well, the code shouldn't then cause a crash of the machine :)
> > >> The user enabled it specifically (where it is disabled by default)
> > >>
> > >> ASPM has been crashing e1000(e), which is why I've recently merged a 
> > >> patch
> > >> to disable L1 ASPM for the onboard 82573 nic on those platforms.
> > >>
> > >> this new infrastructure should work in the default configuration - 
> > >> enabling
> > >> ASPM where this system leaves it disabled is expected to give problems
> > >> unless you know what you are doing.
> > > 
> > > In my defense, the patch documentation didn't say it doesn't work with my 
> > > hardware, nor that it hangs the chipset :) and the promised 1.3w surelly 
> > > looked nice.
> > > 
> > > So, are there any benefits of ASPM if I have it in the kernel but it's 
> > > set to 
> > > default? I got the impression that "default" means not much power savings?
> > 
> > did the Kconfig not come with a big fat (EXPERIMENTAL) ?
> 
> (EXPERIMENTAL) is something different from (KNOWN BROKEN).
> 
> If we know about broken setups, we should probably be blacklisting
> them.

Well, the ASPM thing seems to break every single setup I've tested.  So,
perhaps we should whitelist the working ones?

Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCIE ASPM support hangs my laptop pretty often

2008-02-06 Thread Pavel Machek
On Tue 2008-02-05 16:22:55, Kok, Auke wrote:
> ?? ??? wrote:
> > I've patched my kernel with the PCIe ASPM and after setting
> > echo powersave > /sys/module/pcie_aspm/parameters/policy
> >
> > I started to experience random hangs of my laptop.
> > Hardware info:
> > Thinkpad x60s 1704-5UG
>  the x60's chipset doesn't support ASPM properly afaik... bad idea.
> >>> Well, the code shouldn't then cause a crash of the machine :)
> >> The user enabled it specifically (where it is disabled by default)
> >>
> >> ASPM has been crashing e1000(e), which is why I've recently merged a patch
> >> to disable L1 ASPM for the onboard 82573 nic on those platforms.
> >>
> >> this new infrastructure should work in the default configuration - enabling
> >> ASPM where this system leaves it disabled is expected to give problems
> >> unless you know what you are doing.
> > 
> > In my defense, the patch documentation didn't say it doesn't work with my 
> > hardware, nor that it hangs the chipset :) and the promised 1.3w surelly 
> > looked nice.
> > 
> > So, are there any benefits of ASPM if I have it in the kernel but it's set 
> > to 
> > default? I got the impression that "default" means not much power savings?
> 
> did the Kconfig not come with a big fat (EXPERIMENTAL) ?

(EXPERIMENTAL) is something different from (KNOWN BROKEN).

If we know about broken setups, we should probably be blacklisting
them.

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCIE ASPM support hangs my laptop pretty often

2008-02-05 Thread Kok, Auke
?? ??? wrote:
> I've patched my kernel with the PCIe ASPM and after setting
> echo powersave > /sys/module/pcie_aspm/parameters/policy
>
> I started to experience random hangs of my laptop.
> Hardware info:
> Thinkpad x60s 1704-5UG
 the x60's chipset doesn't support ASPM properly afaik... bad idea.
>>> Well, the code shouldn't then cause a crash of the machine :)
>> The user enabled it specifically (where it is disabled by default)
>>
>> ASPM has been crashing e1000(e), which is why I've recently merged a patch
>> to disable L1 ASPM for the onboard 82573 nic on those platforms.
>>
>> this new infrastructure should work in the default configuration - enabling
>> ASPM where this system leaves it disabled is expected to give problems
>> unless you know what you are doing.
> 
> In my defense, the patch documentation didn't say it doesn't work with my 
> hardware, nor that it hangs the chipset :) and the promised 1.3w surelly 
> looked nice.
> 
> So, are there any benefits of ASPM if I have it in the kernel but it's set to 
> default? I got the impression that "default" means not much power savings?

did the Kconfig not come with a big fat (EXPERIMENTAL) ?

it actually depends for each device on the PCI-Express bus. Most PCI-E ports
support it but the device has the option of advertising enablement of that
capability or not.

both platform and each device on the pci-e bus are involved. some sata chipsets
work great with it, some that might not even advertise the capability... but 
it's
really hit and miss.

Your report is great of course, no doubt about it. I hope that people understand
that this feature can seriously break things at the bus level. It makes me feel 
a
lot better about the issues we had with some of our network cards and ASPM :)

once we get some feeling about how good ASPM works in the field for people we
might have to blacklist certain platforms or devices.

you could (for instance) try to see which device on your busses support ASPM and
work on per-device ASPM parameters (which is one of the things I suggested 
before)
so that we get an idea of which device is badly behaving with ASPM on your 
system.

Cheers,

Auke

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCIE ASPM support hangs my laptop pretty often

2008-02-05 Thread Дамјан Георгиевски
> >>> I've patched my kernel with the PCIe ASPM and after setting
> >>> echo powersave > /sys/module/pcie_aspm/parameters/policy
> >>>
> >>> I started to experience random hangs of my laptop.
> >>> Hardware info:
> >>> Thinkpad x60s 1704-5UG
> >>
> >> the x60's chipset doesn't support ASPM properly afaik... bad idea.
> >
> > Well, the code shouldn't then cause a crash of the machine :)
>
> The user enabled it specifically (where it is disabled by default)
>
> ASPM has been crashing e1000(e), which is why I've recently merged a patch
> to disable L1 ASPM for the onboard 82573 nic on those platforms.
>
> this new infrastructure should work in the default configuration - enabling
> ASPM where this system leaves it disabled is expected to give problems
> unless you know what you are doing.

In my defense, the patch documentation didn't say it doesn't work with my 
hardware, nor that it hangs the chipset :) and the promised 1.3w surelly 
looked nice.

So, are there any benefits of ASPM if I have it in the kernel but it's set to 
default? I got the impression that "default" means not much power savings?


-- 
Damjan Georgievski
Free Software Macedonia
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCIE ASPM support hangs my laptop pretty often

2008-02-05 Thread Kok, Auke
Greg KH wrote:
> On Tue, Feb 05, 2008 at 10:46:23AM -0800, Arjan van de Ven wrote:
>> On Tue, 5 Feb 2008 18:40:04 +0100
>>  ??  <[EMAIL PROTECTED]> wrote:
>>
>>> I've patched my kernel with the PCIe ASPM and after setting
>>> echo powersave > /sys/module/pcie_aspm/parameters/policy
>>>
>>> I started to experience random hangs of my laptop.
>>> Hardware info:
>>> Thinkpad x60s 1704-5UG
>> the x60's chipset doesn't support ASPM properly afaik... bad idea.
> 
> Well, the code shouldn't then cause a crash of the machine :)

The user enabled it specifically (where it is disabled by default)

ASPM has been crashing e1000(e), which is why I've recently merged a patch to
disable L1 ASPM for the onboard 82573 nic on those platforms.

this new infrastructure should work in the default configuration - enabling ASPM
where this system leaves it disabled is expected to give problems unless you 
know
what you are doing.

Auke
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   >