Re: [PATCH v6 2/2] PCI: Try best to allocate pref mmio 64bit above 4g

2014-01-10 Thread Yinghai Lu
On Fri, Jan 10, 2014 at 1:41 AM, Guo Chao  wrote:
> On Wed, Jan 08, 2014 at 03:34:54PM -0800, Yinghai Lu wrote:
> Just FYI, a Mellanox net card failed after exactly this patch.
>
> 3.13-rc7 + bjorn's series is OK. After this patch applied, Mellanox
> driver complains:
>
>  |mlx4_core 0003:05:00.0: Multiple PFs not yet supported.  Skipping PF.
>  |mlx4_core: probe of 0003:05:00.0 failed with error -22
>
> This is caused by MMIO read from BAR 0 (64-bit non-prefetchable) returns
> non-zore value.
>
> Resource assignment, as far as we can see, works fine. The noticable
> effect of this patch is putting ROM BAR under non-prefetachable. I try
> to revert this effect by adding MEM_64 to its ROM resource and it works
> again (system does not expose 4G above aperture yet). Not sure what's
> the root cause, looks like a driver/firmware/hardware defect.

Interesting. Can you  post boot log with "debug ignore_loglevel initcall_debug"
and with/without this patch?

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 2/2] PCI: Try best to allocate pref mmio 64bit above 4g

2014-01-10 Thread Guo Chao
On Wed, Jan 08, 2014 at 03:34:54PM -0800, Yinghai Lu wrote:
> On Sun, Dec 22, 2013 at 5:14 PM, Yinghai Lu  wrote:
> > On Sun, Dec 22, 2013 at 4:00 PM, Bjorn Helgaas  wrote:
> >> On Thu, Dec 19, 2013 at 1:44 PM, Yinghai Lu  wrote:
> >>
> >> Let me see if I can figure out what you're trying to do here.  Please
> >> correct me if I'm wrong:
> >>
> >>> When one of children resources does not support MEM_64, MEM_64 for
> >>> bridge get reset, so pull down whole pref resource on the bridge under 4G.
> >>
> >> When we allocate space for a bridge's prefetchable window, we
> >> currently look at the devices behind the bridge and put the window
> >> below 4GB if any of those children has a 32-bit prefetchable BAR.
> >>
> >> This maximizes the use of prefetch, at the cost of using more 32-bit
> >> address space.
> >
> > yes. and we have problem when we have 8 sockets or 32 sockets system,
> > will have limit 32bit space.
> > but we have enough above 4G 64bit mmio for prefetchable.
> >
> >>
> >>> If the bridge support pref mem 64, will only allocate that with pref 
> >>> mem64 to
> >>> children that support it.
> >>> For children resources if they only support pref mem 32, will allocate 
> >>> them
> >>> from non pref mem instead.
> >>
> >> You are changing this so that we will always try to put a bridge's
> >> 64-bit prefetchable window above 4GB, regardless of what devices are
> >> behind the bridge.  If a device behind the bridge has a 32-bit
> >> prefetchable BAR, we will place that BAR in the bridge's 32-bit
> >> non-prefetchable window.
> >
> > Yes. so we can keep IORESOURCE_MEM64 in the flags for PREF.
> >
> >>
> >> This minimizes the use of the 32-bit address space, at the cost of not
> >> being able to use prefetch as much.
> >>
> >>> If the bridge only support 32bit pref mmio, will still have all children 
> >>> pref
> >>> mmio under that.
> >>
> >> Obviously, if a bridge has a prefetchable window that's only 32 bits,
> >> 64-bit prefetchable BARs behind the bridge will have to be in that
> >> 32-bit prefetchable window or the 32-bit non-prefetchable window.  And
> >> if the bridge has no prefetchable window at all, every memory BAR
> >> behind the bridge will have to be in the 32-bit non-prefetchable
> >> window.
> >
> > Yes.
> >
> >>
> >> I'll look at the actual patch later; I just want to make sure I
> >> understand your intent first.
> 
> Hi, Bjorn,
> 
> Can you check and add this one to your pci/resource branch?
> With that we can close the loop for 64bit mmio resource allocation.
> 

Just FYI, a Mellanox net card failed after exactly this patch.

3.13-rc7 + bjorn's series is OK. After this patch applied, Mellanox
driver complains:

 |mlx4_core 0003:05:00.0: Multiple PFs not yet supported.  Skipping PF.
 |mlx4_core: probe of 0003:05:00.0 failed with error -22

This is caused by MMIO read from BAR 0 (64-bit non-prefetchable) returns
non-zore value.

Resource assignment, as far as we can see, works fine. The noticable
effect of this patch is putting ROM BAR under non-prefetachable. I try
to revert this effect by adding MEM_64 to its ROM resource and it works
again (system does not expose 4G above aperture yet). Not sure what's
the root cause, looks like a driver/firmware/hardware defect.

Thanks
Guo Chao

> Thanks
> 
> Yinghai
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 2/2] PCI: Try best to allocate pref mmio 64bit above 4g

2014-01-08 Thread Yinghai Lu
On Sun, Dec 22, 2013 at 5:14 PM, Yinghai Lu  wrote:
> On Sun, Dec 22, 2013 at 4:00 PM, Bjorn Helgaas  wrote:
>> On Thu, Dec 19, 2013 at 1:44 PM, Yinghai Lu  wrote:
>>
>> Let me see if I can figure out what you're trying to do here.  Please
>> correct me if I'm wrong:
>>
>>> When one of children resources does not support MEM_64, MEM_64 for
>>> bridge get reset, so pull down whole pref resource on the bridge under 4G.
>>
>> When we allocate space for a bridge's prefetchable window, we
>> currently look at the devices behind the bridge and put the window
>> below 4GB if any of those children has a 32-bit prefetchable BAR.
>>
>> This maximizes the use of prefetch, at the cost of using more 32-bit
>> address space.
>
> yes. and we have problem when we have 8 sockets or 32 sockets system,
> will have limit 32bit space.
> but we have enough above 4G 64bit mmio for prefetchable.
>
>>
>>> If the bridge support pref mem 64, will only allocate that with pref mem64 
>>> to
>>> children that support it.
>>> For children resources if they only support pref mem 32, will allocate them
>>> from non pref mem instead.
>>
>> You are changing this so that we will always try to put a bridge's
>> 64-bit prefetchable window above 4GB, regardless of what devices are
>> behind the bridge.  If a device behind the bridge has a 32-bit
>> prefetchable BAR, we will place that BAR in the bridge's 32-bit
>> non-prefetchable window.
>
> Yes. so we can keep IORESOURCE_MEM64 in the flags for PREF.
>
>>
>> This minimizes the use of the 32-bit address space, at the cost of not
>> being able to use prefetch as much.
>>
>>> If the bridge only support 32bit pref mmio, will still have all children 
>>> pref
>>> mmio under that.
>>
>> Obviously, if a bridge has a prefetchable window that's only 32 bits,
>> 64-bit prefetchable BARs behind the bridge will have to be in that
>> 32-bit prefetchable window or the 32-bit non-prefetchable window.  And
>> if the bridge has no prefetchable window at all, every memory BAR
>> behind the bridge will have to be in the 32-bit non-prefetchable
>> window.
>
> Yes.
>
>>
>> I'll look at the actual patch later; I just want to make sure I
>> understand your intent first.

Hi, Bjorn,

Can you check and add this one to your pci/resource branch?
With that we can close the loop for 64bit mmio resource allocation.

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 2/2] PCI: Try best to allocate pref mmio 64bit above 4g

2013-12-22 Thread Yinghai Lu
On Sun, Dec 22, 2013 at 4:00 PM, Bjorn Helgaas  wrote:
> On Thu, Dec 19, 2013 at 1:44 PM, Yinghai Lu  wrote:
>
> Let me see if I can figure out what you're trying to do here.  Please
> correct me if I'm wrong:
>
>> When one of children resources does not support MEM_64, MEM_64 for
>> bridge get reset, so pull down whole pref resource on the bridge under 4G.
>
> When we allocate space for a bridge's prefetchable window, we
> currently look at the devices behind the bridge and put the window
> below 4GB if any of those children has a 32-bit prefetchable BAR.
>
> This maximizes the use of prefetch, at the cost of using more 32-bit
> address space.

yes. and we have problem when we have 8 sockets or 32 sockets system,
will have limit 32bit space.
but we have enough above 4G 64bit mmio for prefetchable.

>
>> If the bridge support pref mem 64, will only allocate that with pref mem64 to
>> children that support it.
>> For children resources if they only support pref mem 32, will allocate them
>> from non pref mem instead.
>
> You are changing this so that we will always try to put a bridge's
> 64-bit prefetchable window above 4GB, regardless of what devices are
> behind the bridge.  If a device behind the bridge has a 32-bit
> prefetchable BAR, we will place that BAR in the bridge's 32-bit
> non-prefetchable window.

Yes. so we can keep IORESOURCE_MEM64 in the flags for PREF.

>
> This minimizes the use of the 32-bit address space, at the cost of not
> being able to use prefetch as much.
>
>> If the bridge only support 32bit pref mmio, will still have all children pref
>> mmio under that.
>
> Obviously, if a bridge has a prefetchable window that's only 32 bits,
> 64-bit prefetchable BARs behind the bridge will have to be in that
> 32-bit prefetchable window or the 32-bit non-prefetchable window.  And
> if the bridge has no prefetchable window at all, every memory BAR
> behind the bridge will have to be in the 32-bit non-prefetchable
> window.

Yes.

>
> I'll look at the actual patch later; I just want to make sure I
> understand your intent first.

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 2/2] PCI: Try best to allocate pref mmio 64bit above 4g

2013-12-22 Thread Bjorn Helgaas
On Thu, Dec 19, 2013 at 1:44 PM, Yinghai Lu  wrote:

Let me see if I can figure out what you're trying to do here.  Please
correct me if I'm wrong:

> When one of children resources does not support MEM_64, MEM_64 for
> bridge get reset, so pull down whole pref resource on the bridge under 4G.

When we allocate space for a bridge's prefetchable window, we
currently look at the devices behind the bridge and put the window
below 4GB if any of those children has a 32-bit prefetchable BAR.

This maximizes the use of prefetch, at the cost of using more 32-bit
address space.

> If the bridge support pref mem 64, will only allocate that with pref mem64 to
> children that support it.
> For children resources if they only support pref mem 32, will allocate them
> from non pref mem instead.

You are changing this so that we will always try to put a bridge's
64-bit prefetchable window above 4GB, regardless of what devices are
behind the bridge.  If a device behind the bridge has a 32-bit
prefetchable BAR, we will place that BAR in the bridge's 32-bit
non-prefetchable window.

This minimizes the use of the 32-bit address space, at the cost of not
being able to use prefetch as much.

> If the bridge only support 32bit pref mmio, will still have all children pref
> mmio under that.

Obviously, if a bridge has a prefetchable window that's only 32 bits,
64-bit prefetchable BARs behind the bridge will have to be in that
32-bit prefetchable window or the 32-bit non-prefetchable window.  And
if the bridge has no prefetchable window at all, every memory BAR
behind the bridge will have to be in the 32-bit non-prefetchable
window.

I'll look at the actual patch later; I just want to make sure I
understand your intent first.

Bjorn

> -v2: Add release bridge res support with bridge mem res for pref_mem children 
> res.
> -v3: refresh and make it can be applied early before for_each_dev_res 
> patchset.
> -v4: fix non-pref mmio 64bit support found by Guo Chao.
>
> Signed-off-by: Yinghai Lu 
> Tested-by: Guo Chao 
> ---
>  drivers/pci/setup-bus.c | 138 
> 
>  drivers/pci/setup-res.c |  20 ++-
>  2 files changed, 111 insertions(+), 47 deletions(-)
>
> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> index 138bdd6..b29504f 100644
> --- a/drivers/pci/setup-bus.c
> +++ b/drivers/pci/setup-bus.c
> @@ -713,12 +713,11 @@ static void pci_bridge_check_ranges(struct pci_bus *bus)
> bus resource of a given type. Note: we intentionally skip
> the bus resources which have already been assigned (that is,
> have non-NULL parent resource). */
> -static struct resource *find_free_bus_resource(struct pci_bus *bus, unsigned 
> long type)
> +static struct resource *find_free_bus_resource(struct pci_bus *bus,
> +unsigned long type_mask, unsigned long type)
>  {
> int i;
> struct resource *r;
> -   unsigned long type_mask = IORESOURCE_IO | IORESOURCE_MEM |
> - IORESOURCE_PREFETCH;
>
> pci_bus_for_each_resource(bus, r, i) {
> if (r == &ioport_resource || r == &iomem_resource)
> @@ -815,7 +814,8 @@ static void pbus_size_io(struct pci_bus *bus, 
> resource_size_t min_size,
> resource_size_t add_size, struct list_head *realloc_head)
>  {
> struct pci_dev *dev;
> -   struct resource *b_res = find_free_bus_resource(bus, IORESOURCE_IO);
> +   struct resource *b_res = find_free_bus_resource(bus, IORESOURCE_IO,
> +   IORESOURCE_IO);
> resource_size_t size = 0, size0 = 0, size1 = 0;
> resource_size_t children_add_size = 0;
> resource_size_t min_align, align;
> @@ -915,15 +915,17 @@ static inline resource_size_t 
> calculate_mem_align(resource_size_t *aligns,
>   * guarantees that all child resources fit in this size.
>   */
>  static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
> -unsigned long type, resource_size_t min_size,
> -   resource_size_t add_size,
> -   struct list_head *realloc_head)
> +unsigned long type, unsigned long type2,
> +unsigned long type3,
> +resource_size_t min_size, resource_size_t add_size,
> +struct list_head *realloc_head)
>  {
> struct pci_dev *dev;
> resource_size_t min_align, align, size, size0, size1;
> resource_size_t aligns[12]; /* Alignments from 1Mb to 2Gb */
> int order, max_order;
> -   struct resource *b_res = find_free_bus_resource(bus, type);
> +   struct resource *b_res = find_free_bus_resource(bus,
> +mask | IORESOURCE_PREFETCH, type);
> unsigned int mem64_mask = 0;
> resource_size_t children_add_size = 0;
>
> @@ -944,7 +946,9 @@ static int pbus_size_mem(struct pci_bus