Re: [PATCH 05/14] lib: Add I/O map cache implementation

2013-01-16 Thread Thierry Reding
On Wed, Jan 16, 2013 at 11:25:56AM +, Russell King - ARM Linux wrote:
> On Wed, Jan 16, 2013 at 11:18:22AM +0100, Thierry Reding wrote:
> > err = ioremap_page_range(virt, virt + SZ_64K - 1, phys,
> 
> Why -1 here?

Right, I forgot that end in these functions always means one byte after
the end. Removing the -1 seems to get past the remapping at least.
Reading the configuration space through the mapping doesn't though. I'll
investigate some more.

Thanks for pointing this out Russell.

Thierry


pgpxGHk9AA0K5.pgp
Description: PGP signature


Re: [PATCH 05/14] lib: Add I/O map cache implementation

2013-01-16 Thread Russell King - ARM Linux
On Wed, Jan 16, 2013 at 11:18:22AM +0100, Thierry Reding wrote:
>   err = ioremap_page_range(virt, virt + SZ_64K - 1, phys,

Why -1 here?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 05/14] lib: Add I/O map cache implementation

2013-01-16 Thread Thierry Reding
On Thu, Jan 10, 2013 at 12:24:17PM -0700, Jason Gunthorpe wrote:
> On Thu, Jan 10, 2013 at 08:03:27PM +0100, Thierry Reding wrote:
> 
> > > > You'd piece a mapping together, each bus requires 16 64k mappings, a
> > > > simple 2d array of busnr*16 of pointers would do the trick. A more
> > > > clever solution would be to allocate contiguous virtual memory and
> > > > split that up..
>  
> > > Oh, I see. I'm not very familiar with the internals of remapping, so
> > > I'll need to do some more reading. Thanks for the hints.
> > 
> > I forgot to ask. What's the advantage of having a contiguous virtual
> > memory area and splitting it up versus remapping each chunk separately?
> 
> Not alot, really, but it saves you from the pointer array and
> associated overhead. IIRC it is fairly easy to do in the kernel.
> 
> Arnd's version is good too, but you would be restricted to aligned
> powers of two for the bus number range in the DT, which is probably
> not that big a deal either?

I've been trying to make this work, but this implementation always
triggers a BUG_ON() in lib/ioremap.c, line 27:

27  BUG_ON(!pte_none(*pte));

which seems to indicate that the page is already mapped, right?

Below is the relevant code:

struct tegra_pcie_bus {
struct vm_struct *area;
struct list_head list;
unsigned int nr;
};

static struct tegra_pcie_bus *tegra_pcie_bus_alloc(struct tegra_pcie *pcie,
   unsigned int busnr)
{
unsigned long flags = VM_READ | VM_WRITE | VM_IO | VM_PFNMAP |
  VM_DONTEXPAND | VM_DONTDUMP;
phys_addr_t cs = pcie->cs->start;
struct tegra_pcie_bus *bus;
struct vm_struct *vm;
unsigned int i;
int err;

bus = devm_kzalloc(pcie->dev, sizeof(*bus), GFP_KERNEL);
if (!bus)
return ERR_PTR(-ENOMEM);

INIT_LIST_HEAD(&bus->list);
bus->nr = busnr;

bus->area = get_vm_area(SZ_1M, VM_IOREMAP);
if (!bus->area) {
err = -ENOMEM;
goto free;
}

for (i = 0; i < 16; i++) {
unsigned long virt = (unsigned long)bus->area->addr +
 i * SZ_64K;
phys_addr_t phys = cs + busnr * SZ_64K + i * SZ_1M;

err = ioremap_page_range(virt, virt + SZ_64K - 1, phys,
 vm_get_page_prot(flags));
if (err < 0) {
dev_err(pcie->dev, "ioremap_page_range() failed: %d\n",
err);
goto unmap;
}
}

return bus;

unmap:
vunmap(bus->area->addr);
free_vm_area(bus->area);
free:
devm_kfree(pcie->dev, bus);
return ERR_PTR(err);
}

Anybody see what's wrong with that?


pgpm83uwPpAH4.pgp
Description: PGP signature


Re: [PATCH 05/14] lib: Add I/O map cache implementation

2013-01-10 Thread Jason Gunthorpe
On Thu, Jan 10, 2013 at 09:20:07PM +0100, Thierry Reding wrote:

> > Arnd's version is good too, but you would be restricted to aligned
> > powers of two for the bus number range in the DT, which is probably
> > not that big a deal either?
> 
> Stephen suggested on IRC that we could try to keep a bit of dynamicity
> in the allocation scheme if we create the bus mapping when the first
> device on the bus is probed and discard the mapping if no devices are
> found.

You probably don't need to mess around with 'discard on empty' the
kernel should only access bus numbers that are in the range of the
subordinate busses of the bridges. So if you establish a mapping on a
bus-by-bus basis at first access, it should be fine and very close to
minimal..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 05/14] lib: Add I/O map cache implementation

2013-01-10 Thread Thierry Reding
On Thu, Jan 10, 2013 at 12:24:17PM -0700, Jason Gunthorpe wrote:
> On Thu, Jan 10, 2013 at 08:03:27PM +0100, Thierry Reding wrote:
> 
> > > > You'd piece a mapping together, each bus requires 16 64k mappings, a
> > > > simple 2d array of busnr*16 of pointers would do the trick. A more
> > > > clever solution would be to allocate contiguous virtual memory and
> > > > split that up..
>  
> > > Oh, I see. I'm not very familiar with the internals of remapping, so
> > > I'll need to do some more reading. Thanks for the hints.
> > 
> > I forgot to ask. What's the advantage of having a contiguous virtual
> > memory area and splitting it up versus remapping each chunk separately?
> 
> Not alot, really, but it saves you from the pointer array and
> associated overhead. IIRC it is fairly easy to do in the kernel.

I've been investigating this a bit, and one problem is that it will
prevent the driver from ever building as a module because the necessary
functions aren't exported and I'm not sure exporting them would be
acceptable. Currently PCI host controller drivers with MSI support can't
be built as modules because the MSI infrastructure requires it, but I
briefly discussed this with Bjorn at some point and it should be easy to
remove that requirement.

> Arnd's version is good too, but you would be restricted to aligned
> powers of two for the bus number range in the DT, which is probably
> not that big a deal either?

Stephen suggested on IRC that we could try to keep a bit of dynamicity
in the allocation scheme if we create the bus mapping when the first
device on the bus is probed and discard the mapping if no devices are
found.

Sounds like a good plan to me. Does anybody see any potential pitfalls?

Thierry


pgpwUHh6il_3O.pgp
Description: PGP signature


Re: [PATCH 05/14] lib: Add I/O map cache implementation

2013-01-10 Thread Jason Gunthorpe
On Thu, Jan 10, 2013 at 08:03:27PM +0100, Thierry Reding wrote:

> > > You'd piece a mapping together, each bus requires 16 64k mappings, a
> > > simple 2d array of busnr*16 of pointers would do the trick. A more
> > > clever solution would be to allocate contiguous virtual memory and
> > > split that up..
 
> > Oh, I see. I'm not very familiar with the internals of remapping, so
> > I'll need to do some more reading. Thanks for the hints.
> 
> I forgot to ask. What's the advantage of having a contiguous virtual
> memory area and splitting it up versus remapping each chunk separately?

Not alot, really, but it saves you from the pointer array and
associated overhead. IIRC it is fairly easy to do in the kernel.

Arnd's version is good too, but you would be restricted to aligned
powers of two for the bus number range in the DT, which is probably
not that big a deal either?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 05/14] lib: Add I/O map cache implementation

2013-01-10 Thread Thierry Reding
On Thu, Jan 10, 2013 at 07:55:05PM +0100, Thierry Reding wrote:
> On Thu, Jan 10, 2013 at 11:20:07AM -0700, Jason Gunthorpe wrote:
> > On Thu, Jan 10, 2013 at 11:25:44AM +0100, Thierry Reding wrote:
> > > On Thu, Jan 10, 2013 at 09:17:19AM +, Arnd Bergmann wrote:
> > > > On Thursday 10 January 2013, Thierry Reding wrote:
> > > > > On Wed, Jan 09, 2013 at 04:17:58PM -0700, Jason Gunthorpe wrote:
> > > > > > On Wed, Jan 09, 2013 at 04:12:31PM -0700, Stephen Warren wrote:
> > > > > > You could decrease the size of the mapping to only span the bus
> > > > > > numbers that are configured for use via DT.
> > > > > 
> > > > > That won't work, unfortunately. The mapping is such that the bus 
> > > > > number
> > > > > is not encoded in the uppermost bits, the extended register number is.
> > > > > So the only thing that we could do is decrease the size of the 
> > > > > extended
> > > > > register space for *all* devices.
> > > > 
> > > > But you could still a method to map 16 separate areas per bus, each 
> > > > 65536
> > > > bytes long, which results in 1MB per bus. That is probably ok, since
> > > > very few systems have more than a handful of buses in practice.
> > > > 
> > > > In theory, doing static mappings on a per-page base would let you
> > > > do 16 devices at a time, but it's probably worth doing at this fine
> > > > granularity.
> > > 
> > > I don't understand how this would help. The encoding is like this:
> > > 
> > >   [27:24] extended register number
> > >   [23:16] bus number
> > >   [15:11] device number
> > >   [10: 8] function number
> > >   [ 7: 0] register number
> > > 
> > > So it doesn't matter whether I use separate areas per bus or not. As
> > > soon as the whole extended configuration space needs to be accessed a
> > > whopping 28 bits (256 MiB) are required.
> > 
> > You'd piece a mapping together, each bus requires 16 64k mappings, a
> > simple 2d array of busnr*16 of pointers would do the trick. A more
> > clever solution would be to allocate contiguous virtual memory and
> > split that up..
> 
> Oh, I see. I'm not very familiar with the internals of remapping, so
> I'll need to do some more reading. Thanks for the hints.

I forgot to ask. What's the advantage of having a contiguous virtual
memory area and splitting it up versus remapping each chunk separately?

Thierry


pgpqFYS2GYDOK.pgp
Description: PGP signature


Re: [PATCH 05/14] lib: Add I/O map cache implementation

2013-01-10 Thread Thierry Reding
On Thu, Jan 10, 2013 at 06:26:55PM +, Arnd Bergmann wrote:
> On Thursday 10 January 2013, Thierry Reding wrote:
> > I don't understand how this would help. The encoding is like this:
> > 
> > [27:24] extended register number
> > [23:16] bus number
> > [15:11] device number
> > [10: 8] function number
> > [ 7: 0] register number
> > 
> > So it doesn't matter whether I use separate areas per bus or not. As
> > soon as the whole extended configuration space needs to be accessed a
> > whopping 28 bits (256 MiB) are required.
> > 
> > What you propose would work if only regular configuration space is
> > supported. I'm not sure if that's an option.
> 
> I mean something like:
> 
> struct tegra_bus_private {
>   ...
>   void __iomem *config_space[16];
> };
> 
> 
> void tegra_scan_bus(struct pci_bus *bus)
> {
>   int i;
>   struct tegra_bus_private *priv = bus->dev->private;
> 
>   for (i=0; i<16; i++)
>   priv->config_space[i] = ioremap(config_space_phys +
>65536 * bus->primary + i * SZ_1M, 65536);
> 
>   ...
> }

Okay, I see. It's a bit kludgy, but I guess so was the I/O map cache.
It'll take some more time to work this out and test, but I'll give it
a shot.

Thierry


pgpC4AnXiuPs2.pgp
Description: PGP signature


Re: [PATCH 05/14] lib: Add I/O map cache implementation

2013-01-10 Thread Thierry Reding
On Thu, Jan 10, 2013 at 11:20:07AM -0700, Jason Gunthorpe wrote:
> On Thu, Jan 10, 2013 at 11:25:44AM +0100, Thierry Reding wrote:
> > On Thu, Jan 10, 2013 at 09:17:19AM +, Arnd Bergmann wrote:
> > > On Thursday 10 January 2013, Thierry Reding wrote:
> > > > On Wed, Jan 09, 2013 at 04:17:58PM -0700, Jason Gunthorpe wrote:
> > > > > On Wed, Jan 09, 2013 at 04:12:31PM -0700, Stephen Warren wrote:
> > > > > You could decrease the size of the mapping to only span the bus
> > > > > numbers that are configured for use via DT.
> > > > 
> > > > That won't work, unfortunately. The mapping is such that the bus number
> > > > is not encoded in the uppermost bits, the extended register number is.
> > > > So the only thing that we could do is decrease the size of the extended
> > > > register space for *all* devices.
> > > 
> > > But you could still a method to map 16 separate areas per bus, each 65536
> > > bytes long, which results in 1MB per bus. That is probably ok, since
> > > very few systems have more than a handful of buses in practice.
> > > 
> > > In theory, doing static mappings on a per-page base would let you
> > > do 16 devices at a time, but it's probably worth doing at this fine
> > > granularity.
> > 
> > I don't understand how this would help. The encoding is like this:
> > 
> > [27:24] extended register number
> > [23:16] bus number
> > [15:11] device number
> > [10: 8] function number
> > [ 7: 0] register number
> > 
> > So it doesn't matter whether I use separate areas per bus or not. As
> > soon as the whole extended configuration space needs to be accessed a
> > whopping 28 bits (256 MiB) are required.
> 
> You'd piece a mapping together, each bus requires 16 64k mappings, a
> simple 2d array of busnr*16 of pointers would do the trick. A more
> clever solution would be to allocate contiguous virtual memory and
> split that up..

Oh, I see. I'm not very familiar with the internals of remapping, so
I'll need to do some more reading. Thanks for the hints.

> > > Actually, AER probably needs this, and I believe some broken devices 
> > > need to mask interrupts using the PCI command word in the config space,
> > > it it can happen.
> > 
> > Ugh... that would kill any such dynamic mapping approach. Perhaps if we
> > could mark a device as requiring a static mapping we could pin that
> > cache entry. But that doesn't sound very encouraging.
> 
> AER applies to pretty much every PCI-E device these days.

So given there's no way around a segmented static mapping as you
suggested, right?

Thierry


pgpIxg6ZwZUHq.pgp
Description: PGP signature


Re: [PATCH 05/14] lib: Add I/O map cache implementation

2013-01-10 Thread Arnd Bergmann
On Thursday 10 January 2013, Thierry Reding wrote:
> I don't understand how this would help. The encoding is like this:
> 
> [27:24] extended register number
> [23:16] bus number
> [15:11] device number
> [10: 8] function number
> [ 7: 0] register number
> 
> So it doesn't matter whether I use separate areas per bus or not. As
> soon as the whole extended configuration space needs to be accessed a
> whopping 28 bits (256 MiB) are required.
> 
> What you propose would work if only regular configuration space is
> supported. I'm not sure if that's an option.

I mean something like:

struct tegra_bus_private {
...
void __iomem *config_space[16];
};


void tegra_scan_bus(struct pci_bus *bus)
{
int i;
struct tegra_bus_private *priv = bus->dev->private;

for (i=0; i<16; i++)
priv->config_space[i] = ioremap(config_space_phys +
 65536 * bus->primary + i * SZ_1M, 65536);

...
}

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 05/14] lib: Add I/O map cache implementation

2013-01-10 Thread Jason Gunthorpe
On Thu, Jan 10, 2013 at 11:25:44AM +0100, Thierry Reding wrote:
> On Thu, Jan 10, 2013 at 09:17:19AM +, Arnd Bergmann wrote:
> > On Thursday 10 January 2013, Thierry Reding wrote:
> > > On Wed, Jan 09, 2013 at 04:17:58PM -0700, Jason Gunthorpe wrote:
> > > > On Wed, Jan 09, 2013 at 04:12:31PM -0700, Stephen Warren wrote:
> > > > You could decrease the size of the mapping to only span the bus
> > > > numbers that are configured for use via DT.
> > > 
> > > That won't work, unfortunately. The mapping is such that the bus number
> > > is not encoded in the uppermost bits, the extended register number is.
> > > So the only thing that we could do is decrease the size of the extended
> > > register space for *all* devices.
> > 
> > But you could still a method to map 16 separate areas per bus, each 65536
> > bytes long, which results in 1MB per bus. That is probably ok, since
> > very few systems have more than a handful of buses in practice.
> > 
> > In theory, doing static mappings on a per-page base would let you
> > do 16 devices at a time, but it's probably worth doing at this fine
> > granularity.
> 
> I don't understand how this would help. The encoding is like this:
> 
>   [27:24] extended register number
>   [23:16] bus number
>   [15:11] device number
>   [10: 8] function number
>   [ 7: 0] register number
> 
> So it doesn't matter whether I use separate areas per bus or not. As
> soon as the whole extended configuration space needs to be accessed a
> whopping 28 bits (256 MiB) are required.

You'd piece a mapping together, each bus requires 16 64k mappings, a
simple 2d array of busnr*16 of pointers would do the trick. A more
clever solution would be to allocate contiguous virtual memory and
split that up..

> > Actually, AER probably needs this, and I believe some broken devices 
> > need to mask interrupts using the PCI command word in the config space,
> > it it can happen.
> 
> Ugh... that would kill any such dynamic mapping approach. Perhaps if we
> could mark a device as requiring a static mapping we could pin that
> cache entry. But that doesn't sound very encouraging.

AER applies to pretty much every PCI-E device these days.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 05/14] lib: Add I/O map cache implementation

2013-01-10 Thread Thierry Reding
On Thu, Jan 10, 2013 at 09:17:19AM +, Arnd Bergmann wrote:
> On Thursday 10 January 2013, Thierry Reding wrote:
> > On Wed, Jan 09, 2013 at 04:17:58PM -0700, Jason Gunthorpe wrote:
> > > On Wed, Jan 09, 2013 at 04:12:31PM -0700, Stephen Warren wrote:
> > > You could decrease the size of the mapping to only span the bus
> > > numbers that are configured for use via DT.
> > 
> > That won't work, unfortunately. The mapping is such that the bus number
> > is not encoded in the uppermost bits, the extended register number is.
> > So the only thing that we could do is decrease the size of the extended
> > register space for *all* devices.
> 
> But you could still a method to map 16 separate areas per bus, each 65536
> bytes long, which results in 1MB per bus. That is probably ok, since
> very few systems have more than a handful of buses in practice.
> 
> In theory, doing static mappings on a per-page base would let you
> do 16 devices at a time, but it's probably worth doing at this fine
> granularity.

I don't understand how this would help. The encoding is like this:

[27:24] extended register number
[23:16] bus number
[15:11] device number
[10: 8] function number
[ 7: 0] register number

So it doesn't matter whether I use separate areas per bus or not. As
soon as the whole extended configuration space needs to be accessed a
whopping 28 bits (256 MiB) are required.

What you propose would work if only regular configuration space is
supported. I'm not sure if that's an option.

> > > Are there any concerns about these config registers being accessed
> > > from a context where a new mapping can't be made? Interrupt? Machine
> > > Check? PCI-E Advanced Error Reporting?
> > 
> > I haven't checked but I would expect configuration space accesses to not
> > happen in interrupt context. Usually they are limited to enumeration and
> > driver probe.
> 
> Actually, AER probably needs this, and I believe some broken devices 
> need to mask interrupts using the PCI command word in the config space,
> it it can happen.

Ugh... that would kill any such dynamic mapping approach. Perhaps if we
could mark a device as requiring a static mapping we could pin that
cache entry. But that doesn't sound very encouraging.

Thierry


pgpJ6XzLYDGa2.pgp
Description: PGP signature


Re: [PATCH 05/14] lib: Add I/O map cache implementation

2013-01-10 Thread Arnd Bergmann
On Thursday 10 January 2013, Thierry Reding wrote:
> On Wed, Jan 09, 2013 at 04:17:58PM -0700, Jason Gunthorpe wrote:
> > On Wed, Jan 09, 2013 at 04:12:31PM -0700, Stephen Warren wrote:
> > You could decrease the size of the mapping to only span the bus
> > numbers that are configured for use via DT.
> 
> That won't work, unfortunately. The mapping is such that the bus number
> is not encoded in the uppermost bits, the extended register number is.
> So the only thing that we could do is decrease the size of the extended
> register space for *all* devices.

But you could still a method to map 16 separate areas per bus, each 65536
bytes long, which results in 1MB per bus. That is probably ok, since
very few systems have more than a handful of buses in practice.

In theory, doing static mappings on a per-page base would let you
do 16 devices at a time, but it's probably worth doing at this fine
granularity.

> > Are there any concerns about these config registers being accessed
> > from a context where a new mapping can't be made? Interrupt? Machine
> > Check? PCI-E Advanced Error Reporting?
> 
> I haven't checked but I would expect configuration space accesses to not
> happen in interrupt context. Usually they are limited to enumeration and
> driver probe.

Actually, AER probably needs this, and I believe some broken devices 
need to mask interrupts using the PCI command word in the config space,
it it can happen.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 05/14] lib: Add I/O map cache implementation

2013-01-09 Thread Thierry Reding
On Wed, Jan 09, 2013 at 04:17:58PM -0700, Jason Gunthorpe wrote:
> On Wed, Jan 09, 2013 at 04:12:31PM -0700, Stephen Warren wrote:
> > On 01/09/2013 03:10 PM, Arnd Bergmann wrote:
> > > On Wednesday 09 January 2013, Thierry Reding wrote:
> > >> What happens on Tegra is that we need to map 256 MiB of physical memory
> > >> to access all the PCIe extended configuration space. However, ioremap()
> > >> on such a large region fails if not enough vmalloc() space is available.
> > >>
> > >> This was observed when somebody tested this on CardHu which has a 1 GiB
> > >> of RAM and therefore remapping the full 256 MiB fails.
> > ...
> > > Have you checked if the hardware supports an alternative config
> > > space access mechanism that does not depend on a huge address range?
> > > A lot of them provide an index/data register pair somewhere, as the
> > > original PC implementation did.
> > 
> > That would be nice, but I've talked to the HW engineers, and there's no
> > indication that any alternative mechanism exists.
> 
> It seems to be convention that extended config space is often only
> accessible through mmio space, that was true on x86 last I checked
> too..
> 
> You could decrease the size of the mapping to only span the bus
> numbers that are configured for use via DT.

That won't work, unfortunately. The mapping is such that the bus number
is not encoded in the uppermost bits, the extended register number is.
So the only thing that we could do is decrease the size of the extended
register space for *all* devices.

> Are there any concerns about these config registers being accessed
> from a context where a new mapping can't be made? Interrupt? Machine
> Check? PCI-E Advanced Error Reporting?

I haven't checked but I would expect configuration space accesses to not
happen in interrupt context. Usually they are limited to enumeration and
driver probe.

Thierry


pgp8hk9HE_8Vd.pgp
Description: PGP signature


Re: [PATCH 05/14] lib: Add I/O map cache implementation

2013-01-09 Thread Thierry Reding
On Wed, Jan 09, 2013 at 10:10:49PM +, Arnd Bergmann wrote:
> On Wednesday 09 January 2013, Thierry Reding wrote:
> > What happens on Tegra is that we need to map 256 MiB of physical memory
> > to access all the PCIe extended configuration space. However, ioremap()
> > on such a large region fails if not enough vmalloc() space is available.
> > 
> > This was observed when somebody tested this on CardHu which has a 1 GiB
> > of RAM and therefore remapping the full 256 MiB fails.
> 
> Hmm, config space accesses are fairly rare and generally not expected
> to be fast, and 256 MB is really a huge waste of virtual address space,
> so I agree that just ioremapping the entire space is not a good
> solution.
> 
> However, it's not clear that a cache is necessary. Have you measured
> a significant performance benefit of this implementation over just
> iorempping and unmapping a single page for every config space access?

No, I had not actually implemented it that way because I figured I might
just as well implement something generic with the added benefit that
most remapping operations would be cached automatically since the PCI
enumeration algorithms usually access the configuration space of a
single device at a time, so it actually maps to the best case for an LRU
based cache approach.

> Even if we actually want a cache, how about a private implementation
> that just remembers a single page in LRU? I doubt that there are
> more drivers that would benefit from a generalized version that you
> provide.

I can move the code to the Tegra PCIe driver, but there's quite a bit of
code that isn't actually related to the PCI functionality and I'd really
like to avoid cluttering the driver with this implementation. Keeping it
in a more central location will certainly increase the code's visibility
and make it easier for other potential users to find.

Also I just noticed that I hadn't actually added a parameter to the
iomap_cache_create() function to specify the maximum number of pages, so
currently the code only uses a single page anyway. It should be trivial
to change. I guess performance was good enough with a single page that I
didn't have a reason to increase the maximum number of pages.

Thierry


pgpxKXeino59f.pgp
Description: PGP signature


Re: [PATCH 05/14] lib: Add I/O map cache implementation

2013-01-09 Thread Jason Gunthorpe
On Wed, Jan 09, 2013 at 04:12:31PM -0700, Stephen Warren wrote:
> On 01/09/2013 03:10 PM, Arnd Bergmann wrote:
> > On Wednesday 09 January 2013, Thierry Reding wrote:
> >> What happens on Tegra is that we need to map 256 MiB of physical memory
> >> to access all the PCIe extended configuration space. However, ioremap()
> >> on such a large region fails if not enough vmalloc() space is available.
> >>
> >> This was observed when somebody tested this on CardHu which has a 1 GiB
> >> of RAM and therefore remapping the full 256 MiB fails.
> ...
> > Have you checked if the hardware supports an alternative config
> > space access mechanism that does not depend on a huge address range?
> > A lot of them provide an index/data register pair somewhere, as the
> > original PC implementation did.
> 
> That would be nice, but I've talked to the HW engineers, and there's no
> indication that any alternative mechanism exists.

It seems to be convention that extended config space is often only
accessible through mmio space, that was true on x86 last I checked
too..

You could decrease the size of the mapping to only span the bus
numbers that are configured for use via DT.

Are there any concerns about these config registers being accessed
from a context where a new mapping can't be made? Interrupt? Machine
Check? PCI-E Advanced Error Reporting?

Cheers,
Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 05/14] lib: Add I/O map cache implementation

2013-01-09 Thread Stephen Warren
On 01/09/2013 03:10 PM, Arnd Bergmann wrote:
> On Wednesday 09 January 2013, Thierry Reding wrote:
>> What happens on Tegra is that we need to map 256 MiB of physical memory
>> to access all the PCIe extended configuration space. However, ioremap()
>> on such a large region fails if not enough vmalloc() space is available.
>>
>> This was observed when somebody tested this on CardHu which has a 1 GiB
>> of RAM and therefore remapping the full 256 MiB fails.
...
> Have you checked if the hardware supports an alternative config
> space access mechanism that does not depend on a huge address range?
> A lot of them provide an index/data register pair somewhere, as the
> original PC implementation did.

That would be nice, but I've talked to the HW engineers, and there's no
indication that any alternative mechanism exists.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 05/14] lib: Add I/O map cache implementation

2013-01-09 Thread Arnd Bergmann
On Wednesday 09 January 2013, Thierry Reding wrote:
> What happens on Tegra is that we need to map 256 MiB of physical memory
> to access all the PCIe extended configuration space. However, ioremap()
> on such a large region fails if not enough vmalloc() space is available.
> 
> This was observed when somebody tested this on CardHu which has a 1 GiB
> of RAM and therefore remapping the full 256 MiB fails.

Hmm, config space accesses are fairly rare and generally not expected
to be fast, and 256 MB is really a huge waste of virtual address space,
so I agree that just ioremapping the entire space is not a good
solution.

However, it's not clear that a cache is necessary. Have you measured
a significant performance benefit of this implementation over just
iorempping and unmapping a single page for every config space access?
Have you checked if the hardware supports an alternative config
space access mechanism that does not depend on a huge address range?
A lot of them provide an index/data register pair somewhere, as the
original PC implementation did.

Even if we actually want a cache, how about a private implementation
that just remembers a single page in LRU? I doubt that there are
more drivers that would benefit from a generalized version that you
provide.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 05/14] lib: Add I/O map cache implementation

2013-01-09 Thread Thierry Reding
On Wed, Jan 09, 2013 at 09:28:47PM +, Russell King - ARM Linux wrote:
> On Wed, Jan 09, 2013 at 09:43:05PM +0100, Thierry Reding wrote:
> > The I/O map cache is used to map large regions of physical memory in
> > smaller chunks to avoid running out of vmalloc()/ioremap() space.
> > 
> > Signed-off-by: Thierry Reding 
> 
> We already have a means where we record the mappings which ioremap()
> creates.  If you look at /proc/vmallocinfo, you'll notice lines such
> as:
> 
> 0xf7b72000-0xf7b740008192 e1000_probe+0x291/0xa68 [e1000e] phys=fc025000 
> ioremap
> 
> which gives you the virtual address range, physical address and type
> of the mapping.  Why do we need a duplicated data structure?
> 
> Moreover, you seem to suggest that you want to break up a large
> ioremap() mapping into several smaller mappings.  Why?  The idea
> behind ioremap() is that this relationship holds true:
> 
>   ptr = ioremap(cookie + n, size);
> 
> For any 'n' in the range 0 .. size, the location shall be accessible
> via ptr + n when using the IO space accessors.  If you're going to
> break up a mapping into several smaller ones, this no longer holds
> true.
> 
> If the problem is that you're ioremapping huge address ranges because
> you're passing larger-than-required resources to devices, then that's
> part of the problem too.

The resource is not larger than required. It's just that in order to
address the extended configuration space of all PCI devices we require a
full 256 MiB window.

What this patch tries to achieve is allow a driver to take a large
region and ioremap() it page by page on an as needed basis.

Thierry


pgpJP1GaP2jlq.pgp
Description: PGP signature


Re: [PATCH 05/14] lib: Add I/O map cache implementation

2013-01-09 Thread Thierry Reding
On Wed, Jan 09, 2013 at 09:19:56PM +, Arnd Bergmann wrote:
> On Wednesday 09 January 2013, Thierry Reding wrote:
> > The I/O map cache is used to map large regions of physical memory in
> > smaller chunks to avoid running out of vmalloc()/ioremap() space.
> > 
> > Signed-off-by: Thierry Reding 
> 
> Can you explain why this is necessary and which problem it solves?
> The implementation looks reasonable, but it's not clear to me if
> we really need this new interface.
> 
> In what cases does it actually save ioremap space?

I briefly described this in the cover letter of this series and didn't
realize I hadn't put more information in this patch's commit message.

What happens on Tegra is that we need to map 256 MiB of physical memory
to access all the PCIe extended configuration space. However, ioremap()
on such a large region fails if not enough vmalloc() space is available.

This was observed when somebody tested this on CardHu which has a 1 GiB
of RAM and therefore remapping the full 256 MiB fails.

Thierry


pgpNdDkM4v_5C.pgp
Description: PGP signature


Re: [PATCH 05/14] lib: Add I/O map cache implementation

2013-01-09 Thread Russell King - ARM Linux
On Wed, Jan 09, 2013 at 09:43:05PM +0100, Thierry Reding wrote:
> The I/O map cache is used to map large regions of physical memory in
> smaller chunks to avoid running out of vmalloc()/ioremap() space.
> 
> Signed-off-by: Thierry Reding 

We already have a means where we record the mappings which ioremap()
creates.  If you look at /proc/vmallocinfo, you'll notice lines such
as:

0xf7b72000-0xf7b740008192 e1000_probe+0x291/0xa68 [e1000e] phys=fc025000 
ioremap

which gives you the virtual address range, physical address and type
of the mapping.  Why do we need a duplicated data structure?

Moreover, you seem to suggest that you want to break up a large
ioremap() mapping into several smaller mappings.  Why?  The idea
behind ioremap() is that this relationship holds true:

ptr = ioremap(cookie + n, size);

For any 'n' in the range 0 .. size, the location shall be accessible
via ptr + n when using the IO space accessors.  If you're going to
break up a mapping into several smaller ones, this no longer holds
true.

If the problem is that you're ioremapping huge address ranges because
you're passing larger-than-required resources to devices, then that's
part of the problem too.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 05/14] lib: Add I/O map cache implementation

2013-01-09 Thread Arnd Bergmann
On Wednesday 09 January 2013, Thierry Reding wrote:
> The I/O map cache is used to map large regions of physical memory in
> smaller chunks to avoid running out of vmalloc()/ioremap() space.
> 
> Signed-off-by: Thierry Reding 

Can you explain why this is necessary and which problem it solves?
The implementation looks reasonable, but it's not clear to me if
we really need this new interface.

In what cases does it actually save ioremap space?

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/