Re: [PATCH 05/14] lib: Add I/O map cache implementation
On Wed, Jan 16, 2013 at 11:25:56AM +, Russell King - ARM Linux wrote: > On Wed, Jan 16, 2013 at 11:18:22AM +0100, Thierry Reding wrote: > > err = ioremap_page_range(virt, virt + SZ_64K - 1, phys, > > Why -1 here? Right, I forgot that end in these functions always means one byte after the end. Removing the -1 seems to get past the remapping at least. Reading the configuration space through the mapping doesn't though. I'll investigate some more. Thanks for pointing this out Russell. Thierry pgpxGHk9AA0K5.pgp Description: PGP signature
Re: [PATCH 05/14] lib: Add I/O map cache implementation
On Wed, Jan 16, 2013 at 11:18:22AM +0100, Thierry Reding wrote: > err = ioremap_page_range(virt, virt + SZ_64K - 1, phys, Why -1 here? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 05/14] lib: Add I/O map cache implementation
On Thu, Jan 10, 2013 at 12:24:17PM -0700, Jason Gunthorpe wrote: > On Thu, Jan 10, 2013 at 08:03:27PM +0100, Thierry Reding wrote: > > > > > You'd piece a mapping together, each bus requires 16 64k mappings, a > > > > simple 2d array of busnr*16 of pointers would do the trick. A more > > > > clever solution would be to allocate contiguous virtual memory and > > > > split that up.. > > > > Oh, I see. I'm not very familiar with the internals of remapping, so > > > I'll need to do some more reading. Thanks for the hints. > > > > I forgot to ask. What's the advantage of having a contiguous virtual > > memory area and splitting it up versus remapping each chunk separately? > > Not alot, really, but it saves you from the pointer array and > associated overhead. IIRC it is fairly easy to do in the kernel. > > Arnd's version is good too, but you would be restricted to aligned > powers of two for the bus number range in the DT, which is probably > not that big a deal either? I've been trying to make this work, but this implementation always triggers a BUG_ON() in lib/ioremap.c, line 27: 27 BUG_ON(!pte_none(*pte)); which seems to indicate that the page is already mapped, right? Below is the relevant code: struct tegra_pcie_bus { struct vm_struct *area; struct list_head list; unsigned int nr; }; static struct tegra_pcie_bus *tegra_pcie_bus_alloc(struct tegra_pcie *pcie, unsigned int busnr) { unsigned long flags = VM_READ | VM_WRITE | VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP; phys_addr_t cs = pcie->cs->start; struct tegra_pcie_bus *bus; struct vm_struct *vm; unsigned int i; int err; bus = devm_kzalloc(pcie->dev, sizeof(*bus), GFP_KERNEL); if (!bus) return ERR_PTR(-ENOMEM); INIT_LIST_HEAD(&bus->list); bus->nr = busnr; bus->area = get_vm_area(SZ_1M, VM_IOREMAP); if (!bus->area) { err = -ENOMEM; goto free; } for (i = 0; i < 16; i++) { unsigned long virt = (unsigned long)bus->area->addr + i * SZ_64K; phys_addr_t phys = cs + busnr * SZ_64K + i * SZ_1M; err = ioremap_page_range(virt, virt + SZ_64K - 1, phys, vm_get_page_prot(flags)); if (err < 0) { dev_err(pcie->dev, "ioremap_page_range() failed: %d\n", err); goto unmap; } } return bus; unmap: vunmap(bus->area->addr); free_vm_area(bus->area); free: devm_kfree(pcie->dev, bus); return ERR_PTR(err); } Anybody see what's wrong with that? pgpm83uwPpAH4.pgp Description: PGP signature
Re: [PATCH 05/14] lib: Add I/O map cache implementation
On Thu, Jan 10, 2013 at 09:20:07PM +0100, Thierry Reding wrote: > > Arnd's version is good too, but you would be restricted to aligned > > powers of two for the bus number range in the DT, which is probably > > not that big a deal either? > > Stephen suggested on IRC that we could try to keep a bit of dynamicity > in the allocation scheme if we create the bus mapping when the first > device on the bus is probed and discard the mapping if no devices are > found. You probably don't need to mess around with 'discard on empty' the kernel should only access bus numbers that are in the range of the subordinate busses of the bridges. So if you establish a mapping on a bus-by-bus basis at first access, it should be fine and very close to minimal.. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 05/14] lib: Add I/O map cache implementation
On Thu, Jan 10, 2013 at 12:24:17PM -0700, Jason Gunthorpe wrote: > On Thu, Jan 10, 2013 at 08:03:27PM +0100, Thierry Reding wrote: > > > > > You'd piece a mapping together, each bus requires 16 64k mappings, a > > > > simple 2d array of busnr*16 of pointers would do the trick. A more > > > > clever solution would be to allocate contiguous virtual memory and > > > > split that up.. > > > > Oh, I see. I'm not very familiar with the internals of remapping, so > > > I'll need to do some more reading. Thanks for the hints. > > > > I forgot to ask. What's the advantage of having a contiguous virtual > > memory area and splitting it up versus remapping each chunk separately? > > Not alot, really, but it saves you from the pointer array and > associated overhead. IIRC it is fairly easy to do in the kernel. I've been investigating this a bit, and one problem is that it will prevent the driver from ever building as a module because the necessary functions aren't exported and I'm not sure exporting them would be acceptable. Currently PCI host controller drivers with MSI support can't be built as modules because the MSI infrastructure requires it, but I briefly discussed this with Bjorn at some point and it should be easy to remove that requirement. > Arnd's version is good too, but you would be restricted to aligned > powers of two for the bus number range in the DT, which is probably > not that big a deal either? Stephen suggested on IRC that we could try to keep a bit of dynamicity in the allocation scheme if we create the bus mapping when the first device on the bus is probed and discard the mapping if no devices are found. Sounds like a good plan to me. Does anybody see any potential pitfalls? Thierry pgpwUHh6il_3O.pgp Description: PGP signature
Re: [PATCH 05/14] lib: Add I/O map cache implementation
On Thu, Jan 10, 2013 at 08:03:27PM +0100, Thierry Reding wrote: > > > You'd piece a mapping together, each bus requires 16 64k mappings, a > > > simple 2d array of busnr*16 of pointers would do the trick. A more > > > clever solution would be to allocate contiguous virtual memory and > > > split that up.. > > Oh, I see. I'm not very familiar with the internals of remapping, so > > I'll need to do some more reading. Thanks for the hints. > > I forgot to ask. What's the advantage of having a contiguous virtual > memory area and splitting it up versus remapping each chunk separately? Not alot, really, but it saves you from the pointer array and associated overhead. IIRC it is fairly easy to do in the kernel. Arnd's version is good too, but you would be restricted to aligned powers of two for the bus number range in the DT, which is probably not that big a deal either? Jason -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 05/14] lib: Add I/O map cache implementation
On Thu, Jan 10, 2013 at 07:55:05PM +0100, Thierry Reding wrote: > On Thu, Jan 10, 2013 at 11:20:07AM -0700, Jason Gunthorpe wrote: > > On Thu, Jan 10, 2013 at 11:25:44AM +0100, Thierry Reding wrote: > > > On Thu, Jan 10, 2013 at 09:17:19AM +, Arnd Bergmann wrote: > > > > On Thursday 10 January 2013, Thierry Reding wrote: > > > > > On Wed, Jan 09, 2013 at 04:17:58PM -0700, Jason Gunthorpe wrote: > > > > > > On Wed, Jan 09, 2013 at 04:12:31PM -0700, Stephen Warren wrote: > > > > > > You could decrease the size of the mapping to only span the bus > > > > > > numbers that are configured for use via DT. > > > > > > > > > > That won't work, unfortunately. The mapping is such that the bus > > > > > number > > > > > is not encoded in the uppermost bits, the extended register number is. > > > > > So the only thing that we could do is decrease the size of the > > > > > extended > > > > > register space for *all* devices. > > > > > > > > But you could still a method to map 16 separate areas per bus, each > > > > 65536 > > > > bytes long, which results in 1MB per bus. That is probably ok, since > > > > very few systems have more than a handful of buses in practice. > > > > > > > > In theory, doing static mappings on a per-page base would let you > > > > do 16 devices at a time, but it's probably worth doing at this fine > > > > granularity. > > > > > > I don't understand how this would help. The encoding is like this: > > > > > > [27:24] extended register number > > > [23:16] bus number > > > [15:11] device number > > > [10: 8] function number > > > [ 7: 0] register number > > > > > > So it doesn't matter whether I use separate areas per bus or not. As > > > soon as the whole extended configuration space needs to be accessed a > > > whopping 28 bits (256 MiB) are required. > > > > You'd piece a mapping together, each bus requires 16 64k mappings, a > > simple 2d array of busnr*16 of pointers would do the trick. A more > > clever solution would be to allocate contiguous virtual memory and > > split that up.. > > Oh, I see. I'm not very familiar with the internals of remapping, so > I'll need to do some more reading. Thanks for the hints. I forgot to ask. What's the advantage of having a contiguous virtual memory area and splitting it up versus remapping each chunk separately? Thierry pgpqFYS2GYDOK.pgp Description: PGP signature
Re: [PATCH 05/14] lib: Add I/O map cache implementation
On Thu, Jan 10, 2013 at 06:26:55PM +, Arnd Bergmann wrote: > On Thursday 10 January 2013, Thierry Reding wrote: > > I don't understand how this would help. The encoding is like this: > > > > [27:24] extended register number > > [23:16] bus number > > [15:11] device number > > [10: 8] function number > > [ 7: 0] register number > > > > So it doesn't matter whether I use separate areas per bus or not. As > > soon as the whole extended configuration space needs to be accessed a > > whopping 28 bits (256 MiB) are required. > > > > What you propose would work if only regular configuration space is > > supported. I'm not sure if that's an option. > > I mean something like: > > struct tegra_bus_private { > ... > void __iomem *config_space[16]; > }; > > > void tegra_scan_bus(struct pci_bus *bus) > { > int i; > struct tegra_bus_private *priv = bus->dev->private; > > for (i=0; i<16; i++) > priv->config_space[i] = ioremap(config_space_phys + >65536 * bus->primary + i * SZ_1M, 65536); > > ... > } Okay, I see. It's a bit kludgy, but I guess so was the I/O map cache. It'll take some more time to work this out and test, but I'll give it a shot. Thierry pgpC4AnXiuPs2.pgp Description: PGP signature
Re: [PATCH 05/14] lib: Add I/O map cache implementation
On Thu, Jan 10, 2013 at 11:20:07AM -0700, Jason Gunthorpe wrote: > On Thu, Jan 10, 2013 at 11:25:44AM +0100, Thierry Reding wrote: > > On Thu, Jan 10, 2013 at 09:17:19AM +, Arnd Bergmann wrote: > > > On Thursday 10 January 2013, Thierry Reding wrote: > > > > On Wed, Jan 09, 2013 at 04:17:58PM -0700, Jason Gunthorpe wrote: > > > > > On Wed, Jan 09, 2013 at 04:12:31PM -0700, Stephen Warren wrote: > > > > > You could decrease the size of the mapping to only span the bus > > > > > numbers that are configured for use via DT. > > > > > > > > That won't work, unfortunately. The mapping is such that the bus number > > > > is not encoded in the uppermost bits, the extended register number is. > > > > So the only thing that we could do is decrease the size of the extended > > > > register space for *all* devices. > > > > > > But you could still a method to map 16 separate areas per bus, each 65536 > > > bytes long, which results in 1MB per bus. That is probably ok, since > > > very few systems have more than a handful of buses in practice. > > > > > > In theory, doing static mappings on a per-page base would let you > > > do 16 devices at a time, but it's probably worth doing at this fine > > > granularity. > > > > I don't understand how this would help. The encoding is like this: > > > > [27:24] extended register number > > [23:16] bus number > > [15:11] device number > > [10: 8] function number > > [ 7: 0] register number > > > > So it doesn't matter whether I use separate areas per bus or not. As > > soon as the whole extended configuration space needs to be accessed a > > whopping 28 bits (256 MiB) are required. > > You'd piece a mapping together, each bus requires 16 64k mappings, a > simple 2d array of busnr*16 of pointers would do the trick. A more > clever solution would be to allocate contiguous virtual memory and > split that up.. Oh, I see. I'm not very familiar with the internals of remapping, so I'll need to do some more reading. Thanks for the hints. > > > Actually, AER probably needs this, and I believe some broken devices > > > need to mask interrupts using the PCI command word in the config space, > > > it it can happen. > > > > Ugh... that would kill any such dynamic mapping approach. Perhaps if we > > could mark a device as requiring a static mapping we could pin that > > cache entry. But that doesn't sound very encouraging. > > AER applies to pretty much every PCI-E device these days. So given there's no way around a segmented static mapping as you suggested, right? Thierry pgpIxg6ZwZUHq.pgp Description: PGP signature
Re: [PATCH 05/14] lib: Add I/O map cache implementation
On Thursday 10 January 2013, Thierry Reding wrote: > I don't understand how this would help. The encoding is like this: > > [27:24] extended register number > [23:16] bus number > [15:11] device number > [10: 8] function number > [ 7: 0] register number > > So it doesn't matter whether I use separate areas per bus or not. As > soon as the whole extended configuration space needs to be accessed a > whopping 28 bits (256 MiB) are required. > > What you propose would work if only regular configuration space is > supported. I'm not sure if that's an option. I mean something like: struct tegra_bus_private { ... void __iomem *config_space[16]; }; void tegra_scan_bus(struct pci_bus *bus) { int i; struct tegra_bus_private *priv = bus->dev->private; for (i=0; i<16; i++) priv->config_space[i] = ioremap(config_space_phys + 65536 * bus->primary + i * SZ_1M, 65536); ... } Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 05/14] lib: Add I/O map cache implementation
On Thu, Jan 10, 2013 at 11:25:44AM +0100, Thierry Reding wrote: > On Thu, Jan 10, 2013 at 09:17:19AM +, Arnd Bergmann wrote: > > On Thursday 10 January 2013, Thierry Reding wrote: > > > On Wed, Jan 09, 2013 at 04:17:58PM -0700, Jason Gunthorpe wrote: > > > > On Wed, Jan 09, 2013 at 04:12:31PM -0700, Stephen Warren wrote: > > > > You could decrease the size of the mapping to only span the bus > > > > numbers that are configured for use via DT. > > > > > > That won't work, unfortunately. The mapping is such that the bus number > > > is not encoded in the uppermost bits, the extended register number is. > > > So the only thing that we could do is decrease the size of the extended > > > register space for *all* devices. > > > > But you could still a method to map 16 separate areas per bus, each 65536 > > bytes long, which results in 1MB per bus. That is probably ok, since > > very few systems have more than a handful of buses in practice. > > > > In theory, doing static mappings on a per-page base would let you > > do 16 devices at a time, but it's probably worth doing at this fine > > granularity. > > I don't understand how this would help. The encoding is like this: > > [27:24] extended register number > [23:16] bus number > [15:11] device number > [10: 8] function number > [ 7: 0] register number > > So it doesn't matter whether I use separate areas per bus or not. As > soon as the whole extended configuration space needs to be accessed a > whopping 28 bits (256 MiB) are required. You'd piece a mapping together, each bus requires 16 64k mappings, a simple 2d array of busnr*16 of pointers would do the trick. A more clever solution would be to allocate contiguous virtual memory and split that up.. > > Actually, AER probably needs this, and I believe some broken devices > > need to mask interrupts using the PCI command word in the config space, > > it it can happen. > > Ugh... that would kill any such dynamic mapping approach. Perhaps if we > could mark a device as requiring a static mapping we could pin that > cache entry. But that doesn't sound very encouraging. AER applies to pretty much every PCI-E device these days. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 05/14] lib: Add I/O map cache implementation
On Thu, Jan 10, 2013 at 09:17:19AM +, Arnd Bergmann wrote: > On Thursday 10 January 2013, Thierry Reding wrote: > > On Wed, Jan 09, 2013 at 04:17:58PM -0700, Jason Gunthorpe wrote: > > > On Wed, Jan 09, 2013 at 04:12:31PM -0700, Stephen Warren wrote: > > > You could decrease the size of the mapping to only span the bus > > > numbers that are configured for use via DT. > > > > That won't work, unfortunately. The mapping is such that the bus number > > is not encoded in the uppermost bits, the extended register number is. > > So the only thing that we could do is decrease the size of the extended > > register space for *all* devices. > > But you could still a method to map 16 separate areas per bus, each 65536 > bytes long, which results in 1MB per bus. That is probably ok, since > very few systems have more than a handful of buses in practice. > > In theory, doing static mappings on a per-page base would let you > do 16 devices at a time, but it's probably worth doing at this fine > granularity. I don't understand how this would help. The encoding is like this: [27:24] extended register number [23:16] bus number [15:11] device number [10: 8] function number [ 7: 0] register number So it doesn't matter whether I use separate areas per bus or not. As soon as the whole extended configuration space needs to be accessed a whopping 28 bits (256 MiB) are required. What you propose would work if only regular configuration space is supported. I'm not sure if that's an option. > > > Are there any concerns about these config registers being accessed > > > from a context where a new mapping can't be made? Interrupt? Machine > > > Check? PCI-E Advanced Error Reporting? > > > > I haven't checked but I would expect configuration space accesses to not > > happen in interrupt context. Usually they are limited to enumeration and > > driver probe. > > Actually, AER probably needs this, and I believe some broken devices > need to mask interrupts using the PCI command word in the config space, > it it can happen. Ugh... that would kill any such dynamic mapping approach. Perhaps if we could mark a device as requiring a static mapping we could pin that cache entry. But that doesn't sound very encouraging. Thierry pgpJ6XzLYDGa2.pgp Description: PGP signature
Re: [PATCH 05/14] lib: Add I/O map cache implementation
On Thursday 10 January 2013, Thierry Reding wrote: > On Wed, Jan 09, 2013 at 04:17:58PM -0700, Jason Gunthorpe wrote: > > On Wed, Jan 09, 2013 at 04:12:31PM -0700, Stephen Warren wrote: > > You could decrease the size of the mapping to only span the bus > > numbers that are configured for use via DT. > > That won't work, unfortunately. The mapping is such that the bus number > is not encoded in the uppermost bits, the extended register number is. > So the only thing that we could do is decrease the size of the extended > register space for *all* devices. But you could still a method to map 16 separate areas per bus, each 65536 bytes long, which results in 1MB per bus. That is probably ok, since very few systems have more than a handful of buses in practice. In theory, doing static mappings on a per-page base would let you do 16 devices at a time, but it's probably worth doing at this fine granularity. > > Are there any concerns about these config registers being accessed > > from a context where a new mapping can't be made? Interrupt? Machine > > Check? PCI-E Advanced Error Reporting? > > I haven't checked but I would expect configuration space accesses to not > happen in interrupt context. Usually they are limited to enumeration and > driver probe. Actually, AER probably needs this, and I believe some broken devices need to mask interrupts using the PCI command word in the config space, it it can happen. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 05/14] lib: Add I/O map cache implementation
On Wed, Jan 09, 2013 at 04:17:58PM -0700, Jason Gunthorpe wrote: > On Wed, Jan 09, 2013 at 04:12:31PM -0700, Stephen Warren wrote: > > On 01/09/2013 03:10 PM, Arnd Bergmann wrote: > > > On Wednesday 09 January 2013, Thierry Reding wrote: > > >> What happens on Tegra is that we need to map 256 MiB of physical memory > > >> to access all the PCIe extended configuration space. However, ioremap() > > >> on such a large region fails if not enough vmalloc() space is available. > > >> > > >> This was observed when somebody tested this on CardHu which has a 1 GiB > > >> of RAM and therefore remapping the full 256 MiB fails. > > ... > > > Have you checked if the hardware supports an alternative config > > > space access mechanism that does not depend on a huge address range? > > > A lot of them provide an index/data register pair somewhere, as the > > > original PC implementation did. > > > > That would be nice, but I've talked to the HW engineers, and there's no > > indication that any alternative mechanism exists. > > It seems to be convention that extended config space is often only > accessible through mmio space, that was true on x86 last I checked > too.. > > You could decrease the size of the mapping to only span the bus > numbers that are configured for use via DT. That won't work, unfortunately. The mapping is such that the bus number is not encoded in the uppermost bits, the extended register number is. So the only thing that we could do is decrease the size of the extended register space for *all* devices. > Are there any concerns about these config registers being accessed > from a context where a new mapping can't be made? Interrupt? Machine > Check? PCI-E Advanced Error Reporting? I haven't checked but I would expect configuration space accesses to not happen in interrupt context. Usually they are limited to enumeration and driver probe. Thierry pgp8hk9HE_8Vd.pgp Description: PGP signature
Re: [PATCH 05/14] lib: Add I/O map cache implementation
On Wed, Jan 09, 2013 at 10:10:49PM +, Arnd Bergmann wrote: > On Wednesday 09 January 2013, Thierry Reding wrote: > > What happens on Tegra is that we need to map 256 MiB of physical memory > > to access all the PCIe extended configuration space. However, ioremap() > > on such a large region fails if not enough vmalloc() space is available. > > > > This was observed when somebody tested this on CardHu which has a 1 GiB > > of RAM and therefore remapping the full 256 MiB fails. > > Hmm, config space accesses are fairly rare and generally not expected > to be fast, and 256 MB is really a huge waste of virtual address space, > so I agree that just ioremapping the entire space is not a good > solution. > > However, it's not clear that a cache is necessary. Have you measured > a significant performance benefit of this implementation over just > iorempping and unmapping a single page for every config space access? No, I had not actually implemented it that way because I figured I might just as well implement something generic with the added benefit that most remapping operations would be cached automatically since the PCI enumeration algorithms usually access the configuration space of a single device at a time, so it actually maps to the best case for an LRU based cache approach. > Even if we actually want a cache, how about a private implementation > that just remembers a single page in LRU? I doubt that there are > more drivers that would benefit from a generalized version that you > provide. I can move the code to the Tegra PCIe driver, but there's quite a bit of code that isn't actually related to the PCI functionality and I'd really like to avoid cluttering the driver with this implementation. Keeping it in a more central location will certainly increase the code's visibility and make it easier for other potential users to find. Also I just noticed that I hadn't actually added a parameter to the iomap_cache_create() function to specify the maximum number of pages, so currently the code only uses a single page anyway. It should be trivial to change. I guess performance was good enough with a single page that I didn't have a reason to increase the maximum number of pages. Thierry pgpxKXeino59f.pgp Description: PGP signature
Re: [PATCH 05/14] lib: Add I/O map cache implementation
On Wed, Jan 09, 2013 at 04:12:31PM -0700, Stephen Warren wrote: > On 01/09/2013 03:10 PM, Arnd Bergmann wrote: > > On Wednesday 09 January 2013, Thierry Reding wrote: > >> What happens on Tegra is that we need to map 256 MiB of physical memory > >> to access all the PCIe extended configuration space. However, ioremap() > >> on such a large region fails if not enough vmalloc() space is available. > >> > >> This was observed when somebody tested this on CardHu which has a 1 GiB > >> of RAM and therefore remapping the full 256 MiB fails. > ... > > Have you checked if the hardware supports an alternative config > > space access mechanism that does not depend on a huge address range? > > A lot of them provide an index/data register pair somewhere, as the > > original PC implementation did. > > That would be nice, but I've talked to the HW engineers, and there's no > indication that any alternative mechanism exists. It seems to be convention that extended config space is often only accessible through mmio space, that was true on x86 last I checked too.. You could decrease the size of the mapping to only span the bus numbers that are configured for use via DT. Are there any concerns about these config registers being accessed from a context where a new mapping can't be made? Interrupt? Machine Check? PCI-E Advanced Error Reporting? Cheers, Jason -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 05/14] lib: Add I/O map cache implementation
On 01/09/2013 03:10 PM, Arnd Bergmann wrote: > On Wednesday 09 January 2013, Thierry Reding wrote: >> What happens on Tegra is that we need to map 256 MiB of physical memory >> to access all the PCIe extended configuration space. However, ioremap() >> on such a large region fails if not enough vmalloc() space is available. >> >> This was observed when somebody tested this on CardHu which has a 1 GiB >> of RAM and therefore remapping the full 256 MiB fails. ... > Have you checked if the hardware supports an alternative config > space access mechanism that does not depend on a huge address range? > A lot of them provide an index/data register pair somewhere, as the > original PC implementation did. That would be nice, but I've talked to the HW engineers, and there's no indication that any alternative mechanism exists. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 05/14] lib: Add I/O map cache implementation
On Wednesday 09 January 2013, Thierry Reding wrote: > What happens on Tegra is that we need to map 256 MiB of physical memory > to access all the PCIe extended configuration space. However, ioremap() > on such a large region fails if not enough vmalloc() space is available. > > This was observed when somebody tested this on CardHu which has a 1 GiB > of RAM and therefore remapping the full 256 MiB fails. Hmm, config space accesses are fairly rare and generally not expected to be fast, and 256 MB is really a huge waste of virtual address space, so I agree that just ioremapping the entire space is not a good solution. However, it's not clear that a cache is necessary. Have you measured a significant performance benefit of this implementation over just iorempping and unmapping a single page for every config space access? Have you checked if the hardware supports an alternative config space access mechanism that does not depend on a huge address range? A lot of them provide an index/data register pair somewhere, as the original PC implementation did. Even if we actually want a cache, how about a private implementation that just remembers a single page in LRU? I doubt that there are more drivers that would benefit from a generalized version that you provide. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 05/14] lib: Add I/O map cache implementation
On Wed, Jan 09, 2013 at 09:28:47PM +, Russell King - ARM Linux wrote: > On Wed, Jan 09, 2013 at 09:43:05PM +0100, Thierry Reding wrote: > > The I/O map cache is used to map large regions of physical memory in > > smaller chunks to avoid running out of vmalloc()/ioremap() space. > > > > Signed-off-by: Thierry Reding > > We already have a means where we record the mappings which ioremap() > creates. If you look at /proc/vmallocinfo, you'll notice lines such > as: > > 0xf7b72000-0xf7b740008192 e1000_probe+0x291/0xa68 [e1000e] phys=fc025000 > ioremap > > which gives you the virtual address range, physical address and type > of the mapping. Why do we need a duplicated data structure? > > Moreover, you seem to suggest that you want to break up a large > ioremap() mapping into several smaller mappings. Why? The idea > behind ioremap() is that this relationship holds true: > > ptr = ioremap(cookie + n, size); > > For any 'n' in the range 0 .. size, the location shall be accessible > via ptr + n when using the IO space accessors. If you're going to > break up a mapping into several smaller ones, this no longer holds > true. > > If the problem is that you're ioremapping huge address ranges because > you're passing larger-than-required resources to devices, then that's > part of the problem too. The resource is not larger than required. It's just that in order to address the extended configuration space of all PCI devices we require a full 256 MiB window. What this patch tries to achieve is allow a driver to take a large region and ioremap() it page by page on an as needed basis. Thierry pgpJP1GaP2jlq.pgp Description: PGP signature
Re: [PATCH 05/14] lib: Add I/O map cache implementation
On Wed, Jan 09, 2013 at 09:19:56PM +, Arnd Bergmann wrote: > On Wednesday 09 January 2013, Thierry Reding wrote: > > The I/O map cache is used to map large regions of physical memory in > > smaller chunks to avoid running out of vmalloc()/ioremap() space. > > > > Signed-off-by: Thierry Reding > > Can you explain why this is necessary and which problem it solves? > The implementation looks reasonable, but it's not clear to me if > we really need this new interface. > > In what cases does it actually save ioremap space? I briefly described this in the cover letter of this series and didn't realize I hadn't put more information in this patch's commit message. What happens on Tegra is that we need to map 256 MiB of physical memory to access all the PCIe extended configuration space. However, ioremap() on such a large region fails if not enough vmalloc() space is available. This was observed when somebody tested this on CardHu which has a 1 GiB of RAM and therefore remapping the full 256 MiB fails. Thierry pgpNdDkM4v_5C.pgp Description: PGP signature
Re: [PATCH 05/14] lib: Add I/O map cache implementation
On Wed, Jan 09, 2013 at 09:43:05PM +0100, Thierry Reding wrote: > The I/O map cache is used to map large regions of physical memory in > smaller chunks to avoid running out of vmalloc()/ioremap() space. > > Signed-off-by: Thierry Reding We already have a means where we record the mappings which ioremap() creates. If you look at /proc/vmallocinfo, you'll notice lines such as: 0xf7b72000-0xf7b740008192 e1000_probe+0x291/0xa68 [e1000e] phys=fc025000 ioremap which gives you the virtual address range, physical address and type of the mapping. Why do we need a duplicated data structure? Moreover, you seem to suggest that you want to break up a large ioremap() mapping into several smaller mappings. Why? The idea behind ioremap() is that this relationship holds true: ptr = ioremap(cookie + n, size); For any 'n' in the range 0 .. size, the location shall be accessible via ptr + n when using the IO space accessors. If you're going to break up a mapping into several smaller ones, this no longer holds true. If the problem is that you're ioremapping huge address ranges because you're passing larger-than-required resources to devices, then that's part of the problem too. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 05/14] lib: Add I/O map cache implementation
On Wednesday 09 January 2013, Thierry Reding wrote: > The I/O map cache is used to map large regions of physical memory in > smaller chunks to avoid running out of vmalloc()/ioremap() space. > > Signed-off-by: Thierry Reding Can you explain why this is necessary and which problem it solves? The implementation looks reasonable, but it's not clear to me if we really need this new interface. In what cases does it actually save ioremap space? Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/