Re: [openib-general] Suggestions for how to remove bus_to_virt()
Ralph Campbell wrote: On Fri, 2006-07-14 at 15:35 -0700, David Miller wrote: ... The dma_mapping_ops idea will never get accepted by folks like Linus, for reasons I've outlined in previous emails in this thread. So, it's best to look elsewhere for solutions to your problem, such as the ideas used by the USB and IEE1394 device layers. The USB code won't work in my case because the USB system is the one doing the memory allocation and IOMMU setup so it can remember the kernel virtual address or physical pages used to create the mapping. Side note: The same is true with the DMA stuff in the ieee1394 subsystem. And the SCSI subsystem doesn't allocate (all) buffers but leaves DMA mapping and unmapping to the low-level drivers --- i.e. Ralph can't rip bus_to_virt replacements from there either, because: In my case, the infiniband (SRP) code is doing the mapping and only passing the dma_addr_t to the device driver at which point I have no way to convert it back to a kernel virtual address. I need to either change the IB device API to include mapping functions or intercept the dma_* functions so I can save the inputs. On the other hand, ieee1394/dma is the rather obvious example of a generic layer which keeps book of virtual address and bus address of mapped memory regions, for above or below layers to use as they need. Ralph, do you think you can arrange your required API change as a pure _extension_ of the IB API? I.e. add fields to data structs or add fields to callback templates or add calls into the SRP layer... (I haven't bothered to look at the API yet.) -- Stefan Richter -=-=-==- -=== - http://arcgraph.de/sr/ ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Suggestions for how to remove bus_to_virt()
On Thu, 2006-07-13 at 08:46 +0300, Muli Ben-Yehuda wrote: On Wed, Jul 12, 2006 at 05:40:13PM -0700, David Miller wrote: From: Roland Dreier [EMAIL PROTECTED] Date: Wed, 12 Jul 2006 17:11:26 -0700 A cleaner solution would be to make the dma_ API really use the device it's passed anyway, and allow drivers to override the standard PCI stuff nicely. But that would be major surgery, I guess. Clean but expensive, you should not force the rest of the kernel to eat the cost of something you want to do when it's totally unnecessary for most other users. For example, x86 never needs to do anything other than a direct virt_to_phys translation to produce a DMA address, no matter what bus the device is on. It's a single simple integer adjustment that can be done inline in about 2 or 3 instructions at most. It's possible that even x86 will support multiple IOMMUs in the future - for example, the Calgary IOMMU support we recently added to x86-64 could be modified to work on plain x86 as well. I like the idea of a per-device DMA-API implementation, but only if it can be done in a way that is zero cost to the majority of the users of the API. We already have dynamic dma_ops on x86-64 to support nommu, swiotlb, gart and Calgary cleanly, extending it to use a per-device dma-ops isn't too difficult. Cheers, Muli A per-device DMA-API would solve my problem. It would be a fairly invasive changeset though. The basic idea would be to add a struct dma_mapping_ops * to struct device and change all the inline dma_* routines to something like: static inline dma_addr_t dma_map_single(struct device *hwdev, void *ptr, size_t size, int direction) { BUG_ON(!valid_dma_direction(direction)); return hwdev-dma_ops ? hwdev-dma_ops-map_single(hwdev, ptr, size, direction) : dma_ops-map_single(hwdev, ptr, size, direction); } Note that the current design only supports one IOMMU type in a system. This could support multiple IOMMU types at the same time. Another possibility is to only do this for the infiniband subsystem. The idea would be to replace calls to dma_* with ib_dma_* which would be defined as above. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Suggestions for how to remove bus_to_virt()
From: Ralph Campbell [EMAIL PROTECTED] Date: Fri, 14 Jul 2006 15:27:07 -0700 Note that the current design only supports one IOMMU type in a system. This could support multiple IOMMU types at the same time. This is not true, the framework allows multiply such types and in fact Sparc64 takes advantage of this. We have about 4 or 5 different PCI controllers, and the IOMMUs are slightly different in each. Even with the standard PCI DMA mapping calls, we can gather the platform private information necessary to program the IOMMU appropriately for a given chipset. The dma_mapping_ops idea will never get accepted by folks like Linus, for reasons I've outlined in previous emails in this thread. So, it's best to look elsewhere for solutions to your problem, such as the ideas used by the USB and IEE1394 device layers. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Suggestions for how to remove bus_to_virt()
On Fri, 2006-07-14 at 15:35 -0700, David Miller wrote: From: Ralph Campbell [EMAIL PROTECTED] Date: Fri, 14 Jul 2006 15:27:07 -0700 Note that the current design only supports one IOMMU type in a system. This could support multiple IOMMU types at the same time. This is not true, the framework allows multiply such types and in fact Sparc64 takes advantage of this. We have about 4 or 5 different PCI controllers, and the IOMMUs are slightly different in each. I see. It looks like dma_map_single() is an inline call to pci_map_single() which is a function call that can then look at the device and tell what IOMMU function to call. Even with the standard PCI DMA mapping calls, we can gather the platform private information necessary to program the IOMMU appropriately for a given chipset. The dma_mapping_ops idea will never get accepted by folks like Linus, for reasons I've outlined in previous emails in this thread. So, it's best to look elsewhere for solutions to your problem, such as the ideas used by the USB and IEE1394 device layers. The USB code won't work in my case because the USB system is the one doing the memory allocation and IOMMU setup so it can remember the kernel virtual address or physical pages used to create the mapping. In my case, the infiniband (SRP) code is doing the mapping and only passing the dma_addr_t to the device driver at which point I have no way to convert it back to a kernel virtual address. I need to either change the IB device API to include mapping functions or intercept the dma_* functions so I can save the inputs. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Suggestions for how to remove bus_to_virt()
David Miller wrote: If you need device level DMA mapping semantics, create them for your device type. This is what USB does, btw. Ralph, two other examples where drivers provide some sort of address lookup are: - drivers/ieee1394/dma.[hc] AFAIK this deals with housekeeping of ringbuffers as used by 1394 controllers for isochronous transmit and receive. Users of this little API are dv1394, video1394, ohci1394. - patch dc395x: dynamically map scatter-gather for PIO by Guennadi Liakhovetski, http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=cdb8c2a6d848deb9eeef42974478fbb51b8c This mapping is not specific to SCSI. The user is a driver which mixes PIO and DMA. I don't know if these have any similarity to your requirements though. (I too need to come up with either a portable replacement of bus_to_virt or with a fundamentally different implementation but haven't started my project yet. This occurrence of bus_to_virt is in drivers/ieee1394/sbp2 but #ifdef'd out by default.) -- Stefan Richter -=-=-==- -=== -==-- http://arcgraph.de/sr/ ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Suggestions for how to remove bus_to_virt()
A cleaner solution would be to make the dma_ API really use the device it's passed anyway, and allow drivers to override the standard PCI stuff nicely. But that would be major surgery, I guess. Clean but expensive, you should not force the rest of the kernel to eat the cost of something you want to do when it's totally unnecessary for most other users. OK, fair enough. For example, x86 never needs to do anything other than a direct virt_to_phys translation to produce a DMA address, no matter what bus the device is on. It's a single simple integer adjustment that can be done inline in about 2 or 3 instructions at most. pedanticExcept x86 needs to handle systems with IOMMUs now.../pedantic If you need device level DMA mapping semantics, create them for your device type. This is what USB does, btw. Makes sense -- Ralph, I would suggest looking at USB as a model. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Suggestions for how to remove bus_to_virt()
Thanks to all for the pointers and suggestions. It will probably take me a while to follow up on these and make another proposal. On Thu, 2006-07-13 at 09:02 -0700, Roland Dreier wrote: A cleaner solution would be to make the dma_ API really use the device it's passed anyway, and allow drivers to override the standard PCI stuff nicely. But that would be major surgery, I guess. Clean but expensive, you should not force the rest of the kernel to eat the cost of something you want to do when it's totally unnecessary for most other users. OK, fair enough. For example, x86 never needs to do anything other than a direct virt_to_phys translation to produce a DMA address, no matter what bus the device is on. It's a single simple integer adjustment that can be done inline in about 2 or 3 instructions at most. pedanticExcept x86 needs to handle systems with IOMMUs now.../pedantic If you need device level DMA mapping semantics, create them for your device type. This is what USB does, btw. Makes sense -- Ralph, I would suggest looking at USB as a model. - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
[openib-general] Suggestions for how to remove bus_to_virt()
I have been looking at how to eliminate the bus_to_virt() and phys_to_virt() calls used by the ib_ipath driver. I am looking for suggestions on how to proceed. The current IB core to IB device driver interface relies on a kernel module being able to call ib_get_dma_mr() to allocate a memory region which represents all of device addressable memory. The kernel module is then expected to call dma_map_single(), dma_map_sg(), etc. to convert physical or virtual addresses into device addresses. If the system has an IOMMU, there may be several physical pages mapped to a single contiguous device address region. This device address and length (possibly an array of them) is then passed to the IB device driver so the IB device can DMA data to or from memory. The ib_ipath driver cannot tell the HW to DMA data directly to the device (IOMMU) addresses and must copy the data. This means the driver needs to reverse the IOMMU mapping and somehow obtain kernel virtual addresses so it can memcpy() the data to the correct location. Currently, the ib_ipath driver requires that the mapping be one-to-one since there is no practical way to reverse IOMMU mappings. I believe it is generally agreed that trying to change the dma_map_* interface to include functions of this sort is not the right approach to take. One solution is to change the IB device driver interface so that kernel virtual addresses are passed to the IB device driver and the device driver is responsible for calling dma_map_single(), etc. I believe this will be unacceptable to the OpenFabrics community since it would require the driver to allocate large amounts of memory (#QPs * #MaxWRs * sizeof(dma_addr_t + length)) to store the information needed to undo the mapping when the DMA is complete. The current IB code allocates the storage for dma_unmap_single(), etc. as extra elements in structures already needed so it isn't a large overhead and it is based on the actual number of requests posted instead of the maximums allowed. Another solution is to change the IB device driver interface to add a function which tells the caller what type of addresses the device expects. Kernel modules would then be required to pass either a dma_map_xxx() address or a kernel virtual address based on the driver's preference. The current set of IB consumers either start with kmalloc/vmalloc memory (such as the MAD layer) or a list of physical pages (such as ISER and SRP). The current code could therefore be fairly easily changed except for ISER/SRP when a struct page doesn't have a direct kernel address (high pages) and would need to call kmap()/kunmap() in that case. I plan to implement this last approach unless someone has a better idea. I would like to get some buy-in before I spend a lot time coding only to be rejected when finished. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Suggestions for how to remove bus_to_virt()
From: Ralph Campbell [EMAIL PROTECTED] Date: Wed, 12 Jul 2006 16:29:27 -0700 Currently, the ib_ipath driver requires that the mapping be one-to-one since there is no practical way to reverse IOMMU mappings. You can maintain a hash table that maps DMA addresses back to kernel mappings. Depending upon your situation, you can optimize this to use very small keys if you have some kind of other identification method for your buffers. That would be for dynamic mappings. You were using consistent DMA memory, which I gather you're not, you could use the PCI DMA pool mechanism. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Suggestions for how to remove bus_to_virt()
One solution is to change the IB device driver interface so that kernel virtual addresses are passed to the IB device driver and the device driver is responsible for calling dma_map_single(), etc. I believe this will be unacceptable to the OpenFabrics community Actually it's worse than unacceptable -- I don't see how this can work at all. The problem is that addresses are not just passed directly to the local HCA; they also might be embedded in protocol messages that are sent to a remote HCA and then used by the remote HCA to initiate RDMA. For example, the SRP driver uses ib_get_dma_mr() to get an R_Key, which it then sends to the target along with a DMA address. The target uses that R_Key/address to RDMA data directly to or from the host. There's no good way for the low-level driver to handle the DMA mapping, since the address is embedded in a protocol message that the low-level driver knows nothing about. Another solution is to change the IB device driver interface to add a function which tells the caller what type of addresses the device expects. Kernel modules would then be required to pass either a dma_map_xxx() address or a kernel virtual address based on the driver's preference. The current set of IB consumers either start with kmalloc/vmalloc memory (such as the MAD layer) or a list of physical pages (such as ISER and SRP). The current code could therefore be fairly easily changed except for ISER/SRP when a struct page doesn't have a direct kernel address (high pages) and would need to call kmap()/kunmap() in that case. I have a few problems with this: first, it's unfortunate that every consumer needs two code paths to handle the two possibilities; second, I don't see how it handles the case of SRP's use of the ib_get_dma_mr() R_Key as above anyway; third, expecting consumers to kmap pages for a long time across work request execution is a bad idea. Maybe the least bad solution would be to add rdma_XXX wrappers around the dma mapping functions that RDMA consumers use; then most low-level drivers could just pass them through to the DMA mapping API, while the ipath driver could handle things itself. The problem with that is that it ends up wrapping a huge API -- for example, you need both dma_map_single and dma_map_sg at least, plus someone might want to use dma_alloc_coherent memory, not to mention the dma_pool stuff, etc. A cleaner solution would be to make the dma_ API really use the device it's passed anyway, and allow drivers to override the standard PCI stuff nicely. But that would be major surgery, I guess. (BTW, vmalloc memory should not be used for DMA, since it's not guaranteed to be DMA-able -- so anyone doing that is just wrong) - R. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Suggestions for how to remove bus_to_virt()
From: Roland Dreier [EMAIL PROTECTED] Date: Wed, 12 Jul 2006 17:11:26 -0700 A cleaner solution would be to make the dma_ API really use the device it's passed anyway, and allow drivers to override the standard PCI stuff nicely. But that would be major surgery, I guess. Clean but expensive, you should not force the rest of the kernel to eat the cost of something you want to do when it's totally unnecessary for most other users. For example, x86 never needs to do anything other than a direct virt_to_phys translation to produce a DMA address, no matter what bus the device is on. It's a single simple integer adjustment that can be done inline in about 2 or 3 instructions at most. Once you start allowing overrides then even x86 starts to eat the stupid costs of dereferencing some kind of device ops method. That doesn't make any sense, and that's why the DMA API works the way it does now. It's a platform or bus operation, not a device one. If you need device level DMA mapping semantics, create them for your device type. This is what USB does, btw. ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [openib-general] Suggestions for how to remove bus_to_virt()
On Wed, Jul 12, 2006 at 05:40:13PM -0700, David Miller wrote: From: Roland Dreier [EMAIL PROTECTED] Date: Wed, 12 Jul 2006 17:11:26 -0700 A cleaner solution would be to make the dma_ API really use the device it's passed anyway, and allow drivers to override the standard PCI stuff nicely. But that would be major surgery, I guess. Clean but expensive, you should not force the rest of the kernel to eat the cost of something you want to do when it's totally unnecessary for most other users. For example, x86 never needs to do anything other than a direct virt_to_phys translation to produce a DMA address, no matter what bus the device is on. It's a single simple integer adjustment that can be done inline in about 2 or 3 instructions at most. It's possible that even x86 will support multiple IOMMUs in the future - for example, the Calgary IOMMU support we recently added to x86-64 could be modified to work on plain x86 as well. I like the idea of a per-device DMA-API implementation, but only if it can be done in a way that is zero cost to the majority of the users of the API. We already have dynamic dma_ops on x86-64 to support nommu, swiotlb, gart and Calgary cleanly, extending it to use a per-device dma-ops isn't too difficult. Cheers, Muli ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general