Re: [openib-general] Suggestions for how to remove bus_to_virt()

2006-07-15 Thread Stefan Richter
Ralph Campbell wrote:
 On Fri, 2006-07-14 at 15:35 -0700, David Miller wrote:
...
 The dma_mapping_ops idea will never get accepted by folks like Linus,
 for reasons I've outlined in previous emails in this thread.  So, it's
 best to look elsewhere for solutions to your problem, such as the
 ideas used by the USB and IEE1394 device layers.
 
 The USB code won't work in my case because the USB system is
 the one doing the memory allocation and IOMMU setup so it
 can remember the kernel virtual address or physical pages used
 to create the mapping.

Side note: The same is true with the DMA stuff in the ieee1394
subsystem. And the SCSI subsystem doesn't allocate (all) buffers but
leaves DMA mapping and unmapping to the low-level drivers --- i.e. Ralph
can't rip bus_to_virt replacements from there either, because:

 In my case, the infiniband (SRP) code is doing the mapping and
 only passing the dma_addr_t to the device driver at which point
 I have no way to convert it back to a kernel virtual address.
 I need to either change the IB device API to include mapping functions
 or intercept the dma_* functions so I can save the inputs.

On the other hand, ieee1394/dma is the rather obvious example of a
generic layer which keeps book of virtual address and bus address of
mapped memory regions, for above or below layers to use as they need.

Ralph, do you think you can arrange your required API change as a pure
_extension_ of the IB API? I.e. add fields to data structs or add fields
to callback templates or add calls into the SRP layer... (I haven't
bothered to look at the API yet.)
-- 
Stefan Richter
-=-=-==- -=== -
http://arcgraph.de/sr/

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Suggestions for how to remove bus_to_virt()

2006-07-14 Thread Ralph Campbell
On Thu, 2006-07-13 at 08:46 +0300, Muli Ben-Yehuda wrote:
 On Wed, Jul 12, 2006 at 05:40:13PM -0700, David Miller wrote:
  From: Roland Dreier [EMAIL PROTECTED]
  Date: Wed, 12 Jul 2006 17:11:26 -0700
  
   A cleaner solution would be to make the dma_ API really use the device
   it's passed anyway, and allow drivers to override the standard PCI
   stuff nicely.  But that would be major surgery, I guess.
  
  Clean but expensive, you should not force the rest of the kernel
  to eat the cost of something you want to do when it's totally
  unnecessary for most other users.
  
  For example, x86 never needs to do anything other than a direct
  virt_to_phys translation to produce a DMA address, no matter what
  bus the device is on.  It's a single simple integer adjustment
  that can be done inline in about 2 or 3 instructions at most.
 
 It's possible that even x86 will support multiple IOMMUs in the future
 - for example, the Calgary IOMMU support we recently added to x86-64
 could be modified to work on plain x86 as well.
 
 I like the idea of a per-device DMA-API implementation, but only if it
 can be done in a way that is zero cost to the majority of the users of
 the API. We already have dynamic dma_ops on x86-64 to support nommu,
 swiotlb, gart and Calgary cleanly, extending it to use a per-device
 dma-ops isn't too difficult.
 
 Cheers,
 Muli

A per-device DMA-API would solve my problem.
It would be a fairly invasive changeset though.
The basic idea would be to add a struct dma_mapping_ops *
to struct device and change all the inline dma_* routines
to something like:

static inline dma_addr_t
dma_map_single(struct device *hwdev, void *ptr, size_t size,
   int direction)
{
BUG_ON(!valid_dma_direction(direction));
return hwdev-dma_ops ? 
hwdev-dma_ops-map_single(hwdev, ptr, size, direction) :
dma_ops-map_single(hwdev, ptr, size, direction);
}

Note that the current design only supports one IOMMU type in a system.
This could support multiple IOMMU types at the same time.

Another possibility is to only do this for the infiniband subsystem.
The idea would be to replace calls to dma_* with ib_dma_* which
would be defined as above.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Suggestions for how to remove bus_to_virt()

2006-07-14 Thread David Miller
From: Ralph Campbell [EMAIL PROTECTED]
Date: Fri, 14 Jul 2006 15:27:07 -0700

 Note that the current design only supports one IOMMU type in a system.
 This could support multiple IOMMU types at the same time.

This is not true, the framework allows multiply such types
and in fact Sparc64 takes advantage of this.  We have about
4 or 5 different PCI controllers, and the IOMMUs are slightly
different in each.

Even with the standard PCI DMA mapping calls, we can gather the
platform private information necessary to program the IOMMU
appropriately for a given chipset.

The dma_mapping_ops idea will never get accepted by folks like Linus,
for reasons I've outlined in previous emails in this thread.  So, it's
best to look elsewhere for solutions to your problem, such as the
ideas used by the USB and IEE1394 device layers.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Suggestions for how to remove bus_to_virt()

2006-07-14 Thread Ralph Campbell
On Fri, 2006-07-14 at 15:35 -0700, David Miller wrote:
 From: Ralph Campbell [EMAIL PROTECTED]
 Date: Fri, 14 Jul 2006 15:27:07 -0700
 
  Note that the current design only supports one IOMMU type in a system.
  This could support multiple IOMMU types at the same time.
 
 This is not true, the framework allows multiply such types
 and in fact Sparc64 takes advantage of this.  We have about
 4 or 5 different PCI controllers, and the IOMMUs are slightly
 different in each.

I see. It looks like dma_map_single() is an inline call to
pci_map_single() which is a function call that can then
look at the device and tell what IOMMU function to call.

 Even with the standard PCI DMA mapping calls, we can gather the
 platform private information necessary to program the IOMMU
 appropriately for a given chipset.
 
 The dma_mapping_ops idea will never get accepted by folks like Linus,
 for reasons I've outlined in previous emails in this thread.  So, it's
 best to look elsewhere for solutions to your problem, such as the
 ideas used by the USB and IEE1394 device layers.

The USB code won't work in my case because the USB system is
the one doing the memory allocation and IOMMU setup so it
can remember the kernel virtual address or physical pages used
to create the mapping.

In my case, the infiniband (SRP) code is doing the mapping and
only passing the dma_addr_t to the device driver at which point
I have no way to convert it back to a kernel virtual address.
I need to either change the IB device API to include mapping functions
or intercept the dma_* functions so I can save the inputs.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Suggestions for how to remove bus_to_virt()

2006-07-13 Thread Stefan Richter
David Miller wrote:
 If you need device level DMA mapping semantics, create them for your
 device type.  This is what USB does, btw.

Ralph,
two other examples where drivers provide some sort of address lookup are:

 - drivers/ieee1394/dma.[hc]
   AFAIK this deals with housekeeping of ringbuffers as used by
   1394 controllers for isochronous transmit and receive. Users of
   this little API are dv1394, video1394, ohci1394.

 - patch dc395x: dynamically map scatter-gather for PIO by
   Guennadi Liakhovetski,
http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=cdb8c2a6d848deb9eeef42974478fbb51b8c
   This mapping is not specific to SCSI. The user is a driver which
   mixes PIO and DMA.

I don't know if these have any similarity to your requirements though.

(I too need to come up with either a portable replacement of bus_to_virt
or with a fundamentally different implementation but haven't started my
project yet. This occurrence of bus_to_virt is in drivers/ieee1394/sbp2
but #ifdef'd out by default.)
-- 
Stefan Richter
-=-=-==- -=== -==--
http://arcgraph.de/sr/

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Suggestions for how to remove bus_to_virt()

2006-07-13 Thread Roland Dreier
   A cleaner solution would be to make the dma_ API really use the device
   it's passed anyway, and allow drivers to override the standard PCI
   stuff nicely.  But that would be major surgery, I guess.

  Clean but expensive, you should not force the rest of the kernel
  to eat the cost of something you want to do when it's totally
  unnecessary for most other users.

OK, fair enough.

  For example, x86 never needs to do anything other than a direct
  virt_to_phys translation to produce a DMA address, no matter what
  bus the device is on.  It's a single simple integer adjustment
  that can be done inline in about 2 or 3 instructions at most.

pedanticExcept x86 needs to handle systems with IOMMUs now.../pedantic

  If you need device level DMA mapping semantics, create them for your
  device type.  This is what USB does, btw.

Makes sense -- Ralph, I would suggest looking at USB as a model.

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Suggestions for how to remove bus_to_virt()

2006-07-13 Thread Ralph Campbell
Thanks to all for the pointers and suggestions.
It will probably take me a while to follow up on these
and make another proposal.

On Thu, 2006-07-13 at 09:02 -0700, Roland Dreier wrote:
A cleaner solution would be to make the dma_ API really use the device
it's passed anyway, and allow drivers to override the standard PCI
stuff nicely.  But that would be major surgery, I guess.
 
   Clean but expensive, you should not force the rest of the kernel
   to eat the cost of something you want to do when it's totally
   unnecessary for most other users.
 
 OK, fair enough.
 
   For example, x86 never needs to do anything other than a direct
   virt_to_phys translation to produce a DMA address, no matter what
   bus the device is on.  It's a single simple integer adjustment
   that can be done inline in about 2 or 3 instructions at most.
 
 pedanticExcept x86 needs to handle systems with IOMMUs now.../pedantic
 
   If you need device level DMA mapping semantics, create them for your
   device type.  This is what USB does, btw.
 
 Makes sense -- Ralph, I would suggest looking at USB as a model.
 
  - R.


___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



[openib-general] Suggestions for how to remove bus_to_virt()

2006-07-12 Thread Ralph Campbell
I have been looking at how to eliminate the bus_to_virt() and
phys_to_virt() calls used by the ib_ipath driver.
I am looking for suggestions on how to proceed.

The current IB core to IB device driver interface relies
on a kernel module being able to call ib_get_dma_mr() to allocate
a memory region which represents all of device addressable memory.
The kernel module is then expected to call dma_map_single(),
dma_map_sg(), etc. to convert physical or virtual addresses into
device addresses.  If the system has an IOMMU, there may be several
physical pages mapped to a single contiguous device address region.
This device address and length (possibly an array of them) is then
passed to the IB device driver so the IB device can DMA data
to or from memory.

The ib_ipath driver cannot tell the HW to DMA data directly to the
device (IOMMU) addresses and must copy the data.  This means the driver
needs to reverse the IOMMU mapping and somehow obtain kernel virtual
addresses so it can memcpy() the data to the correct location.
Currently, the ib_ipath driver requires that the mapping be one-to-one
since there is no practical way to reverse IOMMU mappings.

I believe it is generally agreed that trying to change the dma_map_*
interface to include functions of this sort is not the right approach
to take.

One solution is to change the IB device driver interface so that
kernel virtual addresses are passed to the IB device driver and
the device driver is responsible for calling dma_map_single(), etc.
I believe this will be unacceptable to the OpenFabrics community
since it would require the driver to allocate large amounts of memory
(#QPs * #MaxWRs * sizeof(dma_addr_t + length)) to store the
information needed to undo the mapping when the DMA is complete.
The current IB code allocates the storage for dma_unmap_single(), etc.
as extra elements in structures already needed so it isn't a large
overhead and it is based on the actual number of requests posted
instead of the maximums allowed.

Another solution is to change the IB device driver interface to add
a function which tells the caller what type of addresses the device
expects.  Kernel modules would then be required to pass either a
dma_map_xxx() address or a kernel virtual address based on the
driver's preference.
The current set of IB consumers either start with kmalloc/vmalloc
memory (such as the MAD layer) or a list of physical pages
(such as ISER and SRP). The current code could therefore be
fairly easily changed except for ISER/SRP when a struct page
doesn't have a direct kernel address (high pages) and would
need to call kmap()/kunmap() in that case.

I plan to implement this last approach unless someone has
a better idea.  I would like to get some buy-in before
I spend a lot time coding only to be rejected when finished.



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Suggestions for how to remove bus_to_virt()

2006-07-12 Thread David Miller
From: Ralph Campbell [EMAIL PROTECTED]
Date: Wed, 12 Jul 2006 16:29:27 -0700

 Currently, the ib_ipath driver requires that the mapping be
 one-to-one since there is no practical way to reverse IOMMU
 mappings.

You can maintain a hash table that maps DMA addresses back to kernel
mappings.  Depending upon your situation, you can optimize this to use
very small keys if you have some kind of other identification method
for your buffers.

That would be for dynamic mappings.

You were using consistent DMA memory, which I gather you're not,
you could use the PCI DMA pool mechanism.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Suggestions for how to remove bus_to_virt()

2006-07-12 Thread Roland Dreier
  One solution is to change the IB device driver interface so that
  kernel virtual addresses are passed to the IB device driver and
  the device driver is responsible for calling dma_map_single(), etc.
  I believe this will be unacceptable to the OpenFabrics community

Actually it's worse than unacceptable -- I don't see how this can work
at all.  The problem is that addresses are not just passed directly to
the local HCA; they also might be embedded in protocol messages that
are sent to a remote HCA and then used by the remote HCA to initiate
RDMA.

For example, the SRP driver uses ib_get_dma_mr() to get an R_Key,
which it then sends to the target along with a DMA address.  The
target uses that R_Key/address to RDMA data directly to or from the
host.  There's no good way for the low-level driver to handle the DMA
mapping, since the address is embedded in a protocol message that the
low-level driver knows nothing about.

  Another solution is to change the IB device driver interface to add
  a function which tells the caller what type of addresses the device
  expects.  Kernel modules would then be required to pass either a
  dma_map_xxx() address or a kernel virtual address based on the
  driver's preference.
  The current set of IB consumers either start with kmalloc/vmalloc
  memory (such as the MAD layer) or a list of physical pages
  (such as ISER and SRP). The current code could therefore be
  fairly easily changed except for ISER/SRP when a struct page
  doesn't have a direct kernel address (high pages) and would
  need to call kmap()/kunmap() in that case.

I have a few problems with this: first, it's unfortunate that every
consumer needs two code paths to handle the two possibilities; second,
I don't see how it handles the case of SRP's use of the
ib_get_dma_mr() R_Key as above anyway; third, expecting consumers to
kmap pages for a long time across work request execution is a bad
idea.

Maybe the least bad solution would be to add rdma_XXX wrappers around
the dma mapping functions that RDMA consumers use; then most low-level
drivers could just pass them through to the DMA mapping API, while the
ipath driver could handle things itself.

The problem with that is that it ends up wrapping a huge API -- for
example, you need both dma_map_single and dma_map_sg at least, plus
someone might want to use dma_alloc_coherent memory, not to mention
the dma_pool stuff, etc.

A cleaner solution would be to make the dma_ API really use the device
it's passed anyway, and allow drivers to override the standard PCI
stuff nicely.  But that would be major surgery, I guess.

(BTW, vmalloc memory should not be used for DMA, since it's not
guaranteed to be DMA-able -- so anyone doing that is just wrong)

 - R.

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Suggestions for how to remove bus_to_virt()

2006-07-12 Thread David Miller
From: Roland Dreier [EMAIL PROTECTED]
Date: Wed, 12 Jul 2006 17:11:26 -0700

 A cleaner solution would be to make the dma_ API really use the device
 it's passed anyway, and allow drivers to override the standard PCI
 stuff nicely.  But that would be major surgery, I guess.

Clean but expensive, you should not force the rest of the kernel
to eat the cost of something you want to do when it's totally
unnecessary for most other users.

For example, x86 never needs to do anything other than a direct
virt_to_phys translation to produce a DMA address, no matter what
bus the device is on.  It's a single simple integer adjustment
that can be done inline in about 2 or 3 instructions at most.

Once you start allowing overrides then even x86 starts to eat the
stupid costs of dereferencing some kind of device ops method.

That doesn't make any sense, and that's why the DMA API works the way
it does now.  It's a platform or bus operation, not a device one.

If you need device level DMA mapping semantics, create them for your
device type.  This is what USB does, btw.



___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



Re: [openib-general] Suggestions for how to remove bus_to_virt()

2006-07-12 Thread Muli Ben-Yehuda
On Wed, Jul 12, 2006 at 05:40:13PM -0700, David Miller wrote:
 From: Roland Dreier [EMAIL PROTECTED]
 Date: Wed, 12 Jul 2006 17:11:26 -0700
 
  A cleaner solution would be to make the dma_ API really use the device
  it's passed anyway, and allow drivers to override the standard PCI
  stuff nicely.  But that would be major surgery, I guess.
 
 Clean but expensive, you should not force the rest of the kernel
 to eat the cost of something you want to do when it's totally
 unnecessary for most other users.
 
 For example, x86 never needs to do anything other than a direct
 virt_to_phys translation to produce a DMA address, no matter what
 bus the device is on.  It's a single simple integer adjustment
 that can be done inline in about 2 or 3 instructions at most.

It's possible that even x86 will support multiple IOMMUs in the future
- for example, the Calgary IOMMU support we recently added to x86-64
could be modified to work on plain x86 as well.

I like the idea of a per-device DMA-API implementation, but only if it
can be done in a way that is zero cost to the majority of the users of
the API. We already have dynamic dma_ops on x86-64 to support nommu,
swiotlb, gart and Calgary cleanly, extending it to use a per-device
dma-ops isn't too difficult.

Cheers,
Muli

___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general