Mark Johnson wrote:


Andrew Gallatin wrote:
Mark Johnson wrote:


Andrew Gallatin wrote:

How about what I've suggested in the past?  Let's copy the one
networking thing that MacOSX got right (at least before Adi started
at Apple :) and keep network buffers pre-mapped in the IOMMU at a
system level.

This means we invent a new mblk data allocator which allocates buffers
which are "optimized" for network drivers.  On IOMMU-less systems,
these buffers have their physical address associated with them
(somehow).  On IOMMU-full systems, these buffers are a static pool
which is pre-allocated at boot time, and pre-mapped in the IOMMU.
Rather than their physical addresses, they have their IOMMU address
associated with them.  Drivers continue to use the existing DDI DMA
interface, but it just gets much, much faster because 90% of the code
path goes away for data blocks allocated by this fastpath.  There are
a few cases where this can break down (multiple IOMMUs per system,
IOMMU exhaustion), and then things would only devolve to what we have
now.

So a global cache of pre-allocated and mapped mblk's?
If the NIC doesn't support a 64-bit DMA address, you
make it go through the slow path..

PCI DAC has been around for more than 10 years.  Any NIC which
cannot handle 64-bit DMA deserves to take the slow path at
the very least.

The tx path uses the cache at the top of the stack
and gld or a new ddi_dma_mblk_bind knows how to get
to the cookies?

You might just be able to use the existing DMA DDI.  For
example, put all optimized buffers in a particular virtual
address space in kernel VM, and then the DDI routines
would be able to tell if it was from the optimized pool
just by looking at the pointer.

yeah, you could hash the address..  But why not just
have a pointer to an array of ddi_dma_cookie
off of the mblk_t.

If it's non-null, you have your cookies.  If
it's NULL, NIC driver calls bind.

Or takes the bcopy slow path. :-)

Btw, on the matter of DAC, many many NICs out there (not 10GbE) don't support DAC. For some that do (Marvell) DAC comes at additional cost (2x descriptor cost) It would be *nice* if we could support those as well. Some relatively easy solutions:

*) For rx, allow me to specify DMA attributes to be associated with mblk allocation

*) For tx, just keep the addresses in low space for now. There should be enough space to find a few hundred MB of VA space for packet buffers under the 4GB limit. I don't think we need to support gigabytes of these after all. When the pool is exhausted, the system could internally resort to bcopy.

   -- Garrett



MRJ





_______________________________________________
networking-discuss mailing list
networking-disc...@opensolaris.org

_______________________________________________
driver-discuss mailing list
driver-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/driver-discuss

Reply via email to