On Thu, Apr 14, 2016 at 02:18:32PM -0700, Benjamin Serebrin wrote:
> On Thu, Apr 14, 2016 at 2:05 PM, Adam Morrison <m...@cs.technion.ac.il> wrote:
> > On Thu, Apr 14, 2016 at 9:26 PM, Benjamin Serebrin via iommu
> > <iommu@lists.linux-foundation.org> wrote:
> >
> >> It was pointed out that DMA_32 or _24 (or anything other non-64 size)
> >> could be starved if the magazines on all cores are full and the depot
> >> is empty.  (This gets more probable with increased core count.)  You
> >> could try one more time: call free_iova_rcaches() and try alloc_iova
> >> again before giving up
> >
> > That's not safe, unfortunately.  free_iova_rcaches() is meant to be
> > called only when the domain is dying and the CPUs won't access the
> > rcaches.
> 
> Fair enough.  Is it possible to make this safe, cleanly and without
> too much locking during the normal case?
> 
> > It's tempting to make the rcaches work only for DMA_64 allocations.
> > This would also solve the problem of respecting the pfn_limit when
> > allocating, which Shaohua Li pointed out.  Sadly, intel-iommu.c
> > converts DMA_64 to DMA_32 by default, apparently to avoid dual address
> > cycles on the PCI bus.  I wonder about the importance of this, though,
> > as it doesn't seem that anything equivalent happens when iommu=off.
> 
> I agree.  It's tempting to make all DMA_64 allocations grow up from
> 4G, leaving the entire 32 bit space free for small allocations.  I'd
> be willing to argue that that should be the default, with some
> override for anyone who finds it objectionable.
> 
> Dual address cycle is really "4 more bytes in the TLP header" on PCIe;
> a 32-bit address takes 3 doublewords (12 bytes) while a 64-bit address
> takes 4 DW (16 bytes).  What's 25% of a read request between friends?
> And every read request has a read response 3DW TLP plus its data, so
> the aggregate bandwidth consumed is getting uninteresting.  Similarly
> for writes, the additional address bytes don't cost a large
> percentage.
> 
> That being said, it's a rare device that needs more than 4GB of active
> address space, and it's a rare system that needs to mix a
> performance-critical DMA_32 (or 24) and _64 device in the same page
> table.

I'm not sure about the TLP overhead. IOMMU is not only for pcie device.
there are pcie-to-pcix/pci bridge, any pci device can reside behind it.
The device might not handle DMA_64. DAC has overhead for pcix device
iirc, which somebody might care about. So let's not break such devices.

Thanks,
Shaohua
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Reply via email to