On Thu, Apr 14, 2016 at 02:18:32PM -0700, Benjamin Serebrin wrote: > On Thu, Apr 14, 2016 at 2:05 PM, Adam Morrison <m...@cs.technion.ac.il> wrote: > > On Thu, Apr 14, 2016 at 9:26 PM, Benjamin Serebrin via iommu > > <iommu@lists.linux-foundation.org> wrote: > > > >> It was pointed out that DMA_32 or _24 (or anything other non-64 size) > >> could be starved if the magazines on all cores are full and the depot > >> is empty. (This gets more probable with increased core count.) You > >> could try one more time: call free_iova_rcaches() and try alloc_iova > >> again before giving up > > > > That's not safe, unfortunately. free_iova_rcaches() is meant to be > > called only when the domain is dying and the CPUs won't access the > > rcaches. > > Fair enough. Is it possible to make this safe, cleanly and without > too much locking during the normal case? > > > It's tempting to make the rcaches work only for DMA_64 allocations. > > This would also solve the problem of respecting the pfn_limit when > > allocating, which Shaohua Li pointed out. Sadly, intel-iommu.c > > converts DMA_64 to DMA_32 by default, apparently to avoid dual address > > cycles on the PCI bus. I wonder about the importance of this, though, > > as it doesn't seem that anything equivalent happens when iommu=off. > > I agree. It's tempting to make all DMA_64 allocations grow up from > 4G, leaving the entire 32 bit space free for small allocations. I'd > be willing to argue that that should be the default, with some > override for anyone who finds it objectionable. > > Dual address cycle is really "4 more bytes in the TLP header" on PCIe; > a 32-bit address takes 3 doublewords (12 bytes) while a 64-bit address > takes 4 DW (16 bytes). What's 25% of a read request between friends? > And every read request has a read response 3DW TLP plus its data, so > the aggregate bandwidth consumed is getting uninteresting. Similarly > for writes, the additional address bytes don't cost a large > percentage. > > That being said, it's a rare device that needs more than 4GB of active > address space, and it's a rare system that needs to mix a > performance-critical DMA_32 (or 24) and _64 device in the same page > table.
I'm not sure about the TLP overhead. IOMMU is not only for pcie device. there are pcie-to-pcix/pci bridge, any pci device can reside behind it. The device might not handle DMA_64. DAC has overhead for pcix device iirc, which somebody might care about. So let's not break such devices. Thanks, Shaohua _______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu