On Fri, Jan 29, 2016 at 2:28 PM, Eric Dumazet <eric.duma...@gmail.com> wrote: > On Fri, 2016-01-29 at 14:08 -0800, Alexander Duyck wrote: > >> It also means DMA becomes dramatically slower as it introduces a >> partial write access for the start of every frame. It is why we had >> set NET_IP_ALIGN to 0 on x86 since DMA was becoming more expensive >> when unaligned then reading IP unaligned headers. > > Well, I guess that if you have an arch where DMA accesses are slow and > NET_IP_ALIGN = 2, you are out of luck. This is why some platforms are > better than others.
The other bit you forgot to mention was an IOMMU. That is another per-architecture thing that can really slow us down. Back when I rewrote the receive path I was dealing with a number of performance complaints on PowerPC. The approach I took with the Intel drivers was supposed to be the best compromise for IOMMU, DMA alignment, and IP header alignment. >> >> The gain on recvmsg would probably be minimal. The only time I have >> seen any significant speed-up for copying is if you can get both ends >> aligned to something like 16B. > > On modern intel cpus, this does not matter at all, sure. It took a while > before "rep movsb" finally did the right thing. > > memcpy() and friends implementations are much slower on some older > arches (when dealing with unaligned src/dst) > > arch/mips/lib/memcpy.S is a gem ;) Yeah. I can imagine. The fact is you can't may everybody happy so I am good with just trying to support the majority architectures as best as possible if a few have to take a performance hit for an unaligned memcpy then so be it. - Alex