On Sun, Jun 12, 2016 at 02:29:42PM +0000, Jianfeng Tan wrote: > Compile DPDK with clang, below line in virtio_rxtx.c could be > optimized with four "VMOVAPS ymm, m256". > memset(&rxvq->fake_mbuf, 0, sizeof(rxvq->fake_mbuf)); > > This instruction requires memory address is 32-byte aligned. > Or, it leads to segfault. Although only tested with Clang 3.6.0, > it can be reproduced in any compilers, which do aggressive > optimization, aka, change memset of known length to VMOVAPS. > > The fact that struct rte_mbuf is cache line aligned, can only make > sure fake_mbuf is aligned compared to the start address of struct > virtnet_rx. Unfortunately, this address is not necessarily aligned > because it's allocated by: > rxvq = (struct virtnet_rx *)RTE_PTR_ADD(vq, sz_vq); > > When sz_vq is not aligned, then rxvq cannot be allocated with an > aligned address, and then rxvq->fake_mbuf (addr of rxvq + cache line > size) is not an aligned address. > > The fix is very simple that making sz_vq 32-byte aligned. Here we > make it cache line aligned for future optimization. > > Fixes: a900472aedef ("virtio: split virtio Rx/Tx queue")
Folded this fix (and the other fix) to above commit, so that we could have a clean working tree. Thanks for good catch! --yliu