Hi Neil,
> The ixgbe pmd currently can't be built without enabling sse instructions at
> compile time.
Actually it can, all you have to do is set RTE_IXGBE_INC_VECTOR=n in your
config.
> While sse extensions provide better performance, theres no reason
> that we can't still create builds to run on systems that don't support sse.
> If
> we modify the ixgbe code to use the __builtin_shuffle and __builtin_popcountll
> functions, I've confirmed that the gcc compiler emits the appropriate sse
> instructions when the provided -march parameter indicates a machine that
> includes sse support, and emits generic code when see isn't available.
I don't think it is ok to blindly replace _mm_shuffle_epi8 with
__builtin_shuffle.
They are not identical.
I tried your patch on IVB box (gcc 4.8.3, CONFIG_RTE_MACHINE="native").
The result is - ixgbe_recv_pkts_vec() functionality is broken.
See below for more details.
So my vote is NACK.
Konstantin
1. Code changes:
uint16_t
ixgbe_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
uint16_t nb_pkts)
{
...
__m128i shuf_msk;
...
/* mask to shuffle from desc. to mbuf */
shuf_msk = _mm_set_epi8(
7, 6, 5, 4, /* octet 4~7, 32bits rss */
0xFF, 0xFF, /* skip high 16 bits vlan_macip, zero out */
15, 14, /* octet 14~15, low 16 bits vlan_macip */
0xFF, 0xFF, /* skip high 16 bits pkt_len, zero out */
13, 12, /* octet 12~13, low 16 bits pkt_len */
0xFF, 0xFF, /* skip nb_segs and in_port, zero out */
13, 12 /* octet 12~13, 16 bits data_len */
);
...
for (...) {
__m128i descs[RTE_IXGBE_DESCS_PER_LOOP];
__m128i pkt_mb1, pkt_mb2, pkt_mb3, pkt_mb4;
...
descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3));
...
- pkt_mb4 = _mm_shuffle_epi8(descs[3], shuf_msk);
+ pkt_mb4 = __builtin_shuffle(descs[3], shuf_msk);
...
2. Code generated before the patch (valid one):
...
vmovdqa 0x4978d(%rip),%xmm0 /* load shuf_msk */
...
vmovdqu 0x30(%rdx),%xmm4 /* load desc[3] */
....
vpshufb %xmm0,%xmm4,%xmm8
....
3. Code generated after the patch applied (broken one):
...
vmovdqu 0x30(%rdx),%xmm3
...
vpunpcklqdq %xmm3,%xmm3,%xmm3 /* !!! ERROR - should be vpshufb !!!! */
4. What happens here?
My understanding:
GCC treats __m128i as vector of two 64bit integers:
/lib/gcc/x86_64-redhat-linux/4.8.3/include/emmintrin.h:typedef long long
__m128i __attribute__ ((__vector_size__ (16), __may_alias__));