rte_memcpy.h for both SSE and AVX platforms

Wodkowski, PawelX Mon, 26 Jan 2015 14:43:04 +0000

Hi,

I must say: greate work.


I have some small comments:

> +/**
> + * Macro for copying unaligned block from one location to another,
> + * 47 bytes leftover maximum,
> + * locations should not overlap.
> + * Requirements:
> + * - Store is aligned
> + * - Load offset is <offset>, which must be immediate value within [1, 15]
> + * - For <src>, make sure <offset> bit backwards & <16 - offset> bit forwards
> are available for loading
> + * - <dst>, <src>, <len> must be variables
> + * - __m128i <xmm0> ~ <xmm8> must be pre-defined
> + */
> +#define MOVEUNALIGNED_LEFT47(dst, src, len, offset)
> \
> +{                                                                            
>                                \
...
> +}

Why not do { ... } while(0) or ({ ... }) ? This could have unpredictable side
effects.

Second:
Why you completely substitute
#define rte_memcpy(dst, src, n)              \
        ({ (__builtin_constant_p(n)) ?       \
        memcpy((dst), (src), (n)) :          \
        rte_memcpy_func((dst), (src), (n)); })

with inline rte_memcpy()? This construction  can help compiler to deduce
which version to use (static?) inline implementation or call external
function.

Did you try 'extern inline' type? It could help reducing compilation time.

[dpdk-dev] [PATCH 4/4] lib/librte_eal: Optimized memcpy in arch/x86/rte_memcpy.h for both SSE and AVX platforms

Reply via email to