[dpdk-dev] [PATCH v1 1/2] rte_memcmp functions using Intel AVX and SSE intrinsics

2016-03-07 Thread Ravi Kerur
v1:
This patch adds memcmp functionality using AVX and SSE
intrinsics provided by Intel. For other architectures
supported by DPDK regular memcmp function is used.

Compiled and tested on Ubuntu 14.04(non-NUMA) and 15.10(NUMA)
systems.

Signed-off-by: Ravi Kerur 
---
 .../common/include/arch/arm/rte_memcmp.h   |  60 ++
 .../common/include/arch/ppc_64/rte_memcmp.h|  62 ++
 .../common/include/arch/tile/rte_memcmp.h  |  60 ++
 .../common/include/arch/x86/rte_memcmp.h   | 786 +
 lib/librte_eal/common/include/generic/rte_memcmp.h | 175 +
 5 files changed, 1143 insertions(+)
 create mode 100644 lib/librte_eal/common/include/arch/arm/rte_memcmp.h
 create mode 100644 lib/librte_eal/common/include/arch/ppc_64/rte_memcmp.h
 create mode 100644 lib/librte_eal/common/include/arch/tile/rte_memcmp.h
 create mode 100644 lib/librte_eal/common/include/arch/x86/rte_memcmp.h
 create mode 100644 lib/librte_eal/common/include/generic/rte_memcmp.h

diff --git a/lib/librte_eal/common/include/arch/arm/rte_memcmp.h 
b/lib/librte_eal/common/include/arch/arm/rte_memcmp.h
new file mode 100644
index 000..fcbacb4
--- /dev/null
+++ b/lib/librte_eal/common/include/arch/arm/rte_memcmp.h
@@ -0,0 +1,60 @@
+/*
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 RehiveTech. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of IBM Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+*/
+
+#ifndef _RTE_MEMCMP_ARM_H_
+#define _RTE_MEMCMP_ARM_H_
+
+#include 
+#include 
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "generic/rte_memcmp.h"
+
+#define rte_memcmp(dst, src, n)  \
+   ({ (__builtin_constant_p(n)) ?   \
+   memcmp((dst), (src), (n)) :  \
+   rte_memcmp_func((dst), (src), (n)); })
+
+static inline bool
+rte_memcmp_func(void *dst, const void *src, size_t n)
+{
+   return memcmp(dst, src, n);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_MEMCMP_ARM_H_ */
diff --git a/lib/librte_eal/common/include/arch/ppc_64/rte_memcmp.h 
b/lib/librte_eal/common/include/arch/ppc_64/rte_memcmp.h
new file mode 100644
index 000..5839a2d
--- /dev/null
+++ b/lib/librte_eal/common/include/arch/ppc_64/rte_memcmp.h
@@ -0,0 +1,62 @@
+/*
+ *   BSD LICENSE
+ *
+ *   Copyright (C) IBM Corporation 2016.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in
+ *   the documentation and/or other materials provided with the
+ *   distribution.
+ * * Neither the name of IBM Corporation nor the names of its
+ *   contributors may be used to endorse or promote products derived
+ *   from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIA

[dpdk-dev] [PATCH v1 1/2] rte_memcmp functions using Intel AVX and SSE intrinsics

2016-05-25 Thread Thomas Monjalon
2016-03-07 15:00, Ravi Kerur:
> v1:
> This patch adds memcmp functionality using AVX and SSE
> intrinsics provided by Intel. For other architectures
> supported by DPDK regular memcmp function is used.

Anyone to review this patch please? Zhihong?


[dpdk-dev] [PATCH v1 1/2] rte_memcmp functions using Intel AVX and SSE intrinsics

2016-05-26 Thread Wang, Zhihong


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ravi Kerur
> Sent: Tuesday, March 8, 2016 7:01 AM
> To: dev at dpdk.org
> Subject: [dpdk-dev] [PATCH v1 1/2] rte_memcmp functions using Intel AVX and
> SSE intrinsics
> 
> v1:
> This patch adds memcmp functionality using AVX and SSE
> intrinsics provided by Intel. For other architectures
> supported by DPDK regular memcmp function is used.
> 
> Compiled and tested on Ubuntu 14.04(non-NUMA) and 15.10(NUMA)
> systems.
> 
[...]

> + if (unlikely(!_mm_testz_si128(xmm2, xmm2))) {
> + __m128i idx =
> + _mm_setr_epi8(15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 
> 3, 2, 1, 0);

line over 80 characters ;)

> +
> + /*
> +  * Reverse byte order
> +  */
> + xmm0 = _mm_shuffle_epi8(xmm0, idx);
> + xmm1 = _mm_shuffle_epi8(xmm1, idx);
> +
> + /*
> + * Compare unsigned bytes with instructions for signed bytes
> + */
> + xmm0 = _mm_xor_si128(xmm0, _mm_set1_epi8(0x80));
> + xmm1 = _mm_xor_si128(xmm1, _mm_set1_epi8(0x80));
> +
> + return _mm_movemask_epi8(xmm0 > xmm1) -
> _mm_movemask_epi8(xmm1 > xmm0);
> + }
> +
> + return 0;
> +}

[...]

> +static inline int
> +rte_memcmp(const void *_src_1, const void *_src_2, size_t n)
> +{
> + const uint8_t *src_1 = (const uint8_t *)_src_1;
> + const uint8_t *src_2 = (const uint8_t *)_src_2;
> + int ret = 0;
> +
> + if (n < 16)
> + return rte_memcmp_regular(src_1, src_2, n);
[...]
> +
> + while (n > 512) {
> + ret = rte_cmp256(src_1 + 0 * 256, src_2 + 0 * 256);

Thanks for the great work!

Seems to me there's a big improvement area before going into detailed
instruction layout tuning that -- No unalignment handling here for large
size memcmp.

So almost without a doubt the performance will be low in micro-architectures
like Sandy Bridge if the start address is unaligned, which might be a
common case.

> + if (unlikely(ret != 0))
> + return ret;
> +
> + ret = rte_cmp256(src_1 + 1 * 256, src_2 + 1 * 256);
> + if (unlikely(ret != 0))
> + return ret;
> +
> + src_1 = src_1 + 512;
> + src_2 = src_2 + 512;
> + n -= 512;
> + }
> + goto CMP_BLOCK_LESS_THAN_512;
> +}
> +
> +#else /* RTE_MACHINE_CPUFLAG_AVX2 */




Re: [dpdk-dev] [PATCH v1 1/2] rte_memcmp functions using Intel AVX and SSE intrinsics

2018-12-20 Thread Ferruh Yigit
On 5/26/2016 9:57 AM, zhihong.wang at intel.com (Wang, Zhihong) wrote:
>> -Original Message-
>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ravi Kerur
>> Sent: Tuesday, March 8, 2016 7:01 AM
>> To: dev at dpdk.org
>> Subject: [dpdk-dev] [PATCH v1 1/2] rte_memcmp functions using Intel AVX and
>> SSE intrinsics
>>
>> v1:
>> This patch adds memcmp functionality using AVX and SSE
>> intrinsics provided by Intel. For other architectures
>> supported by DPDK regular memcmp function is used.
>>
>> Compiled and tested on Ubuntu 14.04(non-NUMA) and 15.10(NUMA)
>> systems.
>>
> [...]
> 
>> +if (unlikely(!_mm_testz_si128(xmm2, xmm2))) {
>> +__m128i idx =
>> +_mm_setr_epi8(15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 
>> 3, 2, 1, 0);
> 
> line over 80 characters ;)
> 
>> +
>> +/*
>> + * Reverse byte order
>> + */
>> +xmm0 = _mm_shuffle_epi8(xmm0, idx);
>> +xmm1 = _mm_shuffle_epi8(xmm1, idx);
>> +
>> +/*
>> +* Compare unsigned bytes with instructions for signed bytes
>> +*/
>> +xmm0 = _mm_xor_si128(xmm0, _mm_set1_epi8(0x80));
>> +xmm1 = _mm_xor_si128(xmm1, _mm_set1_epi8(0x80));
>> +
>> +return _mm_movemask_epi8(xmm0 > xmm1) -
>> _mm_movemask_epi8(xmm1 > xmm0);
>> +}
>> +
>> +return 0;
>> +}
> 
> [...]
> 
>> +static inline int
>> +rte_memcmp(const void *_src_1, const void *_src_2, size_t n)
>> +{
>> +const uint8_t *src_1 = (const uint8_t *)_src_1;
>> +const uint8_t *src_2 = (const uint8_t *)_src_2;
>> +int ret = 0;
>> +
>> +if (n < 16)
>> +return rte_memcmp_regular(src_1, src_2, n);
> [...]
>> +
>> +while (n > 512) {
>> +ret = rte_cmp256(src_1 + 0 * 256, src_2 + 0 * 256);
> 
> Thanks for the great work!
> 
> Seems to me there's a big improvement area before going into detailed
> instruction layout tuning that -- No unalignment handling here for large
> size memcmp.
> 
> So almost without a doubt the performance will be low in micro-architectures
> like Sandy Bridge if the start address is unaligned, which might be a
> common case.

Patch is waiting for comment for a long time, since 2016 May. Updating patch
status as rejected.

Anyone planning to work on vectorized version of rte_memcmp() can benefit from
this patch:
https://patches.dpdk.org/patch/11156/
https://patches.dpdk.org/patch/11157/