Re: [dpdk-dev] [EXT] [PATCH 1/3] acl: fix arm argument types

Aaron Conole Wed, 05 Jun 2019 10:09:54 -0700

Jerin Jacob Kollanukkaran <[email protected]> writes:

>> -----Original Message-----
>> From: Jerin Jacob Kollanukkaran
>> Sent: Wednesday, April 10, 2019 8:10 PM
>> To: [email protected]; [email protected]
>> Cc: [email protected]; [email protected]
>> Subject: Re: [EXT] [PATCH 1/3] acl: fix arm argument types
>> 
>> On Mon, 2019-04-08 at 14:24 -0400, Aaron Conole wrote:
>> > -------------------------------------------------------------------
>> > ---
>> > Compiler complains of argument type mismatch, like:
>> 
>> Can you share more details on how to reproduce this issue?
>> 
>> We already have
>> CFLAGS_acl_run_neon.o += -flax-vector-conversions in the Makefile.
>> 
>> If you are taking out -flax-vector-conversions the correct way to fix will be
>> use vreinterpret*.
>> 
>> For me the code looks clean, If unnecessary casting is avoided.
>
>
> Considering the following patch is part of dpdk.org now. I think, We may not 
> need this
> patch in benefit to avoid a lot of typecasting.
>
> https://git.dpdk.org/dpdk/commit/?id=e53ce4e4137974f46743e74bd9ab912e0166c8b1


Correct, the lax conversions aren't needed.

>
>
>
>> 
>> 
>> >
>> >    ../lib/librte_acl/acl_run_neon.h: In function ‘transition4’:
>> >    ../lib/librte_acl/acl_run_neon.h:115:2: note: use -flax-vector-
>> > conversions
>> >       to permit conversions between vectors with differing element
>> > types
>> >       or numbers of subparts
>> >      node_type = vbicq_s32(tr_hi_lo.val[0], index_msk);
>> >      ^
>> >    ../lib/librte_acl/acl_run_neon.h:115:41: error: incompatible type
>> > for
>> >       argument 2 of ‘vbicq_s32’
>> >
>> > Signed-off-by: Aaron Conole <[email protected]>
>> > ---
>> >  lib/librte_acl/acl_run_neon.h | 46 ++++++++++++++++++++-------------
>> > --
>> >  1 file changed, 27 insertions(+), 19 deletions(-)
>> >
>> >
>> >
>> >  /*
>> > @@ -179,6 +183,9 @@ search_neon_8(const struct rte_acl_ctx *ctx, const
>> > uint8_t **data,
>> >    acl_match_check_x4(0, ctx, parms, &flows, &index_array[0]);
>> >    acl_match_check_x4(4, ctx, parms, &flows, &index_array[4]);
>> >
>> > +  memset(&input0, 0, sizeof(input0));
>> > +  memset(&input1, 0, sizeof(input1));
>> 
>> Why this memset only required for arm64? If it real issue, Shouldn't it
>> required for x86 and ppc ?
>> 

Something for this part is still needed (see for example:
https://travis-ci.com/DPDK/dpdk/jobs/205675369).

I have two alternate approaches, butneither have even been compile tested
(and the obvious '-Wno-maybe-uninitialized' - but I dislike that
 approach because it will afflict all routines):

1.  Something like this:

@@ -181,8 +181,8 @@ search_neon_8(const struct rte_acl_ctx *ctx, const uint8_t 
**data,
 
        while (flows.started > 0) {
                /* Gather 4 bytes of input data for each stream. */
-               input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input0, 0);
-               input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), input1, 0);
+               input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), 
vdup_n_s32(0), 0);
+               input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), 
vdup_n_s32(0), 0);
 
                input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input0, 1);
                input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 5), input1, 1);
@@ -242,7 +242,7 @@ search_neon_4(const struct rte_acl_ctx *ctx, const uint8_t 
**data,
 
        while (flows.started > 0) {
                /* Gather 4 bytes of input data for each stream. */
-               input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input, 0);
+               input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), 
vdup_n_s32(0), 0);
                input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input, 1);
                input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2), input, 2);
                input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), input, 3);

---------

2: something like this

diff --git a/lib/librte_acl/acl_run_neon.h b/lib/librte_acl/acl_run_neon.h
index a055a8240..0eb42865a 100644
--- a/lib/librte_acl/acl_run_neon.h
+++ b/lib/librte_acl/acl_run_neon.h
@@ -165,7 +165,8 @@ search_neon_8(const struct rte_acl_ctx *ctx, const uint8_t 
**data,
        uint64_t index_array[8];
        struct completion cmplt[8];
        struct parms parms[8];
-       int32x4_t input0, input1;
+       static int32x4_t ZERO_VAL;
+       int32x4_t input0 = ZERO_VAL, input1 = ZERO_VAL;
 
        acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results,
                     total_packets, categories, ctx->trans_table);
@@ -181,8 +182,8 @@ search_neon_8(const struct rte_acl_ctx *ctx, const uint8_t 
**data,
 
        while (flows.started > 0) {
                /* Gather 4 bytes of input data for each stream. */
-               input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), 
vdup_n_s32(0), 0);
-               input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), 
vdup_n_s32(0), 0);
+               input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input0, 0);
+               input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), input1, 0);
 
                input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input0, 1);
                input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 5), input1, 1);
@@ -227,7 +228,8 @@ search_neon_4(const struct rte_acl_ctx *ctx, const uint8_t 
**data,
        uint64_t index_array[4];
        struct completion cmplt[4];
        struct parms parms[4];
-       int32x4_t input;
+       static int32x4_t ZERO_VAL;
+       int32x4_t input = ZERO_VAL;
 
        acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results,
                     total_packets, categories, ctx->trans_table);
@@ -242,7 +244,7 @@ search_neon_4(const struct rte_acl_ctx *ctx, const uint8_t 
**data,
 
        while (flows.started > 0) {
                /* Gather 4 bytes of input data for each stream. */
-               input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), 
vdup_n_s32(0), 0);
+               input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input, 0);
                input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input, 1);
                input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2), input, 2);
                input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), input, 3);
---

WDYT?

Re: [dpdk-dev] [EXT] [PATCH 1/3] acl: fix arm argument types

Reply via email to