<snip>

> 
> Thanks for your suggestions, we found that the -fno-tree-vectorize option
> works.
> PS: This option is not successfully added in the earliest test.
> 
> Solution:
> 1. use the -fno-tree-vectorize option to prevent compiler generate auto
> vetorization
>    code, so tha slow-path will work fine.
> 2. add '-march=armv8-a+sve+crc' line of implementer_generic in
> arm/meson.build
>         'part_number_config': {
>                 'generic': {'machine_args': ['-march=armv8-a+crc',
>                                              '-march=armv8-a+sve+crc',
>                                              '-moutline-atomics']}
>         }
>    If compiler doesn't support '-march=armv8-a+sve+crc', then it will fallback
>    supports '-march=armv8-a+crc'.
>    If compiler supports '-march=armv8-a+sve+crc', then it will compile SVE-
> related
>    code, so the IO-path could support SVE.
> 
> Base above we could achieve initial target.
The 'generic' target is for generating a binary that would work on all ArmV8 
machines. If you are building with '-march=armv8-a+sve+crc', the IO-Path would 
not work on non-SVE machines.

> 
> 
> On 2021/5/1 4:54, Honnappa Nagarahalli wrote:
> > <snip>
> >
> >>
> >> On Fri, Apr 30, 2021 at 5:27 PM fengchengwen
> >> <fengcheng...@huawei.com> wrote:
> >>>
> >>> Hi, ALL
> >>> We have a question for your help:
> >>>   1. We have two platforms, both of which are ARM64, one of which
> >> supports
> >>>      both NEON and SVE, the other only support NEON.
> >>>   2. We want to run on both platforms with a single binary file, and use
> the
> >>>      highest vector capability of the corresponding platform
> >>> whenever
> >> possible.
> >>
> >> I see VPP has a similar feature. IMO, it is not present in DPDK.
> >> Basically, In order to do this.
> >> - Compile slow-path code(90% of DPDK) with minimal CPU instruction
> >> set support
> >> - Have fastpath function compile with different CPU instruction set
> >> levels -In slowpath, Attach the fastpath function pointer-based on
> >> CPU instruction- level support.
> > Agree.
> >
> >>
> >>
> >>>   3. So we build the DPDK program with -march=armv8-a+sve+crc (GCC
> >> 10.2).
> > This defines the minimum capabilities of the target machine.
> >
> >>>      However, it is found that invalid instructions occur when the program
> >>>      runs on a machine that does not support SVE (pls see below).
> >>>   4. The problem is caused by the introduction of SVE in GCC
> >>> automatic
> >> vector
> >>>      optimization.
> >>>
> >>>   So Is there a way to disable GCC automatic vector optimization or use
> only
> >>>   NEON to perform automatic vector optimization?
> > I do not think this is safe. Once SVE is enabled, compiler is allowed to use
> the SVE instructions wherever it finds it fit.
> >
> >>>
> >>>   BTW: we already test -fno-tree-vectorize (as link below) but found
> >>> no
> >> effect.
> >>>
> >>> https://stackoverflow.com/questions/7778174/how-can-i-disable-vector
> >>> iz
> >>> ation-while-using-gcc
> >>>
> >>>
> >>> The GDB output:
> >>>      EAL: Detected 128 lcore(s)
> >>>      EAL: Detected 4 NUMA nodes
> >>>      Option -w, --pci-whitelist is deprecated, use -a, --allow
> >>> option instead
> >>>
> >>>      Program received signal SIGILL, Illegal instruction.
> >>>      0x0000000000671b88 in eal_adjust_config ()
> >>>      (gdb)
> >>>      (gdb) where
> >>>      #0  0x0000000000671b88 in eal_adjust_config ()
> >>>      #1  0x0000000000682840 in rte_eal_init ()
> >>>      #2  0x000000000051c870 in main ()
> >>>      (gdb)
> >>>
> >>> The disassembly output of eal_adjust_config:
> >>>      671b7c:       f8237a81        str     x1, [x20, x3, lsl #3]
> >>>      671b80:       f110001f        cmp     x0, #0x400
> >>>      671b84:       54ffff21        b.ne    671b68 
> >>> <eal_adjust_config+0x1f4>  //
> >> b.any
> >>>      671b88:       043357f5        addvl   x21, x19, #-1
> >>>      671b8c:       043457e1        addvl   x1, x20, #-1
> >>>      671b90:       910562b5        add     x21, x21, #0x158
> >>>      671b94:       04e0e3e0        cntd    x0
> >>>      671b98:       914012b5        add     x21, x21, #0x4, lsl #12
> >>>      671b9c:       52800218        mov     w24, #0x10                     
> >>>  // #16
> >>>      671ba0:       25d8e3e1        ptrue   p1.d
> >>>      671ba4:       25f80fe0        whilelo p0.d, wzr, w24
> >>>      671ba8:       a5e04020        ld1d    {z0.d}, p0/z, [x1, x0, lsl #3]
> >>>
> >>>
> >>> Best regards.
> >>>

Reply via email to