<snip> > > > >> > >> Thanks for your suggestions, we found that the -fno-tree-vectorize > >> option works. > >> PS: This option is not successfully added in the earliest test. > >> > >> Solution: > >> 1. use the -fno-tree-vectorize option to prevent compiler generate > >> auto vetorization > >> code, so tha slow-path will work fine. > >> 2. add '-march=armv8-a+sve+crc' line of implementer_generic in > >> arm/meson.build > >> 'part_number_config': { > >> 'generic': {'machine_args': ['-march=armv8-a+crc', > >> '-march=armv8-a+sve+crc', > >> '-moutline-atomics']} > >> } > >> If compiler doesn't support '-march=armv8-a+sve+crc', then it will > fallback > >> supports '-march=armv8-a+crc'. > >> If compiler supports '-march=armv8-a+sve+crc', then it will > >> compile SVE- related > >> code, so the IO-path could support SVE. > >> > >> Base above we could achieve initial target. > > The 'generic' target is for generating a binary that would work on all ArmV8 > machines. If you are building with '-march=armv8-a+sve+crc', the IO-Path > would not work on non-SVE machines. > > > > The 'generic' only used in local CI (note: the two platforms are both ARMv8 > machines) > > In the IO-path, we support NEON and SVE Rx/Tx, the code was written by > ACLE, so it will not affect by the -fno-tree-vectorize option. > > If compiler supports '-march=armv8-a+sve+crc', then it will compile both > NEON and SVE related code. Using '-march=armv8-a+sve+crc' and '-fno-tree-vectorize' does not provide an absolute guarantee that the compiler will not use SVE elsewhere.
The safest way to ensure that only specific functions use SVE is to compile without +sve (e.g. using -march=armv8-a) and use pragmas around the functions that are allowed to use SVE. Ex: #pragma GCC push_options #pragma GCC target ("+sve") void f(int *x) { for (int i = 0; i < 100; ++i) x[i] = i; } #pragma GCC pop_options void g(int *x) { for (int i = 0; i < 100; ++i) x[i] = i; } compiles f() using SVE and g() with standard options. You can also follow the function multiversioning discussed in the other thread. > In the runtime, driver supports detect the platform whether support SVE, if > not it will select the NEON. > > Best regards. > > >> > >> > >> On 2021/5/1 4:54, Honnappa Nagarahalli wrote: > >>> <snip> > >>> > >>>> > >>>> On Fri, Apr 30, 2021 at 5:27 PM fengchengwen > >>>> <fengcheng...@huawei.com> wrote: > >>>>> > >>>>> Hi, ALL > >>>>> We have a question for your help: > >>>>> 1. We have two platforms, both of which are ARM64, one of which > >>>> supports > >>>>> both NEON and SVE, the other only support NEON. > >>>>> 2. We want to run on both platforms with a single binary file, > >>>>> and use > >> the > >>>>> highest vector capability of the corresponding platform > >>>>> whenever > >>>> possible. > >>>> > >>>> I see VPP has a similar feature. IMO, it is not present in DPDK. > >>>> Basically, In order to do this. > >>>> - Compile slow-path code(90% of DPDK) with minimal CPU instruction > >>>> set support > >>>> - Have fastpath function compile with different CPU instruction set > >>>> levels -In slowpath, Attach the fastpath function pointer-based on > >>>> CPU instruction- level support. > >>> Agree. > >>> > >>>> > >>>> > >>>>> 3. So we build the DPDK program with -march=armv8-a+sve+crc (GCC > >>>> 10.2). > >>> This defines the minimum capabilities of the target machine. > >>> > >>>>> However, it is found that invalid instructions occur when the > program > >>>>> runs on a machine that does not support SVE (pls see below). > >>>>> 4. The problem is caused by the introduction of SVE in GCC > >>>>> automatic > >>>> vector > >>>>> optimization. > >>>>> > >>>>> So Is there a way to disable GCC automatic vector optimization > >>>>> or use > >> only > >>>>> NEON to perform automatic vector optimization? > >>> I do not think this is safe. Once SVE is enabled, compiler is > >>> allowed to use > >> the SVE instructions wherever it finds it fit. > >>> > >>>>> > >>>>> BTW: we already test -fno-tree-vectorize (as link below) but > >>>>> found no > >>>> effect. > >>>>> > >>>>> https://stackoverflow.com/questions/7778174/how-can-i-disable-vect > >>>>> or > >>>>> iz > >>>>> ation-while-using-gcc > >>>>> > >>>>> > >>>>> The GDB output: > >>>>> EAL: Detected 128 lcore(s) > >>>>> EAL: Detected 4 NUMA nodes > >>>>> Option -w, --pci-whitelist is deprecated, use -a, --allow > >>>>> option instead > >>>>> > >>>>> Program received signal SIGILL, Illegal instruction. > >>>>> 0x0000000000671b88 in eal_adjust_config () > >>>>> (gdb) > >>>>> (gdb) where > >>>>> #0 0x0000000000671b88 in eal_adjust_config () > >>>>> #1 0x0000000000682840 in rte_eal_init () > >>>>> #2 0x000000000051c870 in main () > >>>>> (gdb) > >>>>> > >>>>> The disassembly output of eal_adjust_config: > >>>>> 671b7c: f8237a81 str x1, [x20, x3, lsl #3] > >>>>> 671b80: f110001f cmp x0, #0x400 > >>>>> 671b84: 54ffff21 b.ne 671b68 > >>>>> <eal_adjust_config+0x1f4> > // > >>>> b.any > >>>>> 671b88: 043357f5 addvl x21, x19, #-1 > >>>>> 671b8c: 043457e1 addvl x1, x20, #-1 > >>>>> 671b90: 910562b5 add x21, x21, #0x158 > >>>>> 671b94: 04e0e3e0 cntd x0 > >>>>> 671b98: 914012b5 add x21, x21, #0x4, lsl #12 > >>>>> 671b9c: 52800218 mov w24, #0x10 > >>>>> // #16 > >>>>> 671ba0: 25d8e3e1 ptrue p1.d > >>>>> 671ba4: 25f80fe0 whilelo p0.d, wzr, w24 > >>>>> 671ba8: a5e04020 ld1d {z0.d}, p0/z, [x1, x0, lsl > >>>>> #3] > >>>>> > >>>>> > >>>>> Best regards. > >>>>> > >