Re: [dpdk-dev] [PATCH v8 2/4] meson: add infra to support machine specific flags

Yongseok Koh Thu, 11 Apr 2019 19:05:39 -0700


> On Apr 11, 2019, at 1:12 PM, Yongseok Koh <[email protected]> wrote:
> 
>> 
>> On Apr 10, 2019, at 11:07 PM, Pavan Nikhilesh Bhagavatula 
>> <[email protected]> wrote:
>> 
>> Hi Yongseok,
>> 
>>> -----Original Message-----
>>> From: Yongseok Koh <[email protected]>
>>> Sent: Wednesday, April 10, 2019 11:08 PM
>>> To: Pavan Nikhilesh Bhagavatula <[email protected]>
>>> Cc: Thomas Monjalon <[email protected]>; dev <[email protected]>; Jerin
>>> Jacob Kollanukkaran <[email protected]>; [email protected]
>>> Subject: [EXT] Re: [dpdk-dev] [PATCH v8 2/4] meson: add infra to support
>>> machine specific flags
>>> 
>>> External Email
>>> 
>>> ----------------------------------------------------------------------
>>> 
>>>> On Apr 10, 2019, at 9:13 AM, [email protected] wrote:
>>>> 
>>>> From: Pavan Nikhilesh <[email protected]>
>>>> 
>>>> Currently, RTE_* flags are set based on the implementer ID but there
>>>> might be some micro arch specific differences from the same vendor eg.
>>>> CACHE_LINESIZE. Add support to set micro arch specific flags.
>>>> 
>>>> Signed-off-by: Pavan Nikhilesh <[email protected]>
>>>> Signed-off-by: Jerin Jacob <[email protected]>
>>>> ---
>>>> config/arm/meson.build | 56 ++++++++++++++++++++++++------------------
>>>> 1 file changed, 32 insertions(+), 24 deletions(-)
>>>> 
>>>> diff --git a/config/arm/meson.build b/config/arm/meson.build index
>>>> 170a4981a..24bce2b39 100644
>>>> --- a/config/arm/meson.build
>>>> +++ b/config/arm/meson.build
>>>> @@ -7,25 +7,6 @@ march_opt = '-march=@0@'.format(machine)
>>>> 
>>>> arm_force_native_march = false
>>>> 
>>>> -machine_args_generic = [
>>>> -  ['default', ['-march=armv8-a+crc+crypto']],
>>>> -  ['native', ['-march=native']],
>>>> -  ['0xd03', ['-mcpu=cortex-a53']],
>>>> -  ['0xd04', ['-mcpu=cortex-a35']],
>>>> -  ['0xd05', ['-mcpu=cortex-a55']],
>>>> -  ['0xd07', ['-mcpu=cortex-a57']],
>>>> -  ['0xd08', ['-mcpu=cortex-a72']],
>>>> -  ['0xd09', ['-mcpu=cortex-a73']],
>>>> -  ['0xd0a', ['-mcpu=cortex-a75']],
>>>> -  ['0xd0b', ['-mcpu=cortex-a76']],
>>>> -]
>>>> -machine_args_cavium = [
>>>> -  ['default', ['-march=armv8-a+crc+crypto','-mcpu=thunderx']],
>>>> -  ['native', ['-march=native']],
>>>> -  ['0xa1', ['-mcpu=thunderxt88']],
>>>> -  ['0xa2', ['-mcpu=thunderxt81']],
>>>> -  ['0xa3', ['-mcpu=thunderxt83']]]
>>>> -
>>>> flags_common_default = [
>>>>    # Accelarate rte_memcpy. Be sure to run unit test
>>> (memcpy_perf_autotest)
>>>>    # to determine the best threshold in code. Refer to notes in source
>>>> file @@ -52,12 +33,10 @@ flags_generic = [
>>>>    ['RTE_USE_C11_MEM_MODEL', true],
>>>>    ['RTE_CACHE_LINE_SIZE', 128]]
>>>> flags_cavium = [
>>>> -  ['RTE_MACHINE', '"thunderx"'],
>>>>    ['RTE_CACHE_LINE_SIZE', 128],
>>>>    ['RTE_MAX_NUMA_NODES', 2],
>>>>    ['RTE_MAX_LCORE', 96],
>>>> -  ['RTE_MAX_VFIO_GROUPS', 128],
>>>> -  ['RTE_USE_C11_MEM_MODEL', false]]
>>>> +  ['RTE_MAX_VFIO_GROUPS', 128]]
>>>> flags_dpaa = [
>>>>    ['RTE_MACHINE', '"dpaa"'],
>>>>    ['RTE_USE_C11_MEM_MODEL', true],
>>>> @@ -71,6 +50,27 @@ flags_dpaa2 = [
>>>>    ['RTE_MAX_NUMA_NODES', 1],
>>>>    ['RTE_MAX_LCORE', 16],
>>>>    ['RTE_LIBRTE_DPAA2_USE_PHYS_IOVA', false]]
>>>> +flags_default_extra = []
>>>> +flags_thunderx_extra = [
>>>> +  ['RTE_MACHINE', '"thunderx"'],
>>>> +  ['RTE_USE_C11_MEM_MODEL', false]]
>>>> +
>>>> +machine_args_generic = [
>>>> +  ['default', ['-march=armv8-a+crc+crypto']],
>>>> +  ['native', ['-march=native']],
>>>> +  ['0xd03', ['-mcpu=cortex-a53']],
>>>> +  ['0xd04', ['-mcpu=cortex-a35']],
>>>> +  ['0xd07', ['-mcpu=cortex-a57']],
>>>> +  ['0xd08', ['-mcpu=cortex-a72']],
>>>> +  ['0xd09', ['-mcpu=cortex-a73']],
>>>> +  ['0xd0a', ['-mcpu=cortex-a75']]]
>>>> +
>>>> +machine_args_cavium = [
>>>> +  ['default', ['-march=armv8-a+crc+crypto','-mcpu=thunderx']],
>>>> +  ['native', ['-march=native']],
>>>> +  ['0xa1', ['-mcpu=thunderxt88'], flags_thunderx_extra],
>>>> +  ['0xa2', ['-mcpu=thunderxt81'], flags_thunderx_extra],
>>>> +  ['0xa3', ['-mcpu=thunderxt83'], flags_thunderx_extra]]
>>>> 
>>>> ## Arm implementer ID (ARM DDI 0487C.a, Section G7.2.106, Page
>>>> G7-5321) impl_generic = ['Generic armv8', flags_generic,
>>>> machine_args_generic] @@ -157,8 +157,16 @@ else
>>>>    endif
>>>>    foreach marg: machine[2]
>>>>            if marg[0] == impl_pn
>>>> -                  foreach f: marg[1]
>>>> -                          machine_args += f
>>>> +                  foreach flag: marg[1]
>>>> +                          if cc.has_argument(flag)
>>>> +                                  machine_args += flag
>>>> +                          endif
>>>> +                  endforeach
>>>> +                  # Apply any extra machine specific flags.
>>>> +                  foreach flag: marg.get(2, flags_default_extra)
>>>> +                          if flag.length() > 0
>>>> +                                  dpdk_conf.set(flag[0], flag[1])
>>>> +                          endif
>>> 
>>> Let me continue the discussion from v7 here.
>>> Seems I wan't clear enough.
>>> 
>>> Let me take an example. If the host is thunderx2 (0xaf) and compiler is 
>>> older
>>> than v7, flags_thunderx2_extra isn't set. This means, for example,
>>> RTE_CACHE_LINE_SIZE will still be 128. Is that what you want?
>>> RTE_CACHE_LINE_SIZE has nothing to do with compiler support and you might
>>> want to set it regardless of gcc version. You could skip setting -mcpu with 
>>> setting
>>> the extra flags.
>>> 
>> 
>> Thanks for the detailed explanation.
>> I think since we have the check to skip mcpu flag when cc doesn't support it 
>> (cc.has_argument(flag))
>> It will be safe to remove 
>> `
>>       # Primary part number based mcpu flags are supported
>>       # for gcc versions > 7
>>       if cc.version().version_compare(
>>                       '<7.0') or cmd_output.length() == 0
>>               if not meson.is_cross_build() and arm_force_native_march == 
>> true
>>                       impl_pn = 'native'
>>               else
>>                       impl_pn = 'default'
>>               endif
>>       endif
>> `
> 
> +1


I've tested it but still have an issue with old gcc.
Even if -mcpu isn't set due to cc.has_argument(), -march isn't set either.
So, it spews error due to lack of CRC feature.
-march should have '+crc'. The error I got was:

> ninja: Entering directory `build'
> [942/1452] Compiling C object 
> 'drivers/drivers...c@sta/net_softnic_rte_eth_softnic_action.c.o'.
> FAILED: 
> drivers/drivers@@tmp_rte_pmd_softnic@sta/net_softnic_rte_eth_softnic_action.c.o
> cc -Idrivers/drivers@@tmp_rte_pmd_softnic@sta -Idrivers -I../drivers 
> -Idrivers/net/softnic -I../drivers/net/softnic -Ilib/librte_ethdev 
> -I../lib/librte_ethdev -I. -I../ -Iconfig 
> -I../config-Ilib/librte_eal/common/include -I../lib/librte_eal/common/include 
> -I../lib/librte_eal/linux/eal/include -Ilib/librte_eal/common 
> -I../lib/librte_eal/common -Ilib/librte_eal/
> common/include/arch/arm -I../lib/librte_eal/common/include/arch/arm 
> -Ilib/librte_eal -I../lib/librte_eal -Ilib/librte_kvargs 
> -I../lib/librte_kvargs -Ilib/librte_net -I../lib/librte_net -Ilib/librte_mbuf 
> -I../lib/librte_mbuf -Ilib/librte_mempool -I../lib/librte_mempool 
> -Ilib/librte_ring -I../lib/librte_ring -Ilib/librte_cmdline 
> -I../lib/librte_cmdline -Ilib/lib
> rte_meter -I../lib/librte_meter -Idrivers/bus/pci -I../drivers/bus/pci 
> -I../drivers/bus/pci/linux -Ilib/librte_pci -I../lib/librte_pci 
> -Idrivers/bus/vdev -I../drivers/bus/vdev -Ilib/librte_pipeline 
> -I../lib/librte_pipeline -Ilib/librte_port -I../lib/librte_port 
> -Ilib/librte_sched -I../lib/librte_sched -Ilib/librte_ip_frag 
> -I../lib/librte_ip_frag -Ilib/librte_h
> ash -I../lib/librte_hash -Ilib/librte_cryptodev -I../lib/librte_cryptodev 
> -Ilib/librte_kni -I../lib/librte_kni -Ilib/librte_table -I../lib/librte_table 
> -Ilib/librte_lpm -I../lib/librte_lpm -Ilib/librte_acl -I../lib/librte_acl 
> -pipe -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -O3 -include rte_config.h 
> -Wsign-compare -Wcast-qual -fPIC -D_GNU_SOURCE -DALLOW_EXPERI
> MENTAL_API  -MD -MQ 
> 'drivers/drivers@@tmp_rte_pmd_softnic@sta/net_softnic_rte_eth_softnic_action.c.o'
>  -MF 
> 'drivers/drivers@@tmp_rte_pmd_softnic@sta/net_softnic_rte_eth_softnic_action.c.o.d'
>  -o 
> 'drivers/drivers@@tmp_rte_pmd_softnic@sta/net_softnic_rte_eth_softnic_action.c.o'
>  -c ../drivers/net/softnic/rte_eth_softnic_action.c
> {standard input}: Assembler messages:
> {standard input}:14: Error: selected processor does not support `crc32cx 
> w3,w3,x0'
> {standard input}:37: Error: selected processor does not support `crc32cx 
> w1,w1,x3'
> {standard input}:40: Error: selected processor does not support `crc32cx 
> w0,w0,x2'


My machine has 0x41(Arm) and 0xd08(cortex-a72). gcc is '4.8.5 20150623 (Red Hat 
4.8.5-28)'

Thanks,
Yongseok


> 
>> 
>> The command output check can also be removed as it is handled when calling 
>> the command script itself.
> 
> +1
> 
>> 
>> Thoughts?
>> 
>> PS. I think the safest way to set CACHELINE_SIZE is to read the cache type 
>> register[1] but sadly only few latest kernels 
>> have the support through sysfs 
>> (/sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size) 
> 
> +1
> 
> In summary, +3. LoL
> 
> I'll also submit a patch to change the default cacheline size of cortex-a72 
> with the new flags_*_extra[]
> 
> 
> thanks,
> Yongseok

Re: [dpdk-dev] [PATCH v8 2/4] meson: add infra to support machine specific flags

Reply via email to