Hi, for the online check I referred to the check of "default_set" via the initial thread affinity.
I see that pthread_getaffinity_np returns an already and:ed mask, was under the impression that pthread_getaffinity_np would return the same mask as was set using pthread_setaffinity_np. Looking on the implementation I see that it has been implemented on this line (https://github.com/torvalds/linux/blob/master/kernel/sched/core.c#L5242) for the last decade. Don’t know how this is implemented on FreeBSD or Windows. Below is some example runs without the online cpu check running inside the exclusive cpuset 1-3,19,79 with cpu 79 offline. Added a print statements after each consecutive calculation just to verify what the different steps. Nice that you were able to reproduce the bug, the fix looks good otherwise :) . = Example runs echo 0 > /sys/bus/cpu/devices/cpu79/online == 1. Ctrl threads via fallback app# LD_LIBRARY_PATH=$PWD/../lib:$LD_LIBRARY_PATH taskset -c 19,79 ./testpmd --master-lcore 0 --lcores "(0,19)@(19,1,2,3)" EAL: Detected 79 lcore(s) EAL: Detected 2 NUMA nodes EAL: default_set: 19 EAL: cset_online: 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78 EAL: cset_non_busy: 0,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127 EAL: cpuset: EAL: cpuset fallback: 1,2,3,19 ... ^Z app# grep -HE '^(Cpus_allowed_list|Name):' /proc/48803/task/*/status /proc/48803/task/48803/status:Name: testpmd /proc/48803/task/48803/status:Cpus_allowed_list: 1-3,19 /proc/48803/task/48804/status:Name: eal-intr-thread /proc/48803/task/48804/status:Cpus_allowed_list: 1-3,19 /proc/48803/task/48805/status:Name: rte_mp_handle /proc/48803/task/48805/status:Cpus_allowed_list: 1-3,19 /proc/48803/task/48806/status:Name: lcore-slave-19 /proc/48803/task/48806/status:Cpus_allowed_list: 1-3,19 == 2. Ctrl threads via default_set app# LD_LIBRARY_PATH=$PWD/../lib:$LD_LIBRARY_PATH taskset -c 3,79 ./testpmd --master-lcore 0 --lcores "(0,19)@(19,1,2)" EAL: Detected 79 lcore(s) EAL: Detected 2 NUMA nodes EAL: default_set: 3 EAL: cset_online: 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78 EAL: cset_non_busy: 0,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127 EAL: cpuset: 3 EAL: cpuset fallback: 3 ... ^Z app# grep -HE '^(Cpus_allowed_list|Name):' /proc/54032/task/*/status /proc/54032/task/54032/status:Name: testpmd /proc/54032/task/54032/status:Cpus_allowed_list: 1-2,19 /proc/54032/task/54033/status:Name: eal-intr-thread /proc/54032/task/54033/status:Cpus_allowed_list: 3 /proc/54032/task/54034/status:Name: rte_mp_handle /proc/54032/task/54034/status:Cpus_allowed_list: 3 /proc/54032/task/54035/status:Name: lcore-slave-19 /proc/54032/task/54035/status:Cpus_allowed_list: 1-2,19 BR Johan -----Original Message----- From: David Marchand [mailto:[email protected]] Sent: July 30, 2019 15:48 To: Johan Källström <[email protected]> Cc: [email protected]; [email protected]; [email protected]; [email protected] Subject: Re: [PATCH] eal: fix ctrl thread affinity with --lcores On Tue, Jul 30, 2019 at 1:38 PM Johan Källström <[email protected]> wrote: > The CPU failsafe is nice to have as you could set the thread affinity to > offline cpus. Created a "dpdk" cpuset and put cpus 4-7 into it (my system is mono numa with 8 cpus) # cd /sys/fs/cgroup/cpuset/ # mkdir dpdk # cd dpdk # echo 4-7 > cpuset.cpus # echo 0 > cpuset.mems Disabled cpu 5. # echo 0 > /sys/bus/cpu/devices/cpu5/online Put my shell that starts testpmd in this dpdk cpuset # echo 4439 > tasks EAL refuses an offline core when parsing the thread affinities and this did not change. $ ./master/app/testpmd --master-lcore 0 --lcores '(0,7)@(7,4,5)' --log-level *:debug --no-huge --no-pci -m 512 -- -i --total-num-mbufs=2048 EAL: Detected lcore 0 as core 0 on socket 0 EAL: Detected lcore 1 as core 1 on socket 0 EAL: Detected lcore 2 as core 2 on socket 0 EAL: Detected lcore 3 as core 3 on socket 0 EAL: Detected lcore 4 as core 0 on socket 0 EAL: Detected lcore 6 as core 2 on socket 0 EAL: Detected lcore 7 as core 3 on socket 0 EAL: Support maximum 128 logical core(s) by configuration. EAL: Detected 7 lcore(s) EAL: Detected 1 NUMA nodes EAL: core 5 unavailable EAL: invalid parameter for --lcores What did I miss? > > Maybe also add the example I gave you to trigger the bug? > https://protect2.fireeye.com/url?k=51a8b8b7-0d2163b8-51a8f82c-0cc47ad9 > 3e1a-2e7d7fab24e99be5&q=1&u=https%3A%2F%2Fbugs.dpdk.org%2Fshow_bug.cgi > %3Fid%3D322%23c12 I managed to reproduce your error with the setup above (without relying on the cset tool that is not available on rhel afaics), I can add it to the commitlog yes. > This also shows how to set the default_affinity mask and proves that the > calculation will result in threads inside the cpuset on Linux. > > /Johan > > On tis, 2019-07-30 at 11:35 +0200, David Marchand wrote: > > When using -l/-c options, each lcore is mapped to a physical cpu in > > a > > 1:1 fashion. > > On the contrary, when using --lcores, each lcore has its own cpuset > > Use "thread affinity" instead of cpuset when we talk about setting the thread > affinity. > > I know that the term cpuset is used in the data structure, but it is not a > cpuset as described by 'man cpuset' (on Linux). This comment can be seen as > cosmetic, but I think that it could be good to have a clear definitions to > minimize confusion. Indeed, using cpuset is inappropriate. I will update the commitlog and the comment. -- David Marchand

