On 27.11.2018 13:33, Burakov, Anatoly wrote: > On 27-Nov-18 10:26 AM, Hemant Agrawal wrote: >> >> On 11/26/2018 8:55 PM, Asaf Sinai wrote: >>> +CC Ilia & Sasha. >>> >>> -----Original Message----- >>> From: Burakov, Anatoly <anatoly.bura...@intel.com> >>> Sent: Monday, November 26, 2018 04:57 PM >>> To: Ilya Maximets <i.maxim...@samsung.com>; Asaf Sinai >>> <asa...@radware.com>; dev@dpdk.org; Thomas Monjalon <tho...@monjalon.net> >>> Subject: Re: [dpdk-dev] CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES: no difference >>> in memory pool allocations, when enabling/disabling this configuration >>> >>> On 26-Nov-18 2:32 PM, Ilya Maximets wrote: >>>> On 26.11.2018 17:21, Burakov, Anatoly wrote: >>>>> On 26-Nov-18 2:10 PM, Ilya Maximets wrote: >>>>>> On 26.11.2018 16:42, Burakov, Anatoly wrote: >>>>>>> On 26-Nov-18 1:20 PM, Ilya Maximets wrote: >>>>>>>> On 26.11.2018 16:16, Ilya Maximets wrote: >>>>>>>>> On 26.11.2018 15:50, Burakov, Anatoly wrote: >>>>>>>>>> On 26-Nov-18 11:43 AM, Burakov, Anatoly wrote: >>>>>>>>>>> On 26-Nov-18 11:33 AM, Asaf Sinai wrote: >>>>>>>>>>>> Hi Anatoly, >>>>>>>>>>>> >>>>>>>>>>>> We did not check it with "testpmd", only with our application. >>>>>>>>>>>> From the beginning, we did not enable this configuration >>>>>>>>>>>> (look at attached files), and everything works fine. >>>>>>>>>>>> Of course we rebuild DPDK, when we change configuration. >>>>>>>>>>>> Please note that we use DPDK 17.11.3, maybe this is why it works >>>>>>>>>>>> fine? >>>>>>>>>>> Just tested with DPDK 17.11, and yes, it does work the way you are >>>>>>>>>>> describing. This is not intended behavior. I will look into it. >>>>>>>>>>> >>>>>>>>>> +CC author of commit introducing CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES. >>>>>>>>>> >>>>>>>>>> Looking at the code, i think this config option needs to be reworked >>>>>>>>>> and we should clarify what we mean by this option. It appears that >>>>>>>>>> i've misunderstood what this option actually intended to do, and i >>>>>>>>>> also think it's naming could be improved because it's confusing and >>>>>>>>>> misleading. >>>>>>>>>> >>>>>>>>>> In 17.11, this option does *not* prevent EAL from using NUMA - it >>>>>>>>>> merely disables using libnuma to perform memory allocation. This >>>>>>>>>> looks like intended (if counter-intuitive) behavior - disabling this >>>>>>>>>> option will simply revert DPDK to working as it did before this >>>>>>>>>> option was introduced (i.e. best-effort allocation). This is why >>>>>>>>>> your code still works - because EAL still does allocate memory on >>>>>>>>>> socket 1, and *knows* that it's socket 1 memory. It still supports >>>>>>>>>> NUMA. >>>>>>>>>> >>>>>>>>>> The commit message for these changes states that the actual purpose >>>>>>>>>> of this option is to enable "balanced" hugepage allocation. In case >>>>>>>>>> of cgroups limitations, previously, DPDK would've exhausted all >>>>>>>>>> hugepages on master core's socket before attempting to allocate from >>>>>>>>>> other sockets, but by the time we've reached cgroups limits on >>>>>>>>>> numbers of hugepages, we might not have reached socket 1 and thus >>>>>>>>>> missed out on the pages we could've allocated, but didn't. Using >>>>>>>>>> libnuma solves this issue, because now we can allocate pages on >>>>>>>>>> sockets we want, instead of hoping we won't run out of hugepages >>>>>>>>>> before we get the memory we need. >>>>>>>>>> >>>>>>>>>> In 18.05 onwards, this option works differently (and arguably >>>>>>>>>> wrong). More specifically, it disallows allocations on sockets other >>>>>>>>>> than 0, and it also makes it so that EAL does not check which socket >>>>>>>>>> the memory *actually* came from. So, not only allocating memory from >>>>>>>>>> socket 1 is disabled, but allocating from socket 0 may even get you >>>>>>>>>> memory from socket 1! >>>>>>>>> I'd consider this as a bug. >>>>>>>>> >>>>>>>>>> +CC Thomas >>>>>>>>>> >>>>>>>>>> The CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES option is a misnomer, >>>>>>>>>> because it makes it seem like this option disables NUMA support, >>>>>>>>>> which is not the case. >>>>>>>>>> >>>>>>>>>> I would also argue that it is not relevant to 18.05+ memory >>>>>>>>>> subsystem, and should only work in legacy mode, because it is >>>>>>>>>> *impossible* to make it work right in the new memory subsystem, and >>>>>>>>>> here's why: >>>>>>>>>> >>>>>>>>>> Without libnuma, we have no way of "asking" the kernel to allocate a >>>>>>>>>> hugepage on a specific socket - instead, any allocation will most >>>>>>>>>> likely happen on socket from which the allocation came from. For >>>>>>>>>> example, if user program's lcore is on socket 1, allocation on >>>>>>>>>> socket 0 will actually allocate a page on socket 1. >>>>>>>>>> >>>>>>>>>> If we don't check for page's NUMA node affinity (which is what >>>>>>>>>> currently happens) - we get performance degradation because we may >>>>>>>>>> unintentionally allocate memory on wrong NUMA node. If we do check >>>>>>>>>> for this - then allocation of memory on socket 1 from lcore on >>>>>>>>>> socket 0 will almost never succeed, because kernel will always give >>>>>>>>>> us pages on socket 0. >>>>>>>>>> >>>>>>>>>> Put it simply, there is no sane way to make this option work for the >>>>>>>>>> new memory subsystem - IMO it should be dropped, and libnuma should >>>>>>>>>> be made a hard dependency on Linux. >>>>>>>>> I agree that new memory model could not work without libnuma, >>>>>>>>> i.e. will lead to unpredictable memory allocations with no any >>>>>>>>> respect to requested socket_id's. I also agree that >>>>>>>>> CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES is only sane for a legacy memory >>>>>>>>> model. >>>>>>>>> It looks like we have no other choice than just drop the option >>>>>>>>> and make the code unconditional, i.e. have hard dependency on libnuma. >>>>>>>>> >>>>>>>> We, probably, could compile this code and have hard dependency >>>>>>>> only for platforms with 'RTE_MAX_NUMA_NODES > 1'. >>>>>>> Well, as long as legacy mode stays supported, we have to keep the >>>>>>> option. The "drop" part was referring to supporting it under the new >>>>>>> memory system, not a literal drop from config files. >>>>>> The option was introduced because we didn't want to introduce the >>>>>> new hard dependency. Since we'll have it anyway, I'm not sure if >>>>>> keeping the option for legacy mode makes any sense. >>>>> Oh yes, you're right. Drop it is! >>>>> >>>>>>> As for using RTE_MAX_NUMA_NODES, i don't think it's merited. >>>>>>> Distributions cannot deliver different DPDK versions based on the >>>>>>> number of sockets on a particular machine - so it would have to be a >>>>>>> hard dependency for distributions anyway (does any distribution ship >>>>>>> DPDK without libnuma?). >>>>>> At least ARMv7 builds commonly does not ship libnuma package. >>>>> Do you mean libnuma builds for ARMv7 are not available? Or do you mean >>>>> the libnuma package is not installed by default? >>>>> >>>>> If it's the latter, then i believe it's not installed by default >>>>> anywhere, but if using distribution version of DPDK, libnuma will be >>>>> taken care of via package manager. Presumably building from source can be >>>>> taken care of with pkg-config/meson. >>>>> >>>>> Or do you mean ARMv7 does not have libnuma for their arch at all, in any >>>>> distro? >>>> libnuma builds for ARMv7 are not available in most of the distros. I >>>> didn't check all, but here is results for Ubuntu: >>>> >>>> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpac >>>> kages.ubuntu.com%2Fsearch%3Fsuite%3Dbionic%26arch%3Darmhf%26searchon%3 >>>> Dnames%26keywords%3Dlibnuma&data=02%7C01%7CAsafSi%40radware.com%7C >>>> a44f84bca42d4a52acac08d653af83b8%7C6ae4e000b5d04f48a766402d46119b76%7C >>>> 0%7C0%7C636788410626179927&sdata=1pJ0WkAs6Y%2Bv3w%2BhKAELBw%2BjMra >>>> BnhiqqpsXkRv2ifI%3D&reserved=0 >>>> >>>> You may see that Ubuntu 18.04 (bionic) has no libnuma package for >>>> 'armhf' and also 'powerpc' platforms. >>>> >>> That's a difficulty. Do these platforms support NUMA? In other words, could >>> we replace this flag with just outright disabling NUMA support? >> >> Many platforms don't support NUMA, so they dont' really need libnuma. >> >> Mandating libnuma will also break several things: >> >> - cross build for ARM on x86 - which is among the preferred method >> for build by many in ARM community. >> >> - many of the embedded SoCs are without NUMA support, they use smaller >> rootf (e.g. Yocto). It will be a burden to add libnuma there. >> > > OK, point taken. > > So, the alternative would be to have the ability to outright disable NUMA > support (either with a new option, or reworking this one - i would prefer a > new one, since this one is confusingly named). Meaning, report all cores as > socket 0, report all hardware as socket 0, report all memory as socket 0 and > never care about NUMA nodes anywhere. > > Would that work? E.g. by default, make libnuma a hard dependency on x86 Linux > (but allow to disable it), but disable it everywhere else?
I think, you may just rename the RTE_EAL_NUMA_AWARE_HUGEPAGES to something like RTE_EAL_NUMA_SUPPORT and keep all the defaults as is, i.e. * globally disabled * enabled for linux * disabled for armv7a, dpaa, dpaa2 and stingray. Meson could handle everything dynamically. >> >>> >>>>>>> For those compiling from source - are there any supported >>>>>>> distributions which don't package libnuma? I don't see much sense >>>>>>> in keeping libnuma optional, IMO. This is of course up to the tech >>>>>>> board to decide, but IMO the "without libnuma it's basically >>>>>>> broken" argument is very strong in my opinion :) >>>>>>> >>>>> >>> >>> -- >>> Thanks, >>> Anatoly > >