On 26.11.2018 17:21, Burakov, Anatoly wrote:
> On 26-Nov-18 2:10 PM, Ilya Maximets wrote:
>> On 26.11.2018 16:42, Burakov, Anatoly wrote:
>>> On 26-Nov-18 1:20 PM, Ilya Maximets wrote:
>>>> On 26.11.2018 16:16, Ilya Maximets wrote:
>>>>> On 26.11.2018 15:50, Burakov, Anatoly wrote:
>>>>>> On 26-Nov-18 11:43 AM, Burakov, Anatoly wrote:
>>>>>>> On 26-Nov-18 11:33 AM, Asaf Sinai wrote:
>>>>>>>> Hi Anatoly,
>>>>>>>>
>>>>>>>> We did not check it with "testpmd", only with our application.
>>>>>>>> From the beginning, we did not enable this configuration (look at
>>>>>>>> attached files), and everything works fine.
>>>>>>>> Of course we rebuild DPDK, when we change configuration.
>>>>>>>> Please note that we use DPDK 17.11.3, maybe this is why it works fine?
>>>>>>>
>>>>>>> Just tested with DPDK 17.11, and yes, it does work the way you are
>>>>>>> describing. This is not intended behavior. I will look into it.
>>>>>>>
>>>>>>
>>>>>> +CC author of commit introducing CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES.
>>>>>>
>>>>>> Looking at the code, i think this config option needs to be reworked and
>>>>>> we should clarify what we mean by this option. It appears that i've
>>>>>> misunderstood what this option actually intended to do, and i also think
>>>>>> it's naming could be improved because it's confusing and misleading.
>>>>>>
>>>>>> In 17.11, this option does *not* prevent EAL from using NUMA - it merely
>>>>>> disables using libnuma to perform memory allocation. This looks like
>>>>>> intended (if counter-intuitive) behavior - disabling this option will
>>>>>> simply revert DPDK to working as it did before this option was
>>>>>> introduced (i.e. best-effort allocation). This is why your code still
>>>>>> works - because EAL still does allocate memory on socket 1, and *knows*
>>>>>> that it's socket 1 memory. It still supports NUMA.
>>>>>>
>>>>>> The commit message for these changes states that the actual purpose of
>>>>>> this option is to enable "balanced" hugepage allocation. In case of
>>>>>> cgroups limitations, previously, DPDK would've exhausted all hugepages
>>>>>> on master core's socket before attempting to allocate from other
>>>>>> sockets, but by the time we've reached cgroups limits on numbers of
>>>>>> hugepages, we might not have reached socket 1 and thus missed out on the
>>>>>> pages we could've allocated, but didn't. Using libnuma solves this
>>>>>> issue, because now we can allocate pages on sockets we want, instead of
>>>>>> hoping we won't run out of hugepages before we get the memory we need.
>>>>>>
>>>>>> In 18.05 onwards, this option works differently (and arguably wrong).
>>>>>> More specifically, it disallows allocations on sockets other than 0, and
>>>>>> it also makes it so that EAL does not check which socket the memory
>>>>>> *actually* came from. So, not only allocating memory from socket 1 is
>>>>>> disabled, but allocating from socket 0 may even get you memory from
>>>>>> socket 1!
>>>>>
>>>>> I'd consider this as a bug.
>>>>>
>>>>>>
>>>>>> +CC Thomas
>>>>>>
>>>>>> The CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES option is a misnomer, because it
>>>>>> makes it seem like this option disables NUMA support, which is not the
>>>>>> case.
>>>>>>
>>>>>> I would also argue that it is not relevant to 18.05+ memory subsystem,
>>>>>> and should only work in legacy mode, because it is *impossible* to make
>>>>>> it work right in the new memory subsystem, and here's why:
>>>>>>
>>>>>> Without libnuma, we have no way of "asking" the kernel to allocate a
>>>>>> hugepage on a specific socket - instead, any allocation will most likely
>>>>>> happen on socket from which the allocation came from. For example, if
>>>>>> user program's lcore is on socket 1, allocation on socket 0 will
>>>>>> actually allocate a page on socket 1.
>>>>>>
>>>>>> If we don't check for page's NUMA node affinity (which is what currently
>>>>>> happens) - we get performance degradation because we may unintentionally
>>>>>> allocate memory on wrong NUMA node. If we do check for this - then
>>>>>> allocation of memory on socket 1 from lcore on socket 0 will almost
>>>>>> never succeed, because kernel will always give us pages on socket 0.
>>>>>>
>>>>>> Put it simply, there is no sane way to make this option work for the new
>>>>>> memory subsystem - IMO it should be dropped, and libnuma should be made
>>>>>> a hard dependency on Linux.
>>>>>
>>>>> I agree that new memory model could not work without libnuma, i.e. will
>>>>> lead to unpredictable memory allocations with no any respect to requested
>>>>> socket_id's. I also agree that CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES is only
>>>>> sane for a legacy memory model.
>>>>> It looks like we have no other choice than just drop the option and make
>>>>> the code unconditional, i.e. have hard dependency on libnuma.
>>>>>
>>>>
>>>> We, probably, could compile this code and have hard dependency only for
>>>> platforms with 'RTE_MAX_NUMA_NODES > 1'.
>>>
>>> Well, as long as legacy mode stays supported, we have to keep the option.
>>> The "drop" part was referring to supporting it under the new memory system,
>>> not a literal drop from config files.
>>
>> The option was introduced because we didn't want to introduce the
>> new hard dependency. Since we'll have it anyway, I'm not sure if
>> keeping the option for legacy mode makes any sense.
>
> Oh yes, you're right. Drop it is!
>
>>
>>>
>>> As for using RTE_MAX_NUMA_NODES, i don't think it's merited. Distributions
>>> cannot deliver different DPDK versions based on the number of sockets on a
>>> particular machine - so it would have to be a hard dependency for
>>> distributions anyway (does any distribution ship DPDK without libnuma?).
>>
>> At least ARMv7 builds commonly does not ship libnuma package.
>
> Do you mean libnuma builds for ARMv7 are not available? Or do you mean the
> libnuma package is not installed by default?
>
> If it's the latter, then i believe it's not installed by default anywhere,
> but if using distribution version of DPDK, libnuma will be taken care of via
> package manager. Presumably building from source can be taken care of with
> pkg-config/meson.
>
> Or do you mean ARMv7 does not have libnuma for their arch at all, in any
> distro?
libnuma builds for ARMv7 are not available in most of the distros. I didn't
check all,
but here is results for Ubuntu:
https://packages.ubuntu.com/search?suite=bionic&arch=armhf&searchon=names&keywords=libnuma
You may see that Ubuntu 18.04 (bionic) has no libnuma package for 'armhf' and
also 'powerpc' platforms.
>
>>
>>>
>>> For those compiling from source - are there any supported distributions
>>> which don't package libnuma? I don't see much sense in keeping libnuma
>>> optional, IMO. This is of course up to the tech board to decide, but IMO
>>> the "without libnuma it's basically broken" argument is very strong in my
>>> opinion :)
>>>
>>
>
>