David Marchand <david.march...@redhat.com> writes:

> On Fri, Feb 21, 2020 at 9:19 AM Song, Keesang <keesang.s...@amd.com> wrote:
>>
>> [AMD Official Use Only - Internal Distribution Only]
>
> Please, get this header removed.
> This is a public mailing list.
>
>
>> Thanks Thomas for bringing this up.
>> I consider this is not a new feature, but rather a fix to address
>> the issue with statically assigned maximum lcore limit on
>> high-density CPU platform such as AMD Epyc.
>> As I see a lot of DPDK adopters are still using LTS 18.11 & 19.11,
>> and they have 1~2 yrs of lifetime left, we like to backport this to
>> LTS 18.11 & 19.11 at least.
>
> It is not a fix.
>
> The use of static arrays is a design choice that goes back to the
> early days in dpdk.
> The addition of --lcores came in after this, but it was introduced for
> a different use case than placing lcores on any physical core.
> And there was no claim that a core > RTE_MAX_LCORE would be usable.
>
>
> When testing on a new hardware, it is normal to observe some limitations.
> Running DPDK on those platforms should be possible: "should be"
> because I do not have access to this hardware and saw neither tests
> reports nor performance numbers.
> Before this patch, the limitation is that on Epyc, cores >
> RTE_MAX_LCORE are not usable.
>
>
> Now, this change is quite constrained.
> If we backport it, I don't expect issues in the main dpdk components
> (based on code review and ovs tests with a RTE_MAX_LCORE set to 16 on
> a 24 cores system).
> There might be issues in some examples or not widely used library
> which uses a physical core id instead of a lcore id.
>
>
> This is the same recurring question "do we allow new features in a
> stable branch?".

Usually, the answer is 'no'.  But we do allow some "new" things to be
backported (pci ids, etc) that might be required to enable older
functionality.  Additionally, I'm sure if some feature were required to
mitigate a CVE, we'd rather favor backporting it.

I guess we could pose a litmus test:

  1. Is the problem this feature solves so widespread that it needs to
     be addressed ASAP?
  2. Is there a known workaround to the problem this is solving?
  3. How intrusive is the feature?
  4. Is it shown to be stable in the mainline (number of fixes, testing,
     etc)?
  5. Is it constrained enough that we know we can support it with even
     higher priority than other things?

Probably other questions that will need to be asked.

And even in that list of question, I'm not sure I'd be able to advocate
backporting this in the upstream branches - it hasn't had much testing.
It's unstable.  It's "difficult" to use.  It is not widespread that
people have so many cores.  The workaround is much simpler than
supporting this (recompile).

>
> --
> David Marchand

Reply via email to