On 9/27/2023 2:48 PM, Stanisław Kardach wrote: > On Wed, Sep 27, 2023 at 1:55 PM Ferruh Yigit <ferruh.yi...@amd.com> wrote: >> >> On 9/21/2023 3:49 PM, Stanisław Kardach wrote: >>> On Thu, Sep 21, 2023, 15:18 Tummala, Sivaprasad >>> <sivaprasad.tumm...@amd.com <mailto:sivaprasad.tumm...@amd.com>> wrote: >>> >>> [AMD Official Use Only - General] >>> >>> > -----Original Message----- >>> > From: David Marchand <david.march...@redhat.com >>> <mailto:david.march...@redhat.com>> >>> > Sent: Wednesday, September 20, 2023 1:05 PM >>> > To: Stanisław Kardach <k...@semihalf.com >>> <mailto:k...@semihalf.com>>; Tummala, Sivaprasad >>> > <sivaprasad.tumm...@amd.com <mailto:sivaprasad.tumm...@amd.com>> >>> > Cc: Ruifeng Wang <ruifeng.w...@arm.com >>> <mailto:ruifeng.w...@arm.com>>; Min Zhou <zhou...@loongson.cn >>> <mailto:zhou...@loongson.cn>>; >>> > David Christensen <d...@linux.vnet.ibm.com >>> <mailto:d...@linux.vnet.ibm.com>>; Bruce Richardson >>> > <bruce.richard...@intel.com <mailto:bruce.richard...@intel.com>>; >>> Konstantin Ananyev >>> > <konstantin.v.anan...@yandex.ru >>> <mailto:konstantin.v.anan...@yandex.ru>>; dev <dev@dpdk.org >>> <mailto:dev@dpdk.org>>; Yigit, Ferruh >>> > <ferruh.yi...@amd.com <mailto:ferruh.yi...@amd.com>>; Thomas >>> Monjalon <tho...@monjalon.net <mailto:tho...@monjalon.net>> >>> > Subject: Re: [PATCH v2 2/2] eal: remove NUMFLAGS enumeration >>> > >>> > Caution: This message originated from an External Source. Use >>> proper caution >>> > when opening attachments, clicking links, or responding. >>> > >>> > >>> > On Wed, Sep 20, 2023 at 8:01 AM Stanisław Kardach >>> <k...@semihalf.com <mailto:k...@semihalf.com>> wrote: >>> > > >>> > > On Tue, Sep 19, 2023 at 4:47 PM David Marchand >>> > <david.march...@redhat.com <mailto:david.march...@redhat.com>> wrote: >>> > > <snip> >>> > > > > Also I see you're still removing the RTE_CPUFLAG_NUMFLAGS >>> (what I call a >>> > last element canary). Why? If you're concerned with ABI, then >>> we're talking about >>> > an application linking dynamically with DPDK or talking via some >>> RPC channel with >>> > another DPDK application. So clashing with this definition does >>> not come into >>> > question. One should rather use rte_cpu_get_flag_enabled(). >>> > > > > Also if you want to introduce new features, one would add >>> them yo the >>> > rte_cpuflags headers, unless you'd like to not add those and keep an >>> > undocumented list "above" the last defined element. >>> > > > > Could you explain a bit more Your use-case? >>> > > > >>> > > > Hey Stanislaw, >>> > > > >>> > > > Talking generically, one problem with such pattern (having a LAST, >>> > > > or MAX enum) is when an array sized with such a symbol is exposed. >>> > > > As I mentionned in the past, this can have unwanted effects: >>> > > > >>> https://patchwork.dpdk.org/project/dpdk/patch/20230919140430.3251493 >>> <https://patchwork.dpdk.org/project/dpdk/patch/20230919140430.3251493> >>> > > > -1-david.march...@redhat.com/ >>> <http://1-david.march...@redhat.com/> >>> > >>> > Argh... who broke copy/paste in my browser ?! >>> > Wrt to MAX and arrays, I wanted to point at: >>> > >>> >>> http://inbox.dpdk.org/dev/CAJFAV8xs5CVdE2xwRtaxk5vE_PiQMV5LY5tKStk3R1gOuR >>> <http://inbox.dpdk.org/dev/CAJFAV8xs5CVdE2xwRtaxk5vE_PiQMV5LY5tKStk3R1gOuR> >>> > t...@mail.gmail.com/ <http://t...@mail.gmail.com/> >>> > >>> > > I agree, though I'd argue "LAST" and "MAX" semantics are a bit >>> different. "LAST" >>> > delimits the known enumeration territory while "MAX" is more of a >>> `constepxr` >>> > value type. >>> > > > >>> > > > Another issue is when an existing enum meaning changes: from the >>> > > > application pov, the (old) MAX value is incorrect, but for the >>> > > > library pov, a new meaning has been associated. >>> > > > This may trigger bugs in the application when calling a function >>> > > > that returns such an enum which never return this MAX value in >>> the past. >>> > > > >>> > > > For at least those two reasons, removing those canary elements is >>> > > > being done in DPDK. >>> > > > >>> > > > This specific removal has been announced: >>> > > > >>> https://patchwork.dpdk.org/project/dpdk/patch/20230919140430.3251493 >>> <https://patchwork.dpdk.org/project/dpdk/patch/20230919140430.3251493> >>> > > > -1-david.march...@redhat.com/ >>> <http://1-david.march...@redhat.com/> >>> > > Thanks for pointing this out but did you mean to link to the >>> patch again here? >>> > >>> > Sorry, same here, bad copy/paste :-(. >>> > >>> > The intended link is: >>> https://git.dpdk.org/dpdk/commit/?id=5da7c13521 >>> <https://git.dpdk.org/dpdk/commit/?id=5da7c13521> >>> > The deprecation notice was badly formulated and this patch here is >>> consistent with >>> > it. >>> > >>> > >>> > > > >>> > > > Now, practically, when I look at the cpuflags API, I don't see us >>> > > > exposed to those two issues wrt rte_cpu_flag_t, so maybe this >>> change >>> > > > is unneeded. >>> > > > But on the other hand, is it really an issue for an application to >>> > > > lose this (internal) information? >>> > > I doubt it, maybe it could be used as a sanity check for >>> choosing proper functors >>> > in the application. Though the initial description of the reason >>> behind this patch was >>> > to not break the ABI and I don't think it does that. What it does >>> is enforces users to >>> > use explicit cpu flag values which is a good thing. Though if so, >>> then it should be >>> > stated in the commit description. >>> > >>> > I agree. >>> > Siva, can you work on a new revision? >>> > >>> David, Stanislaw, >>> >>> The original motivation of this patch was to avoid ABI breakage with >>> the introduction of new CPU flag >>> "RTE_CPUFLAG_MONITORX" >>> (http://mails.dpdk.org/archives/test-report/2023-April/382489.html >>> <http://mails.dpdk.org/archives/test-report/2023-April/382489.html>). >>> >>> Because of ABI breakage, the feature was postponed to this release. >>> >>> https://patchwork.dpdk.org/project/dpdk/patch/20230413115334.43172-3-sivaprasad.tumm...@amd.com/ >>> >>> <https://patchwork.dpdk.org/project/dpdk/patch/20230413115334.43172-3-sivaprasad.tumm...@amd.com/> >>> >>> This test is flawed, reason being that the NUMFLAGS should not be >>> treated as a flag value and instead as a canary but this test is not >>> taking into account. >>> >> >> Hi Stanislaw, >> >> Why test is flawed? >> >> The enum in in the public header, so the 'RTE_CPUFLAG_NUMFLAGS' enum >> item, and there are APIs using the enum, so the enum exchanged between >> shared library and the application. > In a similar way lots of Linux uapi headers contain bits that should > not be used directly, even though they are defined there. The reason > for that is the C language syntax, not necessarily the intent of a > developer. > Since NUMFLAGS was a canary to make the flag handling code easier, it > should not be treated as a "real" value and hence my suggestion of a > flawed test. That said, NUMFLAGS does not bring enough value to not > remove it. :) >
Both it doesn't enough value to hang on, and we don't have control on how it is used by the application once it is exposed by the library. >> >> Similar thing discussed before and when enum exchanged between >> application and shared library, there is an ABI breakage risk when enum >> extended and general tendency is to eliminate the MAX value to reduce >> the risk. > Agreed though as I have mentioned before, "MAX" has a different > semantics than "NUM". Then again since we have rte_cpu_feature_table, > we can RTE_DIM to check the user input. > Their usage and intention on having them is same I think, can you please elaborate what is the difference between MAX and NUM enum items that is added as last item in an enum? >> >> >> When enum value sent from library to application, it is more clear that >> this can cause an ABI breakage, because application can receive a value >> that it is not aware in the build time, which can cause unexpected behavior. >> Simply think about a case application allocated array in >> 'RTE_CPUFLAG_NUMFLAGS' size and directly accessing the array index based >> on returned enum item value, if the enum extended in the new version of >> the shared library, this can cause invalid memory access in application. > Using the NUM enum element (which serves as a last item canary) to > size an array is not a good idea unless it's returned from a runtime > call. Otherwise one hits issues that you've described. > I agree :), but that is a way to describe how it can be a problem. Also last time I argued similar to what you said, that application should check against MAX value before using it but I have been told not to assume what application does. My take from it is, expect worst from application as a library side developer. >> >> When enum value sent from application to library, I am not quite sure >> how problematic it is to be honest. Like being in the >> 'rte_cpu_get_flag_enabled()' & 'rte_cpu_get_flag_name()' in question. >> Only when application sends 'RTE_CPUFLAG_NUMFLAGS' to >> 'rte_cpu_get_flag_name()', it expects a NULL returned, but this won't >> happen in new version of the shared library, not sure if this can cause >> any problem for the application. >> But as I mentioned, general guidance is to eliminate this kind of MAX >> enum value usage. >> >> >> And for this specific issue, although usage of the enum in >> 'rte_cpu_get_flag_enabled()' & 'rte_cpu_get_flag_name()' APIs is not >> clear if it cause ABI breakage, >> enum being embedded into the 'struct rte_bbdev_driver_info' struct >> doesn't leave a question, since this struct is returned from library to >> the application and change in the enum causes an ABI breakage. > Enum size does not change irrespective of changing its values. So > size-wise it's not an ABI breakage. Re-ordering values is an ABI > breakage.> Agree it is not size-wise issue. But still an issue. >> >> >> Briefly, I think even appending to the end of 'enum rte_cpu_flag_t' >> cause ABI breakage and removing 'RTE_CPUFLAG_NUMFLAGS' helps to extend >> this enum in the future. >> And an outstanding deprecation notice already exists for this: >> https://git.dpdk.org/dpdk/tree/doc/guides/rel_notes/deprecation.rst?h=v23.07#n63 >> >> >>> Your change did not break the ABI because you have properly added the >>> new flag at the end. >>> So I would ask to change the commit description to mention that NUMFLAGS >>> is removed to: >>> 1. Prevent users from treating it as a usable value or an array size. >>> 2. Prevent false-positive failures in the ABI test. >>> >>> Also it would be good to link to the aforementioned ABI test failure to >>> give readers some context when inspecting the git tree. >>> >>> >>> >>> Can you please add what exactly needs to be reworked in the new version. >>> >>> > >>> > Thanks. >>> > >>> > -- >>> > David Marchand >>> >> > >