Hi, Dumitru Yes, sure.
> On 12 Mar 2025, at 17:51, Dumitru Ceara <[email protected]> wrote: > > Hi Aleksandra, > > I see you posted [0] to add checksum checks for northd stages. > > We plan to bump OVN_INTERNAL_MINOR_VER as a side effect of a different > change anyway [1] - the v5 of that patch will likely be backported to > all stable branches. > > I'm considering moving "[PATCH ovn] northd: Fixed stage name mapping in > lflow table." [2] to "Rejected" in patchwork as [0] will cover these > cases in the future. > > What do you think? > > Thanks, > Dumitru > > [0] > https://patchwork.ozlabs.org/project/ovn/patch/[email protected]/ > [1] > https://patchwork.ozlabs.org/project/ovn/patch/[email protected]/#3477523 > [2] > https://patchwork.ozlabs.org/project/ovn/patch/[email protected]/ > > On 3/11/25 5:28 PM, Ilya Maximets wrote: >> On 3/11/25 16:58, Aleksandra Rukomoinikova wrote: >>> *Good evening,* >>> >>> You are absolutely correct about the Bump version being helpful. I observed >>> this >>> while working on the intermediate update. >>> >>> Perhaps we should consider this and calculate the checksum similarly to the >>> base, >>> but specifically for the order of stages in northd? >> >> I'd suggest we add a new PHONY target in the Makefile.am for this. See the >> check-ifconfig target for example. It will run for every build. >> And we can do a checksum comparison trick similarly as we do for database >> schema checksums. For example, we could add something like: >> "The current pipeline checksum is '1317960254 5844'" to the comment above >> the definition of OVN_INTERNAL_MINOR_VER. And our new build target would >> do something like "grep '^\s*PIPELINE_STAGE(' northd/northd.h | cksum", >> then grep out the checksum value from the comment and compare them, failing >> the build if they do not match. >> >> This will ensure that every time someone touches the pipeline definitions, >> they will have to update the checksum above OVN_INTERNAL_MINOR_VER definition >> and so they will consider updating the version itself. >> >> What do you think? >> >> Best regards, Ilya Maximets. >> >>> >>> I must admit, the manual Bump version process is a bit confusing in my >>> opinion. >>> >>>> On 11 Mar 2025, at 14:03, Dumitru Ceara <[email protected]> wrote: >>>> >>>> On 3/11/25 11:40 AM, Ilya Maximets wrote: >>>>> On 3/11/25 11:11, Dumitru Ceara wrote: >>>>>> Hi Aleksandra, Ilya, >>>>>> >>>>>> On 3/11/25 12:01 AM, Ilya Maximets wrote: >>>>>>> On 3/7/25 16:51, Aleksandra Rukomoinikova wrote: >>>>>>>> Hi Ilya! >>>>>>>> >>>>>>>> I’m not entirely clear on your point about the impact of this change >>>>>>>> on performance. >>>>>>>> Formally, without my changes, two strings are already added to the >>>>>>>> hash: action and clone, >>>>>>>> which are usually much longer than just the stage name. >>>>>>>> I don’t think the recalculation of flows based on hashes is >>>>>>>> particularly significant. >>>>>>>> Could you explain the impact? Is this somehow related to datapath >>>>>>>> groups? >>>>>>> >>>>>>> Each lflow is getting hashed and compared against existing ones right >>>>>>> after >>>>>>> creation. At least in the past, that was a fairly CPU intensive >>>>>>> operation. >>>>>>> Especially because in some cases northd generated a large amount of >>>>>>> duplicated >>>>>>> logical flows, so the actual number of comparisons was significantly >>>>>>> larger >>>>>>> than the total number of logical flows that ended up in the southbound >>>>>>> DB. >>>>>>> >>>>>>> So, even a smallest of changes in a way lflows are hashed and compared >>>>>>> would >>>>>>> result in very significant changes in the overall performance of >>>>>>> ovn-northd. >>>>>>> >>>>>>> However, over time we optimized different aspects of lflow generation >>>>>>> including >>>>>>> optimized lflow generation for datapath groups in places where northd >>>>>>> knows >>>>>>> that all the flows will indeed be grouped. And there are not so many >>>>>>> cases >>>>>>> now where duplicates can be generated, so it seems to be less of a >>>>>>> concern in >>>>>>> the latest versions of OVN. >>>>>>> >>>>>>>> >>>>>>>> I also thought about an alternative approach where we would know the >>>>>>>> hashes for each >>>>>>>> stage name in advance and simply add this hash instead of computing it >>>>>>>> from the string, >>>>>>>> since it’s a stateful value. However, ovn_stage_to_str doesn’t access >>>>>>>> the database but >>>>>>>> retrieves the string from memory, and this didn’t seem like a >>>>>>>> significant optimization to me. >>>>>>>> >>>>>>>> I conducted some research on the impact of this patch on the >>>>>>>> performance of upstream version. >>>>>>>> In a setup with 6000 logical switches and 15,000 ports, I collected >>>>>>>> some metrics regarding performance. >>>>>>>> >>>>>>>> The average utilization metrics for hash maps during recomputation >>>>>>>> were: >>>>>>>> • Without the patch: >>>>>>>> hmap_expand: 5928.750/sec (average), 98.8125/sec (overall average) >>>>>>>> >>>>>>>> • With the patch: >>>>>>>> hmap_expand: 5921.117/sec (average), 98.6853/sec (overall average) >>>>>>>> >>>>>>> >>>>>>> This one is not a very useful metric. At least for this measurement. >>>>>>> >>>>>>>> Statistics for lflows_to_sb: >>>>>>>> • With the patch: 154.000000 msec >>>>>>>> • Without the patch: 151.000000 msec >>>>>>>> >>>>>>>> These results seem to fall within the range of normal deviation. >>>>>>> >>>>>>> It seems, you're right. I did a few tests of my own and I also see an >>>>>>> insignificant performance difference in most cases. It is a little >>>>>>> surprising >>>>>>> to not see much difference, but maybe it shouldn't bee. All the >>>>>>> changes we made >>>>>>> in the last few years indeed decreased the importance of the lflow >>>>>>> hashing and >>>>>>> comparisons. >>>>>>> >>>>>>> I would like to run a larger set of ovn-heater tests though before >>>>>>> concluding that >>>>>>> there is actually no real difference. However, ... >>>>>>> >>>>>>> It seems like an OVN_INTERNAL_MINOR_VER wasn't increased for a long >>>>>>> time now and >>>>>>> it supposed to be increased every time logical pipeline stages are >>>>>>> added/modified. >>>>>>> That might be the root of the issue you're trying to solve. >>>>>>> >>>>>>> When the internal version changes, that should trigger updates for all >>>>>>> the stage >>>>>>> names and hints and the source locators for all the logical flows, see >>>>>>> the >>>>>>> 'if (ovn_internal_version_changed)' branch in the sync_lflow_to_sb(). >>>>>>> >>>>>>> I think, we just need to bump that number and maybe add some build >>>>>>> assertions, >>>>>>> so we do not forget about it again... >>>>>>> >>>>>> >>>>>> That's what the comment above the OVN_INTERNAL_MINOR_VER definition >>>>>> says, yes: >>>>>> >>>>>> /* Increment this for any logical flow changes, if an existing OVN >>>>>> action is >>>>>> * modified or a stage is added to a logical pipeline. >>>>>> * >>>>>> * This value is also used to handle some backward compatibility during >>>>>> * upgrading. It should never decrease or rewind. */ >>>>>> >>>>>> So we could try to add a compile time check to make sure we didn't >>>>>> forget to bump this. However, ... :) >>>>>> >>>>>>> What do you think? Dumitru, what's your take on this? >>>>>>> >>>>>> >>>>>> The definition of ovn_get_internal_version() is: >>>>>> >>>>>> /* Returns the OVN version. The caller must free the returned value. */ >>>>>> char * >>>>>> ovn_get_internal_version(void) >>>>>> { >>>>>> return xasprintf("%s-%s-%d.%d", OVN_PACKAGE_VERSION, >>>>>> sbrec_get_db_version(), >>>>>> N_OVNACTS, OVN_INTERNAL_MINOR_VER); >>>>>> } >>>>>> >>>>>> I'd expect production upgrades move to a new version so >>>>>> OVN_PACKAGE_VERSION >>>>>> should change, or am I missing something? >>>>> >>>>> AFAICT, we backport pipeline stage changes from time to time, and the >>>>> internal >>>>> version will not change in this case, if the downstream consumer of those >>>>> changes (distribution packages) do not have custom patches to bump the >>>>> internal >>>>> minor version. Same will be true for someone who is following a bleeding >>>>> edge >>>>> main branch for one reason or another. >>>>> >>>>> So, it's true that this issue should not appear while updating between >>>>> official >>>>> OVN upstream releases (minor or major), but it may be seen mid-way, and >>>>> so it >>>>> can be seen in distributions. >>>>> >>>> >>>> That's a good point. >>>> >>>>> That's also assuming that 'ovn_internal_version_changed' logic actually >>>>> works >>>>> as expected (I didn't test). >>>>> >>>> >>>> It seems to work.. it's a bit awkward because it checks if the version >>>> in NB.NB_Global.options:northd_internal_version changed. I would've >>>> expected it to check the SB.SB_Global table. However, that's not wrong >>>> as ovn-northd (old version too) syncs the value from SB options to NB >>>> options. >>>> >>>> Aleksandra, I guess, we need to understand in which cases you hit this >>>> problem? Is it after an upgrade? Were you upgrading between >>>> minor/major upstream OVN versions? Or upgrading mid-way like Ilya >>>> mentioned above. >>>> >>>> If it's the former we must have a hidden bug somewhere. >>>> >>>> If it's the latter maybe it's enough to just bump OVN_INTERNAL_MINOR_VER >>>> as Ilya suggested and add a compile time check. >>>> >>>> What do you think? >>>> >>>>>> >>>>>>> Best regards, Ilya Maximets. >>>>>>> >>>>>> >>>>>> Thanks, >>>>>> Dumitru >>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> On 6 Mar 2025, at 16:50, Ilya Maximets <[email protected]> wrote: >>>>>>>>> >>>>>>>>> On 3/6/25 09:41, Alexandra Rukomoinikova wrote: >>>>>>>>>> When changing the order of stages in the LogicalFlow >>>>>>>>>> table in SB, some rules will be added with an incorrect >>>>>>>>>> stage-name in the external-ids of existing datapath groups. >>>>>>>>>> When adding a stage, new lflows in a new table with the same >>>>>>>>>> match and action that already existed with the same table-id >>>>>>>>>> will remain in the database with unupdated stage names. >>>>>>>>>> >>>>>>>>>> Signed-off-by: Alexandra Rukomoinikova <[email protected]> >>>>>>>>>> --- >>>>>>>>>> lib/ovn-util.c | 5 ++++- >>>>>>>>>> lib/ovn-util.h | 2 +- >>>>>>>>>> northd/lflow-mgr.c | 4 +++- >>>>>>>>>> 3 files changed, 8 insertions(+), 3 deletions(-) >>>>>>>>>> >>>>>>>>>> diff --git a/lib/ovn-util.c b/lib/ovn-util.c >>>>>>>>>> index c65b36bb5..d1572d5ec 100644 >>>>>>>>>> --- a/lib/ovn-util.c >>>>>>>>>> +++ b/lib/ovn-util.c >>>>>>>>>> @@ -646,16 +646,19 @@ sbrec_logical_flow_hash(const struct >>>>>>>>>> sbrec_logical_flow *lf) >>>>>>>>>> { >>>>>>>>>> return ovn_logical_flow_hash(lf->table_id, >>>>>>>>>> >>>>>>>>>> ovn_pipeline_from_name(lf->pipeline), >>>>>>>>>> + smap_get_def(&lf->external_ids, >>>>>>>>>> + "stage-name", ""), >>>>>>>>>> lf->priority, lf->match, >>>>>>>>>> lf->actions); >>>>>>>>>> } >>>>>>>>> >>>>>>>>> Hi, Alexandra. Thanks for the patch! >>>>>>>>> >>>>>>>>> Though, IIRC, it was an intentional decision to not include the stage >>>>>>>>> name in the hash/comparison to avoid a significant performance impact >>>>>>>>> that this hashmap lookup and the string hashing bring. >>>>>>>>> >>>>>>>>> The issue only affects debugging capabilities, but the performance >>>>>>>>> impact of hashing the name on each database update and comparing extra >>>>>>>>> strings on lflow generation as well as re-writing most of the flows >>>>>>>>> on stage updates is significant and affects every operation in northd. >>>>>>>>> >>>>>>>>> Did you run some scale tests with this change? What's the impact in >>>>>>>>> current OVN? >>>>>>>>> >>>>>>>>> Best regards, Ilya Maximets. >>> >> > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
