Hi, Dumitru

Yes, sure.

> On 12 Mar 2025, at 17:51, Dumitru Ceara <[email protected]> wrote:
> 
> Hi Aleksandra,
> 
> I see you posted [0] to add checksum checks for northd stages.
> 
> We plan to bump OVN_INTERNAL_MINOR_VER as a side effect of a different
> change anyway [1] - the v5 of that patch will likely be backported to
> all stable branches.
> 
> I'm considering moving "[PATCH ovn] northd: Fixed stage name mapping in
> lflow table." [2] to "Rejected" in patchwork as [0] will cover these
> cases in the future.
> 
> What do you think?
> 
> Thanks,
> Dumitru
> 
> [0]
> https://patchwork.ozlabs.org/project/ovn/patch/[email protected]/
> [1]
> https://patchwork.ozlabs.org/project/ovn/patch/[email protected]/#3477523
> [2]
> https://patchwork.ozlabs.org/project/ovn/patch/[email protected]/
> 
> On 3/11/25 5:28 PM, Ilya Maximets wrote:
>> On 3/11/25 16:58, Aleksandra Rukomoinikova wrote:
>>> *Good evening,*
>>> 
>>> You are absolutely correct about the Bump version being helpful. I observed 
>>> this
>>> while working on the intermediate update.
>>> 
>>> Perhaps we should consider this and calculate the checksum similarly to the 
>>> base,
>>> but specifically for the order of stages in northd?
>> 
>> I'd suggest we add a new PHONY target in the Makefile.am for this.  See the
>> check-ifconfig target for example.  It will run for every build.
>> And we can do a checksum comparison trick similarly as we do for database
>> schema checksums.  For example, we could add something like:
>> "The current pipeline checksum is '1317960254 5844'" to the comment above
>> the definition of OVN_INTERNAL_MINOR_VER.  And our new build target would
>> do something like "grep '^\s*PIPELINE_STAGE(' northd/northd.h | cksum",
>> then grep out the checksum value from the comment and compare them, failing
>> the build if they do not match.
>> 
>> This will ensure that every time someone touches the pipeline definitions,
>> they will have to update the checksum above OVN_INTERNAL_MINOR_VER definition
>> and so they will consider updating the version itself.
>> 
>> What do you think?
>> 
>> Best regards, Ilya Maximets.
>> 
>>> 
>>> I must admit, the manual Bump version process is a bit confusing in my 
>>> opinion.
>>> 
>>>> On 11 Mar 2025, at 14:03, Dumitru Ceara <[email protected]> wrote:
>>>> 
>>>> On 3/11/25 11:40 AM, Ilya Maximets wrote:
>>>>> On 3/11/25 11:11, Dumitru Ceara wrote:
>>>>>> Hi Aleksandra, Ilya,
>>>>>> 
>>>>>> On 3/11/25 12:01 AM, Ilya Maximets wrote:
>>>>>>> On 3/7/25 16:51, Aleksandra Rukomoinikova wrote:
>>>>>>>> Hi Ilya!
>>>>>>>> 
>>>>>>>> I’m not entirely clear on your point about the impact of this change 
>>>>>>>> on performance.
>>>>>>>> Formally, without my changes, two strings are already added to the 
>>>>>>>> hash: action and clone,
>>>>>>>> which are usually much longer than just the stage name.
>>>>>>>> I don’t think the recalculation of flows based on hashes is 
>>>>>>>> particularly significant. 
>>>>>>>> Could you explain the impact? Is this somehow related to datapath 
>>>>>>>> groups?
>>>>>>> 
>>>>>>> Each lflow is getting hashed and compared against existing ones right 
>>>>>>> after
>>>>>>> creation.  At least in the past, that was a fairly CPU intensive 
>>>>>>> operation.
>>>>>>> Especially because in some cases northd generated a large amount of 
>>>>>>> duplicated
>>>>>>> logical flows, so the actual number of comparisons was significantly 
>>>>>>> larger
>>>>>>> than the total number of logical flows that ended up in the southbound 
>>>>>>> DB.
>>>>>>> 
>>>>>>> So, even a smallest of changes in a way lflows are hashed and compared 
>>>>>>> would
>>>>>>> result in very significant changes in the overall performance of 
>>>>>>> ovn-northd.
>>>>>>> 
>>>>>>> However, over time we optimized different aspects of lflow generation 
>>>>>>> including
>>>>>>> optimized lflow generation for datapath groups in places where northd 
>>>>>>> knows
>>>>>>> that all the flows will indeed be grouped.  And there are not so many 
>>>>>>> cases
>>>>>>> now where duplicates can be generated, so it seems to be less of a 
>>>>>>> concern in
>>>>>>> the latest versions of OVN.
>>>>>>> 
>>>>>>>> 
>>>>>>>> I also thought about an alternative approach where we would know the 
>>>>>>>> hashes for each
>>>>>>>> stage name in advance and simply add this hash instead of computing it 
>>>>>>>> from the string,
>>>>>>>> since it’s a stateful value. However, ovn_stage_to_str doesn’t access 
>>>>>>>> the database but
>>>>>>>> retrieves the string from memory, and this didn’t seem like a 
>>>>>>>> significant optimization to me.
>>>>>>>> 
>>>>>>>> I conducted some research on the impact of this patch on the 
>>>>>>>> performance of upstream version.
>>>>>>>> In a setup with 6000 logical switches and 15,000 ports, I collected 
>>>>>>>> some metrics regarding performance.
>>>>>>>> 
>>>>>>>> The average utilization metrics for hash maps during recomputation 
>>>>>>>> were:
>>>>>>>>  • Without the patch:
>>>>>>>> hmap_expand: 5928.750/sec (average), 98.8125/sec (overall average)
>>>>>>>> 
>>>>>>>>  • With the patch:
>>>>>>>> hmap_expand: 5921.117/sec (average), 98.6853/sec (overall average)
>>>>>>>> 
>>>>>>> 
>>>>>>> This one is not a very useful metric.  At least for this measurement.
>>>>>>> 
>>>>>>>> Statistics for lflows_to_sb:
>>>>>>>>  • With the patch: 154.000000 msec
>>>>>>>>  • Without the patch: 151.000000 msec
>>>>>>>> 
>>>>>>>> These results seem to fall within the range of normal deviation.
>>>>>>> 
>>>>>>> It seems, you're right.  I did a few tests of my own and I also see an
>>>>>>> insignificant performance difference in most cases.  It is a little 
>>>>>>> surprising
>>>>>>> to not see much difference, but maybe it shouldn't bee.  All the 
>>>>>>> changes we made
>>>>>>> in the last few years indeed decreased the importance of the lflow 
>>>>>>> hashing and
>>>>>>> comparisons.
>>>>>>> 
>>>>>>> I would like to run a larger set of ovn-heater tests though before 
>>>>>>> concluding that
>>>>>>> there is actually no real difference.  However, ...
>>>>>>> 
>>>>>>> It seems like an OVN_INTERNAL_MINOR_VER wasn't increased for a long 
>>>>>>> time now and
>>>>>>> it supposed to be increased every time logical pipeline stages are 
>>>>>>> added/modified.
>>>>>>> That might be the root of the issue you're trying to solve.
>>>>>>> 
>>>>>>> When the internal version changes, that should trigger updates for all 
>>>>>>> the stage
>>>>>>> names and hints and the source locators for all the logical flows, see 
>>>>>>> the
>>>>>>> 'if (ovn_internal_version_changed)' branch in the sync_lflow_to_sb().
>>>>>>> 
>>>>>>> I think, we just need to bump that number and maybe add some build 
>>>>>>> assertions,
>>>>>>> so we do not forget about it again...
>>>>>>> 
>>>>>> 
>>>>>> That's what the comment above the OVN_INTERNAL_MINOR_VER definition
>>>>>> says, yes:
>>>>>> 
>>>>>> /* Increment this for any logical flow changes, if an existing OVN 
>>>>>> action is
>>>>>> * modified or a stage is added to a logical pipeline.
>>>>>> *
>>>>>> * This value is also used to handle some backward compatibility during
>>>>>> * upgrading. It should never decrease or rewind. */
>>>>>> 
>>>>>> So we could try to add a compile time check to make sure we didn't
>>>>>> forget to bump this.  However, ... :)
>>>>>> 
>>>>>>> What do you think?  Dumitru, what's your take on this?
>>>>>>> 
>>>>>> 
>>>>>> The definition of ovn_get_internal_version() is:
>>>>>> 
>>>>>> /* Returns the OVN version. The caller must free the returned value. */
>>>>>> char *
>>>>>> ovn_get_internal_version(void)
>>>>>> {
>>>>>>    return xasprintf("%s-%s-%d.%d", OVN_PACKAGE_VERSION,
>>>>>>                     sbrec_get_db_version(),
>>>>>>                     N_OVNACTS, OVN_INTERNAL_MINOR_VER);
>>>>>> }
>>>>>> 
>>>>>> I'd expect production upgrades move to a new version so 
>>>>>> OVN_PACKAGE_VERSION
>>>>>> should change, or am I missing something?
>>>>> 
>>>>> AFAICT, we backport pipeline stage changes from time to time, and the 
>>>>> internal
>>>>> version will not change in this case, if the downstream consumer of those
>>>>> changes (distribution packages) do not have custom patches to bump the 
>>>>> internal
>>>>> minor version.  Same will be true for someone who is following a bleeding 
>>>>> edge
>>>>> main branch for one reason or another.
>>>>> 
>>>>> So, it's true that this issue should not appear while updating between 
>>>>> official
>>>>> OVN upstream releases (minor or major), but it may be seen mid-way, and 
>>>>> so it
>>>>> can be seen in distributions.
>>>>> 
>>>> 
>>>> That's a good point.
>>>> 
>>>>> That's also assuming that 'ovn_internal_version_changed' logic actually 
>>>>> works
>>>>> as expected (I didn't test).
>>>>> 
>>>> 
>>>> It seems to work.. it's a bit awkward because it checks if the version
>>>> in NB.NB_Global.options:northd_internal_version changed.  I would've
>>>> expected it to check the SB.SB_Global table.  However, that's not wrong
>>>> as ovn-northd (old version too) syncs the value from SB options to NB
>>>> options.
>>>> 
>>>> Aleksandra, I guess, we need to understand in which cases you hit this
>>>> problem?  Is it after an upgrade?  Were you upgrading between
>>>> minor/major upstream OVN versions?  Or upgrading mid-way like Ilya
>>>> mentioned above.
>>>> 
>>>> If it's the former we must have a hidden bug somewhere.
>>>> 
>>>> If it's the latter maybe it's enough to just bump OVN_INTERNAL_MINOR_VER
>>>> as Ilya suggested and add a compile time check.
>>>> 
>>>> What do you think?
>>>> 
>>>>>> 
>>>>>>> Best regards, Ilya Maximets.
>>>>>>> 
>>>>>> 
>>>>>> Thanks,
>>>>>> Dumitru
>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On 6 Mar 2025, at 16:50, Ilya Maximets <[email protected]> wrote:
>>>>>>>>> 
>>>>>>>>> On 3/6/25 09:41, Alexandra Rukomoinikova wrote:
>>>>>>>>>> When changing the order of stages in the LogicalFlow
>>>>>>>>>> table in SB, some rules will be added with an incorrect
>>>>>>>>>> stage-name in the external-ids of existing datapath groups.
>>>>>>>>>> When adding a stage, new lflows in a new table with the same
>>>>>>>>>> match and action that already existed with the same table-id
>>>>>>>>>> will remain in the database with unupdated stage names.
>>>>>>>>>> 
>>>>>>>>>> Signed-off-by: Alexandra Rukomoinikova <[email protected]>
>>>>>>>>>> ---
>>>>>>>>>> lib/ovn-util.c     | 5 ++++-
>>>>>>>>>> lib/ovn-util.h     | 2 +-
>>>>>>>>>> northd/lflow-mgr.c | 4 +++-
>>>>>>>>>> 3 files changed, 8 insertions(+), 3 deletions(-)
>>>>>>>>>> 
>>>>>>>>>> diff --git a/lib/ovn-util.c b/lib/ovn-util.c
>>>>>>>>>> index c65b36bb5..d1572d5ec 100644
>>>>>>>>>> --- a/lib/ovn-util.c
>>>>>>>>>> +++ b/lib/ovn-util.c
>>>>>>>>>> @@ -646,16 +646,19 @@ sbrec_logical_flow_hash(const struct 
>>>>>>>>>> sbrec_logical_flow *lf)
>>>>>>>>>> {
>>>>>>>>>>     return ovn_logical_flow_hash(lf->table_id,
>>>>>>>>>>                                  
>>>>>>>>>> ovn_pipeline_from_name(lf->pipeline),
>>>>>>>>>> +                                 smap_get_def(&lf->external_ids,
>>>>>>>>>> +                                              "stage-name", ""),
>>>>>>>>>>                                  lf->priority, lf->match,
>>>>>>>>>>                                  lf->actions);
>>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> Hi, Alexandra.  Thanks for the patch!
>>>>>>>>> 
>>>>>>>>> Though, IIRC, it was an intentional decision to not include the stage
>>>>>>>>> name in the hash/comparison to avoid a significant performance impact
>>>>>>>>> that this hashmap lookup and the string hashing bring.
>>>>>>>>> 
>>>>>>>>> The issue only affects debugging capabilities, but the performance
>>>>>>>>> impact of hashing the name on each database update and comparing extra
>>>>>>>>> strings on lflow generation as well as re-writing most of the flows
>>>>>>>>> on stage updates is significant and affects every operation in northd.
>>>>>>>>> 
>>>>>>>>> Did you run some scale tests with this change?  What's the impact in
>>>>>>>>> current OVN?
>>>>>>>>> 
>>>>>>>>> Best regards, Ilya Maximets.
>>> 
>> 
> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to