On 3/11/25 11:11, Dumitru Ceara wrote:
> Hi Aleksandra, Ilya,
>
> On 3/11/25 12:01 AM, Ilya Maximets wrote:
>> On 3/7/25 16:51, Aleksandra Rukomoinikova wrote:
>>> Hi Ilya!
>>>
>>> I’m not entirely clear on your point about the impact of this change on
>>> performance.
>>> Formally, without my changes, two strings are already added to the hash:
>>> action and clone,
>>> which are usually much longer than just the stage name.
>>> I don’t think the recalculation of flows based on hashes is particularly
>>> significant.
>>> Could you explain the impact? Is this somehow related to datapath groups?
>>
>> Each lflow is getting hashed and compared against existing ones right after
>> creation. At least in the past, that was a fairly CPU intensive operation.
>> Especially because in some cases northd generated a large amount of
>> duplicated
>> logical flows, so the actual number of comparisons was significantly larger
>> than the total number of logical flows that ended up in the southbound DB.
>>
>> So, even a smallest of changes in a way lflows are hashed and compared would
>> result in very significant changes in the overall performance of ovn-northd.
>>
>> However, over time we optimized different aspects of lflow generation
>> including
>> optimized lflow generation for datapath groups in places where northd knows
>> that all the flows will indeed be grouped. And there are not so many cases
>> now where duplicates can be generated, so it seems to be less of a concern in
>> the latest versions of OVN.
>>
>>>
>>> I also thought about an alternative approach where we would know the hashes
>>> for each
>>> stage name in advance and simply add this hash instead of computing it from
>>> the string,
>>> since it’s a stateful value. However, ovn_stage_to_str doesn’t access the
>>> database but
>>> retrieves the string from memory, and this didn’t seem like a significant
>>> optimization to me.
>>>
>>> I conducted some research on the impact of this patch on the performance of
>>> upstream version.
>>> In a setup with 6000 logical switches and 15,000 ports, I collected some
>>> metrics regarding performance.
>>>
>>> The average utilization metrics for hash maps during recomputation were:
>>> • Without the patch:
>>> hmap_expand: 5928.750/sec (average), 98.8125/sec (overall average)
>>>
>>> • With the patch:
>>> hmap_expand: 5921.117/sec (average), 98.6853/sec (overall average)
>>>
>>
>> This one is not a very useful metric. At least for this measurement.
>>
>>> Statistics for lflows_to_sb:
>>> • With the patch: 154.000000 msec
>>> • Without the patch: 151.000000 msec
>>>
>>> These results seem to fall within the range of normal deviation.
>>
>> It seems, you're right. I did a few tests of my own and I also see an
>> insignificant performance difference in most cases. It is a little
>> surprising
>> to not see much difference, but maybe it shouldn't bee. All the changes we
>> made
>> in the last few years indeed decreased the importance of the lflow hashing
>> and
>> comparisons.
>>
>> I would like to run a larger set of ovn-heater tests though before
>> concluding that
>> there is actually no real difference. However, ...
>>
>> It seems like an OVN_INTERNAL_MINOR_VER wasn't increased for a long time now
>> and
>> it supposed to be increased every time logical pipeline stages are
>> added/modified.
>> That might be the root of the issue you're trying to solve.
>>
>> When the internal version changes, that should trigger updates for all the
>> stage
>> names and hints and the source locators for all the logical flows, see the
>> 'if (ovn_internal_version_changed)' branch in the sync_lflow_to_sb().
>>
>> I think, we just need to bump that number and maybe add some build
>> assertions,
>> so we do not forget about it again...
>>
>
> That's what the comment above the OVN_INTERNAL_MINOR_VER definition
> says, yes:
>
> /* Increment this for any logical flow changes, if an existing OVN action is
> * modified or a stage is added to a logical pipeline.
> *
> * This value is also used to handle some backward compatibility during
> * upgrading. It should never decrease or rewind. */
>
> So we could try to add a compile time check to make sure we didn't
> forget to bump this. However, ... :)
>
>> What do you think? Dumitru, what's your take on this?
>>
>
> The definition of ovn_get_internal_version() is:
>
> /* Returns the OVN version. The caller must free the returned value. */
> char *
> ovn_get_internal_version(void)
> {
> return xasprintf("%s-%s-%d.%d", OVN_PACKAGE_VERSION,
> sbrec_get_db_version(),
> N_OVNACTS, OVN_INTERNAL_MINOR_VER);
> }
>
> I'd expect production upgrades move to a new version so OVN_PACKAGE_VERSION
> should change, or am I missing something?
AFAICT, we backport pipeline stage changes from time to time, and the internal
version will not change in this case, if the downstream consumer of those
changes (distribution packages) do not have custom patches to bump the internal
minor version. Same will be true for someone who is following a bleeding edge
main branch for one reason or another.
So, it's true that this issue should not appear while updating between official
OVN upstream releases (minor or major), but it may be seen mid-way, and so it
can be seen in distributions.
That's also assuming that 'ovn_internal_version_changed' logic actually works
as expected (I didn't test).
>
>> Best regards, Ilya Maximets.
>>
>
> Thanks,
> Dumitru
>
>>>
>>>
>>>> On 6 Mar 2025, at 16:50, Ilya Maximets <[email protected]> wrote:
>>>>
>>>> On 3/6/25 09:41, Alexandra Rukomoinikova wrote:
>>>>> When changing the order of stages in the LogicalFlow
>>>>> table in SB, some rules will be added with an incorrect
>>>>> stage-name in the external-ids of existing datapath groups.
>>>>> When adding a stage, new lflows in a new table with the same
>>>>> match and action that already existed with the same table-id
>>>>> will remain in the database with unupdated stage names.
>>>>>
>>>>> Signed-off-by: Alexandra Rukomoinikova <[email protected]>
>>>>> ---
>>>>> lib/ovn-util.c | 5 ++++-
>>>>> lib/ovn-util.h | 2 +-
>>>>> northd/lflow-mgr.c | 4 +++-
>>>>> 3 files changed, 8 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/lib/ovn-util.c b/lib/ovn-util.c
>>>>> index c65b36bb5..d1572d5ec 100644
>>>>> --- a/lib/ovn-util.c
>>>>> +++ b/lib/ovn-util.c
>>>>> @@ -646,16 +646,19 @@ sbrec_logical_flow_hash(const struct
>>>>> sbrec_logical_flow *lf)
>>>>> {
>>>>> return ovn_logical_flow_hash(lf->table_id,
>>>>> ovn_pipeline_from_name(lf->pipeline),
>>>>> + smap_get_def(&lf->external_ids,
>>>>> + "stage-name", ""),
>>>>> lf->priority, lf->match,
>>>>> lf->actions);
>>>>> }
>>>>
>>>> Hi, Alexandra. Thanks for the patch!
>>>>
>>>> Though, IIRC, it was an intentional decision to not include the stage
>>>> name in the hash/comparison to avoid a significant performance impact
>>>> that this hashmap lookup and the string hashing bring.
>>>>
>>>> The issue only affects debugging capabilities, but the performance
>>>> impact of hashing the name on each database update and comparing extra
>>>> strings on lflow generation as well as re-writing most of the flows
>>>> on stage updates is significant and affects every operation in northd.
>>>>
>>>> Did you run some scale tests with this change? What's the impact in
>>>> current OVN?
>>>>
>>>> Best regards, Ilya Maximets.
>>>
>>
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev