On 8/29/25 4:02 PM, Dumitru Ceara wrote: > On 8/29/25 12:22 PM, Ilya Maximets wrote: >> On 8/29/25 9:27 AM, Dumitru Ceara wrote: >>> On 8/28/25 11:55 PM, Ilya Maximets wrote: >>>> On 8/28/25 11:41 PM, Ilya Maximets wrote: >>>>> On 8/13/25 7:20 PM, Mark Michelson via dev wrote: >>>>>> Prior to this commit, if you wanted to find the corresponding northbound >>>>>> UUID for a southbound Logical Datapath, you could find it in one of two >>>>>> places: >>>>>> >>>>>> For logical switches, it was in external-ids:logical-switch. >>>>>> For logical routers, it was in external-ids:logical-router. >>>>>> >>>>>> With this commit, we are separating the type and UUID into separate >>>>>> fields. This way, no matter the type of the datapath, you can find the >>>>>> UUID. And you can find the type of the datapath without having to >>>>>> potentially check multiple external-ids keys to do so. >>>>>> >>>>>> These fields are going to be used pretty heavily by northd in upcoming >>>>>> patches, so instead of making them external-ids, these are now >>>>>> full-fledged columns on the southbound Datapath_Binding. The UUID of the >>>>>> northbound logical datapath is in a column called "nb_uuid" and the type >>>>>> of the logical datapath is stored in an enumerated column called "type". >>>>>> >>>>>> Note that there is a seemingly-unrelated change in the check packet >>>>>> length test in tests/ovn.at. This test started failing because its use >>>>>> of `grep` was picking up the new "nb_uuid" column accidentally. The >>>>>> change to the test uses `fetch_column` since it is more precise. >>>>>> >>>>>> Signed-off-by: Mark Michelson <[email protected]> >>>>>> Acked-by: Ales Musil <[email protected]> >>>>>> --- >>>>>> V17: >>>>>> - cherry picked from v15 >>>>>> --- >>>>>> controller/local_data.c | 2 +- >>>>>> ic/ovn-ic.c | 5 +++-- >>>>>> lib/ovn-util.c | 45 +++++++++++++++++++++++++++++++++++++++++ >>>>>> lib/ovn-util.h | 8 ++++++++ >>>>>> northd/northd.c | 38 +++++++++++++++++----------------- >>>>>> northd/northd.h | 5 +++-- >>>>>> ovn-sb.ovsschema | 11 ++++++++-- >>>>>> ovn-sb.xml | 21 ++++++++++--------- >>>>>> tests/ovn-controller.at | 4 ++++ >>>>>> tests/ovn-northd.at | 6 +++--- >>>>>> tests/ovn.at | 6 ++---- >>>>>> utilities/ovn-sbctl.c | 4 ++-- >>>>>> utilities/ovn-trace.c | 3 +-- >>>>>> 13 files changed, 112 insertions(+), 46 deletions(-) >>>>> >>>>> <snip> >>>>> >>>>>> /* Returns an "enum ovn_stage" built from the arguments. >>>>>> diff --git a/ovn-sb.ovsschema b/ovn-sb.ovsschema >>>>>> index 4c24f5b51..8bc7bbba3 100644 >>>>>> --- a/ovn-sb.ovsschema >>>>>> +++ b/ovn-sb.ovsschema >>>>>> @@ -1,7 +1,7 @@ >>>>>> { >>>>>> "name": "OVN_Southbound", >>>>>> - "version": "21.2.0", >>>>>> - "cksum": "29145795 34859", >>>>>> + "version": "21.3.0", >>>>>> + "cksum": "3179330761 35286", >>>>>> "tables": { >>>>>> "SB_Global": { >>>>>> "columns": { >>>>>> @@ -196,6 +196,13 @@ >>>>>> "load_balancers": {"type": {"key": {"type": "uuid"}, >>>>>> "min": 0, >>>>>> "max": "unlimited"}}, >>>>>> + "type": {"type": {"key": {"type": "string", >>>>>> + "enum": ["set", >>>>>> + ["logical-switch", >>>>>> + >>>>>> "logical-router"]]}}}, >>>>> >>>>> Hi, Mark, Dumitru, Ales. >>>>> >>> >>> Hi Ilya, >>> >>>>> This change breaks upgrades, at least in active-backup database >>>>> configuration. >>>>> >>>>> When the old database is converted to the new schema, the default value >>>>> for >>>>> the column is an empty string. But this is against the schema definition. >>>>> This is not a problem for the database itself, as it doesn't check these >>>>> constraints, but if we have the backup database connected, it will check >>>>> the >>>>> constraints upon receiving the new data and will fail: >>>>> >>>>> |replication|INFO|OVN_Southbound: Monitor reply received. Resetting the >>>>> database. >>>>> |00045|ovsdb_error|ERR|unexpected ovsdb error: constraint violation: "" >>>>> is not >>>>> one of the allowed values ([logical-router, >>>>> logical-switch]) >>>>> >>>>> At this point the backup server will disconnect and turn into active >>>>> server >>>>> as ti thinks that the origin is broken. We probably need to allow this >>>>> column to not have a value specified. >>>>> >>>>> It's fairly easy to reporoduce - get an older database and start a sandbox >>>>> with it. The sb2 will error out and disconnect. >>>>> >>> >>> Nice catch! I managed to reproduce the problem as following your >>> suggestion locally: I started a 25.03 sandbox, ran ./ovn-setup.sh, saved >>> the NB/SB and then started a sandbox on main with the saved DBs. >>> >>>>> Also, northd and ovn-controller should be able to handle this case where >>>>> the column is not yet set after upgrade. I didn't check if that works or >>>>> not. >>> >>> Yes, that part is handled correctly, see datapath_get_nb_uuid_and_type(). >>> >>> Are you planning to post a patch to fix this? I assume we just need to >>> make the column value optional in the SB schema. >> >> I'll probably not have time for this today, so feel free to work on it. >> Otherwise, I can find some time early next week. >> > > No worries, I posted a patch here: > https://patchwork.ozlabs.org/project/ovn/patch/[email protected]/ > >>> >>>> >>>> And it seems like northd is in the infinite lop of re-writing entire sb. >>>> Not sure if it is related or some other issue. >>>> >>> >>> Could it be that you didn't have the Static_Mac_Binding fix in your >>> local main branch? I don't see the infinite loop in my reproducer. >>> >>> https://github.com/ovn-org/ovn/commit/e8a0ade >> >> I saw this on the latest main: >> https://github.com/ovn-org/ovn/commit/b6ee938b9 >> >> So, both of the static MAC binding changes are there. >> > > Upon further debugging there is a problem with static MAC bindings but > that's not what is causing the loop. Ales just posted a separate patch > for that. > >> It feels like northd removes datapaths with all the acssociated resources >> and then re-adds them on the next iteration. E.g. I see the same static >> mac binding deleted in one transaction and added back in the next and >> so on in the loop, same with the logical flows and some other stuff. >> >>> >>> If not, it might be good to share the DBs if possible. >> >> I saw this in one of the OpenShift DBs from OVN 24.03. I can share more >> details on this particular database privately. >> > > Thanks for the databases, I managed to reproduce this locally. > Debugging further, I'll update once I know more.
Hi Ilya, I turned the datapath schema fix patch into a series which now includes all the issues that we found with the databases you spotted the bug on initially: https://patchwork.ozlabs.org/project/ovn/list/?series=471679&state=* Thanks again for the bug report, great catch! Regards, Dumitru _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
