Em qui., 29 de jan. de 2026 às 09:11, Mairtin O'Loingsigh
<[email protected]> escreveu:
>
> On Wed, Jan 28, 2026 at 03:55:26PM -0300, Tiago Matos Carvalho Reis wrote:
> > Hi everyone,
> >
> > I have been working on implementing incremental processing in OVN-IC and
> > encountered a design issue regarding how OVN-IC handles multi-AZ writes.
> >
> > The Issue
> > In a scenario where multiple AZs are connected via OVN-IC, certain events
> > trigger all AZs to attempt writing the same data to the ISB/INB
> > simultaneously. This race condition leads to a constraint violation, which
> > causes the transaction to fail and forces a full recompute.
> >
> > Example:
> > A clear example of this can be seen in ovn-ic.c:ts_run:
> >
> >     if (ctx->ovnisb_txn) {
> >         /* Create ISB Datapath_Binding */
> >         ICNBREC_TRANSIT_SWITCH_FOR_EACH (ts, ctx->ovninb_idl) {
> >             const struct icsbrec_datapath_binding *isb_dp =
> >                 shash_find_and_delete(isb_ts_dps, ts->name);
> >             if (!isb_dp) {
> >                 /* Allocate tunnel key */
> >                 int64_t dp_key = allocate_dp_key(dp_tnlids, vxlan_mode,
> >                                                  "transit switch datapath");
> >                 if (!dp_key) {
> >                     continue;
> >                 }
> >
> >                 isb_dp = icsbrec_datapath_binding_insert(ctx->ovnisb_txn);
> >                 icsbrec_datapath_binding_set_transit_switch(isb_dp,
> > ts->name);
> >                 icsbrec_datapath_binding_set_tunnel_key(isb_dp, dp_key);
> >             } else if (dp_key_refresh) {
> >                 /* Refresh tunnel key since encap mode has changed. */
> >                 int64_t dp_key = allocate_dp_key(dp_tnlids, vxlan_mode,
> >                                                  "transit switch datapath");
> >                 if (dp_key) {
> >                     icsbrec_datapath_binding_set_tunnel_key(isb_dp, dp_key);
> >                 }
> >             }
> >
> >             if (!isb_dp->type) {
> >                 icsbrec_datapath_binding_set_type(isb_dp, "transit-switch");
> >             }
> >
> >             if (!isb_dp->nb_ic_uuid) {
> >                 icsbrec_datapath_binding_set_nb_ic_uuid(isb_dp,
> >                                                         &ts->header_.uuid,
> > 1);
> >             }
> >         }
> >
> >         struct shash_node *node;
> >         SHASH_FOR_EACH (node, isb_ts_dps) {
> >             icsbrec_datapath_binding_delete(node->data);
> >         }
> >     }
> >
> > When a new transit-switch is created, every AZ attempts to create the same
> > datapath_binding on the ISB. Only one request succeeds; the others fail
> > with a "constraint-violation."
> >
> > Impact:
> > This behavior negates the performance benefits of implementing incremental
> > processing, as the system falls back to a full recompute upon these
> > failures.
> >
> > For development purposes, I am currently ignoring these errors, but the
> > ideal way of fixing this issue is to have a mechanism where only a single
> > AZ handles the writes but this would require implementing some consensus
> > protocol.
> >
> > Does anyone have any advice on how we can fix this issue?
> ovn-ic in each AZ enumerates all existing ISB datapaths in
> enumerate_datapaths
> function, then will attempt to add missing datapaths. Since multilpe AZs
> will attempt to add the same missing entry, all but the first will fail
> causing transaction errors. Currently, ovn-ic will enumerate the ISB
> datapath again, see the entry that succeeded and continue to create NB
> in local AZ. This solution does cause a transaction error on all but 1
> AZ whenever a Transit router is added, but we currently dont have a
> mechanism to manage this gracefully across multiple AZs.

Hi Mairtin, thanks for the reply.

Since there is no mechanism to manage which AZ should insert the data,
the only good solution besides implementing a full-fledge consensus algorithm
like Raft to select a leader AZ,  that I came up with is to simply set an option
in IC_NB_Global to manually configure a specific AZ as a leader, and in the
code check if the AZ is the leader or not.

Example:
$ ovn-ic-nbctl set IC_NB_Global . options:leader=az1

In the code:

const struct icnbrec_ic_nb_global *icnb_global =
    icnbrec_ic_nb_global_table_first(ic_nb_global_table);

const struct nbrec_nb_global *nb_global =
    nbrec_nb_global_table_first(nb_global_table);

const char *leader = smap_get(&icnb_global->options, "leader")
if (!strcmp(leader, nb_global->name)) {
// Insert logic here
}

Do you have any opinion on this approach?

> >
> > Thanks,
> > Tiago Matos
> >
> > --
> >
> >
> >
> >
> > _?Esta mensagem ? direcionada apenas para os endere?os constantes no
> > cabe?alho inicial. Se voc? n?o est? listado nos endere?os constantes no
> > cabe?alho, pedimos-lhe que desconsidere completamente o conte?do dessa
> > mensagem e cuja c?pia, encaminhamento e/ou execu??o das a??es citadas est?o
> > imediatamente anuladas e proibidas?._
> >
> >
> > *?**?Apesar do Magazine Luiza tomar
> > todas as precau??es razo?veis para assegurar que nenhum v?rus esteja
> > presente nesse e-mail, a empresa n?o poder? aceitar a responsabilidade por
> > quaisquer perdas ou danos causados por esse e-mail ou por seus anexos?.*
> >
> >
> >
> > -------------- next part --------------
> > An HTML attachment was scrubbed...
> > URL: 
> > <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20260128/90a7463f/attachment.htm>
>
>
> Hi Tiago,
>
> I ran into similar issues when adding transit router support and have
> added a comment above. I also have been working on OVN-IC related
> features, so if you would like to discuss above issue further or other
> OVN-IC work I would like to help.
>
> Regards,
> Mairtin
>


Regards,
Tiago Matos

-- 




_‘Esta mensagem é direcionada apenas para os endereços constantes no 
cabeçalho inicial. Se você não está listado nos endereços constantes no 
cabeçalho, pedimos-lhe que desconsidere completamente o conteúdo dessa 
mensagem e cuja cópia, encaminhamento e/ou execução das ações citadas estão 
imediatamente anuladas e proibidas’._


* **‘Apesar do Magazine Luiza tomar 
todas as precauções razoáveis para assegurar que nenhum vírus esteja 
presente nesse e-mail, a empresa não poderá aceitar a responsabilidade por 
quaisquer perdas ou danos causados por esse e-mail ou por seus anexos’.*



_______________________________________________
discuss mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to