On 2/22/21 10:37 AM, Numan Siddique wrote:
On Mon, Feb 22, 2021 at 2:53 PM Numan Siddique <num...@ovn.org> wrote:

On Mon, Feb 22, 2021 at 2:17 PM Dumitru Ceara <dce...@redhat.com> wrote:

On 2/21/21 12:34 PM, num...@ovn.org wrote:
From: Numan Siddique <num...@ovn.org>

In one of the scaled deployments, ovn-controller is asserting with the
below stack trace

      ***
       (gdb) bt
         0  raise () from /lib64/libc.so.6
         1  abort () from /lib64/libc.so.6
         2  ovs_abort_valist ("%s: assertion %s failed in %s()") at 
lib/util.c:419
         3  vlog_abort_valist ("%s: assertion %s failed in %s()") at 
lib/vlog.c:1249
         4  vlog_abort ("%s: assertion %s failed in %s()") at lib/vlog.c:1263
         5  ovs_assert_failure (where="controller/ofctrl.c:1216",
                                function="flood_remove_flows_for_sb_uuid",
                                condition="ovs_list_is_empty(&f->list_node)") 
at lib/util.c:86
         6  flood_remove_flows_for_sb_uuid (sb_uuid=...134,
              flood_remove_nodes=...ef0) at controller/ofctrl.c:1216
         9  ofctrl_flood_remove_flows (flood_remove_nodes=...ef0) at 
controller/ofctrl.c:1280
         10 lflow_handle_changed_ref (ref_type=REF_TYPE_PORTGROUP,
              ref_name= "5564_pg_14...aac") at controller/lflow.c:522
         11 _flow_output_resource_ref_handler (ref_type=REF_TYPE_PORTGROUP)
              at controller/ovn-controller.c:2220
         12 engine_compute () at lib/inc-proc-eng.c:306
         13 engine_run_node (recompute_allowed=true) at lib/inc-proc-eng.c:352
         14 engine_run (recompute_allowed=true) at lib/inc-proc-eng.c:377
         15 main () at controller/ovn-controller.c:2887
      ***

The reason for this assert is different from the assertion bug fixed in
the commit 858d1dd716("ofctrl: Fix the assert seen when flood removing
flows.")

The above assertion is seen if lflow.c calls ofctrl_add_or_append_flow()
twice for the same flow uuid 'S'.  When this function is called for
the first time, a new desired flow 'f' is created and 'f->references'
will have [(S, f)].  When it is called for the 2nd time, the list
'f->references' is updated as [(S, f), (S,f)].  Later when
flood_remove_flows_for_sb_uuid() is called for 'S', the above assertion
is seen.

This scenarion can happen when ovn-controller receives an update message
from SB ovsdb-server in which a new logical flow 'F' belonging to a
datapath 'D' is added which would result in conjunction flows.  If this
datapath 'D' is added to 'local_datapaths' in the same loop, then
logical flow 'F' is first processed in lflow_add_flows_for_datapath()
and the second time in lflow_handle_changed_flows().

This issue can be fixed in 2 ways
1. Flood remove all the tracked flows in lflow_handle_changed_flows()
     first and then process the modified or newly added logical flows.

2. In the function ofctrl_add_or_append_flow(), check if there is
     already a reference for the flow uuid 'S' in the existing desired
     flows and only append the actions if it doesn't exist.

This patch has taken the approach (2) as that seems better and
effecient.

Signed-off-by: Numan Siddique <num...@ovn.org>
---

Hi Numan,

Nice work tracking this down!

This is not a full review, just an initial question to see if there's
also an option #3 to fix this issue:

Can we try to avoid processing the same lflow twice?  E.g., in
lflow_add_flows_for_datapath(), call consider_logical_flow__() only for
logical flows that are not on the table's 'track_list', so have not
changed in the SB.

Hi Dumitru,

Thanks for the comments. I think this would require running  a for
loop of tracked
flows for every logical flow for the datapath and checking if it is in
the tracked list or not.
I think we could do this too. But in my opinion we should support
ofctrl_add_or_append_flow() to be called twice for the same uuid.
When ofctrl_add_or_append_flow() is called twice, it should result in a no-op.

We could end up in a similar scenario later when incrementally processing
other changes.

In my opinion we can take (2) to not result in this assert scenario.
We can also take your suggested approach (3) so that we don't call
ofctrl_add_or_append_flow() twice for the reported scenario.

What do you think?

One problem I see with #3 is that, this makes the function
lflow_add_flows_for_datapath()
assume that the function lflow_handle_changed_flows() will be called
later and also enforces
the order of the I-P engine nodes registered to the flow_output.  In my opinion
we should avoid this assumption.


That's a fair point indeed. But if we go for (2) we also increase complexity in the already complex ofctrl_*() code.

I'm not sure yet what's best :)

Regards,
Dumitru

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to