On Thu, Oct 31, 2019 at 9:21 AM Dumitru Ceara <dce...@redhat.com> wrote: > > On Thu, Oct 31, 2019 at 6:16 AM Han Zhou <hz...@ovn.org> wrote: > > > > > > > > On Tue, Oct 29, 2019 at 8:11 AM Dumitru Ceara <dce...@redhat.com> wrote: > > > > > > The incremental processing engine might stop a run before the > > > en_runtime_data node is processed. In such cases the ed_runtime_data > > > fields might contain pointers to already deleted SB records. For > > > example, if a port binding corresponding to a patch port is removed from > > > the SB database and the incremental processing engine aborts before the > > > en_runtime_data node is processed then the corresponding local_datapath > > > hashtable entry in ed_runtime_data is stale and will store a pointer to > > > the already freed sbrec_port_binding record. > > > > > > This will cause invalid memory accesses in various places (e.g., > > > pinctrl_run() -> prepare_ipv6_ras()). > > > > > > To fix the issue we need a way to track how each node was processed > > > during an engine run. This commit transforms the 'changed' field in > > > struct engine_node in a 'state' field. Possible node states are: > > > - "Stale": data in the node is not up to date with the DB. > > > - "Updated": data in the node is valid but was updated during > > > the last run of the engine. > > > - "Valid": data in the node is valid and didn't change during > > > the last run of the engine. > > > - "Aborted": during the last run, processing was aborted for > > > this node. > > > > > > The commit also simplifies the logic of calling engine_run and > > > engine_need_run in order to reduce the number of external variables > > > required to track the result of the last engine execution. > > > > > > Functions that need to be called from the main loop and depend on > > > various data contents of the engine's nodes are now called only if > > > the data is up to date. > > > > > > CC: Han Zhou <hzh...@ebay.com> > > > Fixes: a6b7d9f4f047 ("ovn-controller: Fix flow installation latency > > > caused by recompute.") > > > Signed-off-by: Dumitru Ceara <dce...@redhat.com> > > > > > > Thanks Dumitru for the great fix! This was my omission. > > I realized that the problem should have existed even before the commit > > a6b7d9f4f047 (so the commit message can be updated), because there are > > chances the engine would not run at all, which is mostly caused by a > > previous SB update from this node and the corresponding ovsdb otification > > is coming back. At that moment since the transaction's reply is not coming > > yet, the txn is always NULL and engine_run() is always skipped. If the > > transaction itself deleted some data, which happen to be used by > > pinctrl_run() in this iteration, then it would hit the dangling pointer > > problem. I am not sure why I never encountered such case though. > > > > Hi Han, > > Right, I selected the wrong commit. I'll update the commit message. I > think it's harder to see the issue because you need to be waiting for > a transaction reply and at the same time deleting port bindings from > the SB database. > > > In general I think one option to fix this is never store references to > > another node's data. For example, always replicate the IDL data to engine > > node's data. In practice, this is both inefficient and complex. It is > > inefficient because of data copy. It is complex because data need to be > > deep copied. > > > > The other option is the one implemented by your patch, to maintain states > > of each egnine node and make sure accessing node data only when it is > > valid. I think this is a more practical approach. > > > > I think this will be fixed indirectly when/if pinctrl moves to a > different process. Until then this seems like the easiest solution. > > > I also like the way you handle the case when engine_run is not invoked, by > > passing run_id in the checks. > > > > However, there are still some places that some of the engine nodes’ data > > are used without checking engine_node_data_valid(), such as ofctrl_run() > > using ed_runtime_data.pending_ct_zones, ofctrl_inject_pkt() using > > &ed_addr_sets.addr_sets and &ed_port_groups.port_groups, etc. However, > > those data are actually valid all the time - they don't maintain any > > reference pointers to their inputs. So it may not be a real problem. > > However, it can be confusing why at some places we validate but at other > > places we don't. So I think here is an improvement idea. Basically, we > > should always validate the engine node data before accessing it, and the > > validation mechanism can be improved to reflect each node's real state. We > > can add an optional interface is_data_valid() for engine_node, which can > > override the default engine_node_data_valid() behavior. We can let > > engine_node_data_valid() call the node’s is_data_valid() interface if it is > > implemented, otherwise fallback to the default behavior, which validates > > only from the node state and run_id. For runtime_data, since part of it is > > always valid, while other part may be invalid, I think we should split the > > pending_ct_zones out as a separate engine node. This improvement is not > > urgent, so I think it is ok to add it as a follow-up patch. > > True, I missed those cases. It does sound better to have an optional > is_data_valid() node callback. I'll add it in v2. If it's not too much > work I'll also move pending_ct_zones to a separate engine node. > Otherwise we can add it as a follow up. > > > > > Regarding the current patch, I have some other comments below: > > > > > > > > > > > + /* We need to make sure that at least the runtime > > > data > > > + * (e.g., local datapaths, ct zones) are fresh before > > > + * calling ofctrl_put and pinctrl_run to avoid using > > > + * potentially freed references to database records. > > > + */ > > > + if (engine_node_data_valid(&en_runtime_data, > > > + engine_run_id)) { > > > + ofctrl_put(&ed_flow_output.flow_table, > > > + &ed_runtime_data.pending_ct_zones, > > > + > > > sbrec_meter_table_get(ovnsb_idl_loop.idl), > > > + get_nb_cfg(sbrec_sb_global_table_get( > > > + ovnsb_idl_loop.idl)), > > > + > > > engine_node_data_changed(&en_flow_output, > > > + > > > engine_run_id)); > > > > > > I think ofctrl_put() needs to be invoked every iteration, because it takes > > care of reinstalling flows to OVS when ovs-vswitchd is restarted. See > > "need_reinstall_flows" in ofctrl.c. > > For now, I think it is ok to leave this function out of the if > > (engine_node_data_valid()) check, because pending_ct_zones is always valid. > > For improvement, we should move pending_ct_zones to a separate engine node > > so that it always passes the check. > > Ack. > > > > > > > > > > > > @@ -2095,17 +2118,16 @@ main(int argc, char *argv[]) > > > } > > > > > > } > > > - if (old_engine_run_id == engine_run_id || !engine_run_done) { > > > - if (!engine_run_done || > > > engine_need_run(&en_flow_output)) { > > > - VLOG_DBG("engine did not run, force recompute next > > > time: " > > > - "br_int %p, chassis %p", br_int, chassis); > > > - engine_set_force_recompute(true); > > > - poll_immediate_wake(); > > > - } else { > > > - VLOG_DBG("engine did not run, and it was not needed" > > > - " either: br_int %p, chassis %p", > > > - br_int, chassis); > > > - } > > > + if (engine_need_run(&en_flow_output, engine_run_id)) { > > > + VLOG_DBG("engine did not run, force recompute next time: > > > " > > > + "br_int %p, chassis %p", br_int, chassis); > > > + engine_set_force_recompute(true); > > > + poll_immediate_wake(); > > > + } else if (engine_aborted(&en_flow_output)) { > > > + VLOG_DBG("engine was aborted, force recompute next time: > > > " > > > + "br_int %p, chassis %p", br_int, chassis); > > > + engine_set_force_recompute(true); > > > + poll_immediate_wake(); > > > > > > The logic is clearer than before. However, can we have the DBG log that > > tells when engine didn't run and was not needed? It could be an else if > > here, like: > > Sure, I'll add it. > > > } else if (!engine_has_run(&en_flow_output, engine_run_id)) > > { > > VLOG_DBG(...) > > > > > } else { > > > engine_set_force_recompute(false); > > > } > > > > > > > > > > > > > @@ -82,6 +82,20 @@ struct engine_node_input { > > > bool (*change_handler)(struct engine_node *node); > > > }; > > > > > > +enum engine_node_state { > > > + EN_STALE, /* Data in the node is not up to date with the DB. */ > > > + EN_UPDATED, /* Data in the node is valid but was updated during the > > > + * last run. > > > + */ > > > + EN_VALID, /* Data in the node is valid and didn't change during > > > the > > > + * last run. > > > + */ > > > + EN_ABORTED, /* During the last run, processing was aborted for > > > + * this node. > > > + */ > > > + EN_STATE_MAX, > > > +}; > > > + > > > > > > Do we really need STALE state? I think it is same as ABORTED. It seems the > > node state would never be STALE, right? Even though, I think I am ok for > > separating them for clarity. Just want to make sure you thought about it. > > We don't actually end up with the final state STALE because in > engine_run() we always move from STALE to VALID if the data didn't > change and we didn't abort execution. It did help me follow the logic > of the code though so I would keep it. > > > > > > > > > struct engine_node { > > > /* A unique id to distinguish each iteration of the engine_run(). */ > > > uint64_t run_id; > > > @@ -102,8 +116,8 @@ struct engine_node { > > > * node. */ > > > void *data; > > > > > > - /* Whether the data changed in the last engine run. */ > > > - bool changed; > > > + /* State of the node after the last engine run. */ > > > + enum engine_node_state state; > > > > > > /* Method to initialize data. It may be NULL. */ > > > void (*init)(struct engine_node *); > > > @@ -121,9 +135,9 @@ struct engine_node { > > > void engine_init(struct engine_node *); > > > > > > /* Execute the processing recursively, which should be called in the main > > > - * loop. Returns true if the execution is compelte, false if it is > > > aborted, > > > - * which could happen when engine_abort_recompute is set. */ > > > -bool engine_run(struct engine_node *, uint64_t run_id); > > > + * loop. Updates the engine node's states accordingly. > > > + */ > > > +void engine_run(struct engine_node *, uint64_t run_id); > > > > > > /* Clean up the data for the engine nodes recursively. It calls each > > > node's > > > * cleanup() method if not NULL. It should be called before the program > > > @@ -132,7 +146,7 @@ void engine_cleanup(struct engine_node *); > > > > > > /* Check if engine needs to run, i.e. any change to be processed. */ > > > > > > We need to update the comment: Check if engine needs to run but didn't. > > Ack. > > > > > > bool > > > -engine_need_run(struct engine_node *); > > > +engine_need_run(struct engine_node *, uint64_t run_id); > > > > > > > Thanks, > > Han > > I'll send a follow up v2 soon.
Hi Han, v2 is available here: https://patchwork.ozlabs.org/patch/1188930/ In the end I decided not to add a custom "is_data_valid" handler per node and instead separate in ovn-controller the data that can be accessed at any time (regardless of engine_run success). I had initially tried with a custom handler but I found the code a bit harder to follow. What do you think? Thanks, Dumitru > > Thanks, > Dumitru _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev