On Thu, Oct 31, 2019 at 6:16 AM Han Zhou <hz...@ovn.org> wrote: > > > > On Tue, Oct 29, 2019 at 8:11 AM Dumitru Ceara <dce...@redhat.com> wrote: > > > > The incremental processing engine might stop a run before the > > en_runtime_data node is processed. In such cases the ed_runtime_data > > fields might contain pointers to already deleted SB records. For > > example, if a port binding corresponding to a patch port is removed from > > the SB database and the incremental processing engine aborts before the > > en_runtime_data node is processed then the corresponding local_datapath > > hashtable entry in ed_runtime_data is stale and will store a pointer to > > the already freed sbrec_port_binding record. > > > > This will cause invalid memory accesses in various places (e.g., > > pinctrl_run() -> prepare_ipv6_ras()). > > > > To fix the issue we need a way to track how each node was processed > > during an engine run. This commit transforms the 'changed' field in > > struct engine_node in a 'state' field. Possible node states are: > > - "Stale": data in the node is not up to date with the DB. > > - "Updated": data in the node is valid but was updated during > > the last run of the engine. > > - "Valid": data in the node is valid and didn't change during > > the last run of the engine. > > - "Aborted": during the last run, processing was aborted for > > this node. > > > > The commit also simplifies the logic of calling engine_run and > > engine_need_run in order to reduce the number of external variables > > required to track the result of the last engine execution. > > > > Functions that need to be called from the main loop and depend on > > various data contents of the engine's nodes are now called only if > > the data is up to date. > > > > CC: Han Zhou <hzh...@ebay.com> > > Fixes: a6b7d9f4f047 ("ovn-controller: Fix flow installation latency > > caused by recompute.") > > Signed-off-by: Dumitru Ceara <dce...@redhat.com> > > > Thanks Dumitru for the great fix! This was my omission. > I realized that the problem should have existed even before the commit > a6b7d9f4f047 (so the commit message can be updated), because there are > chances the engine would not run at all, which is mostly caused by a previous > SB update from this node and the corresponding ovsdb otification is coming > back. At that moment since the transaction's reply is not coming yet, the txn > is always NULL and engine_run() is always skipped. If the transaction itself > deleted some data, which happen to be used by pinctrl_run() in this > iteration, then it would hit the dangling pointer problem. I am not sure why > I never encountered such case though. >
Hi Han, Right, I selected the wrong commit. I'll update the commit message. I think it's harder to see the issue because you need to be waiting for a transaction reply and at the same time deleting port bindings from the SB database. > In general I think one option to fix this is never store references to > another node's data. For example, always replicate the IDL data to engine > node's data. In practice, this is both inefficient and complex. It is > inefficient because of data copy. It is complex because data need to be deep > copied. > > The other option is the one implemented by your patch, to maintain states of > each egnine node and make sure accessing node data only when it is valid. I > think this is a more practical approach. > I think this will be fixed indirectly when/if pinctrl moves to a different process. Until then this seems like the easiest solution. > I also like the way you handle the case when engine_run is not invoked, by > passing run_id in the checks. > > However, there are still some places that some of the engine nodes’ data are > used without checking engine_node_data_valid(), such as ofctrl_run() using > ed_runtime_data.pending_ct_zones, ofctrl_inject_pkt() using > &ed_addr_sets.addr_sets and &ed_port_groups.port_groups, etc. However, those > data are actually valid all the time - they don't maintain any reference > pointers to their inputs. So it may not be a real problem. However, it can be > confusing why at some places we validate but at other places we don't. So I > think here is an improvement idea. Basically, we should always validate the > engine node data before accessing it, and the validation mechanism can be > improved to reflect each node's real state. We can add an optional interface > is_data_valid() for engine_node, which can override the default > engine_node_data_valid() behavior. We can let engine_node_data_valid() call > the node’s is_data_valid() interface if it is implemented, otherwise fallback > to the default behavior, which validates only from the node state and run_id. > For runtime_data, since part of it is always valid, while other part may be > invalid, I think we should split the pending_ct_zones out as a separate > engine node. This improvement is not urgent, so I think it is ok to add it as > a follow-up patch. True, I missed those cases. It does sound better to have an optional is_data_valid() node callback. I'll add it in v2. If it's not too much work I'll also move pending_ct_zones to a separate engine node. Otherwise we can add it as a follow up. > > Regarding the current patch, I have some other comments below: > > > > > > > + /* We need to make sure that at least the runtime data > > + * (e.g., local datapaths, ct zones) are fresh before > > + * calling ofctrl_put and pinctrl_run to avoid using > > + * potentially freed references to database records. > > + */ > > + if (engine_node_data_valid(&en_runtime_data, > > + engine_run_id)) { > > + ofctrl_put(&ed_flow_output.flow_table, > > + &ed_runtime_data.pending_ct_zones, > > + > > sbrec_meter_table_get(ovnsb_idl_loop.idl), > > + get_nb_cfg(sbrec_sb_global_table_get( > > + ovnsb_idl_loop.idl)), > > + > > engine_node_data_changed(&en_flow_output, > > + > > engine_run_id)); > > > I think ofctrl_put() needs to be invoked every iteration, because it takes > care of reinstalling flows to OVS when ovs-vswitchd is restarted. See > "need_reinstall_flows" in ofctrl.c. > For now, I think it is ok to leave this function out of the if > (engine_node_data_valid()) check, because pending_ct_zones is always valid. > For improvement, we should move pending_ct_zones to a separate engine node so > that it always passes the check. Ack. > > > > > > > @@ -2095,17 +2118,16 @@ main(int argc, char *argv[]) > > } > > > > } > > - if (old_engine_run_id == engine_run_id || !engine_run_done) { > > - if (!engine_run_done || engine_need_run(&en_flow_output)) { > > - VLOG_DBG("engine did not run, force recompute next > > time: " > > - "br_int %p, chassis %p", br_int, chassis); > > - engine_set_force_recompute(true); > > - poll_immediate_wake(); > > - } else { > > - VLOG_DBG("engine did not run, and it was not needed" > > - " either: br_int %p, chassis %p", > > - br_int, chassis); > > - } > > + if (engine_need_run(&en_flow_output, engine_run_id)) { > > + VLOG_DBG("engine did not run, force recompute next time: " > > + "br_int %p, chassis %p", br_int, chassis); > > + engine_set_force_recompute(true); > > + poll_immediate_wake(); > > + } else if (engine_aborted(&en_flow_output)) { > > + VLOG_DBG("engine was aborted, force recompute next time: " > > + "br_int %p, chassis %p", br_int, chassis); > > + engine_set_force_recompute(true); > > + poll_immediate_wake(); > > > The logic is clearer than before. However, can we have the DBG log that tells > when engine didn't run and was not needed? It could be an else if here, like: Sure, I'll add it. > } else if (!engine_has_run(&en_flow_output, engine_run_id)) { > VLOG_DBG(...) > > > } else { > > engine_set_force_recompute(false); > > } > > > > > > > > @@ -82,6 +82,20 @@ struct engine_node_input { > > bool (*change_handler)(struct engine_node *node); > > }; > > > > +enum engine_node_state { > > + EN_STALE, /* Data in the node is not up to date with the DB. */ > > + EN_UPDATED, /* Data in the node is valid but was updated during the > > + * last run. > > + */ > > + EN_VALID, /* Data in the node is valid and didn't change during the > > + * last run. > > + */ > > + EN_ABORTED, /* During the last run, processing was aborted for > > + * this node. > > + */ > > + EN_STATE_MAX, > > +}; > > + > > > Do we really need STALE state? I think it is same as ABORTED. It seems the > node state would never be STALE, right? Even though, I think I am ok for > separating them for clarity. Just want to make sure you thought about it. We don't actually end up with the final state STALE because in engine_run() we always move from STALE to VALID if the data didn't change and we didn't abort execution. It did help me follow the logic of the code though so I would keep it. > > > > > struct engine_node { > > /* A unique id to distinguish each iteration of the engine_run(). */ > > uint64_t run_id; > > @@ -102,8 +116,8 @@ struct engine_node { > > * node. */ > > void *data; > > > > - /* Whether the data changed in the last engine run. */ > > - bool changed; > > + /* State of the node after the last engine run. */ > > + enum engine_node_state state; > > > > /* Method to initialize data. It may be NULL. */ > > void (*init)(struct engine_node *); > > @@ -121,9 +135,9 @@ struct engine_node { > > void engine_init(struct engine_node *); > > > > /* Execute the processing recursively, which should be called in the main > > - * loop. Returns true if the execution is compelte, false if it is aborted, > > - * which could happen when engine_abort_recompute is set. */ > > -bool engine_run(struct engine_node *, uint64_t run_id); > > + * loop. Updates the engine node's states accordingly. > > + */ > > +void engine_run(struct engine_node *, uint64_t run_id); > > > > /* Clean up the data for the engine nodes recursively. It calls each node's > > * cleanup() method if not NULL. It should be called before the program > > @@ -132,7 +146,7 @@ void engine_cleanup(struct engine_node *); > > > > /* Check if engine needs to run, i.e. any change to be processed. */ > > > We need to update the comment: Check if engine needs to run but didn't. Ack. > > > bool > > -engine_need_run(struct engine_node *); > > +engine_need_run(struct engine_node *, uint64_t run_id); > > > > Thanks, > Han I'll send a follow up v2 soon. Thanks, Dumitru _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev