On Thu, Oct 31, 2019 at 6:16 AM Han Zhou <hz...@ovn.org> wrote:
>
>
>
> On Tue, Oct 29, 2019 at 8:11 AM Dumitru Ceara <dce...@redhat.com> wrote:
> >
> > The incremental processing engine might stop a run before the
> > en_runtime_data node is processed. In such cases the ed_runtime_data
> > fields might contain pointers to already deleted SB records. For
> > example, if a port binding corresponding to a patch port is removed from
> > the SB database and the incremental processing engine aborts before the
> > en_runtime_data node is processed then the corresponding local_datapath
> > hashtable entry in ed_runtime_data is stale and will store a pointer to
> > the already freed sbrec_port_binding record.
> >
> > This will cause invalid memory accesses in various places (e.g.,
> > pinctrl_run() -> prepare_ipv6_ras()).
> >
> > To fix the issue we need a way to track how each node was processed
> > during an engine run. This commit transforms the 'changed' field in
> > struct engine_node in a 'state' field. Possible node states are:
> > - "Stale": data in the node is not up to date with the DB.
> > - "Updated": data in the node is valid but was updated during
> >   the last run of the engine.
> > - "Valid": data in the node is valid and didn't change during
> >   the last run of the engine.
> > - "Aborted": during the last run, processing was aborted for
> >   this node.
> >
> > The commit also simplifies the logic of calling engine_run and
> > engine_need_run in order to reduce the number of external variables
> > required to track the result of the last engine execution.
> >
> > Functions that need to be called from the main loop and depend on
> > various data contents of the engine's nodes are now called only if
> > the data is up to date.
> >
> > CC: Han Zhou <hzh...@ebay.com>
> > Fixes: a6b7d9f4f047 ("ovn-controller: Fix flow installation latency
> > caused by recompute.")
> > Signed-off-by: Dumitru Ceara <dce...@redhat.com>
>
>
> Thanks Dumitru for the great fix! This was my omission.
> I realized that the problem should have existed even before the commit 
> a6b7d9f4f047 (so the commit message can be updated), because there are 
> chances the engine would not run at all, which is mostly caused by a previous 
> SB update from this node and the corresponding ovsdb otification is coming 
> back. At that moment since the transaction's reply is not coming yet, the txn 
> is always NULL and engine_run() is always skipped. If the transaction itself 
> deleted some data, which happen to be used by pinctrl_run() in this 
> iteration, then it would hit the dangling pointer problem. I am not sure why 
> I never encountered such case though.
>

Hi Han,

Right, I selected the wrong commit. I'll update the commit message. I
think it's harder to see the issue because you need to be waiting for
a transaction reply and at the same time deleting port bindings from
the SB database.

> In general I think one option to fix this is never store references to 
> another node's data. For example, always replicate the IDL data to engine 
> node's data. In practice, this is both inefficient and complex. It is 
> inefficient because of data copy. It is complex because data need to be deep 
> copied.
>
> The other option is the one implemented by your patch, to maintain states of 
> each egnine node and make sure accessing node data only when it is valid. I 
> think this is a more practical approach.
>

I think this will be fixed indirectly when/if pinctrl moves to a
different process. Until then this seems like the easiest solution.

> I also like the way you handle the case when engine_run is not invoked, by 
> passing run_id in the checks.
>
> However, there are still some places that some of the engine nodes’ data are 
> used without checking engine_node_data_valid(), such as ofctrl_run() using 
> ed_runtime_data.pending_ct_zones, ofctrl_inject_pkt() using 
> &ed_addr_sets.addr_sets and &ed_port_groups.port_groups, etc. However, those 
> data are actually valid all the time - they don't maintain any reference 
> pointers to their inputs. So it may not be a real problem. However, it can be 
> confusing why at some places we validate but at other places we don't. So I 
> think here is an improvement idea. Basically, we should always validate the 
> engine node data before accessing it, and the validation mechanism can be 
> improved to reflect each node's real state. We can add an optional interface 
> is_data_valid() for engine_node, which can override the default 
> engine_node_data_valid() behavior. We can let engine_node_data_valid() call 
> the node’s is_data_valid() interface if it is implemented, otherwise fallback 
> to the default behavior, which validates only from the node state and run_id. 
> For runtime_data, since part of it is always valid, while other part may be 
> invalid, I think we should split the pending_ct_zones out as a separate 
> engine node. This improvement is not urgent, so I think it is ok to add it as 
> a follow-up patch.

True, I missed those cases. It does sound better to have an optional
is_data_valid() node callback. I'll add it in v2. If it's not too much
work I'll also move pending_ct_zones to a separate engine node.
Otherwise we can add it as a follow up.

>
> Regarding the current patch, I have some other comments below:
>
> >
> >
> > +                    /* We need to make sure that at least the runtime data
> > +                     * (e.g., local datapaths, ct zones) are fresh before
> > +                     * calling ofctrl_put and pinctrl_run to avoid using
> > +                     * potentially freed references to database records.
> > +                     */
> > +                    if (engine_node_data_valid(&en_runtime_data,
> > +                                               engine_run_id)) {
> > +                        ofctrl_put(&ed_flow_output.flow_table,
> > +                                   &ed_runtime_data.pending_ct_zones,
> > +                                   
> > sbrec_meter_table_get(ovnsb_idl_loop.idl),
> > +                                   get_nb_cfg(sbrec_sb_global_table_get(
> > +                                                   ovnsb_idl_loop.idl)),
> > +                                   
> > engine_node_data_changed(&en_flow_output,
> > +                                                            
> > engine_run_id));
>
>
> I think ofctrl_put() needs to be invoked every iteration, because it takes 
> care of reinstalling flows to OVS when ovs-vswitchd is restarted. See 
> "need_reinstall_flows" in ofctrl.c.
> For now, I think it is ok to leave this function out of the if 
> (engine_node_data_valid()) check, because pending_ct_zones is always valid. 
> For improvement, we should move pending_ct_zones to a separate engine node so 
> that it always passes the check.

Ack.

>
> >
> >
> > @@ -2095,17 +2118,16 @@ main(int argc, char *argv[])
> >                  }
> >
> >              }
> > -            if (old_engine_run_id == engine_run_id || !engine_run_done) {
> > -                if (!engine_run_done || engine_need_run(&en_flow_output)) {
> > -                    VLOG_DBG("engine did not run, force recompute next 
> > time: "
> > -                             "br_int %p, chassis %p", br_int, chassis);
> > -                    engine_set_force_recompute(true);
> > -                    poll_immediate_wake();
> > -                } else {
> > -                    VLOG_DBG("engine did not run, and it was not needed"
> > -                             " either: br_int %p, chassis %p",
> > -                             br_int, chassis);
> > -                }
> > +            if (engine_need_run(&en_flow_output, engine_run_id)) {
> > +                VLOG_DBG("engine did not run, force recompute next time: "
> > +                            "br_int %p, chassis %p", br_int, chassis);
> > +                engine_set_force_recompute(true);
> > +                poll_immediate_wake();
> > +            } else if (engine_aborted(&en_flow_output)) {
> > +                VLOG_DBG("engine was aborted, force recompute next time: "
> > +                         "br_int %p, chassis %p", br_int, chassis);
> > +                engine_set_force_recompute(true);
> > +                poll_immediate_wake();
>
>
> The logic is clearer than before. However, can we have the DBG log that tells 
> when engine didn't run and was not needed? It could be an else if here, like:

Sure, I'll add it.

>                 } else if (!engine_has_run(&en_flow_output, engine_run_id)) {
>                      VLOG_DBG(...)
>
> >              } else {
> >                  engine_set_force_recompute(false);
> >              }
> >
>
>
> >
> > @@ -82,6 +82,20 @@ struct engine_node_input {
> >      bool (*change_handler)(struct engine_node *node);
> >  };
> >
> > +enum engine_node_state {
> > +    EN_STALE,     /* Data in the node is not up to date with the DB. */
> > +    EN_UPDATED,   /* Data in the node is valid but was updated during the
> > +                   * last run.
> > +                   */
> > +    EN_VALID,     /* Data in the node is valid and didn't change during the
> > +                   * last run.
> > +                   */
> > +    EN_ABORTED,   /* During the last run, processing was aborted for
> > +                   * this node.
> > +                   */
> > +    EN_STATE_MAX,
> > +};
> > +
>
>
> Do we really need STALE state? I think it is same as ABORTED. It seems the 
> node state would never be STALE, right? Even though, I think I am ok for 
> separating them for clarity. Just want to make sure you thought about it.

We don't actually end up with the final state STALE because in
engine_run() we always move from STALE to VALID if the data didn't
change and we didn't abort execution. It did help me follow the logic
of the code though so I would keep it.

>
> >
> >  struct engine_node {
> >      /* A unique id to distinguish each iteration of the engine_run(). */
> >      uint64_t run_id;
> > @@ -102,8 +116,8 @@ struct engine_node {
> >       * node. */
> >      void *data;
> >
> > -    /* Whether the data changed in the last engine run. */
> > -    bool changed;
> > +    /* State of the node after the last engine run. */
> > +    enum engine_node_state state;
> >
> >      /* Method to initialize data. It may be NULL. */
> >      void (*init)(struct engine_node *);
> > @@ -121,9 +135,9 @@ struct engine_node {
> >  void engine_init(struct engine_node *);
> >
> >  /* Execute the processing recursively, which should be called in the main
> > - * loop. Returns true if the execution is compelte, false if it is aborted,
> > - * which could happen when engine_abort_recompute is set. */
> > -bool engine_run(struct engine_node *, uint64_t run_id);
> > + * loop. Updates the engine node's states accordingly.
> > + */
> > +void engine_run(struct engine_node *, uint64_t run_id);
> >
> >  /* Clean up the data for the engine nodes recursively. It calls each node's
> >   * cleanup() method if not NULL. It should be called before the program
> > @@ -132,7 +146,7 @@ void engine_cleanup(struct engine_node *);
> >
> >  /* Check if engine needs to run, i.e. any change to be processed. */
>
>
> We need to update the comment: Check if engine needs to run but didn't.

Ack.

>
> >  bool
> > -engine_need_run(struct engine_node *);
> > +engine_need_run(struct engine_node *, uint64_t run_id);
> >
>
> Thanks,
> Han

I'll send a follow up v2 soon.

Thanks,
Dumitru

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to