On Thu, Oct 31, 2019 at 9:21 AM Dumitru Ceara <dce...@redhat.com> wrote:
>
> On Thu, Oct 31, 2019 at 6:16 AM Han Zhou <hz...@ovn.org> wrote:
> >
> >
> >
> > On Tue, Oct 29, 2019 at 8:11 AM Dumitru Ceara <dce...@redhat.com> wrote:
> > >
> > > The incremental processing engine might stop a run before the
> > > en_runtime_data node is processed. In such cases the ed_runtime_data
> > > fields might contain pointers to already deleted SB records. For
> > > example, if a port binding corresponding to a patch port is removed from
> > > the SB database and the incremental processing engine aborts before the
> > > en_runtime_data node is processed then the corresponding local_datapath
> > > hashtable entry in ed_runtime_data is stale and will store a pointer to
> > > the already freed sbrec_port_binding record.
> > >
> > > This will cause invalid memory accesses in various places (e.g.,
> > > pinctrl_run() -> prepare_ipv6_ras()).
> > >
> > > To fix the issue we need a way to track how each node was processed
> > > during an engine run. This commit transforms the 'changed' field in
> > > struct engine_node in a 'state' field. Possible node states are:
> > > - "Stale": data in the node is not up to date with the DB.
> > > - "Updated": data in the node is valid but was updated during
> > >   the last run of the engine.
> > > - "Valid": data in the node is valid and didn't change during
> > >   the last run of the engine.
> > > - "Aborted": during the last run, processing was aborted for
> > >   this node.
> > >
> > > The commit also simplifies the logic of calling engine_run and
> > > engine_need_run in order to reduce the number of external variables
> > > required to track the result of the last engine execution.
> > >
> > > Functions that need to be called from the main loop and depend on
> > > various data contents of the engine's nodes are now called only if
> > > the data is up to date.
> > >
> > > CC: Han Zhou <hzh...@ebay.com>
> > > Fixes: a6b7d9f4f047 ("ovn-controller: Fix flow installation latency
> > > caused by recompute.")
> > > Signed-off-by: Dumitru Ceara <dce...@redhat.com>
> >
> >
> > Thanks Dumitru for the great fix! This was my omission.
> > I realized that the problem should have existed even before the commit 
> > a6b7d9f4f047 (so the commit message can be updated), because there are 
> > chances the engine would not run at all, which is mostly caused by a 
> > previous SB update from this node and the corresponding ovsdb otification 
> > is coming back. At that moment since the transaction's reply is not coming 
> > yet, the txn is always NULL and engine_run() is always skipped. If the 
> > transaction itself deleted some data, which happen to be used by 
> > pinctrl_run() in this iteration, then it would hit the dangling pointer 
> > problem. I am not sure why I never encountered such case though.
> >
>
> Hi Han,
>
> Right, I selected the wrong commit. I'll update the commit message. I
> think it's harder to see the issue because you need to be waiting for
> a transaction reply and at the same time deleting port bindings from
> the SB database.
>
> > In general I think one option to fix this is never store references to 
> > another node's data. For example, always replicate the IDL data to engine 
> > node's data. In practice, this is both inefficient and complex. It is 
> > inefficient because of data copy. It is complex because data need to be 
> > deep copied.
> >
> > The other option is the one implemented by your patch, to maintain states 
> > of each egnine node and make sure accessing node data only when it is 
> > valid. I think this is a more practical approach.
> >
>
> I think this will be fixed indirectly when/if pinctrl moves to a
> different process. Until then this seems like the easiest solution.
>
> > I also like the way you handle the case when engine_run is not invoked, by 
> > passing run_id in the checks.
> >
> > However, there are still some places that some of the engine nodes’ data 
> > are used without checking engine_node_data_valid(), such as ofctrl_run() 
> > using ed_runtime_data.pending_ct_zones, ofctrl_inject_pkt() using 
> > &ed_addr_sets.addr_sets and &ed_port_groups.port_groups, etc. However, 
> > those data are actually valid all the time - they don't maintain any 
> > reference pointers to their inputs. So it may not be a real problem. 
> > However, it can be confusing why at some places we validate but at other 
> > places we don't. So I think here is an improvement idea. Basically, we 
> > should always validate the engine node data before accessing it, and the 
> > validation mechanism can be improved to reflect each node's real state. We 
> > can add an optional interface is_data_valid() for engine_node, which can 
> > override the default engine_node_data_valid() behavior. We can let 
> > engine_node_data_valid() call the node’s is_data_valid() interface if it is 
> > implemented, otherwise fallback to the default behavior, which validates 
> > only from the node state and run_id. For runtime_data, since part of it is 
> > always valid, while other part may be invalid, I think we should split the 
> > pending_ct_zones out as a separate engine node. This improvement is not 
> > urgent, so I think it is ok to add it as a follow-up patch.
>
> True, I missed those cases. It does sound better to have an optional
> is_data_valid() node callback. I'll add it in v2. If it's not too much
> work I'll also move pending_ct_zones to a separate engine node.
> Otherwise we can add it as a follow up.
>
> >
> > Regarding the current patch, I have some other comments below:
> >
> > >
> > >
> > > +                    /* We need to make sure that at least the runtime 
> > > data
> > > +                     * (e.g., local datapaths, ct zones) are fresh before
> > > +                     * calling ofctrl_put and pinctrl_run to avoid using
> > > +                     * potentially freed references to database records.
> > > +                     */
> > > +                    if (engine_node_data_valid(&en_runtime_data,
> > > +                                               engine_run_id)) {
> > > +                        ofctrl_put(&ed_flow_output.flow_table,
> > > +                                   &ed_runtime_data.pending_ct_zones,
> > > +                                   
> > > sbrec_meter_table_get(ovnsb_idl_loop.idl),
> > > +                                   get_nb_cfg(sbrec_sb_global_table_get(
> > > +                                                   ovnsb_idl_loop.idl)),
> > > +                                   
> > > engine_node_data_changed(&en_flow_output,
> > > +                                                            
> > > engine_run_id));
> >
> >
> > I think ofctrl_put() needs to be invoked every iteration, because it takes 
> > care of reinstalling flows to OVS when ovs-vswitchd is restarted. See 
> > "need_reinstall_flows" in ofctrl.c.
> > For now, I think it is ok to leave this function out of the if 
> > (engine_node_data_valid()) check, because pending_ct_zones is always valid. 
> > For improvement, we should move pending_ct_zones to a separate engine node 
> > so that it always passes the check.
>
> Ack.
>
> >
> > >
> > >
> > > @@ -2095,17 +2118,16 @@ main(int argc, char *argv[])
> > >                  }
> > >
> > >              }
> > > -            if (old_engine_run_id == engine_run_id || !engine_run_done) {
> > > -                if (!engine_run_done || 
> > > engine_need_run(&en_flow_output)) {
> > > -                    VLOG_DBG("engine did not run, force recompute next 
> > > time: "
> > > -                             "br_int %p, chassis %p", br_int, chassis);
> > > -                    engine_set_force_recompute(true);
> > > -                    poll_immediate_wake();
> > > -                } else {
> > > -                    VLOG_DBG("engine did not run, and it was not needed"
> > > -                             " either: br_int %p, chassis %p",
> > > -                             br_int, chassis);
> > > -                }
> > > +            if (engine_need_run(&en_flow_output, engine_run_id)) {
> > > +                VLOG_DBG("engine did not run, force recompute next time: 
> > > "
> > > +                            "br_int %p, chassis %p", br_int, chassis);
> > > +                engine_set_force_recompute(true);
> > > +                poll_immediate_wake();
> > > +            } else if (engine_aborted(&en_flow_output)) {
> > > +                VLOG_DBG("engine was aborted, force recompute next time: 
> > > "
> > > +                         "br_int %p, chassis %p", br_int, chassis);
> > > +                engine_set_force_recompute(true);
> > > +                poll_immediate_wake();
> >
> >
> > The logic is clearer than before. However, can we have the DBG log that 
> > tells when engine didn't run and was not needed? It could be an else if 
> > here, like:
>
> Sure, I'll add it.
>
> >                 } else if (!engine_has_run(&en_flow_output, engine_run_id)) 
> > {
> >                      VLOG_DBG(...)
> >
> > >              } else {
> > >                  engine_set_force_recompute(false);
> > >              }
> > >
> >
> >
> > >
> > > @@ -82,6 +82,20 @@ struct engine_node_input {
> > >      bool (*change_handler)(struct engine_node *node);
> > >  };
> > >
> > > +enum engine_node_state {
> > > +    EN_STALE,     /* Data in the node is not up to date with the DB. */
> > > +    EN_UPDATED,   /* Data in the node is valid but was updated during the
> > > +                   * last run.
> > > +                   */
> > > +    EN_VALID,     /* Data in the node is valid and didn't change during 
> > > the
> > > +                   * last run.
> > > +                   */
> > > +    EN_ABORTED,   /* During the last run, processing was aborted for
> > > +                   * this node.
> > > +                   */
> > > +    EN_STATE_MAX,
> > > +};
> > > +
> >
> >
> > Do we really need STALE state? I think it is same as ABORTED. It seems the 
> > node state would never be STALE, right? Even though, I think I am ok for 
> > separating them for clarity. Just want to make sure you thought about it.
>
> We don't actually end up with the final state STALE because in
> engine_run() we always move from STALE to VALID if the data didn't
> change and we didn't abort execution. It did help me follow the logic
> of the code though so I would keep it.
>
> >
> > >
> > >  struct engine_node {
> > >      /* A unique id to distinguish each iteration of the engine_run(). */
> > >      uint64_t run_id;
> > > @@ -102,8 +116,8 @@ struct engine_node {
> > >       * node. */
> > >      void *data;
> > >
> > > -    /* Whether the data changed in the last engine run. */
> > > -    bool changed;
> > > +    /* State of the node after the last engine run. */
> > > +    enum engine_node_state state;
> > >
> > >      /* Method to initialize data. It may be NULL. */
> > >      void (*init)(struct engine_node *);
> > > @@ -121,9 +135,9 @@ struct engine_node {
> > >  void engine_init(struct engine_node *);
> > >
> > >  /* Execute the processing recursively, which should be called in the main
> > > - * loop. Returns true if the execution is compelte, false if it is 
> > > aborted,
> > > - * which could happen when engine_abort_recompute is set. */
> > > -bool engine_run(struct engine_node *, uint64_t run_id);
> > > + * loop. Updates the engine node's states accordingly.
> > > + */
> > > +void engine_run(struct engine_node *, uint64_t run_id);
> > >
> > >  /* Clean up the data for the engine nodes recursively. It calls each 
> > > node's
> > >   * cleanup() method if not NULL. It should be called before the program
> > > @@ -132,7 +146,7 @@ void engine_cleanup(struct engine_node *);
> > >
> > >  /* Check if engine needs to run, i.e. any change to be processed. */
> >
> >
> > We need to update the comment: Check if engine needs to run but didn't.
>
> Ack.
>
> >
> > >  bool
> > > -engine_need_run(struct engine_node *);
> > > +engine_need_run(struct engine_node *, uint64_t run_id);
> > >
> >
> > Thanks,
> > Han
>
> I'll send a follow up v2 soon.

Hi Han,

v2 is available here: https://patchwork.ozlabs.org/patch/1188930/

In the end I decided not to add a custom "is_data_valid" handler per
node and instead separate in ovn-controller the data that can be
accessed at any time (regardless of engine_run success). I had
initially tried with a custom handler but I found the code a bit
harder to follow.

What do you think?

Thanks,
Dumitru

>
> Thanks,
> Dumitru

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to