Re: [ovs-dev] [PATCH ovn] ovn-controller: Fix use of dangling pointers in I-P runtime_data.

2019-11-04 Thread Dumitru Ceara
On Thu, Oct 31, 2019 at 9:21 AM Dumitru Ceara  wrote:
>
> On Thu, Oct 31, 2019 at 6:16 AM Han Zhou  wrote:
> >
> >
> >
> > On Tue, Oct 29, 2019 at 8:11 AM Dumitru Ceara  wrote:
> > >
> > > The incremental processing engine might stop a run before the
> > > en_runtime_data node is processed. In such cases the ed_runtime_data
> > > fields might contain pointers to already deleted SB records. For
> > > example, if a port binding corresponding to a patch port is removed from
> > > the SB database and the incremental processing engine aborts before the
> > > en_runtime_data node is processed then the corresponding local_datapath
> > > hashtable entry in ed_runtime_data is stale and will store a pointer to
> > > the already freed sbrec_port_binding record.
> > >
> > > This will cause invalid memory accesses in various places (e.g.,
> > > pinctrl_run() -> prepare_ipv6_ras()).
> > >
> > > To fix the issue we need a way to track how each node was processed
> > > during an engine run. This commit transforms the 'changed' field in
> > > struct engine_node in a 'state' field. Possible node states are:
> > > - "Stale": data in the node is not up to date with the DB.
> > > - "Updated": data in the node is valid but was updated during
> > >   the last run of the engine.
> > > - "Valid": data in the node is valid and didn't change during
> > >   the last run of the engine.
> > > - "Aborted": during the last run, processing was aborted for
> > >   this node.
> > >
> > > The commit also simplifies the logic of calling engine_run and
> > > engine_need_run in order to reduce the number of external variables
> > > required to track the result of the last engine execution.
> > >
> > > Functions that need to be called from the main loop and depend on
> > > various data contents of the engine's nodes are now called only if
> > > the data is up to date.
> > >
> > > CC: Han Zhou 
> > > Fixes: a6b7d9f4f047 ("ovn-controller: Fix flow installation latency
> > > caused by recompute.")
> > > Signed-off-by: Dumitru Ceara 
> >
> >
> > Thanks Dumitru for the great fix! This was my omission.
> > I realized that the problem should have existed even before the commit 
> > a6b7d9f4f047 (so the commit message can be updated), because there are 
> > chances the engine would not run at all, which is mostly caused by a 
> > previous SB update from this node and the corresponding ovsdb otification 
> > is coming back. At that moment since the transaction's reply is not coming 
> > yet, the txn is always NULL and engine_run() is always skipped. If the 
> > transaction itself deleted some data, which happen to be used by 
> > pinctrl_run() in this iteration, then it would hit the dangling pointer 
> > problem. I am not sure why I never encountered such case though.
> >
>
> Hi Han,
>
> Right, I selected the wrong commit. I'll update the commit message. I
> think it's harder to see the issue because you need to be waiting for
> a transaction reply and at the same time deleting port bindings from
> the SB database.
>
> > In general I think one option to fix this is never store references to 
> > another node's data. For example, always replicate the IDL data to engine 
> > node's data. In practice, this is both inefficient and complex. It is 
> > inefficient because of data copy. It is complex because data need to be 
> > deep copied.
> >
> > The other option is the one implemented by your patch, to maintain states 
> > of each egnine node and make sure accessing node data only when it is 
> > valid. I think this is a more practical approach.
> >
>
> I think this will be fixed indirectly when/if pinctrl moves to a
> different process. Until then this seems like the easiest solution.
>
> > I also like the way you handle the case when engine_run is not invoked, by 
> > passing run_id in the checks.
> >
> > However, there are still some places that some of the engine nodes’ data 
> > are used without checking engine_node_data_valid(), such as ofctrl_run() 
> > using ed_runtime_data.pending_ct_zones, ofctrl_inject_pkt() using 
> > _addr_sets.addr_sets and _port_groups.port_groups, etc. However, 
> > those data are actually valid all the time - they don't maintain any 
> > reference pointers to their inputs. So it may not be a real problem. 
> > However, it can be confusing why at some places we validate but at other 
> > places we don't. So I think here is an improvement idea. Basically, we 
> > should always validate the engine node data before accessing it, and the 
> > validation mechanism can be improved to reflect each node's real state. We 
> > can add an optional interface is_data_valid() for engine_node, which can 
> > override the default engine_node_data_valid() behavior. We can let 
> > engine_node_data_valid() call the node’s is_data_valid() interface if it is 
> > implemented, otherwise fallback to the default behavior, which validates 
> > only from the node state and run_id. For runtime_data, since part of it is 
> > always valid, while 

Re: [ovs-dev] [PATCH ovn] ovn-controller: Fix use of dangling pointers in I-P runtime_data.

2019-10-31 Thread Dumitru Ceara
On Thu, Oct 31, 2019 at 6:16 AM Han Zhou  wrote:
>
>
>
> On Tue, Oct 29, 2019 at 8:11 AM Dumitru Ceara  wrote:
> >
> > The incremental processing engine might stop a run before the
> > en_runtime_data node is processed. In such cases the ed_runtime_data
> > fields might contain pointers to already deleted SB records. For
> > example, if a port binding corresponding to a patch port is removed from
> > the SB database and the incremental processing engine aborts before the
> > en_runtime_data node is processed then the corresponding local_datapath
> > hashtable entry in ed_runtime_data is stale and will store a pointer to
> > the already freed sbrec_port_binding record.
> >
> > This will cause invalid memory accesses in various places (e.g.,
> > pinctrl_run() -> prepare_ipv6_ras()).
> >
> > To fix the issue we need a way to track how each node was processed
> > during an engine run. This commit transforms the 'changed' field in
> > struct engine_node in a 'state' field. Possible node states are:
> > - "Stale": data in the node is not up to date with the DB.
> > - "Updated": data in the node is valid but was updated during
> >   the last run of the engine.
> > - "Valid": data in the node is valid and didn't change during
> >   the last run of the engine.
> > - "Aborted": during the last run, processing was aborted for
> >   this node.
> >
> > The commit also simplifies the logic of calling engine_run and
> > engine_need_run in order to reduce the number of external variables
> > required to track the result of the last engine execution.
> >
> > Functions that need to be called from the main loop and depend on
> > various data contents of the engine's nodes are now called only if
> > the data is up to date.
> >
> > CC: Han Zhou 
> > Fixes: a6b7d9f4f047 ("ovn-controller: Fix flow installation latency
> > caused by recompute.")
> > Signed-off-by: Dumitru Ceara 
>
>
> Thanks Dumitru for the great fix! This was my omission.
> I realized that the problem should have existed even before the commit 
> a6b7d9f4f047 (so the commit message can be updated), because there are 
> chances the engine would not run at all, which is mostly caused by a previous 
> SB update from this node and the corresponding ovsdb otification is coming 
> back. At that moment since the transaction's reply is not coming yet, the txn 
> is always NULL and engine_run() is always skipped. If the transaction itself 
> deleted some data, which happen to be used by pinctrl_run() in this 
> iteration, then it would hit the dangling pointer problem. I am not sure why 
> I never encountered such case though.
>

Hi Han,

Right, I selected the wrong commit. I'll update the commit message. I
think it's harder to see the issue because you need to be waiting for
a transaction reply and at the same time deleting port bindings from
the SB database.

> In general I think one option to fix this is never store references to 
> another node's data. For example, always replicate the IDL data to engine 
> node's data. In practice, this is both inefficient and complex. It is 
> inefficient because of data copy. It is complex because data need to be deep 
> copied.
>
> The other option is the one implemented by your patch, to maintain states of 
> each egnine node and make sure accessing node data only when it is valid. I 
> think this is a more practical approach.
>

I think this will be fixed indirectly when/if pinctrl moves to a
different process. Until then this seems like the easiest solution.

> I also like the way you handle the case when engine_run is not invoked, by 
> passing run_id in the checks.
>
> However, there are still some places that some of the engine nodes’ data are 
> used without checking engine_node_data_valid(), such as ofctrl_run() using 
> ed_runtime_data.pending_ct_zones, ofctrl_inject_pkt() using 
> _addr_sets.addr_sets and _port_groups.port_groups, etc. However, those 
> data are actually valid all the time - they don't maintain any reference 
> pointers to their inputs. So it may not be a real problem. However, it can be 
> confusing why at some places we validate but at other places we don't. So I 
> think here is an improvement idea. Basically, we should always validate the 
> engine node data before accessing it, and the validation mechanism can be 
> improved to reflect each node's real state. We can add an optional interface 
> is_data_valid() for engine_node, which can override the default 
> engine_node_data_valid() behavior. We can let engine_node_data_valid() call 
> the node’s is_data_valid() interface if it is implemented, otherwise fallback 
> to the default behavior, which validates only from the node state and run_id. 
> For runtime_data, since part of it is always valid, while other part may be 
> invalid, I think we should split the pending_ct_zones out as a separate 
> engine node. This improvement is not urgent, so I think it is ok to add it as 
> a follow-up patch.

True, I missed those cases. It does sound better to 

Re: [ovs-dev] [PATCH ovn] ovn-controller: Fix use of dangling pointers in I-P runtime_data.

2019-10-30 Thread Han Zhou
On Wed, Oct 30, 2019 at 10:15 PM Han Zhou  wrote:
>
> >
> >
> > +/* We need to make sure that at least the runtime
data
> > + * (e.g., local datapaths, ct zones) are fresh
before
> > + * calling ofctrl_put and pinctrl_run to avoid
using
> > + * potentially freed references to database
records.
> > + */
> > +if (engine_node_data_valid(_runtime_data,
> > +   engine_run_id)) {
> > +ofctrl_put(_flow_output.flow_table,
> > +   _runtime_data.pending_ct_zones,
> > +
sbrec_meter_table_get(ovnsb_idl_loop.idl),
> > +
get_nb_cfg(sbrec_sb_global_table_get(
> > +
ovnsb_idl_loop.idl)),
> > +
engine_node_data_changed(_flow_output,
> > +
 engine_run_id));
>
>
> I think ofctrl_put() needs to be invoked every iteration, because it
takes care of reinstalling flows to OVS when ovs-vswitchd is restarted. See
"need_reinstall_flows" in ofctrl.c.
> For now, I think it is ok to leave this function out of the if
(engine_node_data_valid()) check, because pending_ct_zones is always valid.
For improvement, we should move pending_ct_zones to a separate engine node
so that it always passes the check.
>

I thought about this again and it seems ok. If ovs-vswitchd happened to be
restarted before this iteration and now we skipped ofctrl_put() because
runtime_data is not valid, then main loop should be waked up immediately
later and then ofctrl_put() will be executed finally and flows will be
reinstalled.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] ovn-controller: Fix use of dangling pointers in I-P runtime_data.

2019-10-30 Thread Han Zhou
On Tue, Oct 29, 2019 at 8:11 AM Dumitru Ceara  wrote:
>
> The incremental processing engine might stop a run before the
> en_runtime_data node is processed. In such cases the ed_runtime_data
> fields might contain pointers to already deleted SB records. For
> example, if a port binding corresponding to a patch port is removed from
> the SB database and the incremental processing engine aborts before the
> en_runtime_data node is processed then the corresponding local_datapath
> hashtable entry in ed_runtime_data is stale and will store a pointer to
> the already freed sbrec_port_binding record.
>
> This will cause invalid memory accesses in various places (e.g.,
> pinctrl_run() -> prepare_ipv6_ras()).
>
> To fix the issue we need a way to track how each node was processed
> during an engine run. This commit transforms the 'changed' field in
> struct engine_node in a 'state' field. Possible node states are:
> - "Stale": data in the node is not up to date with the DB.
> - "Updated": data in the node is valid but was updated during
>   the last run of the engine.
> - "Valid": data in the node is valid and didn't change during
>   the last run of the engine.
> - "Aborted": during the last run, processing was aborted for
>   this node.
>
> The commit also simplifies the logic of calling engine_run and
> engine_need_run in order to reduce the number of external variables
> required to track the result of the last engine execution.
>
> Functions that need to be called from the main loop and depend on
> various data contents of the engine's nodes are now called only if
> the data is up to date.
>
> CC: Han Zhou 
> Fixes: a6b7d9f4f047 ("ovn-controller: Fix flow installation latency
> caused by recompute.")
> Signed-off-by: Dumitru Ceara 


Thanks Dumitru for the great fix! This was my omission.
I realized that the problem should have existed even before the commit
a6b7d9f4f047 (so the commit message can be updated), because there are
chances the engine would not run at all, which is mostly caused by a
previous SB update from this node and the corresponding ovsdb otification
is coming back. At that moment since the transaction's reply is not coming
yet, the txn is always NULL and engine_run() is always skipped. If the
transaction itself deleted some data, which happen to be used by
pinctrl_run() in this iteration, then it would hit the dangling pointer
problem. I am not sure why I never encountered such case though.

In general I think one option to fix this is never store references to
another node's data. For example, always replicate the IDL data to engine
node's data. In practice, this is both inefficient and complex. It is
inefficient because of data copy. It is complex because data need to be
deep copied.

The other option is the one implemented by your patch, to maintain states
of each egnine node and make sure accessing node data only when it is
valid. I think this is a more practical approach.

I also like the way you handle the case when engine_run is not invoked, by
passing run_id in the checks.

However, there are still some places that some of the engine nodes’ data
are used without checking engine_node_data_valid(), such as ofctrl_run()
using ed_runtime_data.pending_ct_zones, ofctrl_inject_pkt() using
_addr_sets.addr_sets and _port_groups.port_groups, etc. However,
those data are actually valid all the time - they don't maintain any
reference pointers to their inputs. So it may not be a real problem.
However, it can be confusing why at some places we validate but at other
places we don't. So I think here is an improvement idea. Basically, we
should always validate the engine node data before accessing it, and the
validation mechanism can be improved to reflect each node's real state. We
can add an optional interface is_data_valid() for engine_node, which can
override the default engine_node_data_valid() behavior. We can let
engine_node_data_valid() call the node’s is_data_valid() interface if it is
implemented, otherwise fallback to the default behavior, which validates
only from the node state and run_id. For runtime_data, since part of it is
always valid, while other part may be invalid, I think we should split the
pending_ct_zones out as a separate engine node. This improvement is not
urgent, so I think it is ok to add it as a follow-up patch.

Regarding the current patch, I have some other comments below:

>
>
> +/* We need to make sure that at least the runtime
data
> + * (e.g., local datapaths, ct zones) are fresh before
> + * calling ofctrl_put and pinctrl_run to avoid using
> + * potentially freed references to database records.
> + */
> +if (engine_node_data_valid(_runtime_data,
> +   engine_run_id)) {
> +ofctrl_put(_flow_output.flow_table,
> +