Re: [ovs-dev] [PATCH ovn v3 00/27] ddlog 5x performance improvement

2021-05-20 Thread Ben Pfaff
On Thu, May 20, 2021 at 04:50:58PM -0400, Mark Michelson wrote:
> Hi Ben,
> 
> I gave these patches another look, and I can say that for patches 1-10, and
> patch 27:
> 
> Acked-by: Mark Michelson 
> 
> For patch 11, I had suggested previously a configure-time check to detect if
> the correct DDLog version was installed, and you replied with an example
> that you would roll into the next version of the series. But I don't see it
> here. Is it in a different patch series?
> 
> For patches 12-26, I don't feel qualified to definitively state that it's
> the "best" way to optimize things, but the changes all make sense to me and
> don't seem to be incorrect, so for patches 12-26:
> 
> Acked-by: Mark Michelson 
> 
> but I wouldn't be averse to others having a look to be certain.

Thanks.  I've now applied the whole series, except the last patch, to
master.

Somehow I omitted the configure-time check patch.  I'll post it again.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3] ofproto-dpif-xlate: fix zone set from non-frozen-metadata fields

2021-05-20 Thread 贺鹏
Hi, Ilya



Ilya Maximets  于2021年5月19日周三 下午8:50写道:
>
> On 2/27/21 10:34 AM, Peng He wrote:
> > CT zone could be set from a field that is not included in frozen
> > metadata. Consider the example rules which are typically seen in
> > OpenStack security group rules:
> >
> > priority=100,in_port=1,tcp,ct_state=-trk,action=ct(zone=5,table=0)
> > priority=100,in_port=1,tcp,ct_state=+trk,action=ct(commit,zone=NXM_NX_CT_ZONE[]),2
> >
> > The zone is set from the first rule's ct action. These two rules will
> > generate two megaflows: the first one uses zone=5 to query the CT module,
> > the second one sets the zone-id from the first megaflow and commit to CT.
> >
> > The current implementation will generate a megaflow that does not use
> > ct_zone=5 as a match, but directly commit into the ct using zone=5, as zone 
> > is
> > set by an Imm not a field.
> >
> > Consider a situation that one changes the zone id (for example to 15)
> > in the first rule, however, still keep the second rule unchanged. During
> > this change, there is traffic hitting the two generated megaflows, the
> > revaldiator would revalidate all megaflows, however, the revalidator will
> > not change the second megaflow, because zone=5 is recorded in the
> > megaflow, so the xlate will still translate the commit action into zone=5,
> > and the new traffic will still commit to CT as zone=5, not zone=15,
> > resulting in taffic drops and other issues.
> >
> > Just like OVS set-field convention, if a field X is set by Y
> > (Y is a variable not an Imm), we should also mask Y as a match
> > in the generated megaflow. An exception is that if the zone-id is
> > set by the field that is included in the frozen state (i.e. regs) and this
> > upcall is a resume of a thawed xlate, the un-wildcarding can be skipped,
> > as the recirc_id is a hash of the values in these fields, and it will change
> > following the changes of these fields. When the recirc_id changes,
> > all megaflows with the old recirc id will be invalid later.
> >
> > Fixes: 07659514c3 ("Add support for connection tracking.")
> > Reported-by: Sai Su 
> > Signed-off-by: Peng He 
> > ---
> >  include/openvswitch/meta-flow.h |  1 +
> >  lib/meta-flow.c | 13 ++
> >  ofproto/ofproto-dpif-xlate.c| 12 +
> >  tests/system-traffic.at | 45 +
> >  4 files changed, 71 insertions(+)
> >
> > diff --git a/include/openvswitch/meta-flow.h 
> > b/include/openvswitch/meta-flow.h
> > index 95e52e358..045dce8f5 100644
> > --- a/include/openvswitch/meta-flow.h
> > +++ b/include/openvswitch/meta-flow.h
> > @@ -2305,6 +2305,7 @@ void mf_set_flow_value_masked(const struct mf_field *,
> >const union mf_value *mask,
> >struct flow *);
> >  bool mf_is_tun_metadata(const struct mf_field *);
> > +bool mf_is_frozen_metadata(const struct mf_field *);
> >  bool mf_is_pipeline_field(const struct mf_field *);
> >  bool mf_is_set(const struct mf_field *, const struct flow *);
> >  void mf_mask_field(const struct mf_field *, struct flow_wildcards *);
> > diff --git a/lib/meta-flow.c b/lib/meta-flow.c
> > index c808d205d..e03cd8d0c 100644
> > --- a/lib/meta-flow.c
> > +++ b/lib/meta-flow.c
> > @@ -1788,6 +1788,19 @@ mf_is_tun_metadata(const struct mf_field *mf)
> > mf->id < MFF_TUN_METADATA0 + TUN_METADATA_NUM_OPTS;
> >  }
> >
> > +bool
> > +mf_is_frozen_metadata(const struct mf_field *mf)
> > +{
> > +if (mf->id >= MFF_TUN_ID && mf->id <= MFF_IN_PORT_OXM) {
> > +return true;
> > +}
> > +
> > +if (mf->id >= MFF_REG0 && mf->id < MFF_ETH_SRC) {
> > +return true;
> > +}
> > +return false;
> > +}
> > +
> >  bool
> >  mf_is_pipeline_field(const struct mf_field *mf)
> >  {
> > diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
> > index 7108c8a30..14d00db1e 100644
> > --- a/ofproto/ofproto-dpif-xlate.c
> > +++ b/ofproto/ofproto-dpif-xlate.c
> > @@ -6195,6 +6195,18 @@ compose_conntrack_action(struct xlate_ctx *ctx, 
> > struct ofpact_conntrack *ofc,
> >
> >  if (ofc->zone_src.field) {
> >  zone = mf_get_subfield(>zone_src, >xin->flow);
> > +if (ctx->xin->frozen_state) {
> > +/* If the upcall is a resume of a recirculation, we only need 
> > to
> > + * unwildcard the fields that are not in the frozen_metadata, 
> > as
> > + * when the rules update, OVS will generate a new recirc_id,
> > + * which will invalidate the megaflow with old the recirc_id.
> > + */
> > +if (!mf_is_frozen_metadata(ofc->zone_src.field)) {
> > +mf_mask_field(ofc->zone_src.field, ctx->wc);
> > +}
> > +} else {
> > +mf_mask_field(ofc->zone_src.field, ctx->wc);
> > +}
>
> IIUC, in current code above block is equal to just a single line:
>
> mf_mask_field(ofc->zone_src.field, ctx->wc);
>
> That is because zone 

Re: [ovs-dev] [PATCH ovn 3/3] Honor ACL direction when omitting ct for stateless

2021-05-20 Thread Han Zhou
On Thu, May 20, 2021 at 3:22 PM Han Zhou  wrote:
>
>
>
> On Mon, May 17, 2021 at 2:47 PM Ihar Hrachyshka 
wrote:
> >
> > While we *should not* circumvent conntrack when a stateful ACL of higher
> > priority is present on the switch, we should do so only when
> > allow-stateless and allow-stateful directions are the same, otherwise we
> > should still skip conntrack for allow-stateless ACLs.
> >
> > Fixes: 3187b9fef1 ("ovn-northd: introduce new allow-stateless ACL verb")
>
> Is this patch a "fix" or an "optimization"?
>
> >
> > Signed-off-by: Ihar Hrachyshka 
> > ---
> >  northd/lswitch.dl| 88 ---
> >  northd/ovn-northd.c  | 89 ++--
> >  northd/ovn_northd.dl | 32 
> >  tests/ovn-northd.at  | 72 +++
> >  4 files changed, 213 insertions(+), 68 deletions(-)
> >
> > diff --git a/northd/lswitch.dl b/northd/lswitch.dl
> > index b73cfd047..8a4c9154c 100644
> > --- a/northd/lswitch.dl
> > +++ b/northd/lswitch.dl
> > @@ -135,16 +135,39 @@ LogicalSwitchStatelessACL(ls, acl) :-
> >  LogicalSwitchACL(ls, acl),
> >  nb::ACL(._uuid = acl, .action = "allow-stateless").
> >
> > +relation LogicalSwitchStatelessFromACL(ls: uuid, acl: uuid)
> > +
> > +LogicalSwitchStatelessFromACL(ls, acl) :-
> > +LogicalSwitchStatelessACL(ls, acl),
> > +nb::ACL(._uuid = acl, .direction = "from-lport").
> > +
> > +// "Pitfalls of projections" in ddlog-new-feature.rst explains why this
> > +// is an output relation:
> > +output relation LogicalSwitchHasStatelessFromACL(ls: uuid,
has_stateless_from_acl: bool)
> > +
> > +LogicalSwitchHasStatelessFromACL(ls, true) :-
> > +LogicalSwitchStatelessFromACL(ls, _).
> > +
> > +LogicalSwitchHasStatelessFromACL(ls, false) :-
> > +nb::Logical_Switch(._uuid = ls),
> > +not LogicalSwitchStatelessFromACL(ls, _).
> > +
> > +relation LogicalSwitchStatelessToACL(ls: uuid, acl: uuid)
> > +
> > +LogicalSwitchStatelessToACL(ls, acl) :-
> > +LogicalSwitchStatelessACL(ls, acl),
> > +nb::ACL(._uuid = acl, .direction = "to-lport").
> > +
> >  // "Pitfalls of projections" in ddlog-new-feature.rst explains why this
> >  // is an output relation:
> > -output relation LogicalSwitchHasStatelessACL(ls: uuid,
has_stateless_acl: bool)
> > +output relation LogicalSwitchHasStatelessToACL(ls: uuid,
has_stateless_to_acl: bool)
> >
> > -LogicalSwitchHasStatelessACL(ls, true) :-
> > -LogicalSwitchStatelessACL(ls, _).
> > +LogicalSwitchHasStatelessToACL(ls, true) :-
> > +LogicalSwitchStatelessToACL(ls, _).
> >
> > -LogicalSwitchHasStatelessACL(ls, false) :-
> > +LogicalSwitchHasStatelessToACL(ls, false) :-
> >  nb::Logical_Switch(._uuid = ls),
> > -not LogicalSwitchStatelessACL(ls, _).
> > +not LogicalSwitchStatelessToACL(ls, _).
> >
> >  // "Pitfalls of projections" in ddlog-new-feature.rst explains why this
> >  // is an output relation:
> > @@ -223,18 +246,19 @@ LogicalSwitchHasNonRouterPort(ls, false) :-
> >  /* Switch relation collects all attributes of a logical switch */
> >
> >  relation (
> > -ls:nb::Logical_Switch,
> > -has_stateful_acl:  bool,
> > -has_stateless_acl: bool,
> > -has_acls:  bool,
> > -has_lb_vip:bool,
> > -has_dns_records:   bool,
> > -has_unknown_ports: bool,
> > -localnet_ports:Vec<(uuid, string)>,  // UUID and name of each
localnet port.
> > -subnet:Option<(in_addr/*subnet*/, in_addr/*mask*/,
bit<32>/*start_ipv4*/, bit<32>/*total_ipv4s*/)>,
> > -ipv6_prefix:   Option,
> > -mcast_cfg: Ref,
> > -is_vlan_transparent: bool,
> > +ls: nb::Logical_Switch,
> > +has_stateful_acl:   bool,
> > +has_stateless_from_acl: bool,
> > +has_stateless_to_acl:   bool,
> > +has_acls:   bool,
> > +has_lb_vip: bool,
> > +has_dns_records:bool,
> > +has_unknown_ports:  bool,
> > +localnet_ports: Vec<(uuid, string)>,  // UUID and name of
each localnet port.
> > +subnet: Option<(in_addr/*subnet*/,
in_addr/*mask*/, bit<32>/*start_ipv4*/, bit<32>/*total_ipv4s*/)>,
> > +ipv6_prefix:Option,
> > +mcast_cfg:  Ref,
> > +is_vlan_transparent:bool,
> >
> >  /* Does this switch have at least one port with type != "router"?
*/
> >  has_non_router_port: bool
> > @@ -251,22 +275,24 @@ function ipv6_parse_prefix(s: string):
Option {
> >  }
> >  }
> >
> > -(.ls= ls,
> > -.has_stateful_acl  = has_stateful_acl,
> > -.has_stateless_acl = has_stateless_acl,
> > -.has_acls  = has_acls,
> > -.has_lb_vip= has_lb_vip,
> > -.has_dns_records   = has_dns_records,
> > -.has_unknown_ports = has_unknown_ports,
> > -.localnet_ports= localnet_ports,
> > -.subnet= subnet,
> > -

Re: [ovs-dev] [PATCH] ofproto-dpif-upcall: Fix race condition while purging

2021-05-20 Thread Jianbo Liu
The 05/11/2021 11:38, Roi Dayan wrote:
> From: Jianbo Liu 
> 
> There is a race condidtion between purger and handler. Handler may
> create new ukey and install it while executing 'ovs-appctl
> revalidator/purge' command. However, before handler calls
> transition_ukey() in handle_upcalls(), purger may get this ukey from
> umap, then evict and delete it. This will trigger ovs_abort in
> transition_ukey() for handler because it is trying to set state to
> EVICTED or OPERATIONAL, but ukey is already in DELETED state.
> To fix this issue, purger must not delete ukey in VISIBLE state.

Friendly ping for review ...

Thanks!
Jianbo

> 
> Fixes: 98bb4286970d ("tests: Add command to purge revalidators of flows.")
> Signed-off-by: Jianbo Liu 
> Reviewed-by: Roi Dayan 
> ---
>  ofproto/ofproto-dpif-upcall.c | 25 -
>  1 file changed, 16 insertions(+), 9 deletions(-)
> 
> diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
> index ccf97266c0b9..3add505ff652 100644
> --- a/ofproto/ofproto-dpif-upcall.c
> +++ b/ofproto/ofproto-dpif-upcall.c
> @@ -327,6 +327,12 @@ struct ukey_op {
>  struct dpif_op dop;   /* Flow operation. */
>  };
>  
> +enum sweep_type {
> +PURGE_NONE,
> +PURGE_SOFT,
> +PURGE_HARD,
> +};
> +
>  static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
>  static struct ovs_list all_udpifs = OVS_LIST_INITIALIZER(_udpifs);
>  
> @@ -345,7 +351,7 @@ static unsigned long udpif_get_n_flows(struct udpif *);
>  static void revalidate(struct revalidator *);
>  static void revalidator_pause(struct revalidator *);
>  static void revalidator_sweep(struct revalidator *);
> -static void revalidator_purge(struct revalidator *);
> +static void revalidator_purge(struct revalidator *, enum sweep_type);
>  static void upcall_unixctl_show(struct unixctl_conn *conn, int argc,
>  const char *argv[], void *aux);
>  static void upcall_unixctl_disable_megaflows(struct unixctl_conn *, int argc,
> @@ -541,7 +547,7 @@ udpif_stop_threads(struct udpif *udpif, bool delete_flows)
>  
>  if (delete_flows) {
>  for (i = 0; i < udpif->n_revalidators; i++) {
> -revalidator_purge(>revalidators[i]);
> +revalidator_purge(>revalidators[i], PURGE_HARD);
>  }
>  }
>  
> @@ -2772,7 +2778,8 @@ revalidator_pause(struct revalidator *revalidator)
>  }
>  
>  static void
> -revalidator_sweep__(struct revalidator *revalidator, bool purge)
> +revalidator_sweep__(struct revalidator *revalidator,
> +enum sweep_type sweep_type)
>  {
>  struct udpif *udpif;
>  uint64_t dump_seq, reval_seq;
> @@ -2803,13 +2810,13 @@ revalidator_sweep__(struct revalidator *revalidator, 
> bool purge)
>  }
>  ukey_state = ukey->state;
>  if (ukey_state == UKEY_OPERATIONAL
> -|| (ukey_state == UKEY_VISIBLE && purge)) {
> +|| (ukey_state == UKEY_VISIBLE && sweep_type == PURGE_HARD)) 
> {
>  struct recirc_refs recircs = RECIRC_REFS_EMPTY_INITIALIZER;
>  bool seq_mismatch = (ukey->dump_seq != dump_seq
>   && ukey->reval_seq != reval_seq);
>  enum reval_result result;
>  
> -if (purge) {
> +if (sweep_type > PURGE_NONE) {
>  result = UKEY_DELETE;
>  } else if (!seq_mismatch) {
>  result = UKEY_KEEP;
> @@ -2856,13 +2863,13 @@ revalidator_sweep__(struct revalidator *revalidator, 
> bool purge)
>  static void
>  revalidator_sweep(struct revalidator *revalidator)
>  {
> -revalidator_sweep__(revalidator, false);
> +revalidator_sweep__(revalidator, PURGE_NONE);
>  }
>  
>  static void
> -revalidator_purge(struct revalidator *revalidator)
> +revalidator_purge(struct revalidator *revalidator, enum sweep_type 
> sweep_type)
>  {
> -revalidator_sweep__(revalidator, true);
> +revalidator_sweep__(revalidator, sweep_type);
>  }
>  
>  /* In reaction to dpif purge, purges all 'ukey's with same 'pmd_id'. */
> @@ -3056,7 +3063,7 @@ upcall_unixctl_purge(struct unixctl_conn *conn, int 
> argc OVS_UNUSED,
>  int n;
>  
>  for (n = 0; n < udpif->n_revalidators; n++) {
> -revalidator_purge(>revalidators[n]);
> +revalidator_purge(>revalidators[n], PURGE_SOFT);
>  }
>  }
>  unixctl_command_reply(conn, "");
> -- 
> 2.8.0
> 

-- 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn 2/3] ovn-northd, ovn-northd-ddlog: New --dry-run option.

2021-05-20 Thread 0-day Robot
Bleep bloop.  Greetings Ben Pfaff, I am a robot and I have tried out your patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
WARNING: Line lacks whitespace around operator
#70 FILE: northd/ovn-northd-ddlog.c:1053:
  --dry-run start in paused state (do not commit db changes)\n\

WARNING: Line lacks whitespace around operator
#226 FILE: northd/ovn-northd.c:404:
  --dry-run start in paused state (do not commit db changes)\n\

Lines checked: 416, Warnings: 2, Errors: 0


Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn 1/3] ovn-northd-ddlog: Document --ddlog-record option.

2021-05-20 Thread 0-day Robot
Bleep bloop.  Greetings Ben Pfaff, I am a robot and I have tried out your patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
WARNING: Line lacks whitespace around operator
WARNING: Line lacks whitespace around operator
WARNING: Line lacks whitespace around operator
#20 FILE: northd/ovn-northd-ddlog.c:1052:
  --ddlog-record=FILE.TXT   record db changes to replay later for debugging\n\

Lines checked: 47, Warnings: 3, Errors: 0


Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] physical: do not forward traffic from localport to a localnet one

2021-05-20 Thread Numan Siddique
On Thu, May 20, 2021 at 5:05 PM Mark Michelson  wrote:
>
> Acked-by: Mark Michelson
>
> I have a small nit below, but since it's strictly with a comment, it's
> easy for whoever merges this to correct the comment when they do it.
> There's no reason to spin up a new version of this.
>
> On 5/4/21 1:59 PM, Lorenzo Bianconi wrote:
> > Since the localnet port is available on each hv, do not forward traffic
> > to the localnet port if it is present in order to avoid switch fdb
> > misconfiguration.
> > Related bz: https://bugzilla.redhat.com/show_bug.cgi?id=1942877
> >
> > Signed-off-by: Lorenzo Bianconi 
> > ---
> >   controller/physical.c| 23 +++
> >   include/ovn/logical-fields.h |  4 
> >   tests/ovn.at | 17 +
> >   3 files changed, 44 insertions(+)
> >
> > diff --git a/controller/physical.c b/controller/physical.c
> > index 96c959d18..258842634 100644
> > --- a/controller/physical.c
> > +++ b/controller/physical.c
> > @@ -1193,6 +1193,11 @@ consider_port_binding(struct ovsdb_idl_index 
> > *sbrec_port_binding_by_name,
> >
> >   load_logical_ingress_metadata(binding, _ids, ofpacts_p);
> >
> > +if (!strcmp(binding->type, "localport")) {
> > +/* mark the packet as incoming from a localport */
> > +put_load(1, MFF_LOG_FLAGS, MLF_LOCALPORT_BIT, 1, ofpacts_p);
> > +}
> > +
> >   /* Resubmit to first logical ingress pipeline table. */
> >   put_resubmit(OFTABLE_LOG_INGRESS_PIPELINE, ofpacts_p);
> >   ofctrl_add_flow(flow_table, OFTABLE_PHY_TO_LOG,
> > @@ -1251,6 +1256,24 @@ consider_port_binding(struct ovsdb_idl_index 
> > *sbrec_port_binding_by_name,
> > ofport, flow_table);
> >   }
> >
> > +/* Table 34, priority 160.
>
> Nit: Table 34 is incorrect here. OFTABLE_CHECK_LOOPBACK is table 39 now.
> I think the issue here is that when the tables shifted, nobody updated
> the comments in this file, and so it's wrong here and several other
> places in physical.c
>

I corrected the comment and applied the patch to the main branch.

Numan

> > + * ===
> > + *
> > + * Do not forward local traffic from a localport to a localnet 
> > port.
> > + */
> > +if (!strcmp(binding->type, "localnet")) {
> > +/* do not forward traffic from localport to localnet port */
> > +match_init_catchall();
> > +ofpbuf_clear(ofpacts_p);
> > +match_set_metadata(, htonll(dp_key));
> > +match_set_reg(, MFF_LOG_OUTPORT - MFF_REG0, port_key);
> > +match_set_reg_masked(, MFF_LOG_FLAGS - MFF_REG0,
> > + MLF_LOCALPORT, MLF_LOCALPORT);
> > +ofctrl_add_flow(flow_table, OFTABLE_CHECK_LOOPBACK, 160,
> > +binding->header_.uuid.parts[0], ,
> > +ofpacts_p, >header_.uuid);
> > +}
> > +
> >   } else if (!tun && !is_ha_remote) {
> >   /* Remote port connected by localnet port */
> >   /* Table 33, priority 100.
> > diff --git a/include/ovn/logical-fields.h b/include/ovn/logical-fields.h
> > index d44b30b30..ef97117b9 100644
> > --- a/include/ovn/logical-fields.h
> > +++ b/include/ovn/logical-fields.h
> > @@ -67,6 +67,7 @@ enum mff_log_flags_bits {
> >   MLF_LOOKUP_LB_HAIRPIN_BIT = 7,
> >   MLF_LOOKUP_FDB_BIT = 8,
> >   MLF_SKIP_SNAT_FOR_LB_BIT = 9,
> > +MLF_LOCALPORT_BIT = 10,
> >   };
> >
> >   /* MFF_LOG_FLAGS_REG flag assignments */
> > @@ -107,6 +108,9 @@ enum mff_log_flags {
> >   /* Indicate that a packet must not SNAT in the gateway router when
> >* load-balancing has taken place. */
> >   MLF_SKIP_SNAT_FOR_LB = (1 << MLF_SKIP_SNAT_FOR_LB_BIT),
> > +
> > +/* Indicate the packet has been received from a localport */
> > +MLF_LOCALPORT = (1 << MLF_LOCALPORT_BIT),
> >   };
> >
> >   /* OVN logical fields
> > diff --git a/tests/ovn.at b/tests/ovn.at
> > index fe6a7c85b..99764b24b 100644
> > --- a/tests/ovn.at
> > +++ b/tests/ovn.at
> > @@ -11823,10 +11823,17 @@ OVN_FOR_EACH_NORTHD([
> >   AT_SETUP([ovn -- localport suppress gARP])
> >   ovn_start
> >
> > +send_garp() {
> > +local inport=$1 eth_src=$2 eth_dst=$3 spa=$4 tpa=$5
> > +local 
> > request=${eth_dst}${eth_src}08060001080006040001${eth_src}${spa}${eth_dst}${tpa}
> > +as hv1 ovs-appctl netdev-dummy/receive vif$inport $request
> > +}
> > +
> >   net_add n1
> >   sim_add hv1
> >   as hv1
> >   check ovs-vsctl add-br br-phys
> > +ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys
> >   ovn_attach n1 br-phys 192.168.0.1
> >
> >   check ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys
> > @@ -11837,6 +11844,7 @@ check ovn-nbctl ls-add ls \
> >   -- lsp-set-addresses lp "00:00:00:00:00:01 10.0.0.1" \
> >   -- lsp-add ls ln \
> >   -- lsp-set-type ln localnet 

[ovs-dev] [PATCH ovn 2/3] ovn-northd, ovn-northd-ddlog: New --dry-run option.

2021-05-20 Thread Ben Pfaff
By being able to start up ovn-northd-ddlog in a paused state, we can
produce a recording for use in debugging without having to restart one
of the real ovn-northd-ddlog processes or disturbing the system.

Signed-off-by: Ben Pfaff 
CC: Ihar Hrachyshka 
---
 NEWS  |  2 +-
 northd/ovn-northd-ddlog.c | 43 ---
 northd/ovn-northd.8.xml   | 25 ---
 northd/ovn-northd.c   | 19 -
 tests/ovn-macros.at   | 22 ++--
 tests/ovn-northd.at   | 16 +++
 6 files changed, 88 insertions(+), 39 deletions(-)

diff --git a/NEWS b/NEWS
index f96ed73f0823..96c9397361a2 100644
--- a/NEWS
+++ b/NEWS
@@ -18,7 +18,7 @@ Post-v21.03.0
 datapath flows with this field used.
   - Introduce a new "allow-stateless" ACL verb to always bypass connection
 tracking. The existing "allow" verb behavior is left intact.
-  - Added support in native DNS to respond to PTR request types.
+  - New --dry-run option for ovn-northd and ovn-northd-ddlog.
 
 OVN v21.03.0 - 12 Mar 2021
 -
diff --git a/northd/ovn-northd-ddlog.c b/northd/ovn-northd-ddlog.c
index c79e15312b64..73bf5290eb35 100644
--- a/northd/ovn-northd-ddlog.c
+++ b/northd/ovn-northd-ddlog.c
@@ -159,7 +159,8 @@ northd_ctx_create(const char *server, const char *database,
   ddlog_prog ddlog, ddlog_delta *delta,
   const char **input_relations,
   const char **output_relations,
-  const char **output_only_relations)
+  const char **output_only_relations,
+  bool paused)
 {
 struct northd_ctx *ctx = xmalloc(sizeof *ctx);
 *ctx = (struct northd_ctx) {
@@ -175,7 +176,7 @@ northd_ctx_create(const char *server, const char *database,
 .db_name = database,
 /* 'output_only_relations' will get filled in later. */
 .lock_name = lock_name,
-.paused = false,
+.paused = paused,
 };
 
 ovsdb_cs_set_remote(ctx->cs, server, true);
@@ -1041,7 +1042,7 @@ static void
 usage(void)
 {
 printf("\
-%s: OVN northbound management daemon\n\
+%s: OVN northbound management daemon (DDlog version)\n\
 usage: %s [OPTIONS]\n\
 \n\
 Options:\n\
@@ -1049,6 +1050,7 @@ Options:\n\
 (default: %s)\n\
   --ovnsb-db=DATABASE   connect to ovn-sb database at DATABASE\n\
 (default: %s)\n\
+  --dry-run start in paused state (do not commit db changes)\n\
   --ddlog-record=FILE.TXT   record db changes to replay later for debugging\n\
   --unixctl=SOCKET  override default control socket name\n\
   -h, --helpdisplay this help message\n\
@@ -1061,22 +1063,25 @@ Options:\n\
 }
 
 static void
-parse_options(int argc OVS_UNUSED, char *argv[] OVS_UNUSED)
+parse_options(int argc OVS_UNUSED, char *argv[] OVS_UNUSED,
+  bool *pause)
 {
 enum {
 OVN_DAEMON_OPTION_ENUMS,
 VLOG_OPTION_ENUMS,
 SSL_OPTION_ENUMS,
+OPT_DRY_RUN,
 OPT_DDLOG_RECORD
 };
 static const struct option long_options[] = {
-{"ddlog-record", required_argument, NULL, OPT_DDLOG_RECORD},
 {"ovnsb-db", required_argument, NULL, 'd'},
 {"ovnnb-db", required_argument, NULL, 'D'},
 {"unixctl", required_argument, NULL, 'u'},
 {"help", no_argument, NULL, 'h'},
 {"options", no_argument, NULL, 'o'},
 {"version", no_argument, NULL, 'V'},
+{"dry-run", no_argument, NULL, OPT_DRY_RUN},
+{"ddlog-record", required_argument, NULL, OPT_DDLOG_RECORD},
 OVN_DAEMON_LONG_OPTIONS,
 VLOG_LONG_OPTIONS,
 STREAM_SSL_LONG_OPTIONS,
@@ -1097,10 +1102,6 @@ parse_options(int argc OVS_UNUSED, char *argv[] 
OVS_UNUSED)
 VLOG_OPTION_HANDLERS;
 STREAM_SSL_OPTION_HANDLERS;
 
-case OPT_DDLOG_RECORD:
-record_file = optarg;
-break;
-
 case 'd':
 ovnsb_db = optarg;
 break;
@@ -1125,6 +1126,14 @@ parse_options(int argc OVS_UNUSED, char *argv[] 
OVS_UNUSED)
 ovs_print_version(0, 0);
 exit(EXIT_SUCCESS);
 
+case OPT_DRY_RUN:
+*pause = true;
+break;
+
+case OPT_DDLOG_RECORD:
+record_file = optarg;
+break;
+
 default:
 break;
 }
@@ -1148,6 +1157,10 @@ main(int argc, char *argv[])
 struct unixctl_server *unixctl;
 int retval;
 bool exiting;
+struct northd_status status = {
+.locked = false,
+.pause = false,
+};
 
 init_table_ids();
 
@@ -1155,7 +1168,7 @@ main(int argc, char *argv[])
 ovs_cmdl_proctitle_init(argc, argv);
 set_program_name(argv[0]);
 service_start(, );
-parse_options(argc, argv);
+parse_options(argc, argv, );
 
 daemonize_start(false);
 
@@ -1167,10 +1180,6 @@ main(int argc, char *argv[])
 

[ovs-dev] [PATCH ovn 3/3] tests: Don't define tests that will always be skipped.

2021-05-20 Thread Ben Pfaff
The "(northbound|southbound) database reconnection" tests had
dp-groups=yes variants but they were unconditionally skipped at runtime.
There's no point in having them, so this commit drops them.

This changes the changes of the tests without datapath groups to have
"dp-groups=no" in their names.  That seems like an OK change to me,
but it can easily be reverted if people don't like it.

Signed-off-by: Ben Pfaff 
---
 tests/ovn-macros.at | 26 --
 tests/ovn-northd.at | 12 ++--
 2 files changed, 14 insertions(+), 24 deletions(-)

diff --git a/tests/ovn-macros.at b/tests/ovn-macros.at
index 2217131ab234..cd02b6986cc2 100644
--- a/tests/ovn-macros.at
+++ b/tests/ovn-macros.at
@@ -546,17 +546,15 @@ OVS_END_SHELL_HELPERS
 
 m4_define([OVN_POPULATE_ARP], [AT_CHECK(ovn_populate_arp__, [0], [ignore])])
 
-m4_define([OVN_FOR_EACH_NORTHD], [dnl
-m4_pushdef([NORTHD_TYPE], [ovn-northd])dnl
-m4_pushdef([NORTHD_USE_DP_GROUPS], [yes])dnl
-$1
-m4_popdef([NORTHD_USE_DP_GROUPS])dnl
-$1
-m4_popdef([NORTHD_TYPE])dnl
-m4_pushdef([NORTHD_TYPE], [ovn-northd-ddlog])dnl
-m4_pushdef([NORTHD_USE_DP_GROUPS], [yes])dnl
-$1
-m4_popdef([NORTHD_USE_DP_GROUPS])dnl
-$1
-m4_popdef([NORTHD_TYPE])dnl
-])
+# Defines a versions of a test with all combinations of northd and
+# datapath groups.
+m4_define([OVN_FOR_EACH_NORTHD],
+  [m4_foreach([NORTHD_TYPE], [ovn-northd, ovn-northd-ddlog],
+ [m4_foreach([NORTHD_USE_DP_GROUPS], [yes, no], [$1
+])])])
+
+# Some tests aren't prepared for dp groups to be enabled.
+m4_define([OVN_FOR_EACH_NORTHD_WITHOUT_DP_GROUPS],
+  [m4_foreach([NORTHD_TYPE], [ovn-northd, ovn-northd-ddlog],
+ [m4_foreach([NORTHD_USE_DP_GROUPS], [no], [$1
+])])])
diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
index 28a46702cf48..cf7e8a0604ed 100644
--- a/tests/ovn-northd.at
+++ b/tests/ovn-northd.at
@@ -734,13 +734,9 @@ wait_row_count Datapath_Binding 2
 AT_CLEANUP
 ])
 
-OVN_FOR_EACH_NORTHD([
+OVN_FOR_EACH_NORTHD_WITHOUT_DP_GROUPS([
 AT_SETUP([ovn -- northbound database reconnection])
 
-dnl This test doesn't take into account flows that are shared between
-dnl datapaths when datapath groups are enabled.
-AT_SKIP_IF([test NORTHD_USE_DP_GROUPS = yes])
-
 ovn_start --backup-northd=none
 
 # Check that ovn-northd is active, by verifying that it creates and
@@ -774,13 +770,9 @@ wait_row_count Logical_Flow $(expr 2 \* $lf)
 AT_CLEANUP
 ])
 
-OVN_FOR_EACH_NORTHD([
+OVN_FOR_EACH_NORTHD_WITHOUT_DP_GROUPS([
 AT_SETUP([ovn -- southbound database reconnection])
 
-dnl This test doesn't take into account flows that are shared between
-dnl datapaths when datapath groups are enabled.
-AT_SKIP_IF([test NORTHD_USE_DP_GROUPS = yes])
-
 ovn_start --backup-northd=none
 
 # Check that ovn-northd is active, by verifying that it creates and
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn 1/3] ovn-northd-ddlog: Document --ddlog-record option.

2021-05-20 Thread Ben Pfaff
Signed-off-by: Ben Pfaff 
---
 northd/ovn-northd-ddlog.c |  1 +
 northd/ovn-northd.8.xml   | 10 ++
 2 files changed, 11 insertions(+)

diff --git a/northd/ovn-northd-ddlog.c b/northd/ovn-northd-ddlog.c
index b7d2c8a5ef8d..c79e15312b64 100644
--- a/northd/ovn-northd-ddlog.c
+++ b/northd/ovn-northd-ddlog.c
@@ -1049,6 +1049,7 @@ Options:\n\
 (default: %s)\n\
   --ovnsb-db=DATABASE   connect to ovn-sb database at DATABASE\n\
 (default: %s)\n\
+  --ddlog-record=FILE.TXT   record db changes to replay later for debugging\n\
   --unixctl=SOCKET  override default control socket name\n\
   -h, --helpdisplay this help message\n\
   -o, --options list available options\n\
diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
index bca9cb4f4e79..b7214b05e7ed 100644
--- a/northd/ovn-northd.8.xml
+++ b/northd/ovn-northd.8.xml
@@ -42,6 +42,16 @@
 as the default.  Otherwise, the default is
 unix:@RUNDIR@/ovnsb_db.sock.
   
+  --ddlog-record=file
+  
+This option is for ovn-north-ddlog only.  It causes the
+daemon to record the initial database state and later changes to
+file in the text-based DDlog command format.  The
+ovn_northd_cli program can later replay these changes for
+debugging purposes.  This option has a performance impact.  See
+debugging-ddlog.rst in the OVN documentation for more
+details.
+  
 
 
   database in the above options must be an OVSDB active or
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn 0/3] Add --dry-run option to ovn-northd{, -ddlog}

2021-05-20 Thread Ben Pfaff
As discussed in the OVN meeting earlier today.

Ben Pfaff (3):
  ovn-northd-ddlog: Document --ddlog-record option.
  ovn-northd, ovn-northd-ddlog: New --dry-run option.
  tests: Don't define tests that will always be skipped.

 NEWS  |  2 +-
 northd/ovn-northd-ddlog.c | 44 ++-
 northd/ovn-northd.8.xml   | 35 +---
 northd/ovn-northd.c   | 19 
 tests/ovn-macros.at   | 48 +++
 tests/ovn-northd.at   | 28 ---
 6 files changed, 113 insertions(+), 63 deletions(-)

-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn] northd: Fix incorrect datapath flows for Gateway Router with Load Balancers.

2021-05-20 Thread numans
From: Mark Gray 

When sending traffic from a Logical Switch Port to an external
IP address via a Gateway Router with a Load Balancer, Open vSwitch
creates datapath flows with a couple of problems:

1) In the router pipeline, we have the following stages to handle
dnat and unsnat.

 - Stage 4 : lr_in_defrag (dnat zone)
 - Stage 5 : lr_in_unsnat (snat zone)
 - Stage 6 : lr_in_dnat   (dnat zone)

In the reply direction, the order of traversal of the tables
"lr_in_defrag", "lr_in_unsnat" and "lr_in_dnat" add incorrect
datapath flows that check ct_state in the wrong conntrack zone.
This is illustrated below where reply trafic enters the physical host
port (6) and traverses DNAT zone (14), SNAT zone (default), back to the
DNAT zone and then on to Logical Switch Port zone (22). The third
flow is incorrectly checking the state from the SNAT zone instead
of the DNAT zone.

recirc_id(0),in_port(6),ct_state(-new-est-rel-rpl-trk) 
actions:ct_clear,ct(zone=14),recirc(0xf)
recirc_id(0xf),in_port(6) actions:ct(nat),recirc(0x10)
recirc_id(0x10),in_port(6),ct_state(-new+est+trk) 
actions:ct(zone=14,nat),recirc(0x11)
recirc_id(0x11),in_port(6),ct_state(+new-est-rel-rpl+trk) actions: 
ct(zone=22,nat),recirc(0x12)
recirc_id(0x12),in_port(6),ct_state(-new+est-rel+rpl+trk) actions:5

2) Efficiencies can be gained by using the ct_dnat action in the
table "lr_in_defrag" instead of ct_next. This removes the need for the
ct_dnat action for established Load Balancer flows.

This patch resolves these issues.

Co-authored-by: Numan Siddique 
Signed-off-by: Mark Gray 
Signed-off-by: Numan Siddique 
---
 northd/ovn-northd.8.xml | 124 +
 northd/ovn-northd.c |  61 +++--
 northd/ovn_northd.dl|  82 --
 tests/ovn-northd.at | 551 
 tests/ovn.at|   6 +-
 tests/system-ovn.at |   7 +-
 6 files changed, 674 insertions(+), 157 deletions(-)

diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
index bca9cb4f4e..701134389e 100644
--- a/northd/ovn-northd.8.xml
+++ b/northd/ovn-northd.8.xml
@@ -2599,39 +2599,7 @@ icmp6 {
   
 
 
-Ingress Table 4: DEFRAG
-
-
-  This is to send packets to connection tracker for tracking and
-  defragmentation.  It contains a priority-0 flow that simply moves traffic
-  to the next table.
-
-
-
-  If load balancing rules with virtual IP addresses (and ports) are
-  configured in OVN_Northbound database for a Gateway router,
-  a priority-100 flow is added for each configured virtual IP address
-  VIP. For IPv4 VIPs the flow matches ip
-   ip4.dst == VIP.  For IPv6 VIPs,
-  the flow matches ip  ip6.dst == VIP.
-  The flow uses the action ct_next; to send IP packets to the
-  connection tracker for packet de-fragmentation and tracking before
-  sending it to the next table.
-
-
-
-  If ECMP routes with symmetric reply are configured in the
-  OVN_Northbound database for a gateway router, a priority-100
-  flow is added for each router port on which symmetric replies are
-  configured. The matching logic for these ports essentially reverses the
-  configured logic of the ECMP route. So for instance, a route with a
-  destination routing policy will instead match if the source IP address
-  matches the static route's prefix. The flow uses the action
-  ct_next to send IP packets to the connection tracker for
-  packet de-fragmentation and tracking before sending it to the next table.
-
-
-Ingress Table 5: UNSNAT
+Ingress Table 4: UNSNAT
 
 
   This is for already established connections' reverse traffic.
@@ -2640,7 +2608,7 @@ icmp6 {
   unSNATted here.
 
 
-Ingress Table 5: UNSNAT on Gateway and Distributed Routers
+Ingress Table 4: UNSNAT on Gateway and Distributed Routers
 
   
 
@@ -2667,7 +2635,7 @@ icmp6 {
   
 
 
-Ingress Table 5: UNSNAT on Gateway Routers
+Ingress Table 4: UNSNAT on Gateway Routers
 
 
   
@@ -2684,9 +2652,10 @@ icmp6 {
   lb_force_snat_ip=router_ip then for every logical router
   port P attached to the Gateway router with the router ip
   B, a priority-110 flow is added with the match
-  inport == P  ip4.dst == 
B or
-  inport == P  ip6.dst == 
B
-  with an action ct_snat; .
+  inport == P 
+  ip4.dst == B or inport == P
+   ip6.dst == B with an action
+  ct_snat; .
 
 
 
@@ -2716,7 +2685,7 @@ icmp6 {
   
 
 
-Ingress Table 5: UNSNAT on Distributed Routers
+Ingress Table 4: UNSNAT on Distributed Routers
 
 
   
@@ -2747,6 +2716,40 @@ icmp6 {
   
 
 
+Ingress Table 5: DEFRAG
+
+
+  This is to send packets to connection tracker for tracking and
+  defragmentation.  It contains a priority-0 flow that simply moves traffic
+  to the next table.
+
+
+
+  If load 

Re: [ovs-dev] [PATCH ovn 3/3] Honor ACL direction when omitting ct for stateless

2021-05-20 Thread Han Zhou
On Mon, May 17, 2021 at 2:47 PM Ihar Hrachyshka  wrote:
>
> While we *should not* circumvent conntrack when a stateful ACL of higher
> priority is present on the switch, we should do so only when
> allow-stateless and allow-stateful directions are the same, otherwise we
> should still skip conntrack for allow-stateless ACLs.
>
> Fixes: 3187b9fef1 ("ovn-northd: introduce new allow-stateless ACL verb")

Is this patch a "fix" or an "optimization"?

>
> Signed-off-by: Ihar Hrachyshka 
> ---
>  northd/lswitch.dl| 88 ---
>  northd/ovn-northd.c  | 89 ++--
>  northd/ovn_northd.dl | 32 
>  tests/ovn-northd.at  | 72 +++
>  4 files changed, 213 insertions(+), 68 deletions(-)
>
> diff --git a/northd/lswitch.dl b/northd/lswitch.dl
> index b73cfd047..8a4c9154c 100644
> --- a/northd/lswitch.dl
> +++ b/northd/lswitch.dl
> @@ -135,16 +135,39 @@ LogicalSwitchStatelessACL(ls, acl) :-
>  LogicalSwitchACL(ls, acl),
>  nb::ACL(._uuid = acl, .action = "allow-stateless").
>
> +relation LogicalSwitchStatelessFromACL(ls: uuid, acl: uuid)
> +
> +LogicalSwitchStatelessFromACL(ls, acl) :-
> +LogicalSwitchStatelessACL(ls, acl),
> +nb::ACL(._uuid = acl, .direction = "from-lport").
> +
> +// "Pitfalls of projections" in ddlog-new-feature.rst explains why this
> +// is an output relation:
> +output relation LogicalSwitchHasStatelessFromACL(ls: uuid,
has_stateless_from_acl: bool)
> +
> +LogicalSwitchHasStatelessFromACL(ls, true) :-
> +LogicalSwitchStatelessFromACL(ls, _).
> +
> +LogicalSwitchHasStatelessFromACL(ls, false) :-
> +nb::Logical_Switch(._uuid = ls),
> +not LogicalSwitchStatelessFromACL(ls, _).
> +
> +relation LogicalSwitchStatelessToACL(ls: uuid, acl: uuid)
> +
> +LogicalSwitchStatelessToACL(ls, acl) :-
> +LogicalSwitchStatelessACL(ls, acl),
> +nb::ACL(._uuid = acl, .direction = "to-lport").
> +
>  // "Pitfalls of projections" in ddlog-new-feature.rst explains why this
>  // is an output relation:
> -output relation LogicalSwitchHasStatelessACL(ls: uuid,
has_stateless_acl: bool)
> +output relation LogicalSwitchHasStatelessToACL(ls: uuid,
has_stateless_to_acl: bool)
>
> -LogicalSwitchHasStatelessACL(ls, true) :-
> -LogicalSwitchStatelessACL(ls, _).
> +LogicalSwitchHasStatelessToACL(ls, true) :-
> +LogicalSwitchStatelessToACL(ls, _).
>
> -LogicalSwitchHasStatelessACL(ls, false) :-
> +LogicalSwitchHasStatelessToACL(ls, false) :-
>  nb::Logical_Switch(._uuid = ls),
> -not LogicalSwitchStatelessACL(ls, _).
> +not LogicalSwitchStatelessToACL(ls, _).
>
>  // "Pitfalls of projections" in ddlog-new-feature.rst explains why this
>  // is an output relation:
> @@ -223,18 +246,19 @@ LogicalSwitchHasNonRouterPort(ls, false) :-
>  /* Switch relation collects all attributes of a logical switch */
>
>  relation (
> -ls:nb::Logical_Switch,
> -has_stateful_acl:  bool,
> -has_stateless_acl: bool,
> -has_acls:  bool,
> -has_lb_vip:bool,
> -has_dns_records:   bool,
> -has_unknown_ports: bool,
> -localnet_ports:Vec<(uuid, string)>,  // UUID and name of each
localnet port.
> -subnet:Option<(in_addr/*subnet*/, in_addr/*mask*/,
bit<32>/*start_ipv4*/, bit<32>/*total_ipv4s*/)>,
> -ipv6_prefix:   Option,
> -mcast_cfg: Ref,
> -is_vlan_transparent: bool,
> +ls: nb::Logical_Switch,
> +has_stateful_acl:   bool,
> +has_stateless_from_acl: bool,
> +has_stateless_to_acl:   bool,
> +has_acls:   bool,
> +has_lb_vip: bool,
> +has_dns_records:bool,
> +has_unknown_ports:  bool,
> +localnet_ports: Vec<(uuid, string)>,  // UUID and name of
each localnet port.
> +subnet: Option<(in_addr/*subnet*/, in_addr/*mask*/,
bit<32>/*start_ipv4*/, bit<32>/*total_ipv4s*/)>,
> +ipv6_prefix:Option,
> +mcast_cfg:  Ref,
> +is_vlan_transparent:bool,
>
>  /* Does this switch have at least one port with type != "router"? */
>  has_non_router_port: bool
> @@ -251,22 +275,24 @@ function ipv6_parse_prefix(s: string):
Option {
>  }
>  }
>
> -(.ls= ls,
> -.has_stateful_acl  = has_stateful_acl,
> -.has_stateless_acl = has_stateless_acl,
> -.has_acls  = has_acls,
> -.has_lb_vip= has_lb_vip,
> -.has_dns_records   = has_dns_records,
> -.has_unknown_ports = has_unknown_ports,
> -.localnet_ports= localnet_ports,
> -.subnet= subnet,
> -.ipv6_prefix   = ipv6_prefix,
> -.mcast_cfg = mcast_cfg,
> -.has_non_router_port = has_non_router_port,
> -.is_vlan_transparent = is_vlan_transparent) :-
> +(.ls = ls,
> +.has_stateful_acl   = has_stateful_acl,
> +

Re: [ovs-dev] [PATCH ovn 2/3] Honor allow priority when stateless present

2021-05-20 Thread Han Zhou
On Mon, May 17, 2021 at 2:47 PM Ihar Hrachyshka  wrote:
>
> For each allow-stateless ACL, a rule was added earlier in the pipeline
> that circumvented setting REGBIT_CONNTRACK_DEFRAG regardless of
> whether other, e.g. allow ACLs with higher priority were present.
>
> Now, when allow-stateless ACLs are present on the switch with other
> allow-related ACLs, for each allow, insert an early pipeline rule that
> would set DEFRAG bit for the corresponding match and priority.
>
> Fixes: 3187b9fef1 ("ovn-northd: introduce new allow-stateless ACL verb")
>

Looks good except that flow changes need to be documented.
Acked-by: Han Zhou 

> Signed-off-by: Ihar Hrachyshka 
> ---
>  northd/ovn-northd.c  |  3 ++
>  northd/ovn_northd.dl |  2 +-
>  tests/ovn-northd.at  | 72 
>  3 files changed, 76 insertions(+), 1 deletion(-)
>
> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> index 26b723165..a410be1de 100644
> --- a/northd/ovn-northd.c
> +++ b/northd/ovn-northd.c
> @@ -5076,6 +5076,9 @@ build_stateful_override_filters(struct ovn_datapath
*od,
>  apply_to_each_acl_of_action(
>  od, port_groups, lflows, "allow-related",
>  build_stateful_override_filter);
> +apply_to_each_acl_of_action(
> +od, port_groups, lflows, "allow",
> +build_stateful_override_filter);
>  }
>
>  static void
> diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl
> index 7c2402be4..fc703f62e 100644
> --- a/northd/ovn_northd.dl
> +++ b/northd/ovn_northd.dl
> @@ -1846,7 +1846,7 @@ for ((.sw = sw@{.ls = ls}, .acl =
, .has_fair_meter = _)) {
>
>  for ((.sw = sw@{.ls = ls}, .acl = , .has_fair_meter
= _)) {
>  if (sw.has_stateless_acl) {
> -if (acl.action == "allow-related") {
> +if ((sw.has_stateful_acl and acl.action == "allow") or
acl.action == "allow-related") {
>  if (acl.direction == "from-lport") {
>  Flow(.logical_datapath = ls._uuid,
>   .stage= s_SWITCH_IN_PRE_ACL(),
> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
> index da75434b6..6c38e1a97 100644
> --- a/tests/ovn-northd.at
> +++ b/tests/ovn-northd.at
> @@ -2755,6 +2755,78 @@ output("lsp2");
>  AT_CLEANUP
>  ])
>
> +OVN_FOR_EACH_NORTHD([
> +AT_SETUP([ovn -- ACL priority: allow-stateless vs allow])
> +ovn_start
> +
> +ovn-nbctl ls-add ls
> +ovn-nbctl lsp-add ls lsp1
> +ovn-nbctl lsp-set-addresses lsp1 00:00:00:00:00:01
> +ovn-nbctl lsp-add ls lsp2
> +ovn-nbctl lsp-set-addresses lsp2 00:00:00:00:00:02
> +
> +for direction in from to; do
> +ovn-nbctl acl-add ls ${direction}-lport 3 "tcp" allow
> +done
> +ovn-nbctl --wait=sb sync
> +
> +flow_eth='eth.src == 00:00:00:00:00:01 && eth.dst == 00:00:00:00:00:02'
> +flow_ip='ip.ttl==64 && ip4.src == 42.42.42.1 && ip4.dst == 66.66.66.66'
> +flow_tcp='tcp && tcp.dst == 80'
> +flow_udp='udp && udp.dst == 80'
> +lsp1_inport=$(fetch_column Port_Binding tunnel_key logical_port=lsp1)
> +
> +# TCP packets should not go to conntrack.
> +flow="inport == \"lsp1\" && ${flow_eth} && ${flow_ip} && ${flow_tcp}"
> +AT_CHECK_UNQUOTED([ovn-trace --minimal ls "${flow}"], [0], [dnl
> +#
tcp,reg14=0x${lsp1_inport},vlan_tci=0x,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0
> +output("lsp2");
> +])
> +
> +# Add allow-stateless with lower priority.
> +for direction in from to; do
> +ovn-nbctl acl-add ls ${direction}-lport 1 tcp allow-stateless
> +done
> +ovn-nbctl --wait=sb sync
> +
> +# TCP packets should still not go to conntrack (allow ACL in effect).
> +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"],
[0], [dnl
> +#
tcp,reg14=0x${lsp1_inport},vlan_tci=0x,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0
> +output("lsp2");
> +])
> +
> +# Add allow-related to the switch.
> +for direction in from to; do
> +ovn-nbctl acl-add ls ${direction}-lport 3 icmp allow-related
> +done
> +ovn-nbctl --wait=sb sync
> +
> +# TCP packets should now go to conntrack (allow ACL in effect, flipped
its meaning to apply conntrack)
> +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"],
[0], [dnl
> +#
tcp,reg14=0x${lsp1_inport},vlan_tci=0x,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,nw_src=42.42.42.1,nw_dst=66.66.66.66,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=0,tp_dst=80,tcp_flags=0
> +ct_next(ct_state=new|trk) {
> +ct_next(ct_state=new|trk) {
> +output("lsp2");
> +};
> +};
> +])
> +
> +# Add allow-stateless with higher priority.
> +for direction in from to; do
> +ovn-nbctl acl-add ls ${direction}-lport 5 tcp allow-stateless
> +done
> +ovn-nbctl --wait=sb sync
> +
> +
> +# TCP packets should no longer go to conntrack (the new allow-stateless
ACL in effect).
> +AT_CHECK_UNQUOTED([ovn-trace --ct new --ct new --minimal ls "${flow}"],
[0], [dnl

Re: [ovs-dev] [PATCH ovn 1/3] Honor allow-related priority when stateless present

2021-05-20 Thread Han Zhou
On Mon, May 17, 2021 at 2:47 PM Ihar Hrachyshka  wrote:
>
> For each allow-stateless ACL, a rule was added earlier in the pipeline
> that circumvented setting REGBIT_CONNTRACK_DEFRAG regardless of
> whether other, e.g. allow-related ACLs with higher priority were
> present.
>
> Now, when allow-stateless ACLs are present on the switch, for each
> allow-related, insert an early pipeline rule that would set DEFRAG bit
> for the corresponding match and priority.
>
> Fixes: 3187b9fef1 ("ovn-northd: introduce new allow-stateless ACL verb")

Thanks for the Fix. I think we need to update northd document for the flow
changes. Please also see some inline comments below.

>
> Signed-off-by: Ihar Hrachyshka 
> ---
>  northd/lswitch.dl|  20 +
>  northd/ovn-northd.c  |  91 +++
>  northd/ovn_northd.dl |  24 ++-
>  tests/ovn-northd.at  | 100 ---
>  4 files changed, 210 insertions(+), 25 deletions(-)
>
> diff --git a/northd/lswitch.dl b/northd/lswitch.dl
> index a1aaebb6d..b73cfd047 100644
> --- a/northd/lswitch.dl
> +++ b/northd/lswitch.dl
> @@ -129,6 +129,23 @@ LogicalSwitchHasStatefulACL(ls, false) :-
>  nb::Logical_Switch(._uuid = ls),
>  not LogicalSwitchStatefulACL(ls, _).
>
> +relation LogicalSwitchStatelessACL(ls: uuid, acl: uuid)
> +
> +LogicalSwitchStatelessACL(ls, acl) :-
> +LogicalSwitchACL(ls, acl),
> +nb::ACL(._uuid = acl, .action = "allow-stateless").
> +
> +// "Pitfalls of projections" in ddlog-new-feature.rst explains why this
> +// is an output relation:
> +output relation LogicalSwitchHasStatelessACL(ls: uuid,
has_stateless_acl: bool)
> +
> +LogicalSwitchHasStatelessACL(ls, true) :-
> +LogicalSwitchStatelessACL(ls, _).
> +
> +LogicalSwitchHasStatelessACL(ls, false) :-
> +nb::Logical_Switch(._uuid = ls),
> +not LogicalSwitchStatelessACL(ls, _).
> +
>  // "Pitfalls of projections" in ddlog-new-feature.rst explains why this
>  // is an output relation:
>  output relation LogicalSwitchHasACLs(ls: uuid, has_acls: bool)
> @@ -208,6 +225,7 @@ LogicalSwitchHasNonRouterPort(ls, false) :-
>  relation (
>  ls:nb::Logical_Switch,
>  has_stateful_acl:  bool,
> +has_stateless_acl: bool,
>  has_acls:  bool,
>  has_lb_vip:bool,
>  has_dns_records:   bool,
> @@ -235,6 +253,7 @@ function ipv6_parse_prefix(s: string):
Option {
>
>  (.ls= ls,
>  .has_stateful_acl  = has_stateful_acl,
> +.has_stateless_acl = has_stateless_acl,
>  .has_acls  = has_acls,
>  .has_lb_vip= has_lb_vip,
>  .has_dns_records   = has_dns_records,
> @@ -247,6 +266,7 @@ function ipv6_parse_prefix(s: string):
Option {
>  .is_vlan_transparent = is_vlan_transparent) :-
>  nb::Logical_Switch[ls],
>  LogicalSwitchHasStatefulACL(ls._uuid, has_stateful_acl),
> +LogicalSwitchHasStatelessACL(ls._uuid, has_stateless_acl),
>  LogicalSwitchHasACLs(ls._uuid, has_acls),
>  LogicalSwitchHasLBVIP(ls._uuid, has_lb_vip),
>  LogicalSwitchHasDNSRecords(ls._uuid, has_dns_records),
> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> index 7464b7fba..26b723165 100644
> --- a/northd/ovn-northd.c
> +++ b/northd/ovn-northd.c
> @@ -4983,6 +4983,38 @@ skip_port_from_conntrack(struct ovn_datapath *od,
struct ovn_port *op,
>  ds_destroy(_out);
>  }
>
> +static bool
> +apply_to_each_acl_of_action(struct ovn_datapath *od,
> +const struct hmap *port_groups,
> +struct hmap *lflows, const char *action,
> +void (*func)(struct ovn_datapath *,
> + const struct nbrec_acl *,
> + struct hmap *))
> +{
> +bool applied = false;
> +for (size_t i = 0; i < od->nbs->n_acls; i++) {
> +const struct nbrec_acl *acl = od->nbs->acls[i];
> +if (!strcmp(acl->action, action)) {
> +func(od, acl, lflows);
> +applied = true;
> +}
> +}
> +
> +struct ovn_port_group *pg;
> +HMAP_FOR_EACH (pg, key_node, port_groups) {
> +if (ovn_port_group_ls_find(pg, >nbs->header_.uuid)) {
> +for (size_t i = 0; i < pg->nb_pg->n_acls; i++) {
> +const struct nbrec_acl *acl = pg->nb_pg->acls[i];
> +if (!strcmp(acl->action, action)) {
> +func(od, acl, lflows);
> +applied = true;
> +}
> +}
> +}
> +}
> +return applied;
> +}
> +
>  static void
>  build_stateless_filter(struct ovn_datapath *od,
> const struct nbrec_acl *acl,
> @@ -5003,28 +5035,47 @@ build_stateless_filter(struct ovn_datapath *od,
>  }
>  }
>
> -static void
> -build_stateless_filters(struct ovn_datapath *od, struct hmap
*port_groups,
> +static bool
> +build_stateless_filters(struct ovn_datapath 

Re: [ovs-dev] [PATCH ovn] physical: do not forward traffic from localport to a localnet one

2021-05-20 Thread Mark Michelson

Acked-by: Mark Michelson

I have a small nit below, but since it's strictly with a comment, it's 
easy for whoever merges this to correct the comment when they do it. 
There's no reason to spin up a new version of this.


On 5/4/21 1:59 PM, Lorenzo Bianconi wrote:

Since the localnet port is available on each hv, do not forward traffic
to the localnet port if it is present in order to avoid switch fdb
misconfiguration.
Related bz: https://bugzilla.redhat.com/show_bug.cgi?id=1942877

Signed-off-by: Lorenzo Bianconi 
---
  controller/physical.c| 23 +++
  include/ovn/logical-fields.h |  4 
  tests/ovn.at | 17 +
  3 files changed, 44 insertions(+)

diff --git a/controller/physical.c b/controller/physical.c
index 96c959d18..258842634 100644
--- a/controller/physical.c
+++ b/controller/physical.c
@@ -1193,6 +1193,11 @@ consider_port_binding(struct ovsdb_idl_index 
*sbrec_port_binding_by_name,
  
  load_logical_ingress_metadata(binding, _ids, ofpacts_p);
  
+if (!strcmp(binding->type, "localport")) {

+/* mark the packet as incoming from a localport */
+put_load(1, MFF_LOG_FLAGS, MLF_LOCALPORT_BIT, 1, ofpacts_p);
+}
+
  /* Resubmit to first logical ingress pipeline table. */
  put_resubmit(OFTABLE_LOG_INGRESS_PIPELINE, ofpacts_p);
  ofctrl_add_flow(flow_table, OFTABLE_PHY_TO_LOG,
@@ -1251,6 +1256,24 @@ consider_port_binding(struct ovsdb_idl_index 
*sbrec_port_binding_by_name,
ofport, flow_table);
  }
  
+/* Table 34, priority 160.


Nit: Table 34 is incorrect here. OFTABLE_CHECK_LOOPBACK is table 39 now. 
I think the issue here is that when the tables shifted, nobody updated 
the comments in this file, and so it's wrong here and several other 
places in physical.c



+ * ===
+ *
+ * Do not forward local traffic from a localport to a localnet port.
+ */
+if (!strcmp(binding->type, "localnet")) {
+/* do not forward traffic from localport to localnet port */
+match_init_catchall();
+ofpbuf_clear(ofpacts_p);
+match_set_metadata(, htonll(dp_key));
+match_set_reg(, MFF_LOG_OUTPORT - MFF_REG0, port_key);
+match_set_reg_masked(, MFF_LOG_FLAGS - MFF_REG0,
+ MLF_LOCALPORT, MLF_LOCALPORT);
+ofctrl_add_flow(flow_table, OFTABLE_CHECK_LOOPBACK, 160,
+binding->header_.uuid.parts[0], ,
+ofpacts_p, >header_.uuid);
+}
+
  } else if (!tun && !is_ha_remote) {
  /* Remote port connected by localnet port */
  /* Table 33, priority 100.
diff --git a/include/ovn/logical-fields.h b/include/ovn/logical-fields.h
index d44b30b30..ef97117b9 100644
--- a/include/ovn/logical-fields.h
+++ b/include/ovn/logical-fields.h
@@ -67,6 +67,7 @@ enum mff_log_flags_bits {
  MLF_LOOKUP_LB_HAIRPIN_BIT = 7,
  MLF_LOOKUP_FDB_BIT = 8,
  MLF_SKIP_SNAT_FOR_LB_BIT = 9,
+MLF_LOCALPORT_BIT = 10,
  };
  
  /* MFF_LOG_FLAGS_REG flag assignments */

@@ -107,6 +108,9 @@ enum mff_log_flags {
  /* Indicate that a packet must not SNAT in the gateway router when
   * load-balancing has taken place. */
  MLF_SKIP_SNAT_FOR_LB = (1 << MLF_SKIP_SNAT_FOR_LB_BIT),
+
+/* Indicate the packet has been received from a localport */
+MLF_LOCALPORT = (1 << MLF_LOCALPORT_BIT),
  };
  
  /* OVN logical fields

diff --git a/tests/ovn.at b/tests/ovn.at
index fe6a7c85b..99764b24b 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -11823,10 +11823,17 @@ OVN_FOR_EACH_NORTHD([
  AT_SETUP([ovn -- localport suppress gARP])
  ovn_start
  
+send_garp() {

+local inport=$1 eth_src=$2 eth_dst=$3 spa=$4 tpa=$5
+local 
request=${eth_dst}${eth_src}08060001080006040001${eth_src}${spa}${eth_dst}${tpa}
+as hv1 ovs-appctl netdev-dummy/receive vif$inport $request
+}
+
  net_add n1
  sim_add hv1
  as hv1
  check ovs-vsctl add-br br-phys
+ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys
  ovn_attach n1 br-phys 192.168.0.1
  
  check ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys

@@ -11837,6 +11844,7 @@ check ovn-nbctl ls-add ls \
  -- lsp-set-addresses lp "00:00:00:00:00:01 10.0.0.1" \
  -- lsp-add ls ln \
  -- lsp-set-type ln localnet \
+-- lsp-set-addresses ln unknown \
  -- lsp-set-options ln network_name=phys \
  -- lsp-add ls lsp \
  -- lsp-set-addresses lsp "00:00:00:00:00:02 10.0.0.2"
@@ -11870,6 +11878,15 @@ AT_CHECK([
  test 0 -eq $pkts
  ])
  
+spa=$(ip_to_hex 10 0 0 1)

+tpa=$(ip_to_hex 10 0 0 100)
+send_garp 1 0001  $spa $tpa
+
+dnl traffic from localport should not be sent to localnet
+AT_CHECK([tcpdump -r hv1/br-phys_n1-tx.pcap arp[[24:4]]=0x0a64 | wc 
-l],[0],[dnl
+0
+],[ignore])
+
  OVN_CLEANUP([hv1])
 

Re: [ovs-dev] [PATCH ovn v3 00/27] ddlog 5x performance improvement

2021-05-20 Thread Mark Michelson

Hi Ben,

I gave these patches another look, and I can say that for patches 1-10, 
and patch 27:


Acked-by: Mark Michelson 

For patch 11, I had suggested previously a configure-time check to 
detect if the correct DDLog version was installed, and you replied with 
an example that you would roll into the next version of the series. But 
I don't see it here. Is it in a different patch series?


For patches 12-26, I don't feel qualified to definitively state that 
it's the "best" way to optimize things, but the changes all make sense 
to me and don't seem to be incorrect, so for patches 12-26:


Acked-by: Mark Michelson 

but I wouldn't be averse to others having a look to be certain.

Thanks!

On 5/7/21 12:06 AM, Ben Pfaff wrote:

This series of patches greatly improves the performance of
ovn-northd-ddlog with the benchmark added in the final patch.  The
first three patches improve both the benchmark for both versions of
ovn-northd.

Here are the timings that I measure in each case.  All of them
include the benefit of the first three patches.  Without those
patches, the C version takes over 500 seconds and the other take much
longer too; the relative timings aren't affected much, it's just all
slower:

C: 106.8s (0.135s ... 1.043s)
ddlog before optimization patches: 176.0s (0.128s ... 1.804s)
ddlog after optimization patches:   35.2s (0.129s ... 0.147s)

(I haven't re-measured for v3.)

v1->v2:
   - Don't remove --output-only-table option from ovsdb2ddlog2c
 in "ovn-northd-ddlog: Intern selected input relations.".
   - New patches "ovn-nbctl: Daemon mode is no longer experimental."
 and "ovn-nbctl: Recommend ovn-appctl instead of ovs-appctl."
 and make similar changes to new ovn-sbctl manpage.
   - Update ovn-sbctl and ovn-nbctl manpages to reference ovn-appctl
 manpage.
   - Various trivial changes suggested by checkpatch.
   - New patches "ovn-nbctl: Fix memory leak in client mode."
 and "ovn-northd-ddlog: Fix two memory leaks." fix memory leaks
 reported by Numan and found by Address Sanitizer.
   - Fix bug introduced into ovsdb2ddlog2c in "ovn-northd-ddlog: Intern
 selected input relations."

v2->v3:
   - Rebase.
   - Apply Mark Michelson's comments for "ovn-sbctl: Add daemon support."
 and to "tests: Miscellaneous debuggability improvements.".
   - New patch "ovn-nbctl: Don't replicate entire database unnecessarily."

Ben Pfaff (12):
   ovn-northd-ddlog: Fix two memory leaks.
   ovn-nbctl: Fix memory leak in client mode.
   ovn-nbctl: Improve manpage.
   ovn-nbctl: Recommend ovn-appctl instead of ovs-appctl.
   ovn-nbctl: Daemon mode is no longer experimental.
   ovn-nbctl: Refactor into infrastructure and northbound details.
   ovn-dbctl: Fix memory leak in client mode.
   ovn-sbctl: Add daemon support.
   ovn-nbctl: Don't replicate entire database unnecessarily.
   tests: Miscellaneous debuggability improvements.
   ovn-northd-ddlog: Preserve NB_Global more carefully.
   tutorial: Add benchmarking test script to run within sandbox.

Leonid Ryzhyk (15):
   ovn-northd-ddlog: Upgrade to ddlog 0.38.
   ovn-northd-ddlog: Remove `lr` field from `Router`.
   ovn-northd-ddlog: Intern the `Router` table.
   ovn-northd-ddlog: Workaround for slow group_by.
   ovn-northd-ddlog: Intern the Switch table.
   ovn-northd-ddlog: Remove `ls` field from `Switch`.
   ovn-northd-ddlog: Intern the SwitchPort table.
   ovn-northd-ddlog: Intern the RouterPort table.
   ovn-northd-ddlog: Remove unused function.
   ovn-northd-ddlog: Eliminate remaining Ref's.
   ovn-northd-ddlog: Eliminate redundant dereferences.
   ovn-northd-ddlog: Intern selected input relations.
   ovn-northd-ddlog: Intern nb::Logical_Router_Port.
   ovn-northd-ddlog: Intern nb::Logical_Switch_Port.
   ovn-northd-ddlog: Remove Router.static_routes.

  NEWS  |5 +-
  manpages.mk   |   17 -
  northd/helpers.dl |   40 +-
  northd/ipam.dl|   61 +-
  northd/lrouter.dl |  188 +--
  northd/lswitch.dl |  243 ++--
  northd/multicast.dl   |   77 +-
  northd/ovn-nb.dlopts  |   10 +
  northd/ovn-northd-ddlog.c |   23 +-
  northd/ovn-sb.dlopts  |1 +
  northd/ovn_northd.dl  | 1045 ---
  northd/ovsdb2ddlog2c  |4 +-
  tests/ovn-sbctl.at|   76 +-
  tests/ovn.at  |   51 +-
  tutorial/automake.mk  |3 +-
  tutorial/northd_ddlog_test.sh |   81 ++
  utilities/automake.mk |   12 +-
  utilities/ovn-dbctl.c | 1230 +
  utilities/ovn-dbctl.h |   61 +
  utilities/ovn-nbctl.8.xml |  667 +
  utilities/ovn-nbctl.c | 2385 ++---
  utilities/ovn-sbctl.8.in  |  317 -
  utilities/ovn-sbctl.8.xml |  580 
  utilities/ovn-sbctl.c |  669 ++---
  24 files changed, 4464 insertions(+), 3382 deletions(-)
  create mode 

Re: [ovs-dev] [PATCH ovn 2/5] ovn-northd: Add support for CoPP.

2021-05-20 Thread Mark Michelson

On 4/29/21 1:04 PM, Lorenzo Bianconi wrote:

From: Dumitru Ceara 

Add new 'Copp' (Control plane protection) table to OVN Northbound DB:
- this stores mappings between control plane protocol names and meters
   that should be used to rate limit controller-destined traffic for
   those protocols.

Add new 'copp' columns to the following OVN Northbound DB tables:
- Logical_Switch
- Logical_Router

For now, no control plane protection policy is installed for any of
the existing flows that punt packets to ovn-controller. This will be
added in follow-up patches.

Add CLI commands in 'ovn-nbctl' to allow the user to manage Control
Plane Protection Policies at different levels (logical switch,
logical router).

Co-authored-by: Lorenzo Bianconi 
Signed-off-by: Lorenzo Bianconi 
Signed-off-by: Dumitru Ceara 
---
  lib/automake.mk   |   2 +
  lib/copp.c| 144 
  lib/copp.h|  60 ++
  northd/ovn-northd.c   |  44 +++---
  ovn-nb.ovsschema  |  18 +++-
  ovn-nb.xml|  78 ++
  tests/ovn-northd.at   |  49 +++
  utilities/ovn-nbctl.8.xml | 116 ++
  utilities/ovn-nbctl.c | 167 ++
  9 files changed, 665 insertions(+), 13 deletions(-)
  create mode 100644 lib/copp.c
  create mode 100644 lib/copp.h

diff --git a/lib/automake.mk b/lib/automake.mk
index 781be2109..20e296fff 100644
--- a/lib/automake.mk
+++ b/lib/automake.mk
@@ -9,6 +9,8 @@ lib_libovn_la_SOURCES = \
lib/actions.c \
lib/chassis-index.c \
lib/chassis-index.h \
+   lib/copp.c \
+   lib/copp.h \
lib/ovn-dirs.h \
lib/expr.c \
lib/extend-table.h \
diff --git a/lib/copp.c b/lib/copp.c
new file mode 100644
index 0..ac53a1094
--- /dev/null
+++ b/lib/copp.c
@@ -0,0 +1,144 @@
+/* Copyright (c) 2021, Red Hat, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include 
+#include 
+
+#include "openvswitch/shash.h"
+#include "db-ctl-base.h"
+#include "smap.h"
+#include "lib/ovn-nb-idl.h"
+#include "lib/copp.h"
+
+static char *copp_proto_names[COPP_PROTO_MAX] = {
+[COPP_ARP]   = "arp",
+[COPP_ARP_RESOLVE]   = "arp-resolve",
+[COPP_DHCPV4_OPTS]   = "dhcpv4-opts",
+[COPP_DHCPV6_OPTS]   = "dhcpv6-opts",
+[COPP_DNS]   = "dns",
+[COPP_EVENT_ELB] = "event-elb",
+[COPP_ICMP4_ERR] = "icmp4-error",
+[COPP_ICMP6_ERR] = "icmp6-error",
+[COPP_IGMP]  = "igmp",
+[COPP_ND_NA] = "nd-na",
+[COPP_ND_NS] = "nd-ns",
+[COPP_ND_NS_RESOLVE] = "nd-ns-resolve",
+[COPP_ND_RA_OPTS]= "nd-ra-opts",
+[COPP_TCP_RESET] = "tcp-reset",
+[COPP_BFD]   = "bfd",
+};
+
+const char *
+copp_proto_get(enum copp_proto proto)
+{
+if (proto >= COPP_PROTO_MAX) {
+return "";
+}
+return copp_proto_names[proto];
+}
+
+const char *
+copp_meter_get(enum copp_proto proto, const struct nbrec_copp *copp,
+   const struct shash *meter_groups)
+{
+if (!copp || proto >= COPP_PROTO_MAX) {
+return NULL;
+}
+
+const char *meter = smap_get(>meters, copp_proto_names[proto]);
+
+if (meter && shash_find(meter_groups, meter)) {
+return meter;
+}
+
+return NULL;
+}
+
+void
+copp_list(struct ctl_context *ctx, const struct nbrec_copp *copp)
+{
+if (!copp) {
+return;
+}
+
+struct smap_node *node;
+
+SMAP_FOR_EACH (node, >meters) {
+ds_put_format(>output, "%s: %s\n", node->key, node->value);
+}
+}
+
+const struct nbrec_copp *
+copp_add_meter(struct ctl_context *ctx, const struct nbrec_copp *copp,
+   const char *proto_name, const char *meter)
+{
+if (!copp) {
+copp = nbrec_copp_insert(ctx->txn);
+}
+
+struct smap meters;
+smap_init();
+smap_clone(, >meters);
+smap_replace(, proto_name, meter);
+nbrec_copp_set_meters(copp, );
+smap_destroy();
+
+return copp;
+}
+
+void
+copp_del_meter(const struct nbrec_copp *copp, const char *proto_name)
+{
+if (!copp) {
+return;
+}
+
+if (proto_name) {
+if (smap_get(>meters, proto_name)) {
+struct smap meters;
+smap_init();
+smap_clone(, >meters);
+smap_remove(, proto_name);
+nbrec_copp_set_meters(copp, );
+smap_destroy();
+}
+

Re: [ovs-dev] [PATCH ovn 1/5] ovn-controller: Add support for Logical_Flow control meters

2021-05-20 Thread Mark Michelson
I think this patch could use some ovn-controller tests to ensure that 
meters configured in the SB end up being applied to the resulting 
controller() actions.


On 4/29/21 1:04 PM, Lorenzo Bianconi wrote:

From: Dumitru Ceara 

Add a new 'controller_meter' column to OVN Southbound Logical_Flow
table. This stores an optional string which should correspond to
the Meter that must be used for rate limiting controller actions
generated by packets hitting the flow.

Add a new 'ofctrl_add_flow_metred' function to create a new 'ovn_flow'


This spelling is odd to me. I think it should be 
"ofctrl_add_flow_metered". I was worried when making this suggestion I 
was being US-centric with my spelling suggestion, but I think other 
variants of English also use the "meter" spelling instead of "metre" 
when referring to a regulating tool. Therefore, I think "metered" is 
preferred over "metred".


(Someone from Canada or from east of the Atlantic can correct me if I'm 
mistaken here).



with an attached controller meter.

Change ofctrl_check_and_add_flow to allow specifying a meter ID for
packets that are punted to controller.

Change consider_logical_flow to parse controller_meter from the logical
flow and use it when building openflow entries.

Add a new 'ctrl_meter_id' field to 'struct ovnact_encode_params' to be
used when encoding controller actions from logical flow actions.

Co-authored-by: Lorenzo Bianconi 
Signed-off-by: Lorenzo Bianconi 
Signed-off-by: Dumitru Ceara 
---
  controller/lflow.c| 40 +++---
  controller/ofctrl.c   | 54 ---
  controller/ofctrl.h   | 21 ++
  controller/physical.c |  7 +++--
  include/ovn/actions.h |  2 ++
  lib/actions.c | 66 +++
  ovn-sb.ovsschema  |  6 ++--
  ovn-sb.xml|  6 
  8 files changed, 139 insertions(+), 63 deletions(-)

diff --git a/controller/lflow.c b/controller/lflow.c
index b8424e1fb..f3f901c32 100644
--- a/controller/lflow.c
+++ b/controller/lflow.c
@@ -557,6 +557,27 @@ update_conj_id_ofs(uint32_t *conj_id_ofs, uint32_t n_conjs)
  return false;
  }
  
+static void

+lflow_parse_ctrl_meter(const struct sbrec_logical_flow *lflow,
+   struct ovn_extend_table *meter_table,
+   uint32_t *meter_id)
+{
+ovs_assert(meter_id);
+*meter_id = NX_CTLR_NO_METER;
+
+if (lflow->controller_meter) {
+*meter_id = ovn_extend_table_assign_id(meter_table,
+   lflow->controller_meter,
+   lflow->header_.uuid);
+if (*meter_id == EXT_TABLE_ID_INVALID) {
+static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1);
+VLOG_WARN_RL(, "Unable to assign id for meter: %s",
+ lflow->controller_meter);
+return;
+}
+}
+}
+
  static void
  add_matches_to_flow_table(const struct sbrec_logical_flow *lflow,
const struct sbrec_datapath_binding *dp,
@@ -572,6 +593,13 @@ add_matches_to_flow_table(const struct sbrec_logical_flow 
*lflow,
  .dp = dp,
  };
  
+/* Parse any meter to be used if this flow should punt packets to

+ * controller.
+ */
+uint32_t ctrl_meter_id = NX_CTLR_NO_METER;
+lflow_parse_ctrl_meter(lflow, l_ctx_out->meter_table,
+   _meter_id);
+
  /* Encode OVN logical actions into OpenFlow. */
  uint64_t ofpacts_stub[1024 / 8];
  struct ofpbuf ofpacts = OFPBUF_STUB_INITIALIZER(ofpacts_stub);
@@ -595,6 +623,7 @@ add_matches_to_flow_table(const struct sbrec_logical_flow 
*lflow,
  .ct_snat_vip_ptable = OFTABLE_CT_SNAT_FOR_VIP,
  .fdb_ptable = OFTABLE_GET_FDB,
  .fdb_lookup_ptable = OFTABLE_LOOKUP_FDB,
+.ctrl_meter_id = ctrl_meter_id,
  };
  ovnacts_encode(ovnacts->data, ovnacts->size, , );
  
@@ -621,9 +650,11 @@ add_matches_to_flow_table(const struct sbrec_logical_flow *lflow,

  }
  }
  if (!m->n) {
-ofctrl_add_flow(l_ctx_out->flow_table, ptable, lflow->priority,
-lflow->header_.uuid.parts[0], >match, ,
->header_.uuid);
+ofctrl_add_flow_metred(l_ctx_out->flow_table, ptable,
+   lflow->priority,
+   lflow->header_.uuid.parts[0], >match,
+   , >header_.uuid,
+   ctrl_meter_id);
  } else {
  uint64_t conj_stubs[64 / 8];
  struct ofpbuf conj;
@@ -641,7 +672,8 @@ add_matches_to_flow_table(const struct sbrec_logical_flow 
*lflow,
  
  ofctrl_add_or_append_flow(l_ctx_out->flow_table, ptable,

lflow->priority, 0,
-  >match, , >header_.uuid);
+   

Re: [ovs-dev] Moving of the primary #openvswitch channel to irc.libera.chat ?

2021-05-20 Thread Ihar Hrachyshka
During the irc meeting today, a question of @openstack bot was raised.
We use the bot to manage and log our weekly meetings, and we would
like to continue using it after migration.

I talked to openstack infra folks (specifically, @fungi), and here is
what I am told:
- openstack still hasn't decided whether to move anywhere from freenode;
- if they move, they will move to oftc that was long time ago
designated as their backup;
- they pre-created all official channels for their sub-projects in
oftc; #openvswitch is not openstack project and so is not owned by
them there;
- I am told that even the fact the bot was enabled for freenode
#openvswitch was probably a mistake since they usually require
ownership for the channel to do that. Regardless,
https://review.opendev.org/c/openstack/project-config/+/714174/ was
merged.

It doesn't seem like there are plans to ever support bot on libera
network. They MAY support it on oftc is tech committee decides to move
(still on agenda). If we want to continue using the bot, we will have
to run our own instance (both the bot itself as well as the web server
to publish logs).

It may be easier to move to oftc and not libera if we want to continue
using the bot without taking over its maintenance. Regardless, we may
have to formally pass over ownership of the channel to openstack
infra.

If you ask my personal opinion, I would take a timeout and watch what
other projects (especially openstack that we rely on here) do and then
decide. Freenode is still operating and can serve the project needs
until we have a better picture.

I hope it helps,
Ihar

On Wed, May 19, 2021 at 4:04 PM Ilya Maximets  wrote:
>
> Hi.
>
> Taking into account some very unhealthy things that happened recently
> with FreeNode network and resigning of lots of its stuff [1], we
> probably need to discuss if Open vSwitch project wants to change the
> IRC server for a primary #openvswitch channel.  User's data is the
> main concern, IIUC, as it's unclear what the new management will
> do with the network.
>
> The main alternative now seems to be the Libera.Chat [2] where most of
> the former FreeNode stuff.
>
> Some projects already announced [3][4] movement to Libera.Chat.  Others
> are discussing the possibility [5].
>
> So, I think, it make sense to discuss the future of #openvswitch
> channel too.  Any thoughts?
>
> Will we have an OVN meeting on a different server tomorrow?
>
> Best regards, Ilya Maximets.
>
> [1] https://blog.bofh.it/debian/id_461
> [2] https://libera.chat/
> [3] 
> https://blog.centos.org/2021/05/centos-irc-channels-moving-to-irc-libera-chat/
> [4] https://sourcehut.org/blog/2021-05-19-liberachat/
> [5] https://mail.kde.org/pipermail/kde-community/2021q2/006888.html
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] CT Meeting Minutes - 20-May

2021-05-20 Thread Aaron Conole
Next meeting: 03-June, 2021

Attendees:
* Aaron
* Paolo
* Korol
* Gaetan

Agenda:

A Security concerns, raised by NVidia
  - tcp_loose mode support
- Kernel supports this via sysctl knob
- Userspace has no such knob or function
- Aconole: will start work on this as a separate series
- Q: Should this be configured via ovs-appctl or ovsdb?
  - synproxy support
- Not used in kernel space, not existing in userspace
- Needs more details (what is the use case, etc).  Lots of technical
  details missing

B Usability:
  - Flow HW offloads
 * Nvidia POV, wants to enable HWOL but running into issues
  - Discrepencies between sw & hw datapaths
 * Tcp_loose / tcp_liberal - should these be set via the DB
 * Tcp_liberal / no seq checking is already defaulted in kernel, and
   cannot be modified at run time
 * Aconole: Maybe we can fire an RFC to match this behavior
 * Should tcp_liberal be configurable via the OVSDB, or tied to HWOL?
  - IP Fragmentation behavior in kernel does not match userspace
 * Userspace ct() calls do not lose fragment boundaries, while
   kernel space ct() calls can:
   
https://patchwork.kernel.org/project/netdevbpf/patch/20210319204307.3128280-1-acon...@redhat.com/

Patch statuses
Q: How to increase reviews?
 - Checking with internal teams for more assistance


Individual Patch Status Discussions:
1 
https://patchwork.ozlabs.org/project/openvswitch/patch/161943952616.327630.4676425878426520994.stgit@ebuild/
  - Aaron to reply upstream re: issues with Dumitru's suggestion
  - Paolo V to also reply

2 https://patchwork.ozlabs.org/project/openvswitch/list/?series=217356=*
  - Gaetan to submit a new revision

3 https://patchwork.ozlabs.org/project/openvswitch/list/?series=241223
  - Pending resolution on (1)

4 
https://patchwork.ozlabs.org/project/openvswitch/patch/161980273210.95131.8863959545804983824.st...@fed.void/
  - No change

5 
https://patchwork.ozlabs.org/project/openvswitch/patch/161643132356.178714.585911287654021306.st...@fed.void/
  - Paolo asked someone with SCTP experience to review

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3] conntrack: document all-zero IP SNAT behavior and add a test case

2021-05-20 Thread Aaron Conole
Dumitru Ceara  writes:

> On 4/26/21 2:19 PM, Eelco Chaudron wrote:
>> Currently, conntrack in the kernel has an undocumented feature referred
>> to as all-zero IP address NULL SNAT. Basically, when a source port
>> collision is detected during the commit, the source port will be
>> translated to an ephemeral port. If there is no collision, no SNAT is
>> performed.
>> 
>> This patchset documents this behavior and adds a self-test to verify
>> it's not changing.
>> 
>> Signed-off-by: Eelco Chaudron 
>> ---
>> v3: Renamed NULL SNAT to all-zero IP SNAT.
>> v2: Fixed NULL SNAT to only work in the -rpl state to be inline with
>> OpenShift-SDN's behavior.
>
> Hi Eelco,
>
> Would it be possible to add this capability to the list of kernel
> Datapath.capabilities ovsdb column? [0]
>
> Given that the patch to add userspace datapath support for all-zero IP
> SNAT is not accepted yet [1], and even if it does it will likely not be
> backported to LTS because it's a feature, this would make it easier for
> OVN (for example ovn-controller) to determine at runtime if it should
> use all-zero IP SNAT or not.

I think it would be rather difficult to add this to the userspace patch
being proposed.

Actually, if we want something where datapath can detect whether such
feature is supported, there isn't a good test.  For instance, normally
we would attempt to insert a flow that has the characteristics we desire
and look for a failure to insert.  In the case of all-zero SNAT,
both datapaths can "accept" it, so it becomes difficult.

Also, the netlink datapath under linux has had this support since it was
introduced, so we're really just documenting it here.  It can still be
used in older releases.  OTOH, userspace doesn't have such support.
Both datapaths will accept flows with SNAT set to 0, as far as I can
tell.  This means there's not an easy way to distinguish them.

Maybe we need a change to this patchset to reject such flows from the
netdev datapath, and then we can modify [1] to remove such limitation.
Then it can be detected by the datapath capabilities that already exist.

Maybe I missed something, or misunderstood something.

Thoughts?

> [0]
> https://github.com/openvswitch/ovs/commit/27501802d09f782b8133031c1eae3394ae5ce147
>
> [1] https://patchwork.ozlabs.org/project/openvswitch/list/?series=241223
>
> Thanks,
> Dumitru

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn] northd: Avoid memory reallocation while building lb rules.

2021-05-20 Thread Ilya Maximets
This is one of the hottest points in the northd in case of big number
of load balancers and we're reallocating matches and actions several
times for each vIP for each load balancer.

Fix that by re-using the allocated memory and just clearing dynamic
strings for all subsequnet IPs.

Signed-off-by: Ilya Maximets 
---
 northd/ovn-northd.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
index 0e5092a87..9fc4deb76 100644
--- a/northd/ovn-northd.c
+++ b/northd/ovn-northd.c
@@ -6013,13 +6013,17 @@ static void
 build_lb_rules(struct ovn_datapath *od, struct hmap *lflows,
struct ovn_northd_lb *lb)
 {
+struct ds action = DS_EMPTY_INITIALIZER;
+struct ds match = DS_EMPTY_INITIALIZER;
+
 for (size_t i = 0; i < lb->n_vips; i++) {
 struct ovn_lb_vip *lb_vip = >vips[i];
 struct ovn_northd_lb_vip *lb_vip_nb = >vips_nb[i];
-
-struct ds action = DS_EMPTY_INITIALIZER;
 const char *ip_match = NULL;
 
+ds_clear();
+ds_clear();
+
 /* Store the original destination IP to be used when generating
  * hairpin flows.
  */
@@ -6055,7 +6059,6 @@ build_lb_rules(struct ovn_datapath *od, struct hmap 
*lflows,
 build_lb_vip_actions(lb_vip, lb_vip_nb, ,
  lb->selection_fields, true);
 
-struct ds match = DS_EMPTY_INITIALIZER;
 ds_put_format(, "ct.new && %s.dst == %s", ip_match,
   lb_vip->vip_str);
 if (lb_vip->vip_port) {
@@ -6068,10 +6071,9 @@ build_lb_rules(struct ovn_datapath *od, struct hmap 
*lflows,
 ds_cstr(), ds_cstr(),
 >nlb->header_);
 }
-
-ds_destroy();
-ds_destroy();
 }
+ds_destroy();
+ds_destroy();
 }
 
 static void
-- 
2.26.3

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn 2/2] ovn-northd: Fix IPv6 ECMP symmetric reply flows

2021-05-20 Thread Ilya Maximets
On 5/20/21 6:30 PM, Mark Gray wrote:
> When adding ECMP routes with symmetric replies, ovn-northd
> adds a priority 100 flow in the table "lr_in_ip_routing" which bypasses
> the ECMP routes for replies by using the ct.rpl field. Lower priority
> flows are also added in this table for each route to IPv4/6 network. These
> flows have a priority that is calculated by build_route_match() as a function
> of the length of the prefix of the route. For long prefixes the priority of
> these flows exceeds that of the symmetric reply flow causing unexpected
> behaviour. This is particularly seen with IPv6 due to the long length of IPv6
> prefixes.
> 
> This patch changes the priority of the symmetric reply flow to 300
> which will ensure it will have a greater priority than the route
> flows even at the maximum prefix length of 128.
> 
> This patch also adds additional tests for this condition.
> 
> Reported-at: https://bugzilla.redhat.com/1959008
> Fixes: 4fdca656857d ("Add ECMP symmetric replies.")
> Signed-off-by: Mark Gray 
> ---
>  northd/ovn-northd.8.xml |   2 +-
>  northd/ovn-northd.c |   2 +-
>  tests/ovn.at| 127 +++
>  tests/system-ovn.at | 162 
>  4 files changed, 291 insertions(+), 2 deletions(-)

Shouldn't there be a corresponding ddlog part?

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 5/5] dpif-netlink: Fix send of uninitialized memory in ct limit requests.

2021-05-20 Thread Ilya Maximets
On 5/20/21 7:46 PM, Ilya Maximets wrote:
> On 5/20/21 6:55 PM, Mark Gray wrote:
>> On 04/04/2021 18:31, Ilya Maximets wrote:
>>> ct limit requests never initializes the whole 'struct ovs_zone_limit'
>>> sending uninitialized stack memory to kernel:
>>>
>>>  Syscall param sendmsg(msg.msg_iov[0]) points to uninitialised byte(s)
>>> at 0x5E23867: sendmsg (in /usr/lib64/libpthread-2.28.so)
>>> by 0x54F761: nl_sock_transact_multiple__ (netlink-socket.c:858)
>>> by 0x54FB6E: nl_sock_transact_multiple.part.9 (netlink-socket.c:1079)
>>> by 0x54FCC0: nl_sock_transact_multiple (netlink-socket.c:1044)
>>> by 0x54FCC0: nl_sock_transact (netlink-socket.c:1108)
>>> by 0x550B6F: nl_transact (netlink-socket.c:1804)
>>> by 0x53BEA2: dpif_netlink_ct_get_limits (dpif-netlink.c:3052)
>>> by 0x588B57: dpctl_ct_get_limits (dpctl.c:2178)
>>> by 0x586FF2: dpctl_unixctl_handler (dpctl.c:2870)
>>> by 0x52C241: process_command (unixctl.c:310)
>>> by 0x52C241: run_connection (unixctl.c:344)
>>> by 0x52C241: unixctl_server_run (unixctl.c:395)
>>> by 0x407526: main (ovs-vswitchd.c:128)
>>>   Address 0x10b87480 is 32 bytes inside a block of size 4,096 alloc'd
>>> at 0x4C30F0B: malloc (vg_replace_malloc.c:307)
>>> by 0x52CDE4: xmalloc (util.c:138)
>>> by 0x4F7E07: ofpbuf_init (ofpbuf.c:123)
>>> by 0x4F7E07: ofpbuf_new (ofpbuf.c:151)
>>> by 0x53BDE3: dpif_netlink_ct_get_limits (dpif-netlink.c:3025)
>>> by 0x588B57: dpctl_ct_get_limits (dpctl.c:2178)
>>> by 0x586FF2: dpctl_unixctl_handler (dpctl.c:2870)
>>> by 0x52C241: process_command (unixctl.c:310)
>>> by 0x52C241: run_connection (unixctl.c:344)
>>> by 0x52C241: unixctl_server_run (unixctl.c:395)
>>> by 0x407526: main (ovs-vswitchd.c:128)
>>>   Uninitialised value was created by a stack allocation
>>> at 0x46AAA0: ct_dpif_get_limits (ct-dpif.c:197)
>>>
>>> Fix that by using designated initializers that will clear all the
>>> non-specified fields.
>>>
>>> CC: Yi-Hung Wei 
>>> Fixes: 906ff9d229ee ("dpif-netlink: Implement conntrack zone limit")
>>> Signed-off-by: Ilya Maximets 
>>> ---
>>>  lib/dpif-netlink.c | 24 ++--
>>>  1 file changed, 14 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
>>> index ceb56c685..ee96f4011 100644
>>> --- a/lib/dpif-netlink.c
>>> +++ b/lib/dpif-netlink.c
>>> @@ -2923,8 +2923,6 @@ dpif_netlink_ct_set_limits(struct dpif *dpif 
>>> OVS_UNUSED,
>>> const uint32_t *default_limits,
>>> const struct ovs_list *zone_limits)
>>>  {
>>> -struct ovs_zone_limit req_zone_limit;
>>> -
>>>  if (ovs_ct_limit_family < 0) {
>>>  return EOPNOTSUPP;
>>>  }
>>> @@ -2941,8 +2939,10 @@ dpif_netlink_ct_set_limits(struct dpif *dpif 
>>> OVS_UNUSED,
>>>  size_t opt_offset;
>>>  opt_offset = nl_msg_start_nested(request, 
>>> OVS_CT_LIMIT_ATTR_ZONE_LIMIT);
>>>  if (default_limits) {
>>> -req_zone_limit.zone_id = OVS_ZONE_LIMIT_DEFAULT_ZONE;
>>> -req_zone_limit.limit = *default_limits;
>>> +struct ovs_zone_limit req_zone_limit = {
>>> +.zone_id = OVS_ZONE_LIMIT_DEFAULT_ZONE,
>>> +.limit   = *default_limits,
>>> +};
>>>  nl_msg_put(request, _zone_limit, sizeof req_zone_limit);
>>>  }
>>>  
>>> @@ -2950,8 +2950,10 @@ dpif_netlink_ct_set_limits(struct dpif *dpif 
>>> OVS_UNUSED,
>>>  struct ct_dpif_zone_limit *zone_limit;
>>>  
>>>  LIST_FOR_EACH (zone_limit, node, zone_limits) {
>>> -req_zone_limit.zone_id = zone_limit->zone;
>>> -req_zone_limit.limit = zone_limit->limit;
>>> +struct ovs_zone_limit req_zone_limit = {
>>> +.zone_id = zone_limit->zone,
>>> +.limit   = zone_limit->limit,
>>> +};
>>>  nl_msg_put(request, _zone_limit, sizeof req_zone_limit);
>>>  }
>>>  }
>>> @@ -3035,8 +3037,9 @@ dpif_netlink_ct_get_limits(struct dpif *dpif 
>>> OVS_UNUSED,
>>>  size_t opt_offset = nl_msg_start_nested(request,
>>>  
>>> OVS_CT_LIMIT_ATTR_ZONE_LIMIT);
>>>  
>>> -struct ovs_zone_limit req_zone_limit;
>>> -req_zone_limit.zone_id = OVS_ZONE_LIMIT_DEFAULT_ZONE;
>>> +struct ovs_zone_limit req_zone_limit = {
>>> +.zone_id = OVS_ZONE_LIMIT_DEFAULT_ZONE,
>>> +};
>>>  nl_msg_put(request, _zone_limit, sizeof req_zone_limit);
>>>  
>>>  struct ct_dpif_zone_limit *zone_limit;
>>> @@ -3086,8 +3089,9 @@ dpif_netlink_ct_del_limits(struct dpif *dpif 
>>> OVS_UNUSED,
>>>  
>>>  struct ct_dpif_zone_limit *zone_limit;
>>>  LIST_FOR_EACH (zone_limit, node, zone_limits) {
>>> -struct ovs_zone_limit req_zone_limit;
>>> -req_zone_limit.zone_id = zone_limit->zone;
>>> +struct ovs_zone_limit req_zone_limit = {
>>> +

Re: [ovs-dev] [PATCH ovn 0/2] Fix ECMP symmetric replies for IPv6

2021-05-20 Thread Mark Michelson

Looks good to me.

For the series:
Acked-by: Mark Michelson 

On 5/20/21 12:30 PM, Mark Gray wrote:



Mark Gray (2):
   ovn-nb.xml: Fix typo
   ovn-northd: Fix IPv6 ECMP symmetric reply flows

  northd/ovn-northd.8.xml |   2 +-
  northd/ovn-northd.c |   2 +-
  ovn-nb.xml  |   2 +-
  tests/ovn.at| 127 +++
  tests/system-ovn.at | 162 
  5 files changed, 292 insertions(+), 3 deletions(-)



___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 5/5] dpif-netlink: Fix send of uninitialized memory in ct limit requests.

2021-05-20 Thread Ilya Maximets
On 5/20/21 6:55 PM, Mark Gray wrote:
> On 04/04/2021 18:31, Ilya Maximets wrote:
>> ct limit requests never initializes the whole 'struct ovs_zone_limit'
>> sending uninitialized stack memory to kernel:
>>
>>  Syscall param sendmsg(msg.msg_iov[0]) points to uninitialised byte(s)
>> at 0x5E23867: sendmsg (in /usr/lib64/libpthread-2.28.so)
>> by 0x54F761: nl_sock_transact_multiple__ (netlink-socket.c:858)
>> by 0x54FB6E: nl_sock_transact_multiple.part.9 (netlink-socket.c:1079)
>> by 0x54FCC0: nl_sock_transact_multiple (netlink-socket.c:1044)
>> by 0x54FCC0: nl_sock_transact (netlink-socket.c:1108)
>> by 0x550B6F: nl_transact (netlink-socket.c:1804)
>> by 0x53BEA2: dpif_netlink_ct_get_limits (dpif-netlink.c:3052)
>> by 0x588B57: dpctl_ct_get_limits (dpctl.c:2178)
>> by 0x586FF2: dpctl_unixctl_handler (dpctl.c:2870)
>> by 0x52C241: process_command (unixctl.c:310)
>> by 0x52C241: run_connection (unixctl.c:344)
>> by 0x52C241: unixctl_server_run (unixctl.c:395)
>> by 0x407526: main (ovs-vswitchd.c:128)
>>   Address 0x10b87480 is 32 bytes inside a block of size 4,096 alloc'd
>> at 0x4C30F0B: malloc (vg_replace_malloc.c:307)
>> by 0x52CDE4: xmalloc (util.c:138)
>> by 0x4F7E07: ofpbuf_init (ofpbuf.c:123)
>> by 0x4F7E07: ofpbuf_new (ofpbuf.c:151)
>> by 0x53BDE3: dpif_netlink_ct_get_limits (dpif-netlink.c:3025)
>> by 0x588B57: dpctl_ct_get_limits (dpctl.c:2178)
>> by 0x586FF2: dpctl_unixctl_handler (dpctl.c:2870)
>> by 0x52C241: process_command (unixctl.c:310)
>> by 0x52C241: run_connection (unixctl.c:344)
>> by 0x52C241: unixctl_server_run (unixctl.c:395)
>> by 0x407526: main (ovs-vswitchd.c:128)
>>   Uninitialised value was created by a stack allocation
>> at 0x46AAA0: ct_dpif_get_limits (ct-dpif.c:197)
>>
>> Fix that by using designated initializers that will clear all the
>> non-specified fields.
>>
>> CC: Yi-Hung Wei 
>> Fixes: 906ff9d229ee ("dpif-netlink: Implement conntrack zone limit")
>> Signed-off-by: Ilya Maximets 
>> ---
>>  lib/dpif-netlink.c | 24 ++--
>>  1 file changed, 14 insertions(+), 10 deletions(-)
>>
>> diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
>> index ceb56c685..ee96f4011 100644
>> --- a/lib/dpif-netlink.c
>> +++ b/lib/dpif-netlink.c
>> @@ -2923,8 +2923,6 @@ dpif_netlink_ct_set_limits(struct dpif *dpif 
>> OVS_UNUSED,
>> const uint32_t *default_limits,
>> const struct ovs_list *zone_limits)
>>  {
>> -struct ovs_zone_limit req_zone_limit;
>> -
>>  if (ovs_ct_limit_family < 0) {
>>  return EOPNOTSUPP;
>>  }
>> @@ -2941,8 +2939,10 @@ dpif_netlink_ct_set_limits(struct dpif *dpif 
>> OVS_UNUSED,
>>  size_t opt_offset;
>>  opt_offset = nl_msg_start_nested(request, OVS_CT_LIMIT_ATTR_ZONE_LIMIT);
>>  if (default_limits) {
>> -req_zone_limit.zone_id = OVS_ZONE_LIMIT_DEFAULT_ZONE;
>> -req_zone_limit.limit = *default_limits;
>> +struct ovs_zone_limit req_zone_limit = {
>> +.zone_id = OVS_ZONE_LIMIT_DEFAULT_ZONE,
>> +.limit   = *default_limits,
>> +};
>>  nl_msg_put(request, _zone_limit, sizeof req_zone_limit);
>>  }
>>  
>> @@ -2950,8 +2950,10 @@ dpif_netlink_ct_set_limits(struct dpif *dpif 
>> OVS_UNUSED,
>>  struct ct_dpif_zone_limit *zone_limit;
>>  
>>  LIST_FOR_EACH (zone_limit, node, zone_limits) {
>> -req_zone_limit.zone_id = zone_limit->zone;
>> -req_zone_limit.limit = zone_limit->limit;
>> +struct ovs_zone_limit req_zone_limit = {
>> +.zone_id = zone_limit->zone,
>> +.limit   = zone_limit->limit,
>> +};
>>  nl_msg_put(request, _zone_limit, sizeof req_zone_limit);
>>  }
>>  }
>> @@ -3035,8 +3037,9 @@ dpif_netlink_ct_get_limits(struct dpif *dpif 
>> OVS_UNUSED,
>>  size_t opt_offset = nl_msg_start_nested(request,
>>  
>> OVS_CT_LIMIT_ATTR_ZONE_LIMIT);
>>  
>> -struct ovs_zone_limit req_zone_limit;
>> -req_zone_limit.zone_id = OVS_ZONE_LIMIT_DEFAULT_ZONE;
>> +struct ovs_zone_limit req_zone_limit = {
>> +.zone_id = OVS_ZONE_LIMIT_DEFAULT_ZONE,
>> +};
>>  nl_msg_put(request, _zone_limit, sizeof req_zone_limit);
>>  
>>  struct ct_dpif_zone_limit *zone_limit;
>> @@ -3086,8 +3089,9 @@ dpif_netlink_ct_del_limits(struct dpif *dpif 
>> OVS_UNUSED,
>>  
>>  struct ct_dpif_zone_limit *zone_limit;
>>  LIST_FOR_EACH (zone_limit, node, zone_limits) {
>> -struct ovs_zone_limit req_zone_limit;
>> -req_zone_limit.zone_id = zone_limit->zone;
>> +struct ovs_zone_limit req_zone_limit = {
>> +.zone_id = zone_limit->zone,
> 
> Does this set 'limit' and 'count' automatically to 0? Is this behaviour
> specified by the relevant c standard or 

Re: [ovs-dev] [PATCH v3] ofproto-dpif: APIs and CLI option to add/delete static fdb entry

2021-05-20 Thread Vasu Dasari
Thank you Eelco for testing the patch.

My responses are inline:

*Vasu Dasari*


On Thu, May 20, 2021 at 5:20 AM Eelco Chaudron  wrote:

>
>
> On 14 May 2021, at 21:33, Vasu Dasari wrote:
>
> > Currently there is an option to add/flush/show ARP/ND neighbor. This
> covers L3
> > side.  For L2 side, there is only fdb show command. This patch gives an
> option
> > to add/del an fdb entry via ovs-appctl. ovs-appctl command looks like
> this:
> >
> > To add:
> > ovs-appctl fdb/add
> > ovs-appctl fdb/add br0 p1 0 50:54:00:00:00:05
> >
> > To del:
> > ovs-appctl fdb/del
> > ovs-appctl fdb/del br0 p1 0 50:54:00:00:00:05
> >
> > Static entry should not age. To indicate that entry being programmed is
> a static entry,
> > 'expires' field in 'struct mac_entry' will be set to a
> MAC_ENTRY_AGE_STATIC_ENTRY. A
> > check for this value is made while deleting mac entry as part of regular
> aging process.
> > Another check as part of mac-update process, when a packet with same
> source mac as this
> > entry arrives on the configured port will not modify the expires field
> >
> > Added two new APIs to provide convenient interface to add and delete
> static-macs.
> > void xlate_add_static_mac_entry(const struct ofproto_dpif *, ofp_port_t
> in_port,
> >struct eth_addr dl_src, int vlan);
> > void xlate_delete_static_mac_entry(const struct ofproto_dpif *,
> >   struct eth_addr dl_src, int vlan);
> >
> > Signed-off-by: Vasu Dasari 
> > Reported-at:
> https://mail.openvswitch.org/pipermail/ovs-discuss/2019-June/048894.html
> > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1597752
>
> I did some testing and found the issues below, once fixed, I’ll do a full
> code review.

When you do an FDB flush, it also clears the static FDB entries. I think
> this is wrong, as all hardware vendors I know will retain the static FDB
> entries.
>

I took the liberty of having fdb/flush clear static entries as well. I do
not mind changing the code to delete only dynamic ones. Will take care of
this.


> When you create a static entry for lets say Port A, when a packet with the
> same MAC comes from Port B the entry will be updated with Port B. This
> should not happen for static entries.
>
> I agree. I tested this case too. It fails. Will fix it.

> When you add a static MAC entry, the command just returns "OK". Other
> commands do not return anything on a successful addition. You should either
> follow the same behavior or be more verbose (Static FDB successfully
> added?) on your return.
>
> Sure, will make the command return error string only if it fails,
otherwise will be silent.

> Also, it might be nice to be more verbose when you replace an existing
> static or dynamic FDB entry, i.e. especially if the physical port is
> different (mac move case).
>
>
Sure, will add more detailed output.

> Cheers,
>
>
> Eelco
>
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 5/5] dpif-netlink: Fix send of uninitialized memory in ct limit requests.

2021-05-20 Thread Mark Gray
On 04/04/2021 18:31, Ilya Maximets wrote:
> ct limit requests never initializes the whole 'struct ovs_zone_limit'
> sending uninitialized stack memory to kernel:
> 
>  Syscall param sendmsg(msg.msg_iov[0]) points to uninitialised byte(s)
> at 0x5E23867: sendmsg (in /usr/lib64/libpthread-2.28.so)
> by 0x54F761: nl_sock_transact_multiple__ (netlink-socket.c:858)
> by 0x54FB6E: nl_sock_transact_multiple.part.9 (netlink-socket.c:1079)
> by 0x54FCC0: nl_sock_transact_multiple (netlink-socket.c:1044)
> by 0x54FCC0: nl_sock_transact (netlink-socket.c:1108)
> by 0x550B6F: nl_transact (netlink-socket.c:1804)
> by 0x53BEA2: dpif_netlink_ct_get_limits (dpif-netlink.c:3052)
> by 0x588B57: dpctl_ct_get_limits (dpctl.c:2178)
> by 0x586FF2: dpctl_unixctl_handler (dpctl.c:2870)
> by 0x52C241: process_command (unixctl.c:310)
> by 0x52C241: run_connection (unixctl.c:344)
> by 0x52C241: unixctl_server_run (unixctl.c:395)
> by 0x407526: main (ovs-vswitchd.c:128)
>   Address 0x10b87480 is 32 bytes inside a block of size 4,096 alloc'd
> at 0x4C30F0B: malloc (vg_replace_malloc.c:307)
> by 0x52CDE4: xmalloc (util.c:138)
> by 0x4F7E07: ofpbuf_init (ofpbuf.c:123)
> by 0x4F7E07: ofpbuf_new (ofpbuf.c:151)
> by 0x53BDE3: dpif_netlink_ct_get_limits (dpif-netlink.c:3025)
> by 0x588B57: dpctl_ct_get_limits (dpctl.c:2178)
> by 0x586FF2: dpctl_unixctl_handler (dpctl.c:2870)
> by 0x52C241: process_command (unixctl.c:310)
> by 0x52C241: run_connection (unixctl.c:344)
> by 0x52C241: unixctl_server_run (unixctl.c:395)
> by 0x407526: main (ovs-vswitchd.c:128)
>   Uninitialised value was created by a stack allocation
> at 0x46AAA0: ct_dpif_get_limits (ct-dpif.c:197)
> 
> Fix that by using designated initializers that will clear all the
> non-specified fields.
> 
> CC: Yi-Hung Wei 
> Fixes: 906ff9d229ee ("dpif-netlink: Implement conntrack zone limit")
> Signed-off-by: Ilya Maximets 
> ---
>  lib/dpif-netlink.c | 24 ++--
>  1 file changed, 14 insertions(+), 10 deletions(-)
> 
> diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
> index ceb56c685..ee96f4011 100644
> --- a/lib/dpif-netlink.c
> +++ b/lib/dpif-netlink.c
> @@ -2923,8 +2923,6 @@ dpif_netlink_ct_set_limits(struct dpif *dpif OVS_UNUSED,
> const uint32_t *default_limits,
> const struct ovs_list *zone_limits)
>  {
> -struct ovs_zone_limit req_zone_limit;
> -
>  if (ovs_ct_limit_family < 0) {
>  return EOPNOTSUPP;
>  }
> @@ -2941,8 +2939,10 @@ dpif_netlink_ct_set_limits(struct dpif *dpif 
> OVS_UNUSED,
>  size_t opt_offset;
>  opt_offset = nl_msg_start_nested(request, OVS_CT_LIMIT_ATTR_ZONE_LIMIT);
>  if (default_limits) {
> -req_zone_limit.zone_id = OVS_ZONE_LIMIT_DEFAULT_ZONE;
> -req_zone_limit.limit = *default_limits;
> +struct ovs_zone_limit req_zone_limit = {
> +.zone_id = OVS_ZONE_LIMIT_DEFAULT_ZONE,
> +.limit   = *default_limits,
> +};
>  nl_msg_put(request, _zone_limit, sizeof req_zone_limit);
>  }
>  
> @@ -2950,8 +2950,10 @@ dpif_netlink_ct_set_limits(struct dpif *dpif 
> OVS_UNUSED,
>  struct ct_dpif_zone_limit *zone_limit;
>  
>  LIST_FOR_EACH (zone_limit, node, zone_limits) {
> -req_zone_limit.zone_id = zone_limit->zone;
> -req_zone_limit.limit = zone_limit->limit;
> +struct ovs_zone_limit req_zone_limit = {
> +.zone_id = zone_limit->zone,
> +.limit   = zone_limit->limit,
> +};
>  nl_msg_put(request, _zone_limit, sizeof req_zone_limit);
>  }
>  }
> @@ -3035,8 +3037,9 @@ dpif_netlink_ct_get_limits(struct dpif *dpif OVS_UNUSED,
>  size_t opt_offset = nl_msg_start_nested(request,
>  
> OVS_CT_LIMIT_ATTR_ZONE_LIMIT);
>  
> -struct ovs_zone_limit req_zone_limit;
> -req_zone_limit.zone_id = OVS_ZONE_LIMIT_DEFAULT_ZONE;
> +struct ovs_zone_limit req_zone_limit = {
> +.zone_id = OVS_ZONE_LIMIT_DEFAULT_ZONE,
> +};
>  nl_msg_put(request, _zone_limit, sizeof req_zone_limit);
>  
>  struct ct_dpif_zone_limit *zone_limit;
> @@ -3086,8 +3089,9 @@ dpif_netlink_ct_del_limits(struct dpif *dpif OVS_UNUSED,
>  
>  struct ct_dpif_zone_limit *zone_limit;
>  LIST_FOR_EACH (zone_limit, node, zone_limits) {
> -struct ovs_zone_limit req_zone_limit;
> -req_zone_limit.zone_id = zone_limit->zone;
> +struct ovs_zone_limit req_zone_limit = {
> +.zone_id = zone_limit->zone,

Does this set 'limit' and 'count' automatically to 0? Is this behaviour
specified by the relevant c standard or is it compiler dependent?

> +};
>  nl_msg_put(request, _zone_limit, sizeof req_zone_limit);
>  }
>  

Re: [ovs-dev] [PATCH 4/5] ofproto-dpif: Fix use of uninitialized attributes of timeout policy.

2021-05-20 Thread Mark Gray
On 04/04/2021 18:31, Ilya Maximets wrote:
> 'cdtp' is allocated on a stack and it has uninitialized 'present'
> field that specifies which attributes are actually set.  This
> causes use of uninitialized attributes.
> 
>  Conditional jump or move depends on uninitialised value(s)
> at 0x539651: dpif_netlink_get_nl_tp_udp_attrs (dpif-netlink.c:3206)
> by 0x539651: dpif_netlink_get_nl_tp_attrs (dpif-netlink.c:3234)
> by 0x539651: dpif_netlink_ct_set_timeout_policy (dpif-netlink.c:3370)
> by 0x42B615: ct_add_timeout_policy_to_dpif (ofproto-dpif.c:5421)
> by 0x42B615: ct_set_zone_timeout_policy (ofproto-dpif.c:5525)
> by 0x40BD98: ct_zones_reconfigure (bridge.c:756)
> by 0x40BD98: datapath_reconfigure (bridge.c:804)
> by 0x40D737: bridge_reconfigure (bridge.c:903)
> by 0x411720: bridge_run (bridge.c:3331)
> by 0x40751C: main (ovs-vswitchd.c:127)
> 
> Clearing the whole structure to avoid this kind of problems.
> 
> CC: Yi-Hung Wei 
> Fixes: 993cae678bca ("ofproto-dpif: Consume CT_Zone, and CT_Timeout_Policy 
> tables")
> Signed-off-by: Ilya Maximets 
> ---
>  ofproto/ofproto-dpif.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
> index fd0b2fdea..47db9bb57 100644
> --- a/ofproto/ofproto-dpif.c
> +++ b/ofproto/ofproto-dpif.c
> @@ -5413,6 +5413,8 @@ ct_add_timeout_policy_to_dpif(struct dpif *dpif,
>  struct ct_dpif_timeout_policy cdtp;
>  struct simap_node *node;
>  
> +memset(, 0, sizeof cdtp);
> +
>  cdtp.id = ct_tp->tp_id;
>  SIMAP_FOR_EACH (node, _tp->tp) {
>  ct_dpif_set_timeout_policy_attr_by_name(, node->name, 
> node->data);
> 
LGTM

Acked-by: Mark D. Gray 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 1/5] netdev-linux: Fix use of uninitialized LAG master name.

2021-05-20 Thread Mark Gray
On 04/04/2021 18:31, Ilya Maximets wrote:
> 'if_indextoname' may fail leaving the 'master_name' uninitialized:
> 
>  Conditional jump or move depends on uninitialised value(s)
> at 0x4C34329: strlen (vg_replace_strmem.c:459)
> by 0x51C638: hash_string (hash.h:342)
> by 0x51C638: hash_name (shash.c:28)
> by 0x51CC51: shash_find (shash.c:231)
> by 0x51CD38: shash_find_data (shash.c:245)
> by 0x4A797F: netdev_from_name (netdev.c:2013)
> by 0x544148: netdev_linux_update_lag (netdev-linux.c:676)
> by 0x544148: netdev_linux_run (netdev-linux.c:769)
> by 0x4A5997: netdev_run (netdev.c:186)
> by 0x40752B: main (ovs-vswitchd.c:129)
>   Uninitialised value was created by a stack allocation
> at 0x543AFA: netdev_linux_run (netdev-linux.c:722)
> 
> CC: John Hurley 
> Fixes: d22f8927c3c9 ("netdev-linux: monitor and offload LAG slaves to TC")
> Signed-off-by: Ilya Maximets 
> ---
>  lib/netdev-linux.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
> index 15b25084b..2b8283e94 100644
> --- a/lib/netdev-linux.c
> +++ b/lib/netdev-linux.c
> @@ -672,7 +672,9 @@ netdev_linux_update_lag(struct rtnetlink_change *change)
>  uint32_t block_id;
>  int error = 0;
>  
> -if_indextoname(change->master_ifindex, master_name);
> +if (!if_indextoname(change->master_ifindex, master_name)) {
> +return;
> +}
>  master_netdev = netdev_from_name(master_name);
>  if (!master_netdev) {
>  return;
> 
Interesting that strlen() doesn't check for NULL. LGTM

Acked-by: Mark D. Gray 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn 1/2] ovn-nb.xml: Fix typo

2021-05-20 Thread Mark Gray
Signed-off-by: Mark Gray 
---
 ovn-nb.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ovn-nb.xml b/ovn-nb.xml
index ed271d8eb3f2..cb3aef1ba881 100644
--- a/ovn-nb.xml
+++ b/ovn-nb.xml
@@ -2750,7 +2750,7 @@
   
 
   
-It true, then new traffic that arrives over this route will have
+If true, then new traffic that arrives over this route will have
 its reply traffic bypass ECMP route selection and will be sent out
 this route instead. Note that this option overrides any rules set
 in the  table. This option
-- 
2.27.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn 2/2] ovn-northd: Fix IPv6 ECMP symmetric reply flows

2021-05-20 Thread Mark Gray
When adding ECMP routes with symmetric replies, ovn-northd
adds a priority 100 flow in the table "lr_in_ip_routing" which bypasses
the ECMP routes for replies by using the ct.rpl field. Lower priority
flows are also added in this table for each route to IPv4/6 network. These
flows have a priority that is calculated by build_route_match() as a function
of the length of the prefix of the route. For long prefixes the priority of
these flows exceeds that of the symmetric reply flow causing unexpected
behaviour. This is particularly seen with IPv6 due to the long length of IPv6
prefixes.

This patch changes the priority of the symmetric reply flow to 300
which will ensure it will have a greater priority than the route
flows even at the maximum prefix length of 128.

This patch also adds additional tests for this condition.

Reported-at: https://bugzilla.redhat.com/1959008
Fixes: 4fdca656857d ("Add ECMP symmetric replies.")
Signed-off-by: Mark Gray 
---
 northd/ovn-northd.8.xml |   2 +-
 northd/ovn-northd.c |   2 +-
 tests/ovn.at| 127 +++
 tests/system-ovn.at | 162 
 4 files changed, 291 insertions(+), 2 deletions(-)

diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
index bca9cb4f4e79..bb77689de516 100644
--- a/northd/ovn-northd.8.xml
+++ b/northd/ovn-northd.8.xml
@@ -2621,7 +2621,7 @@ icmp6 {
 
 
   If ECMP routes with symmetric reply are configured in the
-  OVN_Northbound database for a gateway router, a priority-100
+  OVN_Northbound database for a gateway router, a priority-300
   flow is added for each router port on which symmetric replies are
   configured. The matching logic for these ports essentially reverses the
   configured logic of the ECMP route. So for instance, a route with a
diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
index 7464b7fba153..08eac1208dcb 100644
--- a/northd/ovn-northd.c
+++ b/northd/ovn-northd.c
@@ -8426,7 +8426,7 @@ add_ecmp_symmetric_reply_flows(struct hmap *lflows,
   out_port->lrp_networks.ea_s,
   IN6_IS_ADDR_V4MAPPED(>prefix) ? "" : "xx",
   port_ip, out_port->json_key);
-ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_IP_ROUTING, 100,
+ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_IP_ROUTING, 300,
ds_cstr(), ds_cstr(),
_route->header_);
 
diff --git a/tests/ovn.at b/tests/ovn.at
index dcf3e0e09164..4220eb0f2c6f 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -22982,6 +22982,133 @@ OVN_CLEANUP([hv1], [hv2])
 AT_CLEANUP
 ])
 
+OVN_FOR_EACH_NORTHD([
+AT_SETUP([ovn -- Symmetric IPv6 ECMP reply flows])
+ovn_start
+
+net_add n1
+sim_add hv1
+as hv1
+check ovs-vsctl add-br br-phys
+ovn_attach n1 br-phys 192.168.0.1
+
+sim_add hv2
+as hv2
+check ovs-vsctl add-br br-phys
+ovn_attach n1 br-phys 192.168.0.2
+
+# Logical network
+#
+#   ls1 \
+#\
+# DR -- join -- GW -- ext
+#/
+#   ls2 /
+#
+#  ls1 and ls2 are internal switches connected to distributed router
+#  DR. DR is then connected via a join switch to gateway router GW.
+#  GW is then connected to external switch ext. In real life, this
+#  would likely have a localnet port, but for the purposes of this test
+#  it is unnecessary.
+
+ovn-nbctl create Logical_Router name=DR
+gw_uuid=$(ovn-nbctl create Logical_Router name=GW)
+
+check ovn-nbctl ls-add ls1
+check ovn-nbctl ls-add ls2
+check ovn-nbctl ls-add join
+check ovn-nbctl ls-add ext
+
+# Connect ls1 to DR
+check ovn-nbctl lrp-add DR dr-ls1 00:00:01:01:02:03 1001::1/64
+check ovn-nbctl lsp-add ls1 ls1-dr -- set Logical_Switch_Port ls1-dr \
+type=router options:router-port=dr-ls1 addresses='"00:00:01:01:02:03"'
+
+# Connect ls2 to DR
+check ovn-nbctl lrp-add DR dr-ls2 00:00:01:01:02:04 1001::2/64
+check ovn-nbctl lsp-add ls2 ls2-dr -- set Logical_Switch_Port ls2-dr \
+type=router options:router-port=dr-ls2 addresses='"00:00:01:01:02:04"'
+
+# Connect join to DR
+check ovn-nbctl lrp-add DR dr-join 00:00:02:01:02:03 2001::1/64
+check ovn-nbctl lsp-add join join-dr -- set Logical_Switch_Port join-dr \
+type=router options:router-port=dr-join addresses='"00:00:02:01:02:03"'
+
+# Connect join to GW
+check ovn-nbctl lrp-add GW gw-join 00:00:02:01:02:04 2001::2/64
+check ovn-nbctl lsp-add join join-gw -- set Logical_Switch_Port join-gw \
+type=router options:router-port=gw-join addresses='"00:00:02:01:02:04"'
+
+# Connect ext to GW
+check ovn-nbctl lrp-add GW gw-ext 00:00:03:01:02:03 7001::1/64
+check ovn-nbctl lsp-add ext ext-gw -- set Logical_Switch_Port ext-gw \
+type=router options:router-port=gw-ext addresses='"00:00:03:01:02:03"'
+
+check ovn-nbctl lr-route-add GW 1001::/64 2001::1
+check ovn-nbctl --policy="src-ip" lr-route-add DR 1001::0/64 2001::2
+
+# Now add some ECMP routes to the GW router.
+check ovn-nbctl --ecmp-symmetric-reply --policy="src-ip" lr-route-add GW 
1001::/64 

[ovs-dev] [PATCH ovn 0/2] Fix ECMP symmetric replies for IPv6

2021-05-20 Thread Mark Gray



Mark Gray (2):
  ovn-nb.xml: Fix typo
  ovn-northd: Fix IPv6 ECMP symmetric reply flows

 northd/ovn-northd.8.xml |   2 +-
 northd/ovn-northd.c |   2 +-
 ovn-nb.xml  |   2 +-
 tests/ovn.at| 127 +++
 tests/system-ovn.at | 162 
 5 files changed, 292 insertions(+), 3 deletions(-)

-- 
2.27.0


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] Moving of the primary #openvswitch channel to irc.libera.chat ?

2021-05-20 Thread Ben Pfaff
On Thu, May 20, 2021 at 09:46:01AM -0300, Flavio Leitner wrote:
> On Wed, May 19, 2021 at 01:22:58PM -0700, Ben Pfaff wrote:
> > On Wed, May 19, 2021 at 10:03:57PM +0200, Ilya Maximets wrote:
> > > Hi.
> > > 
> > > Taking into account some very unhealthy things that happened recently
> > > with FreeNode network and resigning of lots of its stuff [1], we
> > > probably need to discuss if Open vSwitch project wants to change the
> > > IRC server for a primary #openvswitch channel.  User's data is the
> > > main concern, IIUC, as it's unclear what the new management will
> > > do with the network.
> > > 
> > > The main alternative now seems to be the Libera.Chat [2] where most of
> > > the former FreeNode stuff.
> > > 
> > > Some projects already announced [3][4] movement to Libera.Chat.  Others
> > > are discussing the possibility [5].
> > > 
> > > So, I think, it make sense to discuss the future of #openvswitch
> > > channel too.  Any thoughts?
> > > 
> > > Will we have an OVN meeting on a different server tomorrow?
> > 
> > Thanks for sending such a detailed message (with footnotes!).
> > I think I support the move.  I have already registered the #openvswitch
> > channel on libera.chat (and copied the topic across).
> 
> Great! Thanks.
> 
> 
> > I'm going to log in on both servers tomorrow but I suggest that we
> > transition to libera.chat after tomorrow's meeting, since this is pretty
> > short notice and I think that most of us (including me) are only light
> > users of IRC.
> > 
> > Also, I suspect that the infrastructure for recording meetings has not
> > yet moved to libera.chat.  (I did notice that the 'openstack' user just
> > dropped out of #openvswitch.  I think that's the bot that did the
> > meeting recordings.  So maybe that infastructure is moving across right
> > as I write.)
> 
> When things are ported, please change freenode's #openvswitch topic
> to say that we are moving to the new network.

That makes sense.  I will do that.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] ovn-northd-ddlog scale issues

2021-05-20 Thread Ben Pfaff
On Thu, May 20, 2021 at 05:06:26PM +0200, Dumitru Ceara wrote:
> On 4/7/21 6:49 PM, Ben Pfaff wrote:
> 
> [...]
> 
> >>
> >> Thanks!  I can download them now.  It's back on my to-do list.
> > 
> > I can reproduce the problem now.  I haven't fixed it yet, but I did fix
> > a nasty performance problem in ovn-nbctl that became really apparent
> > when working with your databases:
> > https://mail.openvswitch.org/pipermail/ovs-dev/2021-April/381909.html
> 
> I was wondering if you had a chance to look at this since.

I haven't kept going.  I consider my series that gives a 5x performance
improvement a kind of checkpoint along the way.  I assumed at first that
it would get reviewed quickly so I could move on to other things, but no
one has looked at it yet.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 1/2] add port-based ingress policing based packet-per-second rate-limiting

2021-05-20 Thread Marcelo Ricardo Leitner
Hi,

On Mon, May 17, 2021 at 01:18:53PM +0200, Simon Horman wrote:
...
> @@ -547,6 +549,12 @@ is_tap_netdev(const struct netdev *netdev)
>  return netdev_get_class(netdev) == _tap_class;
>  }
>  
> +enum {
> +OVS_TC_QOS_TYPE_BPS,
> +OVS_TC_QOS_TYPE_PPS,
> +OVS_TC_QOS_TYPE_MAX,
> +};
> +
>  static int
>  netdev_linux_netnsid_update__(struct netdev_linux *netdev)
>  {
> @@ -2593,24 +2601,50 @@ tc_matchall_fill_police(uint32_t kbits_rate, uint32_t 
> kbits_burst)
>  }
>
>  static void
> -nl_msg_put_act_police(struct ofpbuf *request, struct tc_police police)
> +nl_msg_put_act_police(struct ofpbuf *request, struct tc_police police,
> +  uint32_t kpkts_rate, uint32_t kpkts_burst)
>  {
> -size_t offset;
> +size_t offset, act_offset;
> +uint32_t i = 0, prio = 0;
>
> -nl_msg_put_string(request, TCA_ACT_KIND, "police");
> -offset = nl_msg_start_nested(request, TCA_ACT_OPTIONS);
> -nl_msg_put_unspec(request, TCA_POLICE_TBF, , sizeof police);
> -tc_put_rtab(request, TCA_POLICE_RATE, );
> -nl_msg_put_u32(request, TCA_POLICE_RESULT, TC_ACT_UNSPEC);
> -nl_msg_end_nested(request, offset);
> +for (i = 0; i < OVS_TC_QOS_TYPE_MAX; i++) {
> +if (i == OVS_TC_QOS_TYPE_BPS && !police.rate.rate) {
> +continue;
> +}
> +if (i == OVS_TC_QOS_TYPE_PPS && !kpkts_rate) {
> +continue;
> +}
> +act_offset = nl_msg_start_nested(request, ++prio);
> +nl_msg_put_string(request, TCA_ACT_KIND, "police");
> +offset = nl_msg_start_nested(request, TCA_ACT_OPTIONS);
> +if (i == OVS_TC_QOS_TYPE_BPS && police.rate.rate) {

Nit, the "&& ..." condition is not needed, due to the if()s above.

> +tc_put_rtab(request, TCA_POLICE_RATE, );
> +} else if (i == OVS_TC_QOS_TYPE_PPS && kpkts_rate) {
> +unsigned int pkt_burst_ticks, pps_rate;
> +/* for PPS, set rate as 0 to act as a single action */
> +police.rate.rate = 0;
> +police.burst = 0;
> +police.rate.cell_log = 0;

This is changing the parameter and implicitly relies on the fact that
OVS_TC_QOS_TYPE_PPS > OVS_TC_QOS_TYPE_BPS. It works, but may cause
problems in the future.

Maybe add a new variable that will be actually used by
nl_msg_put_unspec() below, and initialize it for each case?


> +pps_rate = kpkts_rate * 1000;
> +pkt_burst_ticks = tc_bytes_to_ticks(pps_rate,
> +MIN(UINT32_MAX / 1024, 
> kpkts_burst) * 1024);

I don't get why for pps_rate 'k' is 1000, and for bursts, it's 1024.
Anyhow, this is also there for bps and it makes sense to follow that.
But some mention to it in the docs below is welcomed.

> +nl_msg_put_u64(request, TCA_POLICE_PKTRATE64, 
> (uint64_t)pps_rate);
> +nl_msg_put_u64(request, TCA_POLICE_PKTBURST64, 
> (uint64_t)pkt_burst_ticks);
> +}
> +nl_msg_put_unspec(request, TCA_POLICE_TBF, , sizeof police);
> +nl_msg_put_u32(request, TCA_POLICE_RESULT, TC_ACT_PIPE);
> +nl_msg_end_nested(request, offset);
> +nl_msg_end_nested(request, act_offset);
> +}
>  }
>
>  static int
>  tc_add_matchall_policer(struct netdev *netdev, uint32_t kbits_rate,
> -uint32_t kbits_burst)
> +uint32_t kbits_burst, uint32_t kpkts_rate,
> +uint32_t kpkts_burst)
>  {
>  uint16_t eth_type = (OVS_FORCE uint16_t) htons(ETH_P_ALL);
> -size_t basic_offset, action_offset, inner_offset;
> +size_t basic_offset, action_offset;
>  uint16_t prio = TC_RESERVED_PRIORITY_POLICE;
>  int ifindex, err = 0;
>  struct tc_police pol_act;
> @@ -2634,9 +2668,7 @@ tc_add_matchall_policer(struct netdev *netdev, uint32_t 
> kbits_rate,
>  nl_msg_put_string(, TCA_KIND, "matchall");
>  basic_offset = nl_msg_start_nested(, TCA_OPTIONS);
>  action_offset = nl_msg_start_nested(, TCA_MATCHALL_ACT);
> -inner_offset = nl_msg_start_nested(, 1);
> -nl_msg_put_act_police(, pol_act);
> -nl_msg_end_nested(, inner_offset);
> +nl_msg_put_act_police(, pol_act, kpkts_rate, kpkts_burst);
>  nl_msg_end_nested(, action_offset);
>  nl_msg_end_nested(, basic_offset);
>
> @@ -2676,8 +2708,9 @@ tc_del_matchall_policer(struct netdev *netdev)
>  /* Attempts to set input rate limiting (policing) policy.  Returns 0 if
>   * successful, otherwise a positive errno value. */
>  static int
> -netdev_linux_set_policing(struct netdev *netdev_,
> -  uint32_t kbits_rate, uint32_t kbits_burst)
> +netdev_linux_set_policing(struct netdev *netdev_, uint32_t kbits_rate,
> +  uint32_t kbits_burst, uint32_t kpkts_rate,
> +  uint32_t kpkts_burst)
>  {
>  struct netdev_linux *netdev = netdev_linux_cast(netdev_);
>  const char *netdev_name = netdev_get_name(netdev_);
> @@ -2688,6 +2721,10 @@ 

Re: [ovs-dev] ovn-northd-ddlog scale issues

2021-05-20 Thread Dumitru Ceara
On 4/7/21 6:49 PM, Ben Pfaff wrote:

[...]

>>
>> Thanks!  I can download them now.  It's back on my to-do list.
> 
> I can reproduce the problem now.  I haven't fixed it yet, but I did fix
> a nasty performance problem in ovn-nbctl that became really apparent
> when working with your databases:
> https://mail.openvswitch.org/pipermail/ovs-dev/2021-April/381909.html
> 

Hi Ben,

I was wondering if you had a chance to look at this since.

Thanks,
Dumitru

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v2] ovs: Include monitor condition expected seqno fix.

2021-05-20 Thread Ilya Maximets
On 5/18/21 8:49 AM, Dumitru Ceara wrote:
> When setting monitor conditions ovsdb_cs_db_set_condition() returns the
> sequence number when it is expected that all updates that correspond to
> the new condition have been received.  This sequence number is used by
> ovn-controller to determine whether a monitor condition change is
> already in flight.  If that's the case ovn-controller waits until all
> in-flight condition changes have been processed before propagating the
> SB_Global.nb_cfg value to the chassis record.
> 
> This is part of the "ovn-nbctl --wait=hv sync" implementation which, as
> far as I know, is not used actively by any CMS but is, however, used by
> OVN self tests.
> 
> Bump the OVS submodule version to include b5bb044fbe4c ("ovsdb-cs:
> Consider all tables when computing expected cond seqno.") which fixes a
> bug in ovsdb_cs_db_set_condition().  Without it, the returned expected
> sequence number was wrong, leading to occasional test failures in the
> OVN test suite.
> 
> Signed-off-by: Dumitru Ceara 
> ---
> v2: Detailed commit log, as suggested by Mark Michelson.
> ---
>  ovs | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/ovs b/ovs
> index ac85cdb38..b5bb044fb 16
> --- a/ovs
> +++ b/ovs
> @@ -1 +1 @@
> -Subproject commit ac85cdb38c1f33e7952bc4c0347d6c7873fb56a1
> +Subproject commit b5bb044fbe4c1395dcde5cc7d5081ef0099bb8b3
> 

Acked-by: Ilya Maximets 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v1 0/8] RCU: Add blocking mode for debugging

2021-05-20 Thread Gaëtan Rivet
On Thu, May 6, 2021, at 01:19, Ben Pfaff wrote:
> On Thu, May 06, 2021 at 12:37:36AM +0200, Gaëtan Rivet wrote:
> > On Wed, May 5, 2021, at 21:36, Ben Pfaff wrote:
> > > On Wed, Apr 28, 2021 at 01:03:24AM +0200, Gaetan Rivet wrote:
> > > > This series adds a compilation option that changes the behavior of the 
> > > > RCU
> > > > module. Once enabled, RCU reclamation by user threads becomes blocking 
> > > > until
> > > > the RCU threads has executed the scheduled callbacks.
> > > 
> > > It's a great idea to improve the quality of the tests, and RCU can be a
> > > sticking point.
> > > 
> > > I don't quite understand the description.  I think that's for one
> > > particular reason: I don't understand yet what it causes to block.
> > > Would you mind giving an example?  I haven't yet read the patches (I
> > > might not; I don't plan to review this series in detail), so maybe
> > > there is an example in one of the patch.  If so, then please just point
> > > me to that one.
> > 
> > Hello Ben,
> > 
> > The feature unit-test in patch 4 gives an example:
> > https://patchwork.ozlabs.org/project/openvswitch/patch/6de208dfd9d2fdea5999140c88632d42bdfe428b.1619564350.git.gr...@u256.net/
> > 
> > In short:
> > 
> > char *x = xmalloc(1);
> > x[0] = 'a';
> > ovsrcu_postpone(free, x);
> > ovsrcu_quiesce();
> > x[0] = 'b'; /* potential UAF. */
> > 
> > If the RCU is made blocking, the user thread will wait at the end of 
> > 'ovsrcu_quiesce()'
> > for 'free(x)' to be executed. Only threads having scheduled callbacks would 
> > wait.
> > This resolves the above race condition. As it executes deterministically, 
> > if ASAN is
> > enabled as well it will detect the UAF.
> 
> This is a good example, thank you.  I think that it would be useful to
> put this or a similar example into the documentation.
> 
> > In development, it could help verify that RCU is properly used.
> > As it makes ASAN more effective, the RCU could be made blocking in 
> > conjunction in CI jobs.
> > 
> > > This message is a good introduction to the series.  Usually,
> > > introductory messages don't make it into the repository, so I hope that
> > > this material is also included in commit messages or in documentation or
> > > comments.
> > 
> > Most of the information here is explained in patch 8 commit log:
> > https://patchwork.ozlabs.org/project/openvswitch/patch/54575393370fa6076ba6ce7201f36bbfecbe9039.1619564350.git.gr...@u256.net/
> > 
> > It also gives the configure option and how to test it.
> > However, this feature is not otherwise documented.
> > I could add a description in lib/ovs-rcu.h?
> 
> I think that would be a good place for it.  You could add a new
> "Debugging" section at the end of the big comment, for example.
> 
> Thanks!
> 

Hello Ben, all,

Although you don't plan on reviewing this series, for people reading this 
thread:
A v2 has been sent: 
https://mail.openvswitch.org/pipermail/ovs-dev/2021-May/383188.html

The only change is the new 'Debugging' section in lib/ovs-rcu.h.

Kind regards,
-- 
Gaetan Rivet
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 1/8] configure: add --enable-asan option

2021-05-20 Thread Gaetan Rivet
Add a configure option to enable ASAN in a simple way.
Adding an AC variable to allow checking for support in the testsuite.

Signed-off-by: Gaetan Rivet 
---
 .ci/linux-build.sh |  4 ++--
 NEWS   |  1 +
 acinclude.m4   | 16 
 configure.ac   |  1 +
 tests/atlocal.in   |  1 +
 5 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/.ci/linux-build.sh b/.ci/linux-build.sh
index 0210d6a77..19600a668 100755
--- a/.ci/linux-build.sh
+++ b/.ci/linux-build.sh
@@ -229,10 +229,10 @@ fi
 if [ "$ASAN" ]; then
 # This will override default option configured in tests/atlocal.in.
 export ASAN_OPTIONS='detect_leaks=1'
+EXTRA_OPTS="$EXTRA_OPTS --enable-asan"
 # -O2 generates few false-positive memory leak reports in test-ovsdb
 # application, so lowering optimizations to -O1 here.
-CLFAGS_ASAN="-O1 -fno-omit-frame-pointer -fno-common -fsanitize=address"
-CFLAGS_FOR_OVS="${CFLAGS_FOR_OVS} ${CLFAGS_ASAN}"
+CFLAGS_FOR_OVS="${CFLAGS_FOR_OVS} -O1"
 fi
 
 save_OPTS="${OPTS} $*"
diff --git a/NEWS b/NEWS
index 402ce5969..79e18b85b 100644
--- a/NEWS
+++ b/NEWS
@@ -12,6 +12,7 @@ Post-v2.15.0
- DPDK:
  * OVS validated with DPDK 20.11.1. It is recommended to use this version
until further releases.
+   - New configure option '--enable-asan' enables AddressSanitizer.
 
 
 v2.15.0 - 15 Feb 2021
diff --git a/acinclude.m4 b/acinclude.m4
index 15a54d636..615e7f962 100644
--- a/acinclude.m4
+++ b/acinclude.m4
@@ -58,6 +58,22 @@ AC_DEFUN([OVS_ENABLE_WERROR],
fi
AC_SUBST([SPARSE_WERROR])])
 
+dnl OVS_ENABLE_ASAN
+AC_DEFUN([OVS_ENABLE_ASAN],
+  [AC_ARG_ENABLE(
+[asan],
+[AC_HELP_STRING([--enable-asan],
+[Enable the Address Sanitizer])],
+[ASAN_ENABLED=yes], [ASAN_ENABLED=no])
+   AC_SUBST([ASAN_ENABLED])
+   AC_CONFIG_COMMANDS_PRE([
+ if test "$ASAN_ENABLED" = "yes"; then
+ OVS_CFLAGS="$OVS_CFLAGS -fno-omit-frame-pointer"
+ OVS_CFLAGS="$OVS_CFLAGS -fno-common -fsanitize=address"
+ fi
+   ])
+  ])
+
 dnl OVS_CHECK_LINUX
 dnl
 dnl Configure linux kernel source tree
diff --git a/configure.ac b/configure.ac
index c077034d4..eec5a9d1b 100644
--- a/configure.ac
+++ b/configure.ac
@@ -182,6 +182,7 @@ OVS_CONDITIONAL_CC_OPTION([-Wno-unused-parameter], 
[HAVE_WNO_UNUSED_PARAMETER])
 OVS_CONDITIONAL_CC_OPTION([-mavx512f], [HAVE_AVX512F])
 OVS_CHECK_CC_OPTION([-mavx512f], [CFLAGS="$CFLAGS -DHAVE_AVX512F"])
 OVS_ENABLE_WERROR
+OVS_ENABLE_ASAN
 OVS_ENABLE_SPARSE
 OVS_CTAGS_IDENTIFIERS
 OVS_CHECK_DPCLS_AUTOVALIDATOR
diff --git a/tests/atlocal.in b/tests/atlocal.in
index cfca7e192..f61e752bf 100644
--- a/tests/atlocal.in
+++ b/tests/atlocal.in
@@ -220,6 +220,7 @@ export OVS_SYSLOG_METHOD
 OVS_CTL_TIMEOUT=30
 export OVS_CTL_TIMEOUT
 
+ASAN_ENABLED='@ASAN_ENABLED@'
 # Add some default flags to make the tests run better under Address
 # Sanitizer, if it was used for the build.
 #
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 8/8] ovs-rcu: Add blocking RCU mode

2021-05-20 Thread Gaetan Rivet
Add the configure option --enable-rcu-blocking, that modifies the RCU
library. When enabled, quiescing from other threads will block, waiting
on the RCU thread to execute the postponed jobs.

This mode forces the deferred memory reclamation to happen
deterministically, reducing the latency of the deferral and forcing memory
to be freed any time a thread goes through a quiescent state.

Some use-after-free that were hidden by deferred memory reclamation may
become observable as a result. Previously the RCU mechanism would make them
harder to detect.

UAF detection tools should then be used in conjunction with this
compilation flag, e.g. (assuming llvm installed):

  ./configure --enable-rcu-blocking --enable-asan && make

  # Verify the tool works: should trigger a UAF
  ./tests/ovstest test-rcu-uaf quiesce
  ./tests/ovstest test-rcu-uaf try-quiesce
  ./tests/ovstest test-rcu-uaf quiesce-start-end

  # The testsuite can be used as well
  make check TESTSUITEFLAGS='-k rcu'

Signed-off-by: Gaetan Rivet 
---
 .ci/linux-build.sh   |  4 ++
 .github/workflows/build-and-test.yml |  7 +++
 NEWS |  1 +
 acinclude.m4 | 15 +
 configure.ac |  1 +
 lib/ovs-rcu.c| 87 
 lib/ovs-rcu.h| 31 ++
 tests/atlocal.in |  1 +
 tests/library.at |  3 +
 9 files changed, 150 insertions(+)

diff --git a/.ci/linux-build.sh b/.ci/linux-build.sh
index 19600a668..c9e04c090 100755
--- a/.ci/linux-build.sh
+++ b/.ci/linux-build.sh
@@ -235,6 +235,10 @@ if [ "$ASAN" ]; then
 CFLAGS_FOR_OVS="${CFLAGS_FOR_OVS} -O1"
 fi
 
+if [ "$RCU_BLOCK" ]; then
+EXTRA_OPTS="$EXTRA_OPTS --enable-rcu-blocking"
+fi
+
 save_OPTS="${OPTS} $*"
 OPTS="${EXTRA_OPTS} ${save_OPTS}"
 
diff --git a/.github/workflows/build-and-test.yml 
b/.github/workflows/build-and-test.yml
index e2350c6d9..56f6f42fc 100644
--- a/.github/workflows/build-and-test.yml
+++ b/.github/workflows/build-and-test.yml
@@ -23,6 +23,7 @@ jobs:
   M32: ${{ matrix.m32 }}
   OPTS:${{ matrix.opts }}
   TESTSUITE:   ${{ matrix.testsuite }}
+  RCU_BLOCK:   ${{ matrix.rcu_blocking }}
 
 name: linux ${{ join(matrix.*, ' ') }}
 runs-on: ubuntu-18.04
@@ -109,6 +110,12 @@ jobs:
   - compiler: gcc
 deb_package:  deb
 
+  - compiler: clang
+testsuite:test
+kernel:   3.16
+asan: asan
+rcu_blocking: rcu-blocking
+
 steps:
 - name: checkout
   uses: actions/checkout@v2
diff --git a/NEWS b/NEWS
index 79e18b85b..daafbe150 100644
--- a/NEWS
+++ b/NEWS
@@ -13,6 +13,7 @@ Post-v2.15.0
  * OVS validated with DPDK 20.11.1. It is recommended to use this version
until further releases.
- New configure option '--enable-asan' enables AddressSanitizer.
+   - New configure option '--enable-rcu-blocking' to debug RCU usage.
 
 
 v2.15.0 - 15 Feb 2021
diff --git a/acinclude.m4 b/acinclude.m4
index 615e7f962..b01264373 100644
--- a/acinclude.m4
+++ b/acinclude.m4
@@ -1386,6 +1386,21 @@ AC_DEFUN([OVS_ENABLE_SPARSE],
  [], [enable_sparse=no])
AM_CONDITIONAL([ENABLE_SPARSE_BY_DEFAULT], [test $enable_sparse = yes])])
 
+dnl OVS_ENABLE_RCU_BLOCKING
+AC_DEFUN([OVS_ENABLE_RCU_BLOCKING],
+  [AC_ARG_ENABLE(
+[rcu-blocking],
+[AC_HELP_STRING([--enable-rcu-blocking],
+[Enable the blocking RCU mode])],
+[RCU_BLOCKING=yes], [RCU_BLOCKING=no])
+   AC_SUBST([RCU_BLOCKING])
+   AC_CONFIG_COMMANDS_PRE([
+ if test "$RCU_BLOCKING" = "yes"; then
+ OVS_CFLAGS="$OVS_CFLAGS -DOVS_RCU_BLOCKING=1"
+ fi
+   ])
+  ])
+
 dnl OVS_CTAGS_IDENTIFIERS
 dnl
 dnl ctags ignores symbols with extras identifiers. This is a list of
diff --git a/configure.ac b/configure.ac
index eec5a9d1b..de11ff777 100644
--- a/configure.ac
+++ b/configure.ac
@@ -184,6 +184,7 @@ OVS_CHECK_CC_OPTION([-mavx512f], [CFLAGS="$CFLAGS 
-DHAVE_AVX512F"])
 OVS_ENABLE_WERROR
 OVS_ENABLE_ASAN
 OVS_ENABLE_SPARSE
+OVS_ENABLE_RCU_BLOCKING
 OVS_CTAGS_IDENTIFIERS
 OVS_CHECK_DPCLS_AUTOVALIDATOR
 OVS_CHECK_BINUTILS_AVX512
diff --git a/lib/ovs-rcu.c b/lib/ovs-rcu.c
index 1866bd308..295a56778 100644
--- a/lib/ovs-rcu.c
+++ b/lib/ovs-rcu.c
@@ -71,6 +71,84 @@ static void ovsrcu_unregister__(struct ovsrcu_perthread *);
 static bool ovsrcu_call_postponed(void);
 static void *ovsrcu_postpone_thread(void *arg OVS_UNUSED);
 
+#ifdef OVS_RCU_BLOCKING
+
+static struct seq *postpone_wait;
+DEFINE_STATIC_PER_THREAD_DATA(bool, need_wait, false);
+DEFINE_STATIC_PER_THREAD_DATA(uint64_t, quiescent_seqno, 0);
+
+static void
+ovsrcu_postpone_end(void)
+{
+if (single_threaded()) {
+return;
+}
+seq_change(postpone_wait);
+}
+
+static bool
+ovsrcu_do_not_block(void)
+{
+/* Do not wait on the postpone thread if it has been cleared for exit. */
+return 

[ovs-dev] [PATCH v2 0/8] RCU: Add blocking mode for debugging

2021-05-20 Thread Gaetan Rivet
This series adds a compilation option that changes the behavior of the RCU
module. Once enabled, RCU reclamation by user threads becomes blocking until
the RCU threads has executed the scheduled callbacks.

Tools such as AddressSanitizer are useful to detect memory errors e.g. 
user-after-free.
Such tool can become ineffective if the RCU library is used to defer memory 
reclamation.
While this is the intended function of the RCU lib, nothing protects developers
from mistakes i.e. keeping references to memory scheduled for reclamation 
accross
quiescent periods.

Such error that should be detectable with ASAN, are made less likely to occur
due to RCU and thus harder to fix. However, if the RCU is modified so that user
threads are waiting on the RCU thread to execute the scheduled callbacks, they
should be forced to happen.

Unit tests have been written that should trigger a use-after-free from ASAN.
They are however thwarted by the RCU, until the blocking mode is enabled.
In that case, they will always abort on the expected error.

The full test-suite can be passed with the blocking RCU mode enabled.
An entry in the CI matrix is created for it. No error has been observed.

v2:

  * Rebased on master
  * Added documentation in lib/ovs-rcu.h following Ben's suggestion.

  CI: https://github.com/grivet/ovs/actions/runs/860557554

Gaetan Rivet (8):
  configure: add --enable-asan option
  tests: Add ovs-barrier unit test
  tests: Add RCU postpone test
  tests: Add ASAN use-after-free validation with RCU
  ovs-thread: Fix barrier use-after-free
  ovs-thread: Quiesce when joining pthreads
  ovs-rcu: Remove unused perthread mutex
  ovs-rcu: Add blocking RCU mode

 .ci/linux-build.sh   |   8 +-
 .github/workflows/build-and-test.yml |   7 +
 NEWS |   2 +
 acinclude.m4 |  31 
 configure.ac |   2 +
 lib/ovs-rcu.c|  90 -
 lib/ovs-rcu.h|  31 
 lib/ovs-thread.c |  77 ++--
 lib/ovs-thread.h |   6 +-
 tests/atlocal.in |   2 +
 tests/automake.mk|   2 +
 tests/library.at |  49 -
 tests/test-barrier.c | 264 +++
 tests/test-rcu-uaf.c |  98 ++
 tests/test-rcu.c |  61 +++
 15 files changed, 708 insertions(+), 22 deletions(-)
 create mode 100644 tests/test-barrier.c
 create mode 100644 tests/test-rcu-uaf.c

--
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 4/8] tests: Add ASAN use-after-free validation with RCU

2021-05-20 Thread Gaetan Rivet
When using the RCU mechanism and deferring memory reclamation, potential
use-after-free due to incorrect use of RCU can be hidden.

Add a test triggering a UAF event. When the test suite is built with
AddressSanitizer support, verify that the event triggers and the tool is
usable with RCU.

Signed-off-by: Gaetan Rivet 
---
 tests/automake.mk|  1 +
 tests/library.at | 33 +++
 tests/test-rcu-uaf.c | 98 
 3 files changed, 132 insertions(+)
 create mode 100644 tests/test-rcu-uaf.c

diff --git a/tests/automake.mk b/tests/automake.mk
index a32abd41c..4420a3f7f 100644
--- a/tests/automake.mk
+++ b/tests/automake.mk
@@ -472,6 +472,7 @@ tests_ovstest_SOURCES = \
tests/test-packets.c \
tests/test-random.c \
tests/test-rcu.c \
+   tests/test-rcu-uaf.c \
tests/test-reconnect.c \
tests/test-rstp.c \
tests/test-sflow.c \
diff --git a/tests/library.at b/tests/library.at
index 6e8a154e5..4a549f77e 100644
--- a/tests/library.at
+++ b/tests/library.at
@@ -261,6 +261,39 @@ AT_KEYWORDS([rcu])
 AT_CHECK([ovstest test-rcu], [0], [])
 AT_CLEANUP
 
+AT_SETUP([rcu quiesce use-after-free detection])
+AT_SKIP_IF([test "$IS_WIN32" = "yes"])
+AT_SKIP_IF([test "$ASAN_ENABLED" = "no"])
+# SIGABRT + 128
+exit_status=134
+AT_KEYWORDS([rcu asan])
+AT_CHECK([ovstest test-rcu-uaf quiesce], [$exit_status], [ignore], [ignore])
+# ASAN report is expected on success.
+rm asan.*
+AT_CLEANUP
+
+AT_SETUP([rcu try-quiesce use-after-free detection])
+AT_SKIP_IF([test "$IS_WIN32" = "yes"])
+AT_SKIP_IF([test "$ASAN_ENABLED" = "no"])
+# SIGABRT + 128
+exit_status=134
+AT_KEYWORDS([rcu asan])
+AT_CHECK([ovstest test-rcu-uaf try-quiesce], [$exit_status], [ignore], 
[ignore])
+# ASAN report is expected on success.
+rm asan.*
+AT_CLEANUP
+
+AT_SETUP([rcu quiesce-start-end use-after-free detection])
+AT_SKIP_IF([test "$IS_WIN32" = "yes"])
+AT_SKIP_IF([test "$ASAN_ENABLED" = "no"])
+AT_KEYWORDS([rcu asan])
+# SIGABRT + 128
+exit_status=134
+AT_CHECK([ovstest test-rcu-uaf quiesce-start-end], [$exit_status], [ignore], 
[ignore])
+# ASAN report is expected on success.
+rm asan.*
+AT_CLEANUP
+
 AT_SETUP([stopwatch module])
 AT_CHECK([ovstest test-stopwatch], [0], [..
 ], [ignore])
diff --git a/tests/test-rcu-uaf.c b/tests/test-rcu-uaf.c
new file mode 100644
index 0..f97738795
--- /dev/null
+++ b/tests/test-rcu-uaf.c
@@ -0,0 +1,98 @@
+/*
+ * Copyright (c) 2021 NVIDIA Corporation.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include 
+
+#include 
+
+#include "ovs-thread.h"
+#include "ovs-rcu.h"
+#include "ovstest.h"
+#include "util.h"
+
+enum ovsrcu_uaf_type {
+OVSRCU_UAF_QUIESCE,
+OVSRCU_UAF_TRY_QUIESCE,
+OVSRCU_UAF_QUIESCE_START_END,
+};
+
+static void *
+rcu_uaf_main(void *aux)
+{
+enum ovsrcu_uaf_type *type = aux;
+char *xx = xmalloc(2);
+
+xx[0] = 'a';
+ovsrcu_postpone(free, xx);
+switch (*type) {
+case OVSRCU_UAF_QUIESCE:
+ovsrcu_quiesce();
+break;
+case OVSRCU_UAF_TRY_QUIESCE:
+while (ovsrcu_try_quiesce()) {
+;
+}
+break;
+case OVSRCU_UAF_QUIESCE_START_END:
+ovsrcu_quiesce_start();
+ovsrcu_quiesce_end();
+break;
+default:
+OVS_NOT_REACHED();
+}
+xx[1] = 'b';
+
+return NULL;
+}
+
+static void
+usage(char *test_name)
+{
+fprintf(stderr, "Usage: %s \n",
+test_name);
+}
+
+static void
+test_rcu_uaf(int argc, char *argv[])
+{
+char **args = argv + optind - 1;
+enum ovsrcu_uaf_type type;
+pthread_t quiescer;
+
+if (argc - optind != 1) {
+usage(args[0]);
+return;
+}
+
+set_program_name(argv[0]);
+
+if (!strcmp(args[1], "quiesce")) {
+type = OVSRCU_UAF_QUIESCE;
+} else if (!strcmp(args[1], "try-quiesce")) {
+type = OVSRCU_UAF_TRY_QUIESCE;
+} else if (!strcmp(args[1], "quiesce-start-end")) {
+type = OVSRCU_UAF_QUIESCE_START_END;
+} else {
+usage(args[0]);
+return;
+}
+
+/* Need to create a separate thread, to support try-quiesce. */
+quiescer = ovs_thread_create("rcu-uaf", rcu_uaf_main, );
+xpthread_join(quiescer, NULL);
+}
+
+OVSTEST_REGISTER("test-rcu-uaf", test_rcu_uaf);
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 7/8] ovs-rcu: Remove unused perthread mutex

2021-05-20 Thread Gaetan Rivet
A mutex is allocated, initialized and destroyed, without being
used in the perthread structure.

Signed-off-by: Gaetan Rivet 
---
 lib/ovs-rcu.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/lib/ovs-rcu.c b/lib/ovs-rcu.c
index cde1e925b..1866bd308 100644
--- a/lib/ovs-rcu.c
+++ b/lib/ovs-rcu.c
@@ -47,7 +47,6 @@ struct ovsrcu_cbset {
 struct ovsrcu_perthread {
 struct ovs_list list_node;  /* In global list. */
 
-struct ovs_mutex mutex;
 uint64_t seqno;
 struct ovsrcu_cbset *cbset;
 char name[16];  /* This thread's name. */
@@ -84,7 +83,6 @@ ovsrcu_perthread_get(void)
 const char *name = get_subprogram_name();
 
 perthread = xmalloc(sizeof *perthread);
-ovs_mutex_init(>mutex);
 perthread->seqno = seq_read(global_seqno);
 perthread->cbset = NULL;
 ovs_strlcpy(perthread->name, name[0] ? name : "main",
@@ -406,7 +404,6 @@ ovsrcu_unregister__(struct ovsrcu_perthread *perthread)
 ovs_list_remove(>list_node);
 ovs_mutex_unlock(_threads_mutex);
 
-ovs_mutex_destroy(>mutex);
 free(perthread);
 
 seq_change(global_seqno);
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 6/8] ovs-thread: Quiesce when joining pthreads

2021-05-20 Thread Gaetan Rivet
Joining pthreads makes the caller quiescent. It should register as such,
as joined threads may wait on an RCU callback executing before quitting,
deadlocking the caller.

Signed-off-by: Gaetan Rivet 
---
 lib/ovs-thread.c | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/lib/ovs-thread.c b/lib/ovs-thread.c
index 805cba622..bf58923f8 100644
--- a/lib/ovs-thread.c
+++ b/lib/ovs-thread.c
@@ -180,8 +180,6 @@ XPTHREAD_FUNC1(pthread_cond_destroy, pthread_cond_t *);
 XPTHREAD_FUNC1(pthread_cond_signal, pthread_cond_t *);
 XPTHREAD_FUNC1(pthread_cond_broadcast, pthread_cond_t *);
 
-XPTHREAD_FUNC2(pthread_join, pthread_t, void **);
-
 typedef void destructor_func(void *);
 XPTHREAD_FUNC2(pthread_key_create, pthread_key_t *, destructor_func *);
 XPTHREAD_FUNC1(pthread_key_delete, pthread_key_t);
@@ -191,6 +189,20 @@ XPTHREAD_FUNC2(pthread_setspecific, pthread_key_t, const 
void *);
 XPTHREAD_FUNC3(pthread_sigmask, int, const sigset_t *, sigset_t *);
 #endif
 
+void
+xpthread_join(pthread_t thread, void **retval)
+{
+int error;
+
+ovsrcu_quiesce_start();
+error = pthread_join(thread, retval);
+ovsrcu_quiesce_end();
+
+if (OVS_UNLIKELY(error)) {
+ovs_abort(error, "%s failed", __func__);
+}
+}
+
 static void
 ovs_mutex_init__(const struct ovs_mutex *l_, int type)
 {
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 3/8] tests: Add RCU postpone test

2021-05-20 Thread Gaetan Rivet
Add a simple postponing test verifying RCU callbacks have executed and
RCU exits in order. Add as part of library unit-tests.

Signed-off-by: Gaetan Rivet 
---
 tests/library.at |  8 ++-
 tests/test-rcu.c | 61 
 2 files changed, 68 insertions(+), 1 deletion(-)

diff --git a/tests/library.at b/tests/library.at
index e572c22e3..6e8a154e5 100644
--- a/tests/library.at
+++ b/tests/library.at
@@ -251,10 +251,16 @@ AT_KEYWORDS([barrier])
 AT_CHECK([ovstest test-barrier], [0], [])
 AT_CLEANUP
 
-AT_SETUP([rcu])
+AT_SETUP([rcu quiescing])
+AT_KEYWORDS([rcu])
 AT_CHECK([ovstest test-rcu-quiesce], [0], [])
 AT_CLEANUP
 
+AT_SETUP([rcu postponing])
+AT_KEYWORDS([rcu])
+AT_CHECK([ovstest test-rcu], [0], [])
+AT_CLEANUP
+
 AT_SETUP([stopwatch module])
 AT_CHECK([ovstest test-stopwatch], [0], [..
 ], [ignore])
diff --git a/tests/test-rcu.c b/tests/test-rcu.c
index 965f3c49f..38e1cbb6f 100644
--- a/tests/test-rcu.c
+++ b/tests/test-rcu.c
@@ -49,3 +49,64 @@ test_rcu_quiesce(int argc OVS_UNUSED, char *argv[] 
OVS_UNUSED)
 }
 
 OVSTEST_REGISTER("test-rcu-quiesce", test_rcu_quiesce);
+
+struct rcu_user_aux {
+bool done;
+};
+
+static void
+rcu_user_deferred(struct rcu_user_aux *aux)
+{
+aux->done = true;
+}
+
+static void *
+rcu_user_main(void *aux_)
+{
+struct rcu_user_aux *aux = aux_;
+
+ovsrcu_quiesce();
+
+aux->done = false;
+ovsrcu_postpone(rcu_user_deferred, aux);
+
+ovsrcu_quiesce();
+
+return NULL;
+}
+
+#define N_THREAD 4
+
+static void
+test_rcu(int argc OVS_UNUSED, char *argv[] OVS_UNUSED)
+{
+struct rcu_user_aux main_aux = {0};
+struct rcu_user_aux aux[N_THREAD];
+pthread_t users[N_THREAD];
+size_t i;
+
+memset(aux, 0, sizeof aux);
+
+for (i = 0; i < ARRAY_SIZE(users); i++) {
+users[i] = ovs_thread_create("user", rcu_user_main, [i]);
+}
+
+for (i = 0; i < ARRAY_SIZE(users); i++) {
+xpthread_join(users[i], NULL);
+}
+
+/* Register a last callback and verify that it will be properly executed
+ * even if the RCU lib is exited without this thread quiescing.
+ */
+ovsrcu_postpone(rcu_user_deferred, _aux);
+
+ovsrcu_exit();
+
+ovs_assert(main_aux.done);
+
+for (i = 0; i < ARRAY_SIZE(users); i++) {
+ovs_assert(aux[i].done);
+}
+}
+
+OVSTEST_REGISTER("test-rcu", test_rcu);
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 2/8] tests: Add ovs-barrier unit test

2021-05-20 Thread Gaetan Rivet
No unit test exist currently for the ovs-barrier type.
It is however crucial as a building block and should be verified to work
as expected.

Create a simple test verifying the basic function of ovs-barrier.
Integrate the test as part of the test suite.

Signed-off-by: Gaetan Rivet 
---
 tests/automake.mk|   1 +
 tests/library.at |   5 +
 tests/test-barrier.c | 264 +++
 3 files changed, 270 insertions(+)
 create mode 100644 tests/test-barrier.c

diff --git a/tests/automake.mk b/tests/automake.mk
index 1a528aa39..a32abd41c 100644
--- a/tests/automake.mk
+++ b/tests/automake.mk
@@ -448,6 +448,7 @@ tests_ovstest_SOURCES = \
tests/ovstest.h \
tests/test-aes128.c \
tests/test-atomic.c \
+   tests/test-barrier.c \
tests/test-bundle.c \
tests/test-byte-order.c \
tests/test-classifier.c \
diff --git a/tests/library.at b/tests/library.at
index 1702b7556..e572c22e3 100644
--- a/tests/library.at
+++ b/tests/library.at
@@ -246,6 +246,11 @@ AT_SETUP([ofpbuf module])
 AT_CHECK([ovstest test-ofpbuf], [0], [])
 AT_CLEANUP
 
+AT_SETUP([barrier module])
+AT_KEYWORDS([barrier])
+AT_CHECK([ovstest test-barrier], [0], [])
+AT_CLEANUP
+
 AT_SETUP([rcu])
 AT_CHECK([ovstest test-rcu-quiesce], [0], [])
 AT_CLEANUP
diff --git a/tests/test-barrier.c b/tests/test-barrier.c
new file mode 100644
index 0..3bc5291cc
--- /dev/null
+++ b/tests/test-barrier.c
@@ -0,0 +1,264 @@
+/*
+ * Copyright (c) 2021 NVIDIA Corporation.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include 
+
+#include 
+
+#include "ovs-thread.h"
+#include "ovs-rcu.h"
+#include "ovstest.h"
+#include "random.h"
+#include "util.h"
+
+#define DEFAULT_N_THREADS 4
+#define NB_STEPS 4
+
+static bool verbose;
+static struct ovs_barrier barrier;
+
+struct blocker_aux {
+unsigned int tid;
+bool leader;
+int step;
+};
+
+static void *
+basic_blocker_main(void *aux_)
+{
+struct blocker_aux *aux = aux_;
+size_t i;
+
+aux->step = 0;
+for (i = 0; i < NB_STEPS; i++) {
+ovs_barrier_block();
+aux->step++;
+ovs_barrier_block();
+}
+
+return NULL;
+}
+
+static void
+basic_block_check(struct blocker_aux *aux, size_t n, int expected)
+{
+size_t i;
+
+for (i = 0; i < n; i++) {
+if (verbose) {
+printf("aux[%" PRIuSIZE "]=%d == %d", i, aux[i].step, expected);
+if (aux[i].step != expected) {
+printf(" <--- X");
+}
+printf("\n");
+} else {
+ovs_assert(aux[i].step == expected);
+}
+}
+ovs_barrier_block();
+ovs_barrier_block();
+}
+
+/*
+ * Basic barrier test.
+ *
+ * N writers and 1 reader participate in the test.
+ * Each thread goes through M steps (=NB_STEPS).
+ * The main thread participates as the reader.
+ *
+ * A Step is divided in three parts:
+ *1. before
+ *  (barrier)
+ *2. during
+ *  (barrier)
+ *3. after
+ *
+ * Each writer updates a thread-local variable with the
+ * current step number within part 2 and waits.
+ *
+ * The reader checks all variables during part 3, expecting
+ * all variables to be equal. If any variable differs, it means
+ * its thread was not properly blocked by the barrier.
+ */
+static void
+test_barrier_basic(size_t n_threads)
+{
+struct blocker_aux *aux;
+pthread_t *threads;
+size_t i;
+
+ovs_barrier_init(, n_threads + 1);
+
+aux = xcalloc(n_threads, sizeof *aux);
+threads = xmalloc(n_threads * sizeof *threads);
+for (i = 0; i < n_threads; i++) {
+threads[i] = ovs_thread_create("ovs-barrier",
+   basic_blocker_main, [i]);
+}
+
+for (i = 0; i < NB_STEPS; i++) {
+basic_block_check(aux, n_threads, i);
+}
+ovs_barrier_destroy();
+
+for (i = 0; i < n_threads; i++) {
+xpthread_join(threads[i], NULL);
+}
+
+free(threads);
+free(aux);
+}
+
+static unsigned int *shared_mem;
+
+static void *
+lead_blocker_main(void *aux_)
+{
+struct blocker_aux *aux = aux_;
+size_t i;
+
+aux->step = 0;
+for (i = 0; i < NB_STEPS; i++) {
+if (aux->leader) {
+shared_mem = xmalloc(sizeof *shared_mem);
+if (verbose) {
+printf("*T1: allocated shmem\n");
+}
+}
+xnanosleep(random_range(100) * 1000);
+
+ovs_barrier_block();
+
+if (verbose) {
+   

[ovs-dev] [PATCH v2 5/8] ovs-thread: Fix barrier use-after-free

2021-05-20 Thread Gaetan Rivet
When a thread is blocked on a barrier, there is no guarantee
regarding the moment it will resume, only that it will at some point in
the future.

One thread can resume first then proceed to destroy the barrier while
another thread has not yet awoken. When it finally happens, the second
thread will attempt a seq_read() on the barrier seq, while the first
thread have already destroyed it, triggering a use-after-free.

Introduce an additional indirection layer within the barrier.
A internal barrier implementation holds all the necessary elements
for a thread to safely block and destroy. Whenever a barrier is
destroyed, the internal implementation is left available to still
blocking threads if necessary. A reference counter is used to track
threads still using the implementation.

Note that current uses of ovs-barrier are not affected: RCU and
revalidators will not destroy their barrier immediately after blocking
on it.

Fixes: d8043da7182a ("ovs-thread: Implement OVS specific barrier.")
Signed-off-by: Gaetan Rivet 
---
 lib/ovs-thread.c | 61 +++-
 lib/ovs-thread.h |  6 ++---
 2 files changed, 53 insertions(+), 14 deletions(-)

diff --git a/lib/ovs-thread.c b/lib/ovs-thread.c
index b686e4548..805cba622 100644
--- a/lib/ovs-thread.c
+++ b/lib/ovs-thread.c
@@ -299,21 +299,53 @@ ovs_spin_init(const struct ovs_spin *spin)
 }
 #endif
 
+struct ovs_barrier_impl {
+uint32_t size;/* Number of threads to wait. */
+atomic_count count;   /* Number of threads already hit the barrier. */
+struct seq *seq;
+struct ovs_refcount refcnt;
+};
+
+static void
+ovs_barrier_impl_ref(struct ovs_barrier_impl *impl)
+{
+ovs_refcount_ref(>refcnt);
+}
+
+static void
+ovs_barrier_impl_unref(struct ovs_barrier_impl *impl)
+{
+if (ovs_refcount_unref(>refcnt) == 1) {
+seq_destroy(impl->seq);
+free(impl);
+}
+}
+
 /* Initializes the 'barrier'.  'size' is the number of threads
  * expected to hit the barrier. */
 void
 ovs_barrier_init(struct ovs_barrier *barrier, uint32_t size)
 {
-barrier->size = size;
-atomic_count_init(>count, 0);
-barrier->seq = seq_create();
+struct ovs_barrier_impl *impl;
+
+impl = xmalloc(sizeof *impl);
+impl->size = size;
+atomic_count_init(>count, 0);
+impl->seq = seq_create();
+ovs_refcount_init(>refcnt);
+
+ovsrcu_set(>impl, impl);
 }
 
 /* Destroys the 'barrier'. */
 void
 ovs_barrier_destroy(struct ovs_barrier *barrier)
 {
-seq_destroy(barrier->seq);
+struct ovs_barrier_impl *impl;
+
+impl = ovsrcu_get(struct ovs_barrier_impl *, >impl);
+ovsrcu_set(>impl, NULL);
+ovs_barrier_impl_unref(impl);
 }
 
 /* Makes the calling thread block on the 'barrier' until all
@@ -325,23 +357,30 @@ ovs_barrier_destroy(struct ovs_barrier *barrier)
 void
 ovs_barrier_block(struct ovs_barrier *barrier)
 {
-uint64_t seq = seq_read(barrier->seq);
+struct ovs_barrier_impl *impl;
 uint32_t orig;
+uint64_t seq;
 
-orig = atomic_count_inc(>count);
-if (orig + 1 == barrier->size) {
-atomic_count_set(>count, 0);
+impl = ovsrcu_get(struct ovs_barrier_impl *, >impl);
+ovs_barrier_impl_ref(impl);
+
+seq = seq_read(impl->seq);
+orig = atomic_count_inc(>count);
+if (orig + 1 == impl->size) {
+atomic_count_set(>count, 0);
 /* seq_change() serves as a release barrier against the other threads,
  * so the zeroed count is visible to them as they continue. */
-seq_change(barrier->seq);
+seq_change(impl->seq);
 } else {
 /* To prevent thread from waking up by other event,
  * keeps waiting for the change of 'barrier->seq'. */
-while (seq == seq_read(barrier->seq)) {
-seq_wait(barrier->seq, seq);
+while (seq == seq_read(impl->seq)) {
+seq_wait(impl->seq, seq);
 poll_block();
 }
 }
+
+ovs_barrier_impl_unref(impl);
 }
 
 DEFINE_EXTERN_PER_THREAD_DATA(ovsthread_id, OVSTHREAD_ID_UNSET);
diff --git a/lib/ovs-thread.h b/lib/ovs-thread.h
index 7ee98bd4e..3b444ccdc 100644
--- a/lib/ovs-thread.h
+++ b/lib/ovs-thread.h
@@ -21,16 +21,16 @@
 #include 
 #include 
 #include "ovs-atomic.h"
+#include "ovs-rcu.h"
 #include "openvswitch/thread.h"
 #include "util.h"
 
 struct seq;
 
 /* Poll-block()-able barrier similar to pthread_barrier_t. */
+struct ovs_barrier_impl;
 struct ovs_barrier {
-uint32_t size;/* Number of threads to wait. */
-atomic_count count;   /* Number of threads already hit the barrier. */
-struct seq *seq;
+OVSRCU_TYPE(struct ovs_barrier_impl *) impl;
 };
 
 /* Wrappers for pthread_mutexattr_*() that abort the process on any error. */
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] Moving of the primary #openvswitch channel to irc.libera.chat ?

2021-05-20 Thread Flavio Leitner
On Wed, May 19, 2021 at 01:22:58PM -0700, Ben Pfaff wrote:
> On Wed, May 19, 2021 at 10:03:57PM +0200, Ilya Maximets wrote:
> > Hi.
> > 
> > Taking into account some very unhealthy things that happened recently
> > with FreeNode network and resigning of lots of its stuff [1], we
> > probably need to discuss if Open vSwitch project wants to change the
> > IRC server for a primary #openvswitch channel.  User's data is the
> > main concern, IIUC, as it's unclear what the new management will
> > do with the network.
> > 
> > The main alternative now seems to be the Libera.Chat [2] where most of
> > the former FreeNode stuff.
> > 
> > Some projects already announced [3][4] movement to Libera.Chat.  Others
> > are discussing the possibility [5].
> > 
> > So, I think, it make sense to discuss the future of #openvswitch
> > channel too.  Any thoughts?
> > 
> > Will we have an OVN meeting on a different server tomorrow?
> 
> Thanks for sending such a detailed message (with footnotes!).
> I think I support the move.  I have already registered the #openvswitch
> channel on libera.chat (and copied the topic across).

Great! Thanks.


> I'm going to log in on both servers tomorrow but I suggest that we
> transition to libera.chat after tomorrow's meeting, since this is pretty
> short notice and I think that most of us (including me) are only light
> users of IRC.
> 
> Also, I suspect that the infrastructure for recording meetings has not
> yet moved to libera.chat.  (I did notice that the 'openstack' user just
> dropped out of #openvswitch.  I think that's the bot that did the
> meeting recordings.  So maybe that infastructure is moving across right
> as I write.)

When things are ported, please change freenode's #openvswitch topic
to say that we are moving to the new network.

Usually the 'topic' is displayed when someone joins the channel, so
it could help users to find out about the change.

-- 
fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3] conntrack: document all-zero IP SNAT behavior and add a test case

2021-05-20 Thread Dumitru Ceara
On 5/17/21 5:22 PM, Dumitru Ceara wrote:
> On 4/26/21 2:19 PM, Eelco Chaudron wrote:
>> Currently, conntrack in the kernel has an undocumented feature referred
>> to as all-zero IP address NULL SNAT. Basically, when a source port
>> collision is detected during the commit, the source port will be
>> translated to an ephemeral port. If there is no collision, no SNAT is
>> performed.
>>
>> This patchset documents this behavior and adds a self-test to verify
>> it's not changing.
>>
>> Signed-off-by: Eelco Chaudron 
>> ---
>> v3: Renamed NULL SNAT to all-zero IP SNAT.
>> v2: Fixed NULL SNAT to only work in the -rpl state to be inline with
>> OpenShift-SDN's behavior.
> 
> Hi Eelco,
> 
> Would it be possible to add this capability to the list of kernel
> Datapath.capabilities ovsdb column? [0]
> 
> Given that the patch to add userspace datapath support for all-zero IP
> SNAT is not accepted yet [1], and even if it does it will likely not be
> backported to LTS because it's a feature, this would make it easier for
> OVN (for example ovn-controller) to determine at runtime if it should
> use all-zero IP SNAT or not.
> 
> [0]
> https://github.com/openvswitch/ovs/commit/27501802d09f782b8133031c1eae3394ae5ce147

I just realized that the Datapath record is not created automatically, I
wonder if that's a change to be done in OVS?  Or is it expected that the
controller/CMS create the datapath record?

> 
> [1] https://patchwork.ozlabs.org/project/openvswitch/list/?series=241223
> 
> Thanks,
> Dumitru
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3] ofproto-dpif: APIs and CLI option to add/delete static fdb entry

2021-05-20 Thread Eelco Chaudron


On 14 May 2021, at 21:33, Vasu Dasari wrote:

> Currently there is an option to add/flush/show ARP/ND neighbor. This covers L3
> side.  For L2 side, there is only fdb show command. This patch gives an option
> to add/del an fdb entry via ovs-appctl. ovs-appctl command looks like this:
>
> To add:
> ovs-appctl fdb/add
> ovs-appctl fdb/add br0 p1 0 50:54:00:00:00:05
>
> To del:
> ovs-appctl fdb/del
> ovs-appctl fdb/del br0 p1 0 50:54:00:00:00:05
>
> Static entry should not age. To indicate that entry being programmed is a 
> static entry,
> 'expires' field in 'struct mac_entry' will be set to a 
> MAC_ENTRY_AGE_STATIC_ENTRY. A
> check for this value is made while deleting mac entry as part of regular 
> aging process.
> Another check as part of mac-update process, when a packet with same source 
> mac as this
> entry arrives on the configured port will not modify the expires field
>
> Added two new APIs to provide convenient interface to add and delete 
> static-macs.
> void xlate_add_static_mac_entry(const struct ofproto_dpif *, ofp_port_t 
> in_port,
>struct eth_addr dl_src, int vlan);
> void xlate_delete_static_mac_entry(const struct ofproto_dpif *,
>   struct eth_addr dl_src, int vlan);
>
> Signed-off-by: Vasu Dasari 
> Reported-at: 
> https://mail.openvswitch.org/pipermail/ovs-discuss/2019-June/048894.html
> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1597752

I did some testing and found the issues below, once fixed, I’ll do a full code 
review.

When you do an FDB flush, it also clears the static FDB entries. I think this 
is wrong, as all hardware vendors I know will retain the static FDB entries.

When you create a static entry for lets say Port A, when a packet with the same 
MAC comes from Port B the entry will be updated with Port B. This should not 
happen for static entries.

When you add a static MAC entry, the command just returns "OK". Other commands 
do not return anything on a successful addition. You should either follow the 
same behavior or be more verbose (Static FDB successfully added?) on your 
return.

Also, it might be nice to be more verbose when you replace an existing static 
or dynamic FDB entry, i.e. especially if the physical port is different (mac 
move case).

Cheers,


Eelco

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v4 2/2] ovs-numa: Dpdk options with non-contiguous nodes

2021-05-20 Thread Finn, Emma



-Original Message-
From: dev  On Behalf Of David Wilder
Sent: Wednesday 19 May 2021 01:10
To: ovs-dev@openvswitch.org
Cc: daniele.di.proie...@gmail.com; wil...@us.ibm.com; i.maxim...@ovn.org
Subject: [ovs-dev] [PATCH v4 2/2] ovs-numa: Dpdk options with non-contiguous 
nodes

If not supplied by the user --socket-mem and --socket-limit EAL options are set 
to a default value based system topology. The assumption that numa nodes must 
be numbered contiguously is removed by this change.

These options can be seen in the ovs-vswitchd.log.  For example:
a system containing only numa nodes 0 and 8 will generate the following:

EAL ARGS: ovs-vswitchd --socket-mem 1024,0,0,0,0,0,0,0,1024 \
 --socket-limit 1024,0,0,0,0,0,0,0,1024 -l 0

Signed-off-by: David Wilder 
Reviewed-by: David Christensen 
---
 lib/dpdk.c | 26 ++
 lib/ovs-numa.c |  1 -
 lib/ovs-numa.h |  2 ++
 3 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/lib/dpdk.c b/lib/dpdk.c
index 3195403..918bc80 100644
--- a/lib/dpdk.c
+++ b/lib/dpdk.c
@@ -133,18 +133,28 @@ static char *
 construct_dpdk_socket_mem(void)
 {
 const char *def_value = "1024";
-int numa, numa_nodes = ovs_numa_get_n_numas();
+struct ovs_numa_dump *dump;
+int last_node = 0;
+
+/* Build a list of all numa nodes with at least one core */
+dump = ovs_numa_dump_n_cores_per_numa(1);
 struct ds dpdk_socket_mem = DS_EMPTY_INITIALIZER;
+ds_put_cstr(_socket_mem, def_value);
 
-if (numa_nodes == 0 || numa_nodes == OVS_NUMA_UNSPEC) {
-numa_nodes = 1;
-}
+const struct ovs_numa_info_numa *node;
 
-ds_put_cstr(_socket_mem, def_value);
-for (numa = 1; numa < numa_nodes; ++numa) {
-ds_put_format(_socket_mem, ",%s", def_value);
+FOR_EACH_NUMA_ON_DUMP(node, dump) {
+while (node->numa_id > last_node+1 &&
+   node->numa_id != OVS_NUMA_UNSPEC &&
+   node->numa_id <= MAX_NUMA_NODES){
+ds_put_format(_socket_mem, ",%s", "0");
+++last_node;
+}
+if (node->numa_id != 0) {
+ds_put_format(_socket_mem, ",%s", def_value);
+}
 }
-
+ovs_numa_dump_destroy(dump);
 return ds_cstr(_socket_mem);
 }
 
diff --git a/lib/ovs-numa.c b/lib/ovs-numa.c index d2e7cc6..de1d037 100644
--- a/lib/ovs-numa.c
+++ b/lib/ovs-numa.c
@@ -58,7 +58,6 @@ VLOG_DEFINE_THIS_MODULE(ovs_numa);
  * TODO: Fix ovs-numa when cpu hotplug is used.
  */
 
-#define MAX_NUMA_NODES 128
 
 /* numa node. */
 struct numa_node {
diff --git a/lib/ovs-numa.h b/lib/ovs-numa.h index 8f2ea34..ecc251a 100644
--- a/lib/ovs-numa.h
+++ b/lib/ovs-numa.h
@@ -26,6 +26,8 @@
 #define OVS_CORE_UNSPEC INT_MAX
 #define OVS_NUMA_UNSPEC INT_MAX
 
+#define MAX_NUMA_NODES 128
+
 /* Dump of a list of 'struct ovs_numa_info'. */  struct ovs_numa_dump {
 struct hmap cores;
--
1.8.3.1

Tested-by: Emma Finn 

Thanks, 
Emma

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v4 1/2] ovs-numa: Support non-contiguous numa nodes and offline CPU cores

2021-05-20 Thread Finn, Emma



-Original Message-
From: dev  On Behalf Of David Wilder
Sent: Wednesday 19 May 2021 01:10
To: ovs-dev@openvswitch.org
Cc: daniele.di.proie...@gmail.com; wil...@us.ibm.com; i.maxim...@ovn.org
Subject: [ovs-dev] [PATCH v4 1/2] ovs-numa: Support non-contiguous numa nodes 
and offline CPU cores

This change removes the assumption that numa nodes and cores are numbered 
contiguously in linux.  This change is required to support some Power systems.

An additional check has been added to verify that cores are online, offline 
cores result in non-contiguously numbered cores.

I manually verified that all the exported functions defined in ovs-numa.h 
function correctly with these changes on Power and amd64 systems.

Tests for dpif-netdev and pmd that define numa systems using the --dummy-numa 
option have been modified to use both contiguous and non-contiguous numbered 
nodes.

Signed-off-by: David Wilder 
Reviewed-by: David Christensen 
---
 lib/ovs-numa.c   | 47 +++---
 tests/dpif-netdev.at | 64 +---
 tests/pmd.at | 20 
 3 files changed, 80 insertions(+), 51 deletions(-)

diff --git a/lib/ovs-numa.c b/lib/ovs-numa.c index 6d0a685..d2e7cc6 100644
--- a/lib/ovs-numa.c
+++ b/lib/ovs-numa.c
@@ -42,16 +42,18 @@ VLOG_DEFINE_THIS_MODULE(ovs_numa);
  * This module stores the affinity information of numa nodes and cpu cores.
  * It also provides functions to bookkeep the pin of threads on cpu cores.
  *
- * It is assumed that the numa node ids and cpu core ids all start from 0 and
- * range continuously.  So, for example, if 'ovs_numa_get_n_cores()' returns N,
- * user can assume core ids from 0 to N-1 are all valid and there is a
- * 'struct cpu_core' for each id.
+ * It is assumed that the numa node ids and cpu core ids all start from 0.
+ * There is no guarantee that node and cpu ids are numbered 
+ consecutively
+ * (this is a change from earlier version of the code). So, for 
+ example,
+ * if two nodes exist with ids 0 and 8, 'ovs_numa_get_n_nodes()' will
+ * return 2, no assumption of node numbering should be made.
  *
  * NOTE, this module should only be used by the main thread.
  *
- * NOTE, the assumption above will fail when cpu hotplug is used.  In that
- * case ovs-numa will not function correctly.  For now, add a TODO entry
- * for addressing it in the future.
+ * NOTE, if cpu hotplug is used 'all_numa_nodes' and 'all_cpu_cores' 
+ must be
+ * invalidated when ever the system topology changes. Support for 
+ detecting
+ * topology changes has not been included. For now, add a TODO entry 
+ for
+ * addressing it in the future.
  *
  * TODO: Fix ovs-numa when cpu hotplug is used.
  */
@@ -130,8 +132,8 @@ insert_new_cpu_core(struct numa_node *n, unsigned core_id)
  * - "0,0,0,0": four cores on numa socket 0.
  * - "0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1": 16 cores on two numa sockets.
  * - "0,0,0,0,1,1,1,1": 8 cores on two numa sockets.
- *
- * The different numa ids must be consecutives or the function will abort. */
+ * - "0,0,0,0,8,8,8,8": 8 cores on two numa sockets, non-contiguous.
+ */
 static void
 discover_numa_and_core_dummy(void)
 {
@@ -169,10 +171,27 @@ discover_numa_and_core_dummy(void)
 
 free(conf);
 
-if (max_numa_id + 1 != hmap_count(_numa_nodes)) {
-ovs_fatal(0, "dummy numa contains non consecutive numa ids");
+}
+
+#ifdef __linux__
+/* Check if a cpu is detected and online */ static int 
+cpu_detected(unsigned int core_id) {
+char path[PATH_MAX];
+int len = snprintf(path,
+  sizeof(path), "/sys/devices/system/cpu/cpu%d/topology/core_id" ,
+  core_id);
+if (len <= 0 || (unsigned) len >= sizeof(path)) {
+return 0;
+}
+if (access(path, F_OK) != 0) {
+return 0;
 }
+
+return 1;
 }
+#endif /* __linux__ */
 
 /* Discovers all numa nodes and the corresponding cpu cores.
  * Constructs the 'struct numa_node' and 'struct cpu_core'. */ @@ -219,7 
+238,9 @@ discover_numa_and_core(void)
 unsigned core_id;
 
 core_id = strtoul(subdir->d_name + 3, NULL, 10);
-insert_new_cpu_core(n, core_id);
+if (cpu_detected(core_id)) {
+insert_new_cpu_core(n, core_id);
+}
 }
 }
 closedir(dir);
@@ -229,7 +250,7 @@ discover_numa_and_core(void)
 }
 
 free(path);
-if (!dir || !numa_supported) {
+if (!numa_supported) {
 break;
 }
 }
diff --git a/tests/dpif-netdev.at b/tests/dpif-netdev.at index 57cae38..f8782e6 
100644
--- a/tests/dpif-netdev.at
+++ b/tests/dpif-netdev.at
@@ -85,7 +85,7 @@ AT_CLEANUP
 
 
 m4_define([DPIF_NETDEV_DUMMY_IFACE],
-  [AT_SETUP([dpif-netdev - $1 interface])
+  [AT_SETUP([dpif-netdev - $1 interface $2])
# Create br0 with interfaces p1 and p7
#and br1 with interfaces p2 and p8