Re: [ovs-dev] [PATCH v2 1/2] conntrack: restore the origin port for each round with new address

2021-09-06 Thread wenxu









From: Paolo Valerio 
Date: 2021-09-06 19:00:37
To:  we...@ucloud.cn,i.maxim...@ovn.org,dlu...@gmail.com,acon...@redhat.com
Cc:  d...@openvswitch.org
Subject: Re: [PATCH v2 1/2] conntrack: restore the origin port for each round 
with new address>Hello,
>
>we...@ucloud.cn writes:
>
>> From: wenxu 
>>
>> It is better to choose the origin select port as current port
>> for each port search round with new address.
>>
>> Signed-off-by: wenxu 
>> ---
>
>This should happen normally.
>It doesn't happen in the case of source port manipulation when the
>default ephemeral range is used and the packet source port is below
>1024. In that case the first IP iteration uses the packet source port,
>whereas the others don't.
>
>if we want to change this behavior, there are some more ways we can
>consider, e.g.:


" In that case the first IP iteration uses the packet source port, whereas the 
others don't"


I think the above rule is not matter with the curr_sport picking different 
ranges?
So you means for source ports < 1024, each IP iteration should using different 
source port?




> >- Using 1024 as initial curr_sport for source ports < 1024. >- picking 
> >different default ranges > - e.g. the kernel uses the range [1, 511], if the 
> >source port is < 512, > [600, 1023], if it's >= 512 and < 1024, and 
> >[1024,65535] for the > remaining ports. > >What do you think if we do 
> >something like the third bullet and squash >this change, if needed, in your 
> >next patch? > >In any case, there are a couple of small nits/questions, see 
> >inline. > >> lib/conntrack.c | 11 +++ >> 1 file changed, 7 
> >insertions(+), 4 deletions(-) >> >> diff --git a/lib/conntrack.c 
> >b/lib/conntrack.c >> index 551c206..2d14205 100644 >> --- a/lib/conntrack.c 
> >>> +++ b/lib/conntrack.c >> @@ -2412,8 +2412,8 @@ 
> >nat_get_unique_tuple(struct conntrack *ct, const struct conn *conn, >> 
> >uint32_t hash = nat_range_hash(conn, ct->hash_basis); >> bool pat_proto = 
> >conn->key.nw_proto == IPPROTO_TCP || >> conn->key.nw_proto == IPPROTO_UDP; 
> >>> - uint16_t min_dport, max_dport, curr_dport; >> - uint16_t min_sport, 
> >max_sp
 ort, curr_sport; >> + uint16_t min_dport, max_dport, curr_dport, orig_dport; 
>> + uint16_t min_sport, max_sport, curr_sport, orig_sport; > >can we keep the 
reverse xmas tree style here? > >> >> min_addr = conn->nat_info->min_addr; >> 
max_addr = conn->nat_info->max_addr; >> @@ -2425,9 +2425,9 @@ 
nat_get_unique_tuple(struct conntrack *ct, const struct conn *conn, >> * we can 
stop once we reach it. */ >> guard_addr = curr_addr; >> >> - 
set_sport_range(conn->nat_info, &conn->key, hash, &curr_sport, >> + 
set_sport_range(conn->nat_info, &conn->key, hash, &orig_sport, >> &min_sport, 
&max_sport); >> - set_dport_range(conn->nat_info, &conn->key, hash, 
&curr_dport, >> + set_dport_range(conn->nat_info, &conn->key, hash, 
&orig_dport, >> &min_dport, &max_dport); >> >> another_round: >> @@ -2443,6 
+2443,9 @@ another_round: >> goto next_addr; >> } >> >> + curr_sport = 
orig_sport; >> + curr_dport = orig_dport; >> + > >can this happen with dport? > 
>> FOR_EACH_PORT_IN_RANGE(curr_dport, min_dport, ma
 x_dport) { >> nat_conn->rev_key.src.port = htons(curr_dport); >> 
FOR_EACH_PORT_IN_RANGE(curr_sport, min_sport, max_sport) { >> -- >> 1.8.3.1 >



___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] ic: call ovn_db_run() only when all DBs are connected

2021-09-06 Thread Han Zhou
On Mon, Aug 30, 2021 at 12:57 PM Vladislav Odintsov 
wrote:
>
> Prior to this commit ovn_db_run() function in ovn-ic
> was called right after daemon's lock was acquired in SB DB.
>
> Inside ovn_db_run() there is a search for availability zone
> in NB DB. If AZ was not found, ovn_db_run returned. So, it was
> an implicit requirement for connected NB DB.
>
> But in case, where NB DB was connected and ICNB DB was not,
> we could run in situation where in ts_run() logical switches
> from NB DB were collected and transit switches from ICNB DB
> were not due to not synced ICNB DB. This lead to deletion of
> transit logical switches from NB DB.
>
> A normal restart of only one ovn-ic daemon in AZ, could damage
> the logical topology: it removed LS (transit), its LSPs.
> After ovn-ic is connected to ICNB, it creates logical switch
> in NB DB back, but its LSPs are lost and need manual re-addition.
>
> This commit adds explicit requirement for all DBs to be
> connected:
>
> - NB DB
> - SB DB
> - ICNB DB
> - ICSB DB
>
> Only when this condition is met, we can safely run ovn-ic sync
> logic.
>
> Signed-off-by: Vladislav Odintsov 

Thanks for the fix! This is a critical bug, so I applied to all branches.

Han

> ---
>  ic/ovn-ic.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/ic/ovn-ic.c b/ic/ovn-ic.c
> index 18066a305..fc608af82 100644
> --- a/ic/ovn-ic.c
> +++ b/ic/ovn-ic.c
> @@ -1786,7 +1786,11 @@ main(int argc, char *argv[])
>  state.had_lock = false;
>  }
>
> -if (ovsdb_idl_has_lock(ovnsb_idl_loop.idl)) {
> +if (ovsdb_idl_has_lock(ovnsb_idl_loop.idl) &&
> +ovsdb_idl_has_ever_connected(ctx.ovnnb_idl) &&
> +ovsdb_idl_has_ever_connected(ctx.ovnsb_idl) &&
> +ovsdb_idl_has_ever_connected(ctx.ovninb_idl) &&
> +ovsdb_idl_has_ever_connected(ctx.ovnisb_idl)) {
>  ovn_db_run(&ctx);
>  }
>
> --
> 2.30.0
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v14 0/7] Add offload support for sFlow

2021-09-06 Thread Chris Mi via dev

On 9/7/2021 1:59 PM, Eelco Chaudron wrote:


On 7 Sep 2021, at 3:34, Chris Mi wrote:


On 9/6/2021 5:47 PM, Eelco Chaudron wrote:

On 6 Sep 2021, at 11:14, Chris Mi wrote:


On 9/3/2021 8:54 PM, Eelco Chaudron wrote:

On 3 Sep 2021, at 14:02, Eelco Chaudron wrote:

  On 15 Jul 2021, at 8:01, Chris Mi wrote:

  This patch set adds offload support for sFlow.

  Psample is a genetlink channel for packet sampling. TC action
  act_sample
  uses psample to send sampled packets to userspace.

  When offloading sample action to TC, userspace creates a
  unique ID to
  map sFlow action and tunnel info and passes this ID to kernel
  instead
  of the sFlow info. psample will send this ID and sampled packet to
  userspace. Using the ID, userspace can recover the sFlow info
  and send
  sampled packet to the right sFlow monitoring host.

  Hi Chris,

  One thing missing from this patchset is a test case. I think we
  should add it, as I’m going over this manually every patch iteration.

  I would add the following:

   *

  Set the sample rate to 1, send 100 packets and make sure you
  receive all of them

o Also, verify the output of “ovs-appctl dpctl/dump-flows
  system@ovs-system type=tc” is correct, i.e., matches the
  kernel output
   *

  Set the sample rate to 10, send 100 packets and make sure you
  receive 10.

o Also, verify the output of “ovs-appctl dpctl/dump-flows
  system@ovs-system type=tc” is correct, i.e., matches the
  kernel output

  Cheers,

  Eelco

  PS: I also had a problem where only one packet got sent to the
  collector, and then no more packets would arrive. Of course, when
  I added some debug code, it never happened, and when removing the
  debug code, it also worked fine :( Did you ever experience
  something like this? I will play a bit more when reviewing
  specific code, maybe it will happen again.


One additional thing I’m seeing is the following:

$ ovs-appctl dpctl/dump-flows system@ovs-system type=tc
recirc_id(0),in_port(3),eth(src=52:54:00:88:51:38,dst=04:f4:bc:28:57:01),eth_type(0x0800),ipv4(frag=no),
 packets:7144, bytes:600096, used:7.430s, 
actions:sample(sample=10.0%,actions(userspace(pid=3925927229,sFlow(vid=0,pcp=0,output=2147483650),actions))),2

Sometimes I get a rather large sFlow(output=) value in the sFlow output. 
Converting the above to hex, 0x8002, where I see this in fix_sflow_action() 
as being mentioned multiple output ports? This seems wrong to me as it should 
be 3 (as it always is in the none hw offload case). This odd value is also 
reported to the sFlow samples.

Unfortunately, I do not have a reproducer for this.


Actually, in the very beginning of the development when I have a lot of debug 
code, I also observed such odd port number sometimes.
Such rule will disappear soon. So it is hard to test such corner case. And it 
doesn't affect the functionality.
So current code didn't deal with such corner case.

I’m getting this almost every time I restart OVS and initiate a flow (using a 
simple ping). This does not go away by itself, it stays until the flow times 
out. So this should be investigated and fixed.

My test setup is rather simple, just two ports, one is a VM one is a external 
host, which I ping from the VM. All default config, so using the NORMAL rule. 
This is my sFlow config:

   ovs-vsctl -- --id=@sflow create sflow agent=lo \
target="127.0.0.1" header=128 \
sampling=10 polling=4 \
-- set bridge ovs_pvp_br0 sflow=@sflow


I tried several times, I can't reproduce it:

# ovs-appctl dpctl/dump-flows system@ovs-system type=tc
recirc_id(0),in_port(2),eth(src=02:25:d0:09:01:02,dst=02:25:d0:09:01:03),eth_type(0x0800),ipv4(frag=no),
 packets:10, bytes:840, used:0.050s, 
actions:sample(sample=10.0%,actions(userspace(pid=3087130156,sFlow(vid=0,pcp=0,output=216),actions))),3
recirc_id(0),in_port(2),eth(src=02:25:d0:09:01:02,dst=02:25:d0:09:01:03),eth_type(0x0806),
 packets:0, bytes:0, used:10.080s, 
actions:sample(sample=10.0%,actions(userspace(pid=3087130156,sFlow(vid=0,pcp=0,output=216),actions))),3
recirc_id(0),in_port(3),eth(src=02:25:d0:09:01:03,dst=02:25:d0:09:01:02),eth_type(0x0800),ipv4(frag=no),
 packets:10, bytes:840, used:0.050s, 
actions:sample(sample=10.0%,actions(userspace(pid=3604571673,sFlow(vid=0,pcp=0,output=215),actions))),2
recirc_id(0),in_port(3),eth(src=02:25:d0:09:01:03,dst=02:25:d0:09:01:02),eth_type(0x0806),
 packets:0, bytes:0, used:10.080s, 
actions:sample(sample=10.0%,actions(userspace(pid=3604571673,sFlow(vid=0,pcp=0,output=215),actions))),2

Ok, I will try to debug it once I get to reviewing the last patch of the series.


Thanks.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.o

Re: [ovs-dev] [PATCH v14 0/7] Add offload support for sFlow

2021-09-06 Thread Eelco Chaudron


On 7 Sep 2021, at 3:34, Chris Mi wrote:

> On 9/6/2021 5:47 PM, Eelco Chaudron wrote:
>>
>> On 6 Sep 2021, at 11:14, Chris Mi wrote:
>>
>>> On 9/3/2021 8:54 PM, Eelco Chaudron wrote:

 On 3 Sep 2021, at 14:02, Eelco Chaudron wrote:

  On 15 Jul 2021, at 8:01, Chris Mi wrote:

  This patch set adds offload support for sFlow.

  Psample is a genetlink channel for packet sampling. TC action
  act_sample
  uses psample to send sampled packets to userspace.

  When offloading sample action to TC, userspace creates a
  unique ID to
  map sFlow action and tunnel info and passes this ID to kernel
  instead
  of the sFlow info. psample will send this ID and sampled packet to
  userspace. Using the ID, userspace can recover the sFlow info
  and send
  sampled packet to the right sFlow monitoring host.

  Hi Chris,

  One thing missing from this patchset is a test case. I think we
  should add it, as I’m going over this manually every patch iteration.

  I would add the following:

   *

  Set the sample rate to 1, send 100 packets and make sure you
  receive all of them

o Also, verify the output of “ovs-appctl dpctl/dump-flows
  system@ovs-system type=tc” is correct, i.e., matches the
  kernel output
   *

  Set the sample rate to 10, send 100 packets and make sure you
  receive 10.

o Also, verify the output of “ovs-appctl dpctl/dump-flows
  system@ovs-system type=tc” is correct, i.e., matches the
  kernel output

  Cheers,

  Eelco

  PS: I also had a problem where only one packet got sent to the
  collector, and then no more packets would arrive. Of course, when
  I added some debug code, it never happened, and when removing the
  debug code, it also worked fine :( Did you ever experience
  something like this? I will play a bit more when reviewing
  specific code, maybe it will happen again.


 One additional thing I’m seeing is the following:

 $ ovs-appctl dpctl/dump-flows system@ovs-system type=tc
 recirc_id(0),in_port(3),eth(src=52:54:00:88:51:38,dst=04:f4:bc:28:57:01),eth_type(0x0800),ipv4(frag=no),
  packets:7144, bytes:600096, used:7.430s, 
 actions:sample(sample=10.0%,actions(userspace(pid=3925927229,sFlow(vid=0,pcp=0,output=2147483650),actions))),2

 Sometimes I get a rather large sFlow(output=) value in the sFlow output. 
 Converting the above to hex, 0x8002, where I see this in 
 fix_sflow_action() as being mentioned multiple output ports? This seems 
 wrong to me as it should be 3 (as it always is in the none hw offload 
 case). This odd value is also reported to the sFlow samples.

 Unfortunately, I do not have a reproducer for this.

>>> Actually, in the very beginning of the development when I have a lot of 
>>> debug code, I also observed such odd port number sometimes.
>>> Such rule will disappear soon. So it is hard to test such corner case. And 
>>> it doesn't affect the functionality.
>>> So current code didn't deal with such corner case.
>> I’m getting this almost every time I restart OVS and initiate a flow (using 
>> a simple ping). This does not go away by itself, it stays until the flow 
>> times out. So this should be investigated and fixed.
>>
>> My test setup is rather simple, just two ports, one is a VM one is a 
>> external host, which I ping from the VM. All default config, so using the 
>> NORMAL rule. This is my sFlow config:
>>
>>   ovs-vsctl -- --id=@sflow create sflow agent=lo \
>>target="127.0.0.1" header=128 \
>>sampling=10 polling=4 \
>>-- set bridge ovs_pvp_br0 sflow=@sflow
>>
> I tried several times, I can't reproduce it:
>
> # ovs-appctl dpctl/dump-flows system@ovs-system type=tc
> recirc_id(0),in_port(2),eth(src=02:25:d0:09:01:02,dst=02:25:d0:09:01:03),eth_type(0x0800),ipv4(frag=no),
>  packets:10, bytes:840, used:0.050s, 
> actions:sample(sample=10.0%,actions(userspace(pid=3087130156,sFlow(vid=0,pcp=0,output=216),actions))),3
> recirc_id(0),in_port(2),eth(src=02:25:d0:09:01:02,dst=02:25:d0:09:01:03),eth_type(0x0806),
>  packets:0, bytes:0, used:10.080s, 
> actions:sample(sample=10.0%,actions(userspace(pid=3087130156,sFlow(vid=0,pcp=0,output=216),actions))),3
> recirc_id(0),in_port(3),eth(src=02:25:d0:09:01:03,dst=02:25:d0:09:01:02),eth_type(0x0800),ipv4(frag=no),
>  packets:10, bytes:840, used:0.050s, 
> actions:sample(sample=10.0%,actions(userspace(pid=3604571673,sFlow(vid=0,pcp=0,output=215),actions))),2
> recirc_id(0),in_port(3),eth(src=02:25:d0:09:01:03,dst=02:25:d0:09:01:02),eth_type(0x0806),
>  pack

Re: [ovs-dev] [PATCH ovn] ic: remove port_binding on ts deletion

2021-09-06 Thread Han Zhou
On Tue, Aug 24, 2021 at 11:45 AM Vladislav Odintsov 
wrote:
>
> When IC port_binding exists and transit switch is deleted,
> the orphan port_binding if left in the IC_SB_DB.
>
> This patch fixes such situation and adds test for this case.
>
> Signed-off-by: Vladislav Odintsov 
>

Thanks for fixing the bug! Please see my comments below.

---
>  ic/ovn-ic.c | 35 +++--
>  tests/ovn-ic.at | 52 +
>  2 files changed, 85 insertions(+), 2 deletions(-)
>
> diff --git a/ic/ovn-ic.c b/ic/ovn-ic.c
> index 75e798cd1..e80cd34ca 100644
> --- a/ic/ovn-ic.c
> +++ b/ic/ovn-ic.c
> @@ -66,6 +66,7 @@ struct ic_context {
>  struct ovsdb_idl_index *nbrec_port_by_name;
>  struct ovsdb_idl_index *sbrec_chassis_by_name;
>  struct ovsdb_idl_index *sbrec_port_binding_by_name;
> +struct ovsdb_idl_index *icsbrec_port_binding_by_az;
>  struct ovsdb_idl_index *icsbrec_port_binding_by_ts;
>  struct ovsdb_idl_index *icsbrec_route_by_ts;
>  };
> @@ -174,7 +175,7 @@ allocate_ts_dp_key(struct hmap *dp_tnlids)
>  }
>
>  static void
> -ts_run(struct ic_context *ctx)
> +ts_run(struct ic_context *ctx, const struct icsbrec_availability_zone
*az)
>  {
>  const struct icnbrec_transit_switch *ts;
>
> @@ -239,8 +240,25 @@ ts_run(struct ic_context *ctx)
>   * SB, to avoid uncommitted ISB datapath tunnel key to be synced
back to
>   * AZ. */
>  if (ctx->ovnisb_txn) {
> +struct shash isb_pbs = SHASH_INITIALIZER(&isb_pbs);
> +const struct icsbrec_port_binding *isb_pb;
> +const struct icsbrec_port_binding *isb_pb_key =
> +icsbrec_port_binding_index_init_row(
> +ctx->icsbrec_port_binding_by_az);
> +
> +icsbrec_port_binding_index_set_availability_zone(isb_pb_key, az);

It seems this index is not used. Looks like it is supposed to be used by
the below loop.

However, I think it is better to fix this by performing the port_binding
clean up in port_binding_run(), because that's where the port_binding table
is supposed to be updated, and no need for the extra index.
The current logic in port_binding_run() didn't consider the situation when
the TS is already deleted, so the big loop for NB TS table wouldn't delete
those port_bindings. To fix it, we could create a shash (all_local_pbs)
that collects all the port_bindings in IC_SB that belong to this AZ at the
beginning of that function, and in the loop when iterating each port of the
existing TSes, remove it from the all_local_pbs:
ICSBREC_PORT_BINDING_FOR_EACH_EQUAL (isb_pb, isb_pb_key,

ctx->icsbrec_port_binding_by_ts) {
if (isb_pb->availability_zone == az) {
shash_add(&local_pbs, isb_pb->logical_port, isb_pb);
+  // TODO: remove isb_pb from the all_local_pbs.

Finally, at the end of the function, we can remove all the port_bindings
left in the all_local_pbs - the ones created by this AZ but without TS.
What do you think?

> +
> +ICSBREC_PORT_BINDING_FOR_EACH (isb_pb, ctx->ovnisb_idl) {
> +shash_add(&isb_pbs, isb_pb->transit_switch, isb_pb);
> +}
> +
>  /* Create ISB Datapath_Binding */
>  ICNBREC_TRANSIT_SWITCH_FOR_EACH (ts, ctx->ovninb_idl) {
> +while (shash_find_and_delete(&isb_pbs, ts->name)) {
> +/* There may be multiple Port_Bindings */
> +continue;
> +}
> +
>  isb_dp = shash_find_and_delete(&isb_dps, ts->name);
>  if (!isb_dp) {
>  /* Allocate tunnel key */
> @@ -260,6 +278,13 @@ ts_run(struct ic_context *ctx)
>  SHASH_FOR_EACH (node, &isb_dps) {
>  icsbrec_datapath_binding_delete(node->data);
>  }
> +
> +SHASH_FOR_EACH (node, &isb_pbs) {
> +icsbrec_port_binding_delete(node->data);
> +}
> +
> +icsbrec_port_binding_index_destroy_row(isb_pb_key);
> +shash_destroy(&isb_pbs);
>  }
>  ovn_destroy_tnlids(&dp_tnlids);
>  shash_destroy(&isb_dps);
> @@ -1493,7 +1518,7 @@ ovn_db_run(struct ic_context *ctx)
>  return;
>  }
>
> -ts_run(ctx);
> +ts_run(ctx, az);
>  gateway_run(ctx, az);
>  port_binding_run(ctx, az);
>  route_run(ctx, az);
> @@ -1684,6 +1709,11 @@ main(int argc, char *argv[])
>  struct ovsdb_idl_index *sbrec_chassis_by_name
>  = ovsdb_idl_index_create1(ovnsb_idl_loop.idl,
>&sbrec_chassis_col_name);
> +
> +struct ovsdb_idl_index *icsbrec_port_binding_by_az
> += ovsdb_idl_index_create1(ovnisb_idl_loop.idl,
> +
 &icsbrec_port_binding_col_availability_zone);
> +
>  struct ovsdb_idl_index *icsbrec_port_binding_by_ts
>  = ovsdb_idl_index_create1(ovnisb_idl_loop.idl,
>
 &icsbrec_port_binding_col_transit_switch);
> @@ -1731,6 +1761,7 @@ main(int argc, char *argv[])
>  .nbrec_port_by_name = nbrec_port_by_name,
>  .sbrec_port_binding_b

Re: [ovs-dev] [PATCH v2 2/2] conntrack: limit port clash resolution attempts

2021-09-06 Thread wenxu









From: Paolo Valerio 
Date: 2021-09-06 19:08:44
To:  we...@ucloud.cn,i.maxim...@ovn.org,dlu...@gmail.com,acon...@redhat.com
Cc:  d...@openvswitch.org
Subject: Re: [PATCH v2 2/2] conntrack: limit port clash resolution 
attempts>Hello,
>
>Thanks for working on this.
>
>I like the idea of limiting the number of attempts, the logic has
>been kept to avoid yet another logical change in a single shot, but it
>is definitely needed especially for the case where both dst port and
>src port are manipulated.


I think both src port and dst port manipulated is only for a special case. It is
better put the specail case out of there like kernel datapath. The conntrack
in kernel only do src or dst port nat.  If dst nat failed then do src nat.


in the function ovs_ct_nat


err = ovs_ct_nat_execute(skb, ct, ctinfo, &info->range, maniptype);
 if (err == NF_ACCEPT && ct->status & IPS_DST_NAT) {
   x
} else if (CTINFO2DIR(ctinfo) == IP_CT_DIR_ORIGINAL) {
err = ovs_ct_nat_execute(skb, ct, ctinfo, NULL,
 NF_NAT_MANIP_SRC);
}
}



>
>we...@ucloud.cn writes:
>
>> From: wenxu 
>>
>> In case almost or all available ports are taken, clash resolution can
>> take a very long time, resulting in soft lockup.
>>
>> This can happen when many to-be-natted hosts connect to same
>> destination:port (e.g. a proxy) and all connections pass the same SNAT.
>>
>> Pick a random offset in the acceptable range, then try ever smaller
>> number of adjacent port numbers, until either the limit is reached or a
>> useable port was found.  This results in at most 248 attempts
>> (128 + 64 + 32 + 16 + 8, i.e. 4 restarts with new search offset)
>> instead of 64000+.
>
>248 attempts for each IP in the range, right?
>A large range of IP addresses may still theoretically lead to many
>attempts, although random subranges and the hash should help to
>reduce the chances in that case.
>
>>
>> Signed-off-by: wenxu 
>> ---
>>  lib/conntrack.c | 35 +++
>>  1 file changed, 31 insertions(+), 4 deletions(-)
>>
>> diff --git a/lib/conntrack.c b/lib/conntrack.c
>> index 2d14205..bc7de17 100644
>> --- a/lib/conntrack.c
>> +++ b/lib/conntrack.c
>> @@ -2414,6 +2414,10 @@ nat_get_unique_tuple(struct conntrack *ct, const 
>> struct conn *conn,
>>   conn->key.nw_proto == IPPROTO_UDP;
>>  uint16_t min_dport, max_dport, curr_dport, orig_dport;
>>  uint16_t min_sport, max_sport, curr_sport, orig_sport;
>> +static const unsigned int max_attempts = 128;
>> +uint16_t range_src, range_dst, range_max;
>> +unsigned int attempts;
>> +unsigned int i;
>>  
>>  min_addr = conn->nat_info->min_addr;
>>  max_addr = conn->nat_info->max_addr;
>> @@ -2430,6 +2434,10 @@ nat_get_unique_tuple(struct conntrack *ct, const 
>> struct conn *conn,
>>  set_dport_range(conn->nat_info, &conn->key, hash, &orig_dport,
>>  &min_dport, &max_dport);
>>  
>> +range_src = max_sport - min_sport + 1;
>> +range_dst = max_dport - min_dport + 1;
>> +range_max = range_src > range_dst ? range_src : range_dst;
>> +
>>  another_round:
>>  store_addr_to_key(&curr_addr, &nat_conn->rev_key,
>>conn->nat_info->nat_action);
>> @@ -2446,17 +2454,36 @@ another_round:
>>  curr_sport = orig_sport;
>>  curr_dport = orig_dport;
>>  
>> +attempts = range_max;
>> +if (attempts > max_attempts) {
>> +attempts = max_attempts;
>> +}
>> +
>> +another_port_round:
>> +i = 0;
>>  FOR_EACH_PORT_IN_RANGE(curr_dport, min_dport, max_dport) {
>>  nat_conn->rev_key.src.port = htons(curr_dport);
>>  FOR_EACH_PORT_IN_RANGE(curr_sport, min_sport, max_sport) {
>> -nat_conn->rev_key.dst.port = htons(curr_sport);
>> -if (!conn_lookup(ct, &nat_conn->rev_key,
>> - time_msec(), NULL, NULL)) {
>> -return true;
>> +if (i++ < attempts) {
>> +nat_conn->rev_key.dst.port = htons(curr_sport);
>> +if (!conn_lookup(ct, &nat_conn->rev_key,
>> + time_msec(), NULL, NULL)) {
>> +return true;
>> +}
>>  }
>
>this way we reduce the number of total lookups, but we still iterate
>through the entire range. We should avoid that.
>
>>  }
>>  }
>>  
>> +if (attempts >= range_max || attempts < 16) {
>> +goto next_addr;
>> +}
>> +
>> +attempts /= 2;
>> +curr_dport = random_uint32() % range_dst;
>> +curr_sport = random_uint32() % range_src;
>> +
>
>did you mean:
>
>curr = min + (random_uint32() % range);
>
>?
>
>> +goto another_port_round;
>> +
>>  /* Check if next IP is in range and respin. Otherwise, notify
>>   * exhaustion to the caller. */
>>  next_addr:
>> -- 
>> 1.8.3.1
>




___

Re: [ovs-dev] [PATCH v2 1/2] conntrack: restore the origin port for each round with new address

2021-09-06 Thread wenxu









From: Paolo Valerio 
Date: 2021-09-06 19:00:37
To:  we...@ucloud.cn,i.maxim...@ovn.org,dlu...@gmail.com,acon...@redhat.com
Cc:  d...@openvswitch.org
Subject: Re: [PATCH v2 1/2] conntrack: restore the origin port for each round 
with new address>Hello,
>
>we...@ucloud.cn writes:
>
>> From: wenxu 
>>
>> It is better to choose the origin select port as current port
>> for each port search round with new address.
>>
>> Signed-off-by: wenxu 
>> ---
>
>This should happen normally.
>It doesn't happen in the case of source port manipulation when the
>default ephemeral range is used and the packet source port is below
>1024. In that case the first IP iteration uses the packet source port,
>whereas the others don't.
>
>if we want to change this behavior, there are some more ways we can
>consider, e.g.:


This can be done for avoding keep old_sport for source ports < 1024 case. 
>
>- Using 1024 as initial curr_sport for source ports < 1024.
>- picking different default ranges
>  - e.g. the kernel uses the range [1, 511], if the source port is < 512,
>[600, 1023], if it's >= 512 and < 1024, and [1024,65535] for the
>remaining ports.
>
>What do you think if we do something like the third bullet and squash
>this change, if needed, in your next patch?
>
>In any case, there are a couple of small nits/questions, see inline.
>
>>  lib/conntrack.c | 11 +++
>>  1 file changed, 7 insertions(+), 4 deletions(-)
>>
>> diff --git a/lib/conntrack.c b/lib/conntrack.c
>> index 551c206..2d14205 100644
>> --- a/lib/conntrack.c
>> +++ b/lib/conntrack.c
>> @@ -2412,8 +2412,8 @@ nat_get_unique_tuple(struct conntrack *ct, const 
>> struct conn *conn,
>>  uint32_t hash = nat_range_hash(conn, ct->hash_basis);
>>  bool pat_proto = conn->key.nw_proto == IPPROTO_TCP ||
>>   conn->key.nw_proto == IPPROTO_UDP;
>> -uint16_t min_dport, max_dport, curr_dport;
>> -uint16_t min_sport, max_sport, curr_sport;
>> +uint16_t min_dport, max_dport, curr_dport, orig_dport;
>> +uint16_t min_sport, max_sport, curr_sport, orig_sport;
>
>can we keep the reverse xmas tree style here?
>
>>  
>>  min_addr = conn->nat_info->min_addr;
>>  max_addr = conn->nat_info->max_addr;
>> @@ -2425,9 +2425,9 @@ nat_get_unique_tuple(struct conntrack *ct, const 
>> struct conn *conn,
>>   * we can stop once we reach it. */
>>  guard_addr = curr_addr;
>>  
>> -set_sport_range(conn->nat_info, &conn->key, hash, &curr_sport,
>> +set_sport_range(conn->nat_info, &conn->key, hash, &orig_sport,
>>  &min_sport, &max_sport);
>> -set_dport_range(conn->nat_info, &conn->key, hash, &curr_dport,
>> +set_dport_range(conn->nat_info, &conn->key, hash, &orig_dport,
>>  &min_dport, &max_dport);
>>  
>>  another_round:
>> @@ -2443,6 +2443,9 @@ another_round:
>>  goto next_addr;
>>  }
>>  
>> +curr_sport = orig_sport;
>> +curr_dport = orig_dport;
>> +
>
>can this happen with dport?
>
>>  FOR_EACH_PORT_IN_RANGE(curr_dport, min_dport, max_dport) {
>>  nat_conn->rev_key.src.port = htons(curr_dport);
>>  FOR_EACH_PORT_IN_RANGE(curr_sport, min_sport, max_sport) {
>> -- 
>> 1.8.3.1
>




___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v14 0/7] Add offload support for sFlow

2021-09-06 Thread Chris Mi via dev

On 9/6/2021 5:47 PM, Eelco Chaudron wrote:


On 6 Sep 2021, at 11:14, Chris Mi wrote:


On 9/3/2021 8:54 PM, Eelco Chaudron wrote:


On 3 Sep 2021, at 14:02, Eelco Chaudron wrote:

 On 15 Jul 2021, at 8:01, Chris Mi wrote:

 This patch set adds offload support for sFlow.

 Psample is a genetlink channel for packet sampling. TC action
 act_sample
 uses psample to send sampled packets to userspace.

 When offloading sample action to TC, userspace creates a
 unique ID to
 map sFlow action and tunnel info and passes this ID to kernel
 instead
 of the sFlow info. psample will send this ID and sampled packet to
 userspace. Using the ID, userspace can recover the sFlow info
 and send
 sampled packet to the right sFlow monitoring host.

 Hi Chris,

 One thing missing from this patchset is a test case. I think we
 should add it, as I’m going over this manually every patch iteration.

 I would add the following:

  *

 Set the sample rate to 1, send 100 packets and make sure you
 receive all of them

   o Also, verify the output of “ovs-appctl dpctl/dump-flows
 system@ovs-system type=tc” is correct, i.e., matches the
 kernel output
  *

 Set the sample rate to 10, send 100 packets and make sure you
 receive 10.

   o Also, verify the output of “ovs-appctl dpctl/dump-flows
 system@ovs-system type=tc” is correct, i.e., matches the
 kernel output

 Cheers,

 Eelco

 PS: I also had a problem where only one packet got sent to the
 collector, and then no more packets would arrive. Of course, when
 I added some debug code, it never happened, and when removing the
 debug code, it also worked fine :( Did you ever experience
 something like this? I will play a bit more when reviewing
 specific code, maybe it will happen again.


One additional thing I’m seeing is the following:

$ ovs-appctl dpctl/dump-flows system@ovs-system type=tc
recirc_id(0),in_port(3),eth(src=52:54:00:88:51:38,dst=04:f4:bc:28:57:01),eth_type(0x0800),ipv4(frag=no),
 packets:7144, bytes:600096, used:7.430s, 
actions:sample(sample=10.0%,actions(userspace(pid=3925927229,sFlow(vid=0,pcp=0,output=2147483650),actions))),2

Sometimes I get a rather large sFlow(output=) value in the sFlow output. 
Converting the above to hex, 0x8002, where I see this in fix_sflow_action() 
as being mentioned multiple output ports? This seems wrong to me as it should 
be 3 (as it always is in the none hw offload case). This odd value is also 
reported to the sFlow samples.

Unfortunately, I do not have a reproducer for this.


Actually, in the very beginning of the development when I have a lot of debug 
code, I also observed such odd port number sometimes.
Such rule will disappear soon. So it is hard to test such corner case. And it 
doesn't affect the functionality.
So current code didn't deal with such corner case.

I’m getting this almost every time I restart OVS and initiate a flow (using a 
simple ping). This does not go away by itself, it stays until the flow times 
out. So this should be investigated and fixed.

My test setup is rather simple, just two ports, one is a VM one is a external 
host, which I ping from the VM. All default config, so using the NORMAL rule. 
This is my sFlow config:

  ovs-vsctl -- --id=@sflow create sflow agent=lo \
   target="127.0.0.1" header=128 \
   sampling=10 polling=4 \
   -- set bridge ovs_pvp_br0 sflow=@sflow


I tried several times, I can't reproduce it:

# ovs-appctl dpctl/dump-flows system@ovs-system type=tc
recirc_id(0),in_port(2),eth(src=02:25:d0:09:01:02,dst=02:25:d0:09:01:03),eth_type(0x0800),ipv4(frag=no), 
packets:10, bytes:840, used:0.050s, 
actions:sample(sample=10.0%,actions(userspace(pid=3087130156,sFlow(vid=0,pcp=0,output=216),actions))),3
recirc_id(0),in_port(2),eth(src=02:25:d0:09:01:02,dst=02:25:d0:09:01:03),eth_type(0x0806), 
packets:0, bytes:0, used:10.080s, 
actions:sample(sample=10.0%,actions(userspace(pid=3087130156,sFlow(vid=0,pcp=0,output=216),actions))),3
recirc_id(0),in_port(3),eth(src=02:25:d0:09:01:03,dst=02:25:d0:09:01:02),eth_type(0x0800),ipv4(frag=no), 
packets:10, bytes:840, used:0.050s, 
actions:sample(sample=10.0%,actions(userspace(pid=3604571673,sFlow(vid=0,pcp=0,output=215),actions))),2
recirc_id(0),in_port(3),eth(src=02:25:d0:09:01:03,dst=02:25:d0:09:01:02),eth_type(0x0806), 
packets:0, bytes:0, used:10.080s, 
actions:sample(sample=10.0%,actions(userspace(pid=3604571673,sFlow(vid=0,pcp=0,output=215),actions))),2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [syzbot] INFO: task hung in ovs_dp_masks_rebalance (2)

2021-09-06 Thread syzbot
Hello,

syzbot found the following issue on:

HEAD commit:7d2a07b76933 Linux 5.14
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=164cc59d30
kernel config:  https://syzkaller.appspot.com/x/.config?x=3931e3b26cba5acf
dashboard link: https://syzkaller.appspot.com/bug?extid=45a5ef16ab5bc505d42a
compiler:   gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for 
Debian) 2.35.1

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+45a5ef16ab5bc505d...@syzkaller.appspotmail.com

INFO: task kworker/0:2:8001 blocked for more than 143 seconds.
  Not tainted 5.14.0-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/0:2 state:D stack:25088 pid: 8001 ppid: 2 flags:0x4000
Workqueue: events ovs_dp_masks_rebalance
Call Trace:
 context_switch kernel/sched/core.c:4695 [inline]
 __schedule+0x93a/0x26f0 kernel/sched/core.c:6026
 schedule+0xd3/0x270 kernel/sched/core.c:6105
 schedule_preempt_disabled+0xf/0x20 kernel/sched/core.c:6164
 __mutex_lock_common kernel/locking/mutex.c:1036 [inline]
 __mutex_lock+0x7b6/0x10a0 kernel/locking/mutex.c:1104
 ovs_lock net/openvswitch/datapath.c:106 [inline]
 ovs_dp_masks_rebalance+0x20/0xf0 net/openvswitch/datapath.c:2386
 process_one_work+0x98d/0x1630 kernel/workqueue.c:2276
 worker_thread+0x658/0x11f0 kernel/workqueue.c:2422
 kthread+0x3e5/0x4d0 kernel/kthread.c:319
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
INFO: task kworker/1:0:13766 blocked for more than 143 seconds.
  Not tainted 5.14.0-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/1:0 state:D stack:22448 pid:13766 ppid: 2 flags:0x4000
Workqueue: events ovs_dp_masks_rebalance
Call Trace:
 context_switch kernel/sched/core.c:4695 [inline]
 __schedule+0x93a/0x26f0 kernel/sched/core.c:6026
 schedule+0xd3/0x270 kernel/sched/core.c:6105
 schedule_preempt_disabled+0xf/0x20 kernel/sched/core.c:6164
 __mutex_lock_common kernel/locking/mutex.c:1036 [inline]
 __mutex_lock+0x7b6/0x10a0 kernel/locking/mutex.c:1104
 ovs_lock net/openvswitch/datapath.c:106 [inline]
 ovs_dp_masks_rebalance+0x20/0xf0 net/openvswitch/datapath.c:2386
 process_one_work+0x98d/0x1630 kernel/workqueue.c:2276
 worker_thread+0x658/0x11f0 kernel/workqueue.c:2422
 kthread+0x3e5/0x4d0 kernel/kthread.c:319
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
INFO: task kworker/1:1:9214 blocked for more than 143 seconds.
  Not tainted 5.14.0-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/1:1 state:D stack:26048 pid: 9214 ppid: 2 flags:0x4000
Workqueue: events ovs_dp_masks_rebalance
Call Trace:
 context_switch kernel/sched/core.c:4695 [inline]
 __schedule+0x93a/0x26f0 kernel/sched/core.c:6026
 schedule+0xd3/0x270 kernel/sched/core.c:6105
 schedule_preempt_disabled+0xf/0x20 kernel/sched/core.c:6164
 __mutex_lock_common kernel/locking/mutex.c:1036 [inline]
 __mutex_lock+0x7b6/0x10a0 kernel/locking/mutex.c:1104
 ovs_lock net/openvswitch/datapath.c:106 [inline]
 ovs_dp_masks_rebalance+0x20/0xf0 net/openvswitch/datapath.c:2386
 process_one_work+0x98d/0x1630 kernel/workqueue.c:2276
 worker_thread+0x658/0x11f0 kernel/workqueue.c:2422
 kthread+0x3e5/0x4d0 kernel/kthread.c:319
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
INFO: task kworker/1:7:27170 blocked for more than 143 seconds.
  Not tainted 5.14.0-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/1:7 state:D stack:25128 pid:27170 ppid: 2 flags:0x4000
Workqueue: events ovs_dp_masks_rebalance
Call Trace:
 context_switch kernel/sched/core.c:4695 [inline]
 __schedule+0x93a/0x26f0 kernel/sched/core.c:6026
 schedule+0xd3/0x270 kernel/sched/core.c:6105
 schedule_preempt_disabled+0xf/0x20 kernel/sched/core.c:6164
 __mutex_lock_common kernel/locking/mutex.c:1036 [inline]
 __mutex_lock+0x7b6/0x10a0 kernel/locking/mutex.c:1104
 ovs_lock net/openvswitch/datapath.c:106 [inline]
 ovs_dp_masks_rebalance+0x20/0xf0 net/openvswitch/datapath.c:2386
 process_one_work+0x98d/0x1630 kernel/workqueue.c:2276
 worker_thread+0x658/0x11f0 kernel/workqueue.c:2422
 kthread+0x3e5/0x4d0 kernel/kthread.c:319
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
INFO: task kworker/0:10:27216 blocked for more than 143 seconds.
  Not tainted 5.14.0-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/0:10state:D stack:27856 pid:27216 ppid: 2 flags:0x4000
Workqueue: events ovs_dp_masks_rebalance
Call Trace:
 context_switch kernel/sched/core.c:4695 [inline]
 __schedule+0x93a/0x26f0 kernel/sched/core.c:6026
 schedule+0xd3/0x270 kernel/sched/core.c:6105
 schedule_preempt_disabled+0xf/0x20 kernel/sched/core.c:6164
 __mutex_lock_common kernel/locking/mutex.c:10

[ovs-dev] [PATCH] ovsdb-idl: Send "set_db_change_aware" before "monitor_cond_since".

2021-09-06 Thread Chengang (Russell Lab)
Hi Han and Ilya,

This patch fix the issue: IDL client closes the connection after got the 
response to the monitor_cond_since request, without wait for the reponse to the 
set_db_change_aware request. Warning message will be add to ovsdb-server.log

It makes user confused the issue will occurred even with the command "ovs-vsctl 
show" or ovs-vsctl other commands.

Is there any other ideas on this issue? 
If this patch can fix it, can we back port this patch to 2.13.0 as it has the 
same issue? 

Best Regards!
Gang Chen

> -Original Message-
> From: dev [mailto:ovs-dev-boun...@openvswitch.org] On Behalf Of Dumitru
> Ceara
> Sent: Friday, July 10, 2020 7:33 PM
> To: d...@openvswitch.org
> Cc: Han Zhou ; Ilya Maximets 
> Subject: [ovs-dev] [PATCH] ovsdb-idl: Send "set_db_change_aware" before
> "monitor_cond_since".
> 
> For short lived IDL clients (e.g., ovn-sbctl) if the client sends
> monitor_cond_since before set_db_change_aware, the client might close the
> DB connection immediately after it received the reply for monitor_cond_since
> and before the server has a chance to reply to set_db_change_aware.
> 
> E.g., from the logs of the ovsdb-server:
> 2020-07-10T09:29:52.649Z|04479|jsonrpc|DBG|unix#72: received request,
> method="monitor_cond_since", params=["OVN_Southbound",
> ["monid","OVN_Southbound"],{"SB_Global":[{"columns":["options"]}]},
> "----"], id=2
> 2020-07-10T09:29:52.649Z|04480|jsonrpc|DBG|unix#72: send reply,
> result=[false,"----",
> {"SB_Global":{"6ad26b48-a742-4fe1-8671-3975e2146ce6":{"initial":
> {"options":["map",[["mac_prefix","be:85:cb"],["svc_monitor_mac",
> "52:58:b5:19:8c:40"]]]], id=2
> 2020-07-10T09:29:52.649Z|04482|jsonrpc|DBG|unix#72: received request,
> method="set_db_change_aware", params=[true], id=3
> 
> <<< IDL client closes the connection here because it already got the response
> to the monitor_cond_since request.
> 
> 2020-07-10T09:29:59.023Z|04483|jsonrpc|DBG|unix#72: send reply, result={},
> id=3
> 2020-07-10T09:29:59.023Z|04484|stream_fd|DBG|send: Broken pipe
> 2020-07-10T09:29:59.023Z|04485|jsonrpc|WARN|unix#72: send error: Broken
> pipe
> 
> While this is not a critical issue, it can be easily mitigated by changing 
> the IDL
> client to always send "set_db_change_aware" before "monitor_cond_since".
> This way we ensure that a well behaving IDL client doesn't close the 
> connection
> too early, avoiding the error logs in ovsdb-server.
> 
> This patch moves the code to send monitor_cond_since(data) from function
> ovsdb_idl_check_server_db() to ovsdb_idl_process_response() as we can
> transition to IDL_S_DATA_MONITOR_COND_SINCE_REQUESTED only upon
> receiving a reply for monitor_cond(server).
> 
> CC: Ben Pfaff 
> CC: Han Zhou 
> CC: Ilya Maximets 
> Reported-by: Girish Moodalbail 
> Reported-at:
> https://mail.openvswitch.org/pipermail/ovs-discuss/2020-July/050343.html
> Fixes: 1b1d2e6daa56 ("ovsdb: Introduce experimental support for clustered
> databases.")
> Signed-off-by: Dumitru Ceara 
> ---
>  lib/ovsdb-idl.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c index ef3b97b..c6427f5 100644
> --- a/lib/ovsdb-idl.c
> +++ b/lib/ovsdb-idl.c
> @@ -770,6 +770,10 @@ ovsdb_idl_process_response(struct ovsdb_idl *idl,
> struct jsonrpc_msg *msg)
> 
> OVSDB_IDL_MM_MONITOR_COND);
>  if (ovsdb_idl_check_server_db(idl)) {
>  ovsdb_idl_send_db_change_aware(idl);
> +ovsdb_idl_send_monitor_request(
> +idl, &idl->data,
> OVSDB_IDL_MM_MONITOR_COND_SINCE);
> +ovsdb_idl_transition(
> +idl,
> IDL_S_DATA_MONITOR_COND_SINCE_REQUESTED);
>  }
>  } else {
>  ovsdb_idl_send_schema_request(idl, &idl->data); @@ -2057,9
> +2061,6 @@ ovsdb_idl_check_server_db(struct ovsdb_idl *idl)
>  if (idl->state == IDL_S_SERVER_MONITOR_COND_REQUESTED) {
>  json_destroy(idl->data.schema);
>  idl->data.schema = json_from_string(database->schema);
> -ovsdb_idl_send_monitor_request(idl, &idl->data,
> -
> OVSDB_IDL_MM_MONITOR_COND_SINCE);
> -ovsdb_idl_transition(idl,
> IDL_S_DATA_MONITOR_COND_SINCE_REQUESTED);
>  }
>  return true;
>  }
> --
> 1.8.3.1
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [OVN Patch v3 2/2] Add support for configuring parallelization via unixctl

2021-09-06 Thread anton . ivanov
From: Anton Ivanov 

Signed-off-by: Anton Ivanov 
---
 lib/ovn-parallel-hmap.c | 209 ++--
 lib/ovn-parallel-hmap.h |  63 +++-
 northd/ovn-northd.c |  26 ++---
 tests/ovn-macros.at |  16 ++-
 4 files changed, 283 insertions(+), 31 deletions(-)

diff --git a/lib/ovn-parallel-hmap.c b/lib/ovn-parallel-hmap.c
index 30de457b5..8a055b2c6 100644
--- a/lib/ovn-parallel-hmap.c
+++ b/lib/ovn-parallel-hmap.c
@@ -33,6 +33,7 @@
 #include "ovs-thread.h"
 #include "ovs-numa.h"
 #include "random.h"
+#include "unixctl.h"
 
 VLOG_DEFINE_THIS_MODULE(ovn_parallel_hmap);
 
@@ -46,6 +47,7 @@ VLOG_DEFINE_THIS_MODULE(ovn_parallel_hmap);
  */
 static atomic_bool initial_pool_setup = ATOMIC_VAR_INIT(false);
 static bool can_parallelize = false;
+static bool should_parallelize = false;
 
 /* This is set only in the process of exit and the set is
  * accompanied by a fence. It does not need to be atomic or be
@@ -85,6 +87,19 @@ ovn_stop_parallel_processing(struct worker_pool *pool)
 return pool->workers_must_exit;
 }
 
+bool
+ovn_set_parallel_processing(bool enable)
+{
+should_parallelize = enable;
+return can_parallelize;
+}
+
+bool
+ovn_get_parallel_processing(void)
+{
+return can_parallelize && should_parallelize;
+}
+
 bool
 ovn_can_parallelize_hashes(bool force_parallel)
 {
@@ -110,6 +125,7 @@ destroy_pool(struct worker_pool *pool) {
 sem_close(pool->done);
 sprintf(sem_name, MAIN_SEM_NAME, sembase, pool);
 sem_unlink(sem_name);
+free(pool->name);
 free(pool);
 }
 
@@ -120,6 +136,10 @@ ovn_resize_pool(struct worker_pool *pool, int size)
 
 ovs_assert(pool != NULL);
 
+if (!pool->is_mutable) {
+return false;
+}
+
 if (!size) {
 size = pool_size;
 }
@@ -159,7 +179,8 @@ cleanup:
 
 
 struct worker_pool *
-ovn_add_worker_pool(void *(*start)(void *), int size)
+ovn_add_worker_pool(void *(*start)(void *), int size, char *name,
+bool is_mutable)
 {
 struct worker_pool *new_pool = NULL;
 bool test = false;
@@ -187,6 +208,8 @@ ovn_add_worker_pool(void *(*start)(void *), int size)
 new_pool = xmalloc(sizeof(struct worker_pool));
 new_pool->size = size;
 new_pool->start = start;
+new_pool->is_mutable = is_mutable;
+new_pool->name = xstrdup(name);
 sprintf(sem_name, MAIN_SEM_NAME, sembase, new_pool);
 new_pool->done = sem_open(sem_name, O_CREAT, S_IRWXU, 0);
 if (new_pool->done == SEM_FAILED) {
@@ -219,6 +242,7 @@ cleanup:
 sprintf(sem_name, MAIN_SEM_NAME, sembase, new_pool);
 sem_unlink(sem_name);
 }
+free(new_pool->name);
 ovs_mutex_unlock(&init_mutex);
 return NULL;
 }
@@ -267,13 +291,9 @@ ovn_fast_hmap_size_for(struct hmap *hmap, int size)
 /* Run a thread pool which uses a callback function to process results
  */
 void
-ovn_run_pool_callback(struct worker_pool *pool,
-  void *fin_result, void *result_frags,
-  void (*helper_func)(struct worker_pool *pool,
-  void *fin_result,
-  void *result_frags, int index))
+ovn_start_pool(struct worker_pool *pool)
 {
-int index, completed;
+int index;
 
 /* Ensure that all worker threads see the same data as the
  * main thread.
@@ -284,8 +304,19 @@ ovn_run_pool_callback(struct worker_pool *pool,
 for (index = 0; index < pool->size; index++) {
 sem_post(pool->controls[index].fire);
 }
+}
+
 
-completed = 0;
+/* Run a thread pool which uses a callback function to process results
+ */
+void
+ovn_complete_pool_callback(struct worker_pool *pool,
+  void *fin_result, void *result_frags,
+  void (*helper_func)(struct worker_pool *pool,
+  void *fin_result,
+  void *result_frags, int index))
+{
+int index, completed = 0;
 
 do {
 bool test;
@@ -327,6 +358,18 @@ ovn_run_pool_callback(struct worker_pool *pool,
 }
 } while (completed < pool->size);
 }
+/* Run a thread pool which uses a callback function to process results
+ */
+void
+ovn_run_pool_callback(struct worker_pool *pool,
+  void *fin_result, void *result_frags,
+  void (*helper_func)(struct worker_pool *pool,
+  void *fin_result,
+  void *result_frags, int index))
+{
+start_pool(pool);
+complete_pool_callback(pool, fin_result, result_frags, helper_func);
+}
 
 /* Run a thread pool - basic, does not do results processing.
  */
@@ -373,6 +416,28 @@ ovn_fast_hmap_merge(struct hmap *dest, struct hmap *inc)
 inc->n = 0;
 }
 
+/* Run a thread pool which gathers results in an array
+ * of hashes. Merge results.
+ */
+void
+ovn_complete_pool_hash(struct worker_pool *pool,
+   

[ovs-dev] [OVN Patch v3 1/2] Make changes to the parallel processing API to allow pool sizing

2021-09-06 Thread anton . ivanov
From: Anton Ivanov 

1. Make pool size user defineable.
2. Expose pool destruction.
3. Make pools resizeable at runtime.

Signed-off-by: Anton Ivanov 
---
 lib/ovn-parallel-hmap.c | 202 ++--
 lib/ovn-parallel-hmap.h |  23 -
 northd/ovn-northd.c |  58 +---
 ovs |   2 +-
 4 files changed, 194 insertions(+), 91 deletions(-)

diff --git a/lib/ovn-parallel-hmap.c b/lib/ovn-parallel-hmap.c
index b8c7ac786..30de457b5 100644
--- a/lib/ovn-parallel-hmap.c
+++ b/lib/ovn-parallel-hmap.c
@@ -51,7 +51,6 @@ static bool can_parallelize = false;
  * accompanied by a fence. It does not need to be atomic or be
  * accessed under a lock.
  */
-static bool workers_must_exit = false;
 
 static struct ovs_list worker_pools = OVS_LIST_INITIALIZER(&worker_pools);
 
@@ -70,10 +69,20 @@ static void merge_hash_results(struct worker_pool *pool 
OVS_UNUSED,
void *fin_result, void *result_frags,
int index);
 
+
+static bool init_control(struct worker_control *control, int id,
+ struct worker_pool *pool);
+
+static void cleanup_control(struct worker_pool *pool, int id);
+
+static void free_controls(struct worker_pool *pool);
+
+static struct worker_control *alloc_controls(int size);
+
 bool
-ovn_stop_parallel_processing(void)
+ovn_stop_parallel_processing(struct worker_pool *pool)
 {
-return workers_must_exit;
+return pool->workers_must_exit;
 }
 
 bool
@@ -92,11 +101,67 @@ ovn_can_parallelize_hashes(bool force_parallel)
 return can_parallelize;
 }
 
+
+void
+destroy_pool(struct worker_pool *pool) {
+char sem_name[256];
+
+free_controls(pool);
+sem_close(pool->done);
+sprintf(sem_name, MAIN_SEM_NAME, sembase, pool);
+sem_unlink(sem_name);
+free(pool);
+}
+
+bool
+ovn_resize_pool(struct worker_pool *pool, int size)
+{
+int i;
+
+ovs_assert(pool != NULL);
+
+if (!size) {
+size = pool_size;
+}
+
+ovs_mutex_lock(&init_mutex);
+
+if (can_parallelize) {
+free_controls(pool);
+pool->size = size;
+
+/* Allocate new control structures. */
+
+pool->controls = alloc_controls(size);
+pool->workers_must_exit = false;
+
+for (i = 0; i < pool->size; i++) {
+if (! init_control(&pool->controls[i], i, pool)) {
+goto cleanup;
+}
+}
+}
+ovs_mutex_unlock(&init_mutex);
+return true;
+cleanup:
+
+/* Something went wrong when opening semaphores. In this case
+ * it is better to shut off parallel procesing altogether
+ */
+
+VLOG_INFO("Failed to initialize parallel processing, error %d", errno);
+can_parallelize = false;
+free_controls(pool);
+
+ovs_mutex_unlock(&init_mutex);
+return false;
+}
+
+
 struct worker_pool *
-ovn_add_worker_pool(void *(*start)(void *))
+ovn_add_worker_pool(void *(*start)(void *), int size)
 {
 struct worker_pool *new_pool = NULL;
-struct worker_control *new_control;
 bool test = false;
 int i;
 char sem_name[256];
@@ -113,38 +178,29 @@ ovn_add_worker_pool(void *(*start)(void *))
 ovs_mutex_unlock(&init_mutex);
 }
 
+if (!size) {
+size = pool_size;
+}
+
 ovs_mutex_lock(&init_mutex);
 if (can_parallelize) {
 new_pool = xmalloc(sizeof(struct worker_pool));
-new_pool->size = pool_size;
-new_pool->controls = NULL;
+new_pool->size = size;
+new_pool->start = start;
 sprintf(sem_name, MAIN_SEM_NAME, sembase, new_pool);
 new_pool->done = sem_open(sem_name, O_CREAT, S_IRWXU, 0);
 if (new_pool->done == SEM_FAILED) {
 goto cleanup;
 }
 
-new_pool->controls =
-xmalloc(sizeof(struct worker_control) * new_pool->size);
+new_pool->controls = alloc_controls(size);
+new_pool->workers_must_exit = false;
 
 for (i = 0; i < new_pool->size; i++) {
-new_control = &new_pool->controls[i];
-new_control->id = i;
-new_control->done = new_pool->done;
-new_control->data = NULL;
-ovs_mutex_init(&new_control->mutex);
-new_control->finished = ATOMIC_VAR_INIT(false);
-sprintf(sem_name, WORKER_SEM_NAME, sembase, new_pool, i);
-new_control->fire = sem_open(sem_name, O_CREAT, S_IRWXU, 0);
-if (new_control->fire == SEM_FAILED) {
+if (!init_control(&new_pool->controls[i], i, new_pool)) {
 goto cleanup;
 }
 }
-
-for (i = 0; i < pool_size; i++) {
-new_pool->controls[i].worker =
-ovs_thread_create("worker pool helper", start, 
&new_pool->controls[i]);
-}
 ovs_list_push_back(&worker_pools, &new_pool->list_node);
 }
 ovs_mutex_unlock(&init_mutex);
@@ -157,16 +213,7 @@ cleanup:
 
 VLOG_INFO("Failed to initialize parallel 

Re: [ovs-dev] [PATCH v14 5/7] dpif-offload-netlink: Implement dpif-offload-provider API

2021-09-06 Thread Eelco Chaudron
On 15 Jul 2021, at 8:01, Chris Mi wrote:

> Implement dpif-offload API for netlink datapath.
>
> Signed-off-by: Chris Mi 
> Reviewed-by: Eli Britstein 
> ---
>  lib/automake.mk |   1 +
>  lib/dpif-netlink.c  |   2 +-
>  lib/dpif-offload-netlink.c  | 210 
>  lib/dpif-offload-provider.h |  12 +++
>  4 files changed, 224 insertions(+), 1 deletion(-)
>  create mode 100644 lib/dpif-offload-netlink.c
>
> diff --git a/lib/automake.mk b/lib/automake.mk
> index dc865b0ef..daa60c784 100644
> --- a/lib/automake.mk
> +++ b/lib/automake.mk
> @@ -442,6 +442,7 @@ lib_libopenvswitch_la_SOURCES += \
>   lib/dpif-netlink.h \
>   lib/dpif-netlink-rtnl.c \
>   lib/dpif-netlink-rtnl.h \
> + lib/dpif-offload-netlink.c \
>   lib/if-notifier.c \
>   lib/netdev-linux.c \
>   lib/netdev-linux.h \
> diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
> index 6a7defb95..676934ef1 100644
> --- a/lib/dpif-netlink.c
> +++ b/lib/dpif-netlink.c
> @@ -4037,7 +4037,7 @@ const struct dpif_class dpif_netlink_class = {
>  NULL,   /* bond_add */
>  NULL,   /* bond_del */
>  NULL,   /* bond_stats_get */
> -NULL,   /* dpif_offlod_api */
> +&dpif_offload_netlink,
>  };
>
>  static int
> diff --git a/lib/dpif-offload-netlink.c b/lib/dpif-offload-netlink.c
> new file mode 100644
> index 0..f02a6b0eb
> --- /dev/null
> +++ b/lib/dpif-offload-netlink.c
> @@ -0,0 +1,210 @@
> +/*
> + * Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
> + *
> + * Licensed under the Apache License, Version 2.0 (the "License");
> + * you may not use this file except in compliance with the License.
> + * You may obtain a copy of the License at:
> + *
> + * http://www.apache.org/licenses/LICENSE-2.0
> + *
> + * Unless required by applicable law or agreed to in writing, software
> + * distributed under the License is distributed on an "AS IS" BASIS,
> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> + * See the License for the specific language governing permissions and
> + * limitations under the License.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "dpif-offload-provider.h"
> +#include "netdev-offload.h"
> +#include "netlink-protocol.h"
> +#include "netlink-socket.h"
> +#include "openvswitch/vlog.h"
> +
> +VLOG_DEFINE_THIS_MODULE(dpif_offload_netlink);
> +
> +static struct nl_sock *psample_sock;
> +static int psample_family;
> +
> +/* Receive psample netlink message and save the attributes. */
> +struct offload_psample {
> +struct nlattr *packet;  /* packet data */
> +int dp_group_id;/* mapping id for sFlow offload */
> +int iifindex;   /* input ifindex */
> +int group_seq;  /* group sequence */

Same comment on captitals and dots.

> +};
> +
> +static void
> +dpif_offload_netlink_init(void)
> +{
> +unsigned int psample_mcgroup;
> +int err;
> +
> +if (!netdev_is_flow_api_enabled()) {

I think the parameter documentation mentions that you need to restart 
ovs-vswitchd when you enable hw-offload, so I think we should be good here.
Are we sure there are no corner cases where this initialization comes before 
the netdev_is_flow_api_enabled() setup? Maybe adding a debug log message might 
help, so we know we did not initialize for this specific reason?

> +return;
> +}
> +
> +if (psample_sock) {

Same here, a debug log message might be good to add.

> +return;
> +}
> +
> +err = nl_lookup_genl_family(PSAMPLE_GENL_NAME,
> +&psample_family);
> +if (err) {
> +VLOG_INFO("%s: Generic Netlink family '%s' does not exist. "
> +  "Please make sure the kernel module psample is loaded",
> +  __func__, PSAMPLE_GENL_NAME);

Should we return the err? Maybe the current log message can be for the specific 
error code and add a more specific one for the others?

> +return;
> +}
> +
> +err = nl_lookup_genl_mcgroup(PSAMPLE_GENL_NAME,
> + PSAMPLE_NL_MCGRP_SAMPLE_NAME,
> + &psample_mcgroup);
> +if (err) {
> +VLOG_INFO("%s: Failed to join multicast group '%s' for Generic "
> +  "Netlink family '%s'", __func__,

Add error code, something like "Netlink family '%s' with error %"

> +  PSAMPLE_NL_MCGRP_SAMPLE_NAME,
> +  PSAMPLE_GENL_NAME);

> +return;
> +}
> +
> +err = nl_sock_create(NETLINK_GENERIC, &psample_sock);
> +if (err) {
> +VLOG_INFO("%s: Failed to create psample socket", __func__);

Add error code to log message

> +return;
> +}
> +
> +err = nl_sock_join_mcgroup(psample_sock, psample_mcgroup);
> +if (err) {
> +VLOG_INFO("%s: Failed to join psample mcg

Re: [ovs-dev] [PATCH v14 4/7] netdev-offload-tc: Introduce group ID management API

2021-09-06 Thread Eelco Chaudron
On 15 Jul 2021, at 8:01, Chris Mi wrote:

> When offloading sample action to TC, userspace creates a unique ID
> to map sFlow action and tunnel info and passes this ID to kernel instead
> of the sFlow info. psample will send this ID and sampled packet to
> userspace. Using the ID, userspace can recover the sFlow info and send
> sampled packet to the right sFlow monitoring host.
>
> Signed-off-by: Chris Mi 
> Reviewed-by: Eli Britstein 
> ---
>  lib/dpif-offload-provider.h |  18 +++
>  lib/netdev-offload-tc.c | 272 +++-
>  lib/netdev-offload.h|   1 +
>  lib/tc.h|   1 +
>  4 files changed, 290 insertions(+), 2 deletions(-)
>
> diff --git a/lib/dpif-offload-provider.h b/lib/dpif-offload-provider.h
> index 97108402a..b765eb9a2 100644
> --- a/lib/dpif-offload-provider.h
> +++ b/lib/dpif-offload-provider.h
> @@ -17,9 +17,27 @@
>  #ifndef DPIF_OFFLOAD_PROVIDER_H
>  #define DPIF_OFFLOAD_PROVIDER_H
>
> +#include "netlink-protocol.h"
> +#include "openvswitch/packets.h"
> +#include "openvswitch/types.h"
> +
>  struct dpif;
>  struct dpif_offload_sflow;
>
> +/* When offloading sample action, userspace creates a unique ID to map
> + * sFlow action and tunnel info and passes this ID to datapath instead
> + * of the sFlow info. Datapath will send this ID and sampled packet to
> + * userspace. Using the ID, userspace can recover the sFlow info and send
> + * sampled packet to the right sFlow monitoring host.
> + */
> +struct dpif_sflow_attr {
> +const struct nlattr *action; /* sFlow action */
> +void *userdata;  /* struct user_action_cookie */
> +size_t userdata_len; /* struct user_action_cookie length */
> +struct flow_tnl *tunnel; /* tunnel info */
> +ovs_u128 ufid;   /* flow ufid */
> +};
> +
>  struct dpif_offload_api {
>  void (*init)(void);
>  void (*uninit)(void);
> diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
> index 9845e8d3f..2f16cf279 100644
> --- a/lib/netdev-offload-tc.c
> +++ b/lib/netdev-offload-tc.c
> @@ -40,6 +40,7 @@
>  #include "unaligned.h"
>  #include "util.h"
>  #include "dpif-provider.h"
> +#include "cmap.h"
>
>  VLOG_DEFINE_THIS_MODULE(netdev_offload_tc);
>
> @@ -62,6 +63,262 @@ struct chain_node {
>  uint32_t chain;
>  };
>
> +/* This maps a psample group ID to struct dpif_sflow_attr for sFlow */
> +struct sgid_node {
> +struct ovs_list exp_node OVS_GUARDED;
> +struct cmap_node metadata_node;
> +struct cmap_node id_node;
> +struct ovs_refcount refcount;
> +uint32_t hash;
> +uint32_t id;
> +const struct dpif_sflow_attr sflow;
> +};
> +
> +static struct ovs_rwlock sgid_rwlock = OVS_RWLOCK_INITIALIZER;
> +
> +static long long int sgid_last_run OVS_GUARDED_BY(sgid_rwlock);
> +
> +static struct cmap sgid_map = CMAP_INITIALIZER;
> +static struct cmap sgid_metadata_map = CMAP_INITIALIZER;
> +
> +static struct ovs_list sgid_expiring OVS_GUARDED_BY(sgid_rwlock)
> += OVS_LIST_INITIALIZER(&sgid_expiring);
> +static struct ovs_list sgid_expired OVS_GUARDED_BY(sgid_rwlock)
> += OVS_LIST_INITIALIZER(&sgid_expired);
> +
> +static uint32_t next_sample_group_id OVS_GUARDED_BY(sgid_rwlock) = 1;
> +
> +#define SGID_RUN_INTERVAL   250 /* msec */
> +
> +static void
> +sgid_node_free(struct sgid_node *node)
> +{
> +free(node->sflow.tunnel);
> +free(CONST_CAST(void *, node->sflow.action));
> +free(node->sflow.userdata);
> +free(node);
> +}
> +
> +static void
> +sgid_cleanup(void)
> +{
> +long long int now = time_msec();
> +struct sgid_node *node;
> +
> +/* Do maintenance at most 4 times / sec. */
> +ovs_rwlock_rdlock(&sgid_rwlock);
> +if (now - sgid_last_run < SGID_RUN_INTERVAL) {
> +ovs_rwlock_unlock(&sgid_rwlock);
> +return;
> +}
> +ovs_rwlock_unlock(&sgid_rwlock);
> +
> +ovs_rwlock_wrlock(&sgid_rwlock);
> +sgid_last_run = now;
> +
> +LIST_FOR_EACH_POP (node, exp_node, &sgid_expired) {
> +cmap_remove(&sgid_map, &node->id_node, node->id);
> +ovsrcu_postpone(sgid_node_free, node);
> +}
> +
> +if (!ovs_list_is_empty(&sgid_expiring)) {
> +/* 'sgid_expired' is now empty, move nodes in
> + * 'sgid_expiring' to it. */
> +ovs_list_splice(&sgid_expired,
> +ovs_list_front(&sgid_expiring),
> +&sgid_expiring);
> +}

I still do not believe copying some similar code without understanding the 
impact helps here.
So I think this still needs to be resolved. I introduced previous comments and 
replies from the v9 patchset:

EC> Trying to understand why we have two list? Why can't we directly cleanup 
from expiring list?

CM> The code takes the recirc id management for example (ofproto-dpif-rid.c). 
I'm not 100% sure the two lists are really needed here. But the recirc id code 
exists for a long time. I think it is mature. And I don't want to re-invent 
something.
CM> And when psam

Re: [ovs-dev] [PATCH v2 2/2] conntrack: limit port clash resolution attempts

2021-09-06 Thread Paolo Valerio
Hello,

Thanks for working on this.

I like the idea of limiting the number of attempts, the logic has
been kept to avoid yet another logical change in a single shot, but it
is definitely needed especially for the case where both dst port and
src port are manipulated.

we...@ucloud.cn writes:

> From: wenxu 
>
> In case almost or all available ports are taken, clash resolution can
> take a very long time, resulting in soft lockup.
>
> This can happen when many to-be-natted hosts connect to same
> destination:port (e.g. a proxy) and all connections pass the same SNAT.
>
> Pick a random offset in the acceptable range, then try ever smaller
> number of adjacent port numbers, until either the limit is reached or a
> useable port was found.  This results in at most 248 attempts
> (128 + 64 + 32 + 16 + 8, i.e. 4 restarts with new search offset)
> instead of 64000+.

248 attempts for each IP in the range, right?
A large range of IP addresses may still theoretically lead to many
attempts, although random subranges and the hash should help to
reduce the chances in that case.

>
> Signed-off-by: wenxu 
> ---
>  lib/conntrack.c | 35 +++
>  1 file changed, 31 insertions(+), 4 deletions(-)
>
> diff --git a/lib/conntrack.c b/lib/conntrack.c
> index 2d14205..bc7de17 100644
> --- a/lib/conntrack.c
> +++ b/lib/conntrack.c
> @@ -2414,6 +2414,10 @@ nat_get_unique_tuple(struct conntrack *ct, const 
> struct conn *conn,
>   conn->key.nw_proto == IPPROTO_UDP;
>  uint16_t min_dport, max_dport, curr_dport, orig_dport;
>  uint16_t min_sport, max_sport, curr_sport, orig_sport;
> +static const unsigned int max_attempts = 128;
> +uint16_t range_src, range_dst, range_max;
> +unsigned int attempts;
> +unsigned int i;
>  
>  min_addr = conn->nat_info->min_addr;
>  max_addr = conn->nat_info->max_addr;
> @@ -2430,6 +2434,10 @@ nat_get_unique_tuple(struct conntrack *ct, const 
> struct conn *conn,
>  set_dport_range(conn->nat_info, &conn->key, hash, &orig_dport,
>  &min_dport, &max_dport);
>  
> +range_src = max_sport - min_sport + 1;
> +range_dst = max_dport - min_dport + 1;
> +range_max = range_src > range_dst ? range_src : range_dst;
> +
>  another_round:
>  store_addr_to_key(&curr_addr, &nat_conn->rev_key,
>conn->nat_info->nat_action);
> @@ -2446,17 +2454,36 @@ another_round:
>  curr_sport = orig_sport;
>  curr_dport = orig_dport;
>  
> +attempts = range_max;
> +if (attempts > max_attempts) {
> +attempts = max_attempts;
> +}
> +
> +another_port_round:
> +i = 0;
>  FOR_EACH_PORT_IN_RANGE(curr_dport, min_dport, max_dport) {
>  nat_conn->rev_key.src.port = htons(curr_dport);
>  FOR_EACH_PORT_IN_RANGE(curr_sport, min_sport, max_sport) {
> -nat_conn->rev_key.dst.port = htons(curr_sport);
> -if (!conn_lookup(ct, &nat_conn->rev_key,
> - time_msec(), NULL, NULL)) {
> -return true;
> +if (i++ < attempts) {
> +nat_conn->rev_key.dst.port = htons(curr_sport);
> +if (!conn_lookup(ct, &nat_conn->rev_key,
> + time_msec(), NULL, NULL)) {
> +return true;
> +}
>  }

this way we reduce the number of total lookups, but we still iterate
through the entire range. We should avoid that.

>  }
>  }
>  
> +if (attempts >= range_max || attempts < 16) {
> +goto next_addr;
> +}
> +
> +attempts /= 2;
> +curr_dport = random_uint32() % range_dst;
> +curr_sport = random_uint32() % range_src;
> +

did you mean:

curr = min + (random_uint32() % range);

?

> +goto another_port_round;
> +
>  /* Check if next IP is in range and respin. Otherwise, notify
>   * exhaustion to the caller. */
>  next_addr:
> -- 
> 1.8.3.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 1/2] conntrack: restore the origin port for each round with new address

2021-09-06 Thread Paolo Valerio
Hello,

we...@ucloud.cn writes:

> From: wenxu 
>
> It is better to choose the origin select port as current port
> for each port search round with new address.
>
> Signed-off-by: wenxu 
> ---

This should happen normally.
It doesn't happen in the case of source port manipulation when the
default ephemeral range is used and the packet source port is below
1024. In that case the first IP iteration uses the packet source port,
whereas the others don't.

if we want to change this behavior, there are some more ways we can
consider, e.g.:

- Using 1024 as initial curr_sport for source ports < 1024.
- picking different default ranges
  - e.g. the kernel uses the range [1, 511], if the source port is < 512,
[600, 1023], if it's >= 512 and < 1024, and [1024,65535] for the
remaining ports.

What do you think if we do something like the third bullet and squash
this change, if needed, in your next patch?

In any case, there are a couple of small nits/questions, see inline.

>  lib/conntrack.c | 11 +++
>  1 file changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/lib/conntrack.c b/lib/conntrack.c
> index 551c206..2d14205 100644
> --- a/lib/conntrack.c
> +++ b/lib/conntrack.c
> @@ -2412,8 +2412,8 @@ nat_get_unique_tuple(struct conntrack *ct, const struct 
> conn *conn,
>  uint32_t hash = nat_range_hash(conn, ct->hash_basis);
>  bool pat_proto = conn->key.nw_proto == IPPROTO_TCP ||
>   conn->key.nw_proto == IPPROTO_UDP;
> -uint16_t min_dport, max_dport, curr_dport;
> -uint16_t min_sport, max_sport, curr_sport;
> +uint16_t min_dport, max_dport, curr_dport, orig_dport;
> +uint16_t min_sport, max_sport, curr_sport, orig_sport;

can we keep the reverse xmas tree style here?

>  
>  min_addr = conn->nat_info->min_addr;
>  max_addr = conn->nat_info->max_addr;
> @@ -2425,9 +2425,9 @@ nat_get_unique_tuple(struct conntrack *ct, const struct 
> conn *conn,
>   * we can stop once we reach it. */
>  guard_addr = curr_addr;
>  
> -set_sport_range(conn->nat_info, &conn->key, hash, &curr_sport,
> +set_sport_range(conn->nat_info, &conn->key, hash, &orig_sport,
>  &min_sport, &max_sport);
> -set_dport_range(conn->nat_info, &conn->key, hash, &curr_dport,
> +set_dport_range(conn->nat_info, &conn->key, hash, &orig_dport,
>  &min_dport, &max_dport);
>  
>  another_round:
> @@ -2443,6 +2443,9 @@ another_round:
>  goto next_addr;
>  }
>  
> +curr_sport = orig_sport;
> +curr_dport = orig_dport;
> +

can this happen with dport?

>  FOR_EACH_PORT_IN_RANGE(curr_dport, min_dport, max_dport) {
>  nat_conn->rev_key.src.port = htons(curr_dport);
>  FOR_EACH_PORT_IN_RANGE(curr_sport, min_sport, max_sport) {
> -- 
> 1.8.3.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v14 3/7] dpif-offload-provider: Introduce dpif-offload-provider layer

2021-09-06 Thread Eelco Chaudron
On 15 Jul 2021, at 8:01, Chris Mi wrote:

> Some offload actions require functionality that is not netdev
> based, but dpif. For example, sFlow action requires to create
> a psample netlink socket to receive the sampled packets from
> TC or kernel driver.
>
> Create dpif-offload-provider layer to support such actions.
>
> Signed-off-by: Chris Mi 
> Reviewed-by: Eli Britstein 
> ---
>  lib/automake.mk |  2 ++
>  lib/dpif-netdev.c   |  1 +
>  lib/dpif-netlink.c  |  2 ++
>  lib/dpif-offload-provider.h | 34 +
>  lib/dpif-offload.c  | 43 +
>  lib/dpif-provider.h |  8 ++-
>  lib/dpif.c  | 10 +
>  7 files changed, 99 insertions(+), 1 deletion(-)
>  create mode 100644 lib/dpif-offload-provider.h
>  create mode 100644 lib/dpif-offload.c
>
> diff --git a/lib/automake.mk b/lib/automake.mk
> index 3c9523c1a..dc865b0ef 100644
> --- a/lib/automake.mk
> +++ b/lib/automake.mk
> @@ -123,6 +123,8 @@ lib_libopenvswitch_la_SOURCES = \
>   lib/dpif-netdev-private.h \
>   lib/dpif-netdev-perf.c \
>   lib/dpif-netdev-perf.h \
> + lib/dpif-offload.c \
> + lib/dpif-offload-provider.h \
>   lib/dpif-provider.h \
>   lib/dpif.c \
>   lib/dpif.h \
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 610949f36..35d73542b 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -8269,6 +8269,7 @@ const struct dpif_class dpif_netdev_class = {
>  dpif_netdev_bond_add,
>  dpif_netdev_bond_del,
>  dpif_netdev_bond_stats_get,
> +NULL,   /* dpif_offlod_api */

offlod -> offload

>  };
>
>  static void
> diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
> index 39dc8300e..6a7defb95 100644
> --- a/lib/dpif-netlink.c
> +++ b/lib/dpif-netlink.c
> @@ -34,6 +34,7 @@
>
>  #include "bitmap.h"
>  #include "dpif-netlink-rtnl.h"
> +#include "dpif-offload-provider.h"
>  #include "dpif-provider.h"
>  #include "fat-rwlock.h"
>  #include "flow.h"
> @@ -4036,6 +4037,7 @@ const struct dpif_class dpif_netlink_class = {
>  NULL,   /* bond_add */
>  NULL,   /* bond_del */
>  NULL,   /* bond_stats_get */
> +NULL,   /* dpif_offlod_api */

offlod -> offload

>  };
>
>  static int
> diff --git a/lib/dpif-offload-provider.h b/lib/dpif-offload-provider.h
> new file mode 100644
> index 0..97108402a
> --- /dev/null
> +++ b/lib/dpif-offload-provider.h
> @@ -0,0 +1,34 @@
> +/*
> + * Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
> + *
> + * Licensed under the Apache License, Version 2.0 (the "License");
> + * you may not use this file except in compliance with the License.
> + * You may obtain a copy of the License at:
> + *
> + * http://www.apache.org/licenses/LICENSE-2.0
> + *
> + * Unless required by applicable law or agreed to in writing, software
> + * distributed under the License is distributed on an "AS IS" BASIS,
> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> + * See the License for the specific language governing permissions and
> + * limitations under the License.
> + */
> +
> +#ifndef DPIF_OFFLOAD_PROVIDER_H
> +#define DPIF_OFFLOAD_PROVIDER_H
> +
> +struct dpif;
> +struct dpif_offload_sflow;
> +
> +struct dpif_offload_api {

Can we add help details on each function below, preferably as in the 
netdev/dpif_class definitions? i.e., describe the function, its input 
parameters, and return value?

> +void (*init)(void);

Do we want to think ahead and allow init to fail? Other init callbacks return 
int.

> +void (*uninit)(void);

Don't really like the uninit() function name. I like destroy() as it sort of 
indicates we should no longer use this resource once called.
However, in other projects, they use deinit() a lot, but to me, it does not 
express the urgency to no use it after the call.

> +void (*sflow_recv_wait)(void);
> +int (*sflow_recv)(struct dpif_offload_sflow *sflow);
> +};
> +
> +void dpif_offload_sflow_recv_wait(const struct dpif *dpif);
> +int dpif_offload_sflow_recv(const struct dpif *dpif,
> +struct dpif_offload_sflow *sflow);
> +
> +#endif /* DPIF_OFFLOAD_PROVIDER_H */
> diff --git a/lib/dpif-offload.c b/lib/dpif-offload.c
> new file mode 100644
> index 0..842e05798
> --- /dev/null
> +++ b/lib/dpif-offload.c
> @@ -0,0 +1,43 @@
> +/*
> + * Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
> + *
> + * Licensed under the Apache License, Version 2.0 (the "License");
> + * you may not use this file except in compliance with the License.
> + * You may obtain a copy of the License at:
> + *
> + * http://www.apache.org/licenses/LICENSE-2.0
> + *
> + * Unless required by applicable law or agreed to in writing, software
> + * distributed under the License is distributed on an "AS IS" BASIS,
> +

Re: [ovs-dev] [PATCH v14 0/7] Add offload support for sFlow

2021-09-06 Thread Eelco Chaudron


On 6 Sep 2021, at 11:14, Chris Mi wrote:

> On 9/3/2021 8:54 PM, Eelco Chaudron wrote:
>>
>>
>> On 3 Sep 2021, at 14:02, Eelco Chaudron wrote:
>>
>> On 15 Jul 2021, at 8:01, Chris Mi wrote:
>>
>> This patch set adds offload support for sFlow.
>>
>> Psample is a genetlink channel for packet sampling. TC action
>> act_sample
>> uses psample to send sampled packets to userspace.
>>
>> When offloading sample action to TC, userspace creates a
>> unique ID to
>> map sFlow action and tunnel info and passes this ID to kernel
>> instead
>> of the sFlow info. psample will send this ID and sampled packet to
>> userspace. Using the ID, userspace can recover the sFlow info
>> and send
>> sampled packet to the right sFlow monitoring host.
>>
>> Hi Chris,
>>
>> One thing missing from this patchset is a test case. I think we
>> should add it, as I’m going over this manually every patch iteration.
>>
>> I would add the following:
>>
>>  *
>>
>> Set the sample rate to 1, send 100 packets and make sure you
>> receive all of them
>>
>>   o Also, verify the output of “ovs-appctl dpctl/dump-flows
>> system@ovs-system type=tc” is correct, i.e., matches the
>> kernel output
>>  *
>>
>> Set the sample rate to 10, send 100 packets and make sure you
>> receive 10.
>>
>>   o Also, verify the output of “ovs-appctl dpctl/dump-flows
>> system@ovs-system type=tc” is correct, i.e., matches the
>> kernel output
>>
>> Cheers,
>>
>> Eelco
>>
>> PS: I also had a problem where only one packet got sent to the
>> collector, and then no more packets would arrive. Of course, when
>> I added some debug code, it never happened, and when removing the
>> debug code, it also worked fine :( Did you ever experience
>> something like this? I will play a bit more when reviewing
>> specific code, maybe it will happen again.
>>
>>
>> One additional thing I’m seeing is the following:
>>
>> $ ovs-appctl dpctl/dump-flows system@ovs-system type=tc
>> recirc_id(0),in_port(3),eth(src=52:54:00:88:51:38,dst=04:f4:bc:28:57:01),eth_type(0x0800),ipv4(frag=no),
>>  packets:7144, bytes:600096, used:7.430s, 
>> actions:sample(sample=10.0%,actions(userspace(pid=3925927229,sFlow(vid=0,pcp=0,output=2147483650),actions))),2
>>
>> Sometimes I get a rather large sFlow(output=) value in the sFlow output. 
>> Converting the above to hex, 0x8002, where I see this in 
>> fix_sflow_action() as being mentioned multiple output ports? This seems 
>> wrong to me as it should be 3 (as it always is in the none hw offload case). 
>> This odd value is also reported to the sFlow samples.
>>
>> Unfortunately, I do not have a reproducer for this.
>>
> Actually, in the very beginning of the development when I have a lot of debug 
> code, I also observed such odd port number sometimes.
> Such rule will disappear soon. So it is hard to test such corner case. And it 
> doesn't affect the functionality.
> So current code didn't deal with such corner case.

I’m getting this almost every time I restart OVS and initiate a flow (using a 
simple ping). This does not go away by itself, it stays until the flow times 
out. So this should be investigated and fixed.

My test setup is rather simple, just two ports, one is a VM one is a external 
host, which I ping from the VM. All default config, so using the NORMAL rule. 
This is my sFlow config:

 ovs-vsctl -- --id=@sflow create sflow agent=lo \
  target="127.0.0.1" header=128 \
  sampling=10 polling=4 \
  -- set bridge ovs_pvp_br0 sflow=@sflow

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v14 0/7] Add offload support for sFlow

2021-09-06 Thread Eelco Chaudron


On 6 Sep 2021, at 10:45, Chris Mi wrote:

> Hi Eelco,
>
> On 9/3/2021 8:02 PM, Eelco Chaudron wrote:
>>
>> On 15 Jul 2021, at 8:01, Chris Mi wrote:
>>
>> This patch set adds offload support for sFlow.
>>
>> Psample is a genetlink channel for packet sampling. TC action
>> act_sample
>> uses psample to send sampled packets to userspace.
>>
>> When offloading sample action to TC, userspace creates a unique ID to
>> map sFlow action and tunnel info and passes this ID to kernel instead
>> of the sFlow info. psample will send this ID and sampled packet to
>> userspace. Using the ID, userspace can recover the sFlow info and
>> send
>> sampled packet to the right sFlow monitoring host.
>>
>> Hi Chris,
>>
>> One thing missing from this patchset is a test case. I think we should add 
>> it, as I’m going over this manually every patch iteration.
>>
> Sure. I'll check existing test cases and see how to add them.
>>
>> I would add the following:
>>
>>  *
>>
>> Set the sample rate to 1, send 100 packets and make sure you
>> receive all of them
>>
>>   o Also, verify the output of “ovs-appctl dpctl/dump-flows
>> system@ovs-system type=tc” is correct, i.e., matches the
>> kernel output
>>
> OK.
>>
>>  *
>>  *
>>
>> Set the sample rate to 10, send 100 packets and make sure you
>> receive 10.
>>
> We have some internal ovs test cases to verify sFlow offload. From our 
> experience,
> we can't verify sFlow easily using so less packets. We send a lot of traffic 
> in a certain
> period and verify the packet number is between a range.

Ok, did not try high-speed tests but for me doing “ping 1.1.1.100 -c 100 -i 
0.01” seems rather ok. I miss 1, which I guess, is based on some other traffic 
that might have occurred, affecting the stats. But as you suggest adding a 
range will do.

> BTW, I'm not sure if you are aware of that our driver change for sFlow 
> offload is upstreamed.
> The main change is in file: 
> drivers/net/ethernet/mellanox/mlx5/core/en/tc/sample.c .

No, I have not tracked the upstream net-specific changes. I might try it out 
later on in the test.

>>  *
>>   o Also, verify the output of “ovs-appctl dpctl/dump-flows
>> system@ovs-system type=tc” is correct, i.e., matches the
>> kernel output
>>
>> Cheers,
>>
>> Eelco
>>
>> PS: I also had a problem where only one packet got sent to the collector, 
>> and then no more packets would arrive. Of course, when I added some debug 
>> code, it never happened, and when removing the debug code, it also worked 
>> fine :( Did you ever experience something like this? I will play a bit more 
>> when reviewing specific code, maybe it will happen again.
>>
> Are you testing with other_config:hw-offload="true" and 
> other_config:tc-policy=skip_hw?

I’m using the default policy, none, but without a HW offload blade.

> Actually, our internal test cases have been running for a long time. We 
> haven't found such issue yet :)
> Anyway, I'll pay attention to this issue to see if I can reproduce it.

Thanks, will try to replicate this.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v14 0/7] Add offload support for sFlow

2021-09-06 Thread Chris Mi via dev

On 9/3/2021 8:54 PM, Eelco Chaudron wrote:



On 3 Sep 2021, at 14:02, Eelco Chaudron wrote:

On 15 Jul 2021, at 8:01, Chris Mi wrote:

This patch set adds offload support for sFlow.

Psample is a genetlink channel for packet sampling. TC action
act_sample
uses psample to send sampled packets to userspace.

When offloading sample action to TC, userspace creates a
unique ID to
map sFlow action and tunnel info and passes this ID to kernel
instead
of the sFlow info. psample will send this ID and sampled packet to
userspace. Using the ID, userspace can recover the sFlow info
and send
sampled packet to the right sFlow monitoring host.

Hi Chris,

One thing missing from this patchset is a test case. I think we
should add it, as I’m going over this manually every patch iteration.

I would add the following:

 *

Set the sample rate to 1, send 100 packets and make sure you
receive all of them

  o Also, verify the output of “ovs-appctl dpctl/dump-flows
system@ovs-system type=tc” is correct, i.e., matches the
kernel output
 *

Set the sample rate to 10, send 100 packets and make sure you
receive 10.

  o Also, verify the output of “ovs-appctl dpctl/dump-flows
system@ovs-system type=tc” is correct, i.e., matches the
kernel output

Cheers,

Eelco

PS: I also had a problem where only one packet got sent to the
collector, and then no more packets would arrive. Of course, when
I added some debug code, it never happened, and when removing the
debug code, it also worked fine :( Did you ever experience
something like this? I will play a bit more when reviewing
specific code, maybe it will happen again.


One additional thing I’m seeing is the following:

$ ovs-appctl dpctl/dump-flows system@ovs-system type=tc
recirc_id(0),in_port(3),eth(src=52:54:00:88:51:38,dst=04:f4:bc:28:57:01),eth_type(0x0800),ipv4(frag=no), 
packets:7144, bytes:600096, used:7.430s, 
actions:sample(sample=10.0%,actions(userspace(pid=3925927229,sFlow(vid=0,pcp=0,output=2147483650),actions))),2


Sometimes I get a rather large sFlow(output=) value in the sFlow 
output. Converting the above to hex, 0x8002, where I see this in 
fix_sflow_action() as being mentioned multiple output ports? This 
seems wrong to me as it should be 3 (as it always is in the none hw 
offload case). This odd value is also reported to the sFlow samples.


Unfortunately, I do not have a reproducer for this.

Actually, in the very beginning of the development when I have a lot of 
debug code, I also observed such odd port number sometimes.
Such rule will disappear soon. So it is hard to test such corner case. 
And it doesn't affect the functionality.

So current code didn't deal with such corner case.

Regards,
Chris


//Eelco




___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v6 2/2] dpctl: dpif: allow viewing and configuring dp cache sizes

2021-09-06 Thread Eelco Chaudron
This patch adds a general way of viewing/configuring datapath
cache sizes. With an implementation for the netlink interface.

The ovs-dpctl/ovs-appctl show commands will display the
current cache sizes configured:

ovs-dpctl show
system@ovs-system:
  lookups: hit:25 missed:63 lost:0
  flows: 0
  masks: hit:282 total:0 hit/pkt:3.20
  cache: hit:4 hit rate:4.54%
  caches:
masks-cache: size: 256
  port 0: ovs-system (internal)
  port 1: br-int (internal)
  port 2: genev_sys_6081 (geneve: packet_type=ptap)
  port 3: br-ex (internal)
  port 4: eth2
  port 5: sw0p1 (internal)
  port 6: sw0p3 (internal)

A specific cache can be configured as follows:

ovs-appctl dpctl/cache-set-size DP CACHE SIZE
ovs-dpctl cache-set-size DP CACHE SIZE

For example to disable the cache do:

$ ovs-dpctl cache-set-size system@ovs-system masks-cache 0
Setting cache size successful, new size 0.

Signed-off-by: Eelco Chaudron 
Acked-by: Paolo Valerio 
Acked-by: Flavio Leitner 

v2: - Changed precision to 2 digits
- Handle missing kernel feature at netlink level
v3: - Rebase on the latest master branch
v4: - Fixed commit message
- Fix issue with resetting user_features
v5: - Include the actual resetting user_features fix
v6: - Rebase on the latest master branch
- Add ACK's
---
 datapath/linux/compat/include/linux/openvswitch.h |2 
 lib/dpctl.c   |  120 +
 lib/dpctl.man |   14 ++
 lib/dpif-netdev.c |4 +
 lib/dpif-netlink.c|  117 
 lib/dpif-provider.h   |   20 
 lib/dpif.c|   32 ++
 lib/dpif.h|7 +
 tests/system-traffic.at   |   36 ++
 utilities/ovs-dpctl.c |4 +
 10 files changed, 355 insertions(+), 1 deletion(-)

diff --git a/datapath/linux/compat/include/linux/openvswitch.h 
b/datapath/linux/compat/include/linux/openvswitch.h
index a3f5fb919..936644192 100644
--- a/datapath/linux/compat/include/linux/openvswitch.h
+++ b/datapath/linux/compat/include/linux/openvswitch.h
@@ -107,7 +107,7 @@ enum ovs_datapath_attr {
OVS_DP_ATTR_MEGAFLOW_STATS, /* struct ovs_dp_megaflow_stats */
OVS_DP_ATTR_USER_FEATURES,  /* OVS_DP_F_*  */
OVS_DP_ATTR_PAD,
-   OVS_DP_ATTR_PAD2,
+   OVS_DP_ATTR_MASKS_CACHE_SIZE,
OVS_DP_ATTR_PER_CPU_PIDS,   /* Netlink PIDS to receive upcalls */
__OVS_DP_ATTR_MAX
 };
diff --git a/lib/dpctl.c b/lib/dpctl.c
index acc677974..6cba8db51 100644
--- a/lib/dpctl.c
+++ b/lib/dpctl.c
@@ -591,6 +591,36 @@ compare_port_nos(const void *a_, const void *b_)
 return a < b ? -1 : a > b;
 }
 
+static void
+show_dpif_cache__(struct dpif *dpif, struct dpctl_params *dpctl_p)
+{
+uint32_t nr_caches;
+int error = dpif_cache_get_supported_levels(dpif, &nr_caches);
+
+if (error || nr_caches == 0) {
+return;
+}
+
+dpctl_print(dpctl_p, "  caches:\n");
+for (int i = 0; i < nr_caches; i++) {
+const char *name;
+uint32_t size;
+
+if (dpif_cache_get_name(dpif, i, &name) ||
+dpif_cache_get_size(dpif, i, &size)) {
+continue;
+}
+dpctl_print(dpctl_p, "%s: size: %u\n", name, size);
+}
+}
+
+static void
+show_dpif_cache(struct dpif *dpif, struct dpctl_params *dpctl_p)
+{
+dpctl_print(dpctl_p, "%s:\n", dpif_name(dpif));
+show_dpif_cache__(dpif, dpctl_p);
+}
+
 static void
 show_dpif(struct dpif *dpif, struct dpctl_params *dpctl_p)
 {
@@ -623,6 +653,8 @@ show_dpif(struct dpif *dpif, struct dpctl_params *dpctl_p)
 }
 }
 
+show_dpif_cache__(dpif, dpctl_p);
+
 odp_port_t *port_nos = NULL;
 size_t allocated_port_nos = 0, n_port_nos = 0;
 DPIF_PORT_FOR_EACH (&dpif_port, &dump, dpif) {
@@ -2409,6 +2441,92 @@ dpctl_ct_ipf_get_status(int argc, const char *argv[],
 return error;
 }
 
+static int
+dpctl_cache_get_size(int argc, const char *argv[],
+ struct dpctl_params *dpctl_p)
+{
+int error;
+
+if (argc > 1) {
+struct dpif *dpif;
+
+error = parsed_dpif_open(argv[1], false, &dpif);
+if (!error) {
+show_dpif_cache(dpif, dpctl_p);
+dpif_close(dpif);
+} else {
+dpctl_error(dpctl_p, error, "Opening datapath %s failed", argv[1]);
+}
+} else {
+error = dps_for_each(dpctl_p, show_dpif_cache);
+}
+
+return error;
+}
+
+static int
+dpctl_cache_set_size(int argc, const char *argv[],
+ struct dpctl_params *dpctl_p)
+{
+int i, error = EINVAL;
+uint32_t nr_caches, size;
+struct dpif *dpif;
+
+if (argc != 4) {
+dpctl_error(dpctl_p, error, "Invalid number of arguments");
+return error;
+}
+
+if (!ovs_scan(argv[3], "%"SCNu32, &size

[ovs-dev] [PATCH v6 1/2] dpctl: dpif: add kernel datapath cache hit output

2021-09-06 Thread Eelco Chaudron
This patch adds cache usage statistics to the output:

$ ovs-dpctl show
system@ovs-system:
  lookups: hit:24 missed:71 lost:0
  flows: 0
  masks: hit:334 total:0 hit/pkt:3.52
  cache: hit:4 hit rate:4.21%
  port 0: ovs-system (internal)
  port 1: genev_sys_6081 (geneve: packet_type=ptap)
  port 2: br-int (internal)
  port 3: br-ex (internal)
  port 4: eth2
  port 5: sw1p1 (internal)
  port 6: sw0p4 (internal)

Signed-off-by: Eelco Chaudron 
Acked-by: Flavio Leitner 
Acked-by: Paolo Valerio 

v2: - Changed cache naming to use a hyphen instead of spaces
- Some error message grammar changes
- Update dpctl man page
- Add self tests to for the new set/get commands
v3: - Rebase on the latest master branch
v4: - Fixed commit message and tab/spaces in datapath include
v5: - No updates
v6: - Add ACKs
---
 datapath/linux/compat/include/linux/openvswitch.h |2 +-
 lib/dpctl.c   |9 +
 lib/dpif-netdev.c |1 +
 lib/dpif-netlink.c|9 +
 lib/dpif.h|2 ++
 5 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/datapath/linux/compat/include/linux/openvswitch.h 
b/datapath/linux/compat/include/linux/openvswitch.h
index f0595eeba..a3f5fb919 100644
--- a/datapath/linux/compat/include/linux/openvswitch.h
+++ b/datapath/linux/compat/include/linux/openvswitch.h
@@ -126,8 +126,8 @@ struct ovs_dp_megaflow_stats {
__u64 n_mask_hit;/* Number of masks used for flow lookups. */
__u32 n_masks;   /* Number of masks for the datapath. */
__u32 pad0;  /* Pad for future expension. */
+   __u64 n_cache_hit;   /* Number of cache matches for flow lookups. */
__u64 pad1;  /* Pad for future expension. */
-   __u64 pad2;  /* Pad for future expension. */
 };
 
 struct ovs_vport_stats {
diff --git a/lib/dpctl.c b/lib/dpctl.c
index ef8ae7402..acc677974 100644
--- a/lib/dpctl.c
+++ b/lib/dpctl.c
@@ -611,6 +611,15 @@ show_dpif(struct dpif *dpif, struct dpctl_params *dpctl_p)
 dpctl_print(dpctl_p, "  masks: hit:%"PRIu64" total:%"PRIu32
  " hit/pkt:%.2f\n",
 stats.n_mask_hit, stats.n_masks, avg);
+
+if (stats.n_cache_hit != UINT64_MAX) {
+double avg_hits = n_pkts ?
+(double) stats.n_cache_hit / n_pkts * 100 : 0.0;
+
+dpctl_print(dpctl_p,
+"  cache: hit:%"PRIu64" hit rate:%.2f%%\n",
+stats.n_cache_hit, avg_hits);
+}
 }
 }
 
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index b3e57bb95..0d0623013 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -1963,6 +1963,7 @@ dpif_netdev_get_stats(const struct dpif *dpif, struct 
dpif_dp_stats *stats)
 }
 stats->n_masks = UINT32_MAX;
 stats->n_mask_hit = UINT64_MAX;
+stats->n_cache_hit = UINT64_MAX;
 
 return 0;
 }
diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
index 34fc04237..70751e634 100644
--- a/lib/dpif-netlink.c
+++ b/lib/dpif-netlink.c
@@ -749,9 +749,18 @@ dpif_netlink_get_stats(const struct dpif *dpif_, struct 
dpif_dp_stats *stats)
 stats->n_masks = dp.megaflow_stats->n_masks;
 stats->n_mask_hit = get_32aligned_u64(
 &dp.megaflow_stats->n_mask_hit);
+stats->n_cache_hit = get_32aligned_u64(
+&dp.megaflow_stats->n_cache_hit);
+
+if (!stats->n_cache_hit) {
+/* Old kernels don't use this field and always
+ * report zero instead. Disable this stat. */
+stats->n_cache_hit = UINT64_MAX;
+}
 } else {
 stats->n_masks = UINT32_MAX;
 stats->n_mask_hit = UINT64_MAX;
+stats->n_cache_hit = UINT64_MAX;
 }
 ofpbuf_delete(buf);
 }
diff --git a/lib/dpif.h b/lib/dpif.h
index 7c322d20e..b32ae5fc7 100644
--- a/lib/dpif.h
+++ b/lib/dpif.h
@@ -429,6 +429,8 @@ struct dpif_dp_stats {
 uint64_t n_missed;  /* Number of flow table misses. */
 uint64_t n_lost;/* Number of misses not sent to userspace. */
 uint64_t n_flows;   /* Number of flows present. */
+uint64_t n_cache_hit;   /* Number of mega flow mask cache hits for
+   flow table matches. */
 uint64_t n_mask_hit;/* Number of mega flow masks visited for
flow table matches. */
 uint32_t n_masks;   /* Number of mega flow masks. */

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v6 0/2] dpctl: cache visibility/configuration enhancements

2021-09-06 Thread Eelco Chaudron
Removed the RFC keyword from this patchset, as the OVS kernel
patches have been approved.

 - 9bf24f594c6a net: openvswitch: make masks cache size configurable
 - 9d2f627b7ec9 net: openvswitch: add masks cache hit counter

Signed-off-by: Eelco Chaudron 

v2: Fixing review comments from Flavio
v3: Rebased on the latest master branch
v4: Fixing review comments from Flavio
Fix issue with resetting user_features
v5: Forgot to include the user_features fix :(
v6: Rebase on the latest master branch
Add ACKs

Eelco Chaudron (2):
  dpctl: dpif: add kernel datapath cache hit output
  dpctl: dpif: allow viewing and configuring dp cache sizes


 datapath/linux/compat/include/linux/openvswitch.h |4 -
 lib/dpctl.c   |  129 +
 lib/dpctl.man |   14 ++
 lib/dpif-netdev.c |5 +
 lib/dpif-netlink.c|  126 +
 lib/dpif-provider.h   |   20 +++
 lib/dpif.c|   32 +
 lib/dpif.h|9 +
 tests/system-traffic.at   |   36 ++
 utilities/ovs-dpctl.c |4 +
 10 files changed, 377 insertions(+), 2 deletions(-)

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v14 0/7] Add offload support for sFlow

2021-09-06 Thread Chris Mi via dev

Hi Eelco,

On 9/3/2021 8:02 PM, Eelco Chaudron wrote:


On 15 Jul 2021, at 8:01, Chris Mi wrote:

This patch set adds offload support for sFlow.

Psample is a genetlink channel for packet sampling. TC action
act_sample
uses psample to send sampled packets to userspace.

When offloading sample action to TC, userspace creates a unique ID to
map sFlow action and tunnel info and passes this ID to kernel instead
of the sFlow info. psample will send this ID and sampled packet to
userspace. Using the ID, userspace can recover the sFlow info and
send
sampled packet to the right sFlow monitoring host.

Hi Chris,

One thing missing from this patchset is a test case. I think we should 
add it, as I’m going over this manually every patch iteration.



Sure. I'll check existing test cases and see how to add them.


I would add the following:

 *

Set the sample rate to 1, send 100 packets and make sure you
receive all of them

  o Also, verify the output of “ovs-appctl dpctl/dump-flows
system@ovs-system type=tc” is correct, i.e., matches the
kernel output


OK.


 *
 *

Set the sample rate to 10, send 100 packets and make sure you
receive 10.

We have some internal ovs test cases to verify sFlow offload. From our 
experience,
we can't verify sFlow easily using so less packets. We send a lot of 
traffic in a certain

period and verify the packet number is between a range.

BTW, I'm not sure if you are aware of that our driver change for sFlow 
offload is upstreamed.
The main change is in file: 
drivers/net/ethernet/mellanox/mlx5/core/en/tc/sample.c .


 *
  o Also, verify the output of “ovs-appctl dpctl/dump-flows
system@ovs-system type=tc” is correct, i.e., matches the
kernel output

Cheers,

Eelco

PS: I also had a problem where only one packet got sent to the 
collector, and then no more packets would arrive. Of course, when I 
added some debug code, it never happened, and when removing the debug 
code, it also worked fine :( Did you ever experience something like 
this? I will play a bit more when reviewing specific code, maybe it 
will happen again.


Are you testing with other_config:hw-offload="true" and 
other_config:tc-policy=skip_hw?
Actually, our internal test cases have been running for a long time. We 
haven't found such issue yet :)

Anyway, I'll pay attention to this issue to see if I can reproduce it.

Thanks,
Chris
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [ovn] bug: vtep port binding update doesn't trigger flows update

2021-09-06 Thread Odintsov Vladislav
Hi Numan,

I’ve tried your advice and this helped — new flow was installed right after 
vtep port_binding appeared.
At first glance full recompute in case of appearing a new vtep port_binding can 
be an "okay" solution, 'cause vtep PBs don’t create/delete very often.
Anyway, what information do you need to handle L-P properly for there ports? I 
can test a possible solution.

Regards,
Vladislav Odintsov

On 4 Sep 2021, at 19:25, Numan Siddique mailto:num...@ovn.org>> 
wrote:

On Fri, Sep 3, 2021 at 7:26 PM Odintsov Vladislav 
mailto:vlodint...@croc.ru>> wrote:

Hi,

On master branch I found a regression that OF flows not get installed when a 
new port_binding type=vtep appeared in sbdb.
I found there is a comment that ovn-controller doesn’t know, what changed:
https://github.com/ovn-org/ovn/blob/922c45f/controller/binding.c#L2431-L2436

Actually, this flow doesn't get installed:
https://github.com/ovn-org/ovn/blob/922c45f74a006d8c0dde400b130915b08b90a1e3/controller/physical.c#L1687-L1689

In this switch statement, can you please try setting  - handled =
false;  so that a full recompute is triggered and see if this solves
the problem ?

I'm not sure what is going on.  Either we can fall back to full
recompute for any port binding changes to vtep ports or handle the I-P
properly
for vtep port bindings.

Thanks
Numan



With new L-P processing codebase I couldn’t quickly understand, where the 
problem can be.
So, ovn-appctl -t ovn-controller recompute solves tha problem and flow gets 
installed. Looks like this problem is really in inc. processing.
Please, help. Thanks.

Regards,
Vladislav Odintsov

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev