Re: [ovs-dev] Patch not being sent to ovs-dev ML

2021-08-16 Thread Krzysztof Klimonda
Hi,

Oops, thanks - it seems I've also goofed up and spammed ML with the same patch 
a few times. My bad :(

Thanks,
Krzysztof

On Mon, Aug 16, 2021, at 13:18, Ilya Maximets wrote:
> On 8/16/21 1:01 PM, Krzysztof Klimonda wrote:
> > Hi,
> > 
> > I'm trying to send a patch to ovs-dev ML but it's not showing up - I get no 
> > response with error, and I can see it in my mailbox (as git send-email adds 
> > me to the To: list) and Message-Id is "Message-Id: 
> > <20210816085206.69170-1-kklimo...@syntaxhighlighted.com>". Could someone 
> > help me figure it out?
> 
> Here is a patch with a stated message id:
>   https://mail.openvswitch.org/pipermail/ovs-dev/2021-August/386754.html
> 
> Everything seems fine.
> 
> Best regards, Ilya Maximets.
> 


-- 
  Krzysztof Klimonda
  kklimo...@syntaxhighlighted.com
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] Patch not being sent to ovs-dev ML

2021-08-16 Thread Krzysztof Klimonda
Hi,

I'm trying to send a patch to ovs-dev ML but it's not showing up - I get no 
response with error, and I can see it in my mailbox (as git send-email adds me 
to the To: list) and Message-Id is "Message-Id: 
<20210816085206.69170-1-kklimo...@syntaxhighlighted.com>". Could someone help 
me figure it out?

Thanks
Krzysztof

-- 
  Krzysztof Klimonda
  kklimo...@syntaxhighlighted.com
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH RFC ovn v2] northd: Fix routing loop in LRs with one-to-many SNAT

2021-08-16 Thread Krzysztof Klimonda
If there are snat entries on the router, and some logical_ip are set to
network instead of an IP address then given SNAT is masquerade. In such
case ct_snat action is used in lr_in_unsnat table to ensure that the
packet is matched against conntrack and destination IP is replaced with
one from matching conntrack entry.

This however breaks down for new connections sent to router's external IP
address. In such case, when packet is checked against conntrack table,
there is no match, and its destination IP remains unchanged. This causes a
loop in lr_in_ip_routing.

This commit installs a new logical flow in lr_in_ip_routing table for routers
that have SNAT entry with logical_ip set to network (that being masquerade).
This flow drops packets that, after going through conntrack via ct_snat action
in lr_in_unsnat table, are not matched to any existing flow and retain the
original destination IP. This prevents vswitchd from looping such packets until
their TTL reaches zero, as well as installing bogus flows in datapath that lead
to ovs module dropping such packets with "deferred action limit reached, drop
recirc action" message.

Signed-off-by: Krzysztof Klimonda 
---
There are two FIXME that should be handled:
- I tried dropping ct.new from the flow, and that breaks E/W SNAT/DNAT
  traffic (DNAT and SNAT on distributed router - E/W -- ovn-northd). See
  comment in ovn-northd.c/ovn_northd.dl
- I can't get my testcase pass with the master - when pinging router gateway
  IP first packet is seemlingly dropped, with the following warning logged
  in vswitchd:
  "Failed to acquire udpif_key corresponding to unexpected flow"

v1 -> v2
* Rebased patch on current master
* Reworked logic to check if nat->type is snat and masquerade
* Added loadbalancer-related check to system-userspace testcase
* Added ddlog implementation

 northd/ovn-northd.c  |  72 +++--
 northd/ovn_northd.dl |  53 +++
 tests/ovn.at |  53 +++
 tests/system-ovn.at  | 122 +++
 4 files changed, 295 insertions(+), 5 deletions(-)

diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
index 52fc255ae..517357884 100644
--- a/northd/ovn-northd.c
+++ b/northd/ovn-northd.c
@@ -12250,8 +12250,9 @@ build_lrouter_ingress_flow(struct hmap *lflows, struct 
ovn_datapath *od,
 
 static int
 lrouter_check_nat_entry(struct ovn_datapath *od, const struct nbrec_nat *nat,
-ovs_be32 *mask, bool *is_v6, int *cidr_bits,
-struct eth_addr *mac, bool *distributed)
+ovs_be32 *mask, bool *is_v6, bool *is_masq,
+int *cidr_bits, struct eth_addr *mac,
+bool *distributed)
 {
 struct in6_addr ipv6, mask_v6, v6_exact = IN6ADDR_EXACT_INIT;
 ovs_be32 ip;
@@ -12285,14 +12286,21 @@ lrouter_check_nat_entry(struct ovn_datapath *od, 
const struct nbrec_nat *nat,
 *is_v6 = true;
 }
 
+*is_masq = false;
 /* Check the validity of nat->logical_ip. 'logical_ip' can
 * be a subnet when the type is "snat". */
 if (*is_v6) {
 error = ipv6_parse_masked(nat->logical_ip, &ipv6, &mask_v6);
 *cidr_bits = ipv6_count_cidr_bits(&mask_v6);
+if (*cidr_bits < 128) {
+*is_masq = true;
+}
 } else {
 error = ip_parse_masked(nat->logical_ip, &ip, mask);
 *cidr_bits = ip_count_cidr_bits(*mask);
+if (*cidr_bits < 32) {
+*is_masq = true;
+}
 }
 if (!strcmp(nat->type, "snat")) {
 if (error) {
@@ -12397,12 +12405,12 @@ build_lrouter_nat_defrag_and_lb(struct ovn_datapath 
*od, struct hmap *lflows,
 for (int i = 0; i < od->nbr->n_nat; i++) {
 const struct nbrec_nat *nat = nat = od->nbr->nat[i];
 struct eth_addr mac = eth_addr_broadcast;
-bool is_v6, distributed;
+bool is_v6, is_masq, distributed;
 ovs_be32 mask;
 int cidr_bits;
 
-if (lrouter_check_nat_entry(od, nat, &mask, &is_v6, &cidr_bits,
-&mac, &distributed) < 0) {
+if (lrouter_check_nat_entry(od, nat, &mask, &is_v6, &is_masq,
+&cidr_bits, &mac, &distributed) < 0) {
 continue;
 }
 
@@ -12413,6 +12421,60 @@ build_lrouter_nat_defrag_and_lb(struct ovn_datapath 
*od, struct hmap *lflows,
 build_lrouter_in_dnat_flow(lflows, od, nat, match, actions, 
distributed,
mask, is_v6);
 
+/* When router have SNAT enabled, and logical_ip is a network (router
+ * is doing masquerade), then we need to make sure that packets
+ * unrelated to any established connection that still have router's
+ * external IP as a next hop after

[ovs-dev] [PATCH RFC ovn v2] northd: Fix routing loop in LRs with one-to-many SNAT

2021-08-12 Thread Krzysztof Klimonda
If there are snat entries on the router, and some logical_ip are set to
network instead of an IP address then given SNAT is masquerade. In such
case ct_snat action is used in lr_in_unsnat table to ensure that the
packet is matched against conntrack and destination IP is replaced with
one from matching conntrack entry.

This however breaks down for new connections sent to router's external IP
address. In such case, when packet is checked against conntrack table,
there is no match, and its destination IP remains unchanged. This causes a
loop in lr_in_ip_routing.

This commit installs a new logical flow in lr_in_ip_routing table for routers
that have SNAT entry with logical_ip set to network (that being masquerade).
This flow drops packets that, after going through conntrack via ct_snat action
in lr_in_unsnat table, are not matched to any existing flow and retain the
original destination IP. This prevents vswitchd from looping such packets until
their TTL reaches zero, as well as installing bogus flows in datapath that lead
to ovs module dropping such packets with "deferred action limit reached, drop
recirc action" message.

Signed-off-by: Krzysztof Klimonda 
---
There are two FIXME that should be handled:
- I tried dropping ct.new from the flow, and that breaks E/W SNAT/DNAT
  traffic (DNAT and SNAT on distributed router - E/W -- ovn-northd). See
  comment in ovn-northd.c/ovn_northd.dl
- I can't get my testcase pass with the master - when pinging router gateway
  IP first packet is seemlingly dropped, with the following warning logged
  in vswitchd:
  "Failed to acquire udpif_key corresponding to unexpected flow"

v1 -> v2
* Rebased patch on current master
* Reworked logic to check if nat->type is snat and masquerade
* Added loadbalancer-related check to system-userspace testcase
* Added ddlog implementation

 northd/ovn-northd.c  |  72 +++--
 northd/ovn_northd.dl |  53 +++
 tests/ovn.at |  53 +++
 tests/system-ovn.at  | 122 +++
 4 files changed, 295 insertions(+), 5 deletions(-)

diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
index 52fc255ae..517357884 100644
--- a/northd/ovn-northd.c
+++ b/northd/ovn-northd.c
@@ -12250,8 +12250,9 @@ build_lrouter_ingress_flow(struct hmap *lflows, struct 
ovn_datapath *od,
 
 static int
 lrouter_check_nat_entry(struct ovn_datapath *od, const struct nbrec_nat *nat,
-ovs_be32 *mask, bool *is_v6, int *cidr_bits,
-struct eth_addr *mac, bool *distributed)
+ovs_be32 *mask, bool *is_v6, bool *is_masq,
+int *cidr_bits, struct eth_addr *mac,
+bool *distributed)
 {
 struct in6_addr ipv6, mask_v6, v6_exact = IN6ADDR_EXACT_INIT;
 ovs_be32 ip;
@@ -12285,14 +12286,21 @@ lrouter_check_nat_entry(struct ovn_datapath *od, 
const struct nbrec_nat *nat,
 *is_v6 = true;
 }
 
+*is_masq = false;
 /* Check the validity of nat->logical_ip. 'logical_ip' can
 * be a subnet when the type is "snat". */
 if (*is_v6) {
 error = ipv6_parse_masked(nat->logical_ip, &ipv6, &mask_v6);
 *cidr_bits = ipv6_count_cidr_bits(&mask_v6);
+if (*cidr_bits < 128) {
+*is_masq = true;
+}
 } else {
 error = ip_parse_masked(nat->logical_ip, &ip, mask);
 *cidr_bits = ip_count_cidr_bits(*mask);
+if (*cidr_bits < 32) {
+*is_masq = true;
+}
 }
 if (!strcmp(nat->type, "snat")) {
 if (error) {
@@ -12397,12 +12405,12 @@ build_lrouter_nat_defrag_and_lb(struct ovn_datapath 
*od, struct hmap *lflows,
 for (int i = 0; i < od->nbr->n_nat; i++) {
 const struct nbrec_nat *nat = nat = od->nbr->nat[i];
 struct eth_addr mac = eth_addr_broadcast;
-bool is_v6, distributed;
+bool is_v6, is_masq, distributed;
 ovs_be32 mask;
 int cidr_bits;
 
-if (lrouter_check_nat_entry(od, nat, &mask, &is_v6, &cidr_bits,
-&mac, &distributed) < 0) {
+if (lrouter_check_nat_entry(od, nat, &mask, &is_v6, &is_masq,
+&cidr_bits, &mac, &distributed) < 0) {
 continue;
 }
 
@@ -12413,6 +12421,60 @@ build_lrouter_nat_defrag_and_lb(struct ovn_datapath 
*od, struct hmap *lflows,
 build_lrouter_in_dnat_flow(lflows, od, nat, match, actions, 
distributed,
mask, is_v6);
 
+/* When router have SNAT enabled, and logical_ip is a network (router
+ * is doing masquerade), then we need to make sure that packets
+ * unrelated to any established connection that still have router's
+ * external IP as a next hop after

[ovs-dev] [PATCH ovn v2] northd: Fix routing loop in LRs with one-to-many SNAT

2021-08-12 Thread Krzysztof Klimonda
If there are snat entries on the router, and some logical_ip are set to
network instead of an IP address then given SNAT is masquerade. In such
case ct_snat action is used in lr_in_unsnat table to ensure that the
packet is matched against conntrack and destination IP is replaced with
one from matching conntrack entry.

This however breaks down for new connections sent to router's external IP
address. In such case, when packet is checked against conntrack table,
there is no match, and its destination IP remains unchanged. This causes a
loop in lr_in_ip_routing.

This commit installs a new logical flow in lr_in_ip_routing table for routers
that have SNAT entry with logical_ip set to network (that being masquerade).
This flow drops packets that, after going through conntrack via ct_snat action
in lr_in_unsnat table, are not matched to any existing flow and retain the
original destination IP. This prevents vswitchd from looping such packets until
their TTL reaches zero, as well as installing bogus flows in datapath that lead
to ovs module dropping such packets with "deferred action limit reached, drop
recirc action" message.

Signed-off-by: Krzysztof Klimonda 
---
There are two FIXME that should be handled:
- I tried dropping ct.new from the flow, and that breaks E/W SNAT/DNAT
  traffic (DNAT and SNAT on distributed router - E/W -- ovn-northd). See
  comment in ovn-northd.c/ovn_northd.dl
- I can't get my testcase pass with the master - when pinging router gateway
  IP first packet is seemlingly dropped, with the following warning logged
  in vswitchd:
  "Failed to acquire udpif_key corresponding to unexpected flow"

v1 -> v2
* Rebased patch on current master
* Reworked logic to check if nat->type is snat and masquerade
* Added loadbalancer-related check to system-userspace testcase
* Added ddlog implementation

 northd/ovn-northd.c  |  72 +++--
 northd/ovn_northd.dl |  53 +++
 tests/ovn.at |  53 +++
 tests/system-ovn.at  | 122 +++
 4 files changed, 295 insertions(+), 5 deletions(-)

diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
index 52fc255ae..517357884 100644
--- a/northd/ovn-northd.c
+++ b/northd/ovn-northd.c
@@ -12250,8 +12250,9 @@ build_lrouter_ingress_flow(struct hmap *lflows, struct 
ovn_datapath *od,
 
 static int
 lrouter_check_nat_entry(struct ovn_datapath *od, const struct nbrec_nat *nat,
-ovs_be32 *mask, bool *is_v6, int *cidr_bits,
-struct eth_addr *mac, bool *distributed)
+ovs_be32 *mask, bool *is_v6, bool *is_masq,
+int *cidr_bits, struct eth_addr *mac,
+bool *distributed)
 {
 struct in6_addr ipv6, mask_v6, v6_exact = IN6ADDR_EXACT_INIT;
 ovs_be32 ip;
@@ -12285,14 +12286,21 @@ lrouter_check_nat_entry(struct ovn_datapath *od, 
const struct nbrec_nat *nat,
 *is_v6 = true;
 }
 
+*is_masq = false;
 /* Check the validity of nat->logical_ip. 'logical_ip' can
 * be a subnet when the type is "snat". */
 if (*is_v6) {
 error = ipv6_parse_masked(nat->logical_ip, &ipv6, &mask_v6);
 *cidr_bits = ipv6_count_cidr_bits(&mask_v6);
+if (*cidr_bits < 128) {
+*is_masq = true;
+}
 } else {
 error = ip_parse_masked(nat->logical_ip, &ip, mask);
 *cidr_bits = ip_count_cidr_bits(*mask);
+if (*cidr_bits < 32) {
+*is_masq = true;
+}
 }
 if (!strcmp(nat->type, "snat")) {
 if (error) {
@@ -12397,12 +12405,12 @@ build_lrouter_nat_defrag_and_lb(struct ovn_datapath 
*od, struct hmap *lflows,
 for (int i = 0; i < od->nbr->n_nat; i++) {
 const struct nbrec_nat *nat = nat = od->nbr->nat[i];
 struct eth_addr mac = eth_addr_broadcast;
-bool is_v6, distributed;
+bool is_v6, is_masq, distributed;
 ovs_be32 mask;
 int cidr_bits;
 
-if (lrouter_check_nat_entry(od, nat, &mask, &is_v6, &cidr_bits,
-&mac, &distributed) < 0) {
+if (lrouter_check_nat_entry(od, nat, &mask, &is_v6, &is_masq,
+&cidr_bits, &mac, &distributed) < 0) {
 continue;
 }
 
@@ -12413,6 +12421,60 @@ build_lrouter_nat_defrag_and_lb(struct ovn_datapath 
*od, struct hmap *lflows,
 build_lrouter_in_dnat_flow(lflows, od, nat, match, actions, 
distributed,
mask, is_v6);
 
+/* When router have SNAT enabled, and logical_ip is a network (router
+ * is doing masquerade), then we need to make sure that packets
+ * unrelated to any established connection that still have router's
+ * external IP as a next hop after

Re: [ovs-dev] [PATCH ovn] northd: Fix routing loop in LRs with one-to-many SNAT

2021-08-11 Thread Krzysztof Klimonda
Hi Han, Numan,

@Han I'll post v2 that is rebased against master with fixes you've mentioned 
below. Check my comments.

@Numan I've extended system test and added loadbalancer plus a connection check 
- seems to be fine, however I'm having a hell of a time trying to figure out 
how to use ovn-trace with CT (and also with LB) to understand how packets are 
matched by flows.

As far as I understand loadbalancer action ct_lb will perform dnat, changing 
the destination IP and port to backend, and the flow I've installed in 
lr_in_ip_routing will no longer match it.

In version 2 patch, after pausing system test 149 I've tried a trace like that 
to see which flows are matched, but the output doesn't seem to actually show 
related flows, and seem to be showing the default matching (without ct and lb). 
The command I run is:

ovn-trace --db unix:./ovn-sb/ovn-sb.sock --ct new 'inport=="rp-public" && 
eth.src==00:10:10:01:02:13 && eth.dst==00:00:02:01:02:03 && ip4 && 
ip4.src==172.16.1.100 && ip4.dst==172.16.1.254 && ip.ttl==64 && tcp && 
tcp.dst==90' --lb-dst 192.168.1.2:90 

On Wed, Aug 4, 2021, at 20:17, Han Zhou wrote:
> 
> 
> On Mon, Mar 22, 2021 at 2:16 AM Krzysztof Klimonda 
>  wrote:
> >
> > If there are snat entries on the router, and some logical_ip are set to
> > network instead of an IP address then given SNAT is masquerade. In such
> > case ct_snat action is used in lr_in_unsnat table to ensure that the
> > packet is matched against conntrack and destination IP is replaced with
> > one from matching conntrack entry.
> >
> > This however breaks down for new connections sent to router's external IP
> > address. In such case, when packet is checked against conntrack table,
> > there is no match, and its destination IP remains unchanged. This causes a
> > loop in lr_in_ip_routing.
> >
> > This commit installs a new logical flow in lr_in_ip_routing table for
> > routers that have SNAT entry with logical_ip set to network (that being
> > masquerade). This flow drops packets that, after going through conntrack
> > via ct_snat action in lr_in_unsnat table, are not in established or
> > related state (!ct.est && !ct.rel) and which destination IP still matches
> > router's external IP. This prevents vswitchd from looping such packets
> > until their TTL reaches zero, as well as installing bogus flows in
> > datapath that lead to ovs module dropping such packages with "deferred
> > action limit reached, drop recirc action" message.
> >
> > Signed-off-by: Krzysztof Klimonda 
> 
> Thanks for the patch. As mentioned in the discussion ML could you post 
> more details on how the loop happens?
> Please see some more comments below.

As far as I understood it when writing the patch, there is a problem with how 
packets that are not part of conntrack flows are handled by ct_snat action:

When packet arrives and goes through lr_in_unsnat table, the ct_snat action 
does not match it to any existing conntrack flow, and destination IP remains 
unchaged. The packet then goes through lr_in_ip_routing and because destination 
IP is still router IP it is sent to egress only to end up in the same ingress 
pipeline, with the same logical flow in lr_in_unsnat  table, and with the same 
resulting ct_snat action - this results in a loop until ttl is decremented 
enough time.

My idea for the fix was to install a route in the router that would drop 
packets which, after being unSNATted, still have destination IP of the router.

I've attached two ovn-traces that try to illustrate the fact, although I can't 
really figure out how to properly use --ct with ovn-trace to simulate new and 
established connections.

I'm also having some weird system test failures with current master where in my 
tests ping fails to send first few packages against router IP with EBUSY 
(resource temporarily unavailable) being returned from recvfrom 

> 
> > ---
> >  northd/ovn-northd.c |  57 +++
> >  tests/ovn.at|  45 ++
> >  tests/system-ovn.at | 108 
> >  3 files changed, 210 insertions(+)
> >
> > diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> > index 4783e43d7..ea7db3d47 100644
> > --- a/northd/ovn-northd.c
> > +++ b/northd/ovn-northd.c
> > @@ -11304,6 +11304,7 @@ build_lrouter_nat_defrag_and_lb(struct ovn_datapath 
> > *od,
> >  ovs_be32 ip, mask;
> >  struct in6_addr ipv6, mask_v6, v6_exact = IN6ADDR_EXACT_INIT;
> >  bool is_v6 = false;
> > +bool is_masquerade = false;
> >  bool statele

Re: [ovs-dev] [RFC PATCH ovn 0/4] Use Distributed Gateway Port for ovn-controller scalability.

2021-08-04 Thread Krzysztof Klimonda
t; > > ovn-controller restart (recompute + reinstall OVS flows): 5s (>90%
> reduced)
> >
> > Hi Han,
> >
> > Thanks for these RFC patches.  The improvements are significant.
> > That's awesome.
> >
> > If I understand this RFC correctly, ovn-k8s will set the
> > gateway_chassis for each logical
> > router port of the cluster router (ovn_cluster_router) connecting to
> > the node logical switch right ?
> >
> > If so, instead of using the multiple gw port feature, why can't
> > ovn-k8s just set the chassis=
> > in the logical switch other_config option ?
> >
> > ovn-controllers can exclude the logical switches from the
> > local_datapaths if they don't belong to the local chassis.
> >
> > I'm not entirely sure if this would work.  Any thoughts ?  If the same
> > can be achieved using the chassis option
> > instead of multiple gw router ports, perhaps the former seems better
> > to me as it would be less work for ovn-k8s.
> > And there will be fewer resources in SB DB.   What do you think ?
> > Otherwise +1 from me for
> > this RFC series.
> >
> 
> Thanks Numan for the feedback!
> The reason why not introducing a new option in LS is:
> 1) The multiple DGP support is a valuable feature regardless of the use
> case of this RFC.
> 2) Don't flood-fill beyond DGP port is also valuable regardless of the
> ovn-k8s use case. As mentioned it would also help the OpenStack scalability
> when multi-tenant sharing same provider networks.

Does this optimization for OpenStack usecase works also when FIPs 
(dnat_and_snat) are in use? The commit message is a little unclear.

Best Regards,
Krzysztof


> 3) If 1) and 2) are both implemented, there is no need for an extra
> mechanism for "bind logical switches to chassis", because the outcome of 1)
> and 2) are sufficient. The changes in ovn-k8s would be the same, i.e. set
> the chassis somewhere, either to a LRP or a LS. I have sent a WIP PR to the
> ovn-k8s repo and it appears to be a very small change:
> https://github.com/ovn-org/ovn-kubernetes/pull/2388
> 
> In addition, a separate option on LS seems unnatural to me, because the end
> user must understand what they are doing by setting that option. In
> contrast, the DGP more flexibly and accurately tells what OVN should do.
> Maybe the name "Distributed Gateway Port" is somehow confusing, but the
> chassis-redirect port behind it is telling OVN that the user wants the
> traffic to be redirected to a chassis for the LRP. There can be different
> scenarios such as a single LS connecting to multiple DGPs and vice versa,
> all are valid setups that can be supported by this feature. While setting a
> chassis option for a LS is arbitrary and it is easy to create conflict
> setups, e.g. setting such an option for LS-join. Of course we can say the
> user is responsible for what they are setting, but I just don't see it
> necessary for now.
> 
> Does this make sense?
> 
> > Thanks
> > Numan
> >
> > >
> > > Han Zhou (4):
> > >   ovn-northd: Avoid ha_ref_chassis calculation when there is only one
> > > chassis in ha_chassis_group.
> > >   binding.c: Refactor binding_handle_port_binding_changes.
> > >   binding.c: Create a new function
> > > consider_patch_port_for_local_datapaths.
> > >   ovn-controller: Don't flood fill local datapaths beyond DGP boundary.
> > >
> > >  controller/binding.c   | 190 +
> > >  northd/ovn-northd.c|  39 +++--
> > >  ovn-architecture.7.xml |  26 ++
> > >  ovn-nb.xml |   6 ++
> > >  tests/ovn.at   |  67 +++
> > >  5 files changed, 268 insertions(+), 60 deletions(-)
> > >
> > > --
> > > 2.30.2
> > >
> > > ___
> > > dev mailing list
> > > d...@openvswitch.org
> > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > >
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 


-- 
  Krzysztof Klimonda
  kklimo...@syntaxhighlighted.com
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] is vxlan well supported by ovn ?

2021-06-25 Thread Krzysztof Klimonda
Hi,

Is this a limitation for a number of logical switches and logical ports that 
are part of networks that use vxlan (for example, by utilizing vtep interfaces) 
or for a total number of LSs and LSPs in a deployment?

Best Regards
Krzysztof

On Thu, Jun 24, 2021, at 18:32, Ihar Hrachyshka wrote:
> It is supported but with a number of limitations. Specifically, the
> number of switches, and ports per switch, is limited to 2^11 when
> VXLAN is used in a cluster. This is due to design limitations as
> described in e.g.:
> https://mail.openvswitch.org/pipermail/ovs-dev/2020-March/369201.html
> 
> Ihar
> 
> On Thu, Jun 24, 2021 at 11:58 AM ovsluck  wrote:
> >
> > Hi, Pettit and all:
> >
> >vxlan is almost data-center's standard protocol ,can be  offloaded  
> > to hardware.
> >
> >  is vxlan  well supported by ovn ? and what plans for the future?
> >
> >
> >
> >
> > thanks.
> >
> >
> > ___
> > dev mailing list
> > d...@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 


-- 
  Krzysztof Klimonda
  kklimo...@syntaxhighlighted.com
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v8 6/6] northd: Flood ARPs to routers for "unreachable" addresses.

2021-06-23 Thread Krzysztof Klimonda
> +# * lr2's interface that connects to ls-pub has IP address 
> 172.18.1.173/24
> +#
> +# lr1 has the following attributes:
> +#   * It has a DNAT rule that translates 172.18.2.11 to 192.168.100.5 
> (VM1)
> +#
> +# lr2 has the following attributes:
> +#   * It has a DNAT rule that translates 172.18.2.12 to 192.168.200.5 
> (VM2)
> +#
> +# In this test, we want to ensure that a ping from VM1 to IP address 
> 172.18.2.12 reaches VM2.
> +# When the NAT rules are set up, there should be MAC_Bindings created 
> that allow for traffic
> +# to exit lr1, go through ls-pub, and reach the NAT external IP 
> configured on lr2.
> +
> +check ovn-nbctl ls-add ls1
> +check ovn-nbctl lsp-add ls1 vm1 -- lsp-set-addresses vm1 
> "00:00:00:00:01:05 192.168.100.5"
> +
> +check ovn-nbctl ls-add ls2
> +check ovn-nbctl lsp-add ls2 vm2 -- lsp-set-addresses vm2 
> "00:00:00:00:02:05 192.168.200.5"
> +
> +check ovn-nbctl ls-add ls-pub
> +
> +check ovn-nbctl lr-add lr1
> +check ovn-nbctl lrp-add lr1 lr1-ls1 00:00:00:00:01:01 192.168.100.1/24
> +check ovn-nbctl lsp-add ls1 ls1-lr1  \
> +-- lsp-set-type ls1-lr1 router \
> +-- lsp-set-addresses ls1-lr1 router\
> +-- lsp-set-options ls1-lr1 router-port=lr1-ls1
> +
> +check ovn-nbctl lr-add lr2
> +check ovn-nbctl lrp-add lr2 lr2-ls2 00:00:00:00:02:01 192.168.200.1/24
> +check ovn-nbctl lsp-add ls2 ls2-lr2  \
> +-- lsp-set-type ls2-lr2 router \
> +-- lsp-set-addresses ls2-lr2 router\
> +-- lsp-set-options ls2-lr2 router-port=lr2-ls2
> +
> +check ovn-nbctl lrp-add lr1 lr1-ls-pub 00:00:00:00:03:01 
> 172.18.2.110/24
> +check ovn-nbctl lrp-set-gateway-chassis lr1-ls-pub hv1
> +check ovn-nbctl lsp-add ls-pub ls-pub-lr1  \
> +-- lsp-set-type ls-pub-lr1 router\
> +-- lsp-set-addresses ls-pub-lr1 router   \
> +-- lsp-set-options ls-pub-lr1 router-port=lr1-ls-pub
> +
> +check ovn-nbctl lrp-add lr2 lr2-ls-pub 00:00:00:00:03:02 
> 172.18.1.173/24
> +check ovn-nbctl lrp-set-gateway-chassis lr2-ls-pub hv1
> +check ovn-nbctl lsp-add ls-pub ls-pub-lr2  \
> +-- lsp-set-type ls-pub-lr2 router\
> +-- lsp-set-addresses ls-pub-lr2 router   \
> +-- lsp-set-options ls-pub-lr2 router-port=lr2-ls-pub
> +
> +# Putting --add-route on these NAT rules means there is no need to
> +# add any static routes.
> +check ovn-nbctl --add-route lr-nat-add lr1 dnat_and_snat 172.18.2.11 
> 192.168.100.5 vm1 00:00:00:00:03:01
> +check ovn-nbctl --add-route lr-nat-add lr2 dnat_and_snat 172.18.2.12 
> 192.168.200.5 vm2 00:00:00:00:03:02
> +
> +ADD_NAMESPACES(vm1)
> +ADD_VETH(vm1, vm1, br-int, "192.168.100.5/24", "00:00:00:00:01:05", \
> + "192.168.100.1")
> +
> +ADD_NAMESPACES(vm2)
> +ADD_VETH(vm2, vm2, br-int, "192.168.200.5/24", "00:00:00:00:02:05", \
> + "192.168.200.1")
> +
> +OVN_POPULATE_ARP
> +check ovn-nbctl --wait=hv sync
> +
> +AS_BOX([Testing a ping])
> +
> +NS_CHECK_EXEC([vm1], [ping -q -c 3 -i 0.3 -w 2 172.18.2.12 | 
> FORMAT_PING], \
> +[0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +AT_CLEANUP
> +])
> -- 
> 2.31.1
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 


-- 
  Krzysztof Klimonda
  kklimo...@syntaxhighlighted.com
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn] northd: Fix routing loop in LRs with one-to-many SNAT

2021-03-22 Thread Krzysztof Klimonda
If there are snat entries on the router, and some logical_ip are set to
network instead of an IP address then given SNAT is masquerade. In such
case ct_snat action is used in lr_in_unsnat table to ensure that the
packet is matched against conntrack and destination IP is replaced with
one from matching conntrack entry.

This however breaks down for new connections sent to router's external IP
address. In such case, when packet is checked against conntrack table,
there is no match, and its destination IP remains unchanged. This causes a
loop in lr_in_ip_routing.

This commit installs a new logical flow in lr_in_ip_routing table for
routers that have SNAT entry with logical_ip set to network (that being
masquerade). This flow drops packets that, after going through conntrack
via ct_snat action in lr_in_unsnat table, are not in established or
related state (!ct.est && !ct.rel) and which destination IP still matches
router's external IP. This prevents vswitchd from looping such packets
until their TTL reaches zero, as well as installing bogus flows in
datapath that lead to ovs module dropping such packages with "deferred
action limit reached, drop recirc action" message.

Signed-off-by: Krzysztof Klimonda 
---
 northd/ovn-northd.c |  57 +++
 tests/ovn.at|  45 ++
 tests/system-ovn.at | 108 
 3 files changed, 210 insertions(+)

diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
index 4783e43d7..ea7db3d47 100644
--- a/northd/ovn-northd.c
+++ b/northd/ovn-northd.c
@@ -11304,6 +11304,7 @@ build_lrouter_nat_defrag_and_lb(struct ovn_datapath *od,
 ovs_be32 ip, mask;
 struct in6_addr ipv6, mask_v6, v6_exact = IN6ADDR_EXACT_INIT;
 bool is_v6 = false;
+bool is_masquerade = false;
 bool stateless = lrouter_nat_is_stateless(nat);
 struct nbrec_address_set *allowed_ext_ips =
   nat->allowed_ext_ips;
@@ -11343,9 +11344,15 @@ build_lrouter_nat_defrag_and_lb(struct ovn_datapath 
*od,
 if (is_v6) {
 error = ipv6_parse_masked(nat->logical_ip, &ipv6, &mask_v6);
 cidr_bits = ipv6_count_cidr_bits(&mask_v6);
+if (cidr_bits < 128) {
+is_masquerade = true;
+}
 } else {
 error = ip_parse_masked(nat->logical_ip, &ip, &mask);
 cidr_bits = ip_count_cidr_bits(mask);
+if (cidr_bits < 32) {
+is_masquerade = 32;
+}
 }
 if (!strcmp(nat->type, "snat")) {
 if (error) {
@@ -11396,6 +11403,56 @@ build_lrouter_nat_defrag_and_lb(struct ovn_datapath 
*od,
 build_lrouter_in_dnat_flow(lflows, od, nat, match, actions, 
distributed,
mask, is_v6);
 
+/* When router have SNAT enabled, and logical_ip is a network (router
+ * is doing masquerade), then we need to make sure that packets
+ * unrelated to any established connection that still have router's
+ * external IP as a next hop after going through lr_in_unsnat table
+ * are dropped properly. Otherwise such packets will loop around
+ * between tables until their ttl reaches zero - this additionally
+ * causes kernel module to drop such packages due to recirculation
+ * limit being exceeded.
+ *
+ * Install a logical flow in lr_in_ip_routing table that will
+ * match packet with router's external IP that have no related
+ * conntrack entries and drop them. Flow priority must be higher
+ * than any other flow in lr_in_ip_routing that matches router's
+ * external IP.
+ *
+ * The priority for destination routes is calculated as
+ * (prefix length * 2) + 1, and there is an additional flow
+ * for when BFD is in play with priority + 1. Set priority that
+ * is higher than any other potential routing flow for that
+ * network, that is (prefix length * 2) + offset, where offset
+ * is 1 (dst) + 1 (bfd) + 1. */
+if (is_masquerade) {
+uint16_t priority, prefix_length, offset;
+
+if (is_v6) {
+prefix_length = 128;
+} else {
+prefix_length = 32;
+}
+offset = 3;
+priority = (prefix_length * 2) + offset;
+
+ds_clear(match);
+
+if (is_v6) {
+ds_put_format(match,
+  "ct.new && ip6.dst == %s",
+  nat->external_ip);
+} else {
+ds_put_format(match,
+  "ct.new && ip4.dst == %s",
+  nat->external_ip);
+}
+

Re: [ovs-dev] [OVN][Scale] Conjunctive matches exponentially increase in Table 45

2021-03-15 Thread Krzysztof Klimonda
Hi,

Sorry for what is most likely an unconnected reply to a thread - I can't seem 
to figure out how to reply to a thread from before I was subscribed to ML.

We've been testing OVN scaling for our OpenStack cloud, and found what seems to 
be a OF flow explosion that is basically a mirror of the issue reported by 
Girish a week ago or so.

In OpenStack, neutron creates a "default" security group that has 4 rules (2 
for both IPv4 and IPv6):

- allow all egress traffic from the port
- allow all ingress traffic from other ports belonging to the same default group

What we have discovered in our testing, is that this second rule translates 
into the following ACL in OVN:

```
outport == @pg_304cc336_8db3_4efd_a558_408e648e6259 && ip4 && ip4.src == 
$pg_304cc336_8db3_4efd_a558_408e648e6259_ip4
```
where port_group `pg_304cc336_8db3_4efd_a558_408e648e6259_ip4` is defined in 
nbdb and contains all ports attached to the SG, and address_set 
pg_304cc336_8db3_4efd_a558_408e648e6259_ip4 is defined in sbdb and seems to 
have a list of addresses that are assigned to ports from that port_group[1].

As Girish has explained in his email, such ACLs are translated into a bunch of 
duplicated flows that only seem to differ in metadata:

```
# ovs-ofctl dump-flows br-int |egrep "(12474|12475)"
[...]
 cookie=0x0, duration=47132.116s, table=45, n_packets=0, n_bytes=0, 
idle_age=47132, priority=2002,ip,reg0=0x100/0x100,reg15=0x3,metadata=0x20e 
actions=conjunction(12475,2/2)
 cookie=0x0, duration=47132.116s, table=45, n_packets=0, n_bytes=0, 
idle_age=47132, 
priority=2002,ip,reg0=0x100/0x100,metadata=0x20e,nw_src=1.0.0.67 
actions=conjunction(12475,1/2)
 cookie=0x0, duration=47132.116s, table=45, n_packets=0, n_bytes=0, 
idle_age=47132, 
priority=2002,ip,reg0=0x100/0x100,metadata=0x20e,nw_src=2.0.0.52 
actions=conjunction(12475,1/2)
 cookie=0x0, duration=47132.116s, table=45, n_packets=0, n_bytes=0, 
idle_age=47132, 
priority=2002,ip,reg0=0x100/0x100,metadata=0x20d,nw_src=1.0.0.67 
actions=conjunction(12475,1/2)
 cookie=0x0, duration=47132.116s, table=45, n_packets=0, n_bytes=0, 
idle_age=47132, 
priority=2002,ip,reg0=0x100/0x100,metadata=0x20d,nw_src=2.0.0.52 
actions=conjunction(12475,1/2)
 cookie=0xb25108c3, duration=47132.116s, table=45, n_packets=0, n_bytes=0, 
idle_age=47132, priority=2002,conj_id=12475,ip,reg0=0x100/0x100,metadata=0x20e 
actions=resubmit(,46)
 cookie=0xb25108c3, duration=47132.116s, table=45, n_packets=0, n_bytes=0, 
idle_age=47132, priority=2002,conj_id=12475,ip,reg0=0x100/0x100,metadata=0x20d 
actions=resubmit(,46)
[...]
#
```
(See http://paste.openstack.org/show/803598/ for the full output of grep)

His idea of changing this conjunction into one that matches additionally on 
metadata seems to make sense in this particular instance, given that all ports 
from all datapaths need to evaluate same set of rules, and possibly it makes 
sense for all ACLs too?

Anyway, to understand how OF flows are generated by ovn-controller, I took a 
quick look at the source code, and it seems that right now all flows are 
forcefully matched to their datapath (by unconditional matching on metadata 
field).
Would it make sense to introduce a notion of "datapath unbound flow" when 
conjunction is already matching metadata?
Are there some other parts of OVN code that heavily depend on flows being 
installed per-dp?
How would that affect OVS performance when matching packets in userspace? In 
our testing we've ended up with over 1M flows installed in table 45, which 
seems to be dwarfing any potential performance loss from having flows that 
don't match on metadata field, but perhaps I'm wrong? Still, that's a lot of 
flows, and puts a hard scaling limit on some openstack deployments given it's a 
SG that is by default attached to all ports on all VMs.


[1] (although apparently not additional IP addresses allowed on port via 
allowed-address-pair - I think I've seen this issue before while testing magnum.


-- 
  Krzysztof Klimonda
  kklimo...@syntaxhighlighted.com
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev