Re: [ovs-dev] [PATCH v2] ovn-vtep: fix arping from vtep-gw physical port
> Hello Ramu > > Can you describe your configuration for this test failure when logical > switch > arp responders are skipped for logical switch "router type" ports ? > I know the existing OVN tests (both system and non-system) pass either way. > > Thanks Darrell > There is no test that failed. But, I found that when a VM boots, and gets dhcp-ip, it pings the default-gw IP address, and for this it arps the gateway-ip. That arp hits the arp-responder flows for the router-port. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH v3] ovn-vtep: fix arping from vtep-gw physical port
Currently, arping from a vtep-gw physical-switch port to a VIF IP address does not work. When a physical-switch-port arps for an IP address of a VIF, that arp packet comes into the VIF hypervisor via a vxlan tunnel. That arp packet must not be responded-to by the arp responder table because, potentially, multiple hypervisors could independently respond. Signed-off-by: Ramu Ramamurthy <ramu.ramamur...@us.ibm.com> --- ovn/northd/ovn-northd.8.xml | 5 +++-- ovn/northd/ovn-northd.c | 5 - 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml index 1da633a..f27b776 100644 --- a/ovn/northd/ovn-northd.8.xml +++ b/ovn/northd/ovn-northd.8.xml @@ -421,8 +421,9 @@ -Priority-100 flows to skip ARP responder if inport is of type -localnet, and advances directly to the next table. +Priority-100 flows to skip the ARP responder if inport type is +localnet or vtep, and advances directly +to the next table. diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c index cdc5525..9606c95 100644 --- a/ovn/northd/ovn-northd.c +++ b/ovn/northd/ovn-northd.c @@ -2683,7 +2683,10 @@ build_lswitch_flows(struct hmap *datapaths, struct hmap *ports, continue; } -if (!strcmp(op->nbsp->type, "localnet")) { +/* Skip arp responder if the logical switch inport + * is of type localnet or vtep. */ +if (!strcmp(op->nbsp->type, "localnet") +|| !strcmp(op->nbsp->type, "vtep")) { ds_clear(); ds_put_format(, "inport == %s", op->json_key); ovn_lflow_add(lflows, op->od, S_SWITCH_IN_ARP_ND_RSP, 100, -- 1.9.1 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2] ovn-vtep: fix arping from vtep-gw physical port
-if (!strcmp(op->nbsp->type, "localnet")) { +/* Skip arp responder if the logical switch inport is not + * associated with a local VIF or a l2gateway port */ +if ((strcmp(op->nbsp->type, "")) && +(strcmp(op->nbsp->type, "l2gateway"))) { ds_clear(); ds_put_format(, "inport == %s", op->json_key); ovn_lflow_add(lflows, op->od, S_SWITCH_IN_ARP_ND_RSP, 100, The logical switch port of type "router" needs to have an arp-responder entry for VIFs to get the mac of the gateway port, but the above skips it and would introduce a regression. I would suggest a) that the code is simpler if we explicitly list the two known port-types for which we want to skip the arp-responder b) keep every other port-type as-is with respect to the arp-responder (and not make code changes for them in this commit). Please let me know what you think of this.. -if (!strcmp(op->nbsp->type, "localnet")) { +if (!strcmp(op->nbsp->type, "localnet") +|| !strcmp(op->nbsp->type, "vtep")) { ds_clear(); ds_put_format(, "inport == %s", op->json_key); ovn_lflow_add(lflows, op->od, S_SWITCH_IN_ARP_ND_RSP, 100, ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH v2] ovn-vtep: fix arping from vtep-gw physical port
Currently, arping from a vtep-gw physical-switch port to a VIF IP address does not work. When a physical-switch-port arps for an IP address of a VIF, that arp packet comes into the VIF hypervisor via a vxlan tunnel. That arp packet must not be responded-to by the arp responder table because, potentially, multiple hypervisors could independently respond. Signed-off-by: Ramu Ramamurthy <ramu.ramamur...@us.ibm.com> --- ovn/northd/ovn-northd.8.xml | 4 ++-- ovn/northd/ovn-northd.c | 4 +++- 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml index 1da633a..3b885f9 100644 --- a/ovn/northd/ovn-northd.8.xml +++ b/ovn/northd/ovn-northd.8.xml @@ -421,8 +421,8 @@ -Priority-100 flows to skip ARP responder if inport is of type -localnet, and advances directly to the next table. +Priority-100 flows to skip ARP responder if inport is not associated +with a local VIF, and advances directly to the next table. diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c index 2cbbb47..d752589 100644 --- a/ovn/northd/ovn-northd.c +++ b/ovn/northd/ovn-northd.c @@ -2683,7 +2683,9 @@ build_lswitch_flows(struct hmap *datapaths, struct hmap *ports, continue; } -if (!strcmp(op->nbsp->type, "localnet")) { +/* Skip installing arp responder if the logical switch inport + * is not associated with a local VIF. */ +if (strcmp(op->nbsp->type, "")) { ds_clear(); ds_put_format(, "inport == %s", op->json_key); ovn_lflow_add(lflows, op->od, S_SWITCH_IN_ARP_ND_RSP, 100, -- 1.9.1 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] ovn-vtep: fix arping from vtep-gw physical port
> > Does the following diff work on your system? > > > diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml > index 7797417..4f1cd89 100644 > --- a/ovn/northd/ovn-northd.8.xml > +++ b/ovn/northd/ovn-northd.8.xml > @@ -421,8 +421,8 @@ > > > > -Priority-100 flows to skip ARP responder if inport is of type > -localnet, and advances directly to the next table. > +Priority-100 flows to skip ARP responder if inport is not > associated > +with a local VIF, and advances directly to the next table. > > > > diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c > index 8c7e80c..bcb9738 100644 > --- a/ovn/northd/ovn-northd.c > +++ b/ovn/northd/ovn-northd.c > @@ -2428,7 +2428,9 @@ build_lswitch_flows(struct hmap *datapaths, struct > hmap *ports, > continue; > } > > -if (!strcmp(op->nbsp->type, "localnet")) { > +/* Skip installing arp responder if the logical switch inport > + * is not associated with a local VIF. */ > +if (strcmp(op->nbsp->type, "")) { > ds_clear(); > ds_put_format(, "inport == %s", op->json_key); > ovn_lflow_add(lflows, op->od, S_SWITCH_IN_ARP_ND_RSP, 100, Yes, the above works, The above code skips the arp-responder for all non-vif ports (localnet, vtep, router, l2gateway). Is that ok ? My usecase requires to skip the arp-responder for vtep because, multiple hypervisors could respond to an arp from vtep. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH] ovn-vtep: fix arping from vtep-gw physical port
Currently, arping from a vetp-gw physical-switch port to a VIF IP address does not work. When a physical-switch-port arps for an IP address of a VIF, that arp packet comes into the VIF hypervisor via a vxlan tunnel. That arp packet must not be responded-to by the arp responder table. Its treatment must be similar to the arps coming in from the localnet port - which is to skip the arp responder table. Signed-off-by: Ramu Ramamurthy <ramu.ramamur...@us.ibm.com> --- ovn/northd/ovn-northd.8.xml | 3 ++- ovn/northd/ovn-northd.c | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml index 307e8be..d6dc860 100644 --- a/ovn/northd/ovn-northd.8.xml +++ b/ovn/northd/ovn-northd.8.xml @@ -422,7 +422,8 @@ Priority-100 flows to skip ARP responder if inport is of type -localnet, and advances directly to the next table. +localnet or vtep, and advances directly to +the next table. diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c index 5ccb516..ebbc18a 100644 --- a/ovn/northd/ovn-northd.c +++ b/ovn/northd/ovn-northd.c @@ -2669,7 +2669,8 @@ build_lswitch_flows(struct hmap *datapaths, struct hmap *ports, continue; } -if (!strcmp(op->nbsp->type, "localnet")) { +if (!strcmp(op->nbsp->type, "localnet") +|| !strcmp(op->nbsp->type, "vtep")) { ds_clear(); ds_put_format(, "inport == %s", op->json_key); ovn_lflow_add(lflows, op->od, S_SWITCH_IN_ARP_ND_RSP, 100, -- 1.9.1 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] procfs interface to openvswitch kernel module to help debugging
Hi, I dont see a /proc filesystem based interface to help debug the openvswitch kernel module. Some of the knobs that may be useful are: a) debug knobs for stats (for those stats are not already reported), b) control knobs like: stop/start forwarding on the fast-path etc. Such an interface would help debugging for some problems. Does such an interface already exist ? If it does not, does it make sense to add such an interface ? ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] ovn: fix ovn-northd leaks in build_acl
Hi Russell, I updated the patch with a Signed-off-by, but it landed here as new on patchworks, https://patchwork.ozlabs.org/patch/665079/ instead of updating the original here: https://patchwork.ozlabs.org/patch/661079/ On Thu, Sep 1, 2016 at 4:50 PM, Russell Bryant <russ...@ovn.org> wrote: > > On Fri, Aug 19, 2016 at 9:06 PM, Ramu Ramamurthy <ramu.ramamur...@gmail.com> > wrote: >> >> From: Ramu Ramamurthy <ramu.ramamur...@us.ibm.com> >> >> The following leaks are due to missing ds_destroy in a few >> places in build_acl. >> >> 5,850 bytes in 50 blocks are definitely lost in loss record 93 of 93 >>at 0x4C29BFD: malloc (in >> /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) >>by 0x4C2BACB: realloc (in >> /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) >>by 0x449507: xrealloc (util.c:123) >>by 0x42CC73: ds_reserve (dynamic-string.c:63) >>by 0x42D08F: ds_put_format_valist (dynamic-string.c:161) >>by 0x42D176: ds_put_format (dynamic-string.c:142) >>by 0x40D380: build_acls (ovn-northd.c:2320) >>by 0x40D380: build_lswitch_flows.constprop.36 (ovn-northd.c:2472) >>by 0x4072D9: build_lflows (ovn-northd.c:3845) >>by 0x4072D9: ovnnb_db_run (ovn-northd.c:3971) >>by 0x4072D9: main (ovn-northd.c:4375) >> >> 9,360 bytes in 72 blocks are definitely lost in loss record 93 of 93 >>at 0x4C29BFD: malloc (in >> /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) >>by 0x4C2BACB: realloc (in >> /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) >>by 0x449507: xrealloc (util.c:123) >>by 0x42CC73: ds_reserve (dynamic-string.c:63) >>by 0x42D08F: ds_put_format_valist (dynamic-string.c:161) >>by 0x42D176: ds_put_format (dynamic-string.c:142) >>by 0x40D505: build_acls (ovn-northd.c:2346) >>by 0x40D505: build_lswitch_flows.constprop.36 (ovn-northd.c:2472) >>by 0x4072D9: build_lflows (ovn-northd.c:3845) >>by 0x4072D9: ovnnb_db_run (ovn-northd.c:3971) >>by 0x4072D9: main (ovn-northd.c:4375) >> --- >> ovn/northd/ovn-northd.c | 2 ++ >> 1 file changed, 2 insertions(+) > > > I was about to apply this, but it's missing your Signed-off-by. Can you > please re-submit with Signed-off-by included? > > Thanks, > > -- > Russell Bryant ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH] ovn: fix ovn-northd leaks in build_acl
The following leaks are due to missing ds_destroy in a few places in build_acl. 5,850 bytes in 50 blocks are definitely lost in loss record 93 of 93 at 0x4C29BFD: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) by 0x4C2BACB: realloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) by 0x449507: xrealloc (util.c:123) by 0x42CC73: ds_reserve (dynamic-string.c:63) by 0x42D08F: ds_put_format_valist (dynamic-string.c:161) by 0x42D176: ds_put_format (dynamic-string.c:142) by 0x40D380: build_acls (ovn-northd.c:2320) by 0x40D380: build_lswitch_flows.constprop.36 (ovn-northd.c:2472) by 0x4072D9: build_lflows (ovn-northd.c:3845) by 0x4072D9: ovnnb_db_run (ovn-northd.c:3971) by 0x4072D9: main (ovn-northd.c:4375) 9,360 bytes in 72 blocks are definitely lost in loss record 93 of 93 at 0x4C29BFD: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) by 0x4C2BACB: realloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) by 0x449507: xrealloc (util.c:123) by 0x42CC73: ds_reserve (dynamic-string.c:63) by 0x42D08F: ds_put_format_valist (dynamic-string.c:161) by 0x42D176: ds_put_format (dynamic-string.c:142) by 0x40D505: build_acls (ovn-northd.c:2346) by 0x40D505: build_lswitch_flows.constprop.36 (ovn-northd.c:2472) by 0x4072D9: build_lflows (ovn-northd.c:3845) by 0x4072D9: ovnnb_db_run (ovn-northd.c:3971) by 0x4072D9: main (ovn-northd.c:4375) Signed-off-by: Ramu Ramamurthy <ramu.ramamur...@us.ibm.com> Acked-by: Ryan Moats <rmo...@us.ibm.com> --- ovn/northd/ovn-northd.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c index 0ad9190..1731332 100644 --- a/ovn/northd/ovn-northd.c +++ b/ovn/northd/ovn-northd.c @@ -2342,6 +2342,7 @@ build_acls(struct ovn_datapath *od, struct hmap *lflows) ovn_lflow_add( lflows, od, S_SWITCH_OUT_ACL, 34000, ds_cstr(), actions); +ds_destroy(); } } @@ -2368,6 +2369,7 @@ build_acls(struct ovn_datapath *od, struct hmap *lflows) ovn_lflow_add( lflows, od, S_SWITCH_OUT_ACL, 34000, ds_cstr(), actions); +ds_destroy(); } } } -- 1.8.3.1 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH v2] ovn: log dhcp responses for debugging
Add a few messages at INFO to help debug the vif lifecycle. A logsearch on mac or ip helps debug what happened to the vif and when. This helps easily correlate logs across CMS and ovn. Logs appear like this: 2016-09-01T18:15:48Z|00014|binding|INFO|Claiming lport eee1a9af-7513-4540-9385-9e3972bfca05 for this chassis. 2016-09-01T18:15:48Z|00015|binding|INFO|Claiming fa:16:3e:01:c3:4a 10.0.0.7 fd93:b509:aa46:0:f816:3eff:fe01:c34a 2016-09-01T18:15:59Z|00016|pinctrl|INFO|DHCPOFFER fa:16:3e:01:c3:4a 10.0.0.7 2016-09-01T18:15:59Z|00017|pinctrl|INFO|DHCPACK fa:16:3e:01:c3:4a 10.0.0.7 2016-09-01T18:16:22Z|00018|binding|INFO|Releasing lport eee1a9af-7513-4540-9385-9e3972bfca05 from this chassis. 2016-09-01T18:16:22Z|00019|binding|INFO|Releasing fa:16:3e:01:c3:4a 10.0.0.7 fd93:b509:aa46:0:f816:3eff:fe01:c34a Signed-off-by: Ramu Ramamurthy <ramu.ramamur...@us.ibm.com> --- v1->v2: add rate limit ovn/controller/binding.c | 6 ++ ovn/controller/pinctrl.c | 7 +++ 2 files changed, 13 insertions(+) diff --git a/ovn/controller/binding.c b/ovn/controller/binding.c index 0353a7b..13c51da 100644 --- a/ovn/controller/binding.c +++ b/ovn/controller/binding.c @@ -143,6 +143,9 @@ consider_local_datapath(struct controller_ctx *ctx, } else { VLOG_INFO("Claiming lport %s for this chassis.", binding_rec->logical_port); +for (int i = 0; i < binding_rec->n_mac; i++) { +VLOG_INFO("Claiming %s", binding_rec->mac[i]); +} } sbrec_port_binding_set_chassis(binding_rec, chassis_rec); } @@ -179,6 +182,9 @@ consider_local_datapath(struct controller_ctx *ctx, if (ctx->ovnsb_idl_txn) { VLOG_INFO("Releasing lport %s from this chassis.", binding_rec->logical_port); +for (int i = 0; i < binding_rec->n_mac; i++) { +VLOG_INFO("Releasing %s", binding_rec->mac[i]); +} sbrec_port_binding_set_chassis(binding_rec, NULL); sset_find_and_delete(all_lports, binding_rec->logical_port); } diff --git a/ovn/controller/pinctrl.c b/ovn/controller/pinctrl.c index d65e213..323faa1 100644 --- a/ovn/controller/pinctrl.c +++ b/ovn/controller/pinctrl.c @@ -354,6 +354,13 @@ pinctrl_handle_put_dhcp_opts( pin->packet = dp_packet_data(_out); pin->packet_len = dp_packet_size(_out); +/* Log the response. */ +static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(20, 40); +const struct eth_header *l2 = dp_packet_l2(_out); +VLOG_INFO_RL(, "DHCP%s "ETH_ADDR_FMT" "IP_FMT"", + msg_type == DHCP_MSG_OFFER ? "OFFER" : "ACK", + ETH_ADDR_ARGS(l2->eth_src), IP_ARGS(*offer_ip)); + success = 1; exit: if (!ofperr) { -- 1.8.3.1 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH] ovn: log dhcp responses at INFO for debugging
Since dhcp responses are important for debugging the vif lifecycle, we want to log them at info level which is the default log level. A logsearch (on macaddr) can quickly tell the dhcp status using such messages. There is no need to rate limit such logs because, we expect dhcp messages to be at low rate normally. Logs appear like this: 2016-09-01T00:08:16Z|00014|pinctrl|INFO|DHCPOFFER fa:16:3e:25:b0:78 10.0.0.5 2016-09-01T00:08:16Z|00015|pinctrl|INFO|DHCPACK fa:16:3e:25:b0:78 10.0.0.5 Signed-off-by: Ramu Ramamurthy <ramu.ramamur...@us.ibm.com> --- ovn/controller/pinctrl.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/ovn/controller/pinctrl.c b/ovn/controller/pinctrl.c index d65e213..ad42fe9 100644 --- a/ovn/controller/pinctrl.c +++ b/ovn/controller/pinctrl.c @@ -354,6 +354,12 @@ pinctrl_handle_put_dhcp_opts( pin->packet = dp_packet_data(_out); pin->packet_len = dp_packet_size(_out); +/* Log the response. */ +const struct eth_header *l2 = dp_packet_l2(_out); +VLOG_INFO("DHCP%s "ETH_ADDR_FMT" "IP_FMT"", + msg_type == DHCP_MSG_OFFER ? "OFFER" : "ACK", + ETH_ADDR_ARGS(l2->eth_src), IP_ARGS(*offer_ip)); + success = 1; exit: if (!ofperr) { -- 1.8.3.1 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH] ovn: add lsp-deletion and bcast-flow removal tests for localnet
From: Ramu Ramamurthy <ramu.ramamur...@us.ibm.com> Add 2 tests for scenarios around lsp-deletion and flow removal which have escaped current unit tests. This test depends on the following patch: "ovn-controller: Back out incremental processing" and passes after applying it, but fails currently on master. 1) In the following sequence of events, createi vif1, create vif2, delete vif1 we find that the localnet patch port got deleted, whereas it should exist because there is a bound vif2. 2) The flow broadcasting to tunnels in table=32 must be deleted when a localnet port gets bound, but we find that the flow remains in table 32 causing broadcasts to both tunnels and localnet patch. Signed-off-by: Ramu Ramamurthy <ramu.ramamur...@us.ibm.com> --- tests/ovn.at | 71 1 file changed, 71 insertions(+) diff --git a/tests/ovn.at b/tests/ovn.at index a23b422..3fa267a 100644 --- a/tests/ovn.at +++ b/tests/ovn.at @@ -5053,3 +5053,74 @@ OVS_WAIT_UNTIL([test `ovs-ofctl dump-flows br-int table=0 | grep REG13 | wc -l` OVN_CLEANUP([hv1]) AT_CLEANUP + + +AT_SETUP([ovn -- lsp deletion and broadcast-flow deletion on localnet]) +AT_KEYWORDS([ovn]) +ovn_start +ovn-nbctl ls-add lsw0 +net_add n1 +for i in 1 2; do +sim_add hv$i +as hv$i +ovs-vsctl add-br br-phys +ovn_attach n1 br-phys 192.168.0.$i +ovs-vsctl add-br br-eth0 +AT_CHECK([ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings=physnet1:br-eth0]) +done + +# Create a localnet port. +AT_CHECK([ovn-nbctl lsp-add lsw0 ln_port]) +AT_CHECK([ovn-nbctl lsp-set-addresses ln_port unknown]) +AT_CHECK([ovn-nbctl lsp-set-type ln_port localnet]) +AT_CHECK([ovn-nbctl lsp-set-options ln_port network_name=physnet1]) + + +# Create 3 vifs. +AT_CHECK([ovn-nbctl lsp-add lsw0 localvif1]) +AT_CHECK([ovn-nbctl lsp-set-addresses localvif1 "f0:00:00:00:00:01 192.168.1.1"]) +AT_CHECK([ovn-nbctl lsp-set-port-security localvif1 "f0:00:00:00:00:01"]) +AT_CHECK([ovn-nbctl lsp-add lsw0 localvif2]) +AT_CHECK([ovn-nbctl lsp-set-addresses localvif2 "f0:00:00:00:00:01 192.168.1.2"]) +AT_CHECK([ovn-nbctl lsp-set-port-security localvif2 "f0:00:00:00:00:02"]) +AT_CHECK([ovn-nbctl lsp-add lsw0 localvif3]) +AT_CHECK([ovn-nbctl lsp-set-addresses localvif3 "f0:00:00:00:00:03 192.168.1.3"]) +AT_CHECK([ovn-nbctl lsp-set-port-security localvif3 "f0:00:00:00:00:03"]) + +# Bind the localvif1 to hv1. +as hv1 +AT_CHECK([ovs-vsctl add-port br-int localvif1 -- set Interface localvif1 external_ids:iface-id=localvif1]) + +# On hv1, check that there are no flows outputting bcast to tunnel +OVS_WAIT_UNTIL([test `ovs-ofctl dump-flows br-int table=32 | ofctl_strip | grep output | wc -l` -eq 0]) + +# On hv2, check that there is 1 flow outputting bcast to tunnel to hv1. +as hv2 +OVS_WAIT_UNTIL([test `ovs-ofctl dump-flows br-int table=32 | ofctl_strip | grep output | wc -l` -eq 1]) + +# Now bind vif2 on hv2. +AT_CHECK([ovs-vsctl add-port br-int localvif2 -- set Interface localvif2 external_ids:iface-id=localvif2]) + +# At this point, the broadcast flow on vif2 should be deleted. +# because, there is now a localnet vif bound (table=32 programming logic) +OVS_WAIT_UNTIL([test `ovs-ofctl dump-flows br-int table=32 | ofctl_strip | grep output | wc -l` -eq 0]) + +# Verify that the local net patch port exists on hv2. +OVS_WAIT_UNTIL([test `ovs-vsctl show | grep "Port patch-br-int-to-ln_port" | wc -l` -eq 1]) + +# Now bind vif3 on hv2. +AT_CHECK([ovs-vsctl add-port br-int localvif3 -- set Interface localvif3 external_ids:iface-id=localvif3]) + +# Verify that the local net patch port still exists on hv2 +OVS_WAIT_UNTIL([test `ovs-vsctl show | grep "Port patch-br-int-to-ln_port" | wc -l` -eq 1]) + +# Delete localvif2 +AT_CHECK([ovn-nbctl lsp-del localvif2]) + +# Verify that the local net patch port still exists on hv2, +# because, localvif3 is still bound. +OVS_WAIT_UNTIL([test `ovs-vsctl show | grep "Port patch-br-int-to-ln_port" | wc -l` -eq 1]) + + +OVN_CLEANUP([hv1],[hv2]) +AT_CLEANUP -- 1.8.3.1 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH] ovn: fix northd leaks in build_acl
From: Ramu Ramamurthy <ramu.ramamur...@us.ibm.com> The following leaks are due to missing ds_destroy in a few places in build_acl. 5,850 bytes in 50 blocks are definitely lost in loss record 93 of 93 at 0x4C29BFD: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) by 0x4C2BACB: realloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) by 0x449507: xrealloc (util.c:123) by 0x42CC73: ds_reserve (dynamic-string.c:63) by 0x42D08F: ds_put_format_valist (dynamic-string.c:161) by 0x42D176: ds_put_format (dynamic-string.c:142) by 0x40D380: build_acls (ovn-northd.c:2320) by 0x40D380: build_lswitch_flows.constprop.36 (ovn-northd.c:2472) by 0x4072D9: build_lflows (ovn-northd.c:3845) by 0x4072D9: ovnnb_db_run (ovn-northd.c:3971) by 0x4072D9: main (ovn-northd.c:4375) 9,360 bytes in 72 blocks are definitely lost in loss record 93 of 93 at 0x4C29BFD: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) by 0x4C2BACB: realloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) by 0x449507: xrealloc (util.c:123) by 0x42CC73: ds_reserve (dynamic-string.c:63) by 0x42D08F: ds_put_format_valist (dynamic-string.c:161) by 0x42D176: ds_put_format (dynamic-string.c:142) by 0x40D505: build_acls (ovn-northd.c:2346) by 0x40D505: build_lswitch_flows.constprop.36 (ovn-northd.c:2472) by 0x4072D9: build_lflows (ovn-northd.c:3845) by 0x4072D9: ovnnb_db_run (ovn-northd.c:3971) by 0x4072D9: main (ovn-northd.c:4375) Signed-off-by: Ramu Ramamurthy <ramu.ramamur...@us.ibm.com> --- ovn/northd/ovn-northd.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c index 625937d..756d188 100644 --- a/ovn/northd/ovn-northd.c +++ b/ovn/northd/ovn-northd.c @@ -2324,6 +2324,7 @@ build_acls(struct ovn_datapath *od, struct hmap *lflows) ovn_lflow_add( lflows, od, S_SWITCH_OUT_ACL, 34000, ds_cstr(), actions); +ds_destroy(); } } @@ -2350,6 +2351,7 @@ build_acls(struct ovn_datapath *od, struct hmap *lflows) ovn_lflow_add( lflows, od, S_SWITCH_OUT_ACL, 34000, ds_cstr(), actions); +ds_destroy(); } } } -- 1.8.3.1 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH] ovn: fix ovn-northd leaks in build_acl
From: Ramu Ramamurthy <ramu.ramamur...@us.ibm.com> The following leaks are due to missing ds_destroy in a few places in build_acl. 5,850 bytes in 50 blocks are definitely lost in loss record 93 of 93 at 0x4C29BFD: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) by 0x4C2BACB: realloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) by 0x449507: xrealloc (util.c:123) by 0x42CC73: ds_reserve (dynamic-string.c:63) by 0x42D08F: ds_put_format_valist (dynamic-string.c:161) by 0x42D176: ds_put_format (dynamic-string.c:142) by 0x40D380: build_acls (ovn-northd.c:2320) by 0x40D380: build_lswitch_flows.constprop.36 (ovn-northd.c:2472) by 0x4072D9: build_lflows (ovn-northd.c:3845) by 0x4072D9: ovnnb_db_run (ovn-northd.c:3971) by 0x4072D9: main (ovn-northd.c:4375) 9,360 bytes in 72 blocks are definitely lost in loss record 93 of 93 at 0x4C29BFD: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) by 0x4C2BACB: realloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) by 0x449507: xrealloc (util.c:123) by 0x42CC73: ds_reserve (dynamic-string.c:63) by 0x42D08F: ds_put_format_valist (dynamic-string.c:161) by 0x42D176: ds_put_format (dynamic-string.c:142) by 0x40D505: build_acls (ovn-northd.c:2346) by 0x40D505: build_lswitch_flows.constprop.36 (ovn-northd.c:2472) by 0x4072D9: build_lflows (ovn-northd.c:3845) by 0x4072D9: ovnnb_db_run (ovn-northd.c:3971) by 0x4072D9: main (ovn-northd.c:4375) --- ovn/northd/ovn-northd.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c index 625937d..756d188 100644 --- a/ovn/northd/ovn-northd.c +++ b/ovn/northd/ovn-northd.c @@ -2324,6 +2324,7 @@ build_acls(struct ovn_datapath *od, struct hmap *lflows) ovn_lflow_add( lflows, od, S_SWITCH_OUT_ACL, 34000, ds_cstr(), actions); +ds_destroy(); } } @@ -2350,6 +2351,7 @@ build_acls(struct ovn_datapath *od, struct hmap *lflows) ovn_lflow_add( lflows, od, S_SWITCH_OUT_ACL, 34000, ds_cstr(), actions); +ds_destroy(); } } } -- 1.8.3.1 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2] pinctrl: Fix memory leak for NAT IPs in send_garp_run().
I applied this patch, and while it fixes the original leak, it still produces the following leak: valgrind.8774-==8776== 2,464 bytes in 44 blocks are definitely lost in loss record 176 of 179 valgrind.8774-==8776==at 0x4C29BFD: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) valgrind.8774-==8776==by 0x48E3C7: xmalloc (util.c:112) valgrind.8774-==8776==by 0x411BAD: get_nat_addresses_and_keys (pinctrl.c:1221) valgrind.8774-==8776==by 0x411BAD: send_garp_run (pinctrl.c:1259) valgrind.8774-==8776==by 0x411BAD: pinctrl_run (pinctrl.c:787) valgrind.8774:==8776==by 0x407ABE: main (ovn-controller.c:451) With the following incremental change, no leak is found in that test. ( as the above points to leaking laddrs which was mallocd in get_nat_addresses_and_keys()) struct lport_addresses *laddrs = iter->data; destroy_lport_addresses(laddrs); shash_delete(_addresses, iter); + free(laddrs); } shash_destroy(_addresses); On Fri, Aug 19, 2016 at 9:03 AM, Ben Pfaff <b...@ovn.org> wrote: > send_garp_run() allocated and populated a shash of struct lport_addresses, > but it only freed some of the data. This fixes the problem. > > CC: Chandra S Vejendla <csvej...@us.ibm.com> > Reported-by: Ramu Ramamurthy <ramu.ramamur...@gmail.com> > Fixes: 8439c2ebd823 ("ovn: Support for GARP for NAT IPs via localnet") > Signed-off-by: Ben Pfaff <b...@ovn.org> > --- > v1->v2: Rebase. > > ovn/controller/pinctrl.c | 10 -- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/ovn/controller/pinctrl.c b/ovn/controller/pinctrl.c > index 358602a..fc72bd5 100644 > --- a/ovn/controller/pinctrl.c > +++ b/ovn/controller/pinctrl.c > @@ -1059,7 +1059,6 @@ send_garp_update(const struct sbrec_port_binding > *binding_rec, > } > free(name); > } > -destroy_lport_addresses(laddrs); > return; > } > > @@ -1302,7 +1301,14 @@ send_garp_run(const struct ovsrec_bridge *br_int, > const char *chassis_id, > sset_destroy(_vifs); > sset_destroy(_l3gw_ports); > simap_destroy(_ofports); > -shash_destroy_free_data(_addresses); > + > +SHASH_FOR_EACH_SAFE (iter, next, _addresses) { > +struct lport_addresses *laddrs = iter->data; > +destroy_lport_addresses(laddrs); > +shash_delete(_addresses, iter); > +} > +shash_destroy(_addresses); > + > sset_destroy(_ip_keys); > } > > -- > 2.1.3 > ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [bug] ovn: make check-valgrind issues
The following are among many reported by "make check-valgrind" on the ovn tests. The ones in pinctrl.c and binding.c seem real. 2246/valgrind.28645-==28647== 1,496 bytes in 22 blocks are definitely lost in loss record 175 of 179 2246/valgrind.28645-==28647==at 0x4C29BFD: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) 2246/valgrind.28645-==28647==by 0x4C2BACB: realloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) 2246/valgrind.28645-==28647==by 0x48E407: xrealloc (util.c:123) 2246/valgrind.28645-==28647==by 0x4201A4: add_ipv4_netaddr.isra.0 (ovn-util.c:28) 2246/valgrind.28645-==28647==by 0x420535: extract_lsp_addresses (ovn-util.c:103) 2246/valgrind.28645-==28647==by 0x4119EB: get_nat_addresses_and_keys (pinctrl.c:1223) 2246/valgrind.28645-==28647==by 0x4119EB: send_garp_run (pinctrl.c:1260) 2246/valgrind.28645-==28647==by 0x4119EB: pinctrl_run (pinctrl.c:787) 2246/valgrind.28645:==28647==by 0x407ABE: main (ovn-controller.c:451) 2258/valgrind.29417-==29419== 195 (96 direct, 99 indirect) bytes in 3 blocks are definitely lost in loss record 163 of 173 2258/valgrind.29417-==29419==at 0x4C29BFD: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) 2258/valgrind.29417-==29419==by 0x48E1E7: xmalloc (util.c:112) 2258/valgrind.29417-==29419==by 0x435044: resize (hmap.c:100) 2258/valgrind.29417-==29419==by 0x487F2D: hmap_insert_at (hmap.h:277) 2258/valgrind.29417-==29419==by 0x487F2D: sset_add__ (sset.c:53) 2258/valgrind.29417-==29419==by 0x487FE7: sset_clone (sset.c:82) 2258/valgrind.29417-==29419==by 0x40933B: binding_run (binding.c:289) 2258/valgrind.29417:==29419==by 0x4077B1: main (ovn-controller.c:438) 2258/valgrind.29417-==29419== 512 (384 direct, 128 indirect) bytes in 6 blocks are definitely lost in loss record 167 of 173 2258/valgrind.29417-==29419==at 0x4C2B974: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) 2258/valgrind.29417-==29419==by 0x48E1B4: xcalloc (util.c:95) 2258/valgrind.29417-==29419==by 0x4093D1: binding_run (binding.c:298) 2258/valgrind.29417:==29419==by 0x4077B1: main (ovn-controller.c:438) ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH] ovn: fix valgrind leak
From: Ramu Ramamurthy <ramu.ramamur...@us.ibm.com> This commit fixes the following leak found by check-valgrind in the test: "send gratuitous arp for nat ips in localnet" sset gets allocated but not destroyed. valgrind.14154-==14157== 1,892 bytes in 44 blocks are definitely lost in loss record 176 of 180 valgrind.14154-==14157==at 0x4C29BFD: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) valgrind.14154-==14157==by 0x48E1E7: xmalloc (util.c:112) valgrind.14154-==14157==by 0x487EDC: sset_add__ (sset.c:51) valgrind.14154-==14157==by 0x411A46: get_nat_addresses_and_keys (pinctrl.c:1231) valgrind.14154-==14157==by 0x411A46: send_garp_run (pinctrl.c:1260) valgrind.14154-==14157==by 0x411A46: pinctrl_run (pinctrl.c:787) valgrind.14154:==14157==by 0x407ABE: main (ovn-controller.c:451) Signed-off-by: Ramu Ramamurthy <ramu.ramamur...@us.ibm.com> --- ovn/controller/pinctrl.c | 1 + 1 file changed, 1 insertion(+) diff --git a/ovn/controller/pinctrl.c b/ovn/controller/pinctrl.c index 8bbc42d..358602a 100644 --- a/ovn/controller/pinctrl.c +++ b/ovn/controller/pinctrl.c @@ -1303,6 +1303,7 @@ send_garp_run(const struct ovsrec_bridge *br_int, const char *chassis_id, sset_destroy(_l3gw_ports); simap_destroy(_ofports); shash_destroy_free_data(_addresses); +sset_destroy(_ip_keys); } static void -- 1.8.3.1 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v7] ovn-northd: Add logical flows to support native DHCPv4
On Sat, Jul 23, 2016 at 5:49 AM, Numan Siddique <nusid...@redhat.com> wrote: > OVN implements a native DHCPv4 support which caters to the common > use case of providing an IP address to a booting instance by > providing stateless replies to DHCPv4 requests based on statically > configured address mappings. To do this it allows a short list of > DHCPv4 options to be configured and applied at each compute host > running ovn-controller. > > A new table 'DHCP_Options' is added in OVN NB DB to store the DHCP > options. Logical ports refer to this table to configure the DHCPv4 > options. > > For each logical port configured with DHCPv4 Options following flows > are added > - A logical flow which copies the DHCPv4 options to the DHCPv4 >request packets using the 'put_dhcp_opts' action and advances the >packet to the next stage. > > - A logical flow which implements the DHCP reponder by sending >the DHCPv4 reply back to the inport once the 'put_dhcp_opts' action >is applied. > > Signed-off-by: Numan Siddique <nusid...@redhat.com> > Co-authored-by: Ben Pfaff <b...@ovn.org> > Signed-off-by: Ben Pfaff <b...@ovn.org> > --- Tested-by: Ramu Ramamurthy <ramu.ramamur...@us.ibm.com> against openstack cms. Acked-by: Ramu Ramamurthy <ramu.ramamur...@us.ibm.com> ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v7] ovn-northd: Add logical flows to support native DHCPv4
Tested to work end-to-end with openstack. On Sat, Jul 23, 2016 at 5:49 AM, Numan Siddiquewrote: > OVN implements a native DHCPv4 support which caters to the common > use case of providing an IP address to a booting instance by > providing stateless replies to DHCPv4 requests based on statically > configured address mappings. To do this it allows a short list of > DHCPv4 options to be configured and applied at each compute host > running ovn-controller. > > A new table 'DHCP_Options' is added in OVN NB DB to store the DHCP > options. Logical ports refer to this table to configure the DHCPv4 > options. > > For each logical port configured with DHCPv4 Options following flows > are added > - A logical flow which copies the DHCPv4 options to the DHCPv4 >request packets using the 'put_dhcp_opts' action and advances the >packet to the next stage. > > - A logical flow which implements the DHCP reponder by sending >the DHCPv4 reply back to the inport once the 'put_dhcp_opts' action >is applied. > > Signed-off-by: Numan Siddique > Co-authored-by: Ben Pfaff > Signed-off-by: Ben Pfaff > --- > ovn/northd/ovn-northd.8.xml | 91 +- > ovn/northd/ovn-northd.c | 256 +- > ovn/ovn-nb.ovsschema | 20 ++- > ovn/ovn-nb.xml| 270 > ovn/utilities/ovn-nbctl.8.xml | 30 + > ovn/utilities/ovn-nbctl.c | 197 + > tests/ovn.at | 281 > ++ > 7 files changed, 1135 insertions(+), 10 deletions(-) > > > v6 -> v7 > * Rebased and resolved the merge conflicts. > > v5 -> v6 > * Rebased and fixed the compilation errors because of renaming of struct > ovn_port's nbs > to nbsp > > diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml > index ced2839..b95caef 100644 > --- a/ovn/northd/ovn-northd.8.xml > +++ b/ovn/northd/ovn-northd.8.xml > @@ -457,7 +457,90 @@ output; > > > > -Ingress Table 10: Destination Lookup > +Ingress Table 10: DHCP option processing > + > + > + This table adds the DHCPv4 options to a DHCPv4 packet from the > + logical ports configured with IPv4 address(es) and DHCPv4 options. > + > + > + > + > + > + A priority-100 logical flow is added for these logical ports > + which matches the IPv4 packet with udp.src = 68 and > + udp.dst = 67 and applies the action > + put_dhcp_opts and advances the packet to the next > table. > + > + > + > +reg0[3] = put_dhcp_opts(offer_ip = O, options...); > +next; > + > + > + > + For DHCPDISCOVER and DHCPREQUEST, this transforms the packet into a > + DHCP reply, adds the DHCP offer IP O and options to the > + packet, and stores 1 into reg0[3]. For other kinds of packets, it > + just stores 0 into reg0[3]. Either way, it continues to the next > + table. > + > + > + > + > + > +A priority-0 flow that matches all packets to advances to table 11. > + > + > + > +Ingress Table 11: DHCP responses > + > + > + This table implements DHCP responder for the DHCP replies generated by > + the previous table. > + > + > + > + > + > + A priority 100 logical flow is added for the logical ports > configured > + with DHCPv4 options which matches IPv4 packets with udp.src > == 68 > + udp.dst == 67 reg0[3] == 1 and > + responds back to the inport after applying these > + actions. If reg0[3] is set to 1, it means that the > + action put_dhcp_opts was successful. > + > + > + > +eth.dst = eth.src; > +eth.src = E; > +ip4.dst = O; > +ip4.src = S; > +udp.src = 67; > +udp.dst = 68; > +outport = P; > +inport = ""; /* Allow sending out inport. */ > +output; > + > + > + > + where E is the server MAC address and S is > the > + server IPv4 address defined in the DHCPv4 options and O > is > + the IPv4 address defined in the logical port's addresses column. > + > + > + > + (This terminates ingress packet processing; the packet does not go > + to the next ingress table.) > + > + > + > + > +A priority-0 flow that matches all packets to advances to table 12. > + > + > + > +Ingress Table 12: Destination Lookup > > >This table implements switching behavior. It contains these logical > @@ -531,6 +614,12 @@ output; >there are no rules added for load balancing new connections. > > > + > + Also a priority 34000 logical flow is added for each logical port which > + has DHCPv4 options defined to allow the DHCPv4 reply packet from the >
Re: [ovs-dev] [PATCH v5] ovn-northd: Add logical flows to support native DHCPv4
Tested-By: Ramu Ramamurthy <ramu.ramamur...@us.ibm.com> with corresponding openstack patch. Is there a mechanism to rate-limit the rate of broadcast dhcp-requests ? This is useful when there are untrusted vifs requesting dhcp. I sent dhcp-requests from a namespace at a high rate (100pkts/s, 200pks/s etc), ovn-controller is able to handle 500 dhcp-requests/s easily with replies. However, cpu of the ovn-controller climbs with with the dhcp-request rate. On Thu, Jul 14, 2016 at 10:07 AM, Numan Siddique <nusid...@redhat.com> wrote: > OVN implements a native DHCPv4 support which caters to the common > use case of providing an IP address to a booting instance by > providing stateless replies to DHCPv4 requests based on statically > configured address mappings. To do this it allows a short list of > DHCPv4 options to be configured and applied at each compute host > running ovn-controller. > > A new table 'DHCP_Options' is added in OVN NB DB to store the DHCP > options. Logical ports refer to this table to configure the DHCPv4 > options. > > For each logical port configured with DHCPv4 Options following flows > are added > - A logical flow which copies the DHCPv4 options to the DHCPv4 >request packets using the 'put_dhcp_opts' action and advances the >packet to the next stage. > > - A logical flow which implements the DHCP reponder by sending >the DHCPv4 reply back to the inport once the 'put_dhcp_opts' action >is applied. > > Signed-off-by: Numan Siddique <nusid...@redhat.com> > Co-authored-by: Ben Pfaff <b...@ovn.org> > Signed-off-by: Ben Pfaff <b...@ovn.org> > --- > ovn/northd/ovn-northd.8.xml | 91 +- > ovn/northd/ovn-northd.c | 256 +- > ovn/ovn-nb.ovsschema | 20 ++- > ovn/ovn-nb.xml| 270 > ovn/utilities/ovn-nbctl.8.xml | 30 + > ovn/utilities/ovn-nbctl.c | 197 + > tests/ovn.at | 281 > ++ > 7 files changed, 1135 insertions(+), 10 deletions(-) > > diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml > index 178a1a9..447e042 100644 > --- a/ovn/northd/ovn-northd.8.xml > +++ b/ovn/northd/ovn-northd.8.xml > @@ -423,7 +423,90 @@ output; > > > > -Ingress Table 10: Destination Lookup > +Ingress Table 10: DHCP option processing > + > + > + This table adds the DHCPv4 options to a DHCPv4 packet from the > + logical ports configured with IPv4 address(es) and DHCPv4 options. > + > + > + > + > + > + A priority-100 logical flow is added for these logical ports > + which matches the IPv4 packet with udp.src = 68 and > + udp.dst = 67 and applies the action > + put_dhcp_opts and advances the packet to the next > table. > + > + > + > +reg0[3] = put_dhcp_opts(offer_ip = O, options...); > +next; > + > + > + > + For DHCPDISCOVER and DHCPREQUEST, this transforms the packet into a > + DHCP reply, adds the DHCP offer IP O and options to the > + packet, and stores 1 into reg0[3]. For other kinds of packets, it > + just stores 0 into reg0[3]. Either way, it continues to the next > + table. > + > + > + > + > + > +A priority-0 flow that matches all packets to advances to table 11. > + > + > + > +Ingress Table 11: DHCP responses > + > + > + This table implements DHCP responder for the DHCP replies generated by > + the previous table. > + > + > + > + > + > + A priority 100 logical flow is added for the logical ports > configured > + with DHCPv4 options which matches IPv4 packets with udp.src > == 68 > + udp.dst == 67 reg0[3] == 1 and > + responds back to the inport after applying these > + actions. If reg0[3] is set to 1, it means that the > + action put_dhcp_opts was successful. > + > + > + > +eth.dst = eth.src; > +eth.src = E; > +ip4.dst = O; > +ip4.src = S; > +udp.src = 67; > +udp.dst = 68; > +outport = P; > +inport = ""; /* Allow sending out inport. */ > +output; > + > + > + > + where E is the server MAC address and S is > the > + server IPv4 address defined in the DHCPv4 options and O > is > + the IPv4 address defined in the logical port's addresses column. > + > + > + > +
Re: [ovs-dev] [PATCH v4] ovn-northd: Add logical flows to support native DHCPv4
Tested-By: Ramu Ramamurthy <ramu.ramamur...@us.ibm.com> against the corresponding (rebased) openstack patch: https://review.openstack.org/#/c/243174/ Hence, Acked-By: Ramu Ramamurthy <ramu.ramamur...@us.ibm.com> On Sun, Jul 3, 2016 at 8:54 PM, Numan Siddique <nusid...@redhat.com> wrote: > OVN implements a native DHCPv4 support which caters to the common > use case of providing an IP address to a booting instance by > providing stateless replies to DHCPv4 requests based on statically > configured address mappings. To do this it allows a short list of > DHCPv4 options to be configured and applied at each compute host > running ovn-controller. > > A new table 'DHCP_Options' is added in OVN NB DB to store the DHCP > options. Logical ports refer to this table to configure the DHCPv4 > options. > > For each logical port configured with DHCPv4 Options following flows > are added > - A logical flow which copies the DHCPv4 options to the DHCPv4 >request packets using the 'put_dhcp_opts' action and advances the >packet to the next stage. > > - A logical flow which implements the DHCP reponder by sending >the DHCPv4 reply back to the inport once the 'put_dhcp_opts' action >is applied. > > Signed-off-by: Numan Siddique <nusid...@redhat.com> > Co-authored-by: Ben Pfaff <b...@ovn.org> > Signed-off-by: Ben Pfaff <b...@ovn.org> > --- ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v10 2/2] ovn-northd: Add logical flows to support native DHCP
I tested and verified this patchset using the corresponding WIP openstack dhcp patch. native-dhcp works end-to-end as advertized - I tested with the default dhcp-options that are used when VMs boot. On Mon, Jun 6, 2016 at 10:49 PM, Numan Siddiquewrote: > OVN implements a native DHCP support which caters to the common > use case of providing an IP address to a booting instance by > providing stateless replies to DHCP requests based on statically > configured address mappings. To do this it allows a short list of > DHCP options to be configured and applied at each compute host > running ovn-controller. > > A new table 'Subnet' is added in OVN NB DB to store the DHCP options. > > For each logical port following flows are added if the CMS has defined > DHCP options in the 'Subnet' column > > - A logical flow which copies the DHCP options to the DHCP >request packets using the 'put_dhcp_opts' action and advances the >packet to the next stage. > > - A logical flow which implements the DHCP reponder by sending >the DHCP reply back to the inport once the 'put_dhcp_opts' action >is applied. > > Signed-Off-by: Numan Siddique ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v8 2/2] ovn-northd: Add logical flows to support native DHCP
Numan, I tested this patchset along with the corresponding WIP ml2 native-dhcp openstack patch. It looks good to me, On Tue, May 24, 2016 at 11:04 AM, Numan Siddiquewrote: > OVN implements a native DHCP support which caters to the common > use case of providing an IP address to a booting instance by > providing stateless replies to DHCP requests based on statically > configured address mappings. To do this it allows a short list of > DHCP options to be configured and applied at each compute host > running ovn-controller. > > A new table 'Subnet' is added in OVN NB DB to store the DHCP options. > > For each logical port following flows are added if the CMS has defined > DHCP options in the 'Subnet' column > > - A logical flow which copies the DHCP options to the DHCP >request packets using the 'put_dhcp_opts' action and advances the >packet to the next stage. > > - A logical flow which implements the DHCP reponder by sending >the DHCP reply back to the inport once the 'put_dhcp_opts' action >is applied. > > Signed-Off-by: Numan Siddique > --- > ovn/northd/ovn-northd.8.xml | 89 +++- > ovn/northd/ovn-northd.c | 265 ++- > ovn/ovn-nb.ovsschema | 19 ++- > ovn/ovn-nb.xml| 314 > +- > ovn/utilities/ovn-nbctl.8.xml | 29 > ovn/utilities/ovn-nbctl.c | 196 ++ > tests/ovn.at | 250 + > tests/test-ovn-dhcp.c | 135 ++ > 8 files changed, 1287 insertions(+), 10 deletions(-) > > diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml > index 970c352..9a4a46a 100644 > --- a/ovn/northd/ovn-northd.8.xml > +++ b/ovn/northd/ovn-northd.8.xml > @@ -326,7 +326,88 @@ output; > > > > -Ingress Table 6: Destination Lookup > +Ingress Table 6: DHCP pause > + > + > + This table adds the DHCP options to a DHCP packet from the > + logical ports configured with IPv4 address(es) and DHCP options. > + > + > + > + > + > + A priority-100 logical flow is added for these logical ports > + which matches the packet with udp.src = 68 and > + udp.dst = 67 and applies the action > + put_dhcp_opts and advances the packet to the table 7. > + > + > + > +put_dhcp_opts(R, offer_ip = O, DHCP_OPTIONS); > +next; > + > + > + > + The action put_dhcp_opts adds the DHCP options to > + the packet and sets the yiaddr to the offer ip > + defined in O only if the packet is a > + DHCPDISCOVER/DHCPREQUEST and resumes the pipeline by setting > + 0x1 to the OVS register R. > + If the packet is non-DHCP packet or invalid DHCP packet > + it resumes the pipeline without any modifications. > + > + > + > + > + > +A priority-0 flow that matches all packets to advances to table 7. > + > + > + > +Ingress Table 7: DHCP resume > + > + > + This table implements DHCP responder for the resumed DHCP packets > + from the previous table. > + > + > + > + > + > + A priority 100 logical flow is added for the logical ports > configured > + with DHCP options which matches the packet with > + udp.src = 68, udp.dst = 67 and > + reg0 = 0x1 and responds back to the > + inport after applying these actions. > + If reg0 is set to 1, it means that the action > + put_dhcp_opts was successful. > + > + > + > +eth.dst = eth.src; > +eth.src = E; > +ip4.dst = O; > +ip4.src = S; > +udp.src = 67; > +udp.dst = 68; > +outport = P; > +inport = ""; /* Allow sending out inport. */ > +output; > + > + > + > + Where E is the server mac address and S is > the > + server IPv4 address defined in the DHCP options and O is > + the IPv4 address defined in the logical port's addresses column. > + > + > + > + > +A priority-0 flow that matches all packets to advances to table 8. > + > + > + > +Ingress Table 8: Destination Lookup > > >This table implements switching behavior. It contains these logical > @@ -370,6 +451,12 @@ output; >This is similar to ingress table 4 except for to-lport > ACLs. > > > + > + Also a priority 34000 logical flow is added for each subnet of the > logical > + switch which has DHCP options defined to allow the DHCP reply packet > + from the Ingress Table 7: DHCP resume. > + > + > Egress Table 2: Egress Port Security - IP > > > diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c > index 44e9430..ff16622 100644 > --- a/ovn/northd/ovn-northd.c > +++ b/ovn/northd/ovn-northd.c > @@ -26,6 +26,7 @@ >
Re: [ovs-dev] [PATCH v7 1/2] ovn-controller: Add 'put_dhcp_opts' action in ovn-controller
Tested-by: Ramu Ramamurthy <ramu.ramamur...@us.ibm.com> I tested this patchset to work end-to-end with openstack (with the corresponding WIP openstack patch). > +/* Check that the DHCP Message Type (opt 53) is present or not with > + * valid values - DHCP_MSG_DISCOVER or DHCP_MSG_REQUEST as the first > + * DHCP option. > + */ > +if (!(in_dhcp_opt[0] == DHCP_OPT_MSG_TYPE && in_dhcp_opt[1] == 1 && ( > +in_dhcp_opt[2] == DHCP_MSG_DISCOVER || > +in_dhcp_opt[2] == DHCP_MSG_REQUEST))) { > +/* Invalid dhcp packet. Resume the packet without further > + * processing. */ > +goto exit; > +} > + Would it make sense to log messages are not handling (nak, decline, release, inform) to help debug (rare) scenarios when VMs dont get dhcp IPs as expected. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v5 2/2] ovn: Add logical flows to support native DHCP
> The reason for not adding the flow in IN_ACL is because the CMS can add > flows to allow or drop DHCP traffic on a logical port if it wants to. In > the case of OpenStack networking-ovn, it is adding the below flows for each > logical port. > > table=4( ls_in_acl), priority= 2002, match=(ct.new && (inport == > "2636f285-6d1a-4ad9-89db-c3323349c554" && ip4)), action=(ct_commit; next;) > table=4( ls_in_acl), priority= 2002, match=(ct.new && (inport == > "2636f285-6d1a-4ad9-89db-c3323349c554" && ip6)), action=(ct_commit; next;) > table=4( ls_in_acl), priority= 2001, match=(inport == > "2636f285-6d1a-4ad9-89db-c3323349c554" && ip), action=(drop;) > > > Actually if we want we can remove the 34000 OUT_ACL flow from ovn-northd and > let CMS add it. I initially thought its good to take care of it on > ovn-northd. But now I am not sure whats the best approach. Please let me > know your comments. > Thanks, It may be fine to program the out-acl flows implicitly (by northd) as a consequence of the CMS setting enable_dhcp on the port (ie the user has accepted DHCP traffic on the port by setting enable_dhcp). ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] pinctrl: Fix "sparse" warning.
Darrell, Ben, Thanks for the fix, The warning was introduced by my patch "send GARP on localnet" In the future, I will run my changes through sparse (and possibly clang) to detect such problems prior to sharing a patch. ovn/controller/pinctrl.c:609:39: warning: incorrect type in assignment (different base types) ovn/controller/pinctrl.c:609:39:expected restricted ofp_port_t [usertype] port ovn/controller/pinctrl.c:609:39:got int [signed] ofport On Tue, May 17, 2016 at 8:12 AM, Darrell Ballwrote: > On Tue, May 17, 2016 at 7:44 AM, Ben Pfaff wrote: > >> The ofport member should be an ofp_port_t, since it represents an OpenFlow >> port number. >> >> Fixes: 9baaabfff3c7 ("ovn: Fix localnet ports deletion and recreation >> sometimes after restart.") >> > > Is this rather related to "ovn: send garp on localnet." ? > > > >> Signed-off-by: Ben Pfaff >> --- >> ovn/controller/pinctrl.c | 5 +++-- >> 1 file changed, 3 insertions(+), 2 deletions(-) >> >> diff --git a/ovn/controller/pinctrl.c b/ovn/controller/pinctrl.c >> index 1611bcd..bc57c40 100644 >> --- a/ovn/controller/pinctrl.c >> +++ b/ovn/controller/pinctrl.c >> @@ -514,7 +514,7 @@ struct garp_data { >> ovs_be32 ipv4; /* Ipv4 address of port. */ >> long long int announce_time; /* Next announcement in ms. */ >> int backoff; /* Backoff for the next announcement. */ >> -int ofport; /* ofport used to output this GARP. */ >> +ofp_port_t ofport; /* ofport used to output this GARP. */ >> }; >> >> /* Contains GARPs to be sent. */ >> @@ -548,7 +548,8 @@ send_garp_update(const struct sbrec_port_binding >> *binding_rec, >> if (!ld || !ld->localnet_port) { >> return; >> } >> -int ofport = simap_get(localnet_ofports, >> ld->localnet_port->logical_port); >> +ofp_port_t ofport = u16_to_ofp(simap_get(localnet_ofports, >> + >> ld->localnet_port->logical_port)); >> >> /* Update GARP if it exists. */ >> struct garp_data *garp = shash_find_data(_garp_data, >> -- >> 2.1.3 >> >> ___ >> dev mailing list >> dev@openvswitch.org >> http://openvswitch.org/mailman/listinfo/dev >> > ___ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v5 2/2] ovn: Add logical flows to support native DHCP
Tested-by: Ramu Ramamurthy <ramu.ramamur...@us.ibm.com> I tested v5 of this patchset to work end-to-end with openstack (using your openstack changes which are also under review). The options tested include dns-server and classless-static-route. A question I have is why you program these out-acl flows at prio 34000, whereas you did not need to program the corresponding in-acl flow ? table=1( ls_out_acl), priority=34000, match=(eth.src == fa:16:3e:94:07:40 && ip4.src == 10.0.2.1 && udp && udp.src == 67 && udp.dst == 68), action=(ct_commit; next;) On Mon, May 16, 2016 at 2:23 AM, Numan Siddique <nusid...@redhat.com> wrote: > OVN implements a native DHCP support which caters to the common > use case of providing an IP address to a booting instance by > providing stateless replies to DHCP requests based on statically > configured address mappings. To do this it allows a short list of > DHCP options to be configured and applied at each compute host > running ovn-controller. > ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH RFC] ovn: localnet ports deleted, recreated sometimes after restart
On graceful-restart of ovn-controller, the chassis row is inserted in the Chassis table. During this transaction, there is a window of time where an idl row-read may not return the newly created row - even though the row should exist, but the transaction is in an incomplete state. As a result, get_chassis() in binding_run() returns a null chassis record binding_run exits early, and does not create local_datapaths, and patch_run deletes localnet patch ports. In a later run, the localnet patch ports are recreated. This is reproducable consistently but not on every restart. The fix is to handle the case that the chassis record may be null in binding_run, and yet create local_datapaths. RFC because, I am not sure if the fix is correct - ie, can there be local_datapaths with no chassis record ? And if this fix is not correct, then what is the correct fix ? Would it be to not run binding_run() and patch_run() when ovnsb_idl_txn is null ? Restart logs follow with commentary: 2016-04-28T18:35:42.448Z|1|vlog|INFO|opened log file /home/ovs/ovs/tests/testsuite.dir/2035/hv/ovn-controller.log 2016-04-28T18:35:42.449Z|2|reconnect|INFO|unix:/home/ovs/ovs/tests/testsuite.dir/2035/hv/db.sock: connecting... 2016-04-28T18:35:42.449Z|3|reconnect|INFO|unix:/home/ovs/ovs/tests/testsuite.dir/2035/hv/db.sock: connected 2016-04-28T18:35:42.452Z|4|reconnect|INFO|unix:/home/ovs/ovs/tests/testsuite.dir/2035/ovn-sb/ovn-sb.sock: connecting... 2016-04-28T18:35:42.452Z|5|reconnect|INFO|unix:/home/ovs/ovs/tests/testsuite.dir/2035/ovn-sb/ovn-sb.sock: connected 2016-04-28T18:35:42.454Z|6|ovsdb_idl|INFO|ovsdb_idl_txn_insert: Chassis row inserted into transaction above 2016-04-28T18:35:42.454Z|7|binding|INFO|Claiming lport localvif2 for this chassis. 2016-04-28T18:35:42.454Z|8|binding|INFO|Claiming lport localvif3 for this chassis. 2016-04-28T18:35:42.454Z|9|binding|INFO|Claiming lport localcif4 for this chassis. 2016-04-28T18:35:42.454Z|00010|binding|INFO|Claiming lport localcif5 for this chassis. 2016-04-28T18:35:42.454Z|00011|binding|INFO|Claiming lport localcif1 for this chassis. 2016-04-28T18:35:42.454Z|00012|binding|INFO|Claiming lport localvif1 for this chassis. 2016-04-28T18:35:42.454Z|00013|binding|INFO|Claiming lport localvif201 for this chassis. 2016-04-28T18:35:42.454Z|00014|binding|INFO|Claiming lport localcif3 for this chassis. 2016-04-28T18:35:42.454Z|00015|binding|INFO|Claiming lport localcif2 for this chassis. Binding run found the chassis record and has claimed the vifs 2016-04-28T18:35:42.455Z|00016|ofctrl|INFO|unix:/home/ovs/ovs/tests/testsuite.dir/2035/hv/br-int.mgmt: connecting to switch 2016-04-28T18:35:42.455Z|00017|rconn|INFO|unix:/home/ovs/ovs/tests/testsuite.dir/2035/hv/br-int.mgmt: connecting... 2016-04-28T18:35:42.455Z|00018|pinctrl|INFO|unix:/home/ovs/ovs/tests/testsuite.dir/2035/hv/br-int.mgmt: connecting to switch 2016-04-28T18:35:42.456Z|00019|rconn|INFO|unix:/home/ovs/ovs/tests/testsuite.dir/2035/hv/br-int.mgmt: connecting... 2016-04-28T18:35:42.457Z|00020|ovsdb_idl|INFO|ovsdb_idl_row_clear_new: At this point read of Chassis table returns no rows, and the transaction status is still incomplete. 2016-04-28T18:35:42.457Z|00021|binding|INFO|no chassis rec! Binding run exits early because chassis_rec was null 2016-04-28T18:35:42.459Z|00022|patch|INFO|removing port patch-br-int-to-localnet201 2016-04-28T18:35:42.459Z|00023|patch|INFO|removing port patch-br-int-to-localnet1 2016-04-28T18:35:42.459Z|00024|patch|INFO|removing port patch-localnet1-to-br-int 2016-04-28T18:35:42.459Z|00025|patch|INFO|removing port patch-localnet201-to-br-int Localnet ports are removed above, because local_datapaths dont exist 2016-04-28T18:35:42.459Z|00026|rconn|INFO|unix:/home/ovs/ovs/tests/testsuite.dir/2035/hv/br-int.mgmt: connected 2016-04-28T18:35:42.460Z|00027|rconn|INFO|unix:/home/ovs/ovs/tests/testsuite.dir/2035/hv/br-int.mgmt: connected 2016-04-28T18:35:42.460Z|00028|ovsdb_idl|INFO|ovsdb_idl_row_create: Now, the transaction is complete Signed-off-by: Ramu Ramamurthy <ramu.ramamur...@us.ibm.com> --- ovn/controller/binding.c | 9 +++-- 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/ovn/controller/binding.c b/ovn/controller/binding.c index 32fcb85..9f50cd5 100644 --- a/ovn/controller/binding.c +++ b/ovn/controller/binding.c @@ -155,9 +155,6 @@ binding_run(struct controller_ctx *ctx, const struct ovsrec_bridge *br_int, const struct sbrec_port_binding *binding_rec; chassis_rec = get_chassis(ctx->ovnsb_idl, chassis_id); -if (!chassis_rec) { -return; -} struct shash lports = SHASH_INITIALIZER(); if (br_int) { @@ -190,10 +187,10 @@ binding_run(struct controller_ctx *ctx, const struct ovsrec_bridge *br_int, if (iface_rec && ctx->ovs_idl_txn) {
Re: [ovs-dev] [PATCH v1 2/2] ovn: Send garp on localnet
> > Is there an equivalent of GARP for IPv6? > The "Unsolicited Neighbor Advertisement" seems to be the equivalent of GARP for IPv6. http://tools.ietf.org/html/rfc2461#section-7.2.6 The problem motivating this fix was seen with IPv4 though. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH v2 2/2] ovn: send garp on localnet
In some usecases such as VM migration or when VMs reuse IP addresses, VMs become unreachable externally because external switches/routers on localnet have stale port-mac or arp caches. The problem resolves after some time when the caches ageout which could be minutes for port-mac bindings or hours for arp caches. To fix this, send some gratuitous ARPs when a logical port on a localnet datapath gets added. Such gratuitous arps help on a best-effort basis to update the mac-port bindings and arp-caches of external switches and routers on the localnet. Reported-at: https://bugs.launchpad.net/networking-ovn/+bug/1545897 Reported-by: Kyle Mestery <mest...@mestery.com> Signed-off-by: Ramu Ramamurthy <ramu.ramamur...@us.ibm.com> --- ovn/controller/ovn-controller.c | 2 +- ovn/controller/pinctrl.c| 244 +++- ovn/controller/pinctrl.h| 5 +- tests/ovn.at| 63 +++ 4 files changed, 311 insertions(+), 3 deletions(-) diff --git a/ovn/controller/ovn-controller.c b/ovn/controller/ovn-controller.c index f68f842..ae7d44b 100644 --- a/ovn/controller/ovn-controller.c +++ b/ovn/controller/ovn-controller.c @@ -373,7 +373,7 @@ main(int argc, char *argv[]) enum mf_field_id mff_ovn_geneve = ofctrl_run(br_int); -pinctrl_run(, , br_int); +pinctrl_run(, , br_int, chassis_id, _datapaths); struct hmap flow_table = HMAP_INITIALIZER(_table); lflow_run(, , , _datapaths, diff --git a/ovn/controller/pinctrl.c b/ovn/controller/pinctrl.c index cbac50e..aec2aa3 100644 --- a/ovn/controller/pinctrl.c +++ b/ovn/controller/pinctrl.c @@ -22,6 +22,8 @@ #include "dp-packet.h" #include "flow.h" #include "lport.h" +#include "ovn-controller.h" +#include "lib/sset.h" #include "openvswitch/ofp-actions.h" #include "openvswitch/ofp-msgs.h" #include "openvswitch/ofp-print.h" @@ -30,6 +32,7 @@ #include "ovn-controller.h" #include "ovn/lib/actions.h" #include "ovn/lib/logical-fields.h" +#include "ovn/lib/ovn-util.h" #include "poll-loop.h" #include "rconn.h" #include "socket-util.h" @@ -54,6 +57,14 @@ static void run_put_arps(struct controller_ctx *, static void wait_put_arps(struct controller_ctx *); static void flush_put_arps(void); +static void init_send_garps(void); +static void destroy_send_garps(void); +static void send_garp_wait(void); +static void send_garp_run(const struct ovsrec_bridge *, + const char *chassis_id, + const struct lport_index *lports, + struct hmap *local_datapaths); + COVERAGE_DEFINE(pinctrl_drop_put_arp); void @@ -62,6 +73,7 @@ pinctrl_init(void) swconn = rconn_create(5, 0, DSCP_DEFAULT, 1 << OFP13_VERSION); conn_seq_no = 0; init_put_arps(); +init_send_garps(); } static ovs_be32 @@ -266,7 +278,9 @@ pinctrl_recv(const struct ofp_header *oh, enum ofptype type) void pinctrl_run(struct controller_ctx *ctx, const struct lport_index *lports, -const struct ovsrec_bridge *br_int) +const struct ovsrec_bridge *br_int, +const char *chassis_id, +struct hmap *local_datapaths) { if (br_int) { char *target; @@ -307,6 +321,7 @@ pinctrl_run(struct controller_ctx *ctx, const struct lport_index *lports, } run_put_arps(ctx, lports); +send_garp_run(br_int, chassis_id, lports, local_datapaths); } void @@ -315,6 +330,7 @@ pinctrl_wait(struct controller_ctx *ctx) wait_put_arps(ctx); rconn_run_wait(swconn); rconn_recv_wait(swconn); +send_garp_wait(); } void @@ -322,6 +338,7 @@ pinctrl_destroy(void) { rconn_destroy(swconn); destroy_put_arps(); +destroy_send_garps(); } /* Implementation of the "put_arp" OVN action. This action sends a packet to @@ -484,3 +501,228 @@ flush_put_arps(void) free(pa); } } + +/* + * Send gratuitous arp for vif on localnet. + * + * When a new vif on localnet is added, gratuitous arps are sent announcing + * the port's mac,ip mapping. On localnet, such announcements are needed for + * switches and routers on the broadcast segment to update their port-mac + * and arp tables. + */ +struct garp_data { +struct eth_addr ea; /* Ethernet address of port. */ +ovs_be32 ipv4; /* Ipv4 address of port. */ +long long int announce_time; /* Next announcement in ms. */ +int backoff; /* Backoff for the next announcement. */ +int ofport; /* Ofport used to output this garp. */ +}; + +/* Contains garps to be sent. */ +static struct shash send_garp_data; + +/* Next garp announcement in ms. */ +static long long int send_garp_time; + +static void +init_send_garps(void) +{ +shash_i
[ovs-dev] [PATCH v2 1/2] ovn: Move extract_lport_addresses
Move the function extract_lport_addresses to a file in ovn/lib since that function can be used by ovn-controller also to parse addresses stored in the mac column of the port_binding table. Currently that function is used only in ovn_northd. Signed-off-by: Ramu Ramamurthy <ramu.ramamur...@us.ibm.com> --- ovn/lib/automake.mk | 2 + ovn/lib/ovn-util.c | 104 ovn/lib/ovn-util.h | 44 ovn/northd/ovn-northd.c | 104 +--- 4 files changed, 151 insertions(+), 103 deletions(-) create mode 100644 ovn/lib/ovn-util.c create mode 100644 ovn/lib/ovn-util.h diff --git a/ovn/lib/automake.mk b/ovn/lib/automake.mk index 9e09352..2b178da 100644 --- a/ovn/lib/automake.mk +++ b/ovn/lib/automake.mk @@ -10,6 +10,8 @@ ovn_lib_libovn_la_SOURCES = \ ovn/lib/expr.h \ ovn/lib/lex.c \ ovn/lib/lex.h \ + ovn/lib/ovn-util.c \ + ovn/lib/ovn-util.h \ ovn/lib/logical-fields.h nodist_ovn_lib_libovn_la_SOURCES = \ ovn/lib/ovn-nb-idl.c \ diff --git a/ovn/lib/ovn-util.c b/ovn/lib/ovn-util.c new file mode 100644 index 000..abdc247 --- /dev/null +++ b/ovn/lib/ovn-util.c @@ -0,0 +1,104 @@ +/* + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include +#include "ovn-util.h" +#include "openvswitch/vlog.h" + +VLOG_DEFINE_THIS_MODULE(ovn_util); + +/* + * Extracts the mac, ipv4 and ipv6 addresses from the input param 'address' + * which should be of the format 'MAC [IP1 IP2 ..]" where IPn should be + * a valid IPv4 or IPv6 address and stores them in the 'ipv4_addrs' and + * 'ipv6_addrs' fields of input param 'laddrs'. + * The caller has to free the 'ipv4_addrs' and 'ipv6_addrs' fields. + * If input param 'store_ipv6' is true only then extracted ipv6 addresses + * are stored in 'ipv6_addrs' fields. + * Return true if at least 'MAC' is found in 'address', false otherwise. + * Eg 1. + * If 'address' = '00:00:00:00:00:01 10.0.0.4 fe80::ea2a:eaff:fe28:3390/64 + * 30.0.0.3/23' and 'store_ipv6' = true + * then returns true with laddrs->n_ipv4_addrs = 2, naddrs->n_ipv6_addrs = 1. + * + * Eg. 2 + * If 'address' = '00:00:00:00:00:01 10.0.0.4 fe80::ea2a:eaff:fe28:3390/64 + * 30.0.0.3/23' and 'store_ipv6' = false + * then returns true with laddrs->n_ipv4_addrs = 2, naddrs->n_ipv6_addrs = 0. + * + * Eg 3. If 'address' = '00:00:00:00:00:01 10.0.0.4 addr 30.0.0.4', then + * returns true with laddrs->n_ipv4_addrs = 1 and laddrs->n_ipv6_addrs = 0. + */ +bool +extract_lport_addresses(char *address, struct lport_addresses *laddrs, +bool store_ipv6) +{ +char *buf = address; +int buf_index = 0; +char *buf_end = buf + strlen(address); +if (!ovs_scan_len(buf, _index, ETH_ADDR_SCAN_FMT, + ETH_ADDR_SCAN_ARGS(laddrs->ea))) { +return false; +} + +ovs_be32 ip4; +struct in6_addr ip6; +unsigned int plen; +char *error; + +laddrs->n_ipv4_addrs = 0; +laddrs->n_ipv6_addrs = 0; +laddrs->ipv4_addrs = NULL; +laddrs->ipv6_addrs = NULL; + +/* Loop through the buffer and extract the IPv4/IPv6 addresses + * and store in the 'laddrs'. Break the loop if invalid data is found. + */ +buf += buf_index; +while (buf < buf_end) { +buf_index = 0; +error = ip_parse_cidr_len(buf, _index, , ); +if (!error) { +laddrs->n_ipv4_addrs++; +laddrs->ipv4_addrs = xrealloc( +laddrs->ipv4_addrs, +sizeof (struct ipv4_netaddr) * laddrs->n_ipv4_addrs); +laddrs->ipv4_addrs[laddrs->n_ipv4_addrs - 1].addr = ip4; +laddrs->ipv4_addrs[laddrs->n_ipv4_addrs - 1].plen = plen; +buf += buf_index; +continue; +} +free(error); +error = ipv6_parse_cidr_len(buf, _index, , ); +if (!error && store_ipv6) { +laddrs->n_ipv6_addrs++; +laddrs->ipv6_addrs = xrealloc( +laddrs->ipv6_addrs, +sizeof(struct ipv6_netaddr) * laddrs->n_ipv6_addrs); +memcpy(>ipv6_addrs[laddrs->n_ipv6_addrs - 1].addr, , + sizeof(struct in6_addr)); +laddrs->ipv6_addrs[laddrs->n_ipv6_addrs - 1].plen = plen; +
[ovs-dev] [PATCH v2 0/2] ovn: send garp on localnet
The first patch refactors the function extract_lport_addresses so that it can be used in ovn-northd and in ovn-controller. The second patch depends on the first patch, and implements the functionality to send gratuitous arps on localnet. Changes v1->v2: Changes per code-review: * send gratuitous arp request instead of a reply * use poll_timer_wait_until for timing * Various code changes to conform to coding style ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2 RFC] ovn: Support native dhcp using 'continuations'
Tested-by: Ramu Ramamurthy <ramu.ramamur...@us.ibm.com> Numan, I tested this patch to work on devstack+ovn without the openstack-plugin, with manual configuration. Notes: 1) In ovn/utilities/ovn-nbctl.c, usage() Can you add a help string to ovn-nbctl for the new command lswitch-get-dhcp-options, and lswitch-set-dhcp-options 2) When server_mac was not defined, the lflow was created like this, maybe the error checks can be tighter in ovn-northd.c table=3( ls_in_dhcp), priority= 50, match=(inport == "7fe086b5-9ab0-4c14-bf62-0291a62a4b14" && eth.src == fa:16:3e:cf:3d:bc && ip4.src == 0.0.0.0 && ip4.dst == 255.255.255.255 && udp.src == 68 && udp.dst == 67), action=(dhcp_offer(offerip = 10.0.1.3, netmask = 255.255.255.0, router = 10.0.1.1, mtu = 1400, server_id = 10.0.1.2, dns_server = {8.8.8.8,7.7.7.7}, lease_time = 43200); eth.dst = eth.src; eth.src = (null); ip4.dst = 10.0.1.3; ip4.src = 10.0.1.2; udp.src = 67; udp.dst = 68; outport = inport; inport = ""; /* Allow sending out inport. */ output;) The ovn-controller failed to parse the above flow at eth.src 2016-04-20T23:19:49Z|00070|lflow|WARN|error parsing actions "dhcp_offer(offerip = 10.0.1.2, netmask = 255.255.255.0, router = 10.0.1.1, mtu = 1400, server_id = 10.0.1.2, dns_server = {8.8.8.8,7.7.7.7}, lease_time = 43200); eth.dst = eth.src; eth.src = (null); ip4.dst = 10.0.1.2; ip4.src = 10.0.1.2; udp.src = 67; udp.dst = 68; outport = inport; inport = ""; /* Allow sending out inport. */ output;": Syntax error at `(' expecting constant. On Mon, Apr 18, 2016 at 10:29 AM, Numan Siddique <nusid...@redhat.com> wrote: > To support native dhcp in ovn > - A new column 'dhcp-options' is added in 'Logical_Switch' north db. > - A logical flow is added for each logical port to handle dhcp packets >if the CMS has defined dhcp options in this column. > > Eg. > action=(dhcp_offer(offerip = 10.0.0.2, router = 10.0.0.1, > server_id = 10.0.0.2, mtu = 1300, lease_time = 3600, > netmask = 255.255.255.0); eth.dst = eth.src; eth.src = 00:00:00:00:00:03; > ip4.dst = 10.0.0.2; ip4.src = 10.0.0.2; udp.src = 67; udp.dst = 68; > outport = inport; inport = ""; /* Allow sending out inport. */ output;) > > - ovn-controller will convert this logical flow to a packet-in OF flow with >'pause' flag set. The dhcp options defined in 'dhcp_offer' action >are stored in the 'userdata' > > - When the ovn-controller receives the packet-in, it would frame a new >dhcp packet with the dhcp options set in the 'userdata' and resume >the packet. > > TODO: Test cases and updating the necessary documentation. > > Signed-off-by: Numan Siddique <nusid...@redhat.com> ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH RFC] ovn: distributed logical port for VM metadata access
On Mon, Apr 18, 2016 at 2:55 PM, Aaron Rosenwrote: > I like this idea as well. > > The one question I have with is is how we should determine which ip address > to select for the 'distributed' port? Aaron, Thanks for your review, and feedback. We can use the dhcp-port (and its IP address) for the distributed-port. The current native-dhcp proposal (http://openvswitch.org/pipermail/dev/2016-April/069787.html) assumes that a dhcp-server ip-address "server_id" is defined on the subnet. action=(dhcp_offer(offerip = 10.0.0.2, router = 10.0.0.1, server_id = 10.0.0.2, mtu = 1300, lease_time = 3600, For openstack, This means that a DHCP port has been created on that subnet in neutron. In the absence of a dhcp-agent, the neutron-ovn-plugin would have to auto-create the dhcp-port in neutron upon creation of a subnet, and then use that port's IP address as the "server_id" when it programs the "dhcp-options" column of the Logical_Switch table. The pros of the distributed-port approach is that a) HA is not needed, b) it runs the existing neutron-metadata-proxy/neutron-metadata-agent as-is, c) In the future, we could remove the neutron-metadata-agent also, by (nova-compute) configuring the instance-id and tenant-id as external-ids of the VM's ovs interface - hence not need to run any neutron-agents at all. The drawbacks include creation of namespaces and metadata-proxy processes on each hypervisor. > If we want to avoid creating the network name spaces and > running the http proxy on each hypervisor is if we took a similar approach > that openstack uses for handling dhcp/metadata requests. > When a subnet is created we could have the neutron-ovn-plugin notify a > metadata agent which would create a port on the given subnet for the logical > network. Then, to get instances to route its metadata traffic to this > logical port we could have ovn distribute an additional host-route via dhcp > (using option 121). Similar to what you are proposing. > > > I.e: So for example if someone created a network/subnet. > > In the ovn plugin we can signal the metadata agent to create a port on that > network. Then, for every port that is created on this network we would > distribute a hostroute of 169.254.169.254/32 via ; Then, > we'd have the metadata agent just run there which would answer these meta > data requests and route them back. > > One down side to this solution is that this metadata agent would need to be > made HA in some way. In your current solution if the metadata agent crashes > or something the failure is isolated to the hypervisor. That said, this type > of HA seems like it can be implemented in at least an active passive > solution easily enough. > > Thoughts? > Your proposal is an alternative solution - which involves changes only to the neutron components (and no changes in ovn ?). Would there be only one modified neutron-metadata-agent in an active-passive configuration serving all the VMs ? If there are multiple agents, would you need agent-scheduling to assign networks to agents ? Could you share more details of the approach ? ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH RFC] ovn: distributed logical port for VM metadata access
> > neutron-ovn-metadata would be something we maintain in our OpenStack plugin > repo (networking-ovn), right? > Russell, Thanks for your review and feedback, neutron-ovn-metadata would be maintained in networking-ovn repo. > I like this proposal. It suggests adding only the minimal amount of support > needed from OVN itself to enable us to get our OpenStack-specific job done. > This is much better than anything I had thought of. > > Thank you! > > -- > Russell Bryant ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH RFC] ovn: distributed logical port for VM metadata access
> This is an interesting proposal. The alternative, which strikes me as a > terrible idea, would be to somehow implement a TCP/IP stack and HTTP > server inside OVS. > > Does there need to be a single metadata server per hypervisor, or one > per logical switch per hypervisor? > Ben, Thanks for your review, and feedback. The metadata-proxy-server is one per logical-switch that has at least 1 VM on the hypervisor. In the worst case, there may be one metadata-proxy-server per VM on the hypervisor if each VM belonged to a different logical switch. The metadata-proxy-server forwards the http request from the VM after setting a few http headers - one of the http-headers is the VMs IP address which is obtained from the http request, and the other http header is the network-id which comes from the configuration of the process. As such, the metadata-proxy-server is expected to be a lightweight process. > I think that a lot of the commit message (include some bits that I > snipped) should go into documentation. > > I feel like this needs to be properly evaluated by someone who > understands OpenStack and Neutron better than me. Maybe Russell? ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v1 RFC] ovn: Support native dhcp using 'continuations'
> +/* dhcp options */ > +shash_init(_opt_symtab); > +dhcp_opt_expr_symtab_add_field(_opt_symtab, "offerip", 0, > + DHCP_OPT_TYPE_IP4); > +dhcp_opt_expr_symtab_add_field(_opt_symtab, "netmask", 1, > + DHCP_OPT_TYPE_IP4); > +dhcp_opt_expr_symtab_add_field(_opt_symtab, "router", 3, > + DHCP_OPT_TYPE_IP4); > +dhcp_opt_expr_symtab_add_field(_opt_symtab, "dns_server", 6, > + DHCP_OPT_TYPE_IP4); > +dhcp_opt_expr_symtab_add_field(_opt_symtab, "domain_name", 15, > + DHCP_OPT_TYPE_STR); > +dhcp_opt_expr_symtab_add_field(_opt_symtab, "ip_forward_enable", 19, > + DHCP_OPT_TYPE_BOOL); > +dhcp_opt_expr_symtab_add_field(_opt_symtab, "mtu", 26, > + DHCP_OPT_TYPE_UINT16); > +dhcp_opt_expr_symtab_add_field(_opt_symtab, "lease_time", 51, > + DHCP_OPT_TYPE_UINT32); > +dhcp_opt_expr_symtab_add_field(_opt_symtab, "server_id", 54, > + DHCP_OPT_TYPE_IP4); > } Numan, DHCP static-route option (249) is used in the DHCP-offer to enable VM Metadata access. Could you consider adding support for that option - Or I can help to add that option. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v1 1/2] ovn: Move extract_lport_addresses
On Mon, Apr 11, 2016 at 6:12 PM, Ben Pfaff <b...@ovn.org> wrote: > On Sun, Apr 03, 2016 at 09:49:03PM -0400, Ramu Ramamurthy wrote: >> Move the function extract_lport_addresses to a file >> in ovn/lib since that function can be used by ovn-controller also >> to parse addresses stored in the mac column of the >> port_binding table. Currently that function is used only >> in ovn_northd. >> >> Signed-off-by: Ramu Ramamurthy <ramu.ramamur...@us.ibm.com> > > Please add the standard license notice at the top of each of the new > files. > > Function-level comments go on function definitions, not their header > file prototypes. > > The comment > +/* contains ovn utility functions */ > isn't very useful and I'd recommend deleting it. > > Thanks, > > Ben. Ben, Thanks for the review, I will update in the next version with changes noted above. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v1 2/2] ovn: Send garp on localnet
On Mon, Apr 11, 2016 at 6:29 PM, Ben Pfaff <b...@ovn.org> wrote: > On Sun, Apr 03, 2016 at 09:49:04PM -0400, Ramu Ramamurthy wrote: >> In some usecases such as VM migration or when VMs reuse IP addresses, >> VMs become unreachable externally because >> external switches/routers on localnet >> have stale port-mac or arp caches. The problem resolves >> after some time when the caches ageout which could be >> minutes for port-mac bindings or hours for arp caches. >> >> To fix this, send some gratuitous ARPs when a logical >> port on a localnet datapath gets added. Such gratuitous arps >> help on a best-effort basis to update the mac-port bindings and arp-caches >> of external switches and routers on the localnet. >> >> Reported-at: https://bugs.launchpad.net/networking-ovn/+bug/1545897 >> Reported-by: Kyle Mestery <mest...@mestery.com> >> Signed-off-by: Ramu Ramamurthy <ramu.ramamur...@us.ibm.com> > Ben, Thanks for your review, > Darrell (CCed) brought up some issues with this idea. I don't know > whether his questions have been satisfactorily answered. > In the next version, I will update as per Darrell's comments. > Is there an equivalent of GARP for IPv6? > I dont know enough IPv6 to answer this, The problem we saw which motivated this fix was in a usecase with IPv4. > The GARP code manages and acts based on timers, but I don't see anything > that calls, e.g., poll_timer_wait_until(), to ensure that ovn-controller > wakes up when it's time to send a GARP. This call would ordinarily be > made by adding another call to pinctrl_wait(). > The code assumes that pinctr_run() would be run often enough to not need to fire timers. But this is not a correct assumption, as there could be pauses between successive runs of pinctl_run(), causing successive garps to be sent at arbitrary intervals. An approach based on timers makes the fix deterministic. I will look into this further in the next version. > Please be careful about coding style. The first thing that jumps out at > me here is comment style. CodingStyle.md says: > > Comments should be written as full sentences that start with a > capital letter and end with a period. Put two spaces between > sentences. > > (Comments that aren't on lines of their own don't need to be sentences > but still should be capitalized and end with a period.) > I will update the next version with careful attention to the coding style for this and the following notes on sizeof, and expressions, > The following is unsafe if there is no such datapath, because > CONTAINER_OF does not treat a null pointer specially: > > struct local_datapath *ld = > CONTAINER_OF(hmap_first_with_hash(local_datapaths, > pb->datapath->tunnel_key), > struct local_datapath, hmap_node); > For this I will use the get_local_datapath() helper. > Please write sizeof *garp here instead of sizeof(struct garp_data): > > /* the mac address must be valid */ > struct garp_data *garp = xmalloc(sizeof(struct garp_data)); > > as requested in CodingStyle: > > The "sizeof" operator is unique among C operators in that it accepts > two very different kinds of operands: an expression or a type. In > general, prefer to specify an expression, e.g. "int *x = > xmalloc(sizeof *x);". When the operand of sizeof is an expression, > there is no need to parenthesize that operand, and please don't. > > Please put spaces around + and * as recommended by CodingStyle, e.g. here: > +garp->announce_time = time_msec()+1000; > and here: > +garp->announce_time = current_time + garp->backoff*1000; > > Thanks, > > Ben. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v1 2/2] ovn: Send garp on localnet
On Tue, Apr 12, 2016 at 9:37 AM, Darrell Ball <dlu...@gmail.com> wrote: > Ramu > > High level comment earlier that still applies: > > 1) garp reply is still being used rather than garp request >garp replies are more likely to be dropped or explicitly filtered so garp > requests are "better" > > Do you need to make changes to build_lswitch_flows() to allow the garp > requests out the localnet > ports ? Darrell, Thanks, In the next version, I will update the code to send garp requests. garp requests are responded-to by the ARP responder flows. This interferes with sending garps using the packet out destination OFPP_TABLE which injects the packet into the ingress pipeline. So a change in needed the arp-responder flows to not respond to garps. Alternatively, I can change the packet out destination to directly output garps onto the localnet ofport. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH RFC] ovn: distributed logical port for VM metadata access
A new logical-port-type called "distributed" is introduced with this change. A distributed logical port is not bound to any chassis. An instance of the distributed port is created as an ovs internal port at all chasses where the datapath has at least 1 vif. VIFs on a chassis can communicate with the locally created distributed port on the same chassis. A distributed port cannot be reached remotely via tunnels or via localnet from other logical ports on the logical switch. Unicast traffic originated from external localnet VMs cannot reach the distributed port. The distributed logical port allows the implementation of services in a logical switch that are available at the same IP,mac address on each chassis to serve local VMs on that logical switch bound to that chassis. The motivating usecase is the Openstack dhcp-based metadata access which is described in detail below. Please suggest any feedback on a) the usecase, b) the approach suggested here to implement it, c) any alternatives. If it makes sense I can pursue this work further. Usecase - DHCP-based metadata access VMs (using cloudinit) use metadata access at the url http://169.254.169.254. (https://cloudinit.readthedocs.org/en/latest/topics/datasources.html#ec2) In Openstack DHCP-based metadata access, the DHCP server advertizes a static route to the link-local address 169.254.169.254 pointing to the DHCP port's IP address using DHCP option 249. A neutron-metadata-proxy server listens on the DHCP port which is also aliased with the link local address 169.254.169.254, and serves metadata requests from VMs. The metadata-proxy server extracts the VMs IP address from the http request, and sets the "X-Forwarded-For" and "X-Neutron-Network-Id" http headers, and forwards the request to the neutron-metadata-agent which in turn forwards the request to nova. With proposed native-dhcp implementation in OVN, there is no need to run the neutron-DHCP-agent in Openstack. Since DHCP-based metadata was implemented by the dhcp-agent, A new approach is needed to implement the metadata-service. The following is a proposed approach. 1) The DHCP logical port's type is configured to be "distributed" in the NorthBound DB. Options on the logical port are configured to indicate the neutron network id. 2) OVN controller creates internal OVS port on each chassis where VIFs on that datapath are present. External IDs on that interface denote the neutron-network id, and dhcp-port ip address, and the link-local ip address to which a static route was advertized. 3) An ovn-metadata entity monitors changes to the OVSDB interface table, and upon seeing a new interface does the following using information from the external-ids on that interface: a) creates namespace and sets the internal port in the namespace, b) spawns the neutron-metadata-proxy server listening on that port Example 1) Create a distributed port (This would be done by neutron-ovn-plugin, upon creation of the DHCP port) ovn-nbctl lport-add 369611f4-f6a5-4a97-a9a2-75e241dc0803 vif-1 ovn-nbctl lport-set-addresses vif-1 "fe:16:3e:ab:cd:01 10.0.0.100" ovn-nbctl lport-set-port-security vif-1 "fe:16:3e:ab:cd:01" ovn-nbctl lport-set-type vif-1 distributed ovn-nbctl lport-set-options vif-1 name="vif1" network="c827c93e-76f3-4fc7-8e98-1df0b97cf927" mac="fe:16:3e:ab:cd:01" ipv4="10.0.0.100/24" application="neutron-metadata" 2) Boot a VM, after VM boots, set the static route inside the VM (this would be done by native-DHCP server automatically by injecting a static route in the DHCP offer) sudo ip route add 169.254.169.254 via 10.0.0.100 3) On the hypervisor (This would be done by a new neutron-ovn-metadata entity running on each hypervisor upon monitoring ovs interface table and detecting a new vif1 interface with options indicating that the application is metadata) sudo ip netns add vif-1 sudo ip link set vif-1 netns vif-1 sudo ip netns exec vif-1 ip link set dev vif1 address fe:16:3e:ab:cd:01 sudo ip netns exec vif-1 ip addr add 10.0.0.100/24 dev vif1 sudo ip netns exec vif-1 ip addr add 169.254.169.254 dev vif1 sudo ip netns exec vif-1 ip link set dev vif1 up sudo ip netns exec vif-1 /usr/bin/python /usr/bin/neutron-ns-metadata-proxy --pid_file=/opt/stack/data/neutron/external/pids/vif-1.pid --metadata_proxy_socket=/opt/stack/data/neutron/metadata_proxy --network_id=c827c93e-76f3-4fc7-8e98-1df0b97cf927 --state_path=/opt/stack/data/neutron --metadata_port=80 --metadata_proxy_user=1001 --metadata_proxy_group=1001 --debug --verbose Signed-off-by: Ramu Ramamurthy <ramu.ramamur...@us.ibm.com> --- ovn/controller/binding.c | 73 +-- ovn/controller/physical.c | 10 +-- 2 files changed, 79 insertions(+), 4 deletions(-) diff --git a/ovn/controller/binding.c b/ovn/controller/binding.c index d3ca9c9..abf9556 100644 --- a/ovn/controller/binding.
Re: [ovs-dev] [PATCH v2] ovn: Add software l2 gateway.
On Mon, Apr 4, 2016 at 5:58 AM, Russell Bryantwrote: > This patch implements one approach to using ovn-controller to implement > a software l2 gateway between logical and physical networks. > > A new logical port type called "gateway" is introduced here. It is very > close to how localnet ports work, with the following exception: > > - A localnet port makes OVN use the physical network as the > transport between hypervisors instead of tunnels. A gateway port still > uses tunnels between all hypervisors, and packets only go to/from the > specified physical network as needed via the chassis the gateway port > is bound to. > > - A gateway port also gets bound to a chassis while a localnet port does > not. This binding is not done by ovn-controller. It is left as an > administrative function. In the case of OpenStack, the Neutron plugin > will do this. > > Signed-off-by: Russell Bryant > --- > Can there be more than 1 gateway to the same l2 network (say for HA) ? If gateway port G1 on chassis1 and gateway port G2 on chassis2, Then, when a VM a chassis3 sends a broadcast, it will tunnel to chassis1, and chassis2, and then get sent out of G1 and G2 to the same L2 network, ie, Can that broadcast packet get sent in duplicate to the physical network ? If so, would you have to run xSTP on the gateway ports ? ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v1 RFC] ovn: Support native dhcp using 'continuations'
> @@ -89,10 +89,11 @@ enum ovn_stage { > PIPELINE_STAGE(SWITCH, IN, PORT_SEC_L2,0, "ls_in_port_sec_l2") \ > PIPELINE_STAGE(SWITCH, IN, PORT_SEC_IP,1, "ls_in_port_sec_ip") \ > PIPELINE_STAGE(SWITCH, IN, PORT_SEC_ND,2, "ls_in_port_sec_nd") \ > -PIPELINE_STAGE(SWITCH, IN, PRE_ACL,3, "ls_in_pre_acl") \ > -PIPELINE_STAGE(SWITCH, IN, ACL,4, "ls_in_acl") \ > -PIPELINE_STAGE(SWITCH, IN, ARP_RSP,5, "ls_in_arp_rsp") \ > -PIPELINE_STAGE(SWITCH, IN, L2_LKUP,6, "ls_in_l2_lkup") \ > +PIPELINE_STAGE(SWITCH, IN, DHCP, 3, "ls_in_dhcp") \ > +PIPELINE_STAGE(SWITCH, IN, PRE_ACL,4, "ls_in_pre_acl") \ > +PIPELINE_STAGE(SWITCH, IN, ACL,5, "ls_in_acl") \ > +PIPELINE_STAGE(SWITCH, IN, ARP_RSP,6, "ls_in_arp_rsp") \ > +PIPELINE_STAGE(SWITCH, IN, L2_LKUP,7, "ls_in_l2_lkup") \ >\ Would it make sense to put DHCP after ACL instead of before - so, some control is provided on the handling of DHCP packets via acl rules. For instance, OpenStack programs these DHCP ACL rules currently allowing client->server communication - and the usecase would be if i want to drop all client->server DHCP traffic for a rogue VM. table=2( ls_in_acl), priority= 2002, match=(inport == "55c0912f-f7aa-4318-82f1-6118032839e3" && ip4 && (ip4.dst == 255.255.255.255 || ip4.dst == 10.10.0.0/16) && udp && udp.src == 68 && udp.dst == 67), action=(ct_commit; next;) ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH v1 2/2] ovn: Send garp on localnet
In some usecases such as VM migration or when VMs reuse IP addresses, VMs become unreachable externally because external switches/routers on localnet have stale port-mac or arp caches. The problem resolves after some time when the caches ageout which could be minutes for port-mac bindings or hours for arp caches. To fix this, send some gratuitous ARPs when a logical port on a localnet datapath gets added. Such gratuitous arps help on a best-effort basis to update the mac-port bindings and arp-caches of external switches and routers on the localnet. Reported-at: https://bugs.launchpad.net/networking-ovn/+bug/1545897 Reported-by: Kyle Mestery <mest...@mestery.com> Signed-off-by: Ramu Ramamurthy <ramu.ramamur...@us.ibm.com> --- ovn/controller/ovn-controller.c | 2 +- ovn/controller/pinctrl.c| 217 +++- ovn/controller/pinctrl.h| 5 +- tests/ovn.at| 61 +++ 4 files changed, 282 insertions(+), 3 deletions(-) diff --git a/ovn/controller/ovn-controller.c b/ovn/controller/ovn-controller.c index 6027011..3f0aea4 100644 --- a/ovn/controller/ovn-controller.c +++ b/ovn/controller/ovn-controller.c @@ -335,7 +335,7 @@ main(int argc, char *argv[]) enum mf_field_id mff_ovn_geneve = ofctrl_run(br_int); -pinctrl_run(, , br_int); +pinctrl_run(, , br_int, chassis_id, _datapaths); struct hmap flow_table = HMAP_INITIALIZER(_table); lflow_run(, , , _datapaths, diff --git a/ovn/controller/pinctrl.c b/ovn/controller/pinctrl.c index 3fcab99..57be002 100644 --- a/ovn/controller/pinctrl.c +++ b/ovn/controller/pinctrl.c @@ -24,6 +24,7 @@ #include "ofp-actions.h" #include "ovn/lib/actions.h" #include "ovn/lib/logical-fields.h" +#include "ovn/lib/ovn-util.h" #include "ofp-msgs.h" #include "ofp-print.h" #include "ofp-util.h" @@ -54,6 +55,13 @@ static void run_put_arps(struct controller_ctx *, static void wait_put_arps(struct controller_ctx *); static void flush_put_arps(void); +static void init_send_garps(void); +static void destroy_send_garps(void); +static void send_garp_run(const struct lport_index *lports, + const struct ovsrec_bridge *br_int, + const char *chassis_id, + struct hmap *local_datapaths); + COVERAGE_DEFINE(pinctrl_drop_put_arp); void @@ -62,6 +70,7 @@ pinctrl_init(void) swconn = rconn_create(5, 0, DSCP_DEFAULT, 1 << OFP13_VERSION); conn_seq_no = 0; init_put_arps(); +init_send_garps(); } static ovs_be32 @@ -266,7 +275,9 @@ pinctrl_recv(const struct ofp_header *oh, enum ofptype type) void pinctrl_run(struct controller_ctx *ctx, const struct lport_index *lports, -const struct ovsrec_bridge *br_int) +const struct ovsrec_bridge *br_int, +const char *chassis_id, +struct hmap *local_datapaths) { if (br_int) { char *target; @@ -307,6 +318,9 @@ pinctrl_run(struct controller_ctx *ctx, const struct lport_index *lports, } run_put_arps(ctx, lports); +if (rconn_is_connected(swconn)) { +send_garp_run(lports, br_int, chassis_id, local_datapaths); +} } void @@ -322,6 +336,7 @@ pinctrl_destroy(void) { rconn_destroy(swconn); destroy_put_arps(); +destroy_send_garps(); } /* Implementation of the "put_arp" OVN action. This action sends a packet to @@ -484,3 +499,203 @@ flush_put_arps(void) free(pa); } } + +/* + * Send gratuitous arp for vifs on localnets + * + * When a new vif on localnet is added, gratuitous arps + * are sent announcing the port's mac,ip mapping. + * On localnet, such announcements are needed for + * switches and routers on the broadcast segment to + * update their port-mac and arp tables. + */ +struct garp_data { +int ofport; /* ofport used for this garp */ +struct eth_addr ea; /* ethernet address of port */ +ovs_be32 ipv4; /* ipv4 address of port */ +long long int announce_time; /* next announcement in ms */ +int backoff; /* backoff for the next announcement */ +}; + +/* Contains garps to be sent */ +static struct shash send_garp_data; + +static void +init_send_garps(void) +{ +shash_init(_garp_data); +} + +static void +destroy_send_garps(void) +{ +shash_destroy(_garp_data); +} + +/* Add a new vif to announce garps */ +static void +send_garp_add(const struct sbrec_port_binding *binding_rec, + int ofport) +{ +int i; +for (i = 0; i < binding_rec->n_mac; i++) { +struct lport_addresses laddrs; +if (extract_lport_addresses(binding_rec->mac[i], , +false)) { +/* the mac address must be valid */ +struct garp_data *garp = xmalloc(sizeof(struct garp_data)); +garp->ofport
[ovs-dev] [PATCH v1 1/2] ovn: Move extract_lport_addresses
Move the function extract_lport_addresses to a file in ovn/lib since that function can be used by ovn-controller also to parse addresses stored in the mac column of the port_binding table. Currently that function is used only in ovn_northd. Signed-off-by: Ramu Ramamurthy <ramu.ramamur...@us.ibm.com> --- ovn/lib/automake.mk | 4 +- ovn/lib/ovn-util.c | 71 + ovn/lib/ovn-util.h | 53 ovn/northd/ovn-northd.c | 104 +--- 4 files changed, 128 insertions(+), 104 deletions(-) create mode 100644 ovn/lib/ovn-util.c create mode 100644 ovn/lib/ovn-util.h diff --git a/ovn/lib/automake.mk b/ovn/lib/automake.mk index 9e09352..32e7b87 100644 --- a/ovn/lib/automake.mk +++ b/ovn/lib/automake.mk @@ -10,7 +10,9 @@ ovn_lib_libovn_la_SOURCES = \ ovn/lib/expr.h \ ovn/lib/lex.c \ ovn/lib/lex.h \ - ovn/lib/logical-fields.h + ovn/lib/logical-fields.h \ + ovn/lib/ovn-util.c \ + ovn/lib/ovn-util.h nodist_ovn_lib_libovn_la_SOURCES = \ ovn/lib/ovn-nb-idl.c \ ovn/lib/ovn-nb-idl.h \ diff --git a/ovn/lib/ovn-util.c b/ovn/lib/ovn-util.c new file mode 100644 index 000..e6297ed --- /dev/null +++ b/ovn/lib/ovn-util.c @@ -0,0 +1,71 @@ + +#include +#include "ovn-util.h" +#include "openvswitch/vlog.h" + +VLOG_DEFINE_THIS_MODULE(ovn_util); + +/* contains ovn utility functions */ + +bool +extract_lport_addresses(char *address, struct lport_addresses *laddrs, +bool store_ipv6) +{ +char *buf = address; +int buf_index = 0; +char *buf_end = buf + strlen(address); +if (!ovs_scan_len(buf, _index, ETH_ADDR_SCAN_FMT, + ETH_ADDR_SCAN_ARGS(laddrs->ea))) { +return false; +} + +ovs_be32 ip4; +struct in6_addr ip6; +unsigned int plen; +char *error; + +laddrs->n_ipv4_addrs = 0; +laddrs->n_ipv6_addrs = 0; +laddrs->ipv4_addrs = NULL; +laddrs->ipv6_addrs = NULL; + +/* Loop through the buffer and extract the IPv4/IPv6 addresses + * and store in the 'laddrs'. Break the loop if invalid data is found. + */ +buf += buf_index; +while (buf < buf_end) { +buf_index = 0; +error = ip_parse_cidr_len(buf, _index, , ); +if (!error) { +laddrs->n_ipv4_addrs++; +laddrs->ipv4_addrs = xrealloc( +laddrs->ipv4_addrs, +sizeof (struct ipv4_netaddr) * laddrs->n_ipv4_addrs); +laddrs->ipv4_addrs[laddrs->n_ipv4_addrs - 1].addr = ip4; +laddrs->ipv4_addrs[laddrs->n_ipv4_addrs - 1].plen = plen; +buf += buf_index; +continue; +} +free(error); +error = ipv6_parse_cidr_len(buf, _index, , ); +if (!error && store_ipv6) { +laddrs->n_ipv6_addrs++; +laddrs->ipv6_addrs = xrealloc( +laddrs->ipv6_addrs, +sizeof(struct ipv6_netaddr) * laddrs->n_ipv6_addrs); +memcpy(>ipv6_addrs[laddrs->n_ipv6_addrs - 1].addr, , + sizeof(struct in6_addr)); +laddrs->ipv6_addrs[laddrs->n_ipv6_addrs - 1].plen = plen; +} + +if (error) { +static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1); +VLOG_INFO_RL(, "invalid syntax '%s' in address", address); +free(error); +break; +} +buf += buf_index; +} + +return true; +} diff --git a/ovn/lib/ovn-util.h b/ovn/lib/ovn-util.h new file mode 100644 index 000..7852568 --- /dev/null +++ b/ovn/lib/ovn-util.h @@ -0,0 +1,53 @@ + + +#ifndef OVN_UTIL_H +#define OVN_UTIL_H 1 + +#include "lib/packets.h" + +struct ipv4_netaddr { +ovs_be32 addr; +unsigned int plen; +}; + +struct ipv6_netaddr { +struct in6_addr addr; +unsigned int plen; +}; + +struct lport_addresses { +struct eth_addr ea; +size_t n_ipv4_addrs; +struct ipv4_netaddr *ipv4_addrs; +size_t n_ipv6_addrs; +struct ipv6_netaddr *ipv6_addrs; +}; + +/* + * Extracts the mac, ipv4 and ipv6 addresses from the input param 'address' + * which should be of the format 'MAC [IP1 IP2 ..]" where IPn should be + * a valid IPv4 or IPv6 address and stores them in the 'ipv4_addrs' and + * 'ipv6_addrs' fields of input param 'laddrs'. + * The caller has to free the 'ipv4_addrs' and 'ipv6_addrs' fields. + * If input param 'store_ipv6' is true only then extracted ipv6 addresses + * are stored in 'ipv6_addrs' fields. + * Return true if at least 'MAC' is found in 'address', false otherwise. + * Eg 1. + * If 'address' = '00:00:00:00:00:01 10.0.0.4 fe80::ea2a:eaff:fe28:3390/64 + * 30.0.0.3/23' and 'store_ipv6' = true + * then returns true with laddrs->n_ipv4_addrs = 2, naddrs->n_ipv6_addrs = 1. + * + * Eg. 2 +
Re: [ovs-dev] [PATCH RFC 0/3] - send garp on localnet
> > On a separate but related topic, northd programs localnet to not respond to > regular > arp requests coming from the network via a localnet port. > > How can IP traffic be initiated from a VM outside OVN control or from the > physical network to a VM inside OVN controlled HV, connected via a localnet > port ? > Traffic initiated from VM1 outside OVN to VM2 inside OVN would flow as follows: VM1 would first arp for VM2's IP. This arp is a broadcast, and would enter br-int via the localnet patch on ALL hypervisors connected to that localnet. br-int "floods" that arp to all locally attached VIFs on that localnet datapath. VM2 happens to be one of those VIFs. VM2 receives that arp, and replies back. As a result of the above interaction, external switches would learn (port->source mac) bindings and the external router would learn the arp mappings (IP->mac) The problem is that port->source mac binding is cached for 5 minutes at switches, and the arp binding is cached for 4 hours by some routers !! During that period, consider the following 2 scenarios: 1) VM2 moved between hypervisors due to migration and stayed silent. VM2 cannot be accessed externally due to stale port->mac mappings for several minutes. Eventually, the mac-learning entry ages out and the switch treats VM2's mac as unknown and floods it as unknown unicast. OR 2) VM2 got deleted, and VM3 is booted within OVN, and VM3 has the same IP as VM2 BUT a different mac, AND VM3 stayed silent. VM3 cannot be accessed externally due to stale arp mapping at a router for several hours !. Eventually, the router would ageout its arp entry, and then, the router would ARP for VM3's IP, and traffic to VM3 is successfully established. So, the problem addressed by this code change ONLY handles the corner cases of stale mac-learning entries at switches or stale arp entries at routers. > Are you depending on flooding the network with gARPs periodically (to handle > new external hosts) for all VMs on all HVs inside OVN, attached to localnet > ? No, the gARP is sent only a few times when a logical port on localnet comes up to handle the special corner case (of stale mac-learning or arp caches) described above. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH RFC 2/3] send garp on localnet
On Thu, Mar 31, 2016 at 12:33 PM, Numan Siddiquewrote: >> >> +/* Add a new vif to announce garps */ >> +static void >> +send_garp_add(const struct sbrec_port_binding *binding_rec, >> + int ofport) >> +{ >> +int i; >> +for (i = 0; i < binding_rec->n_mac; i++) { >> +struct lport_addresses laddrs; >> +if (extract_lport_addresses(binding_rec->mac[i], , >> +false)) { >> +/* the mac address must be valid */ >> +struct garp_data *garp = xmalloc(sizeof(struct garp_data)); >> +garp->ofport = ofport; >> +garp->ea = laddrs.ea; >> +garp->ipv4 = 0; >> +if (laddrs.n_ipv4_addrs) { >> +garp->ipv4 = laddrs.ipv4_addrs[0].addr; >> > > You have to free laddrs.ipv4_addrs. > > Also I think you can merge patchset 2 and 3 as patch 3 tests patch 2 > Thanks !, I missed the free of laddrs.ipv4_addr, In the next version, I will combine the test with the code in one commit. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH RFC 0/3] - send garp on localnet
> > Why did you choose to use reply packets ? > > request packets are less likely to be rejected by external networks > and I think thats discussed in the RFC > > I wanted to use the GARP request, but found a problem. OVN currently programs flows to respond to gratuitous arp requests from VMs, Such flows also respond to ARP probes from VMs. This behavior needs to change so that OVN must not respond to GARP requests or ARP probes. It must flood them. This change is a work in progress. In fact, some DHCP clients send ARP probes after the DHCP ACK to verify no address collisions, and upon seeing ARP probe response will decline dhcp offered IP. As soon as the above problem is fixed, I can update the code to generate GARP requests. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH RFC 3/3] garp generate test
add a test to validate garp generation on localnet Signed-off-by: Ramu Ramamurthy <ramu.ramamur...@us.ibm.com> --- tests/ovn.at | 61 1 file changed, 61 insertions(+) diff --git a/tests/ovn.at b/tests/ovn.at index f2ceba3..715d1d8 100644 --- a/tests/ovn.at +++ b/tests/ovn.at @@ -1986,3 +1986,64 @@ OVS_APP_EXIT_AND_WAIT([ovs-vswitchd]) OVS_APP_EXIT_AND_WAIT([ovsdb-server]) AT_CLEANUP + +AT_SETUP([ovn -- gratuitous arp generate]) +AT_KEYWORDS([ovn]) + +ovn_start +ovn-nbctl lswitch-add lsw0 +net_add n1 +sim_add hv +as hv +ovs-vsctl \ +-- add-br br-phys \ +-- add-br br-eth0 + +ovn_attach n1 br-phys 192.168.0.1 + +AT_CHECK([ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings=physnet1:br-eth0]) +AT_CHECK([ovs-vsctl add-port br-eth0 snoopvif -- set Interface snoopvif options:tx_pcap=hv/snoopvif-tx.pcap options:rxq_pcap=hv/snoopvif-rx.pcap]) + +# create a vif +AT_CHECK([ovn-nbctl lport-add lsw0 localvif1]) +AT_CHECK([ovn-nbctl lport-set-addresses localvif1 "f0:00:00:00:00:01 192.168.1.2"]) +AT_CHECK([ovn-nbctl lport-set-port-security localvif1 "f0:00:00:00:00:01"]) + +# create a localnet port +AT_CHECK([ovn-nbctl lport-add lsw0 ln_port]) +AT_CHECK([ovn-nbctl lport-set-addresses ln_port unknown]) +AT_CHECK([ovn-nbctl lport-set-type ln_port localnet]) +AT_CHECK([ovn-nbctl lport-set-options ln_port network_name=physnet1]) + +AT_CHECK([ovs-vsctl add-port br-int localvif1 -- set Interface localvif1 external_ids:iface-id=localvif1]) + +#wait for packet to be received +OVS_WAIT_UNTIL([test `wc -c < "hv/snoopvif-tx.pcap"` -ge 50]) +trim_zeros() { +sed 's/\(00\)\{1,\}$//' +} +$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv/snoopvif-tx.pcap | trim_zeros > packets +expected="f00108060001080006040002f001c0a80102c0a80102" +echo $expected > expout +AT_CHECK([sort packets], [0], [expout]) +cat packets + +as hv +OVS_APP_EXIT_AND_WAIT([ovn-controller]) +OVS_APP_EXIT_AND_WAIT([ovs-vswitchd]) +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) + +as ovn-sb +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) + +as ovn-nb +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) + +as northd +OVS_APP_EXIT_AND_WAIT([ovn-northd]) + +as main +OVS_APP_EXIT_AND_WAIT([ovs-vswitchd]) +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) + +AT_CLEANUP -- 2.3.9 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH RFC 2/3] send garp on localnet
Generate gratuitous ARP replies when a logical port on a localnet gets added. Such gratuitous arps help to update the mac-port bindings and arp-caches of external switches and routers on the localnet. Signed-off-by: Ramu Ramamurthy <ramu.ramamur...@us.ibm.com> --- ovn/controller/ovn-controller.c | 2 +- ovn/controller/pinctrl.c| 216 +++- ovn/controller/pinctrl.h| 5 +- 3 files changed, 220 insertions(+), 3 deletions(-) diff --git a/ovn/controller/ovn-controller.c b/ovn/controller/ovn-controller.c index 6027011..3f0aea4 100644 --- a/ovn/controller/ovn-controller.c +++ b/ovn/controller/ovn-controller.c @@ -335,7 +335,7 @@ main(int argc, char *argv[]) enum mf_field_id mff_ovn_geneve = ofctrl_run(br_int); -pinctrl_run(, , br_int); +pinctrl_run(, , br_int, chassis_id, _datapaths); struct hmap flow_table = HMAP_INITIALIZER(_table); lflow_run(, , , _datapaths, diff --git a/ovn/controller/pinctrl.c b/ovn/controller/pinctrl.c index 3fcab99..ceafacc 100644 --- a/ovn/controller/pinctrl.c +++ b/ovn/controller/pinctrl.c @@ -24,6 +24,7 @@ #include "ofp-actions.h" #include "ovn/lib/actions.h" #include "ovn/lib/logical-fields.h" +#include "ovn/lib/ovn-util.h" #include "ofp-msgs.h" #include "ofp-print.h" #include "ofp-util.h" @@ -54,6 +55,13 @@ static void run_put_arps(struct controller_ctx *, static void wait_put_arps(struct controller_ctx *); static void flush_put_arps(void); +static void init_send_garps(void); +static void destroy_send_garps(void); +static void send_garp_run(const struct lport_index *lports, + const struct ovsrec_bridge *br_int, + const char *chassis_id, + struct hmap *local_datapaths); + COVERAGE_DEFINE(pinctrl_drop_put_arp); void @@ -62,6 +70,7 @@ pinctrl_init(void) swconn = rconn_create(5, 0, DSCP_DEFAULT, 1 << OFP13_VERSION); conn_seq_no = 0; init_put_arps(); +init_send_garps(); } static ovs_be32 @@ -266,7 +275,9 @@ pinctrl_recv(const struct ofp_header *oh, enum ofptype type) void pinctrl_run(struct controller_ctx *ctx, const struct lport_index *lports, -const struct ovsrec_bridge *br_int) +const struct ovsrec_bridge *br_int, +const char *this_chassis_id, +struct hmap *local_datapaths) { if (br_int) { char *target; @@ -307,6 +318,9 @@ pinctrl_run(struct controller_ctx *ctx, const struct lport_index *lports, } run_put_arps(ctx, lports); +if (rconn_is_connected(swconn)) { +send_garp_run(lports, br_int, this_chassis_id, local_datapaths); +} } void @@ -322,6 +336,7 @@ pinctrl_destroy(void) { rconn_destroy(swconn); destroy_put_arps(); +destroy_send_garps(); } /* Implementation of the "put_arp" OVN action. This action sends a packet to @@ -484,3 +499,202 @@ flush_put_arps(void) free(pa); } } + +/* + * Send gratuitous arp for vifs on localnets + * + * When a new vif on localnet is added, gratuitous arps + * are sent announcing the port's mac,ip mapping. + * On localnet, such announcements are needed for + * switches and routers on the broadcast segment to + * update their port-mac and arp tables. + */ +struct garp_data { +int ofport; /* ofport used for this garp */ +struct eth_addr ea; /* ethernet address of port */ +ovs_be32 ipv4; /* ipv4 address of port */ +long long int announce_time; /* next announcement in ms */ +int backoff; /* backoff for the next announcement */ +}; + +/* Contains garps to be sent */ +static struct shash send_garp_data; + +static void +init_send_garps(void) +{ +shash_init(_garp_data); +} + +static void +destroy_send_garps(void) +{ +shash_destroy(_garp_data); +} + +/* Add a new vif to announce garps */ +static void +send_garp_add(const struct sbrec_port_binding *binding_rec, + int ofport) +{ +int i; +for (i = 0; i < binding_rec->n_mac; i++) { +struct lport_addresses laddrs; +if (extract_lport_addresses(binding_rec->mac[i], , +false)) { +/* the mac address must be valid */ +struct garp_data *garp = xmalloc(sizeof(struct garp_data)); +garp->ofport = ofport; +garp->ea = laddrs.ea; +garp->ipv4 = 0; +if (laddrs.n_ipv4_addrs) { +garp->ipv4 = laddrs.ipv4_addrs[0].addr; +} +garp->backoff = 1; +garp->announce_time = time_msec()+1000; +shash_add(_garp_data, binding_rec->logical_port, garp); +break; +} +} +} + +/* Remove a port from garp announce */ +static void +send_garp_delete(const char *lp) +{ +struct garp_data *garp =
[ovs-dev] [PATCH RFC 1/3] refactor extract_lport_addresses
Move the function extract_lport_addresses under ovn/lib since that function can be used by ovn-controller also to parse addresses stored in the mac column of the port_binding table. Currently that funtion is used only in ovn_northd. Signed-off-by: Ramu Ramamurthy <ramu.ramamur...@us.ibm.com> --- ovn/lib/automake.mk | 4 +- ovn/lib/ovn-util.c | 71 + ovn/lib/ovn-util.h | 53 ovn/northd/ovn-northd.c | 104 +--- 4 files changed, 128 insertions(+), 104 deletions(-) create mode 100644 ovn/lib/ovn-util.c create mode 100644 ovn/lib/ovn-util.h diff --git a/ovn/lib/automake.mk b/ovn/lib/automake.mk index 9e09352..32e7b87 100644 --- a/ovn/lib/automake.mk +++ b/ovn/lib/automake.mk @@ -10,7 +10,9 @@ ovn_lib_libovn_la_SOURCES = \ ovn/lib/expr.h \ ovn/lib/lex.c \ ovn/lib/lex.h \ - ovn/lib/logical-fields.h + ovn/lib/logical-fields.h \ + ovn/lib/ovn-util.c \ + ovn/lib/ovn-util.h nodist_ovn_lib_libovn_la_SOURCES = \ ovn/lib/ovn-nb-idl.c \ ovn/lib/ovn-nb-idl.h \ diff --git a/ovn/lib/ovn-util.c b/ovn/lib/ovn-util.c new file mode 100644 index 000..1c25efb --- /dev/null +++ b/ovn/lib/ovn-util.c @@ -0,0 +1,71 @@ + +#include +#include "lib/packets.h" +#include "ovn-util.h" +#include "openvswitch/vlog.h" + +VLOG_DEFINE_THIS_MODULE(ovn_util); + + +bool +extract_lport_addresses(char *address, struct lport_addresses *laddrs, +bool store_ipv6) +{ +char *buf = address; +int buf_index = 0; +char *buf_end = buf + strlen(address); +if (!ovs_scan_len(buf, _index, ETH_ADDR_SCAN_FMT, + ETH_ADDR_SCAN_ARGS(laddrs->ea))) { +return false; +} + +ovs_be32 ip4; +struct in6_addr ip6; +unsigned int plen; +char *error; + +laddrs->n_ipv4_addrs = 0; +laddrs->n_ipv6_addrs = 0; +laddrs->ipv4_addrs = NULL; +laddrs->ipv6_addrs = NULL; + +/* Loop through the buffer and extract the IPv4/IPv6 addresses + * and store in the 'laddrs'. Break the loop if invalid data is found. + */ +buf += buf_index; +while (buf < buf_end) { +buf_index = 0; +error = ip_parse_cidr_len(buf, _index, , ); +if (!error) { +laddrs->n_ipv4_addrs++; +laddrs->ipv4_addrs = xrealloc( +laddrs->ipv4_addrs, +sizeof (struct ipv4_netaddr) * laddrs->n_ipv4_addrs); +laddrs->ipv4_addrs[laddrs->n_ipv4_addrs - 1].addr = ip4; +laddrs->ipv4_addrs[laddrs->n_ipv4_addrs - 1].plen = plen; +buf += buf_index; +continue; +} +free(error); +error = ipv6_parse_cidr_len(buf, _index, , ); +if (!error && store_ipv6) { +laddrs->n_ipv6_addrs++; +laddrs->ipv6_addrs = xrealloc( +laddrs->ipv6_addrs, +sizeof(struct ipv6_netaddr) * laddrs->n_ipv6_addrs); +memcpy(>ipv6_addrs[laddrs->n_ipv6_addrs - 1].addr, , + sizeof(struct in6_addr)); +laddrs->ipv6_addrs[laddrs->n_ipv6_addrs - 1].plen = plen; +} + +if (error) { +static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1); +VLOG_INFO_RL(, "invalid syntax '%s' in address", address); +free(error); +break; +} +buf += buf_index; +} + +return true; +} diff --git a/ovn/lib/ovn-util.h b/ovn/lib/ovn-util.h new file mode 100644 index 000..863434a --- /dev/null +++ b/ovn/lib/ovn-util.h @@ -0,0 +1,53 @@ + + +#ifndef OVN_UTIL_H +#define OVN_UTIL_H 1 + + + +struct ipv4_netaddr { +ovs_be32 addr; +unsigned int plen; +}; + +struct ipv6_netaddr { +struct in6_addr addr; +unsigned int plen; +}; + +struct lport_addresses { +struct eth_addr ea; +size_t n_ipv4_addrs; +struct ipv4_netaddr *ipv4_addrs; +size_t n_ipv6_addrs; +struct ipv6_netaddr *ipv6_addrs; +}; + +/* + * Extracts the mac, ipv4 and ipv6 addresses from the input param 'address' + * which should be of the format 'MAC [IP1 IP2 ..]" where IPn should be + * a valid IPv4 or IPv6 address and stores them in the 'ipv4_addrs' and + * 'ipv6_addrs' fields of input param 'laddrs'. + * The caller has to free the 'ipv4_addrs' and 'ipv6_addrs' fields. + * If input param 'store_ipv6' is true only then extracted ipv6 addresses + * are stored in 'ipv6_addrs' fields. + * Return true if at least 'MAC' is found in 'address', false otherwise. + * Eg 1. + * If 'address' = '00:00:00:00:00:01 10.0.0.4 fe80::ea2a:eaff:fe28:3390/64 + * 30.0.0.3/23' and 'store_ipv6' = true + * then returns true with laddrs->n_ipv4_addrs = 2, naddrs->n_ipv6_addrs = 1. + * + * Eg. 2 + * If 'address' = '00:00:00:00:00:01 10.0.0.4 fe80::e
[ovs-dev] [PATCH RFC 0/3] - send garp on localnet
Problem: When a VM comes up with interface on localnet, external switches on the broadcast segment have to update their port->mac mappings and routers have to update their arp caches. This normally happens when the VM initiates traffic. on the localnet interface. But in some scenarios, when the VM remains silent, the VM is unreachable externally. This can happen when the VM is migrated or when the VM is reusing an earlier used IP but with a different mac. The fix is to for OVN to initiate gratuitous ARP which updates external routers/switches on localnet. This problem has been reported at: https://bugs.launchpad.net/networking-ovn/+bug/1545897 What to generate: When ipv4 and mac are defined on the port, we can generate a gratuitous ARP request or a reply. ARP requests seem to be preferred over ARP replies as suggested here: https://tools.ietf.org/html/rfc5227#section-3 In this change, a broadcast garp reply is generated. How many to generate: 3 to 5 arps spaced over 1-2 minutes seems to be reasonable to allow for losses. For comparison qemu generates 5 rarps when a VM port comes up. In this change, the garp packet is input to the port, and goes through the pipelines. An alternative is to output directly onto the localnet patch port. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] ovn-controller: add restart test
On Wed, Mar 23, 2016 at 4:23 PM, Ben Pfaff <b...@ovn.org> wrote: > On Tue, Mar 22, 2016 at 04:37:16PM -0700, Ramu Ramamurthy wrote: >> > This adds a test for this invariant for a specific test case, which is >> > useful. However, I believe that this should be a universal invariant >> > for ovn-controller, so that at any time if ovn-controller is restarted >> > gracefully after it has converged, it should preserve these flows. Is >> > that true? If so, then it would be more useful to add such a check to >> > *every* test case for OVN. >> > >> > What do you think? >> > >> > Thanks, >> > >> > Ben. >> >> I am finding that, after a graceful exit of ovn-controller >> using "ovs-appctl -t ovn-controller exit", and then a restart of the >> ovn-controller, >> the flows before restart and after restart dont match on table 0 in 1 out of >> 2 runs (or 1 out of 3 runs). >> >> This is because, localnet ports in the test >> are being deleted and recreated in such failing runs - and therefore, their >> ofports are mismatching in the flow-table (before and after restart). >> >> Localnet ports are being deleted and recreated in the failing runs >> because, ovn-controller is cleaning up the chassis row during graceful >> exit, and upon startup there still seems >> to be a race between patch_run and binding_run when the chassis does >> not exist, I am still >> chasing this down. > > Should that behavior of ovn-controller on graceful restart be considered > a bug, though? If so, then we should probably fix it and add a test (or > modify multiple tests, as I suggested above) to make sure that it does > not recur. > Indeed, its a bug in code, and I will update with patch(es) as needed to fix such restart bugs, along with test modifications. >> Thats why the test as currently coded is killing the ovn-controller >> ungracefully (kill -9) and restarting >> it. As such the restart test and expected invariance of flows seems useful. > > I guess that kill -9 and restart can be a useful test too. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] ovn-controller: add restart test
> This adds a test for this invariant for a specific test case, which is > useful. However, I believe that this should be a universal invariant > for ovn-controller, so that at any time if ovn-controller is restarted > gracefully after it has converged, it should preserve these flows. Is > that true? If so, then it would be more useful to add such a check to > *every* test case for OVN. > > What do you think? > > Thanks, > > Ben. I am finding that, after a graceful exit of ovn-controller using "ovs-appctl -t ovn-controller exit", and then a restart of the ovn-controller, the flows before restart and after restart dont match on table 0 in 1 out of 2 runs (or 1 out of 3 runs). This is because, localnet ports in the test are being deleted and recreated in such failing runs - and therefore, their ofports are mismatching in the flow-table (before and after restart). Localnet ports are being deleted and recreated in the failing runs because, ovn-controller is cleaning up the chassis row during graceful exit, and upon startup there still seems to be a race between patch_run and binding_run when the chassis does not exist, I am still chasing this down. Thats why the test as currently coded is killing the ovn-controller ungracefully (kill -9) and restarting it. As such the restart test and expected invariance of flows seems useful. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH v6 3/3] ovn: Add doc for zone-id external-id on iface record
Update documentation for the new external-id in the interface record of ovsdb. The key consists if the string "ovn-zone-id" concatenated with the logical port id, and the value is the zone-id. Signed-off-by: Ramu Ramamurthy <ramu.ramamur...@us.ibm.com> --- ovn/controller/ovn-controller.8.xml | 15 +++ 1 file changed, 15 insertions(+) diff --git a/ovn/controller/ovn-controller.8.xml b/ovn/controller/ovn-controller.8.xml index b261af9..8f60ba0 100644 --- a/ovn/controller/ovn-controller.8.xml +++ b/ovn/controller/ovn-controller.8.xml @@ -194,6 +194,21 @@ logical patch port that it implements. + + +external-ids:ovn-zone-id-logicalportid in the Interface table + + + + + This key is set by the ovn-controller to identify the + conntrack zone-id used for a OVN logical port. The key consists of + the string ovn-zone-id- concatenated with the logical + port-id. Its value contains the zone-id used for the logical port. + The interface for the parent logical port contains the zone-id keys for + child logical ports. + + Runtime Management Commands -- 2.3.9 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH v6 2/3] ovn: Add a restart test
Add a test to validate that a restart of the ovn-controller will program zoneids consistently on ports. Flows before restart are compared against flows after restart to detect problems with ofports or zone-ids. Signed-off-by: Ramu Ramamurthy <ramu.ramamur...@us.ibm.com> --- tests/ovn-controller.at | 131 1 file changed, 131 insertions(+) diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at index 971ea1e..7d17cd8 100644 --- a/tests/ovn-controller.at +++ b/tests/ovn-controller.at @@ -128,3 +128,134 @@ as ovn-sb OVS_APP_EXIT_AND_WAIT([ovsdb-server]) AT_CLEANUP + +AT_SETUP([ovn-controller - zone-id assignment]) +AT_KEYWORDS([ovn]) +# +# The purpose of this test is to validate the zone-id +# assigned to ports and programmed in the flow table. +# It verifies that the zone-id is assigned consistently +# after ovn-controller restart +# + +# create 2 localnet ports +# create 3 vifs +# create 5 cifs + +ovn_init_db ovn-sb +net_add n1 +sim_add hv +as hv +ovs-vsctl \ +-- add-br br-phys \ +-- add-br br-eth0 \ +-- add-br br-eth1 \ +-- add-br br-eth2 + +ovn_attach n1 br-phys 192.168.0.1 +ovs-appctl -t ovn-controller vlog/set any:any:dbg + +AT_CHECK([ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings=physnet1:br-eth0,physnet2:br-eth1,physnet3:br-eth2]) + +# Create a localnet port +ovn-sbctl \ +-- --id=@dp101 create Datapath_Binding tunnel_key=101 \ +-- create Port_Binding datapath=@dp101 logical_port=localnet1 tunnel_key=1 \ +type=localnet options:network_name=physnet1 \ +-- create Port_Binding datapath=@dp101 logical_port=localvif1 tunnel_key=2 +ovs-vsctl add-port br-int localvif1 -- set Interface localvif1 external_ids:iface-id=localvif1 + +# Create a vif port +ovn-sbctl \ +-- --id=@dp102 create Datapath_Binding tunnel_key=102 \ +-- create Port_Binding datapath=@dp102 logical_port=localvif2 tunnel_key=3 +ovs-vsctl add-port br-int localvif2 -- set Interface localvif2 external_ids:iface-id=localvif2 + + +# Create a vif port and some cifs on it +ovn-sbctl \ +-- --id=@dp103 create Datapath_Binding tunnel_key=103 \ +-- create Port_Binding datapath=@dp103 logical_port=localvif3 tunnel_key=4 \ +-- create Port_Binding datapath=@dp103 logical_port=localcif1 tunnel_key=5 parent_port=localvif3 tag=1 \ +-- create Port_Binding datapath=@dp103 logical_port=localcif2 tunnel_key=6 parent_port=localvif3 tag=2 \ +-- create Port_Binding datapath=@dp103 logical_port=localcif3 tunnel_key=7 parent_port=localvif3 tag=3 \ +-- create Port_Binding datapath=@dp103 logical_port=localcif4 tunnel_key=8 parent_port=localvif3 tag=4 \ +-- create Port_Binding datapath=@dp103 logical_port=localcif5 tunnel_key=9 parent_port=localvif3 tag=5 +ovs-vsctl add-port br-int localvif3 -- set Interface localvif3 external_ids:iface-id=localvif3 + +# Create another localnet port +ovn-sbctl \ +-- --id=@dp201 create Datapath_Binding tunnel_key=201 \ +-- create Port_Binding datapath=@dp201 logical_port=localnet201 tunnel_key=1 \ +type=localnet options:network_name=physnet2 \ +-- create Port_Binding datapath=@dp201 logical_port=localvif201 tunnel_key=2 +ovs-vsctl add-port br-int localvif201 -- set Interface localvif201 external_ids:iface-id=localvif201 + +# wait until all expected flows are programmed in table 0 +OVS_WAIT_UNTIL([test `ovs-ofctl dump-flows br-int table=0 | ofctl_strip | tail -n+2 | sort | wc -l` -ge 13]) + +# dump the zoneids +ovs-vsctl get interface patch-br-int-to-localnet1 external-ids +ovs-vsctl get interface patch-br-int-to-localnet201 external-ids +ovs-vsctl get interface localvif1 external-ids +ovs-vsctl get interface localvif2 external-ids +ovs-vsctl get interface localvif3 external-ids + +# verify zone-id tags are set on the interface +OVS_WAIT_UNTIL([ovs-vsctl get interface patch-br-int-to-localnet1 external-ids:ovn-zone-id-localnet1]) +OVS_WAIT_UNTIL([ovs-vsctl get interface patch-br-int-to-localnet201 external-ids:ovn-zone-id-localnet201]) +OVS_WAIT_UNTIL([ovs-vsctl get interface localvif1 external-ids:ovn-zone-id-localvif1]) +OVS_WAIT_UNTIL([ovs-vsctl get interface localvif2 external-ids:ovn-zone-id-localvif2]) +OVS_WAIT_UNTIL([ovs-vsctl get interface localvif3 external-ids:ovn-zone-id-localvif3]) +OVS_WAIT_UNTIL([ovs-vsctl get interface localvif3 external-ids:ovn-zone-id-localcif1]) +OVS_WAIT_UNTIL([ovs-vsctl get interface localvif3 external-ids:ovn-zone-id-localcif2]) +OVS_WAIT_UNTIL([ovs-vsctl get interface localvif3 external-ids:ovn-zone-id-localcif3]) +OVS_WAIT_UNTIL([ovs-vsctl get interface localvif3 external-ids:ovn-zone-id-localcif4]) +OVS_WAIT_UNTIL([ovs-vsctl get interface localvif3 external-ids:ovn-zone-id-localcif5]) + +# save the table0 flows for a later comparison after restart +ovs-ofctl dump-flows br-int table=0 | ofctl_strip | tail -n+2 | sort > expout +cat expout + +#kill the ovn controller ungracefully, so nothing gets c
[ovs-dev] [PATCH v6 1/3] ovn: Assign zone-id consistently
Currently, ovn-controller does not record the zoneid assigned to logical port. Tests indicate that zoneids can be different on the same logical port after ovn-controller restart. The fix sets the zone-id as an external-id of the interface record, and recovers the zone-id from that record. Reported-by: Russell Bryant <russ...@ovn.org> Reported-at: https://bugs.launchpad.net/networking-ovn/+bug/1538696 Signed-off-by: Ramu Ramamurthy <ramu.ramamur...@us.ibm.com> --- ovn/controller/binding.c | 157 ++- 1 file changed, 141 insertions(+), 16 deletions(-) diff --git a/ovn/controller/binding.c b/ovn/controller/binding.c index d3ca9c9..77d0a2d 100644 --- a/ovn/controller/binding.c +++ b/ovn/controller/binding.c @@ -19,6 +19,7 @@ #include "lib/bitmap.h" #include "lib/hmap.h" #include "lib/sset.h" +#include "lib/type-props.h" #include "lib/util.h" #include "lib/vswitch-idl.h" #include "openvswitch/vlog.h" @@ -50,7 +51,8 @@ binding_register_ovs_idl(struct ovsdb_idl *ovs_idl) } static void -get_local_iface_ids(const struct ovsrec_bridge *br_int, struct shash *lports) +get_local_iface_ids(const struct ovsrec_bridge *br_int, struct shash *lports, +struct shash *localnet_ports) { int i; @@ -63,10 +65,19 @@ get_local_iface_ids(const struct ovsrec_bridge *br_int, struct shash *lports) continue; } +const char *localnet = smap_get(_rec->external_ids, +"ovn-localnet-port"); + for (j = 0; j < port_rec->n_interfaces; j++) { const struct ovsrec_interface *iface_rec; iface_rec = port_rec->interfaces[j]; + +if (!strcmp(iface_rec->type, "patch") && localnet) { +shash_add(localnet_ports, localnet, iface_rec); +break; +} + iface_id = smap_get(_rec->external_ids, "iface-id"); if (!iface_id) { continue; @@ -76,30 +87,127 @@ get_local_iface_ids(const struct ovsrec_bridge *br_int, struct shash *lports) } } +static void get_zones_on_ifaces(struct shash *lports, +struct simap *iface_zones, +struct shash *parent_ports) +{ +const char OVN_ZONE_ID[] = "ovn-zone-id-"; +const struct ovsrec_interface *iface_rec; +struct shash_node *node; +struct shash already_added = SHASH_INITIALIZER(_added); + +/* get zone-ids set in external-ids of iface records, and + * also get parent ports for logical ports + * for efficient iteration on child ports, maintain already_added + */ +SHASH_FOR_EACH(node, lports) { +iface_rec = node->data; +if (!iface_rec || shash_find(_added, iface_rec->name)) { +continue; +} +shash_add(_added, iface_rec->name, iface_rec); +struct smap_node *zone; +int zone_id; +SMAP_FOR_EACH(zone, _rec->external_ids) { +zone_id = smap_get_int(_rec->external_ids, zone->key, 0); +if (!strncmp(zone->key, OVN_ZONE_ID, strlen(OVN_ZONE_ID)) && +zone_id) { +simap_put(iface_zones, zone->key+strlen(OVN_ZONE_ID), zone_id); +shash_add(parent_ports, zone->key+strlen(OVN_ZONE_ID), iface_rec); +} +} +} +shash_destroy(_added); +} + +static void +update_zone_on_iface_rec(const struct ovsrec_interface *iface_rec, + const char *iface_id, int zone_id, struct controller_ctx *ctx) +{ + +if (!iface_rec || !ctx->ovs_idl_txn) { +return; +} +struct smap new; +char *zone_key = xasprintf("ovn-zone-id-%s",iface_id); +char zone[INT_STRLEN(int)+1]; +snprintf(zone, sizeof zone, "%d", zone_id); +smap_clone(, _rec->external_ids); +smap_replace(, zone_key, zone); +ovsrec_interface_verify_external_ids(iface_rec); +ovsrec_interface_set_external_ids(iface_rec, ); +smap_destroy(); +free(zone_key); +} + +static void +delete_zone_on_iface_rec(const struct ovsrec_interface *iface_rec, + const char *key, struct controller_ctx *ctx) +{ +if (!iface_rec || !ctx->ovs_idl_txn) { +return; +} +struct smap new; +char *zone_key = xasprintf("ovn-zone-id-%s", key); +smap_clone(, _rec->external_ids); +smap_remove(, zone_key); +ovsrec_interface_verify_external_ids(iface_rec); +ovsrec_interface_set_external_ids(iface_rec, ); +smap_destroy(); +free(zone_key); +} + static void -update_ct_zones(struct sset *lports, struct simap *ct_zones, -unsigned long *ct_zone_bitmap) +update_ct_zones(struct shash *lports, struct simap *ct_zones, +u
[ovs-dev] [PATCH v6 0/3] ovn: Assign zoneid consistently
Changes v5 to v6 * updates per code-review * handle child-ports and localnet ports * add a test * update docs ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] ovn: broadcast packets dropped with "drop recirc action"
On 2016-03-15 11:16, Andy Zhou wrote: On Mon, Mar 14, 2016 at 12:11 PM, Ramu Ramamurthy < srama...@linux.vnet.ibm.com> wrote: When a logical switch (localnet) has lots of ports on a hypervisor, I find that broadcast packets from one of the ports is only forwarded to a subset of the other ports, and the kernel module shows the message - "kernel: openvswitch: ovs-system: deferred action limit reached, drop recirc action" There appears to be a discussion of a related problem and a possible patch here: http://openvswitch.org/pipermail//discuss/2015-October/019168.html http://openvswitch.org/pipermail//discuss/2015-October/019198.html Thanks for the bug report. Those patches are now dropped since I have not heard back from the bug reporter. I have just posted a potential fix as RFC at: http://openvswitch.org/pipermail/dev/2016-March/067794.html Would you please test it against your set up? Thanks. I tried the patch in the link above on devstack-with-ovn. The openvswitch kernel module is rebuilt with the above patch, I find that there is a system hang, when I send the broadcast packet. The same test earlier showed "drop recirc action" errors. The following change works however, iff --git a/datapath/actions.c b/datapath/actions.c index 20413c9..39dedcc 100644 --- a/datapath/actions.c +++ b/datapath/actions.c @@ -72,7 +72,7 @@ struct ovs_frag_data { static DEFINE_PER_CPU(struct ovs_frag_data, ovs_frag_data_storage); #endif -#define DEFERRED_ACTION_FIFO_SIZE 10 +#define DEFERRED_ACTION_FIFO_SIZE 70 But, it appears that we cannot arbitrarily increase DEFERRED_ACTION_FIFO_SIZE because when I set it to 100, I am not able to load the openvswitch module - the following errors are seen. [ 1094.163171] WARNING: at mm/percpu.c:896 pcpu_alloc+0x64a/0x670() [ 1094.163174] illegal size (46408) or align (8) for percpu allocation It appears that there is a hard limit on DEFERRED_ACTION_FIFO_SIZE determined by: PCPU_MIN_UNIT_SIZE. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] ovn: broadcast packets dropped with "drop recirc action"
On 2016-03-16 16:35, Andy Zhou wrote: On Wed, Mar 16, 2016 at 4:00 PM, Ramu Ramamurthy < srama...@linux.vnet.ibm.com> wrote: On 2016-03-15 11:16, Andy Zhou wrote: On Mon, Mar 14, 2016 at 12:11 PM, Ramu Ramamurthy < srama...@linux.vnet.ibm.com> wrote: When a logical switch (localnet) has lots of ports on a hypervisor, I find that broadcast packets from one of the ports is only forwarded to a subset of the other ports, and the kernel module shows the message - "kernel: openvswitch: ovs-system: deferred action limit reached, drop recirc action" There appears to be a discussion of a related problem and a possible patch here: http://openvswitch.org/pipermail//discuss/2015-October/019168.html http://openvswitch.org/pipermail//discuss/2015-October/019198.html Thanks for the bug report. Those patches are now dropped since I have not heard back from the bug reporter. I have just posted a potential fix as RFC at: http://openvswitch.org/pipermail/dev/2016-March/067794.html Would you please test it against your set up? Thanks. I tried the patch in the link above on devstack-with-ovn. The openvswitch kernel module is rebuilt with the above patch, I find that there is a system hang, when I send the broadcast packet. The same test earlier showed "drop recirc action" errors. Ahh, I see. Would you please try the following incremental change? Thanks. diff --git a/datapath/actions.c b/datapath/actions.c index 3b85eaf..5e81a38 100644 --- a/datapath/actions.c +++ b/datapath/actions.c @@ -116,7 +116,7 @@ static struct deferred_action *action_fifo_get(struct action static struct deferred_action *action_fifo_put(struct action_fifo *fifo) { - if (fifo->head >= fifo->size - 1) { + if (fifo->head >= fifo->size / sizeof(*fifo->fifo) - 1) { /* out of fifo buffer space, try to allocate a new fifo * buffer. */ struct deferred_action *new_fifo; thanks Andy, That worked !, I was able to broadcast (udp-dhcp-discovers) to 100 ports and saw no "drop recirc action" errors. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] ovn: broadcast packets dropped with "drop recirc action"
On 2016-03-15 11:16, Andy Zhou wrote: On Mon, Mar 14, 2016 at 12:11 PM, Ramu Ramamurthy < srama...@linux.vnet.ibm.com> wrote: When a logical switch (localnet) has lots of ports on a hypervisor, I find that broadcast packets from one of the ports is only forwarded to a subset of the other ports, and the kernel module shows the message - "kernel: openvswitch: ovs-system: deferred action limit reached, drop recirc action" There appears to be a discussion of a related problem and a possible patch here: http://openvswitch.org/pipermail//discuss/2015-October/019168.html http://openvswitch.org/pipermail//discuss/2015-October/019198.html Thanks for the bug report. Those patches are now dropped since I have not heard back from the bug reporter. I have just posted a potential fix as RFC at: http://openvswitch.org/pipermail/dev/2016-March/067794.html Would you please test it against your set up? Thanks. Andy, Thanks for the fix, I will update after testing it, ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] ovn: broadcast packets dropped with "drop recirc action"
When a logical switch (localnet) has lots of ports on a hypervisor, I find that broadcast packets from one of the ports is only forwarded to a subset of the other ports, and the kernel module shows the message - "kernel: openvswitch: ovs-system: deferred action limit reached, drop recirc action" There appears to be a discussion of a related problem and a possible patch here: http://openvswitch.org/pipermail//discuss/2015-October/019168.html http://openvswitch.org/pipermail//discuss/2015-October/019198.html there is another related discussion here but it seems to have been fixed, http://openvswitch.org/pipermail/dev/2015-November/061840.html but I am still seeing the problem I appreciate any feedback on possible fix for this problem. (I will try the fixes in above link). Details (for reproduction): 1. Create 50 logical ports on a localnet logical switch and bind them for i in {1..50};do ovn-nbctl lport-add 064d37d8-b3dd-45e6-81a3-15fc8d0a99a3 vif-${i}; done for i in {1..50}; do sudo ip link add vif-${i} type veth peer name vifpeer-${i}; done for i in {1..50}; do sudo ip link set dev vif-${i} up; done for i in {1..50}; do sudo ip link set dev vifpeer-${i} up; done sudo ip addr add 10.10.10.1/16 dev vifpeer-1 2. Send a broadcast packet on one of the ports sudo dhclient -v vifpeer-1 <<< this triggers a broadcast DHCP DISCOVER message 3. I find that only about 10 of the 50 ports get the dhcp DISCOVER broadcast message. The problem appears is in this flow in table 33. It seems that only the first 10 resubmits to table 34 in the following flow have really taken effect, cookie=0x0, duration=60631.984s, table=33, n_packets=118, n_bytes=11074, idle_age=2205, hard_age=5826, priority=100,reg7=0x,metadata=0x2 actions=load:0x32->NXM_NX_REG5[],load:0x31->NXM_NX_REG7[],resubmit(,34),load:0x11->NXM_NX_REG5[],load:0x10->NXM_NX_REG7[],resubmit(,34),load:0x31->NXM_NX_REG5[],load:0x30->NXM_NX_REG7[],resubmit(,34),load:0x26->NXM_NX_REG5[],load:0x25->NXM_NX_REG7[],resubmit(,34),load:0x3->NXM_NX_REG5[],load:0x2->NXM_NX_REG7[],resubmit(,34),load:0xa->NXM_NX_REG5[],load:0x9->NXM_NX_REG7[],resubmit(,34),load:0xd->NXM_NX_REG5[],load:0xc->NXM_NX_REG7[],resubmit(,34),load:0x35->NXM_NX_REG5[],load:0x34->NXM_NX_REG7[],resubmit(,34),load:0x18->NXM_NX_REG5[],load:0x17->NXM_NX_REG7[],resubmit(,34),load:0x14->NXM_NX_REG5[],load:0x13->NXM_NX_REG7[],resubmit(,34),load:0x4->NXM_NX_REG5[],load:0x4->NXM_NX_REG7[],resubmit(,34),load:0x10->NXM_NX_REG5[],load:0xf->NXM_NX_REG7[],resubmit(,34),load:0x2b->NXM_NX_REG5[],load:0x2a->NXM_NX_REG7[],resubmit(,34),load:0x2d->NXM_NX_REG5[],load:0x2c->NXM_NX_REG7[],resubmit(,34),load:0x27->NXM_NX_REG5[],load:0x26->NXM_NX_REG7[],resubmit(,34),load:0xb->NXM_NX_REG5[],load:0x a->NXM_N X_REG7[],resubmit(,34),load:0x17->NXM_NX_REG5[],load:0x16->NXM_NX_REG7[],resubmit(,34),load:0x1f->NXM_NX_REG5[],load:0x1e->NXM_NX_REG7[],resubmit(,34),load:0x1b->NXM_NX_REG5[],load:0x1a->NXM_NX_REG7[],resubmit(,34),load:0x28->NXM_NX_REG5[],load:0x27->NXM_NX_REG7[],resubmit(,34),load:0x33->NXM_NX_REG5[],load:0x32->NXM_NX_REG7[],resubmit(,34),load:0x30->NXM_NX_REG5[],load:0x2f->NXM_NX_REG7[],resubmit(,34),load:0x5->NXM_NX_REG5[],load:0x5->NXM_NX_REG7[],resubmit(,34),load:0x34->NXM_NX_REG5[],load:0x33->NXM_NX_REG7[],resubmit(,34),load:0x15->NXM_NX_REG5[],load:0x14->NXM_NX_REG7[],resubmit(,34),load:0x8->NXM_NX_REG5[],load:0x7->NXM_NX_REG7[],resubmit(,34),load:0xc->NXM_NX_REG5[],load:0xb->NXM_NX_REG7[],resubmit(,34),load:0xe->NXM_NX_REG5[],load:0xd->NXM_NX_REG7[],resubmit(,34),load:0x1c->NXM_NX_REG5[],load:0x1b->NXM_NX_REG7[],resubmit(,34),load:0x1e->NXM_NX_REG5[],load:0x1d->NXM_NX_REG7[],resubmit(,34),load:0x12->NXM_NX_REG5[],load:0x11->NXM_NX_REG7[],resubmit(,34),load:0x9->NXM_N X_REG5[] ,load:0x8->NXM_NX_REG7[],resubmit(,34),load:0x19->NXM_NX_REG5[],load:0x18->NXM_NX_REG7[],resubmit(,34),load:0x22->NXM_NX_REG5[],load:0x21->NXM_NX_REG7[],resubmit(,34),load:0x2a->NXM_NX_REG5[],load:0x29->NXM_NX_REG7[],resubmit(,34),load:0x13->NXM_NX_REG5[],load:0x12->NXM_NX_REG7[],resubmit(,34),load:0x1a->NXM_NX_REG5[],load:0x19->NXM_NX_REG7[],resubmit(,34),load:0x1d->NXM_NX_REG5[],load:0x1c->NXM_NX_REG7[],resubmit(,34),load:0x29->NXM_NX_REG5[],load:0x28->NXM_NX_REG7[],resubmit(,34),load:0x36->NXM_NX_REG5[],load:0x35->NXM_NX_REG7[],resubmit(,34),load:0x25->NXM_NX_REG5[],load:0x24->NXM_NX_REG7[],resubmit(,34),load:0x24->NXM_NX_REG5[],load:0x23->NXM_NX_REG7[],resubmit(,34),load:0x16->NXM_NX_REG5[],load:0x15->NXM_NX_REG7[],resubmit(,34),load:0x2c->NXM_NX_REG5[],load:0x2b->NXM_NX_REG7[],resubmit(,34),load:0x2e->NXM_NX_REG5[],load:0x2d->NXM_NX_REG7[],resubmit(,34),load:0xf->NXM_NX_REG5[],load:0xe->NXM_NX_REG7[],resubmit(,34),load:0x2->NXM_NX_REG5[],load:0x1->NXM_NX_REG7[],resubmit( ,34),loa
Re: [ovs-dev] [PATCH] ovn-controller: add restart test
> What I would probably do is submit them together: the zone-ids fix + this > test case that helps test it. If you take out your "xyz" change, will this > test fail currently? > > If i take out the change which masks the zone-ids, this test fails on master because, zone-ids dont match before and after restart. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] ovn-controller: add restart test
> > Does this depend on your zone-ids fix? Do you have another rev of that > coming? > Russell, This test does not depend on the zone-id fix, because it replaces zone-ids on flows with a dummy string "xyz". It passes on the master as is. I will send the next revision of the zone-id fix soon, and as part of that fix, this test will also be updated to compare zone-ids on flows. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH] ovn-controller: add restart test
Add a test to validate that a restart of the ovn-controller will program flows in table 0 consistently. Flows before restart are compared against flows after restart to detect problems with ofports or zone-ids. Signed-off-by: Ramu Ramamurthy <ramu.ramamur...@us.ibm.com> --- tests/ovn-controller.at | 127 1 file changed, 127 insertions(+) diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at index 2ff56af..68b78ff 100644 --- a/tests/ovn-controller.at +++ b/tests/ovn-controller.at @@ -117,3 +117,130 @@ check_patches # Gracefully terminate ovn-controller ovs-appctl -t ovn-controller exit AT_CLEANUP + +AT_SETUP([ovn-controller - restart]) +AT_KEYWORDS([ovn]) + +# The purpose of this test is to validate flows programmed +# (in table 0) after ovn-controller restart. +# This test confirms that: +# a) ofports remain the same, so, ports were not deleted/readded +# b) zone-ids remain the same on ports +# +# For now, zone-ids in flows are masked out because, they are not assigned +# consistently. This test is added to prepare for that code-change. +# + +ovn_init_db ovn-sb +net_add n1 +sim_add hv +as hv +ovs-vsctl \ +-- add-br br-phys \ +-- add-br br-eth0 \ +-- add-br br-eth1 \ +-- add-br br-eth2 + +ovn_attach n1 br-phys 192.168.0.1 + +AT_CHECK([ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings=physnet1:br-eth0,physnet2:br-eth1,physnet3:br-eth2]) + +# +# - Create 2 localnet ports +# - Create 4 VIFs +# - Create 5 CIFs +# + +# +# Create a localnet port, along with a vif +# +ovn-sbctl \ +-- --id=@dp101 create Datapath_Binding tunnel_key=101 \ +-- create Port_Binding datapath=@dp101 logical_port=localnet1 tunnel_key=1 \ +type=localnet options:network_name=physnet1 \ +-- create Port_Binding datapath=@dp101 logical_port=localvif1 tunnel_key=2 +ovs-vsctl add-port br-int localvif1 -- set Interface localvif1 external_ids:iface-id=localvif1 +# +# Create another localnet port, along with a vif +# +ovn-sbctl \ +-- --id=@dp201 create Datapath_Binding tunnel_key=201 \ +-- create Port_Binding datapath=@dp201 logical_port=localnet201 tunnel_key=1 \ +type=localnet options:network_name=physnet2 \ +-- create Port_Binding datapath=@dp201 logical_port=localvif201 tunnel_key=2 + +ovs-vsctl add-port br-int localvif201 -- set Interface localvif201 external_ids:iface-id=localvif201 + +# +# Create a vif port +# +ovn-sbctl \ +-- --id=@dp102 create Datapath_Binding tunnel_key=102 \ +-- create Port_Binding datapath=@dp102 logical_port=localvif2 tunnel_key=3 +ovs-vsctl add-port br-int localvif2 -- set Interface localvif2 external_ids:iface-id=localvif2 + +# +# Create a vif port and add 5 cifs on it +# +ovn-sbctl \ +-- --id=@dp103 create Datapath_Binding tunnel_key=103 \ +-- create Port_Binding datapath=@dp103 logical_port=localvif3 tunnel_key=4 \ +-- create Port_Binding datapath=@dp103 logical_port=localcif1 tunnel_key=5 parent_port=localvif3 tag=1 \ +-- create Port_Binding datapath=@dp103 logical_port=localcif2 tunnel_key=6 parent_port=localvif3 tag=2 \ +-- create Port_Binding datapath=@dp103 logical_port=localcif3 tunnel_key=7 parent_port=localvif3 tag=3 \ +-- create Port_Binding datapath=@dp103 logical_port=localcif4 tunnel_key=8 parent_port=localvif3 tag=4 \ +-- create Port_Binding datapath=@dp103 logical_port=localcif5 tunnel_key=9 parent_port=localvif3 tag=5 +ovs-vsctl add-port br-int localvif3 -- set Interface localvif3 external_ids:iface-id=localvif3 + +ovs-vsctl show + +# wait for flows in table 0 to get programmed +OVS_WAIT_UNTIL([test `ovs-ofctl dump-flows br-int table=0 | ofctl_strip | tail -n+2 | sort | wc -l` -ge 13]) + +# save the table0 flows for a later comparison after restart +ovs-ofctl dump-flows br-int table=0 | ofctl_strip | tail -n+2 | sort | sed 's/load.\+NXM_NX_REG5/load:xyz->NXM_NX_REG5/g' > expout +cat expout + + +#kill the ovn controller ungracefully +kill -9 `cat "$OVS_RUNDIR"/ovn-controller.pid` + +echo "killed the ovn-controller" + +#clean the flows +ovs-ofctl del-flows br-int + +#restart the ovn controller +start_daemon ovn-controller + +#wait for the table 0 flows to get reprogrammed +OVS_WAIT_UNTIL([test `ovs-ofctl dump-flows br-int table=0 | ofctl_strip | tail -n+2 | sort | wc -l` -ge 13]) + +# save flows for a comparison +ovs-ofctl dump-flows br-int table=0 | ofctl_strip | tail -n+2 | sort | sed 's/load.\+NXM_NX_REG5/load:xyz->NXM_NX_REG5/g' > afterkill +cat afterkill + +#compare the table 0 flows before and after restart +AT_CHECK([ovs-ofctl diff-flows afterkill expout]) + + +ovs-appctl -t ovn-controller exit +echo "shutdown the ovn-controller gracefully" + +#clean the flows +ovs-ofctl del-flows br-int + +start_daemon ovn-controller +echo "restarted the ovn-controller" + +#wait for the table 0 flows to get reprogrammed +OVS_WAIT_UNTIL([test `ovs-ofctl dump-flows br-int
[ovs-dev] [PATCH] ovn-controller: race between binding-run and patch-run for localnet ports
when ctx->ovnsb_idl_txn is null, binding_run exits early and does not add any local_datapaths, but patch_run doesnt check this, and ends up deleting localnet ports, because there are no local datapaths for them, They get readded in a subsequent run causing unnecessary deletion and readdition. Signed-off-by: Ramu Ramamurthy <ramu.ramamur...@us.ibm.com> --- ovn/controller/patch.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ovn/controller/patch.c b/ovn/controller/patch.c index 753ce3e..cfae613 100644 --- a/ovn/controller/patch.c +++ b/ovn/controller/patch.c @@ -280,7 +280,7 @@ void patch_run(struct controller_ctx *ctx, const struct ovsrec_bridge *br_int, struct hmap *local_datapaths) { -if (!ctx->ovs_idl_txn) { +if (!ctx->ovs_idl_txn || !ctx->ovnsb_idl_txn) { return; } -- 2.3.9 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [ovs-dev, v5] ovn-controller: Assign zone-id consistently
Thanks for your review! > > From this part of the patch, it looks like external-ids:zone-id is > accepted even if there are duplicates, or if the value is not valid. I > think that it should reject such cases: > > With that in mind, update_local_zone_ids() should also update > external-ids:zone-id if it needs to change, instead of leaving it the > same. > > Here, please use INT_STRLEN(int) + 1 as the size of zone[]. > +char zone[12]; > +zone_id = simap_get(ct_zones, iface_id); > +snprintf(zone, sizeof zone, "%d", zone_id); > > > In the next version I will incorporate these changes, ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [RFC] OVN: Support Static Route
Reported-At: https://bugs.launchpad.net/networking-ovn/+bug/1539347 Reported-By: na...@cn.ibm.com Co-Authored-By: na...@cn.ibm.com,rua...@cn.ibm.com --- openstack-neutron (CMS) allows the configuration of "static-routes" on a router as shown below. neutron router-update --route destination=192.168.3.0/24,nexthop=192.168.1.100 router1 The static route gets implemented (in openstack) by adding a route in the router-namespace using a command such as: ip route add to destination-cidr via nexthop. The next-hop is a port known to the CMS, and so, the CMS knows the IP->mac mapping of the nexthop port. The usecases addressed by static route includes a VPN router. A VPN router is a VM that has interfaces on a private network, and on a provider network. The static route allows other VMs on the private network to reach the VPN-network via the VPN router. Example: Private networks are 192.168.1.0/24 and 192.168.2.0/24 VPN-router has an interface (192.168.1.100) on private network (192.168.1.0/24) VPN connects to 192.168.3.0/24 over the provider network the following static route is needed to allow VMs private networks to reach the VPN network destination=192.168.3.0/24, nexthop=192.168.1.100 An approach to implement static routes in OVN follows: 1) A new column called static route is added to the Logical Router table * the static route column may in turn refer to a new Logical_Router_Static_Route table whose columns include destination, nexthop OR for simplicity, * the static route column may contain the static routes as a string with formatting to denote separation - for example, "destination=192.168.3.0/24,nexthop=192.168.1.100;destination=192.168.2.0/24,nexthop=10.0.1.4" 2) Based on the presence of the "static route" field in the logical router, OVN northd programs a logical flow corresponding to the each static route as follow: actions : "ip.ttl--; reg0 = nexthop; next;" external_ids : {stage-name=lr_in_ip_routing} match : "ip4.dst == destination cidr" pipeline : ingress The lr_in_arp stage would have already have an arp-resolution-entry for the nexthop since the nexthop (IP,Mac) is already added as a logical port. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v5] ovn-controller: Assign zone-id consistently
> > > Unfortunately, I thought of another issue that complicates this whole > approach. A single interface does not necessarily map to a single > logical port and zone ID. We support sub-ports, initially aimed at > modelling containers in VMs. That means we need to track N different > zone IDs on a single interface record. For more info, see "Life Cycle > of a Container Interface Inside a VM" in ovn-archtecture(7). > > For that use case, we could easily have many hundreds of sub-ports using > a single interface. Maybe we should have external-id keys of the form > "ovn-zone-id-" where is the logical port the zone id is > for? We'd need some additional code for cleaning up zone IDs for > sub-ports that have been deleted. > Thanks, In the case of a cif, can we have the key as: ovn-zone-id- where the tag is the vlan which distinguishes the cif on the vif ? But, this will result in external id containings lots of keys - would that be acceptable ? An alternative approach just for comparison is: during ovn-controller startup to look at the installed flows in table 0 of the flow table, and recover the zone-id -> (ofport, tag) map and from there deduce the logical-port -> zone-id map ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH v5] ovn-controller: Assign zone-id consistently
Currently, ovn-controller does not record the lport->zoneid map, and so, after ovn-controller restart, zone-ids may get set inconsistently on lports, resulting in possible hits to already established connections. Set zone-id as an external-id of the interface record, and recover the zone-id from that record Reported-by: Russell Bryant <russ...@ovn.org> Reported-at: https://bugs.launchpad.net/networking-ovn/+bug/1538696 Signed-off-by: Ramu Ramamurthy <ramu.ramamur...@us.ibm.com> --- Changes v3 to v4: update as per code-review * simplify code to update zone-id only when needed * update documentation for external id * update authors Changes v4 to v5: fix formatting AUTHORS | 1 + ovn/controller/binding.c| 39 +++-- ovn/controller/ovn-controller.8.xml | 13 + 3 files changed, 51 insertions(+), 2 deletions(-) diff --git a/AUTHORS b/AUTHORS index 936394d..597899d 100644 --- a/AUTHORS +++ b/AUTHORS @@ -161,6 +161,7 @@ pritesh pritesh.koth...@cisco.com Pravin B Shelar pshe...@nicira.com Raju Subramanianrsubraman...@nicira.com Rami Rosen ramir...@gmail.com +Ramu Ramamurthy ramu.ramamur...@us.ibm.com Randall Sharo andall.sh...@navy.mil Ravi Kerur ravi.ke...@telekom.com Reid Price r...@nicira.com diff --git a/ovn/controller/binding.c b/ovn/controller/binding.c index cb12cea..543686c 100644 --- a/ovn/controller/binding.c +++ b/ovn/controller/binding.c @@ -50,7 +50,39 @@ binding_register_ovs_idl(struct ovsdb_idl *ovs_idl) } static void -get_local_iface_ids(const struct ovsrec_bridge *br_int, struct shash *lports) +update_local_zone_id(const struct ovsrec_interface *iface_rec, const char *iface_id, + struct simap *ct_zones, unsigned long *ct_zone_bitmap, + struct controller_ctx *ctx) +{ +int zone_id; +const char *OVN_ZONE_ID = "ovn-zone-id"; + +zone_id = smap_get_int(_rec->external_ids, OVN_ZONE_ID, 0); +if (zone_id && !simap_contains(ct_zones, iface_id)) { +/* get zone-id from the local interface record */ +bitmap_set1(ct_zone_bitmap, zone_id); +simap_put(ct_zones, iface_id, zone_id); +} +if (!zone_id && simap_contains(ct_zones, iface_id) && +ctx->ovs_idl_txn) { +/* zone-id has been assigned to lport, but not + * recorded in the local interface record yet */ +struct smap new; +char zone[12]; +zone_id = simap_get(ct_zones, iface_id); +snprintf(zone, sizeof zone, "%d", zone_id); +smap_clone(, _rec->external_ids); +smap_replace(, OVN_ZONE_ID, zone); +ovsrec_interface_verify_external_ids(iface_rec); +ovsrec_interface_set_external_ids(iface_rec, ); +smap_destroy(); +} +} + +static void +get_local_iface_ids(const struct ovsrec_bridge *br_int, struct shash *lports, +struct simap *ct_zones, unsigned long *ct_zone_bitmap, +struct controller_ctx *ctx) { int i; @@ -72,10 +104,13 @@ get_local_iface_ids(const struct ovsrec_bridge *br_int, struct shash *lports) continue; } shash_add(lports, iface_id, iface_rec); +update_local_zone_id(iface_rec, iface_id, ct_zones, + ct_zone_bitmap, ctx); } } } + static void update_ct_zones(struct sset *lports, struct simap *ct_zones, unsigned long *ct_zone_bitmap) @@ -165,7 +200,7 @@ binding_run(struct controller_ctx *ctx, const struct ovsrec_bridge *br_int, struct shash lports = SHASH_INITIALIZER(); if (br_int) { -get_local_iface_ids(br_int, ); +get_local_iface_ids(br_int, , ct_zones, ct_zone_bitmap, ctx); } else { /* We have no integration bridge, therefore no local logical ports. * We'll remove our chassis from all port binding records below. */ diff --git a/ovn/controller/ovn-controller.8.xml b/ovn/controller/ovn-controller.8.xml index b261af9..383de61 100644 --- a/ovn/controller/ovn-controller.8.xml +++ b/ovn/controller/ovn-controller.8.xml @@ -194,6 +194,19 @@ logical patch port that it implements. + + +external-ids:ovn-zone-id in the +Interface table + + + + + This key is set by the ovn-controller to identify the + conntrack-zone id used for the OVN logical port. Its value specifies + the zone-id. + + Runtime Management Commands -- 2.3.9 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH v4] ovn-controller: Assign zone-id consistently
Currently, ovn-controller does not record the lport->zoneid map, and so, after ovn-controller restart, zone-ids may get set inconsistently on lports, resulting in possible hits to already established connections. Set zone-id as an external-id of the interface record, and recover the zone-id from that record Reported-by: Russell Bryant <russ...@ovn.org> Reported-at: https://bugs.launchpad.net/networking-ovn/+bug/1538696 Signed-off-by: Ramu Ramamurthy <ramu.ramamur...@us.ibm.com> --- Changes v3 to v4: update as per code-review * simplify code to update zone-id only when needed * update documentation for external id * update authors AUTHORS | 1 + ovn/controller/binding.c| 39 +++-- ovn/controller/ovn-controller.8.xml | 13 + 3 files changed, 51 insertions(+), 2 deletions(-) diff --git a/AUTHORS b/AUTHORS index 936394d..597899d 100644 --- a/AUTHORS +++ b/AUTHORS @@ -161,6 +161,7 @@ pritesh pritesh.koth...@cisco.com Pravin B Shelar pshe...@nicira.com Raju Subramanianrsubraman...@nicira.com Rami Rosen ramir...@gmail.com +Ramu Ramamurthy ramu.ramamur...@us.ibm.com Randall Sharo andall.sh...@navy.mil Ravi Kerur ravi.ke...@telekom.com Reid Price r...@nicira.com diff --git a/ovn/controller/binding.c b/ovn/controller/binding.c index cb12cea..a2ce2ba 100644 --- a/ovn/controller/binding.c +++ b/ovn/controller/binding.c @@ -50,7 +50,39 @@ binding_register_ovs_idl(struct ovsdb_idl *ovs_idl) } static void -get_local_iface_ids(const struct ovsrec_bridge *br_int, struct shash *lports) +update_local_zone_id(const struct ovsrec_interface *iface_rec, const char *iface_id, + struct simap *ct_zones, unsigned long *ct_zone_bitmap, + struct controller_ctx *ctx) +{ +int zone_id; +const char *OVN_ZONE_ID = "ovn-zone-id"; + +zone_id = smap_get_int(_rec->external_ids, OVN_ZONE_ID, 0); +if (zone_id && !simap_contains(ct_zones, iface_id)) { +/* get zone-id from the local interface record */ +bitmap_set1(ct_zone_bitmap, zone_id); +simap_put(ct_zones, iface_id, zone_id); +} +if (!zone_id && simap_contains(ct_zones, iface_id) && +ctx->ovs_idl_txn) { +/* zone-id has been assigned to lport, but not + * recorded in the local interface record yet */ +struct smap new; +char zone[12]; +zone_id = simap_get(ct_zones, iface_id); +snprintf(zone, sizeof zone, "%d", zone_id); +smap_clone(, _rec->external_ids); +smap_replace(, OVN_ZONE_ID, zone); +ovsrec_interface_verify_external_ids(iface_rec); +ovsrec_interface_set_external_ids(iface_rec, ); +smap_destroy(); +} +} + +static void +get_local_iface_ids(const struct ovsrec_bridge *br_int, struct shash *lports, +struct simap *ct_zones, unsigned long *ct_zone_bitmap, + struct controller_ctx *ctx) { int i; @@ -72,10 +104,13 @@ get_local_iface_ids(const struct ovsrec_bridge *br_int, struct shash *lports) continue; } shash_add(lports, iface_id, iface_rec); +update_local_zone_id(iface_rec, iface_id, ct_zones, + ct_zone_bitmap, ctx); } } } + static void update_ct_zones(struct sset *lports, struct simap *ct_zones, unsigned long *ct_zone_bitmap) @@ -165,7 +200,7 @@ binding_run(struct controller_ctx *ctx, const struct ovsrec_bridge *br_int, struct shash lports = SHASH_INITIALIZER(); if (br_int) { -get_local_iface_ids(br_int, ); +get_local_iface_ids(br_int, , ct_zones, ct_zone_bitmap, ctx); } else { /* We have no integration bridge, therefore no local logical ports. * We'll remove our chassis from all port binding records below. */ diff --git a/ovn/controller/ovn-controller.8.xml b/ovn/controller/ovn-controller.8.xml index b261af9..383de61 100644 --- a/ovn/controller/ovn-controller.8.xml +++ b/ovn/controller/ovn-controller.8.xml @@ -194,6 +194,19 @@ logical patch port that it implements. + + +external-ids:ovn-zone-id in the +Interface table + + + + + This key is set by the ovn-controller to identify the + conntrack-zone id used for the OVN logical port. Its value specifies + the zone-id. + + Runtime Management Commands -- 2.3.9 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH v3] ovn-controller: Assign zone-id consistently
Currently, ovn-controller does not record the lport->zoneid map, and so, after ovn-controller restart, zone-ids may get set inconsistently on lports, resulting in possible hits to already established connections. Set zone-id as an external-id of the interface record, and recover the zone-id from that record Reported-by: Russell Bryant <russ...@ovn.org> Reported-at: https://bugs.launchpad.net/networking-ovn/+bug/1538696 Signed-off-by: Ramu Ramamurthy <ramu.ramamur...@us.ibm.com> --- Changes v1 to v2 * add check for null br-int Changes v2 to v3 * updates per code-review comments ovn/controller/binding.c | 64 +--- 1 file changed, 60 insertions(+), 4 deletions(-) diff --git a/ovn/controller/binding.c b/ovn/controller/binding.c index cb12cea..4aecffe 100644 --- a/ovn/controller/binding.c +++ b/ovn/controller/binding.c @@ -50,7 +50,8 @@ binding_register_ovs_idl(struct ovsdb_idl *ovs_idl) } static void -get_local_iface_ids(const struct ovsrec_bridge *br_int, struct shash *lports) +get_local_iface_ids(const struct ovsrec_bridge *br_int, struct shash *lports, +struct simap *ct_zones, unsigned long *ct_zone_bitmap) { int i; @@ -65,6 +66,8 @@ get_local_iface_ids(const struct ovsrec_bridge *br_int, struct shash *lports) for (j = 0; j < port_rec->n_interfaces; j++) { const struct ovsrec_interface *iface_rec; +const char *zone; +int zone_id; iface_rec = port_rec->interfaces[j]; iface_id = smap_get(_rec->external_ids, "iface-id"); @@ -72,13 +75,65 @@ get_local_iface_ids(const struct ovsrec_bridge *br_int, struct shash *lports) continue; } shash_add(lports, iface_id, iface_rec); +zone = smap_get(_rec->external_ids, "ovn-zone-id"); +if (zone && ovs_scan(zone, "%d", _id)) { +bitmap_set1(ct_zone_bitmap, zone_id); +simap_put(ct_zones, iface_id, zone_id); +} +} +} +} + +static void +update_local_zone_ids(const struct ovsrec_bridge *br_int, struct simap *ct_zones, + struct controller_ctx *ctx) +{ +int i; + +if (!ctx->ovs_idl_txn || !br_int) { +return; +} + +for (i = 0; i < br_int->n_ports; i++) { +const struct ovsrec_port *port_rec = br_int->ports[i]; +int j; + +if (!strcmp(port_rec->name, br_int->name)) { +continue; +} + +for (j = 0; j < port_rec->n_interfaces; j++) { +const struct ovsrec_interface *iface_rec; +const char *iface_id; +struct smap new; +int zone_id; +char zone[12]; + +iface_rec = port_rec->interfaces[j]; +iface_id = smap_get(_rec->external_ids, "iface-id"); + +if (!iface_id || +smap_get(_rec->external_ids, "ovn-zone-id") || +!simap_contains(ct_zones, iface_id)) { +continue; +} + +zone_id = simap_get(ct_zones, iface_id); +snprintf(zone, sizeof zone, "%d", zone_id); +smap_clone(, _rec->external_ids); +smap_replace(, "ovn-zone-id", zone); +ovsrec_interface_verify_external_ids(iface_rec); +ovsrec_interface_set_external_ids(iface_rec, ); +smap_destroy(); } } } static void update_ct_zones(struct sset *lports, struct simap *ct_zones, -unsigned long *ct_zone_bitmap) +unsigned long *ct_zone_bitmap, +const struct ovsrec_bridge *br_int, +struct controller_ctx *ctx) { struct simap_node *ct_zone, *ct_zone_next; const char *iface_id; @@ -119,6 +174,7 @@ update_ct_zones(struct sset *lports, struct simap *ct_zones, * xxx zone, but we need a generic interface to the conntrack * xxx table. */ } +update_local_zone_ids(br_int, ct_zones, ctx); } static void @@ -165,7 +221,7 @@ binding_run(struct controller_ctx *ctx, const struct ovsrec_bridge *br_int, struct shash lports = SHASH_INITIALIZER(); if (br_int) { -get_local_iface_ids(br_int, ); +get_local_iface_ids(br_int, , ct_zones, ct_zone_bitmap); } else { /* We have no integration bridge, therefore no local logical ports. * We'll remove our chassis from all port binding records below. */ @@ -224,7 +280,7 @@ binding_run(struct controller_ctx *ctx, const struct ovsrec_bridge *br_int, VLOG_DBG("No port binding record for lport %s", node->name); } -update_ct_zones(_lports, ct_zones, ct_zone_bitmap); +update_ct_zones(_lports, ct_zones, ct_zone_bitmap, br_int, ctx); shash_destroy(); sset_destroy(_lports); -- 2.3.9 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH] ovn-controller: Assign zone-ids consistently
Currently, conntrack zone-id is assigned to lport by ovn-controller, but the ovn-controller does not remember what was earlier assigned to the same lport (possibly in an earlier run across restart). So, after ovn-controller restart, the zone-ids may get set inconsistently on lports, resulting in possible hits to already established connections. Fix is to remember the zone-id as an external-id of the interface record in the local ovs-db, and recover zone-ids assigned earlier to lports from that record. This patch fixes: https://bugs.launchpad.net/networking-ovn/+bug/1538696 Signed-off-by: Ramu Ramamurthy <ramu.ramamur...@us.ibm.com> --- ovn/controller/binding.c | 70 +++- 1 file changed, 63 insertions(+), 7 deletions(-) diff --git a/ovn/controller/binding.c b/ovn/controller/binding.c index ce9cccf..28adcef 100644 --- a/ovn/controller/binding.c +++ b/ovn/controller/binding.c @@ -46,10 +46,60 @@ binding_register_ovs_idl(struct ovsdb_idl *ovs_idl) ovsdb_idl_add_column(ovs_idl, _interface_col_external_ids); } + static void -get_local_iface_ids(const struct ovsrec_bridge *br_int, struct sset *lports) +update_local_zone_ids(const struct ovsrec_bridge *br_int, struct simap *ct_zones, + struct controller_ctx *ctx) { int i; +struct smap new; +int zone_id; +char *zone; + +if (!ctx->ovs_idl_txn) { +return; +} + +for (i = 0; i < br_int->n_ports; i++) { +const struct ovsrec_port *port_rec = br_int->ports[i]; +const char *iface_id; +int j; + +if (!strcmp(port_rec->name, br_int->name)) { +continue; +} + +for (j = 0; j < port_rec->n_interfaces; j++) { +const struct ovsrec_interface *iface_rec; + +iface_rec = port_rec->interfaces[j]; +iface_id = smap_get(_rec->external_ids, "iface-id"); + +if (!iface_id || smap_get(_rec->external_ids, "zone-id") || +!simap_contains(ct_zones, iface_id)) { +continue; +} + +zone_id = simap_get(ct_zones, iface_id); +zone = xasprintf("%d", zone_id); +smap_clone(, _rec->external_ids); +smap_replace(, "zone-id", zone); +ovsrec_interface_verify_external_ids(iface_rec); +ovsrec_interface_set_external_ids(iface_rec, ); +free(zone); +smap_destroy(); +} +} +} + + +static void +get_local_iface_ids(const struct ovsrec_bridge *br_int, struct sset *lports, +struct simap *ct_zones, unsigned long *ct_zone_bitmap) +{ +int i; +const char *zone; +int zone_id; for (i = 0; i < br_int->n_ports; i++) { const struct ovsrec_port *port_rec = br_int->ports[i]; @@ -69,13 +119,21 @@ get_local_iface_ids(const struct ovsrec_bridge *br_int, struct sset *lports) continue; } sset_add(lports, iface_id); +zone = smap_get(_rec->external_ids, "zone-id"); +if (zone && ovs_scan(zone, "%d", _id)) { +bitmap_set1(ct_zone_bitmap, zone_id); +simap_put(ct_zones, iface_id, zone_id); +} } } } + static void update_ct_zones(struct sset *lports, struct simap *ct_zones, -unsigned long *ct_zone_bitmap) +unsigned long *ct_zone_bitmap, +const struct ovsrec_bridge *br_int, +struct controller_ctx *ctx) { struct simap_node *ct_zone, *ct_zone_next; const char *iface_id; @@ -112,10 +170,8 @@ update_ct_zones(struct sset *lports, struct simap *ct_zones, bitmap_set1(ct_zone_bitmap, zone); simap_put(ct_zones, iface_id, zone); -/* xxx We should erase any old entries for this - * xxx zone, but we need a generic interface to the conntrack - * xxx table. */ } +update_local_zone_ids(br_int, ct_zones, ctx); } static void @@ -154,7 +210,7 @@ binding_run(struct controller_ctx *ctx, const struct ovsrec_bridge *br_int, sset_init(); sset_init(_lports); if (br_int) { -get_local_iface_ids(br_int, ); +get_local_iface_ids(br_int, , ct_zones, ct_zone_bitmap); } else { /* We have no integration bridge, therefore no local logical ports. * We'll remove our chassis from all port binding records below. */ @@ -203,7 +259,7 @@ binding_run(struct controller_ctx *ctx, const struct ovsrec_bridge *br_int, VLOG_DBG("No port binding record for lport %s", name); } -update_ct_zones(_lports, ct_zones, ct_zone_bitmap); +update_ct_zones(_lports, ct_zones, ct_zone_bitmap, br_int, ctx); sset_destroy(); sset_destroy(_lports); -- 1.8.3.1 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] ovn-controller: Assign zone-ids consistently
Resend (apologies) as the mailer mangled spaces on the patch. Signed-off-by: Ramu Ramamurthy <ramu.ramamur...@us.ibm.com> --- ovn/controller/binding.c | 70 +++- 1 file changed, 63 insertions(+), 7 deletions(-) diff --git a/ovn/controller/binding.c b/ovn/controller/binding.c index ce9cccf..28adcef 100644 --- a/ovn/controller/binding.c +++ b/ovn/controller/binding.c @@ -46,10 +46,60 @@ binding_register_ovs_idl(struct ovsdb_idl *ovs_idl) ovsdb_idl_add_column(ovs_idl, _interface_col_external_ids); } + static void -get_local_iface_ids(const struct ovsrec_bridge *br_int, struct sset *lports) +update_local_zone_ids(const struct ovsrec_bridge *br_int, struct simap *ct_zones, + struct controller_ctx *ctx) { int i; +struct smap new; +int zone_id; +char *zone; + +if (!ctx->ovs_idl_txn) { +return; +} + +for (i = 0; i < br_int->n_ports; i++) { +const struct ovsrec_port *port_rec = br_int->ports[i]; +const char *iface_id; +int j; + +if (!strcmp(port_rec->name, br_int->name)) { +continue; +} + +for (j = 0; j < port_rec->n_interfaces; j++) { +const struct ovsrec_interface *iface_rec; + +iface_rec = port_rec->interfaces[j]; +iface_id = smap_get(_rec->external_ids, "iface-id"); + +if (!iface_id || smap_get(_rec->external_ids, "zone-id") || +!simap_contains(ct_zones, iface_id)) { +continue; +} + +zone_id = simap_get(ct_zones, iface_id); +zone = xasprintf("%d", zone_id); +smap_clone(, _rec->external_ids); +smap_replace(, "zone-id", zone); +ovsrec_interface_verify_external_ids(iface_rec); +ovsrec_interface_set_external_ids(iface_rec, ); +free(zone); +smap_destroy(); +} +} +} + + +static void +get_local_iface_ids(const struct ovsrec_bridge *br_int, struct sset *lports, +struct simap *ct_zones, unsigned long *ct_zone_bitmap) +{ +int i; +const char *zone; +int zone_id; for (i = 0; i < br_int->n_ports; i++) { const struct ovsrec_port *port_rec = br_int->ports[i]; @@ -69,13 +119,21 @@ get_local_iface_ids(const struct ovsrec_bridge *br_int, struct sset *lports) continue; } sset_add(lports, iface_id); +zone = smap_get(_rec->external_ids, "zone-id"); +if (zone && ovs_scan(zone, "%d", _id)) { +bitmap_set1(ct_zone_bitmap, zone_id); +simap_put(ct_zones, iface_id, zone_id); +} } } } + static void update_ct_zones(struct sset *lports, struct simap *ct_zones, -unsigned long *ct_zone_bitmap) +unsigned long *ct_zone_bitmap, +const struct ovsrec_bridge *br_int, +struct controller_ctx *ctx) { struct simap_node *ct_zone, *ct_zone_next; const char *iface_id; @@ -112,10 +170,8 @@ update_ct_zones(struct sset *lports, struct simap *ct_zones, bitmap_set1(ct_zone_bitmap, zone); simap_put(ct_zones, iface_id, zone); -/* xxx We should erase any old entries for this - * xxx zone, but we need a generic interface to the conntrack - * xxx table. */ } +update_local_zone_ids(br_int, ct_zones, ctx); } static void @@ -154,7 +210,7 @@ binding_run(struct controller_ctx *ctx, const struct ovsrec_bridge *br_int, sset_init(); sset_init(_lports); if (br_int) { -get_local_iface_ids(br_int, ); +get_local_iface_ids(br_int, , ct_zones, ct_zone_bitmap); } else { /* We have no integration bridge, therefore no local logical ports. * We'll remove our chassis from all port binding records below. */ @@ -203,7 +259,7 @@ binding_run(struct controller_ctx *ctx, const struct ovsrec_bridge *br_int, VLOG_DBG("No port binding record for lport %s", name); } -update_ct_zones(_lports, ct_zones, ct_zone_bitmap); +update_ct_zones(_lports, ct_zones, ct_zone_bitmap, br_int, ctx); sset_destroy(); sset_destroy(_lports); -- 1.8.3.1 On 2016-02-08 10:48, Ramu Ramamurthy wrote: Currently, conntrack zone-id is assigned to lport by ovn-controller, but the ovn-controller does not remember what was earlier assigned to the same lport (possibly in an earlier run across restart). So, after ovn-controller restart, the zone-ids may get set inconsistently on lports, resulting in possible hits to already established connections. Fix is to remember the zone-id as an external-id of the interface record in the local ovs-db, and recover zone-ids assigned earlier to lports from that record.
Re: [ovs-dev] [PATCH] ovn-controller: Assign zone-ids consistently
On 2016-02-08 11:12, Russell Bryant wrote: On 02/08/2016 02:00 PM, Ramu Ramamurthy wrote: Resend (apologies) as the mailer mangled spaces on the patch. Can you try posting with the git-send-email utility? I am posting again from a better behaved mailer ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] ovs segments frames when transmitting to veth
Problem: - We are running centos 7.2 (ovs-kernel module from distribution) with OVS 2.3.1. Large tcp frames (> 30Kbytes gro'd) are coming into a vxlan-tunnel port and switched by an ovs-bridge onto a veth port vxlan-tunnel--(gro'd tcp frames sized > 30K)-->OVS-bridge--(frames sized 1448)-->veth port As the ovs transmits onto the veth port, the frames are getting segmented into 1448 byte sized packets. All the offload settings are at default on the veth port (gso on, gro on, tso on) We expect that the OVS can transmit the frame without segmentation onto the veth port. But, due to segmentation, the performance suffers. Root Cause --- The following trace with debugs shows the root cause that the gso_type is not correctly set on the skb as the packet comes in, and as a result, the dev_xmit_skb is segmenting the frame as ovs transmits it. ksoftirqd/25-231 [025] ..s. 319.964120: vxlan_rcv: gso_size=1448,gso_type=1001 The above is debug in vport-vxlan.c:vxlan_rcv: trace_printk("gso_size=%d,gso_type=%x\n",skb_shinfo(skb)->gso_size,skb_shinfo(skb)->gso_type); Here, gso_type seem to have an invalid value. ksoftirqd/25-231 [025] ..s. 319.964120: ovs_vport_receive <-vxlan_rcv ksoftirqd/25-231 [025] ..s. 319.964121: ovs_flow_key_extract <-ovs_vport_receive ksoftirqd/25-231 [025] ..s. 319.964121: key_extract <-ovs_flow_key_extract ksoftirqd/25-231 [025] ..s. 319.964121: check_header <-key_extract ksoftirqd/25-231 [025] ..s. 319.964121: tcphdr_ok <-key_extract ksoftirqd/25-231 [025] ..s. 319.964121: ovs_dp_process_packet <-ovs_vport_receive ksoftirqd/25-231 [025] ..s. 319.964121: ovs_flow_tbl_lookup_stats <-ovs_dp_process_packet ksoftirqd/25-231 [025] ..s. 319.964121: masked_flow_lookup <-ovs_flow_tbl_lookup_stats ksoftirqd/25-231 [025] ..s. 319.964121: ovs_flow_mask_key <-masked_flow_lookup ksoftirqd/25-231 [025] ..s. 319.964122: find_bucket.isra.2 <-masked_flow_lookup ksoftirqd/25-231 [025] ..s. 319.964122: masked_flow_lookup <-ovs_flow_tbl_lookup_stats ksoftirqd/25-231 [025] ..s. 319.964122: ovs_flow_mask_key <-masked_flow_lookup ksoftirqd/25-231 [025] ..s. 319.964122: find_bucket.isra.2 <-masked_flow_lookup ksoftirqd/25-231 [025] ..s. 319.964122: ovs_flow_stats_update <-ovs_dp_process_packet ksoftirqd/25-231 [025] ..s. 319.964122: _raw_spin_loca <-ovs_flow_stats_update ksoftirqd/25-231 [025] ..s. 319.964122: _raw_spin_unlock <-ovs_flow_stats_update ksoftirqd/25-231 [025] ..s. 319.964122: ovs_execute_actions <-ovs_dp_process_packet ksoftirqd/25-231 [025] ..s. 319.964123: do_execute_actions <-ovs_execute_actions ksoftirqd/25-231 [025] ..s. 319.964123: do_output <-do_execute_actions ksoftirqd/25-231 [025] ..s. 319.964123: ovs_lookup_vport <-do_output ksoftirqd/25-231 [025] ..s. 319.964123: ovs_vport_send <-do_output ksoftirqd/25-231 [025] ..s. 319.964123: netdev_send <-ovs_vport_send ksoftirqd/25-231 [025] ..s. 319.964123: netdev_send: devmtu=8950 ksoftirqd/25-231 [025] ..s. 319.964123: netdev_send: devname=ovs-veth0 ksoftirqd/25-231 [025] ..s. 319.964124: netdev_send: gso_segs=45 ksoftirqd/25-231 [025] ..s. 319.964124: netif_skb_features <-netdev_send ksoftirqd/25-231 [025] ..s. 319.964124: skb_network_protocol <-netif_skb_features ksoftirqd/25-231 [025] ..s. 319.964124: netdev_send: ip_summed=3 ksoftirqd/25-231 [025] ..s. 319.964124: netdev_send: gso_segs=45,len=65226,feat=6103db59e9 ksoftirqd/25-231 [025] ..s. 319.964124: netdev_send: net_gso_ok=0,has_frag_list=1,frag_feat=1 ksoftirqd/25-231 [025] ..s. 319.964125: netdev_send: gso_size=1448,gso_type=4097,ok=0 The above are debugs in vport-netdev.c:netdev_send: trace_printk("devmtu=%d\n",mtu); trace_printk("devname=%s\n",netdev_vport->dev->name); trace_printk("gso_segs=%d\n",skb_shinfo(skb)->gso_segs); features = netif_skb_features(skb); feature = skb_shinfo(skb)->gso_type << NETIF_F_GSO_SHIFT; net_gso_ok = ((features & feature) == feature); gso_ok = skb_gso_ok(skb, features); has_frag_list = skb_has_frag_list(skb); frag_feat = features & NETIF_F_FRAGLIST ? 1 : 0; trace_printk("ip_summed=%d\n",skb->ip_summed); trace_printk("gso_segs=%d,len=%d,feat=%llx\n",skb_shinfo(skb)->gso_segs,len,features); trace_printk("net_gso_ok=%d,has_frag_list=%d,frag_feat=%d\n",net_gso_ok,has_frag_list,frag_feat); trace_printk("gso_size=%d,gso_type=%d,ok=%d\n",skb_shinfo(skb)->gso_size,skb_shinfo(skb)->gso_type,gso_ok); ksoftirqd/25-231 [025] ..s. 319.964125: dev_queue_xmit <-netdev_send ksoftirqd/25-231 [025] ..s. 319.964125: local_bh_disable <-dev_queue_xmit ksoftirqd/25-231 [025] ..s.
Re: [ovs-dev] ovs segments frames when transmitting to veth
On 2016-01-13 16:22, Jesse Gross wrote: On Wed, Jan 13, 2016 at 3:51 PM, Ramu Ramamurthy <srama...@linux.vnet.ibm.com> wrote: Problem: - We are running centos 7.2 (ovs-kernel module from distribution) with OVS 2.3.1. Large tcp frames (> 30Kbytes gro'd) are coming into a vxlan-tunnel port and switched by an ovs-bridge onto a veth port vxlan-tunnel--(gro'd tcp frames sized > 30K)-->OVS-bridge--(frames sized 1448)-->veth port As the ovs transmits onto the veth port, the frames are getting segmented into 1448 byte sized packets. All the offload settings are at default on the veth port (gso on, gro on, tso on) We expect that the OVS can transmit the frame without segmentation onto the veth port. But, due to segmentation, the performance suffers. I think the problem is that tunnel decapsulation (in all cases, not just VXLAN, and not just with OVS) doesn't appear to strip off the GSO type that indicates that it is a tunnel frame. As a result, the veth (which didn't support tunnel GSO until 3.13) sees this as a tunnel frame rather than just a TCP frame and forces software segmentation. The issue isn't with OVS code, so I would recommend that you submit a patch to upstream Linux. Thanks, I will look into the fix for this upstream. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH RFC] - vxlan udp csum and openstack ovs performance
On 2015-08-27 18:26, Jesse Gross wrote: On Thu, Aug 27, 2015 at 5:46 PM, Ramu Ramamurthy srama...@linux.vnet.ibm.com wrote: We found that enabling UDP checksums for vxlan triggers GRO on the receiver, and boosts the vxlan performance to 8+ Gbps. Hence, we want to enable vxlan csum for OVS vxlan tunnels. This is already supported in OVS 2.4, and kernel versions 4+ - But we want this fix to appear in 3.x upstream kernels, and the corresponding userspace change in OVS 2.3. Sorry, this isn't going to happen. Neither the upstream kernel nor OVS backport features to already released versions. OVS 2.4 has been released with this feature and is packaged with a kernel module of backports. Please just use that - considering that you'd have to make changes all the way up the stack, I don't think it's any more disruptive. We cannot change the base distribution - we are using RHEL 7 (3.x based). So, you are suggesting to use OVS 2.4 userspace components, and build the openvswitch kernel-module shipped with OVS 2.4 along with its dependencies (vxlan, etc) also shipped with OVS 2.4 against the RHEL 7.1 (3.x kernel) - and not use any of the upstream components for openvswitch, vxlan etc ? ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH RFC] - vxlan udp csum and openstack ovs performance
posting again, since the earlier emailer mangles email addresses. Overview: We are running openstack kilo with neutron OVS using intel 10G adapters (br kx4 dual-port, 82599es). neutron OVS is using vxlan tunnels. We are using RHEL 7.1 (which uses the 3.x kernel). Openvswitch kernel module is from upstream (3.x) and ovs is 2.3.0 based. Problem: We find that the aggregate throughput does not exceed 3Gbps on the 10Gb link when running various VM test scenarios (40VMs--40VMs). The root cause is the inability to perform GRO at the physical device for the vxlan protocol. See discussion on this problem and its follow ups: http://marc.info/?t=14352771932r=1w=2 Fix: We found that enabling UDP checksums for vxlan triggers GRO on the receiver, and boosts the vxlan performance to 8+ Gbps. Hence, we want to enable vxlan csum for OVS vxlan tunnels. This is already supported in OVS 2.4, and kernel versions 4+ - But we want this fix to appear in 3.x upstream kernels, and the corresponding userspace change in OVS 2.3. So, the following set of patches achieves 8+Gbps throughput on 10G adapters (vs the 2Gbps without these) on the above Openstack topology. 1) Backport this patch (to ovs 2.3) which allows the csum configuration on vxlan ports http://openvswitch.org/pipermail/dev/2015-March/052823.html 2) Apply this new patch to send the CSUM attribute to the kernel module --- /home/ovs/tmp/openvswitch-2.3.0/lib/dpif-linux.c 2014-08-14 16:34:33.0 -0400 +++ lib/dpif-linux.c 2015-08-27 18:56:27.282610654 -0400 @@ -666,10 +666,15 @@ } tnl_cfg = netdev_get_tunnel_config(netdev); -if (tnl_cfg tnl_cfg-dst_port != 0) { +if (tnl_cfg (tnl_cfg-dst_port != 0 || tnl_cfg-csum)) { ofpbuf_use_stack(options, options_stub, sizeof options_stub); -nl_msg_put_u16(options, OVS_TUNNEL_ATTR_DST_PORT, - ntohs(tnl_cfg-dst_port)); +if (tnl_cfg-dst_port != 0) { +nl_msg_put_u16(options, OVS_TUNNEL_ATTR_DST_PORT, + ntohs(tnl_cfg-dst_port)); +} +if (tnl_cfg-csum) { +nl_msg_put_flag(options, OVS_TUNNEL_KEY_ATTR_CSUM); +} request.options = ofpbuf_data(options); request.options_len = ofpbuf_size(options); } 3) Apply this patch to the upstream 3.x kernels to set the UDP csum flag when creating the vxlan socket --- ../linux-3.10.0-229.el7/net/openvswitch/vport-vxlan.c 2015-01-29 18:15:53.0 -0500 +++ vport-vxlan.c 2015-08-27 18:51:35.619688332 -0400 @@ -100,6 +100,7 @@ struct nlattr *a; u16 dst_port; int err; + u32 vxlan_sock_flags = 0; if (!options) { err = -EINVAL; @@ -114,6 +115,11 @@ goto error; } + a = nla_find_nested(options, OVS_TUNNEL_KEY_ATTR_CSUM); + if (a) { + vxlan_sock_flags |= VXLAN_F_UDP_CSUM; + } + vport = ovs_vport_alloc(sizeof(struct vxlan_port), ovs_vxlan_vport_ops, parms); if (IS_ERR(vport)) @@ -122,7 +128,7 @@ vxlan_port = vxlan_vport(vport); strncpy(vxlan_port-name, parms-name, IFNAMSIZ); - vs = vxlan_sock_add(net, htons(dst_port), vxlan_rcv, vport, true, 0); + vs = vxlan_sock_add(net, htons(dst_port), vxlan_rcv, vport, true, vxlan_sock_flags); if (IS_ERR(vs)) { ovs_vport_free(vport); return (void *)vs; 4) Patch Openstack neutron-ovs-agent to optionally set the csum flag on vxlan tunnels (this will be taken to the Openstack Neutron community) Conclusion: Since above patches provide a huge performance boost to OVS-based Openstack neutron using vxlan tunnels on widely deployed 10G adapters, we want the community feedback on how best to apply the above patches. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev