On Thu, Mar 10, 2016 at 05:55:46PM -0800, Justin Pettit wrote: > > > On Feb 19, 2016, at 4:40 PM, Ben Pfaff <b...@ovn.org> wrote: > > > > > > --- a/ovn/TODO > > +++ b/ovn/TODO > > @@ -4,18 +4,11 @@ > > > > ** New OVN logical actions > > > > > > +*** rate_limit > > > > TCP/IP stacks typically limit the rate at which ARPs are sent, e.g. to > > one per second for a given target. We might need to do this too. > > > > .... > > +*** Ratelimiting. > > > > +From casual observation, Linux appears to generate at most one ARP per > > +second per destination. > > It seems like this is saying similar things in two spots.
OK, I removed the item from the logical action section and added a sentence later: This might be supported by adding a new OVN logical action for rate-limiting. > > *** Renewal and expiration. > > > > Something needs to make sure that bindings remain valid and expire > > those that become stale. > > As we discussed in-person, I wonder if we can introduce the concept of > time into the database. Either the database itself could have > time-based limits for columns or we could provide the ability to > define queries based on relative. For example, ovn-northd could > regularly send a delete query that matches any row with an age older > than five seconds. I have some doubts about this approach, but I don't have a design that I'm happy with either. I am not confident at all that centralizing ARP resolution into the database is the best approach. For now, I added a sentence here: "One way to do this might be to add some support for time to the database server itself." > > +/* Adds a flow to table */ > > +static void > > +add_neighbor_flows(struct controller_ctx *ctx, > > + const struct lport_index *lports, struct hmap > > *flow_table) > > I think the comment describing this function didn't get finished. Oops. Revised: /* Adds an OpenFlow flow to 'flow_table' for each MAC binding in the OVN * southbound database, using 'lports' to resolve logical port names to * numbers. */ > > diff --git a/ovn/controller/pinctrl.c b/ovn/controller/pinctrl.c > > index 70088f6..2261475 100644 > > --- a/ovn/controller/pinctrl.c > > +++ b/ovn/controller/pinctrl.c > > > > @@ -137,6 +161,10 @@ pinctrl_handle_arp(const struct flow *ip_flow, struct > > ofpbuf *userdata) > > goto exit; > > } > > > > + struct ds s = DS_EMPTY_INITIALIZER; > > + ofpacts_format(ofpacts.data, ofpacts.size, &s); > > + ds_destroy(&s); > > Is this doing anything useful? No. Deleted. > > @@ -183,21 +212,24 @@ process_packet_in(const struct ofp_header *msg) > > struct flow headers; > > flow_extract(&packet, &headers); > > > > - const struct flow *md = &pin.flow_metadata.flow; > > switch (ntohl(ah->opcode)) { > > ... > > default: > > - VLOG_WARN_RL(&rl, "unrecognized packet-in command %#"PRIx32, > > - md->regs[0]); > > + VLOG_WARN_RL(&rl, "unrecognized packet-in opcode %"PRIu32, > > ah->opcode); > > Does "ah->opcode" need an ntohl() around it? Yes. Added. (Too bad sparse doesn't report these.) > > +static void > > +pinctrl_handle_put_arp(const struct lport_index *lports, > > + const struct flow *md, const struct flow *headers) > > +{ > > + if (n_put_arps >= ARRAY_SIZE(put_arps)) { > > + return; > > + } > > I don't think we've defined any coverage counters for OVN yet, but I > wonder if we should start doing that. These sorts of silent errors > might be good candidates. OK, I added one here, "pinctrl_drop_put_arp". > > ... > > + struct put_arp *pa = &put_arps[n_put_arps++]; > > + pa->timestamp = time_msec(); > > + pa->logical_port = xstrdup(pb->logical_port); > > + pa->ip = htonl(md->regs[0]); > > + pa->mac = headers->dl_src; > > This code just uses and array to store the requests, but I worry that > if there are too many ARP replies that this queue could overrun. What > about using a hash table instead? In the case of multiple identical > put_arps, that approach would also lessen the impact of the fairly > expensive loop in run_put_arps() that looks for duplicates. OK, I changed this to a hash table. It was not difficult, but it was a large change. I think it's OK, but I'd still appreciate it if you'd take another look at the commit I applied just to be sure. > > +static void > > +put_load(uint64_t value, enum mf_field_id dst, int ofs, int n_bits, > > + struct ofpbuf *ofpacts) > > +{ > > + struct ofpact_set_field *sf = ofpact_put_SET_FIELD(ofpacts); > > + sf->field = mf_from_id(dst); > > + sf->flow_has_vlan = false; > > + > > + ovs_be64 n_value = htonll(value); > > + bitwise_copy(&n_value, 8, 0, &sf->value, sf->field->n_bytes, ofs, > > n_bits); > > + bitwise_one(&sf->mask, sf->field->n_bytes, ofs, n_bits); > > +} > > We discussed this off-line, but there are three put_load() functions > in the "ovn" directory--two of which have identical implementations. > Similarly there's an emit_resubmit() in "ovn/lib/actions.c" and a > put_resubmit() in "ovn/controller/physical.c" with identical > implementations but slightly different arguments. I wonder if it > would make sense to start putting these into a library. It's about time. > > diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml > > index 1b2912e..4685143 100644 > > --- a/ovn/northd/ovn-northd.8.xml > > +++ b/ovn/northd/ovn-northd.8.xml > > > > > > > > @@ -602,64 +617,73 @@ icmp4 { > > <ul> > > <li> > > <p> > > - Known MAC bindings. For each IP address <var>A</var> whose host > > is > > - known to have Ethernet address <var>HE</var> and reside on router > > - port <var>P</var> with Ethernet address <var>PE</var>, a > > priority-200 > > - flow with match <code>reg0 == <var>A</var></code> has the > > following > > - actions: > > + Static MAC bindings. MAC bindings can be known statically based > > on > > + data in the <code>OVN_Northbound</code> database. For router > > ports > > + connected to logical switches, MAC bindings can be known > > statically > > + from the <code>addresses</code> column in the > > + <code>Logical_Port</code> table. For router ports connected to > > other > > + logical routers, MAC bindings can be known statically from the > > I wonder if it would be clearer to say "For routed ports" instead of > "For router ports" in these two sentence, since they're different from > "logical router ports" but sound similar. I don't understand the distinction you're making. These sentences talk about how a logical router can be aware of static MAC-IP bindings on its logical router ports. What do you mean? > > + Dynamic MAC bindings. This flows resolves MAC-to-IP bindings > > that > > + have become known dynamically through ARP. (The next table will > > + issue an ARP request for cases where the binding is not yet > > known.) > > + </p> > > It might be nice to point out that the packet that is being resolved > is effectively dropped. Yes, but the next table, for sending ARP requests, is the right place to do that. > > + Unknown MAC address. A priority-100 flow with match > > <code>eth.dst == > > + 00:00:00:00:00:00</code> has the following actions: > > </p> > > > > <pre> > > +rate_limit(outport, ip4.dst); > > It feels weird to have an action listed here that we don't support, > but it's nice to have a reference where it's needed. Maybe leave it > here, but then add a note at the bottom that it still needs to be > implemented? This was just overlooked. I deleted it for now. I don't think rate-limiting is in fact the biggest issue here. > > diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c > > index e6271cf..d511b4d 100644 > > --- a/ovn/northd/ovn-northd.c > > +++ b/ovn/northd/ovn-northd.c > > > > @@ -389,17 +391,18 @@ join_datapaths(struct northd_context *ctx, struct > > hmap *datapaths, > > ... > > + /* Set the gateway port to NULL. If there is a gateway, it will > > get > > + * filled in as we go through the ports later. */ > > + od->gateway_port = NULL; > > I think this requires that build_datapaths() be called before > build_ports(). The code does that, but do you think it's worth > mentioning that in comments describing the functions? OK, I added function-level comments for build_datapaths() and build_ports(). > Thanks for working on this--it's pretty cool! > > Acked-by: Justin Pettit <jpet...@ovn.org> Thanks. I applied this to master, with the following incremental folded in. --8<--------------------------cut here-------------------------->8-- diff --git a/ovn/TODO b/ovn/TODO index 07d0680..0a6225d 100644 --- a/ovn/TODO +++ b/ovn/TODO @@ -4,11 +4,6 @@ ** New OVN logical actions -*** rate_limit - -TCP/IP stacks typically limit the rate at which ARPs are sent, e.g. to -one per second for a given target. We might need to do this too. - *** icmp4 { action... } Generates an ICMPv4 packet based on the current IPv4 packet and @@ -61,6 +56,9 @@ using ARP. From casual observation, Linux appears to generate at most one ARP per second per destination. +This might be supported by adding a new OVN logical action for +rate-limiting. + *** Tracking queries It's probably best to only record in the database responses to queries @@ -73,6 +71,9 @@ into the database. Something needs to make sure that bindings remain valid and expire those that become stale. +One way to do this might be to add some support for time to the +database server itself. + *** Table size limiting. The table of MAC bindings must not be allowed to grow unreasonably diff --git a/ovn/controller/lflow.c b/ovn/controller/lflow.c index b5d00f7..48bb9c8 100644 --- a/ovn/controller/lflow.c +++ b/ovn/controller/lflow.c @@ -363,7 +363,9 @@ put_load(const uint8_t *data, size_t len, bitwise_one(&sf->mask, sf->field->n_bytes, ofs, n_bits); } -/* Adds a flow to table */ +/* Adds an OpenFlow flow to 'flow_table' for each MAC binding in the OVN + * southbound database, using 'lports' to resolve logical port names to + * numbers. */ static void add_neighbor_flows(struct controller_ctx *ctx, const struct lport_index *lports, struct hmap *flow_table) diff --git a/ovn/controller/pinctrl.c b/ovn/controller/pinctrl.c index 5406ba0..3fcab99 100644 --- a/ovn/controller/pinctrl.c +++ b/ovn/controller/pinctrl.c @@ -17,6 +17,7 @@ #include "pinctrl.h" +#include "coverage.h" #include "dirs.h" #include "dp-packet.h" #include "lport.h" @@ -44,18 +45,23 @@ static struct rconn *swconn; * rconn_get_connection_seqno(rconn), 'swconn' has reconnected. */ static unsigned int conn_seq_no; -static void pinctrl_handle_put_arp(const struct lport_index *, - const struct flow *md, +static void pinctrl_handle_put_arp(const struct flow *md, const struct flow *headers); -static void flush_put_arps(void); -static void run_put_arps(struct controller_ctx *); +static void init_put_arps(void); +static void destroy_put_arps(void); +static void run_put_arps(struct controller_ctx *, + const struct lport_index *lports); static void wait_put_arps(struct controller_ctx *); +static void flush_put_arps(void); + +COVERAGE_DEFINE(pinctrl_drop_put_arp); void pinctrl_init(void) { swconn = rconn_create(5, 0, DSCP_DEFAULT, 1 << OFP13_VERSION); conn_seq_no = 0; + init_put_arps(); } static ovs_be32 @@ -169,10 +175,6 @@ pinctrl_handle_arp(const struct flow *ip_flow, const struct match *md, goto exit; } - struct ds s = DS_EMPTY_INITIALIZER; - ofpacts_format(ofpacts.data, ofpacts.size, &s); - ds_destroy(&s); - struct ofputil_packet_out po = { .packet = dp_packet_data(&packet), .packet_len = dp_packet_size(&packet), @@ -190,8 +192,7 @@ exit: } static void -process_packet_in(const struct lport_index *lports, - const struct ofp_header *msg) +process_packet_in(const struct ofp_header *msg) { static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5); @@ -226,18 +227,18 @@ process_packet_in(const struct lport_index *lports, break; case ACTION_OPCODE_PUT_ARP: - pinctrl_handle_put_arp(lports, &pin.flow_metadata.flow, &headers); + pinctrl_handle_put_arp(&pin.flow_metadata.flow, &headers); break; default: - VLOG_WARN_RL(&rl, "unrecognized packet-in opcode %"PRIu32, ah->opcode); + VLOG_WARN_RL(&rl, "unrecognized packet-in opcode %"PRIu32, + ntohl(ah->opcode)); break; } } static void -pinctrl_recv(const struct lport_index *lports, - const struct ofp_header *oh, enum ofptype type) +pinctrl_recv(const struct ofp_header *oh, enum ofptype type) { if (type == OFPTYPE_ECHO_REQUEST) { queue_msg(make_echo_reply(oh)); @@ -250,7 +251,7 @@ pinctrl_recv(const struct lport_index *lports, config.miss_send_len = UINT16_MAX; set_switch_config(swconn, &config); } else if (type == OFPTYPE_PACKET_IN) { - process_packet_in(lports, oh); + process_packet_in(oh); } else if (type != OFPTYPE_ECHO_REPLY && type != OFPTYPE_BARRIER_REPLY) { if (VLOG_IS_DBG_ENABLED()) { static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(30, 300); @@ -300,12 +301,12 @@ pinctrl_run(struct controller_ctx *ctx, const struct lport_index *lports, enum ofptype type; ofptype_decode(&type, oh); - pinctrl_recv(lports, oh, type); + pinctrl_recv(oh, type); ofpbuf_delete(msg); } } - run_put_arps(ctx); + run_put_arps(ctx, lports); } void @@ -320,7 +321,7 @@ void pinctrl_destroy(void) { rconn_destroy(swconn); - flush_put_arps(); + destroy_put_arps(); } /* Implementation of the "put_arp" OVN action. This action sends a packet to @@ -333,103 +334,153 @@ pinctrl_destroy(void) * we buffer up a few put_arps (but we don't keep them longer than 1 second) * and apply them whenever a database transaction is available. */ -/* Buffered "put_arp" operations. */ +/* Buffered "put_arp" operation. */ struct put_arp { + struct hmap_node hmap_node; /* In 'put_arps'. */ + long long int timestamp; /* In milliseconds. */ - char *logical_port; + + /* Key. */ + uint32_t dp_key; + uint32_t port_key; ovs_be32 ip; + + /* Value. */ struct eth_addr mac; }; -static struct put_arp put_arps[1024]; -static size_t n_put_arps; + +/* Contains "struct put_arp"s. */ +static struct hmap put_arps; static void -pinctrl_handle_put_arp(const struct lport_index *lports, - const struct flow *md, const struct flow *headers) +init_put_arps(void) { - if (n_put_arps >= ARRAY_SIZE(put_arps)) { - return; + hmap_init(&put_arps); +} + +static void +destroy_put_arps(void) +{ + flush_put_arps(); + hmap_destroy(&put_arps); +} + +static struct put_arp * +pinctrl_find_put_arp(uint32_t dp_key, uint32_t port_key, ovs_be32 ip, + uint32_t hash) +{ + struct put_arp *pa; + HMAP_FOR_EACH_WITH_HASH (pa, hmap_node, hash, &put_arps) { + if (pa->dp_key == dp_key + && pa->port_key == port_key + && pa->ip == ip) { + return pa; + } } + return NULL; +} - /* Convert logical datapath and logical port key into lport. */ +static void +pinctrl_handle_put_arp(const struct flow *md, const struct flow *headers) +{ uint32_t dp_key = ntohll(md->metadata); uint32_t port_key = md->regs[MFF_LOG_INPORT - MFF_REG0]; - const struct sbrec_port_binding *pb - = lport_lookup_by_key(lports, dp_key, port_key); - if (!pb) { - static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5); + ovs_be32 ip = htonl(md->regs[0]); + uint32_t hash = hash_3words(dp_key, port_key, (OVS_FORCE uint32_t) ip); + struct put_arp *pa = pinctrl_find_put_arp(dp_key, port_key, ip, hash); + if (!pa) { + if (hmap_count(&put_arps) >= 1000) { + COVERAGE_INC(pinctrl_drop_put_arp); + return; + } - VLOG_WARN_RL(&rl, "unknown logical port with datapath %"PRIu32" and " - "port %"PRIu32, dp_key, port_key); - return; + pa = xmalloc(sizeof *pa); + hmap_insert(&put_arps, &pa->hmap_node, hash); + pa->dp_key = dp_key; + pa->port_key = port_key; + pa->ip = ip; } - - struct put_arp *pa = &put_arps[n_put_arps++]; pa->timestamp = time_msec(); - pa->logical_port = xstrdup(pb->logical_port); - pa->ip = htonl(md->regs[0]); pa->mac = headers->dl_src; } static void -flush_put_arps(void) +run_put_arp(struct controller_ctx *ctx, const struct lport_index *lports, + const struct put_arp *pa) { - for (struct put_arp *pa = put_arps; pa < &put_arps[n_put_arps]; pa++) { - free(pa->logical_port); + if (time_msec() > pa->timestamp + 1000) { + return; } - n_put_arps = 0; -} -static void -run_put_arps(struct controller_ctx *ctx) -{ - if (!ctx->ovnsb_idl_txn) { + /* Convert logical datapath and logical port key into lport. */ + const struct sbrec_port_binding *pb + = lport_lookup_by_key(lports, pa->dp_key, pa->port_key); + if (!pb) { + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5); + + VLOG_WARN_RL(&rl, "unknown logical port with datapath %"PRIu32" " + "and port %"PRIu32, pa->dp_key, pa->port_key); return; } - for (const struct put_arp *pa = put_arps; pa < &put_arps[n_put_arps]; - pa++) { - if (time_msec() > pa->timestamp + 1000) { - continue; - } + /* Convert arguments to string form for database. */ + char ip_string[INET_ADDRSTRLEN + 1]; + snprintf(ip_string, sizeof ip_string, IP_FMT, IP_ARGS(pa->ip)); + + char mac_string[ETH_ADDR_STRLEN + 1]; + snprintf(mac_string, sizeof mac_string, + ETH_ADDR_FMT, ETH_ADDR_ARGS(pa->mac)); - /* Convert arguments to string form for database. */ - char ip_string[INET_ADDRSTRLEN + 1]; - snprintf(ip_string, sizeof ip_string, IP_FMT, IP_ARGS(pa->ip)); - char mac_string[ETH_ADDR_STRLEN + 1]; - snprintf(mac_string, sizeof mac_string, - ETH_ADDR_FMT, ETH_ADDR_ARGS(pa->mac)); - - /* Check for and update an existing IP-MAC binding for this logical - * port. - * - * XXX This is not very efficient. */ - const struct sbrec_mac_binding *b; - SBREC_MAC_BINDING_FOR_EACH (b, ctx->ovnsb_idl) { - if (!strcmp(b->logical_port, pa->logical_port) - && !strcmp(b->ip, ip_string)) { - if (strcmp(b->mac, mac_string)) { - sbrec_mac_binding_set_mac(b, mac_string); - } - goto next; + /* Check for and update an existing IP-MAC binding for this logical + * port. + * + * XXX This is not very efficient. */ + const struct sbrec_mac_binding *b; + SBREC_MAC_BINDING_FOR_EACH (b, ctx->ovnsb_idl) { + if (!strcmp(b->logical_port, pb->logical_port) + && !strcmp(b->ip, ip_string)) { + if (strcmp(b->mac, mac_string)) { + sbrec_mac_binding_set_mac(b, mac_string); } + return; } + } + + /* Add new IP-MAC binding for this logical port. */ + b = sbrec_mac_binding_insert(ctx->ovnsb_idl_txn); + sbrec_mac_binding_set_logical_port(b, pb->logical_port); + sbrec_mac_binding_set_ip(b, ip_string); + sbrec_mac_binding_set_mac(b, mac_string); +} - /* Add new IP-MAC binding for this logical port. */ - b = sbrec_mac_binding_insert(ctx->ovnsb_idl_txn); - sbrec_mac_binding_set_logical_port(b, pa->logical_port); - sbrec_mac_binding_set_ip(b, ip_string); - sbrec_mac_binding_set_mac(b, mac_string); - next:; +static void +run_put_arps(struct controller_ctx *ctx, const struct lport_index *lports) +{ + if (!ctx->ovnsb_idl_txn) { + return; } + const struct put_arp *pa; + HMAP_FOR_EACH (pa, hmap_node, &put_arps) { + run_put_arp(ctx, lports, pa); + } flush_put_arps(); } static void wait_put_arps(struct controller_ctx *ctx) { - if (ctx->ovnsb_idl_txn && n_put_arps) { + if (ctx->ovnsb_idl_txn && !hmap_is_empty(&put_arps)) { poll_immediate_wake(); } } + +static void +flush_put_arps(void) +{ + struct put_arp *pa, *next; + HMAP_FOR_EACH_SAFE (pa, next, hmap_node, &put_arps) { + hmap_remove(&put_arps, &pa->hmap_node); + free(pa); + } +} diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml index 0e95828..79fe153 100644 --- a/ovn/northd/ovn-northd.8.xml +++ b/ovn/northd/ovn-northd.8.xml @@ -685,7 +685,6 @@ icmp4 { </p> <pre> -rate_limit(outport, ip4.dst); arp { eth.dst = ff:ff:ff:ff:ff:ff; arp.spa = reg1; @@ -698,6 +697,10 @@ arp { (Ingress table 2 initialized <code>reg1</code> with the IP address owned by <code>outport</code>.) </p> + + <p> + The IP packet that triggers the ARP request is dropped. + </p> </li> <li> diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c index 3beb39a..d707965 100644 --- a/ovn/northd/ovn-northd.c +++ b/ovn/northd/ovn-northd.c @@ -414,6 +414,11 @@ ovn_datapath_allocate_key(struct hmap *dp_tnlids) return allocate_tnlid(dp_tnlids, "datapath", (1u << 24) - 1, &hint); } +/* Updates the southbound Datapath_Binding table so that it contains the + * logical switches and routers specified by the northbound database. + * + * Initializes 'datapaths' to contain a "struct ovn_datapath" for every logical + * switch and router. */ static void build_datapaths(struct northd_context *ctx, struct hmap *datapaths) { @@ -701,6 +706,12 @@ ovn_port_update_sbrec(const struct ovn_port *op) } } +/* Updates the southbound Port_Binding table so that it contains the logical + * ports specified by the northbound database. + * + * Initializes 'ports' to contain a "struct ovn_port" for every logical port, + * using the "struct ovn_datapath"s in 'datapaths' to look up logical + * datapaths. */ static void build_ports(struct northd_context *ctx, struct hmap *datapaths, struct hmap *ports) _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev