Manohar Krishnappa Chidambaraswamy <manohar.krishnappa.chidambarasw...@ericsson.com> writes:
> Jan, > > Have addressed your comments/suggestions. Please take a look and let me > know if you have any more comments. > > Signed-off-by: Manohar K C > <manohar.krishnappa.chidambarasw...@ericsson.com> > CC: Jan Scheurich <jan.scheur...@ericsson.com> > --- Hi Jan, and Manohar, Have you considered making this token bucket per-port instead of per-pmd? As I read it, a greedy port can exhaust all the tokens from a particular PMD, possibly leading to an unfair performance for that PMD thread. Am I just being overly paranoid? How are you stress testing this? Is there some framework you're using? I'm interested in possibly implementing a similar rate-limiter from the kernel side (for parity), but want to see if it's really a problem with the netlink datapath first. -Aaron > v3 : v2 rebased to master and adpated to dpif-netdev-perf counters. > https://patchwork.ozlabs.org/patch/909676/ > > v2 : v1 rebased to master. > https://patchwork.ozlabs.org/patch/860687/ > > v1 : Initial patch for upcall rate-limiter based on token-bucket. > https://patchwork.ozlabs.org/patch/836737/ > > Documentation/howto/dpdk.rst | 53 ++++++++++++++++++ > lib/dpif-netdev-perf.h | 1 + > lib/dpif-netdev.c | 130 > ++++++++++++++++++++++++++++++++++++++++--- > lib/dpif.h | 1 + > vswitchd/vswitch.xml | 42 ++++++++++++++ > 5 files changed, 220 insertions(+), 7 deletions(-) > > diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst > index 380181d..b717804 100644 > --- a/Documentation/howto/dpdk.rst > +++ b/Documentation/howto/dpdk.rst > @@ -358,6 +358,59 @@ devices to bridge ``br0``. Once complete, follow the > below steps: > > $ cat /proc/interrupts | grep virtio > > +Upcall rate limiting > +-------------------- > +In OVS, when an incoming packet does not match any flow in the flow > +table maintained in fast-path (either in userspace or kernel), the > +packet is processed in the slow-path via a mechanism known as upcall. > +In OVS-DPDK, both fast-path and slow-path execute in the context of a > +common poll-mode-driver (PMD) thread, without any partitioning of CPU > +cycles between the two. > + > +When there is a burst of new flows coming into the data-path, packets > +are punted to slow-path in the order they are received and the PMD is > +busy for the duration of the upcall. Slow-path processing of a packet > +consumes 100-200 times the cycles of fast-path handling. As a result, > +the forwarding performance of a PMD degrades significantly during an > +upcall burst. If the PMD was highly loaded already, it becomes temporarily > +overloaded and its rx queues start filling up. If the upcall burst is long > +enough, packets will be dropped when rx queues are full. This happens even > +if the new flows are unexpected and the slow-path decides to drop the > +packets. > + > +It is likely that most of the packets dropped due to rx queue overflow > +belong to established flows that should have been processed by the > +fast-path. Hence, the current OVS-DPDK architecture favors the handling > +of new flows over the forwarding of established flows. This is generally > +a sub-optimal approach. > + > +Without a limit to the rate of upcalls, OVS-DPDK is vulnerable for DoS > +attacks. But even sporadic bursts of e.g. unexpected multicast packets > +have shown to cause such packet drops. > + > +ovs-vsctl can be used to enable and configure upcall rate limit parameters. > +There are 2 configurable parameters ``upcall-rate`` and ``upcall-burst`` > which > +take effect when the configuration parameter ``upcall-rl`` is set to true. > + > +Upcall rate-limiting is done at independent PMD level. Configured values for > +``upcall-rate`` and ``upcall-burst`` are used indepdently in each PMD > +(and non-PMD) threads which execute upcalls. > + > +Upcall rate should be set using ``upcall-rate`` in upcalls-per-sec. If not > +configured, default value of 500 upcalls-per-sec will be set. For example:: > + > + $ ovs-vsctl set Open_vSwitch . other_config:upcall-rate=2000 > + > +Upcall burst size should be set using ``upcall-burst`` in upcalls. If not > +configured, default value of 500 upcalls will be set. For example:: > + > + $ ovs-vsctl set Open_vSwitch . other_config:upcall-burst=2000 > + > +Upcall ratelimit feature should be globally enabled using ``upcall-rl``. For > +example:: > + > + $ ovs-vsctl set Open_vSwitch . other_config:upcall-rl=true > + > Further Reading > --------------- > > diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h > index 5993c25..e4b945a 100644 > --- a/lib/dpif-netdev-perf.h > +++ b/lib/dpif-netdev-perf.h > @@ -55,6 +55,7 @@ enum pmd_stat_type { > * number of packet passes through the datapath > * pipeline and should not be overlapping with > each > * other. */ > + PMD_STAT_RL_DROP, /* Packets dropped due to upcall rate-limit. */ > PMD_STAT_MASKED_LOOKUP, /* Number of subtable lookups for flow table > hits. Each MASKED_HIT hit will have >= 1 > MASKED_LOOKUP(s). */ > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > index be31fd0..dba8629 100644 > --- a/lib/dpif-netdev.c > +++ b/lib/dpif-netdev.c > @@ -101,6 +101,11 @@ static struct shash dp_netdevs > OVS_GUARDED_BY(dp_netdev_mutex) > > static struct vlog_rate_limit upcall_rl = VLOG_RATE_LIMIT_INIT(600, 600); > > +/* These default values may be tuned based on upcall performance */ > +#define UPCALL_RATELIMIT_DEFAULT false /* Disabled by default */ > +#define UPCALL_RATE_DEFAULT 500 /* upcalls-per-sec */ > +#define UPCALL_BURST_DEFAULT 500 /* maximum burst of upcalls */ > + > #define DP_NETDEV_CS_SUPPORTED_MASK (CS_NEW | CS_ESTABLISHED | CS_RELATED \ > | CS_INVALID | CS_REPLY_DIR | > CS_TRACKED \ > | CS_SRC_NAT | CS_DST_NAT) > @@ -316,6 +321,12 @@ struct dp_netdev { > uint64_t last_tnl_conf_seq; > > struct conntrack conntrack; > + > + /* Upcall rate-limiter parameters */ > + atomic_bool upcall_ratelimit; > + atomic_uint32_t upcall_rate; > + atomic_uint32_t upcall_burst; > + bool upcall_params_changed; > }; > > static void meter_lock(const struct dp_netdev *dp, uint32_t meter_id) > @@ -615,6 +626,11 @@ struct dp_netdev_pmd_thread { > /* Keep track of detailed PMD performance statistics. */ > struct pmd_perf_stats perf_stats; > > + /* Policer to rate limit upcalls */ > + struct token_bucket upcall_tb; /* Cache of rate and burst parameters. */ > + bool upcall_ratelimit; /* Cache of global enable/disable > + parameter. */ > + > /* Set to true if the pmd thread needs to be reloaded. */ > bool need_reload; > }; > @@ -856,12 +872,13 @@ pmd_info_show_stats(struct ds *reply, > "\tavg. subtable lookups per megaflow hit: %.02f\n" > "\tmiss with success upcall: %"PRIu64"\n" > "\tmiss with failed upcall: %"PRIu64"\n" > + "\tdrops due to upcall ratelimiter: %"PRIu64"\n" > "\tavg. packets per output batch: %.02f\n", > total_packets, stats[PMD_STAT_RECIRC], > passes_per_pkt, stats[PMD_STAT_EXACT_HIT], > stats[PMD_STAT_MASKED_HIT], lookups_per_hit, > stats[PMD_STAT_MISS], stats[PMD_STAT_LOST], > - packets_per_batch); > + stats[PMD_STAT_RL_DROP], packets_per_batch); > > if (total_cycles == 0) { > return; > @@ -1479,6 +1496,7 @@ dpif_netdev_get_stats(const struct dpif *dpif, struct > dpif_dp_stats *stats) > stats->n_hit += pmd_stats[PMD_STAT_MASKED_HIT]; > stats->n_missed += pmd_stats[PMD_STAT_MISS]; > stats->n_lost += pmd_stats[PMD_STAT_LOST]; > + stats->n_rl_dropped += pmd_stats[PMD_STAT_RL_DROP]; > } > stats->n_masks = UINT32_MAX; > stats->n_mask_hit = UINT64_MAX; > @@ -1489,11 +1507,24 @@ dpif_netdev_get_stats(const struct dpif *dpif, struct > dpif_dp_stats *stats) > static void > dp_netdev_reload_pmd__(struct dp_netdev_pmd_thread *pmd) > { > + uint32_t rate, burst; > + > if (pmd->core_id == NON_PMD_CORE_ID) { > ovs_mutex_lock(&pmd->dp->non_pmd_mutex); > ovs_mutex_lock(&pmd->port_mutex); > pmd_load_cached_ports(pmd); > ovs_mutex_unlock(&pmd->port_mutex); > + > + /* Reconfig the upcall policer if params have changed */ > + atomic_read_relaxed(&pmd->dp->upcall_rate, &rate); > + atomic_read_relaxed(&pmd->dp->upcall_burst, &burst); > + if ((rate != pmd->upcall_tb.rate) || > + (burst != pmd->upcall_tb.burst)) { > + token_bucket_init(&pmd->upcall_tb, rate, burst); > + } > + atomic_read_relaxed(&pmd->dp->upcall_ratelimit, > + &pmd->upcall_ratelimit); > + > ovs_mutex_unlock(&pmd->dp->non_pmd_mutex); > return; > } > @@ -2987,6 +3018,9 @@ dpif_netdev_set_config(struct dpif *dpif, const struct > smap *other_config) > DEFAULT_EM_FLOW_INSERT_INV_PROB); > uint32_t insert_min, cur_min; > uint32_t tx_flush_interval, cur_tx_flush_interval; > + uint32_t rate, cur_rate; > + uint32_t burst, cur_burst; > + bool ratelimit, cur_ratelimit; > > tx_flush_interval = smap_get_int(other_config, "tx-flush-interval", > DEFAULT_TX_FLUSH_INTERVAL); > @@ -3021,6 +3055,44 @@ dpif_netdev_set_config(struct dpif *dpif, const struct > smap *other_config) > } > } > > + /* Handle upcall policer params */ > + ratelimit = smap_get_bool(other_config, "upcall-rl", > + UPCALL_RATELIMIT_DEFAULT); > + rate = smap_get_int(other_config, "upcall-rate", > + UPCALL_RATE_DEFAULT); > + burst = smap_get_int(other_config, "upcall-burst", > + UPCALL_BURST_DEFAULT); > + > + atomic_read_relaxed(&dp->upcall_ratelimit, &cur_ratelimit); > + atomic_read_relaxed(&dp->upcall_rate, &cur_rate); > + atomic_read_relaxed(&dp->upcall_burst, &cur_burst); > + > + if ((rate != cur_rate) || (burst != cur_burst)) { > + VLOG_INFO("Upcall ratelimiter params changed : Old - rate=%d > burst=%d " > + ": New - rate=%d burst=%d\n", cur_rate, cur_burst, > + rate, burst); > + > + atomic_store_relaxed(&dp->upcall_rate, rate); > + atomic_store_relaxed(&dp->upcall_burst, burst); > + > + dp->upcall_params_changed = true; > + } > + > + if (ratelimit != cur_ratelimit) { > + VLOG_INFO("Upcall ratelimiter changed to %s\n", > + (ratelimit ? "Enabled" : "Disabled")); > + VLOG_INFO("Upcall ratelimiter params : rate=%d burst=%d\n", > + rate, burst); > + > + atomic_store_relaxed(&dp->upcall_ratelimit, ratelimit); > + > + dp->upcall_params_changed = true; > + } > + > + if (dp->upcall_params_changed) { > + dp_netdev_request_reconfigure(dp); > + } > + > return 0; > } > > @@ -3653,6 +3725,7 @@ reconfigure_pmd_threads(struct dp_netdev *dp) > struct hmapx_node *node; > bool changed = false; > bool need_to_adjust_static_tx_qids = false; > + bool need_to_reinit_upcall_ratelimiter = false; > > /* The pmd threads should be started only if there's a pmd port in the > * datapath. If the user didn't provide any "pmd-cpu-mask", we start > @@ -3674,6 +3747,13 @@ reconfigure_pmd_threads(struct dp_netdev *dp) > need_to_adjust_static_tx_qids = true; > } > > + /* Check if upcall ratelimiter paramters have changed */ > + if (dp->upcall_params_changed) { > + need_to_reinit_upcall_ratelimiter = true; > + > + dp->upcall_params_changed = false; > + } > + > /* Check for unwanted pmd threads */ > CMAP_FOR_EACH (pmd, node, &dp->poll_threads) { > if (pmd->core_id == NON_PMD_CORE_ID) { > @@ -3682,7 +3762,8 @@ reconfigure_pmd_threads(struct dp_netdev *dp) > if (!ovs_numa_dump_contains_core(pmd_cores, pmd->numa_id, > pmd->core_id)) { > hmapx_add(&to_delete, pmd); > - } else if (need_to_adjust_static_tx_qids) { > + } else if (need_to_adjust_static_tx_qids || > + need_to_reinit_upcall_ratelimiter) { > pmd->need_reload = true; > } > } > @@ -4106,6 +4187,7 @@ pmd_thread_main(void *f_) > int poll_cnt; > int i; > int process_packets = 0; > + uint32_t rate, burst; > > poll_list = NULL; > > @@ -4135,6 +4217,15 @@ reload: > lc = UINT_MAX; > } > > + /* Reconfig upcall policer token bucket with configured params. */ > + atomic_read_relaxed(&pmd->dp->upcall_rate, &rate); > + atomic_read_relaxed(&pmd->dp->upcall_burst, &burst); > + if ((rate != pmd->upcall_tb.rate) || > + (burst != pmd->upcall_tb.burst)) { > + token_bucket_init(&pmd->upcall_tb, rate, burst); > + } > + atomic_read_relaxed(&pmd->dp->upcall_ratelimit, &pmd->upcall_ratelimit); > + > pmd->intrvl_tsc_prev = 0; > atomic_store_relaxed(&pmd->intrvl_cycles, 0); > cycles_counter_update(s); > @@ -4596,6 +4687,8 @@ static void > dp_netdev_configure_pmd(struct dp_netdev_pmd_thread *pmd, struct dp_netdev > *dp, > unsigned core_id, int numa_id) > { > + uint32_t rate, burst; > + > pmd->dp = dp; > pmd->core_id = core_id; > pmd->numa_id = numa_id; > @@ -4627,6 +4720,13 @@ dp_netdev_configure_pmd(struct dp_netdev_pmd_thread > *pmd, struct dp_netdev *dp, > emc_cache_init(&pmd->flow_cache); > pmd_alloc_static_tx_qid(pmd); > } > + > + /* Initialize upcall policer token bucket with configured params */ > + atomic_read_relaxed(&dp->upcall_rate, &rate); > + atomic_read_relaxed(&dp->upcall_burst, &burst); > + token_bucket_init(&pmd->upcall_tb, rate, burst); > + atomic_read_relaxed(&dp->upcall_ratelimit, &pmd->upcall_ratelimit); > + > pmd_perf_stats_init(&pmd->perf_stats); > cmap_insert(&dp->poll_threads, CONST_CAST(struct cmap_node *, > &pmd->node), > hash_int(core_id, 0)); > @@ -5145,6 +5245,7 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd, > struct dpcls_rule *rules[PKT_ARRAY_SIZE]; > struct dp_netdev *dp = pmd->dp; > int upcall_ok_cnt = 0, upcall_fail_cnt = 0; > + int upcall_rl_drop_cnt = 0; > int lookup_cnt = 0, add_lookup_cnt; > bool any_miss; > > @@ -5185,13 +5286,26 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd, > continue; > } > > - int error = handle_packet_upcall(pmd, packet, &keys[i], > - &actions, &put_actions); > + /* > + * Grab a token from the upcall policer to enter slowpath. If > token > + * is not available, drop and account the packet. This is to > + * rate-limit packets getting into slowpath. > + */ > + if (pmd->upcall_ratelimit && > + !token_bucket_withdraw(&pmd->upcall_tb, 1)) { > + dp_packet_delete(packet); > > - if (OVS_UNLIKELY(error)) { > - upcall_fail_cnt++; > + upcall_rl_drop_cnt++; > } else { > - upcall_ok_cnt++; > + > + int error = handle_packet_upcall(pmd, packet, &keys[i], > + &actions, &put_actions); > + > + if (OVS_UNLIKELY(error)) { > + upcall_fail_cnt++; > + } else { > + upcall_ok_cnt++; > + } > } > } > > @@ -5228,6 +5342,8 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd, > upcall_ok_cnt); > pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_LOST, > upcall_fail_cnt); > + pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_RL_DROP, > + upcall_rl_drop_cnt); > } > > /* Packets enter the datapath from a port (or from recirculation) here. > diff --git a/lib/dpif.h b/lib/dpif.h > index 94f89ec..708e798 100644 > --- a/lib/dpif.h > +++ b/lib/dpif.h > @@ -438,6 +438,7 @@ struct dpif_dp_stats { > uint64_t n_hit; /* Number of flow table matches. */ > uint64_t n_missed; /* Number of flow table misses. */ > uint64_t n_lost; /* Number of misses not sent to userspace. */ > + uint64_t n_rl_dropped; /* Number of drops due to upcall rate-limit. > */ > uint64_t n_flows; /* Number of flows present. */ > uint64_t n_mask_hit; /* Number of mega flow masks visited for > flow table matches. */ > diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml > index 94a64dd..858ba4e 100644 > --- a/vswitchd/vswitch.xml > +++ b/vswitchd/vswitch.xml > @@ -428,6 +428,48 @@ > </p> > </column> > > + <column name="other_config" key="upcall-rl" > + type='{"type": "boolean"}'> > + <p> > + Set this value to <code>true</code> to enable upcall rate-limiting. > + The upcall parameters like rate and burst will be ignored, if this > is > + not set. > + </p> > + <p> > + The default value is <code>false</code> and upcall rate-limiting > will > + be disabled. > + </p> > + </column> > + > + <column name="other_config" key="upcall-rate" > + type='{"type": "integer", "minInteger": 0, "maxInteger": > 4294967295}'> > + <p> > + When upcall rate limiting is enabled, this specifies the sustained > + rate of upcalls in upcalls per second that are admitted per > + packet-polling thread (PMD or non-PMD) in the netdev datapath when > + upcall rate limiting is enabled. The default value is 500 upcalls > + per second. > + > + Setting both upcall-rate and upcall-burst to <code>0</code> > disables > + upcalls in the netdev datapath. This can be used for debugging > + purposes. > + </p> > + </column> > + > + <column name="other_config" key="upcall-burst" > + type='{"type": "integer", "minInteger": 0, "maxInteger": > 4294967295}'> > + <p> > + When upcall rate limiting is enabled, this specifies the maximum > + burst of upcalls that are admitted per packet-polling thread > + (PMD or non-PMD) in the netdev datapath independent from their > + packet arrival rate. The default value is a burst of 500 upcalls. > + > + Setting both upcall-rate and upcall-burst to <code>0</code> > disables > + upcalls in the netdev datapath. This can be used for debugging > + purposes. > + </p> > + </column> > + > <column name="other_config" key="vlan-limit" > type='{"type": "integer", "minInteger": 0}'> > <p> _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev