On Thu, 11 Feb 2016 12:32:56 +0000 "Stokes, Ian" <ian.sto...@intel.com> wrote:
> Thanks For the review Flavio, much appreciated, comments inline. I'll re-spin > a new version also. Thank you for the patch! comments inline. > > > -----Original Message----- > > From: Flavio Leitner [mailto:f...@sysclose.org] > > Sent: Wednesday, February 10, 2016 7:55 PM > > To: Stokes, Ian > > Cc: dev@openvswitch.org > > Subject: Re: [ovs-dev] [PATCH v3] netdev_dpdk.c: Add QoS functionality. > > > > On Mon, 1 Feb 2016 20:47:25 +0000 > > Ian Stokes <ian.sto...@intel.com> wrote: > > > > > This patch provides the modifications required in netdev-dpdk.c and > > > vswitch.xml to allow for a DPDK user space QoS algorithm. > > > > > > This patch adds a QoS configuration structure for netdev-dpdk and > > > expected QoS operations 'dpdk_qos_ops'. Various helper functions are > > > also supplied. > > > > > > Also included are the modifications required for vswitch.xml to allow > > > a new QoS implementation for netdev-dpdk devices. This includes a new > > > QoS type `egress-policer` as well as its expected QoS table entries. > > > > > > The QoS functionality implemented for DPDK devices is `egress- > > policer`. > > > This can be used to drop egress packets at a configurable rate. > > > > > > The INSTALL.DPDK.md guide has also been modified to provide an example > > > configuration of `egress-policer` QoS. > > > > > > Signed-off-by: Ian Stokes <ian.sto...@intel.com> > > > --- > > > INSTALL.DPDK.md | 20 +++ > > > lib/netdev-dpdk.c | 439 > > ++++++++++++++++++++++++++++++++++++++++++++++++-- > > > vswitchd/vswitch.xml | 52 ++++++ > > > 3 files changed, 498 insertions(+), 13 deletions(-) > > > > > > diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md index e8ef4b5..dac70cd > > > 100644 > > > --- a/INSTALL.DPDK.md > > > +++ b/INSTALL.DPDK.md > > > @@ -207,6 +207,26 @@ Using the DPDK with ovs-vswitchd: > > > ./ovs-ofctl add-flow br0 in_port=2,action=output:1 > > > ``` > > > > > > +8. QoS usage example > > > + > > > + Assuming you have a vhost-user port transmitting traffic > > consisting of > > > + packets of size 64 bytes, the following command would limit the > > egress > > > + transmission rate of the port to ~1,000,000 packets per second: > > > + > > > + `ovs-vsctl set port vhost-user0 qos=@newqos -- --id=@newqos create > > qos > > > + type=egress-policer other-config:cir=46000000 > > > + other-config:cbs=2048` > > > + > > > + To examine the QoS configuration of the port: > > > + > > > + `ovs-appctl -t ovs-vswitchd qos/show vhost-user0` > > > + > > > + To clear the QoS configuration from the port and ovsdb use the > > following: > > > + > > > + `ovs-vsctl -- destroy QoS vhost-user0 -- clear Port vhost-user0 > > > + qos` > > > + > > > + For more details regarding egress-policer parameters please refer > > to the > > > + vswitch.xml. > > > + > > > Performance Tuning: > > > ------------------- > > > > > > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index > > > 09ccc2c..716da79 100644 > > > --- a/lib/netdev-dpdk.c > > > +++ b/lib/netdev-dpdk.c > > > @@ -44,6 +44,7 @@ > > > #include "ovs-rcu.h" > > > #include "packets.h" > > > #include "shash.h" > > > +#include "smap.h" > > > #include "sset.h" > > > #include "unaligned.h" > > > #include "timeval.h" > > > @@ -52,12 +53,14 @@ > > > > > > #include "rte_config.h" > > > #include "rte_mbuf.h" > > > +#include "rte_meter.h" > > > #include "rte_virtio_net.h" > > > > > > VLOG_DEFINE_THIS_MODULE(dpdk); > > > static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20); > > > > > > #define DPDK_PORT_WATCHDOG_INTERVAL 5 > > > +#define DPDK_MAX_QOS_NAME_SIZE 10 > Noted, will expand to 20 for the moment unless you think it should be higher? I'd rather take that out completely. > > > + > > > +/* Action that can be set for a packet for rte_meter */ enum > > > +egress_policer_action { > > > + GREEN = e_RTE_METER_GREEN, > > > + DROP = 1, > > > +}; > > > > I don't think it's a good idea to mix external and internal declarations > > because if the external one changes, it will break here. Since GREEN and > > DROP have meanings only inside of OVS, we should define our own values. > > See my comment where this is used. > > > Just to clarify, is this in relation to the use of e_RTE_METER_GREEN? > Is it prefereable for GREEN to be equal to our own declaration? We can use e_RTE_METER_GREEN and we should do it to be consistent with the rte function. My concern is having two declarations: one coming from outside and another that you defined to a fixed number. So, what happens if e_RTE_METER_GREEN gets shifted and now its value is 1? But with the simplification I proposed for egress_policer_pkt_handle__() that enum isn't needed and we can take it out completely. > > > static inline void > > > netdev_dpdk_vhost_update_tx_counters(struct netdev_stats *stats, > > > struct dp_packet **packets, @@ > > > -1092,6 +1222,7 @@ __netdev_dpdk_vhost_send(struct netdev *netdev, int > > qid, > > > struct virtio_net *virtio_dev = > > netdev_dpdk_get_virtio(vhost_dev); > > > struct rte_mbuf **cur_pkts = (struct rte_mbuf **) pkts; > > > unsigned int total_pkts = cnt; > > > + unsigned int qos_pkts = cnt; > > > uint64_t start = 0; > > > > > > if (OVS_UNLIKELY(!is_vhost_running(virtio_dev))) { @@ -1106,6 > > > +1237,10 @@ __netdev_dpdk_vhost_send(struct netdev *netdev, int qid, > > > rte_spinlock_lock(&vhost_dev->tx_q[qid].tx_lock); > > > } > > > > > > + /* Check has QoS has been configured for the netdev */ > > > + cnt = netdev_dpdk_qos_process__(vhost_dev, cur_pkts, cnt); > > > + qos_pkts -= cnt; > > > > I don't see how this is working because egress_policer_alg_process() > > will loop in the array one by one freeing or not the packet. > > If it does nothing, then the array is good but otherwise it creates a > > problem because the array still references freed entries. So in loop > > below we are sending a mix of good and freed buffers up to 'cnt'. > > > So we should never have freed packets included in cnt as it will > cause a segfault. Consider when an array of size cnt is passed to the > netdev_dpdk_qos_process(). The timestamp used to update the token > bucket is taken before the for loop is entered. This way the token > bucket is only updated once for each batch of packets of size cnt. > The loop cycles through the array, for any packet that is green the > token bucket decremented. If a packet is dropped then the bucket has > been exhausted. The token bucket will not be updated again within the > run of the loop. Any packets remaining in the array will be dropped > (i.e. freed) as the bucket is exhausted. The value of cnt will only > be the number of pkts that were not dropped (i.e. not freed). This > value will then be passed to the send function ensuring no freed > packets are present in the array that is to be processed. If the > token bucket updated between each individual packet being processed > then you could have the situation you describe, by processing them as > a batch with a single update beforehand you avoid it. I haven't looked into the rte implementation yet so I can't tell if the decision to drop or not the packet is based solely on the timestamp which doesn't change during the batch processing. Assuming that is the case I agree with you that the cnt would have only the first valid entries to be sent out. My concern is when we are processing a batch with two packets. One big packet and another small one. While processing the big packet, there is no enough tokens and the packet is marked as RED/DROPPED. Then the loop gets the second small packet, but now some enough tokens were added/still available in the bucket which passes the small packet. > > > + > > > +static inline int > > > +egress_policer_pkt_handle__(struct rte_meter_srtcm *meter, > > > + struct rte_mbuf *pkt, uint64_t time) > > > +{ > > > + uint8_t output_color; > > > + uint32_t pkt_len = rte_pktmbuf_pkt_len(pkt) - sizeof(struct > > ether_hdr); > > > + enum egress_policer_action action = GREEN; > > > + > > > + /* color input is not used for blind modes */ > > > + output_color = (uint8_t) > > > rte_meter_srtcm_color_blind_check(meter, > > time, > > > + > > pkt_len); > > > + > > > + /* Check output color, 0 corresponds to GREEN i.e. packet > > > can be > > > + * transmitted. If greater than 0 then action is set to > > > DROP. */ > > > + if (output_color) { > > > + action = DROP; > > > + } > > > > > > This could be simpler: > > > > static inline int > > egress_policer_pkt_handle__(struct rte_meter_srtcm *meter, > > struct rte_mbuf *pkt, uint64_t time) > > { > > uint32_t pkt_len = rte_pktmbuf_pkt_len(pkt) - sizeof(struct > > ether_hdr); > > > > /* If GREEN, accept it otherwise drop it */ > > if (rte_meter_srtcm_color_blind_check(meter, time, pkt_len) != > > e_RTE_METER_GREEN) { > > return 1; > > } > > > > return 0; > > } > > > > > Agreed this is simpler/better. In relation to the external > declaration comment above, Is it preferable not to use > e_RTE_METER_GREEN here and instead have a value defined in OVS? We should use it to be consistent with the rte implementation. Thanks, -- fbl _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev