> On Wed, 17 Feb 2016 14:40:26 +0000
> "Stokes, Ian" <ian.sto...@intel.com> wrote:
> 
> > > -----Original Message-----
> > > From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Stokes,
> > > Ian
> > > Sent: Thursday, February 11, 2016 2:42 PM
> > > To: Flavio Leitner
> > > Cc: dev@openvswitch.org
> > > Subject: Re: [ovs-dev] [PATCH v3] netdev_dpdk.c: Add QoS
> functionality.
> > >
> > > > On Thu, 11 Feb 2016 12:32:56 +0000 "Stokes, Ian"
> > > > <ian.sto...@intel.com> wrote:
> > > >
> > > > > Thanks For the review Flavio, much appreciated, comments inline.
> > > > > I'll
> > > > re-spin a new version also.
> > > >
> > > > Thank you for the patch!
> > > > comments inline.
> > > >
> > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Flavio Leitner [mailto:f...@sysclose.org]
> > > > > > Sent: Wednesday, February 10, 2016 7:55 PM
> > > > > > To: Stokes, Ian
> > > > > > Cc: dev@openvswitch.org
> > > > > > Subject: Re: [ovs-dev] [PATCH v3] netdev_dpdk.c: Add QoS
> > > > functionality.
> > > > > >
> > > > > > On Mon,  1 Feb 2016 20:47:25 +0000 Ian Stokes
> > > > > > <ian.sto...@intel.com> wrote:
> > > > > >
> > > > > > > This patch provides the modifications required in
> > > > > > > netdev-dpdk.c and vswitch.xml to allow for a DPDK user space
> QoS algorithm.
> > > > > > >
> > > > > > > This patch adds a QoS configuration structure for
> > > > > > > netdev-dpdk and expected QoS operations 'dpdk_qos_ops'.
> > > > > > > Various helper functions are also supplied.
> > > > > > >
> > > > > > > Also included are the modifications required for vswitch.xml
> > > > > > > to allow a new QoS implementation for netdev-dpdk devices.
> > > > > > > This includes a new QoS type `egress-policer` as well as its
> > > > > > > expected
> > > > QoS table entries.
> > > > > > >
> > > > > > > The QoS functionality implemented for DPDK devices is
> > > > > > > `egress-
> > > > > > policer`.
> > > > > > > This can be used to drop egress packets at a configurable
> rate.
> > > > > > >
> > > > > > > The INSTALL.DPDK.md guide has also been modified to provide
> > > > > > > an example configuration of `egress-policer` QoS.
> > > > > > >
> > > > > > > Signed-off-by: Ian Stokes <ian.sto...@intel.com>
> > > > > > > ---
> > > > > > >  INSTALL.DPDK.md      |   20 +++
> > > > > > >  lib/netdev-dpdk.c    |  439
> > > > > > ++++++++++++++++++++++++++++++++++++++++++++++++--
> > > > > > >  vswitchd/vswitch.xml |   52 ++++++
> > > > > > >  3 files changed, 498 insertions(+), 13 deletions(-)
> > > > > > >
> > > > > > > diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md index
> > > > > > > e8ef4b5..dac70cd
> > > > > > > 100644
> > > > > > > --- a/INSTALL.DPDK.md
> > > > > > > +++ b/INSTALL.DPDK.md
> > > > > > > @@ -207,6 +207,26 @@ Using the DPDK with ovs-vswitchd:
> > > > > > >     ./ovs-ofctl add-flow br0 in_port=2,action=output:1
> > > > > > >     ```
> > > > > > >
> > > > > > > +8. QoS usage example
> > > > > > > +
> > > > > > > +   Assuming you have a vhost-user port transmitting traffic
> > > > > > consisting of
> > > > > > > +   packets of size 64 bytes, the following command would
> > > > > > > + limit the
> > > > > > egress
> > > > > > > +   transmission rate of the port to ~1,000,000 packets per
> > > > second:
> > > > > > > +
> > > > > > > +   `ovs-vsctl set port vhost-user0 qos=@newqos --
> > > > > > > + --id=@newqos create
> > > > > > qos
> > > > > > > +   type=egress-policer other-config:cir=46000000
> > > > > > > + other-config:cbs=2048`
> > > > > > > +
> > > > > > > +   To examine the QoS configuration of the port:
> > > > > > > +
> > > > > > > +   `ovs-appctl -t ovs-vswitchd qos/show vhost-user0`
> > > > > > > +
> > > > > > > +   To clear the QoS configuration from the port and ovsdb
> > > > > > > + use the
> > > > > > following:
> > > > > > > +
> > > > > > > +   `ovs-vsctl -- destroy QoS vhost-user0 -- clear Port
> > > > > > > + vhost-user0 qos`
> > > > > > > +
> > > > > > > +   For more details regarding egress-policer parameters
> > > > > > > + please refer
> > > > > > to the
> > > > > > > +   vswitch.xml.
> > > > > > > +
> > > > > > >  Performance Tuning:
> > > > > > >  -------------------
> > > > > > >
> > > > > > > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index
> > > > > > > 09ccc2c..716da79 100644
> > > > > > > --- a/lib/netdev-dpdk.c
> > > > > > > +++ b/lib/netdev-dpdk.c
> > > > > > > @@ -44,6 +44,7 @@
> > > > > > >  #include "ovs-rcu.h"
> > > > > > >  #include "packets.h"
> > > > > > >  #include "shash.h"
> > > > > > > +#include "smap.h"
> > > > > > >  #include "sset.h"
> > > > > > >  #include "unaligned.h"
> > > > > > >  #include "timeval.h"
> > > > > > > @@ -52,12 +53,14 @@
> > > > > > >
> > > > > > >  #include "rte_config.h"
> > > > > > >  #include "rte_mbuf.h"
> > > > > > > +#include "rte_meter.h"
> > > > > > >  #include "rte_virtio_net.h"
> > > > > > >
> > > > > > >  VLOG_DEFINE_THIS_MODULE(dpdk);  static struct
> > > > > > > vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
> > > > > > >
> > > > > > >  #define DPDK_PORT_WATCHDOG_INTERVAL 5
> > > > > > > +#define DPDK_MAX_QOS_NAME_SIZE 10
> > > > > Noted, will expand to 20 for the moment unless you think it
> > > > > should be
> > > > higher?
> > > >
> > > > I'd rather take that out completely.
> > > >
> > >
> > > Done, Daniele had a similar comment and it's not needed.
> > >
> > > > > > > +
> > > > > > > +/* Action that can be set for a packet for rte_meter */
> > > > > > > +enum egress_policer_action {
> > > > > > > +        GREEN = e_RTE_METER_GREEN,
> > > > > > > +        DROP = 1,
> > > > > > > +};
> > > > > >
> > > > > > I don't think it's a good idea to mix external and internal
> > > > > > declarations because if the external one changes, it will
> > > > > > break here. Since GREEN and DROP have meanings only inside of
> > > > > > OVS, we
> > > > should define our own values.
> > > > > > See my comment where this is used.
> > > > > >
> > > > > Just to clarify, is this in relation to the use of
> > > e_RTE_METER_GREEN?
> > > > > Is it prefereable for GREEN to be equal to our own declaration?
> > > >
> > > > We can use e_RTE_METER_GREEN and we should do it to be consistent
> > > > with the rte function.  My concern is having two declarations: one
> > > > coming from outside and another that you defined to a fixed
> > > > number.  So, what happens if e_RTE_METER_GREEN gets shifted and
> now its value is 1?
> > > >
> > > > But with the simplification I proposed for
> > > > egress_policer_pkt_handle__() that enum isn't needed and we can
> > > > take
> > > it out completely.
> > > >
> > >
> > > Agreed, I've tested your suggestion and it works fine + is cleaner.
> > >
> > > >
> > > > > > >  static inline void
> > > > > > >  netdev_dpdk_vhost_update_tx_counters(struct netdev_stats
> > > *stats,
> > > > > > >                                       struct dp_packet
> > > > > > > **packets, @@
> > > > > > > -1092,6 +1222,7 @@ __netdev_dpdk_vhost_send(struct netdev
> > > > > > > *netdev, int
> > > > > > qid,
> > > > > > >      struct virtio_net *virtio_dev =
> > > > > > netdev_dpdk_get_virtio(vhost_dev);
> > > > > > >      struct rte_mbuf **cur_pkts = (struct rte_mbuf **) pkts;
> > > > > > >      unsigned int total_pkts = cnt;
> > > > > > > +    unsigned int qos_pkts = cnt;
> > > > > > >      uint64_t start = 0;
> > > > > > >
> > > > > > >      if (OVS_UNLIKELY(!is_vhost_running(virtio_dev))) { @@
> > > > > > > -1106,6
> > > > > > > +1237,10 @@ __netdev_dpdk_vhost_send(struct netdev *netdev,
> > > > > > > +int qid,
> > > > > > >          rte_spinlock_lock(&vhost_dev->tx_q[qid].tx_lock);
> > > > > > >      }
> > > > > > >
> > > > > > > +    /* Check has QoS has been configured for the netdev */
> > > > > > > +    cnt = netdev_dpdk_qos_process__(vhost_dev, cur_pkts,
> cnt);
> > > > > > > +    qos_pkts -= cnt;
> > > > > >
> > > > > > I don't see how this is working because
> > > > > > egress_policer_alg_process() will loop in the array one by one
> > > freeing or not the packet.
> > > > > > If it does nothing, then the array is good but otherwise it
> > > > > > creates a problem because the array still references freed
> > > > > > entries.  So in loop below we are sending a mix of good and
> > > > > > freed buffers up to
> > > > 'cnt'.
> > > > > >
> > > > > So we should never have freed packets included in cnt as it will
> > > > > cause a segfault. Consider when an array of size cnt is passed
> > > > > to the netdev_dpdk_qos_process(). The timestamp used to update
> > > > > the token bucket is taken before the for loop is entered. This
> > > > > way the token bucket is only updated once for each batch of
> > > > > packets of size
> > > cnt.
> > > > > The loop cycles through the array, for any packet that is green
> > > > > the token bucket decremented. If a packet is dropped then the
> > > > > bucket has been exhausted. The token bucket will not be updated
> > > > > again within the run of the loop. Any packets remaining in the
> > > > > array will be dropped (i.e. freed) as the bucket is exhausted.
> > > > > The value of cnt will only be the number of pkts that were not
> > > > > dropped (i.e. not freed). This value will then be passed to the
> > > > > send function ensuring no freed packets are present in the array
> that is to be processed.
> > > > > If the token bucket updated between each individual packet being
> > > > > processed then you could have the situation you describe, by
> > > > > processing them as a batch with a single update beforehand you
> > > > > avoid
> > > it.
> > > >
> > > > I haven't looked into the rte implementation yet so I can't tell
> > > > if the decision to drop or not the packet is based solely on the
> > > > timestamp which doesn't change during the batch processing.
> > > >
> > > > Assuming that is the case I agree with you that the cnt would have
> > > > only the first valid entries to be sent out.
> > > >
> > > > My concern is when we are processing a batch with two packets.
> > > > One big packet and another small one.  While processing the big
> > > > packet, there is no enough tokens and the packet is marked as
> > > > RED/DROPPED.  Then the loop gets the second small packet, but now
> > > > some enough tokens were added/still available in the bucket which
> > > > passes
> > > the small packet.
> > > >
> > >
> > > Apologies, I was looking at this the wrong way. I understand your
> > > point now.
> > > Let me think about this. And I'll post a rough solution here before
> > > submitting.
> >
> > So there are 4 cases we can have with policing in terms of how packets
> have been processed.
> >
> > (i) The token bucket is not exhausted and no packets are dropped.
> > (ii) The token bucket is exhausted while processing and packets are
> dropped sequentially for the remainder of the burst.
> > (iii) The token bucket is exhausted from previous call so all packets
> are dropped.
> > (iv) Packets are not dropped sequentially, could occur with large
> packet being dropped before smaller packet being processed.
> >
> > Currently we handle cases (i)(ii) & (iii). However case (iv) is not
> handled and will cause a segfault.
> >
> > Egress_policer_run() returns a cnt of the number of packets remaining
> after the policer has run.
> > The assumption here currently is that this cnt can be directly mapped
> to the entries in the rte_mbuff array i.e. 0 to cnt-1 will contain pkts
> and will not contain NULL entries or packets that have been freed.
> >
> > To deal with case (iv) we will have to ensure that the entries in the
> rte_mbuff array from 0 to cnt-1 contains packets only. This will require
> the entries in the rte_mbuff array to be re-arranged if 0 to cnt-1
> contains freed packets.
> >
> > How to do this as optimally as possible becomes the next question.
> >
> > A potential/rough solution would be as follows
> >
> > We introduce 2 new index values in egress_policer_run().
> > Index value 1 'f_dropped' represents the array entry position where
> the first packet dropped is located in the rte_mbuff array.
> > Index position 2 'l_sent' is the entry location of the last packet to
> be sent in the array.
> >
> > Something like the following could be done to check and handle non
> sequential drops.
> >
> > /* Check if drops are not sequential */ if (f_dropped < l_sent) {
> >
> >     /* Must move any packets that are present after drop
> >      * so that all packets are sequential in rte_mbuff array
> >      * pkts */
> >     for (x = f_dropped; x <= l_sent; x++) {
> >         for (y = x+1; y <= l_sent; y++ ) {
> >             if (pkts[y] != NULL) {
> >                 pkts[x] = pkts[y];
> >                 pkts[y] = NULL;
> >                 break;
> >             }
> >             continue;
> >         }
> >     }
> > }
> > /* return cnt i.e. number of packets present in the array */ return
> > cnt;
> >
> >
> > Would something like what was outlined above be ok? Do people have a
> more optimal solution?
> 
> That seems to not work when you have multiple drops in the middle or
> maybe I missed something.
> What about something like this:
> 
>     int cnt = 0;
>     int i = 0;
>     for (i = 0; i < pkt_cnt; i++) {
>         pkt = pkts[i];
>         /* Handle current packet */
>         if (egress_policer_pkt_handle__(&policer->egress_meter, pkt,
>                                         current_time) == GREEN) {
>             if (cnt != i) {
>                 pkts[cnt] = pkt;
>             }
>             cnt++;
>         }
>         else {
>             rte_pktmbuf_free(pkt);
>         }
>     }
> 
>     return cnt;
> 
> Thanks,
> --
> fbl

Thanks Flavio, this approach seems simpler and the work is done up front rather 
than checking the array after the packets have been processed.
The original solution proposed above would work for multiple packet drops but 
this is a moot point now as your solution avoids the problem altogether which 
is much more preferable.

Thanks
Ian

_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to