Re: [ovs-dev] [PATCH v8 0/3] Support dynamic rebalancing of offloaded flows

Eelco Chaudron Wed, 24 Oct 2018 04:05:04 -0700


On 19 Oct 2018, at 19:49, Sriharsha Basavapatna wrote:

Hi Eelco,

Thanks for your review comments. Please see my response below. The
patch-set got merged earlier today.  We will address your comments in
subsequent bug-fix patches.

Thanks! Please take care of the potential memory corruption withx2nrealloc().

On Fri, Oct 19, 2018 at 1:27 AM Eelco Chaudron <echau...@redhat.com>wrote:




On 18 Oct 2018, at 18:13, Sriharsha Basavapatna via dev wrote:

With the current OVS offload design, when an offload-device fails to
add a
flow rule and returns an error, OVS adds the rule to the kernel
datapath.

The flow gets processed by the kernel datapath for the entire lifeof

that

flow. This is fine when an error is returned by the device due tolack

of
support for certain keys or actions.

But when an error is returned due to temporary conditions such aslack

of
resources to add a flow rule, the flow continues to be processed by
kernel

even when resources become available later. That is, those flowsnever

get

offloaded again. This problem becomes more pronounced when a flowthat

has
been initially offloaded may have a smaller packet rate than a later
flow
that could not be offloaded due to lack of resources. This leads to
inefficient use of HW resources and wastage of host CPU cycles.

This patch-set addresses this issue by providing a way to detect
temporary

offload resource constraints (Out-Of-Resource or OOR condition) andto

selectively and dynamically offload flows with a higher
packets-per-second
(pps) rate. This dynamic rebalancing is done periodically on netdevs
that
are in OOR state until resources become available to offload all
pending
flows.

The patch-set involves the following changes at a high level:

1. Detection of Out-Of-Resources (OOR) condition on anoffload-capable

   netdev.
2. Gathering flow offload selection criteria for all flows on an OOR
netdev;
   i.e, packets-per-second (pps) rate of flows for offloaded and
   non-offloaded (pending) flows.
3. Dynamically replacing offloaded flows with a lower pps-rate, with

non-offloaded flows with a higher pps-rate, on an OOR netdev. Anew

   OpenvSwitch configuration option - "offload-rebalance" to enable
   this policy.

Cost/benefits data points:

1. Rough cost of the new rebalancing, in terms of CPU time:

   Ran a test that replaced 256 low pps-rate flows(pings) with 256
high
   pps-rate flows(iperf), in a system with 4 cpus (Intel Xeon E5 @
2.40GHz;
   2 cores with hw threads enabled, rest disabled). The data showed
that cpu
   utilization increased by about ~20%. This increase occurs during
the
   specific second in which rebalancing is done. And subsequently
(from the
   next second), cpu utilization decreases significantly due to
offloading
   of higher pps-rate flows. So effectively there's a bump in cpu
utilization
   at the time of rebalancing, that is more than compensated by
reduced cpu
   utilization once the right flows get offloaded.

2. Rough benefits to the user in terms of offload performance:

   The benefits to the user is reduced cpu utilization in the host,
since
   higher pps-rate flows get offloaded, replacing lower pps-rate
flows.
   Replacing a single offloaded flood ping flow with an iperf flow
(multiple

connections), shows that the cpu %used that was originally 100%on

a
   single cpu (rebalancing disabled) goes down to 35% (rebalancing
enabled).
   That is, cpu utilization decreased 65% after rebalancing.

3. Circumstances under which the benefits would show up:

   The rebalancing benefits would show up once offload resources are
   exhausted and new flows with higher pps-rate are initiated, that
would
   otherwise be handled by kernel datapath costing host cpu cycles.

This can be observed using 'ovs appctl dpctl/ dump-flows'command.

Prior

to rebalancing, any high pps-rate flows that couldn't beoffloaded

due to
   resource crunch would show up in the output of 'dump-flows
type=ovs' and
   after rebalancing such flows would appear in the output of
   'dump-flows type=offloaded'.

Before I review the individual patches, hope to do this tomorrow, Ihave

some general concerns/comments.

Once committed will this feature be marked double experimental? Just

want to clarify if it goes in as part of HW offload the experimentaltag

for this is not removed once HW offload becomes mainstream.


I'm not sure if it will be marked experimental. But if it does get
marked, then depending on how this feature evolves by the time HW
offload becomes mainstream, the tag could be removed ? Ben, Simon, any
suggestions on this ?


Currently, in OOR state both phases (insert, and rebalance) are ran.
Rather than have offload-rebalance true, or false. Maybe we can have,
disable, retry, retry-rebalance. I think only retrying might be
beneficial as well, not changing existing flows.


If some of the flows terminate and HW resources become available
again, then yes just the first phase itself would be beneficial.  I'm
fine with changing the option values like you are suggesting. But for
now, until we have some more experience with this feature it might be
better to have this simple on/off (boolean) values. Also, the second
phase (rebalance) won't even run if all pending flows get offloaded
during the first phase.

Once we will release the next OVS version it’s hard to change theoption for backward compatibility reasons. So if we want to do it, weshould do it soon rather than later.

In addition, I do see the use case to offload flows once room getsavailable, but people might not want to do the rebalance (to save ontemp resource spikes, or to avoid ping-pong of flows between hw/sw).


My main objection against offloading flows at a later stage is packet
re-ordering. As soon as you move from kernel datapath to hardware

offload packet might be sent out of order. This is mainly a problemfor

TCP streams, which do not have the problem if the stream is directly
offloaded at the start.


Sure, will take a look if/how reordering affects a flow at the time of
rebalancing. A few thoughts on this:
1) This could be seen only for a flow that's being moved from pending
to offloaded state; since we first try to offload it and if that
succeeds we remove it from the kernel datapath. That is, a flow that's
being moved from offloaded to pending state shouldn't have this
problem.

Agreed but based on stats a flow can ping-pong from offload to softwareespecially if only a small amount of offload flows are available.

2) This window is quite small; that is reordering shouldn't persist
for long (i.e, tcp performance impact should be limited).

True, but we had some customer saying this is not acceptable, speciallyas this is happening in the first hop in the network.

3) If this is not acceptable, we could investigate other ways to dealwith this.

Maybe more control on which flows can be offloaded at a later stage? Orsome fancy way of queuing in hardware until the sw path is flushed :)

To make this even worse is that the current implementation has no
protection for flows fighting to be offloaded, i.e. ping/ponging fromHW
to SW, hence causing a lot of out of order packets.


We will try to simulate this condition; if this turns out to be an
issue, we could take some steps along the lines of:

1) Reduce the frequency of rebalancing; currently it is done at 3-secinterval.

2)  Use an additional condition that the difference in pps-rate be >
some number for a flow to be offloaded.
3)  Use packet count (in addition to pps-rate) or a timestamp of last
offloading, to avoid frequent offload/un-offload.

Based on the above I was wondering if any tests where done to measure
the out of order packets/jitter on rebalancing?

Last implementation item I have is the packet throughput through the
kernel datapath is rather low, <200Kpps. This low throughput mightmakeit hard to determine which not offloaded flow might be the bestsuited
for HW offload. The kernel might drop packets for a flow with a way
higher potential than the packets that do make it through the kernel.I
do not have a solution for this, but I guess it worth keeping in mind
when this gets enabled.

Thanks, this is a good point; will observe rebalancing w.r.t thislimitation.


As a general question, where other solutions being considered to cope

with inadequate resources? Maybe a solution that will give theoperator

more control over what would be offloaded? Like giving all configured
flows a relative priority. Either at individual flow creation or some
general overlay, i.e. all TCP flows for example.h


We did think about prioritizing the flows; but then it'd need some
kind of user-input to assign priorities to the flows. We picked the
current approach since it'd be transparent to the user.

It’s transparent but there is no way to influence it. Maybe someonehas real-life use cases to share.

My thoughts were around the case where you have a lot of VM 2 storagetraffic which you want to offload, and your HTTP traffic is mostly shortlived so you want to give this the remaining resources (or none at all).


Thanks,
-Harsha



Cheers,

Eelco

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v8 0/3] Support dynamic rebalancing of offloaded flows

Reply via email to