Re: [ovs-discuss] [OVN] OVN Load balancing algorithm

Han Zhou Mon, 18 May 2020 11:20:56 -0700

On Mon, May 18, 2020 at 3:39 AM Numan Siddique <num...@ovn.org> wrote:


>
>
> On Mon, May 18, 2020 at 3:34 PM Maciej Jozefczyk <mjoze...@redhat.com>
> wrote:
>
>> Hello,
>>
>> Continuing this topic.
>> With introducing selection_fields to OVN Load_Balancer row we moved the
>> responsibility of calculating the hash to ovs-vswitchd
>> That means we will have potential performance penalty because of that.
>>
>> I verified what the numbers are.
>>
>> I took the environment created by Octavia tempest test:
>>
>> octavia_tempest_plugin.tests.scenario.v2.test_traffic_ops.TrafficOperationsScenarioTest.test_source_ip_port_tcp_traffic
>>
>> Created LB:
>>
>>
>> _uuid               : 178d54c4-bbc9-4c9e-bf4e-35154048fc2b
>> external_ids        : {enabled=True, 
>> listener_704d1e43-57e8-47d1-832f-a361e8424d3a="60092:pool_6b658ef9-c3bb-4e22-85e0-4deba7d85875",
>>  lr_ref=neutron-d6bd62d1-d642-4228-afe4-2d9971c5e96c, 
>> ls_refs="{\"neutron-6b454c29-9c74-4eaa-9bc0-b6159e18
>> 06b2\": 1, \"neutron-ef8dc6bf-32d7-4a79-b9da-e3eca9a7b7ad\": 1, 
>> \"neutron-a5e1472b-7fb5-4754-89d5-0a946c403e5e\": 1}", 
>> "neutron:vip"="10.1.1.229", "neutron:vip_fip"="172.24.4.140", 
>> "neutron:vip_port_id"="53dace30-5879-4d8d-b870-e53010ea5e8
>> 9", 
>> pool_6b658ef9-c3bb-4e22-85e0-4deba7d85875="member_1076d217-cf8c-4a79-9d5a-3c6ab3c5264f_10.2.1.58:80_b7852f43-771f-4150-b840-a1b308eb3890,member_17cb4552-3fa6-4e3c-a4f7-9984cd30147b_10.2.2.8:80_bb6ec37e-9145-423f-8907-624badfb677b"}
>> health_check        : []
>> ip_port_mappings    : {}
>> name                : "9475d106-21b6-4672-ba4a-7b982e7879da"
>> protocol            : tcp
>> selection_fields    : []
>> vips                : {"10.1.1.229:60092"="10.2.1.58:80,10.2.2.8:80", 
>> "172.24.4.140:60092"="10.2.1.58:80,10.2.2.8:80"}
>>
>> Testing environment:
>>
>> * 2 VMs having simple HTTP server as backends
>> * test uses FloatingIP attached to LB VIP
>> * test run with `ab -n 100000 -c 1000` which is total 100000 requests and 
>> 1000 in parallel.
>>
>>
>> 1. Test without selection_fields set:
>>
>> -------------------------------------------------------------------------------------------------------------------------
>> stack@mjozefcz-devstack-ovn-lb-master-new-localconf:~$ ab -n 100000 -c 1000 
>> http://172.24.4.140:60092/
>> Server Software:
>> Server Hostname:        172.24.4.140
>> Server Port:            60092
>>
>> Document Path:          /
>> Document Length:        1 bytes
>>
>> Concurrency Level:      1000
>> Time taken for tests:   118.283 seconds
>> Complete requests:      100000
>> Failed requests:        0
>> Total transferred:      14300000 bytes
>> HTML transferred:       100000 bytes
>> Requests per second:    845.43 [#/sec] (mean)
>> Time per request:       1182.826 [ms] (mean)
>> Time per request:       1.183 [ms] (mean, across all concurrent requests)
>> Transfer rate:          118.06 [Kbytes/sec] received
>>
>> Connection Times (ms)
>>               min  mean[+/-sd] median   max
>> Connect:        0  676 2164.1      1   64969
>> Processing:     2  477 193.9    438    3823
>> Waiting:        2  474 191.8    435    3823
>> Total:          2 1152 2190.9    511   65785
>>
>> Percentage of the requests served within a certain time (ms)
>>   50%    511
>>   66%    759
>>   75%   1391
>>   80%   1474
>>   90%   1944
>>   95%   3567
>>   98%   7576
>>   99%   7910
>>  100%  65785 (longest request)
>> stack@mjozefcz-devstack-ovn-lb-master-new-localconf:~$
>> -------------------------------------------------------------------------------------------------------------------------
>>
>> 2. Test with selection fields set:
>> -------------------------------------------------------------------------------------------------------------------------
>> ack@mjozefcz-devstack-ovn-lb-master-new-localconf:~$ ab -n 100000 -c 1000 
>> http://172.24.4.140:60092/
>>
>>
>> Server Software:
>> Server Hostname:        172.24.4.140
>> Server Port:            60092
>>
>> Document Path:          /
>> Document Length:        1 bytes
>>
>> Concurrency Level:      1000
>> Time taken for tests:   121.321 seconds
>> Complete requests:      100000
>> Failed requests:        0
>> Total transferred:      14300000 bytes
>> HTML transferred:       100000 bytes
>> Requests per second:    824.26 [#/sec] (mean)
>> Time per request:       1213.208 [ms] (mean)
>> Time per request:       1.213 [ms] (mean, across all concurrent requests)
>> Transfer rate:          115.11 [Kbytes/sec] received
>>
>> Connection Times (ms)
>>               min  mean[+/-sd] median   max
>> Connect:        0  686 2487.7      1   63750
>> Processing:     2  494 339.7    436   20023
>> Waiting:        2  491 338.7    434   20023
>> Total:          3 1180 2523.9    511   64590
>>
>> Percentage of the requests served within a certain time (ms)
>>   50%    511
>>   66%    732
>>   75%   1373
>>   80%   1463
>>   90%   1917
>>   95%   3593
>>   98%   7585
>>   99%   7965
>>  100%  64590 (longest request)
>> stack@mjozefcz-devstack-ovn-lb-master-new-localconf:~$
>> -------------------------------------------------------------------------------------------------------------------------
>>
>>
>> Comparison (without vs with selection fields):
>> * Time taken to perform the test: 118.283 seconds VS 121.321 seconds
>> * Time per request:       1.183 [ms] VS 1.213 [ms]
>> * Connection Times (ms)
>>               min  mean[+/-sd] median   max
>> Connect:        0  676 2164.1      1   64969
>> Total:          2 1152 2190.9    511   65785
>> -- VS --
>> Connect:        0  686 2487.7      1   63750
>> Total:          3 1180 2523.9    511   64590
>>
>>
>> Summary:
>> * There is no big penalty, from what I can calculate it is about 2.5-3%
>> * ovs-vswitchd process with selection_fields set took around 10-20% of CPU
>> * ovs-vswitchd process without selection_fields took around 1% of CPU
>> * during the test I reached 100% cpu consumption on members
>>
>> Do you have any good ideas for scenarios to test where the perf hit could be 
>> more noticeable? I tried with other scenario - load balancer on a different 
>> host than members, but I reached similar numbers like this ones.
>>
>>
> If the members reached 100% cpu, I assume the bottleneck is on the members
instead of the client side OVS, so it might explain a little about the not
so obvious difference. It may be worth trying with more members so that the
server side load can be distributed and the client side bottleneck can be
tested more accurately.

Another point here is that although you tested n = 100000, ab may not use
all different source ports for all these 100000 requests, thus results in
less number of 5-tuples, which means there are still a big amount of
packets (maybe half of them) not going to userspace with "hash" method,
since when all 5-tuple matches the packet will hit megaflow cache in kernel
datapath. Even though, this is obviously less efficient than using
"dp_hash" because all packets except the first ones should hit a single
metaflow in kernel cache when "dp_hash" is used. The 10 - 20% cpu v.s. 1%
cpu of ovs-vswitchd may have already reflected this.

>
> Have you tested with TCP load balancers ? Maybe it's worth testing that
> too.
>
> I would also suggest having a test which sends some huge data once the
> connection is established.
>

I think with huge data sending on small number of connections would have
same result between dp_hash and hash, because in both cases most of the
packets are going through kernel datapath.


> Thanks
> Numan
>
>
>>
>>
>> Maciej
>>
>>
>> On Mon, May 11, 2020 at 4:19 PM Maciej Jozefczyk <mjoze...@redhat.com>
>> wrote:
>>
>>> Hello,
>>>
>>> Thanks for working on this.
>>>
>>> The change for setting the selection fields for Load_Balancer has been
>>> merged upstream [1].
>>> This change helped a lot and OVN Octavia provider driver (master) is now
>>> able to specify those fields [2] and as effect of it - is able to be
>>> properly tested with tempest tests [3].
>>>
>>> The OVN Octavia bug described here [4], which is solved by the selection
>>> fields, is still valid for stable releases and the backport will help.
>>> The original behaviour wasn't changed, because if the selection fields
>>> in Load_Balancer row are not set, then it will still use 5-tuple hash as
>>> previously.
>>>
>>> Can I ask You to cherry-pick that change to the latest stable
>>> branch-20.03?
>>>
>>>
>>> [1]
>>> https://github.com/ovn-org/ovn/commit/5af304e7478adcf5ac50ed41e96a55bebebff3e8
>>> [2] https://review.opendev.org/#/c/726787
>>> [3] https://review.opendev.org/#/c/714004/
>>> [4] https://bugs.launchpad.net/neutron/+bug/1871239
>>>
>>> On Mon, May 4, 2020 at 3:40 AM Han Zhou <zhou...@gmail.com> wrote:
>>>
>>>>
>>>>
>>>> On Thu, Apr 30, 2020 at 4:12 AM Tonghao Zhang <xiangxia.m....@gmail.com>
>>>> wrote:
>>>> >
>>>> > On Thu, Apr 30, 2020 at 2:58 PM Han Zhou <zhou...@gmail.com> wrote:
>>>> > >
>>>> > > Thanks Maciej for testing.
>>>> > >
>>>> > > On Tue, Apr 28, 2020 at 5:36 AM Maciej Jozefczyk <
>>>> mjoze...@redhat.com> wrote:
>>>> > > > Here are my findings.
>>>> > > >
>>>> > > > 1) Test LB and sleep 1 second between calls:
>>>> > > > ./get-data.py --lb-dest 172.24.4.58:60092 --sleep-time 1
>>>> > > >
>>>> > > > result: http://paste.openstack.org/show/792818/
>>>> > > > Different backeds are selected and different buckets are being
>>>> hit in group_id=3. Sometimes the bucket1 is hit, sometimes bucket0.
>>>> > > > Output from groups dumps during the test:
>>>> http://paste.openstack.org/show/792820/
>>>> > > >
>>>> > > >
>>>> > > > 2) Test LB and sleep 60 second between calls:
>>>> > > > ./get-data.py --lb-dest 172.24.4.58:60092 --sleep-time 60
>>>> > > >
>>>> > > > Result: http://paste.openstack.org/show/792822/
>>>> > > > Output from group stats: http://paste.openstack.org/show/792823/
>>>> > > > Always one bucket is hit (bucket0) and requests are pointed to
>>>> the same backend.
>>>> > > >
>>>> > >
>>>> > > This test result proved our earlier analysis: different hash values
>>>> generated between kernel fastpath and userspace slowpath was the culprit.
>>>> > >
>>>> > > >
>>>> > > >
>>>> > > > On Fri, Apr 24, 2020 at 6:09 PM Ilya Maximets <i.maxim...@ovn.org>
>>>> wrote:
>>>> > > >>
>>>> > > >> On 4/24/20 3:19 AM, Han Zhou wrote:
>>>> > > >> > Based on the discussion in OVN meeting today I did some more
>>>> testing, and here are my findings.
>>>> > > >> >
>>>> > > >> > - With ICMP (ping) between same source and destination it is
>>>> always same bucket selected by dp_hash.
>>>> > > >> > - With "nc" specifying same TCP 5-tuples, the packets can end
>>>> up into different buckets. This is similar to what Numan and Maciej
>>>> observed.
>>>> > > >> >
>>>> > > >> > However, I was using the OVN ECMP feature to test instead of
>>>> LB. Since ECMP feature doesn't use conntrack, here are some more findings.
>>>> The bucket selection changes only between 2 buckets, and the change happens
>>>> when the packet datapath changes between userspace and kernel datapath.
>>>> Let's say the first packet of a flow (megaflow) goes to userspace, it hits
>>>> bucket1, and then if I send the second and more packets immediately they
>>>> will all hit bucket2, but if I wait for a while until the flow disappears
>>>> from the megaflow cache and then send the next packet, it will hit bucket1
>>>> again. This behavior is consistent.
>>>> > > >> >
>>>> > > >> > So I think it might be because of the different implementation
>>>> of dp_hash in userspace and kernel datapath, the different buckets were
>>>> selected (thanks Ilya for this hint in today's meeting).
>>>> > > >> > Numan/Maciej, in your tests did you see more than 2 buckets
>>>> hit for same 5-tuples? If the above theory is right, you should see at most
>>>> 2 buckets hit. For LB, since it uses CT and only the first packet uses the
>>>> group, all packets of the same flow would always be forwarded to same LB
>>>> backend. I guess if you wait long enough between the tests, you should see
>>>> all tests hitting same backend. It would be great if you could confirm 
>>>> this.
>>>> > > >> >
>>>> > > >> > For ECMP, this behavior will cause occasional packets out of
>>>> order even for a single flow (for a burst of packets after some idle time),
>>>> because CT is not used (and we can't use it because when peered with
>>>> physical ECMP router groups we can't ensure the return traffic from
>>>> physical routers hits same LR).
>>>> > > >> >
>>>> > > >> > For LB it causes the unexpected behavior that is reported in
>>>> this thread.
>>>> > > >> >
>>>> > > >> > For the fix, I think we should figure out how to make sure
>>>> dp_hash always uses same hash algorithm for both userspace and kernel
>>>> implementation, if possible.
>>>> > > >> > I am ok with the patch from Numan for the capability of
>>>> configuring the desired hash method instead of always using default.
>>>> However, using "hash" may be a huge performance sacrifice since the packets
>>>> are always handled in slowpath, especially for ECMP. Even though LB uses
>>>> CT, for short-lived flow scenario this is still a big performance penalty
>>>> (for long lived flows of LB it may be ok since the majority of packets are
>>>> still in fastpath).
>>>> > > >> >
>>>> > > >> > I am not familiar with the dp_hash implementation. I will do
>>>> some more study, but any idea on how to ensure the consistency of dp_hash
>>>> is highly appreciated!
>>>> > > >>
>>>> > > >> I had an impression that packet hash is routed from the datapath
>>>> to userspace
>>>> > > >> and back, but it turned out that it's really recent change.
>>>> Seems like following
>>>> > > >> changes are required:
>>>> > > >>
>>>> > > >> 1. Linux kernel: bd1903b7c459 ("net: openvswitch: add hash info
>>>> to upcall")
>>>> > > >>    This is avaialble starting from upstream kernel v5.5.
>>>> > > >>
>>>> > > >> 2. OVS: 0442bfb11d6c ("ofproto-dpif-upcall: Echo HASH attribute
>>>> back to datapath.")
>>>> > > >>    This is available on branch-2.13.
>>>> > > >>
>>>> > > >> With above two patches first and subsequent packets should have
>>>> same dp_hash
>>>> > > >> calculated by the kernel datapath.
>>>> > > >>
>>>> > >
>>>> > > Thanks Ilya for this information! Do you know why the kernel fix
>>>> (1) was not backported to OVS repo? Was it missed or on purpose? cc Tonghao
>>>> Zhang.
>>>> > > I had a test using ovn-fake-multinode, with OVS master, the problem
>>>> disappeared. However I am using an old kernel module and I don't think the
>>>> patch (1) is there.
>>>> > My most patches were applied in linux kernel upstream, and I don't
>>>> > know who backports that, but I am fine to backport it after holiday.
>>>> > Thanks.
>>>>
>>>> Thanks Tonghao! I backported your kernel patch and related fix here,
>>>> please take a look:
>>>> https://patchwork.ozlabs.org/project/openvswitch/patch/1588554154-30608-1-git-send-email-hz...@ovn.org/
>>>>
>>>> I tested it on the same environment where I could reproduce the issue
>>>> with OVN ECMP. Here is the new behavior:
>>>>
>>>> - With a single connection (e.g. SSH), all packets are hitting the same
>>>> bucket now, now matter if datapath megaflow exist or not. This solves the
>>>> potential out of order problem for ECMP.
>>>>
>>>> - With "nc" to test same 5-tuple but different connections, still
>>>> different buckets were selected. I think it is probably because the hash
>>>> used now by dp_hash is generated by the socket, which is random, as
>>>> mentioned by the commit message of the kernel patch. It is not a problem
>>>> for ECMP, but it would not solve the problem initially brought up by this
>>>> thread, for the requirement of the special LB use cases.
>>>>
>>>> To solve the LB problem with dp_hash (as a further improvement), can we
>>>> support calculating hash in datapath instead of from skb, while using the
>>>> same mechanism used by this patch to ensure the same hash value is used for
>>>> the upcall handling?
>>>>
>>>> P.S. another problem for LB use case is that there could be many
>>>> backends (more than 64). The current dp_hash implementation has a
>>>> limitation that if there are more than 64 hash values required, it falls
>>>> back to the original "hash" algorithm, which satisfies the same 5-tuple,
>>>> same bucket need but could have performance problem. The limitation mainly
>>>> comes from the requirement of supporting different weight of the bucket,
>>>> using the Webster method. However, the OVN LB doesn't even use the weight
>>>> (they are all equal).
>>>>
>>>> Thanks,
>>>> Han
>>>>
>>>> > > >>
>>>> > > >> >
>>>> > > >> > Thanks,
>>>> > > >> > Han
>>>> > > >> >
>>>> > > >> > On Tue, Apr 21, 2020 at 1:05 AM Daniel Alvarez Sanchez <
>>>> dalva...@redhat.com <mailto:dalva...@redhat.com>> wrote:
>>>> > > >> >>
>>>> > > >> >> Thanks Numan for the investigation and the great explanation!
>>>> > > >> >>
>>>> > > >> >> On Tue, Apr 21, 2020 at 9:38 AM Numan Siddique <
>>>> num...@ovn.org <mailto:num...@ovn.org>> wrote:
>>>> > > >> >>>
>>>> > > >> >>> On Fri, Apr 17, 2020 at 12:56 PM Han Zhou <zhou...@gmail.com
>>>> <mailto:zhou...@gmail.com>> wrote:
>>>> > > >> >>> >
>>>> > > >> >>> >
>>>> > > >> >>> >
>>>> > > >> >>> > On Tue, Apr 7, 2020 at 7:03 AM Maciej Jozefczyk <
>>>> mjoze...@redhat.com <mailto:mjoze...@redhat.com>> wrote:
>>>> > > >> >>> > >
>>>> > > >> >>> > > Hello!
>>>> > > >> >>> > >
>>>> > > >> >>> > > I would like to ask you to clarify how the OVN Load
>>>> balancing algorithm works.
>>>> > > >> >>> > >
>>>> > > >> >>> > > Based on the action [1]:
>>>> > > >> >>> > > 1) If connection is alive the same 'backend' will be
>>>> chosen,
>>>> > > >> >>> > >
>>>> > > >> >>> > > 2) If it is a new connection the backend will be chosen
>>>> based on selection_method=dp_hash [2].
>>>> > > >> >>> > > Based on changelog the dp_hash uses '5 tuple hash' [3].
>>>> > > >> >>> > > The hash is calculated based on values: source and
>>>> destination IP,  source port, protocol and arbitrary value - 42. [4]
>>>> > > >> >>> > > Based on that information we could name it
>>>> SOURCE_IP_PORT.
>>>> > > >> >>> > >
>>>> > > >> >>> > > Unfortunately we recently got a bug report in OVN
>>>> Octavia provider driver project, that the Load Balancing in OVN
>>>> > > >> >>> > > works differently [5]. The report shows even when the
>>>> test uses the same source ip and port, but new TCP connection,
>>>> > > >> >>> > > traffic is randomly distributed, but based on [2] it
>>>> shouldn't?
>>>> > > >> >>> > >
>>>> > > >> >>> > > Is it a bug?  Is something else taken to account while
>>>> creating a hash? Can it be fixed in OVS/OVN?
>>>> > > >> >>> > >
>>>> > > >> >>> > >
>>>> > > >> >>> > >
>>>> > > >> >>> > > Thanks,
>>>> > > >> >>> > > Maciej
>>>> > > >> >>> > >
>>>> > > >> >>> > >
>>>> > > >> >>> > > [1]
>>>> https://github.com/ovn-org/ovn/blob/branch-20.03/lib/actions.c#L1017
>>>> > > >> >>> > > [2]
>>>> https://github.com/ovn-org/ovn/blob/branch-20.03/lib/actions.c#L1059
>>>> > > >> >>> > > [3]
>>>> https://github.com/openvswitch/ovs/blob/d58b59c17c70137aebdde37d3c01c26a26b28519/NEWS#L364-L371
>>>> > > >> >>> > > [4]
>>>> https://github.com/openvswitch/ovs/blob/74286173f4d7f51f78e9db09b07a6d4d65263252/lib/flow.c#L2217
>>>> > > >> >>> > > [5] https://bugs.launchpad.net/neutron/+bug/1871239
>>>> > > >> >>> > >
>>>> > > >> >>> > > --
>>>> > > >> >>> > > Best regards,
>>>> > > >> >>> > > Maciej Józefczyk
>>>> > > >> >>> > > _______________________________________________
>>>> > > >> >>> > > discuss mailing list
>>>> > > >> >>> > > disc...@openvswitch.org <mailto:disc...@openvswitch.org>
>>>> > > >> >>> > >
>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>>>> > > >> >>> >
>>>> > > >> >>> > Hi Maciej,
>>>> > > >> >>> >
>>>> > > >> >>> > Thanks for reporting. It is definitely strange that same
>>>> 5-tuple flow resulted in hitting different backends. I didn't observed such
>>>> behavior before (maybe I should try again myself to confirm). Can you make
>>>> sure during the testing the group bucket didn't change? You can do so by:
>>>> > > >> >>> > # ovs-ofctl dump-groups br-int
>>>> > > >> >>> > and also check the group stats and see if multiple buckets
>>>> has counter increased during the test
>>>> > > >> >>> > # ovs-ofctl dump-group-stats br-int [group]
>>>> > > >> >>> >
>>>> > > >> >>> > For the 5-tuple hash function you are seeing
>>>> flow_hash_5tuple(), it is using all the 5-tuples. It adds both ports (src
>>>> and dst) at once:
>>>> > > >> >>> >        /* Add both ports at once. */
>>>> > > >> >>> >         hash = hash_add(hash,
>>>> > > >> >>> >                         ((const uint32_t
>>>> *)flow)[offsetof(struct flow, tp_src)
>>>> > > >> >>> >                                                  /
>>>> sizeof(uint32_t)]);
>>>> > > >> >>> >
>>>> > > >> >>> > The tp_src is the start of the offset, and the size is 32,
>>>> meaning both src and dst, each is 16 bits. (Although I am not sure if
>>>> dp_hash method is using this function or not. Need to check more code)
>>>> > > >> >>> >
>>>> > > >> >>> > BTW, I am not sure why Neutron give it the name
>>>> SOURCE_IP_PORT. Shall it be called just 5-TUPLE, since protocol,
>>>> destination IP and PORT are also considered in the hash.
>>>> > > >> >>> >
>>>> > > >> >>>
>>>> > > >> >>>
>>>> > > >> >>> Hi Maciej and Han,
>>>> > > >> >>>
>>>> > > >> >>> I did some testing and I can confirm as you're saying. OVN
>>>> is not
>>>> > > >> >>> choosing the same backend with the src ip, src port fixed.
>>>> > > >> >>>
>>>> > > >> >>> I think there is an issue with OVN on how it is programming
>>>> the group
>>>> > > >> >>> flows.  OVN is setting the selection_method as dp_hash.
>>>> > > >> >>> But when ovs-vswitchd receives the  GROUP_MOD openflow
>>>> message, I
>>>> > > >> >>> noticed that the selection_method is not set.
>>>> > > >> >>> From the code I see that selection_method will be encoded
>>>> only if
>>>> > > >> >>> ovn-controller uses openflow version 1.5 [1]
>>>> > > >> >>>
>>>> > > >> >>> Since selection_method is NULL, vswitchd uses the dp_hash
>>>> method [2].
>>>> > > >> >>> dp_hash means it uses the hash calculated by
>>>> > > >> >>> the datapath. In the case of kernel datapath, from what I
>>>> understand
>>>> > > >> >>> it uses skb_get_hash().
>>>> > > >> >>>
>>>> > > >> >>> I modified the vswitchd code to use the selection_method
>>>> "hash" if
>>>> > > >> >>> selection_method is not set. In this case the load balancer
>>>> > > >> >>> works as expected. For a fixed src ip, src port, dst ip and
>>>> dst port,
>>>> > > >> >>> the group action is selecting the same bucket always. [3]
>>>> > > >> >>>
>>>> > > >> >>> I think we need to fix a few issues in OVN
>>>> > > >> >>>   - Use openflow 1.5 so that ovn can set selection_method
>>>> > > >> >>>  -  Use "hash" method if dp_hash is not choosing the same
>>>> bucket for
>>>> > > >> >>> 5-tuple hash.
>>>> > > >> >>>   - May be provide the option for the CMS to choose an
>>>> algorithm i.e.
>>>> > > >> >>> to use dp_hash or hash.
>>>> > > >> >>>
>>>> > > >> >> I'd rather not expose this to the CMS as it depends on the
>>>> datapath implementation as per [0] but maybe it makes sense to eventually
>>>> abstract it to the CMS in a more LB-ish way (common algorithm names used in
>>>> load balancing) in the case at some point the LB feature is enhanced
>>>> somehow to support more algorithms.
>>>> > > >> >>
>>>> > > >> >> I believe that for OVN LB users, using OF 1.5 to force the
>>>> use of 'hash' would be the best solution now.
>>>> > > >> >>
>>>> > > >> >> My 2 cents as I'm not an LB expert.
>>>> > > >> >>
>>>> > > >> >> I also recall that we tested this in the past and seemed to
>>>> be working. I have been checking further in the doc [0] and found this
>>>> paragraph:
>>>> > > >> >>
>>>> > > >> >> "If no selection method is specified, Open vSwitch up to
>>>> release 2.9 applies the hash method with default fields. From 2.10 onwards
>>>> Open vSwitch defaults to the dp_hash method with symmetric L3/L4 hash
>>>> algorithm, unless the weighted group buck‐ ets cannot be mapped to a
>>>> maximum of 64 dp_hash values with sufficient accuracy. In those rare cases
>>>> Open vSwitch 2.10 and later fall back to the hash method with the default
>>>> set of hash fields."
>>>> > > >> >>
>>>> > > >> >> The explanation seems to be that when we tested the feature
>>>> we relied on OVS 2.9 and hence the confusion.
>>>> > > >> >>
>>>> > > >> >> Thanks a lot again!
>>>> > > >> >> Daniel
>>>> > > >> >>
>>>> > > >> >> [0]
>>>> http://www.openvswitch.org/support/dist-docs/ovs-ofctl.8.html
>>>> > > >> >>
>>>> > > >> >>>
>>>> > > >> >>> I'll look into it on how to support this.
>>>> > > >> >>>
>>>> > > >> >>> [1] -
>>>> https://github.com/openvswitch/ovs/blob/master/lib/ofp-group.c#L2120
>>>> > > >> >>>
>>>> https://github.com/openvswitch/ovs/blob/master/lib/ofp-group.c#L2082
>>>> > > >> >>>
>>>> > > >> >>> [2] -
>>>> https://github.com/openvswitch/ovs/blob/master/ofproto/ofproto-dpif.c#L5108
>>>> > > >> >>> [3] -
>>>> https://github.com/openvswitch/ovs/blob/master/ofproto/ofproto-dpif-xlate.c#L4553
>>>> > > >> >>>
>>>> > > >> >>>
>>>> > > >> >>> Thanks
>>>> > > >> >>> Numan
>>>> > > >> >>>
>>>> > > >> >>>
>>>> > > >> >>> > Thanks,
>>>> > > >> >>> > Han
>>>> > > >> >>> > _______________________________________________
>>>> > > >> >>> > discuss mailing list
>>>> > > >> >>> > disc...@openvswitch.org <mailto:disc...@openvswitch.org>
>>>> > > >> >>> > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>>>> > > >> >>> _______________________________________________
>>>> > > >> >>> discuss mailing list
>>>> > > >> >>> disc...@openvswitch.org <mailto:disc...@openvswitch.org>
>>>> > > >> >>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>>>> > > >>
>>>> > > >> _______________________________________________
>>>> > > >> discuss mailing list
>>>> > > >> disc...@openvswitch.org
>>>> > > >> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>>>> > > >
>>>> > > >
>>>> > > >
>>>> > > > --
>>>> > > > Best regards,
>>>> > > > Maciej Józefczyk
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Best regards, Tonghao
>>>>
>>>
>>>
>>> --
>>> Best regards,
>>> Maciej Józefczyk
>>>
>>
>>
>> --
>> Best regards,
>> Maciej Józefczyk
>> _______________________________________________
>> discuss mailing list
>> disc...@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>>
>

_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Re: [ovs-discuss] [OVN] OVN Load balancing algorithm

Reply via email to