On 7/16/21 1:51 AM, Ben Pfaff wrote: > On Tue, Jul 13, 2021 at 05:32:06PM +0200, Ilya Maximets wrote: >> There are 3 constraints for moving hashes from one member to another: >> >> 1. The load difference exceeds ~ 3% of the load of one member. >> 2. The difference in load between members exceeds 100,000 bytes. >> 3. Moving the hash reduces the load difference by more than 10%. >> >> In the current implementation, if one of the members transitions to >> the DOWN state, all hashes assigned to it will be moved to the other >> members. After that, if the member goes UP, it will wait for >> rebalancing to get hashes. But in case we have more than 10 equally >> loaded hashes, it will never meet constraint # 3, because each hash >> will handle less than 10% of the load. The situation gets worse when >> the number of flows grows and it is almost impossible to transfer any >> hash when all 256 hash records are used, which is very likely when we >> have few hundred/thousand flows. >> >> As a result, if one of the members goes down and back up while traffic >> flows, it will never be used to transmit packets again. This will not >> be fixed even if we completely stop the traffic and start it again, >> because the first two constraints will block rebalancing in the >> earlier stages, while we have low traffic volume. >> >> Moving a single hash if the destination does not have any hashes, >> as it was before commit c460a6a7bc75 ("ofproto/bond: simplifying the >> rebalancing logic"), will not help, because a single hash is not >> enough to make the difference in load less than 10% of the total load, >> and this member will handle only that one hash forever. >> >> To fix this, let's try to move multiple hashes at the same time to >> meet constraint # 3. >> >> The implementation includes sorting the "records" to be able to >> collect records with a cumulative load close enough to the ideal value. >> >> Signed-off-by: Ilya Maximets <i.maxim...@ovn.org> > > I reread this and it still looks good to me. > > I spotted one typo in a comment: > > diff --git a/ofproto/bond.c b/ofproto/bond.c > index b9bfa45493b8..c3e2083575b0 100644 > --- a/ofproto/bond.c > +++ b/ofproto/bond.c > @@ -1216,7 +1216,7 @@ choose_entries_to_migrate(const struct bond_member > *from, uint64_t to_tx_bytes, > } > > if (!cnt) { > - /* There is no entry which load less than or equal to 'ideal_delta'. > + /* There is no entry with load less than or equal to 'ideal_delta'. > * Lets try closest one. The closest is the last in sorted list. */ > struct bond_entry *closest; > > > Acked-by: Ben Pfaff <b...@ovn.org> >
Thanks! I fixed that and a repeated 'to to' in the other comment and applied to master. I also backported to 2.15 as it applies cleanly there. Not sure about backporting to 2.13 though, because it's kind of an algorithmic change. I can do the backport later if someone thinks that it's needed. Best regards, Ilya Maximets. _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev