On 3/22/21 1:10 PM, gmingchen(陈供明) wrote: > > > On 2021/3/19, 9:02 PM, "Dumitru Ceara" <dce...@redhat.com> wrote: > > On 3/10/21 2:29 PM, gmingchen(陈供明) wrote: > > From: Gongming Chen <gmingc...@tencent.com> > > > > This patch merges ipv4 addresses with wildcard masks, and replaces this > > ipv4 addresses with the combined ip/mask. This will greatly reduce the > > entries in the ovs security group flow table, especially when the host > > size is large. > > > > Analysis in the simplest scenario, a network 1.1.1.0/24 network,create > 253 > > ports(1.1.1.2-1.1.1.254). > > Only focus on the number of ip addresses, the original 253 addresses > will > > be combined into 13 addresses. > > 1.1.1.2/31 > > 1.1.1.4/30 > > 1.1.1.8/29 > > 1.1.1.16/28 > > 1.1.1.32/27 > > 1.1.1.64/26 > > 1.1.1.128/26 > > 1.1.1.192/27 > > 1.1.1.224/28 > > 1.1.1.240/29 > > 1.1.1.248/30 > > 1.1.1.252/31 > > 1.1.1.254 > > > > Some scenes are similar to the following: > > 1.1.1.2, 1.1.1.6 > > After the combine: > > 1.1.1.2/255.255.255.251 > > You can use ovn-match-ip utility to match ip. > > such as: > > ovs-ofctl dump-flows br-int | ovn-match-ip 1.1.1.6 > > 1.1.1.2/255.255.255.251 will show. > > > > Simple description of the algorithm. > > There are two situations > > 1. Combine once > > such as: > > 1.1.1.0 1.1.1.1 1.0.1.0 1.0.1.1 > > Combined into: 1.1.1.0/31, 1.0.1.0/31 > > 2. Combine multiple times > > 1.1.1.0 1.1.1.1 1.0.1.0 1.0.1.1 > > Combined into: 1.0.1.0/255.254.255.254 > > > > Considering the actual scene and simplicity, the first case is used to > > combine once. > > > > ...00... > > ...01... > > ...10... > > ...11... > > "..." means the same value omitted. > > Obviously, the above value can be expressed as ...00.../11100111. This > > continuous interval that can be represented by one or several wildcard > > masks is called a segment. > > Only if all 2<<n values from the minimum value 00...(n) to 11...(n) > > exist, can they be combined into 00...(n)/00...( n) > > > > First sort all the values by size. Iterate through each value. > > 1. Find a new segment, where two values differ only by 1 bit, such as > > ...0... and ...1... > > diff = ip_next ^ ip > > if (diff & (diff-1)) == 0 > > new_segment = true > > The first non-zero place in the high direction of ip is the end of the > > segment(segment_end). > > For example...100... and...101..., the segment_end is ...111... > > > > 2. Count the number of consecutive and less than continuous_size in the > > segment. > > diff = ip_next - ip > > if (diff & (diff-1)) == 0 && ip_next <= segment_end > > continuous_size++ > > > > 3. Combine different ip intervals in the segment according to > > continuous_size. > > In continuous_size, from the highest bit of 1 to the lowest bit of 1, in > > the order of segment start, each bit that is 1 is a different ip > interval > > that can be combined with a wildcard mask. > > For example, 000, 001, 010: > > continuous_size: 3 (binary 11), segment_start: 000 > > mask: ~(1 << 1 - 1) = 110; ~(1 << 0 - 1) = 111; > > Combined to: 000/110, 010/111 > > > > 4. The ip that cannot be recorded in a segment will not be combined. > > > > Signed-off-by: Gongming Chen <gmingc...@tencent.com> > > --- > > Hi Gongming, > > Sorry for the delayed review. > > I have a few general remarks/concerns and some specific comments inline. > First, the general remarks. > > I'm wondering if it would make more sense for this wildcard combination > of IPs to be done outside OVN, in the CMS, for the following reasons: > > - it's IPv4 specific. > - the CMS usually has more information about when it is matching on IP > sets and can probably optimize the match it uses in the ACL. > - the code in expr_const_sets_add() used to be a relatively direct > translation from sets of constants to sets of port names/IPs; this > changes with the optimization your proposing. > - the new utility, ovn-match-ip, would be useful but I'm worried that > users will often forget to use it. > > I'm curious about your and the maintainers' opinion on these points. > > Hi Dumitru, > First, thanks for your review. > > On the one hand, I agree with your point of view. The cms side is more > clear about the details of the ip, and the ip can be better optimized. > However, on the other hand, I think that ovn, as a network provider, should > not assume that upper layers such as cms have already made these > optimizations. > In fact, some cms do not have such a function, such as openstack, k8s, > and they hope that this specific and complex work will be realized by > the network provider. > In terms of better serving the upper layer or upper layer use, this > optimization function of ovn simplifies the upper layer work, which > will just make users more satisfied. > This is some of my views from the perspective of cms.
I see, I'll let Numan, Mark, and others share their opinions on this too. > > Another general remark is to increment the patch version when submitting > a new revision. From what I can see this specific patch is at v7. When > sending a v8 you could add "-v8" when generating the patch, e.g.: > > git format-patch --subject-prefix="PATCH ovn" -v8 -M origin/master > > Would generate a patch with subject: > > "Subject: [PATCH ovn v8] ...." > > This usually helps when reviewing patches that go through multiple > iterations. > > Thanks. > > More specific comments inline. > > > debian/ovn-common.install | 1 + > > lib/expr.c | 219 ++++++++++++++++++++++++++++++++++++++ > > rhel/ovn-fedora.spec.in | 1 + > > tests/ovn.at | 136 +++++++++++------------ > > tests/test-ovn.c | 9 ++ > > utilities/automake.mk | 5 +- > > utilities/ovn-match-ip.in | 78 ++++++++++++++ > > 7 files changed, 382 insertions(+), 67 deletions(-) > > create mode 100755 utilities/ovn-match-ip.in > > > > diff --git a/debian/ovn-common.install b/debian/ovn-common.install > > index 8e5915724..0095df918 100644 > > --- a/debian/ovn-common.install > > +++ b/debian/ovn-common.install > > @@ -5,6 +5,7 @@ usr/bin/ovn-ic-nbctl > > usr/bin/ovn-ic-sbctl > > usr/bin/ovn-trace > > usr/bin/ovn-detrace > > +usr/bin/ovn-match-ip > > usr/share/ovn/scripts/ovn-ctl > > usr/share/ovn/scripts/ovndb-servers.ocf > > usr/share/ovn/scripts/ovn-lib > > diff --git a/lib/expr.c b/lib/expr.c > > index f061a8fbe..0a7203817 100644 > > --- a/lib/expr.c > > +++ b/lib/expr.c > > @@ -1060,6 +1060,200 @@ expr_constant_set_destroy(struct > expr_constant_set *cs) > > } > > } > > > > +struct ip_in > > +{ > > + uint32_t *ip; > > + size_t size; > > +}; > > I don't think we really need this, we could just pass the IPv4 address > array to combine_ipv4_in_mask(), along with the number of items in the > array. > > Yes, in combine_ipv4_in_mask(), size of ipv4 array is not necessary. > But I think that the structure composed of *ip and size is more helpful for > expr_const_sets_add to initialize an array memory of n_values size instead > of accurately malloc the entry size of LEX_F_IPV4. > Because expr_const_sets_add does not know the number of LEX_F_IPV4 > when initializing the ipv4 array. > What do you think? In expr_const_sets_add() you currently allocate 'n_values' entries in ip_in_data.ip and then update ip_in_data.size to the exact number of non-masked IPv4 tokens. I don't see a need for 'size' to be stored in 'ip_in_data', it can just be local to expr_const_sets_add() and passed as argument to combine_ipv4_in_mask(). > > > + > > +struct ip_out_entry > > +{ > > + uint32_t ip; > > + uint32_t mask; > > + bool masked; > > +}; > > + > > +struct ip_out > > +{ > > + struct ip_out_entry *entries; > > + size_t used; > > + size_t size; > > +}; > > AFAIU, we can never have more "out" entries than "in" entries. This > means we could be conservative and always allocate as many output > entries as ip_in entries and pass the preallocated struct ip_out_entry > array to combine_ipv4_in_mask(). > > This would also remove the need to resize the "out" array. > > Yes, good idea. > > > + > > +static int > > +compare_mask_ip(const void *a, const void *b) > > The implementation doesn't really use the fact that 'a' and 'b' are IPs. > They're just compared as integers. We might want to use a different > name and move this function to a common module as it could potentially > be used in multiple places in the future. > > How about adding compare_uint32s(const void *a_, const void *b_) to > lib/util.c? Sounds good to me. I guess you mean lib/ovn-util.c though. > > > +{ > > + uint32_t a_ = *(uint32_t *)a; > > + uint32_t b_ = *(uint32_t *)b; > > + > > + return a_ < b_ ? -1 : a_ > b_; > > +} > > + > > +/* Function to check ip return data and xrealloc. */ > > +static void > > +check_realloc_ip_out(struct ip_out *ip_out_data){ > > + if (ip_out_data->used >= ip_out_data->size) { > > + ip_out_data->entries = x2nrealloc(ip_out_data->entries, > > + &ip_out_data->size, > > + sizeof *ip_out_data->entries); > > + } > > +} > > As mentioned in the comment earlier, we can probably avoid using this > function. > > Yes. > > > + > > +/* Simple description of the algorithm. > > + * There are two situations > > + * 1. Combine once > > + * such as: > > + * 1.1.1.0 1.1.1.1 1.0.1.0 1.0.1.1 > > + * Combined into: 1.1.1.0/31, 1.0.1.0/31 > > + * 2. Combine multiple times > > + * 1.1.1.0 1.1.1.1 1.0.1.0 1.0.1.1 > > + * Combined into: 1.0.1.0/255.254.255.254 > > + * > > + * Considering the actual scene and simplicity, the first case is used > to > > + * combine once. > > + * > > + * ...00... > > + * ...01... > > + * ...10... > > + * ...11... > > + * "..." means the same value omitted. > > + * Obviously, the above value can be expressed as ...00.../11100111. > This > > + * continuous interval that can be represented by one or several > wildcard > > + * masks is called a segment. > > + * Only if all 2<<n values from the minimum value 00...(n) to 11...(n) > > + * exist, can they be combined into 00...(n)/00...( n) > > + * > > + * First sort all the values by size. Iterate through each value. > > + * 1. Find a new segment, where two values differ only by 1 bit, such > as > > + * ...0... and ...1... > > + * diff = ip_next ^ ip > > + * if (diff & (diff-1)) == 0 > > + * new_segment = true > > + * The first non-zero place in the high direction of ip is the end of > the > > + * segment(segment_end). > > + * For example...100... and...101..., the segment_end is ...111... > > + * > > + * 2. Count the number of consecutive and less than continuous_size in > the > > + * segment. > > + * diff = ip_next - ip > > + * if (diff & (diff-1)) == 0 && ip_next <= segment_end > > + * continuous_size++ > > + * > > + * 3. Combine different ip intervals in the segment according to > > + * continuous_size. > > + * In continuous_size, from the highest bit of 1 to the lowest bit of > 1, in > > + * the order of segment start, each bit that is 1 is a different ip > interval > > + * that can be combined with a wildcard mask. > > + * For example, 000, 001, 010: > > + * continuous_size: 3 (binary 11), segment_start: 000 > > + * mask: ~(1 << 1 - 1) = 110; ~(1 << 0 - 1) = 111; > > + * Combined to: 000/110, 010/111 > > + * > > + * 4. The ip that cannot be recorded in a segment will not be > combined. */ > > Thanks for adding the detailed description here. However, it's a bit > confusing because the names used in the description don't match the > variable names used in the implementation below. Maybe it's better to > not use names in the description? > > Ok, It would be better to replace it with a general description word. Thanks. > > > +static void > > +combine_ipv4_in_mask(struct ip_in *ip_in_data, struct ip_out > *ip_out_data){ > > Please move the curly brace on a new line. > > Ok, thanks. > > > + bool recording = false; > > + uint32_t i, diff, connect, start, end, continuous_size, mask_base; > > Some of these can be moved in the specific blocks where they're used. > > Ok. > > > + > > + start = 0; > > + continuous_size = 0; > > + mask_base = 0; > > + end = 0; > > + > > + qsort(ip_in_data->ip, ip_in_data->size, sizeof(uint32_t), > > Please use "sizeof *ip_in_data->ip". > > Ok, thanks. > > > + compare_mask_ip); > > + memset(ip_out_data, 0, sizeof(struct ip_out)); > > + > > + for (i = 0; i < ip_in_data->size; i++) { > > + if (i + 1 >= ip_in_data->size) { > > + if (!recording) { > > + goto end_when_not_recording; > > + } else { > > + goto end_when_recording; > > + } > > IMO it makes the code very hard to read if we have gotos jumping inside > if/else branches or inside loops. This is also usually a sign that > we're trying to avoid code duplication. A potentially simpler and more > readable way to do that is to add functions for the specific operations > we don't want to duplicate. > > Ok, I will use function instead of goto. Cool. > > Thanks, > Gongming Thanks, Dumitru _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev