On 11/8/23 05:39, Han Zhou wrote:
> On Tue, Nov 7, 2023 at 6:12 PM Han Zhou <hz...@ovn.org> wrote:
>>
>>
>>
>> On Tue, Nov 7, 2023 at 8:06 AM Dumitru Ceara <dce...@redhat.com> wrote:
>>>
>>> On 11/6/23 08:19, Han Zhou wrote:
>>>> On Sun, Nov 5, 2023 at 10:59 PM Ales Musil <amu...@redhat.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On Sat, Nov 4, 2023 at 5:45 AM Han Zhou <hz...@ovn.org> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Nov 3, 2023 at 1:08 AM Ales Musil <amu...@redhat.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Oct 24, 2023 at 9:36 PM Han Zhou <hz...@ovn.org> wrote:
>>>>>>>>
>>>>>>>> Enhance MAC_Binding aging to allow CIDR-based threshold
>>>> configurations.
>>>>>>>> This enables distinct threshold settings for different IP ranges,
>>>>>>>> applying the longest prefix matching for overlapping ranges.
>>>>>>>>
>>>>>>>> A common use case involves setting a default threshold for all
> IPs,
>>>> while
>>>>>>>> disabling aging for a specific range and potentially excluding a
>>>> subset
>>>>>>>> within that range.
>>>>>>>>
>>>>>>>> Signed-off-by: Han Zhou <hz...@ovn.org>
>>>>>>>> ---
>>>>>>>
>>>>>>>
>>>>>>> Hi Han,
>>>>>>>
>>>>>>> thank you for the patch, I have a couple of comments down below.
>>>>>>
>>>>>> Thanks for your review!
>>>>>>
>>>
>>>
>>> Hi Han,
>>>
>>> Thanks for the patch!  Not a very in-depth review but I left a bunch of
>>> minor comments and shared my opinion on the configuration format.
>>>
>>
>> Thanks for your review!
>>
>>> Regards,
>>> Dumitru
>>>
>>>>>>>
>>>>>>>>
>>>>>>>>  northd/aging.c  | 297
>>>> +++++++++++++++++++++++++++++++++++++++++++++---
>>>>>>>>  northd/aging.h  |   3 +
>>>>>>>>  northd/northd.c |  11 +-
>>>>>>>>  ovn-nb.xml      |  63 +++++++++-
>>>>>>>>  tests/ovn.at    |  60 ++++++++++
>>>>>>>>  5 files changed, 413 insertions(+), 21 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/northd/aging.c b/northd/aging.c
>>>>>>>> index f626c72c8ca3..e5868211a63b 100644
>>>>>>>> --- a/northd/aging.c
>>>>>>>> +++ b/northd/aging.c
>>>>>>>> @@ -47,12 +47,253 @@ aging_waker_schedule_next_wake(struct
>>>> aging_waker *waker, int64_t next_wake_ms)
>>>>>>>>      }
>>>>>>>>  }
>>>>>>>>
>>>>>>>> +struct threshold_entry {
>>>>>>>> +    union {
>>>>>>>> +        ovs_be32 ipv4;
>>>>>>>> +        struct in6_addr ipv6;
>>>>>>>> +    } prefix;
>>>>>>>> +    bool is_v4;
>>>>>>>
>>>>>>>
>>>>>>> We can avoid the is_v4 whatsover by storing the address as mapped
> v4
>>>> in 'in6_addr', I saw the concern and we can still have separate
> arrays for
>>>> v4 and v6. The parsing IMO becomes much simpler if we use the mapped
> v4 for
>>>> both see down below.
>>>>>>>
>>>>>>
>>>>>> I did consider using in6_addr, but it would make the
>>>> find_threshold_for_ip() slightly more tedious because for every v4
> entry,
>>>> even if stored in a separate array, we will still need to call
>>>> in6_addr_get_mapped_ipv4() to convert it. While using the union we can
>>>> directly access what we need (ipv4 or ipv6).
>>>>>
>>>>>
>>>>> We wouldn't have to actually call in6_addr_get_mapped_ipv4() for the
> v4
>>>> variant, that's why I suggested that approach. We can adjust the mask
> and
>>>> then use the same logic as for ipv6 (entry->plen += 96 should do the
> trick).
>>>>>
>>>>
>>>> Understood, and my reply below was about that:
>>>>>> The search may be simplified a little in find_threshold_for_ip()
> with
>>>> adjusted prefix for v4, only if we combine the two arrays to a single
>>>> array. Otherwise we will need to have a function to abstract and
> reuse the
>>>> ipv6 matching logic, while the ipv4 matching is really just 3 lines
> of code
>>>> :)
>>>>
>>>>>>
>>>>>> The is_v4 member was needed when using a single array, but not
> necessary
>>>> any more when storing v4 and v6 entries separately, but I kept it
> just for
>>>> convenience, so that I don't need to use another arg in
>>>> parse_threshold_entry() to return the AF type. Using in6_addr can
> avoid the
>>>> extra arg, too, but similarly, we would have to always call the macro
>>>> IN6_IS_ADDR_V4MAPPED, which doesn't seem cleaner than the is_v4 field
> to
>>>> me. Using ip46_parse_cidr does save several lines of code, but that
> code
>>>> snippet itself is quite small. So with all these being considered, I
> think
>>>> either approach doesn't really make too much difference. I am totally
> fine
>>>> to change it if you still think a single in6_addr is better than the
> union.
>>>>>>
>>>>>>>>
>>>>>>>> +    unsigned int plen;
>>>>>>>> +    unsigned int threshold;
>>>>>>>
>>>>>>>
>>>>>>> I personally prefer sized integers.
>>>>>>>
>>>>>> I think it may be better to match the type of related helper
> functions.
>>>>>> For plen, it is ip[v6]_parse_cidr().
>>>>>> For threshold, it is str_to_uint(), or even smap_get_uint() returns
>>>> unsigned int.
>>>>>>
>>>>>>>>
>>>>>>>> +};
>>>>>>>> +
>>>>>>>> +/* Contains CIDR-based aging threshold configuration parsed from
>>>>>>>> + * "Logical_Router:options:mac_binding_age_threshold".
>>>>>>>> + *
>>>>>>>> + * This struct is also used for non-CIDR-based threshold, e.g.
> the
>>>> ones from
>>>>>>>> + * "NB_Global:other_config:fdb_age_threshold" for the common
>>>> aging_context
>>>>>>>> + * interface.
>>>>>>>> + *
>>>>>>>> + * - The arrays `v4_entries` and `v6_entries` are populated with
>>>> parsed entries
>>>>>>>> + *   for IPv4 and IPv6 CIDRs, respectively, along with their
>>>> associated
>>>>>>>> + *   thresholds.  Entries within these arrays are sorted by
> prefix
>>>> length,
>>>>>>>> + *   starting with the longest.
>>>>>>>> + *
>>>>>>>> + * - If a threshold is provided without an accompanying prefix,
> it's
>>>> captured
>>>>>>>> + *   in `default_threshold`.  In cases with multiple unprefixed
>>>> thresholds,
>>>>>>>> + *   `default_threshold` will only store the last one.  */
>>>>>>>> +struct threshold_config {
>>>>>>>> +    struct threshold_entry *v4_entries;
>>>>>>>> +    int n_v4_entries;
>>>
>>> Nit: size_t
>>>
>> Ack
>>
>>>>>>>> +    struct threshold_entry *v6_entries;
>>>>>>>> +    int n_v6_entries;
>>>
>>> Nit: size_t
>>>
>> Ack
>>
>>>>>>>> +    unsigned int default_threshold;
>>>>>>>> +};
>>>>>>>> +
>>>>>>>> +static int
>>>>>>>> +compare_entries_by_prefix_length(const void *a, const void *b)
>>>>>>>> +{
>>>>>>>> +    const struct threshold_entry *entry_a = a;
>>>>>>>> +    const struct threshold_entry *entry_b = b;
>>>>>>>> +
>>>>>>>> +    return entry_b->plen - entry_a->plen;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +/* Parse an ENTRY in the threshold option, with the format:
>>>>>>>> + * [CIDR:]THRESHOLD
>>>>>>>> + *
>>>>>>>> + * Returns true if successful, false if failed. */
>>>>>>>> +static bool
>>>>>>>> +parse_threshold_entry(const char *str, struct threshold_entry
> *entry)
>>>>>>>> +{
>>>>>>>> +    char *colon_ptr;
>>>>>>>> +    unsigned int value;
>>>>>>>> +    const char *threshold_str;
>>>>>>>> +
>>>>>>>> +    colon_ptr = strrchr(str, ':');
>>>>>>>
>>>>>>>
>>>>>>> I find this a bit confusing and probably error prone, could we use
>>>> different delimiters for the threshold? I guess we could change it to
>>>> 'cidr;threshold,cidr;threshold' WDYT? This would allow us to use
>>>> 'strtok_r()' here and we could spare the multiple copies and replace
> with a
>>>> single one.
>>>>>>>
>>>>>> I thought about that, too, but decided to use ":" instead of ";"
> because
>>>> "A:B" looks more natural to represent "A has a value B", and ";" is
> more
>>>> like a delimiter to separate two different elements, just like ",".
> So I
>>>> feel that ":" might be more friendly to end users. I'd love to hear
>>>> opinions from more people.
>>>>>> If ";" or something else turned out to be a better option for the
> end
>>>> user, we can use strtok_r() here but I think it shouldn't be a
> concern if
>>>> we can't use it.
>>>
>>> IMO ":" is better.  But that's just personal preference.
>>>
>> OK
>>
>>>>>>
>>>>>>>>
>>>>>>>> +    if (!colon_ptr) {
>>>>>>>> +        threshold_str = str;
>>>>>>>> +        entry->plen = 0;
>>>>>>>> +    } else {
>>>>>>>> +        threshold_str = colon_ptr + 1;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    if (!str_to_uint(threshold_str, 10, &value)) {
>>>>>>>> +        return false;
>>>>>>>> +    }
>>>>>>>> +    entry->threshold = value;
>>>>>>>> +
>>>>>>>> +    if (!colon_ptr) {
>>>>>>>> +        return true;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    /* ":" was found, so parse the string before ":" as a cidr.
> */
>>>>>>>> +    char ip_cidr[128];
>>>>>>>> +    ovs_strzcpy(ip_cidr, str, MIN(colon_ptr - str + 1, sizeof
>>>> ip_cidr));
>>>>>>>> +    char *error = ip_parse_cidr(ip_cidr, &entry->prefix.ipv4,
>>>> &entry->plen);
>>>>>>>
>>>>>>>
>>>>>>> As mentioned earlier by storing it in 'in6_addr' we could use
>>>> 'ip46_parse_cidr()' which would simplify the logic. We can also
> adjust the
>>>> prefix for v4 so the search is the same for both.
>>>>>>
>>>>>> The search may be simplified a little in find_threshold_for_ip()
> with
>>>> adjusted prefix for v4, only if we combine the two arrays to a single
>>>> array. Otherwise we will need to have a function to abstract and
> reuse the
>>>> ipv6 matching logic, while the ipv4 matching is really just 3 lines
> of code
>>>> :)
>>>>>> But if you prefer to use a single array for both v4 and v6, I can
> try
>>>> that in v2.
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> +    if (!error) {
>>>>>>>> +        entry->is_v4 = true;
>>>>>>>> +        return true;
>>>>>>>> +    }
>>>>>>>> +    free(error);
>>>>>>>> +    error = ipv6_parse_cidr(ip_cidr, &entry->prefix.ipv6,
>>>> &entry->plen);
>>>
>>> If I'm not wrong this is ambiguous for cases when the cidr length is not
>>> present.
>>>
>>> Maybe we should add a test for an (explicit and implicit) /128 config.
>>>
>> Both ip_parse_cidr and ipv6_parse_cidr accept different formats including
> addresses without mask/length. There are actually lots of combinations but

I had missed the fact that you always parse the integer threshold value
first and then parse the remaining prefix as an IPv6 address.  That
should be fine.

> I think it should belong to unit tests of those functions instead of
> testing them everywhere when they are used.
>> I did include random examples in the test case including an IPv4 address
> without mask and I can add one IPv6 without mask, too, but still, it is not
> supposed to be a thorough test for those CIDR parsing functions.

Makes sense, no need to duplicate efforts in this case, you're right.

>>
>>>>>>>> +    if (!error) {
>>>>>>>> +        entry->is_v4 = false;
>>>>>>>> +        return true;
>>>>>>>> +    }
>>>>>>>> +    free(error);
>>>>>>>> +    return false;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static void
>>>>>>>> +threshold_config_destroy(struct threshold_config *config)
>>>>>>>> +{
>>>>>>>> +    free(config->v4_entries);
>>>>>>>> +    free(config->v6_entries);
>>>>>>>> +    config->v4_entries = config->v6_entries = NULL;
>>>>>>>> +    config->n_v4_entries = config->n_v6_entries = 0;
>>>>>>>> +    config->default_threshold = 0;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +/* Parse the threshold option string, which has the format:
>>>>>>>> + * ENTRY[;ENTRY[...]]
>>>>>>>> + *
>>>>>>>> + * For the exact format of ENTRY, refer to the function
>>>>>>>> + * `parse_threshold_entry`.
>>>>>>>> + *
>>>>>>>> + * The parsed data is populated to the struct threshold_config.
>>>>>>>> + * See the comments of struct threshold_config for details.
>>>>>>>> + *
>>>>>>>> + * Return Values:
>>>>>>>> + * - Returns `false` if the input does not match the expected
> format.
>>>>>>>> + *   Consequently, no entries will be populated.
>>>>>>>> + * - Returns `true` upon successful parsing. The caller is
>>>> responsible for
>>>>>>>> + *   releasing the allocated memory by calling
>>>> threshold_config_destroy. */
>>>>>>>> +static bool
>>>>>>>> +parse_aging_threshold(const char *opt,
>>>>>>>> +                      struct threshold_config *config)
>>>>>>>> +{
>>>>>>>> +    static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5,
> 1);
>>>>>>>> +    struct threshold_entry e;
>>>>>>>> +    const char *start = opt, *end;
>>>>>>>> +    memset(config, 0, sizeof *config);
>>>>>>>> +
>>>>>>>> +    while (start && *start) {
>>>>>>>> +        end = strchr(start, ';');
>>>>>>>
>>>>>>>
>>>>>>> We should use 'strtok_r()' this again allows us to avoid the
> constant
>>>> copies replaced by single one.
>>>>>>
>>>>>> Make sense. I will change this in v2.
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> +        if (end) {
>>>>>>>> +            /* Big enough for any <ipv6>/<ipv6 mask>:<threshold>
> */
>>>>>>>> +            char buf[128];
>>>>>>>> +            ovs_strzcpy(buf, start, MIN(end - start + 1, sizeof
>>>> buf));
>>>>>>>> +            if (!parse_threshold_entry(buf, &e)) {
>>>>>>>> +                VLOG_WARN_RL(&rl, "Parsing aging threshold '%s'
>>>> failed.", buf);
>>>>>>>> +                goto err;
>>>>>>>> +            }
>>>>>>>> +            start = end + 1;
>>>>>>>> +        } else {
>>>>>>>> +            if (!parse_threshold_entry(start, &e)) {
>>>>>>>> +                VLOG_WARN_RL(&rl, "Parsing aging threshold '%s'
>>>> failed.",
>>>>>>>> +                             start);
>>>>>>>> +                goto err;
>>>>>>>> +            }
>>>>>>>> +            start += strlen(start);
>>>>>>>> +        }
>>>>>>>> +        if (!e.plen) {
>>>>>>>> +            config->default_threshold = e.threshold;
>>>>>>>> +        } else if (e.is_v4) {
>>>>>>>> +            config->n_v4_entries++;
>>>>>>>> +            config->v4_entries = xrealloc(config->v4_entries,
>>>>>>>> +                                          config->n_v4_entries *
>>>> sizeof e);
>>>>>>>> +            config->v4_entries[config->n_v4_entries - 1] = e;
>>>
>>> Nit: Maybe use x2nrealloc()?
>>
>> I opted not to use x2nrealloc since it's merely a short configuration
> string (without much thinking). Using x2nrealloc could give the impression
> to the code reader that a large number of entries are anticipated. But, I
> am really fine either way.

If we'd have a single array for v4 and v6 using x2nrealloc would just
mean a single extra line of code.  But I agree, users will probabl not
have a lot of CIDRs listed here so it's probably fine.

>>
>> Best regards,
>> Han
>>
>>>
>>>>>>>> +        } else {
>>>>>>>> +            config->n_v6_entries++;
>>>>>>>> +            config->v6_entries = xrealloc(config->v6_entries,
>>>>>>>> +                                          config->n_v6_entries *
>>>> sizeof e);
>>>>>>>> +            config->v6_entries[config->n_v6_entries - 1] = e;
>>>>>>>> +        }
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    if (config->n_v4_entries > 0) {
>>>>>>>> +        qsort(config->v4_entries, config->n_v4_entries, sizeof e,
>>>>>>>> +              compare_entries_by_prefix_length);
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    if (config->n_v6_entries > 0) {
>>>>>>>> +        qsort(config->v6_entries, config->n_v6_entries, sizeof e,
>>>>>>>> +              compare_entries_by_prefix_length);
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    return true;
>>>>>>>> +
>>>>>>>> +err:
>>>>>>>> +    threshold_config_destroy(config);
>>>>>>>> +    return false;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static unsigned int
>>>>>>>> +find_threshold_for_ip(const char *ip_str,
>>>>>>>> +                      const struct threshold_config *config)
>>>>>>>> +{
>>>>>>>> +    if (!ip_str) {
>>>>>>>> +        return config->default_threshold;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    ovs_be32 ipv4;
>>>>>>>> +    struct in6_addr ipv6;
>>>>>>>> +    if (ip_parse(ip_str, &ipv4)) {
>>>>>>>> +        for (int i = 0; i < config->n_v4_entries; i++) {
>>>>>>>> +            ovs_be32 masked_ip = ipv4 &
>>>>>>>> +                be32_prefix_mask(config->v4_entries[i].plen);
>>>>>>>> +            if (masked_ip == config->v4_entries[i].prefix.ipv4) {
>>>>>>>> +                return config->v4_entries[i].threshold;
>>>>>>>> +            }
>>>>>>>> +        }
>>>>>>>> +    } else if (ipv6_parse(ip_str, &ipv6)) {
>>>>>>>> +        for (int i = 0; i < config->n_v6_entries; i++) {
>>>>>>>> +            struct in6_addr v6_mask =
>>>>>>>> +                ipv6_create_mask(config->v6_entries[i].plen);
>>>>>>>> +            struct in6_addr masked_ip = ipv6_addr_bitand(&ipv6,
>>>> &v6_mask);
>>>>>>>> +            if (ipv6_addr_equals(&masked_ip,
>>>>>>>> +
>>>> &config->v6_entries[i].prefix.ipv6)) {
>>>>>>>> +                return config->v6_entries[i].threshold;
>>>>>>>> +            }
>>>>>>>> +        }
>>>>>>>> +    }
>>>>>>>> +    return config->default_threshold;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +/* Parse the threshold option string (see the comment of the
> function
>>>>>>>> + * parse_aging_threshold), and returns the smallest threshold. */
>>>>>>>> +unsigned int
>>>>>>>> +min_mac_binding_age_threshold(const char *opt)
>>>>>>>> +{
>>>>>>>> +    struct threshold_config config;
>>>>>>>> +    if (!parse_aging_threshold(opt, &config)) {
>>>>>>>> +        return 0;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    unsigned int threshold = UINT_MAX;
>>>>>>>> +    unsigned int t;
>>>>>>>> +
>>>>>>>> +    for (int i = 0; i < config.n_v4_entries; i++) {
>>>>>>>> +        t = config.v4_entries[i].threshold;
>>>>>>>> +        if (t && t < threshold) {
>>>>>>>> +            threshold = t;
>>>>>>>> +        }
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    for (int i = 0; i < config.n_v6_entries; i++) {
>>>>>>>> +        t = config.v6_entries[i].threshold;
>>>>>>>> +        if (t && t < threshold) {
>>>>>>>> +            threshold = t;
>>>>>>>> +        }
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    t = config.default_threshold;
>>>>>>>> +    if (t && t < threshold) {
>>>>>>>> +        threshold = t;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    threshold_config_destroy(&config);
>>>>>>>> +
>>>>>>>> +    return threshold == UINT_MAX ? 0 : threshold;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>>  struct aging_context {
>>>>>>>>      int64_t next_wake_ms;
>>>>>>>>      int64_t time_wall_now;
>>>>>>>>      uint32_t removal_limit;
>>>>>>>>      uint32_t n_removed;
>>>>>>>> -    uint64_t threshold;
>>>>>>>> +    struct threshold_config *threshold;
>>>>>>>>  };
>>>>>>>>
>>>>>>>>  static struct aging_context
>>>>>>>> @@ -63,13 +304,14 @@ aging_context_init(uint32_t removal_limit)
>>>>>>>>             .time_wall_now = time_wall_msec(),
>>>>>>>>             .removal_limit = removal_limit,
>>>>>>>>             .n_removed = 0,
>>>>>>>> -           .threshold = 0,
>>>>>>>> +           .threshold = NULL,
>>>>>>>>      };
>>>>>>>>      return ctx;
>>>>>>>>  }
>>>>>>>>
>>>>>>>>  static void
>>>>>>>> -aging_context_set_threshold(struct aging_context *ctx, uint64_t
>>>> threshold)
>>>>>>>> +aging_context_set_threshold(struct aging_context *ctx,
>>>>>>>> +                            struct threshold_config *threshold)
>>>>>>>>  {
>>>>>>>>      ctx->threshold = threshold;
>>>>>>>>  }
>>>>>>>> @@ -81,19 +323,27 @@ aging_context_is_at_limit(struct
> aging_context
>>>> *ctx)
>>>>>>>>  }
>>>>>>>>
>>>>>>>>  static bool
>>>>>>>> -aging_context_handle_timestamp(struct aging_context *ctx, int64_t
>>>> timestamp)
>>>>>>>> +aging_context_handle_timestamp(struct aging_context *ctx, int64_t
>>>> timestamp,
>>>>>>>> +                               const char *ip)
>>>>>>>>  {
>>>>>>>>      int64_t elapsed = ctx->time_wall_now - timestamp;
>>>>>>>>      if (elapsed < 0) {
>>>>>>>>          return false;
>>>>>>>>      }
>>>>>>>>
>>>>>>>> -    if (elapsed >= ctx->threshold) {
>>>>>>>> +    ovs_assert(ctx->threshold);
>>>>>>>> +    uint64_t threshold = 1000 * find_threshold_for_ip(ip,
>>>> ctx->threshold);
>>>>>>>> +
>>>>>>>> +    if (!threshold) {
>>>>>>>> +        return false;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    if (elapsed >= threshold) {
>>>>>>>>          ctx->n_removed++;
>>>>>>>>          return true;
>>>>>>>>      }
>>>>>>>>
>>>>>>>> -    ctx->next_wake_ms = MIN(ctx->next_wake_ms, (ctx->threshold -
>>>> elapsed));
>>>>>>>> +    ctx->next_wake_ms = MIN(ctx->next_wake_ms, (threshold -
>>>> elapsed));
>>>>>>>>      return false;
>>>>>>>>  }
>>>>>>>>
>>>>>>>> @@ -121,13 +371,18 @@ mac_binding_aging_run_for_datapath(const
> struct
>>>> sbrec_datapath_binding *dp,
>>>>>>>>          return;
>>>>>>>>      }
>>>>>>>>
>>>>>>>> +    if (!ctx->threshold->n_v4_entries &&
>>>> !ctx->threshold->n_v6_entries
>>>>>>>> +        && !ctx->threshold->default_threshold) {
>>>>>>>> +        return;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>>      struct sbrec_mac_binding *mb_index_row =
>>>>>>>>          sbrec_mac_binding_index_init_row(mb_by_datapath);
>>>>>>>>      sbrec_mac_binding_index_set_datapath(mb_index_row, dp);
>>>>>>>>
>>>>>>>>      const struct sbrec_mac_binding *mb;
>>>>>>>>      SBREC_MAC_BINDING_FOR_EACH_EQUAL (mb, mb_index_row,
>>>> mb_by_datapath) {
>>>>>>>> -        if (aging_context_handle_timestamp(ctx, mb->timestamp)) {
>>>>>>>> +        if (aging_context_handle_timestamp(ctx, mb->timestamp,
>>>> mb->ip)) {
>>>>>>>>              sbrec_mac_binding_delete(mb);
>>>>>>>>              if (aging_context_is_at_limit(ctx)) {
>>>>>>>>                  break;
>>>>>>>> @@ -166,13 +421,20 @@ en_mac_binding_aging_run(struct engine_node
>>>> *node, void *data OVS_UNUSED)
>>>>>>>>              continue;
>>>>>>>>          }
>>>>>>>>
>>>>>>>> -        uint64_t threshold = smap_get_uint(&od->nbr->options,
>>>>>>>> -
>>>> "mac_binding_age_threshold", 0);
>>>>>>>> -        aging_context_set_threshold(&ctx, threshold * 1000);
>>>>>>>> +        struct threshold_config threshold_config;
>>>>>>>> +        if (!parse_aging_threshold(smap_get(&od->nbr->options,
>>>>>>>> +
>>>>  "mac_binding_age_threshold"),
>>>>>>>> +                                   &threshold_config)) {
>>>>>>>> +            return;
>>>>>>>> +        }
>>>>>>>> +
>>>>>>>> +        aging_context_set_threshold(&ctx, &threshold_config);
>>>>>>>>
>>>>>>>>          mac_binding_aging_run_for_datapath(od->sb,
>>>>>>>>
>>>> sbrec_mac_binding_by_datapath,
>>>>>>>>                                             &ctx);
>>>>>>>> +        threshold_config_destroy(&threshold_config);
>>>>>>>> +
>>>>>>>>          if (aging_context_is_at_limit(&ctx)) {
>>>>>>>>              /* Schedule the next run after specified delay. */
>>>>>>>>              ctx.next_wake_ms = AGING_BULK_REMOVAL_DELAY_MSEC;
>>>>>>>> @@ -255,7 +517,7 @@ fdb_run_for_datapath(const struct
>>>> sbrec_datapath_binding *dp,
>>>>>>>>
>>>>>>>>      const struct sbrec_fdb *fdb;
>>>>>>>>      SBREC_FDB_FOR_EACH_EQUAL (fdb, fdb_index_row, fdb_by_dp_key)
> {
>>>>>>>> -        if (aging_context_handle_timestamp(ctx, fdb->timestamp))
> {
>>>>>>>> +        if (aging_context_handle_timestamp(ctx, fdb->timestamp,
>>>> NULL)) {
>>>>>>>>              sbrec_fdb_delete(fdb);
>>>>>>>>              if (aging_context_is_at_limit(ctx)) {
>>>>>>>>                  break;
>>>>>>>> @@ -293,11 +555,16 @@ en_fdb_aging_run(struct engine_node *node,
> void
>>>> *data OVS_UNUSED)
>>>>>>>>              continue;
>>>>>>>>          }
>>>>>>>>
>>>>>>>> -        uint64_t threshold =
>>>>>>>> -                smap_get_uint(&od->nbs->other_config,
>>>> "fdb_age_threshold", 0);
>>>>>>>> -        aging_context_set_threshold(&ctx, threshold * 1000);
>>>>>>>> -
>>>>>>>> +        struct threshold_config threshold_config;
>>>>>>>> +        if
> (!parse_aging_threshold(smap_get(&od->nbs->other_config,
>>>>>>>> +                                            "fdb_age_threshold"),
>>>>>>>> +                                   &threshold_config)) {
>>>>>>>
>>>>>>>
>>>>>>> This would mean that fdb aging would accept the same arguments, but
>>>> that doesn't seem right. We should set only the default value instead
> of
>>>> the whole parsing.
>>>>>>>
>>>>>> Calling parse_aging_threshold here is just a convenient way to
>>>> initialize the other fields of the structure while setting the default
>>>> value. I agree it may be just more clear to explicitly initialize the
>>>> threshold_config and set the default value by calling
> smap_get_uint(). I
>>>> will do that in v2.
>>>>>>
>>>>>> I will wait for the consensus on the CIDR option's format and the
>>>> separate v.s. combined array before sending v2.
>>>>>
>>>>>
>>>>> I definitely don't want to block this patch on something that is
> more or
>>>> less cosmetic change so let's see what others think.
>>>>>
>>>> The option format may be more important since it is user facing. With
> an
>>>> example, we need to choose between:
>>>>
>>>> A: 192.168.0.0/16:300;192.168.10.0/24:0;fe80::/10:600;1200
>>>>
>>>> or
>>>>
>>>> B: 192.168.0.0/16;300,192.168.10.0/24;0,fe80::/10;600,1200
>>>>
>>>> Thanks,
>>>> Han
>>>>
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Han
>>>>>>
>>>>>>>>
>>>>>>>> +            return;
>>>>>>>> +        }
>>>>>>>> +        aging_context_set_threshold(&ctx, &threshold_config);
>>>>>>>>          fdb_run_for_datapath(od->sb, sbrec_fdb_by_dp_key, &ctx);
>>>>>>>> +        threshold_config_destroy(&threshold_config);
>>>>>>>> +
>>>>>>>>          if (aging_context_is_at_limit(&ctx)) {
>>>>>>>>              /* Schedule the next run after specified delay. */
>>>>>>>>              ctx.next_wake_ms = AGING_BULK_REMOVAL_DELAY_MSEC;
>>>>>>>> diff --git a/northd/aging.h b/northd/aging.h
>>>>>>>> index 650990e244aa..2cfe2fd44457 100644
>>>>>>>> --- a/northd/aging.h
>>>>>>>> +++ b/northd/aging.h
>>>>>>>> @@ -30,6 +30,9 @@ void *en_mac_binding_aging_waker_init(struct
>>>> engine_node *node,
>>>>>>>>                                        struct engine_arg *arg);
>>>>>>>>  void en_mac_binding_aging_waker_cleanup(void *data);
>>>>>>>>
>>>>>>>> +/* The MAC binding aging helper function. */
>>>>>>>> +unsigned int min_mac_binding_age_threshold(const char *opt);
>>>>>>>> +
>>>>>>>>  /* The FDB aging node functions. */
>>>>>>>>  void en_fdb_aging_run(struct engine_node *node, void *data);
>>>>>>>>  void *en_fdb_aging_init(struct engine_node *node, struct
> engine_arg
>>>> *arg);
>>>>>>>> diff --git a/northd/northd.c b/northd/northd.c
>>>>>>>> index f8b046d83e85..4bc898982cf2 100644
>>>>>>>> --- a/northd/northd.c
>>>>>>>> +++ b/northd/northd.c
>>>>>>>> @@ -17,6 +17,7 @@
>>>>>>>>  #include <stdlib.h>
>>>>>>>>  #include <stdio.h>
>>>>>>>>
>>>>>>>> +#include "aging.h"
>>>>>>>>  #include "debug.h"
>>>>>>>>  #include "bitmap.h"
>>>>>>>>  #include "coverage.h"
>>>>>>>> @@ -1166,8 +1167,14 @@ ovn_datapath_update_external_ids(struct
>>>> ovn_datapath *od)
>>>>>>>>              smap_add(&ids, "always_learn_from_arp_request",
> "false");
>>>>>>>>          }
>>>>>>>>
>>>>>>>> -        uint32_t age_threshold = smap_get_uint(&od->nbr->options,
>>>>>>>> -
>>>> "mac_binding_age_threshold", 0);
>>>>>>>> +        /* For timestamp refreshing, the smallest threshold of
> the
>>>> option is
>>>>>>>> +         * set to SB to make sure all entries are refreshed in
> time.
>>>>>>>> +         * XXX: This approach simplifies processing in
>>>> ovn-controller, but it
>>>>>>>> +         * may be enhanced, if necessary, to parse the complete
>>>> CIDR-based
>>>>>>>> +         * threshold configurations to SB to reduce unnecessary
>>>> refreshes. */
>>>>>>>> +        uint32_t age_threshold = min_mac_binding_age_threshold(
>>>>>>>> +
> smap_get(&od->nbr->options,
>>>>>>>> +
>>>> "mac_binding_age_threshold"));
>>>>>>>>          if (age_threshold) {
>>>>>>>>              smap_add_format(&ids, "mac_binding_age_threshold",
>>>>>>>>                              "%u", age_threshold);
>>>>>>>> diff --git a/ovn-nb.xml b/ovn-nb.xml
>>>>>>>> index 1de0c30416ce..80733273484e 100644
>>>>>>>> --- a/ovn-nb.xml
>>>>>>>> +++ b/ovn-nb.xml
>>>>>>>> @@ -2686,10 +2686,65 @@ or
>>>>>>>>        </column>
>>>>>>>>
>>>>>>>>        <column name="options" key="mac_binding_age_threshold"
>>>>>>>> -              type='{"type": "integer", "minInteger": 0,
>>>> "maxInteger": 4294967295}'>
>>>>>>>> -        MAC binding aging <code>threshold</code> value in
> seconds.
>>>> MAC binding
>>>>>>>> -        exceeding this timeout will be automatically removed. The
>>>> value
>>>>>>>> -        defaults to 0, which means disabled.
>>>>>>>> +              type='{"type": "string"}'>
>>>>>>>> +        <p>
>>>>>>>> +            Specifies the MAC binding aging thresholds based on
>>>> CIDRs, with the
>>>>>>>> +            format:
>>>> <var>entry</var>[<code>;</code><var>entry</var>[...]],
>>>>>>>> +            where each <var>entry</var> has the format:
>>>>>>>> +            [<var>cidr</var><code>:</code>]<var>threshold</var>
>>>>>>>> +        </p>
>>>>>>>> +
>>>>>>>> +        <ul>
>>>>>>>> +          <li>
>>>>>>>> +            <var>cidr</var>: Can be either an IPv4 or IPv6 CIDR.
>>>>>>>> +          </li>
>>>>>>>> +          <li>
>>>>>>>> +            <var>threshold</var>: Threshold value in seconds. MAC
>>>> bindings with
>>>>>>>> +            IP addresses matching the specified CIDR that exceed
>>>> this timeout
>>>>>>>> +            will be automatically removed.
>>>>>>>> +          </li>
>>>>>>>> +        </ul>
>>>>>>>> +
>>>>>>>> +        <p>
>>>>>>>> +            If an <var>entry</var> is provided without an CIDR
> (just
>>>> the
>>>>>>>> +            threshold value), it specifies the default threshold
> for
>>>> MAC
>>>>>>>> +            bindings that don't match any of the given CIDRs. If
>>>> there are
>>>>>>>> +            multiple default threshold entries in the option, the
>>>> behavior is
>>>>>>>> +            undefined.
>>>>>>>> +        </p>
>>>>>>>> +
>>>>>>>> +        <p>
>>>>>>>> +            If there are multiple CIDRs matching a MAC binding
> IP,
>>>> the one with
>>>>>>>> +            the longest prefix length takes effect. If there are
>>>> multiple
>>>>>>>> +            entries with the same CIDR in the option, the
> behavior is
>>>>>>>> +            undefined.
>>>>>>>> +        </p>
>>>>>>>> +
>>>>>>>> +        <p>
>>>>>>>> +            If no matching CIDR is found for a MAC binding IP,
> and
>>>> no default
>>>>>>>> +            threshold is specified, the behavior defaults to the
>>>> original: the
>>>>>>>> +            binding will not be removed based on age.
>>>>>>>> +        </p>
>>>>>>>> +
>>>>>>>> +        <p>
>>>>>>>> +            The value can also default to an empty string, which
>>>> means that the
>>>>>>>> +            aging threshold is disabled. Any string not in the
> above
>>>> format is
>>>>>>>> +            regarded as invalid and the aging is disabled.
>>>>>>>> +        </p>
>>>>>>>> +
>>>>>>>> +        <p>
>>>>>>>> +            Example:
>>>>>>>> +            <code>
>>>> 192.168.0.0/16:300;192.168.10.0/24:0;fe80::/10:600;1200</code>
>>>>>>>> +        </p>
>>>>>>>> +
>>>>>>>> +        <p>
>>>>>>>> +            This sets a threshold of 300 seconds for MAC bindings
>>>> with IP
>>>>>>>> +            addresses in the 192.168.0.0/16 range, excluding the
>>>> 192.168.1.0/24
>>>>>>>> +            range (for which the aging is disabled), a threshold
> of
>>>> 600 seconds
>>>>>>>> +            for MAC bindings with IP addresses in the fe80::/10
> IPv6
>>>> range, and
>>>>>>>> +            a default threshold of 1200 seconds for all other MAC
>>>> bindings.
>>>>>>>> +        </p>
>>>>>>>> +
>>>>>>>>        </column>
>>>>>>>>      </group>
>>>>>>>>
>>>>>>>> diff --git a/tests/ovn.at b/tests/ovn.at
>>>>>>>> index 637d92bedbc7..07b5c1ec57fc 100644
>>>>>>>> --- a/tests/ovn.at
>>>>>>>> +++ b/tests/ovn.at
>>>>>>>> @@ -34676,6 +34676,66 @@ OVS_WAIT_UNTIL([
>>>>>>>>
>>>>>>>>  AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=79 --no-stats
> |
>>>> strip_cookie], [0], [])
>>>>>>>>
>>>>>>>> +# Test CIDR-based threshold configuration
>>>>>>>> +check ovn-nbctl set logical_router gw
>>>> options:mac_binding_age_threshold="
>>>> 192.168.10.0/255.255.255.0:2;192.168.10.64/26:0;192.168.10.20:0"
>>>>>>>> +check ovn-nbctl --wait=sb sync
>>>>>>>> +uuid=$(fetch_column datapath _uuid external_ids:name=gw)
>>>>>>>> +AT_CHECK([ovn-sbctl get datapath $uuid
>>>> external_ids:mac_binding_age_threshold], [0], [dnl
>>>>>>>> +"2"
>>>>>>>> +])
>>>>>>>> +
>>>>>>>> +# Send GARP to populate MAC binding table records
>>>>>>>> +send_garp hv1 ext1 10 # belong to 192.168.10.0/24
>>>>>>>> +send_garp hv2 ext2 20 # belong to 192.168.10.20/32
>>>>>>>> +send_garp hv2 ext2 65 # belong to 192.168.10.64/26
>>>>>>>> +
>>>>>>>> +OVS_WAIT_UNTIL([ovn-sbctl list mac_binding | grep -q
>>>> "192.168.10.10"])
>>>>>>>> +OVS_WAIT_UNTIL([ovn-sbctl list mac_binding | grep -q
>>>> "192.168.10.20"])
>>>>>>>> +OVS_WAIT_UNTIL([ovn-sbctl list mac_binding | grep -q
>>>> "192.168.10.65"])
>>>>>>>> +
>>>>>>>> +OVS_WAIT_UNTIL([
>>>>>>>> +    test "0" = "$(ovn-sbctl list mac_binding | grep -c
>>>> '192.168.10.10')"
>>>>>>>> +])
>>>>>>>> +# The other two should remain because the corresponding prefixes
>>>> have threshold 0
>>>>>>>> +AT_CHECK([ovn-sbctl list mac_binding | grep -q "192.168.10.20"])
>>>>>>>> +AT_CHECK([ovn-sbctl list mac_binding | grep -q "192.168.10.65"])
>>>>>>>> +check ovn-sbctl --all destroy mac_binding
>>>>>>>> +
>>>>>>>> +# Set the aging threshold mixed with IPv6 prefixes and default
>>>> threshold
>>>>>>>> +check ovn-nbctl set logical_router gw
>>>> options:mac_binding_age_threshold="1;
> 192.168.10.64/26:0;ff00:1234::/32:888"
>>>>>>>> +check ovn-nbctl --wait=sb sync
>>>>>>>> +uuid=$(fetch_column datapath _uuid external_ids:name=gw)
>>>>>>>> +AT_CHECK([ovn-sbctl get datapath $uuid
>>>> external_ids:mac_binding_age_threshold], [0], [dnl
>>>>>>>> +"1"
>>>>>>>> +])
>>>>>>>> +
>>>>>>>> +# Send GARP to populate MAC binding table records
>>>>>>>> +send_garp hv1 ext1 10 # belong to default
>>>>>>>> +send_garp hv2 ext2 65 # belong to 192.168.10.64/26
>>>>>>>> +
>>>>>>>> +OVS_WAIT_UNTIL([ovn-sbctl list mac_binding | grep -q
>>>> "192.168.10.10"])
>>>>>>>> +OVS_WAIT_UNTIL([ovn-sbctl list mac_binding | grep -q
>>>> "192.168.10.65"])
>>>>>>>> +
>>>>>>>> +OVS_WAIT_UNTIL([
>>>>>>>> +    test "0" = "$(ovn-sbctl list mac_binding | grep -c
>>>> '192.168.10.10')"
>>>>>>>> +])
>>>>>>>> +AT_CHECK([ovn-sbctl list mac_binding | grep -q "192.168.10.65"])
>>>>>>>> +check ovn-sbctl --all destroy mac_binding
>>>>>>>> +
>>>>>>>> +# Set the aging threshold with invalid format
>>>>>>>> +check ovn-nbctl set logical_router gw
>>>> options:mac_binding_age_threshold="1;abc/26:0"
>>>>>>>> +check ovn-nbctl --wait=sb sync
>>>>>>>> +uuid=$(fetch_column datapath _uuid external_ids:name=gw)
>>>>>>>> +AT_CHECK([ovn-sbctl get datapath $uuid
>>>> external_ids:mac_binding_age_threshold], [1], [ignore], [ignore])
>>>>>>>> +
>>>>>>>> +# Send GARP to populate MAC binding table records
>>>>>>>> +send_garp hv1 ext1 10
>>>>>>>> +
>>>>>>>> +OVS_WAIT_UNTIL([ovn-sbctl list mac_binding | grep -q
>>>> "192.168.10.10"])
>>>>>>>> +# The record is not deleted
>>>>>>>> +sleep 5
>>>>>>>> +AT_CHECK([ovn-sbctl list mac_binding | grep -q "192.168.10.10"])
>>>>>>>> +
>>>>>>>>  OVN_CLEANUP([hv1], [hv2])
>>>>>>>>  AT_CLEANUP
>>>>>>>>  ])
>>>>>>>> --
>>>>>>>> 2.38.1
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> dev mailing list
>>>>>>>> d...@openvswitch.org
>>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Ales
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Ales Musil
>>>>>>>
>>>>>>> Senior Software Engineer - OVN Core
>>>>>>>
>>>>>>> Red Hat EMEA
>>>>>>>
>>>>>>> amu...@redhat.com
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Ales
>>>>> --
>>>>>
>>>>> Ales Musil
>>>>>
>>>>> Senior Software Engineer - OVN Core
>>>>>
>>>>> Red Hat EMEA
>>>>>
>>>>> amu...@redhat.com
>>>> _______________________________________________
>>>> dev mailing list
>>>> d...@openvswitch.org
>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>
> 
> I just sent v2. PTAL:
> 
> https://patchwork.ozlabs.org/project/ovn/patch/20231108043618.4056443-1-hz...@ovn.org/
> 

I'll try to have a look soon, thanks!


_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to