On Tue, Nov 7, 2023 at 8:06 AM Dumitru Ceara <dce...@redhat.com> wrote:
>
> On 11/6/23 08:19, Han Zhou wrote:
> > On Sun, Nov 5, 2023 at 10:59 PM Ales Musil <amu...@redhat.com> wrote:
> >>
> >>
> >>
> >> On Sat, Nov 4, 2023 at 5:45 AM Han Zhou <hz...@ovn.org> wrote:
> >>>
> >>>
> >>>
> >>> On Fri, Nov 3, 2023 at 1:08 AM Ales Musil <amu...@redhat.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On Tue, Oct 24, 2023 at 9:36 PM Han Zhou <hz...@ovn.org> wrote:
> >>>>>
> >>>>> Enhance MAC_Binding aging to allow CIDR-based threshold
> > configurations.
> >>>>> This enables distinct threshold settings for different IP ranges,
> >>>>> applying the longest prefix matching for overlapping ranges.
> >>>>>
> >>>>> A common use case involves setting a default threshold for all IPs,
> > while
> >>>>> disabling aging for a specific range and potentially excluding a
> > subset
> >>>>> within that range.
> >>>>>
> >>>>> Signed-off-by: Han Zhou <hz...@ovn.org>
> >>>>> ---
> >>>>
> >>>>
> >>>> Hi Han,
> >>>>
> >>>> thank you for the patch, I have a couple of comments down below.
> >>>
> >>> Thanks for your review!
> >>>
>
>
> Hi Han,
>
> Thanks for the patch!  Not a very in-depth review but I left a bunch of
> minor comments and shared my opinion on the configuration format.
>

Thanks for your review!

> Regards,
> Dumitru
>
> >>>>
> >>>>>
> >>>>>  northd/aging.c  | 297
> > +++++++++++++++++++++++++++++++++++++++++++++---
> >>>>>  northd/aging.h  |   3 +
> >>>>>  northd/northd.c |  11 +-
> >>>>>  ovn-nb.xml      |  63 +++++++++-
> >>>>>  tests/ovn.at    |  60 ++++++++++
> >>>>>  5 files changed, 413 insertions(+), 21 deletions(-)
> >>>>>
> >>>>> diff --git a/northd/aging.c b/northd/aging.c
> >>>>> index f626c72c8ca3..e5868211a63b 100644
> >>>>> --- a/northd/aging.c
> >>>>> +++ b/northd/aging.c
> >>>>> @@ -47,12 +47,253 @@ aging_waker_schedule_next_wake(struct
> > aging_waker *waker, int64_t next_wake_ms)
> >>>>>      }
> >>>>>  }
> >>>>>
> >>>>> +struct threshold_entry {
> >>>>> +    union {
> >>>>> +        ovs_be32 ipv4;
> >>>>> +        struct in6_addr ipv6;
> >>>>> +    } prefix;
> >>>>> +    bool is_v4;
> >>>>
> >>>>
> >>>> We can avoid the is_v4 whatsover by storing the address as mapped v4
> > in 'in6_addr', I saw the concern and we can still have separate arrays
for
> > v4 and v6. The parsing IMO becomes much simpler if we use the mapped v4
for
> > both see down below.
> >>>>
> >>>
> >>> I did consider using in6_addr, but it would make the
> > find_threshold_for_ip() slightly more tedious because for every v4
entry,
> > even if stored in a separate array, we will still need to call
> > in6_addr_get_mapped_ipv4() to convert it. While using the union we can
> > directly access what we need (ipv4 or ipv6).
> >>
> >>
> >> We wouldn't have to actually call in6_addr_get_mapped_ipv4() for the v4
> > variant, that's why I suggested that approach. We can adjust the mask
and
> > then use the same logic as for ipv6 (entry->plen += 96 should do the
trick).
> >>
> >
> > Understood, and my reply below was about that:
> >>> The search may be simplified a little in find_threshold_for_ip() with
> > adjusted prefix for v4, only if we combine the two arrays to a single
> > array. Otherwise we will need to have a function to abstract and reuse
the
> > ipv6 matching logic, while the ipv4 matching is really just 3 lines of
code
> > :)
> >
> >>>
> >>> The is_v4 member was needed when using a single array, but not
necessary
> > any more when storing v4 and v6 entries separately, but I kept it just
for
> > convenience, so that I don't need to use another arg in
> > parse_threshold_entry() to return the AF type. Using in6_addr can avoid
the
> > extra arg, too, but similarly, we would have to always call the macro
> > IN6_IS_ADDR_V4MAPPED, which doesn't seem cleaner than the is_v4 field to
> > me. Using ip46_parse_cidr does save several lines of code, but that code
> > snippet itself is quite small. So with all these being considered, I
think
> > either approach doesn't really make too much difference. I am totally
fine
> > to change it if you still think a single in6_addr is better than the
union.
> >>>
> >>>>>
> >>>>> +    unsigned int plen;
> >>>>> +    unsigned int threshold;
> >>>>
> >>>>
> >>>> I personally prefer sized integers.
> >>>>
> >>> I think it may be better to match the type of related helper
functions.
> >>> For plen, it is ip[v6]_parse_cidr().
> >>> For threshold, it is str_to_uint(), or even smap_get_uint() returns
> > unsigned int.
> >>>
> >>>>>
> >>>>> +};
> >>>>> +
> >>>>> +/* Contains CIDR-based aging threshold configuration parsed from
> >>>>> + * "Logical_Router:options:mac_binding_age_threshold".
> >>>>> + *
> >>>>> + * This struct is also used for non-CIDR-based threshold, e.g. the
> > ones from
> >>>>> + * "NB_Global:other_config:fdb_age_threshold" for the common
> > aging_context
> >>>>> + * interface.
> >>>>> + *
> >>>>> + * - The arrays `v4_entries` and `v6_entries` are populated with
> > parsed entries
> >>>>> + *   for IPv4 and IPv6 CIDRs, respectively, along with their
> > associated
> >>>>> + *   thresholds.  Entries within these arrays are sorted by prefix
> > length,
> >>>>> + *   starting with the longest.
> >>>>> + *
> >>>>> + * - If a threshold is provided without an accompanying prefix,
it's
> > captured
> >>>>> + *   in `default_threshold`.  In cases with multiple unprefixed
> > thresholds,
> >>>>> + *   `default_threshold` will only store the last one.  */
> >>>>> +struct threshold_config {
> >>>>> +    struct threshold_entry *v4_entries;
> >>>>> +    int n_v4_entries;
>
> Nit: size_t
>
Ack

> >>>>> +    struct threshold_entry *v6_entries;
> >>>>> +    int n_v6_entries;
>
> Nit: size_t
>
Ack

> >>>>> +    unsigned int default_threshold;
> >>>>> +};
> >>>>> +
> >>>>> +static int
> >>>>> +compare_entries_by_prefix_length(const void *a, const void *b)
> >>>>> +{
> >>>>> +    const struct threshold_entry *entry_a = a;
> >>>>> +    const struct threshold_entry *entry_b = b;
> >>>>> +
> >>>>> +    return entry_b->plen - entry_a->plen;
> >>>>> +}
> >>>>> +
> >>>>> +/* Parse an ENTRY in the threshold option, with the format:
> >>>>> + * [CIDR:]THRESHOLD
> >>>>> + *
> >>>>> + * Returns true if successful, false if failed. */
> >>>>> +static bool
> >>>>> +parse_threshold_entry(const char *str, struct threshold_entry
*entry)
> >>>>> +{
> >>>>> +    char *colon_ptr;
> >>>>> +    unsigned int value;
> >>>>> +    const char *threshold_str;
> >>>>> +
> >>>>> +    colon_ptr = strrchr(str, ':');
> >>>>
> >>>>
> >>>> I find this a bit confusing and probably error prone, could we use
> > different delimiters for the threshold? I guess we could change it to
> > 'cidr;threshold,cidr;threshold' WDYT? This would allow us to use
> > 'strtok_r()' here and we could spare the multiple copies and replace
with a
> > single one.
> >>>>
> >>> I thought about that, too, but decided to use ":" instead of ";"
because
> > "A:B" looks more natural to represent "A has a value B", and ";" is more
> > like a delimiter to separate two different elements, just like ",". So I
> > feel that ":" might be more friendly to end users. I'd love to hear
> > opinions from more people.
> >>> If ";" or something else turned out to be a better option for the end
> > user, we can use strtok_r() here but I think it shouldn't be a concern
if
> > we can't use it.
>
> IMO ":" is better.  But that's just personal preference.
>
OK

> >>>
> >>>>>
> >>>>> +    if (!colon_ptr) {
> >>>>> +        threshold_str = str;
> >>>>> +        entry->plen = 0;
> >>>>> +    } else {
> >>>>> +        threshold_str = colon_ptr + 1;
> >>>>> +    }
> >>>>> +
> >>>>> +    if (!str_to_uint(threshold_str, 10, &value)) {
> >>>>> +        return false;
> >>>>> +    }
> >>>>> +    entry->threshold = value;
> >>>>> +
> >>>>> +    if (!colon_ptr) {
> >>>>> +        return true;
> >>>>> +    }
> >>>>> +
> >>>>> +    /* ":" was found, so parse the string before ":" as a cidr. */
> >>>>> +    char ip_cidr[128];
> >>>>> +    ovs_strzcpy(ip_cidr, str, MIN(colon_ptr - str + 1, sizeof
> > ip_cidr));
> >>>>> +    char *error = ip_parse_cidr(ip_cidr, &entry->prefix.ipv4,
> > &entry->plen);
> >>>>
> >>>>
> >>>> As mentioned earlier by storing it in 'in6_addr' we could use
> > 'ip46_parse_cidr()' which would simplify the logic. We can also adjust
the
> > prefix for v4 so the search is the same for both.
> >>>
> >>> The search may be simplified a little in find_threshold_for_ip() with
> > adjusted prefix for v4, only if we combine the two arrays to a single
> > array. Otherwise we will need to have a function to abstract and reuse
the
> > ipv6 matching logic, while the ipv4 matching is really just 3 lines of
code
> > :)
> >>> But if you prefer to use a single array for both v4 and v6, I can try
> > that in v2.
> >>>
> >>>>
> >>>>>
> >>>>> +    if (!error) {
> >>>>> +        entry->is_v4 = true;
> >>>>> +        return true;
> >>>>> +    }
> >>>>> +    free(error);
> >>>>> +    error = ipv6_parse_cidr(ip_cidr, &entry->prefix.ipv6,
> > &entry->plen);
>
> If I'm not wrong this is ambiguous for cases when the cidr length is not
> present.
>
> Maybe we should add a test for an (explicit and implicit) /128 config.
>
Both ip_parse_cidr and ipv6_parse_cidr accept different formats including
addresses without mask/length. There are actually lots of combinations but
I think it should belong to unit tests of those functions instead of
testing them everywhere when they are used.
I did include random examples in the test case including an IPv4 address
without mask and I can add one IPv6 without mask, too, but still, it is not
supposed to be a thorough test for those CIDR parsing functions.

> >>>>> +    if (!error) {
> >>>>> +        entry->is_v4 = false;
> >>>>> +        return true;
> >>>>> +    }
> >>>>> +    free(error);
> >>>>> +    return false;
> >>>>> +}
> >>>>> +
> >>>>> +static void
> >>>>> +threshold_config_destroy(struct threshold_config *config)
> >>>>> +{
> >>>>> +    free(config->v4_entries);
> >>>>> +    free(config->v6_entries);
> >>>>> +    config->v4_entries = config->v6_entries = NULL;
> >>>>> +    config->n_v4_entries = config->n_v6_entries = 0;
> >>>>> +    config->default_threshold = 0;
> >>>>> +}
> >>>>> +
> >>>>> +/* Parse the threshold option string, which has the format:
> >>>>> + * ENTRY[;ENTRY[...]]
> >>>>> + *
> >>>>> + * For the exact format of ENTRY, refer to the function
> >>>>> + * `parse_threshold_entry`.
> >>>>> + *
> >>>>> + * The parsed data is populated to the struct threshold_config.
> >>>>> + * See the comments of struct threshold_config for details.
> >>>>> + *
> >>>>> + * Return Values:
> >>>>> + * - Returns `false` if the input does not match the expected
format.
> >>>>> + *   Consequently, no entries will be populated.
> >>>>> + * - Returns `true` upon successful parsing. The caller is
> > responsible for
> >>>>> + *   releasing the allocated memory by calling
> > threshold_config_destroy. */
> >>>>> +static bool
> >>>>> +parse_aging_threshold(const char *opt,
> >>>>> +                      struct threshold_config *config)
> >>>>> +{
> >>>>> +    static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1);
> >>>>> +    struct threshold_entry e;
> >>>>> +    const char *start = opt, *end;
> >>>>> +    memset(config, 0, sizeof *config);
> >>>>> +
> >>>>> +    while (start && *start) {
> >>>>> +        end = strchr(start, ';');
> >>>>
> >>>>
> >>>> We should use 'strtok_r()' this again allows us to avoid the constant
> > copies replaced by single one.
> >>>
> >>> Make sense. I will change this in v2.
> >>>
> >>>>
> >>>>>
> >>>>> +        if (end) {
> >>>>> +            /* Big enough for any <ipv6>/<ipv6 mask>:<threshold> */
> >>>>> +            char buf[128];
> >>>>> +            ovs_strzcpy(buf, start, MIN(end - start + 1, sizeof
> > buf));
> >>>>> +            if (!parse_threshold_entry(buf, &e)) {
> >>>>> +                VLOG_WARN_RL(&rl, "Parsing aging threshold '%s'
> > failed.", buf);
> >>>>> +                goto err;
> >>>>> +            }
> >>>>> +            start = end + 1;
> >>>>> +        } else {
> >>>>> +            if (!parse_threshold_entry(start, &e)) {
> >>>>> +                VLOG_WARN_RL(&rl, "Parsing aging threshold '%s'
> > failed.",
> >>>>> +                             start);
> >>>>> +                goto err;
> >>>>> +            }
> >>>>> +            start += strlen(start);
> >>>>> +        }
> >>>>> +        if (!e.plen) {
> >>>>> +            config->default_threshold = e.threshold;
> >>>>> +        } else if (e.is_v4) {
> >>>>> +            config->n_v4_entries++;
> >>>>> +            config->v4_entries = xrealloc(config->v4_entries,
> >>>>> +                                          config->n_v4_entries *
> > sizeof e);
> >>>>> +            config->v4_entries[config->n_v4_entries - 1] = e;
>
> Nit: Maybe use x2nrealloc()?

I opted not to use x2nrealloc since it's merely a short configuration
string (without much thinking). Using x2nrealloc could give the impression
to the code reader that a large number of entries are anticipated. But, I
am really fine either way.

Best regards,
Han

>
> >>>>> +        } else {
> >>>>> +            config->n_v6_entries++;
> >>>>> +            config->v6_entries = xrealloc(config->v6_entries,
> >>>>> +                                          config->n_v6_entries *
> > sizeof e);
> >>>>> +            config->v6_entries[config->n_v6_entries - 1] = e;
> >>>>> +        }
> >>>>> +    }
> >>>>> +
> >>>>> +    if (config->n_v4_entries > 0) {
> >>>>> +        qsort(config->v4_entries, config->n_v4_entries, sizeof e,
> >>>>> +              compare_entries_by_prefix_length);
> >>>>> +    }
> >>>>> +
> >>>>> +    if (config->n_v6_entries > 0) {
> >>>>> +        qsort(config->v6_entries, config->n_v6_entries, sizeof e,
> >>>>> +              compare_entries_by_prefix_length);
> >>>>> +    }
> >>>>> +
> >>>>> +    return true;
> >>>>> +
> >>>>> +err:
> >>>>> +    threshold_config_destroy(config);
> >>>>> +    return false;
> >>>>> +}
> >>>>> +
> >>>>> +static unsigned int
> >>>>> +find_threshold_for_ip(const char *ip_str,
> >>>>> +                      const struct threshold_config *config)
> >>>>> +{
> >>>>> +    if (!ip_str) {
> >>>>> +        return config->default_threshold;
> >>>>> +    }
> >>>>> +
> >>>>> +    ovs_be32 ipv4;
> >>>>> +    struct in6_addr ipv6;
> >>>>> +    if (ip_parse(ip_str, &ipv4)) {
> >>>>> +        for (int i = 0; i < config->n_v4_entries; i++) {
> >>>>> +            ovs_be32 masked_ip = ipv4 &
> >>>>> +                be32_prefix_mask(config->v4_entries[i].plen);
> >>>>> +            if (masked_ip == config->v4_entries[i].prefix.ipv4) {
> >>>>> +                return config->v4_entries[i].threshold;
> >>>>> +            }
> >>>>> +        }
> >>>>> +    } else if (ipv6_parse(ip_str, &ipv6)) {
> >>>>> +        for (int i = 0; i < config->n_v6_entries; i++) {
> >>>>> +            struct in6_addr v6_mask =
> >>>>> +                ipv6_create_mask(config->v6_entries[i].plen);
> >>>>> +            struct in6_addr masked_ip = ipv6_addr_bitand(&ipv6,
> > &v6_mask);
> >>>>> +            if (ipv6_addr_equals(&masked_ip,
> >>>>> +
> > &config->v6_entries[i].prefix.ipv6)) {
> >>>>> +                return config->v6_entries[i].threshold;
> >>>>> +            }
> >>>>> +        }
> >>>>> +    }
> >>>>> +    return config->default_threshold;
> >>>>> +}
> >>>>> +
> >>>>> +/* Parse the threshold option string (see the comment of the
function
> >>>>> + * parse_aging_threshold), and returns the smallest threshold. */
> >>>>> +unsigned int
> >>>>> +min_mac_binding_age_threshold(const char *opt)
> >>>>> +{
> >>>>> +    struct threshold_config config;
> >>>>> +    if (!parse_aging_threshold(opt, &config)) {
> >>>>> +        return 0;
> >>>>> +    }
> >>>>> +
> >>>>> +    unsigned int threshold = UINT_MAX;
> >>>>> +    unsigned int t;
> >>>>> +
> >>>>> +    for (int i = 0; i < config.n_v4_entries; i++) {
> >>>>> +        t = config.v4_entries[i].threshold;
> >>>>> +        if (t && t < threshold) {
> >>>>> +            threshold = t;
> >>>>> +        }
> >>>>> +    }
> >>>>> +
> >>>>> +    for (int i = 0; i < config.n_v6_entries; i++) {
> >>>>> +        t = config.v6_entries[i].threshold;
> >>>>> +        if (t && t < threshold) {
> >>>>> +            threshold = t;
> >>>>> +        }
> >>>>> +    }
> >>>>> +
> >>>>> +    t = config.default_threshold;
> >>>>> +    if (t && t < threshold) {
> >>>>> +        threshold = t;
> >>>>> +    }
> >>>>> +
> >>>>> +    threshold_config_destroy(&config);
> >>>>> +
> >>>>> +    return threshold == UINT_MAX ? 0 : threshold;
> >>>>> +}
> >>>>> +
> >>>>>  struct aging_context {
> >>>>>      int64_t next_wake_ms;
> >>>>>      int64_t time_wall_now;
> >>>>>      uint32_t removal_limit;
> >>>>>      uint32_t n_removed;
> >>>>> -    uint64_t threshold;
> >>>>> +    struct threshold_config *threshold;
> >>>>>  };
> >>>>>
> >>>>>  static struct aging_context
> >>>>> @@ -63,13 +304,14 @@ aging_context_init(uint32_t removal_limit)
> >>>>>             .time_wall_now = time_wall_msec(),
> >>>>>             .removal_limit = removal_limit,
> >>>>>             .n_removed = 0,
> >>>>> -           .threshold = 0,
> >>>>> +           .threshold = NULL,
> >>>>>      };
> >>>>>      return ctx;
> >>>>>  }
> >>>>>
> >>>>>  static void
> >>>>> -aging_context_set_threshold(struct aging_context *ctx, uint64_t
> > threshold)
> >>>>> +aging_context_set_threshold(struct aging_context *ctx,
> >>>>> +                            struct threshold_config *threshold)
> >>>>>  {
> >>>>>      ctx->threshold = threshold;
> >>>>>  }
> >>>>> @@ -81,19 +323,27 @@ aging_context_is_at_limit(struct aging_context
> > *ctx)
> >>>>>  }
> >>>>>
> >>>>>  static bool
> >>>>> -aging_context_handle_timestamp(struct aging_context *ctx, int64_t
> > timestamp)
> >>>>> +aging_context_handle_timestamp(struct aging_context *ctx, int64_t
> > timestamp,
> >>>>> +                               const char *ip)
> >>>>>  {
> >>>>>      int64_t elapsed = ctx->time_wall_now - timestamp;
> >>>>>      if (elapsed < 0) {
> >>>>>          return false;
> >>>>>      }
> >>>>>
> >>>>> -    if (elapsed >= ctx->threshold) {
> >>>>> +    ovs_assert(ctx->threshold);
> >>>>> +    uint64_t threshold = 1000 * find_threshold_for_ip(ip,
> > ctx->threshold);
> >>>>> +
> >>>>> +    if (!threshold) {
> >>>>> +        return false;
> >>>>> +    }
> >>>>> +
> >>>>> +    if (elapsed >= threshold) {
> >>>>>          ctx->n_removed++;
> >>>>>          return true;
> >>>>>      }
> >>>>>
> >>>>> -    ctx->next_wake_ms = MIN(ctx->next_wake_ms, (ctx->threshold -
> > elapsed));
> >>>>> +    ctx->next_wake_ms = MIN(ctx->next_wake_ms, (threshold -
> > elapsed));
> >>>>>      return false;
> >>>>>  }
> >>>>>
> >>>>> @@ -121,13 +371,18 @@ mac_binding_aging_run_for_datapath(const
struct
> > sbrec_datapath_binding *dp,
> >>>>>          return;
> >>>>>      }
> >>>>>
> >>>>> +    if (!ctx->threshold->n_v4_entries &&
> > !ctx->threshold->n_v6_entries
> >>>>> +        && !ctx->threshold->default_threshold) {
> >>>>> +        return;
> >>>>> +    }
> >>>>> +
> >>>>>      struct sbrec_mac_binding *mb_index_row =
> >>>>>          sbrec_mac_binding_index_init_row(mb_by_datapath);
> >>>>>      sbrec_mac_binding_index_set_datapath(mb_index_row, dp);
> >>>>>
> >>>>>      const struct sbrec_mac_binding *mb;
> >>>>>      SBREC_MAC_BINDING_FOR_EACH_EQUAL (mb, mb_index_row,
> > mb_by_datapath) {
> >>>>> -        if (aging_context_handle_timestamp(ctx, mb->timestamp)) {
> >>>>> +        if (aging_context_handle_timestamp(ctx, mb->timestamp,
> > mb->ip)) {
> >>>>>              sbrec_mac_binding_delete(mb);
> >>>>>              if (aging_context_is_at_limit(ctx)) {
> >>>>>                  break;
> >>>>> @@ -166,13 +421,20 @@ en_mac_binding_aging_run(struct engine_node
> > *node, void *data OVS_UNUSED)
> >>>>>              continue;
> >>>>>          }
> >>>>>
> >>>>> -        uint64_t threshold = smap_get_uint(&od->nbr->options,
> >>>>> -
> > "mac_binding_age_threshold", 0);
> >>>>> -        aging_context_set_threshold(&ctx, threshold * 1000);
> >>>>> +        struct threshold_config threshold_config;
> >>>>> +        if (!parse_aging_threshold(smap_get(&od->nbr->options,
> >>>>> +
> >  "mac_binding_age_threshold"),
> >>>>> +                                   &threshold_config)) {
> >>>>> +            return;
> >>>>> +        }
> >>>>> +
> >>>>> +        aging_context_set_threshold(&ctx, &threshold_config);
> >>>>>
> >>>>>          mac_binding_aging_run_for_datapath(od->sb,
> >>>>>
> > sbrec_mac_binding_by_datapath,
> >>>>>                                             &ctx);
> >>>>> +        threshold_config_destroy(&threshold_config);
> >>>>> +
> >>>>>          if (aging_context_is_at_limit(&ctx)) {
> >>>>>              /* Schedule the next run after specified delay. */
> >>>>>              ctx.next_wake_ms = AGING_BULK_REMOVAL_DELAY_MSEC;
> >>>>> @@ -255,7 +517,7 @@ fdb_run_for_datapath(const struct
> > sbrec_datapath_binding *dp,
> >>>>>
> >>>>>      const struct sbrec_fdb *fdb;
> >>>>>      SBREC_FDB_FOR_EACH_EQUAL (fdb, fdb_index_row, fdb_by_dp_key) {
> >>>>> -        if (aging_context_handle_timestamp(ctx, fdb->timestamp)) {
> >>>>> +        if (aging_context_handle_timestamp(ctx, fdb->timestamp,
> > NULL)) {
> >>>>>              sbrec_fdb_delete(fdb);
> >>>>>              if (aging_context_is_at_limit(ctx)) {
> >>>>>                  break;
> >>>>> @@ -293,11 +555,16 @@ en_fdb_aging_run(struct engine_node *node,
void
> > *data OVS_UNUSED)
> >>>>>              continue;
> >>>>>          }
> >>>>>
> >>>>> -        uint64_t threshold =
> >>>>> -                smap_get_uint(&od->nbs->other_config,
> > "fdb_age_threshold", 0);
> >>>>> -        aging_context_set_threshold(&ctx, threshold * 1000);
> >>>>> -
> >>>>> +        struct threshold_config threshold_config;
> >>>>> +        if (!parse_aging_threshold(smap_get(&od->nbs->other_config,
> >>>>> +                                            "fdb_age_threshold"),
> >>>>> +                                   &threshold_config)) {
> >>>>
> >>>>
> >>>> This would mean that fdb aging would accept the same arguments, but
> > that doesn't seem right. We should set only the default value instead of
> > the whole parsing.
> >>>>
> >>> Calling parse_aging_threshold here is just a convenient way to
> > initialize the other fields of the structure while setting the default
> > value. I agree it may be just more clear to explicitly initialize the
> > threshold_config and set the default value by calling smap_get_uint(). I
> > will do that in v2.
> >>>
> >>> I will wait for the consensus on the CIDR option's format and the
> > separate v.s. combined array before sending v2.
> >>
> >>
> >> I definitely don't want to block this patch on something that is more
or
> > less cosmetic change so let's see what others think.
> >>
> > The option format may be more important since it is user facing. With an
> > example, we need to choose between:
> >
> > A: 192.168.0.0/16:300;192.168.10.0/24:0;fe80::/10:600;1200
> >
> > or
> >
> > B: 192.168.0.0/16;300,192.168.10.0/24;0,fe80::/10;600,1200
> >
> > Thanks,
> > Han
> >
> >>>
> >>>
> >>> Thanks,
> >>> Han
> >>>
> >>>>>
> >>>>> +            return;
> >>>>> +        }
> >>>>> +        aging_context_set_threshold(&ctx, &threshold_config);
> >>>>>          fdb_run_for_datapath(od->sb, sbrec_fdb_by_dp_key, &ctx);
> >>>>> +        threshold_config_destroy(&threshold_config);
> >>>>> +
> >>>>>          if (aging_context_is_at_limit(&ctx)) {
> >>>>>              /* Schedule the next run after specified delay. */
> >>>>>              ctx.next_wake_ms = AGING_BULK_REMOVAL_DELAY_MSEC;
> >>>>> diff --git a/northd/aging.h b/northd/aging.h
> >>>>> index 650990e244aa..2cfe2fd44457 100644
> >>>>> --- a/northd/aging.h
> >>>>> +++ b/northd/aging.h
> >>>>> @@ -30,6 +30,9 @@ void *en_mac_binding_aging_waker_init(struct
> > engine_node *node,
> >>>>>                                        struct engine_arg *arg);
> >>>>>  void en_mac_binding_aging_waker_cleanup(void *data);
> >>>>>
> >>>>> +/* The MAC binding aging helper function. */
> >>>>> +unsigned int min_mac_binding_age_threshold(const char *opt);
> >>>>> +
> >>>>>  /* The FDB aging node functions. */
> >>>>>  void en_fdb_aging_run(struct engine_node *node, void *data);
> >>>>>  void *en_fdb_aging_init(struct engine_node *node, struct engine_arg
> > *arg);
> >>>>> diff --git a/northd/northd.c b/northd/northd.c
> >>>>> index f8b046d83e85..4bc898982cf2 100644
> >>>>> --- a/northd/northd.c
> >>>>> +++ b/northd/northd.c
> >>>>> @@ -17,6 +17,7 @@
> >>>>>  #include <stdlib.h>
> >>>>>  #include <stdio.h>
> >>>>>
> >>>>> +#include "aging.h"
> >>>>>  #include "debug.h"
> >>>>>  #include "bitmap.h"
> >>>>>  #include "coverage.h"
> >>>>> @@ -1166,8 +1167,14 @@ ovn_datapath_update_external_ids(struct
> > ovn_datapath *od)
> >>>>>              smap_add(&ids, "always_learn_from_arp_request",
"false");
> >>>>>          }
> >>>>>
> >>>>> -        uint32_t age_threshold = smap_get_uint(&od->nbr->options,
> >>>>> -
> > "mac_binding_age_threshold", 0);
> >>>>> +        /* For timestamp refreshing, the smallest threshold of the
> > option is
> >>>>> +         * set to SB to make sure all entries are refreshed in
time.
> >>>>> +         * XXX: This approach simplifies processing in
> > ovn-controller, but it
> >>>>> +         * may be enhanced, if necessary, to parse the complete
> > CIDR-based
> >>>>> +         * threshold configurations to SB to reduce unnecessary
> > refreshes. */
> >>>>> +        uint32_t age_threshold = min_mac_binding_age_threshold(
> >>>>> +                                       smap_get(&od->nbr->options,
> >>>>> +
> > "mac_binding_age_threshold"));
> >>>>>          if (age_threshold) {
> >>>>>              smap_add_format(&ids, "mac_binding_age_threshold",
> >>>>>                              "%u", age_threshold);
> >>>>> diff --git a/ovn-nb.xml b/ovn-nb.xml
> >>>>> index 1de0c30416ce..80733273484e 100644
> >>>>> --- a/ovn-nb.xml
> >>>>> +++ b/ovn-nb.xml
> >>>>> @@ -2686,10 +2686,65 @@ or
> >>>>>        </column>
> >>>>>
> >>>>>        <column name="options" key="mac_binding_age_threshold"
> >>>>> -              type='{"type": "integer", "minInteger": 0,
> > "maxInteger": 4294967295}'>
> >>>>> -        MAC binding aging <code>threshold</code> value in seconds.
> > MAC binding
> >>>>> -        exceeding this timeout will be automatically removed. The
> > value
> >>>>> -        defaults to 0, which means disabled.
> >>>>> +              type='{"type": "string"}'>
> >>>>> +        <p>
> >>>>> +            Specifies the MAC binding aging thresholds based on
> > CIDRs, with the
> >>>>> +            format:
> > <var>entry</var>[<code>;</code><var>entry</var>[...]],
> >>>>> +            where each <var>entry</var> has the format:
> >>>>> +            [<var>cidr</var><code>:</code>]<var>threshold</var>
> >>>>> +        </p>
> >>>>> +
> >>>>> +        <ul>
> >>>>> +          <li>
> >>>>> +            <var>cidr</var>: Can be either an IPv4 or IPv6 CIDR.
> >>>>> +          </li>
> >>>>> +          <li>
> >>>>> +            <var>threshold</var>: Threshold value in seconds. MAC
> > bindings with
> >>>>> +            IP addresses matching the specified CIDR that exceed
> > this timeout
> >>>>> +            will be automatically removed.
> >>>>> +          </li>
> >>>>> +        </ul>
> >>>>> +
> >>>>> +        <p>
> >>>>> +            If an <var>entry</var> is provided without an CIDR
(just
> > the
> >>>>> +            threshold value), it specifies the default threshold
for
> > MAC
> >>>>> +            bindings that don't match any of the given CIDRs. If
> > there are
> >>>>> +            multiple default threshold entries in the option, the
> > behavior is
> >>>>> +            undefined.
> >>>>> +        </p>
> >>>>> +
> >>>>> +        <p>
> >>>>> +            If there are multiple CIDRs matching a MAC binding IP,
> > the one with
> >>>>> +            the longest prefix length takes effect. If there are
> > multiple
> >>>>> +            entries with the same CIDR in the option, the behavior
is
> >>>>> +            undefined.
> >>>>> +        </p>
> >>>>> +
> >>>>> +        <p>
> >>>>> +            If no matching CIDR is found for a MAC binding IP, and
> > no default
> >>>>> +            threshold is specified, the behavior defaults to the
> > original: the
> >>>>> +            binding will not be removed based on age.
> >>>>> +        </p>
> >>>>> +
> >>>>> +        <p>
> >>>>> +            The value can also default to an empty string, which
> > means that the
> >>>>> +            aging threshold is disabled. Any string not in the
above
> > format is
> >>>>> +            regarded as invalid and the aging is disabled.
> >>>>> +        </p>
> >>>>> +
> >>>>> +        <p>
> >>>>> +            Example:
> >>>>> +            <code>
> > 192.168.0.0/16:300;192.168.10.0/24:0;fe80::/10:600;1200</code>
> >>>>> +        </p>
> >>>>> +
> >>>>> +        <p>
> >>>>> +            This sets a threshold of 300 seconds for MAC bindings
> > with IP
> >>>>> +            addresses in the 192.168.0.0/16 range, excluding the
> > 192.168.1.0/24
> >>>>> +            range (for which the aging is disabled), a threshold of
> > 600 seconds
> >>>>> +            for MAC bindings with IP addresses in the fe80::/10
IPv6
> > range, and
> >>>>> +            a default threshold of 1200 seconds for all other MAC
> > bindings.
> >>>>> +        </p>
> >>>>> +
> >>>>>        </column>
> >>>>>      </group>
> >>>>>
> >>>>> diff --git a/tests/ovn.at b/tests/ovn.at
> >>>>> index 637d92bedbc7..07b5c1ec57fc 100644
> >>>>> --- a/tests/ovn.at
> >>>>> +++ b/tests/ovn.at
> >>>>> @@ -34676,6 +34676,66 @@ OVS_WAIT_UNTIL([
> >>>>>
> >>>>>  AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=79 --no-stats |
> > strip_cookie], [0], [])
> >>>>>
> >>>>> +# Test CIDR-based threshold configuration
> >>>>> +check ovn-nbctl set logical_router gw
> > options:mac_binding_age_threshold="
> > 192.168.10.0/255.255.255.0:2;192.168.10.64/26:0;192.168.10.20:0"
> >>>>> +check ovn-nbctl --wait=sb sync
> >>>>> +uuid=$(fetch_column datapath _uuid external_ids:name=gw)
> >>>>> +AT_CHECK([ovn-sbctl get datapath $uuid
> > external_ids:mac_binding_age_threshold], [0], [dnl
> >>>>> +"2"
> >>>>> +])
> >>>>> +
> >>>>> +# Send GARP to populate MAC binding table records
> >>>>> +send_garp hv1 ext1 10 # belong to 192.168.10.0/24
> >>>>> +send_garp hv2 ext2 20 # belong to 192.168.10.20/32
> >>>>> +send_garp hv2 ext2 65 # belong to 192.168.10.64/26
> >>>>> +
> >>>>> +OVS_WAIT_UNTIL([ovn-sbctl list mac_binding | grep -q
> > "192.168.10.10"])
> >>>>> +OVS_WAIT_UNTIL([ovn-sbctl list mac_binding | grep -q
> > "192.168.10.20"])
> >>>>> +OVS_WAIT_UNTIL([ovn-sbctl list mac_binding | grep -q
> > "192.168.10.65"])
> >>>>> +
> >>>>> +OVS_WAIT_UNTIL([
> >>>>> +    test "0" = "$(ovn-sbctl list mac_binding | grep -c
> > '192.168.10.10')"
> >>>>> +])
> >>>>> +# The other two should remain because the corresponding prefixes
> > have threshold 0
> >>>>> +AT_CHECK([ovn-sbctl list mac_binding | grep -q "192.168.10.20"])
> >>>>> +AT_CHECK([ovn-sbctl list mac_binding | grep -q "192.168.10.65"])
> >>>>> +check ovn-sbctl --all destroy mac_binding
> >>>>> +
> >>>>> +# Set the aging threshold mixed with IPv6 prefixes and default
> > threshold
> >>>>> +check ovn-nbctl set logical_router gw
> > options:mac_binding_age_threshold="1;
192.168.10.64/26:0;ff00:1234::/32:888"
> >>>>> +check ovn-nbctl --wait=sb sync
> >>>>> +uuid=$(fetch_column datapath _uuid external_ids:name=gw)
> >>>>> +AT_CHECK([ovn-sbctl get datapath $uuid
> > external_ids:mac_binding_age_threshold], [0], [dnl
> >>>>> +"1"
> >>>>> +])
> >>>>> +
> >>>>> +# Send GARP to populate MAC binding table records
> >>>>> +send_garp hv1 ext1 10 # belong to default
> >>>>> +send_garp hv2 ext2 65 # belong to 192.168.10.64/26
> >>>>> +
> >>>>> +OVS_WAIT_UNTIL([ovn-sbctl list mac_binding | grep -q
> > "192.168.10.10"])
> >>>>> +OVS_WAIT_UNTIL([ovn-sbctl list mac_binding | grep -q
> > "192.168.10.65"])
> >>>>> +
> >>>>> +OVS_WAIT_UNTIL([
> >>>>> +    test "0" = "$(ovn-sbctl list mac_binding | grep -c
> > '192.168.10.10')"
> >>>>> +])
> >>>>> +AT_CHECK([ovn-sbctl list mac_binding | grep -q "192.168.10.65"])
> >>>>> +check ovn-sbctl --all destroy mac_binding
> >>>>> +
> >>>>> +# Set the aging threshold with invalid format
> >>>>> +check ovn-nbctl set logical_router gw
> > options:mac_binding_age_threshold="1;abc/26:0"
> >>>>> +check ovn-nbctl --wait=sb sync
> >>>>> +uuid=$(fetch_column datapath _uuid external_ids:name=gw)
> >>>>> +AT_CHECK([ovn-sbctl get datapath $uuid
> > external_ids:mac_binding_age_threshold], [1], [ignore], [ignore])
> >>>>> +
> >>>>> +# Send GARP to populate MAC binding table records
> >>>>> +send_garp hv1 ext1 10
> >>>>> +
> >>>>> +OVS_WAIT_UNTIL([ovn-sbctl list mac_binding | grep -q
> > "192.168.10.10"])
> >>>>> +# The record is not deleted
> >>>>> +sleep 5
> >>>>> +AT_CHECK([ovn-sbctl list mac_binding | grep -q "192.168.10.10"])
> >>>>> +
> >>>>>  OVN_CLEANUP([hv1], [hv2])
> >>>>>  AT_CLEANUP
> >>>>>  ])
> >>>>> --
> >>>>> 2.38.1
> >>>>>
> >>>>> _______________________________________________
> >>>>> dev mailing list
> >>>>> d...@openvswitch.org
> >>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >>>>>
> >>>>
> >>>> Thanks,
> >>>> Ales
> >>>>
> >>>> --
> >>>>
> >>>> Ales Musil
> >>>>
> >>>> Senior Software Engineer - OVN Core
> >>>>
> >>>> Red Hat EMEA
> >>>>
> >>>> amu...@redhat.com
> >>
> >>
> >> Thanks,
> >> Ales
> >> --
> >>
> >> Ales Musil
> >>
> >> Senior Software Engineer - OVN Core
> >>
> >> Red Hat EMEA
> >>
> >> amu...@redhat.com
> > _______________________________________________
> > dev mailing list
> > d...@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to