On Thu, Feb 4, 2021 at 5:15 PM William Tu <u9012...@gmail.com> wrote:
>
> Hi Toshiaki,
>
> Thanks for the patch. I have some questions inline.
>
> On Thu, Jul 30, 2020 at 7:55 PM Toshiaki Makita
> <toshiaki.maki...@gmail.com> wrote:
> >
> > This provider offloads classifier to software XDP.
> >
> > It works only when a custom XDP object is loaded by afxdp netdev.
> > The BPF program needs to implement classifier with array-of-maps for
> > subtable hashmaps and arraymap for subtable masks. The flow api
> > provider detects classifier support in the custom XDP program when
> > loading it.
> >
> > The requirements for the BPF program is as below:
> >
> > - A struct named "meta_info" in ".ovs_meta" section
> >   inform vswitchd of supported keys, actions, and the max number of
> >   actions.
> >
> > - A map named "subtbl_masks"
> >   An array which implements a list. The entry contains struct
> >   xdp_subtables_mask provided by lib/netdev-offload-xdp.h followed by
> >   miniflow buf of any length for the mask.
> >
> > - A map named "subtbl_masks_hd"
> >   A one entry int array which contains the first entry index of the list
> >   implemented by subtbl_masks.
> >
> > - A map named "flow_table"
> >   An array-of-maps. Each entry points to subtable hash-map instantiated
> >   with the following "subtbl_template".
> >   The entry of each index of subtbl_masks has miniflow mask of subtable
> >   in the corresponding index of flow_table.
> >
> > - A map named "subtbl_template"
> >   A template for subtable, used for entries of "flow_table".
> >   The key must be miniflow buf (without leading map).
> >
> > - A map named "output_map"
> >   A devmap. Each entry represents odp port.
> >
> > - A map named "xsks_map"
> >   A xskmap. Used for upcall.
> >
> > For more details, refer to the reference BPF program in next commit.
> >
> > In the future it may be possible to offload classifier to SmartNIC
> > through XDP, but currently it's not because map-in-map is not supported
> > by XDP hw-offload.
> >
> > Signed-off-by: Toshiaki Makita <toshiaki.maki...@gmail.com>
> > ---
> >  acinclude.m4                  |   25 +
> >  configure.ac                  |    1 +
> >  lib/automake.mk               |    8 +
> >  lib/bpf-util.c                |   38 ++
> >  lib/bpf-util.h                |   22 +
> >  lib/netdev-afxdp.c            |  218 +++++-
> >  lib/netdev-afxdp.h            |    3 +
> >  lib/netdev-linux-private.h    |    2 +
> >  lib/netdev-offload-provider.h |    8 +-
> >  lib/netdev-offload-xdp.c      | 1213 +++++++++++++++++++++++++++++++++
> >  lib/netdev-offload-xdp.h      |   49 ++
> >  lib/netdev-offload.c          |    3 +
> >  12 files changed, 1588 insertions(+), 2 deletions(-)
> >  create mode 100644 lib/bpf-util.c
> >  create mode 100644 lib/bpf-util.h
> >  create mode 100644 lib/netdev-offload-xdp.c
> >  create mode 100644 lib/netdev-offload-xdp.h
> >
> > diff --git a/acinclude.m4 b/acinclude.m4
> > index 4bac9dbdd..31ff0c013 100644
> > --- a/acinclude.m4
> > +++ b/acinclude.m4
> > @@ -329,6 +329,31 @@ AC_DEFUN([OVS_CHECK_LINUX_AF_XDP], [
> >    AM_CONDITIONAL([HAVE_AF_XDP], test "$AF_XDP_ENABLE" = true)
> >  ])
> >
> > +dnl OVS_CHECK_LINUX_XDP_OFFLOAD
> > +dnl
> > +dnl Check both llvm and libbpf support
> > +AC_DEFUN([OVS_CHECK_LINUX_XDP_OFFLOAD], [
> > +  AC_ARG_ENABLE([xdp_offload],
> > +                [AC_HELP_STRING([--enable-xdp-offload],
> > +                                [Compile XDP offload])],
> > +                [], [enable_xdp_offload=no])
> > +  AC_MSG_CHECKING([whether XDP offload is enabled])
> > +  if test "$enable_xdp_offload" != yes; then
> > +    AC_MSG_RESULT([no])
> > +    XDP_OFFLOAD_ENABLE=false
> > +  else
> > +    if test "$enable_afxdp" == no; then
> > +      AC_MSG_ERROR([XDP offload depends on afxdp. Please add 
> > --enable-afxdp.])
> > +    fi
> > +    AC_MSG_RESULT([yes])
> > +    XDP_OFFLOAD_ENABLE=true
> > +
> > +    AC_DEFINE([HAVE_XDP_OFFLOAD], [1],
> > +              [Define to 1 if XDP offload compilation is available and 
> > enabled.])
> > +  fi
> > +  AM_CONDITIONAL([HAVE_XDP_OFFLOAD], test "$XDP_OFFLOAD_ENABLE" = true)
> > +])
> > +
> >  dnl OVS_CHECK_DPDK
> >  dnl
> >  dnl Configure DPDK source tree
> > diff --git a/configure.ac b/configure.ac
> > index 8d37af9db..530112c49 100644
> > --- a/configure.ac
> > +++ b/configure.ac
> > @@ -99,6 +99,7 @@ OVS_CHECK_DOT
> >  OVS_CHECK_IF_DL
> >  OVS_CHECK_STRTOK_R
> >  OVS_CHECK_LINUX_AF_XDP
> > +OVS_CHECK_LINUX_XDP_OFFLOAD
> >  AC_CHECK_DECLS([sys_siglist], [], [], [[#include <signal.h>]])
> >  AC_CHECK_MEMBERS([struct stat.st_mtim.tv_nsec, struct stat.st_mtimensec],
> >    [], [], [[#include <sys/stat.h>]])
> > diff --git a/lib/automake.mk b/lib/automake.mk
> > index 920c958e3..9486b32e7 100644
> > --- a/lib/automake.mk
> > +++ b/lib/automake.mk
> > @@ -451,6 +451,14 @@ lib_libopenvswitch_la_SOURCES += \
> >         lib/netdev-afxdp.h
> >  endif
> >
> > +if HAVE_XDP_OFFLOAD
> > +lib_libopenvswitch_la_SOURCES += \
> > +       lib/bpf-util.c \
> > +       lib/bpf-util.h \
> > +       lib/netdev-offload-xdp.c \
> > +       lib/netdev-offload-xdp.h
> > +endif
> > +
> >  if DPDK_NETDEV
> >  lib_libopenvswitch_la_SOURCES += \
> >         lib/dpdk.c \
> > diff --git a/lib/bpf-util.c b/lib/bpf-util.c
> > new file mode 100644
> > index 000000000..324cfbe1d
> > --- /dev/null
> > +++ b/lib/bpf-util.c
> > @@ -0,0 +1,38 @@
> > +/*
> > + * Copyright (c) 2020 NTT Corp.
> > + *
> > + * Licensed under the Apache License, Version 2.0 (the "License");
> > + * you may not use this file except in compliance with the License.
> > + * You may obtain a copy of the License at:
> > + *
> > + *     http://www.apache.org/licenses/LICENSE-2.0
> > + *
> > + * Unless required by applicable law or agreed to in writing, software
> > + * distributed under the License is distributed on an "AS IS" BASIS,
> > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> > + * See the License for the specific language governing permissions and
> > + * limitations under the License.
> > + */
> > +
> > +#include <config.h>
> > +
> > +#include "bpf-util.h"
> > +
> > +#include <bpf/libbpf.h>
> > +
> > +#include "ovs-thread.h"
> > +
> > +DEFINE_STATIC_PER_THREAD_DATA(struct { char s[128]; },
> > +                              libbpf_strerror_buffer,
> > +                              { "" });
> > +
> > +const char *
> > +ovs_libbpf_strerror(int err)
> > +{
> > +    enum { BUFSIZE = sizeof libbpf_strerror_buffer_get()->s };
> > +    char *buf = libbpf_strerror_buffer_get()->s;
> > +
> > +    libbpf_strerror(err, buf, BUFSIZE);
> > +
> > +    return buf;
> > +}
> > diff --git a/lib/bpf-util.h b/lib/bpf-util.h
> > new file mode 100644
> > index 000000000..6346935b3
> > --- /dev/null
> > +++ b/lib/bpf-util.h
> > @@ -0,0 +1,22 @@
> > +/*
> > + * Copyright (c) 2020 NTT Corp.
> > + *
> > + * Licensed under the Apache License, Version 2.0 (the "License");
> > + * you may not use this file except in compliance with the License.
> > + * You may obtain a copy of the License at:
> > + *
> > + *     http://www.apache.org/licenses/LICENSE-2.0
> > + *
> > + * Unless required by applicable law or agreed to in writing, software
> > + * distributed under the License is distributed on an "AS IS" BASIS,
> > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> > + * See the License for the specific language governing permissions and
> > + * limitations under the License.
> > + */
> > +
> > +#ifndef BPF_UTIL_H
> > +#define BPF_UTIL_H 1
> > +
> > +const char *ovs_libbpf_strerror(int err);
> > +
> > +#endif /* bpf-util.h */
> > diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c
> > index e1374ccbb..00204f299 100644
> > --- a/lib/netdev-afxdp.c
> > +++ b/lib/netdev-afxdp.c
> > @@ -22,6 +22,7 @@
> >  #include "netdev-afxdp-pool.h"
> >
> >  #include <bpf/bpf.h>
> > +#include <bpf/btf.h>
> >  #include <errno.h>
> >  #include <inttypes.h>
> >  #include <linux/rtnetlink.h>
> > @@ -37,10 +38,13 @@
> >  #include <sys/types.h>
> >  #include <unistd.h>
> >
> > +#include "bpf-util.h"
> >  #include "coverage.h"
> >  #include "dp-packet.h"
> >  #include "dpif-netdev.h"
> >  #include "fatal-signal.h"
> > +#include "netdev-offload-provider.h"
> > +#include "netdev-offload-xdp.h"
> >  #include "openvswitch/compiler.h"
> >  #include "openvswitch/dynamic-string.h"
> >  #include "openvswitch/list.h"
> > @@ -261,10 +265,204 @@ netdev_afxdp_sweep_unused_pools(void *aux OVS_UNUSED)
> >      ovs_mutex_unlock(&unused_pools_mutex);
> >  }
> >
> > +bool
> > +has_xdp_flowtable(struct netdev *netdev)
> > +{
> > +    struct netdev_linux *dev = netdev_linux_cast(netdev);
> > +
> > +    return dev->has_xdp_flowtable;
> > +}
> > +
> > +struct bpf_object *
> > +get_xdp_object(struct netdev *netdev)
> > +{
> > +    struct netdev_linux *dev = netdev_linux_cast(netdev);
> > +
> > +    return dev->xdp_obj;
> > +}
> > +
> > +#ifdef HAVE_XDP_OFFLOAD
> > +static int
> > +xdp_preload(struct netdev *netdev, struct bpf_object *obj)
> > +{
> > +    static struct ovsthread_once output_map_once = 
> > OVSTHREAD_ONCE_INITIALIZER;
> > +    struct btf *btf;
> > +    struct bpf_map *flow_table, *subtbl_template, *subtbl_masks_hd, 
> > *xsks_map,
> > +                   *output_map;
> > +    const struct bpf_map_def *flow_table_def, *subtbl_def;
> > +    int flow_table_fd, subtbl_meta_fd, xsks_map_fd;
> > +    static int output_map_fd = -1;
> > +    int n_rxq, err;
> > +
> > +    btf = bpf_object__btf(obj);
> > +    if (!btf) {
> > +        VLOG_DBG("BPF object for netdev \"%s\" does not contain BTF",
> > +                 netdev_get_name(netdev));
> > +        return EOPNOTSUPP;
> > +    }
> > +    if (btf__find_by_name_kind(btf, ".ovs_meta", BTF_KIND_DATASEC) < 0) {
> > +        VLOG_DBG("\".ovs_meta\" datasec not found in BTF");
> > +        return EOPNOTSUPP;
> > +    }
> > +
> > +    flow_table = bpf_object__find_map_by_name(obj, "flow_table");
> > +    if (!flow_table) {
> > +        VLOG_DBG("%s: \"flow_table\" map does not exist",
> > +                 netdev_get_name(netdev));
> > +        return EOPNOTSUPP;
> > +    }
> > +
> > +    subtbl_template = bpf_object__find_map_by_name(obj, "subtbl_template");
> > +    if (!subtbl_template) {
> > +        VLOG_DBG("%s: \"subtbl_template\" map does not exist",
> > +                 netdev_get_name(netdev));
> > +        return EOPNOTSUPP;
> > +    }
> > +
> > +    output_map = bpf_object__find_map_by_name(obj, "output_map");
> > +    if (!output_map) {
> > +        VLOG_DBG("%s: \"output_map\" map does not exist",
> > +                 netdev_get_name(netdev));
> > +        return EOPNOTSUPP;
> > +    }
> > +
> > +    if (ovsthread_once_start(&output_map_once)) {
> > +        output_map_fd = bpf_create_map_name(BPF_MAP_TYPE_DEVMAP, 
> > "output_map",
> > +                                            sizeof(int), sizeof(int),
> > +                                            XDP_MAX_PORTS, 0);
> > +        if (output_map_fd < 0) {
> > +            err = errno;
> > +            VLOG_WARN("%s: Map creation for output_map failed: %s",
> > +                      netdev_get_name(netdev), ovs_strerror(errno));
> > +            ovsthread_once_done(&output_map_once);
> > +            return err;
> > +        }
> > +        ovsthread_once_done(&output_map_once);
> > +    }
> > +
> > +    flow_table_def = bpf_map__def(flow_table);
> > +    if (flow_table_def->type != BPF_MAP_TYPE_ARRAY_OF_MAPS) {
> > +        VLOG_WARN("%s: \"flow_table\" map type is not map-in-map array.",
> > +                  netdev_get_name(netdev));
> > +        return EINVAL;
> > +    }
>
> IIUC, for example flow_table, here we tried to read its definition from the
> object file (without loading it), and create a map, get its fd.
>
> > +
> > +    subtbl_def = bpf_map__def(subtbl_template);
> > +    if (subtbl_def->type != BPF_MAP_TYPE_HASH) {
> > +        VLOG_WARN("%s: \"subtbl_templates\" map type is not hash-table",
> > +                  netdev_get_name(netdev));
> > +        return EINVAL;
> > +    }
> > +
> > +    subtbl_meta_fd = bpf_create_map(BPF_MAP_TYPE_HASH, 
> > subtbl_def->key_size,
> > +                                    subtbl_def->value_size,
> > +                                    subtbl_def->max_entries, 0);
> > +    if (subtbl_meta_fd < 0) {
> > +        err = errno;
> > +        VLOG_WARN("%s: Map creation for flow_table meta table failed: %s",
> > +                  netdev_get_name(netdev), ovs_strerror(errno));
> > +        return err;
> > +    }
> > +    flow_table_fd = bpf_create_map_in_map(BPF_MAP_TYPE_ARRAY_OF_MAPS,
> > +                                          "flow_table", sizeof(uint32_t),
> > +                                          subtbl_meta_fd,
> > +                                          flow_table_def->max_entries,
> > +                                          0);
> > +    if (flow_table_fd < 0) {
> > +        err = errno;
> > +        VLOG_WARN("%s: Map creation for flow_table failed: %s",
> > +                  netdev_get_name(netdev), ovs_strerror(errno));
> > +        close(subtbl_meta_fd);
> > +        return err;
> > +    }
> > +    close(subtbl_meta_fd);
> > +
> > +    err = bpf_map__reuse_fd(flow_table, flow_table_fd);
> > +    if (err) {
> > +        VLOG_WARN("%s: Failed to reuse flow_table fd: %s",
> > +                  netdev_get_name(netdev), ovs_libbpf_strerror(err));
> > +        close(flow_table_fd);
> > +        return EINVAL;
> > +    }
> > +    close(flow_table_fd);
>
> Question:
> What's the purpose of bpf_map__reuse_fd and why do we close the flow_table_fd.
> I though later on when we call "bpf_object__load(obj)", all the maps
> will be loaded?
>
> > +
> > +    subtbl_masks_hd = bpf_object__find_map_by_name(obj, "subtbl_masks_hd");
> > +    if (!subtbl_masks_hd) {
> > +        /* Head index can be a global variable. In that case libbpf will
> > +         * initialize it. */
> > +        VLOG_DBG("%s: \"subtbl_masks_hd\" map does not exist",
> > +                 netdev_get_name(netdev));
> > +    } else {
> > +        int head_fd, head = XDP_SUBTABLES_TAIL, zero = 0;
> > +
> > +        head_fd = bpf_create_map_name(BPF_MAP_TYPE_ARRAY, 
> > "subtbl_masks_hd",
> > +                                      sizeof(uint32_t), sizeof(int), 1, 0);
> > +        if (head_fd < 0) {
> > +            err = errno;
> > +            VLOG_ERR("%s: Map creation of \"subtbl_masks_hd\" failed: %s",
> > +                     netdev_get_name(netdev), ovs_strerror(errno));
> > +            return err;
> > +        }
> > +
> > +        if (bpf_map_update_elem(head_fd, &zero, &head, 0)) {
> > +            err = errno;
> > +            VLOG_ERR("Cannot update subtbl_masks_hd: %s",
> > +                     ovs_strerror(errno));
> > +            return err;
> > +        }
> > +
> > +        err = bpf_map__reuse_fd(subtbl_masks_hd, head_fd);
> > +        if (err) {
> > +            VLOG_ERR("%s: Failed to reuse subtbl_masks_hd fd: %s",
> > +                     netdev_get_name(netdev), ovs_libbpf_strerror(err));
> > +            close(head_fd);
> > +            return EINVAL;
> > +        }
> > +        close(head_fd);
> > +    }
> > +
> > +    xsks_map = bpf_object__find_map_by_name(obj, "xsks_map");
> > +    if (!xsks_map) {
> > +        VLOG_ERR("%s: BUG: \"xsks_map\" map does not exist",
> > +                 netdev_get_name(netdev));
> > +        return EOPNOTSUPP;
> > +    }
> > +
> > +    n_rxq = netdev_n_rxq(netdev);
> > +    xsks_map_fd = bpf_create_map_name(BPF_MAP_TYPE_XSKMAP, "xsks_map",
> > +                                      sizeof(int), sizeof(int), n_rxq, 0);
> > +    if (xsks_map_fd < 0) {
> > +        err = errno;
> > +        VLOG_WARN("%s: Map creation for xsks_map failed: %s",
> > +                  netdev_get_name(netdev), ovs_strerror(errno));
> > +        return err;
> > +    }
> > +
> > +    err = bpf_map__reuse_fd(xsks_map, xsks_map_fd);
> > +    if (err) {
> > +        VLOG_WARN("%s: Failed to reuse xsks_map fd: %s",
> > +                  netdev_get_name(netdev), ovs_libbpf_strerror(err));
> > +        close(xsks_map_fd);
> > +        return EINVAL;
> > +    }
> > +    close(xsks_map_fd);
> > +
> > +    err = bpf_map__reuse_fd(output_map, output_map_fd);
> > +    if (err) {
> > +        VLOG_WARN("%s: Failed to reuse output_map fd: %s",
> > +                  netdev_get_name(netdev), ovs_libbpf_strerror(err));
> > +        return EINVAL;
> > +    }
>
> Do you need to close(output_map_fd)?
>
> > +
> > +    return 0;
> > +}
> > +#endif
> > +
> >  static int
> >  xsk_load_prog(struct netdev *netdev, const char *path,
> >                struct bpf_object **pobj, int *prog_fd)
> >  {
> > +    struct netdev_linux *dev OVS_UNUSED = netdev_linux_cast(netdev);
> >      struct bpf_object_open_attr attr = {
> >          .prog_type = BPF_PROG_TYPE_XDP,
> >          .file = path,
> > @@ -298,6 +496,14 @@ xsk_load_prog(struct netdev *netdev, const char *path,
> >          goto err;
> >      }
> >
> > +#ifdef HAVE_XDP_OFFLOAD
> > +    if (!xdp_preload(netdev, obj)) {
>
> I forgot what's the purpose of doing xdp_preload, before we call
> bpf_object__load below.
> Does it just to check whether the flowtable_afxdp.o has every map we want?
> Or it does initialize something else.
>
> > +        VLOG_INFO("%s: Detected flowtable support in XDP program",
> > +                  netdev_get_name(netdev));
> > +        dev->has_xdp_flowtable = true;
> > +    }
> > +#endif
> > +
> >      if (bpf_object__load(obj)) {
> >          VLOG_ERR("%s: Can't load XDP program at '%s'",
> >                   netdev_get_name(netdev), path);
> > @@ -1297,7 +1503,17 @@ libbpf_print(enum libbpf_print_level level,
> >
> >  int netdev_afxdp_init(void)
> >  {
> > -    libbpf_set_print(libbpf_print);
> > +    static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
> > +
> > +    if (ovsthread_once_start(&once)) {
> > +        libbpf_set_print(libbpf_print);
> > +#ifdef HAVE_XDP_OFFLOAD
> > +        if (netdev_register_flow_api_provider(&netdev_offload_xdp)) {
> > +            VLOG_WARN("Failed to register XDP flow api provider");
> > +        }
> > +#endif
> > +        ovsthread_once_done(&once);
> > +    }
> >      return 0;
> >  }
> >
> > diff --git a/lib/netdev-afxdp.h b/lib/netdev-afxdp.h
> > index e91cd102d..324152e8f 100644
> > --- a/lib/netdev-afxdp.h
> > +++ b/lib/netdev-afxdp.h
> > @@ -44,6 +44,7 @@ struct netdev_stats;
> >  struct smap;
> >  struct xdp_umem;
> >  struct xsk_socket_info;
> > +struct bpf_object;
> >
> >  int netdev_afxdp_rxq_construct(struct netdev_rxq *rxq_);
> >  void netdev_afxdp_rxq_destruct(struct netdev_rxq *rxq_);
> > @@ -70,6 +71,8 @@ int netdev_afxdp_get_custom_stats(const struct netdev 
> > *netdev,
> >  void free_afxdp_buf(struct dp_packet *p);
> >  int netdev_afxdp_reconfigure(struct netdev *netdev);
> >  void signal_remove_xdp(struct netdev *netdev);
> > +bool has_xdp_flowtable(struct netdev *netdev);
> > +struct bpf_object *get_xdp_object(struct netdev *netdev);
> >
> >  #else /* !HAVE_AF_XDP */
> >
> Thanks
> William

btw, looking at lib/netdev-offload-xdp.c
the netdev_xdp_flow_get is not implemented.
Does that mean we will not detect flow that alread exists and always
replace with the new flow?

also, when I tried to dump-flow, is there a way to know which flow is
processed in XDP,
and which flows are not offloaded?
looking at man ovs-vswitchd(8), DATAPATH FLOW TABLE DEBUGGING COMMANDS,
dpctl/dump-flows, should we add "xdp - displays flows handled in the xdp" there?

Thanks
William
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to