Re: [ovs-dev] [PATCH net-next] openvswitch: Introduce per-cpu upcall dispatch
On 15/07/2021 05:45, Pravin Shelar wrote: > On Wed, Jun 30, 2021 at 2:53 AM Mark Gray wrote: >> >> The Open vSwitch kernel module uses the upcall mechanism to send >> packets from kernel space to user space when it misses in the kernel >> space flow table. The upcall sends packets via a Netlink socket. >> Currently, a Netlink socket is created for every vport. In this way, >> there is a 1:1 mapping between a vport and a Netlink socket. >> When a packet is received by a vport, if it needs to be sent to >> user space, it is sent via the corresponding Netlink socket. >> >> This mechanism, with various iterations of the corresponding user >> space code, has seen some limitations and issues: >> >> * On systems with a large number of vports, there is a correspondingly >> large number of Netlink sockets which can limit scaling. >> (https://bugzilla.redhat.com/show_bug.cgi?id=1526306) >> * Packet reordering on upcalls. >> (https://bugzilla.redhat.com/show_bug.cgi?id=1844576) >> * A thundering herd issue. >> (https://bugzilla.redhat.com/show_bug.cgi?id=183) >> >> This patch introduces an alternative, feature-negotiated, upcall >> mode using a per-cpu dispatch rather than a per-vport dispatch. >> >> In this mode, the Netlink socket to be used for the upcall is >> selected based on the CPU of the thread that is executing the upcall. >> In this way, it resolves the issues above as: >> >> a) The number of Netlink sockets scales with the number of CPUs >> rather than the number of vports. >> b) Ordering per-flow is maintained as packets are distributed to >> CPUs based on mechanisms such as RSS and flows are distributed >> to a single user space thread. >> c) Packets from a flow can only wake up one user space thread. >> >> The corresponding user space code can be found at: >> https://mail.openvswitch.org/pipermail/ovs-dev/2021-April/382618.html >> >> Bugzilla: https://bugzilla.redhat.com/1844576 >> Signed-off-by: Mark Gray >> --- >> >> Notes: >> v1 - Reworked based on Flavio's comments: >> * Fixed handling of userspace action case >> * Renamed 'struct dp_portids' >> * Fixed handling of return from kmalloc() >> * Removed check for dispatch type from ovs_dp_get_upcall_portid() >>- Reworked based on Dan's comments: >> * Fixed handling of return from kmalloc() >>- Reworked based on Pravin's comments: >> * Fixed handling of userspace action case >>- Added kfree() in destroy_dp_rcu() to cleanup netlink port ids >> > Patch looks good to me. I have the following minor comments. > >> include/uapi/linux/openvswitch.h | 8 >> net/openvswitch/actions.c| 6 ++- >> net/openvswitch/datapath.c | 70 +++- >> net/openvswitch/datapath.h | 20 + >> 4 files changed, 101 insertions(+), 3 deletions(-) >> >> diff --git a/include/uapi/linux/openvswitch.h >> b/include/uapi/linux/openvswitch.h >> index 8d16744edc31..6571b57b2268 100644 >> --- a/include/uapi/linux/openvswitch.h >> +++ b/include/uapi/linux/openvswitch.h >> @@ -70,6 +70,8 @@ enum ovs_datapath_cmd { >> * set on the datapath port (for OVS_ACTION_ATTR_MISS). Only valid on >> * %OVS_DP_CMD_NEW requests. A value of zero indicates that upcalls should >> * not be sent. >> + * OVS_DP_ATTR_PER_CPU_PIDS: Per-cpu array of PIDs for upcalls when >> + * OVS_DP_F_DISPATCH_UPCALL_PER_CPU feature is set. >> * @OVS_DP_ATTR_STATS: Statistics about packets that have passed through the >> * datapath. Always present in notifications. >> * @OVS_DP_ATTR_MEGAFLOW_STATS: Statistics about mega flow masks usage for >> the >> @@ -87,6 +89,9 @@ enum ovs_datapath_attr { >> OVS_DP_ATTR_USER_FEATURES, /* OVS_DP_F_* */ >> OVS_DP_ATTR_PAD, >> OVS_DP_ATTR_MASKS_CACHE_SIZE, >> + OVS_DP_ATTR_PER_CPU_PIDS, /* Netlink PIDS to receive upcalls in >> per-cpu >> +* dispatch mode >> +*/ >> __OVS_DP_ATTR_MAX >> }; >> >> @@ -127,6 +132,9 @@ struct ovs_vport_stats { >> /* Allow tc offload recirc sharing */ >> #define OVS_DP_F_TC_RECIRC_SHARING (1 << 2) >> >> +/* Allow per-cpu dispatch of upcalls */ >> +#define OVS_DP_F_DISPATCH_UPCALL_PER_CPU (1 << 3) >> + >> /* Fixed logical ports. */ >> #define OVSP_LOCAL ((__u32)0) >> >> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c >> index ef15d9eb4774..f79679746c62 100644 >> --- a/net/openvswitch/actions.c >> +++ b/net/openvswitch/actions.c >> @@ -924,7 +924,11 @@ static int output_userspace(struct datapath *dp, struct >> sk_buff *skb, >> break; >> >> case OVS_USERSPACE_ATTR_PID: >> - upcall.portid = nla_get_u32(a); >> + if (dp->user_features & >> OVS_DP_F_DISPATCH_UPCALL_PER_CPU) >> + upcall.portid = >> +
Re: [ovs-dev] [PATCH net-next] openvswitch: Introduce per-cpu upcall dispatch
On Wed, Jun 30, 2021 at 2:53 AM Mark Gray wrote: > > The Open vSwitch kernel module uses the upcall mechanism to send > packets from kernel space to user space when it misses in the kernel > space flow table. The upcall sends packets via a Netlink socket. > Currently, a Netlink socket is created for every vport. In this way, > there is a 1:1 mapping between a vport and a Netlink socket. > When a packet is received by a vport, if it needs to be sent to > user space, it is sent via the corresponding Netlink socket. > > This mechanism, with various iterations of the corresponding user > space code, has seen some limitations and issues: > > * On systems with a large number of vports, there is a correspondingly > large number of Netlink sockets which can limit scaling. > (https://bugzilla.redhat.com/show_bug.cgi?id=1526306) > * Packet reordering on upcalls. > (https://bugzilla.redhat.com/show_bug.cgi?id=1844576) > * A thundering herd issue. > (https://bugzilla.redhat.com/show_bug.cgi?id=183) > > This patch introduces an alternative, feature-negotiated, upcall > mode using a per-cpu dispatch rather than a per-vport dispatch. > > In this mode, the Netlink socket to be used for the upcall is > selected based on the CPU of the thread that is executing the upcall. > In this way, it resolves the issues above as: > > a) The number of Netlink sockets scales with the number of CPUs > rather than the number of vports. > b) Ordering per-flow is maintained as packets are distributed to > CPUs based on mechanisms such as RSS and flows are distributed > to a single user space thread. > c) Packets from a flow can only wake up one user space thread. > > The corresponding user space code can be found at: > https://mail.openvswitch.org/pipermail/ovs-dev/2021-April/382618.html > > Bugzilla: https://bugzilla.redhat.com/1844576 > Signed-off-by: Mark Gray > --- > > Notes: > v1 - Reworked based on Flavio's comments: > * Fixed handling of userspace action case > * Renamed 'struct dp_portids' > * Fixed handling of return from kmalloc() > * Removed check for dispatch type from ovs_dp_get_upcall_portid() >- Reworked based on Dan's comments: > * Fixed handling of return from kmalloc() >- Reworked based on Pravin's comments: > * Fixed handling of userspace action case >- Added kfree() in destroy_dp_rcu() to cleanup netlink port ids > Patch looks good to me. I have the following minor comments. > include/uapi/linux/openvswitch.h | 8 > net/openvswitch/actions.c| 6 ++- > net/openvswitch/datapath.c | 70 +++- > net/openvswitch/datapath.h | 20 + > 4 files changed, 101 insertions(+), 3 deletions(-) > > diff --git a/include/uapi/linux/openvswitch.h > b/include/uapi/linux/openvswitch.h > index 8d16744edc31..6571b57b2268 100644 > --- a/include/uapi/linux/openvswitch.h > +++ b/include/uapi/linux/openvswitch.h > @@ -70,6 +70,8 @@ enum ovs_datapath_cmd { > * set on the datapath port (for OVS_ACTION_ATTR_MISS). Only valid on > * %OVS_DP_CMD_NEW requests. A value of zero indicates that upcalls should > * not be sent. > + * OVS_DP_ATTR_PER_CPU_PIDS: Per-cpu array of PIDs for upcalls when > + * OVS_DP_F_DISPATCH_UPCALL_PER_CPU feature is set. > * @OVS_DP_ATTR_STATS: Statistics about packets that have passed through the > * datapath. Always present in notifications. > * @OVS_DP_ATTR_MEGAFLOW_STATS: Statistics about mega flow masks usage for > the > @@ -87,6 +89,9 @@ enum ovs_datapath_attr { > OVS_DP_ATTR_USER_FEATURES, /* OVS_DP_F_* */ > OVS_DP_ATTR_PAD, > OVS_DP_ATTR_MASKS_CACHE_SIZE, > + OVS_DP_ATTR_PER_CPU_PIDS, /* Netlink PIDS to receive upcalls in > per-cpu > +* dispatch mode > +*/ > __OVS_DP_ATTR_MAX > }; > > @@ -127,6 +132,9 @@ struct ovs_vport_stats { > /* Allow tc offload recirc sharing */ > #define OVS_DP_F_TC_RECIRC_SHARING (1 << 2) > > +/* Allow per-cpu dispatch of upcalls */ > +#define OVS_DP_F_DISPATCH_UPCALL_PER_CPU (1 << 3) > + > /* Fixed logical ports. */ > #define OVSP_LOCAL ((__u32)0) > > diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c > index ef15d9eb4774..f79679746c62 100644 > --- a/net/openvswitch/actions.c > +++ b/net/openvswitch/actions.c > @@ -924,7 +924,11 @@ static int output_userspace(struct datapath *dp, struct > sk_buff *skb, > break; > > case OVS_USERSPACE_ATTR_PID: > - upcall.portid = nla_get_u32(a); > + if (dp->user_features & > OVS_DP_F_DISPATCH_UPCALL_PER_CPU) > + upcall.portid = > + ovs_dp_get_upcall_portid(dp, > smp_processor_id()); > + else > + upcall.portid = nla_get_u32(a); >
Re: [ovs-dev] [PATCH net-next] openvswitch: Introduce per-cpu upcall dispatch
Hi Joe, Maybe you can take a look... Thanks, fbl On Thu, Jul 08, 2021 at 11:40:12AM -0300, Flavio Leitner wrote: > > Hi Pravin, > > Any thoughts on this patch? We are closing OVS 2.16, so it would > be nice to know if it looks okay or needs changes, specially > changes related to the userspace interface. > > Thanks, > fbl > > On Wed, Jun 30, 2021 at 05:53:49AM -0400, Mark Gray wrote: > > The Open vSwitch kernel module uses the upcall mechanism to send > > packets from kernel space to user space when it misses in the kernel > > space flow table. The upcall sends packets via a Netlink socket. > > Currently, a Netlink socket is created for every vport. In this way, > > there is a 1:1 mapping between a vport and a Netlink socket. > > When a packet is received by a vport, if it needs to be sent to > > user space, it is sent via the corresponding Netlink socket. > > > > This mechanism, with various iterations of the corresponding user > > space code, has seen some limitations and issues: > > > > * On systems with a large number of vports, there is a correspondingly > > large number of Netlink sockets which can limit scaling. > > (https://bugzilla.redhat.com/show_bug.cgi?id=1526306) > > * Packet reordering on upcalls. > > (https://bugzilla.redhat.com/show_bug.cgi?id=1844576) > > * A thundering herd issue. > > (https://bugzilla.redhat.com/show_bug.cgi?id=183) > > > > This patch introduces an alternative, feature-negotiated, upcall > > mode using a per-cpu dispatch rather than a per-vport dispatch. > > > > In this mode, the Netlink socket to be used for the upcall is > > selected based on the CPU of the thread that is executing the upcall. > > In this way, it resolves the issues above as: > > > > a) The number of Netlink sockets scales with the number of CPUs > > rather than the number of vports. > > b) Ordering per-flow is maintained as packets are distributed to > > CPUs based on mechanisms such as RSS and flows are distributed > > to a single user space thread. > > c) Packets from a flow can only wake up one user space thread. > > > > The corresponding user space code can be found at: > > https://mail.openvswitch.org/pipermail/ovs-dev/2021-April/382618.html > > > > Bugzilla: https://bugzilla.redhat.com/1844576 > > Signed-off-by: Mark Gray > > --- > > > > Notes: > > v1 - Reworked based on Flavio's comments: > > * Fixed handling of userspace action case > > * Renamed 'struct dp_portids' > > * Fixed handling of return from kmalloc() > > * Removed check for dispatch type from ovs_dp_get_upcall_portid() > >- Reworked based on Dan's comments: > > * Fixed handling of return from kmalloc() > >- Reworked based on Pravin's comments: > > * Fixed handling of userspace action case > >- Added kfree() in destroy_dp_rcu() to cleanup netlink port ids > > > > include/uapi/linux/openvswitch.h | 8 > > net/openvswitch/actions.c| 6 ++- > > net/openvswitch/datapath.c | 70 +++- > > net/openvswitch/datapath.h | 20 + > > 4 files changed, 101 insertions(+), 3 deletions(-) > > > > diff --git a/include/uapi/linux/openvswitch.h > > b/include/uapi/linux/openvswitch.h > > index 8d16744edc31..6571b57b2268 100644 > > --- a/include/uapi/linux/openvswitch.h > > +++ b/include/uapi/linux/openvswitch.h > > @@ -70,6 +70,8 @@ enum ovs_datapath_cmd { > > * set on the datapath port (for OVS_ACTION_ATTR_MISS). Only valid on > > * %OVS_DP_CMD_NEW requests. A value of zero indicates that upcalls should > > * not be sent. > > + * OVS_DP_ATTR_PER_CPU_PIDS: Per-cpu array of PIDs for upcalls when > > + * OVS_DP_F_DISPATCH_UPCALL_PER_CPU feature is set. > > * @OVS_DP_ATTR_STATS: Statistics about packets that have passed through > > the > > * datapath. Always present in notifications. > > * @OVS_DP_ATTR_MEGAFLOW_STATS: Statistics about mega flow masks usage for > > the > > @@ -87,6 +89,9 @@ enum ovs_datapath_attr { > > OVS_DP_ATTR_USER_FEATURES, /* OVS_DP_F_* */ > > OVS_DP_ATTR_PAD, > > OVS_DP_ATTR_MASKS_CACHE_SIZE, > > + OVS_DP_ATTR_PER_CPU_PIDS, /* Netlink PIDS to receive upcalls in > > per-cpu > > +* dispatch mode > > +*/ > > __OVS_DP_ATTR_MAX > > }; > > > > @@ -127,6 +132,9 @@ struct ovs_vport_stats { > > /* Allow tc offload recirc sharing */ > > #define OVS_DP_F_TC_RECIRC_SHARING (1 << 2) > > > > +/* Allow per-cpu dispatch of upcalls */ > > +#define OVS_DP_F_DISPATCH_UPCALL_PER_CPU (1 << 3) > > + > > /* Fixed logical ports. */ > > #define OVSP_LOCAL ((__u32)0) > > > > diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c > > index ef15d9eb4774..f79679746c62 100644 > > --- a/net/openvswitch/actions.c > > +++ b/net/openvswitch/actions.c > > @@ -924,7 +924,11 @@ static int output_userspace(struct datapath *dp, > > struct sk_buff *skb, > >
Re: [ovs-dev] [PATCH net-next] openvswitch: Introduce per-cpu upcall dispatch
Hi Pravin, Any thoughts on this patch? We are closing OVS 2.16, so it would be nice to know if it looks okay or needs changes, specially changes related to the userspace interface. Thanks, fbl On Wed, Jun 30, 2021 at 05:53:49AM -0400, Mark Gray wrote: > The Open vSwitch kernel module uses the upcall mechanism to send > packets from kernel space to user space when it misses in the kernel > space flow table. The upcall sends packets via a Netlink socket. > Currently, a Netlink socket is created for every vport. In this way, > there is a 1:1 mapping between a vport and a Netlink socket. > When a packet is received by a vport, if it needs to be sent to > user space, it is sent via the corresponding Netlink socket. > > This mechanism, with various iterations of the corresponding user > space code, has seen some limitations and issues: > > * On systems with a large number of vports, there is a correspondingly > large number of Netlink sockets which can limit scaling. > (https://bugzilla.redhat.com/show_bug.cgi?id=1526306) > * Packet reordering on upcalls. > (https://bugzilla.redhat.com/show_bug.cgi?id=1844576) > * A thundering herd issue. > (https://bugzilla.redhat.com/show_bug.cgi?id=183) > > This patch introduces an alternative, feature-negotiated, upcall > mode using a per-cpu dispatch rather than a per-vport dispatch. > > In this mode, the Netlink socket to be used for the upcall is > selected based on the CPU of the thread that is executing the upcall. > In this way, it resolves the issues above as: > > a) The number of Netlink sockets scales with the number of CPUs > rather than the number of vports. > b) Ordering per-flow is maintained as packets are distributed to > CPUs based on mechanisms such as RSS and flows are distributed > to a single user space thread. > c) Packets from a flow can only wake up one user space thread. > > The corresponding user space code can be found at: > https://mail.openvswitch.org/pipermail/ovs-dev/2021-April/382618.html > > Bugzilla: https://bugzilla.redhat.com/1844576 > Signed-off-by: Mark Gray > --- > > Notes: > v1 - Reworked based on Flavio's comments: > * Fixed handling of userspace action case > * Renamed 'struct dp_portids' > * Fixed handling of return from kmalloc() > * Removed check for dispatch type from ovs_dp_get_upcall_portid() >- Reworked based on Dan's comments: > * Fixed handling of return from kmalloc() >- Reworked based on Pravin's comments: > * Fixed handling of userspace action case >- Added kfree() in destroy_dp_rcu() to cleanup netlink port ids > > include/uapi/linux/openvswitch.h | 8 > net/openvswitch/actions.c| 6 ++- > net/openvswitch/datapath.c | 70 +++- > net/openvswitch/datapath.h | 20 + > 4 files changed, 101 insertions(+), 3 deletions(-) > > diff --git a/include/uapi/linux/openvswitch.h > b/include/uapi/linux/openvswitch.h > index 8d16744edc31..6571b57b2268 100644 > --- a/include/uapi/linux/openvswitch.h > +++ b/include/uapi/linux/openvswitch.h > @@ -70,6 +70,8 @@ enum ovs_datapath_cmd { > * set on the datapath port (for OVS_ACTION_ATTR_MISS). Only valid on > * %OVS_DP_CMD_NEW requests. A value of zero indicates that upcalls should > * not be sent. > + * OVS_DP_ATTR_PER_CPU_PIDS: Per-cpu array of PIDs for upcalls when > + * OVS_DP_F_DISPATCH_UPCALL_PER_CPU feature is set. > * @OVS_DP_ATTR_STATS: Statistics about packets that have passed through the > * datapath. Always present in notifications. > * @OVS_DP_ATTR_MEGAFLOW_STATS: Statistics about mega flow masks usage for > the > @@ -87,6 +89,9 @@ enum ovs_datapath_attr { > OVS_DP_ATTR_USER_FEATURES, /* OVS_DP_F_* */ > OVS_DP_ATTR_PAD, > OVS_DP_ATTR_MASKS_CACHE_SIZE, > + OVS_DP_ATTR_PER_CPU_PIDS, /* Netlink PIDS to receive upcalls in > per-cpu > + * dispatch mode > + */ > __OVS_DP_ATTR_MAX > }; > > @@ -127,6 +132,9 @@ struct ovs_vport_stats { > /* Allow tc offload recirc sharing */ > #define OVS_DP_F_TC_RECIRC_SHARING (1 << 2) > > +/* Allow per-cpu dispatch of upcalls */ > +#define OVS_DP_F_DISPATCH_UPCALL_PER_CPU (1 << 3) > + > /* Fixed logical ports. */ > #define OVSP_LOCAL ((__u32)0) > > diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c > index ef15d9eb4774..f79679746c62 100644 > --- a/net/openvswitch/actions.c > +++ b/net/openvswitch/actions.c > @@ -924,7 +924,11 @@ static int output_userspace(struct datapath *dp, struct > sk_buff *skb, > break; > > case OVS_USERSPACE_ATTR_PID: > - upcall.portid = nla_get_u32(a); > + if (dp->user_features & > OVS_DP_F_DISPATCH_UPCALL_PER_CPU) > + upcall.portid = > +ovs_dp_get_upcall_portid(dp, >
Re: [ovs-dev] [PATCH net-next] openvswitch: Introduce per-cpu upcall dispatch
On Wed, Jun 30, 2021 at 05:53:49AM -0400, Mark Gray wrote: > The Open vSwitch kernel module uses the upcall mechanism to send > packets from kernel space to user space when it misses in the kernel > space flow table. The upcall sends packets via a Netlink socket. > Currently, a Netlink socket is created for every vport. In this way, > there is a 1:1 mapping between a vport and a Netlink socket. > When a packet is received by a vport, if it needs to be sent to > user space, it is sent via the corresponding Netlink socket. > > This mechanism, with various iterations of the corresponding user > space code, has seen some limitations and issues: > > * On systems with a large number of vports, there is a correspondingly > large number of Netlink sockets which can limit scaling. > (https://bugzilla.redhat.com/show_bug.cgi?id=1526306) > * Packet reordering on upcalls. > (https://bugzilla.redhat.com/show_bug.cgi?id=1844576) > * A thundering herd issue. > (https://bugzilla.redhat.com/show_bug.cgi?id=183) > > This patch introduces an alternative, feature-negotiated, upcall > mode using a per-cpu dispatch rather than a per-vport dispatch. > > In this mode, the Netlink socket to be used for the upcall is > selected based on the CPU of the thread that is executing the upcall. > In this way, it resolves the issues above as: > > a) The number of Netlink sockets scales with the number of CPUs > rather than the number of vports. > b) Ordering per-flow is maintained as packets are distributed to > CPUs based on mechanisms such as RSS and flows are distributed > to a single user space thread. > c) Packets from a flow can only wake up one user space thread. > > The corresponding user space code can be found at: > https://mail.openvswitch.org/pipermail/ovs-dev/2021-April/382618.html > > Bugzilla: https://bugzilla.redhat.com/1844576 > Signed-off-by: Mark Gray > --- It looks good and works for me. Acked-by: Flavio Leitner Thanks Mark! fbl ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next] openvswitch: Introduce per-cpu upcall dispatch
Hi Mark, Thanks for addressing the comments. The patch looks good to me and I plan to test it tomorrow. fbl On Wed, Jun 30, 2021 at 05:53:49AM -0400, Mark Gray wrote: > The Open vSwitch kernel module uses the upcall mechanism to send > packets from kernel space to user space when it misses in the kernel > space flow table. The upcall sends packets via a Netlink socket. > Currently, a Netlink socket is created for every vport. In this way, > there is a 1:1 mapping between a vport and a Netlink socket. > When a packet is received by a vport, if it needs to be sent to > user space, it is sent via the corresponding Netlink socket. > > This mechanism, with various iterations of the corresponding user > space code, has seen some limitations and issues: > > * On systems with a large number of vports, there is a correspondingly > large number of Netlink sockets which can limit scaling. > (https://bugzilla.redhat.com/show_bug.cgi?id=1526306) > * Packet reordering on upcalls. > (https://bugzilla.redhat.com/show_bug.cgi?id=1844576) > * A thundering herd issue. > (https://bugzilla.redhat.com/show_bug.cgi?id=183) > > This patch introduces an alternative, feature-negotiated, upcall > mode using a per-cpu dispatch rather than a per-vport dispatch. > > In this mode, the Netlink socket to be used for the upcall is > selected based on the CPU of the thread that is executing the upcall. > In this way, it resolves the issues above as: > > a) The number of Netlink sockets scales with the number of CPUs > rather than the number of vports. > b) Ordering per-flow is maintained as packets are distributed to > CPUs based on mechanisms such as RSS and flows are distributed > to a single user space thread. > c) Packets from a flow can only wake up one user space thread. > > The corresponding user space code can be found at: > https://mail.openvswitch.org/pipermail/ovs-dev/2021-April/382618.html > > Bugzilla: https://bugzilla.redhat.com/1844576 > Signed-off-by: Mark Gray > --- > > Notes: > v1 - Reworked based on Flavio's comments: > * Fixed handling of userspace action case > * Renamed 'struct dp_portids' > * Fixed handling of return from kmalloc() > * Removed check for dispatch type from ovs_dp_get_upcall_portid() >- Reworked based on Dan's comments: > * Fixed handling of return from kmalloc() >- Reworked based on Pravin's comments: > * Fixed handling of userspace action case >- Added kfree() in destroy_dp_rcu() to cleanup netlink port ids > > include/uapi/linux/openvswitch.h | 8 > net/openvswitch/actions.c| 6 ++- > net/openvswitch/datapath.c | 70 +++- > net/openvswitch/datapath.h | 20 + > 4 files changed, 101 insertions(+), 3 deletions(-) > > diff --git a/include/uapi/linux/openvswitch.h > b/include/uapi/linux/openvswitch.h > index 8d16744edc31..6571b57b2268 100644 > --- a/include/uapi/linux/openvswitch.h > +++ b/include/uapi/linux/openvswitch.h > @@ -70,6 +70,8 @@ enum ovs_datapath_cmd { > * set on the datapath port (for OVS_ACTION_ATTR_MISS). Only valid on > * %OVS_DP_CMD_NEW requests. A value of zero indicates that upcalls should > * not be sent. > + * OVS_DP_ATTR_PER_CPU_PIDS: Per-cpu array of PIDs for upcalls when > + * OVS_DP_F_DISPATCH_UPCALL_PER_CPU feature is set. > * @OVS_DP_ATTR_STATS: Statistics about packets that have passed through the > * datapath. Always present in notifications. > * @OVS_DP_ATTR_MEGAFLOW_STATS: Statistics about mega flow masks usage for > the > @@ -87,6 +89,9 @@ enum ovs_datapath_attr { > OVS_DP_ATTR_USER_FEATURES, /* OVS_DP_F_* */ > OVS_DP_ATTR_PAD, > OVS_DP_ATTR_MASKS_CACHE_SIZE, > + OVS_DP_ATTR_PER_CPU_PIDS, /* Netlink PIDS to receive upcalls in > per-cpu > + * dispatch mode > + */ > __OVS_DP_ATTR_MAX > }; > > @@ -127,6 +132,9 @@ struct ovs_vport_stats { > /* Allow tc offload recirc sharing */ > #define OVS_DP_F_TC_RECIRC_SHARING (1 << 2) > > +/* Allow per-cpu dispatch of upcalls */ > +#define OVS_DP_F_DISPATCH_UPCALL_PER_CPU (1 << 3) > + > /* Fixed logical ports. */ > #define OVSP_LOCAL ((__u32)0) > > diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c > index ef15d9eb4774..f79679746c62 100644 > --- a/net/openvswitch/actions.c > +++ b/net/openvswitch/actions.c > @@ -924,7 +924,11 @@ static int output_userspace(struct datapath *dp, struct > sk_buff *skb, > break; > > case OVS_USERSPACE_ATTR_PID: > - upcall.portid = nla_get_u32(a); > + if (dp->user_features & > OVS_DP_F_DISPATCH_UPCALL_PER_CPU) > + upcall.portid = > +ovs_dp_get_upcall_portid(dp, > smp_processor_id()); > + else > +
Re: [ovs-dev] [PATCH net-next] openvswitch: Introduce per-cpu upcall dispatch
Hi Mark, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on net-next/master] url: https://github.com/0day-ci/linux/commits/Mark-Gray/openvswitch-Introduce-per-cpu-upcall-dispatch/20210630-175435 base: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git b6df00789e2831fff7a2c65aa7164b2a4dcbe599 config: sh-randconfig-s032-20210630 (attached as .config) compiler: sh4-linux-gcc (GCC) 9.3.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # apt-get install sparse # sparse version: v0.6.3-341-g8af24329-dirty # https://github.com/0day-ci/linux/commit/24234cf5ef3fdbb1cad04189ba6459292f84180e git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Mark-Gray/openvswitch-Introduce-per-cpu-upcall-dispatch/20210630-175435 git checkout 24234cf5ef3fdbb1cad04189ba6459292f84180e # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=sh If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot sparse warnings: (new ones prefixed by >>) >> net/openvswitch/datapath.c:169:17: sparse: sparse: incorrect type in >> argument 1 (different address spaces) @@ expected void const * @@ >> got struct dp_nlsk_pids [noderef] __rcu *upcall_portids @@ net/openvswitch/datapath.c:169:17: sparse: expected void const * net/openvswitch/datapath.c:169:17: sparse: got struct dp_nlsk_pids [noderef] __rcu *upcall_portids vim +169 net/openvswitch/datapath.c 160 161 static void destroy_dp_rcu(struct rcu_head *rcu) 162 { 163 struct datapath *dp = container_of(rcu, struct datapath, rcu); 164 165 ovs_flow_tbl_destroy(>table); 166 free_percpu(dp->stats_percpu); 167 kfree(dp->ports); 168 ovs_meters_exit(dp); > 169 kfree(dp->upcall_portids); 170 kfree(dp); 171 } 172 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH net-next] openvswitch: Introduce per-cpu upcall dispatch
The Open vSwitch kernel module uses the upcall mechanism to send packets from kernel space to user space when it misses in the kernel space flow table. The upcall sends packets via a Netlink socket. Currently, a Netlink socket is created for every vport. In this way, there is a 1:1 mapping between a vport and a Netlink socket. When a packet is received by a vport, if it needs to be sent to user space, it is sent via the corresponding Netlink socket. This mechanism, with various iterations of the corresponding user space code, has seen some limitations and issues: * On systems with a large number of vports, there is a correspondingly large number of Netlink sockets which can limit scaling. (https://bugzilla.redhat.com/show_bug.cgi?id=1526306) * Packet reordering on upcalls. (https://bugzilla.redhat.com/show_bug.cgi?id=1844576) * A thundering herd issue. (https://bugzilla.redhat.com/show_bug.cgi?id=183) This patch introduces an alternative, feature-negotiated, upcall mode using a per-cpu dispatch rather than a per-vport dispatch. In this mode, the Netlink socket to be used for the upcall is selected based on the CPU of the thread that is executing the upcall. In this way, it resolves the issues above as: a) The number of Netlink sockets scales with the number of CPUs rather than the number of vports. b) Ordering per-flow is maintained as packets are distributed to CPUs based on mechanisms such as RSS and flows are distributed to a single user space thread. c) Packets from a flow can only wake up one user space thread. The corresponding user space code can be found at: https://mail.openvswitch.org/pipermail/ovs-dev/2021-April/382618.html Bugzilla: https://bugzilla.redhat.com/1844576 Signed-off-by: Mark Gray --- Notes: v1 - Reworked based on Flavio's comments: * Fixed handling of userspace action case * Renamed 'struct dp_portids' * Fixed handling of return from kmalloc() * Removed check for dispatch type from ovs_dp_get_upcall_portid() - Reworked based on Dan's comments: * Fixed handling of return from kmalloc() - Reworked based on Pravin's comments: * Fixed handling of userspace action case - Added kfree() in destroy_dp_rcu() to cleanup netlink port ids include/uapi/linux/openvswitch.h | 8 net/openvswitch/actions.c| 6 ++- net/openvswitch/datapath.c | 70 +++- net/openvswitch/datapath.h | 20 + 4 files changed, 101 insertions(+), 3 deletions(-) diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h index 8d16744edc31..6571b57b2268 100644 --- a/include/uapi/linux/openvswitch.h +++ b/include/uapi/linux/openvswitch.h @@ -70,6 +70,8 @@ enum ovs_datapath_cmd { * set on the datapath port (for OVS_ACTION_ATTR_MISS). Only valid on * %OVS_DP_CMD_NEW requests. A value of zero indicates that upcalls should * not be sent. + * OVS_DP_ATTR_PER_CPU_PIDS: Per-cpu array of PIDs for upcalls when + * OVS_DP_F_DISPATCH_UPCALL_PER_CPU feature is set. * @OVS_DP_ATTR_STATS: Statistics about packets that have passed through the * datapath. Always present in notifications. * @OVS_DP_ATTR_MEGAFLOW_STATS: Statistics about mega flow masks usage for the @@ -87,6 +89,9 @@ enum ovs_datapath_attr { OVS_DP_ATTR_USER_FEATURES, /* OVS_DP_F_* */ OVS_DP_ATTR_PAD, OVS_DP_ATTR_MASKS_CACHE_SIZE, + OVS_DP_ATTR_PER_CPU_PIDS, /* Netlink PIDS to receive upcalls in per-cpu +* dispatch mode +*/ __OVS_DP_ATTR_MAX }; @@ -127,6 +132,9 @@ struct ovs_vport_stats { /* Allow tc offload recirc sharing */ #define OVS_DP_F_TC_RECIRC_SHARING (1 << 2) +/* Allow per-cpu dispatch of upcalls */ +#define OVS_DP_F_DISPATCH_UPCALL_PER_CPU (1 << 3) + /* Fixed logical ports. */ #define OVSP_LOCAL ((__u32)0) diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c index ef15d9eb4774..f79679746c62 100644 --- a/net/openvswitch/actions.c +++ b/net/openvswitch/actions.c @@ -924,7 +924,11 @@ static int output_userspace(struct datapath *dp, struct sk_buff *skb, break; case OVS_USERSPACE_ATTR_PID: - upcall.portid = nla_get_u32(a); + if (dp->user_features & OVS_DP_F_DISPATCH_UPCALL_PER_CPU) + upcall.portid = + ovs_dp_get_upcall_portid(dp, smp_processor_id()); + else + upcall.portid = nla_get_u32(a); break; case OVS_USERSPACE_ATTR_EGRESS_TUN_PORT: { diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c index bc164b35e67d..8d54fa323543 100644 --- a/net/openvswitch/datapath.c +++ b/net/openvswitch/datapath.c @@ -166,6 +166,7 @@ static void destroy_dp_rcu(struct rcu_head *rcu)