Hello again, While we have confirmed this patch fixes a real issue in production and it passes the regular functional tests, I was apparently a bit quick out the door with this patch. It appears to break the system tests due to some unexpected WRN level logging: ../../tests/system-traffic.at:23: check_logs --- /dev/null 2023-05-10 17:48:04.088000000 +0200 +++ /tmp/autopkgtest.7DpozZ/build.MGV/src/_debian/tests/system-kmod-testsuite.dir/at-groups/1/stdout 2023-05-10 18:02:04.040410777 +0200 @@ -0,0 +1 @@ +2023-05-10T16:02:01.887Z|00034|netdev_linux|WARN|query br0 qdisc failed (Resource temporarily unavailable)
I will check and send a v2 asap, apologies for the inconvenience. -- Frode Nordahl On Wed, May 10, 2023 at 5:12 PM Frode Nordahl <frode.nord...@canonical.com> wrote: > > The tc module combines the use of the `tc_transact` helper > function for communication with the in-kernel tc infrastructure > with assertions on the reply data by `ofpbuf_at_assert` on the > received data prior to further processing. > > `tc_transact` in turn calls `nl_transact`, which via > `nl_transact_multiple__` ultimately calls and handles return > value from `recvmsg`. On error a check for EAGAIN is performed > and a consequence of this condition is effectively to provide a > non-error (0) result and an empty reply buffer. > > Before this change, the `tc_transact` and, consumers of it, were > unaware of this condition. The use of assertions on the reply > buffer can as such lead to a fatal crash of OVS. > > To be fair, the behavior of `nl_transact` when handling an EAGAIN > return is currently not documented, so this change also adds that > documentation. > > For the record, the symptom of the crash is this in the log: > EMER|../include/openvswitch/ofpbuf.h:194: assertion offset + size <= b->size > failed in ofpbuf_at_assert() > > And an excerpt of the backtrace looks like this: > 0x0000561dac1396d1 in ofpbuf_at_assert (b=0x7fb650005d20, b=0x7fb650005d20, > offset=16, size=20) at ../include/openvswitch/ofpbuf.h:194 > tc_replace_flower (id=<optimized out>, flower=<optimized out>) at > ../lib/tc.c:3223 > 0x0000561dac128155 in netdev_tc_flow_put (netdev=0x561dacf91840, > match=<optimized out>, actions=<optimized out>, actions_len=<optimized out>, > ufid=<optimized out>, info=<optimized out>, stats=<optimized out>) at > ../lib/netdev-offload-tc.c:2096 > 0x0000561dac117541 in netdev_flow_put (stats=<optimized out>, > info=0x7fb65b7ba780, ufid=<optimized out>, act_len=<optimized out>, > actions=<optimized out>, > match=0x7fb65b7ba980, netdev=0x561dacf91840) at ../lib/netdev-offload.c:257 > parse_flow_put (put=0x7fb65b7bcc50, dpif=0x561dad0ad550) at > ../lib/dpif-netlink.c:2297 > try_send_to_netdev (op=0x7fb65b7bcc48, dpif=0x561dad0ad550) at > ../lib/dpif-netlink.c:2384 > > Reported-At: https://launchpad.net/bugs/2018500 > Fixes f98e418fbdb6 (tc: Add tc flower functions) > Fixes 407556ac6c90 (netlink-socket: Avoid forcing a reply for final message > in a transaction.) > Signed-off-by: Frode Nordahl <frode.nord...@canonical.com> > --- > lib/dpif-netlink.c | 4 +++- > lib/netlink-socket.c | 5 +++++ > lib/tc.c | 8 ++++++++ > 3 files changed, 16 insertions(+), 1 deletion(-) > > diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c > index 60bd39643..75537b2f8 100644 > --- a/lib/dpif-netlink.c > +++ b/lib/dpif-netlink.c > @@ -2340,7 +2340,9 @@ parse_flow_put(struct dpif_netlink *dpif, struct > dpif_flow_put *put) > } > netdev_set_hw_info(oor_netdev, HW_INFO_TYPE_OOR, true); > } > - level = (err == ENOSPC || err == EOPNOTSUPP) ? VLL_DBG : VLL_ERR; > + level = (err == EAGAIN || > + err == ENOSPC || > + err == EOPNOTSUPP) ? VLL_DBG : VLL_ERR; > VLOG_RL(&rl, level, "failed to offload flow: %s: %s", > ovs_strerror(err), > (oor_netdev ? oor_netdev->name : dev->name)); > diff --git a/lib/netlink-socket.c b/lib/netlink-socket.c > index 80da20d9f..e40caafe3 100644 > --- a/lib/netlink-socket.c > +++ b/lib/netlink-socket.c > @@ -1798,6 +1798,11 @@ nl_pool_release(struct nl_sock *sock) > * > * 2. Resending the request causes it to be re-executed, so the request > * needs to be idempotent. > + * > + * 3. In the event that the kernel is to busy to handle the request to > + * receive the response (i.e. EAGAIN), this function will still > return > + * 0. The caller can check for this condition by checking for a zero > + * size of the 'replyp' ofpbuf buffer. > */ > int > nl_transact(int protocol, const struct ofpbuf *request, > diff --git a/lib/tc.c b/lib/tc.c > index 5c32c6f97..8e229f93c 100644 > --- a/lib/tc.c > +++ b/lib/tc.c > @@ -237,11 +237,19 @@ static void request_from_tcf_id(struct tcf_id *id, > uint16_t eth_type, > } > } > > +/* Please acquaint yourself with the documentation of the `nl_transact` > + * function in the netlink-sock module before making use of this function. > + */ > int > tc_transact(struct ofpbuf *request, struct ofpbuf **replyp) > { > int error = nl_transact(NETLINK_ROUTE, request, replyp); > ofpbuf_uninit(request); > + > + if (!error && replyp && !(*replyp)->size) { > + return EAGAIN; > + } > + > return error; > } > > -- > 2.39.2 > > _______________________________________________ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev