On Fri, May 26, 2023 at 10:31 PM Ilya Maximets <i.maxim...@ovn.org> wrote: > > On 5/26/23 22:07, Ilya Maximets wrote: > > CC: Marcelo. > > > > On 5/26/23 22:02, Ilya Maximets wrote: > >> On 5/26/23 21:19, Eelco Chaudron wrote: > >>> > >>> > >>> Send from my phone > >>> > >>>> Op 26 mei 2023 om 20:52 heeft Ilya Maximets <i.maxim...@ovn.org> het > >>>> volgende geschreven: > >>>> > >>>> On 5/26/23 20:43, Ilya Maximets wrote: > >>>>> On 5/23/23 12:39, Frode Nordahl wrote: > >>>>>> The tc module combines the use of the `tc_transact` helper > >>>>>> function for communication with the in-kernel tc infrastructure > >>>>>> with assertions on the reply data by `ofpbuf_at_assert` on the > >>>>>> received data prior to further processing. > >>>>>> > >>>>>> `tc_transact` in turn calls `nl_transact`, which via > >>>>>> `nl_transact_multiple__` ultimately calls and handles return > >>>>>> value from `recvmsg`. On error a check for EAGAIN is performed > >>>>>> and a consequence of this condition is effectively to provide a > >>>>>> non-error (0) result and an empty reply buffer. > >>>>> > >>>>> Hi, Frode, others.
Hello, Ilya, thank you for weighing in on the issue and thread, much appreciated. > >>>>> I took a closer look at the patch and the code in the netlink-socket. > >>>>> IIUC, the EAGAIN here is not a result of operation that we're > >>>>> requesting, > >>>>> it's just a EAGAIN on a non-blocking (MSG_DONTWAIT) netlink socket while > >>>>> trying to read. The reply to that transaction will arrive eventually > >>>>> and > >>>>> we will interpret it later as a reply to a different netlink > >>>>> transaction. > >>>>> > >>>>> The issue appears to be introduced in the following commit: > >>>>> 407556ac6c90 ("netlink-socket: Avoid forcing a reply for final message > >>>>> in a transaction.") > >>>>> > >>>>> And it was a performance optimization introduced as part of the set: > >>>>> https://mail.openvswitch.org/pipermail/ovs-dev/2012-April/260122.html > >>>>> > >>>>> The problem is that we can't tell apart socket EAGAIN if there is > >>>>> nothing > >>>>> to wait (requests didn't need a reply) and EAGAIN if kernel just didn't > >>>>> manage to reply yet. I did find that change suspicious, and it is listed as one of the commits fixed in my patch, although I'm only addressing it by documenting its behavior as I do not have full visibility of the blast radius for changing it. > >>>> It's still strance though that reply is delayed. Typically netlink > >>>> replies are formed right in the request handler. Did you manage to > >>>> figure out what is causing this in tc specifically? It wasn't an issue > >>>> for an OVS and other common families for many years. > >>>> > >>> > >>> Replying from my phone so I did not look at any code. However when I > >>> looked at the code during the review I noticed it will ignore earlier > >>> (none processed messages) replies based in the receive loop on the > >>> transaction ID. So I do not think it should be an issue of messages > >>> getting out of sync. > >> > >> That makes sense, OK. Our challenge here is that this crash is happening in a production environment, and we are currently unable to reproduce it in a lab environment. The consequence being that production has disabled the feature. So the proposed patch is in my mind a step on the way to figure out what is actually going on by allowing the system to be up and capture more logs without interruption of service. > >>> > >>> I assumed the delay happens when we request something to a hardware > >>> offload drive which is replying async due to it being busy. > >> > >> I'm not sure it is actually possible for netlink replies to be delivered > >> asynchronously. Looking at the kernel code, by the time the original > >> request is processed, the reply should already be sent. > >> > >> A simpler explanation, would be that it just doesn't reply for some reason > >> to a request that needs a reply. e.g. some incorrect error handling in the > >> kernel that leaves request unanswered. > >> > >> One potential issue I see in the tc_new_tfilter() implementation is that > >> if tfilter_notify() will fail for any reason, the function will still > >> return 0, because tc_new_tfilter() doesn't check the result. And no reply > >> will be generated, because the return value is zero (success) and the > >> rtnetlink_send() was never called due to an error in the tfilter_notify(). > >> > >> I'm not sure if that's what happening here, but it is one option how we > >> end up with a successful netlink transaction with NLM_F_ECHO set, but no > >> reply on the socket. > >> > >> A potential fix would be (not tested): > >> > >> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c > >> index 2621550bfddc..9c334dbf9c5f 100644 > >> --- a/net/sched/cls_api.c > >> +++ b/net/sched/cls_api.c > >> @@ -2311,7 +2311,7 @@ static int tc_new_tfilter(struct sk_buff *skb, > >> struct nlmsghdr *n, > >> err = tp->ops->change(net, skb, tp, cl, t->tcm_handle, tca, &fh, > >> flags, extack); > >> if (err == 0) { > >> - tfilter_notify(net, skb, n, tp, block, q, parent, fh, > >> + err = tfilter_notify(net, skb, n, tp, block, q, parent, fh, > >> RTM_NEWTFILTER, false, rtnl_held, extack); > >> tfilter_put(tp, fh); > >> /* q pointer is NULL for shared blocks */ > >> --- > > ^^ This is definitely not a correct solution though, because we already > changed the filter, but we just failed to dump it back. Some other > solution is needed. Most likely, it needs to be re-tried until it succeeds. > Or the change() will need to be reverted and a failure reported to the > userspace. Good catch, the cited code does indeed look problematic. > >> > >> Solution on the OVS side might be to fail transactions that requested a > >> reply, but didn't get one. Async replies sound like something that should > >> not be allowed and likely is not allowed. Even if this turns out to be a bug on the kernel side, we ought to protect ourself from the condition. Just to understand your proposal, you are thinking partial revert of 407556ac6c90 and add a check for no/empty reply in the nl_transact? -- Frode Nordahl > >> Best regards, Ilya Maximets. > >> > >>> > >>> Maybe Frode can get some perf traces to confirm this. > >>> > >>> //Eelco > >>>>> > >>>>> The real solution would be to revert commit 407556ac6c90 or be a bit > >>>>> smarter and wait for reply only on requests that specify a reply buffer. > >>>>> > >>>>> Or am I missing something? > >>>>> > >>>>> Best regards, Ilya Maximets. > >>>> > >> > > > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev