Re: [PATCH 3/3] virarptable: End parsing earlier in case of NLMSG_DONE

Laine Stump Fri, 16 Aug 2024 22:08:13 -0700

On 8/16/24 8:45 AM, Martin Kletzander wrote:

Check for the last multipart message right as the first thing.  The
presumption probably was that the last message might still contain a
payload we want to parse.  However that cannot be true since that would
have to be a type RTM_NEWNEIGH.

This was not caught because older
kernels were note sending NLMSG_DONE and probably relied on the fact
that the parsing just stops after all the messages are walked through,
which the NLMSG_OK macro successfully did.


Resolves: https://issues.redhat.com/browse/RHEL-52449


Also:

Resolves: https://bugzilla.redhat.com/2302245
Fixes: a176d67cdfaf5b8237a7e3a80d8be0e6bdf2d8fd

(Hmm, and BTW I had suggested that patch 2/3 should have that Fixes:line, but I guess it can't be *totally* fixed until this patch, sinceprior to that we're checking the values of a struct ndmsg beforechecking the message type (i.e. it might not be an RTM_NEWNEIGH))

Signed-off-by: Martin Kletzander <mklet...@redhat.com>
---
So technically this still has some issues, maybe.

I'm pretty sure there are other issues with our netlink code, but theeffect of those issues is unknown, and unrelated to the current bugwhich your code fixes. So, IMO, "this" doesn't have some issues :-)

I could not find if our usage of libnl3 makes it easier for us so that we do not
have to check for NLMSG_{ERROR,OVERRUN,NOOP} or whether these checks should be
here as well

Well, virArpTableGet() calls virNetlinkGetNeighbor(), which allowsresponses where the *first* message in the response is NLMSG_NEWNEIGH orNLMSG_ERROR, and that calls virNetlinkTalk() which fails on anyNLMSG_ERROR that isn't 0 (an NLMSG_ERRO with error==0 is just an ackpacket). virNetlinkTalk() calls virNetLinkCommand(), which just callsnl_recv(), and as far as I can tell from the libnl API docs, there's nospecial handling there for any type of message.

If yes, then we should add them.  And we have (some of) these
checks elsewhere in the code, so "maybe".

Have you ever tried to assimilate the libnl documentation?(https://www.infradead.org/~tgr/libnl/doc/api/) I'll refer you to mycomment further down marked with "[*]".


Another thing is that we could avoid such errors by using nl_socket_set_cb(),
calling nl_recvmsgs_default() and then parsing only the valid messages in a
callback.

This somehow reminds me of code that we had in netcf that I think neededto get a list of interfaces, and we used the nl_cache_* functions forthis, which in the end (I thought) made the code even more complicatedthan if we'd just used lower level functions. There is some very usefulstuff in libnl, but I think some times it complicates in the name of"simplifying".


I guess I'm trying to say "Maybe, but don't get your hopes up".

 On top of that we could have an abstraction on top this to utilise in
all the netlink dumps we do, ditching our current abstraction which was a bit
hard for me to go through, to be honest.

Everything about netlink from the very beginning has been "hard to gothrough", especially when mixed in with libnl. I think originally therewere a couple different usages of netlink/libnl in libvirt authored by acouple different people, and then another usage or two came along witheach new user trying to refactor existing code to work for them too, butwithout ever any underlying plan. Or something like that. The result isthat (IMO) trying to understand some of our netlink code might do youmore harm than good :-/. Not that I'm cynical or anything.

(P.S. Ditching the current abstraction(s) and making something new thatworks better for everyone would be wonderful to see. As long as someoneelse did it. The dreadful side of that is that I may soon need to addyet another usage of netlink to the code, and that could mean thateither I will a) go through the pain and time consuming frustration ofdoing said ditching/rewriting so that my new code can be nicer, or b)live with the embarrasment of adding yet another hacky house of netlinkcards to the code :-/)


And of course there might be other places in our codebase that expect the same
behaviour as this code did and we should fix 'em all.

One other problem I noticed during my short dip into our netlink codelast week is that in all cases except getting VFINFO (for SRIOVdevices), we only read a single response packet after we make a request,when any response could have multiple packets.

 After all the debugging
for this piece I did not even check for those, maybe if this gets in I'll have a
long think about it.

[*]Good luck with that. Be sure the beer cooler next to your desk iswell stocked.

  src/util/virarptable.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/util/virarptable.c b/src/util/virarptable.c
index 8e805fb35332..604019c62a37 100644
--- a/src/util/virarptable.c
+++ b/src/util/virarptable.c
@@ -84,6 +84,9 @@ virArpTableGet(void)
          int len = nh->nlmsg_len;
          void *addr;

+ if (nh->nlmsg_type == NLMSG_DONE)

+            return table;

I would prefer "break;" here rather than "return table;" just in casesomeone in the future adds something that requires some sort of cleanupcode at the end of the function. This is fine too though.


Reviewed-by: Laine Stump <la...@redhat.com>

+
          if (len < NLMSG_SPACE(sizeof(*r))) {
              virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
                             _("wrong nlmsg len"));
@@ -98,9 +101,6 @@ virArpTableGet(void)
              (!(r->ndm_state == NUD_STALE || r->ndm_state == NUD_REACHABLE)))
              continue;

- if (nh->nlmsg_type == NLMSG_DONE)

-            return table;
-
          VIR_WARNINGS_NO_CAST_ALIGN
          parse_rtattr(tb, NDA_MAX, NDA_RTA(r), NLMSG_PAYLOAD(nh, sizeof(*r)));
          VIR_WARNINGS_RESET

Re: [PATCH 3/3] virarptable: End parsing earlier in case of NLMSG_DONE

Reply via email to