On Fri, 8 Oct 2010, Attilio Rao wrote:

GENERAL FRAMEWORK ARCHITECTURE

Netdump is composed, right now, by an userland "server" and a kernel "client". The former is run on the target machine (where the dump will phisically happen) and it is responsible for receiving  the packets containing coredumps frame and for correctly writing them on-disk. The latter is part of the kernel installed on the source machine (where the dump is initiated) and is responsible for building correctly UDP packets containing the coredump frames, pushing through the network interface and routing them appropriately.

Hi Attilio:

Network dumps would be a great addition to the FreeBSD debugging suite! A few casual comments and questions, I need to spend some more time pondering the implications of the current netdump design later in the week.

(1) Did you consider using tftp as the network dump protocol, rather than a custom protocol? It's also a simple UDP-based, ACKed file transfer protocol, with the advantage that it's widely supported by exiting tftp daemons, packet sniffers, security devices, etc. This wouldn't require using a stock tftpd, although that would certainly be a benefit as well. The post-processing of crashdumps that seems to happen in the netdump server could, presumably, happen "offline" as easily...?

(2) Have you thought about adding a checksum to the dump format, since packets are going over the wire on UDP, etc? Even an MD5 sum wouldn't be too hard to arrange: all the code is in the kernel already, requires relatively little state storage, and is designed for streamed data.

(3) As the bounds checking/etc behavior in dumping grows more complex, it seems a shame to replicate it in architecture-specific code. Could we pull the mediaoffset/mediasize checking into common code? Also, a reserved size/offset of "0" meaning "no limit" sounds slightly worrying; it might be slightly more conservative to add a flags field to dumperinfo and have a flag indicating "size is no object" for netdump, rather than overloading the existing fields.

Some specific patch comments:

+ * XXX: This should be split into machdep and non-machdep parts

What MD parts are in the file?

The sysctl parts of the patch have a number of issues:

- Please use sysctl_handle_string rather than manual calls and string bounds
  checking with SYSCTL_IN/SYSCTL_OUT.  In general, type-specific sysctl
  handlers are named sysctl_handle_<type>, which I recommend here as well.

- Please use stack-local, fixed-length string buffers as the
  source/destination of copyin/copyout on ifnet names, rather than the fields
  in the structure itself.  We support interface renaming, so the field can
  change, and sysctl has the potential to block for extended periods
  mid-copyin/copyout.

- Because we support ifnet renaming, "none" is probably not a good reserved
  name.  Could you use the empty string or something similar, or just a flag
  to indicate that netdump is disabled?

- "ifp" rather than "ifn" and "nic" is the preferred variable name for ifnet
  pointers throughout the kernel.

- ifnets can, and are, removeable at runtime.  We have a refcount, but that's
  not really what you want.  I suggest simply looking up the ifnet by name
  when it's needed during a dump or for sysctl output, rather than maintaining
  a long-term pointer, which can become stale.  Especially with the
  auto-detect code.

- Since sysctl_nic is only used once, I'd prefer it were inlined into a
  use-specific handler function, avoiding the arg1/arg2 complications that
  make this code a bit harder to read.  sysctl_ip is useful because it's used
  more than once, though.  However, it also very much wants you to call
  sysctl_handle_string!

+sysctl_force_crash(SYSCTL_HANDLER_ARGS)

Does this belong in the netdump code? We already have some of these options in debug.kdb.*, but perhaps others should be added there as well.

+               /*
+                * get and fill a header mbuf, then chain data as an extended
+                * mbuf.
+                */
+               MGETHDR(m, M_DONTWAIT, MT_DATA);

The idea of calling into the mbuf allocator in this context is just freaky, and may have some truly awful side effects. I suppose this is the cost of trying to combine code paths in the network device driver rather than have an independent path in the netdump case, but it's quite unfortunate and will significantly reduce the robustness of netdumps in the face of, for example, mbuf starvation.

+       if (ntohs(ah->ar_hrd) != ARPHRD_ETHER &&
+           ntohs(ah->ar_hrd) != ARPHRD_IEEE802 &&
+           ntohs(ah->ar_hrd) != ARPHRD_ARCNET &&
+           ntohs(ah->ar_hrd) != ARPHRD_IEEE1394) {
+               NETDDEBUG("nd_handle_arp: unknown hardware address fmt "
+                   "0x%2D)\n", (unsigned char *)&ah->ar_hrd, "");
+               return;
+       }

Are you sure you don't want to just check for ETHER here?

+       /* XXX: Probably should check if we're the recipient MAC address */
+       /* Done ethernet processing. Strip off the ethernet header */

Yes, quite probably.  What if you panic in promiscuous mode?

+       /*
+        * The first write (at offset 0) is the kernel dump header.  Flag it
+        * for the server to treat specially.  XXX: This doesn't strip out the
+        * footer KDH, although it shouldn't hurt anything.
+        */

The footer allows us to confirm that the tail end of a dump matches the beginning; while probably not strictly necessary in this scenario, it's not a bad idea given the lack of a checksum.

+       /* Default the nic to the first available interface */
+       if ((ifn = TAILQ_FIRST(&ifnet)) != NULL) do {
+               if ((ifn->if_flags & IFF_UP) == 0)
+                       continue;

More locking needed.

I'm not sure I like the idea of defaulting to net dumping out the first interface -- in the future, we'll want to compile netdump into the kernel in GENERIC so that it's always available. But we won't want to default to doing the above for a number of reasons. This means the defaults need to be sensible and quite conservative (i.e., disabled and non-autoconfigured).

+       /* Default the client to the first IP on nd_nic */
+       if ((ifa = TAILQ_FIRST(&nd_nic->if_addrhead)) != NULL) do {
+               if (ifa->ifa_addr->sa_family != AF_INET) {
+                       continue;
+               }
+               nd_client = ((struct sockaddr_in *)ifa->ifa_addr)->sin_addr;
+       } while ((ifa = TAILQ_NEXT(ifa, ifa_link)) != NULL &&
+                       nd_client.s_addr == INADDR_ANY);

More locking needed here too. I'm guess this code is forward-ported from an older FreeBSD version without locking here, so you will want to check for other changes in the replicated code paths, such as input / arp / etc handling.

And is this really a good idea? Addresses change, and I think you aren't registering a callback to pick up the change in the event you're using an auto-configured address. In some ways, in the same way you might want to look up the ifnet to use at dump-time, you might simultaneously want to auto-configure an address then, if auto-configuration is selected (i.e., netdump is enabled and no IP is configured).

-       void    *if_pspare[7];
+       struct  netdump_methods *if_ndumpfuncs; /* netdump virtual methods */
+       void    *if_pspare[6];
        int     if_ispare[4];

In HEAD, at least, you should add your field and not use the spare. The spare should only be used on a merge to a KBI-stable branch (such as 8.x).

I need to ponder your changes to ifnet and to the drivers some more; I recognize the benefits of maximal code alignment, but I worry a lot about these code paths having side effects (not least, calls into memory allocators, which in turn can enter VM, etc). I wonder if, given that, we need to teach the mbuf allocator to be much more conservative if netdumping is active...

There are style bugs throughout.

Robert
_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Reply via email to