OL> obj_sock_users() is only required for objects that are to be OL> tested for leaks in full container checkpoint. I don't think it's OL> needed, because the relation sock <-> file is 1-to-1. (If it OL> were, then you would also need to collect sockets).
Okay, that also answers the question posed by John in the other thread. OL> (Actually, I will remove checkpoint_bad and restore_bad and modify OL> checkpoint_obj() and restore_obj() to fail if the respective OL> method is NULL). Okay, should I expect that to show up in v17-dev soon? OL> Nit: I already got confused a few times because of the similar OL> names. Perhaps change CKPT_HDR_SOCKET_BUFFERS to OL> CKPT_HDR_SOCKET_QUEUE (and adjust the header name accordingly). Agreed. I remember writing that and remarking to myself how ridiculous of a name it was, but never went back to change it. OL> Unless/until we decide otherwise, let's keep all ckpt_hdr_xxx OL> definitions in a single place -- checkpoint_hdr.h That's how I initially had it, but Serge asked for it to be moved into the network headers. Serge? OL> Unless you intend to use the struct name 'ckpt_socket' elsewhere, OL> can we get rid of it (leaving only 'struct {') ? Yep. OL> Can you use fill_name() from checkpoint/file.c ? Yeah, looks like it. OL> This direct call to ->getname skips security checks through OL> getsockname(). You may want to refactor sys_getsockname() similar OL> to sys_bind(). Okay. >> + if ((h->sock.userlocks & SOCK_SNDBUF_LOCK) && >> + ((h->sock.sndbuf < SOCK_MIN_SNDBUF) || >> + (h->sock.sndbuf > sysctl_wmem_max))) >> + return -EINVAL; OL> At least for afunix, if the user did not explicitly set the OL> sndbuf, then userlocks won't have SOCK_SNDBUF_LOCK set. OL> Also, if userlocks in the image doesn't have SOCK_SNDBUF_LOCK, OL> then the sndbuf value isn't checked ? I think I was thinking that I only needed to verify the buffer value if the user claimed to have set it (as if it would be ignored otherwise), but that doesn't seem right. So, I think the proper thing to do here is always check it (i.e., remove the first check of the lock). OL> What about verifying sock.flags itself ? OL> In doing that, some options may assume/require some state -- OL> - SOCK_USE_WRITE_QUEUE only used in ipv4/ipv6/sctp OL> - SOCK_DESTROY only used in some protocols OL> Perhaps sanitize sock.flags per protocol ? Hmm, okay. OL> Many of these direct copy into the socket and sock effectively OL> bypass security checks that take place in {get,set}sockopt(), OL> either explicitly (e.g. sk_sndtimeo) or implicitly (e.g. OL> SOCK_LINGER in sock.flags, reflecting SO_LINGER option). OL> This applies both to checkpoint (potentially bypassing permission OL> of the checkpointer to view this data) and restart (bypassing OL> permissions of user to set these values). OL> The alternative is to use socksetopt/sockgetopt for those values OL> that should be subject to security checks. Yeah, I suppose so. I've resisted that thus far because it will make the sync operation so much harder to read, but I suppose it's unavoidable. OL> There should also be per-protocol consistency checks. E.g. afunix OL> cannot be in socket.state == SS_{DIS,}CONNECTING I suppose so, but I don't see anything in af_unix.c that seems to care :) OL> Better yet: -ENOSYS ? Okay. >> + ret = kernel_sendmsg(sock->sk_socket, &msg, &kvec, 1, len); OL> Is it possible to avoid the extra copy using splice() instead ? It's possible, but it will require some refactoring to get access to all the right pointers. I'd propose we push that optimization out until after we've got these patches integrated into the c/r tree. OL> SOCK_DEAD in sk->flags may also pose a problem. (Do we at all OL> need to checkpoint and/or restore SOCK_DEAD ?!) Is there any reasonable way we could arrive at a SOCK_DEAD socket via a valid descriptor? OL> Hmm... this test is quite hidden here - maybe a fat comment with a OL> reference to why it's here and what it is doing ? Sure. >> + else { >> + ckpt_debug("Buffer total %u exceeds limit %u\n", >> + h->total_bytes, *bufsize); >> + ret = -EINVAL; >> + goto out; >> + } >> + } >> + >> + for (i = 0; i < h->skb_count; i++) { >> + ret = sock_read_buffer(ctx, sock); >> + ckpt_debug("read buffer %i: %i\n", i, ret); >> + if (ret < 0) >> + break; >> + >> + total += ret; >> + if (total > h->total_bytes) { OL> What if 'total' overflows (for CAP_NET_ADMIN) ? perhaps: OL> if (total > h->total_bytes || total < ret) { Actually, I've changed this locally to require fewer variables and special cases. Let me know when I re-post if you don't think it's a better way to handle this. OL> Does the following bypass security checks for sys_connect() ? I don't think so. We're basically replicating sys_socketpair() here, which does not do a security check, presumably because all you're doing is hooking two sockets together that both belong to you. That's not to say that we're as safe as that limited operation, but I don't think it's totally clear. Perhaps someone more confident will comment. >> + /* Prime the socket's buffer limit with the maximum */ >> + peer->sk_userlocks |= SOCK_SNDBUF_LOCK; >> + peer->sk_sndbuf = sysctl_wmem_max; OL> I suppose this meant to be temporary ? Right, it will be overridden when we synchronize the socket properties. I'll strengthen the comment. OL> It is indeed a good idea to make it generic for all sockets, but I OL> suspect the way sock_read_buffers() works will only be suitable OL> for afunix, and perhaps for in-container inet4/6 (by OL> "in-container" I mean that both sides of the connection are in the OL> checkpoint image). Yeah, I meant to put a comment in the commit log about this. As I mentioned before, I was hoping to put most of the effort into the UNIX patch and move forward with that as we iterate on the INET patch. Thus, this was part of the buffer restore re-swizzle that clearly won't work for INET. In the meantime I've been working on the INET patch and splitting it up while retaining the necessary behavioral changes made here. In my next posting these should look better. OL> Hmm... does the code below work well with dgram sockets who OL> received data from the peer, and then the peer died (before OL> checkpoint) ? I'm not sure that I've replicated that exact scenario in my tests. However, when we're restoring socket A with outstanding incoming buffers, we restore them via B. If B is not in a reasonable state, I put it back into connected state to appease the sendmsg function and restore it when complete. See the "unshutdown" comment. >> +static struct file *sock_alloc_attach_fd(struct socket *socket) OL> Nit: I think this name is misleading - sock_attach_fd() itself is OL> already and should have been sock_attach_file() IMHO - so perhaps OL> s/fd/file here ? The name was chosen to reflect the fact that we're allocating a socket and then calling sock_attach_fd() on it. If you think it's less misleading to be inconsistent with the other call, I'll change it, but I wouldn't really agree... :) Thanks! -- Dan Smith IBM Linux Technology Center email: da...@us.ibm.com _______________________________________________ Containers mailing list contain...@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/containers _______________________________________________ Devel mailing list Devel@openvz.org https://openvz.org/mailman/listinfo/devel