Daniel Phillips wrote: > Hi guys, > > Well I have been reading net code seriously for two days, so I am still > basically a complete network klutz. But we have a nasty network-realted vm > deadlock that needs fixing and there seems to be little choice but to wade in > and try to sort things out. > > Here is the basic problem: > > http://lwn.net/Articles/129703/ > > There was discussion of this at the Kernel Summit: > > http://lwn.net/Articles/144273/ > > I won't discuss this further except to note that people are shooting way wide > of the mark by talking about throttling user processes. It is the block IO > paths that need throttling, nothing more and nothing less. When a block IO > path extends over the network, then the network protocol involved needs > throttling. More specifically, the memory usage of the network part of the > path needs to be bounded, network memory needs to be drawn from a reserve > corresponding to the known bound, and we need to ensure that the number of > requests in flight is bounded in order to know how big the memory reserve > needs to be. > > A couple of details interact to make this hard: > > 1) There may be other traffic on a network interface that just block IO > protocol. We need to ensure the block IO traffic gets through, perhaps > at the expense of dropping other traffic at critical times. > > 2) Memory is allocated for packet buffers in the net interface drivers, > before we have decoded any protocol headers, and thus, before we even > know if a particular packet is involved in network block IO. > > At OLS, I heard of an interesting proposal to attack this problem, apparently > put forth at the networking summit shortly before. The idea is to support > multiple MAC addresses per interface, using ARP proxy techniques. The MAC > address in each ethernet packet can then be used to denote a particular kind > of traffic on an interface, i.e., block IO traffic. This is only part of a > solution of course, we would still have to do form of throttling even if we > did not have to worry about unrelated traffic. This technique does seem > workable to me, but I would prefer a more local solution if one is to be > found. I think I have found one, but I need a reality check on my reasoning, > which is the purpose of this post. > > Here is the plan: > > * All protocols used on an interface that supports block IO must be > vm-aware. > > If we wish, we can leave it up to the administrator to ensure that only > vm-aware protocols are used on an interface that supports block IO, or we can > do some automatic checking. > > * Any socket to be used for block IO will be marked as a "vmhelper". > > The number of protocols that need to have this special knowledge is quite > small, e.g.: tcp, udp, sctp, icmp, arp, maybe a few others. We are talking > about a line or two of code in each to add the necessary awareness. > > * Inside the network driver, when memory is low we will allocate space > for every incoming packet from a memory reserve, regardless of whether > it is related to block IO or not. > > * Under low memory, we call the protocol layer synchronously instead of > queuing the packet through softnet. > > We do not necessarily have to bypass softnet, since there is a mechanism for > thottling packets at this point. However, there is a big problem with > throttling here: we haven't classified the packet yet, so the throttling > might discard some block IO packets, which is exactly what we don't want to > do under memory pressure. > > * The protocol receive handler does the socket lookup, then if memory is > low, discards any packet not belonging to a vmhelper socket. > > Roughly speaking, the driver allocates each skb via: > > skb = memory_pressure ? dev_alloc_skb_reserve() : dev_alloc_skb(); > > Then the driver hands off the packet to netif_rx, which does: > > if (from_reserve(skb)) { > netif_receive_skb(skb); > return; > } > > And in the protocol handler we have: > > if (memory_pressure && !is_vmhelper(sock) && from_reserve(skb)) > goto drop_the_packet; > > That is pretty much it. Now, being a net newbie, it is not entirely clear to > me that we can call netif_receive_skb directly when packets are also being > queued through the softnet interface. May I have some guidance on this > point, please? > > If that works, I am prepared to justify and prove the rest.
It can cause reordering, but I think it should be possible. But there can also be other memory allocations, like dst_alloc, secpath_alloc, ipq_alloc, ip_conntrack_alloc, ..., which you need to take care of. Socket delivery is last on the input path and a lot can happen to a packet in between. IIRC Linus called the whole swapping over ISCSI idea broken because apparently even userspace needs to allocate memory in some situations to make things work. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html