Daniel Phillips wrote:
> Hi guys,
> 
> Well I have been reading net code seriously for two days, so I am still 
> basically a complete network klutz.  But we have a nasty network-realted vm 
> deadlock that needs fixing and there seems to be little choice but to wade in 
> and try to sort things out.
> 
> Here is the basic problem:
> 
>    http://lwn.net/Articles/129703/
> 
> There was discussion of this at the Kernel Summit:
> 
>    http://lwn.net/Articles/144273/
> 
> I won't discuss this further except to note that people are shooting way wide 
> of the mark by talking about throttling user processes.  It is the block IO 
> paths that need throttling, nothing more and nothing less.  When a block IO 
> path extends over the network, then the network protocol involved needs 
> throttling.  More specifically, the memory usage of the network part of the 
> path needs to be bounded, network memory needs to be drawn from a reserve 
> corresponding to the known bound, and we need to ensure that the number of 
> requests in flight is bounded in order to know how big the memory reserve 
> needs to be.
> 
> A couple of details interact to make this hard:
> 
>   1) There may be other traffic on a network interface that just block IO
>      protocol.  We need to ensure the block IO traffic gets through, perhaps
>      at the expense of dropping other traffic at critical times.
> 
>   2) Memory is allocated for packet buffers in the net interface drivers,
>      before we have decoded any protocol headers, and thus, before we even
>      know if a particular packet is involved in network block IO.
> 
> At OLS, I heard of an interesting proposal to attack this problem, apparently 
> put forth at the networking summit shortly before.  The idea is to support 
> multiple MAC addresses per interface, using ARP proxy techniques.  The MAC 
> address in each ethernet packet can then be used to denote a particular kind 
> of traffic on an interface, i.e., block IO traffic.  This is only part of a 
> solution of course, we would still have to do form of throttling even if we 
> did not have to worry about unrelated traffic. This technique does seem 
> workable to me, but I would prefer a more local solution if one is to be 
> found.  I think I have found one, but I need a reality check on my reasoning, 
> which is the purpose of this post.
> 
> Here is the plan:
> 
>   * All protocols used on an interface that supports block IO must be
>     vm-aware.
> 
> If we wish, we can leave it up to the administrator to ensure that only 
> vm-aware protocols are used on an interface that supports block IO, or we can 
> do some automatic checking.
> 
>   * Any socket to be used for block IO will be marked as a "vmhelper".
> 
> The number of protocols that need to have this special knowledge is quite 
> small, e.g.: tcp, udp, sctp, icmp, arp, maybe a few others.  We are talking 
> about a line or two of code in each to add the necessary awareness.
> 
>   * Inside the network driver, when memory is low we will allocate space
>     for every incoming packet from a memory reserve, regardless of whether
>     it is related to block IO or not.
> 
>   * Under low memory, we call the protocol layer synchronously instead of
>     queuing the packet through softnet.
> 
> We do not necessarily have to bypass softnet, since there is a mechanism for 
> thottling packets at this point.  However, there is a big problem with 
> throttling here: we haven't classified the packet yet, so the throttling 
> might discard some block IO packets, which is exactly what we don't want to 
> do under memory pressure.
> 
>   * The protocol receive handler does the socket lookup, then if memory is
>     low, discards any packet not belonging to a vmhelper socket.
> 
> Roughly speaking, the driver allocates each skb via:
> 
>         skb = memory_pressure ? dev_alloc_skb_reserve() : dev_alloc_skb();
> 
> Then the driver hands off the packet to netif_rx, which does:
> 
>         if (from_reserve(skb)) {
>               netif_receive_skb(skb);
>                 return;
>       }
> 
> And in the protocol handler we have:
> 
>         if (memory_pressure && !is_vmhelper(sock) && from_reserve(skb))
>                 goto drop_the_packet;
> 
> That is pretty much it.  Now, being a net newbie, it is not entirely clear to 
> me that we can call netif_receive_skb directly when packets are also being 
> queued through the softnet interface.  May I have some guidance on this 
> point, please?
> 
> If that works, I am prepared to justify and prove the rest.

It can cause reordering, but I think it should be possible. But there
can also be other memory allocations, like dst_alloc, secpath_alloc,
ipq_alloc, ip_conntrack_alloc, ..., which you need to take care of.
Socket delivery is last on the input path and a lot can happen to a
packet in between. IIRC Linus called the whole swapping over ISCSI idea
broken because apparently even userspace needs to allocate memory in
some situations to make things work.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to