On Thu, 2015-04-23 at 08:48 -0500, Josh Hunt wrote: > On 04/21/2015 07:09 PM, Eric Dumazet wrote: > > > > Note that this patch adds a deadlock possibility in some stress > > situations. > > > > If a process owning some tcp socket dies, and tcp_mem[2] is already hit, > > all sk_stream_alloc_skb() can return NULL and we loop in tcp_send_fin(), > > making no progress because we can not free any tcp memory. > > Ugh. Thanks for fixing this Eric!
No problem ;) For the record, I've tested this followup patch that I'll formally submit when net-next reopens : If there is one already sent skb in write queue (we look at the tail of course), and : - We are under TCP memory pressure, - or allocation of a fresh skb using GFP_KERNEL failed, -> we add the FIN flag that will eventually be sent later after a timeout. net/ipv4/tcp_output.c | 41 ++++++++++++++++++++-------------------- 1 file changed, 21 insertions(+), 20 deletions(-) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 2ade67b7cdb0..fe6558eb64f3 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2827,33 +2827,34 @@ static void sk_forced_wmem_schedule(struct sock *sk, int size) sk_memory_allocated_add(sk, amt, &status); } -/* Send a fin. The caller locks the socket for us. This cannot be - * allowed to fail queueing a FIN frame under any circumstances. +/* Send a FIN. The caller locks the socket for us. + * We should try to send a FIN packet really hard, but eventually give up. */ void tcp_send_fin(struct sock *sk) { + struct sk_buff *skb, *tskb = tcp_write_queue_tail(sk); struct tcp_sock *tp = tcp_sk(sk); - struct sk_buff *skb = tcp_write_queue_tail(sk); - int mss_now; - /* Optimization, tack on the FIN if we have a queue of - * unsent frames. But be careful about outgoing SACKS - * and IP options. + /* Optimization, tack on the FIN if we have one skb in write queue and + * this skb was not yet sent, or we are under memory pressure. + * Note: in the latter case, FIN packet will be sent after a timeout, + * as TCP stack thinks it has been transmitted once. */ - mss_now = tcp_current_mss(sk); - - if (tcp_send_head(sk)) { - TCP_SKB_CB(skb)->tcp_flags |= TCPHDR_FIN; - TCP_SKB_CB(skb)->end_seq++; + if (tskb && (tcp_send_head(sk) || sk_under_memory_pressure(sk))) { +coalesce: + TCP_SKB_CB(tskb)->tcp_flags |= TCPHDR_FIN; + TCP_SKB_CB(tskb)->end_seq++; tp->write_seq++; + if (!tcp_send_head(sk)) { + tp->snd_nxt++; + return; + } } else { - /* Socket is locked, keep trying until memory is available. */ - for (;;) { - skb = alloc_skb_fclone(MAX_TCP_HEADER, - sk->sk_allocation); - if (skb) - break; - yield(); + skb = alloc_skb_fclone(MAX_TCP_HEADER, sk->sk_allocation); + if (unlikely(!skb)) { + if (tskb) + goto coalesce; + return; } skb_reserve(skb, MAX_TCP_HEADER); sk_forced_wmem_schedule(sk, skb->truesize); @@ -2862,7 +2863,7 @@ void tcp_send_fin(struct sock *sk) TCPHDR_ACK | TCPHDR_FIN); tcp_queue_skb(sk, skb); } - __tcp_push_pending_frames(sk, mss_now, TCP_NAGLE_OFF); + __tcp_push_pending_frames(sk, tcp_current_mss(sk), TCP_NAGLE_OFF); } /* We get here when a process closes a file descriptor (either due to -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html