Re: [PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb

2013-11-20 Thread Michael S. Tsirkin
On Tue, Nov 19, 2013 at 02:00:11PM -0800, Eric Dumazet wrote:
 On Tue, 2013-11-19 at 23:53 +0200, Michael S. Tsirkin wrote:
 
  Which NIC? Virtio? Prior to 2613af0ed18a11d5c566a81f9a6510b73180660a
  it didn't drop packets received from host as far as I can tell.
  virtio is more like a pipe than a real NIC in this respect.
 
 Prior/after to this patch, you were not posting buffers, so if packets
 were received on a physical NIC, you were dropping the packets anyway.

 It makes no difference at all, adding a cushion might make you feel
 better, but its really not worth it.
 
 Under memory stress, it makes better sense to drop a super big GRO
 packet (The one needing frag_list extension ...)
 
 It gives a better signal to the sender to reduce its pressure, and gives
 opportunity to free more of your memory.
 

OK, but in that case one wonders whether we should do more to free memory?

E.g. imagine that we dropped a packet of a specific TCP flow
because we couldn't allocate a new packet.

What happens now is that the old packet is freed as well.

So quite likely the next packet in queue will get processed
since it will reuse the memory we have just freed.

The next packet and the next after it etc all will have to go through
the net stack until they get at the socket and are dropped then
because we missed a segment.  Even worse, GRO gets disabled so the load
on receiver goes up instead of down.

Sounds like a problem doesn't it?

GRO actually detects it's the same flow and can see packet is
out of sequence. Why doesn't it drop the packet then?
Alternatively, we could (for example using the pre-allocated skb
like I suggested) notify GRO that it should start dropping packets
of this flow.

What do you think?

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb

2013-11-20 Thread Michael S. Tsirkin
On Tue, Nov 19, 2013 at 01:38:16PM -0800, Michael Dalton wrote:
 Great catch Jason. I agree this now raises the larger issue of how to
 handle a memory alloc failure in the middle of receive. As Eric mentioned,
 we can drop the packet and free the remaining (num_buf) frags.
 
 Michael, perhaps I'm missing something, but why would you prefer
 pre-allocating buffers in this case? If the guest kernel is OOM'ing,
 dropping packets should provide backpressure.
 
 Also, we could just as easily fail the initial skb alloc in page_to_skb,
 and I think that case also needs to be handled now in the same fashion as
 a memory allocation failure in receive_mergeable.
 
 Best,
 
 Mike

Yes I missed this last night. Thanks a lot Eric and Michael for pointing
this out.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb

2013-11-20 Thread Eric Dumazet
On Wed, 2013-11-20 at 10:58 +0200, Michael S. Tsirkin wrote:
 On Tue, Nov 19, 2013 at 02:00:11PM -0800, Eric Dumazet wrote:
  On Tue, 2013-11-19 at 23:53 +0200, Michael S. Tsirkin wrote:
  
   Which NIC? Virtio? Prior to 2613af0ed18a11d5c566a81f9a6510b73180660a
   it didn't drop packets received from host as far as I can tell.
   virtio is more like a pipe than a real NIC in this respect.
  
  Prior/after to this patch, you were not posting buffers, so if packets
  were received on a physical NIC, you were dropping the packets anyway.
 
  It makes no difference at all, adding a cushion might make you feel
  better, but its really not worth it.
  
  Under memory stress, it makes better sense to drop a super big GRO
  packet (The one needing frag_list extension ...)
  
  It gives a better signal to the sender to reduce its pressure, and gives
  opportunity to free more of your memory.
  
 
 OK, but in that case one wonders whether we should do more to free memory?
 
 E.g. imagine that we dropped a packet of a specific TCP flow
 because we couldn't allocate a new packet.
 
 What happens now is that the old packet is freed as well.
 
 So quite likely the next packet in queue will get processed
 since it will reuse the memory we have just freed.
 
 The next packet and the next after it etc all will have to go through
 the net stack until they get at the socket and are dropped then
 because we missed a segment.  Even worse, GRO gets disabled so the load
 on receiver goes up instead of down.
 
 Sounds like a problem doesn't it?

I see no problem at all. GRO is a hint for high rates (and obviously
when there is enough memory)

 
 GRO actually detects it's the same flow and can see packet is
 out of sequence. Why doesn't it drop the packet then?
 Alternatively, we could (for example using the pre-allocated skb
 like I suggested) notify GRO that it should start dropping packets
 of this flow.
 
 What do you think?
 

I think we disagree a lot on memory management on networking stacks.

We did a lot of work in TCP stack and Qdisc layers to lower memory
pressure (and bufferbloat), an you seem to try hard to introduce yet
another layer of buffer bloat in virtio_net.

So add whatever you want to proudly state to your management :

Look how smart we are : we drop no packets in our layer



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb

2013-11-20 Thread Michael S. Tsirkin
On Wed, Nov 20, 2013 at 07:16:33AM -0800, Eric Dumazet wrote:
 On Wed, 2013-11-20 at 10:58 +0200, Michael S. Tsirkin wrote:
  On Tue, Nov 19, 2013 at 02:00:11PM -0800, Eric Dumazet wrote:
   On Tue, 2013-11-19 at 23:53 +0200, Michael S. Tsirkin wrote:
   
Which NIC? Virtio? Prior to 2613af0ed18a11d5c566a81f9a6510b73180660a
it didn't drop packets received from host as far as I can tell.
virtio is more like a pipe than a real NIC in this respect.
   
   Prior/after to this patch, you were not posting buffers, so if packets
   were received on a physical NIC, you were dropping the packets anyway.
  
   It makes no difference at all, adding a cushion might make you feel
   better, but its really not worth it.
   
   Under memory stress, it makes better sense to drop a super big GRO
   packet (The one needing frag_list extension ...)
   
   It gives a better signal to the sender to reduce its pressure, and gives
   opportunity to free more of your memory.
   
  
  OK, but in that case one wonders whether we should do more to free memory?
  
  E.g. imagine that we dropped a packet of a specific TCP flow
  because we couldn't allocate a new packet.
  
  What happens now is that the old packet is freed as well.
  
  So quite likely the next packet in queue will get processed
  since it will reuse the memory we have just freed.
  
  The next packet and the next after it etc all will have to go through
  the net stack until they get at the socket and are dropped then
  because we missed a segment.  Even worse, GRO gets disabled so the load
  on receiver goes up instead of down.
  
  Sounds like a problem doesn't it?
 
 I see no problem at all. GRO is a hint for high rates (and obviously
 when there is enough memory)
 
  
  GRO actually detects it's the same flow and can see packet is
  out of sequence. Why doesn't it drop the packet then?
  Alternatively, we could (for example using the pre-allocated skb
  like I suggested) notify GRO that it should start dropping packets
  of this flow.
  
  What do you think?
  
 
 I think we disagree a lot on memory management on networking stacks.
 
 We did a lot of work in TCP stack and Qdisc layers to lower memory
 pressure (and bufferbloat), an you seem to try hard to introduce yet
 another layer of buffer bloat in virtio_net.
 
 So add whatever you want to proudly state to your management :
 
 Look how smart we are : we drop no packets in our layer
 

Hmm some kind of disconnect here.
I got you rmanagement about bufferbloat.

What I am saying is that maybe we should drop packets more
aggressively: when we drop one packet of a flow, why not
drop everything that's queued and is for the same flow?

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb

2013-11-20 Thread Eric Dumazet
On Wed, 2013-11-20 at 18:06 +0200, Michael S. Tsirkin wrote:

 Hmm some kind of disconnect here.
 I got you rmanagement about bufferbloat.
 
 What I am saying is that maybe we should drop packets more
 aggressively: when we drop one packet of a flow, why not
 drop everything that's queued and is for the same flow?

I really hope your TCP flows use SACK ;)

Please read the rfc 2018 introduction for details.

If a packet is dropped, it doesn't mean following packets _have_ to be
dropped.


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb

2013-11-20 Thread Michael S. Tsirkin
On Wed, Nov 20, 2013 at 08:14:21AM -0800, Eric Dumazet wrote:
 On Wed, 2013-11-20 at 18:06 +0200, Michael S. Tsirkin wrote:
 
  Hmm some kind of disconnect here.
  I got you rmanagement about bufferbloat.
  
  What I am saying is that maybe we should drop packets more
  aggressively: when we drop one packet of a flow, why not
  drop everything that's queued and is for the same flow?
 
 I really hope your TCP flows use SACK ;)
 
 Please read the rfc 2018 introduction for details.
 
 If a packet is dropped, it doesn't mean following packets _have_ to be
 dropped.


Got it, thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb

2013-11-19 Thread Jason Wang
We need to drop the refcnt of page when we fail to allocate an skb for frag
list, otherwise it will be leaked. The bug was introduced by commit
2613af0ed18a11d5c566a81f9a6510b73180660a (virtio_net: migrate mergeable rx
buffers to page frag allocators).

Cc: Michael Dalton mwdal...@google.com
Cc: Eric Dumazet eduma...@google.com
Cc: Rusty Russell ru...@rustcorp.com.au
Cc: Michael S. Tsirkin m...@redhat.com
Signed-off-by: Jason Wang jasow...@redhat.com
---
The patch was needed for 3.12 stable.
---
 drivers/net/virtio_net.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 69ad42b..3798517 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -322,9 +322,11 @@ static int receive_mergeable(struct receive_queue *rq, 
struct sk_buff *head_skb)
 head_skb-dev-name);
len = MERGE_BUFFER_LEN;
}
+   page = virt_to_head_page(buf);
if (unlikely(num_skb_frags == MAX_SKB_FRAGS)) {
struct sk_buff *nskb = alloc_skb(0, GFP_ATOMIC);
if (unlikely(!nskb)) {
+   put_page(page);
head_skb-dev-stats.rx_dropped++;
return -ENOMEM;
}
@@ -341,7 +343,6 @@ static int receive_mergeable(struct receive_queue *rq, 
struct sk_buff *head_skb)
head_skb-len += len;
head_skb-truesize += MERGE_BUFFER_LEN;
}
-   page = virt_to_head_page(buf);
offset = buf - (char *)page_address(page);
if (skb_can_coalesce(curr_skb, num_skb_frags, page, offset)) {
put_page(page);
-- 
1.8.3.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb

2013-11-19 Thread Eric Dumazet
On Tue, 2013-11-19 at 16:05 +0800, Jason Wang wrote:
 We need to drop the refcnt of page when we fail to allocate an skb for frag
 list, otherwise it will be leaked. The bug was introduced by commit
 2613af0ed18a11d5c566a81f9a6510b73180660a (virtio_net: migrate mergeable rx
 buffers to page frag allocators).
 
 Cc: Michael Dalton mwdal...@google.com
 Cc: Eric Dumazet eduma...@google.com
 Cc: Rusty Russell ru...@rustcorp.com.au
 Cc: Michael S. Tsirkin m...@redhat.com
 Signed-off-by: Jason Wang jasow...@redhat.com
 ---
 The patch was needed for 3.12 stable.

Good catch, but if we return from receive_mergeable() in the 'middle'
of the frags we would need for the current skb, who will
call the virtqueue_get_buf() to flush the remaining frags ?

Don't we also need to call virtqueue_get_buf() like 

while (--num_buf) {
buf = virtqueue_get_buf(rq-vq, len);
if (!buf)
break;
put_page(virt_to_head_page(buf));
}

?




___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb

2013-11-19 Thread Michael S. Tsirkin
On Tue, Nov 19, 2013 at 06:03:48AM -0800, Eric Dumazet wrote:
 On Tue, 2013-11-19 at 16:05 +0800, Jason Wang wrote:
  We need to drop the refcnt of page when we fail to allocate an skb for frag
  list, otherwise it will be leaked. The bug was introduced by commit
  2613af0ed18a11d5c566a81f9a6510b73180660a (virtio_net: migrate mergeable rx
  buffers to page frag allocators).
  
  Cc: Michael Dalton mwdal...@google.com
  Cc: Eric Dumazet eduma...@google.com
  Cc: Rusty Russell ru...@rustcorp.com.au
  Cc: Michael S. Tsirkin m...@redhat.com
  Signed-off-by: Jason Wang jasow...@redhat.com
  ---
  The patch was needed for 3.12 stable.
 
 Good catch, but if we return from receive_mergeable() in the 'middle'
 of the frags we would need for the current skb, who will
 call the virtqueue_get_buf() to flush the remaining frags ?
 
 Don't we also need to call virtqueue_get_buf() like 
 
 while (--num_buf) {
 buf = virtqueue_get_buf(rq-vq, len);
 if (!buf)
 break;
 put_page(virt_to_head_page(buf));
 }
 
 ?
 
 
 


virtqueue_get_buf only gives you back a buffer that has been DMA-ed
to by hardware. ATM there's no way to get back a buffer once you
gave it to hardware without doing a NIC reset.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb

2013-11-19 Thread Michael S. Tsirkin
On Tue, Nov 19, 2013 at 06:03:48AM -0800, Eric Dumazet wrote:
 On Tue, 2013-11-19 at 16:05 +0800, Jason Wang wrote:
  We need to drop the refcnt of page when we fail to allocate an skb for frag
  list, otherwise it will be leaked. The bug was introduced by commit
  2613af0ed18a11d5c566a81f9a6510b73180660a (virtio_net: migrate mergeable rx
  buffers to page frag allocators).
  
  Cc: Michael Dalton mwdal...@google.com
  Cc: Eric Dumazet eduma...@google.com
  Cc: Rusty Russell ru...@rustcorp.com.au
  Cc: Michael S. Tsirkin m...@redhat.com
  Signed-off-by: Jason Wang jasow...@redhat.com
  ---
  The patch was needed for 3.12 stable.
 
 Good catch, but if we return from receive_mergeable() in the 'middle'
 of the frags we would need for the current skb, who will
 call the virtqueue_get_buf() to flush the remaining frags ?
 
 Don't we also need to call virtqueue_get_buf() like 
 
 while (--num_buf) {
 buf = virtqueue_get_buf(rq-vq, len);
 if (!buf)
 break;
 put_page(virt_to_head_page(buf));
 }
 
 ?
 
 


Let me explain what worries me in your suggestion:

struct sk_buff *nskb = alloc_skb(0, GFP_ATOMIC);
if (unlikely(!nskb)) {
head_skb-dev-stats.rx_dropped++;
return -ENOMEM;
}

is this the failure case we are talking about?

I think this is a symprom of a larger problem
introduced by 2613af0ed18a11d5c566a81f9a6510b73180660a,
namely that we now need to allocate memory in the
middle of processing a packet.


I think discarding a completely valid and well-formed
packet from the receive queue because we are unable
to allocate new memory with GFP_ATOMIC
for future packets is not a good idea.

It certainly violates the principle of least surprize:
when one sees host pass packet to guest, one expects
the packet to get into the networking stack, not get
dropped by the driver internally.
Guest stack can do with the packet what it sees fit.

We actually wake up a thread if we can't fill up the queue,
that will fill it up in GFP_KERNEL context.

So I think we should find a way to pre-allocate if necessary and avoid
error paths where allocating new memory is a required to avoid drops.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb

2013-11-19 Thread Eric Dumazet
On Tue, 2013-11-19 at 22:49 +0200, Michael S. Tsirkin wrote:
 On Tue, Nov 19, 2013 at 06:03:48AM -0800, Eric Dumazet wrote:
  On Tue, 2013-11-19 at 16:05 +0800, Jason Wang wrote:
   We need to drop the refcnt of page when we fail to allocate an skb for 
   frag
   list, otherwise it will be leaked. The bug was introduced by commit
   2613af0ed18a11d5c566a81f9a6510b73180660a (virtio_net: migrate mergeable 
   rx
   buffers to page frag allocators).
   
   Cc: Michael Dalton mwdal...@google.com
   Cc: Eric Dumazet eduma...@google.com
   Cc: Rusty Russell ru...@rustcorp.com.au
   Cc: Michael S. Tsirkin m...@redhat.com
   Signed-off-by: Jason Wang jasow...@redhat.com
   ---
   The patch was needed for 3.12 stable.
  
  Good catch, but if we return from receive_mergeable() in the 'middle'
  of the frags we would need for the current skb, who will
  call the virtqueue_get_buf() to flush the remaining frags ?
  
  Don't we also need to call virtqueue_get_buf() like 
  
  while (--num_buf) {
  buf = virtqueue_get_buf(rq-vq, len);
  if (!buf)
  break;
  put_page(virt_to_head_page(buf));
  }
  
  ?
  
  
 
 
 Let me explain what worries me in your suggestion:
 
 struct sk_buff *nskb = alloc_skb(0, GFP_ATOMIC);
 if (unlikely(!nskb)) {
 head_skb-dev-stats.rx_dropped++;
 return -ENOMEM;
 }
 
 is this the failure case we are talking about?

I thought Jason patch was about this, no ?

 
 I think this is a symprom of a larger problem
 introduced by 2613af0ed18a11d5c566a81f9a6510b73180660a,
 namely that we now need to allocate memory in the
 middle of processing a packet.
 
 
 I think discarding a completely valid and well-formed
 packet from the receive queue because we are unable
 to allocate new memory with GFP_ATOMIC
 for future packets is not a good idea.

How is it different with NIC processing in RX path ?

 
 It certainly violates the principle of least surprize:
 when one sees host pass packet to guest, one expects
 the packet to get into the networking stack, not get
 dropped by the driver internally.
 Guest stack can do with the packet what it sees fit.
 
 We actually wake up a thread if we can't fill up the queue,
 that will fill it up in GFP_KERNEL context.
 
 So I think we should find a way to pre-allocate if necessary and avoid
 error paths where allocating new memory is a required to avoid drops.
 

Really, under ATOMIC context, there is no way you can avoid dropping
packets if you cannot allocate memory. If you cannot allocate sk_buff
(256 bytes !!), you wont be able to allocate the 1500+ bytes to hold the
payload of next packets anyway. 

Same problem on a real NIC.

Under memory pressure we _do_ packet drops.
Nobody really complained.

Sure, you can add yet another cache of pre-allocated skbs and pay the
price of managing yet another cache layer, but still need to trop
packets under stress.

Pre-allocating skb on real NIC has a performance cost, because we clear
sk_buff way ahead of time. By the time skb is finally received, cpu has
to bring back into its cache memory cache lines.



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb

2013-11-19 Thread Michael Dalton
Great catch Jason. I agree this now raises the larger issue of how to
handle a memory alloc failure in the middle of receive. As Eric mentioned,
we can drop the packet and free the remaining (num_buf) frags.

Michael, perhaps I'm missing something, but why would you prefer
pre-allocating buffers in this case? If the guest kernel is OOM'ing,
dropping packets should provide backpressure.

Also, we could just as easily fail the initial skb alloc in page_to_skb,
and I think that case also needs to be handled now in the same fashion as
a memory allocation failure in receive_mergeable.

Best,

Mike
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb

2013-11-19 Thread Michael S. Tsirkin
On Tue, Nov 19, 2013 at 01:36:36PM -0800, Eric Dumazet wrote:
 On Tue, 2013-11-19 at 22:49 +0200, Michael S. Tsirkin wrote:
  On Tue, Nov 19, 2013 at 06:03:48AM -0800, Eric Dumazet wrote:
   On Tue, 2013-11-19 at 16:05 +0800, Jason Wang wrote:
We need to drop the refcnt of page when we fail to allocate an skb for 
frag
list, otherwise it will be leaked. The bug was introduced by commit
2613af0ed18a11d5c566a81f9a6510b73180660a (virtio_net: migrate 
mergeable rx
buffers to page frag allocators).

Cc: Michael Dalton mwdal...@google.com
Cc: Eric Dumazet eduma...@google.com
Cc: Rusty Russell ru...@rustcorp.com.au
Cc: Michael S. Tsirkin m...@redhat.com
Signed-off-by: Jason Wang jasow...@redhat.com
---
The patch was needed for 3.12 stable.
   
   Good catch, but if we return from receive_mergeable() in the 'middle'
   of the frags we would need for the current skb, who will
   call the virtqueue_get_buf() to flush the remaining frags ?
   
   Don't we also need to call virtqueue_get_buf() like 
   
   while (--num_buf) {
   buf = virtqueue_get_buf(rq-vq, len);
   if (!buf)
   break;
   put_page(virt_to_head_page(buf));
   }
   
   ?
   
   
  
  
  Let me explain what worries me in your suggestion:
  
  struct sk_buff *nskb = alloc_skb(0, GFP_ATOMIC);
  if (unlikely(!nskb)) {
  head_skb-dev-stats.rx_dropped++;
  return -ENOMEM;
  }
  
  is this the failure case we are talking about?
 
 I thought Jason patch was about this, no ?
 
  
  I think this is a symprom of a larger problem
  introduced by 2613af0ed18a11d5c566a81f9a6510b73180660a,
  namely that we now need to allocate memory in the
  middle of processing a packet.
  
  
  I think discarding a completely valid and well-formed
  packet from the receive queue because we are unable
  to allocate new memory with GFP_ATOMIC
  for future packets is not a good idea.
 
 How is it different with NIC processing in RX path ?


Which NIC? Virtio? Prior to 2613af0ed18a11d5c566a81f9a6510b73180660a
it didn't drop packets received from host as far as I can tell.
virtio is more like a pipe than a real NIC in this respect.

  
  It certainly violates the principle of least surprize:
  when one sees host pass packet to guest, one expects
  the packet to get into the networking stack, not get
  dropped by the driver internally.
  Guest stack can do with the packet what it sees fit.
  
  We actually wake up a thread if we can't fill up the queue,
  that will fill it up in GFP_KERNEL context.
  
  So I think we should find a way to pre-allocate if necessary and avoid
  error paths where allocating new memory is a required to avoid drops.
  
 
 Really, under ATOMIC context, there is no way you can avoid dropping
 packets if you cannot allocate memory. If you cannot allocate sk_buff
 (256 bytes !!), you wont be able to allocate the 1500+ bytes to hold the
 payload of next packets anyway. 

that's why we do:

if (!try_fill_recv(rq, GFP_ATOMIC))
schedule_delayed_work(vi-refill, 0);


the queues are large enough for a single failure not to be
an immediate problem.


 Same problem on a real NIC.
 
 Under memory pressure we _do_ packet drops.
 Nobody really complained.

 Sure, you can add yet another cache of pre-allocated skbs and pay the
 price of managing yet another cache layer, but still need to trop
 packets under stress.

We don't need a cache even. Just enough to avoid dropping packets
if allocation failed in the middle so we don't dequeue a buffer and then
drop it.

Once we use this reserved skb, we stop processing the queue until
refill gives it back.

 Pre-allocating skb on real NIC has a performance cost, because we clear
 sk_buff way ahead of time. By the time skb is finally received, cpu has
 to bring back into its cache memory cache lines.
 

Alternatively we can pre-allocate the memory but avoid clearing it maybe?

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb

2013-11-19 Thread Eric Dumazet
On Tue, 2013-11-19 at 23:53 +0200, Michael S. Tsirkin wrote:

 Which NIC? Virtio? Prior to 2613af0ed18a11d5c566a81f9a6510b73180660a
 it didn't drop packets received from host as far as I can tell.
 virtio is more like a pipe than a real NIC in this respect.

Prior/after to this patch, you were not posting buffers, so if packets
were received on a physical NIC, you were dropping the packets anyway.

It makes no difference at all, adding a cushion might make you feel
better, but its really not worth it.

Under memory stress, it makes better sense to drop a super big GRO
packet (The one needing frag_list extension ...)

It gives a better signal to the sender to reduce its pressure, and gives
opportunity to free more of your memory.



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb

2013-11-19 Thread Michael Dalton
Hi,

After further reflection I think we're looking at two related issues:
(a) a memory leak that Jason has identified that occurs when a memory
allocation fails in receive_mergeable. Jasons commit solves this issue.
(b) virtio-net does not dequeue all buffers for a packet in the
case that an error occurs on receive and mergeable receive buffers is
enabled.

For (a), this bug is new and due to changes in 2613af0ed18a, and the
net impact is memory leak on the physical page. However, I believe (b)
has always been possible in some form because if page_to_skb() returns
NULL (e.g., due to SKB allocation failure), receive_mergeable is never
called. AFAICT this is also the behavior prior to 2613af0ed18a.

The net impact of (b) would be that virtio-net would interpret a packet
buffer that is in the middle of a mergeable packet as the start of a
new packet, which is definitely also a bug (and the buffer contents
could contain bytes that resembled a valid virtio-net header).

A solution for (b) will require handling both the page_to_skb memory
allocation failures and the memory allocation failures in
receive_mergeable introduced by 2613af0ed18a.

Best,

Mike
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb

2013-11-19 Thread Jason Wang
On 11/19/2013 10:03 PM, Eric Dumazet wrote:
 On Tue, 2013-11-19 at 16:05 +0800, Jason Wang wrote:
  We need to drop the refcnt of page when we fail to allocate an skb for frag
  list, otherwise it will be leaked. The bug was introduced by commit
  2613af0ed18a11d5c566a81f9a6510b73180660a (virtio_net: migrate mergeable rx
  buffers to page frag allocators).
  
  Cc: Michael Dalton mwdal...@google.com
  Cc: Eric Dumazet eduma...@google.com
  Cc: Rusty Russell ru...@rustcorp.com.au
  Cc: Michael S. Tsirkin m...@redhat.com
  Signed-off-by: Jason Wang jasow...@redhat.com
  ---
  The patch was needed for 3.12 stable.
 Good catch, but if we return from receive_mergeable() in the 'middle'
 of the frags we would need for the current skb, who will
 call the virtqueue_get_buf() to flush the remaining frags ?

 Don't we also need to call virtqueue_get_buf() like 

 while (--num_buf) {
 buf = virtqueue_get_buf(rq-vq, len);
 if (!buf)
 break;
 put_page(virt_to_head_page(buf));
 }

 ?

Yes we need this, will send V2.

Thanks
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb

2013-11-19 Thread Jason Wang
On 11/20/2013 04:49 AM, Michael S. Tsirkin wrote:
 On Tue, Nov 19, 2013 at 06:03:48AM -0800, Eric Dumazet wrote:
 On Tue, 2013-11-19 at 16:05 +0800, Jason Wang wrote:
 We need to drop the refcnt of page when we fail to allocate an skb for frag
 list, otherwise it will be leaked. The bug was introduced by commit
 2613af0ed18a11d5c566a81f9a6510b73180660a (virtio_net: migrate mergeable rx
 buffers to page frag allocators).

 Cc: Michael Dalton mwdal...@google.com
 Cc: Eric Dumazet eduma...@google.com
 Cc: Rusty Russell ru...@rustcorp.com.au
 Cc: Michael S. Tsirkin m...@redhat.com
 Signed-off-by: Jason Wang jasow...@redhat.com
 ---
 The patch was needed for 3.12 stable.
 Good catch, but if we return from receive_mergeable() in the 'middle'
 of the frags we would need for the current skb, who will
 call the virtqueue_get_buf() to flush the remaining frags ?

 Don't we also need to call virtqueue_get_buf() like 

 while (--num_buf) {
 buf = virtqueue_get_buf(rq-vq, len);
 if (!buf)
 break;
 put_page(virt_to_head_page(buf));
 }

 ?



 Let me explain what worries me in your suggestion:

 struct sk_buff *nskb = alloc_skb(0, GFP_ATOMIC);
 if (unlikely(!nskb)) {
 head_skb-dev-stats.rx_dropped++;
 return -ENOMEM;
 }

 is this the failure case we are talking about?

 I think this is a symprom of a larger problem
 introduced by 2613af0ed18a11d5c566a81f9a6510b73180660a,
 namely that we now need to allocate memory in the
 middle of processing a packet.


 I think discarding a completely valid and well-formed
 packet from the receive queue because we are unable
 to allocate new memory with GFP_ATOMIC
 for future packets is not a good idea.

 It certainly violates the principle of least surprize:
 when one sees host pass packet to guest, one expects
 the packet to get into the networking stack, not get
 dropped by the driver internally.
 Guest stack can do with the packet what it sees fit.

 We actually wake up a thread if we can't fill up the queue,
 that will fill it up in GFP_KERNEL context.

 So I think we should find a way to pre-allocate if necessary and avoid
 error paths where allocating new memory is a required to avoid drops.


The problem happens only on memory pressure, this pre-allocation may add
more stress on this.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb

2013-11-19 Thread Jason Wang
On 11/20/2013 09:34 AM, Michael Dalton wrote:
 Hi,

 After further reflection I think we're looking at two related issues:
 (a) a memory leak that Jason has identified that occurs when a memory
 allocation fails in receive_mergeable. Jasons commit solves this issue.
 (b) virtio-net does not dequeue all buffers for a packet in the
 case that an error occurs on receive and mergeable receive buffers is
 enabled.

 For (a), this bug is new and due to changes in 2613af0ed18a, and the
 net impact is memory leak on the physical page. However, I believe (b)
 has always been possible in some form because if page_to_skb() returns
 NULL (e.g., due to SKB allocation failure), receive_mergeable is never
 called. AFAICT this is also the behavior prior to 2613af0ed18a.

 The net impact of (b) would be that virtio-net would interpret a packet
 buffer that is in the middle of a mergeable packet as the start of a
 new packet, which is definitely also a bug (and the buffer contents
 could contain bytes that resembled a valid virtio-net header).

 A solution for (b) will require handling both the page_to_skb memory
 allocation failures and the memory allocation failures in
 receive_mergeable introduced by 2613af0ed18a.

Ture, so we first need a patch to solve page_to_skb() failure which
could be used for stable tree prior to 2613af0ed18a. Then another patch
to solve the issue introduced by 2613af0ed18a which could be only used
for 3.12 stable. Will draft patches for them.

Thanks

 Best,

 Mike

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization