Re: [PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb

2013-11-20 Thread Michael S. Tsirkin
On Tue, Nov 19, 2013 at 02:00:11PM -0800, Eric Dumazet wrote:
 On Tue, 2013-11-19 at 23:53 +0200, Michael S. Tsirkin wrote:
 
  Which NIC? Virtio? Prior to 2613af0ed18a11d5c566a81f9a6510b73180660a
  it didn't drop packets received from host as far as I can tell.
  virtio is more like a pipe than a real NIC in this respect.
 
 Prior/after to this patch, you were not posting buffers, so if packets
 were received on a physical NIC, you were dropping the packets anyway.

 It makes no difference at all, adding a cushion might make you feel
 better, but its really not worth it.
 
 Under memory stress, it makes better sense to drop a super big GRO
 packet (The one needing frag_list extension ...)
 
 It gives a better signal to the sender to reduce its pressure, and gives
 opportunity to free more of your memory.
 

OK, but in that case one wonders whether we should do more to free memory?

E.g. imagine that we dropped a packet of a specific TCP flow
because we couldn't allocate a new packet.

What happens now is that the old packet is freed as well.

So quite likely the next packet in queue will get processed
since it will reuse the memory we have just freed.

The next packet and the next after it etc all will have to go through
the net stack until they get at the socket and are dropped then
because we missed a segment.  Even worse, GRO gets disabled so the load
on receiver goes up instead of down.

Sounds like a problem doesn't it?

GRO actually detects it's the same flow and can see packet is
out of sequence. Why doesn't it drop the packet then?
Alternatively, we could (for example using the pre-allocated skb
like I suggested) notify GRO that it should start dropping packets
of this flow.

What do you think?

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb

2013-11-20 Thread Michael S. Tsirkin
On Tue, Nov 19, 2013 at 01:38:16PM -0800, Michael Dalton wrote:
 Great catch Jason. I agree this now raises the larger issue of how to
 handle a memory alloc failure in the middle of receive. As Eric mentioned,
 we can drop the packet and free the remaining (num_buf) frags.
 
 Michael, perhaps I'm missing something, but why would you prefer
 pre-allocating buffers in this case? If the guest kernel is OOM'ing,
 dropping packets should provide backpressure.
 
 Also, we could just as easily fail the initial skb alloc in page_to_skb,
 and I think that case also needs to be handled now in the same fashion as
 a memory allocation failure in receive_mergeable.
 
 Best,
 
 Mike

Yes I missed this last night. Thanks a lot Eric and Michael for pointing
this out.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH net 2/3] virtio-net: fix num calculation on frag skb allocation failure

2013-11-20 Thread Jason Wang
We need decrease the rq-num after we can get a buf through
virtqueue_get_buf() even if we could not allocate frag skb. Otherwise, the
refill routine won't be triggered under heavy memory stress since the driver may
still think there's enough room.

This bug was introduced by commit 2613af0ed18a11d5c566a81f9a6510b73180660a
(virtio_net: migrate mergeable rx buffers to page frag allocators).

Cc: Rusty Russell ru...@rustcorp.com.au
Cc: Michael S. Tsirkin m...@redhat.com
Cc: Michael Dalton mwdal...@google.com
Cc: Eric Dumazet eduma...@google.com
Signed-off-by: Jason Wang jasow...@redhat.com
---
The patch was needed for 3.12 stable.
---
 drivers/net/virtio_net.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 24fd502..de1d6ca 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -333,6 +333,7 @@ static int receive_mergeable(struct receive_queue *rq, 
struct sk_buff *head_skb)
head_skb-dev-stats.rx_length_errors++;
return -EINVAL;
}
+   --rq-num;
if (unlikely(len  MERGE_BUFFER_LEN)) {
pr_debug(%s: rx error: merge buffer too long\n,
 head_skb-dev-name);
@@ -367,7 +368,6 @@ static int receive_mergeable(struct receive_queue *rq, 
struct sk_buff *head_skb)
skb_add_rx_frag(curr_skb, num_skb_frags, page,
offset, len, MERGE_BUFFER_LEN);
}
-   --rq-num;
}
return 0;
 }
-- 
1.8.3.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH net 3/3] virtio-net: fix resources leaking when fail to allocate frag skb

2013-11-20 Thread Jason Wang
When we fail to allocate a frag skb, we need put the refcnt of the page and drop
the rest of the buffers. Otherwise the page was leaked and driver won't get a
correct head buffer after this failure.

This bug was introduced by commit 2613af0ed18a11d5c566a81f9a6510b73180660a
(virtio_net: migrate mergeable rx buffers to page frag allocators).

Cc: Rusty Russell ru...@rustcorp.com.au
Cc: Michael S. Tsirkin m...@redhat.com
Cc: Michael Dalton mwdal...@google.com
Cc: Eric Dumazet eduma...@google.com
Signed-off-by: Jason Wang jasow...@redhat.com
---
The patch was needed for 3.12 stable.
---
 drivers/net/virtio_net.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index de1d6ca..f6f1e20 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -339,9 +339,12 @@ static int receive_mergeable(struct receive_queue *rq, 
struct sk_buff *head_skb)
 head_skb-dev-name);
len = MERGE_BUFFER_LEN;
}
+   page = virt_to_head_page(buf);
if (unlikely(num_skb_frags == MAX_SKB_FRAGS)) {
struct sk_buff *nskb = alloc_skb(0, GFP_ATOMIC);
if (unlikely(!nskb)) {
+   put_page(page);
+   drop_mergeable_buffer(rq, num_buf);
head_skb-dev-stats.rx_dropped++;
return -ENOMEM;
}
@@ -358,7 +361,6 @@ static int receive_mergeable(struct receive_queue *rq, 
struct sk_buff *head_skb)
head_skb-len += len;
head_skb-truesize += MERGE_BUFFER_LEN;
}
-   page = virt_to_head_page(buf);
offset = buf - (char *)page_address(page);
if (skb_can_coalesce(curr_skb, num_skb_frags, page, offset)) {
put_page(page);
-- 
1.8.3.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net 1/3] virtio-net: drop the rest of buffers when we can't allocate skb

2013-11-20 Thread Michael S. Tsirkin
On Wed, Nov 20, 2013 at 05:07:25PM +0800, Jason Wang wrote:
 When mergeable buffer were used, we only put the first page buf leave the rest
 of buffers in the virt queue. This will cause the driver could not get the
 correct head buffer any more. Fix this by dropping the rest of buffers for 
 this
 packet.
 
 The bug was introduced by commit 9ab86bbcf8be755256f0a5e994e0b38af6b4d399
 (virtio_net: Defer skb allocation in receive path).
 
 Cc: Rusty Russell ru...@rustcorp.com.au
 Cc: Michael S. Tsirkin m...@redhat.com
 Cc: Michael Dalton mwdal...@google.com
 Cc: Eric Dumazet eduma...@google.com
 Cc: Shirley Ma x...@us.ibm.com
 Signed-off-by: Jason Wang jasow...@redhat.com
 ---
 This patch was needed for stable
 ---
  drivers/net/virtio_net.c | 18 +-
  1 file changed, 17 insertions(+), 1 deletion(-)
 
 diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
 index 7bab4de..24fd502 100644
 --- a/drivers/net/virtio_net.c
 +++ b/drivers/net/virtio_net.c
 @@ -222,6 +222,17 @@ static void skb_xmit_done(struct virtqueue *vq)
   netif_wake_subqueue(vi-dev, vq2txq(vq));
  }
  
 +static void drop_mergeable_buffer(struct receive_queue *rq, int num_buf)
 +{
 + char *buf;
 + int len;
 +
 + while (--num_buf  (buf = virtqueue_get_buf(rq-vq, len)) != NULL) {
 + --rq-num;
 + put_page(virt_to_head_page(buf));
 + }
 +}
 +

This is the same code we have in receive_mergeable anyway.
So let's reuse that.


  /* Called from bottom half context */
  static struct sk_buff *page_to_skb(struct receive_queue *rq,
  struct page *page, unsigned int offset,
 @@ -237,8 +248,13 @@ static struct sk_buff *page_to_skb(struct receive_queue 
 *rq,
  
   /* copy small packet so we can reuse these pages for small data */
   skb = netdev_alloc_skb_ip_align(vi-dev, GOOD_COPY_LEN);
 - if (unlikely(!skb))
 + if (unlikely(!skb)) {
 + if (vi-mergeable_rx_bufs) {
 + hdr = (struct skb_vnet_hdr *)p;
 + drop_mergeable_buffer(rq, hdr-mhdr.num_buffers);
 + }
   return NULL;
 + }
  
   hdr = skb_vnet_hdr(skb);
  
 -- 
 1.8.3.2
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net 2/3] virtio-net: fix num calculation on frag skb allocation failure

2013-11-20 Thread Michael S. Tsirkin
On Wed, Nov 20, 2013 at 05:07:26PM +0800, Jason Wang wrote:
 We need decrease the rq-num after we can get a buf through
 virtqueue_get_buf() even if we could not allocate frag skb. Otherwise, the
 refill routine won't be triggered under heavy memory stress since the driver 
 may
 still think there's enough room.
 
 This bug was introduced by commit 2613af0ed18a11d5c566a81f9a6510b73180660a
 (virtio_net: migrate mergeable rx buffers to page frag allocators).
 
 Cc: Rusty Russell ru...@rustcorp.com.au
 Cc: Michael S. Tsirkin m...@redhat.com
 Cc: Michael Dalton mwdal...@google.com
 Cc: Eric Dumazet eduma...@google.com
 Signed-off-by: Jason Wang jasow...@redhat.com

So let's wrap virtqueue_get_buf to make sure we get it right?

 ---
 The patch was needed for 3.12 stable.
 ---
  drivers/net/virtio_net.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
 index 24fd502..de1d6ca 100644
 --- a/drivers/net/virtio_net.c
 +++ b/drivers/net/virtio_net.c
 @@ -333,6 +333,7 @@ static int receive_mergeable(struct receive_queue *rq, 
 struct sk_buff *head_skb)
   head_skb-dev-stats.rx_length_errors++;
   return -EINVAL;
   }
 + --rq-num;
   if (unlikely(len  MERGE_BUFFER_LEN)) {
   pr_debug(%s: rx error: merge buffer too long\n,
head_skb-dev-name);
 @@ -367,7 +368,6 @@ static int receive_mergeable(struct receive_queue *rq, 
 struct sk_buff *head_skb)
   skb_add_rx_frag(curr_skb, num_skb_frags, page,
   offset, len, MERGE_BUFFER_LEN);
   }
 - --rq-num;
   }
   return 0;
  }
 -- 
 1.8.3.2
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net 1/3] virtio-net: drop the rest of buffers when we can't allocate skb

2013-11-20 Thread Jason Wang


- 原始邮件 -
 On Wed, Nov 20, 2013 at 05:07:25PM +0800, Jason Wang wrote:
  When mergeable buffer were used, we only put the first page buf leave the
  rest
  of buffers in the virt queue. This will cause the driver could not get the
  correct head buffer any more. Fix this by dropping the rest of buffers for
  this
  packet.
  
  The bug was introduced by commit 9ab86bbcf8be755256f0a5e994e0b38af6b4d399
  (virtio_net: Defer skb allocation in receive path).
  
  Cc: Rusty Russell ru...@rustcorp.com.au
  Cc: Michael S. Tsirkin m...@redhat.com
  Cc: Michael Dalton mwdal...@google.com
  Cc: Eric Dumazet eduma...@google.com
  Cc: Shirley Ma x...@us.ibm.com
  Signed-off-by: Jason Wang jasow...@redhat.com
  ---
  This patch was needed for stable
  ---
   drivers/net/virtio_net.c | 18 +-
   1 file changed, 17 insertions(+), 1 deletion(-)
  
  diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
  index 7bab4de..24fd502 100644
  --- a/drivers/net/virtio_net.c
  +++ b/drivers/net/virtio_net.c
  @@ -222,6 +222,17 @@ static void skb_xmit_done(struct virtqueue *vq)
  netif_wake_subqueue(vi-dev, vq2txq(vq));
   }
   
  +static void drop_mergeable_buffer(struct receive_queue *rq, int num_buf)
  +{
  +   char *buf;
  +   int len;
  +
  +   while (--num_buf  (buf = virtqueue_get_buf(rq-vq, len)) != NULL) {
  +   --rq-num;
  +   put_page(virt_to_head_page(buf));
  +   }
  +}
  +
 
 This is the same code we have in receive_mergeable anyway.
 So let's reuse that.
 
 

receive_mergeable() was called after page_to_skb() was called and 
there's lots of conditions check there. I'm not sure how could we 
reuse them.
   /* Called from bottom half context */
   static struct sk_buff *page_to_skb(struct receive_queue *rq,
 struct page *page, unsigned int offset,
  @@ -237,8 +248,13 @@ static struct sk_buff *page_to_skb(struct
  receive_queue *rq,
   
  /* copy small packet so we can reuse these pages for small data */
  skb = netdev_alloc_skb_ip_align(vi-dev, GOOD_COPY_LEN);
  -   if (unlikely(!skb))
  +   if (unlikely(!skb)) {
  +   if (vi-mergeable_rx_bufs) {
  +   hdr = (struct skb_vnet_hdr *)p;
  +   drop_mergeable_buffer(rq, hdr-mhdr.num_buffers);
  +   }
  return NULL;
  +   }
   
  hdr = skb_vnet_hdr(skb);
   
  --
  1.8.3.2
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net 2/3] virtio-net: fix num calculation on frag skb allocation failure

2013-11-20 Thread Jason Wang


- 原始邮件 -
 On Wed, Nov 20, 2013 at 05:07:26PM +0800, Jason Wang wrote:
  We need decrease the rq-num after we can get a buf through
  virtqueue_get_buf() even if we could not allocate frag skb. Otherwise, the
  refill routine won't be triggered under heavy memory stress since the
  driver may
  still think there's enough room.
  
  This bug was introduced by commit 2613af0ed18a11d5c566a81f9a6510b73180660a
  (virtio_net: migrate mergeable rx buffers to page frag allocators).
  
  Cc: Rusty Russell ru...@rustcorp.com.au
  Cc: Michael S. Tsirkin m...@redhat.com
  Cc: Michael Dalton mwdal...@google.com
  Cc: Eric Dumazet eduma...@google.com
  Signed-off-by: Jason Wang jasow...@redhat.com
 
 So let's wrap virtqueue_get_buf to make sure we get it right?
 

Ok. good idea.
  ---
  The patch was needed for 3.12 stable.
  ---
   drivers/net/virtio_net.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)
  
  diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
  index 24fd502..de1d6ca 100644
  --- a/drivers/net/virtio_net.c
  +++ b/drivers/net/virtio_net.c
  @@ -333,6 +333,7 @@ static int receive_mergeable(struct receive_queue *rq,
  struct sk_buff *head_skb)
  head_skb-dev-stats.rx_length_errors++;
  return -EINVAL;
  }
  +   --rq-num;
  if (unlikely(len  MERGE_BUFFER_LEN)) {
  pr_debug(%s: rx error: merge buffer too long\n,
   head_skb-dev-name);
  @@ -367,7 +368,6 @@ static int receive_mergeable(struct receive_queue *rq,
  struct sk_buff *head_skb)
  skb_add_rx_frag(curr_skb, num_skb_frags, page,
  offset, len, MERGE_BUFFER_LEN);
  }
  -   --rq-num;
  }
  return 0;
   }
  --
  1.8.3.2
 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH RFC] virtio_net: fix error handling for mergeable buffers

2013-11-20 Thread Michael S. Tsirkin
On Wed, Nov 20, 2013 at 05:07:25PM +0800, Jason Wang wrote:
 When mergeable buffer were used, we only put the first page buf leave the rest
 of buffers in the virt queue. This will cause the driver could not get the
 correct head buffer any more. Fix this by dropping the rest of buffers for 
 this
 packet.
 
 The bug was introduced by commit 9ab86bbcf8be755256f0a5e994e0b38af6b4d399
 (virtio_net: Defer skb allocation in receive path).
 
 Cc: Rusty Russell ru...@rustcorp.com.au
 Cc: Michael S. Tsirkin m...@redhat.com
 Cc: Michael Dalton mwdal...@google.com
 Cc: Eric Dumazet eduma...@google.com
 Cc: Shirley Ma x...@us.ibm.com
 Signed-off-by: Jason Wang jasow...@redhat.com

Just to clarify my previous comment: it was not about the
idea of adding drop_mergeable_buffer - rather, I think that
adding knowledge about mergeable buffers into page_to_skb creates an
ugly internal API.

Let's move the call to page_to_skb within receive_mergeable instead:
it's also nice that int offset = buf - page_address(page) logic
is not spread around like it was.

Also, it's not nice that we ignore length errors when we drop
packets because of OOM.

So I came up with the following - it seems to work but I didn't
stress test yet.

commit ebffb3fe4335ffe07124e4518e76d6e05844fa18
Author: Michael S. Tsirkin m...@redhat.com
Date:   Wed Nov 20 14:41:29 2013 +0200

virtio_net: fix error handling for mergeable buffers

Eric Dumazet noticed that if we encounter an error
when processing a mergeable buffer, we don't
dequeue all of the buffers from this packet,
the result is almost sure to be loss of networking.

Jason Wang noticed that we also leak a page and that we don't decrement
the rq buf count, so we won't repost buffers (a resource leak).

Cc: Rusty Russell ru...@rustcorp.com.au
Cc: Michael Dalton mwdal...@google.com
Reported-by: Eric Dumazet eduma...@google.com
Reported-by: Jason Wang jasow...@redhat.com
Signed-off-by: Michael S. Tsirkin m...@redhat.com

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 01f4eb5..42f6a1e 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -296,41 +296,53 @@ static struct sk_buff *page_to_skb(struct receive_queue 
*rq,
return skb;
 }
 
-static int receive_mergeable(struct receive_queue *rq, struct sk_buff 
*head_skb)
+static struct sk_buff *receive_mergeable(struct net_device *dev,
+struct receive_queue *rq,
+void *buf,
+unsigned int len)
 {
-   struct skb_vnet_hdr *hdr = skb_vnet_hdr(head_skb);
+   struct skb_vnet_hdr *hdr = buf;
+   int num_buf = hdr-mhdr.num_buffers;
+   struct page *page = virt_to_head_page(buf);
+   int offset = buf - page_address(page);
+   struct sk_buff *head_skb = page_to_skb(rq, page, offset, len,
+  MAX_PACKET_LEN);
struct sk_buff *curr_skb = head_skb;
-   char *buf;
-   struct page *page;
-   int num_buf, len, offset;
 
-   num_buf = hdr-mhdr.num_buffers;
-   while (--num_buf) {
-   int num_skb_frags = skb_shinfo(curr_skb)-nr_frags;
+   if (unlikely(!curr_skb))
+   goto err_skb;
+
+   while (--num_buf) {
+   int num_skb_frags;
+
buf = virtqueue_get_buf(rq-vq, len);
if (unlikely(!buf)) {
-   pr_debug(%s: rx error: %d buffers missing\n,
-head_skb-dev-name, hdr-mhdr.num_buffers);
-   head_skb-dev-stats.rx_length_errors++;
-   return -EINVAL;
+   pr_debug(%s: rx error: %d buffers out of %d missing\n,
+dev-name, num_buf, hdr-mhdr.num_buffers);
+   dev-stats.rx_length_errors++;
+   goto err_buf;
}
if (unlikely(len  MAX_PACKET_LEN)) {
pr_debug(%s: rx error: merge buffer too long\n,
-head_skb-dev-name);
+dev-name);
len = MAX_PACKET_LEN;
}
+
+   page = virt_to_head_page(buf);
+   --rq-num;
+
+   num_skb_frags = skb_shinfo(curr_skb)-nr_frags;
if (unlikely(num_skb_frags == MAX_SKB_FRAGS)) {
struct sk_buff *nskb = alloc_skb(0, GFP_ATOMIC);
-   if (unlikely(!nskb)) {
-   head_skb-dev-stats.rx_dropped++;
-   return -ENOMEM;
-   }
+
+   if (unlikely(!nskb))
+   goto err_skb;
if (curr_skb == head_skb)
skb_shinfo(curr_skb)-frag_list = nskb;
else
  

Re: [PATCH net 1/3] virtio-net: drop the rest of buffers when we can't allocate skb

2013-11-20 Thread Michael S. Tsirkin
On Wed, Nov 20, 2013 at 07:08:02AM -0500, Jason Wang wrote:
 
 
 - 原始邮件 -
  On Wed, Nov 20, 2013 at 05:07:25PM +0800, Jason Wang wrote:
   When mergeable buffer were used, we only put the first page buf leave the
   rest
   of buffers in the virt queue. This will cause the driver could not get the
   correct head buffer any more. Fix this by dropping the rest of buffers for
   this
   packet.
   
   The bug was introduced by commit 9ab86bbcf8be755256f0a5e994e0b38af6b4d399
   (virtio_net: Defer skb allocation in receive path).
   
   Cc: Rusty Russell ru...@rustcorp.com.au
   Cc: Michael S. Tsirkin m...@redhat.com
   Cc: Michael Dalton mwdal...@google.com
   Cc: Eric Dumazet eduma...@google.com
   Cc: Shirley Ma x...@us.ibm.com
   Signed-off-by: Jason Wang jasow...@redhat.com
   ---
   This patch was needed for stable
   ---
drivers/net/virtio_net.c | 18 +-
1 file changed, 17 insertions(+), 1 deletion(-)
   
   diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
   index 7bab4de..24fd502 100644
   --- a/drivers/net/virtio_net.c
   +++ b/drivers/net/virtio_net.c
   @@ -222,6 +222,17 @@ static void skb_xmit_done(struct virtqueue *vq)
 netif_wake_subqueue(vi-dev, vq2txq(vq));
}

   +static void drop_mergeable_buffer(struct receive_queue *rq, int num_buf)
   +{
   + char *buf;
   + int len;
   +
   + while (--num_buf  (buf = virtqueue_get_buf(rq-vq, len)) != NULL) {
   + --rq-num;
   + put_page(virt_to_head_page(buf));
   + }
   +}
   +
  
  This is the same code we have in receive_mergeable anyway.
  So let's reuse that.
  
  
 
 receive_mergeable() was called after page_to_skb() was called and 
 there's lots of conditions check there. I'm not sure how could we 
 reuse them.

I posted a patch showing how :)

/* Called from bottom half context */
static struct sk_buff *page_to_skb(struct receive_queue *rq,
struct page *page, unsigned int offset,
   @@ -237,8 +248,13 @@ static struct sk_buff *page_to_skb(struct
   receive_queue *rq,

 /* copy small packet so we can reuse these pages for small data */
 skb = netdev_alloc_skb_ip_align(vi-dev, GOOD_COPY_LEN);
   - if (unlikely(!skb))
   + if (unlikely(!skb)) {
   + if (vi-mergeable_rx_bufs) {
   + hdr = (struct skb_vnet_hdr *)p;
   + drop_mergeable_buffer(rq, hdr-mhdr.num_buffers);
   + }
 return NULL;
   + }

 hdr = skb_vnet_hdr(skb);

   --
   1.8.3.2
  --
  To unsubscribe from this list: send the line unsubscribe linux-kernel in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
  Please read the FAQ at  http://www.tux.org/lkml/
  
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net 2/3] virtio-net: fix num calculation on frag skb allocation failure

2013-11-20 Thread Michael S. Tsirkin
On Wed, Nov 20, 2013 at 07:08:50AM -0500, Jason Wang wrote:
 
 
 - 原始邮件 -
  On Wed, Nov 20, 2013 at 05:07:26PM +0800, Jason Wang wrote:
   We need decrease the rq-num after we can get a buf through
   virtqueue_get_buf() even if we could not allocate frag skb. Otherwise, the
   refill routine won't be triggered under heavy memory stress since the
   driver may
   still think there's enough room.
   
   This bug was introduced by commit 2613af0ed18a11d5c566a81f9a6510b73180660a
   (virtio_net: migrate mergeable rx buffers to page frag allocators).
   
   Cc: Rusty Russell ru...@rustcorp.com.au
   Cc: Michael S. Tsirkin m...@redhat.com
   Cc: Michael Dalton mwdal...@google.com
   Cc: Eric Dumazet eduma...@google.com
   Signed-off-by: Jason Wang jasow...@redhat.com
  
  So let's wrap virtqueue_get_buf to make sure we get it right?
  
 
 Ok. good idea.

So I did this (below) but the compiler is not smart enough to
avoid two branches on data path.
So I'm not sure anymore: with my patch it's pretty clear how
the code works.

Signed-off-by: Michael S. Tsirkin m...@redhat.com

but I don't think we need to apply this.

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 11d9cc9..1cc2e43 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -296,6 +296,14 @@ static struct sk_buff *page_to_skb(struct receive_queue 
*rq,
return skb;
 }
 
+static void *rq_get_buf(struct receive_queue *rq, unsigned int *len)
+{
+   void *buf = virtqueue_get_buf(rq-vq, len);
+   if (buf)
+   rq-num--;
+   return buf;
+}
+
 static struct sk_buff *receive_mergeable(struct net_device *dev,
 struct receive_queue *rq,
 void *buf,
@@ -315,7 +323,7 @@ static struct sk_buff *receive_mergeable(struct net_device 
*dev,
while (--num_buf) {
int num_skb_frags;
 
-   buf = virtqueue_get_buf(rq-vq, len);
+   buf = rq_get_buf(rq, len);
if (unlikely(!buf)) {
pr_debug(%s: rx error: %d buffers out of %d missing\n,
 dev-name, num_buf, hdr-mhdr.num_buffers);
@@ -329,7 +337,6 @@ static struct sk_buff *receive_mergeable(struct net_device 
*dev,
}
 
page = virt_to_head_page(buf);
-   --rq-num;
 
num_skb_frags = skb_shinfo(curr_skb)-nr_frags;
if (unlikely(num_skb_frags == MAX_SKB_FRAGS)) {
@@ -370,7 +377,7 @@ err_buf:
dev-stats.rx_dropped++;
dev_kfree_skb(head_skb);
while (--num_buf) {
-   buf = virtqueue_get_buf(rq-vq, len);
+   buf = rq_get_buf(rq, len);
if (unlikely(!buf)) {
pr_debug(%s: rx error: %d buffers missing\n,
 dev-name, num_buf);
@@ -379,7 +386,6 @@ err_buf:
}
page = virt_to_head_page(buf);
put_page(page);
-   --rq-num;
}
return NULL;
 }
@@ -675,9 +681,8 @@ static int virtnet_poll(struct napi_struct *napi, int 
budget)
 
 again:
while (received  budget 
-  (buf = virtqueue_get_buf(rq-vq, len)) != NULL) {
+  (buf = rq_get_buf(rq, len)) != NULL) {
receive_buf(rq, buf, len);
-   --rq-num;
received++;
}
 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC] virtio_net: fix error handling for mergeable buffers

2013-11-20 Thread Jason Wang


- 原始邮件 -
 On Wed, Nov 20, 2013 at 05:07:25PM +0800, Jason Wang wrote:
  When mergeable buffer were used, we only put the first page buf leave the
  rest
  of buffers in the virt queue. This will cause the driver could not get the
  correct head buffer any more. Fix this by dropping the rest of buffers for
  this
  packet.
  
  The bug was introduced by commit 9ab86bbcf8be755256f0a5e994e0b38af6b4d399
  (virtio_net: Defer skb allocation in receive path).
  
  Cc: Rusty Russell ru...@rustcorp.com.au
  Cc: Michael S. Tsirkin m...@redhat.com
  Cc: Michael Dalton mwdal...@google.com
  Cc: Eric Dumazet eduma...@google.com
  Cc: Shirley Ma x...@us.ibm.com
  Signed-off-by: Jason Wang jasow...@redhat.com
 
 Just to clarify my previous comment: it was not about the
 idea of adding drop_mergeable_buffer - rather, I think that
 adding knowledge about mergeable buffers into page_to_skb creates an
 ugly internal API.
 
 Let's move the call to page_to_skb within receive_mergeable instead:
 it's also nice that int offset = buf - page_address(page) logic
 is not spread around like it was.
 
 Also, it's not nice that we ignore length errors when we drop
 packets because of OOM.
 
 So I came up with the following - it seems to work but I didn't
 stress test yet.

I've no objection on this. But I've rather like my small and direct patch 
to be applied to -net first. It has lower risk and was much more easier to 
be backported to stable trees. Then we can do the re-factor like this in 
net-next. 
 
 commit ebffb3fe4335ffe07124e4518e76d6e05844fa18
 Author: Michael S. Tsirkin m...@redhat.com
 Date:   Wed Nov 20 14:41:29 2013 +0200
 
 virtio_net: fix error handling for mergeable buffers
 
 Eric Dumazet noticed that if we encounter an error
 when processing a mergeable buffer, we don't
 dequeue all of the buffers from this packet,
 the result is almost sure to be loss of networking.
 
 Jason Wang noticed that we also leak a page and that we don't decrement
 the rq buf count, so we won't repost buffers (a resource leak).
 
 Cc: Rusty Russell ru...@rustcorp.com.au
 Cc: Michael Dalton mwdal...@google.com
 Reported-by: Eric Dumazet eduma...@google.com
 Reported-by: Jason Wang jasow...@redhat.com
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 
 diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
 index 01f4eb5..42f6a1e 100644
 --- a/drivers/net/virtio_net.c
 +++ b/drivers/net/virtio_net.c
 @@ -296,41 +296,53 @@ static struct sk_buff *page_to_skb(struct receive_queue
 *rq,
   return skb;
  }
  
 -static int receive_mergeable(struct receive_queue *rq, struct sk_buff
 *head_skb)
 +static struct sk_buff *receive_mergeable(struct net_device *dev,
 +  struct receive_queue *rq,
 +  void *buf,
 +  unsigned int len)
  {
 - struct skb_vnet_hdr *hdr = skb_vnet_hdr(head_skb);
 + struct skb_vnet_hdr *hdr = buf;
 + int num_buf = hdr-mhdr.num_buffers;
 + struct page *page = virt_to_head_page(buf);
 + int offset = buf - page_address(page);
 + struct sk_buff *head_skb = page_to_skb(rq, page, offset, len,
 +MAX_PACKET_LEN);
   struct sk_buff *curr_skb = head_skb;
 - char *buf;
 - struct page *page;
 - int num_buf, len, offset;
  
 - num_buf = hdr-mhdr.num_buffers;
 - while (--num_buf) {
 - int num_skb_frags = skb_shinfo(curr_skb)-nr_frags;
 + if (unlikely(!curr_skb))
 + goto err_skb;
 +
 + while (--num_buf) {
 + int num_skb_frags;
 +
   buf = virtqueue_get_buf(rq-vq, len);
   if (unlikely(!buf)) {
 - pr_debug(%s: rx error: %d buffers missing\n,
 -  head_skb-dev-name, hdr-mhdr.num_buffers);
 - head_skb-dev-stats.rx_length_errors++;
 - return -EINVAL;
 + pr_debug(%s: rx error: %d buffers out of %d missing\n,
 +  dev-name, num_buf, hdr-mhdr.num_buffers);
 + dev-stats.rx_length_errors++;
 + goto err_buf;
   }
   if (unlikely(len  MAX_PACKET_LEN)) {
   pr_debug(%s: rx error: merge buffer too long\n,
 -  head_skb-dev-name);
 +  dev-name);
   len = MAX_PACKET_LEN;
   }
 +
 + page = virt_to_head_page(buf);
 + --rq-num;
 +
 + num_skb_frags = skb_shinfo(curr_skb)-nr_frags;
   if (unlikely(num_skb_frags == MAX_SKB_FRAGS)) {
   struct sk_buff *nskb = alloc_skb(0, GFP_ATOMIC);
 - if (unlikely(!nskb)) {
 - head_skb-dev-stats.rx_dropped++;
 - return -ENOMEM;
 -

Re: [PATCH net 2/3] virtio-net: fix num calculation on frag skb allocation failure

2013-11-20 Thread Jason Wang


- 原始邮件 -
 On Wed, Nov 20, 2013 at 07:08:50AM -0500, Jason Wang wrote:
  
  
  - 原始邮件 -
   On Wed, Nov 20, 2013 at 05:07:26PM +0800, Jason Wang wrote:
We need decrease the rq-num after we can get a buf through
virtqueue_get_buf() even if we could not allocate frag skb. Otherwise,
the
refill routine won't be triggered under heavy memory stress since the
driver may
still think there's enough room.

This bug was introduced by commit
2613af0ed18a11d5c566a81f9a6510b73180660a
(virtio_net: migrate mergeable rx buffers to page frag allocators).

Cc: Rusty Russell ru...@rustcorp.com.au
Cc: Michael S. Tsirkin m...@redhat.com
Cc: Michael Dalton mwdal...@google.com
Cc: Eric Dumazet eduma...@google.com
Signed-off-by: Jason Wang jasow...@redhat.com
   
   So let's wrap virtqueue_get_buf to make sure we get it right?
   
  
  Ok. good idea.
 
 So I did this (below) but the compiler is not smart enough to
 avoid two branches on data path.
 So I'm not sure anymore: with my patch it's pretty clear how
 the code works.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 
 but I don't think we need to apply this.
 

True, another point is we'd better to handle both increasing and decreasing in 
the 
same way. We do not increase it in a helper.
 diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
 index 11d9cc9..1cc2e43 100644
 --- a/drivers/net/virtio_net.c
 +++ b/drivers/net/virtio_net.c
 @@ -296,6 +296,14 @@ static struct sk_buff *page_to_skb(struct receive_queue
 *rq,
   return skb;
  }
  
 +static void *rq_get_buf(struct receive_queue *rq, unsigned int *len)
 +{
 + void *buf = virtqueue_get_buf(rq-vq, len);
 + if (buf)
 + rq-num--;
 + return buf;
 +}
 +
  static struct sk_buff *receive_mergeable(struct net_device *dev,
struct receive_queue *rq,
void *buf,
 @@ -315,7 +323,7 @@ static struct sk_buff *receive_mergeable(struct
 net_device *dev,
   while (--num_buf) {
   int num_skb_frags;
  
 - buf = virtqueue_get_buf(rq-vq, len);
 + buf = rq_get_buf(rq, len);
   if (unlikely(!buf)) {
   pr_debug(%s: rx error: %d buffers out of %d missing\n,
dev-name, num_buf, hdr-mhdr.num_buffers);
 @@ -329,7 +337,6 @@ static struct sk_buff *receive_mergeable(struct
 net_device *dev,
   }
  
   page = virt_to_head_page(buf);
 - --rq-num;
  
   num_skb_frags = skb_shinfo(curr_skb)-nr_frags;
   if (unlikely(num_skb_frags == MAX_SKB_FRAGS)) {
 @@ -370,7 +377,7 @@ err_buf:
   dev-stats.rx_dropped++;
   dev_kfree_skb(head_skb);
   while (--num_buf) {
 - buf = virtqueue_get_buf(rq-vq, len);
 + buf = rq_get_buf(rq, len);
   if (unlikely(!buf)) {
   pr_debug(%s: rx error: %d buffers missing\n,
dev-name, num_buf);
 @@ -379,7 +386,6 @@ err_buf:
   }
   page = virt_to_head_page(buf);
   put_page(page);
 - --rq-num;
   }
   return NULL;
  }
 @@ -675,9 +681,8 @@ static int virtnet_poll(struct napi_struct *napi, int
 budget)
  
  again:
   while (received  budget 
 -(buf = virtqueue_get_buf(rq-vq, len)) != NULL) {
 +(buf = rq_get_buf(rq, len)) != NULL) {
   receive_buf(rq, buf, len);
 - --rq-num;
   received++;
   }
  
 --
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC] virtio_net: fix error handling for mergeable buffers

2013-11-20 Thread Michael S. Tsirkin
On Wed, Nov 20, 2013 at 08:54:02AM -0500, Jason Wang wrote:
 
 
 - 原始邮件 -
  On Wed, Nov 20, 2013 at 05:07:25PM +0800, Jason Wang wrote:
   When mergeable buffer were used, we only put the first page buf leave the
   rest
   of buffers in the virt queue. This will cause the driver could not get the
   correct head buffer any more. Fix this by dropping the rest of buffers for
   this
   packet.
   
   The bug was introduced by commit 9ab86bbcf8be755256f0a5e994e0b38af6b4d399
   (virtio_net: Defer skb allocation in receive path).
   
   Cc: Rusty Russell ru...@rustcorp.com.au
   Cc: Michael S. Tsirkin m...@redhat.com
   Cc: Michael Dalton mwdal...@google.com
   Cc: Eric Dumazet eduma...@google.com
   Cc: Shirley Ma x...@us.ibm.com
   Signed-off-by: Jason Wang jasow...@redhat.com
  
  Just to clarify my previous comment: it was not about the
  idea of adding drop_mergeable_buffer - rather, I think that
  adding knowledge about mergeable buffers into page_to_skb creates an
  ugly internal API.
  
  Let's move the call to page_to_skb within receive_mergeable instead:
  it's also nice that int offset = buf - page_address(page) logic
  is not spread around like it was.
  
  Also, it's not nice that we ignore length errors when we drop
  packets because of OOM.
  
  So I came up with the following - it seems to work but I didn't
  stress test yet.
 
 I've no objection on this. But I've rather like my small and direct patch 
 to be applied to -net first. It has lower risk and was much more easier to 
 be backported to stable trees. Then we can do the re-factor like this in 
 net-next. 

It makes the interfaces too messy. We are not in code freeze in -net -
only feature freeze, so no reason to make code like spagetty,
and it's only 25 lines changed as compared to 40.
It's not a huge refactoring.

It's just as easy to backport my patch too.
You just drop the goto in the new code path we added.

Let me show you (untested) - your patch is not smaller.

Signed-off-by: Michael S. Tsirkin m...@redhat.com



commit 9b442fe970d5c71311d4314edef26ee2eb16e7fb
Author: Michael S. Tsirkin m...@redhat.com
Date:   Wed Nov 20 12:44:14 2013 +0200

virtio_net: fix resource leak on alloc failure

virtio net got confused, started dropping packets.

Signed-off-by: Michael S. Tsirkin m...@redhat.com

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 9fbdfcd..df4b9d0 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -297,13 +297,22 @@ static struct sk_buff *page_to_skb(struct receive_queue 
*rq,
return skb;
 }
 
-static int receive_mergeable(struct receive_queue *rq, struct sk_buff *skb)
+static struct sk_buff *receive_mergeable(struct net_device *dev,
+struct receive_queue *rq,
+struct page *page,
+unsigned int len)
 {
-   struct skb_vnet_hdr *hdr = skb_vnet_hdr(skb);
-   struct page *page;
-   int num_buf, i, len;
+   struct skb_vnet_hdr *hdr = page_address(buf);
+   int num_buf = hdr-mhdr.num_buffers;
+   struct sk_buff *skb = page_to_skb(rq, page, len);
+   int i;
+
+   skb = page_to_skb(rq, page, len);
+
+   if (unlikely(!skb))
+   goto err_skb;
+
 
-   num_buf = hdr-mhdr.num_buffers;
while (--num_buf) {
i = skb_shinfo(skb)-nr_frags;
if (i = MAX_SKB_FRAGS) {
@@ -313,10 +322,10 @@ static int receive_mergeable(struct receive_queue *rq, 
struct sk_buff *skb)
}
page = virtqueue_get_buf(rq-vq, len);
if (!page) {
-   pr_debug(%s: rx error: %d buffers missing\n,
-skb-dev-name, hdr-mhdr.num_buffers);
-   skb-dev-stats.rx_length_errors++;
-   return -EINVAL;
+   pr_debug(%s: rx error: %d buffers %d missing\n,
+dev-name, hdr-mhdr.num_buffers, num_buf);
+   dev-stats.rx_length_errors++;
+   goto err_buf;
}
 
if (len  PAGE_SIZE)
@@ -326,7 +335,25 @@ static int receive_mergeable(struct receive_queue *rq, 
struct sk_buff *skb)
 
--rq-num;
}
-   return 0;
+   return skb;
+err_skb:
+   put_page(page);
+err_buf:
+   dev-stats.rx_dropped++;
+   dev_kfree_skb(head_skb);
+   while (--num_buf) {
+   buf = virtqueue_get_buf(rq-vq, len);
+   if (unlikely(!buf)) {
+   pr_debug(%s: rx error: %d buffers missing\n,
+dev-name, num_buf);
+   dev-stats.rx_length_errors++;
+   break;
+   }
+   page = buf;
+   give_pages(rq, page);
+   --rq-num;
+   }
+   return NULL;
 }
 
 static void receive_buf(struct 

2014 World Conference on Information Systems and Technologies; Submission: Nov. 29

2013-11-20 Thread WorldCIST
Apologies if you are receiving this mail more than once...


**
 WorldCIST'14
The 2014 World Conference on Information Systems and Technologies
April 15 - 18, Madeira Island, Portugal
   http://www.aisti.eu/worldcist14/
**

The 2014 World Conference on Information Systems and Technologies 
(WorldCIST'14: http://www.aisti.eu/worldcist14) is a global forum for 
researchers and practitioners to present and discuss the most recent 
innovations, trends, results, experiences and concerns in the several 
perspectives of Information Systems and Technologies.

We are pleased to invite you to submit your papers to WorldCISTI'14. All 
submissions will be reviewed on the basis of relevance, originality, importance 
and clarity.

 
THEMES

Submitted papers should be related with one or more of the main themes proposed 
for the Conference:

A) Information and Knowledge Management (IKM);

B) Organizational Models and Information Systems (OMIS);

C) Intelligent and Decision Support Systems (IDSS);

D) Software Systems, Architectures, Applications and Tools (SSAAT);

E) Computer Networks, Mobility and Pervasive Systems (CNMPS);

F) Human-Computer Interaction (HCI);

G) Health Informatics (HIS);

H) Information Technologies in Education (ITE).


TYPES OF SUBMISSIONS AND DECISIONS

Four types of papers can be submitted:

Full paper: Finished or consolidated RD works, to be included in one of the 
Conference themes. These papers are assigned a 10-page limit.

Short paper: Ongoing works with relevant preliminary results, open to 
discussion. These papers are assigned a 7-page limit.

Poster paper: Initial work with relevant ideas, open to discussion. These 
papers are assigned to a 4-page limit.

Company paper: Companies' papers that show practical experience, R  D, tools, 
etc., focused on some topics of the conference. These papers are assigned to a 
4-page limit.

Submitted papers must comply with the format of Advances in Intelligent Systems 
and Computing Series (see Instructions for Authors at Springer Website or 
download a DOC example) be written in English, must not have been published 
before, not be under review for any other conference or publication and not 
include any information leading to the authors’ identification. Therefore, the 
authors’ names, affiliations and bibliographic references should not be 
included in the version for evaluation by the Program Committee. This 
information should only be included in the camera-ready version, saved in Word 
or Latex format and also in PDF format. These files must be accompanied by the 
Consent to Publication form filled out, in a ZIP file, and uploaded at the 
conference management system.

All papers will be subjected to a “double-blind review” by at least two members 
of the Program Committee.

Based on Program Committee evaluation, a paper can be rejected or accepted by 
the Conference Chairs. In the later case, it can be accepted as the type 
originally submitted or as another type. Thus, full papers can be accepted as 
short papers or poster papers only. Similarly, short papers can be accepted as 
poster papers only. In these cases, the authors will be allowed to maintain the 
original number of pages in the camera-ready version.

The authors of accepted poster papers must also build and print a poster to be 
exhibited during the Conference. This poster must follow an A1 or A2 vertical 
format. The Conference includes Work Sessions where these posters are presented 
and orally discussed, with a 5 minute limit per poster.

The authors of accepted full papers will have 15 minutes to present their work 
in a Conference Work Session; approximately 5 minutes of discussion will follow 
each presentation. The authors of accepted short papers and company papers will 
have 11 minutes to present their work in a Conference Work Session; 
approximately 4 minutes of discussion will follow each presentation.


PUBLICATION AND INDEXING

To ensure that a full paper, short paper, poster paper or company paper is 
published in the Proceedings, at least one of the authors must be fully 
registered by the 24th of January 2014, and the paper must comply with the 
suggested layout and page-limit. Additionally, all recommended changes must be 
addressed by the authors before they submit the camera-ready version.

No more than one paper per registration will be published in the Conference 
Proceedings. An extra fee must be paid for publication of additional papers, 
with a maximum of one additional paper per registration.

Full and short papers will be published in Proceedings by Springer, in Advances 
in Intelligent Systems and Computing Series. Poster and company papers will be 
published in Proceedings by AISTI.

Published full and short papers will be submitted 

Re: [PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb

2013-11-20 Thread Eric Dumazet
On Wed, 2013-11-20 at 10:58 +0200, Michael S. Tsirkin wrote:
 On Tue, Nov 19, 2013 at 02:00:11PM -0800, Eric Dumazet wrote:
  On Tue, 2013-11-19 at 23:53 +0200, Michael S. Tsirkin wrote:
  
   Which NIC? Virtio? Prior to 2613af0ed18a11d5c566a81f9a6510b73180660a
   it didn't drop packets received from host as far as I can tell.
   virtio is more like a pipe than a real NIC in this respect.
  
  Prior/after to this patch, you were not posting buffers, so if packets
  were received on a physical NIC, you were dropping the packets anyway.
 
  It makes no difference at all, adding a cushion might make you feel
  better, but its really not worth it.
  
  Under memory stress, it makes better sense to drop a super big GRO
  packet (The one needing frag_list extension ...)
  
  It gives a better signal to the sender to reduce its pressure, and gives
  opportunity to free more of your memory.
  
 
 OK, but in that case one wonders whether we should do more to free memory?
 
 E.g. imagine that we dropped a packet of a specific TCP flow
 because we couldn't allocate a new packet.
 
 What happens now is that the old packet is freed as well.
 
 So quite likely the next packet in queue will get processed
 since it will reuse the memory we have just freed.
 
 The next packet and the next after it etc all will have to go through
 the net stack until they get at the socket and are dropped then
 because we missed a segment.  Even worse, GRO gets disabled so the load
 on receiver goes up instead of down.
 
 Sounds like a problem doesn't it?

I see no problem at all. GRO is a hint for high rates (and obviously
when there is enough memory)

 
 GRO actually detects it's the same flow and can see packet is
 out of sequence. Why doesn't it drop the packet then?
 Alternatively, we could (for example using the pre-allocated skb
 like I suggested) notify GRO that it should start dropping packets
 of this flow.
 
 What do you think?
 

I think we disagree a lot on memory management on networking stacks.

We did a lot of work in TCP stack and Qdisc layers to lower memory
pressure (and bufferbloat), an you seem to try hard to introduce yet
another layer of buffer bloat in virtio_net.

So add whatever you want to proudly state to your management :

Look how smart we are : we drop no packets in our layer



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC 0/3] virtio: add new notify() callback to virtio_driver

2013-11-20 Thread Heinz Graalfs
Hi,

when an active virtio block device is hot-unplugged from a KVM guest, running
affected guest user applications are not aware of any errors that occur due
to the lost device. This patch-set adds code to avoid further request queueing
when a lost block device is detected, resulting in appropriate error info.

On System z there exists no handshake mechanism between host and guest
when a device is hot-unplugged. The device is removed and no further I/O
is possible.

When an online channel device disappears on System z the kernel's CIO layer
informs the driver (virtio_ccw) about the lost device.

It's just the block device drivers that care to provide a notify
callback.

Here are some more error details:

For a particular block device virtio's request function virtblk_request()
is called by the block layer to queue requests to be handled by the host.
In case of a lost device requests can still be queued, but an appropriate
subsequent host kick usually fails. This leads to situations where no error
feedback is shown.

In order to prevent request queueing for lost devices appropriate settings
in the block layer should be made. Exploiting System z's CIO notify handler
callback, and adding a corresponding new virtio_driver notify() handler to
'inform' the block layer, solve this task.

Patch 1 adds an optional notify() callback to virtio_driver.

Patch 2 adds a new notify() callback for the virtio_blk driver. When called
for a lost device settings are made to prevent future request queueing.

Patch 3 modifies the CIO notify handler in virtio_ccw's transport layer to pass
on the lost device info to virtio's backend driver virtio_blk.

Heinz Graalfs (3):
  virtio: add notify() callback to virtio_driver
  virtio_blk: add virtblk_notify() as virtio_driver's notify() callback
  virtio_ccw: invoke virtio_driver's notify() on CIO_GONE notification

 drivers/block/virtio_blk.c| 14 ++
 drivers/s390/kvm/virtio_ccw.c | 14 --
 drivers/virtio/virtio.c   |  8 
 include/linux/virtio.h| 10 ++
 4 files changed, 44 insertions(+), 2 deletions(-)

-- 
1.8.3.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC 1/3] virtio: add notify() callback to virtio_driver

2013-11-20 Thread Heinz Graalfs
Add an optional notify() callback to virtio_driver. A backend
driver can provide this callback to perform actions for a lost
device.

notify() event values are inherited from virtio_ccw's notify()
callback. We might want to support even more of them lateron.

notify() return values are defined in include/linux/notifier.h.

Signed-off-by: Heinz Graalfs graa...@linux.vnet.ibm.com
---
 drivers/virtio/virtio.c |  8 
 include/linux/virtio.h  | 10 ++
 2 files changed, 18 insertions(+)

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index ee59b74..a09abb4 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -186,6 +186,14 @@ void unregister_virtio_driver(struct virtio_driver *driver)
 }
 EXPORT_SYMBOL_GPL(unregister_virtio_driver);
 
+int notify_virtio_device(struct virtio_device *vdev, int event)
+{
+   struct virtio_driver *drv = drv_to_virtio(vdev-dev.driver);
+
+   return drv-notify ? drv-notify(vdev, event) : NOTIFY_DONE;
+}
+EXPORT_SYMBOL_GPL(notify_virtio_device);
+
 int register_virtio_device(struct virtio_device *dev)
 {
int err;
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index f15f6e7..da18e9a 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -110,6 +110,15 @@ int register_virtio_device(struct virtio_device *dev);
 void unregister_virtio_device(struct virtio_device *dev);
 
 /**
+ * notify event values
+ * @VDEV_GONE: device gone
+ */
+enum {
+   VDEV_GONE   = 1,
+};
+int notify_virtio_device(struct virtio_device *dev, int event);
+
+/**
  * virtio_driver - operations for a virtio I/O driver
  * @driver: underlying device driver (populate name and owner).
  * @id_table: the ids serviced by this driver.
@@ -129,6 +138,7 @@ struct virtio_driver {
void (*scan)(struct virtio_device *dev);
void (*remove)(struct virtio_device *dev);
void (*config_changed)(struct virtio_device *dev);
+   int (*notify)(struct virtio_device *dev, int event);
 #ifdef CONFIG_PM
int (*freeze)(struct virtio_device *dev);
int (*restore)(struct virtio_device *dev);
-- 
1.8.3.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC 2/3] virtio_blk: add virtblk_notify() as virtio_driver's notify() callback

2013-11-20 Thread Heinz Graalfs
Add virtblk_notify() as virtio_driver's notify() callback.

When a transport driver is notified that a device disappeared it
should invoke this callback to prevent further request queueing.

Subsequent block layer calls of virtio_blk's request function will
fail, resulting in appropriate I/O errors.

Signed-off-by: Heinz Graalfs graa...@linux.vnet.ibm.com
---
 drivers/block/virtio_blk.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 2d43be4..7fc1d62 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -901,6 +901,19 @@ static void virtblk_remove(struct virtio_device *vdev)
ida_simple_remove(vd_index_ida, index);
 }
 
+static int virtblk_notify(struct virtio_device *vdev, int event)
+{
+   struct virtio_blk *vblk = vdev-priv;
+
+   if (event == VDEV_GONE) {
+   queue_flag_set_unlocked(QUEUE_FLAG_DYING, vblk-disk-queue);
+   queue_flag_set_unlocked(QUEUE_FLAG_NOMERGES, vblk-disk-queue);
+   queue_flag_set_unlocked(QUEUE_FLAG_NOXMERGES,
+   vblk-disk-queue);
+   }
+   return NOTIFY_DONE;
+}
+
 #ifdef CONFIG_PM
 static int virtblk_freeze(struct virtio_device *vdev)
 {
@@ -961,6 +974,7 @@ static struct virtio_driver virtio_blk = {
.probe  = virtblk_probe,
.remove = virtblk_remove,
.config_changed = virtblk_config_changed,
+   .notify = virtblk_notify,
 #ifdef CONFIG_PM
.freeze = virtblk_freeze,
.restore= virtblk_restore,
-- 
1.8.3.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC 3/3] virtio_ccw: invoke virtio_driver's notify() on CIO_GONE notification

2013-11-20 Thread Heinz Graalfs
virtio_ccw's notify() callback for the common IO layer invokes
virtio_driver's notify() callback to pass-on information to a
backend driver if an online device disappeared.

Signed-off-by: Heinz Graalfs graa...@linux.vnet.ibm.com
---
 drivers/s390/kvm/virtio_ccw.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/s390/kvm/virtio_ccw.c b/drivers/s390/kvm/virtio_ccw.c
index 35b9aaa..420010d 100644
--- a/drivers/s390/kvm/virtio_ccw.c
+++ b/drivers/s390/kvm/virtio_ccw.c
@@ -1064,8 +1064,18 @@ out_free:
 
 static int virtio_ccw_cio_notify(struct ccw_device *cdev, int event)
 {
-   /* TODO: Check whether we need special handling here. */
-   return 0;
+   int rc;
+   struct virtio_ccw_device *vcdev = dev_get_drvdata(cdev-dev);
+
+   switch (event) {
+   case CIO_GONE:
+   rc = notify_virtio_device(vcdev-vdev, VDEV_GONE);
+   break;
+   default:
+   rc = NOTIFY_DONE;
+   break;
+   }
+   return rc;
 }
 
 static struct ccw_device_id virtio_ids[] = {
-- 
1.8.3.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH -tip v3 00/23] kprobes: introduce NOKPROBE_SYMBOL() and general cleaning of kprobe blacklist

2013-11-20 Thread Ingo Molnar

* Frank Ch. Eigler f...@redhat.com wrote:

 masami.hiramatsu.pt wrote:
 
  [...]  This series also includes a change which prohibits probing 
  on the address in .entry.text because the code is used for very 
  low-level sensitive interrupt/syscall entries. Probing such code 
  may cause unexpected result (actually most of that area is already 
  in the kprobe blacklist).  So I've decide to prohibit probing all 
  of them. [...]
 
 Does this new blacklist cover enough that the kernel now survives a 
 broadly wildcarded perf-probe, e.g. over e.g. all of its kallsyms?

That's generally the purpose of the annotations - if it doesn't then 
that's a bug.

Thanks,

Ingo
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb

2013-11-20 Thread Michael S. Tsirkin
On Wed, Nov 20, 2013 at 07:16:33AM -0800, Eric Dumazet wrote:
 On Wed, 2013-11-20 at 10:58 +0200, Michael S. Tsirkin wrote:
  On Tue, Nov 19, 2013 at 02:00:11PM -0800, Eric Dumazet wrote:
   On Tue, 2013-11-19 at 23:53 +0200, Michael S. Tsirkin wrote:
   
Which NIC? Virtio? Prior to 2613af0ed18a11d5c566a81f9a6510b73180660a
it didn't drop packets received from host as far as I can tell.
virtio is more like a pipe than a real NIC in this respect.
   
   Prior/after to this patch, you were not posting buffers, so if packets
   were received on a physical NIC, you were dropping the packets anyway.
  
   It makes no difference at all, adding a cushion might make you feel
   better, but its really not worth it.
   
   Under memory stress, it makes better sense to drop a super big GRO
   packet (The one needing frag_list extension ...)
   
   It gives a better signal to the sender to reduce its pressure, and gives
   opportunity to free more of your memory.
   
  
  OK, but in that case one wonders whether we should do more to free memory?
  
  E.g. imagine that we dropped a packet of a specific TCP flow
  because we couldn't allocate a new packet.
  
  What happens now is that the old packet is freed as well.
  
  So quite likely the next packet in queue will get processed
  since it will reuse the memory we have just freed.
  
  The next packet and the next after it etc all will have to go through
  the net stack until they get at the socket and are dropped then
  because we missed a segment.  Even worse, GRO gets disabled so the load
  on receiver goes up instead of down.
  
  Sounds like a problem doesn't it?
 
 I see no problem at all. GRO is a hint for high rates (and obviously
 when there is enough memory)
 
  
  GRO actually detects it's the same flow and can see packet is
  out of sequence. Why doesn't it drop the packet then?
  Alternatively, we could (for example using the pre-allocated skb
  like I suggested) notify GRO that it should start dropping packets
  of this flow.
  
  What do you think?
  
 
 I think we disagree a lot on memory management on networking stacks.
 
 We did a lot of work in TCP stack and Qdisc layers to lower memory
 pressure (and bufferbloat), an you seem to try hard to introduce yet
 another layer of buffer bloat in virtio_net.
 
 So add whatever you want to proudly state to your management :
 
 Look how smart we are : we drop no packets in our layer
 

Hmm some kind of disconnect here.
I got you rmanagement about bufferbloat.

What I am saying is that maybe we should drop packets more
aggressively: when we drop one packet of a flow, why not
drop everything that's queued and is for the same flow?

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb

2013-11-20 Thread Eric Dumazet
On Wed, 2013-11-20 at 18:06 +0200, Michael S. Tsirkin wrote:

 Hmm some kind of disconnect here.
 I got you rmanagement about bufferbloat.
 
 What I am saying is that maybe we should drop packets more
 aggressively: when we drop one packet of a flow, why not
 drop everything that's queued and is for the same flow?

I really hope your TCP flows use SACK ;)

Please read the rfc 2018 introduction for details.

If a packet is dropped, it doesn't mean following packets _have_ to be
dropped.


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb

2013-11-20 Thread Michael S. Tsirkin
On Wed, Nov 20, 2013 at 08:14:21AM -0800, Eric Dumazet wrote:
 On Wed, 2013-11-20 at 18:06 +0200, Michael S. Tsirkin wrote:
 
  Hmm some kind of disconnect here.
  I got you rmanagement about bufferbloat.
  
  What I am saying is that maybe we should drop packets more
  aggressively: when we drop one packet of a flow, why not
  drop everything that's queued and is for the same flow?
 
 I really hope your TCP flows use SACK ;)
 
 Please read the rfc 2018 introduction for details.
 
 If a packet is dropped, it doesn't mean following packets _have_ to be
 dropped.


Got it, thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH -tip v3 00/23] kprobes: introduce NOKPROBE_SYMBOL() and general cleaning of kprobe blacklist

2013-11-20 Thread Steven Rostedt
On Wed, 20 Nov 2013 12:36:00 -0500
Frank Ch. Eigler f...@redhat.com wrote:

 Hi -
 
   Does this new blacklist cover enough that the kernel now survives a 
   broadly wildcarded perf-probe, e.g. over e.g. all of its kallsyms?
  
  That's generally the purpose of the annotations - if it doesn't then 
  that's a bug.
 
 AFAIK, no kernel since kprobes was introduced has ever stood up to
 that test.  perf probe lacks the wildcarding powers of systemtap, so
 one needs to resort to something like:
 
 # cat /proc/kallsyms | grep ' [tT] ' | while read addr type symbol; do
perf probe $symbol
 done

I'm curious to why one would do that. IIUC, perf now has function
tracing support.

-- Steve

 
 then wait for a few hours for that to finish. Then, or while the loop
 is still running, run
 
 # perf record -e 'probe:*' -aR sleep 1
 
 to take a kernel down.
 
 
 - FChE

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH -tip v3 00/23] kprobes: introduce NOKPROBE_SYMBOL() and general cleaning of kprobe blacklist

2013-11-20 Thread Frank Ch. Eigler

masami.hiramatsu.pt wrote:

 [...]  This series also includes a change which prohibits probing on
 the address in .entry.text because the code is used for very
 low-level sensitive interrupt/syscall entries. Probing such code may
 cause unexpected result (actually most of that area is already in
 the kprobe blacklist).  So I've decide to prohibit probing all of
 them. [...]

Does this new blacklist cover enough that the kernel now survives a
broadly wildcarded perf-probe, e.g. over e.g. all of its kallsyms?


- FChE
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH -tip v3 00/23] kprobes: introduce NOKPROBE_SYMBOL() and general cleaning of kprobe blacklist

2013-11-20 Thread Frank Ch. Eigler
Hi -

  Does this new blacklist cover enough that the kernel now survives a 
  broadly wildcarded perf-probe, e.g. over e.g. all of its kallsyms?
 
 That's generally the purpose of the annotations - if it doesn't then 
 that's a bug.

AFAIK, no kernel since kprobes was introduced has ever stood up to
that test.  perf probe lacks the wildcarding powers of systemtap, so
one needs to resort to something like:

# cat /proc/kallsyms | grep ' [tT] ' | while read addr type symbol; do
   perf probe $symbol
done

then wait for a few hours for that to finish. Then, or while the loop
is still running, run

# perf record -e 'probe:*' -aR sleep 1

to take a kernel down.


- FChE
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH -tip v3 00/23] kprobes: introduce NOKPROBE_SYMBOL() and general cleaning of kprobe blacklist

2013-11-20 Thread Josh Stone
On 11/20/2013 09:56 AM, Steven Rostedt wrote:
 On Wed, 20 Nov 2013 12:36:00 -0500
 Frank Ch. Eigler f...@redhat.com wrote:
 
 Hi -

 Does this new blacklist cover enough that the kernel now survives a 
 broadly wildcarded perf-probe, e.g. over e.g. all of its kallsyms?

 That's generally the purpose of the annotations - if it doesn't then 
 that's a bug.

 AFAIK, no kernel since kprobes was introduced has ever stood up to
 that test.  perf probe lacks the wildcarding powers of systemtap, so
 one needs to resort to something like:

 # cat /proc/kallsyms | grep ' [tT] ' | while read addr type symbol; do
perf probe $symbol
 done
 
 I'm curious to why one would do that. IIUC, perf now has function
 tracing support.

Then consider something like probing all inline call sites, which will
be sprinkled in the middle where ftrace doesn't apply.

The point is not whether there's an alternative - kprobes really ought
to be wholly safe regardless.  Slow, if you did such broad probing,
sure, but still safe.

And a real use-case probably wouldn't probe *all* functions/inlines, but
it illustrates that there are at least a few in the full set that don't
behave well.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH -tip v3 00/23] kprobes: introduce NOKPROBE_SYMBOL() and general cleaning of kprobe blacklist

2013-11-20 Thread Masami Hiramatsu
(2013/11/21 2:36), Frank Ch. Eigler wrote:
 Hi -
 
 Does this new blacklist cover enough that the kernel now survives a 
 broadly wildcarded perf-probe, e.g. over e.g. all of its kallsyms?

 That's generally the purpose of the annotations - if it doesn't then 
 that's a bug.
 
 AFAIK, no kernel since kprobes was introduced has ever stood up to
 that test.  perf probe lacks the wildcarding powers of systemtap, so
 one needs to resort to something like:
 
 # cat /proc/kallsyms | grep ' [tT] ' | while read addr type symbol; do
perf probe $symbol
 done
 
 then wait for a few hours for that to finish. Then, or while the loop
 is still running, run
 
 # perf record -e 'probe:*' -aR sleep 1
 
 to take a kernel down.

Um, indeed, current blacklist is not perfect. As I reported in this
series, I've found 2 more patterns. I guess there still have some
others.

But anyway, I don't think it is good to fix all such bugs
in this series.
This is just the first step to do it. :)

Thank you,

-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu...@hitachi.com


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] virtio_net: fix error handling for mergeable buffers

2013-11-20 Thread Jason Wang
On 11/20/2013 10:20 PM, Michael S. Tsirkin wrote:
 On Wed, Nov 20, 2013 at 08:54:02AM -0500, Jason Wang wrote:

 - 原始邮件 -
 On Wed, Nov 20, 2013 at 05:07:25PM +0800, Jason Wang wrote:
 When mergeable buffer were used, we only put the first page buf leave the
 rest
 of buffers in the virt queue. This will cause the driver could not get the
 correct head buffer any more. Fix this by dropping the rest of buffers for
 this
 packet.

 The bug was introduced by commit 9ab86bbcf8be755256f0a5e994e0b38af6b4d399
 (virtio_net: Defer skb allocation in receive path).

 Cc: Rusty Russell ru...@rustcorp.com.au
 Cc: Michael S. Tsirkin m...@redhat.com
 Cc: Michael Dalton mwdal...@google.com
 Cc: Eric Dumazet eduma...@google.com
 Cc: Shirley Ma x...@us.ibm.com
 Signed-off-by: Jason Wang jasow...@redhat.com
 Just to clarify my previous comment: it was not about the
 idea of adding drop_mergeable_buffer - rather, I think that
 adding knowledge about mergeable buffers into page_to_skb creates an
 ugly internal API.

 Let's move the call to page_to_skb within receive_mergeable instead:
 it's also nice that int offset = buf - page_address(page) logic
 is not spread around like it was.

 Also, it's not nice that we ignore length errors when we drop
 packets because of OOM.

 So I came up with the following - it seems to work but I didn't
 stress test yet.
 I've no objection on this. But I've rather like my small and direct patch 
 to be applied to -net first. It has lower risk and was much more easier to 
 be backported to stable trees. Then we can do the re-factor like this in 
 net-next. 
 It makes the interfaces too messy. 

Do you mean receive_mergeable() should call page_to_skb()? It has been
there several years. And even with your changes, for big and small
packets, page_to_skb() were still called directly receive_buf() and the
error handling were done there.
 We are not in code freeze in -net -
 only feature freeze, so no reason to make code like spagetty,
 and it's only 25 lines changed as compared to 40.
 It's not a huge refactoring.

The problem is your patch mixes several bugs fixing with re-factoring,
only one of the bug fixing was really needed for stable. At least we
should make incremental patches on this.

 It's just as easy to backport my patch too.
 You just drop the goto in the new code path we added.

It may not be so easy for the people who are not familiar with
virtio-net and the s/skb-dev/dev/g changes looks irrelevant here.
 Let me show you (untested) - your patch is not smaller.

 Signed-off-by: Michael S. Tsirkin m...@redhat.com



 commit 9b442fe970d5c71311d4314edef26ee2eb16e7fb
 Author: Michael S. Tsirkin m...@redhat.com
 Date:   Wed Nov 20 12:44:14 2013 +0200

 virtio_net: fix resource leak on alloc failure
 
 virtio net got confused, started dropping packets.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com

 diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
 index 9fbdfcd..df4b9d0 100644
 --- a/drivers/net/virtio_net.c
 +++ b/drivers/net/virtio_net.c
 @@ -297,13 +297,22 @@ static struct sk_buff *page_to_skb(struct receive_queue 
 *rq,
   return skb;
  }
  
 -static int receive_mergeable(struct receive_queue *rq, struct sk_buff *skb)
 +static struct sk_buff *receive_mergeable(struct net_device *dev,
 +  struct receive_queue *rq,
 +  struct page *page,
 +  unsigned int len)
  {
 - struct skb_vnet_hdr *hdr = skb_vnet_hdr(skb);
 - struct page *page;
 - int num_buf, i, len;
 + struct skb_vnet_hdr *hdr = page_address(buf);
 + int num_buf = hdr-mhdr.num_buffers;
 + struct sk_buff *skb = page_to_skb(rq, page, len);
 + int i;
 +
 + skb = page_to_skb(rq, page, len);
 +
 + if (unlikely(!skb))
 + goto err_skb;
 +
  
 - num_buf = hdr-mhdr.num_buffers;
   while (--num_buf) {
   i = skb_shinfo(skb)-nr_frags;
   if (i = MAX_SKB_FRAGS) {
 @@ -313,10 +322,10 @@ static int receive_mergeable(struct receive_queue *rq, 
 struct sk_buff *skb)
   }
   page = virtqueue_get_buf(rq-vq, len);
   if (!page) {
 - pr_debug(%s: rx error: %d buffers missing\n,
 -  skb-dev-name, hdr-mhdr.num_buffers);
 - skb-dev-stats.rx_length_errors++;
 - return -EINVAL;
 + pr_debug(%s: rx error: %d buffers %d missing\n,
 +  dev-name, hdr-mhdr.num_buffers, num_buf);
 + dev-stats.rx_length_errors++;
 + goto err_buf;
   }
  
   if (len  PAGE_SIZE)
 @@ -326,7 +335,25 @@ static int receive_mergeable(struct receive_queue *rq, 
 struct sk_buff *skb)
  
   --rq-num;
   }
 - return 0;
 + return skb;
 +err_skb:
 + put_page(page);
 +err_buf:
 + 

Re: [PATCH RFC 1/3] virtio: add notify() callback to virtio_driver

2013-11-20 Thread Rusty Russell
Heinz Graalfs graa...@linux.vnet.ibm.com writes:
 Add an optional notify() callback to virtio_driver. A backend
 driver can provide this callback to perform actions for a lost
 device.

 notify() event values are inherited from virtio_ccw's notify()
 callback. We might want to support even more of them lateron.

 notify() return values are defined in include/linux/notifier.h.

 Signed-off-by: Heinz Graalfs graa...@linux.vnet.ibm.com

These patches seem sensible.  I've applied them in my pending queue,
but it'd be nice to have some feedback on the virtio_blk.c patch.

Cheers,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC 2/3] virtio_blk: add virtblk_notify() as virtio_driver's notify() callback

2013-11-20 Thread Michael S. Tsirkin
On Wed, Nov 20, 2013 at 04:22:02PM +0100, Heinz Graalfs wrote:
 Add virtblk_notify() as virtio_driver's notify() callback.
 
 When a transport driver is notified that a device disappeared it
 should invoke this callback to prevent further request queueing.
 
 Subsequent block layer calls of virtio_blk's request function will
 fail, resulting in appropriate I/O errors.
 
 Signed-off-by: Heinz Graalfs graa...@linux.vnet.ibm.com
 ---
  drivers/block/virtio_blk.c | 14 ++
  1 file changed, 14 insertions(+)
 
 diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
 index 2d43be4..7fc1d62 100644
 --- a/drivers/block/virtio_blk.c
 +++ b/drivers/block/virtio_blk.c
 @@ -901,6 +901,19 @@ static void virtblk_remove(struct virtio_device *vdev)
   ida_simple_remove(vd_index_ida, index);
  }
  
 +static int virtblk_notify(struct virtio_device *vdev, int event)
 +{
 + struct virtio_blk *vblk = vdev-priv;
 +
 + if (event == VDEV_GONE) {
 + queue_flag_set_unlocked(QUEUE_FLAG_DYING, vblk-disk-queue);
 + queue_flag_set_unlocked(QUEUE_FLAG_NOMERGES, vblk-disk-queue);
 + queue_flag_set_unlocked(QUEUE_FLAG_NOXMERGES,
 + vblk-disk-queue);
 + }
 + return NOTIFY_DONE;
 +}
 +

But what serializes with the block layer?
And is unlocked really safe here?
Don't we need to take the queue lock?


  #ifdef CONFIG_PM
  static int virtblk_freeze(struct virtio_device *vdev)
  {
 @@ -961,6 +974,7 @@ static struct virtio_driver virtio_blk = {
   .probe  = virtblk_probe,
   .remove = virtblk_remove,
   .config_changed = virtblk_config_changed,
 + .notify = virtblk_notify,
  #ifdef CONFIG_PM
   .freeze = virtblk_freeze,
   .restore= virtblk_restore,
 -- 
 1.8.3.1
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC 2/3] virtio_blk: add virtblk_notify() as virtio_driver's notify() callback

2013-11-20 Thread Michael S. Tsirkin
On Wed, Nov 20, 2013 at 04:22:02PM +0100, Heinz Graalfs wrote:
 Add virtblk_notify() as virtio_driver's notify() callback.
 
 When a transport driver is notified that a device disappeared it
 should invoke this callback to prevent further request queueing.
 
 Subsequent block layer calls of virtio_blk's request function will
 fail, resulting in appropriate I/O errors.
 
 Signed-off-by: Heinz Graalfs graa...@linux.vnet.ibm.com
 ---
  drivers/block/virtio_blk.c | 14 ++
  1 file changed, 14 insertions(+)
 
 diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
 index 2d43be4..7fc1d62 100644
 --- a/drivers/block/virtio_blk.c
 +++ b/drivers/block/virtio_blk.c
 @@ -901,6 +901,19 @@ static void virtblk_remove(struct virtio_device *vdev)
   ida_simple_remove(vd_index_ida, index);
  }
  
 +static int virtblk_notify(struct virtio_device *vdev, int event)
 +{
 + struct virtio_blk *vblk = vdev-priv;
 +
 + if (event == VDEV_GONE) {
 + queue_flag_set_unlocked(QUEUE_FLAG_DYING, vblk-disk-queue);
 + queue_flag_set_unlocked(QUEUE_FLAG_NOMERGES, vblk-disk-queue);
 + queue_flag_set_unlocked(QUEUE_FLAG_NOXMERGES,
 + vblk-disk-queue);
 + }
 + return NOTIFY_DONE;

Also pls include linux/notifier.h for this value.

 +}
 +
  #ifdef CONFIG_PM
  static int virtblk_freeze(struct virtio_device *vdev)
  {
 @@ -961,6 +974,7 @@ static struct virtio_driver virtio_blk = {
   .probe  = virtblk_probe,
   .remove = virtblk_remove,
   .config_changed = virtblk_config_changed,
 + .notify = virtblk_notify,
  #ifdef CONFIG_PM
   .freeze = virtblk_freeze,
   .restore= virtblk_restore,
 -- 
 1.8.3.1
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC 1/3] virtio: add notify() callback to virtio_driver

2013-11-20 Thread Michael S. Tsirkin
On Wed, Nov 20, 2013 at 04:22:01PM +0100, Heinz Graalfs wrote:
 Add an optional notify() callback to virtio_driver. A backend
 driver can provide this callback to perform actions for a lost
 device.
 
 notify() event values are inherited from virtio_ccw's notify()
 callback. We might want to support even more of them lateron.
 
 notify() return values are defined in include/linux/notifier.h.
 
 Signed-off-by: Heinz Graalfs graa...@linux.vnet.ibm.com
 ---
  drivers/virtio/virtio.c |  8 
  include/linux/virtio.h  | 10 ++
  2 files changed, 18 insertions(+)
 
 diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
 index ee59b74..a09abb4 100644
 --- a/drivers/virtio/virtio.c
 +++ b/drivers/virtio/virtio.c
 @@ -186,6 +186,14 @@ void unregister_virtio_driver(struct virtio_driver 
 *driver)
  }
  EXPORT_SYMBOL_GPL(unregister_virtio_driver);
  
 +int notify_virtio_device(struct virtio_device *vdev, int event)
 +{
 + struct virtio_driver *drv = drv_to_virtio(vdev-dev.driver);
 +
 + return drv-notify ? drv-notify(vdev, event) : NOTIFY_DONE;

Also pls include linux/notifier.h for this value.

 +}
 +EXPORT_SYMBOL_GPL(notify_virtio_device);
 +
  int register_virtio_device(struct virtio_device *dev)
  {
   int err;
 diff --git a/include/linux/virtio.h b/include/linux/virtio.h
 index f15f6e7..da18e9a 100644
 --- a/include/linux/virtio.h
 +++ b/include/linux/virtio.h
 @@ -110,6 +110,15 @@ int register_virtio_device(struct virtio_device *dev);
  void unregister_virtio_device(struct virtio_device *dev);
  
  /**
 + * notify event values
 + * @VDEV_GONE: device gone
 + */
 +enum {
 + VDEV_GONE   = 1,

Seems a bit short, can lead to namespace pollution.
Let's rename to VIRTIO_DEVICE_GONE ?

 +};
 +int notify_virtio_device(struct virtio_device *dev, int event);
 +
 +/**
   * virtio_driver - operations for a virtio I/O driver
   * @driver: underlying device driver (populate name and owner).
   * @id_table: the ids serviced by this driver.
 @@ -129,6 +138,7 @@ struct virtio_driver {
   void (*scan)(struct virtio_device *dev);
   void (*remove)(struct virtio_device *dev);
   void (*config_changed)(struct virtio_device *dev);
 + int (*notify)(struct virtio_device *dev, int event);
  #ifdef CONFIG_PM
   int (*freeze)(struct virtio_device *dev);
   int (*restore)(struct virtio_device *dev);
 -- 
 1.8.3.1
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC 0/3] virtio: add new notify() callback to virtio_driver

2013-11-20 Thread Michael S. Tsirkin
On Wed, Nov 20, 2013 at 04:22:00PM +0100, Heinz Graalfs wrote:
 Hi,
 
 when an active virtio block device is hot-unplugged from a KVM guest, running
 affected guest user applications are not aware of any errors that occur due
 to the lost device. This patch-set adds code to avoid further request queueing
 when a lost block device is detected, resulting in appropriate error info.
 
 On System z there exists no handshake mechanism between host and guest
 when a device is hot-unplugged. The device is removed and no further I/O
 is possible.
 
 When an online channel device disappears on System z the kernel's CIO layer
 informs the driver (virtio_ccw) about the lost device.
 
 It's just the block device drivers that care to provide a notify
 callback.
 
 Here are some more error details:
 
 For a particular block device virtio's request function virtblk_request()
 is called by the block layer to queue requests to be handled by the host.
 In case of a lost device requests can still be queued, but an appropriate
 subsequent host kick usually fails. This leads to situations where no error
 feedback is shown.
 
 In order to prevent request queueing for lost devices appropriate settings
 in the block layer should be made. Exploiting System z's CIO notify handler
 callback, and adding a corresponding new virtio_driver notify() handler to
 'inform' the block layer, solve this task.
 
 Patch 1 adds an optional notify() callback to virtio_driver.
 
 Patch 2 adds a new notify() callback for the virtio_blk driver. When called
 for a lost device settings are made to prevent future request queueing.
 
 Patch 3 modifies the CIO notify handler in virtio_ccw's transport layer to 
 pass
 on the lost device info to virtio's backend driver virtio_blk.

Question: I guess remove callback is invoked eventually?
Could you please clarify why isn't this sufficient?



 Heinz Graalfs (3):
   virtio: add notify() callback to virtio_driver
   virtio_blk: add virtblk_notify() as virtio_driver's notify() callback
   virtio_ccw: invoke virtio_driver's notify() on CIO_GONE notification
 
  drivers/block/virtio_blk.c| 14 ++
  drivers/s390/kvm/virtio_ccw.c | 14 --
  drivers/virtio/virtio.c   |  8 
  include/linux/virtio.h| 10 ++
  4 files changed, 44 insertions(+), 2 deletions(-)
 
 -- 
 1.8.3.1
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH -tip v3 00/23] kprobes: introduce NOKPROBE_SYMBOL() and general cleaning of kprobe blacklist

2013-11-20 Thread Ingo Molnar

* Masami Hiramatsu masami.hiramatsu...@hitachi.com wrote:

 (2013/11/21 2:36), Frank Ch. Eigler wrote:

[ ... ]
  one needs to resort to something like:
  
  # cat /proc/kallsyms | grep ' [tT] ' | while read addr type symbol; do
 perf probe $symbol
  done
  
  then wait for a few hours for that to finish. Then, or while the loop
  is still running, run
  
  # perf record -e 'probe:*' -aR sleep 1
  
  to take a kernel down.
 
 Um, indeed, current blacklist is not perfect. [...]

Then it needs to be fixed ASAP!

 [...] As I reported in this series, I've found 2 more patterns. I 
 guess there still have some others.
 
 But anyway, I don't think it is good to fix all such bugs in this 
 series.

Fixing these bugs is far more important than any cleanup work.

We can apply the fixes together with your cleanup series to make it 
all simpler, but the bug fixing absolutely needs to happen right now.

Thanks,

Ingo
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization