date:20131120

Re: [PATCH -tip v3 00/23] kprobes: introduce NOKPROBE_SYMBOL() and general cleaning of kprobe blacklist

2013-11-20 Thread Ingo Molnar


* Masami Hiramatsu  wrote:

> (2013/11/21 2:36), Frank Ch. Eigler wrote:

[ ... ]
> > one needs to resort to something like:
> > 
> > # cat /proc/kallsyms | grep ' [tT] ' | while read addr type symbol; do
> >perf probe $symbol
> > done
> > 
> > then wait for a few hours for that to finish. Then, or while the loop
> > is still running, run
> > 
> > # perf record -e 'probe:*' -aR sleep 1
> > 
> > to take a kernel down.
> 
> Um, indeed, current blacklist is not perfect. [...]

Then it needs to be fixed ASAP!

> [...] As I reported in this series, I've found 2 more patterns. I 
> guess there still have some others.
> 
> But anyway, I don't think it is good to fix all such bugs in this 
> series.

Fixing these bugs is far more important than any cleanup work.

We can apply the fixes together with your cleanup series to make it 
all simpler, but the bug fixing absolutely needs to happen right now.

Thanks,

Ingo
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC 0/3] virtio: add new notify() callback to virtio_driver

2013-11-20 Thread Michael S. Tsirkin

On Wed, Nov 20, 2013 at 04:22:00PM +0100, Heinz Graalfs wrote:
> Hi,
> 
> when an active virtio block device is hot-unplugged from a KVM guest, running
> affected guest user applications are not aware of any errors that occur due
> to the lost device. This patch-set adds code to avoid further request queueing
> when a lost block device is detected, resulting in appropriate error info.
> 
> On System z there exists no handshake mechanism between host and guest
> when a device is hot-unplugged. The device is removed and no further I/O
> is possible.
> 
> When an online channel device disappears on System z the kernel's CIO layer
> informs the driver (virtio_ccw) about the lost device.
> 
> It's just the block device drivers that care to provide a notify
> callback.
> 
> Here are some more error details:
> 
> For a particular block device virtio's request function virtblk_request()
> is called by the block layer to queue requests to be handled by the host.
> In case of a lost device requests can still be queued, but an appropriate
> subsequent host kick usually fails. This leads to situations where no error
> feedback is shown.
> 
> In order to prevent request queueing for lost devices appropriate settings
> in the block layer should be made. Exploiting System z's CIO notify handler
> callback, and adding a corresponding new virtio_driver notify() handler to
> 'inform' the block layer, solve this task.
> 
> Patch 1 adds an optional notify() callback to virtio_driver.
> 
> Patch 2 adds a new notify() callback for the virtio_blk driver. When called
> for a lost device settings are made to prevent future request queueing.
> 
> Patch 3 modifies the CIO notify handler in virtio_ccw's transport layer to 
> pass
> on the lost device info to virtio's backend driver virtio_blk.

Question: I guess remove callback is invoked eventually?
Could you please clarify why isn't this sufficient?



> Heinz Graalfs (3):
>   virtio: add notify() callback to virtio_driver
>   virtio_blk: add virtblk_notify() as virtio_driver's notify() callback
>   virtio_ccw: invoke virtio_driver's notify() on CIO_GONE notification
> 
>  drivers/block/virtio_blk.c| 14 ++
>  drivers/s390/kvm/virtio_ccw.c | 14 --
>  drivers/virtio/virtio.c   |  8 
>  include/linux/virtio.h| 10 ++
>  4 files changed, 44 insertions(+), 2 deletions(-)
> 
> -- 
> 1.8.3.1
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC 1/3] virtio: add notify() callback to virtio_driver

2013-11-20 Thread Michael S. Tsirkin

On Wed, Nov 20, 2013 at 04:22:01PM +0100, Heinz Graalfs wrote:
> Add an optional notify() callback to virtio_driver. A backend
> driver can provide this callback to perform actions for a lost
> device.
> 
> notify() event values are inherited from virtio_ccw's notify()
> callback. We might want to support even more of them lateron.
> 
> notify() return values are defined in include/linux/notifier.h.
> 
> Signed-off-by: Heinz Graalfs 
> ---
>  drivers/virtio/virtio.c |  8 
>  include/linux/virtio.h  | 10 ++
>  2 files changed, 18 insertions(+)
> 
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index ee59b74..a09abb4 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -186,6 +186,14 @@ void unregister_virtio_driver(struct virtio_driver 
> *driver)
>  }
>  EXPORT_SYMBOL_GPL(unregister_virtio_driver);
>  
> +int notify_virtio_device(struct virtio_device *vdev, int event)
> +{
> + struct virtio_driver *drv = drv_to_virtio(vdev->dev.driver);
> +
> + return drv->notify ? drv->notify(vdev, event) : NOTIFY_DONE;

Also pls include linux/notifier.h for this value.

> +}
> +EXPORT_SYMBOL_GPL(notify_virtio_device);
> +
>  int register_virtio_device(struct virtio_device *dev)
>  {
>   int err;
> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index f15f6e7..da18e9a 100644
> --- a/include/linux/virtio.h
> +++ b/include/linux/virtio.h
> @@ -110,6 +110,15 @@ int register_virtio_device(struct virtio_device *dev);
>  void unregister_virtio_device(struct virtio_device *dev);
>  
>  /**
> + * notify event values
> + * @VDEV_GONE: device gone
> + */
> +enum {
> + VDEV_GONE   = 1,

Seems a bit short, can lead to namespace pollution.
Let's rename to VIRTIO_DEVICE_GONE ?

> +};
> +int notify_virtio_device(struct virtio_device *dev, int event);
> +
> +/**
>   * virtio_driver - operations for a virtio I/O driver
>   * @driver: underlying device driver (populate name and owner).
>   * @id_table: the ids serviced by this driver.
> @@ -129,6 +138,7 @@ struct virtio_driver {
>   void (*scan)(struct virtio_device *dev);
>   void (*remove)(struct virtio_device *dev);
>   void (*config_changed)(struct virtio_device *dev);
> + int (*notify)(struct virtio_device *dev, int event);
>  #ifdef CONFIG_PM
>   int (*freeze)(struct virtio_device *dev);
>   int (*restore)(struct virtio_device *dev);
> -- 
> 1.8.3.1
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC 2/3] virtio_blk: add virtblk_notify() as virtio_driver's notify() callback

2013-11-20 Thread Michael S. Tsirkin

On Wed, Nov 20, 2013 at 04:22:02PM +0100, Heinz Graalfs wrote:
> Add virtblk_notify() as virtio_driver's notify() callback.
> 
> When a transport driver is notified that a device disappeared it
> should invoke this callback to prevent further request queueing.
> 
> Subsequent block layer calls of virtio_blk's request function will
> fail, resulting in appropriate I/O errors.
> 
> Signed-off-by: Heinz Graalfs 
> ---
>  drivers/block/virtio_blk.c | 14 ++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index 2d43be4..7fc1d62 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -901,6 +901,19 @@ static void virtblk_remove(struct virtio_device *vdev)
>   ida_simple_remove(&vd_index_ida, index);
>  }
>  
> +static int virtblk_notify(struct virtio_device *vdev, int event)
> +{
> + struct virtio_blk *vblk = vdev->priv;
> +
> + if (event == VDEV_GONE) {
> + queue_flag_set_unlocked(QUEUE_FLAG_DYING, vblk->disk->queue);
> + queue_flag_set_unlocked(QUEUE_FLAG_NOMERGES, vblk->disk->queue);
> + queue_flag_set_unlocked(QUEUE_FLAG_NOXMERGES,
> + vblk->disk->queue);
> + }
> + return NOTIFY_DONE;

Also pls include linux/notifier.h for this value.

> +}
> +
>  #ifdef CONFIG_PM
>  static int virtblk_freeze(struct virtio_device *vdev)
>  {
> @@ -961,6 +974,7 @@ static struct virtio_driver virtio_blk = {
>   .probe  = virtblk_probe,
>   .remove = virtblk_remove,
>   .config_changed = virtblk_config_changed,
> + .notify = virtblk_notify,
>  #ifdef CONFIG_PM
>   .freeze = virtblk_freeze,
>   .restore= virtblk_restore,
> -- 
> 1.8.3.1
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC 2/3] virtio_blk: add virtblk_notify() as virtio_driver's notify() callback

2013-11-20 Thread Michael S. Tsirkin

On Wed, Nov 20, 2013 at 04:22:02PM +0100, Heinz Graalfs wrote:
> Add virtblk_notify() as virtio_driver's notify() callback.
> 
> When a transport driver is notified that a device disappeared it
> should invoke this callback to prevent further request queueing.
> 
> Subsequent block layer calls of virtio_blk's request function will
> fail, resulting in appropriate I/O errors.
> 
> Signed-off-by: Heinz Graalfs 
> ---
>  drivers/block/virtio_blk.c | 14 ++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index 2d43be4..7fc1d62 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -901,6 +901,19 @@ static void virtblk_remove(struct virtio_device *vdev)
>   ida_simple_remove(&vd_index_ida, index);
>  }
>  
> +static int virtblk_notify(struct virtio_device *vdev, int event)
> +{
> + struct virtio_blk *vblk = vdev->priv;
> +
> + if (event == VDEV_GONE) {
> + queue_flag_set_unlocked(QUEUE_FLAG_DYING, vblk->disk->queue);
> + queue_flag_set_unlocked(QUEUE_FLAG_NOMERGES, vblk->disk->queue);
> + queue_flag_set_unlocked(QUEUE_FLAG_NOXMERGES,
> + vblk->disk->queue);
> + }
> + return NOTIFY_DONE;
> +}
> +

But what serializes with the block layer?
And is unlocked really safe here?
Don't we need to take the queue lock?


>  #ifdef CONFIG_PM
>  static int virtblk_freeze(struct virtio_device *vdev)
>  {
> @@ -961,6 +974,7 @@ static struct virtio_driver virtio_blk = {
>   .probe  = virtblk_probe,
>   .remove = virtblk_remove,
>   .config_changed = virtblk_config_changed,
> + .notify = virtblk_notify,
>  #ifdef CONFIG_PM
>   .freeze = virtblk_freeze,
>   .restore= virtblk_restore,
> -- 
> 1.8.3.1
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC 1/3] virtio: add notify() callback to virtio_driver

2013-11-20 Thread Rusty Russell

Heinz Graalfs  writes:
> Add an optional notify() callback to virtio_driver. A backend
> driver can provide this callback to perform actions for a lost
> device.
>
> notify() event values are inherited from virtio_ccw's notify()
> callback. We might want to support even more of them lateron.
>
> notify() return values are defined in include/linux/notifier.h.
>
> Signed-off-by: Heinz Graalfs 

These patches seem sensible.  I've applied them in my pending queue,
but it'd be nice to have some feedback on the virtio_blk.c patch.

Cheers,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC] virtio_net: fix error handling for mergeable buffers

2013-11-20 Thread Jason Wang

On 11/20/2013 10:20 PM, Michael S. Tsirkin wrote:
> On Wed, Nov 20, 2013 at 08:54:02AM -0500, Jason Wang wrote:
>>
>> - 原始邮件 -
>>> On Wed, Nov 20, 2013 at 05:07:25PM +0800, Jason Wang wrote:
 When mergeable buffer were used, we only put the first page buf leave the
 rest
 of buffers in the virt queue. This will cause the driver could not get the
 correct head buffer any more. Fix this by dropping the rest of buffers for
 this
 packet.

 The bug was introduced by commit 9ab86bbcf8be755256f0a5e994e0b38af6b4d399
 (virtio_net: Defer skb allocation in receive path).

 Cc: Rusty Russell 
 Cc: Michael S. Tsirkin 
 Cc: Michael Dalton 
 Cc: Eric Dumazet 
 Cc: Shirley Ma 
 Signed-off-by: Jason Wang 
>>> Just to clarify my previous comment: it was not about the
>>> idea of adding drop_mergeable_buffer - rather, I think that
>>> adding knowledge about mergeable buffers into page_to_skb creates an
>>> ugly internal API.
>>>
>>> Let's move the call to page_to_skb within receive_mergeable instead:
>>> it's also nice that int offset = buf - page_address(page) logic
>>> is not spread around like it was.
>>>
>>> Also, it's not nice that we ignore length errors when we drop
>>> packets because of OOM.
>>>
>>> So I came up with the following - it seems to work but I didn't
>>> stress test yet.
>> I've no objection on this. But I've rather like my small and direct patch 
>> to be applied to -net first. It has lower risk and was much more easier to 
>> be backported to stable trees. Then we can do the re-factor like this in 
>> net-next. 
> It makes the interfaces too messy. 

Do you mean receive_mergeable() should call page_to_skb()? It has been
there several years. And even with your changes, for big and small
packets, page_to_skb() were still called directly receive_buf() and the
error handling were done there.
> We are not in code freeze in -net -
> only feature freeze, so no reason to make code like spagetty,
> and it's only 25 lines changed as compared to 40.
> It's not a huge refactoring.

The problem is your patch mixes several bugs fixing with re-factoring,
only one of the bug fixing was really needed for stable. At least we
should make incremental patches on this.
>
> It's just as easy to backport my patch too.
> You just drop the goto in the new code path we added.

It may not be so easy for the people who are not familiar with
virtio-net and the s/skb->dev/dev/g changes looks irrelevant here.
> Let me show you (untested) - your patch is not smaller.
>
> Signed-off-by: Michael S. Tsirkin 
>
>
>
> commit 9b442fe970d5c71311d4314edef26ee2eb16e7fb
> Author: Michael S. Tsirkin 
> Date:   Wed Nov 20 12:44:14 2013 +0200
>
> virtio_net: fix resource leak on alloc failure
> 
> virtio net got confused, started dropping packets.
> 
> Signed-off-by: Michael S. Tsirkin 
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 9fbdfcd..df4b9d0 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -297,13 +297,22 @@ static struct sk_buff *page_to_skb(struct receive_queue 
> *rq,
>   return skb;
>  }
>  
> -static int receive_mergeable(struct receive_queue *rq, struct sk_buff *skb)
> +static struct sk_buff *receive_mergeable(struct net_device *dev,
> +  struct receive_queue *rq,
> +  struct page *page,
> +  unsigned int len)
>  {
> - struct skb_vnet_hdr *hdr = skb_vnet_hdr(skb);
> - struct page *page;
> - int num_buf, i, len;
> + struct skb_vnet_hdr *hdr = page_address(buf);
> + int num_buf = hdr->mhdr.num_buffers;
> + struct sk_buff *skb = page_to_skb(rq, page, len);
> + int i;
> +
> + skb = page_to_skb(rq, page, len);
> +
> + if (unlikely(!skb))
> + goto err_skb;
> +
>  
> - num_buf = hdr->mhdr.num_buffers;
>   while (--num_buf) {
>   i = skb_shinfo(skb)->nr_frags;
>   if (i >= MAX_SKB_FRAGS) {
> @@ -313,10 +322,10 @@ static int receive_mergeable(struct receive_queue *rq, 
> struct sk_buff *skb)
>   }
>   page = virtqueue_get_buf(rq->vq, &len);
>   if (!page) {
> - pr_debug("%s: rx error: %d buffers missing\n",
> -  skb->dev->name, hdr->mhdr.num_buffers);
> - skb->dev->stats.rx_length_errors++;
> - return -EINVAL;
> + pr_debug("%s: rx error: %d buffers %d missing\n",
> +  dev->name, hdr->mhdr.num_buffers, num_buf);
> + dev->stats.rx_length_errors++;
> + goto err_buf;
>   }
>  
>   if (len > PAGE_SIZE)
> @@ -326,7 +335,25 @@ static int receive_mergeable(struct receive_queue *rq, 
> struct sk_buff *skb)
>  
>   --rq->num;
>   }
> - return 0;
> + return skb

Re: [PATCH -tip v3 00/23] kprobes: introduce NOKPROBE_SYMBOL() and general cleaning of kprobe blacklist

2013-11-20 Thread Masami Hiramatsu

(2013/11/21 2:36), Frank Ch. Eigler wrote:
> Hi -
> 
>>> Does this new blacklist cover enough that the kernel now survives a 
>>> broadly wildcarded perf-probe, e.g. over e.g. all of its kallsyms?
>>
>> That's generally the purpose of the annotations - if it doesn't then 
>> that's a bug.
> 
> AFAIK, no kernel since kprobes was introduced has ever stood up to
> that test.  perf probe lacks the wildcarding powers of systemtap, so
> one needs to resort to something like:
> 
> # cat /proc/kallsyms | grep ' [tT] ' | while read addr type symbol; do
>perf probe $symbol
> done
> 
> then wait for a few hours for that to finish. Then, or while the loop
> is still running, run
> 
> # perf record -e 'probe:*' -aR sleep 1
> 
> to take a kernel down.

Um, indeed, current blacklist is not perfect. As I reported in this
series, I've found 2 more patterns. I guess there still have some
others.

But anyway, I don't think it is good to fix all such bugs
in this series.
This is just the first step to do it. :)

Thank you,

-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu...@hitachi.com


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH -tip v3 00/23] kprobes: introduce NOKPROBE_SYMBOL() and general cleaning of kprobe blacklist

2013-11-20 Thread Josh Stone

On 11/20/2013 09:56 AM, Steven Rostedt wrote:
> On Wed, 20 Nov 2013 12:36:00 -0500
> "Frank Ch. Eigler"  wrote:
> 
>> Hi -
>>
 Does this new blacklist cover enough that the kernel now survives a 
 broadly wildcarded perf-probe, e.g. over e.g. all of its kallsyms?
>>>
>>> That's generally the purpose of the annotations - if it doesn't then 
>>> that's a bug.
>>
>> AFAIK, no kernel since kprobes was introduced has ever stood up to
>> that test.  perf probe lacks the wildcarding powers of systemtap, so
>> one needs to resort to something like:
>>
>> # cat /proc/kallsyms | grep ' [tT] ' | while read addr type symbol; do
>>perf probe $symbol
>> done
> 
> I'm curious to why one would do that. IIUC, perf now has function
> tracing support.

Then consider something like probing all inline "call" sites, which will
be sprinkled in the middle where ftrace doesn't apply.

The point is not whether there's an alternative - kprobes really ought
to be wholly safe regardless.  Slow, if you did such broad probing,
sure, but still safe.

And a real use-case probably wouldn't probe *all* functions/inlines, but
it illustrates that there are at least a few in the full set that don't
behave well.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH -tip v3 00/23] kprobes: introduce NOKPROBE_SYMBOL() and general cleaning of kprobe blacklist

2013-11-20 Thread Frank Ch. Eigler

Hi -

> > Does this new blacklist cover enough that the kernel now survives a 
> > broadly wildcarded perf-probe, e.g. over e.g. all of its kallsyms?
> 
> That's generally the purpose of the annotations - if it doesn't then 
> that's a bug.

AFAIK, no kernel since kprobes was introduced has ever stood up to
that test.  perf probe lacks the wildcarding powers of systemtap, so
one needs to resort to something like:

# cat /proc/kallsyms | grep ' [tT] ' | while read addr type symbol; do
   perf probe $symbol
done

then wait for a few hours for that to finish. Then, or while the loop
is still running, run

# perf record -e 'probe:*' -aR sleep 1

to take a kernel down.


- FChE
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH -tip v3 00/23] kprobes: introduce NOKPROBE_SYMBOL() and general cleaning of kprobe blacklist

2013-11-20 Thread Frank Ch. Eigler


masami.hiramatsu.pt wrote:

> [...]  This series also includes a change which prohibits probing on
> the address in .entry.text because the code is used for very
> low-level sensitive interrupt/syscall entries. Probing such code may
> cause unexpected result (actually most of that area is already in
> the kprobe blacklist).  So I've decide to prohibit probing all of
> them. [...]

Does this new blacklist cover enough that the kernel now survives a
broadly wildcarded perf-probe, e.g. over e.g. all of its kallsyms?


- FChE
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH -tip v3 00/23] kprobes: introduce NOKPROBE_SYMBOL() and general cleaning of kprobe blacklist

2013-11-20 Thread Steven Rostedt

On Wed, 20 Nov 2013 12:36:00 -0500
"Frank Ch. Eigler"  wrote:

> Hi -
> 
> > > Does this new blacklist cover enough that the kernel now survives a 
> > > broadly wildcarded perf-probe, e.g. over e.g. all of its kallsyms?
> > 
> > That's generally the purpose of the annotations - if it doesn't then 
> > that's a bug.
> 
> AFAIK, no kernel since kprobes was introduced has ever stood up to
> that test.  perf probe lacks the wildcarding powers of systemtap, so
> one needs to resort to something like:
> 
> # cat /proc/kallsyms | grep ' [tT] ' | while read addr type symbol; do
>perf probe $symbol
> done

I'm curious to why one would do that. IIUC, perf now has function
tracing support.

-- Steve

> 
> then wait for a few hours for that to finish. Then, or while the loop
> is still running, run
> 
> # perf record -e 'probe:*' -aR sleep 1
> 
> to take a kernel down.
> 
> 
> - FChE

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb

2013-11-20 Thread Michael S. Tsirkin

On Wed, Nov 20, 2013 at 08:14:21AM -0800, Eric Dumazet wrote:
> On Wed, 2013-11-20 at 18:06 +0200, Michael S. Tsirkin wrote:
> 
> > Hmm some kind of disconnect here.
> > I got you rmanagement about bufferbloat.
> > 
> > What I am saying is that maybe we should drop packets more
> > aggressively: when we drop one packet of a flow, why not
> > drop everything that's queued and is for the same flow?
> 
> I really hope your TCP flows use SACK ;)
> 
> Please read the rfc 2018 introduction for details.
> 
> If a packet is dropped, it doesn't mean following packets _have_ to be
> dropped.


Got it, thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb

2013-11-20 Thread Eric Dumazet

On Wed, 2013-11-20 at 18:06 +0200, Michael S. Tsirkin wrote:

> Hmm some kind of disconnect here.
> I got you rmanagement about bufferbloat.
> 
> What I am saying is that maybe we should drop packets more
> aggressively: when we drop one packet of a flow, why not
> drop everything that's queued and is for the same flow?

I really hope your TCP flows use SACK ;)

Please read the rfc 2018 introduction for details.

If a packet is dropped, it doesn't mean following packets _have_ to be
dropped.


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb

2013-11-20 Thread Michael S. Tsirkin

On Wed, Nov 20, 2013 at 07:16:33AM -0800, Eric Dumazet wrote:
> On Wed, 2013-11-20 at 10:58 +0200, Michael S. Tsirkin wrote:
> > On Tue, Nov 19, 2013 at 02:00:11PM -0800, Eric Dumazet wrote:
> > > On Tue, 2013-11-19 at 23:53 +0200, Michael S. Tsirkin wrote:
> > > 
> > > > Which NIC? Virtio? Prior to 2613af0ed18a11d5c566a81f9a6510b73180660a
> > > > it didn't drop packets received from host as far as I can tell.
> > > > virtio is more like a pipe than a real NIC in this respect.
> > > 
> > > Prior/after to this patch, you were not posting buffers, so if packets
> > > were received on a physical NIC, you were dropping the packets anyway.
> > >
> > > It makes no difference at all, adding a cushion might make you feel
> > > better, but its really not worth it.
> > > 
> > > Under memory stress, it makes better sense to drop a super big GRO
> > > packet (The one needing frag_list extension ...)
> > > 
> > > It gives a better signal to the sender to reduce its pressure, and gives
> > > opportunity to free more of your memory.
> > > 
> > 
> > OK, but in that case one wonders whether we should do more to free memory?
> > 
> > E.g. imagine that we dropped a packet of a specific TCP flow
> > because we couldn't allocate a new packet.
> > 
> > What happens now is that the old packet is freed as well.
> > 
> > So quite likely the next packet in queue will get processed
> > since it will reuse the memory we have just freed.
> > 
> > The next packet and the next after it etc all will have to go through
> > the net stack until they get at the socket and are dropped then
> > because we missed a segment.  Even worse, GRO gets disabled so the load
> > on receiver goes up instead of down.
> > 
> > Sounds like a problem doesn't it?
> 
> I see no problem at all. GRO is a hint for high rates (and obviously
> when there is enough memory)
> 
> > 
> > GRO actually detects it's the same flow and can see packet is
> > out of sequence. Why doesn't it drop the packet then?
> > Alternatively, we could (for example using the pre-allocated skb
> > like I suggested) notify GRO that it should start dropping packets
> > of this flow.
> > 
> > What do you think?
> > 
> 
> I think we disagree a lot on memory management on networking stacks.
> 
> We did a lot of work in TCP stack and Qdisc layers to lower memory
> pressure (and bufferbloat), an you seem to try hard to introduce yet
> another layer of buffer bloat in virtio_net.
> 
> So add whatever you want to proudly state to your management :
> 
> "Look how smart we are : we drop no packets in our layer"
> 

Hmm some kind of disconnect here.
I got you rmanagement about bufferbloat.

What I am saying is that maybe we should drop packets more
aggressively: when we drop one packet of a flow, why not
drop everything that's queued and is for the same flow?

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH -tip v3 00/23] kprobes: introduce NOKPROBE_SYMBOL() and general cleaning of kprobe blacklist

2013-11-20 Thread Ingo Molnar


* Frank Ch. Eigler  wrote:

> masami.hiramatsu.pt wrote:
> 
> > [...]  This series also includes a change which prohibits probing 
> > on the address in .entry.text because the code is used for very 
> > low-level sensitive interrupt/syscall entries. Probing such code 
> > may cause unexpected result (actually most of that area is already 
> > in the kprobe blacklist).  So I've decide to prohibit probing all 
> > of them. [...]
> 
> Does this new blacklist cover enough that the kernel now survives a 
> broadly wildcarded perf-probe, e.g. over e.g. all of its kallsyms?

That's generally the purpose of the annotations - if it doesn't then 
that's a bug.

Thanks,

Ingo
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH RFC 3/3] virtio_ccw: invoke virtio_driver's notify() on CIO_GONE notification

2013-11-20 Thread Heinz Graalfs

virtio_ccw's notify() callback for the common IO layer invokes
virtio_driver's notify() callback to pass-on information to a
backend driver if an online device disappeared.

Signed-off-by: Heinz Graalfs 
---
 drivers/s390/kvm/virtio_ccw.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/s390/kvm/virtio_ccw.c b/drivers/s390/kvm/virtio_ccw.c
index 35b9aaa..420010d 100644
--- a/drivers/s390/kvm/virtio_ccw.c
+++ b/drivers/s390/kvm/virtio_ccw.c
@@ -1064,8 +1064,18 @@ out_free:
 
 static int virtio_ccw_cio_notify(struct ccw_device *cdev, int event)
 {
-   /* TODO: Check whether we need special handling here. */
-   return 0;
+   int rc;
+   struct virtio_ccw_device *vcdev = dev_get_drvdata(&cdev->dev);
+
+   switch (event) {
+   case CIO_GONE:
+   rc = notify_virtio_device(&vcdev->vdev, VDEV_GONE);
+   break;
+   default:
+   rc = NOTIFY_DONE;
+   break;
+   }
+   return rc;
 }
 
 static struct ccw_device_id virtio_ids[] = {
-- 
1.8.3.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH RFC 2/3] virtio_blk: add virtblk_notify() as virtio_driver's notify() callback

2013-11-20 Thread Heinz Graalfs

Add virtblk_notify() as virtio_driver's notify() callback.

When a transport driver is notified that a device disappeared it
should invoke this callback to prevent further request queueing.

Subsequent block layer calls of virtio_blk's request function will
fail, resulting in appropriate I/O errors.

Signed-off-by: Heinz Graalfs 
---
 drivers/block/virtio_blk.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 2d43be4..7fc1d62 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -901,6 +901,19 @@ static void virtblk_remove(struct virtio_device *vdev)
ida_simple_remove(&vd_index_ida, index);
 }
 
+static int virtblk_notify(struct virtio_device *vdev, int event)
+{
+   struct virtio_blk *vblk = vdev->priv;
+
+   if (event == VDEV_GONE) {
+   queue_flag_set_unlocked(QUEUE_FLAG_DYING, vblk->disk->queue);
+   queue_flag_set_unlocked(QUEUE_FLAG_NOMERGES, vblk->disk->queue);
+   queue_flag_set_unlocked(QUEUE_FLAG_NOXMERGES,
+   vblk->disk->queue);
+   }
+   return NOTIFY_DONE;
+}
+
 #ifdef CONFIG_PM
 static int virtblk_freeze(struct virtio_device *vdev)
 {
@@ -961,6 +974,7 @@ static struct virtio_driver virtio_blk = {
.probe  = virtblk_probe,
.remove = virtblk_remove,
.config_changed = virtblk_config_changed,
+   .notify = virtblk_notify,
 #ifdef CONFIG_PM
.freeze = virtblk_freeze,
.restore= virtblk_restore,
-- 
1.8.3.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH RFC 0/3] virtio: add new notify() callback to virtio_driver

2013-11-20 Thread Heinz Graalfs

Hi,

when an active virtio block device is hot-unplugged from a KVM guest, running
affected guest user applications are not aware of any errors that occur due
to the lost device. This patch-set adds code to avoid further request queueing
when a lost block device is detected, resulting in appropriate error info.

On System z there exists no handshake mechanism between host and guest
when a device is hot-unplugged. The device is removed and no further I/O
is possible.

When an online channel device disappears on System z the kernel's CIO layer
informs the driver (virtio_ccw) about the lost device.

It's just the block device drivers that care to provide a notify
callback.

Here are some more error details:

For a particular block device virtio's request function virtblk_request()
is called by the block layer to queue requests to be handled by the host.
In case of a lost device requests can still be queued, but an appropriate
subsequent host kick usually fails. This leads to situations where no error
feedback is shown.

In order to prevent request queueing for lost devices appropriate settings
in the block layer should be made. Exploiting System z's CIO notify handler
callback, and adding a corresponding new virtio_driver notify() handler to
'inform' the block layer, solve this task.

Patch 1 adds an optional notify() callback to virtio_driver.

Patch 2 adds a new notify() callback for the virtio_blk driver. When called
for a lost device settings are made to prevent future request queueing.

Patch 3 modifies the CIO notify handler in virtio_ccw's transport layer to pass
on the lost device info to virtio's backend driver virtio_blk.

Heinz Graalfs (3):
  virtio: add notify() callback to virtio_driver
  virtio_blk: add virtblk_notify() as virtio_driver's notify() callback
  virtio_ccw: invoke virtio_driver's notify() on CIO_GONE notification

 drivers/block/virtio_blk.c| 14 ++
 drivers/s390/kvm/virtio_ccw.c | 14 --
 drivers/virtio/virtio.c   |  8 
 include/linux/virtio.h| 10 ++
 4 files changed, 44 insertions(+), 2 deletions(-)

-- 
1.8.3.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH RFC 1/3] virtio: add notify() callback to virtio_driver

2013-11-20 Thread Heinz Graalfs

Add an optional notify() callback to virtio_driver. A backend
driver can provide this callback to perform actions for a lost
device.

notify() event values are inherited from virtio_ccw's notify()
callback. We might want to support even more of them lateron.

notify() return values are defined in include/linux/notifier.h.

Signed-off-by: Heinz Graalfs 
---
 drivers/virtio/virtio.c |  8 
 include/linux/virtio.h  | 10 ++
 2 files changed, 18 insertions(+)

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index ee59b74..a09abb4 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -186,6 +186,14 @@ void unregister_virtio_driver(struct virtio_driver *driver)
 }
 EXPORT_SYMBOL_GPL(unregister_virtio_driver);
 
+int notify_virtio_device(struct virtio_device *vdev, int event)
+{
+   struct virtio_driver *drv = drv_to_virtio(vdev->dev.driver);
+
+   return drv->notify ? drv->notify(vdev, event) : NOTIFY_DONE;
+}
+EXPORT_SYMBOL_GPL(notify_virtio_device);
+
 int register_virtio_device(struct virtio_device *dev)
 {
int err;
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index f15f6e7..da18e9a 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -110,6 +110,15 @@ int register_virtio_device(struct virtio_device *dev);
 void unregister_virtio_device(struct virtio_device *dev);
 
 /**
+ * notify event values
+ * @VDEV_GONE: device gone
+ */
+enum {
+   VDEV_GONE   = 1,
+};
+int notify_virtio_device(struct virtio_device *dev, int event);
+
+/**
  * virtio_driver - operations for a virtio I/O driver
  * @driver: underlying device driver (populate name and owner).
  * @id_table: the ids serviced by this driver.
@@ -129,6 +138,7 @@ struct virtio_driver {
void (*scan)(struct virtio_device *dev);
void (*remove)(struct virtio_device *dev);
void (*config_changed)(struct virtio_device *dev);
+   int (*notify)(struct virtio_device *dev, int event);
 #ifdef CONFIG_PM
int (*freeze)(struct virtio_device *dev);
int (*restore)(struct virtio_device *dev);
-- 
1.8.3.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb

2013-11-20 Thread Eric Dumazet

On Wed, 2013-11-20 at 10:58 +0200, Michael S. Tsirkin wrote:
> On Tue, Nov 19, 2013 at 02:00:11PM -0800, Eric Dumazet wrote:
> > On Tue, 2013-11-19 at 23:53 +0200, Michael S. Tsirkin wrote:
> > 
> > > Which NIC? Virtio? Prior to 2613af0ed18a11d5c566a81f9a6510b73180660a
> > > it didn't drop packets received from host as far as I can tell.
> > > virtio is more like a pipe than a real NIC in this respect.
> > 
> > Prior/after to this patch, you were not posting buffers, so if packets
> > were received on a physical NIC, you were dropping the packets anyway.
> >
> > It makes no difference at all, adding a cushion might make you feel
> > better, but its really not worth it.
> > 
> > Under memory stress, it makes better sense to drop a super big GRO
> > packet (The one needing frag_list extension ...)
> > 
> > It gives a better signal to the sender to reduce its pressure, and gives
> > opportunity to free more of your memory.
> > 
> 
> OK, but in that case one wonders whether we should do more to free memory?
> 
> E.g. imagine that we dropped a packet of a specific TCP flow
> because we couldn't allocate a new packet.
> 
> What happens now is that the old packet is freed as well.
> 
> So quite likely the next packet in queue will get processed
> since it will reuse the memory we have just freed.
> 
> The next packet and the next after it etc all will have to go through
> the net stack until they get at the socket and are dropped then
> because we missed a segment.  Even worse, GRO gets disabled so the load
> on receiver goes up instead of down.
> 
> Sounds like a problem doesn't it?

I see no problem at all. GRO is a hint for high rates (and obviously
when there is enough memory)

> 
> GRO actually detects it's the same flow and can see packet is
> out of sequence. Why doesn't it drop the packet then?
> Alternatively, we could (for example using the pre-allocated skb
> like I suggested) notify GRO that it should start dropping packets
> of this flow.
> 
> What do you think?
> 

I think we disagree a lot on memory management on networking stacks.

We did a lot of work in TCP stack and Qdisc layers to lower memory
pressure (and bufferbloat), an you seem to try hard to introduce yet
another layer of buffer bloat in virtio_net.

So add whatever you want to proudly state to your management :

"Look how smart we are : we drop no packets in our layer"



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

2014 World Conference on Information Systems and Technologies; Submission: Nov. 29

2013-11-20 Thread WorldCIST

Apologies if you are receiving this mail more than once...


**
 WorldCIST'14
The 2014 World Conference on Information Systems and Technologies
April 15 - 18, Madeira Island, Portugal
   http://www.aisti.eu/worldcist14/
**

The 2014 World Conference on Information Systems and Technologies 
(WorldCIST'14: http://www.aisti.eu/worldcist14) is a global forum for 
researchers and practitioners to present and discuss the most recent 
innovations, trends, results, experiences and concerns in the several 
perspectives of Information Systems and Technologies.

We are pleased to invite you to submit your papers to WorldCISTI'14. All 
submissions will be reviewed on the basis of relevance, originality, importance 
and clarity.

 
THEMES

Submitted papers should be related with one or more of the main themes proposed 
for the Conference:

A) Information and Knowledge Management (IKM);

B) Organizational Models and Information Systems (OMIS);

C) Intelligent and Decision Support Systems (IDSS);

D) Software Systems, Architectures, Applications and Tools (SSAAT);

E) Computer Networks, Mobility and Pervasive Systems (CNMPS);

F) Human-Computer Interaction (HCI);

G) Health Informatics (HIS);

H) Information Technologies in Education (ITE).


TYPES OF SUBMISSIONS AND DECISIONS

Four types of papers can be submitted:

Full paper: Finished or consolidated R&D works, to be included in one of the 
Conference themes. These papers are assigned a 10-page limit.

Short paper: Ongoing works with relevant preliminary results, open to 
discussion. These papers are assigned a 7-page limit.

Poster paper: Initial work with relevant ideas, open to discussion. These 
papers are assigned to a 4-page limit.

Company paper: Companies' papers that show practical experience, R & D, tools, 
etc., focused on some topics of the conference. These papers are assigned to a 
4-page limit.

Submitted papers must comply with the format of Advances in Intelligent Systems 
and Computing Series (see Instructions for Authors at Springer Website or 
download a DOC example) be written in English, must not have been published 
before, not be under review for any other conference or publication and not 
include any information leading to the authors identification. Therefore, the 
authors names, affiliations and bibliographic references should not be 
included in the version for evaluation by the Program Committee. This 
information should only be included in the camera-ready version, saved in Word 
or Latex format and also in PDF format. These files must be accompanied by the 
Consent to Publication form filled out, in a ZIP file, and uploaded at the 
conference management system.

All papers will be subjected to a double-blind review by at least two members 
of the Program Committee.

Based on Program Committee evaluation, a paper can be rejected or accepted by 
the Conference Chairs. In the later case, it can be accepted as the type 
originally submitted or as another type. Thus, full papers can be accepted as 
short papers or poster papers only. Similarly, short papers can be accepted as 
poster papers only. In these cases, the authors will be allowed to maintain the 
original number of pages in the camera-ready version.

The authors of accepted poster papers must also build and print a poster to be 
exhibited during the Conference. This poster must follow an A1 or A2 vertical 
format. The Conference includes Work Sessions where these posters are presented 
and orally discussed, with a 5 minute limit per poster.

The authors of accepted full papers will have 15 minutes to present their work 
in a Conference Work Session; approximately 5 minutes of discussion will follow 
each presentation. The authors of accepted short papers and company papers will 
have 11 minutes to present their work in a Conference Work Session; 
approximately 4 minutes of discussion will follow each presentation.


PUBLICATION AND INDEXING

To ensure that a full paper, short paper, poster paper or company paper is 
published in the Proceedings, at least one of the authors must be fully 
registered by the 24th of January 2014, and the paper must comply with the 
suggested layout and page-limit. Additionally, all recommended changes must be 
addressed by the authors before they submit the camera-ready version.

No more than one paper per registration will be published in the Conference 
Proceedings. An extra fee must be paid for publication of additional papers, 
with a maximum of one additional paper per registration.

Full and short papers will be published in Proceedings by Springer, in Advances 
in Intelligent Systems and Computing Series. Poster and company papers will be 
published in Proceedings by AISTI.

Published full and short papers will be submitted

Re: [PATCH RFC] virtio_net: fix error handling for mergeable buffers

2013-11-20 Thread Michael S. Tsirkin

On Wed, Nov 20, 2013 at 08:54:02AM -0500, Jason Wang wrote:
> 
> 
> - 原始邮件 -
> > On Wed, Nov 20, 2013 at 05:07:25PM +0800, Jason Wang wrote:
> > > When mergeable buffer were used, we only put the first page buf leave the
> > > rest
> > > of buffers in the virt queue. This will cause the driver could not get the
> > > correct head buffer any more. Fix this by dropping the rest of buffers for
> > > this
> > > packet.
> > > 
> > > The bug was introduced by commit 9ab86bbcf8be755256f0a5e994e0b38af6b4d399
> > > (virtio_net: Defer skb allocation in receive path).
> > > 
> > > Cc: Rusty Russell 
> > > Cc: Michael S. Tsirkin 
> > > Cc: Michael Dalton 
> > > Cc: Eric Dumazet 
> > > Cc: Shirley Ma 
> > > Signed-off-by: Jason Wang 
> > 
> > Just to clarify my previous comment: it was not about the
> > idea of adding drop_mergeable_buffer - rather, I think that
> > adding knowledge about mergeable buffers into page_to_skb creates an
> > ugly internal API.
> > 
> > Let's move the call to page_to_skb within receive_mergeable instead:
> > it's also nice that int offset = buf - page_address(page) logic
> > is not spread around like it was.
> > 
> > Also, it's not nice that we ignore length errors when we drop
> > packets because of OOM.
> > 
> > So I came up with the following - it seems to work but I didn't
> > stress test yet.
> 
> I've no objection on this. But I've rather like my small and direct patch 
> to be applied to -net first. It has lower risk and was much more easier to 
> be backported to stable trees. Then we can do the re-factor like this in 
> net-next. 

It makes the interfaces too messy. We are not in code freeze in -net -
only feature freeze, so no reason to make code like spagetty,
and it's only 25 lines changed as compared to 40.
It's not a huge refactoring.

It's just as easy to backport my patch too.
You just drop the goto in the new code path we added.

Let me show you (untested) - your patch is not smaller.

Signed-off-by: Michael S. Tsirkin 



commit 9b442fe970d5c71311d4314edef26ee2eb16e7fb
Author: Michael S. Tsirkin 
Date:   Wed Nov 20 12:44:14 2013 +0200

virtio_net: fix resource leak on alloc failure

virtio net got confused, started dropping packets.

Signed-off-by: Michael S. Tsirkin 

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 9fbdfcd..df4b9d0 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -297,13 +297,22 @@ static struct sk_buff *page_to_skb(struct receive_queue 
*rq,
return skb;
 }
 
-static int receive_mergeable(struct receive_queue *rq, struct sk_buff *skb)
+static struct sk_buff *receive_mergeable(struct net_device *dev,
+struct receive_queue *rq,
+struct page *page,
+unsigned int len)
 {
-   struct skb_vnet_hdr *hdr = skb_vnet_hdr(skb);
-   struct page *page;
-   int num_buf, i, len;
+   struct skb_vnet_hdr *hdr = page_address(buf);
+   int num_buf = hdr->mhdr.num_buffers;
+   struct sk_buff *skb = page_to_skb(rq, page, len);
+   int i;
+
+   skb = page_to_skb(rq, page, len);
+
+   if (unlikely(!skb))
+   goto err_skb;
+
 
-   num_buf = hdr->mhdr.num_buffers;
while (--num_buf) {
i = skb_shinfo(skb)->nr_frags;
if (i >= MAX_SKB_FRAGS) {
@@ -313,10 +322,10 @@ static int receive_mergeable(struct receive_queue *rq, 
struct sk_buff *skb)
}
page = virtqueue_get_buf(rq->vq, &len);
if (!page) {
-   pr_debug("%s: rx error: %d buffers missing\n",
-skb->dev->name, hdr->mhdr.num_buffers);
-   skb->dev->stats.rx_length_errors++;
-   return -EINVAL;
+   pr_debug("%s: rx error: %d buffers %d missing\n",
+dev->name, hdr->mhdr.num_buffers, num_buf);
+   dev->stats.rx_length_errors++;
+   goto err_buf;
}
 
if (len > PAGE_SIZE)
@@ -326,7 +335,25 @@ static int receive_mergeable(struct receive_queue *rq, 
struct sk_buff *skb)
 
--rq->num;
}
-   return 0;
+   return skb;
+err_skb:
+   put_page(page);
+err_buf:
+   dev->stats.rx_dropped++;
+   dev_kfree_skb(head_skb);
+   while (--num_buf) {
+   buf = virtqueue_get_buf(rq->vq, &len);
+   if (unlikely(!buf)) {
+   pr_debug("%s: rx error: %d buffers missing\n",
+dev->name, num_buf);
+   dev->stats.rx_length_errors++;
+   break;
+   }
+   page = buf;
+   give_pages(rq, page);
+   --rq->num;
+   }
+   return NULL;
 }
 
 static void receive_buf(struct receive_queue *rq, void *buf, unsigned i

Re: [PATCH net 2/3] virtio-net: fix num calculation on frag skb allocation failure

2013-11-20 Thread Jason Wang



- 原始邮件 -
> On Wed, Nov 20, 2013 at 07:08:50AM -0500, Jason Wang wrote:
> > 
> > 
> > - 原始邮件 -
> > > On Wed, Nov 20, 2013 at 05:07:26PM +0800, Jason Wang wrote:
> > > > We need decrease the rq->num after we can get a buf through
> > > > virtqueue_get_buf() even if we could not allocate frag skb. Otherwise,
> > > > the
> > > > refill routine won't be triggered under heavy memory stress since the
> > > > driver may
> > > > still think there's enough room.
> > > > 
> > > > This bug was introduced by commit
> > > > 2613af0ed18a11d5c566a81f9a6510b73180660a
> > > > (virtio_net: migrate mergeable rx buffers to page frag allocators).
> > > > 
> > > > Cc: Rusty Russell 
> > > > Cc: Michael S. Tsirkin 
> > > > Cc: Michael Dalton 
> > > > Cc: Eric Dumazet 
> > > > Signed-off-by: Jason Wang 
> > > 
> > > So let's wrap virtqueue_get_buf to make sure we get it right?
> > > 
> > 
> > Ok. good idea.
> 
> So I did this (below) but the compiler is not smart enough to
> avoid two branches on data path.
> So I'm not sure anymore: with my patch it's pretty clear how
> the code works.
> 
> Signed-off-by: Michael S. Tsirkin 
> 
> but I don't think we need to apply this.
> 

True, another point is we'd better to handle both increasing and decreasing in 
the 
same way. We do not increase it in a helper.
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 11d9cc9..1cc2e43 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -296,6 +296,14 @@ static struct sk_buff *page_to_skb(struct receive_queue
> *rq,
>   return skb;
>  }
>  
> +static void *rq_get_buf(struct receive_queue *rq, unsigned int *len)
> +{
> + void *buf = virtqueue_get_buf(rq->vq, len);
> + if (buf)
> + rq->num--;
> + return buf;
> +}
> +
>  static struct sk_buff *receive_mergeable(struct net_device *dev,
>struct receive_queue *rq,
>void *buf,
> @@ -315,7 +323,7 @@ static struct sk_buff *receive_mergeable(struct
> net_device *dev,
>   while (--num_buf) {
>   int num_skb_frags;
>  
> - buf = virtqueue_get_buf(rq->vq, &len);
> + buf = rq_get_buf(rq, &len);
>   if (unlikely(!buf)) {
>   pr_debug("%s: rx error: %d buffers out of %d missing\n",
>dev->name, num_buf, hdr->mhdr.num_buffers);
> @@ -329,7 +337,6 @@ static struct sk_buff *receive_mergeable(struct
> net_device *dev,
>   }
>  
>   page = virt_to_head_page(buf);
> - --rq->num;
>  
>   num_skb_frags = skb_shinfo(curr_skb)->nr_frags;
>   if (unlikely(num_skb_frags == MAX_SKB_FRAGS)) {
> @@ -370,7 +377,7 @@ err_buf:
>   dev->stats.rx_dropped++;
>   dev_kfree_skb(head_skb);
>   while (--num_buf) {
> - buf = virtqueue_get_buf(rq->vq, &len);
> + buf = rq_get_buf(rq, &len);
>   if (unlikely(!buf)) {
>   pr_debug("%s: rx error: %d buffers missing\n",
>dev->name, num_buf);
> @@ -379,7 +386,6 @@ err_buf:
>   }
>   page = virt_to_head_page(buf);
>   put_page(page);
> - --rq->num;
>   }
>   return NULL;
>  }
> @@ -675,9 +681,8 @@ static int virtnet_poll(struct napi_struct *napi, int
> budget)
>  
>  again:
>   while (received < budget &&
> -(buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> +(buf = rq_get_buf(rq, &len)) != NULL) {
>   receive_buf(rq, buf, len);
> - --rq->num;
>   received++;
>   }
>  
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC] virtio_net: fix error handling for mergeable buffers

2013-11-20 Thread Jason Wang



- 原始邮件 -
> On Wed, Nov 20, 2013 at 05:07:25PM +0800, Jason Wang wrote:
> > When mergeable buffer were used, we only put the first page buf leave the
> > rest
> > of buffers in the virt queue. This will cause the driver could not get the
> > correct head buffer any more. Fix this by dropping the rest of buffers for
> > this
> > packet.
> > 
> > The bug was introduced by commit 9ab86bbcf8be755256f0a5e994e0b38af6b4d399
> > (virtio_net: Defer skb allocation in receive path).
> > 
> > Cc: Rusty Russell 
> > Cc: Michael S. Tsirkin 
> > Cc: Michael Dalton 
> > Cc: Eric Dumazet 
> > Cc: Shirley Ma 
> > Signed-off-by: Jason Wang 
> 
> Just to clarify my previous comment: it was not about the
> idea of adding drop_mergeable_buffer - rather, I think that
> adding knowledge about mergeable buffers into page_to_skb creates an
> ugly internal API.
> 
> Let's move the call to page_to_skb within receive_mergeable instead:
> it's also nice that int offset = buf - page_address(page) logic
> is not spread around like it was.
> 
> Also, it's not nice that we ignore length errors when we drop
> packets because of OOM.
> 
> So I came up with the following - it seems to work but I didn't
> stress test yet.

I've no objection on this. But I've rather like my small and direct patch 
to be applied to -net first. It has lower risk and was much more easier to 
be backported to stable trees. Then we can do the re-factor like this in 
net-next. 
> 
> commit ebffb3fe4335ffe07124e4518e76d6e05844fa18
> Author: Michael S. Tsirkin 
> Date:   Wed Nov 20 14:41:29 2013 +0200
> 
> virtio_net: fix error handling for mergeable buffers
> 
> Eric Dumazet noticed that if we encounter an error
> when processing a mergeable buffer, we don't
> dequeue all of the buffers from this packet,
> the result is almost sure to be loss of networking.
> 
> Jason Wang noticed that we also leak a page and that we don't decrement
> the rq buf count, so we won't repost buffers (a resource leak).
> 
> Cc: Rusty Russell 
> Cc: Michael Dalton 
> Reported-by: Eric Dumazet 
> Reported-by: Jason Wang 
> Signed-off-by: Michael S. Tsirkin 
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 01f4eb5..42f6a1e 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -296,41 +296,53 @@ static struct sk_buff *page_to_skb(struct receive_queue
> *rq,
>   return skb;
>  }
>  
> -static int receive_mergeable(struct receive_queue *rq, struct sk_buff
> *head_skb)
> +static struct sk_buff *receive_mergeable(struct net_device *dev,
> +  struct receive_queue *rq,
> +  void *buf,
> +  unsigned int len)
>  {
> - struct skb_vnet_hdr *hdr = skb_vnet_hdr(head_skb);
> + struct skb_vnet_hdr *hdr = buf;
> + int num_buf = hdr->mhdr.num_buffers;
> + struct page *page = virt_to_head_page(buf);
> + int offset = buf - page_address(page);
> + struct sk_buff *head_skb = page_to_skb(rq, page, offset, len,
> +MAX_PACKET_LEN);
>   struct sk_buff *curr_skb = head_skb;
> - char *buf;
> - struct page *page;
> - int num_buf, len, offset;
>  
> - num_buf = hdr->mhdr.num_buffers;
> - while (--num_buf) {
> - int num_skb_frags = skb_shinfo(curr_skb)->nr_frags;
> + if (unlikely(!curr_skb))
> + goto err_skb;
> +
> + while (--num_buf) {
> + int num_skb_frags;
> +
>   buf = virtqueue_get_buf(rq->vq, &len);
>   if (unlikely(!buf)) {
> - pr_debug("%s: rx error: %d buffers missing\n",
> -  head_skb->dev->name, hdr->mhdr.num_buffers);
> - head_skb->dev->stats.rx_length_errors++;
> - return -EINVAL;
> + pr_debug("%s: rx error: %d buffers out of %d missing\n",
> +  dev->name, num_buf, hdr->mhdr.num_buffers);
> + dev->stats.rx_length_errors++;
> + goto err_buf;
>   }
>   if (unlikely(len > MAX_PACKET_LEN)) {
>   pr_debug("%s: rx error: merge buffer too long\n",
> -  head_skb->dev->name);
> +  dev->name);
>   len = MAX_PACKET_LEN;
>   }
> +
> + page = virt_to_head_page(buf);
> + --rq->num;
> +
> + num_skb_frags = skb_shinfo(curr_skb)->nr_frags;
>   if (unlikely(num_skb_frags == MAX_SKB_FRAGS)) {
>   struct sk_buff *nskb = alloc_skb(0, GFP_ATOMIC);
> - if (unlikely(!nskb)) {
> - head_skb->dev->stats.rx_dropped++;
> - return -ENOMEM;
> - }
> +
> + if (unlikely(!ns

Re: [PATCH net 2/3] virtio-net: fix num calculation on frag skb allocation failure

2013-11-20 Thread Michael S. Tsirkin

On Wed, Nov 20, 2013 at 07:08:50AM -0500, Jason Wang wrote:
> 
> 
> - 原始邮件 -
> > On Wed, Nov 20, 2013 at 05:07:26PM +0800, Jason Wang wrote:
> > > We need decrease the rq->num after we can get a buf through
> > > virtqueue_get_buf() even if we could not allocate frag skb. Otherwise, the
> > > refill routine won't be triggered under heavy memory stress since the
> > > driver may
> > > still think there's enough room.
> > > 
> > > This bug was introduced by commit 2613af0ed18a11d5c566a81f9a6510b73180660a
> > > (virtio_net: migrate mergeable rx buffers to page frag allocators).
> > > 
> > > Cc: Rusty Russell 
> > > Cc: Michael S. Tsirkin 
> > > Cc: Michael Dalton 
> > > Cc: Eric Dumazet 
> > > Signed-off-by: Jason Wang 
> > 
> > So let's wrap virtqueue_get_buf to make sure we get it right?
> > 
> 
> Ok. good idea.

So I did this (below) but the compiler is not smart enough to
avoid two branches on data path.
So I'm not sure anymore: with my patch it's pretty clear how
the code works.

Signed-off-by: Michael S. Tsirkin 

but I don't think we need to apply this.

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 11d9cc9..1cc2e43 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -296,6 +296,14 @@ static struct sk_buff *page_to_skb(struct receive_queue 
*rq,
return skb;
 }
 
+static void *rq_get_buf(struct receive_queue *rq, unsigned int *len)
+{
+   void *buf = virtqueue_get_buf(rq->vq, len);
+   if (buf)
+   rq->num--;
+   return buf;
+}
+
 static struct sk_buff *receive_mergeable(struct net_device *dev,
 struct receive_queue *rq,
 void *buf,
@@ -315,7 +323,7 @@ static struct sk_buff *receive_mergeable(struct net_device 
*dev,
while (--num_buf) {
int num_skb_frags;
 
-   buf = virtqueue_get_buf(rq->vq, &len);
+   buf = rq_get_buf(rq, &len);
if (unlikely(!buf)) {
pr_debug("%s: rx error: %d buffers out of %d missing\n",
 dev->name, num_buf, hdr->mhdr.num_buffers);
@@ -329,7 +337,6 @@ static struct sk_buff *receive_mergeable(struct net_device 
*dev,
}
 
page = virt_to_head_page(buf);
-   --rq->num;
 
num_skb_frags = skb_shinfo(curr_skb)->nr_frags;
if (unlikely(num_skb_frags == MAX_SKB_FRAGS)) {
@@ -370,7 +377,7 @@ err_buf:
dev->stats.rx_dropped++;
dev_kfree_skb(head_skb);
while (--num_buf) {
-   buf = virtqueue_get_buf(rq->vq, &len);
+   buf = rq_get_buf(rq, &len);
if (unlikely(!buf)) {
pr_debug("%s: rx error: %d buffers missing\n",
 dev->name, num_buf);
@@ -379,7 +386,6 @@ err_buf:
}
page = virt_to_head_page(buf);
put_page(page);
-   --rq->num;
}
return NULL;
 }
@@ -675,9 +681,8 @@ static int virtnet_poll(struct napi_struct *napi, int 
budget)
 
 again:
while (received < budget &&
-  (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
+  (buf = rq_get_buf(rq, &len)) != NULL) {
receive_buf(rq, buf, len);
-   --rq->num;
received++;
}
 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net 1/3] virtio-net: drop the rest of buffers when we can't allocate skb

2013-11-20 Thread Michael S. Tsirkin

On Wed, Nov 20, 2013 at 07:08:02AM -0500, Jason Wang wrote:
> 
> 
> - 原始邮件 -
> > On Wed, Nov 20, 2013 at 05:07:25PM +0800, Jason Wang wrote:
> > > When mergeable buffer were used, we only put the first page buf leave the
> > > rest
> > > of buffers in the virt queue. This will cause the driver could not get the
> > > correct head buffer any more. Fix this by dropping the rest of buffers for
> > > this
> > > packet.
> > > 
> > > The bug was introduced by commit 9ab86bbcf8be755256f0a5e994e0b38af6b4d399
> > > (virtio_net: Defer skb allocation in receive path).
> > > 
> > > Cc: Rusty Russell 
> > > Cc: Michael S. Tsirkin 
> > > Cc: Michael Dalton 
> > > Cc: Eric Dumazet 
> > > Cc: Shirley Ma 
> > > Signed-off-by: Jason Wang 
> > > ---
> > > This patch was needed for stable
> > > ---
> > >  drivers/net/virtio_net.c | 18 +-
> > >  1 file changed, 17 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index 7bab4de..24fd502 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -222,6 +222,17 @@ static void skb_xmit_done(struct virtqueue *vq)
> > >   netif_wake_subqueue(vi->dev, vq2txq(vq));
> > >  }
> > >  
> > > +static void drop_mergeable_buffer(struct receive_queue *rq, int num_buf)
> > > +{
> > > + char *buf;
> > > + int len;
> > > +
> > > + while (--num_buf && (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> > > + --rq->num;
> > > + put_page(virt_to_head_page(buf));
> > > + }
> > > +}
> > > +
> > 
> > This is the same code we have in receive_mergeable anyway.
> > So let's reuse that.
> > 
> > 
> 
> receive_mergeable() was called after page_to_skb() was called and 
> there's lots of conditions check there. I'm not sure how could we 
> reuse them.

I posted a patch showing how :)

> > >  /* Called from bottom half context */
> > >  static struct sk_buff *page_to_skb(struct receive_queue *rq,
> > >  struct page *page, unsigned int offset,
> > > @@ -237,8 +248,13 @@ static struct sk_buff *page_to_skb(struct
> > > receive_queue *rq,
> > >  
> > >   /* copy small packet so we can reuse these pages for small data */
> > >   skb = netdev_alloc_skb_ip_align(vi->dev, GOOD_COPY_LEN);
> > > - if (unlikely(!skb))
> > > + if (unlikely(!skb)) {
> > > + if (vi->mergeable_rx_bufs) {
> > > + hdr = (struct skb_vnet_hdr *)p;
> > > + drop_mergeable_buffer(rq, hdr->mhdr.num_buffers);
> > > + }
> > >   return NULL;
> > > + }
> > >  
> > >   hdr = skb_vnet_hdr(skb);
> > >  
> > > --
> > > 1.8.3.2
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> > 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH RFC] virtio_net: fix error handling for mergeable buffers

2013-11-20 Thread Michael S. Tsirkin

On Wed, Nov 20, 2013 at 05:07:25PM +0800, Jason Wang wrote:
> When mergeable buffer were used, we only put the first page buf leave the rest
> of buffers in the virt queue. This will cause the driver could not get the
> correct head buffer any more. Fix this by dropping the rest of buffers for 
> this
> packet.
> 
> The bug was introduced by commit 9ab86bbcf8be755256f0a5e994e0b38af6b4d399
> (virtio_net: Defer skb allocation in receive path).
> 
> Cc: Rusty Russell 
> Cc: Michael S. Tsirkin 
> Cc: Michael Dalton 
> Cc: Eric Dumazet 
> Cc: Shirley Ma 
> Signed-off-by: Jason Wang 

Just to clarify my previous comment: it was not about the
idea of adding drop_mergeable_buffer - rather, I think that
adding knowledge about mergeable buffers into page_to_skb creates an
ugly internal API.

Let's move the call to page_to_skb within receive_mergeable instead:
it's also nice that int offset = buf - page_address(page) logic
is not spread around like it was.

Also, it's not nice that we ignore length errors when we drop
packets because of OOM.

So I came up with the following - it seems to work but I didn't
stress test yet.

commit ebffb3fe4335ffe07124e4518e76d6e05844fa18
Author: Michael S. Tsirkin 
Date:   Wed Nov 20 14:41:29 2013 +0200

virtio_net: fix error handling for mergeable buffers

Eric Dumazet noticed that if we encounter an error
when processing a mergeable buffer, we don't
dequeue all of the buffers from this packet,
the result is almost sure to be loss of networking.

Jason Wang noticed that we also leak a page and that we don't decrement
the rq buf count, so we won't repost buffers (a resource leak).

Cc: Rusty Russell 
Cc: Michael Dalton 
Reported-by: Eric Dumazet 
Reported-by: Jason Wang 
Signed-off-by: Michael S. Tsirkin 

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 01f4eb5..42f6a1e 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -296,41 +296,53 @@ static struct sk_buff *page_to_skb(struct receive_queue 
*rq,
return skb;
 }
 
-static int receive_mergeable(struct receive_queue *rq, struct sk_buff 
*head_skb)
+static struct sk_buff *receive_mergeable(struct net_device *dev,
+struct receive_queue *rq,
+void *buf,
+unsigned int len)
 {
-   struct skb_vnet_hdr *hdr = skb_vnet_hdr(head_skb);
+   struct skb_vnet_hdr *hdr = buf;
+   int num_buf = hdr->mhdr.num_buffers;
+   struct page *page = virt_to_head_page(buf);
+   int offset = buf - page_address(page);
+   struct sk_buff *head_skb = page_to_skb(rq, page, offset, len,
+  MAX_PACKET_LEN);
struct sk_buff *curr_skb = head_skb;
-   char *buf;
-   struct page *page;
-   int num_buf, len, offset;
 
-   num_buf = hdr->mhdr.num_buffers;
-   while (--num_buf) {
-   int num_skb_frags = skb_shinfo(curr_skb)->nr_frags;
+   if (unlikely(!curr_skb))
+   goto err_skb;
+
+   while (--num_buf) {
+   int num_skb_frags;
+
buf = virtqueue_get_buf(rq->vq, &len);
if (unlikely(!buf)) {
-   pr_debug("%s: rx error: %d buffers missing\n",
-head_skb->dev->name, hdr->mhdr.num_buffers);
-   head_skb->dev->stats.rx_length_errors++;
-   return -EINVAL;
+   pr_debug("%s: rx error: %d buffers out of %d missing\n",
+dev->name, num_buf, hdr->mhdr.num_buffers);
+   dev->stats.rx_length_errors++;
+   goto err_buf;
}
if (unlikely(len > MAX_PACKET_LEN)) {
pr_debug("%s: rx error: merge buffer too long\n",
-head_skb->dev->name);
+dev->name);
len = MAX_PACKET_LEN;
}
+
+   page = virt_to_head_page(buf);
+   --rq->num;
+
+   num_skb_frags = skb_shinfo(curr_skb)->nr_frags;
if (unlikely(num_skb_frags == MAX_SKB_FRAGS)) {
struct sk_buff *nskb = alloc_skb(0, GFP_ATOMIC);
-   if (unlikely(!nskb)) {
-   head_skb->dev->stats.rx_dropped++;
-   return -ENOMEM;
-   }
+
+   if (unlikely(!nskb))
+   goto err_skb;
if (curr_skb == head_skb)
skb_shinfo(curr_skb)->frag_list = nskb;
else
curr_skb->next = nskb;
-   curr_skb = nskb;
head_skb->truesize += nskb->truesize;
+   curr_skb = ns

Re: [PATCH net 2/3] virtio-net: fix num calculation on frag skb allocation failure

2013-11-20 Thread Jason Wang



- 原始邮件 -
> On Wed, Nov 20, 2013 at 05:07:26PM +0800, Jason Wang wrote:
> > We need decrease the rq->num after we can get a buf through
> > virtqueue_get_buf() even if we could not allocate frag skb. Otherwise, the
> > refill routine won't be triggered under heavy memory stress since the
> > driver may
> > still think there's enough room.
> > 
> > This bug was introduced by commit 2613af0ed18a11d5c566a81f9a6510b73180660a
> > (virtio_net: migrate mergeable rx buffers to page frag allocators).
> > 
> > Cc: Rusty Russell 
> > Cc: Michael S. Tsirkin 
> > Cc: Michael Dalton 
> > Cc: Eric Dumazet 
> > Signed-off-by: Jason Wang 
> 
> So let's wrap virtqueue_get_buf to make sure we get it right?
> 

Ok. good idea.
> > ---
> > The patch was needed for 3.12 stable.
> > ---
> >  drivers/net/virtio_net.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 24fd502..de1d6ca 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -333,6 +333,7 @@ static int receive_mergeable(struct receive_queue *rq,
> > struct sk_buff *head_skb)
> > head_skb->dev->stats.rx_length_errors++;
> > return -EINVAL;
> > }
> > +   --rq->num;
> > if (unlikely(len > MERGE_BUFFER_LEN)) {
> > pr_debug("%s: rx error: merge buffer too long\n",
> >  head_skb->dev->name);
> > @@ -367,7 +368,6 @@ static int receive_mergeable(struct receive_queue *rq,
> > struct sk_buff *head_skb)
> > skb_add_rx_frag(curr_skb, num_skb_frags, page,
> > offset, len, MERGE_BUFFER_LEN);
> > }
> > -   --rq->num;
> > }
> > return 0;
> >  }
> > --
> > 1.8.3.2
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net 1/3] virtio-net: drop the rest of buffers when we can't allocate skb

2013-11-20 Thread Jason Wang



- 原始邮件 -
> On Wed, Nov 20, 2013 at 05:07:25PM +0800, Jason Wang wrote:
> > When mergeable buffer were used, we only put the first page buf leave the
> > rest
> > of buffers in the virt queue. This will cause the driver could not get the
> > correct head buffer any more. Fix this by dropping the rest of buffers for
> > this
> > packet.
> > 
> > The bug was introduced by commit 9ab86bbcf8be755256f0a5e994e0b38af6b4d399
> > (virtio_net: Defer skb allocation in receive path).
> > 
> > Cc: Rusty Russell 
> > Cc: Michael S. Tsirkin 
> > Cc: Michael Dalton 
> > Cc: Eric Dumazet 
> > Cc: Shirley Ma 
> > Signed-off-by: Jason Wang 
> > ---
> > This patch was needed for stable
> > ---
> >  drivers/net/virtio_net.c | 18 +-
> >  1 file changed, 17 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 7bab4de..24fd502 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -222,6 +222,17 @@ static void skb_xmit_done(struct virtqueue *vq)
> > netif_wake_subqueue(vi->dev, vq2txq(vq));
> >  }
> >  
> > +static void drop_mergeable_buffer(struct receive_queue *rq, int num_buf)
> > +{
> > +   char *buf;
> > +   int len;
> > +
> > +   while (--num_buf && (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> > +   --rq->num;
> > +   put_page(virt_to_head_page(buf));
> > +   }
> > +}
> > +
> 
> This is the same code we have in receive_mergeable anyway.
> So let's reuse that.
> 
> 

receive_mergeable() was called after page_to_skb() was called and 
there's lots of conditions check there. I'm not sure how could we 
reuse them.
> >  /* Called from bottom half context */
> >  static struct sk_buff *page_to_skb(struct receive_queue *rq,
> >struct page *page, unsigned int offset,
> > @@ -237,8 +248,13 @@ static struct sk_buff *page_to_skb(struct
> > receive_queue *rq,
> >  
> > /* copy small packet so we can reuse these pages for small data */
> > skb = netdev_alloc_skb_ip_align(vi->dev, GOOD_COPY_LEN);
> > -   if (unlikely(!skb))
> > +   if (unlikely(!skb)) {
> > +   if (vi->mergeable_rx_bufs) {
> > +   hdr = (struct skb_vnet_hdr *)p;
> > +   drop_mergeable_buffer(rq, hdr->mhdr.num_buffers);
> > +   }
> > return NULL;
> > +   }
> >  
> > hdr = skb_vnet_hdr(skb);
> >  
> > --
> > 1.8.3.2
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net 2/3] virtio-net: fix num calculation on frag skb allocation failure

2013-11-20 Thread Michael S. Tsirkin

On Wed, Nov 20, 2013 at 05:07:26PM +0800, Jason Wang wrote:
> We need decrease the rq->num after we can get a buf through
> virtqueue_get_buf() even if we could not allocate frag skb. Otherwise, the
> refill routine won't be triggered under heavy memory stress since the driver 
> may
> still think there's enough room.
> 
> This bug was introduced by commit 2613af0ed18a11d5c566a81f9a6510b73180660a
> (virtio_net: migrate mergeable rx buffers to page frag allocators).
> 
> Cc: Rusty Russell 
> Cc: Michael S. Tsirkin 
> Cc: Michael Dalton 
> Cc: Eric Dumazet 
> Signed-off-by: Jason Wang 

So let's wrap virtqueue_get_buf to make sure we get it right?

> ---
> The patch was needed for 3.12 stable.
> ---
>  drivers/net/virtio_net.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 24fd502..de1d6ca 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -333,6 +333,7 @@ static int receive_mergeable(struct receive_queue *rq, 
> struct sk_buff *head_skb)
>   head_skb->dev->stats.rx_length_errors++;
>   return -EINVAL;
>   }
> + --rq->num;
>   if (unlikely(len > MERGE_BUFFER_LEN)) {
>   pr_debug("%s: rx error: merge buffer too long\n",
>head_skb->dev->name);
> @@ -367,7 +368,6 @@ static int receive_mergeable(struct receive_queue *rq, 
> struct sk_buff *head_skb)
>   skb_add_rx_frag(curr_skb, num_skb_frags, page,
>   offset, len, MERGE_BUFFER_LEN);
>   }
> - --rq->num;
>   }
>   return 0;
>  }
> -- 
> 1.8.3.2
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net 1/3] virtio-net: drop the rest of buffers when we can't allocate skb

2013-11-20 Thread Michael S. Tsirkin

On Wed, Nov 20, 2013 at 05:07:25PM +0800, Jason Wang wrote:
> When mergeable buffer were used, we only put the first page buf leave the rest
> of buffers in the virt queue. This will cause the driver could not get the
> correct head buffer any more. Fix this by dropping the rest of buffers for 
> this
> packet.
> 
> The bug was introduced by commit 9ab86bbcf8be755256f0a5e994e0b38af6b4d399
> (virtio_net: Defer skb allocation in receive path).
> 
> Cc: Rusty Russell 
> Cc: Michael S. Tsirkin 
> Cc: Michael Dalton 
> Cc: Eric Dumazet 
> Cc: Shirley Ma 
> Signed-off-by: Jason Wang 
> ---
> This patch was needed for stable
> ---
>  drivers/net/virtio_net.c | 18 +-
>  1 file changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 7bab4de..24fd502 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -222,6 +222,17 @@ static void skb_xmit_done(struct virtqueue *vq)
>   netif_wake_subqueue(vi->dev, vq2txq(vq));
>  }
>  
> +static void drop_mergeable_buffer(struct receive_queue *rq, int num_buf)
> +{
> + char *buf;
> + int len;
> +
> + while (--num_buf && (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> + --rq->num;
> + put_page(virt_to_head_page(buf));
> + }
> +}
> +

This is the same code we have in receive_mergeable anyway.
So let's reuse that.


>  /* Called from bottom half context */
>  static struct sk_buff *page_to_skb(struct receive_queue *rq,
>  struct page *page, unsigned int offset,
> @@ -237,8 +248,13 @@ static struct sk_buff *page_to_skb(struct receive_queue 
> *rq,
>  
>   /* copy small packet so we can reuse these pages for small data */
>   skb = netdev_alloc_skb_ip_align(vi->dev, GOOD_COPY_LEN);
> - if (unlikely(!skb))
> + if (unlikely(!skb)) {
> + if (vi->mergeable_rx_bufs) {
> + hdr = (struct skb_vnet_hdr *)p;
> + drop_mergeable_buffer(rq, hdr->mhdr.num_buffers);
> + }
>   return NULL;
> + }
>  
>   hdr = skb_vnet_hdr(skb);
>  
> -- 
> 1.8.3.2
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH net 1/3] virtio-net: drop the rest of buffers when we can't allocate skb

2013-11-20 Thread Jason Wang

When mergeable buffer were used, we only put the first page buf leave the rest
of buffers in the virt queue. This will cause the driver could not get the
correct head buffer any more. Fix this by dropping the rest of buffers for this
packet.

The bug was introduced by commit 9ab86bbcf8be755256f0a5e994e0b38af6b4d399
(virtio_net: Defer skb allocation in receive path).

Cc: Rusty Russell 
Cc: Michael S. Tsirkin 
Cc: Michael Dalton 
Cc: Eric Dumazet 
Cc: Shirley Ma 
Signed-off-by: Jason Wang 
---
This patch was needed for stable
---
 drivers/net/virtio_net.c | 18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 7bab4de..24fd502 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -222,6 +222,17 @@ static void skb_xmit_done(struct virtqueue *vq)
netif_wake_subqueue(vi->dev, vq2txq(vq));
 }
 
+static void drop_mergeable_buffer(struct receive_queue *rq, int num_buf)
+{
+   char *buf;
+   int len;
+
+   while (--num_buf && (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
+   --rq->num;
+   put_page(virt_to_head_page(buf));
+   }
+}
+
 /* Called from bottom half context */
 static struct sk_buff *page_to_skb(struct receive_queue *rq,
   struct page *page, unsigned int offset,
@@ -237,8 +248,13 @@ static struct sk_buff *page_to_skb(struct receive_queue 
*rq,
 
/* copy small packet so we can reuse these pages for small data */
skb = netdev_alloc_skb_ip_align(vi->dev, GOOD_COPY_LEN);
-   if (unlikely(!skb))
+   if (unlikely(!skb)) {
+   if (vi->mergeable_rx_bufs) {
+   hdr = (struct skb_vnet_hdr *)p;
+   drop_mergeable_buffer(rq, hdr->mhdr.num_buffers);
+   }
return NULL;
+   }
 
hdr = skb_vnet_hdr(skb);
 
-- 
1.8.3.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH net 3/3] virtio-net: fix resources leaking when fail to allocate frag skb

2013-11-20 Thread Jason Wang

When we fail to allocate a frag skb, we need put the refcnt of the page and drop
the rest of the buffers. Otherwise the page was leaked and driver won't get a
correct head buffer after this failure.

This bug was introduced by commit 2613af0ed18a11d5c566a81f9a6510b73180660a
(virtio_net: migrate mergeable rx buffers to page frag allocators).

Cc: Rusty Russell 
Cc: Michael S. Tsirkin 
Cc: Michael Dalton 
Cc: Eric Dumazet 
Signed-off-by: Jason Wang 
---
The patch was needed for 3.12 stable.
---
 drivers/net/virtio_net.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index de1d6ca..f6f1e20 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -339,9 +339,12 @@ static int receive_mergeable(struct receive_queue *rq, 
struct sk_buff *head_skb)
 head_skb->dev->name);
len = MERGE_BUFFER_LEN;
}
+   page = virt_to_head_page(buf);
if (unlikely(num_skb_frags == MAX_SKB_FRAGS)) {
struct sk_buff *nskb = alloc_skb(0, GFP_ATOMIC);
if (unlikely(!nskb)) {
+   put_page(page);
+   drop_mergeable_buffer(rq, num_buf);
head_skb->dev->stats.rx_dropped++;
return -ENOMEM;
}
@@ -358,7 +361,6 @@ static int receive_mergeable(struct receive_queue *rq, 
struct sk_buff *head_skb)
head_skb->len += len;
head_skb->truesize += MERGE_BUFFER_LEN;
}
-   page = virt_to_head_page(buf);
offset = buf - (char *)page_address(page);
if (skb_can_coalesce(curr_skb, num_skb_frags, page, offset)) {
put_page(page);
-- 
1.8.3.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH net 2/3] virtio-net: fix num calculation on frag skb allocation failure

2013-11-20 Thread Jason Wang

We need decrease the rq->num after we can get a buf through
virtqueue_get_buf() even if we could not allocate frag skb. Otherwise, the
refill routine won't be triggered under heavy memory stress since the driver may
still think there's enough room.

This bug was introduced by commit 2613af0ed18a11d5c566a81f9a6510b73180660a
(virtio_net: migrate mergeable rx buffers to page frag allocators).

Cc: Rusty Russell 
Cc: Michael S. Tsirkin 
Cc: Michael Dalton 
Cc: Eric Dumazet 
Signed-off-by: Jason Wang 
---
The patch was needed for 3.12 stable.
---
 drivers/net/virtio_net.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 24fd502..de1d6ca 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -333,6 +333,7 @@ static int receive_mergeable(struct receive_queue *rq, 
struct sk_buff *head_skb)
head_skb->dev->stats.rx_length_errors++;
return -EINVAL;
}
+   --rq->num;
if (unlikely(len > MERGE_BUFFER_LEN)) {
pr_debug("%s: rx error: merge buffer too long\n",
 head_skb->dev->name);
@@ -367,7 +368,6 @@ static int receive_mergeable(struct receive_queue *rq, 
struct sk_buff *head_skb)
skb_add_rx_frag(curr_skb, num_skb_frags, page,
offset, len, MERGE_BUFFER_LEN);
}
-   --rq->num;
}
return 0;
 }
-- 
1.8.3.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb

2013-11-20 Thread Michael S. Tsirkin

On Tue, Nov 19, 2013 at 01:38:16PM -0800, Michael Dalton wrote:
> Great catch Jason. I agree this now raises the larger issue of how to
> handle a memory alloc failure in the middle of receive. As Eric mentioned,
> we can drop the packet and free the remaining (num_buf) frags.
> 
> Michael, perhaps I'm missing something, but why would you prefer
> pre-allocating buffers in this case? If the guest kernel is OOM'ing,
> dropping packets should provide backpressure.
> 
> Also, we could just as easily fail the initial skb alloc in page_to_skb,
> and I think that case also needs to be handled now in the same fashion as
> a memory allocation failure in receive_mergeable.
> 
> Best,
> 
> Mike

Yes I missed this last night. Thanks a lot Eric and Michael for pointing
this out.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb

2013-11-20 Thread Michael S. Tsirkin

On Tue, Nov 19, 2013 at 05:34:16PM -0800, Michael Dalton wrote:
> Hi,
> 
> After further reflection I think we're looking at two related issues:
> (a) a memory leak that Jason has identified that occurs when a memory
> allocation fails in receive_mergeable. Jasons commit solves this issue.
> (b) virtio-net does not dequeue all buffers for a packet in the
> case that an error occurs on receive and mergeable receive buffers is
> enabled.
> 
> For (a), this bug is new and due to changes in 2613af0ed18a, and the
> net impact is memory leak on the physical page. However, I believe (b)
> has always been possible in some form because if page_to_skb() returns
> NULL (e.g., due to SKB allocation failure), receive_mergeable is never
> called. AFAICT this is also the behavior prior to 2613af0ed18a.
> 
> The net impact of (b) would be that virtio-net would interpret a packet
> buffer that is in the middle of a mergeable packet as the start of a
> new packet, which is definitely also a bug (and the buffer contents
> could contain bytes that resembled a valid virtio-net header).
> 
> A solution for (b) will require handling both the page_to_skb memory
> allocation failures and the memory allocation failures in
> receive_mergeable introduced by 2613af0ed18a.
> 
> Best,
> 
> Mike


Absolutely. I missed this fact yesterday night but I can see it clearly
in the morning.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb

2013-11-20 Thread Michael S. Tsirkin

On Tue, Nov 19, 2013 at 02:00:11PM -0800, Eric Dumazet wrote:
> On Tue, 2013-11-19 at 23:53 +0200, Michael S. Tsirkin wrote:
> 
> > Which NIC? Virtio? Prior to 2613af0ed18a11d5c566a81f9a6510b73180660a
> > it didn't drop packets received from host as far as I can tell.
> > virtio is more like a pipe than a real NIC in this respect.
> 
> Prior/after to this patch, you were not posting buffers, so if packets
> were received on a physical NIC, you were dropping the packets anyway.
>
> It makes no difference at all, adding a cushion might make you feel
> better, but its really not worth it.
> 
> Under memory stress, it makes better sense to drop a super big GRO
> packet (The one needing frag_list extension ...)
> 
> It gives a better signal to the sender to reduce its pressure, and gives
> opportunity to free more of your memory.
> 

OK, but in that case one wonders whether we should do more to free memory?

E.g. imagine that we dropped a packet of a specific TCP flow
because we couldn't allocate a new packet.

What happens now is that the old packet is freed as well.

So quite likely the next packet in queue will get processed
since it will reuse the memory we have just freed.

The next packet and the next after it etc all will have to go through
the net stack until they get at the socket and are dropped then
because we missed a segment.  Even worse, GRO gets disabled so the load
on receiver goes up instead of down.

Sounds like a problem doesn't it?

GRO actually detects it's the same flow and can see packet is
out of sequence. Why doesn't it drop the packet then?
Alternatively, we could (for example using the pre-allocated skb
like I suggested) notify GRO that it should start dropping packets
of this flow.

What do you think?

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

38 matches

Mail list logo