TUN_F_UFO change breaks live migration

2014-11-11 Thread Stefan Hajnoczi
Commit 3d0ad09412ffe00c9afa201d01effdb6023d09b4 ("drivers/net: Disable
UFO through virtio") breaks live migration of KVM guests from older to
newer host kernels:

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=3d0ad09412ffe00c9afa201d01effdb6023d09b4

The problem occurs when a guest running on a host kernel without commit
3d0ad0941 in tun.ko attempts to live migration to a host with commit
3d0ad0941.

Live migration fails in QEMU with the following error message:

  virtio-net: saved image requires TUN_F_UFO support

The old host tun.ko advertised support for TUN_F_UFO.  The new host
tun.ko does not and that's why QEMU aborts live migration.  QEMU cannot
change the features of a running virtio-net device.

tuxcrafter provided logs from two Debian hosts migrating from
3.2.60-1+deb7u3 to 3.2.63-2+deb7u1:

http://paste.debian.net/131264/

I haven't investigated enough to suggest a fix, just wanted to bring it
to your attention.  Soon a lot of people will be hitting this problem as
they upgrade their infrastructure and migrate guests - seems like a
critical issue.

Stefan


pgpJ6PBwnKGBc.pgp
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: TUN_F_UFO change breaks live migration

2014-11-11 Thread Ben Hutchings
On Tue, 2014-11-11 at 10:58 +, Stefan Hajnoczi wrote:
> Commit 3d0ad09412ffe00c9afa201d01effdb6023d09b4 ("drivers/net: Disable
> UFO through virtio") breaks live migration of KVM guests from older to
> newer host kernels:
> 
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=3d0ad09412ffe00c9afa201d01effdb6023d09b4
> 
> The problem occurs when a guest running on a host kernel without commit
> 3d0ad0941 in tun.ko attempts to live migration to a host with commit
> 3d0ad0941.
> 
> Live migration fails in QEMU with the following error message:
> 
>   virtio-net: saved image requires TUN_F_UFO support
> 
> The old host tun.ko advertised support for TUN_F_UFO.  The new host
> tun.ko does not and that's why QEMU aborts live migration.  QEMU cannot
> change the features of a running virtio-net device.

Yes, this is known and was mentioned in the DSA.

> tuxcrafter provided logs from two Debian hosts migrating from
> 3.2.60-1+deb7u3 to 3.2.63-2+deb7u1:
> 
> http://paste.debian.net/131264/
> 
> I haven't investigated enough to suggest a fix, just wanted to bring it
> to your attention.  Soon a lot of people will be hitting this problem as
> they upgrade their infrastructure and migrate guests - seems like a
> critical issue.

You can work around this by making macvtap and tun still claim to
support UFO.  They continue to support it even if it's not advertised
because the tap features don't reliably get propagated to virtio
devices.

Ben.

-- 
Ben Hutchings
Experience is directly proportional to the value of equipment destroyed.
 - Carolyn Scheppner


signature.asc
Description: This is a digitally signed message part
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH] virtio-mmio: support for multiple irqs

2014-11-11 Thread Pawel Moll
On Tue, 2014-11-04 at 09:35 +, Shannon Zhao wrote:
> As the current virtio-mmio only support single irq,
> so some advanced features such as vhost-net with irqfd
> are not supported. And the net performance is not
> the best without vhost-net and irqfd supporting.

Could you, please, help understanding me where does the main issue is?
Is it about:

1. The fact that the existing implementation blindly kicks all queues,
instead only of the updated ones?

or:

2. Literally having a dedicated interrupt line (remember, we're talking
"real" interrupts here, not message signalled ones) per queue, so they
can be handled by different processors at the same time?

Now, if it's only about 1, the simplest solution would be to extend the
VIRTIO_MMIO_INTERRUPT_STATUS register to signal up to 30 queues
"readiness" in bits 2-31, still keeping bit 0 as a "combined"
VIRTIO_MMIO_INT_VRING. In case when VIRTIO_MMIO_INT_VRING is set and
none of the "individual" bits is (a device which doesn't support this
feature or one that has more than 30 queues and of of those is ready) we
would fall back to the original "kick all queues" approach. This could
be a useful (and pretty simple) extension. In the worst case scenario it
could be a post-1.0 standard addition, as it would provide backward
compatibility.

However, if it's about 2, we're talking larger changes here. From the
device perspective, we can define it as having per-queue (plus one for
config) interrupt output *and* a "combined" output, being simple logical
"or" of all the others. Then, the Device Tree bindings would be used to
express the implementation choices (I'd keep the kernel parameter
approach supporting the single interrupt case only). This is a very
popular and well understood approach for memory mapped peripherals (for
example, see the . It allows the system integrator to make a decision
when it's coming to latency vs number interrupt lines trade off. The
main issue is that we can't really impose a limit on a number of queues,
therefore on a number of interrupts. This would require adding a new
"interrupt acknowledge" register, which would take a number of the queue
(or a symbolic value for the config one) instead of a bit mask. And I
must say that I'm not enjoying the idea of such substantial change to
the specification that late in the process... (in other words: you'll
have to put extra effort into convincing me :-)

> This patch support virtio-mmio to request multiple
> irqs like virtio-pci. With this patch and qemu assigning
> multiple irqs for virtio-mmio device, it's ok to use
> vhost-net with irqfd on arm/arm64.

Could you please tell me how many queues (interrupts) are we talking
about in this case? 5? A dozen? Hundreds?

Disclaimer: I have no personal experience with virtio and network (due
to the fact how our Fast Models are implemented, I mostly us block
devices and 9p protocol over virtio and I get enough performance from
them :-).

> As arm doesn't support msi-x now, 

To be precise: "ARM" does "support" MSI-X :-) (google for GICv2m)

The correct statement would be: "normal memory mapped devices have no
interface for message signalled interrupts (like MSI-X)"

> we use GSI for multiple irq. 

I'm not sure what GSI stands for, but looking at the code I assume it's
just a "normal" peripheral interrupt.

> In this patch we use "vm_try_to_find_vqs"
> to check whether multiple irqs are supported like
> virtio-pci.

Yeah, I can see that you have followed virtio-pci quite literally. I'm
particularly not convinced to the one interrupt for config, one for all
queues option. Doesn't make any sense to me here.

> Is this the right direction? is there other ways to
> make virtio-mmio support multiple irq? Hope for feedback.

One point I'd like to make is that the device was intentionally designed
with simplicity in mind first, performance later (something about
"embedded" etc" :-). Changing this assumption is of course possible, but
- I must say - makes me slightly uncomfortable... The extensions we're
discussing here seem doable, but I've noticed your other patches doing
with a shared memory region and I didn't like them at all, sorry.

I see the subject has been already touched in the discussions, but let
me bring PCI to the surface again. We're getting more server-class SOCs
in the market, which obviously bring PCI with them to both arm and arm64
world, something unheard of in the "mobile past". I believe the PCI
patches for the arm64 have been already merged in the kernel.

Therefore: I'm not your boss so, obviously, I can't tell you what to do,
but could you consider redirecting your efforts into getting the "ARM
PCI" up and running in qemu so you can simply use the existing
infrastructure? This would save us a lot of work and pain in doing late
functional changes to the standard and will be probably more
future-proof from your perspective (PCI will happen, sooner or later -
you can make it sooner ;-)

Regards

Pawel

__

Re: kernel BUG at drivers/block/virtio_blk.c:172!

2014-11-11 Thread Dongsu Park
Hi Ming,

On 11.11.2014 08:56, Ming Lei wrote:
> On Tue, Nov 11, 2014 at 7:31 AM, Jens Axboe  wrote:
> > Known, I'm afraid, Ming is looking into it.

Actually I had also tried to reproduce this bug, without success.
But today I happened to know how to trigger the bug, by coincidence,
during testing other things.

Try to run xfstests/generic/034. You'll see the crash immediately.
Tested on a QEMU VM with kernel 3.18-rc4, virtio-blk, dm-flakey and xfs.

> There is one obvious bug which should have been fixed by below
> patch(0001-block-blk-merge-fix-blk_recount_segments.patch"):
> 
> http://marc.info/?l=linux-virtualization&m=141562191719405&q=p3

This patch didn't bring anything to me, as Lukas said.

> And there might be another one, I appreciate someone can post
> log which is printed by patch("blk-seg.patch") in below link if the
> bug still can be triggered even with above fix:
> 
> http://marc.info/?l=linux-virtualization&m=141473040618467&q=p3

"blk_recount_segments: 1-1-1 vcnt-128 segs-128"

As long as I understood so far, the reason is that bi_phys_segments gets
sometimes bigger than queue_max_sectors() after blk_recount_segments().
That happens no matter whether segments are recalculated or not.

I'm not completely sure about what to do, but how about the attached patch?
It seems to work, according to several xfstests runs.

Cheers,
Dongsu



>From 1db98323931eb9ab430116c4d909d2c16e22 Mon Sep 17 00:00:00 2001
From: Dongsu Park 
Date: Tue, 11 Nov 2014 13:10:59 +0100
Subject: [RFC PATCH] blk-merge: make bi_phys_segments consider also
 queue_max_segments()

When recounting the number of physical segments, the number of max
segments of request_queue must be also taken into account.
Otherwise bio->bi_phys_segments could get bigger than
queue_max_segments(). Then this results in virtio_queue_rq() seeing
req->nr_phys_segments that is greater than expected. Although the
initial queue_max_segments was set to (vblk->sg_elems - 2), a request
comes in with a larger value of nr_phys_segments, which triggers the
BUG_ON() condition.

This commit should fix a kernel crash in virtio_blk, which occurs
especially frequently when it runs with blk-mq, device mapper, and xfs.
The simplest way to reproduce this bug is to run xfstests/generic/034.
Note, test 034 requires dm-flakey to be turned on in the kernel config.

See the kernel trace below:
[ cut here ]
kernel BUG at drivers/block/virtio_blk.c:172!
invalid opcode:  [#1] SMP
CPU: 1 PID: 3343 Comm: mount Not tainted 3.18.0-rc4+ #55
RIP: 0010:[]
 [] virtio_queue_rq+0x277/0x280
Call Trace:
 [] __blk_mq_run_hw_queue+0x1a8/0x300
 [] blk_mq_run_hw_queue+0x6d/0x90
 [] blk_sq_make_request+0x23e/0x360
 [] generic_make_request+0xc0/0x110
 [] submit_bio+0x69/0x130
 [] _xfs_buf_ioapply+0x2bd/0x410
 [] ? xlog_bread_noalign+0xa8/0xe0
 [] xfs_buf_submit_wait+0x61/0x1d0
 [] xlog_bread_noalign+0xa8/0xe0
 [] xlog_bread+0x27/0x60
 [] xlog_find_verify_cycle+0xe1/0x190
 [] xlog_find_head+0x2d1/0x3c0
 [] xlog_find_tail+0x2d/0x3f0
 [] xlog_recover+0x1e/0xf0
 [] xfs_log_mount+0x24c/0x2c0
 [] xfs_mountfs+0x44b/0x7a0
 [] xfs_fs_fill_super+0x2ba/0x330
 [] mount_bdev+0x194/0x1d0
 [] ? xfs_parseargs+0xbe0/0xbe0
 [] xfs_fs_mount+0x15/0x20
 [] mount_fs+0x39/0x1b0
 [] ? __alloc_percpu+0x15/0x20
 [] vfs_kern_mount+0x67/0x110
 [] do_mount+0x204/0xad0
 [] SyS_mount+0x8b/0xe0
 [] system_call_fastpath+0x12/0x17
RIP [] virtio_queue_rq+0x277/0x280
---[ end trace ae3ec6426f011b5d ]---

Signed-off-by: Dongsu Park 
Tested-by: Dongsu Park 
Cc: Ming Lei 
Cc: Jens Axboe 
Cc: Rusty Russell 
Cc: Jeff Layton 
Cc: Dave Chinner 
Cc: "Michael S. Tsirkin" 
Cc: Lukas Czerner 
Cc: Christoph Hellwig 
Cc: virtualization@lists.linux-foundation.org
---
 block/blk-merge.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index b3ac40a..d808601 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -103,13 +103,16 @@ void blk_recount_segments(struct request_queue *q, struct 
bio *bio)
 
if (no_sg_merge && !bio_flagged(bio, BIO_CLONED) &&
merge_not_need)
-   bio->bi_phys_segments = bio->bi_vcnt;
+   bio->bi_phys_segments = min_t(unsigned int, bio->bi_vcnt,
+   queue_max_segments(q));
else {
struct bio *nxt = bio->bi_next;
 
bio->bi_next = NULL;
-   bio->bi_phys_segments = __blk_recalc_rq_segments(q, bio,
-   no_sg_merge && merge_not_need);
+   bio->bi_phys_segments = min_t(unsigned int,
+   __blk_recalc_rq_segments(q, bio, no_sg_merge
+   && merge_not_need),
+   queue_max_segments(q));
bio->bi_next = nxt;
}
 
-- 
1.9.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
ht

Re: TUN_F_UFO change breaks live migration

2014-11-11 Thread Michael S. Tsirkin
On Tue, Nov 11, 2014 at 12:17:26PM +, Ben Hutchings wrote:
> On Tue, 2014-11-11 at 10:58 +, Stefan Hajnoczi wrote:
> > Commit 3d0ad09412ffe00c9afa201d01effdb6023d09b4 ("drivers/net: Disable
> > UFO through virtio") breaks live migration of KVM guests from older to
> > newer host kernels:
> > 
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=3d0ad09412ffe00c9afa201d01effdb6023d09b4
> > 
> > The problem occurs when a guest running on a host kernel without commit
> > 3d0ad0941 in tun.ko attempts to live migration to a host with commit
> > 3d0ad0941.
> > 
> > Live migration fails in QEMU with the following error message:
> > 
> >   virtio-net: saved image requires TUN_F_UFO support
> > 
> > The old host tun.ko advertised support for TUN_F_UFO.  The new host
> > tun.ko does not and that's why QEMU aborts live migration.  QEMU cannot
> > change the features of a running virtio-net device.
> 
> Yes, this is known and was mentioned in the DSA.
> 
> > tuxcrafter provided logs from two Debian hosts migrating from
> > 3.2.60-1+deb7u3 to 3.2.63-2+deb7u1:
> > 
> > http://paste.debian.net/131264/
> > 
> > I haven't investigated enough to suggest a fix, just wanted to bring it
> > to your attention.  Soon a lot of people will be hitting this problem as
> > they upgrade their infrastructure and migrate guests - seems like a
> > critical issue.
> 
> You can work around this by making macvtap and tun still claim to
> support UFO.

If this is what we want userspace to do, let's just put the
feature flag back?


Basically userspace assumed that features will only
ever be added, never removed, so this change is
breaking it.

>  They continue to support it even if it's not advertised
> because the tap features don't reliably get propagated to virtio
> devices.
> 
> Ben.

Hmm I don't understand this last sentence.
features are actually reliably propagated to virtio devices.


> -- 
> Ben Hutchings
> Experience is directly proportional to the value of equipment destroyed.
>  - Carolyn Scheppner


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: TUN_F_UFO change breaks live migration

2014-11-11 Thread David Miller
From: "Michael S. Tsirkin" 
Date: Tue, 11 Nov 2014 17:57:50 +0200

> Basically userspace assumed that features will only
> ever be added, never removed, so this change is
> breaking it.

I essentially agree.

We can't just toss feature bits like this which have been present for so
long.

"There is no way I can figure out how to easily make it work" is not an
excuse for breaking existing userspace like this.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: kernel BUG at drivers/block/virtio_blk.c:172!

2014-11-11 Thread Ming Lei
On Tue, Nov 11, 2014 at 11:42 PM, Dongsu Park
 wrote:
> Hi Ming,
>
> On 11.11.2014 08:56, Ming Lei wrote:
>> On Tue, Nov 11, 2014 at 7:31 AM, Jens Axboe  wrote:
>> > Known, I'm afraid, Ming is looking into it.
>
> Actually I had also tried to reproduce this bug, without success.
> But today I happened to know how to trigger the bug, by coincidence,
> during testing other things.
>
> Try to run xfstests/generic/034. You'll see the crash immediately.
> Tested on a QEMU VM with kernel 3.18-rc4, virtio-blk, dm-flakey and xfs.
>
>> There is one obvious bug which should have been fixed by below
>> patch(0001-block-blk-merge-fix-blk_recount_segments.patch"):
>>
>> http://marc.info/?l=linux-virtualization&m=141562191719405&q=p3
>
> This patch didn't bring anything to me, as Lukas said.
>
>> And there might be another one, I appreciate someone can post
>> log which is printed by patch("blk-seg.patch") in below link if the
>> bug still can be triggered even with above fix:
>>
>> http://marc.info/?l=linux-virtualization&m=141473040618467&q=p3
>
> "blk_recount_segments: 1-1-1 vcnt-128 segs-128"
>
> As long as I understood so far, the reason is that bi_phys_segments gets
> sometimes bigger than queue_max_sectors() after blk_recount_segments().
> That happens no matter whether segments are recalculated or not.

I know the problem now, since bi_vcnt can't be used for cloned bio,
and the patch I sent last time is wrong too.

> I'm not completely sure about what to do, but how about the attached patch?
> It seems to work, according to several xfstests runs.
>
> Cheers,
> Dongsu
>
> 
>
> From 1db98323931eb9ab430116c4d909d2c16e22 Mon Sep 17 00:00:00 2001
> From: Dongsu Park 
> Date: Tue, 11 Nov 2014 13:10:59 +0100
> Subject: [RFC PATCH] blk-merge: make bi_phys_segments consider also
>  queue_max_segments()
>
> When recounting the number of physical segments, the number of max
> segments of request_queue must be also taken into account.
> Otherwise bio->bi_phys_segments could get bigger than
> queue_max_segments(). Then this results in virtio_queue_rq() seeing
> req->nr_phys_segments that is greater than expected. Although the
> initial queue_max_segments was set to (vblk->sg_elems - 2), a request
> comes in with a larger value of nr_phys_segments, which triggers the
> BUG_ON() condition.
>
> This commit should fix a kernel crash in virtio_blk, which occurs
> especially frequently when it runs with blk-mq, device mapper, and xfs.
> The simplest way to reproduce this bug is to run xfstests/generic/034.
> Note, test 034 requires dm-flakey to be turned on in the kernel config.
>
> See the kernel trace below:
> [ cut here ]
> kernel BUG at drivers/block/virtio_blk.c:172!
> invalid opcode:  [#1] SMP
> CPU: 1 PID: 3343 Comm: mount Not tainted 3.18.0-rc4+ #55
> RIP: 0010:[]
>  [] virtio_queue_rq+0x277/0x280
> Call Trace:
>  [] __blk_mq_run_hw_queue+0x1a8/0x300
>  [] blk_mq_run_hw_queue+0x6d/0x90
>  [] blk_sq_make_request+0x23e/0x360
>  [] generic_make_request+0xc0/0x110
>  [] submit_bio+0x69/0x130
>  [] _xfs_buf_ioapply+0x2bd/0x410
>  [] ? xlog_bread_noalign+0xa8/0xe0
>  [] xfs_buf_submit_wait+0x61/0x1d0
>  [] xlog_bread_noalign+0xa8/0xe0
>  [] xlog_bread+0x27/0x60
>  [] xlog_find_verify_cycle+0xe1/0x190
>  [] xlog_find_head+0x2d1/0x3c0
>  [] xlog_find_tail+0x2d/0x3f0
>  [] xlog_recover+0x1e/0xf0
>  [] xfs_log_mount+0x24c/0x2c0
>  [] xfs_mountfs+0x44b/0x7a0
>  [] xfs_fs_fill_super+0x2ba/0x330
>  [] mount_bdev+0x194/0x1d0
>  [] ? xfs_parseargs+0xbe0/0xbe0
>  [] xfs_fs_mount+0x15/0x20
>  [] mount_fs+0x39/0x1b0
>  [] ? __alloc_percpu+0x15/0x20
>  [] vfs_kern_mount+0x67/0x110
>  [] do_mount+0x204/0xad0
>  [] SyS_mount+0x8b/0xe0
>  [] system_call_fastpath+0x12/0x17
> RIP [] virtio_queue_rq+0x277/0x280
> ---[ end trace ae3ec6426f011b5d ]---
>
> Signed-off-by: Dongsu Park 
> Tested-by: Dongsu Park 
> Cc: Ming Lei 
> Cc: Jens Axboe 
> Cc: Rusty Russell 
> Cc: Jeff Layton 
> Cc: Dave Chinner 
> Cc: "Michael S. Tsirkin" 
> Cc: Lukas Czerner 
> Cc: Christoph Hellwig 
> Cc: virtualization@lists.linux-foundation.org
> ---
>  block/blk-merge.c | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index b3ac40a..d808601 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -103,13 +103,16 @@ void blk_recount_segments(struct request_queue *q, 
> struct bio *bio)
>
> if (no_sg_merge && !bio_flagged(bio, BIO_CLONED) &&
> merge_not_need)
> -   bio->bi_phys_segments = bio->bi_vcnt;
> +   bio->bi_phys_segments = min_t(unsigned int, bio->bi_vcnt,
> +   queue_max_segments(q));
> else {
> struct bio *nxt = bio->bi_next;
>
> bio->bi_next = NULL;
> -   bio->bi_phys_segments = __blk_recalc_rq_segments(q, bio,
> -   no_sg_merge && merge_not_need);
> +   bio->bi_phys_segments 

Re: TUN_F_UFO change breaks live migration

2014-11-11 Thread Ben Hutchings
On Tue, 2014-11-11 at 17:57 +0200, Michael S. Tsirkin wrote:
> On Tue, Nov 11, 2014 at 12:17:26PM +, Ben Hutchings wrote:
> > On Tue, 2014-11-11 at 10:58 +, Stefan Hajnoczi wrote:
> > > Commit 3d0ad09412ffe00c9afa201d01effdb6023d09b4 ("drivers/net: Disable
> > > UFO through virtio") breaks live migration of KVM guests from older to
> > > newer host kernels:
> > > 
> > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=3d0ad09412ffe00c9afa201d01effdb6023d09b4
> > > 
> > > The problem occurs when a guest running on a host kernel without commit
> > > 3d0ad0941 in tun.ko attempts to live migration to a host with commit
> > > 3d0ad0941.
> > > 
> > > Live migration fails in QEMU with the following error message:
> > > 
> > >   virtio-net: saved image requires TUN_F_UFO support
> > > 
> > > The old host tun.ko advertised support for TUN_F_UFO.  The new host
> > > tun.ko does not and that's why QEMU aborts live migration.  QEMU cannot
> > > change the features of a running virtio-net device.
> > 
> > Yes, this is known and was mentioned in the DSA.
> > 
> > > tuxcrafter provided logs from two Debian hosts migrating from
> > > 3.2.60-1+deb7u3 to 3.2.63-2+deb7u1:
> > > 
> > > http://paste.debian.net/131264/
> > > 
> > > I haven't investigated enough to suggest a fix, just wanted to bring it
> > > to your attention.  Soon a lot of people will be hitting this problem as
> > > they upgrade their infrastructure and migrate guests - seems like a
> > > critical issue.
> > 
> > You can work around this by making macvtap and tun still claim to
> > support UFO.
> 
> If this is what we want userspace to do, let's just put the
> feature flag back?

Let's not *just* put the feature flag back.  I accept this is probably
needed as a workaround, but UFO/IPv6 still won't work correctly over
virtio.

> Basically userspace assumed that features will only
> ever be added, never removed, so this change is
> breaking it.

OK.

> >  They continue to support it even if it's not advertised
> > because the tap features don't reliably get propagated to virtio
> > devices.
> > 
> > Ben.
> 
> Hmm I don't understand this last sentence.
> features are actually reliably propagated to virtio devices.

They might be when using current QEMU and libvirt on the host.  They
weren't when I tested on Debian stable.  The warnings about 'using
disabled UFO feature' are reliably triggered if the host's tap driver is
patched and the guest's virtio_net driver is not.

Ben.

-- 
Ben Hutchings
Experience is directly proportional to the value of equipment destroyed.
 - Carolyn Scheppner


signature.asc
Description: This is a digitally signed message part
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH net] Revert "drivers/net: Disable UFO through virtio" in macvtap and tun

2014-11-11 Thread Ben Hutchings
This reverts commit 88e0e0e5aa722b193c8758c8b45d041de5316924 for
the tap drivers, but leaves UFO disabled in virtio_net.

libvirt at least assumes that tap features will never be dropped
in new kernel versions, and doing so prevents migration of VMs to
the never kernel version while they are running with virtio net
devices.

Fixes: 88e0e0e5aa7a ("drivers/net: Disable UFO through virtio")
Signed-off-by: Ben Hutchings 
---
Compile-tested only.

Ben.

 drivers/net/macvtap.c | 13 -
 drivers/net/tun.c | 19 ---
 2 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 6f226de..aeaeb6d 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -66,7 +66,7 @@ static struct cdev macvtap_cdev;
 static const struct proto_ops macvtap_socket_ops;
 
 #define TUN_OFFLOADS (NETIF_F_HW_CSUM | NETIF_F_TSO_ECN | NETIF_F_TSO | \
- NETIF_F_TSO6)
+ NETIF_F_TSO6 | NETIF_F_UFO)
 #define RX_OFFLOADS (NETIF_F_GRO | NETIF_F_LRO)
 #define TAP_FEATURES (NETIF_F_GSO | NETIF_F_SG)
 
@@ -570,8 +570,6 @@ static int macvtap_skb_from_vnet_hdr(struct sk_buff *skb,
gso_type = SKB_GSO_TCPV6;
break;
case VIRTIO_NET_HDR_GSO_UDP:
-   pr_warn_once("macvtap: %s: using disabled UFO feature; 
please fix this program\n",
-current->comm);
gso_type = SKB_GSO_UDP;
if (skb->protocol == htons(ETH_P_IPV6))
ipv6_proxy_select_ident(skb);
@@ -619,6 +617,8 @@ static void macvtap_skb_to_vnet_hdr(const struct sk_buff 
*skb,
vnet_hdr->gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
else if (sinfo->gso_type & SKB_GSO_TCPV6)
vnet_hdr->gso_type = VIRTIO_NET_HDR_GSO_TCPV6;
+   else if (sinfo->gso_type & SKB_GSO_UDP)
+   vnet_hdr->gso_type = VIRTIO_NET_HDR_GSO_UDP;
else
BUG();
if (sinfo->gso_type & SKB_GSO_TCP_ECN)
@@ -953,6 +953,9 @@ static int set_offload(struct macvtap_queue *q, unsigned 
long arg)
if (arg & TUN_F_TSO6)
feature_mask |= NETIF_F_TSO6;
}
+
+   if (arg & TUN_F_UFO)
+   feature_mask |= NETIF_F_UFO;
}
 
/* tun/tap driver inverts the usage for TSO offloads, where
@@ -963,7 +966,7 @@ static int set_offload(struct macvtap_queue *q, unsigned 
long arg)
 * When user space turns off TSO, we turn off GSO/LRO so that
 * user-space will not receive TSO frames.
 */
-   if (feature_mask & (NETIF_F_TSO | NETIF_F_TSO6))
+   if (feature_mask & (NETIF_F_TSO | NETIF_F_TSO6 | NETIF_F_UFO))
features |= RX_OFFLOADS;
else
features &= ~RX_OFFLOADS;
@@ -1064,7 +1067,7 @@ static long macvtap_ioctl(struct file *file, unsigned int 
cmd,
case TUNSETOFFLOAD:
/* let the user check for future flags */
if (arg & ~(TUN_F_CSUM | TUN_F_TSO4 | TUN_F_TSO6 |
-   TUN_F_TSO_ECN))
+   TUN_F_TSO_ECN | TUN_F_UFO))
return -EINVAL;
 
rtnl_lock();
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 7302398..a0987d1 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -175,7 +175,7 @@ struct tun_struct {
struct net_device   *dev;
netdev_features_t   set_features;
 #define TUN_USER_FEATURES (NETIF_F_HW_CSUM|NETIF_F_TSO_ECN|NETIF_F_TSO| \
- NETIF_F_TSO6)
+ NETIF_F_TSO6|NETIF_F_UFO)
 
int vnet_hdr_sz;
int sndbuf;
@@ -1152,20 +1152,10 @@ static ssize_t tun_get_user(struct tun_struct *tun, 
struct tun_file *tfile,
skb_shinfo(skb)->gso_type = SKB_GSO_TCPV6;
break;
case VIRTIO_NET_HDR_GSO_UDP:
-   {
-   static bool warned;
-
-   if (!warned) {
-   warned = true;
-   netdev_warn(tun->dev,
-   "%s: using disabled UFO feature; 
please fix this program\n",
-   current->comm);
-   }
skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
if (skb->protocol == htons(ETH_P_IPV6))
ipv6_proxy_select_ident(skb);
break;
-   }
default:
tun->dev->stats.rx_frame_errors++;
kfree_skb(skb);
@@ -1265,6 +1255,8 @@ static ssize_t tun_put_user(struct tun_struct *tun,

Re: [PATCH net] Revert "drivers/net: Disable UFO through virtio" in macvtap and tun

2014-11-11 Thread David Miller
From: Ben Hutchings 
Date: Tue, 11 Nov 2014 17:12:58 +

> This reverts commit 88e0e0e5aa722b193c8758c8b45d041de5316924 for
> the tap drivers, but leaves UFO disabled in virtio_net.
> 
> libvirt at least assumes that tap features will never be dropped
> in new kernel versions, and doing so prevents migration of VMs to
> the never kernel version while they are running with virtio net
> devices.
> 
> Fixes: 88e0e0e5aa7a ("drivers/net: Disable UFO through virtio")
> Signed-off-by: Ben Hutchings 
> ---
> Compile-tested only.

After you get some Tested-by:, please resubmit to netdev.

Thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


CLUSTER 2015 - Call for Workshops

2014-11-11 Thread Ioan Raicu

IEEE International Conference on Cluster Computing
September 8-11, 2015
Chicago, IL, USA
https://press3.mcs.anl.gov/ieeecluster2015/

--
...Follow us on Facebook athttps://www.facebook.com/ieee.cluster
...Follow us on Twitter athttps://twitter.com/IEEECluster
...Follow us on Linkedin at
https://www.linkedin.com/groups/IEEE-International-Conference-on-Cluster-7428925 


...Follow us on RenRen athttp://page.renren.com/601871401

--

CALL FOR WORKSHOPS

Experts in cluster computing are invited to share their expertise with
the community by submitting proposals for workshops to be co-located
with the Cluster 2015 conference that will be held on September 8-11,
2015 in Chicago, Illinois. The Cluster 2015 workshop program will
provide an opportunity for authors to present preliminary work related
to the topics covered by Cluster 2015 and, thus, will give attendees
access to the latest work in progress in the field. Cluster 2015
invites proposals for workshops on a broad range of topics on the
design of clusters and related technologies and on effectively
managing or using cluster resources and services. All workshops will
be held on September 8, 2015.


TOPIC CATEGORIES

Topic Categories of interest include but are not limited to the following:

Cluster node and system hardware;
Cluster/distributed operating and runtime systems;
Parallel programming models, language, runtimes and environments;
Performance evaluation, analysis and optimization;
Fault-tolerance, resilience, reliability and high availability;
Data management and distributed and parallel file systems;
Visualization and data analytics;
Cluster management tools including resource management;
Power management, energy and cooling;
High performance networks and communication systems;
Clouds and distributed systems;
Security.


SUBMISSION GUIDELINES

Proposals are limited to a 3 page description that should include the 
following:


A title, an abstract (400 words maximum),and a syllabus;
Discussion of the topic's relevance and workshop goals;
A description of the target audience;
An indication of half-day or full-day duration;
If applicable, details on previous instantiations of the workshop,
including attendance numbers.
In addition to the 3 page description, please include the following:

A brief biographical sketch of each workshop organizer (2 pages per 
organizer).

Workshop proposal will be evaluated as they are received. Decisions
will be made based on the proposal content and its relationship to any
workshop proposals that have been accepted or received at the time the
proposal is received. Since this policy may result in a the workshop
program being completed early, Cluster 2015 reserves the right to
close workshop submissions prior to the date listed below.

A single pdf file with all the material described above should be
submitted to the Cluster 2015 conference on-line submission system
under the workshop track
(https://easychair.org/conferences/?conf=cluster2015workshops).

Workshop length will be determined based on the following policy:

Workshops that accept ten or more papers will be provided space for a
full day program;
Workshops that accept seven to nine papers will be provided space for
four and a half hours (not including breaks);
Workshops that accept five or six papers will be provided space for
three hours (not including breaks);
Workshops that accept three or four papers will be provided space for
a half hour per paper;
Workshops with fewer accepted papers will be canceled.
Cluster 2015 will consider increasing the time allocated to a workshop
if space is available.

Cluster 2015 will provide free conference registration for one keynote
speaker to workshops that accept ten or more papers. The free
registration cannot be used by a workshop organizer, even if the
organizer presents a keynote.


IMPORTANT DATES

Workshop proposals are selected and notifications are sent on a
first-come basis. Three workshops have already been selected!


CLUSTER 2015 WORKSHOP AND TUTORIAL CHAIR

Bronis R. de Supinski, Lawrence Livermore National Laboratory (first
name at llnl dot gov).

--
=
Ioan Raicu, Ph.D.
Assistant Professor, Illinois Institute of Technology (IIT)
Guest Research Faculty, Argonne National Laboratory (ANL)
=
Data-Intensive Distributed Systems Laboratory, CS/IIT
Distributed Systems Laboratory, MCS/ANL
=
Editor: IEEE TCC, Springer Cluster, Springer JoCCASA
Chair:  IEEE/ACM MTAGS, ACM ScienceCloud
=
Cel:  1-847-722-0876
Office:   1-312-567-5704
Email:ira...@cs.iit.edu
Web:  http://www.cs.iit.edu/~iraicu/
Web:  http://datasys.cs.iit.edu/
LinkedIn: http://www.linkedin.com/in/ioanr

Re: [PATCH v4 3/6] hw_random: use reference counts on each struct hwrng.

2014-11-11 Thread Rusty Russell
Amos Kong  writes:
> From: Rusty Russell 
>
> current_rng holds one reference, and we bump it every time we want
> to do a read from it.
>
> This means we only hold the rng_mutex to grab or drop a reference,
> so accessing /sys/devices/virtual/misc/hw_random/rng_current doesn't
> block on read of /dev/hwrng.
>
> Using a kref is overkill (we're always under the rng_mutex), but
> a standard pattern.
>
> This also solves the problem that the hwrng_fillfn thread was
> accessing current_rng without a lock, which could change (eg. to NULL)
> underneath it.
>
> v4: decrease last reference for triggering the cleanup

This doesn't make any sense:

> +static void drop_current_rng(void)
> +{
> + struct hwrng *rng = current_rng;
> +
> + BUG_ON(!mutex_is_locked(&rng_mutex));
> + if (!current_rng)
> + return;
> +
> + /* release current_rng reference */
> + kref_put(¤t_rng->ref, cleanup_rng);
> + current_rng = NULL;
> +
> + /* decrease last reference for triggering the cleanup */
> + kref_put(&rng->ref, cleanup_rng);
> +}

Why would it drop the refcount twice?  This doesn't make sense.

Hmm, because you added kref_init, which initializes the reference count
to 1, you created this bug.

Leave out the kref_init, and let it naturally be 0 (until, and if, it
becomes current_rng).  Add a comment if you want.

Thanks,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 4/6] hw_random: fix unregister race.

2014-11-11 Thread Rusty Russell
Amos Kong  writes:
> From: Rusty Russell 
>
> The previous patch added one potential problem: we can still be
> reading from a hwrng when it's unregistered.  Add a wait for zero
> in the hwrng_unregister path.
>
> v4: add cleanup_done flag to insure that cleanup is done

That's a bit weird.  The usual pattern would be to hold a reference
until we're actually finished, but this reference is a bit weird.

We hold the mutex across cleanup, so we could grab that but we have to
take care sleeping inside wait_event, otherwise Peter will have to fix
my code again :)

AFAICT the wake_woken() stuff isn't merged yet, so your patch will
have to do for now.

> @@ -98,6 +99,8 @@ static inline void cleanup_rng(struct kref *kref)
>  
>   if (rng->cleanup)
>   rng->cleanup(rng);
> + rng->cleanup_done = true;
> + wake_up_all(&rng_done);
>  }
>  
>  static void set_current_rng(struct hwrng *rng)
> @@ -536,6 +539,11 @@ void hwrng_unregister(struct hwrng *rng)
>   kthread_stop(hwrng_fill);
>   } else
>   mutex_unlock(&rng_mutex);
> +
> + /* Just in case rng is reading right now, wait. */
> + wait_event(rng_done, rng->cleanup_done &&
> +atomic_read(&rng->ref.refcount) == 0);
> +

The atomic_read() isn't necessary here.

However, you should probably init cleanup_done in hwrng_register().
(Probably noone does unregister then register, but let's be clear).

Thanks,
Rusty.

>  }
>  EXPORT_SYMBOL_GPL(hwrng_unregister);
>  
> diff --git a/include/linux/hw_random.h b/include/linux/hw_random.h
> index c212e71..7832e50 100644
> --- a/include/linux/hw_random.h
> +++ b/include/linux/hw_random.h
> @@ -46,6 +46,7 @@ struct hwrng {
>   /* internal. */
>   struct list_head list;
>   struct kref ref;
> + bool cleanup_done;
>  };
>  
>  /** Register a new Hardware Random Number Generator driver. */
> -- 
> 1.9.3
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 4/6] hw_random: fix unregister race.

2014-11-11 Thread Herbert Xu
On Mon, Nov 10, 2014 at 09:47:27PM +0800, Herbert Xu wrote:
> On Mon, Nov 03, 2014 at 11:56:24PM +0800, Amos Kong wrote:
> >
> > @@ -98,6 +99,8 @@ static inline void cleanup_rng(struct kref *kref)
> >  
> > if (rng->cleanup)
> > rng->cleanup(rng);
> 
> You need a compiler barrier here to prevent reordering.

Michael Büsch pointed out that we should actually have a memory
barrier here.  I thought we didn't need it because I was only
worried about the code in my original complaint.  However, expecting
driver writers to use correct synchronisation primitives is surely
asking too much.

So let's add an smp_wmb() here to ensure all side-effects of
cleanup is visible.

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization