Re: [Qemu-devel] [libvirt] Problems using netdev_del+netdev_add w/o corresponding device_del+device_add

2012-10-16 Thread Stefan Hajnoczi
On Mon, Oct 15, 2012 at 10:25:58AM +0100, Daniel P. Berrange wrote:
 On Mon, Oct 15, 2012 at 10:30:07AM +0200, Stefan Hajnoczi wrote:
  On Sat, Oct 13, 2012 at 04:47:14PM -0400, Laine Stump wrote:
   Here is the sequence sent to disconnect only the host side, then
   reconnect it with a new tap device. (although the fd is the same, this
   is because the old tap device had already been closed, so the number is
   just being used - the same thing happens when doing sequential full
   detach/attach cycles, and they all work with no problems):
   
   
   168.750  0x7f8e2c90
   {execute:netdev_del,arguments:{id:hostnet0},id:libvirt-30}
   168.762  0x7f8e2c90 {return: {}, id: libvirt-30}
   168.800  0x7f8e2c90
   {execute:getfd,arguments:{fdname:fd-net0},id:libvirt-31}
   (fd=27)
   168.801  0x7f8e2c90 {return: {}, id: libvirt-31}
   168.801  0x7f8e2c90
   {execute:netdev_add,arguments:{type:tap,fd:fd-net0,id:hostnet0},id:libvirt-32}
   168.802  0x7f8e2c90 {return: {}, id: libvirt-32}
   168.802  0x7f8e2c90
   {execute:set_link,arguments:{name:net0,up:true},id:libvirt-33}
   168.803  0x7f8e2c90 {return: {}, id: libvirt-33}
   
   After this sequence is done, everything about the network device
   *appears* normal on both the guest and host (at least the things I know
   to look at), but no traffic from the host shows up in a tcpdump of the
   interface on the guest, and no traffic from the guest shows up in a
   tcpdump of the tap device on the host.
  
  What you are trying to do isn't possible today.
  
  The device associates with the netdev during initialization only - there
  is no command to associate at a later point in time.  That is why your
  example works only when the device is deleted together with the netdev.
  
  It is certainly possible to implement a command to switch netdevs but
  I'm curious what the use case is.  Is this necessary just because QEMU
  doesn't provide a way to modify the existing netdev or because you
  really want to switch to a completely different netdev?
 
 We have end users who want to be able to dynamically change the guest'
 networking attachment, without restarting/hotplugging devices in the
 guest[1]. If it is just a case of changing from one bridge, to another
 bridge we can do that just by moving the TAP Device from one to another.
 This doesn't work if we want to support more general changes in config.
 eg from a macvtap setup to a TAP setup, or vica-verca.
 
 Another requirement is to be able to start a guest with a null backend
 (akin to not plugging in the ethernet cable on a physical host), and
 then attach it to a bridge/macvtap device on the fly later on (akin
 to then plugging in the ethernet cable once running).

virtio-net presents a challenge because checksum offload and other
advanced features are announced in the virtio feature bits.  Virtio
feature bits don't change during the lifetime of the device and there's
no way to notify existing guests to re-negotiate them besides taking
down the device.

The offload feature bits are tied to the netdev in QEMU, especially the
tap driver's vnet_hdr feature which allows the guest to pass through
offload flags to the host network stack.  QEMU does not emulate these
today and only enables them when the netdev supports vnet_hdr.

In other words, virtio-net is tied to its netdev.  Changing from -netdev
tap,vhost=on setup to a -netdev user is difficult.

Two possibilities:

1. Add offload emulation code to QEMU.

2. Place sufficient checks in QEMU and libvirt so that only safe
   netdev changes can be made.

#1 is unattractive because this code path will rarely be used but is
complex (a bunch of buffer munging and memory management).

Are you trying to change netdev without involving the guest?  In that
case the link must stay up and libvirt needs to ensure that the new
netdev will have a compatible network configuration (subnet, gateway, IP
address details).

Stefan



Re: [Qemu-devel] [libvirt] Problems using netdev_del+netdev_add w/o corresponding device_del+device_add

2012-10-16 Thread Markus Armbruster
Laine Stump la...@redhat.com writes:

 On 10/15/2012 05:25 AM, Daniel P. Berrange wrote:
 On Mon, Oct 15, 2012 at 10:30:07AM +0200, Stefan Hajnoczi wrote:
 On Sat, Oct 13, 2012 at 04:47:14PM -0400, Laine Stump wrote:
 Here is the sequence sent to disconnect only the host side, then
 reconnect it with a new tap device. (although the fd is the same, this
 is because the old tap device had already been closed, so the number is
 just being used - the same thing happens when doing sequential full
 detach/attach cycles, and they all work with no problems):


 168.750  0x7f8e2c90
 {execute:netdev_del,arguments:{id:hostnet0},id:libvirt-30}
 168.762  0x7f8e2c90 {return: {}, id: libvirt-30}

This deletes the backend, and leaves the frontend NIC without a backend.
Such as NIC behaves / should behave like it's not connected to anything
(link down).

 168.800  0x7f8e2c90
 {execute:getfd,arguments:{fdname:fd-net0},id:libvirt-31}
 (fd=27)
 168.801  0x7f8e2c90 {return: {}, id: libvirt-31}
 168.801  0x7f8e2c90
 {execute:netdev_add,arguments:{type:tap,fd:fd-net0,id:hostnet0},id:libvirt-32}
 168.802  0x7f8e2c90 {return: {}, id: libvirt-32}
 168.802  0x7f8e2c90

This creates a new backend, not connected to any frontend.  The fact
that it has the same ID as some deleted backend is completely
immaterial.

 {execute:set_link,arguments:{name:net0,up:true},id:libvirt-33}
 168.803  0x7f8e2c90 {return: {}, id: libvirt-33}

This orders the NIC to change the link status to up.  Can't work,
because it's still not connected to anything.  It succeeds anyway, which
could be regarded as a bug.

 After this sequence is done, everything about the network device
 *appears* normal on both the guest and host (at least the things I know
 to look at), but no traffic from the host shows up in a tcpdump of the
 interface on the guest, and no traffic from the guest shows up in a
 tcpdump of the tap device on the host.
 What you are trying to do isn't possible today.

 Well, at least it's good to know that I should stop trying to make it
 work :-)

 Actually, it's a bit disconcerting that 1) the act of creating a guest
 device is split into two commands, implying that they don't necessarily
 have a hardwired a--b relationship although that is the case, and that

It isn't really the case.

Network frontend and backend are really separate things, but...

 2) netdev_add even returns success when you use it in this way. Although
 hindsight is 20/20 and all that, if both a and b are required, and must
 always be in the same order, wouldn't it have made more sense for the
 two steps to be a single command? I suppose this is a byproduct of the
 monitor commands being a direct reflection ot the commandline options.
 (At the very least, though, I think netdev_add should report an error if
 the device name alias it uses is already in use by a device.)


 The device associates with the netdev during initialization only - there

... the connection between the two can only be made during frontend
initialization.  Not because of design limitations, just because more
dynamic connecting hasn't been implemented.

 is no command to associate at a later point in time.  That is why your
 example works only when the device is deleted together with the netdev.

 It is certainly possible to implement a command to switch netdevs

 At this point yes, it would be better to have a new command rather than
 to make netdev_add work in the way I've attempted - this way there would
 be a new command whose presence libvirt could use to decide whether or
 not to support this functionality.

Besides, I'd oppose ID magic like making netdev_add behave differently
when the ID matches some previously used ID.

  but
 I'm curious what the use case is.  Is this necessary just because QEMU
 doesn't provide a way to modify the existing netdev or because you
 really want to switch to a completely different netdev?
 We have end users who want to be able to dynamically change the guest'
 networking attachment, without restarting/hotplugging devices in the
 guest[1]. If it is just a case of changing from one bridge, to another
 bridge we can do that just by moving the TAP Device from one to another.
 This doesn't work if we want to support more general changes in config.
 eg from a macvtap setup to a TAP setup, or vica-verca.

 Beyond that, I haven't determined it conclusively yet, but it so far
 looks to me like a macvtap device can only be linked to a physdev when
 it is created - there is no netlink message to re-link it to a different
 physdev (this is based on my naive examination of the relevant kernel
 source). So if you want to change the attach point for a macvtap-type
 connection, you again need to discard the old macvtap device and create
 a new one, implying that you need to do a new netdev_add.

Wanting to connect a frontend NIC to a different backend seems entirely
fair to me.  Patches welcome :)



Re: [Qemu-devel] [libvirt] Problems using netdev_del+netdev_add w/o corresponding device_del+device_add

2012-10-16 Thread Stefan Hajnoczi
On Mon, Oct 15, 2012 at 11:15:30AM -0400, Laine Stump wrote:
 On 10/15/2012 05:25 AM, Daniel P. Berrange wrote:
  On Mon, Oct 15, 2012 at 10:30:07AM +0200, Stefan Hajnoczi wrote:
  On Sat, Oct 13, 2012 at 04:47:14PM -0400, Laine Stump wrote:
  Here is the sequence sent to disconnect only the host side, then
  reconnect it with a new tap device. (although the fd is the same, this
  is because the old tap device had already been closed, so the number is
  just being used - the same thing happens when doing sequential full
  detach/attach cycles, and they all work with no problems):
 
 
  168.750  0x7f8e2c90
  {execute:netdev_del,arguments:{id:hostnet0},id:libvirt-30}
  168.762  0x7f8e2c90 {return: {}, id: libvirt-30}
  168.800  0x7f8e2c90
  {execute:getfd,arguments:{fdname:fd-net0},id:libvirt-31}
  (fd=27)
  168.801  0x7f8e2c90 {return: {}, id: libvirt-31}
  168.801  0x7f8e2c90
  {execute:netdev_add,arguments:{type:tap,fd:fd-net0,id:hostnet0},id:libvirt-32}
  168.802  0x7f8e2c90 {return: {}, id: libvirt-32}
  168.802  0x7f8e2c90
  {execute:set_link,arguments:{name:net0,up:true},id:libvirt-33}
  168.803  0x7f8e2c90 {return: {}, id: libvirt-33}
 
  After this sequence is done, everything about the network device
  *appears* normal on both the guest and host (at least the things I know
  to look at), but no traffic from the host shows up in a tcpdump of the
  interface on the guest, and no traffic from the guest shows up in a
  tcpdump of the tap device on the host.
  What you are trying to do isn't possible today.
 
 Well, at least it's good to know that I should stop trying to make it
 work :-)
 
 Actually, it's a bit disconcerting that 1) the act of creating a guest
 device is split into two commands, implying that they don't necessarily
 have a hardwired a--b relationship although that is the case, and that
 2) netdev_add even returns success when you use it in this way. Although
 hindsight is 20/20 and all that, if both a and b are required, and must
 always be in the same order, wouldn't it have made more sense for the
 two steps to be a single command? I suppose this is a byproduct of the
 monitor commands being a direct reflection ot the commandline options.
 (At the very least, though, I think netdev_add should report an error if
 the device name alias it uses is already in use by a device.)

The commands are historic (at least to me) and we have to make the best
of them.

   but
  I'm curious what the use case is.  Is this necessary just because QEMU
  doesn't provide a way to modify the existing netdev or because you
  really want to switch to a completely different netdev?
  We have end users who want to be able to dynamically change the guest'
  networking attachment, without restarting/hotplugging devices in the
  guest[1]. If it is just a case of changing from one bridge, to another
  bridge we can do that just by moving the TAP Device from one to another.
  This doesn't work if we want to support more general changes in config.
  eg from a macvtap setup to a TAP setup, or vica-verca.
 
 Beyond that, I haven't determined it conclusively yet, but it so far
 looks to me like a macvtap device can only be linked to a physdev when
 it is created - there is no netlink message to re-link it to a different
 physdev (this is based on my naive examination of the relevant kernel
 source). So if you want to change the attach point for a macvtap-type
 connection, you again need to discard the old macvtap device and create
 a new one, implying that you need to do a new netdev_add.

Yep, I just checked too.  macvlan_dev-lowerdev is only set in
macvlan_common_newlink().  There is no way to change it once the link
has been created.

Stefan



Re: [Qemu-devel] [libvirt] Problems using netdev_del+netdev_add w/o corresponding device_del+device_add

2012-10-16 Thread Daniel P. Berrange
On Tue, Oct 16, 2012 at 10:08:21AM +0200, Stefan Hajnoczi wrote:
 On Mon, Oct 15, 2012 at 10:25:58AM +0100, Daniel P. Berrange wrote:
  On Mon, Oct 15, 2012 at 10:30:07AM +0200, Stefan Hajnoczi wrote:
   On Sat, Oct 13, 2012 at 04:47:14PM -0400, Laine Stump wrote:
Here is the sequence sent to disconnect only the host side, then
reconnect it with a new tap device. (although the fd is the same, this
is because the old tap device had already been closed, so the number is
just being used - the same thing happens when doing sequential full
detach/attach cycles, and they all work with no problems):


168.750  0x7f8e2c90
{execute:netdev_del,arguments:{id:hostnet0},id:libvirt-30}
168.762  0x7f8e2c90 {return: {}, id: libvirt-30}
168.800  0x7f8e2c90
{execute:getfd,arguments:{fdname:fd-net0},id:libvirt-31}
(fd=27)
168.801  0x7f8e2c90 {return: {}, id: libvirt-31}
168.801  0x7f8e2c90
{execute:netdev_add,arguments:{type:tap,fd:fd-net0,id:hostnet0},id:libvirt-32}
168.802  0x7f8e2c90 {return: {}, id: libvirt-32}
168.802  0x7f8e2c90
{execute:set_link,arguments:{name:net0,up:true},id:libvirt-33}
168.803  0x7f8e2c90 {return: {}, id: libvirt-33}

After this sequence is done, everything about the network device
*appears* normal on both the guest and host (at least the things I know
to look at), but no traffic from the host shows up in a tcpdump of the
interface on the guest, and no traffic from the guest shows up in a
tcpdump of the tap device on the host.
   
   What you are trying to do isn't possible today.
   
   The device associates with the netdev during initialization only - there
   is no command to associate at a later point in time.  That is why your
   example works only when the device is deleted together with the netdev.
   
   It is certainly possible to implement a command to switch netdevs but
   I'm curious what the use case is.  Is this necessary just because QEMU
   doesn't provide a way to modify the existing netdev or because you
   really want to switch to a completely different netdev?
  
  We have end users who want to be able to dynamically change the guest'
  networking attachment, without restarting/hotplugging devices in the
  guest[1]. If it is just a case of changing from one bridge, to another
  bridge we can do that just by moving the TAP Device from one to another.
  This doesn't work if we want to support more general changes in config.
  eg from a macvtap setup to a TAP setup, or vica-verca.
  
  Another requirement is to be able to start a guest with a null backend
  (akin to not plugging in the ethernet cable on a physical host), and
  then attach it to a bridge/macvtap device on the fly later on (akin
  to then plugging in the ethernet cable once running).
 
 virtio-net presents a challenge because checksum offload and other
 advanced features are announced in the virtio feature bits.  Virtio
 feature bits don't change during the lifetime of the device and there's
 no way to notify existing guests to re-negotiate them besides taking
 down the device.
 
 The offload feature bits are tied to the netdev in QEMU, especially the
 tap driver's vnet_hdr feature which allows the guest to pass through
 offload flags to the host network stack.  QEMU does not emulate these
 today and only enables them when the netdev supports vnet_hdr.
 
 In other words, virtio-net is tied to its netdev.  Changing from -netdev
 tap,vhost=on setup to a -netdev user is difficult.

Urgh, so much for there being a clean separation between frontend
and backend :-(

 Two possibilities:
 
 1. Add offload emulation code to QEMU.
 
 2. Place sufficient checks in QEMU and libvirt so that only safe
netdev changes can be made.
 
 #1 is unattractive because this code path will rarely be used but is
 complex (a bunch of buffer munging and memory management).

Agreed, sounds like 2 is the only practical option.

 Are you trying to change netdev without involving the guest?  In that
 case the link must stay up and libvirt needs to ensure that the new
 netdev will have a compatible network configuration (subnet, gateway, IP
 address details).

We'd leave the decision about that upto the management tool using these
APIs. In some cases the subnet/ip/gateway/etc might remain the same, in
other cases, the mgmt tool may want to set the link down, change backend
then set the link online again, to make NetworkManager (or whatever)
redo DHCP in the guest.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|



Re: [Qemu-devel] [libvirt] Problems using netdev_del+netdev_add w/o corresponding device_del+device_add

2012-10-15 Thread Daniel P. Berrange
On Mon, Oct 15, 2012 at 10:30:07AM +0200, Stefan Hajnoczi wrote:
 On Sat, Oct 13, 2012 at 04:47:14PM -0400, Laine Stump wrote:
  Here is the sequence sent to disconnect only the host side, then
  reconnect it with a new tap device. (although the fd is the same, this
  is because the old tap device had already been closed, so the number is
  just being used - the same thing happens when doing sequential full
  detach/attach cycles, and they all work with no problems):
  
  
  168.750  0x7f8e2c90
  {execute:netdev_del,arguments:{id:hostnet0},id:libvirt-30}
  168.762  0x7f8e2c90 {return: {}, id: libvirt-30}
  168.800  0x7f8e2c90
  {execute:getfd,arguments:{fdname:fd-net0},id:libvirt-31}
  (fd=27)
  168.801  0x7f8e2c90 {return: {}, id: libvirt-31}
  168.801  0x7f8e2c90
  {execute:netdev_add,arguments:{type:tap,fd:fd-net0,id:hostnet0},id:libvirt-32}
  168.802  0x7f8e2c90 {return: {}, id: libvirt-32}
  168.802  0x7f8e2c90
  {execute:set_link,arguments:{name:net0,up:true},id:libvirt-33}
  168.803  0x7f8e2c90 {return: {}, id: libvirt-33}
  
  After this sequence is done, everything about the network device
  *appears* normal on both the guest and host (at least the things I know
  to look at), but no traffic from the host shows up in a tcpdump of the
  interface on the guest, and no traffic from the guest shows up in a
  tcpdump of the tap device on the host.
 
 What you are trying to do isn't possible today.
 
 The device associates with the netdev during initialization only - there
 is no command to associate at a later point in time.  That is why your
 example works only when the device is deleted together with the netdev.
 
 It is certainly possible to implement a command to switch netdevs but
 I'm curious what the use case is.  Is this necessary just because QEMU
 doesn't provide a way to modify the existing netdev or because you
 really want to switch to a completely different netdev?

We have end users who want to be able to dynamically change the guest'
networking attachment, without restarting/hotplugging devices in the
guest[1]. If it is just a case of changing from one bridge, to another
bridge we can do that just by moving the TAP Device from one to another.
This doesn't work if we want to support more general changes in config.
eg from a macvtap setup to a TAP setup, or vica-verca.

Another requirement is to be able to start a guest with a null backend
(akin to not plugging in the ethernet cable on a physical host), and
then attach it to a bridge/macvtap device on the fly later on (akin
to then plugging in the ethernet cable once running).

Regards,
Daniel

[1] Obviously the guest might need to reconfigure its IP or re-run DHCP
though.
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|



Re: [Qemu-devel] [libvirt] Problems using netdev_del+netdev_add w/o corresponding device_del+device_add

2012-10-15 Thread Laine Stump
On 10/15/2012 05:25 AM, Daniel P. Berrange wrote:
 On Mon, Oct 15, 2012 at 10:30:07AM +0200, Stefan Hajnoczi wrote:
 On Sat, Oct 13, 2012 at 04:47:14PM -0400, Laine Stump wrote:
 Here is the sequence sent to disconnect only the host side, then
 reconnect it with a new tap device. (although the fd is the same, this
 is because the old tap device had already been closed, so the number is
 just being used - the same thing happens when doing sequential full
 detach/attach cycles, and they all work with no problems):


 168.750  0x7f8e2c90
 {execute:netdev_del,arguments:{id:hostnet0},id:libvirt-30}
 168.762  0x7f8e2c90 {return: {}, id: libvirt-30}
 168.800  0x7f8e2c90
 {execute:getfd,arguments:{fdname:fd-net0},id:libvirt-31}
 (fd=27)
 168.801  0x7f8e2c90 {return: {}, id: libvirt-31}
 168.801  0x7f8e2c90
 {execute:netdev_add,arguments:{type:tap,fd:fd-net0,id:hostnet0},id:libvirt-32}
 168.802  0x7f8e2c90 {return: {}, id: libvirt-32}
 168.802  0x7f8e2c90
 {execute:set_link,arguments:{name:net0,up:true},id:libvirt-33}
 168.803  0x7f8e2c90 {return: {}, id: libvirt-33}

 After this sequence is done, everything about the network device
 *appears* normal on both the guest and host (at least the things I know
 to look at), but no traffic from the host shows up in a tcpdump of the
 interface on the guest, and no traffic from the guest shows up in a
 tcpdump of the tap device on the host.
 What you are trying to do isn't possible today.

Well, at least it's good to know that I should stop trying to make it
work :-)

Actually, it's a bit disconcerting that 1) the act of creating a guest
device is split into two commands, implying that they don't necessarily
have a hardwired a--b relationship although that is the case, and that
2) netdev_add even returns success when you use it in this way. Although
hindsight is 20/20 and all that, if both a and b are required, and must
always be in the same order, wouldn't it have made more sense for the
two steps to be a single command? I suppose this is a byproduct of the
monitor commands being a direct reflection ot the commandline options.
(At the very least, though, I think netdev_add should report an error if
the device name alias it uses is already in use by a device.)


 The device associates with the netdev during initialization only - there
 is no command to associate at a later point in time.  That is why your
 example works only when the device is deleted together with the netdev.

 It is certainly possible to implement a command to switch netdevs

At this point yes, it would be better to have a new command rather than
to make netdev_add work in the way I've attempted - this way there would
be a new command whose presence libvirt could use to decide whether or
not to support this functionality.

  but
 I'm curious what the use case is.  Is this necessary just because QEMU
 doesn't provide a way to modify the existing netdev or because you
 really want to switch to a completely different netdev?
 We have end users who want to be able to dynamically change the guest'
 networking attachment, without restarting/hotplugging devices in the
 guest[1]. If it is just a case of changing from one bridge, to another
 bridge we can do that just by moving the TAP Device from one to another.
 This doesn't work if we want to support more general changes in config.
 eg from a macvtap setup to a TAP setup, or vica-verca.

Beyond that, I haven't determined it conclusively yet, but it so far
looks to me like a macvtap device can only be linked to a physdev when
it is created - there is no netlink message to re-link it to a different
physdev (this is based on my naive examination of the relevant kernel
source). So if you want to change the attach point for a macvtap-type
connection, you again need to discard the old macvtap device and create
a new one, implying that you need to do a new netdev_add.