Re: [Openstack] Libvirt LXC with volume-attach broken ?

2012-07-10 Thread Eric W. Biederman
Serge Hallyn serge.hal...@canonical.com writes:

 Quoting Daniel P. Berrange (berra...@redhat.com):
 On Thu, Jul 05, 2012 at 03:00:26PM +0100, Daniel P. Berrange wrote:
  Now, when using 'nova volume-attach':
  
# nova volume-attach 05eb16df-03b8-451b-85c1-b838a8757736 
  a5ad1d37-aed0-4bf6-8c6e-c28543cd38ac /dev/sdf
  
  nova will import an iSCSI LUN from the nova volume service, on the compute
  node. The kernel will assign it the next free SCSI drive letter, in my
  case '/dev/sdc'.
  
  The libvirt nova driver will then do a mknod, using the volume name
  passed to 'nova volume-attach'.
  eg it will do
  
mknod  /var/lib/nova/instances/instance-000e/rootfs/dev/sdf
 
 Opps, I'm slightly wrong here. What it actually does is
 
   mount --bind /dev/sdc 
 /var/lib/nova/instances/instance-000e/rootfs/dev/sdf
 
 so you get a 'sdf' device, but with the major/minor number of the 'sdc'
 device. I can't say I particularly like this approach. Ultimately I
 think we need the kernel support to make this work correctly. In any

 Yes, that's what the 'devices namespace' is meant to address.  I'm hoping
 we can some serious design discussion on that in the next few months.

This is not the device namespace problem.

This is the setns problem for mount namespaces, and the unprivilged
mount problem.

There may be a notification issue so use space can perform actions
in a container when a device shows up.

But it should be very possible on the host to call.
setns(containers_mount_namespace);
mknod(/dev/foo);
chown(/dev/foo, CONTAINER_ROOT_UID, CONTAINER_ROOT_GID);

And then from inside the container especially when I get the rest of
the user namespace merged it should be very possible to manipulate
the block device because you have permission, and to mount the
partitions of the block device, because you are root in your container.

But until the user namespace is merged you really are root so you can
mount whatever.

Daniel does that sound like the support you are looking for?

Eric



___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Libvirt LXC with volume-attach broken ?

2012-07-10 Thread Eric W. Biederman
Daniel P. Berrange berra...@redhat.com writes:

 On Thu, Jul 05, 2012 at 06:49:06PM -0700, Eric W. Biederman wrote:
 Serge Hallyn serge.hal...@canonical.com writes:
 
  Quoting Daniel P. Berrange (berra...@redhat.com):
  On Thu, Jul 05, 2012 at 03:00:26PM +0100, Daniel P. Berrange wrote:
   Now, when using 'nova volume-attach':
   
 # nova volume-attach 05eb16df-03b8-451b-85c1-b838a8757736 
   a5ad1d37-aed0-4bf6-8c6e-c28543cd38ac /dev/sdf
   
   nova will import an iSCSI LUN from the nova volume service, on the 
   compute
   node. The kernel will assign it the next free SCSI drive letter, in my
   case '/dev/sdc'.
   
   The libvirt nova driver will then do a mknod, using the volume name
   passed to 'nova volume-attach'.
   eg it will do
   
 mknod  /var/lib/nova/instances/instance-000e/rootfs/dev/sdf
  
  Opps, I'm slightly wrong here. What it actually does is
  
mount --bind /dev/sdc 
  /var/lib/nova/instances/instance-000e/rootfs/dev/sdf
  
  so you get a 'sdf' device, but with the major/minor number of the 'sdc'
  device. I can't say I particularly like this approach. Ultimately I
  think we need the kernel support to make this work correctly. In any
 
  Yes, that's what the 'devices namespace' is meant to address.  I'm hoping
  we can some serious design discussion on that in the next few months.
 
 This is not the device namespace problem.
 
 This is the setns problem for mount namespaces, and the unprivilged
 mount problem.
 
 There may be a notification issue so use space can perform actions
 in a container when a device shows up.
 
 But it should be very possible on the host to call.
 setns(containers_mount_namespace);
 mknod(/dev/foo);
 chown(/dev/foo, CONTAINER_ROOT_UID, CONTAINER_ROOT_GID);
 
 And then from inside the container especially when I get the rest of
 the user namespace merged it should be very possible to manipulate
 the block device because you have permission, and to mount the
 partitions of the block device, because you are root in your container.
 
 But until the user namespace is merged you really are root so you can
 mount whatever.
 
 Daniel does that sound like the support you are looking for?

 Yes, the setns(mnt) approach you describe above is exactly what I'd
 like to be able todo, to solve the first half of the problem.

 The part of the problem is that I have a /dev/sdf, or even a
 /dev/volgroup00/logvol3 in the host (with whatever major:minor
 number that implies), and I want to be able to make it always
 appear as /dev/sda  in the container (with the correspondingly
 different major:minor number).  I'm guessing this is what Serge
 was refering to as the 'device' namespace problem

Getting the device to always appear with the name /dev/sda is easy.

Where does the need to have a specific device come from?  I would have
thought by now that hotplug had been around long enough that in general
user space would not care.

The only case that I know of where keeping the same device number seems
reasonable is in the case of live migration an application, in order to
avoid issues with stat changing for the same file over the transition,
and I think a synthesized hotplug event could probably handle that case.

Is there another case besides buggy applications that have hard
coded device numbers that need specific device numbers?

Eric


___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Libvirt LXC with volume-attach broken ?

2012-07-10 Thread Eric W. Biederman
Daniel P. Berrange berra...@redhat.com writes:

 On Fri, Jul 06, 2012 at 02:35:14AM -0700, Eric W. Biederman wrote:
 Daniel P. Berrange berra...@redhat.com writes:
  The part of the problem is that I have a /dev/sdf, or even a
  /dev/volgroup00/logvol3 in the host (with whatever major:minor
  number that implies), and I want to be able to make it always
  appear as /dev/sda  in the container (with the correspondingly
  different major:minor number).  I'm guessing this is what Serge
  was refering to as the 'device' namespace problem
 
 Getting the device to always appear with the name /dev/sda is easy.
 
 Where does the need to have a specific device come from?  I would have
 thought by now that hotplug had been around long enough that in general
 user space would not care.
 
 The only case that I know of where keeping the same device number seems
 reasonable is in the case of live migration an application, in order to
 avoid issues with stat changing for the same file over the transition,
 and I think a synthesized hotplug event could probably handle that case.
 
 Is there another case besides buggy applications that have hard
 coded device numbers that need specific device numbers?

 There isn't any particular buggy application we're trying to avoid
 here. We're just trying to provide an piece of OpenStack functionality
 to LXC in the same way as its provided to KVM.

 With a basic OpenStack instance, you just get the root filesystem
 from the image you booted, whose contents are transient (ie thrown
 away on shutdown). It is possible to tell OpenStack to attach one
 or more block devices to a running instance, which give you some
 persistent storage.

 The end user API for this lets the host admin specify the device
 name that the block device will appear as inside the instance.

 eg, with KVM you'd invoke:

  # nova volume-attach myguest  mystoragevol1 /dev/vdb
  # nova volume-attach myguest  mystoragevol2 /dev/vdc

 Obviously with KVM this just works, because you have a level of
 indirection between host  guest device names via virtio-blk.

 The desire is to be able to wire up LXC in a similar way

  # nova volume-attach myguest  mystoragevol1 /dev/sdb
  # nova volume-attach myguest  mystoragevol2 /dev/sdc

 So it is really the host admin specifying that they want to provide
 the container with a '/dev/sdb' device, regardless of what the actual
 device node on the host is (it could be an iSCSI LUN, multipath LUN,
 LVM volume, or whatever). So I'm really looking to have the container
 visible device name be independent of host name.

And there is a level of inderection in more linux, between the
device and the device name.  That level of indirection is the
device number. 

So you should have no trouble specifying the device name.

Regardless it looks like setns is enough for this problem.

There is a challege to be with some of the more advanced
parts of this.  Things like creating loopback block devices
from files, etc.  But I think I just need to get my setns
patch for the mount namespace merged and you should have
what you need for libvirt and lxc.

Eric

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Libvirt LXC with volume-attach broken ?

2012-07-06 Thread Daniel P. Berrange
On Thu, Jul 05, 2012 at 06:49:06PM -0700, Eric W. Biederman wrote:
 Serge Hallyn serge.hal...@canonical.com writes:
 
  Quoting Daniel P. Berrange (berra...@redhat.com):
  On Thu, Jul 05, 2012 at 03:00:26PM +0100, Daniel P. Berrange wrote:
   Now, when using 'nova volume-attach':
   
 # nova volume-attach 05eb16df-03b8-451b-85c1-b838a8757736 
   a5ad1d37-aed0-4bf6-8c6e-c28543cd38ac /dev/sdf
   
   nova will import an iSCSI LUN from the nova volume service, on the 
   compute
   node. The kernel will assign it the next free SCSI drive letter, in my
   case '/dev/sdc'.
   
   The libvirt nova driver will then do a mknod, using the volume name
   passed to 'nova volume-attach'.
   eg it will do
   
 mknod  /var/lib/nova/instances/instance-000e/rootfs/dev/sdf
  
  Opps, I'm slightly wrong here. What it actually does is
  
mount --bind /dev/sdc 
  /var/lib/nova/instances/instance-000e/rootfs/dev/sdf
  
  so you get a 'sdf' device, but with the major/minor number of the 'sdc'
  device. I can't say I particularly like this approach. Ultimately I
  think we need the kernel support to make this work correctly. In any
 
  Yes, that's what the 'devices namespace' is meant to address.  I'm hoping
  we can some serious design discussion on that in the next few months.
 
 This is not the device namespace problem.
 
 This is the setns problem for mount namespaces, and the unprivilged
 mount problem.
 
 There may be a notification issue so use space can perform actions
 in a container when a device shows up.
 
 But it should be very possible on the host to call.
 setns(containers_mount_namespace);
 mknod(/dev/foo);
 chown(/dev/foo, CONTAINER_ROOT_UID, CONTAINER_ROOT_GID);
 
 And then from inside the container especially when I get the rest of
 the user namespace merged it should be very possible to manipulate
 the block device because you have permission, and to mount the
 partitions of the block device, because you are root in your container.
 
 But until the user namespace is merged you really are root so you can
 mount whatever.
 
 Daniel does that sound like the support you are looking for?

Yes, the setns(mnt) approach you describe above is exactly what I'd
like to be able todo, to solve the first half of the problem.

The part of the problem is that I have a /dev/sdf, or even a
/dev/volgroup00/logvol3 in the host (with whatever major:minor
number that implies), and I want to be able to make it always
appear as /dev/sda  in the container (with the correspondingly
different major:minor number).  I'm guessing this is what Serge
was refering to as the 'device' namespace problem


Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Libvirt LXC with volume-attach broken ?

2012-07-06 Thread Daniel P. Berrange
On Fri, Jul 06, 2012 at 02:35:14AM -0700, Eric W. Biederman wrote:
 Daniel P. Berrange berra...@redhat.com writes:
  The part of the problem is that I have a /dev/sdf, or even a
  /dev/volgroup00/logvol3 in the host (with whatever major:minor
  number that implies), and I want to be able to make it always
  appear as /dev/sda  in the container (with the correspondingly
  different major:minor number).  I'm guessing this is what Serge
  was refering to as the 'device' namespace problem
 
 Getting the device to always appear with the name /dev/sda is easy.
 
 Where does the need to have a specific device come from?  I would have
 thought by now that hotplug had been around long enough that in general
 user space would not care.
 
 The only case that I know of where keeping the same device number seems
 reasonable is in the case of live migration an application, in order to
 avoid issues with stat changing for the same file over the transition,
 and I think a synthesized hotplug event could probably handle that case.
 
 Is there another case besides buggy applications that have hard
 coded device numbers that need specific device numbers?

There isn't any particular buggy application we're trying to avoid
here. We're just trying to provide an piece of OpenStack functionality
to LXC in the same way as its provided to KVM.

With a basic OpenStack instance, you just get the root filesystem
from the image you booted, whose contents are transient (ie thrown
away on shutdown). It is possible to tell OpenStack to attach one
or more block devices to a running instance, which give you some
persistent storage.

The end user API for this lets the host admin specify the device
name that the block device will appear as inside the instance.

eg, with KVM you'd invoke:

 # nova volume-attach myguest  mystoragevol1 /dev/vdb
 # nova volume-attach myguest  mystoragevol2 /dev/vdc

Obviously with KVM this just works, because you have a level of
indirection between host  guest device names via virtio-blk.

The desire is to be able to wire up LXC in a similar way

 # nova volume-attach myguest  mystoragevol1 /dev/sdb
 # nova volume-attach myguest  mystoragevol2 /dev/sdc

So it is really the host admin specifying that they want to provide
the container with a '/dev/sdb' device, regardless of what the actual
device node on the host is (it could be an iSCSI LUN, multipath LUN,
LVM volume, or whatever). So I'm really looking to have the container
visible device name be independent of host name.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Libvirt LXC with volume-attach broken ?

2012-07-06 Thread Serge Hallyn
Quoting Eric W. Biederman (ebied...@xmission.com):
 Daniel P. Berrange berra...@redhat.com writes:
 
  On Thu, Jul 05, 2012 at 06:49:06PM -0700, Eric W. Biederman wrote:
  Serge Hallyn serge.hal...@canonical.com writes:
  
   Quoting Daniel P. Berrange (berra...@redhat.com):
   On Thu, Jul 05, 2012 at 03:00:26PM +0100, Daniel P. Berrange wrote:
Now, when using 'nova volume-attach':

  # nova volume-attach 05eb16df-03b8-451b-85c1-b838a8757736 
a5ad1d37-aed0-4bf6-8c6e-c28543cd38ac /dev/sdf

nova will import an iSCSI LUN from the nova volume service, on the 
compute
node. The kernel will assign it the next free SCSI drive letter, in my
case '/dev/sdc'.

The libvirt nova driver will then do a mknod, using the volume name
passed to 'nova volume-attach'.
eg it will do

  mknod  /var/lib/nova/instances/instance-000e/rootfs/dev/sdf
   
   Opps, I'm slightly wrong here. What it actually does is
   
 mount --bind /dev/sdc 
   /var/lib/nova/instances/instance-000e/rootfs/dev/sdf
   
   so you get a 'sdf' device, but with the major/minor number of the 'sdc'
   device. I can't say I particularly like this approach. Ultimately I
   think we need the kernel support to make this work correctly. In any
  
   Yes, that's what the 'devices namespace' is meant to address.  I'm hoping
   we can some serious design discussion on that in the next few months.
  
  This is not the device namespace problem.
  
  This is the setns problem for mount namespaces, and the unprivilged
  mount problem.
  
  There may be a notification issue so use space can perform actions
  in a container when a device shows up.
  
  But it should be very possible on the host to call.
  setns(containers_mount_namespace);
  mknod(/dev/foo);
  chown(/dev/foo, CONTAINER_ROOT_UID, CONTAINER_ROOT_GID);
  
  And then from inside the container especially when I get the rest of
  the user namespace merged it should be very possible to manipulate
  the block device because you have permission, and to mount the
  partitions of the block device, because you are root in your container.
  
  But until the user namespace is merged you really are root so you can
  mount whatever.
  
  Daniel does that sound like the support you are looking for?
 
  Yes, the setns(mnt) approach you describe above is exactly what I'd
  like to be able todo, to solve the first half of the problem.
 
  The part of the problem is that I have a /dev/sdf, or even a
  /dev/volgroup00/logvol3 in the host (with whatever major:minor
  number that implies), and I want to be able to make it always
  appear as /dev/sda  in the container (with the correspondingly
  different major:minor number).  I'm guessing this is what Serge
  was refering to as the 'device' namespace problem

Right.

 Getting the device to always appear with the name /dev/sda is easy.

It's easy to log in and make it look that way.  It's not easy to
make all distros see it that way across boot.

 Where does the need to have a specific device come from?  I would have
 thought by now that hotplug had been around long enough that in general
 user space would not care.

Yes the *primary* need for the devices namespace is to prevent udev
storm in the host and send uevents to the right place, and macvtap
and loop devices.

 The only case that I know of where keeping the same device number seems
 reasonable is in the case of live migration an application, in order to
 avoid issues with stat changing for the same file over the transition,
 and I think a synthesized hotplug event could probably handle that case.
 
 Is there another case besides buggy applications that have hard
 coded device numbers that need specific device numbers?

Other cases where specific device maj-min numbers are important
are things like makedev.  There is lots of software, and especially
automatic update software, which insists that things have specific
'correct' maj-minor numbers.

FWIW my (presumably naive) view is that for each non-init devicens
we'd have a list of

type-major:minor::type2-major:minor2

(:: meaning maps-to).  Then if a uevent comes through not aimed at
any type2-major2:minor2 valid in the namespace, that ns doesn't get
the uevent.

-serge

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


[Openstack] Libvirt LXC with volume-attach broken ?

2012-07-05 Thread Daniel P. Berrange
In the Libvirt driver there is special-case code for LXC to deal with
the volume-attach functionality, since there is no block device attach
functionality in libvirt for LXC. The code in question was added in

  commit e40b659d320b3c6894862b87adf1011e31cbf8fc
  Author: Chuck Short chuck.sh...@canonical.com
  Date:   Tue Jan 31 20:53:24 2012 -0500

Add support for LXC volumes.

This introduces volume support for LXC containers in Nova.
The way that this works is that when a device is attached to an
LXC container is that, the xml is parsed to find out which device to
connect to the LXC container, binds the device to the LXC container,
and allow the device through cgroups.

This bug fixes LP: #924601.

Change-Id: I00b41426ae8354b3cd4212655ecb48319a63aa9b
Signed-off-by: Chuck Short chuck.sh...@canonical.com

First a little background

The way LXC works with Nova, is that the image file assigned to the instance
eg 

  /var/lib/nova/instances/instance-000e/disk

is exported via qemu-nbd, and then mounted on the host at

  /var/lib/nova/instances/instance-000e/rootfs


When libvirt starts the container it uses that directory as the root
filesystem. libvirt will *also* mount a private /dev, /dev/pts, /proc
and /sys for the container. This is all fine

Now, when using 'nova volume-attach':

  # nova volume-attach 05eb16df-03b8-451b-85c1-b838a8757736 
a5ad1d37-aed0-4bf6-8c6e-c28543cd38ac /dev/sdf

nova will import an iSCSI LUN from the nova volume service, on the compute
node. The kernel will assign it the next free SCSI drive letter, in my
case '/dev/sdc'.

The libvirt nova driver will then do a mknod, using the volume name
passed to 'nova volume-attach'.
eg it will do

  mknod  /var/lib/nova/instances/instance-000e/rootfs/dev/sdf

this is where it has all gone horribly wrong...

  * The iSCSI LUN is completely randomly allocated, and unrelated to the
block device name the user will give to 'nova volume-attach'. So there
is no association between the /dev/sdf in the container and the
/dev/sdc in the host, and you can't expect the caller of 'volume-attach'
to be able to predict what the next assigned LUN will be on the host.

  * The  /var/lib/nova/instances/instance-000e/rootfs/dev/ directory
where nova did the mknod is a completely different filesystem to
the one seen by the container. The /dev in the container is a tmpfs
that is never visible to the host, so a mknod in the host won't
appear to the container.

AFAIK, there is no way to resolve either of these problems given the
current level kernel support for LXC, which is why libvirt has never
implemented block volume attach itself.

Thus I'm wondering how this LXC volume-attach code in Nova has ever
worked, or was tested ? My testing of Nova shows no sign of it working
today. Unless someone can demonstrate a flaw in my logic, I'm inclined
to simply revert this whole commit from Nova.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Libvirt LXC with volume-attach broken ?

2012-07-05 Thread Chuck Short
On Thu, 5 Jul 2012 15:00:26 +0100
Daniel P. Berrange berra...@redhat.com wrote:

 In the Libvirt driver there is special-case code for LXC to deal with
 the volume-attach functionality, since there is no block device
 attach functionality in libvirt for LXC. The code in question was
 added in
 
   commit e40b659d320b3c6894862b87adf1011e31cbf8fc
   Author: Chuck Short chuck.sh...@canonical.com
   Date:   Tue Jan 31 20:53:24 2012 -0500
 
 Add support for LXC volumes.
 
 This introduces volume support for LXC containers in Nova.
 The way that this works is that when a device is attached to an
 LXC container is that, the xml is parsed to find out which device
 to connect to the LXC container, binds the device to the LXC
 container, and allow the device through cgroups.
 
 This bug fixes LP: #924601.
 
 Change-Id: I00b41426ae8354b3cd4212655ecb48319a63aa9b
 Signed-off-by: Chuck Short chuck.sh...@canonical.com
 
 First a little background
 
 The way LXC works with Nova, is that the image file assigned to the
 instance eg 
 
   /var/lib/nova/instances/instance-000e/disk
 
 is exported via qemu-nbd, and then mounted on the host at
 
   /var/lib/nova/instances/instance-000e/rootfs
 
 
 When libvirt starts the container it uses that directory as the root
 filesystem. libvirt will *also* mount a private /dev, /dev/pts, /proc
 and /sys for the container. This is all fine
 
 Now, when using 'nova volume-attach':
 
   # nova volume-attach 05eb16df-03b8-451b-85c1-b838a8757736
 a5ad1d37-aed0-4bf6-8c6e-c28543cd38ac /dev/sdf
 
 nova will import an iSCSI LUN from the nova volume service, on the
 compute node. The kernel will assign it the next free SCSI drive
 letter, in my case '/dev/sdc'.
 
 The libvirt nova driver will then do a mknod, using the volume name
 passed to 'nova volume-attach'.
 eg it will do
 
   mknod  /var/lib/nova/instances/instance-000e/rootfs/dev/sdf
 
 this is where it has all gone horribly wrong...
 
   * The iSCSI LUN is completely randomly allocated, and unrelated to
 the block device name the user will give to 'nova volume-attach'. So
 there is no association between the /dev/sdf in the container and the
 /dev/sdc in the host, and you can't expect the caller of
 'volume-attach' to be able to predict what the next assigned LUN will
 be on the host.
 
   * The  /var/lib/nova/instances/instance-000e/rootfs/dev/
 directory where nova did the mknod is a completely different
 filesystem to the one seen by the container. The /dev in the
 container is a tmpfs that is never visible to the host, so a mknod in
 the host won't appear to the container.
 
 AFAIK, there is no way to resolve either of these problems given the
 current level kernel support for LXC, which is why libvirt has never
 implemented block volume attach itself.
 
 Thus I'm wondering how this LXC volume-attach code in Nova has ever
 worked, or was tested ? My testing of Nova shows no sign of it working
 today. Unless someone can demonstrate a flaw in my logic, I'm inclined
 to simply revert this whole commit from Nova.
 
 Regards,
 Daniel

Hi,

It *was* working at one point. Its on my todo list to make sure that it
still works properly. Otherwise Ill remove it myself.

Regards
chuck

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Libvirt LXC with volume-attach broken ?

2012-07-05 Thread Daniel P. Berrange
On Thu, Jul 05, 2012 at 03:00:26PM +0100, Daniel P. Berrange wrote:
 Now, when using 'nova volume-attach':
 
   # nova volume-attach 05eb16df-03b8-451b-85c1-b838a8757736 
 a5ad1d37-aed0-4bf6-8c6e-c28543cd38ac /dev/sdf
 
 nova will import an iSCSI LUN from the nova volume service, on the compute
 node. The kernel will assign it the next free SCSI drive letter, in my
 case '/dev/sdc'.
 
 The libvirt nova driver will then do a mknod, using the volume name
 passed to 'nova volume-attach'.
 eg it will do
 
   mknod  /var/lib/nova/instances/instance-000e/rootfs/dev/sdf

Opps, I'm slightly wrong here. What it actually does is

  mount --bind /dev/sdc /var/lib/nova/instances/instance-000e/rootfs/dev/sdf

so you get a 'sdf' device, but with the major/minor number of the 'sdc'
device. I can't say I particularly like this approach. Ultimately I
think we need the kernel support to make this work correctly. In any
case, even using mount --bind, doesn't deal with the fact that the guest'
/dev is not visible from the host

 this is where it has all gone horribly wrong...
 
   * The iSCSI LUN is completely randomly allocated, and unrelated to the
 block device name the user will give to 'nova volume-attach'. So there
 is no association between the /dev/sdf in the container and the
 /dev/sdc in the host, and you can't expect the caller of 'volume-attach'
 to be able to predict what the next assigned LUN will be on the host.
 
   * The  /var/lib/nova/instances/instance-000e/rootfs/dev/ directory
 where nova did the mknod is a completely different filesystem to
 the one seen by the container. The /dev in the container is a tmpfs
 that is never visible to the host, so a mknod in the host won't
 appear to the container.
 
 AFAIK, there is no way to resolve either of these problems given the
 current level kernel support for LXC, which is why libvirt has never
 implemented block volume attach itself.
 
 Thus I'm wondering how this LXC volume-attach code in Nova has ever
 worked, or was tested ? My testing of Nova shows no sign of it working
 today. Unless someone can demonstrate a flaw in my logic, I'm inclined
 to simply revert this whole commit from Nova.
 

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Libvirt LXC with volume-attach broken ?

2012-07-05 Thread Serge Hallyn
Quoting Daniel P. Berrange (berra...@redhat.com):
 On Thu, Jul 05, 2012 at 03:00:26PM +0100, Daniel P. Berrange wrote:
  Now, when using 'nova volume-attach':
  
# nova volume-attach 05eb16df-03b8-451b-85c1-b838a8757736 
  a5ad1d37-aed0-4bf6-8c6e-c28543cd38ac /dev/sdf
  
  nova will import an iSCSI LUN from the nova volume service, on the compute
  node. The kernel will assign it the next free SCSI drive letter, in my
  case '/dev/sdc'.
  
  The libvirt nova driver will then do a mknod, using the volume name
  passed to 'nova volume-attach'.
  eg it will do
  
mknod  /var/lib/nova/instances/instance-000e/rootfs/dev/sdf
 
 Opps, I'm slightly wrong here. What it actually does is
 
   mount --bind /dev/sdc 
 /var/lib/nova/instances/instance-000e/rootfs/dev/sdf
 
 so you get a 'sdf' device, but with the major/minor number of the 'sdc'
 device. I can't say I particularly like this approach. Ultimately I
 think we need the kernel support to make this work correctly. In any

Yes, that's what the 'devices namespace' is meant to address.  I'm hoping
we can some serious design discussion on that in the next few months.

-serge

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp