Re: [Openstack] Libvirt LXC with volume-attach broken ?
Quoting Eric W. Biederman (ebied...@xmission.com): Daniel P. Berrange berra...@redhat.com writes: On Thu, Jul 05, 2012 at 06:49:06PM -0700, Eric W. Biederman wrote: Serge Hallyn serge.hal...@canonical.com writes: Quoting Daniel P. Berrange (berra...@redhat.com): On Thu, Jul 05, 2012 at 03:00:26PM +0100, Daniel P. Berrange wrote: Now, when using 'nova volume-attach': # nova volume-attach 05eb16df-03b8-451b-85c1-b838a8757736 a5ad1d37-aed0-4bf6-8c6e-c28543cd38ac /dev/sdf nova will import an iSCSI LUN from the nova volume service, on the compute node. The kernel will assign it the next free SCSI drive letter, in my case '/dev/sdc'. The libvirt nova driver will then do a mknod, using the volume name passed to 'nova volume-attach'. eg it will do mknod /var/lib/nova/instances/instance-000e/rootfs/dev/sdf Opps, I'm slightly wrong here. What it actually does is mount --bind /dev/sdc /var/lib/nova/instances/instance-000e/rootfs/dev/sdf so you get a 'sdf' device, but with the major/minor number of the 'sdc' device. I can't say I particularly like this approach. Ultimately I think we need the kernel support to make this work correctly. In any Yes, that's what the 'devices namespace' is meant to address. I'm hoping we can some serious design discussion on that in the next few months. This is not the device namespace problem. This is the setns problem for mount namespaces, and the unprivilged mount problem. There may be a notification issue so use space can perform actions in a container when a device shows up. But it should be very possible on the host to call. setns(containers_mount_namespace); mknod(/dev/foo); chown(/dev/foo, CONTAINER_ROOT_UID, CONTAINER_ROOT_GID); And then from inside the container especially when I get the rest of the user namespace merged it should be very possible to manipulate the block device because you have permission, and to mount the partitions of the block device, because you are root in your container. But until the user namespace is merged you really are root so you can mount whatever. Daniel does that sound like the support you are looking for? Yes, the setns(mnt) approach you describe above is exactly what I'd like to be able todo, to solve the first half of the problem. The part of the problem is that I have a /dev/sdf, or even a /dev/volgroup00/logvol3 in the host (with whatever major:minor number that implies), and I want to be able to make it always appear as /dev/sda in the container (with the correspondingly different major:minor number). I'm guessing this is what Serge was refering to as the 'device' namespace problem Right. Getting the device to always appear with the name /dev/sda is easy. It's easy to log in and make it look that way. It's not easy to make all distros see it that way across boot. Where does the need to have a specific device come from? I would have thought by now that hotplug had been around long enough that in general user space would not care. Yes the *primary* need for the devices namespace is to prevent udev storm in the host and send uevents to the right place, and macvtap and loop devices. The only case that I know of where keeping the same device number seems reasonable is in the case of live migration an application, in order to avoid issues with stat changing for the same file over the transition, and I think a synthesized hotplug event could probably handle that case. Is there another case besides buggy applications that have hard coded device numbers that need specific device numbers? Other cases where specific device maj-min numbers are important are things like makedev. There is lots of software, and especially automatic update software, which insists that things have specific 'correct' maj-minor numbers. FWIW my (presumably naive) view is that for each non-init devicens we'd have a list of type-major:minor::type2-major:minor2 (:: meaning maps-to). Then if a uevent comes through not aimed at any type2-major2:minor2 valid in the namespace, that ns doesn't get the uevent. -serge ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Libvirt LXC with volume-attach broken ?
Quoting Daniel P. Berrange (berra...@redhat.com): On Thu, Jul 05, 2012 at 03:00:26PM +0100, Daniel P. Berrange wrote: Now, when using 'nova volume-attach': # nova volume-attach 05eb16df-03b8-451b-85c1-b838a8757736 a5ad1d37-aed0-4bf6-8c6e-c28543cd38ac /dev/sdf nova will import an iSCSI LUN from the nova volume service, on the compute node. The kernel will assign it the next free SCSI drive letter, in my case '/dev/sdc'. The libvirt nova driver will then do a mknod, using the volume name passed to 'nova volume-attach'. eg it will do mknod /var/lib/nova/instances/instance-000e/rootfs/dev/sdf Opps, I'm slightly wrong here. What it actually does is mount --bind /dev/sdc /var/lib/nova/instances/instance-000e/rootfs/dev/sdf so you get a 'sdf' device, but with the major/minor number of the 'sdc' device. I can't say I particularly like this approach. Ultimately I think we need the kernel support to make this work correctly. In any Yes, that's what the 'devices namespace' is meant to address. I'm hoping we can some serious design discussion on that in the next few months. -serge ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] [libvirt] [RFC PATCH] lxc: don't return error on GetInfo when cgroups not yet set up
Quoting Serge E. Hallyn (serge.hal...@canonical.com): Quoting Daniel P. Berrange (berra...@redhat.com): On Wed, Sep 28, 2011 at 02:14:52PM -0500, Serge E. Hallyn wrote: Nova (openstack) calls libvirt to create a container, then periodically checks using GetInfo to see whether the container is up. If it does this too quickly, then libvirt returns an error, which in libvirt.py causes an exception to be raised, the same type as if the container was bad. lxcDomainGetInfo(), holds a mutex on 'dom' for the duration of its execution. It checks for virDomainObjIsActive() before trying to use the cgroups. Yes, it does, but lxcDomainStart(), holds the mutex on 'dom' for the duration of its execution, and does not return until the container is running and cgroups are present. No. It calls the lxc_controller with --background. The controller main task in turn exits before the cgroups have been set up. There is the race. So what is the right fix here? Should the controller write out another file when it is past the part which should be locked, and the driver waits for that file to exist before it drops the driver mutex? If we do that, do we risk having the driver hang when the controller has hung? -serge ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] [libvirt] [RFC PATCH] lxc: don't return error on GetInfo when cgroups not yet set up
Quoting Daniel P. Berrange (berra...@redhat.com): On Thu, Sep 29, 2011 at 10:12:17PM -0500, Serge E. Hallyn wrote: Quoting Daniel P. Berrange (berra...@redhat.com): On Wed, Sep 28, 2011 at 02:14:52PM -0500, Serge E. Hallyn wrote: Nova (openstack) calls libvirt to create a container, then periodically checks using GetInfo to see whether the container is up. If it does this too quickly, then libvirt returns an error, which in libvirt.py causes an exception to be raised, the same type as if the container was bad. lxcDomainGetInfo(), holds a mutex on 'dom' for the duration of its execution. It checks for virDomainObjIsActive() before trying to use the cgroups. Yes, it does, but lxcDomainStart(), holds the mutex on 'dom' for the duration of its execution, and does not return until the container is running and cgroups are present. No. It calls the lxc_controller with --background. The controller main task in turn exits before the cgroups have been set up. There is the race. The lxcDomainStart() method isn't actually waiting on the child pid directly, so the --background flag ought not to matter. We have a pipe that we pass into the controller, which we wait on for a notification after running the process. The controller does not notify the 'handshake' FD until after cgroups have been setup, unless I'm mis-interpreting our code That's the call to lxcContainerWaitForContinue(), right? If so, that's done by lxcContainerChild(), which is called by the lxc_controller. AFAICS there is nothing in the lxc_driver which will wait on that before dropping the driver-lock mutex. -serge ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] detecting errors when determining libvirt vm power state
Quoting Serge Hallyn (serge.hal...@canonical.com): Hi, I'm looking at what first manifested as a bug when launching multiple lxc containers simultaneously, i.e. 'euca-run-instances -n 4', as reported at https://bugs.launchpad.net/ubuntu/+source/nova/+bug/842845. The problem appears to be that nova uses self.driver.get_info(). Libvirt can raise excpetions on this for several reasons - the vm could be bad or not exist, or it could be in a transient state i.e. cgroups are not set up yet. What is the right way to handle this? Should the drivers categorize their exceptions into either 'broken' or 'transient' ones, so that nova can detect former and bail, and retry on the latter? Now that I've sent that, I guess it seems pretty clear that the lxc getinfo helper should understand that -ENOENT from getcgroup means it's not yet ready, and set the values to 0 as it does if the domain is not running. -serge ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
[Openstack] detecting errors when determining libvirt vm power state
Hi, I'm looking at what first manifested as a bug when launching multiple lxc containers simultaneously, i.e. 'euca-run-instances -n 4', as reported at https://bugs.launchpad.net/ubuntu/+source/nova/+bug/842845. The problem appears to be that nova uses self.driver.get_info(). Libvirt can raise excpetions on this for several reasons - the vm could be bad or not exist, or it could be in a transient state i.e. cgroups are not set up yet. What is the right way to handle this? Should the drivers categorize their exceptions into either 'broken' or 'transient' ones, so that nova can detect former and bail, and retry on the latter? Note that while the bug was raised for lxc, I suspect the same should be possible with kvm ones. However the qemu GetInfo method doesn't get its cpu/mem usage info from cgroups, so it would not happen the exact same way. -serge ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp