Hello, we are experiencing failures in starting new instances on our local OpenStack (Folsom) installation. The problem seems to be that VMs are being started on nodes that do not have enough free space. Relevant extract from `nova-compute.log` on a compute node:
... 19314 TRACE nova.compute.manager [instance: ...] ProcessExecutionError: Unexpected error while running command. ... 19314 TRACE nova.compute.manager [instance: ...] Command: qemu-img convert -O raw /var/lib/nova/instances/_base/77f4a9fce5e923d379c1514ca8078ff3c7e2835f.part /var/lib/nova/instances/_base/77f4a9fce5e923d379c1514ca8078ff3c7e2835f.converted ... 19314 TRACE nova.compute.manager [instance: ...] Exit code: 1 ... 19314 TRACE nova.compute.manager [instance: ...] Stdout: '' ... 19314 TRACE nova.compute.manager [instance: ...] Stderr: 'qemu-img: error while writing sector 1720383: No space left on device\n' Now, we have two questions: (1) Why does this happen, i.e., how do we debug the misreporting of free disk space? (2) Most disk space seems to be wasted by old (and presumably failed) image conversion attempts in `/var/lib/nova/instances/_base`. How can we clean those up? Can we just delete the files or some special care / database edit is needed? More details on the node state follows: (1) There is a curious mismatch between the values that Nova gets from hypervisor, and those that it apparently reports to the Scheduler: ... 19314 DEBUG nova.compute.resource_tracker [-] Hypervisor: free ram (MB): 30284 _report_hypervisor_resource_view /usr/lib/python2.7/dist-packages/nova/compute/resource_tracker.py:470 ... 19314 DEBUG nova.compute.resource_tracker [-] Hypervisor: free disk (GB): 7 _report_hypervisor_resource_view /usr/lib/python2.7/dist-packages/nova/compute/resource_tracker.py:471 ... 19314 DEBUG nova.compute.resource_tracker [-] Hypervisor: free VCPUs: 6 _report_hypervisor_resource_view /usr/lib/python2.7/dist-packages/nova/compute/resource_tracker.py:476 ... 19314 AUDIT nova.compute.resource_tracker [-] Free ram (MB): 30189 ... 19314 AUDIT nova.compute.resource_tracker [-] Free disk (GB): 87 ... 19314 AUDIT nova.compute.resource_tracker [-] Free VCPUS: 5 ... 19314 INFO nova.compute.resource_tracker [-] Compute_service record updated for node-09-02-11 Why are 7GB of free disk detected, but 87 reported? And why does this not happen for the free RAM? There are only two instances actually running on the node: # pgrep -lf kvm 1419 kvm-irqfd-clean 3225 /usr/bin/kvm -name instance-00002ab1 ... 3228 kvm-pit-wq 24110 /usr/bin/kvm -name instance-00002800 ... 24113 kvm-pit-wq (2) The disk on the node is nearly full (`/var/lib/nova` is on the root fs): root@node-09-02-11:/var/lib/nova/instances# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/node--09--02--11-root 103G 96G 2,0G 99% / ... Most of the space is used in the `instances/_base` subdirectory: root@node-09-02-11:/var/lib/nova/instances# du -sch * 93G _base 672M instance-00002800 8,0K instance-00002835 17M instance-00002ab1 8,0K instance-00002ab2 8,0K instance-00002ab3 4,0K snapshots 93G total root@node-09-02-11:/var/lib/nova/instances/_base# ls -l total 96645084 -rw-r--r-- 1 nova nova 5368709120 mag 14 19:12 04df06d78ce815d1e1dbac931318253f5423480f -rw-r--r-- 1 libvirt-qemu kvm 5368709120 mag 14 19:13 04df06d78ce815d1e1dbac931318253f5423480f_5 -rw-r--r-- 1 nova nova 10737418240 mag 22 14:24 0c09def8c324019bff1d98d88fa3266a964acd78 -rw-r--r-- 1 libvirt-qemu kvm 21474836480 mag 22 14:25 0c09def8c324019bff1d98d88fa3266a964acd78_20 -rw-rw-r-- 1 nova nova 1276903424 mag 22 14:24 0c09def8c324019bff1d98d88fa3266a964acd78.part -rw-r--r-- 1 nova nova 21474836480 mag 31 11:33 23f7cddeb51dd8892cbaa076eef7693ee43fa295 -rw-r--r-- 1 libvirt-qemu kvm 21474836480 mag 31 11:34 23f7cddeb51dd8892cbaa076eef7693ee43fa295_20 -rw-r--r-- 1 libvirt-qemu kvm 42949672960 giu 3 10:11 23f7cddeb51dd8892cbaa076eef7693ee43fa295_40 -rw-rw-r-- 1 nova nova 2400321536 mag 31 11:32 23f7cddeb51dd8892cbaa076eef7693ee43fa295.part -rw-r--r-- 1 nova nova 21474836480 mag 30 15:30 3ea65202fb7a4664fc1983dff33bf3d007627549 -rw-r--r-- 1 libvirt-qemu kvm 21474836480 mag 30 15:30 3ea65202fb7a4664fc1983dff33bf3d007627549_20 -rw-r--r-- 1 libvirt-qemu kvm 42949672960 mag 31 10:04 3ea65202fb7a4664fc1983dff33bf3d007627549_40 -rw-rw-r-- 1 nova nova 1581711360 mag 30 15:30 3ea65202fb7a4664fc1983dff33bf3d007627549.part -rw-r--r-- 1 nova nova 21474836480 mag 2 22:02 4a00a749fd7741fde579b69202475606c800a053 -rw-r--r-- 1 libvirt-qemu kvm 21474836480 mag 2 22:03 4a00a749fd7741fde579b69202475606c800a053_5 -rw-r--r-- 1 nova nova 21474836480 apr 26 14:27 7e93ee4745fe5a825aff33d0731b12ade576fffc -rw-r--r-- 1 libvirt-qemu kvm 21474836480 apr 26 14:35 7e93ee4745fe5a825aff33d0731b12ade576fffc_20 -rw-r--r-- 1 nova nova 5368709120 apr 10 11:24 90551e161412bd971a2f323fd9cb9d8b96a56f5f -rw-r--r-- 1 libvirt-qemu kvm 107374182400 apr 29 17:16 90551e161412bd971a2f323fd9cb9d8b96a56f5f_100 -rw-r--r-- 1 libvirt-qemu kvm 21474836480 apr 10 11:37 90551e161412bd971a2f323fd9cb9d8b96a56f5f_20 -rw-r--r-- 1 libvirt-qemu kvm 5368709120 apr 10 11:24 90551e161412bd971a2f323fd9cb9d8b96a56f5f_5 -rw-r--r-- 1 libvirt-qemu kvm 85899345920 apr 17 20:42 90551e161412bd971a2f323fd9cb9d8b96a56f5f_80 -rw-r--r-- 1 nova nova 21474836480 mag 14 16:40 db5ff3aeff506b37035f0ff1c07acf96311f1172 -rw-r--r-- 1 libvirt-qemu kvm 21474836480 mag 14 16:43 db5ff3aeff506b37035f0ff1c07acf96311f1172_20 -rw-r--r-- 1 nova nova 107374182400 apr 29 17:17 ephemeral_0_100_None -rw-r--r-- 1 libvirt-qemu kvm 107374182400 apr 29 17:17 ephemeral_0_100_None_100 -rw-r--r-- 1 nova nova 21474836480 mag 27 14:46 ephemeral_0_20_None -rw-r--r-- 1 libvirt-qemu kvm 21474836480 mag 27 14:46 ephemeral_0_20_None_20 However, only one file in `instances/_base` is in use (both instances use the same base image): root@node-09-02-11:/var/lib/nova/instances/_base# fuser -v * USER PID ACCESS COMMAND /var/lib/nova/instances/_base/90551e161412bd971a2f323fd9cb9d8b96a56f5f_5: libvirt-qemu 3225 f.... kvm libvirt-qemu 24110 f.... kvm Can the other files in `_base` be safely deleted? Thanks for any help, Riccardo -- Riccardo Murri http://www.gc3.uzh.ch/people/rm Grid Computing Competence Centre University of Zurich Winterthurerstrasse 190, CH-8057 Zürich (Switzerland) Tel: +41 44 635 4222 Fax: +41 44 635 6888 _______________________________________________ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp