[ovirt-users] Problem with start VDSMD durring hosted-engine --deploy
Hi Support, Could you help me with my problem, because when I try to deploy host on fresh Centos 7.1 installation I got an error that vdsmd servies cannot bestarted, and deploy after taht are terminated: [root@vmsrv1 ~]# [root@vmsrv1 ~]# hosted-engine --deploy /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py:15: DeprecationWarning: vdscli uses xmlrpc. since ovirt 3.6 xmlrpc is deprecated, please use vdsm.jsonrpcvdscli import vdsm.vdscli [ INFO ] Stage: Initializing [ INFO ] Generating a temporary VNC password. [ INFO ] Stage: Environment setup During customization use CTRL-D to abort. Continuing will configure this host for serving as hypervisor and create a VM where you have to install the engine afterwards. Are you sure you want to continue? (Yes, No)[Yes]: It has been detected that this program is executed through an SSH connection without using screen. Continuing with the installation may lead to broken installation if the network connection fails. It is highly recommended to abort the installation and run it inside a screen session using command "screen". Do you want to continue anyway? (Yes, No)[No]: Yes [ INFO ] Hardware supports virtualization Configuration files: [] Log file: /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20161024074216-iagc54.log Version: otopi-1.5.2 (otopi-1.5.2-1.el7.centos) [ INFO ] Stage: Environment packages setup [ INFO ] Stage: Programs detection [ INFO ] Stage: Environment setup *[ ERROR ] Failed to execute stage 'Environment setup': Failed to start service 'vdsmd'* [ INFO ] Stage: Clean up [ INFO ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20161024074222.conf' [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination [ ERROR ] Hosted Engine deployment failed Log file is located at /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20161024074216-iagc54.log [root@vmsrv1 ~]# Firstly I tried to do it by ovirt portal, but every time failed. so I try to do it by hosted-engine --deploy, and this time I got more information as above. (SELinux is disabled). firewall are disabled. Onlywhat make me confuse are vdsm. So I checkd it and for sure it has been installed (with version vdsm-4.18.13-1.el7.centos.x86_64) [root@vmsrv1 ~]# yum install vdsm Wczytane wtyczki: fastestmirror, versionlock Loading mirror speeds from cached hostfile * base: centos.trisect.eu * epel: epel.mirrors.ovh.net * extras: centos.trisect.eu * ovirt-4.0: ftp.nluug.nl * ovirt-4.0-epel: epel.mirrors.ovh.net * updates: centos.trisect.eu Pakiet vdsm-4.18.13-1.el7.centos.x86_64 jest już zainstalowany w najnowszej wersji Nie ma niczego do zrobienia after that I tried to find it and restart : [root@vmsrv1 ~]# systemctl vdsmd restart Unknown operation 'vdsmd'. So question is if vdsm has been installed, what going on? So: [root@vmsrv1 ~]# systemctl |grep -i vdsm supervdsmd.service loadedactive running Auxiliary vdsm service for running helper functions as root vdsm-network.service loadedactive exitedVirtual Desktop Server Manager network restoration [root@vmsrv1 ~]# As see above there are completly different name convention, and it shuld be like vdsmd.service. What is strange, this exist on host where is Portal, but it is not possible to start it, because of "vdsm-network.service" Question are how to avoid it, because for sure hosted-engine script do not know how to start vdsm Additionaly, I tried to reconfigure vdsm: [root@vmsrv1 ~]# vdsm-tool configure --force /usr/lib/python2.7/site-packages/vdsm/tool/dump_volume_chains.py:28: DeprecationWarning: vdscli uses xmlrpc. since ovirt 3.6 xmlrpc is deprecated, please use vdsm.jsonrpcvdscli from vdsm import vdscli Checking configuration status... Current revision of multipath.conf detected, preserving libvirt is already configured for vdsm SUCCESS: ssl configured to true. No conflicts Running configure... Reconfiguration of sebool is done. Reconfiguration of libvirt is done. Done configuring modules to VDSM. [root@vmsrv1 ~]# systemctl start vdsmd A dependency job for vdsmd.service failed. See 'journalctl -xe' for details. [root@vmsrv1 ~]# journalctl -xe -- Subject: Ukończono uruchamianie jednostki libvirtd.service -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Jednostka libvirtd.service ukończyła uruchamianie. -- -- Wynik uruchamiania: done. paź 24 07:57:12 vmsrv1.szypa.net systemd[1]: Configuration file /usr/lib/systemd/system/ebtables.service is marked executable. Please remove executable permission bits. Proceeding anyway. paź 24 07:57:12 vmsrv1.szypa.net systemd[1]: Started Auxiliary vdsm service for running helper functions as root. -- Subject: Ukończono uruchamianie jednostki supervdsmd.service -- Defined-By: systemd
Re: [ovirt-users] about hosted engine gluster support
- Original Message - > From: "张余歌"> To: users@ovirt.org > Sent: Monday, October 24, 2016 10:21:15 AM > Subject: [ovirt-users] about hosted engine gluster support > > hey ,friends! > Refer to > https://www.ovirt.org/develop/release-management/features/engine/self-hosted-engine-gluster-support/ > i meet some problem:when i process 'hosted-engine --deploy' > i show support iscsi,nfs3,nfs4 ,but gluster. > > i should be: Please specify the storage you would like to use (glusterfs, > iscsi, nfs3, nfs4)[nfs3]: glusterfs > > i followed the stage of the link,but i failed to hosted-engine support > gluster,maybe something else i should configure?i am so so so so > confused!why ,please help me! > > thanks. > > my ovirt version is 3.5.6. gluster is not supported as a storage for hosted-engine in oVirt 3.5. I would strongly suggest you to use the latest oVirt 4.0 unless you have a specific reason. Regards, Ramesh > > > ___ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users > ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
[ovirt-users] about hosted engine gluster support
hey ,friends!Refer to https://www.ovirt.org/develop/release-management/features/engine/self-hosted-engine-gluster-support/i meet some problem:when i process 'hosted-engine --deploy'i show support iscsi,nfs3,nfs4 ,but gluster. i should be:Please specify the storage you would like to use (glusterfs, iscsi, nfs3, nfs4)[nfs3]: glusterfs i followed the stage of the link,but i failed to hosted-engine support gluster,maybe something else i should configure?i am so so so so confused!why ,please help me! thanks. my ovirt version is 3.5.6. ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] low level Image copy failed
I apparently was unable to connect the dots when I was working on this yesterday. So, just to test I now manually changed the size value in the meta file 67108864 --> 73924608 And after that I was able to import the vm. So perhaps the real problem is in the export ? Rgds Jonas On 23/10/16 20:57, Jonas Israelsson wrote: On 23/10/16 20:06, Nir Soffer wrote: On Sun, Oct 23, 2016 at 5:34 PM, Jonas Israelssonwrote: Greetings. We are in the process of migrating from oVirt 3.6 to 4.0. To properly test 4.0 we have setup a parallel 4.0 environment. For the non critical vm:s we thought we try the "export vms --> move storage domain to the other DC --> import vms" method. While many imports are successful quite a few fails with 'low level Image copy failed' One of these vm impossible to import have the following disk layout. * Disk 1 - 100GB (Thin) * Disk2 - 32GB (Preallocated) According to the volume .meta file bellow, this is COW/SPARSE, not preallocated. It's because I'm an idiot and gave you information about the wrong disk. My apologizes.. $ /usr/bin/qemu-img.org info /rhev/data-center/9d200b26-359e-48b6-972a-90da179e4829/61842ad9-42da-40a9-8ec8-dd7807a82916/images/9eb60288-27b6-4fb1-aef1-4246455d588e/ddf8b402-514c-4a3c-9683-26810a7c41c0 image: /rhev/data-center/9d200b26-359e-48b6-972a-90da179e4829/61842ad9-42da-40a9-8ec8-dd7807a82916/images/9eb60288-27b6-4fb1-aef1-4246455d588e/ddf8b402-514c-4a3c-9683-26810a7c41c0 file format: raw virtual size: 35G (37849399296 bytes) disk size: 35G [root@patty tmp]# cat /rhev/data-center/9d200b26-359e-48b6-972a-90da179e4829/61842ad9-42da-40a9-8ec8-dd7807a82916/images/9eb60288-27b6-4fb1-aef1-4246455d588e/ddf8b402-514c-4a3c-9683-26810a7c41c0.meta DOMAIN=61842ad9-42da-40a9-8ec8-dd7807a82916 VOLTYPE=LEAF CTIME=1476880543 FORMAT=RAW IMAGE=9eb60288-27b6-4fb1-aef1-4246455d588e DISKTYPE=2 PUUID=---- LEGALITY=LEGAL MTIME=0 POOL_UUID= SIZE=67108864 TYPE=PREALLOCATED DESCRIPTION= EOF Can you share the original vm disk metadata before the export? Could you please instruct me how to ? It's on a FC-LUN so it's then hiding on a lv somewhere. I could perhaps just move it to an nfs data domain .. ? Looking at the metadata before the export, after the export, and after the import, we can understand what is the root cause. It will be hard to find the metadata after the failed copy since vdsm try hard to clean up after errors, but the information should be available in vdsm log. Yes I noticed, hence the qemu-img wrapper * Disk3 - 32GB (Thin) Where the two thin disk (1 & 3) are successfully imported but disk2, the preallocated always fail. ... and from vdsm.log ... CopyImageError: low level Image copy failed: ('ecode=1, stdout=, stderr=qemu-img: error while writing sector 73912303: No space left on device\n, message=None',) We need log from the entire flow, starting at "Run and protect: copyImage..." ... The first checking the size of the image (37849399296) , and the second the size of logical volume (34359738368) just created to hold this image. And as you can see the volume is smaller in size than the image it should hold, whereas we are under the impression something made an incorrect decision when creating that volume. The destination image size depend on the destination format. If the destination is preallocated, the logical volume size *must* be the virtual size (32G). If it is sparse, the logical volume should be the file size on the export domain (35G). According to your findings, we created a destination image for a preallocated disk (32G), and then tried to run "qemu-img convert" with qcow2 format as both source and destination. However this is only a guess, since I don't have the log showing the actual qemu-img command. 12:37:15 685557156 --- Identifier: 51635 , Arguments: convert -p -t none -T none -f raw /rhev/data-center/9d200b26-359e-48b6-972a-90da179e4829/61842ad9-42da-40a9-8ec8-dd7807a82916/images/9eb60288-27b6-4fb1-aef1-4246455d588e/ddf8b402-514c-4a3c-9683-26810a7c41c0 -O raw /rhev/data-center/mnt/blockSD/cb64e1fc-98b6-4b8c-916e-418d05bcd467/images/a1d70c22-cace-48d2-9809-caadc70b77e7/71f5fe82-81dd-47e9-aa3f-1a66622db4cb Please share complete engine and vdsm logs showing the entire flow. http://whs1.elementary.se/logs.tar.gz In vdsm.log search for 12:37:15 ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] low level Image copy failed
On 23/10/16 20:06, Nir Soffer wrote: On Sun, Oct 23, 2016 at 5:34 PM, Jonas Israelssonwrote: Greetings. We are in the process of migrating from oVirt 3.6 to 4.0. To properly test 4.0 we have setup a parallel 4.0 environment. For the non critical vm:s we thought we try the "export vms --> move storage domain to the other DC --> import vms" method. While many imports are successful quite a few fails with 'low level Image copy failed' One of these vm impossible to import have the following disk layout. * Disk 1 - 100GB (Thin) * Disk2 - 32GB (Preallocated) According to the volume .meta file bellow, this is COW/SPARSE, not preallocated. It's because I'm an idiot and gave you information about the wrong disk. My apologizes.. $ /usr/bin/qemu-img.org info /rhev/data-center/9d200b26-359e-48b6-972a-90da179e4829/61842ad9-42da-40a9-8ec8-dd7807a82916/images/9eb60288-27b6-4fb1-aef1-4246455d588e/ddf8b402-514c-4a3c-9683-26810a7c41c0 image: /rhev/data-center/9d200b26-359e-48b6-972a-90da179e4829/61842ad9-42da-40a9-8ec8-dd7807a82916/images/9eb60288-27b6-4fb1-aef1-4246455d588e/ddf8b402-514c-4a3c-9683-26810a7c41c0 file format: raw virtual size: 35G (37849399296 bytes) disk size: 35G [root@patty tmp]# cat /rhev/data-center/9d200b26-359e-48b6-972a-90da179e4829/61842ad9-42da-40a9-8ec8-dd7807a82916/images/9eb60288-27b6-4fb1-aef1-4246455d588e/ddf8b402-514c-4a3c-9683-26810a7c41c0.meta DOMAIN=61842ad9-42da-40a9-8ec8-dd7807a82916 VOLTYPE=LEAF CTIME=1476880543 FORMAT=RAW IMAGE=9eb60288-27b6-4fb1-aef1-4246455d588e DISKTYPE=2 PUUID=---- LEGALITY=LEGAL MTIME=0 POOL_UUID= SIZE=67108864 TYPE=PREALLOCATED DESCRIPTION= EOF Can you share the original vm disk metadata before the export? Could you please instruct me how to ? It's on a FC-LUN so it's then hiding on a lv somewhere. I could perhaps just move it to an nfs data domain .. ? Looking at the metadata before the export, after the export, and after the import, we can understand what is the root cause. It will be hard to find the metadata after the failed copy since vdsm try hard to clean up after errors, but the information should be available in vdsm log. Yes I noticed, hence the qemu-img wrapper * Disk3 - 32GB (Thin) Where the two thin disk (1 & 3) are successfully imported but disk2, the preallocated always fail. ... and from vdsm.log ... CopyImageError: low level Image copy failed: ('ecode=1, stdout=, stderr=qemu-img: error while writing sector 73912303: No space left on device\n, message=None',) We need log from the entire flow, starting at "Run and protect: copyImage..." ... The first checking the size of the image (37849399296) , and the second the size of logical volume (34359738368) just created to hold this image. And as you can see the volume is smaller in size than the image it should hold, whereas we are under the impression something made an incorrect decision when creating that volume. The destination image size depend on the destination format. If the destination is preallocated, the logical volume size *must* be the virtual size (32G). If it is sparse, the logical volume should be the file size on the export domain (35G). According to your findings, we created a destination image for a preallocated disk (32G), and then tried to run "qemu-img convert" with qcow2 format as both source and destination. However this is only a guess, since I don't have the log showing the actual qemu-img command. 12:37:15 685557156 --- Identifier: 51635 , Arguments: convert -p -t none -T none -f raw /rhev/data-center/9d200b26-359e-48b6-972a-90da179e4829/61842ad9-42da-40a9-8ec8-dd7807a82916/images/9eb60288-27b6-4fb1-aef1-4246455d588e/ddf8b402-514c-4a3c-9683-26810a7c41c0 -O raw /rhev/data-center/mnt/blockSD/cb64e1fc-98b6-4b8c-916e-418d05bcd467/images/a1d70c22-cace-48d2-9809-caadc70b77e7/71f5fe82-81dd-47e9-aa3f-1a66622db4cb Please share complete engine and vdsm logs showing the entire flow. http://whs1.elementary.se/logs.tar.gz In vdsm.log search for 12:37:15 ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] low level Image copy failed
On Sun, Oct 23, 2016 at 5:34 PM, Jonas Israelssonwrote: > Greetings. > > We are in the process of migrating from oVirt 3.6 to 4.0. To properly test > 4.0 we have setup a parallel 4.0 environment. > > For the non critical vm:s we thought we try the "export vms --> move storage > domain to the other DC --> import vms" method. > > While many imports are successful quite a few fails with 'low level Image > copy failed' > > One of these vm impossible to import have the following disk layout. > > * Disk 1 - 100GB (Thin) > > * Disk2 - 32GB (Preallocated) According to the volume .meta file bellow, this is COW/SPARSE, not preallocated. Can you share the original vm disk metadata before the export? Looking at the metadata before the export, after the export, and after the import, we can understand what is the root cause. It will be hard to find the metadata after the failed copy since vdsm try hard to clean up after errors, but the information should be available in vdsm log. > * Disk3 - 32GB (Thin) > > Where the two thin disk (1 & 3) are successfully imported but disk2, the > preallocated always fail. > ... > and from vdsm.log > ... > CopyImageError: low level Image copy failed: ('ecode=1, stdout=, > stderr=qemu-img: error while writing sector 73912303: No space left on > device\n, message=None',) We need log from the entire flow, starting at "Run and protect: copyImage..." ... > The first checking the size of the image (37849399296) , and the second the > size of logical volume (34359738368) just created to hold this image. > And as you can see the volume is smaller in size than the image it should > hold, whereas we are under the impression something made an incorrect > decision when creating that volume. The destination image size depend on the destination format. If the destination is preallocated, the logical volume size *must* be the virtual size (32G). If it is sparse, the logical volume should be the file size on the export domain (35G). According to your findings, we created a destination image for a preallocated disk (32G), and then tried to run "qemu-img convert" with qcow2 format as both source and destination. However this is only a guess, since I don't have the log showing the actual qemu-img command. Please share complete engine and vdsm logs showing the entire flow. Nir ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
[ovirt-users] low level Image copy failed
Greetings. We are in the process of migrating from oVirt 3.6 to 4.0. To properly test 4.0 we have setup a parallel 4.0 environment. For the non critical vm:s we thought we try the "export vms --> move storage domain to the other DC --> import vms" method. While many imports are successful quite a few fails with 'low level Image copy failed' One of these vm impossible to import have the following disk layout. * Disk 1 - 100GB (Thin) * Disk2 - 32GB (Preallocated) * Disk3 - 32GB (Thin) Where the two thin disk (1 & 3) are successfully imported but disk2, the preallocated always fail. From engine.log 2016-10-19 18:50:28,096 INFO [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (DefaultQuartzScheduler2) [2dc919bd] BaseAsyncTask::onTaskEndSuccess: Task '30832827-078e-4359-8552-0dccdc9821ff' (Parent Command 'ImportVm', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended successfully. 2016-10-19 18:50:28,096 INFO [org.ovirt.engine.core.bll.CommandMultiAsyncTasks] (DefaultQuartzScheduler2) [2dc919bd] Task with DB Task ID '64829f3d-194b-434f-8997-4723770e4638' and VDSM Task ID 'bccae407-0c28-4556-80d3-6b61887ce045' is in state Polling. End action for command 39bbd979-e9f8-4cf6-901f-55d109baa9cc will proceed when all the entity's tasks are completed. 2016-10-19 18:50:40,231 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand] (DefaultQuartzScheduler2) [2dc919bd] Failed in 'HSMGetAllTasksStatusesVDS' method 2016-10-19 18:50:40,243 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler2) [2dc919bd] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VDSM fattony command failed: low level Image copy failed 2016-10-19 18:50:40,243 INFO [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (DefaultQuartzScheduler2) [2dc919bd] SPMAsyncTask::PollTask: Polling task 'bccae407-0c28-4556-80d3-6b61887ce045' (Parent Command 'ImportVm', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') returned status 'finished', result 'cleanSuccess'. 2016-10-19 18:50:40,296 ERROR [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (DefaultQuartzScheduler2) [2dc919bd] BaseAsyncTask::logEndTaskFailure: Task 'bccae407-0c28-4556-80d3-6b61887ce045' (Parent Command 'ImportVm', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended with failure: -- Result: 'cleanSuccess' -- Message: 'VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = low level Image copy failed, code = 261', -- Exception: 'VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = low level Image copy failed, code = 261' and from vdsm.log bccae407-0c28-4556-80d3-6b61887ce045::DEBUG::2016-10-19 18:50:36,451::resourceManager::661::Storage.ResourceManager::(releaseResource) No one is waiting for resource '61842ad9-42da-40a9-8ec8-dd7807a82916_imageNS.9eb60288-27b6-4fb1-aef1-4246455d588e', Clearing records. bccae407-0c28-4556-80d3-6b61887ce045::ERROR::2016-10-19 18:50:36,452::task::868::Storage.TaskManager.Task::(_setError) Task=`bccae407-0c28-4556-80d3-6b61887ce045`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 875, in _run return fn(*args, **kargs) File "/usr/share/vdsm/storage/task.py", line 334, in run return self.cmd(*self.argslist, **self.argsdict) File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 78, in wrapper return method(self, *args, **kwargs) File "/usr/share/vdsm/storage/sp.py", line 1558, in copyImage postZero, force) File "/usr/share/vdsm/storage/image.py", line 902, in copyCollapsed raise se.CopyImageError(str(e)) CopyImageError: low level Image copy failed: ('ecode=1, stdout=, stderr=qemu-img: error while writing sector 73912303: No space left on device\n, message=None',) To further figure out what is going on we created a wrapper, replacing qemu-img with a script that runs qemu-img through strace. What caught our attention is the following two lseek stat("/rhev/data-center/9d200b26-359e-48b6-972a-90da179e4829/61842ad9-42da-40a9-8ec8-dd7807a82916/images/9eb60288-27b6-4fb1-aef1-4246455d588e/ddf8b402-514c-4a3c-9683-26810a7c41c0", {st_mode=S_IFREG|0660, st_size=37849399296, ...}) = 0 open("/rhev/data-center/9d200b26-359e-48b6-972a-90da179e4829/61842ad9-42da-40a9-8ec8-dd7807a82916/images/9eb60288-27b6-4fb1-aef1-4246455d588e/ddf8b402-514c-4a3c-9683-26810a7c41c0", O_RDONLY|O_DIRECT|O_CLOEXEC) = 12 fstat(12, {st_mode=S_IFREG|0660, st_size=37849399296, ...}) = 0 lseek(12, 0, SEEK_END) = 37849399296 AND stat("/rhev/data-center/mnt/blockSD/cb64e1fc-98b6-4b8c-916e-418d05bcd467/images/a1d70c22-cace-48d2-9809-caadc70b77e7/71f5fe82-81dd-47e9-aa3f-1a66622db4cb", {st_mode=S_IFBLK|0660, st_rdev=makedev(253, 42), ...}) = 0
Re: [ovirt-users] Ovirt Hypervisor vdsm.Scheduler logs fill partition
Do you know when .34 will be released? http://mirror.centos.org/centos/7/virt/x86_64/ovirt-3.6/ Latest version is: vdsm-cli-4.17.32-1.el7.noarch.rpm 08-Aug-2016 17:36 On Fri, Oct 14, 2016 at 1:11 AM, Francesco Romaniwrote: > > - Original Message - > > From: "Simone Tiraboschi" > > To: "Steve Dainard" , "Francesco Romani" < > from...@redhat.com> > > Cc: "users" > > Sent: Friday, October 14, 2016 9:59:49 AM > > Subject: Re: [ovirt-users] Ovirt Hypervisor vdsm.Scheduler logs fill > partition > > > > On Fri, Oct 14, 2016 at 1:12 AM, Steve Dainard > wrote: > > > > > Hello, > > > > > > I had a hypervisor semi-crash this week, 4 of ~10 VM's continued to > run, > > > but the others were killed off somehow and all VM's running on this > host > > > had '?' status in the ovirt UI. > > > > > > This appears to have been caused by vdsm logs filling up disk space on > the > > > logging partition. > > > > > > I've attached the log file vdsm.log.27.xz which shows this error: > > > > > > vdsm.Scheduler::DEBUG::2016-10-11 > > > 16:42:09,318::executor::216::Executor::(_discard) > > > Worker discarded: > > action= > > 'virt.periodic.DriveWatermarkMonitor'> > > > at 0x7f8e90021210> at 0x7f8e90021250> discarded at 0x7f8dd123e850> > > > > > > which happens more and more frequently throughout the log. > > > > > > It was a bit difficult to understand what caused the failure, but the > logs > > > were getting really large, then being xz'd which compressed 11G+ into > a few > > > MB. Once this happened the disk space would be freed, and nagios > wouldn't > > > hit the 3rd check to throw a warning, until pretty much right at the > crash. > > > > > > I was able to restart vdsmd to resolve the issue, but I still need to > know > > > why these logs started to stack up so I can avoid this issue in the > future. > > > > > > > We had this one: https://bugzilla.redhat.com/show_bug.cgi?id=1383259 > > but in your case the logs are rotating. > > Francesco? > > Hi, > > yes, it is a different issue. Here the log messages are caused by the > Worker threads > of the periodic subsystem, which are leaking[1]. > This was a bug in Vdsm (insufficient protection against rogue domains), > but the > real problem is that some of your domain are being unresponsive at > hypervisor level. > The most likely cause is in turn unresponsive storages. > > Fixes are been committed and shipped with Vdsm 4.17.34. > > See: ttps://bugzilla.redhat.com/1364925 > > HTH, > > +++ > > [1] actually, they are replaced too quickly, leading to unbound growth. > So those aren't actually "leaking", Vdsm is just overzealous handling one > error condition, > making things worse than before. > Still serious issue, no doubt, but quite different cause. > > -- > Francesco Romani > Red Hat Engineering Virtualization R & D > Phone: 8261328 > IRC: fromani > ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users