[ovirt-users] Problem with start VDSMD durring hosted-engine --deploy

2016-10-23 Thread Grzegorz Szypa
Hi  Support,

Could you  help me with my problem, because when I try to deploy host on
fresh Centos  7.1 installation  I got an error that vdsmd servies cannot
bestarted, and deploy after taht are terminated:

[root@vmsrv1 ~]#
[root@vmsrv1 ~]# hosted-engine --deploy
/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py:15:
DeprecationWarning: vdscli uses xmlrpc. since ovirt 3.6 xmlrpc is
deprecated, please use vdsm.jsonrpcvdscli
  import vdsm.vdscli
[ INFO  ] Stage: Initializing
[ INFO  ] Generating a temporary VNC password.
[ INFO  ] Stage: Environment setup
  During customization use CTRL-D to abort.
  Continuing will configure this host for serving as hypervisor and
create a VM where you have to install the engine afterwards.
  Are you sure you want to continue? (Yes, No)[Yes]:
  It has been detected that this program is executed through an SSH
connection without using screen.
  Continuing with the installation may lead to broken installation
if the network connection fails.
  It is highly recommended to abort the installation and run it
inside a screen session using command "screen".
  Do you want to continue anyway? (Yes, No)[No]: Yes
[ INFO  ] Hardware supports virtualization
  Configuration files: []
  Log file:
/var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20161024074216-iagc54.log
  Version: otopi-1.5.2 (otopi-1.5.2-1.el7.centos)
[ INFO  ] Stage: Environment packages setup
[ INFO  ] Stage: Programs detection
[ INFO  ] Stage: Environment setup
*[ ERROR ] Failed to execute stage 'Environment setup': Failed to start
service 'vdsmd'*
[ INFO  ] Stage: Clean up
[ INFO  ] Generating answer file
'/var/lib/ovirt-hosted-engine-setup/answers/answers-20161024074222.conf'
[ INFO  ] Stage: Pre-termination
[ INFO  ] Stage: Termination
[ ERROR ] Hosted Engine deployment failed
  Log file is located at
/var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20161024074216-iagc54.log
[root@vmsrv1 ~]#


Firstly I tried to do it by ovirt portal, but every time failed. so I try
to do it  by hosted-engine --deploy, and this time I got  more  information
as above. (SELinux is disabled). firewall are disabled. Onlywhat make me
confuse are vdsm. So I checkd it and for sure it has been installed (with
version vdsm-4.18.13-1.el7.centos.x86_64)

[root@vmsrv1 ~]# yum install vdsm
Wczytane wtyczki: fastestmirror, versionlock
Loading mirror speeds from cached hostfile
 * base: centos.trisect.eu
 * epel: epel.mirrors.ovh.net
 * extras: centos.trisect.eu
 * ovirt-4.0: ftp.nluug.nl
 * ovirt-4.0-epel: epel.mirrors.ovh.net
 * updates: centos.trisect.eu
Pakiet vdsm-4.18.13-1.el7.centos.x86_64 jest już zainstalowany w najnowszej
wersji
Nie ma niczego do zrobienia

after that I tried to find it and restart :
[root@vmsrv1 ~]# systemctl vdsmd restart
Unknown operation 'vdsmd'.

So question is if vdsm has been installed, what going on? So:
[root@vmsrv1 ~]# systemctl |grep -i vdsm
  supervdsmd.service
loadedactive running   Auxiliary vdsm service for
running helper functions as root
  vdsm-network.service
loadedactive exitedVirtual Desktop Server Manager
network restoration
[root@vmsrv1 ~]#


As see  above there are completly different name convention, and it shuld
be like vdsmd.service. What is strange, this exist  on host where is
Portal, but it is not possible to start it, because of
"vdsm-network.service"

Question are how to avoid it, because for sure hosted-engine script do not
 know how to start vdsm

Additionaly, I tried to reconfigure  vdsm:

[root@vmsrv1 ~]# vdsm-tool configure --force
/usr/lib/python2.7/site-packages/vdsm/tool/dump_volume_chains.py:28:
DeprecationWarning: vdscli uses xmlrpc. since ovirt 3.6 xmlrpc is
deprecated, please use vdsm.jsonrpcvdscli
  from vdsm import vdscli

Checking configuration status...

Current revision of multipath.conf detected, preserving
libvirt is already configured for vdsm
SUCCESS: ssl configured to true. No conflicts

Running configure...
Reconfiguration of sebool is done.
Reconfiguration of libvirt is done.

Done configuring modules to VDSM.


[root@vmsrv1 ~]# systemctl start vdsmd
A dependency job for vdsmd.service failed. See 'journalctl -xe' for details.


[root@vmsrv1 ~]# journalctl -xe
-- Subject: Ukończono uruchamianie jednostki libvirtd.service
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Jednostka libvirtd.service ukończyła uruchamianie.
--
-- Wynik uruchamiania: done.
paź 24 07:57:12 vmsrv1.szypa.net systemd[1]: Configuration file
/usr/lib/systemd/system/ebtables.service is marked executable. Please
remove executable permission bits. Proceeding anyway.
paź 24 07:57:12 vmsrv1.szypa.net systemd[1]: Started Auxiliary vdsm service
for running helper functions as root.
-- Subject: Ukończono uruchamianie jednostki supervdsmd.service
-- Defined-By: systemd

Re: [ovirt-users] about hosted engine gluster support

2016-10-23 Thread Ramesh Nachimuthu




- Original Message -
> From: "张余歌" 
> To: users@ovirt.org
> Sent: Monday, October 24, 2016 10:21:15 AM
> Subject: [ovirt-users] about hosted engine gluster support
> 
> hey ,friends!
> Refer to
> https://www.ovirt.org/develop/release-management/features/engine/self-hosted-engine-gluster-support/
> i meet some problem:when i process 'hosted-engine --deploy'
> i show support iscsi,nfs3,nfs4 ,but gluster.
> 
> i should be: Please specify the storage you would like to use (glusterfs,
> iscsi, nfs3, nfs4)[nfs3]: glusterfs
> 
> i followed the stage of the link,but i failed to hosted-engine support
> gluster,maybe something else i should configure?i am so so so so
> confused!why ,please help me!
> 
> thanks.
> 
> my ovirt version is 3.5.6.

gluster is not supported as a storage for hosted-engine in oVirt 3.5. I would 
strongly suggest you to use the latest oVirt 4.0 unless you have a specific 
reason.

Regards,
Ramesh

> 
> 
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
> 
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] about hosted engine gluster support

2016-10-23 Thread 张余歌
hey ,friends!Refer to 
https://www.ovirt.org/develop/release-management/features/engine/self-hosted-engine-gluster-support/i
 meet some problem:when i process 'hosted-engine --deploy'i  show support 
iscsi,nfs3,nfs4 ,but gluster.
i should be:Please specify the storage you would like to use (glusterfs, iscsi, 
nfs3, nfs4)[nfs3]: glusterfs
i followed the stage of the link,but i failed to hosted-engine support 
gluster,maybe something else i should configure?i am so so so so 
confused!why ,please help me!
thanks.
my ovirt version is 3.5.6.
  ___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] low level Image copy failed

2016-10-23 Thread Jonas Israelsson
I apparently was unable to connect the dots when I was working on this 
yesterday.


So, just to test I now manually changed the size value in the meta file

67108864 --> 73924608

And after that I was able to import the vm.

So perhaps the real problem is in the export ?

Rgds Jonas


On 23/10/16 20:57, Jonas Israelsson wrote:

On 23/10/16 20:06, Nir Soffer wrote:


On Sun, Oct 23, 2016 at 5:34 PM, Jonas Israelsson
 wrote:

Greetings.

We are in the process of migrating from oVirt 3.6 to 4.0. To 
properly test

4.0 we have setup a parallel 4.0 environment.

For the non critical vm:s we thought we try the "export vms --> move 
storage

domain to the other DC --> import vms" method.

While many imports are successful quite a few fails with 'low level 
Image

copy failed'

One of these vm impossible to import have the following disk layout.

* Disk 1 - 100GB  (Thin)

* Disk2 - 32GB (Preallocated)

According to the volume .meta file bellow, this is COW/SPARSE,
not preallocated.
It's because I'm an idiot and gave you information about the wrong 
disk. My apologizes..


$ /usr/bin/qemu-img.org info 
/rhev/data-center/9d200b26-359e-48b6-972a-90da179e4829/61842ad9-42da-40a9-8ec8-dd7807a82916/images/9eb60288-27b6-4fb1-aef1-4246455d588e/ddf8b402-514c-4a3c-9683-26810a7c41c0


image: 
/rhev/data-center/9d200b26-359e-48b6-972a-90da179e4829/61842ad9-42da-40a9-8ec8-dd7807a82916/images/9eb60288-27b6-4fb1-aef1-4246455d588e/ddf8b402-514c-4a3c-9683-26810a7c41c0

file format: raw
virtual size: 35G (37849399296 bytes)
disk size: 35G


[root@patty tmp]# cat 
/rhev/data-center/9d200b26-359e-48b6-972a-90da179e4829/61842ad9-42da-40a9-8ec8-dd7807a82916/images/9eb60288-27b6-4fb1-aef1-4246455d588e/ddf8b402-514c-4a3c-9683-26810a7c41c0.meta 


DOMAIN=61842ad9-42da-40a9-8ec8-dd7807a82916
VOLTYPE=LEAF
CTIME=1476880543
FORMAT=RAW
IMAGE=9eb60288-27b6-4fb1-aef1-4246455d588e
DISKTYPE=2
PUUID=----
LEGALITY=LEGAL
MTIME=0
POOL_UUID=
SIZE=67108864
TYPE=PREALLOCATED
DESCRIPTION=
EOF





Can you share the original vm disk metadata before the export?
Could you please instruct me how to ? It's on a FC-LUN so it's then 
hiding on a lv somewhere. I could perhaps just move it to an nfs data 
domain .. ?

Looking at the metadata before the export, after the export, and after
the import, we can understand what is the root cause.

It will be hard to find the metadata after the failed copy since vdsm 
try

hard to clean up after errors, but the information should be available
in vdsm log.

Yes I noticed, hence the qemu-img wrapper

* Disk3 - 32GB (Thin)

Where the two thin disk (1 & 3) are successfully imported but disk2, 
the

preallocated always fail.


...

and from vdsm.log


...

CopyImageError: low level Image copy failed: ('ecode=1, stdout=,
stderr=qemu-img: error while writing sector 73912303: No space left on
device\n, message=None',)
We need log from the entire flow, starting at "Run and protect: 
copyImage..."


...
The first checking the size of the image (37849399296) , and the 
second the

size of logical volume (34359738368) just created to hold this image.
And as you can see the volume is smaller in size than the image it 
should

hold, whereas we are under the impression something made an incorrect
decision when creating that volume.
The destination image size depend on the destination format. If the 
destination

is preallocated, the logical volume size *must* be the virtual size
(32G). If it is
sparse, the logical volume should be the file size on the export 
domain (35G).


According to your findings, we created a destination image for a 
preallocated
disk (32G), and then tried to run "qemu-img convert" with qcow2 
format as
both source and destination. However this is only a guess, since I 
don't have

the log showing the actual qemu-img command.
12:37:15 685557156   ---   Identifier: 51635 , Arguments: convert -p 
-t none -T none -f raw 
/rhev/data-center/9d200b26-359e-48b6-972a-90da179e4829/61842ad9-42da-40a9-8ec8-dd7807a82916/images/9eb60288-27b6-4fb1-aef1-4246455d588e/ddf8b402-514c-4a3c-9683-26810a7c41c0 
-O raw 
/rhev/data-center/mnt/blockSD/cb64e1fc-98b6-4b8c-916e-418d05bcd467/images/a1d70c22-cace-48d2-9809-caadc70b77e7/71f5fe82-81dd-47e9-aa3f-1a66622db4cb

Please share complete engine and vdsm logs showing the entire flow.

http://whs1.elementary.se/logs.tar.gz
In vdsm.log search for  12:37:15


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] low level Image copy failed

2016-10-23 Thread Jonas Israelsson

On 23/10/16 20:06, Nir Soffer wrote:


On Sun, Oct 23, 2016 at 5:34 PM, Jonas Israelsson
 wrote:

Greetings.

We are in the process of migrating from oVirt 3.6 to 4.0. To properly test
4.0 we have setup a parallel 4.0 environment.

For the non critical vm:s we thought we try the "export vms --> move storage
domain to the other DC --> import vms" method.

While many imports are successful quite a few fails with 'low level Image
copy failed'

One of these vm impossible to import have the following disk layout.

* Disk 1 - 100GB  (Thin)

* Disk2 - 32GB (Preallocated)

According to the volume .meta file bellow, this is COW/SPARSE,
not preallocated.
It's because I'm an idiot and gave you information about the wrong disk. 
My apologizes..


$ /usr/bin/qemu-img.org info 
/rhev/data-center/9d200b26-359e-48b6-972a-90da179e4829/61842ad9-42da-40a9-8ec8-dd7807a82916/images/9eb60288-27b6-4fb1-aef1-4246455d588e/ddf8b402-514c-4a3c-9683-26810a7c41c0


image: 
/rhev/data-center/9d200b26-359e-48b6-972a-90da179e4829/61842ad9-42da-40a9-8ec8-dd7807a82916/images/9eb60288-27b6-4fb1-aef1-4246455d588e/ddf8b402-514c-4a3c-9683-26810a7c41c0

file format: raw
virtual size: 35G (37849399296 bytes)
disk size: 35G


[root@patty tmp]# cat 
/rhev/data-center/9d200b26-359e-48b6-972a-90da179e4829/61842ad9-42da-40a9-8ec8-dd7807a82916/images/9eb60288-27b6-4fb1-aef1-4246455d588e/ddf8b402-514c-4a3c-9683-26810a7c41c0.meta 


DOMAIN=61842ad9-42da-40a9-8ec8-dd7807a82916
VOLTYPE=LEAF
CTIME=1476880543
FORMAT=RAW
IMAGE=9eb60288-27b6-4fb1-aef1-4246455d588e
DISKTYPE=2
PUUID=----
LEGALITY=LEGAL
MTIME=0
POOL_UUID=
SIZE=67108864
TYPE=PREALLOCATED
DESCRIPTION=
EOF





Can you share the original vm disk metadata before the export?
Could you please instruct me how to ? It's on a FC-LUN so it's then 
hiding on a lv somewhere. I could perhaps just move it to an nfs data 
domain .. ?

Looking at the metadata before the export, after the export, and after
the import, we can understand what is the root cause.

It will be hard to find the metadata after the failed copy since vdsm try
hard to clean up after errors, but the information should be available
in vdsm log.

Yes I noticed, hence the qemu-img wrapper

* Disk3 - 32GB (Thin)

Where the two thin disk (1 & 3) are successfully imported but disk2, the
preallocated always fail.


...

and from vdsm.log


...

CopyImageError: low level Image copy failed: ('ecode=1, stdout=,
stderr=qemu-img: error while writing sector 73912303: No space left on
device\n, message=None',)

We need log from the entire flow, starting at "Run and protect: copyImage..."

...

The first checking the size of the image (37849399296) , and the second the
size of logical volume (34359738368) just created to hold this image.
And as you can see the volume is smaller in size than the image it should
hold, whereas we are under the impression something made an incorrect
decision when creating that volume.

The destination image size depend on the destination format. If the destination
is preallocated, the logical volume size *must* be the virtual size
(32G). If it is
sparse, the logical volume should be the file size on the export domain (35G).

According to your findings, we created a destination image for a preallocated
disk (32G), and then tried to run "qemu-img convert" with qcow2 format as
both source and destination. However this is only a guess, since I don't have
the log showing the actual qemu-img command.
12:37:15 685557156   ---   Identifier: 51635 , Arguments: convert -p -t 
none -T none -f raw 
/rhev/data-center/9d200b26-359e-48b6-972a-90da179e4829/61842ad9-42da-40a9-8ec8-dd7807a82916/images/9eb60288-27b6-4fb1-aef1-4246455d588e/ddf8b402-514c-4a3c-9683-26810a7c41c0 
-O raw 
/rhev/data-center/mnt/blockSD/cb64e1fc-98b6-4b8c-916e-418d05bcd467/images/a1d70c22-cace-48d2-9809-caadc70b77e7/71f5fe82-81dd-47e9-aa3f-1a66622db4cb

Please share complete engine and vdsm logs showing the entire flow.

http://whs1.elementary.se/logs.tar.gz
In vdsm.log search for  12:37:15
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] low level Image copy failed

2016-10-23 Thread Nir Soffer
On Sun, Oct 23, 2016 at 5:34 PM, Jonas Israelsson
 wrote:
> Greetings.
>
> We are in the process of migrating from oVirt 3.6 to 4.0. To properly test
> 4.0 we have setup a parallel 4.0 environment.
>
> For the non critical vm:s we thought we try the "export vms --> move storage
> domain to the other DC --> import vms" method.
>
> While many imports are successful quite a few fails with 'low level Image
> copy failed'
>
> One of these vm impossible to import have the following disk layout.
>
> * Disk 1 - 100GB  (Thin)
>
> * Disk2 - 32GB (Preallocated)

According to the volume .meta file bellow, this is COW/SPARSE,
not preallocated.

Can you share the original vm disk metadata before the export?

Looking at the metadata before the export, after the export, and after
the import, we can understand what is the root cause.

It will be hard to find the metadata after the failed copy since vdsm try
hard to clean up after errors, but the information should be available
in vdsm log.

> * Disk3 - 32GB (Thin)
>
> Where the two thin disk (1 & 3) are successfully imported but disk2, the
> preallocated always fail.
>
...
> and from vdsm.log
>
...
> CopyImageError: low level Image copy failed: ('ecode=1, stdout=,
> stderr=qemu-img: error while writing sector 73912303: No space left on
> device\n, message=None',)

We need log from the entire flow, starting at "Run and protect: copyImage..."

...
> The first checking the size of the image (37849399296) , and the second the
> size of logical volume (34359738368) just created to hold this image.
> And as you can see the volume is smaller in size than the image it should
> hold, whereas we are under the impression something made an incorrect
> decision when creating that volume.

The destination image size depend on the destination format. If the destination
is preallocated, the logical volume size *must* be the virtual size
(32G). If it is
sparse, the logical volume should be the file size on the export domain (35G).

According to your findings, we created a destination image for a preallocated
disk (32G), and then tried to run "qemu-img convert" with qcow2 format as
both source and destination. However this is only a guess, since I don't have
the log showing the actual qemu-img command.

Please share complete engine and vdsm logs showing the entire flow.

Nir
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] low level Image copy failed

2016-10-23 Thread Jonas Israelsson

Greetings.

We are in the process of migrating from oVirt 3.6 to 4.0. To properly 
test 4.0 we have setup a parallel 4.0 environment.


For the non critical vm:s we thought we try the "export vms --> move 
storage domain to the other DC --> import vms" method.


While many imports are successful quite a few fails with 'low level 
Image copy failed'


One of these vm impossible to import have the following disk layout.

* Disk 1 - 100GB  (Thin)

* Disk2 - 32GB (Preallocated)

* Disk3 - 32GB (Thin)

Where the two thin disk (1 & 3) are successfully imported but disk2, the 
preallocated always fail.


From engine.log

2016-10-19 18:50:28,096 INFO 
[org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (DefaultQuartzScheduler2) 
[2dc919bd] BaseAsyncTask::onTaskEndSuccess: Task 
'30832827-078e-4359-8552-0dccdc9821ff' (Parent Command 'ImportVm', 
Parameters Type 
'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended 
successfully.
2016-10-19 18:50:28,096 INFO 
[org.ovirt.engine.core.bll.CommandMultiAsyncTasks] 
(DefaultQuartzScheduler2) [2dc919bd] Task with DB Task ID 
'64829f3d-194b-434f-8997-4723770e4638' and VDSM Task ID 
'bccae407-0c28-4556-80d3-6b61887ce045' is in state Polling. End action 
for command 39bbd979-e9f8-4cf6-901f-55d109baa9cc will proceed when all 
the entity's tasks are completed.
2016-10-19 18:50:40,231 ERROR 
[org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand] 
(DefaultQuartzScheduler2) [2dc919bd] Failed in 
'HSMGetAllTasksStatusesVDS' method
2016-10-19 18:50:40,243 ERROR 
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(DefaultQuartzScheduler2) [2dc919bd] Correlation ID: null, Call Stack: 
null, Custom Event ID: -1, Message: VDSM fattony command failed: low 
level Image copy failed
2016-10-19 18:50:40,243 INFO 
[org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (DefaultQuartzScheduler2) 
[2dc919bd] SPMAsyncTask::PollTask: Polling task 
'bccae407-0c28-4556-80d3-6b61887ce045' (Parent Command 'ImportVm', 
Parameters Type 
'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') returned 
status 'finished', result 'cleanSuccess'.
2016-10-19 18:50:40,296 ERROR 
[org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (DefaultQuartzScheduler2) 
[2dc919bd] BaseAsyncTask::logEndTaskFailure: Task 
'bccae407-0c28-4556-80d3-6b61887ce045' (Parent Command 'ImportVm', 
Parameters Type 
'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended 
with failure:

-- Result: 'cleanSuccess'
-- Message: 'VDSGenericException: VDSErrorException: Failed to 
HSMGetAllTasksStatusesVDS, error = low level Image copy failed, code = 
261',
-- Exception: 'VDSGenericException: VDSErrorException: Failed to 
HSMGetAllTasksStatusesVDS, error = low level Image copy failed, code = 261'


and from vdsm.log

bccae407-0c28-4556-80d3-6b61887ce045::DEBUG::2016-10-19 
18:50:36,451::resourceManager::661::Storage.ResourceManager::(releaseResource) 
No one is waiting for resource 
'61842ad9-42da-40a9-8ec8-dd7807a82916_imageNS.9eb60288-27b6-4fb1-aef1-4246455d588e', 
Clearing records.
bccae407-0c28-4556-80d3-6b61887ce045::ERROR::2016-10-19 
18:50:36,452::task::868::Storage.TaskManager.Task::(_setError) 
Task=`bccae407-0c28-4556-80d3-6b61887ce045`::Unexpected error

Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 875, in _run
return fn(*args, **kargs)
  File "/usr/share/vdsm/storage/task.py", line 334, in run
return self.cmd(*self.argslist, **self.argsdict)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", 
line 78, in wrapper

return method(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 1558, in copyImage
postZero, force)
  File "/usr/share/vdsm/storage/image.py", line 902, in copyCollapsed
raise se.CopyImageError(str(e))
CopyImageError: low level Image copy failed: ('ecode=1, stdout=, 
stderr=qemu-img: error while writing sector 73912303: No space left on 
device\n, message=None',)



To further figure out what is going on we created a wrapper, replacing 
qemu-img with a script that runs qemu-img through strace.

What caught our attention is the following two lseek

stat("/rhev/data-center/9d200b26-359e-48b6-972a-90da179e4829/61842ad9-42da-40a9-8ec8-dd7807a82916/images/9eb60288-27b6-4fb1-aef1-4246455d588e/ddf8b402-514c-4a3c-9683-26810a7c41c0", 
{st_mode=S_IFREG|0660, st_size=37849399296, ...}) = 0
open("/rhev/data-center/9d200b26-359e-48b6-972a-90da179e4829/61842ad9-42da-40a9-8ec8-dd7807a82916/images/9eb60288-27b6-4fb1-aef1-4246455d588e/ddf8b402-514c-4a3c-9683-26810a7c41c0", 
O_RDONLY|O_DIRECT|O_CLOEXEC) = 12

fstat(12, {st_mode=S_IFREG|0660, st_size=37849399296, ...}) = 0
lseek(12, 0, SEEK_END)  = 37849399296

AND

stat("/rhev/data-center/mnt/blockSD/cb64e1fc-98b6-4b8c-916e-418d05bcd467/images/a1d70c22-cace-48d2-9809-caadc70b77e7/71f5fe82-81dd-47e9-aa3f-1a66622db4cb", 
{st_mode=S_IFBLK|0660, st_rdev=makedev(253, 42), ...}) = 0

Re: [ovirt-users] Ovirt Hypervisor vdsm.Scheduler logs fill partition

2016-10-23 Thread Steve Dainard
Do you know when .34 will be released?

http://mirror.centos.org/centos/7/virt/x86_64/ovirt-3.6/
Latest version is:
vdsm-cli-4.17.32-1.el7.noarch.rpm 08-Aug-2016 17:36

On Fri, Oct 14, 2016 at 1:11 AM, Francesco Romani 
wrote:

>
> - Original Message -
> > From: "Simone Tiraboschi" 
> > To: "Steve Dainard" , "Francesco Romani" <
> from...@redhat.com>
> > Cc: "users" 
> > Sent: Friday, October 14, 2016 9:59:49 AM
> > Subject: Re: [ovirt-users] Ovirt Hypervisor vdsm.Scheduler logs fill
> partition
> >
> > On Fri, Oct 14, 2016 at 1:12 AM, Steve Dainard 
> wrote:
> >
> > > Hello,
> > >
> > > I had a hypervisor semi-crash this week, 4 of ~10 VM's continued to
> run,
> > > but the others were killed off somehow and all VM's running on this
> host
> > > had '?' status in the ovirt UI.
> > >
> > > This appears to have been caused by vdsm logs filling up disk space on
> the
> > > logging partition.
> > >
> > > I've attached the log file vdsm.log.27.xz which shows this error:
> > >
> > > vdsm.Scheduler::DEBUG::2016-10-11
> > > 16:42:09,318::executor::216::Executor::(_discard)
> > > Worker discarded:  > > action= > > 'virt.periodic.DriveWatermarkMonitor'>
> > > at 0x7f8e90021210> at 0x7f8e90021250> discarded at 0x7f8dd123e850>
> > >
> > > which happens more and more frequently throughout the log.
> > >
> > > It was a bit difficult to understand what caused the failure, but the
> logs
> > > were getting really large, then being xz'd which compressed 11G+ into
> a few
> > > MB. Once this happened the disk space would be freed, and nagios
> wouldn't
> > > hit the 3rd check to throw a warning, until pretty much right at the
> crash.
> > >
> > > I was able to restart vdsmd to resolve the issue, but I still need to
> know
> > > why these logs started to stack up so I can avoid this issue in the
> future.
> > >
> >
> > We had this one: https://bugzilla.redhat.com/show_bug.cgi?id=1383259
> > but in your case the logs are rotating.
> > Francesco?
>
> Hi,
>
> yes, it is a different issue. Here the log messages are caused by the
> Worker threads
> of the periodic subsystem, which are leaking[1].
> This was a bug in Vdsm (insufficient protection against rogue domains),
> but the
> real problem is that some of your domain are being unresponsive at
> hypervisor level.
> The most likely cause is in turn unresponsive storages.
>
> Fixes are been committed and shipped with Vdsm 4.17.34.
>
> See: ttps://bugzilla.redhat.com/1364925
>
> HTH,
>
> +++
>
> [1] actually, they are replaced too quickly, leading to unbound growth.
> So those aren't actually "leaking", Vdsm is just overzealous handling one
> error condition,
> making things worse than before.
> Still serious issue, no doubt, but quite different cause.
>
> --
> Francesco Romani
> Red Hat Engineering Virtualization R & D
> Phone: 8261328
> IRC: fromani
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users