Re: [Users] VM crashes and doesn't recover

2013-04-09 Thread Federico Simoncelli
- Original Message -
> From: "Yuval M" 
> To: "Dan Kenigsberg" 
> Cc: users@ovirt.org, "Nezer Zaidenberg" 
> Sent: Friday, March 29, 2013 2:19:43 PM
> Subject: Re: [Users] VM crashes and doesn't recover
> 
> Any ideas on what can cause that storage crash?
> could it be related to using a SSD?

What the logs say is that the IO on the storage domain are failing (both
the oop timeouts and the sanlock log) and this triggers the VDSM restart.

> On Sun, Mar 24, 2013 at 09:50:02PM +0200, Yuval M wrote:
> > I am running vdsm from packages as my interest is in developing for the
> > I noticed that when the storage domain crashes I can't even do "df -h"
> > (hangs)

This is also consistent with the unreachable domain.

The dmesg log that you attached doesn't contain timestamps so it's hard to
correlate with the rest.

If you want you can try to reproduce the issue and resubmit the logs:

/var/log/vdsm/vdsm.log
/var/log/sanlock.log
/var/log/messages

(Maybe stating also the exact time when the issue begins to appear)

In the logs I noticed that you're using only one NFS domain, and I think that
the SSD (on the storage side) shouldn't be a problem. When you experience such
failure are you able to read/write from/to the SSD on machine that is serving
the share? (If it's the same machine check that using the "real" path where
it's mounted, not the nfs share)

-- 
Federico
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [Users] VM crashes and doesn't recover

2013-03-31 Thread Ayal Baron
Can you attach the sanlock log and the full vdsm log? (compress it if it's too 
big and not xz yet)
Thanks.

- Original Message -
> Any ideas on what can cause that storage crash?
> could it be related to using a SSD?
> 
> Thanks,
> 
> Yuval Meir
> 
> 
> On Wed, Mar 27, 2013 at 6:08 PM, Yuval M < yuva...@gmail.com > wrote:
> 
> 
> 
> Still getting crashes with the patch:
> # rpm -q vdsm
> vdsm-4.10.3-0.281.git97db188.fc18.x86_64
> 
> attached excerpts from vdsm.log and from dmesg.
> 
> Yuval
> 
> 
> On Wed, Mar 27, 2013 at 11:02 AM, Dan Kenigsberg < dan...@redhat.com > wrote:
> 
> 
> 
> On Sun, Mar 24, 2013 at 09:50:02PM +0200, Yuval M wrote:
> > I am running vdsm from packages as my interest is in developing for the
> > engine and not vdsm.
> > I updated the vdsm package in an attempt to solve this, now I have:
> > # rpm -q vdsm
> > vdsm-4.10.3-10.fc18.x86_64
> 
> I'm afraid that this build still does not have the patch mentioned
> earlier.
> 
> > 
> > I noticed that when the storage domain crashes I can't even do "df -h"
> > (hangs)
> 
> That's expectable, since the master domain is still mounted (due to that
> patch missing), but unreachable.
> 
> Would you be kind to try out my little patch, in order to advance a bit
> in the research to solve the bug?
> 
> 
> > I'm also getting some errors in /var/log/messages:
> > 
> > Mar 24 19 :57:44 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to
> > svdsm
> > failed [Errno 2] No such file or directory
> > Mar 24 19 :57:45 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to
> > svdsm
> > failed [Errno 2] No such file or directory
> > Mar 24 19 :57:46 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to
> > svdsm
> > failed [Errno 2] No such file or directory
> > Mar 24 19 :57:47 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to
> > svdsm
> > failed [Errno 2] No such file or directory
> > Mar 24 19 :57:48 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to
> > svdsm
> > failed [Errno 2] No such file or directory
> > Mar 24 19 :57:49 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to
> > svdsm
> > failed [Errno 2] No such file or directory
> > Mar 24 19 :57:50 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to
> > svdsm
> > failed [Errno 2] No such file or directory
> > Mar 24 19 :57:51 bufferoverflow sanlock[1208]: 2013-03-24 19:57:51+0200
> > 7412
> > [4759 ]: 1083422e close_task_aio 0 0x7ff3740008c0 busy
> > Mar 24 19 :57:51 bufferoverflow sanlock[1208]: 2013-03-24 19:57:51+0200
> > 7412
> > [4759 ]: 1083422e close_task_aio 1 0x7ff374000910 busy
> > Mar 24 19 :57:51 bufferoverflow sanlock[1208]: 2013-03-24 19:57:51+0200
> > 7412
> > [4759 ]: 1083422e close_task_aio 2 0x7ff374000960 busy
> > Mar 24 19 :57:51 bufferoverflow sanlock[1208]: 2013-03-24 19:57:51+0200
> > 7412
> > [4759 ]: 1083422e close_task_aio 3 0x7ff3740009b0 busy
> > Mar 24 19:57:51 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm
> > failed [Errno 2] No such file or directory
> > Mar 24 19:57:52 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm
> > failed [Errno 2] No such file or directory
> > Mar 24 19:57:53 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm
> > failed [Errno 2] No such file or directory
> > Mar 24 19:57:54 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm
> > failed [Errno 2] No such file or directory
> > Mar 24 19:57:55 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm
> > failed [Errno 2] No such file or directory
> > Mar 24 19:57:55 bufferoverflow vdsm Storage.Misc ERROR Panic: Couldn't
> > connect to supervdsm
> > Mar 24 19:57:55 bufferoverflow respawn: slave '/usr/share/vdsm/vdsm' died,
> > respawning slave
> > Mar 24 19:57:55 bufferoverflow vdsm fileUtils WARNING Dir
> > /rhev/data-center/mnt already exists
> > Mar 24 19:57:58 bufferoverflow vdsm vds WARNING Unable to load the json rpc
> > server module. Please make sure it is installed.
> > Mar 24 19:57:58 bufferoverflow vdsm vm.Vm WARNING
> > vmId=`4d3d81b3-d083-4569-acc2-8e631ed51843`::Unknown type found, device:
> > '{'device': u'unix', 'alias': u'channel0', 'type': u'channel', 'address':
> > {u'bus': u'0', u'controller': u'0', u'type': u'virtio-serial', u'port':
> > u'1'}}' found
> > Mar 24 19:57:58 bufferoverflow vdsm vm.Vm WARNING
> > vmId=`4d3d81b3-d083-4569-acc2-8e631ed51843`::Unknown type found, device:
> > '{'device': u'unix', 'alias': u'channel1', 'type': u'channel', 'address':
> > {u'bus': u'0', u'controller': u'0', u'type': u'virtio-serial', u'port':
> > u'2'}}' found
> > Mar 24 19:57:58 bufferoverflow vdsm vm.Vm WARNING
> > vmId=`4d3d81b3-d083-4569-acc2-8e631ed51843`::_readPauseCode unsupported by
> > libvirt vm
> > Mar 24 19:57:58 bufferoverflow kernel: [ 7402.688177] ata1: hard resetting
> > link
> > Mar 24 19:57:59 bufferoverflow kernel: [ 7402.994510] ata1: SATA link up
> > 6.0 Gbps (SStatus 133 SControl 300)
> > Mar 24 19:57:59 bufferoverflow kernel: [ 7403.005510] ACPI Error: [DSSP]
> > Namespace lookup failure, AE_NOT_FOUN

Re: [Users] VM crashes and doesn't recover

2013-03-29 Thread Yuval M
Any ideas on what can cause that storage crash?
could it be related to using a SSD?

Thanks,

Yuval Meir


On Wed, Mar 27, 2013 at 6:08 PM, Yuval M  wrote:

> Still getting crashes with the patch:
> # rpm -q vdsm
> vdsm-4.10.3-0.281.git97db188.fc18.x86_64
>
> attached excerpts from vdsm.log and from dmesg.
>
> Yuval
>
>
> On Wed, Mar 27, 2013 at 11:02 AM, Dan Kenigsberg wrote:
>
>> On Sun, Mar 24, 2013 at 09:50:02PM +0200, Yuval M wrote:
>> > I am running vdsm from packages as my interest is in developing for the
>> > engine and not vdsm.
>> > I updated the vdsm package in an attempt to solve this, now I have:
>> > # rpm -q vdsm
>> > vdsm-4.10.3-10.fc18.x86_64
>>
>> I'm afraid that this build still does not have the patch mentioned
>> earlier.
>>
>> >
>> > I noticed that when the storage domain crashes I can't even do "df -h"
>> > (hangs)
>>
>> That's expectable, since the master domain is still mounted (due to that
>> patch missing), but unreachable.
>>
>> Would you be kind to try out my little patch, in order to advance a bit
>> in the research to solve the bug?
>>
>>
>> > I'm also getting some errors in /var/log/messages:
>> >
>> > Mar 24 19:57:44 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to
>> svdsm
>> > failed [Errno 2] No such file or directory
>> > Mar 24 19:57:45 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to
>> svdsm
>> > failed [Errno 2] No such file or directory
>> > Mar 24 19:57:46 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to
>> svdsm
>> > failed [Errno 2] No such file or directory
>> > Mar 24 19:57:47 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to
>> svdsm
>> > failed [Errno 2] No such file or directory
>> > Mar 24 19:57:48 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to
>> svdsm
>> > failed [Errno 2] No such file or directory
>> > Mar 24 19:57:49 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to
>> svdsm
>> > failed [Errno 2] No such file or directory
>> > Mar 24 19:57:50 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to
>> svdsm
>> > failed [Errno 2] No such file or directory
>> > Mar 24 19:57:51 bufferoverflow sanlock[1208]: 2013-03-24 19:57:51+0200
>> 7412
>> > [4759]: 1083422e close_task_aio 0 0x7ff3740008c0 busy
>> > Mar 24 19:57:51 bufferoverflow sanlock[1208]: 2013-03-24 19:57:51+0200
>> 7412
>> > [4759]: 1083422e close_task_aio 1 0x7ff374000910 busy
>> > Mar 24 19:57:51 bufferoverflow sanlock[1208]: 2013-03-24 19:57:51+0200
>> 7412
>> > [4759]: 1083422e close_task_aio 2 0x7ff374000960 busy
>> > Mar 24 19:57:51 bufferoverflow sanlock[1208]: 2013-03-24 19:57:51+0200
>> 7412
>> > [4759]: 1083422e close_task_aio 3 0x7ff3740009b0 busy
>> > Mar 24 19:57:51 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to
>> svdsm
>> > failed [Errno 2] No such file or directory
>> > Mar 24 19:57:52 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to
>> svdsm
>> > failed [Errno 2] No such file or directory
>> > Mar 24 19:57:53 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to
>> svdsm
>> > failed [Errno 2] No such file or directory
>> > Mar 24 19:57:54 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to
>> svdsm
>> > failed [Errno 2] No such file or directory
>> > Mar 24 19:57:55 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to
>> svdsm
>> > failed [Errno 2] No such file or directory
>> > Mar 24 19:57:55 bufferoverflow vdsm Storage.Misc ERROR Panic: Couldn't
>> > connect to supervdsm
>> > Mar 24 19:57:55 bufferoverflow respawn: slave '/usr/share/vdsm/vdsm'
>> died,
>> > respawning slave
>> > Mar 24 19:57:55 bufferoverflow vdsm fileUtils WARNING Dir
>> > /rhev/data-center/mnt already exists
>> > Mar 24 19:57:58 bufferoverflow vdsm vds WARNING Unable to load the json
>> rpc
>> > server module. Please make sure it is installed.
>> > Mar 24 19:57:58 bufferoverflow vdsm vm.Vm WARNING
>> > vmId=`4d3d81b3-d083-4569-acc2-8e631ed51843`::Unknown type found, device:
>> > '{'device': u'unix', 'alias': u'channel0', 'type': u'channel',
>> 'address':
>> > {u'bus': u'0', u'controller': u'0', u'type': u'virtio-serial', u'port':
>> > u'1'}}' found
>> > Mar 24 19:57:58 bufferoverflow vdsm vm.Vm WARNING
>> > vmId=`4d3d81b3-d083-4569-acc2-8e631ed51843`::Unknown type found, device:
>> > '{'device': u'unix', 'alias': u'channel1', 'type': u'channel',
>> 'address':
>> > {u'bus': u'0', u'controller': u'0', u'type': u'virtio-serial', u'port':
>> > u'2'}}' found
>> > Mar 24 19:57:58 bufferoverflow vdsm vm.Vm WARNING
>> > vmId=`4d3d81b3-d083-4569-acc2-8e631ed51843`::_readPauseCode unsupported
>> by
>> > libvirt vm
>> > Mar 24 19:57:58 bufferoverflow kernel: [ 7402.688177] ata1: hard
>> resetting
>> > link
>> > Mar 24 19:57:59 bufferoverflow kernel: [ 7402.994510] ata1: SATA link up
>> > 6.0 Gbps (SStatus 133 SControl 300)
>> > Mar 24 19:57:59 bufferoverflow kernel: [ 7403.005510] ACPI Error: [DSSP]
>> > Namespace lookup failure, AE_NOT_FOUND (20120711/psargs-359)
>> > Mar 24 19:57:59 bufferoverflow kernel: [ 7403.005517] ACPI Error: Method
>> > parse/execution failed [\_SB_.

Re: [Users] VM crashes and doesn't recover

2013-03-28 Thread Limor Gavish
Concerning the following error in dmesg:

[ 2235.638814] device-mapper: table: 253:0: multipath: error getting device
[ 2235.638816] device-mapper: ioctl: error adding target to table

I tried to debug it but mutipath gives me some problems

[wil@bufferoverflow vdsm]$ sudo multipath -l
Mar 28 18:28:19 | multipath.conf +5, invalid keyword: getuid_callout
Mar 28 18:28:19 | multipath.conf +18, invalid keyword: getuid_callout
[wil@bufferoverflow vdsm]$ sudo multipath -F
Mar 28 18:28:30 | multipath.conf +5, invalid keyword: getuid_callout
Mar 28 18:28:30 | multipath.conf +18, invalid keyword: getuid_callout
[wil@bufferoverflow vdsm]$  sudo multipath -v2
Mar 28 18:28:35 | multipath.conf +5, invalid keyword: getuid_callout
Mar 28 18:28:35 | multipath.conf +18, invalid keyword: getuid_callout
Mar 28 18:28:35 | sda: rport id not found
Mar 28 18:28:35 | Corsair_Force_GS_13057914977000C3: ignoring map

Any idea if those mutipath errors are related to the storage crash?

Here is the mutipath.conf:

[wil@bufferoverflow vdsm]$ sudo cat /etc/multipath.conf
*# RHEV REVISION 1.0*
*
*
*defaults {*
*polling_interval5*
*getuid_callout  "/usr/lib/udev/scsi_id --whitelisted
--replace-whitespace --device=/dev/%n"*
*no_path_retry   fail*
*user_friendly_names no*
*flush_on_last_del   yes*
*fast_io_fail_tmo5*
*dev_loss_tmo30*
*max_fds 4096*
*}*
*
*
*devices {*
*device {*
*vendor  "HITACHI"*
*product "DF.*"*
*getuid_callout  "/usr/lib/udev/scsi_id --whitelisted
--replace-whitespace --device=/dev/%n"*
*}*
*device {*
*vendor  "COMPELNT"*
*product "Compellent Vol"*
*no_path_retry   fail*
*}*
*}*

Thanks,
Limor G


On Wed, Mar 27, 2013 at 6:08 PM, Yuval M  wrote:

> Still getting crashes with the patch:
> # rpm -q vdsm
> vdsm-4.10.3-0.281.git97db188.fc18.x86_64
>
> attached excerpts from vdsm.log and from dmesg.
>
> Yuval
>
>
> On Wed, Mar 27, 2013 at 11:02 AM, Dan Kenigsberg wrote:
>
>> On Sun, Mar 24, 2013 at 09:50:02PM +0200, Yuval M wrote:
>> > I am running vdsm from packages as my interest is in developing for the
>> > engine and not vdsm.
>> > I updated the vdsm package in an attempt to solve this, now I have:
>> > # rpm -q vdsm
>> > vdsm-4.10.3-10.fc18.x86_64
>>
>> I'm afraid that this build still does not have the patch mentioned
>> earlier.
>>
>> >
>> > I noticed that when the storage domain crashes I can't even do "df -h"
>> > (hangs)
>>
>> That's expectable, since the master domain is still mounted (due to that
>> patch missing), but unreachable.
>>
>> Would you be kind to try out my little patch, in order to advance a bit
>> in the research to solve the bug?
>>
>>
>> > I'm also getting some errors in /var/log/messages:
>> >
>> > Mar 24 19:57:44 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to
>> svdsm
>> > failed [Errno 2] No such file or directory
>> > Mar 24 19:57:45 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to
>> svdsm
>> > failed [Errno 2] No such file or directory
>> > Mar 24 19:57:46 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to
>> svdsm
>> > failed [Errno 2] No such file or directory
>> > Mar 24 19:57:47 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to
>> svdsm
>> > failed [Errno 2] No such file or directory
>> > Mar 24 19:57:48 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to
>> svdsm
>> > failed [Errno 2] No such file or directory
>> > Mar 24 19:57:49 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to
>> svdsm
>> > failed [Errno 2] No such file or directory
>> > Mar 24 19:57:50 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to
>> svdsm
>> > failed [Errno 2] No such file or directory
>> > Mar 24 19:57:51 bufferoverflow sanlock[1208]: 2013-03-24 19:57:51+0200
>> 7412
>> > [4759]: 1083422e close_task_aio 0 0x7ff3740008c0 busy
>> > Mar 24 19:57:51 bufferoverflow sanlock[1208]: 2013-03-24 19:57:51+0200
>> 7412
>> > [4759]: 1083422e close_task_aio 1 0x7ff374000910 busy
>> > Mar 24 19:57:51 bufferoverflow sanlock[1208]: 2013-03-24 19:57:51+0200
>> 7412
>> > [4759]: 1083422e close_task_aio 2 0x7ff374000960 busy
>> > Mar 24 19:57:51 bufferoverflow sanlock[1208]: 2013-03-24 19:57:51+0200
>> 7412
>> > [4759]: 1083422e close_task_aio 3 0x7ff3740009b0 busy
>> > Mar 24 19:57:51 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to
>> svdsm
>> > failed [Errno 2] No such file or directory
>> > Mar 24 19:57:52 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to
>> svdsm
>> > failed [Errno 2] No such file or directory
>> > Mar 24 19:57:53 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to
>> svdsm
>> > failed [Errno 2] No such file or directory
>> > Mar 24 19:57:54 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to
>> svdsm
>> > failed [Errno 2] No such file or directory
>> > Mar 24 19:57:55 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to
>> svdsm
>> > failed [Errno 2] No such 

Re: [Users] VM crashes and doesn't recover

2013-03-27 Thread Dan Kenigsberg
On Sun, Mar 24, 2013 at 09:50:02PM +0200, Yuval M wrote:
> I am running vdsm from packages as my interest is in developing for the
> engine and not vdsm.
> I updated the vdsm package in an attempt to solve this, now I have:
> # rpm -q vdsm
> vdsm-4.10.3-10.fc18.x86_64

I'm afraid that this build still does not have the patch mentioned
earlier.

> 
> I noticed that when the storage domain crashes I can't even do "df -h"
> (hangs)

That's expectable, since the master domain is still mounted (due to that
patch missing), but unreachable.

Would you be kind to try out my little patch, in order to advance a bit
in the research to solve the bug?


> I'm also getting some errors in /var/log/messages:
> 
> Mar 24 19:57:44 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm
> failed [Errno 2] No such file or directory
> Mar 24 19:57:45 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm
> failed [Errno 2] No such file or directory
> Mar 24 19:57:46 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm
> failed [Errno 2] No such file or directory
> Mar 24 19:57:47 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm
> failed [Errno 2] No such file or directory
> Mar 24 19:57:48 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm
> failed [Errno 2] No such file or directory
> Mar 24 19:57:49 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm
> failed [Errno 2] No such file or directory
> Mar 24 19:57:50 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm
> failed [Errno 2] No such file or directory
> Mar 24 19:57:51 bufferoverflow sanlock[1208]: 2013-03-24 19:57:51+0200 7412
> [4759]: 1083422e close_task_aio 0 0x7ff3740008c0 busy
> Mar 24 19:57:51 bufferoverflow sanlock[1208]: 2013-03-24 19:57:51+0200 7412
> [4759]: 1083422e close_task_aio 1 0x7ff374000910 busy
> Mar 24 19:57:51 bufferoverflow sanlock[1208]: 2013-03-24 19:57:51+0200 7412
> [4759]: 1083422e close_task_aio 2 0x7ff374000960 busy
> Mar 24 19:57:51 bufferoverflow sanlock[1208]: 2013-03-24 19:57:51+0200 7412
> [4759]: 1083422e close_task_aio 3 0x7ff3740009b0 busy
> Mar 24 19:57:51 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm
> failed [Errno 2] No such file or directory
> Mar 24 19:57:52 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm
> failed [Errno 2] No such file or directory
> Mar 24 19:57:53 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm
> failed [Errno 2] No such file or directory
> Mar 24 19:57:54 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm
> failed [Errno 2] No such file or directory
> Mar 24 19:57:55 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm
> failed [Errno 2] No such file or directory
> Mar 24 19:57:55 bufferoverflow vdsm Storage.Misc ERROR Panic: Couldn't
> connect to supervdsm
> Mar 24 19:57:55 bufferoverflow respawn: slave '/usr/share/vdsm/vdsm' died,
> respawning slave
> Mar 24 19:57:55 bufferoverflow vdsm fileUtils WARNING Dir
> /rhev/data-center/mnt already exists
> Mar 24 19:57:58 bufferoverflow vdsm vds WARNING Unable to load the json rpc
> server module. Please make sure it is installed.
> Mar 24 19:57:58 bufferoverflow vdsm vm.Vm WARNING
> vmId=`4d3d81b3-d083-4569-acc2-8e631ed51843`::Unknown type found, device:
> '{'device': u'unix', 'alias': u'channel0', 'type': u'channel', 'address':
> {u'bus': u'0', u'controller': u'0', u'type': u'virtio-serial', u'port':
> u'1'}}' found
> Mar 24 19:57:58 bufferoverflow vdsm vm.Vm WARNING
> vmId=`4d3d81b3-d083-4569-acc2-8e631ed51843`::Unknown type found, device:
> '{'device': u'unix', 'alias': u'channel1', 'type': u'channel', 'address':
> {u'bus': u'0', u'controller': u'0', u'type': u'virtio-serial', u'port':
> u'2'}}' found
> Mar 24 19:57:58 bufferoverflow vdsm vm.Vm WARNING
> vmId=`4d3d81b3-d083-4569-acc2-8e631ed51843`::_readPauseCode unsupported by
> libvirt vm
> Mar 24 19:57:58 bufferoverflow kernel: [ 7402.688177] ata1: hard resetting
> link
> Mar 24 19:57:59 bufferoverflow kernel: [ 7402.994510] ata1: SATA link up
> 6.0 Gbps (SStatus 133 SControl 300)
> Mar 24 19:57:59 bufferoverflow kernel: [ 7403.005510] ACPI Error: [DSSP]
> Namespace lookup failure, AE_NOT_FOUND (20120711/psargs-359)
> Mar 24 19:57:59 bufferoverflow kernel: [ 7403.005517] ACPI Error: Method
> parse/execution failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node 880407c74d48),
> AE_NOT_FOUND (20120711/psparse-536)
> Mar 24 19:57:59 bufferoverflow kernel: [ 7403.015485] ACPI Error: [DSSP]
> Namespace lookup failure, AE_NOT_FOUND (20120711/psargs-359)
> Mar 24 19:57:59 bufferoverflow kernel: [ 7403.015493] ACPI Error: Method
> parse/execution failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node 880407c74d48),
> AE_NOT_FOUND (20120711/psparse-536)
> Mar 24 19:57:59 bufferoverflow kernel: [ 7403.016061] ata1.00: configured
> for UDMA/133
> Mar 24 19:57:59 bufferoverflow kernel: [ 7403.016066] ata1: EH complete
> Mar 24 19:58:01 bufferoverflow sanlock[1208]: 2013-03-24 19:58:01+0200 7422
> [4759]: 1083422e close_task_aio 0 

Re: [Users] VM crashes and doesn't recover

2013-03-24 Thread Yuval M
I am running vdsm from packages as my interest is in developing for the
engine and not vdsm.
I updated the vdsm package in an attempt to solve this, now I have:
# rpm -q vdsm
vdsm-4.10.3-10.fc18.x86_64

I noticed that when the storage domain crashes I can't even do "df -h"
(hangs)
I'm also getting some errors in /var/log/messages:

Mar 24 19:57:44 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm
failed [Errno 2] No such file or directory
Mar 24 19:57:45 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm
failed [Errno 2] No such file or directory
Mar 24 19:57:46 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm
failed [Errno 2] No such file or directory
Mar 24 19:57:47 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm
failed [Errno 2] No such file or directory
Mar 24 19:57:48 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm
failed [Errno 2] No such file or directory
Mar 24 19:57:49 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm
failed [Errno 2] No such file or directory
Mar 24 19:57:50 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm
failed [Errno 2] No such file or directory
Mar 24 19:57:51 bufferoverflow sanlock[1208]: 2013-03-24 19:57:51+0200 7412
[4759]: 1083422e close_task_aio 0 0x7ff3740008c0 busy
Mar 24 19:57:51 bufferoverflow sanlock[1208]: 2013-03-24 19:57:51+0200 7412
[4759]: 1083422e close_task_aio 1 0x7ff374000910 busy
Mar 24 19:57:51 bufferoverflow sanlock[1208]: 2013-03-24 19:57:51+0200 7412
[4759]: 1083422e close_task_aio 2 0x7ff374000960 busy
Mar 24 19:57:51 bufferoverflow sanlock[1208]: 2013-03-24 19:57:51+0200 7412
[4759]: 1083422e close_task_aio 3 0x7ff3740009b0 busy
Mar 24 19:57:51 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm
failed [Errno 2] No such file or directory
Mar 24 19:57:52 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm
failed [Errno 2] No such file or directory
Mar 24 19:57:53 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm
failed [Errno 2] No such file or directory
Mar 24 19:57:54 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm
failed [Errno 2] No such file or directory
Mar 24 19:57:55 bufferoverflow vdsm SuperVdsmProxy WARNING Connect to svdsm
failed [Errno 2] No such file or directory
Mar 24 19:57:55 bufferoverflow vdsm Storage.Misc ERROR Panic: Couldn't
connect to supervdsm
Mar 24 19:57:55 bufferoverflow respawn: slave '/usr/share/vdsm/vdsm' died,
respawning slave
Mar 24 19:57:55 bufferoverflow vdsm fileUtils WARNING Dir
/rhev/data-center/mnt already exists
Mar 24 19:57:58 bufferoverflow vdsm vds WARNING Unable to load the json rpc
server module. Please make sure it is installed.
Mar 24 19:57:58 bufferoverflow vdsm vm.Vm WARNING
vmId=`4d3d81b3-d083-4569-acc2-8e631ed51843`::Unknown type found, device:
'{'device': u'unix', 'alias': u'channel0', 'type': u'channel', 'address':
{u'bus': u'0', u'controller': u'0', u'type': u'virtio-serial', u'port':
u'1'}}' found
Mar 24 19:57:58 bufferoverflow vdsm vm.Vm WARNING
vmId=`4d3d81b3-d083-4569-acc2-8e631ed51843`::Unknown type found, device:
'{'device': u'unix', 'alias': u'channel1', 'type': u'channel', 'address':
{u'bus': u'0', u'controller': u'0', u'type': u'virtio-serial', u'port':
u'2'}}' found
Mar 24 19:57:58 bufferoverflow vdsm vm.Vm WARNING
vmId=`4d3d81b3-d083-4569-acc2-8e631ed51843`::_readPauseCode unsupported by
libvirt vm
Mar 24 19:57:58 bufferoverflow kernel: [ 7402.688177] ata1: hard resetting
link
Mar 24 19:57:59 bufferoverflow kernel: [ 7402.994510] ata1: SATA link up
6.0 Gbps (SStatus 133 SControl 300)
Mar 24 19:57:59 bufferoverflow kernel: [ 7403.005510] ACPI Error: [DSSP]
Namespace lookup failure, AE_NOT_FOUND (20120711/psargs-359)
Mar 24 19:57:59 bufferoverflow kernel: [ 7403.005517] ACPI Error: Method
parse/execution failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node 880407c74d48),
AE_NOT_FOUND (20120711/psparse-536)
Mar 24 19:57:59 bufferoverflow kernel: [ 7403.015485] ACPI Error: [DSSP]
Namespace lookup failure, AE_NOT_FOUND (20120711/psargs-359)
Mar 24 19:57:59 bufferoverflow kernel: [ 7403.015493] ACPI Error: Method
parse/execution failed [\_SB_.PCI0.SAT0.SPT0._GTF] (Node 880407c74d48),
AE_NOT_FOUND (20120711/psparse-536)
Mar 24 19:57:59 bufferoverflow kernel: [ 7403.016061] ata1.00: configured
for UDMA/133
Mar 24 19:57:59 bufferoverflow kernel: [ 7403.016066] ata1: EH complete
Mar 24 19:58:01 bufferoverflow sanlock[1208]: 2013-03-24 19:58:01+0200 7422
[4759]: 1083422e close_task_aio 0 0x7ff3740008c0 busy
Mar 24 19:58:01 bufferoverflow sanlock[1208]: 2013-03-24 19:58:01+0200 7422
[4759]: 1083422e close_task_aio 1 0x7ff374000910 busy
Mar 24 19:58:01 bufferoverflow sanlock[1208]: 2013-03-24 19:58:01+0200 7422
[4759]: 1083422e close_task_aio 2 0x7ff374000960 busy
Mar 24 19:58:01 bufferoverflow sanlock[1208]: 2013-03-24 19:58:01+0200 7422
[4759]: 1083422e close_task_aio 3 0x7ff3740009b0 busy
Mar 24 19:58:01 bufferoverflow kernel: [ 7405.714145] device-mapper: table:
253:0: multipath: error getti

Re: [Users] VM crashes and doesn't recover

2013-03-24 Thread Dan Kenigsberg
On Fri, Mar 22, 2013 at 08:24:35PM +0200, Limor Gavish wrote:
> Hello,
> 
> I am using Ovirt 3.2 on Fedora 18:
> [wil@bufferoverflow ~]$ rpm -q vdsm
> vdsm-4.10.3-7.fc18.x86_64
> 
> (the engine is built from sources).
> 
> I seem to have hit this bug:
> https://bugzilla.redhat.com/show_bug.cgi?id=922515

This bug is only one part of the problem, but it's nasty enough that I
have just suggested it as a fix to the ovirt-3.2 branch of vdsm:
http://gerrit.ovirt.org/13303

Could you test if with it, vdsm relinquishes its spm role, and recovers
as operational?

> 
> in the following configuration:
> Single host (no migrations)
> Created a VM, installed an OS inside (Fedora18)
> stopped the VM.
> created template from it.
> Created an additional VM from the template using thin provision.
> Started the second VM.
> 
> in addition to the errors in the logs the storage domains (both data and
> ISO) crashed, i.e went to "unknown" and "inactive" states respectively.
> (see the attached engine.log)
> 
> I attached the VDSM and engine logs.
> 
> is there a way to work around this problem?
> It happens repeatedly.
> 
> Yuval Meir
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [Users] VM crashes and doesn't recover

2013-03-24 Thread Dafna Ron

https://bugzilla.redhat.com/show_bug.cgi?id=890365

try restarting the vdsm service.
you had a problem with the storage and the vdsm did not recover properly.



On 03/24/2013 11:40 AM, Yuval M wrote:
sanlock is at the latest version (this solved another problem we had a 
few days ago):


$ rpm -q sanlock
sanlock-2.6-7.fc18.x86_64

the storage is on the same machine as the engine and vdsm.
iptables is up but there is a rule to allow all localhost traffic.


On Sun, Mar 24, 2013 at 11:34 AM, Maor Lipchuk > wrote:


From the VDSM log, it seems that the master storage domain was not
responding.

Thread-23::DEBUG::2013-03-22

18:50:20,263::domainMonitor::216::Storage.DomainMonitorThread::(_monitorDomain)
Domain 1083422e-a5db-41b6-b667-b9ef1ef244f0 changed its status to
Invalid

Traceback (most recent call last):
  File "/usr/share/vdsm/storage/domainMonitor.py", line 186, in
_monitorDomain
self.domain.selftest()
  File "/usr/share/vdsm/storage/nfsSD.py", line 108, in selftest
fileSD.FileStorageDomain.selftest(self)
  File "/usr/share/vdsm/storage/fileSD.py", line 480, in selftest
self.oop.os.statvfs(self.domaindir)
  File "/usr/share/vdsm/storage/remoteFileHandler.py", line 280, in
callCrabRPCFunction
*args, **kwargs)
  File "/usr/share/vdsm/storage/remoteFileHandler.py", line 180, in
callCrabRPCFunction
rawLength = self._recvAll(LENGTH_STRUCT_LENGTH, timeout)
  File "/usr/share/vdsm/storage/remoteFileHandler.py", line 146,
in _recvAll
raise Timeout()
Timeout
.

I'm also see a san lock issue, but I think that is because the storage
could not be reached:
ReleaseHostIdFailure: Cannot release host id:
('1083422e-a5db-41b6-b667-b9ef1ef244f0', SanlockException(16, 'Sanlock
lockspace remove failure', 'Device or resource busy'))

Can you try to see if the ip tables are running on your host, and
if so,
please check if it is blocking the storage server by any chance?
Can you try to manually mount this NFS and see if it works?
Is it possible the storage server got connectivity issues?


Regards,
Maor

On 03/22/2013 08:24 PM, Limor Gavish wrote:
> Hello,
>
> I am using Ovirt 3.2 on Fedora 18:
> [wil@bufferoverflow ~]$ rpm -q vdsm
> vdsm-4.10.3-7.fc18.x86_64
>
> (the engine is built from sources).
>
> I seem to have hit this bug:
> https://bugzilla.redhat.com/show_bug.cgi?id=922515
>
> in the following configuration:
> Single host (no migrations)
> Created a VM, installed an OS inside (Fedora18)
> stopped the VM.
> created template from it.
> Created an additional VM from the template using thin provision.
> Started the second VM.
>
> in addition to the errors in the logs the storage domains (both
data and
> ISO) crashed, i.e went to "unknown" and "inactive" states
respectively.
> (see the attached engine.log)
>
> I attached the VDSM and engine logs.
>
> is there a way to work around this problem?
> It happens repeatedly.
>
> Yuval Meir
>
>
>
> ___
> Users mailing list
> Users@ovirt.org 
> http://lists.ovirt.org/mailman/listinfo/users
>





___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users



--
Dafna Ron
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [Users] VM crashes and doesn't recover

2013-03-24 Thread Yuval M
sanlock is at the latest version (this solved another problem we had a few
days ago):

$ rpm -q sanlock
sanlock-2.6-7.fc18.x86_64

the storage is on the same machine as the engine and vdsm.
iptables is up but there is a rule to allow all localhost traffic.


On Sun, Mar 24, 2013 at 11:34 AM, Maor Lipchuk  wrote:

> From the VDSM log, it seems that the master storage domain was not
> responding.
>
> Thread-23::DEBUG::2013-03-22
>
> 18:50:20,263::domainMonitor::216::Storage.DomainMonitorThread::(_monitorDomain)
> Domain 1083422e-a5db-41b6-b667-b9ef1ef244f0 changed its status to Invalid
> 
> Traceback (most recent call last):
>   File "/usr/share/vdsm/storage/domainMonitor.py", line 186, in
> _monitorDomain
> self.domain.selftest()
>   File "/usr/share/vdsm/storage/nfsSD.py", line 108, in selftest
> fileSD.FileStorageDomain.selftest(self)
>   File "/usr/share/vdsm/storage/fileSD.py", line 480, in selftest
> self.oop.os.statvfs(self.domaindir)
>   File "/usr/share/vdsm/storage/remoteFileHandler.py", line 280, in
> callCrabRPCFunction
> *args, **kwargs)
>   File "/usr/share/vdsm/storage/remoteFileHandler.py", line 180, in
> callCrabRPCFunction
> rawLength = self._recvAll(LENGTH_STRUCT_LENGTH, timeout)
>   File "/usr/share/vdsm/storage/remoteFileHandler.py", line 146, in
> _recvAll
> raise Timeout()
> Timeout
> .
>
> I'm also see a san lock issue, but I think that is because the storage
> could not be reached:
> ReleaseHostIdFailure: Cannot release host id:
> ('1083422e-a5db-41b6-b667-b9ef1ef244f0', SanlockException(16, 'Sanlock
> lockspace remove failure', 'Device or resource busy'))
>
> Can you try to see if the ip tables are running on your host, and if so,
> please check if it is blocking the storage server by any chance?
> Can you try to manually mount this NFS and see if it works?
> Is it possible the storage server got connectivity issues?
>
>
> Regards,
> Maor
>
> On 03/22/2013 08:24 PM, Limor Gavish wrote:
> > Hello,
> >
> > I am using Ovirt 3.2 on Fedora 18:
> > [wil@bufferoverflow ~]$ rpm -q vdsm
> > vdsm-4.10.3-7.fc18.x86_64
> >
> > (the engine is built from sources).
> >
> > I seem to have hit this bug:
> > https://bugzilla.redhat.com/show_bug.cgi?id=922515
> >
> > in the following configuration:
> > Single host (no migrations)
> > Created a VM, installed an OS inside (Fedora18)
> > stopped the VM.
> > created template from it.
> > Created an additional VM from the template using thin provision.
> > Started the second VM.
> >
> > in addition to the errors in the logs the storage domains (both data and
> > ISO) crashed, i.e went to "unknown" and "inactive" states respectively.
> > (see the attached engine.log)
> >
> > I attached the VDSM and engine logs.
> >
> > is there a way to work around this problem?
> > It happens repeatedly.
> >
> > Yuval Meir
> >
> >
> >
> > ___
> > Users mailing list
> > Users@ovirt.org
> > http://lists.ovirt.org/mailman/listinfo/users
> >
>
>
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [Users] VM crashes and doesn't recover

2013-03-24 Thread Maor Lipchuk
>From the VDSM log, it seems that the master storage domain was not
responding.

Thread-23::DEBUG::2013-03-22
18:50:20,263::domainMonitor::216::Storage.DomainMonitorThread::(_monitorDomain)
Domain 1083422e-a5db-41b6-b667-b9ef1ef244f0 changed its status to Invalid

Traceback (most recent call last):
  File "/usr/share/vdsm/storage/domainMonitor.py", line 186, in
_monitorDomain
self.domain.selftest()
  File "/usr/share/vdsm/storage/nfsSD.py", line 108, in selftest
fileSD.FileStorageDomain.selftest(self)
  File "/usr/share/vdsm/storage/fileSD.py", line 480, in selftest
self.oop.os.statvfs(self.domaindir)
  File "/usr/share/vdsm/storage/remoteFileHandler.py", line 280, in
callCrabRPCFunction
*args, **kwargs)
  File "/usr/share/vdsm/storage/remoteFileHandler.py", line 180, in
callCrabRPCFunction
rawLength = self._recvAll(LENGTH_STRUCT_LENGTH, timeout)
  File "/usr/share/vdsm/storage/remoteFileHandler.py", line 146, in _recvAll
raise Timeout()
Timeout
.

I'm also see a san lock issue, but I think that is because the storage
could not be reached:
ReleaseHostIdFailure: Cannot release host id:
('1083422e-a5db-41b6-b667-b9ef1ef244f0', SanlockException(16, 'Sanlock
lockspace remove failure', 'Device or resource busy'))

Can you try to see if the ip tables are running on your host, and if so,
please check if it is blocking the storage server by any chance?
Can you try to manually mount this NFS and see if it works?
Is it possible the storage server got connectivity issues?


Regards,
Maor

On 03/22/2013 08:24 PM, Limor Gavish wrote:
> Hello,
> 
> I am using Ovirt 3.2 on Fedora 18:
> [wil@bufferoverflow ~]$ rpm -q vdsm
> vdsm-4.10.3-7.fc18.x86_64
> 
> (the engine is built from sources).
> 
> I seem to have hit this bug:
> https://bugzilla.redhat.com/show_bug.cgi?id=922515
> 
> in the following configuration:
> Single host (no migrations)
> Created a VM, installed an OS inside (Fedora18)
> stopped the VM.
> created template from it.
> Created an additional VM from the template using thin provision.
> Started the second VM.
> 
> in addition to the errors in the logs the storage domains (both data and
> ISO) crashed, i.e went to "unknown" and "inactive" states respectively.
> (see the attached engine.log)
> 
> I attached the VDSM and engine logs.
> 
> is there a way to work around this problem?
> It happens repeatedly.
> 
> Yuval Meir
> 
> 
> 
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
> 


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users