[ovirt-users] Re: Lots of storage.MailBox.SpmMailMonitor

2022-01-06 Thread Petr Kyselák
Hello Nir, 
I recently upgrade oVirt engine to 4.4.9 from 4.3.10 (hosts will follow ASAP).
I found out in the vdsm.log same strange messages:
2022-01-06 10:35:41,333+0100 ERROR (mailbox-spm) 
[storage.MailBox.SpmMailMonitor] mailbox 65 checksum failed, not clearing 
mailbox, clearing new mail (data='\xff\xff\xff\xff\...lot of 
data...\x00\x00\x00', checksum=, 
expected='\xbfG\x00\x00') (mailbox:603)
2022-01-06 10:35:41,334+0100 ERROR (mailbox-spm) 
[storage.MailBox.SpmMailMonitor] mailbox 66 checksum failed, not clearing 
mailbox, clearing new mail (data='\x00\x00\x00\x00\...lot of data...\xff\xff', 
checksum=, expected='\x04\xf0\x0b\x00') 
(mailbox:603)

We have 7 iSCSI and 1 NFS (old export domain).

lvscan | grep inbox
  ACTIVE'/dev/8ee251ed-a50b-4235-a279-0829d7e8e9a0/inbox' [128.00 
MiB] inherit
  ACTIVE'/dev/dfd0134d-2d63-432f-af9f-b60aa6e1fefb/inbox' [128.00 
MiB] inherit
  ACTIVE'/dev/0633a601-d73a-4750-8ff6-c893fe064469/inbox' [128.00 
MiB] inherit
  ACTIVE'/dev/47814a07-b6bc-4d1f-b01d-1919c07878a6/inbox' [128.00 
MiB] inherit
  ACTIVE'/dev/1c61030e-91a5-4e17-af37-92e1dada7c19/inbox' [128.00 
MiB] inherit
  ACTIVE'/dev/a74c32e3-ddd5-4f06-9d9c-3ba7aa153d98/inbox' [128.00 
MiB] inherit
  ACTIVE'/dev/333a7e7e-0da9-4db8-b486-fea1f1ee8171/inbox' [128.00 
MiB] inherit

 I am not fully sure which inbox/outbox I should try to clear manually. Can you 
try help me with this please?
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/L7WD2FY25XJCNMB3YMTA4ASKMZGKCDZM/


[ovirt-users] Re: Lots of storage.MailBox.SpmMailMonitor

2018-11-23 Thread Fabrice Bacchella


> Le 22 nov. 2018 à 20:53, Nir Soffer  a écrit :
> 
> On Thu, Nov 22, 2018, 13:43 Fabrice Bacchella   wrote:
> My vdsm log files are huge:
> 
> -rw-r--r--  1 vdsm kvm  1.8G Nov 22 11:32 vdsm.log
> 
> And this is juste half an hour of logs:
> 
> $ head -1 vdsm.log
> 2018-11-22 11:01:12,132+0100 ERROR (mailbox-spm) 
> [storage.MailBox.SpmMailMonitor] mailbox 2 checksum failed, not clearing 
> mailbox, clearing new mail (data='...lots of data', 
> expected='\xa4\x06\x08\x00') (mailbox:612)
> 

> 
> blkdiscard -z /dev/domain-uuid/{inbox,outbox}

This command solved my problem for now. They was too much logs to send. So I 
purged all my logs and will open a bug if this problem come again.___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/RASLCKC3MTFNZOQEDEIJKU32UNZK3VXW/


[ovirt-users] Re: Lots of storage.MailBox.SpmMailMonitor

2018-11-22 Thread Nir Soffer
On Thu, Nov 22, 2018 at 10:17 PM Nir Soffer  wrote:

> On Thu, Nov 22, 2018 at 9:53 PM Nir Soffer  wrote:
>
>> On Thu, Nov 22, 2018, 13:43 Fabrice Bacchella <
>> fabrice.bacche...@icloud.com wrote:
>>
>>> ...

> As first aid fix you can clear the inbox and outbox files like this:
>>>
>>
> Or this less intrusive fix - restart vdsm on the host with host id 2.
>
> To find the host with host id 2, look for this log on all hosts:
>
> [vdsm.api] START connectStoragePool(..., hostID=2, ...
>
> This will clear the mailbox with the bad checksum. Once cleared
> the SPM will stop complaining about the bad checksum.
>

Finally, when you find host 2, we need vdsm log from this host - the log
may help to understand why there was a bad checksum, and why the bad
checksum was not fixed when the next message was sent to the mailbox.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/IPMN377GYONQ7AJUNGJ33VAUSWQDXSVX/


[ovirt-users] Re: Lots of storage.MailBox.SpmMailMonitor

2018-11-22 Thread Nir Soffer
On Thu, Nov 22, 2018 at 9:53 PM Nir Soffer  wrote:

> On Thu, Nov 22, 2018, 13:43 Fabrice Bacchella <
> fabrice.bacche...@icloud.com wrote:
>
>> My vdsm log files are huge:
>>
>> -rw-r--r--  1 vdsm kvm  1.8G Nov 22 11:32 vdsm.log
>>
>> And this is juste half an hour of logs:
>>
>> $ head -1 vdsm.log
>> 2018-11-22 11:01:12,132+0100 ERROR (mailbox-spm)
>> [storage.MailBox.SpmMailMonitor] mailbox 2 checksum failed, not clearing
>> mailbox, clearing new mail (data='...lots of data',
>> expected='\xa4\x06\x08\x00') (mailbox:612)
>>
>
> Are you sure this is the log line? the error from line 612 should  be:
>
> mailbox %s checksum failed, not clearing mailbox, clearing new mail
> (data=%r, checksum=%r, expected=%r)
>
> Please open a bug and attach the interesting part of the log -  from
> starting the SPM
> until the first error was seen, and some errors after that.
>
> It would be useful if you cam share the compressed log somehow.
>
> We also need the contents of the inbox and outbox:
>
> For iSCSI/FC domain, the logical volumes
>
> /dev/domain-uuid/{inbox,outbox}
>
> For NFS/Gluster, the files:
>
> /rhev/data-center/mnt/server:_path/domain-uuid/dom_md/{inbox,outbox}
>
> You can copy them with dd, compress, and attach to the bug.
>
>
>> I just upgraded vdsm:
>> $ rpm -qi vdsm
>> Name: vdsm
>> Version : 4.20.43
>>
>
> This started after the upgrade?
>
> As first aid fix you can clear the inbox and outbox files like this:
>

Or this less intrusive fix - restart vdsm on the host with host id 2.

To find the host with host id 2, look for this log on all hosts:

[vdsm.api] START connectStoragePool(..., hostID=2, ...

This will clear the mailbox with the bad checksum. Once cleared
the SPM will stop complaining about the bad checksum.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/O63V5PKQI3V653ZVWFNNLQD4FYT6FGTW/


[ovirt-users] Re: Lots of storage.MailBox.SpmMailMonitor

2018-11-22 Thread Nir Soffer
On Thu, Nov 22, 2018, 13:43 Fabrice Bacchella  My vdsm log files are huge:
>
> -rw-r--r--  1 vdsm kvm  1.8G Nov 22 11:32 vdsm.log
>
> And this is juste half an hour of logs:
>
> $ head -1 vdsm.log
> 2018-11-22 11:01:12,132+0100 ERROR (mailbox-spm)
> [storage.MailBox.SpmMailMonitor] mailbox 2 checksum failed, not clearing
> mailbox, clearing new mail (data='...lots of data',
> expected='\xa4\x06\x08\x00') (mailbox:612)
>

Are you sure this is the log line? the error from line 612 should  be:

mailbox %s checksum failed, not clearing mailbox, clearing new mail
(data=%r, checksum=%r, expected=%r)

Please open a bug and attach the interesting part of the log -  from
starting the SPM
until the first error was seen, and some errors after that.

It would be useful if you cam share the compressed log somehow.

We also need the contents of the inbox and outbox:

For iSCSI/FC domain, the logical volumes

/dev/domain-uuid/{inbox,outbox}

For NFS/Gluster, the files:

/rhev/data-center/mnt/server:_path/domain-uuid/dom_md/{inbox,outbox}

You can copy them with dd, compress, and attach to the bug.


> I just upgraded vdsm:
> $ rpm -qi vdsm
> Name: vdsm
> Version : 4.20.43
>

This started after the upgrade?

As first aid fix you can clear the inbox and outbox files like this:

1. Stop vdsm on the SPM host

systemctl stop vdsmd

2. Clear the mailboxes

For iSCSI/FC:

blkdiscard -z /dev/domain-uuid/{inbox,outbox}

For NFS/Gluster:

dd if=/dev/zero
of=/rhev/data-center/mnt/server:_path/domain-uuid/dom_md/inbox bs=1M
count=16 oflag=direct conv=fsync
dd if=/dev/zero
of=/rhev/data-center/mnt/server:_path/domain-uuid/dom_md/outbox bs=1M
count=16 oflag=direct conv=fsync

3. Start vdsm on the SPM host

systemctl start vdsmd

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/5RPWNDSFKAOD2RJ7736WS6SI4DD4SPO4/