On Sun, Jul 8, 2018 at 12:23 PM, Yaniv Kaul <yk...@redhat.com> wrote:

>
>
> On Fri, Jul 6, 2018 at 1:01 PM, Sandro Bonazzola <sbona...@redhat.com>
> wrote:
>
>> https://jenkins.ovirt.org/job/ovirt-system-tests_hc-basic-suite-4.2/326
>>
>> fails on add host test with:
>>
>> Error: The response content type 'text/html; charset=iso-8859-1' isn't the 
>> expected XML
>>
>>
>> Something bad happened during the deployment because the engine complains
>> about an host not included in the cluster:
>>
>> 2018-07-05 21:34:47,768-04 WARN  
>> [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturn] 
>> (DefaultQuartzScheduler6) [3009952a] Could not add brick 
>> 'lago-hc-basic-suite-4-2-host1:/rhs/brick1/engine' to volume 
>> 'c1146520-3bf7-4b81-b31a-7cc5475b6438' - server uuid 
>> '50e37ed8-86f3-4b50-9258-f516169025ea' not found in cluster 
>> '3125aa60-80bb-11e8-a143-00163e24d363'
>>
>>
> In[2] we can see:
> 2018-07-05 22:03:42,975-0400 ERROR (monitor/f6c4ab4) [storage.Monitor]
> Error checking domain f6c4ab4a-005d-4ab7-acda-03810014c841 (monitor:424)
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/monitor.py", line
> 405, in _checkDomainStatus
>     self.domain.selftest()
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 48,
> in __getattr__
>     return getattr(self.getRealDomain(), attrName)
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 51,
> in getRealDomain
>     return self._cache._realProduce(self._sdUUID)
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 134,
> in _realProduce
>     domain = self._findDomain(sdUUID)
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/sdc.py", line 151,
> in _findDomain
>     return findMethod(sdUUID)
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/glusterSD.py", line
> 55, in findDomain
>     return GlusterStorageDomain(GlusterStorageDomain.
> findDomainPath(sdUUID))
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/fileSD.py", line
> 391, in __init__
>     validateFileSystemFeatures(manifest.sdUUID, manifest.mountpoint)
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/fileSD.py", line
> 104, in validateFileSystemFeatures
>     oop.getProcessPool(sdUUID).directTouch(testFilePath)
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/outOfProcess.py",
> line 320, in directTouch
>     ioproc.touch(path, flags, mode)
>   File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line
> 567, in touch
>     self.timeout)
>   File "/usr/lib/python2.7/site-packages/ioprocess/__init__.py", line
> 451, in _sendCommand
>     raise OSError(errcode, errstr)
> OSError: [Errno 30] Read-only file system
>
> And just before that:
>
> 2018-07-05 22:03:33,214-0400 INFO  (libvirt/events) [virt.vm] 
> (vmId='a2f514e6-81ca-4d41-acf9-77cc910f6eaf') abnormal vm stop device 
> ua-c0592bd6-20e6-4dbf-9610-9a35e3f566ab error eother (vm:5116)
> 2018-07-05 22:03:33,214-0400 INFO  (libvirt/events) [virt.vm] 
> (vmId='a2f514e6-81ca-4d41-acf9-77cc910f6eaf') CPU stopped: onIOError (vm:6157)
> 2018-07-05 22:03:33,222-0400 INFO  (libvirt/events) [virt.vm] 
> (vmId='a2f514e6-81ca-4d41-acf9-77cc910f6eaf') CPU stopped: onSuspend (vm:6157)
> 2018-07-05 22:03:33,225-0400 WARN  (libvirt/events) [virt.vm] 
> (vmId='a2f514e6-81ca-4d41-acf9-77cc910f6eaf') device vda reported I/O error 
> (vm:4065)
>
>
> And indeed, @[3]:
>
> [2018-07-05 22:04:38,936] WARNING [utils - 298:publish_to_webhook] - Event 
> push failed to URL: http://hc-engine:80/ovirt-engine/services/glusterevents, 
> Event: {"event": "QUORUM_LOST", "message": {"volume": "vmstore"}, "nodeid": 
> "59bf7956-60a4-4152-9cf9-99fcdccb211f", "ts": 1530842614}, Status: 
> ('Connection aborted.', error(113, 'No route to host'))
>
>
> And we can also see https://bugzilla.redhat.com/show_bug.cgi?id=1595436
> there as well.
>
>
> Sahina, Gobinda, can you please investigate?
>>
>> Ondra, no idea why the engine is returning text/html instead of xml here,
>> can you please check?
>>
>
> Because of the exception[1].
> Y.
>

Thanks Yaniv!

The failure to add hosts is because engine was down due to quorum loss.
I see that HC suite has failed in the past due to similar errors, and even
in the runs that pass there are quorum loss messages (as glusterd is
restarted whenever the host is added). I need to dig into the reason for
quorum loss - if it's the parallel addition of hosts causing it, or
something else. Will update this thread.


> [1] https://jenkins.ovirt.org/job/ovirt-system-tests_hc-
> basic-suite-4.2/326/artifact/exported-artifacts/test_logs/
> hc-basic-suite-4.2/post-002_bootstrap.py/lago-hc-basic-
> suite-4-2-engine/_var_log/ovirt-engine/server.log
> [2] https://jenkins.ovirt.org/job/ovirt-system-tests_hc-
> basic-suite-4.2/326/artifact/exported-artifacts/test_logs/
> hc-basic-suite-4.2/post-002_bootstrap.py/lago-hc-basic-
> suite-4-2-host0/_var_log/vdsm/vdsm.log
> [3] https://jenkins.ovirt.org/job/ovirt-system-tests_hc-
> basic-suite-4.2/326/artifact/exported-artifacts/test_logs/
> hc-basic-suite-4.2/post-002_bootstrap.py/lago-hc-basic-
> suite-4-2-host0/_var_log/glusterfs/events.log
>
>>
>>
>> --
>>
>> SANDRO BONAZZOLA
>>
>> MANAGER, SOFTWARE ENGINEERING, EMEA R&D RHV
>>
>> Red Hat EMEA <https://www.redhat.com/>
>>
>> sbona...@redhat.com
>> <https://red.ht/sig>
>>
>> _______________________________________________
>> Devel mailing list -- devel@ovirt.org
>> To unsubscribe send an email to devel-le...@ovirt.org
>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>> oVirt Code of Conduct: https://www.ovirt.org/communit
>> y/about/community-guidelines/
>> List Archives: https://lists.ovirt.org/archiv
>> es/list/devel@ovirt.org/message/3FSX6M23CN2ZKIBMGUOLKOQ36LNGL4MH/
>>
>>
>
_______________________________________________
Devel mailing list -- devel@ovirt.org
To unsubscribe send an email to devel-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/EUVFVHBWLUKQUCLXCUF7FSPR5ZVT4KLJ/

Reply via email to