On Fri, Mar 22, 2019 at 3:21 PM Marcin Sobczyk <msobc...@redhat.com> wrote:

> Dafna,
>
> in 'verify_add_hosts' we specifically wait for single host to be up with a
> timeout:
>
>  144     up_hosts = hosts_service.list(search='datacenter={} AND 
> status=up'.format(DC_NAME))
>  145     if len(up_hosts):
>  146         return True
>
> The log files say, that it took ~50 secs for one of the hosts to be up
> (seems reasonable) and no timeout is being reported.
> Just after running 'verify_add_hosts', we run 'add_master_storage_domain',
> which calls '_hosts_in_dc' function.
> That function does the exact same check, but it fails:
>
>  113     hosts = hosts_service.list(search='datacenter={} AND 
> status=up'.format(dc_name))
>  114     if hosts:
>  115         if random_host:
>  116             return random.choice(hosts)
>
> I don't think it is relevant to our current failure; but I consider
random_host=True as a bad practice. As if we do not have enough moving
parts, we are adding intentional randomness. Reproducibility is far more
important than coverage - particularly for a shared system test like OST.

>  117         else:
>  118             return sorted(hosts, key=lambda host: host.name)
>  119     raise RuntimeError('Could not find hosts that are up in DC %s' % 
> dc_name)
>
>
> I'm also not able to reproduce this issue locally on my server. The
> investigation continues...
>

I think that it would be fair to take the filtering by host state out of
Engine and into the test, where we can easily log the current status of
each host. Then we'd have better understanding on the next failure.

On 3/22/19 1:17 PM, Marcin Sobczyk wrote:
>
> Hi,
>
> sure, I'm on it - it's weird though, I did ran 4.3 basic suite for this
> patch manually and everything was ok.
> On 3/22/19 1:05 PM, Dafna Ron wrote:
>
> Hi,
>
> We are failing branch 4.3 for test: 002_bootstrap.add_master_storage_domain
>
> It seems that in one of the hosts, the vdsm is not starting
> there is nothing in vdsm.log or in supervdsm.log
>
> CQ identified this patch as the suspected root cause:
>
> https://gerrit.ovirt.org/#/c/98748/ - vdsm: client: Add support for flow
> id
>
> Milan, Marcin, can you please have a look?
>
> full logs:
>
>
> http://jenkins.ovirt.org/job/ovirt-4.3_change-queue-tester/326/artifact/basic-suite.el7.x86_64/test_logs/basic-suite-4.3/post-002_bootstrap.py/
>
> the only error I can see is about host not being up (makes sense as vdsm
> is not running)
>
> Stacktrace
>
>   File "/usr/lib64/python2.7/unittest/case.py", line 369, in run
>     testMethod()
>   File "/usr/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
>     self.test(*self.arg)
>   File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 142, in 
> wrapped_test
>     test()
>   File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line 60, in 
> wrapper
>     return func(get_test_prefix(), *args, **kwargs)
>   File 
> "/home/jenkins/workspace/ovirt-4.3_change-queue-tester/ovirt-system-tests/basic-suite-4.3/test-scenarios/002_bootstrap.py",
>  line 417, in add_master_storage_domain
>     add_iscsi_storage_domain(prefix)
>   File 
> "/home/jenkins/workspace/ovirt-4.3_change-queue-tester/ovirt-system-tests/basic-suite-4.3/test-scenarios/002_bootstrap.py",
>  line 561, in add_iscsi_storage_domain
>     host=_random_host_from_dc(api, DC_NAME),
>   File 
> "/home/jenkins/workspace/ovirt-4.3_change-queue-tester/ovirt-system-tests/basic-suite-4.3/test-scenarios/002_bootstrap.py",
>  line 122, in _random_host_from_dc
>     return _hosts_in_dc(api, dc_name, True)
>   File 
> "/home/jenkins/workspace/ovirt-4.3_change-queue-tester/ovirt-system-tests/basic-suite-4.3/test-scenarios/002_bootstrap.py",
>  line 119, in _hosts_in_dc
>     raise RuntimeError('Could not find hosts that are up in DC %s' % dc_name)
> 'Could not find hosts that are up in DC test-dc\n-------------------- >> 
> begin captured logging << --------------------\nlago.ssh: DEBUG: start 
> task:937bdea7-a2a3-47ad-9383-36647ea37ddf:Get ssh client for 
> lago-basic-suite-4-3-engine:\nlago.ssh: DEBUG: end 
> task:937bdea7-a2a3-47ad-9383-36647ea37ddf:Get ssh client for 
> lago-basic-suite-4-3-engine:\nlago.ssh: DEBUG: Running c07b5ee2 on 
> lago-basic-suite-4-3-engine: cat /root/multipath.txt\nlago.ssh: DEBUG: 
> Command c07b5ee2 on lago-basic-suite-4-3-engine returned with 0\nlago.ssh: 
> DEBUG: Command c07b5ee2 on lago-basic-suite-4-3-engine output:\n 
> 3600140516f88cafa71243648ea218995\n360014053e28f60001764fed9978ec4b3\n360014059edc777770114a6484891dcf1\n36001405d93d8585a50d43a4ad0bd8d19\n36001405e31361631de14bcf87d43e55a\n\n-----------
>
> _______________________________________________
> Devel mailing list -- devel@ovirt.org
> To unsubscribe send an email to devel-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/devel@ovirt.org/message/J4NCHXTK5ZYLXWW36DZKAUL5DN7WBNW4/
>
_______________________________________________
Devel mailing list -- devel@ovirt.org
To unsubscribe send an email to devel-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/ULS4OKU2YZFDQT5EDFYKLW5GFA52YZ7U/

Reply via email to