On Sun, Mar 7, 2021 at 5:12 AM <jenk...@jenkins.phx.ovirt.org> wrote:
>
> Project: 
> https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/
> Build: 
> https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/1943/

Failed during 
-basic-suite-master.test-scenarios.test_004_basic_sanity.test_ha_recovery
due to timeout. Can't tell exactly why. Now rebuilt, to see if it
happens again. Basic suite did pass this point with (mostly) the same
code, and there shouldn't be anything HE-specific in this test. FYI.

Some more details:

The function is:

@order_by(_TEST_LIST)
def test_ha_recovery(engine_api, get_ansible_host_for_vm):
    engine = engine_api.system_service()
    with engine_utils.wait_for_event(engine, [119, 9602, 506]):
        # VM_DOWN_ERROR event(119)
        # HA_VM_FAILED event event(9602)
        # VDS_INITIATED_RUN_VM event(506)
        ansible_host = get_ansible_host_for_vm(VM2_NAME)
        pid = ansible_host.shell('pgrep -f qemu.*guest=vm2')['stdout'].strip()
        ansible_host.shell('kill -KILL {}'.format(pid))

    vm_service = test_utils.get_vm_service(engine, VM2_NAME)
    assertions.assert_true_within_long(
        lambda:
        vm_service.get().status == types.VmStatus.UP
    )
    with engine_utils.wait_for_event(engine, 33): # USER_STOP_VM event
        vm_service.stop()

It failed inside the 'with' statement. Can't find any of the expected
events in engine.log. In last successful I see all three of them.

We do not actively log anything about running the pgrep command, but
it's logged by ansible on the host, to /var/log/messages. I see it in
last successful, but not in above failed run, in any of the two hosts.
So most likely it didn't even get to run pgrep.

IMO we probably want some more information logged in such a case, to
allow further investigation.

Best regards,

> Build Number: 1943
> Build Status:  Still Failing
> Triggered By: Started by timer
>
> -------------------------------------
> Changes Since Last Success:
> -------------------------------------
> Changes for Build #1939
> [hbraha] ost: Check if user custom repos are used on VM
>
>
> Changes for Build #1940
> [hbraha] network: add check_repos fixture
>
>
> Changes for Build #1941
> [Andrej Cernek] network, pylint: Simplify pylintrc file
>
>
> Changes for Build #1942
> [Andrej Cernek] network, pylint: Simplify pylintrc file
>
>
> Changes for Build #1943
> [Andrej Cernek] network, pylint: Simplify pylintrc file
>
>
>
>
> -----------------
> Failed Tests:
> -----------------
> 1 tests failed.
> FAILED:  
> he-basic-suite-master.test-scenarios.test_004_basic_sanity.test_ha_recovery
>
> Error Message:
> AssertionError: False != True after 600 seconds
>
> Stack Trace:
> engine_api = <ovirtsdk4.Connection object at 0x7f408114c1d0>
> get_ansible_host_for_vm = <function 
> get_ansible_host_for_vm.<locals>.ansible_host_for at 0x7f4081171950>
>
>     @order_by(_TEST_LIST)
>     def test_ha_recovery(engine_api, get_ansible_host_for_vm):
>         engine = engine_api.system_service()
>         with engine_utils.wait_for_event(engine, [119, 9602, 506]):
>             # VM_DOWN_ERROR event(119)
>             # HA_VM_FAILED event event(9602)
>             # VDS_INITIATED_RUN_VM event(506)
>             ansible_host = get_ansible_host_for_vm(VM2_NAME)
>             pid = ansible_host.shell('pgrep -f 
> qemu.*guest=vm2')['stdout'].strip()
> >           ansible_host.shell('kill -KILL {}'.format(pid))
>
> ../he-basic-suite-master/test-scenarios/test_004_basic_sanity.py:1070:
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _
> /usr/lib64/python3.6/contextlib.py:88: in __exit__
>     next(self.gen)
> ../ost_utils/ost_utils/engine_utils.py:42: in wait_for_event
>     lambda:
> ../ost_utils/ost_utils/assertions.py:96: in assert_true_within_long
>     assert_equals_within_long(func, True, allowed_exceptions)
> ../ost_utils/ost_utils/assertions.py:81: in assert_equals_within_long
>     func, value, LONG_TIMEOUT, allowed_exceptions=allowed_exceptions
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _
>
> func = <function wait_for_event.<locals>.<lambda> at 0x7f40810d5bf8>
> value = True, timeout = 600, allowed_exceptions = [], initial_wait = 0
> error_message = 'False != True after 600 seconds'
>
>     def assert_equals_within(
>         func, value, timeout, allowed_exceptions=None, initial_wait=10,
>         error_message=None
>     ):
>         allowed_exceptions = allowed_exceptions or []
>         res = '<no-result-obtained>'
>         with _EggTimer(timeout) as timer:
>             while not timer.elapsed():
>                 try:
>                     res = func()
>                     if res == value:
>                         return
>                 except Exception as exc:
>                     if _instance_of_any(exc, allowed_exceptions):
>                         time.sleep(3)
>                         continue
>
>                     LOGGER.exception("Unhandled exception in %s", func)
>                     raise
>
>                 if initial_wait == 0:
>                     time.sleep(3)
>                 else:
>                     time.sleep(initial_wait)
>                     initial_wait = 0
>         try:
>             if error_message is None:
>                 error_message = '%s != %s after %s seconds' % (res, value, 
> timeout)
> >           raise AssertionError(error_message)
> E           AssertionError: False != True after 600 seconds
>
> ../ost_utils/ost_utils/assertions.py:59: AssertionError



-- 
Didi
_______________________________________________
Devel mailing list -- devel@ovirt.org
To unsubscribe send an email to devel-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/WSJ6W6VEOFLWJWG6SG4MTCQVQLKCIA3N/

Reply via email to