On 3 April 2018 at 17:43, Dan Kenigsberg <dan...@redhat.com> wrote: > On Tue, Apr 3, 2018 at 3:57 PM, Piotr Kliczewski <pklic...@redhat.com> wrote: >> Dan, >> >> It looks like it was one of the calls triggered when vdsm was down: >> >> 2018-04-03 05:30:16,065-0400 INFO (mailbox-hsm) >> [storage.MailBox.HsmMailMonitor] HSM_MailMonitor sending mail to SPM - >> ['/usr/bin/dd', >> 'of=/rhev/data-center/ddb765d2-2137-437d-95f8-c46dbdbc7711/mastersd/dom_md/inbox', >> 'iflag=fullblock', 'oflag=direct', 'conv=notrunc', 'bs=4096', 'count=1', >> 'seek=1'] (mailbox:387) >> 2018-04-03 05:31:22,441-0400 INFO (MainThread) [vds] (PID: 20548) I am the >> actual vdsm 4.20.23-28.gitd11ed44.el7.centos lago-basic-suite-4-2-host-0 >> (3.10.0-693.21.1.el7.x86_64) (vdsmd:149) >> >> >> which failed and caused timeout. >> >> Thanks, >> Piotr >> >> On Tue, Apr 3, 2018 at 1:57 PM, Dan Kenigsberg <dan...@redhat.com> wrote: >>> >>> On Tue, Apr 3, 2018 at 2:07 PM, Barak Korren <bkor...@redhat.com> wrote: >>> > Test failed: [ 006_migrations.prepare_migration_attachments_ipv6 ] >>> > >>> > Link to suspected patches: >>> > >>> > (Patch seems unrelated - do we have sporadic communication issues >>> > arising in PST?) >>> > https://gerrit.ovirt.org/c/89737/1 - vdsm - automation: check-patch: >>> > attempt to install vdsm-gluster >>> > >>> > Link to Job: >>> > http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1521/ >>> > >>> > Link to all logs: >>> > >>> > http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1521/artifact/exported-artifacts/basic-suit-4.2-el7/test_logs/basic-suite-4.2/post-006_migrations.py/ >>> > >>> > Error snippet from log: >>> > >>> > <error> >>> > >>> > Traceback (most recent call last): >>> > File "/usr/lib64/python2.7/unittest/case.py", line 369, in run >>> > testMethod() >>> > File "/usr/lib/python2.7/site-packages/nose/case.py", line 197, in >>> > runTest >>> > self.test(*self.arg) >>> > File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line >>> > 129, in wrapped_test >>> > test() >>> > File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line >>> > 59, in wrapper >>> > return func(get_test_prefix(), *args, **kwargs) >>> > File "/usr/lib/python2.7/site-packages/ovirtlago/testlib.py", line >>> > 78, in wrapper >>> > prefix.virt_env.engine_vm().get_api(api_ver=4), *args, **kwargs >>> > File >>> > "/home/jenkins/workspace/ovirt-4.2_change-queue-tester/ovirt-system-tests/basic-suite-4.2/test-scenarios/006_migrations.py", >>> > line 139, in prepare_migration_attachments_ipv6 >>> > engine, host_service, MIGRATION_NETWORK, ip_configuration) >>> > File >>> > "/home/jenkins/workspace/ovirt-4.2_change-queue-tester/ovirt-system-tests/basic-suite-4.2/test_utils/network_utils_v4.py", >>> > line 71, in modify_ip_config >>> > check_connectivity=True) >>> > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/services.py", >>> > line 36729, in setup_networks >>> > return self._internal_action(action, 'setupnetworks', None, >>> > headers, query, wait) >>> > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line >>> > 299, in _internal_action >>> > return future.wait() if wait else future >>> > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line >>> > 55, in wait >>> > return self._code(response) >>> > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line >>> > 296, in callback >>> > self._check_fault(response) >>> > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line >>> > 132, in _check_fault >>> > self._raise_error(response, body) >>> > File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line >>> > 118, in _raise_error >>> > raise error >>> > Error: Fault reason is "Operation Failed". Fault detail is "[Network >>> > error during communication with the Host.]". HTTP response code is >>> > 400. >>> >>> The error occurred sometime in the interval >>> >>> 09:32:58 [basic-suit] @ Run test: 006_migrations.py: >>> 09:33:55 [basic-suit] Error occured, aborting >>> >>> and indeed >>> >>> >>> http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1521/artifact/exported-artifacts/basic-suit-4.2-el7/test_logs/basic-suite-4.2/post-006_migrations.py/lago-basic-suite-4-2-engine/_var_log/ovirt-engine/engine.log/*view*/ >>> >>> has Engine disconnected from the host at >>> >>> 2018-04-03 05:33:32,307-04 ERROR >>> [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] >>> (EE-ManagedThreadFactory-engineScheduled-Thread-39) [] Unable to >>> RefreshCapabilities: VDSNetworkException: VDSGenericException: >>> VDSNetworkException: Vds timeout occured >>> >>> Maybe Piotr can read more into it. > > I should have thought of a down vdsm; but it was down because what > seems to be soft fencing > > http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1521/artifact/exported-artifacts/basic-suit-4.2-el7/test_logs/basic-suite-4.2/post-006_migrations.py/lago-basic-suite-4-2-host-0/_var_log/messages/*view*/ > > Apr 3 05:30:01 lago-basic-suite-4-2-host-0 systemd: Started Session > 46 of user root. > Apr 3 05:30:01 lago-basic-suite-4-2-host-0 systemd: Starting Session > 46 of user root. > Apr 3 05:30:07 lago-basic-suite-4-2-host-0 systemd: Stopped MOM > instance configured for VDSM purposes. > Apr 3 05:30:07 lago-basic-suite-4-2-host-0 systemd: Stopping Virtual > Desktop Server Manager... > Apr 3 05:30:16 lago-basic-suite-4-2-host-0 kernel: > scsi_verify_blk_ioctl: 33 callbacks suppressed > Apr 3 05:30:16 lago-basic-suite-4-2-host-0 kernel: dd: sending ioctl > 80306d02 to a partition! > Apr 3 05:30:17 lago-basic-suite-4-2-host-0 systemd: vdsmd.service > stop-sigterm timed out. Killing.
This failure looks like another instance of the same issue: http://jenkins.ovirt.org/job/ovirt-4.2_change-queue-tester/1525/ -- Barak Korren RHV DevOps team , RHCE, RHCi Red Hat EMEA redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted _______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel