Re: [CQ]: 87428, 2 (vdsm) failed "ovirt-master" system tests, but isn't the failure root cause

2018-02-27 Thread Eyal Edri
On Tue, Feb 27, 2018 at 2:15 PM, Sandro Bonazzola 
wrote:

>
>
> 2018-02-22 22:42 GMT+01:00 oVirt Jenkins :
>
>> A system test invoked by the "ovirt-master" change queue including change
>> 87428,2 (vdsm) failed. However, this change seems not to be the root
>> cause for
>> this failure. Change 87944,3 (vdsm) that this change depends on or is
>> based on,
>> was detected as the cause of the testing failures.
>>
>> This change had been removed from the testing queue. Artifacts built from
>> this
>> change will not be released until either change 87944,3 (vdsm) is fixed
>> and
>> this change is updated to refer to or rebased on the fixed version, or
>> this
>> change is modified to no longer depend on it.
>>
>> For further details about the change see:
>> https://gerrit.ovirt.org/#/c/87428/2
>>
>> For further details about the change that seems to be the root cause
>> behind the
>> testing failures see:
>> https://gerrit.ovirt.org/#/c/87944/3
>>
>> For failed test results see:
>> http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/5841/
>
>
>
> This fails due to multiple reason.
> http://jenkins.ovirt.org/job/vdsm_master_check-merged-el7-x86_64/3224/
> failed on
>
> *22:23:15* E   OSError: [Errno 24] Too many open files
>
> which seems to be an infra issue on the slave / dirty slave
>


Just FYI, check-merged jobs has nothing to do with OST/CQ results, they are
running functional tests written by VDSM developers.


>
>
> http://jenkins.ovirt.org/job/ovirt-master_change-queue-
> tester/5841/testReport/junit/(root)/002_bootstrap/add_hosts/
>
> Host lago-upgrade-from-release-suite-master-host0 is in non responsive state
>
>
> The host has vdsm failures with:
>
> 2018-02-22 16:34:04,208-0500 ERROR (MainThread) [MOM] MOM's RPC interface is 
> disabled (momIF:50)
> 2018-02-22 16:34:04,208-0500 ERROR (MainThread) [vds] failed to init 
> clientIF, shutting down storage dispatcher (clientIF:148)
> 2018-02-22 16:34:04,208-0500 INFO  (MainThread) [vdsm.api] START 
> prepareForShutdown(options=None) from=internal, 
> task_id=f45a0864-9bd6-4116-add9-a55a05d72909 (api:46)
> 2018-02-22 16:34:04,220-0500 INFO  (MainThread) [storage.Monitor] Shutting 
> down domain monitors (monitor:222)
> 2018-02-22 16:34:04,220-0500 INFO  (MainThread) [storage.check] Stopping 
> check service (check:104)
> 2018-02-22 16:34:04,221-0500 INFO  (check/loop) [storage.asyncevent] Stopping 
>  (asyncevent:220)
> 2018-02-22 16:34:04,221-0500 INFO  (MainThread) [storage.udev] Stopping 
> multipath event listener (udev:149)
> 2018-02-22 16:34:04,221-0500 INFO  (MainThread) [vdsm.api] FINISH 
> prepareForShutdown return=None from=internal, 
> task_id=f45a0864-9bd6-4116-add9-a55a05d72909 (api:52)
> 2018-02-22 16:34:04,222-0500 ERROR (MainThread) [vds] Exception raised 
> (vdsmd:158)
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/site-packages/vdsm/vdsmd.py", line 156, in run
> serve_clients(log)
>   File "/usr/lib/python2.7/site-packages/vdsm/vdsmd.py", line 103, in 
> serve_clients
> cif = clientIF.getInstance(irs, log, scheduler)
>   File "/usr/lib/python2.7/site-packages/vdsm/clientIF.py", line 251, in 
> getInstance
> cls._instance = clientIF(irs, log, scheduler)
>   File "/usr/lib/python2.7/site-packages/vdsm/clientIF.py", line 121, in 
> __init__
> self.mom = MomClient(config.get("mom", "socket_path"))
>   File "/usr/lib/python2.7/site-packages/vdsm/momIF.py", line 51, in __init__
> raise MomNotAvailableError()
> MomNotAvailableError
>
> and failure on mom side:
>
> 2018-02-22 16:34:00,168 - mom - INFO - MOM starting
> 2018-02-22 16:34:00,185 - mom.HostMonitor - INFO - Host Monitor starting
> 2018-02-22 16:34:00,186 - mom - INFO - hypervisor interface vdsmjsonrpcbulk
> 2018-02-22 16:34:00,280 - mom.VdsmRpcBase - ERROR - Cannot connect to VDSM! 
> [Errno 111] Connection refused
> 2018-02-22 16:34:00,280 - mom - ERROR - Failed to initialize MOM threads
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/site-packages/mom/__init__.py", line 29, in run
> hypervisor_iface = self.get_hypervisor_interface()
>   File "/usr/lib/python2.7/site-packages/mom/__init__.py", line 217, in 
> get_hypervisor_interface
> return module.instance(self.config)
>   File 
> "/usr/lib/python2.7/site-packages/mom/HypervisorInterfaces/vdsmjsonrpcbulkInterface.py",
>  line 47, in instance
> return JsonRpcVdsmBulkInterface()
>   File 
> "/usr/lib/python2.7/site-packages/mom/HypervisorInterfaces/vdsmjsonrpcbulkInterface.py",
>  line 29, in __init__
> super(JsonRpcVdsmBulkInterface, self).__init__()
>   File 
> "/usr/lib/python2.7/site-packages/mom/HypervisorInterfaces/vdsmjsonrpcInterface.py",
>  line 41, in __init__
> .orRaise(RuntimeError, 'No connection to VDSM.')
>   File "/usr/lib/python2.7/site-packages/mom/optional.py", line 28, in orRaise
> raise exception(*args, **kwargs)
> RuntimeError: No connection to VDSM.
>
> vdsm upgrade log shows:

Re: [CQ]: 87428, 2 (vdsm) failed "ovirt-master" system tests, but isn't the failure root cause

2018-02-27 Thread Sandro Bonazzola
2018-02-22 22:42 GMT+01:00 oVirt Jenkins :

> A system test invoked by the "ovirt-master" change queue including change
> 87428,2 (vdsm) failed. However, this change seems not to be the root cause
> for
> this failure. Change 87944,3 (vdsm) that this change depends on or is
> based on,
> was detected as the cause of the testing failures.
>
> This change had been removed from the testing queue. Artifacts built from
> this
> change will not be released until either change 87944,3 (vdsm) is fixed and
> this change is updated to refer to or rebased on the fixed version, or this
> change is modified to no longer depend on it.
>
> For further details about the change see:
> https://gerrit.ovirt.org/#/c/87428/2
>
> For further details about the change that seems to be the root cause
> behind the
> testing failures see:
> https://gerrit.ovirt.org/#/c/87944/3
>
> For failed test results see:
> http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/5841/



This fails due to multiple reason.
http://jenkins.ovirt.org/job/vdsm_master_check-merged-el7-x86_64/3224/
failed on

*22:23:15* E   OSError: [Errno 24] Too many open files

which seems to be an infra issue on the slave / dirty slave


http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/5841/testReport/junit/(root)/002_bootstrap/add_hosts/

Host lago-upgrade-from-release-suite-master-host0 is in non responsive state


The host has vdsm failures with:

2018-02-22 16:34:04,208-0500 ERROR (MainThread) [MOM] MOM's RPC
interface is disabled (momIF:50)
2018-02-22 16:34:04,208-0500 ERROR (MainThread) [vds] failed to init
clientIF, shutting down storage dispatcher (clientIF:148)
2018-02-22 16:34:04,208-0500 INFO  (MainThread) [vdsm.api] START
prepareForShutdown(options=None) from=internal,
task_id=f45a0864-9bd6-4116-add9-a55a05d72909 (api:46)
2018-02-22 16:34:04,220-0500 INFO  (MainThread) [storage.Monitor]
Shutting down domain monitors (monitor:222)
2018-02-22 16:34:04,220-0500 INFO  (MainThread) [storage.check]
Stopping check service (check:104)
2018-02-22 16:34:04,221-0500 INFO  (check/loop) [storage.asyncevent]
Stopping 
(asyncevent:220)
2018-02-22 16:34:04,221-0500 INFO  (MainThread) [storage.udev]
Stopping multipath event listener (udev:149)
2018-02-22 16:34:04,221-0500 INFO  (MainThread) [vdsm.api] FINISH
prepareForShutdown return=None from=internal,
task_id=f45a0864-9bd6-4116-add9-a55a05d72909 (api:52)
2018-02-22 16:34:04,222-0500 ERROR (MainThread) [vds] Exception raised
(vdsmd:158)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/vdsmd.py", line 156, in run
serve_clients(log)
  File "/usr/lib/python2.7/site-packages/vdsm/vdsmd.py", line 103, in
serve_clients
cif = clientIF.getInstance(irs, log, scheduler)
  File "/usr/lib/python2.7/site-packages/vdsm/clientIF.py", line 251,
in getInstance
cls._instance = clientIF(irs, log, scheduler)
  File "/usr/lib/python2.7/site-packages/vdsm/clientIF.py", line 121,
in __init__
self.mom = MomClient(config.get("mom", "socket_path"))
  File "/usr/lib/python2.7/site-packages/vdsm/momIF.py", line 51, in __init__
raise MomNotAvailableError()
MomNotAvailableError

and failure on mom side:

2018-02-22 16:34:00,168 - mom - INFO - MOM starting
2018-02-22 16:34:00,185 - mom.HostMonitor - INFO - Host Monitor starting
2018-02-22 16:34:00,186 - mom - INFO - hypervisor interface vdsmjsonrpcbulk
2018-02-22 16:34:00,280 - mom.VdsmRpcBase - ERROR - Cannot connect to
VDSM! [Errno 111] Connection refused
2018-02-22 16:34:00,280 - mom - ERROR - Failed to initialize MOM threads
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/mom/__init__.py", line 29, in run
hypervisor_iface = self.get_hypervisor_interface()
  File "/usr/lib/python2.7/site-packages/mom/__init__.py", line 217,
in get_hypervisor_interface
return module.instance(self.config)
  File 
"/usr/lib/python2.7/site-packages/mom/HypervisorInterfaces/vdsmjsonrpcbulkInterface.py",
line 47, in instance
return JsonRpcVdsmBulkInterface()
  File 
"/usr/lib/python2.7/site-packages/mom/HypervisorInterfaces/vdsmjsonrpcbulkInterface.py",
line 29, in __init__
super(JsonRpcVdsmBulkInterface, self).__init__()
  File 
"/usr/lib/python2.7/site-packages/mom/HypervisorInterfaces/vdsmjsonrpcInterface.py",
line 41, in __init__
.orRaise(RuntimeError, 'No connection to VDSM.')
  File "/usr/lib/python2.7/site-packages/mom/optional.py", line 28, in orRaise
raise exception(*args, **kwargs)
RuntimeError: No connection to VDSM.

vdsm upgrade log shows:

MainThread::DEBUG::2018-02-22
16:31:34,113::libvirtconnection::167::root::(get) trying to connect
libvirt
MainThread::DEBUG::2018-02-22
16:31:34,134::cmdutils::150::root::(exec_cmd) lshw -json -disable usb
-disable pcmcia -disable isapnp -disable ide -disable scsi -disable
dmi -disable memory -disable cpuinfo (cwd None)
MainThread::DEBUG::2018-02-22
16:31:34,242::cmdutils::158::root::(exec_cmd) SUCCESS:  = '';
 = 0

[CQ]: 87428, 2 (vdsm) failed "ovirt-master" system tests, but isn't the failure root cause

2018-02-22 Thread oVirt Jenkins
A system test invoked by the "ovirt-master" change queue including change
87428,2 (vdsm) failed. However, this change seems not to be the root cause for
this failure. Change 87944,3 (vdsm) that this change depends on or is based on,
was detected as the cause of the testing failures.

This change had been removed from the testing queue. Artifacts built from this
change will not be released until either change 87944,3 (vdsm) is fixed and
this change is updated to refer to or rebased on the fixed version, or this
change is modified to no longer depend on it.

For further details about the change see:
https://gerrit.ovirt.org/#/c/87428/2

For further details about the change that seems to be the root cause behind the
testing failures see:
https://gerrit.ovirt.org/#/c/87944/3

For failed test results see:
http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/5841/
___
Infra mailing list
Infra@ovirt.org
http://lists.ovirt.org/mailman/listinfo/infra