[ovirt-devel] OST failed during test_003_00_metrics_bootstrap

2021-04-06 Thread Eyal Shenitzky
Hi all,

OST failed to run due to the following error in test_003_00_metrics_bootstrap
-

ost_utils.ansible.module_mappers.AnsibleExecutionError: Error running
ansible: rc=2, stdout=/usr/lib/python3.6/site-packages/requests/__init__.py:91:
RequestsDependencyWarning: urllib3 (1.26.4) or chardet (3.0.4) doesn't
match a supported version!
  RequestsDependencyWarning)
lago-basic-suite-master-engine | FAILED | rc=1 >>
This command will collect system configuration and diagnostic
information from this system.
The generated archive may contain data considered sensitive and its
content should be reviewed by the originating organization before
being passed to any third party.
No changes will be made to system configuration.
Use the -h option to see usage.
DEBUG: Configuration:
DEBUG: command: collect
DEBUG: Traceback (most recent call last):
DEBUG:   File 
"/usr/lib/python3.6/site-packages/ovirt_log_collector/__main__.py",
line 2067, in 
DEBUG: '%s directory is not empty.' % (conf["local_tmp_dir"])
DEBUG: Exception: /dev/shm/iSX3ZN directory is not empty.ERROR:
/dev/shm/iSX3ZN directory is not empty.non-zero return code

Stacktrace

ansible_engine = 
ansible_hosts = 

def test_metrics_and_log_collector(ansible_engine, ansible_hosts):
vt = utils.VectorThread(
[
functools.partial(configure_metrics, ansible_engine,
  ansible_hosts),
functools.partial(run_log_collector, ansible_engine),
],
)
vt.start_all()
>   vt.join_all()

basic-suite-master/test-scenarios/test_003_00_metrics_bootstrap.py:96:



Logs can be found in -
https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/16188/testReport/junit/basic-suite-master.test-scenarios/test_003_00_metrics_bootstrap/test_metrics_and_log_collector/



Can someone have a look?


-- 
Regards,
Eyal Shenitzky
___
Devel mailing list -- devel@ovirt.org
To unsubscribe send an email to devel-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/LNO63CJBAT42GHZY24TDS6GVO5TWTIVO/


[ovirt-devel] Re: [oVirt Jenkins] ovirt-system-tests_he-basic-suite-master - Build # 1974 - Still Failing!

2021-04-06 Thread Marcin Sobczyk



On 4/6/21 9:55 AM, Yedidyah Bar David wrote:

On Tue, Apr 6, 2021 at 9:24 AM Marcin Sobczyk  wrote:

Hi,

On 4/6/21 7:23 AM, Yedidyah Bar David wrote:

On Mon, Apr 5, 2021 at 5:53 AM  wrote:

Project: https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/
Build: 
https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/1974/

FYI: This failed twice in a row (1973 and 1974), for the same reason.
I reproduced locally, looked a bit, failed to find the root cause.
When I connected
to host-1's console, it was stuck in emergency after reboot. I checked
a bit, there
was some error about kdump failing to read the kernel image
( /boot/vmlinuz-4.18.0-240.15.1.el8_3.x86_64 ), when I tried manually
as root I did
manage to read it. I rebooted, and the VM came up fine. I decided to
try OST again,
cleaned up and ran it, and opened a 'lago console' on the vm after it
was up, but
OST passed. Tried again, passed again. Then I manually ran in CI 1975
and it passed,
and also the nightly 1976 passed. So I am going to ignore for now.

I think we need a patch to make lago/OST log consoles of all the VMs.
I might try
to work on this.

Also stumbled upon this. Please take a look at
https://gerrit.ovirt.org/#/c/ovirt-system-tests/+/114050/

Yes, I did notice this change and wondered if it's related...

But it's not merged yet, and still HE passed at least 4 times (two locally,
two on CI). Obviously this does not prove that the issue is fixed.

Anyway, in addition to merely fixing it (which perhaps your patch does),
I also wanted to emphasize the importance of making it easier to fix
future such cases. How did you manage to find the root cause?

My case was similar - HE suite was failing for me constantly. I noticed
host-1 drops to emergency shell, so I just 'virsh console'd inside
and went through the logs. That's when I spotted the problem with
the additional '/var/tmp' disk. I tried the fix on my machine and HE
suite started working again. Moments later I tried running HE suite
without the patch and it was successful again.

I couldn't figure out what's the real cause behind these problems,
but removing the unnecessary additional disk from host-1 seemed
to do the trick.

+1 for logging consoles of the VMs - that should help with these kind
of problems in the future.

Regards, Marcin



Best regards,

___
Devel mailing list -- devel@ovirt.org
To unsubscribe send an email to devel-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/3H2HXEGUTWYV23EL7QT6NJETCLHN6MWG/


[ovirt-devel] Re: [oVirt Jenkins] ovirt-system-tests_he-basic-suite-master - Build # 1974 - Still Failing!

2021-04-06 Thread Yedidyah Bar David
On Tue, Apr 6, 2021 at 9:24 AM Marcin Sobczyk  wrote:
>
> Hi,
>
> On 4/6/21 7:23 AM, Yedidyah Bar David wrote:
> > On Mon, Apr 5, 2021 at 5:53 AM  wrote:
> >> Project: 
> >> https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/
> >> Build: 
> >> https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/1974/
> > FYI: This failed twice in a row (1973 and 1974), for the same reason.
> > I reproduced locally, looked a bit, failed to find the root cause.
> > When I connected
> > to host-1's console, it was stuck in emergency after reboot. I checked
> > a bit, there
> > was some error about kdump failing to read the kernel image
> > ( /boot/vmlinuz-4.18.0-240.15.1.el8_3.x86_64 ), when I tried manually
> > as root I did
> > manage to read it. I rebooted, and the VM came up fine. I decided to
> > try OST again,
> > cleaned up and ran it, and opened a 'lago console' on the vm after it
> > was up, but
> > OST passed. Tried again, passed again. Then I manually ran in CI 1975
> > and it passed,
> > and also the nightly 1976 passed. So I am going to ignore for now.
> >
> > I think we need a patch to make lago/OST log consoles of all the VMs.
> > I might try
> > to work on this.
> Also stumbled upon this. Please take a look at
> https://gerrit.ovirt.org/#/c/ovirt-system-tests/+/114050/

Yes, I did notice this change and wondered if it's related...

But it's not merged yet, and still HE passed at least 4 times (two locally,
two on CI). Obviously this does not prove that the issue is fixed.

Anyway, in addition to merely fixing it (which perhaps your patch does),
I also wanted to emphasize the importance of making it easier to fix
future such cases. How did you manage to find the root cause?

Best regards,
-- 
Didi
___
Devel mailing list -- devel@ovirt.org
To unsubscribe send an email to devel-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/U6I3ZGEPJF4PA2NRMP7HKNMAQIL5REV2/


[ovirt-devel] Re: [oVirt Jenkins] ovirt-system-tests_he-basic-suite-master - Build # 1974 - Still Failing!

2021-04-06 Thread Marcin Sobczyk

Hi,

On 4/6/21 7:23 AM, Yedidyah Bar David wrote:

On Mon, Apr 5, 2021 at 5:53 AM  wrote:

Project: https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/
Build: 
https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/1974/

FYI: This failed twice in a row (1973 and 1974), for the same reason.
I reproduced locally, looked a bit, failed to find the root cause.
When I connected
to host-1's console, it was stuck in emergency after reboot. I checked
a bit, there
was some error about kdump failing to read the kernel image
( /boot/vmlinuz-4.18.0-240.15.1.el8_3.x86_64 ), when I tried manually
as root I did
manage to read it. I rebooted, and the VM came up fine. I decided to
try OST again,
cleaned up and ran it, and opened a 'lago console' on the vm after it
was up, but
OST passed. Tried again, passed again. Then I manually ran in CI 1975
and it passed,
and also the nightly 1976 passed. So I am going to ignore for now.

I think we need a patch to make lago/OST log consoles of all the VMs.
I might try
to work on this.
Also stumbled upon this. Please take a look at 
https://gerrit.ovirt.org/#/c/ovirt-system-tests/+/114050/


Regards, Marcin



Best regards,

___
Devel mailing list -- devel@ovirt.org
To unsubscribe send an email to devel-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/T6STZLXWV3QG2IN2QZ43XZ5QAKXSRW4L/