On Wed, Mar 17, 2021 at 5:38 PM Yedidyah Bar David <d...@redhat.com> wrote: > > On Wed, Mar 17, 2021 at 4:48 PM Martin Perina <mper...@redhat.com> wrote: > > > > > > > > On Wed, Mar 17, 2021 at 3:24 PM Michal Skrivanek > > <michal.skriva...@redhat.com> wrote: > >> > >> > >> > >> On 17. 3. 2021, at 13:53, Dana Elfassy <delfa...@redhat.com> wrote: > >> > >> Adding +Marcin Sobczyk > >> > >> On Mon, Mar 15, 2021 at 9:59 AM Yedidyah Bar David <d...@redhat.com> wrote: > >>> > >>> On Mon, Mar 15, 2021 at 7:55 AM Yedidyah Bar David <d...@redhat.com> > >>> wrote: > >>> > > >>> > Hi all, > >>> > > >>> > This started a few days ago [1] and randomly happens since then: > >>> > > >>> > E DEBUG: Configuration: > >>> > E DEBUG: command: collect > >>> > E DEBUG: Traceback (most recent call last): > >>> > E DEBUG: File > >>> > "/usr/lib/python3.6/site-packages/ovirt_log_collector/__main__.py", > >>> > line 2067, in <module> > >>> > E DEBUG: '%s directory is not empty.' % > >>> > (conf["local_tmp_dir"]) > >>> > E DEBUG: Exception: /dev/shm/log directory is not > >>> > empty.ERROR: /dev/shm/log directory is not empty.non-zero return code > >>> > > >>> > Michal tried to fix this by using a random directory but it still fails > >>> > [2]: > >>> > > >>> > DEBUG: command: collect > >>> > DEBUG: Traceback (most recent call last): > >>> > DEBUG: File > >>> > "/usr/lib/python3.6/site-packages/ovirt_log_collector/__main__.py", > >>> > line 2067, in <module> > >>> > DEBUG: '%s directory is not empty.' % (conf["local_tmp_dir"]) > >>> > DEBUG: Exception: /dev/shm/kaN7uY directory is not empty.ERROR: > >>> > /dev/shm/kaN7uY directory is not empty.non-zero return code > >>> > > >>> > Since I suppose that the randomness of mktemp is good enough, it must > >>> > be something else. Also, the last successful run before [1] used the > >>> > same OST git commit (same code), so I do not think it's something in > >>> > OST's code. > >>> > > >>> > Any idea? > >>> > > >>> > I think I'll push a patch to create and use the directory right before > >>> > calling ovirt-log-collector, which is probably better in other ways. > >>> > >>> My patch [1] still fails, with a somewhat different error message, but > >>> this made me check further, and while I still do not understand, I have > >>> this to add: > >>> > >>> In the failing runs, ovirt-log-collector is called *twice* in parallel. > >>> E.g. > >>> in [2] (the check-patch of [1]): > >>> > >>> Mar 15 07:38:59 lago-basic-suite-master-engine platform-python[59099]: > >>> ansible-command Invoked with _raw_params=lctmp=$(mktemp -d -p > >>> /dev/shm); ovirt-log-collector --verbose --batch --no-hypervisors > >>> --local-tmp="${lctmp}" --conf-file=/root/ovirt-log-collector.conf > >>> _uses_shell=True warn=True stdin_add_newline=True > >>> strip_empty_ends=True argv=None chdir=None executable=None > >>> creates=None removes=None stdin=None > >>> Mar 15 07:38:59 lago-basic-suite-master-engine platform-python[59124]: > >>> ansible-command Invoked with _raw_params=lctmp=$(mktemp -d -p > >>> /dev/shm); ovirt-log-collector --verbose --batch --no-hypervisors > >>> --local-tmp="${lctmp}" --conf-file=/root/ovirt-log-collector.conf > >>> _uses_shell=True warn=True stdin_add_newline=True > >>> strip_empty_ends=True argv=None chdir=None executable=None > >>> creates=None removes=None stdin=None > >>> > >>> It also generates two logs, which you can check/compare. > >>> > >>> It's the same for previous ones, e.g. latest nightly [3][4]: > >>> > >>> Mar 15 06:23:30 lago-basic-suite-master-engine platform-python[59343]: > >>> ansible-command Invoked with _raw_params=ovirt-log-collector --verbose > >>> --batch --no-hypervisors --conf-file=/root/ovirt-log-collector.conf > >>> _uses_shell=True warn=True stdin_add_newline=True > >>> strip_empty_ends=True argv=None chdir=None executable=None > >>> creates=None removes=None stdin=None > >>> Mar 15 06:23:30 lago-basic-suite-master-engine setroubleshoot[58889]: > >>> SELinux is preventing /usr/lib/systemd/systemd from unlink access on > >>> the sock_file ansible-ssh-lago-basic-suite-master-host-1-22-root. For > >>> complete SELinux messages run: sealert -l > >>> d03a8655-9430-4fcf-9892-3b4df1939899 > >>> Mar 15 06:23:30 lago-basic-suite-master-engine setroubleshoot[58889]: > >>> SELinux is preventing /usr/lib/systemd/systemd from unlink access on > >>> the sock_file > >>> ansible-ssh-lago-basic-suite-master-host-1-22-root.#012#012***** > >>> Plugin catchall (100. confidence) suggests > >>> **************************#012#012If you believe that systemd should > >>> be allowed unlink access on the > >>> ansible-ssh-lago-basic-suite-master-host-1-22-root sock_file by > >>> default.#012Then you should report this as a bug.#012You can generate > >>> a local policy module to allow this access.#012Do#012allow this access > >>> for now by executing:#012# ausearch -c 'systemd' --raw | audit2allow > >>> -M my-systemd#012# semodule -X 300 -i my-systemd.pp#012 > >>> Mar 15 06:23:30 lago-basic-suite-master-engine platform-python[59361]: > >>> ansible-command Invoked with _raw_params=ovirt-log-collector --verbose > >>> --batch --no-hypervisors --conf-file=/root/ovirt-log-collector.conf > >>> _uses_shell=True warn=True stdin_add_newline=True > >>> strip_empty_ends=True argv=None chdir=None executable=None > >>> creates=None removes=None stdin=None > >>> > >>> Any idea what might have caused this to start happening? Perhaps > >>> a bug in ansible, or ansible-runner? It reminds me of [5]. > >>> Adding Dana and Martin. > >>> > >>> I think [5] is quite a serious bug, btw, should be a 4.4.5 blocker. > >> > >> > >> it’s from January and tehre are no comments there. Dana, any update? > >> It does look serious but perhaps not really hit in real world scenarios? > > > > > > It's reproducible outside OST, Dana is investigating intensively last 2 > > weeks, but so far we haven't been able to find where events are multiplied. > > But AFAIK this issue is visble only with ansible-runner-service, > > respectively only in the part when engine fetches events from ARS and create > > audit_log records from them. It doesn't mean, that playbook tasks are > > executed multiple times! > > Are you sure? Please check also bz 1930565. I closed it as a > duplicate, but if you are certain that tasks are never executed > multiple times, then we should reopen. I looked at the attached logs, > and they were hard to follow (due to the duplication), but at least I > didn't notice there any bug around imgbased. > > > OST is using ansible-runner directly, so it doesn't seem to me like the > > same issue > > Indeed. That's also what I currently think myself. It seems like a bug > in OST. Still didn't manage to find the root cause - I am pushing my > debugging changes to [1], for anyone that wants to follow, and: > > 1. run_tc basic-suite-master/test-scenarios/test_003_01_dummy.py > > ~/run_tc/run-tc-$(date +%Y-%m-%d-%H%M%S).log 2>&1 > > 2. cat $(ls -t1 run-tc* | head -1) | grep -Ev > 'do_something_2.*something 2|do_something_1.*something 1' | grep -v > 'Start' > > 3. Pick a line from the output (there are always a few, among 100 > runs), check its timestamp, then look at pytest.log and search for > (currently): > > $timestamp|something..|_run_ansible_runner|_run_ansible_runner.*before > run|_run_ansible_runner.*after run|Starting > functools.partial|.*test_dummy: Starting iteration.*|ModuleArgsMapper > > Hope to update tomorrow. For now, we can merge the workaround patch to > run them serially, to unbreak basic-suite at this point.
OK. I now pushed two alternative patches that fix the bug. Please pick the one you like and merge. I tested both of them on [1] and both seem ok. shorter fix: https://gerrit.ovirt.org/c/ovirt-system-tests/+/113937 clearer fix, imo: https://gerrit.ovirt.org/c/ovirt-system-tests/+/113938 [1] https://github.com/didib/ovirt-system-tests/commits/add-test_003_01_dummy > > [1] https://github.com/didib/ovirt-system-tests/commits/ansible-logs > > > > > So [5] is not a blocker, it's just really annoying if you need to works > > with audit_log records > > > >> > >>> > >>> Best regards, > >>> > >>> [1] https://gerrit.ovirt.org/c/ovirt-system-tests/+/113875 > >>> > >>> [2] > >>> https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/15980/artifact/check-patch.basic_suite_master.el8.x86_64/test_logs/lago-basic-suite-master-engine/var/log/messages/*view* > >>> > >>> [3] > >>> https://jenkins.ovirt.org/job/ovirt-system-tests_basic-suite-master_nightly/959/ > >>> > >>> [4] > >>> https://jenkins.ovirt.org/job/ovirt-system-tests_basic-suite-master_nightly/959/artifact/exported-artifacts/test_logs/lago-basic-suite-master-engine/var/log/messages/*view* > >>> > >>> [5] https://bugzilla.redhat.com/show_bug.cgi?id=1917707 > >>> > >>> > > >>> > Best regards, > >>> > > >>> > [1] > >>> > https://jenkins.ovirt.org/job/ovirt-system-tests_basic-suite-master_nightly/949/ > >>> > > >>> > [2] > >>> > https://jenkins.ovirt.org/job/ovirt-system-tests_basic-suite-master_nightly/959/ > >>> > > >>> > > >>> > -- > >>> > Didi > >>> > >>> > >>> > >>> -- > >>> Didi > >>> > >> _______________________________________________ > >> Infra mailing list -- in...@ovirt.org > >> To unsubscribe send an email to infra-le...@ovirt.org > >> Privacy Statement: https://www.ovirt.org/privacy-policy.html > >> oVirt Code of Conduct: > >> https://www.ovirt.org/community/about/community-guidelines/ > >> List Archives: > >> https://lists.ovirt.org/archives/list/in...@ovirt.org/message/PSO5IRX73FSH2RA5VCDNSPIFZRDDAWCQ/ > >> > >> > >> _______________________________________________ > >> Devel mailing list -- devel@ovirt.org > >> To unsubscribe send an email to devel-le...@ovirt.org > >> Privacy Statement: https://www.ovirt.org/privacy-policy.html > >> oVirt Code of Conduct: > >> https://www.ovirt.org/community/about/community-guidelines/ > >> List Archives: > >> https://lists.ovirt.org/archives/list/devel@ovirt.org/message/6DZRAMSAJCXNSMZYCSAROOWDGKVF6GQX/ > > > > > > > > -- > > Martin Perina > > Manager, Software Engineering > > Red Hat Czech s.r.o. > > > > -- > Didi -- Didi _______________________________________________ Devel mailing list -- devel@ovirt.org To unsubscribe send an email to devel-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/devel@ovirt.org/message/47I2PBFZSSQSSXZLWXMA2V24WVZVFURM/