On Wed, Mar 17, 2021 at 5:38 PM Yedidyah Bar David <d...@redhat.com> wrote:
>
> On Wed, Mar 17, 2021 at 4:48 PM Martin Perina <mper...@redhat.com> wrote:
> >
> >
> >
> > On Wed, Mar 17, 2021 at 3:24 PM Michal Skrivanek 
> > <michal.skriva...@redhat.com> wrote:
> >>
> >>
> >>
> >> On 17. 3. 2021, at 13:53, Dana Elfassy <delfa...@redhat.com> wrote:
> >>
> >> Adding +Marcin Sobczyk
> >>
> >> On Mon, Mar 15, 2021 at 9:59 AM Yedidyah Bar David <d...@redhat.com> wrote:
> >>>
> >>> On Mon, Mar 15, 2021 at 7:55 AM Yedidyah Bar David <d...@redhat.com> 
> >>> wrote:
> >>> >
> >>> > Hi all,
> >>> >
> >>> > This started a few days ago [1] and randomly happens since then:
> >>> >
> >>> > E           DEBUG: Configuration:
> >>> > E           DEBUG: command: collect
> >>> > E           DEBUG: Traceback (most recent call last):
> >>> > E           DEBUG:   File
> >>> > "/usr/lib/python3.6/site-packages/ovirt_log_collector/__main__.py",
> >>> > line 2067, in <module>
> >>> > E           DEBUG:     '%s directory is not empty.' % 
> >>> > (conf["local_tmp_dir"])
> >>> > E           DEBUG: Exception: /dev/shm/log directory is not
> >>> > empty.ERROR: /dev/shm/log directory is not empty.non-zero return code
> >>> >
> >>> > Michal tried to fix this by using a random directory but it still fails 
> >>> > [2]:
> >>> >
> >>> > DEBUG: command: collect
> >>> > DEBUG: Traceback (most recent call last):
> >>> > DEBUG:   File 
> >>> > "/usr/lib/python3.6/site-packages/ovirt_log_collector/__main__.py",
> >>> > line 2067, in <module>
> >>> > DEBUG:     '%s directory is not empty.' % (conf["local_tmp_dir"])
> >>> > DEBUG: Exception: /dev/shm/kaN7uY directory is not empty.ERROR:
> >>> > /dev/shm/kaN7uY directory is not empty.non-zero return code
> >>> >
> >>> > Since I suppose that the randomness of mktemp is good enough, it must
> >>> > be something else. Also, the last successful run before [1] used the
> >>> > same OST git commit (same code), so I do not think it's something in
> >>> > OST's code.
> >>> >
> >>> > Any idea?
> >>> >
> >>> > I think I'll push a patch to create and use the directory right before
> >>> > calling ovirt-log-collector, which is probably better in other ways.
> >>>
> >>> My patch [1] still fails, with a somewhat different error message, but
> >>> this made me check further, and while I still do not understand, I have
> >>> this to add:
> >>>
> >>> In the failing runs, ovirt-log-collector is called *twice* in parallel. 
> >>> E.g.
> >>> in [2] (the check-patch of [1]):
> >>>
> >>> Mar 15 07:38:59 lago-basic-suite-master-engine platform-python[59099]:
> >>> ansible-command Invoked with _raw_params=lctmp=$(mktemp -d -p
> >>> /dev/shm); ovirt-log-collector --verbose --batch --no-hypervisors
> >>> --local-tmp="${lctmp}" --conf-file=/root/ovirt-log-collector.conf
> >>> _uses_shell=True warn=True stdin_add_newline=True
> >>> strip_empty_ends=True argv=None chdir=None executable=None
> >>> creates=None removes=None stdin=None
> >>> Mar 15 07:38:59 lago-basic-suite-master-engine platform-python[59124]:
> >>> ansible-command Invoked with _raw_params=lctmp=$(mktemp -d -p
> >>> /dev/shm); ovirt-log-collector --verbose --batch --no-hypervisors
> >>> --local-tmp="${lctmp}" --conf-file=/root/ovirt-log-collector.conf
> >>> _uses_shell=True warn=True stdin_add_newline=True
> >>> strip_empty_ends=True argv=None chdir=None executable=None
> >>> creates=None removes=None stdin=None
> >>>
> >>> It also generates two logs, which you can check/compare.
> >>>
> >>> It's the same for previous ones, e.g. latest nightly [3][4]:
> >>>
> >>> Mar 15 06:23:30 lago-basic-suite-master-engine platform-python[59343]:
> >>> ansible-command Invoked with _raw_params=ovirt-log-collector --verbose
> >>> --batch --no-hypervisors --conf-file=/root/ovirt-log-collector.conf
> >>> _uses_shell=True warn=True stdin_add_newline=True
> >>> strip_empty_ends=True argv=None chdir=None executable=None
> >>> creates=None removes=None stdin=None
> >>> Mar 15 06:23:30 lago-basic-suite-master-engine setroubleshoot[58889]:
> >>> SELinux is preventing /usr/lib/systemd/systemd from unlink access on
> >>> the sock_file ansible-ssh-lago-basic-suite-master-host-1-22-root. For
> >>> complete SELinux messages run: sealert -l
> >>> d03a8655-9430-4fcf-9892-3b4df1939899
> >>> Mar 15 06:23:30 lago-basic-suite-master-engine setroubleshoot[58889]:
> >>> SELinux is preventing /usr/lib/systemd/systemd from unlink access on
> >>> the sock_file 
> >>> ansible-ssh-lago-basic-suite-master-host-1-22-root.#012#012*****
> >>>  Plugin catchall (100. confidence) suggests
> >>> **************************#012#012If you believe that systemd should
> >>> be allowed unlink access on the
> >>> ansible-ssh-lago-basic-suite-master-host-1-22-root sock_file by
> >>> default.#012Then you should report this as a bug.#012You can generate
> >>> a local policy module to allow this access.#012Do#012allow this access
> >>> for now by executing:#012# ausearch -c 'systemd' --raw | audit2allow
> >>> -M my-systemd#012# semodule -X 300 -i my-systemd.pp#012
> >>> Mar 15 06:23:30 lago-basic-suite-master-engine platform-python[59361]:
> >>> ansible-command Invoked with _raw_params=ovirt-log-collector --verbose
> >>> --batch --no-hypervisors --conf-file=/root/ovirt-log-collector.conf
> >>> _uses_shell=True warn=True stdin_add_newline=True
> >>> strip_empty_ends=True argv=None chdir=None executable=None
> >>> creates=None removes=None stdin=None
> >>>
> >>> Any idea what might have caused this to start happening? Perhaps
> >>> a bug in ansible, or ansible-runner? It reminds me of [5].
> >>> Adding Dana and Martin.
> >>>
> >>> I think [5] is quite a serious bug, btw, should be a 4.4.5 blocker.
> >>
> >>
> >> it’s from January and tehre are no comments there. Dana, any update?
> >> It does look serious but perhaps not really hit in real world scenarios?
> >
> >
> > It's reproducible outside OST, Dana is investigating intensively last 2 
> > weeks, but so far we haven't been able to find where events are multiplied.
> > But AFAIK this issue is visble only with ansible-runner-service, 
> > respectively only in the part when engine fetches events from ARS and create
> > audit_log records from them. It doesn't mean, that playbook tasks are 
> > executed multiple times!
>
> Are you sure? Please check also bz 1930565. I closed it as a
> duplicate, but if you are certain that tasks are never executed
> multiple times, then we should reopen. I looked at the attached logs,
> and they were hard to follow (due to the duplication), but at least I
> didn't notice there any bug around imgbased.
>
> > OST is using ansible-runner directly, so it doesn't seem to me like the 
> > same issue
>
> Indeed. That's also what I currently think myself. It seems like a bug
> in OST. Still didn't manage to find the root cause - I am pushing my
> debugging changes to [1], for anyone that wants to follow, and:
>
> 1. run_tc basic-suite-master/test-scenarios/test_003_01_dummy.py >
> ~/run_tc/run-tc-$(date +%Y-%m-%d-%H%M%S).log 2>&1
>
> 2. cat $(ls -t1 run-tc* | head -1) | grep -Ev
> 'do_something_2.*something 2|do_something_1.*something 1' | grep -v
> 'Start'
>
> 3. Pick a line from the output (there are always a few, among 100
> runs), check its timestamp, then look at pytest.log and search for
> (currently):
>
> $timestamp|something..|_run_ansible_runner|_run_ansible_runner.*before
> run|_run_ansible_runner.*after run|Starting
> functools.partial|.*test_dummy: Starting iteration.*|ModuleArgsMapper
>
> Hope to update tomorrow. For now, we can merge the workaround patch to
> run them serially, to unbreak basic-suite at this point.

OK. I now pushed two alternative patches that fix the bug. Please pick
the one you like and merge. I tested both of them on [1] and both seem ok.

shorter fix: https://gerrit.ovirt.org/c/ovirt-system-tests/+/113937

clearer fix, imo: https://gerrit.ovirt.org/c/ovirt-system-tests/+/113938

[1] https://github.com/didib/ovirt-system-tests/commits/add-test_003_01_dummy

>
> [1] https://github.com/didib/ovirt-system-tests/commits/ansible-logs
>
> >
> > So [5] is not a blocker, it's just really annoying if you need to works 
> > with audit_log records
> >
> >>
> >>>
> >>> Best regards,
> >>>
> >>> [1] https://gerrit.ovirt.org/c/ovirt-system-tests/+/113875
> >>>
> >>> [2] 
> >>> https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/15980/artifact/check-patch.basic_suite_master.el8.x86_64/test_logs/lago-basic-suite-master-engine/var/log/messages/*view*
> >>>
> >>> [3] 
> >>> https://jenkins.ovirt.org/job/ovirt-system-tests_basic-suite-master_nightly/959/
> >>>
> >>> [4] 
> >>> https://jenkins.ovirt.org/job/ovirt-system-tests_basic-suite-master_nightly/959/artifact/exported-artifacts/test_logs/lago-basic-suite-master-engine/var/log/messages/*view*
> >>>
> >>> [5] https://bugzilla.redhat.com/show_bug.cgi?id=1917707
> >>>
> >>> >
> >>> > Best regards,
> >>> >
> >>> > [1] 
> >>> > https://jenkins.ovirt.org/job/ovirt-system-tests_basic-suite-master_nightly/949/
> >>> >
> >>> > [2] 
> >>> > https://jenkins.ovirt.org/job/ovirt-system-tests_basic-suite-master_nightly/959/
> >>> >
> >>> >
> >>> > --
> >>> > Didi
> >>>
> >>>
> >>>
> >>> --
> >>> Didi
> >>>
> >> _______________________________________________
> >> Infra mailing list -- in...@ovirt.org
> >> To unsubscribe send an email to infra-le...@ovirt.org
> >> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> >> oVirt Code of Conduct: 
> >> https://www.ovirt.org/community/about/community-guidelines/
> >> List Archives: 
> >> https://lists.ovirt.org/archives/list/in...@ovirt.org/message/PSO5IRX73FSH2RA5VCDNSPIFZRDDAWCQ/
> >>
> >>
> >> _______________________________________________
> >> Devel mailing list -- devel@ovirt.org
> >> To unsubscribe send an email to devel-le...@ovirt.org
> >> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> >> oVirt Code of Conduct: 
> >> https://www.ovirt.org/community/about/community-guidelines/
> >> List Archives: 
> >> https://lists.ovirt.org/archives/list/devel@ovirt.org/message/6DZRAMSAJCXNSMZYCSAROOWDGKVF6GQX/
> >
> >
> >
> > --
> > Martin Perina
> > Manager, Software Engineering
> > Red Hat Czech s.r.o.
>
>
>
> --
> Didi



-- 
Didi
_______________________________________________
Devel mailing list -- devel@ovirt.org
To unsubscribe send an email to devel-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/47I2PBFZSSQSSXZLWXMA2V24WVZVFURM/

Reply via email to