[ovirt-devel] Re: bz 1915329: [Stream] Add host fails with: Destination /etc/pki/ovirt-engine/requests not writable
On 1/18/21 9:58 AM, Yedidyah Bar David wrote: On Mon, Jan 18, 2021 at 10:53 AM Martin Perina wrote: On Mon, Jan 18, 2021 at 9:08 AM Yedidyah Bar David wrote: On Sun, Jan 17, 2021 at 3:11 PM Yedidyah Bar David wrote: On Thu, Jan 14, 2021 at 1:41 PM Yedidyah Bar David wrote: On Thu, Jan 14, 2021 at 8:35 AM Yedidyah Bar David wrote: On Wed, Jan 13, 2021 at 5:34 PM Yedidyah Bar David wrote: On Wed, Jan 13, 2021 at 2:48 PM Yedidyah Bar David wrote: On Wed, Jan 13, 2021 at 1:57 PM Marcin Sobczyk wrote: Hi, my guess is it's selinux-related. Unfortunately I can't find any meaningful errors in audit.log in a scenario where host deployment fails. However switching selinux to permissive mode before adding hosts makes the problem go away, so it's probably not an error somewhere in logic. It's getting weirder: Under strace, it succeeds: https://gerrit.ovirt.org/c/ovirt-system-tests/+/112948 (Can't see the actual log, as I didn't add '-A', so it was overwritten on restart...) After updating it to use '-A' it indeed shows that it worked: 43664 14:16:55.997639 access("/etc/pki/ovirt-engine/requests", W_OK 43664 14:16:55.997695 <... access resumed>) = 0 Weird. Now ran in parallel 'ci test' for this patch and another one from master, for comparison: Again, the same: https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/14916/ With strace, passed, https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/1883/ Without strace, failed. Last nightly run that passed [1] used: ost-images-el8-host-installed-1-202101100446.x86_64 ovirt-engine-appliance-4.4-20210109182828.1.el8.x86_64 Trying now with these - not sure it possible to put specific versions inside automation/*packages, let's see: https://gerrit.ovirt.org/c/ovirt-system-tests/+/112977 Indeed, with a fixed ost-images and removing updates, it passes. network suite failed, but he-basic passed: https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/14920/artifact/ci_build_summary.html So I am quite certain this is an OS issue. Not sure how we do not see this in basic-suite. Perhaps it's related to nested-kvm, or to load/slowness caused by that? Weird. when this fails, we do not collect all engine's /var/log, only messages and ovirt-engine/ . So it's not easy to get a list of the packages that were updated. Pushed now: https://github.com/oVirt/ovirt-ansible-collection/pull/202 to get all of engine's /var/log, and ran manual HE job with it: https://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests_manual/7680/ This one I accidentally ran with the wrong repo, then ran another one with the correct repo [1], But: 1. The repo wasn't used. Emailed about this a separate thread: "manual job does not use custom repo" 2. It passed! Being what seems like a heisenbug, I understand why when you run it under strace it works differently. But even if you just intend to collect more logs it also causes it to behave differently? :-) This does not mean that "problem solved" - latest nightly run [2] did fail with the same error. Status: 1. he-basic-suite is still failing. 2. Patch to collect all of /var/log from the engine merged. Dana, can you please update? Did you have any progress? IMO it's an OS bug. If Marcin says it's an selinux issue, I do not argue :-). So, how do we continue? Switching to CentOS Stream development/testing is a big effort, I'm not sure we can do this and still deliver all the RFEs/bugs planned for 4.4.5 ... +1 IMO we should now revert appliance and node to CentOS 8.3, and then continue the discussion. Having he-basic-suite broken for a week is too much. +1 The testing infrastructure for Stream is here, but if it doesn't work yet than let's stick to the plan and focus on 8.3. [1] https://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests_manual/7681/ [2] https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/1887/ [1] https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/1879/ -- Didi -- Didi -- Didi -- Martin Perina Manager, Software Engineering Red Hat Czech s.r.o. ___ Devel mailing list -- devel@ovirt.org To unsubscribe send an email to devel-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/devel@ovirt.org/message/MHE2YPXUYGDX6IBS265BPXEXLGCEOZWI/
[ovirt-devel] Re: bz 1915329: [Stream] Add host fails with: Destination /etc/pki/ovirt-engine/requests not writable
On Mon, Jan 18, 2021 at 10:53 AM Martin Perina wrote: > > > > On Mon, Jan 18, 2021 at 9:08 AM Yedidyah Bar David wrote: >> >> On Sun, Jan 17, 2021 at 3:11 PM Yedidyah Bar David wrote: >> > >> > On Thu, Jan 14, 2021 at 1:41 PM Yedidyah Bar David wrote: >> > > >> > > On Thu, Jan 14, 2021 at 8:35 AM Yedidyah Bar David >> > > wrote: >> > > > >> > > > On Wed, Jan 13, 2021 at 5:34 PM Yedidyah Bar David >> > > > wrote: >> > > > > >> > > > > On Wed, Jan 13, 2021 at 2:48 PM Yedidyah Bar David >> > > > > wrote: >> > > > > > >> > > > > > On Wed, Jan 13, 2021 at 1:57 PM Marcin Sobczyk >> > > > > > wrote: >> > > > > > > >> > > > > > > Hi, >> > > > > > > >> > > > > > > my guess is it's selinux-related. >> > > > > > > >> > > > > > > Unfortunately I can't find any meaningful errors in audit.log in >> > > > > > > a >> > > > > > > scenario where host deployment fails. >> > > > > > > However switching selinux to permissive mode before adding hosts >> > > > > > > makes >> > > > > > > the problem go away, so it's probably not an error somewhere in >> > > > > > > logic. >> > > > > > >> > > > > > It's getting weirder: Under strace, it succeeds: >> > > > > > >> > > > > > https://gerrit.ovirt.org/c/ovirt-system-tests/+/112948 >> > > > > > >> > > > > > (Can't see the actual log, as I didn't add '-A', so it was >> > > > > > overwritten >> > > > > > on restart...) >> > > > > >> > > > > After updating it to use '-A' it indeed shows that it worked: >> > > > > >> > > > > 43664 14:16:55.997639 access("/etc/pki/ovirt-engine/requests", W_OK >> > > > > >> > > > > 43664 14:16:55.997695 <... access resumed>) = 0 >> > > > > >> > > > > Weird. >> > > > > >> > > > > Now ran in parallel 'ci test' for this patch and another one from >> > > > > master, for comparison: >> > > > >> > > > Again, the same: >> > > > >> > > > > >> > > > > https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/14916/ >> > > > >> > > > With strace, passed, >> > > > >> > > > > >> > > > > https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/1883/ >> > > > >> > > > Without strace, failed. >> > > > >> > > > Last nightly run that passed [1] used: >> > > > >> > > > ost-images-el8-host-installed-1-202101100446.x86_64 >> > > > ovirt-engine-appliance-4.4-20210109182828.1.el8.x86_64 >> > > > >> > > > Trying now with these - not sure it possible to put specific versions >> > > > inside >> > > > automation/*packages, let's see: >> > > > >> > > > https://gerrit.ovirt.org/c/ovirt-system-tests/+/112977 >> > > >> > > Indeed, with a fixed ost-images and removing updates, it passes. network >> > > suite >> > > failed, but he-basic passed: >> > > >> > > https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/14920/artifact/ci_build_summary.html >> > > >> > > So I am quite certain this is an OS issue. Not sure how we do not see >> > > this in basic-suite. >> > > Perhaps it's related to nested-kvm, or to load/slowness caused by that? >> > > Weird. >> > > >> > > when this fails, we do not collect all engine's /var/log, only >> > > messages and ovirt-engine/ . >> > > So it's not easy to get a list of the packages that were updated. >> > > >> > > Pushed now: >> > > >> > > https://github.com/oVirt/ovirt-ansible-collection/pull/202 >> > > >> > > to get all of engine's /var/log, and ran manual HE job with it: >> > > >> > > https://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests_manual/7680/ >> > >> > This one I accidentally ran with the wrong repo, then ran another one >> > with the correct repo [1], >> > But: >> > >> > 1. The repo wasn't used. Emailed about this a separate thread: "manual >> > job does not use custom repo" >> > >> > 2. It passed! Being what seems like a heisenbug, I understand why when >> > you run it under strace it >> > works differently. But even if you just intend to collect more logs it >> > also causes it to behave >> > differently? :-) This does not mean that "problem solved" - latest >> > nightly run [2] did fail with >> > the same error. >> >> Status: >> >> 1. he-basic-suite is still failing. >> >> 2. Patch to collect all of /var/log from the engine merged. >> >> Dana, can you please update? Did you have any progress? >> >> IMO it's an OS bug. If Marcin says it's an selinux issue, I do not argue :-). >> So, how do we continue? > > > Switching to CentOS Stream development/testing is a big effort, I'm not sure > we can do this and still deliver all the RFEs/bugs planned for 4.4.5 ... IMO we should now revert appliance and node to CentOS 8.3, and then continue the discussion. Having he-basic-suite broken for a week is too much. >> >> >> > >> > [1] >> > https://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests_manual/7681/ >> > [2] >> > https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/1887/ >> > >> > > >> > > >> > > > >> > > > [1] >> > > > https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/1879/ >> > > -- >> > > Didi >> > >>
[ovirt-devel] Re: bz 1915329: [Stream] Add host fails with: Destination /etc/pki/ovirt-engine/requests not writable
On Mon, Jan 18, 2021 at 9:08 AM Yedidyah Bar David wrote: > On Sun, Jan 17, 2021 at 3:11 PM Yedidyah Bar David > wrote: > > > > On Thu, Jan 14, 2021 at 1:41 PM Yedidyah Bar David > wrote: > > > > > > On Thu, Jan 14, 2021 at 8:35 AM Yedidyah Bar David > wrote: > > > > > > > > On Wed, Jan 13, 2021 at 5:34 PM Yedidyah Bar David > wrote: > > > > > > > > > > On Wed, Jan 13, 2021 at 2:48 PM Yedidyah Bar David < > d...@redhat.com> wrote: > > > > > > > > > > > > On Wed, Jan 13, 2021 at 1:57 PM Marcin Sobczyk < > msobc...@redhat.com> wrote: > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > my guess is it's selinux-related. > > > > > > > > > > > > > > Unfortunately I can't find any meaningful errors in audit.log > in a > > > > > > > scenario where host deployment fails. > > > > > > > However switching selinux to permissive mode before adding > hosts makes > > > > > > > the problem go away, so it's probably not an error somewhere > in logic. > > > > > > > > > > > > It's getting weirder: Under strace, it succeeds: > > > > > > > > > > > > https://gerrit.ovirt.org/c/ovirt-system-tests/+/112948 > > > > > > > > > > > > (Can't see the actual log, as I didn't add '-A', so it was > overwritten > > > > > > on restart...) > > > > > > > > > > After updating it to use '-A' it indeed shows that it worked: > > > > > > > > > > 43664 14:16:55.997639 access("/etc/pki/ovirt-engine/requests", W_OK > > > > > > > > > > 43664 14:16:55.997695 <... access resumed>) = 0 > > > > > > > > > > Weird. > > > > > > > > > > Now ran in parallel 'ci test' for this patch and another one from > > > > > master, for comparison: > > > > > > > > Again, the same: > > > > > > > > > > > > > > > https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/14916/ > > > > > > > > With strace, passed, > > > > > > > > > > > > > > > https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/1883/ > > > > > > > > Without strace, failed. > > > > > > > > Last nightly run that passed [1] used: > > > > > > > > ost-images-el8-host-installed-1-202101100446.x86_64 > > > > ovirt-engine-appliance-4.4-20210109182828.1.el8.x86_64 > > > > > > > > Trying now with these - not sure it possible to put specific > versions inside > > > > automation/*packages, let's see: > > > > > > > > https://gerrit.ovirt.org/c/ovirt-system-tests/+/112977 > > > > > > Indeed, with a fixed ost-images and removing updates, it passes. > network suite > > > failed, but he-basic passed: > > > > > > > https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/14920/artifact/ci_build_summary.html > > > > > > So I am quite certain this is an OS issue. Not sure how we do not see > > > this in basic-suite. > > > Perhaps it's related to nested-kvm, or to load/slowness caused by > that? Weird. > > > > > > when this fails, we do not collect all engine's /var/log, only > > > messages and ovirt-engine/ . > > > So it's not easy to get a list of the packages that were updated. > > > > > > Pushed now: > > > > > > https://github.com/oVirt/ovirt-ansible-collection/pull/202 > > > > > > to get all of engine's /var/log, and ran manual HE job with it: > > > > > > > https://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests_manual/7680/ > > > > This one I accidentally ran with the wrong repo, then ran another one > > with the correct repo [1], > > But: > > > > 1. The repo wasn't used. Emailed about this a separate thread: "manual > > job does not use custom repo" > > > > 2. It passed! Being what seems like a heisenbug, I understand why when > > you run it under strace it > > works differently. But even if you just intend to collect more logs it > > also causes it to behave > > differently? :-) This does not mean that "problem solved" - latest > > nightly run [2] did fail with > > the same error. > > Status: > > 1. he-basic-suite is still failing. > > 2. Patch to collect all of /var/log from the engine merged. > > Dana, can you please update? Did you have any progress? > > IMO it's an OS bug. If Marcin says it's an selinux issue, I do not argue > :-). > So, how do we continue? > Switching to CentOS Stream development/testing is a big effort, I'm not sure we can do this and still deliver all the RFEs/bugs planned for 4.4.5 ... > > > > > [1] > https://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests_manual/7681/ > > [2] > https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/1887/ > > > > > > > > > > > > > > > > [1] > https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/1879/ > > > -- > > > Didi > > > > > > > > -- > > Didi > > > > -- > Didi > > -- Martin Perina Manager, Software Engineering Red Hat Czech s.r.o. ___ Devel mailing list -- devel@ovirt.org To unsubscribe send an email to devel-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List
[ovirt-devel] Re: bz 1915329: [Stream] Add host fails with: Destination /etc/pki/ovirt-engine/requests not writable
On Sun, Jan 17, 2021 at 3:11 PM Yedidyah Bar David wrote: > > On Thu, Jan 14, 2021 at 1:41 PM Yedidyah Bar David wrote: > > > > On Thu, Jan 14, 2021 at 8:35 AM Yedidyah Bar David wrote: > > > > > > On Wed, Jan 13, 2021 at 5:34 PM Yedidyah Bar David > > > wrote: > > > > > > > > On Wed, Jan 13, 2021 at 2:48 PM Yedidyah Bar David > > > > wrote: > > > > > > > > > > On Wed, Jan 13, 2021 at 1:57 PM Marcin Sobczyk > > > > > wrote: > > > > > > > > > > > > Hi, > > > > > > > > > > > > my guess is it's selinux-related. > > > > > > > > > > > > Unfortunately I can't find any meaningful errors in audit.log in a > > > > > > scenario where host deployment fails. > > > > > > However switching selinux to permissive mode before adding hosts > > > > > > makes > > > > > > the problem go away, so it's probably not an error somewhere in > > > > > > logic. > > > > > > > > > > It's getting weirder: Under strace, it succeeds: > > > > > > > > > > https://gerrit.ovirt.org/c/ovirt-system-tests/+/112948 > > > > > > > > > > (Can't see the actual log, as I didn't add '-A', so it was overwritten > > > > > on restart...) > > > > > > > > After updating it to use '-A' it indeed shows that it worked: > > > > > > > > 43664 14:16:55.997639 access("/etc/pki/ovirt-engine/requests", W_OK > > > > > > > > 43664 14:16:55.997695 <... access resumed>) = 0 > > > > > > > > Weird. > > > > > > > > Now ran in parallel 'ci test' for this patch and another one from > > > > master, for comparison: > > > > > > Again, the same: > > > > > > > > > > > https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/14916/ > > > > > > With strace, passed, > > > > > > > > > > > https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/1883/ > > > > > > Without strace, failed. > > > > > > Last nightly run that passed [1] used: > > > > > > ost-images-el8-host-installed-1-202101100446.x86_64 > > > ovirt-engine-appliance-4.4-20210109182828.1.el8.x86_64 > > > > > > Trying now with these - not sure it possible to put specific versions > > > inside > > > automation/*packages, let's see: > > > > > > https://gerrit.ovirt.org/c/ovirt-system-tests/+/112977 > > > > Indeed, with a fixed ost-images and removing updates, it passes. network > > suite > > failed, but he-basic passed: > > > > https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/14920/artifact/ci_build_summary.html > > > > So I am quite certain this is an OS issue. Not sure how we do not see > > this in basic-suite. > > Perhaps it's related to nested-kvm, or to load/slowness caused by that? > > Weird. > > > > when this fails, we do not collect all engine's /var/log, only > > messages and ovirt-engine/ . > > So it's not easy to get a list of the packages that were updated. > > > > Pushed now: > > > > https://github.com/oVirt/ovirt-ansible-collection/pull/202 > > > > to get all of engine's /var/log, and ran manual HE job with it: > > > > https://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests_manual/7680/ > > This one I accidentally ran with the wrong repo, then ran another one > with the correct repo [1], > But: > > 1. The repo wasn't used. Emailed about this a separate thread: "manual > job does not use custom repo" > > 2. It passed! Being what seems like a heisenbug, I understand why when > you run it under strace it > works differently. But even if you just intend to collect more logs it > also causes it to behave > differently? :-) This does not mean that "problem solved" - latest > nightly run [2] did fail with > the same error. Status: 1. he-basic-suite is still failing. 2. Patch to collect all of /var/log from the engine merged. Dana, can you please update? Did you have any progress? IMO it's an OS bug. If Marcin says it's an selinux issue, I do not argue :-). So, how do we continue? > > [1] > https://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests_manual/7681/ > [2] > https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/1887/ > > > > > > > > > > > [1] > > > https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/1879/ > > -- > > Didi > > > > -- > Didi -- Didi ___ Devel mailing list -- devel@ovirt.org To unsubscribe send an email to devel-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/devel@ovirt.org/message/SMAL3FXKOKNZA3N6YDC6EWIXP4U3WWA2/