On Mon, Nov 25, 2019 at 11:00 AM Dominik Holler <dhol...@redhat.com> wrote: > > > > On Fri, Nov 22, 2019 at 8:57 PM Dominik Holler <dhol...@redhat.com> wrote: >> >> >> >> On Fri, Nov 22, 2019 at 5:54 PM Dominik Holler <dhol...@redhat.com> wrote: >>> >>> >>> >>> On Fri, Nov 22, 2019 at 5:48 PM Nir Soffer <nsof...@redhat.com> wrote: >>>> >>>> >>>> >>>> On Fri, Nov 22, 2019, 18:18 Marcin Sobczyk <msobc...@redhat.com> wrote: >>>>> >>>>> >>>>> >>>>> On 11/22/19 4:54 PM, Martin Perina wrote: >>>>> >>>>> >>>>> >>>>> On Fri, Nov 22, 2019 at 4:43 PM Dominik Holler <dhol...@redhat.com> wrote: >>>>>> >>>>>> >>>>>> On Fri, Nov 22, 2019 at 12:17 PM Dominik Holler <dhol...@redhat.com> >>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Nov 22, 2019 at 12:00 PM Miguel Duarte de Mora Barroso >>>>>>> <mdbarr...@redhat.com> wrote: >>>>>>>> >>>>>>>> On Fri, Nov 22, 2019 at 11:54 AM Vojtech Juranek <vjura...@redhat.com> >>>>>>>> wrote: >>>>>>>> > >>>>>>>> > On pátek 22. listopadu 2019 9:56:56 CET Miguel Duarte de Mora >>>>>>>> > Barroso wrote: >>>>>>>> > > On Fri, Nov 22, 2019 at 9:49 AM Vojtech Juranek >>>>>>>> > > <vjura...@redhat.com> >>>>>>>> > > wrote: >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> > > > On pátek 22. listopadu 2019 9:41:26 CET Dominik Holler wrote: >>>>>>>> > > > >>>>>>>> > > > > On Fri, Nov 22, 2019 at 8:40 AM Dominik Holler >>>>>>>> > > > > <dhol...@redhat.com> >>>>>>>> > > > > wrote: >>>>>>>> > > > > >>>>>>>> > > > > > On Thu, Nov 21, 2019 at 10:54 PM Nir Soffer >>>>>>>> > > > > > <nsof...@redhat.com> >>>>>>>> > > > > > wrote: >>>>>>>> > > > > > >>>>>>>> > > > > >> On Thu, Nov 21, 2019 at 11:24 PM Vojtech Juranek >>>>>>>> > > > > >> <vjura...@redhat.com> >>>>>>>> > > > > >> >>>>>>>> > > > > >> >>>>>>>> > > > > >> >>>>>>>> > > > > >> wrote: >>>>>>>> > > > > >> >>>>>>>> > > > > >> > Hi, >>>>>>>> > > > > >> > OST fails (see e.g. [1]) in >>>>>>>> > > > > >> > 002_bootstrap.check_update_host. It >>>>>>>> > > > > >> > fails >>>>>>>> > > > > >> >>>>>>>> > > > > >> >>>>>>>> > > > > >> >>>>>>>> > > > > >> with >>>>>>>> > > > > >> >>>>>>>> > > > > >> >>>>>>>> > > > > >> >>>>>>>> > > > > >> > FAILED! => {"changed": false, "failures": [], "msg": >>>>>>>> > > > > >> > "Depsolve >>>>>>>> > > > > >> > Error >>>>>>>> > > > > >> >>>>>>>> > > > > >> >>>>>>>> > > > > >> >>>>>>>> > > > > >> occured: >>>>>>>> > > > > >> >>>>>>>> > > > > >> > \n Problem 1: cannot install the best update candidate >>>>>>>> > > > > >> > for package >>>>>>>> > > > > >> > vdsm- >>>>>>>> > > > > >> > network-4.40.0-1236.git63ea8cb8b.el8.x86_64\n - nothing >>>>>>>> > > > > >> > provides >>>>>>>> > > > > >> >>>>>>>> > > > > >> >>>>>>>> > > > > >> >>>>>>>> > > > > >> nmstate >>>>>>>> > > > > >> >>>>>>>> > > > > >> >>>>>>>> > > > > >> >>>>>>>> > > > > >> > needed by >>>>>>>> > > > > >> > vdsm-network-4.40.0-1271.git524e08c8a.el8.x86_64\n >>>>>>>> > > > > >> > Problem 2: >>>>>>>> > > > > >> > package vdsm-python-4.40.0-1271.git524e08c8a.el8.noarch >>>>>>>> > > > > >> > requires >>>>>>>> > > > > >> >>>>>>>> > > > > >> >>>>>>>> > > > > >> >>>>>>>> > > > > >> vdsm-network >>>>>>>> > > > > >> >>>>>>>> > > > > >> >>>>>>>> > > > > >> >>>>>>>> > > > > >> > = 4.40.0-1271.git524e08c8a.el8, but none of the providers >>>>>>>> > > > > >> > can be >>>>>>>> > > > > >> >>>>>>>> > > > > >> >>>>>>>> > > > > >> >>>>>>>> > > > > >> installed\n >>>>>>>> > > > > >> >>>>>>>> > > > > >> >>>>>>>> > > > > >> >>>>>>>> > > > > >> > - cannot install the best update candidate for package >>>>>>>> > > > > >> > vdsm- >>>>>>>> > > > > >> > python-4.40.0-1236.git63ea8cb8b.el8.noarch\n - nothing >>>>>>>> > > > > >> > provides >>>>>>>> > > > > >> > nmstate >>>>>>>> > > > > >> > needed by >>>>>>>> > > > > >> > vdsm-network-4.40.0-1271.git524e08c8a.el8.x86_64\n >>>>>>>> > > > > >> >>>>>>>> > > > > >> >>>>>>>> > > > > >> >>>>>>>> > > > > >> nmstate should be provided by copr repo enabled by >>>>>>>> > > > > >> ovirt-release-master. >>>>>>>> > > > > > >>>>>>>> > > > > > >>>>>>>> > > > > > >>>>>>>> > > > > > I re-triggered as >>>>>>>> > > > > > https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6131 >>>>>>>> > > > > > maybe >>>>>>>> > > > > > https://gerrit.ovirt.org/#/c/104825/ >>>>>>>> > > > > > was missing >>>>>>>> > > > > >>>>>>>> > > > > >>>>>>>> > > > > >>>>>>>> > > > > Looks like >>>>>>>> > > > > https://gerrit.ovirt.org/#/c/104825/ is ignored by OST. >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> > > > maybe not. You re-triggered with [1], which really missed this >>>>>>>> > > > patch. >>>>>>>> > > > I did a rebase and now running with this patch in build #6132 >>>>>>>> > > > [2]. Let's >>>>>>>> > > > wait >>>>>>>> > for it to see if gerrit #104825 helps. >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> > > > [1] https://jenkins.ovirt.org/job/standard-manual-runner/909/ >>>>>>>> > > > [2] https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6132/ >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> > > > > Miguel, do you think merging >>>>>>>> > > > > >>>>>>>> > > > > >>>>>>>> > > > > >>>>>>>> > > > > https://gerrit.ovirt.org/#/c/104495/15/common/yum-repos/ovirt-master-hos >>>>>>>> > > > > t-cq >>>>>>>> > .repo.in >>>>>>>> > > > > >>>>>>>> > > > > >>>>>>>> > > > > >>>>>>>> > > > > would solve this? >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > I've split the patch Dominik mentions above in two, one of them >>>>>>>> > > adding >>>>>>>> > > the nmstate / networkmanager copr repos - [3]. >>>>>>>> > > >>>>>>>> > > Let's see if it fixes it. >>>>>>>> > >>>>>>>> > it fixes original issue, but OST still fails in >>>>>>>> > 098_ovirt_provider_ovn.use_ovn_provider: >>>>>>>> > >>>>>>>> > https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6134 >>>>>>>> >>>>>>>> I think Dominik was looking into this issue; +Dominik Holler please >>>>>>>> confirm. >>>>>>>> >>>>>>>> Let me know if you need any help Dominik. >>>>>>> >>>>>>> >>>>>>> >>>>>>> Thanks. >>>>>>> The problem is that the hosts lost connection to storage: >>>>>>> https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6134/artifact/exported-artifacts/test_logs/basic-suite-master/post-098_ovirt_provider_ovn.py/lago-basic-suite-master-host-0/_var_log/vdsm/vdsm.log >>>>>>> : >>>>>>> >>>>>>> 2019-11-22 05:39:12,326-0500 DEBUG (jsonrpc/5) [common.commands] >>>>>>> /usr/bin/taskset --cpu-list 0-1 /usr/bin/sudo -n /sbin/lvm vgs --config >>>>>>> 'devices { preferred_names=["^/dev/mapper/"] >>>>>>> ignore_suspended_devices=1 write_cache_state=0 >>>>>>> disable_after_error_count=3 >>>>>>> filter=["a|^/dev/mapper/36001405107ea8b4e3ac4ddeb3e19890f$|^/dev/mapper/360014054924c91df75e41178e4b8a80c$|^/dev/mapper/3600140561c0d02829924b77ab7323f17$|^/dev/mapper/3600140582feebc04ca5409a99660dbbc$|^/dev/mapper/36001405c3c53755c13c474dada6be354$|", >>>>>>> "r|.*|"] } global { locking_type=1 prioritise_write_locks=1 >>>>>>> wait_for_locks=1 use_lvmetad=0 } backup { retain_min=50 >>>>>>> retain_days=0 }' --noheadings --units b --nosuffix --separator '|' >>>>>>> --ignoreskippedcluster -o >>>>>>> uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name >>>>>>> (cwd None) (commands:153) >>>>>>> 2019-11-22 05:39:12,415-0500 ERROR (check/loop) [storage.Monitor] Error >>>>>>> checking path >>>>>>> /rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/d10879c6-8de1-40ba-87fa-f447844eed2a/dom_md/metadata >>>>>>> (monitor:501) >>>>>>> Traceback (most recent call last): >>>>>>> File "/usr/lib/python3.6/site-packages/vdsm/storage/monitor.py", line >>>>>>> 499, in _pathChecked >>>>>>> delay = result.delay() >>>>>>> File "/usr/lib/python3.6/site-packages/vdsm/storage/check.py", line >>>>>>> 391, in delay >>>>>>> raise exception.MiscFileReadException(self.path, self.rc, self.err) >>>>>>> vdsm.storage.exception.MiscFileReadException: Internal file read >>>>>>> failure: >>>>>>> ('/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/d10879c6-8de1-40ba-87fa-f447844eed2a/dom_md/metadata', >>>>>>> 1, 'Read timeout') >>>>>>> 2019-11-22 05:39:12,416-0500 INFO (check/loop) [storage.Monitor] >>>>>>> Domain d10879c6-8de1-40ba-87fa-f447844eed2a became INVALID (monitor:472) >>>>>>> >>>>>>> >>>>>>> I failed to reproduce local to analyze this, I will try again, any >>>>>>> hints welcome. >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> https://gerrit.ovirt.org/#/c/104925/1/ shows that 008_basic_ui_sanity.py >>>>>> triggers the problem. >>>>>> Is there someone with knowledge about the basic_ui_sanity around? >>>>> >>>>> How do you think it's related? By commenting out the ui sanity tests and >>>>> seeing OST with successful finish? >>>>> >>>>> Looking at 6134 run you were discussing: >>>>> >>>>> - timing of the ui sanity set-up [1]: >>>>> >>>>> 11:40:20 @ Run test: 008_basic_ui_sanity.py: >>>>> >>>>> - timing of first encountered storage error [2]: >>>>> >>>>> 2019-11-22 05:39:12,415-0500 ERROR (check/loop) [storage.Monitor] Error >>>>> checking path >>>>> /rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/d10879c6-8de1-40ba-87fa-f447844eed2a/dom_md/metadata >>>>> (monitor:501) >>>>> Traceback (most recent call last): >>>>> File "/usr/lib/python3.6/site-packages/vdsm/storage/monitor.py", line >>>>> 499, in _pathChecked >>>>> delay = result.delay() >>>>> File "/usr/lib/python3.6/site-packages/vdsm/storage/check.py", line >>>>> 391, in delay >>>>> raise exception.MiscFileReadException(self.path, self.rc, self.err) >>>>> vdsm.storage.exception.MiscFileReadException: Internal file read failure: >>>>> ('/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/d10879c6-8de1-40ba-87fa-f447844eed2a/dom_md/metadata', >>>>> 1, 'Read timeout') >>>>> >>>>> Timezone difference aside, it seems to me that these storage errors >>>>> occured before doing anything ui-related. >> >> >> >> You are right, a time.sleep(8*60) in >> https://gerrit.ovirt.org/#/c/104925/2 >> has triggers the issue the same way.
So this is a test issues, assuming that the UI tests can complete in less than 8 minutes? >> > > Nir or Steve, can you please confirm that this is a storage problem? Why do you think we have a storage problem? > >> >> >>>>> >>>>> I remember talking with Steven Rosenberg on IRC a couple of days ago >>>>> about some storage metadata issues and he said he got a response from >>>>> Nir, that "it's a known issue". >>>>> >>>>> Nir, Amit, can you comment on this? >>>> >>>> >>>> The error mentioned here is not vdsm error but warning about storage >>>> accessibility. We sould convert the tracebacks to warning. >>>> >>>> The reason for such issue can be misconfigured network (maybe network team >>>> is testing negative flows?), >>> >>> >>> No. >>> >>>> >>>> or some issue in the NFS server. >>>> >>> >>> Only hint I found is >>> "Exiting Time2Retain handler because session_reinstatement=1" >>> but I have no idea what this means or if this is relevant at all. >>> >>>> >>>> One read timeout is not an issue. We have a real issue only if we have >>>> consistent read timeouts or errors for couple of minutes, after that >>>> engine can deactivate the storage domain or some hosts if only these hosts >>>> are having trouble to access storage. >>>> >>>> In OST we never expect such conditions since we don't test negative flows, >>>> and we should have good connectivity with the vms running on the same host. >>>> >>> >>> Ack, this seems to be the problem. >>> >>>> >>>> Nir >>>> >>>> >>>>> [1] https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6134/console >>>>> [2] >>>>> https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6134/artifact/exported-artifacts/test_logs/basic-suite-master/post-098_ovirt_provider_ovn.py/lago-basic-suite-master-host-0/_var_log/vdsm/vdsm.log >>>>>> >>>>>> >>>>> >>>>> Marcin, could you please take a look? >>>>>> >>>>>> >>>>>> >>>>>>>> >>>>>>>> > >>>>>>>> > > [3] - https://gerrit.ovirt.org/#/c/104897/ >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > > > >>>>>>>> > > > > >>>>>>>> > > > > >> Who installs this rpm in OST? >>>>>>>> > > > > > >>>>>>>> > > > > > >>>>>>>> > > > > > >>>>>>>> > > > > > I do not understand the question. >>>>>>>> > > > > > >>>>>>>> > > > > > >>>>>>>> > > > > > >>>>>>>> > > > > >> > [...] >>>>>>>> > > > > >> > >>>>>>>> > > > > >> > >>>>>>>> > > > > >> > >>>>>>>> > > > > >> > See [2] for full error. >>>>>>>> > > > > >> > >>>>>>>> > > > > >> > >>>>>>>> > > > > >> > >>>>>>>> > > > > >> > Can someone please take a look? >>>>>>>> > > > > >> > Thanks >>>>>>>> > > > > >> > Vojta >>>>>>>> > > > > >> > >>>>>>>> > > > > >> > >>>>>>>> > > > > >> > >>>>>>>> > > > > >> > [1] >>>>>>>> > > > > >> > https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6128/ >>>>>>>> > > > > >> > [2] >>>>>>>> > > > > >> >>>>>>>> > > > > >> >>>>>>>> > > > > >> >>>>>>>> > > > > >> https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6128/artifact >>>>>>>> > > > > >> / >>>>>>>> > > > > >> >>>>>>>> > > > > >> >>>>>>>> > > > > >> >>>>>>>> > > > > >> > exported-artifacts/test_logs/basic-suite-master/ >>>>>>>> > > > > >> >>>>>>>> > > > > >> >>>>>>>> > > > > >> >>>>>>>> > > > > >> post-002_bootstrap.py/lago- >>>>>>>> > > > > >> >>>>>>>> > > > > >> >>>>>>>> > > > > >> >>>>>>>> > > > > >> basic-suite-master-engine/_var_log/ovirt-engine/engine.log___________ >>>>>>>> > > > > >> ____ >>>>>>>> > > > > >> ________________________________>> >>>>>>>> > > > > >> >>>>>>>> > > > > >> > Devel mailing list -- devel@ovirt.org >>>>>>>> > > > > >> > To unsubscribe send an email to devel-le...@ovirt.org >>>>>>>> > > > > >> > Privacy Statement: >>>>>>>> > > > > >> > https://www.ovirt.org/site/privacy-policy/ >>>>>>>> > > > > >> >>>>>>>> > > > > >> >>>>>>>> > > > > >> >>>>>>>> > > > > >> > oVirt Code of Conduct: >>>>>>>> > > > > >> >>>>>>>> > > > > >> https://www.ovirt.org/community/about/community-guidelines/ >>>>>>>> > > > > >> >>>>>>>> > > > > >> >>>>>>>> > > > > >> >>>>>>>> > > > > >> > List Archives: >>>>>>>> > > > > >> >>>>>>>> > > > > >> https://lists.ovirt.org/archives/list/devel@ovirt.org/message/4K5N3VQ >>>>>>>> > > > > >> N26B >>>>>>>> > > > > >> L73K7D45A2IR7R3UMMM23/ >>>>>>>> > > > > >> _______________________________________________ >>>>>>>> > > > > >> Devel mailing list -- devel@ovirt.org >>>>>>>> > > > > >> To unsubscribe send an email to devel-le...@ovirt.org >>>>>>>> > > > > >> Privacy Statement: >>>>>>>> > > > > >> https://www.ovirt.org/site/privacy-policy/ >>>>>>>> > > > > >> oVirt Code of Conduct: >>>>>>>> > > > > >> https://www.ovirt.org/community/about/community-guidelines/ >>>>>>>> > > > > >> List Archives: >>>>>>>> > > > > >> https://lists.ovirt.org/archives/list/devel@ovirt.org/message/JN7MNUZ >>>>>>>> > > > > >> N5K3 >>>>>>>> > > > > >> NS5TGXFCILYES77KI5TZU/ >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> > > >>>>>>>> > > _______________________________________________ >>>>>>>> > > Devel mailing list -- devel@ovirt.org >>>>>>>> > > To unsubscribe send an email to devel-le...@ovirt.org >>>>>>>> > > Privacy Statement: https://www.ovirt.org/site/privacy-policy/ >>>>>>>> > > oVirt Code of Conduct: >>>>>>>> > > https://www.ovirt.org/community/about/community-guidelines/ List >>>>>>>> > > Archives: >>>>>>>> > > https://lists.ovirt.org/archives/list/devel@ovirt.org/message/UPJ5SEAV5Z65H >>>>>>>> > > 5BQ3SCHOYZX6JMTQPBW/ >>>>>>>> > >>>>>>>> >>>>> >>>>> >>>>> -- >>>>> Martin Perina >>>>> Manager, Software Engineering >>>>> Red Hat Czech s.r.o. >>>>> >>>>> _______________________________________________ Devel mailing list -- devel@ovirt.org To unsubscribe send an email to devel-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/devel@ovirt.org/message/5FP3QJ3QTPOIR5L3PJE2HB4WDNHWWUHY/