On Fri, Nov 22, 2019 at 8:57 PM Dominik Holler <dhol...@redhat.com> wrote:

>
>
> On Fri, Nov 22, 2019 at 5:54 PM Dominik Holler <dhol...@redhat.com> wrote:
>
>>
>>
>> On Fri, Nov 22, 2019 at 5:48 PM Nir Soffer <nsof...@redhat.com> wrote:
>>
>>>
>>>
>>> On Fri, Nov 22, 2019, 18:18 Marcin Sobczyk <msobc...@redhat.com> wrote:
>>>
>>>>
>>>>
>>>> On 11/22/19 4:54 PM, Martin Perina wrote:
>>>>
>>>>
>>>>
>>>> On Fri, Nov 22, 2019 at 4:43 PM Dominik Holler <dhol...@redhat.com>
>>>> wrote:
>>>>
>>>>>
>>>>> On Fri, Nov 22, 2019 at 12:17 PM Dominik Holler <dhol...@redhat.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Nov 22, 2019 at 12:00 PM Miguel Duarte de Mora Barroso <
>>>>>> mdbarr...@redhat.com> wrote:
>>>>>>
>>>>>>> On Fri, Nov 22, 2019 at 11:54 AM Vojtech Juranek <
>>>>>>> vjura...@redhat.com> wrote:
>>>>>>> >
>>>>>>> > On pátek 22. listopadu 2019 9:56:56 CET Miguel Duarte de Mora
>>>>>>> Barroso wrote:
>>>>>>> > > On Fri, Nov 22, 2019 at 9:49 AM Vojtech Juranek <
>>>>>>> vjura...@redhat.com>
>>>>>>> > > wrote:
>>>>>>> > > >
>>>>>>> > > >
>>>>>>> > > > On pátek 22. listopadu 2019 9:41:26 CET Dominik Holler wrote:
>>>>>>> > > >
>>>>>>> > > > > On Fri, Nov 22, 2019 at 8:40 AM Dominik Holler <
>>>>>>> dhol...@redhat.com>
>>>>>>> > > > > wrote:
>>>>>>> > > > >
>>>>>>> > > > > > On Thu, Nov 21, 2019 at 10:54 PM Nir Soffer <
>>>>>>> nsof...@redhat.com>
>>>>>>> > > > > > wrote:
>>>>>>> > > > > >
>>>>>>> > > > > >> On Thu, Nov 21, 2019 at 11:24 PM Vojtech Juranek
>>>>>>> > > > > >> <vjura...@redhat.com>
>>>>>>> > > > > >>
>>>>>>> > > > > >>
>>>>>>> > > > > >>
>>>>>>> > > > > >> wrote:
>>>>>>> > > > > >>
>>>>>>> > > > > >> > Hi,
>>>>>>> > > > > >> > OST fails (see e.g. [1]) in
>>>>>>> 002_bootstrap.check_update_host. It
>>>>>>> > > > > >> > fails
>>>>>>> > > > > >>
>>>>>>> > > > > >>
>>>>>>> > > > > >>
>>>>>>> > > > > >> with
>>>>>>> > > > > >>
>>>>>>> > > > > >>
>>>>>>> > > > > >>
>>>>>>> > > > > >> >  FAILED! => {"changed": false, "failures": [], "msg":
>>>>>>> "Depsolve
>>>>>>> > > > > >> >  Error
>>>>>>> > > > > >>
>>>>>>> > > > > >>
>>>>>>> > > > > >>
>>>>>>> > > > > >> occured:
>>>>>>> > > > > >>
>>>>>>> > > > > >> > \n Problem 1: cannot install the best update candidate
>>>>>>> for package
>>>>>>> > > > > >> > vdsm-
>>>>>>> > > > > >> > network-4.40.0-1236.git63ea8cb8b.el8.x86_64\n  -
>>>>>>> nothing provides
>>>>>>> > > > > >>
>>>>>>> > > > > >>
>>>>>>> > > > > >>
>>>>>>> > > > > >> nmstate
>>>>>>> > > > > >>
>>>>>>> > > > > >>
>>>>>>> > > > > >>
>>>>>>> > > > > >> > needed by
>>>>>>> vdsm-network-4.40.0-1271.git524e08c8a.el8.x86_64\n
>>>>>>> > > > > >> > Problem 2:
>>>>>>> > > > > >> > package vdsm-python-4.40.0-1271.git524e08c8a.el8.noarch
>>>>>>> requires
>>>>>>> > > > > >>
>>>>>>> > > > > >>
>>>>>>> > > > > >>
>>>>>>> > > > > >> vdsm-network
>>>>>>> > > > > >>
>>>>>>> > > > > >>
>>>>>>> > > > > >>
>>>>>>> > > > > >> > = 4.40.0-1271.git524e08c8a.el8, but none of the
>>>>>>> providers can be
>>>>>>> > > > > >>
>>>>>>> > > > > >>
>>>>>>> > > > > >>
>>>>>>> > > > > >> installed\n
>>>>>>> > > > > >>
>>>>>>> > > > > >>
>>>>>>> > > > > >>
>>>>>>> > > > > >> > - cannot install the best update candidate for package
>>>>>>> vdsm-
>>>>>>> > > > > >> > python-4.40.0-1236.git63ea8cb8b.el8.noarch\n  - nothing
>>>>>>> provides
>>>>>>> > > > > >> > nmstate
>>>>>>> > > > > >> > needed by
>>>>>>> vdsm-network-4.40.0-1271.git524e08c8a.el8.x86_64\n
>>>>>>> > > > > >>
>>>>>>> > > > > >>
>>>>>>> > > > > >>
>>>>>>> > > > > >> nmstate should be provided by copr repo enabled by
>>>>>>> > > > > >> ovirt-release-master.
>>>>>>> > > > > >
>>>>>>> > > > > >
>>>>>>> > > > > >
>>>>>>> > > > > > I re-triggered as
>>>>>>> > > > > >
>>>>>>> https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6131
>>>>>>> > > > > > maybe
>>>>>>> > > > > > https://gerrit.ovirt.org/#/c/104825/
>>>>>>> > > > > > was missing
>>>>>>> > > > >
>>>>>>> > > > >
>>>>>>> > > > >
>>>>>>> > > > > Looks like
>>>>>>> > > > > https://gerrit.ovirt.org/#/c/104825/ is ignored by OST.
>>>>>>> > > >
>>>>>>> > > >
>>>>>>> > > >
>>>>>>> > > > maybe not. You re-triggered with [1], which really missed this
>>>>>>> patch.
>>>>>>> > > > I did a rebase and now running with this patch in build #6132
>>>>>>> [2]. Let's
>>>>>>> > > > wait
>>>>>>> >  for it to see if gerrit #104825 helps.
>>>>>>> > > >
>>>>>>> > > >
>>>>>>> > > >
>>>>>>> > > > [1] https://jenkins.ovirt.org/job/standard-manual-runner/909/
>>>>>>> > > > [2]
>>>>>>> https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6132/
>>>>>>> > > >
>>>>>>> > > >
>>>>>>> > > >
>>>>>>> > > > > Miguel, do you think merging
>>>>>>> > > > >
>>>>>>> > > > >
>>>>>>> > > > >
>>>>>>> > > > >
>>>>>>> https://gerrit.ovirt.org/#/c/104495/15/common/yum-repos/ovirt-master-hos
>>>>>>> > > > > t-cq
>>>>>>> >  .repo.in
>>>>>>> > > > >
>>>>>>> > > > >
>>>>>>> > > > >
>>>>>>> > > > > would solve this?
>>>>>>> > >
>>>>>>> > >
>>>>>>> > > I've split the patch Dominik mentions above in two, one of them
>>>>>>> adding
>>>>>>> > > the nmstate / networkmanager copr repos - [3].
>>>>>>> > >
>>>>>>> > > Let's see if it fixes it.
>>>>>>> >
>>>>>>> > it fixes original issue, but OST still fails in
>>>>>>> > 098_ovirt_provider_ovn.use_ovn_provider:
>>>>>>> >
>>>>>>> > https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6134
>>>>>>>
>>>>>>> I think Dominik was looking into this issue; +Dominik Holler please
>>>>>>> confirm.
>>>>>>>
>>>>>>> Let me know if you need any help Dominik.
>>>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks.
>>>>>> The problem is that the hosts lost connection to storage:
>>>>>>
>>>>>> https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6134/artifact/exported-artifacts/test_logs/basic-suite-master/post-098_ovirt_provider_ovn.py/lago-basic-suite-master-host-0/_var_log/vdsm/vdsm.log
>>>>>> :
>>>>>>
>>>>>> 2019-11-22 05:39:12,326-0500 DEBUG (jsonrpc/5) [common.commands] 
>>>>>> /usr/bin/taskset --cpu-list 0-1 /usr/bin/sudo -n /sbin/lvm vgs --config 
>>>>>> 'devices {  preferred_names=["^/dev/mapper/"]  
>>>>>> ignore_suspended_devices=1  write_cache_state=0  
>>>>>> disable_after_error_count=3  
>>>>>> filter=["a|^/dev/mapper/36001405107ea8b4e3ac4ddeb3e19890f$|^/dev/mapper/360014054924c91df75e41178e4b8a80c$|^/dev/mapper/3600140561c0d02829924b77ab7323f17$|^/dev/mapper/3600140582feebc04ca5409a99660dbbc$|^/dev/mapper/36001405c3c53755c13c474dada6be354$|",
>>>>>>  "r|.*|"] } global {  locking_type=1  prioritise_write_locks=1  
>>>>>> wait_for_locks=1  use_lvmetad=0 } backup {  retain_min=50  retain_days=0 
>>>>>> }' --noheadings --units b --nosuffix --separator '|' 
>>>>>> --ignoreskippedcluster -o 
>>>>>> uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name
>>>>>>  (cwd None) (commands:153)
>>>>>> 2019-11-22 05:39:12,415-0500 ERROR (check/loop) [storage.Monitor] Error 
>>>>>> checking path 
>>>>>> /rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/d10879c6-8de1-40ba-87fa-f447844eed2a/dom_md/metadata
>>>>>>  (monitor:501)
>>>>>> Traceback (most recent call last):
>>>>>>   File "/usr/lib/python3.6/site-packages/vdsm/storage/monitor.py", line 
>>>>>> 499, in _pathChecked
>>>>>>     delay = result.delay()
>>>>>>   File "/usr/lib/python3.6/site-packages/vdsm/storage/check.py", line 
>>>>>> 391, in delay
>>>>>>     raise exception.MiscFileReadException(self.path, self.rc, self.err)
>>>>>> vdsm.storage.exception.MiscFileReadException: Internal file read 
>>>>>> failure: 
>>>>>> ('/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/d10879c6-8de1-40ba-87fa-f447844eed2a/dom_md/metadata',
>>>>>>  1, 'Read timeout')
>>>>>> 2019-11-22 05:39:12,416-0500 INFO  (check/loop) [storage.Monitor] Domain 
>>>>>> d10879c6-8de1-40ba-87fa-f447844eed2a became INVALID (monitor:472)
>>>>>>
>>>>>>
>>>>>> I failed to reproduce local to analyze this, I will try again, any
>>>>>> hints welcome.
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> https://gerrit.ovirt.org/#/c/104925/1/ shows that
>>>>> 008_basic_ui_sanity.py triggers the problem.
>>>>> Is there someone with knowledge about the basic_ui_sanity around?
>>>>>
>>>> How do you think it's related? By commenting out the ui sanity tests
>>>> and seeing OST with successful finish?
>>>>
>>>> Looking at 6134 run you were discussing:
>>>>
>>>>  - timing of the ui sanity set-up [1]:
>>>>
>>>> 11:40:20 @ Run test: 008_basic_ui_sanity.py:
>>>>
>>>> - timing of first encountered storage error [2]:
>>>>
>>>> 2019-11-22 05:39:12,415-0500 ERROR (check/loop) [storage.Monitor] Error
>>>> checking path 
>>>> /rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/d10879c6-8de1-40ba-87fa-f447844eed2a/dom_md/metadata
>>>> (monitor:501)
>>>> Traceback (most recent call last):
>>>>   File "/usr/lib/python3.6/site-packages/vdsm/storage/monitor.py", line
>>>> 499, in _pathChecked
>>>>     delay = result.delay()
>>>>   File "/usr/lib/python3.6/site-packages/vdsm/storage/check.py", line
>>>> 391, in delay
>>>>     raise exception.MiscFileReadException(self.path, self.rc, self.err)
>>>> vdsm.storage.exception.MiscFileReadException: Internal file read
>>>> failure: 
>>>> ('/rhev/data-center/mnt/192.168.200.4:_exports_nfs_share2/d10879c6-8de1-40ba-87fa-f447844eed2a/dom_md/metadata',
>>>> 1, 'Read timeout')
>>>>
>>>> Timezone difference aside, it seems to me that these storage errors
>>>> occured before doing anything ui-related.
>>>>
>>>
>
> You are right, a time.sleep(8*60) in
> https://gerrit.ovirt.org/#/c/104925/2
> has triggers the issue the same way.
>
>
Nir or Steve, can you please confirm that this is a storage problem?


>
>
>> I remember talking with Steven Rosenberg on IRC a couple of days ago
>>>> about some storage metadata issues and he said he got a response from Nir,
>>>> that "it's a known issue".
>>>>
>>>> Nir, Amit, can you comment on this?
>>>>
>>>
>>> The error mentioned here is not vdsm error but warning about storage
>>> accessibility. We sould convert the tracebacks to warning.
>>>
>>> The reason for such issue can be misconfigured network (maybe network
>>> team is testing negative flows?),
>>>
>>
>> No.
>>
>>
>>> or some issue in the NFS server.
>>>
>>>
>> Only hint I found is
>> "Exiting Time2Retain handler because session_reinstatement=1"
>> but I have no idea what this means or if this is relevant at all.
>>
>>
>>> One read timeout is not an issue. We have a real issue only if we have
>>> consistent read timeouts or errors for couple of minutes, after that engine
>>> can deactivate the storage domain or some hosts if only these hosts are
>>> having trouble to access storage.
>>>
>>> In OST we never expect such conditions since we don't test negative
>>> flows, and we should have good connectivity with the vms running on the
>>> same host.
>>>
>>>
>> Ack, this seems to be the problem.
>>
>>
>>> Nir
>>>
>>>
>>> [1] https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6134/console
>>>> [2]
>>>> https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6134/artifact/exported-artifacts/test_logs/basic-suite-master/post-098_ovirt_provider_ovn.py/lago-basic-suite-master-host-0/_var_log/vdsm/vdsm.log
>>>>
>>>>
>>>>>
>>>> Marcin, could you please take a look?
>>>>
>>>>>
>>>>>
>>>>>
>>>>>> >
>>>>>>> > > [3] - https://gerrit.ovirt.org/#/c/104897/
>>>>>>> > >
>>>>>>> > >
>>>>>>> > > > >
>>>>>>> > > > >
>>>>>>> > > > > >> Who installs this rpm in OST?
>>>>>>> > > > > >
>>>>>>> > > > > >
>>>>>>> > > > > >
>>>>>>> > > > > > I do not understand the question.
>>>>>>> > > > > >
>>>>>>> > > > > >
>>>>>>> > > > > >
>>>>>>> > > > > >> > [...]
>>>>>>> > > > > >> >
>>>>>>> > > > > >> >
>>>>>>> > > > > >> >
>>>>>>> > > > > >> > See [2] for full error.
>>>>>>> > > > > >> >
>>>>>>> > > > > >> >
>>>>>>> > > > > >> >
>>>>>>> > > > > >> > Can someone please take a look?
>>>>>>> > > > > >> > Thanks
>>>>>>> > > > > >> > Vojta
>>>>>>> > > > > >> >
>>>>>>> > > > > >> >
>>>>>>> > > > > >> >
>>>>>>> > > > > >> > [1]
>>>>>>> https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6128/
>>>>>>> > > > > >> > [2]
>>>>>>> > > > > >>
>>>>>>> > > > > >>
>>>>>>> > > > > >>
>>>>>>> > > > > >>
>>>>>>> https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6128/artifact
>>>>>>> > > > > >> /
>>>>>>> > > > > >>
>>>>>>> > > > > >>
>>>>>>> > > > > >>
>>>>>>> > > > > >> > exported-artifacts/test_logs/basic-suite-master/
>>>>>>> > > > > >>
>>>>>>> > > > > >>
>>>>>>> > > > > >>
>>>>>>> > > > > >> post-002_bootstrap.py/lago-
>>>>>>> > > > > >>
>>>>>>> > > > > >>
>>>>>>> > > > > >>
>>>>>>> > > > > >>
>>>>>>> basic-suite-master-engine/_var_log/ovirt-engine/engine.log___________
>>>>>>> > > > > >> ____
>>>>>>> > > > > >> ________________________________>>
>>>>>>> > > > > >>
>>>>>>> > > > > >> > Devel mailing list -- devel@ovirt.org
>>>>>>> > > > > >> > To unsubscribe send an email to devel-le...@ovirt.org
>>>>>>> > > > > >> > Privacy Statement:
>>>>>>> https://www.ovirt.org/site/privacy-policy/
>>>>>>> > > > > >>
>>>>>>> > > > > >>
>>>>>>> > > > > >>
>>>>>>> > > > > >> > oVirt Code of Conduct:
>>>>>>> > > > > >>
>>>>>>> > > > > >>
>>>>>>> https://www.ovirt.org/community/about/community-guidelines/
>>>>>>> > > > > >>
>>>>>>> > > > > >>
>>>>>>> > > > > >>
>>>>>>> > > > > >> > List Archives:
>>>>>>> > > > > >>
>>>>>>> > > > > >>
>>>>>>> https://lists.ovirt.org/archives/list/devel@ovirt.org/message/4K5N3VQ
>>>>>>> > > > > >> N26B
>>>>>>> > > > > >> L73K7D45A2IR7R3UMMM23/
>>>>>>> > > > > >> _______________________________________________
>>>>>>> > > > > >> Devel mailing list -- devel@ovirt.org
>>>>>>> > > > > >> To unsubscribe send an email to devel-le...@ovirt.org
>>>>>>> > > > > >> Privacy Statement:
>>>>>>> https://www.ovirt.org/site/privacy-policy/
>>>>>>> > > > > >> oVirt Code of Conduct:
>>>>>>> > > > > >>
>>>>>>> https://www.ovirt.org/community/about/community-guidelines/
>>>>>>> > > > > >> List Archives:
>>>>>>> > > > > >>
>>>>>>> https://lists.ovirt.org/archives/list/devel@ovirt.org/message/JN7MNUZ
>>>>>>> > > > > >> N5K3
>>>>>>> > > > > >> NS5TGXFCILYES77KI5TZU/
>>>>>>> > > >
>>>>>>> > > >
>>>>>>> > >
>>>>>>> > > _______________________________________________
>>>>>>> > > Devel mailing list -- devel@ovirt.org
>>>>>>> > > To unsubscribe send an email to devel-le...@ovirt.org
>>>>>>> > > Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>>>>>>> > > oVirt Code of Conduct:
>>>>>>> > > https://www.ovirt.org/community/about/community-guidelines/
>>>>>>> List Archives:
>>>>>>> > >
>>>>>>> https://lists.ovirt.org/archives/list/devel@ovirt.org/message/UPJ5SEAV5Z65H
>>>>>>> > > 5BQ3SCHOYZX6JMTQPBW/
>>>>>>> >
>>>>>>>
>>>>>>>
>>>>
>>>> --
>>>> Martin Perina
>>>> Manager, Software Engineering
>>>> Red Hat Czech s.r.o.
>>>>
>>>>
>>>>
_______________________________________________
Devel mailing list -- devel@ovirt.org
To unsubscribe send an email to devel-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/MTQW6SWMBZ6U6CD5SNXLA33GATZ4R2GM/

Reply via email to