On Tue, Mar 10, 2020 at 7:03 PM Amit Bawer <aba...@redhat.com> wrote:
>
> Seems like a reproduce of 
> https://bugzilla.redhat.com/show_bug.cgi?id=1807050#c1

Agree, because...

> Snipped from 
> https://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/21146/artifact/basic-suite.el7.x86_64/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-host-1/_var_log/vdsm/vdsm.log:
>
> 2020-03-10 05:59:18,549-0400 ERROR (jsonrpc/3) [storage.LVM] vg 
> cceb9d83-7b76-4840-a189-c82f3c18760e has pv_count 2 but pv_names 
> ('/dev/mapper/3600140544bef7e411164e5f94e13b5d8',) (lvm:578)
> 2020-03-10 05:59:18,551-0400 INFO  (jsonrpc/3) [storage.StorageDomain] 
> sdUUID=cceb9d83-7b76-4840-a189-c82f3c18760e (blockSD:1192)
> 2020-03-10 05:59:18,551-0400 DEBUG (jsonrpc/3) [common.commands] 
> /usr/bin/taskset --cpu-list 0-1 /usr/bin/sudo -n /sbin/lvm vgck --config 
> 'devices {  preferred_names=["^/dev/mapper/"]  ignore_suspended_devices=1  
> write_cache_state=0  disable_after_error_count=3  
> filter=["a|^/dev/mapper/3600140544bef7e411164e5f94e13b5d8$|", "r|.*|"]  
> hints="none" } global {  locking_type=1  prioritise_write_locks=1  
> wait_for_locks=1  use_lvmetad=0 } backup {  retain_min=50  retain_days=0 }' 
> cceb9d83-7b76-4840-a189-c82f3c18760e (cwd None) (commands:153)
> 2020-03-10 05:59:18,634-0400 DEBUG (jsonrpc/3) [common.commands] FAILED: 
> <err> = b"  WARNING: Couldn't find device with uuid 
> FH6lfD-DZus-6Ndn-tkr8-5Hsy-lt2c-CDRPDU.\n  WARNING: VG 
> cceb9d83-7b76-4840-a189-c82f3c18760e is missing PV 
> FH6lfD-DZus-6Ndn-tkr8-5Hsy-lt2c-CDRPDU.\n  The volume group is missing 1 
> physical volumes.\n"; <rc> = 5 (commands:185)
> 2020-03-10 05:59:18,637-0400 INFO  (jsonrpc/3) [vdsm.api] FINISH 
> getStorageDomainInfo error=Domain is either partially accessible or entirely 
> inaccessible: ('cceb9d83-7b76-4840-a189-c82f3c18760e: ["  WARNING: Couldn\'t 
> find device with uuid FH6lfD-DZus-6Ndn-tkr8-5Hsy-lt2c-CDRPDU.", \'  WARNING: 
> VG cceb9d83-7b76-4840-a189-c82f3c18760e is missing PV 
> FH6lfD-DZus-6Ndn-tkr8-5Hsy-lt2c-CDRPDU.\', \'  The volume group is missing 1 
> physical volumes.\']',) from=::ffff:192.168.201.4,47796, 
> flow_id=5f02a1ec-db37-470d-b329-41b22f23582b, 
> task_id=9be86ca4-49ac-47ea-b0e2-8182e33924ff (api:52)

This command was run only once. Usually when a command using specific filter
(e.g.  filter=["a|^/dev/mapper/3600140544bef7e411164e5f94e13b5d8$|", "r|.*|"])
fails, we rebuild the filter. If the new filter is different (e.g has
more devices) we
run the command again.

Since we ran the command only once we know that the filter is correct,
so we have
only /dev/mapper/3600140544bef7e411164e5f94e13b5d8 on the host. The other PV
is not available when this command was run.

We started the connection here:

2020-03-10 05:59:17,364-0400 DEBUG (jsonrpc/2) [common.commands]
/usr/bin/taskset --cpu-list 0-1 /usr/bin/sudo -n /sbin/iscsiadm -m
node -T iqn.2014-07.org.ovirt:storage -I default -p
192.168.200.4:3260,1 -l (cwd None) (commands:153)
2020-03-10 05:59:17,504-0400 DEBUG (jsonrpc/2) [common.commands]
SUCCESS: <err> = b''; <rc> = 0 (commands:98)

And finished here:

2020-03-10 05:59:17,610-0400 DEBUG (jsonrpc/2) [common.commands]
/usr/bin/taskset --cpu-list 0-1 /sbin/udevadm settle --timeout=5 (cwd
None) (commands:153)
2020-03-10 05:59:17,787-0400 DEBUG (jsonrpc/2) [common.commands]
SUCCESS: <err> = b''; <rc> = 0 (commands:98)

In /var/log/message we see the connection starting here:

Mar 10 05:59:17 lago-basic-suite-master-host-1 iscsid[21973]: iscsid:
Connection2:0 to [target: iqn.2014-07.org.ovirt:storage, portal:
192.168.200.4,3260] through [iface: default] is operational now

Adding devices:

Mar 10 05:59:17 lago-basic-suite-master-host-1 kernel: sd 3:0:0:0:
[sdf] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
Mar 10 05:59:17 lago-basic-suite-master-host-1 kernel: sd 3:0:0:4:
[sdg] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
Mar 10 05:59:17 lago-basic-suite-master-host-1 kernel: sd 3:0:0:3:
[sdh] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
Mar 10 05:59:17 lago-basic-suite-master-host-1 kernel: sd 3:0:0:2:
[sdi] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)
Mar 10 05:59:17 lago-basic-suite-master-host-1 kernel: sd 3:0:0:1:
[sdj] 41943040 512-byte logical blocks: (21.5 GB/20.0 GiB)

Multipath adding devices to maps:

Mar 10 05:59:18 lago-basic-suite-master-host-1 multipathd[21959]: sdb
[8:16]: path added to devmap 36001405b39cc4e33bd24f35a81c0c140
Mar 10 05:59:18 lago-basic-suite-master-host-1 multipathd[21959]: sdc
[8:32]: path added to devmap 36001405a70a062950224fc985825aa0d
Mar 10 05:59:18 lago-basic-suite-master-host-1 multipathd[21959]: sda
[8:0]: path added to devmap 3600140559c49ea12b0d4dc1994ba4ef0
Mar 10 05:59:18 lago-basic-suite-master-host-1 multipathd[21959]: sde
[8:64]: path added to devmap 36001405f277c71b13814669926ffbae4
Mar 10 05:59:18 lago-basic-suite-master-host-1 multipathd[21959]: sdi
[8:128]: path added to devmap 3600140544bef7e411164e5f94e13b5d8  <<<
This is probably the missing device

So we need to wait for a while, until multipath handles all the devices.

https://gerrit.ovirt.org/c/107206/ should avoid this issue.

Benny, please try to run OST.

> 2020-03-10 05:59:18,637-0400 ERROR (jsonrpc/3) [storage.TaskManager.Task] 
> (Task='9be86ca4-49ac-47ea-b0e2-8182e33924ff') Unexpected error (task:880)
> Traceback (most recent call last):
>   File "/usr/lib/python3.6/site-packages/vdsm/storage/task.py", line 887, in 
> _run
>     return fn(*args, **kargs)
>   File "<decorator-gen-129>", line 2, in getStorageDomainInfo
>   File "/usr/lib/python3.6/site-packages/vdsm/common/api.py", line 50, in 
> method
>     ret = func(*args, **kwargs)
>   File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 2752, in 
> getStorageDomainInfo
>     dom = self.validateSdUUID(sdUUID)
>   File "/usr/lib/python3.6/site-packages/vdsm/storage/hsm.py", line 310, in 
> validateSdUUID
>     sdDom.validate()
>   File "/usr/lib/python3.6/site-packages/vdsm/storage/blockSD.py", line 1193, 
> in validate
>     lvm.chkVG(self.sdUUID)
>   File "/usr/lib/python3.6/site-packages/vdsm/storage/lvm.py", line 1278, in 
> chkVG
>     raise se.StorageDomainAccessError("%s: %s" % (vgName, err))
> vdsm.storage.exception.StorageDomainAccessError: Domain is either partially 
> accessible or entirely inaccessible: ('cceb9d83-7b76-4840-a189-c82f3c18760e: 
> ["  WARNING: Couldn\'t find device with uuid 
> FH6lfD-DZus-6Ndn-tkr8-5Hsy-lt2c-CDRPDU.", \'  WARNING: VG 
> cceb9d83-7b76-4840-a189-c82f3c18760e is missing PV 
> FH6lfD-DZus-6Ndn-tkr8-5Hsy-lt2c-CDRPDU.\', \'  The volume group is missing 1 
> physical volumes.\']',)
> 2020-03-10 05:59:18,637-0400 INFO  (jsonrpc/3) [storage.TaskManager.Task] 
> (Task='9be86ca4-49ac-47ea-b0e2-8182e33924ff') aborting: Task is aborted: 
> 'value=Domain is either partially accessible or entirely inaccessible: 
> (\'cceb9d83-7b76-4840-a189-c82f3c18760e: ["  WARNING: Couldn\\\'t find device 
> with uuid FH6lfD-DZus-6Ndn-tkr8-5Hsy-lt2c-CDRPDU.", \\\'  WARNING: VG 
> cceb9d83-7b76-4840-a189-c82f3c18760e is missing PV 
> FH6lfD-DZus-6Ndn-tkr8-5Hsy-lt2c-CDRPDU.\\\', \\\'  The volume group is 
> missing 1 physical volumes.\\\']\',) abortedcode=379' (task:1190)
> 2020-03-10 05:59:18,638-0400 ERROR (jsonrpc/3) [storage.Dispatcher] FINISH 
> getStorageDomainInfo error=Domain is either partially accessible or entirely 
> inaccessible: ('cceb9d83-7b76-4840-a189-c82f3c18760e: ["  WARNING: Couldn\'t 
> find device with uuid FH6lfD-DZus-6Ndn-tkr8-5Hsy-lt2c-CDRPDU.", \'  WARNING: 
> VG cceb9d83-7b76-4840-a189-c82f3c18760e is missing PV 
> FH6lfD-DZus-6Ndn-tkr8-5Hsy-lt2c-CDRPDU.\', \'  The volume group is missing 1 
> physical volumes.\']',) (dispatcher:83)
>
>
> Suggest to try again once the BZ is fixed on master.
>
> On Tue, Mar 10, 2020 at 1:36 PM Yedidyah Bar David <d...@redhat.com> wrote:
> >
> > Hi all,
> >
> > Anyone looking at this?
> >
> > See e.g.:
> >
> > https://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/21146/
> >
> > Thanks,
> > --
> > Didi
> > _______________________________________________
> > Devel mailing list -- devel@ovirt.org
> > To unsubscribe send an email to devel-le...@ovirt.org
> > Privacy Statement: https://www.ovirt.org/privacy-policy.html
> > oVirt Code of Conduct: 
> > https://www.ovirt.org/community/about/community-guidelines/
> > List Archives: 
> > https://lists.ovirt.org/archives/list/devel@ovirt.org/message/ED57V5XW4B3WC7AM5GRYDE6CJJL7PWPM/
> _______________________________________________
> Devel mailing list -- devel@ovirt.org
> To unsubscribe send an email to devel-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/devel@ovirt.org/message/GFWZBWE3UT4OCB2GDJ7WPOG62TIKSU43/
_______________________________________________
Devel mailing list -- devel@ovirt.org
To unsubscribe send an email to devel-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/4YQUXGCCZIJM7SN52WKRVEJBIE6AAYAU/

Reply via email to