[ovirt-users] Multipath flapping with SAS via FCP

2021-02-25 Thread Benoit Chatelain
Hi,

I have some troubles with multipath. 
When I add SAS disk over FCP as Storage Domain via oVirt WebUI, 
The first link as active, but the second is stuck as failed.

Volum disk is provided from Dell Compellent via FCP, and disk is transported in 
SAS.

multipath is flapping in all hypervisor from the same domain disk:

[root@isildur-adm ~]# tail -f /var/log/messages
Feb 25 11:48:21 isildur-adm kernel: device-mapper: multipath: 253:3: Failing 
path 8:32.
Feb 25 11:48:24 isildur-adm multipathd[659460]: 
36000d31003d5c210: sdc - tur checker reports path is up
Feb 25 11:48:24 isildur-adm multipathd[659460]: 8:32: reinstated
Feb 25 11:48:24 isildur-adm multipathd[659460]: 
36000d31003d5c210: remaining active paths: 2
Feb 25 11:48:24 isildur-adm kernel: device-mapper: multipath: 253:3: 
Reinstating path 8:32.
Feb 25 11:48:24 isildur-adm kernel: sd 1:0:1:2: alua: port group f01c state S 
non-preferred supports toluSNA
Feb 25 11:48:24 isildur-adm kernel: sd 1:0:1:2: alua: port group f01c state S 
non-preferred supports toluSNA
Feb 25 11:48:24 isildur-adm kernel: device-mapper: multipath: 253:3: Failing 
path 8:32.
Feb 25 11:48:25 isildur-adm multipathd[659460]: sdc: mark as failed
Feb 25 11:48:25 isildur-adm multipathd[659460]: 
36000d31003d5c210: remaining active paths: 1
--- 
[root@isildur-adm ~]# multipath -ll
36000d31003d5c210 dm-3 COMPELNT,Compellent Vol
size=1.5T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=25 status=active
  |- 1:0:0:2 sdb 8:16 active ready running
  `- 1:0:1:2 sdc 8:32 failed ready running
---
VDSM generate multipath.conf like this ( I have remove commented lines for read 
confort ) : 

[root@isildur-adm ~]# cat /etc/multipath.conf 
# VDSM REVISION 2.0

# This file is managed by vdsm.
defaults {
polling_interval5
no_path_retry   16
user_friendly_names no
flush_on_last_del   yes
fast_io_fail_tmo5
dev_loss_tmo30
max_fds 4096
}
blacklist {
protocol "(scsi:adt|scsi:sbp)"
}

no_path_retry   16
}

Have you some idea why this link is flapping on my two hypervisor?

Thanks a lot in advance.
- Benoit Chatelain
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/M426QWSHTUYSNROKH5CRUIU44PFFZD4Y/


[ovirt-users] Re: Multipath flapping with SAS via FCP

2021-02-26 Thread Benoit Chatelain
Hi Nir Soffer,
Thank for your reply

Indeed, the device fails immediately after it was reinstated.

There is my 'multipathd show config' dump :

defaults {
verbosity 2
polling_interval 5
max_polling_interval 20
reassign_maps "no"
multipath_dir "/lib64/multipath"
path_selector "service-time 0"
path_grouping_policy "failover"
uid_attribute "ID_SERIAL"
prio "const"
prio_args ""
features "0"
path_checker "tur"
alias_prefix "mpath"
failback "manual"
rr_min_io 1000
rr_min_io_rq 1
max_fds 4096
rr_weight "uniform"
no_path_retry 16
queue_without_daemon "no"
flush_on_last_del "yes"
user_friendly_names "no"
fast_io_fail_tmo 5
dev_loss_tmo 60
bindings_file "/etc/multipath/bindings"
wwids_file "/etc/multipath/wwids"
prkeys_file "/etc/multipath/prkeys"
log_checker_err always
all_tg_pt "no"
retain_attached_hw_handler "yes"
detect_prio "yes"
detect_checker "yes"
force_sync "no"
strict_timing "no"
deferred_remove "no"
config_dir "/etc/multipath/conf.d"
delay_watch_checks "no"
delay_wait_checks "no"
san_path_err_threshold "no"
san_path_err_forget_rate "no"
san_path_err_recovery_time "no"
marginal_path_err_sample_time "no"
marginal_path_err_rate_threshold "no"
marginal_path_err_recheck_gap_time "no"
marginal_path_double_failed_time "no"
find_multipaths "on"
uxsock_timeout 4000
retrigger_tries 3
retrigger_delay 10
missing_uev_wait_timeout 30
skip_kpartx "no"
disable_changed_wwids ignored
remove_retries 0
ghost_delay "no"
find_multipaths_timeout -10
enable_foreign ""
marginal_pathgroups "no"
}
blacklist {
devnode "!^(sd[a-z]|dasd[a-z]|nvme[0-9])"
wwid "36f402700f232e40026b41bd43a0812e5"
protocol "(scsi:adt|scsi:sbp)"
...
}
blacklist_exceptions {
protocol "scsi:sas"
}
devices {
...
device {
vendor "COMPELNT"
product "Compellent Vol"
path_grouping_policy "multibus"
no_path_retry "queue"
}
...
}
overrides {
no_path_retry 16
}

And there is my scsi disks (sdb & sdc disks) :

[root@anarion-adm ~]# lsscsi -l
[0:2:0:0]diskDELL PERC H330 Adp4.30  /dev/sda   
   
  state=running queue_depth=256 scsi_level=6 type=0 device_blocked=0 timeout=90
[1:0:0:2]diskCOMPELNT Compellent Vol   0704  /dev/sdb 
  state=running queue_depth=254 scsi_level=6 type=0 device_blocked=0 timeout=30
[1:0:1:2]diskCOMPELNT Compellent Vol   0704  /dev/sdc 
  state=running queue_depth=254 scsi_level=6 type=0 device_blocked=0 timeout=30


My disk configuration is present in multipath, and the DELLEMC documentation & 
white paper don't specifying exotics configuration for multipathd. (I'm wrong ?)

I looked modules for SAS & FCP driver, they look good : 

[root@anarion-adm ~]# lsmod | grep sas
mpt3sas   303104  4
raid_class 16384  1 mpt3sas
megaraid_sas  172032  2
scsi_transport_sas 45056  1 mpt3sas

[root@anarion-adm ~]# lsmod | grep fc
bnx2fc110592  0
cnic   69632  1 bnx2fc
libfcoe77824  2 qedf,bnx2fc
libfc 147456  3 qedf,bnx2fc,libfcoe
scsi_transport_fc  69632  3 qedf,libfc,bnx2fc

Do you think my device is misconfigured? should I check on the vendor side?  
Another idea ? :)

Regards,
Benoit Chatelain
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/VBQPXXIQ5WQVKKBH67DGKFRJOGKSU27E/