<p>Grant,
On 6/4/24 12:29 AM, Joost Roeleveld wrote:
I don't use FibreChannel myself (can't justify the cost).
I hear that. I manage an old EMC for a friend who's still using it
as clustered storage for an old VMware install. It started giving us
occasional hiccups (corrected memory errors) so we're assembling a
cold spare. He's footing the bill and I get to play with the
equipment. }:-) Though honestly, the bill ins't that bad any more.
But with SAS drives with dual expanders and 2 HBAs, I do get
multipath to work.
ACK
The OS sees every drive 4 times (2 HBAs, each 2 paths to every drive).
Four times surprises me. -- I know that SAS drives have two SAS
channels / lanes / terms? on them. But I don't know how those are
both connected to two controllers. I would have naively assumed each
drive showed up as two drives, one for each channel / lane / ???. I
guess if there is some sort of multiplexor / breakout / backplane
there's a way to connect both controllers to both channels / lanes /
??? on each drive and four paths would make sense.
Aside: I'd like to know more about how you're doing that physical connection.
I have 2 HBAs.
Each HBA is connected to expander A and expander B (seperate expander
chips on the same backplane)
Each SAS drive has 2 connections. Connection 1 is connected to
expander A. 2 is connected to B.
(might be the other way round, but this explains the situation)
That means that each HBA has 2 paths to each drive.
And the OS can use either HBA to each drive.
This totals to 4 times :)
On top of this, I run multipath with the default configuration and
use the corresponding /dev/mapper/... entries.
So dev-mapper multipathing is working in a contemporary Gentoo
system. Thank you for the confirmation. That's what I was wanting.
Yes, I have dev-mapper multipath support in the kernel:
# zcat /proc/config.gz | grep -i multipath
# CONFIG_NVME_MULTIPATH is not set
CONFIG_DM_MULTIPATH=y
CONFIG_DM_MULTIPATH_QL=m
CONFIG_DM_MULTIPATH_ST=m
CONFIG_DM_MULTIPATH_HST=m
# CONFIG_DM_MULTIPATH_IOA is not set
I installed multipath:
# eix -I multipath
[U] sys-fs/multipath-tools
Available versions: 0.9.7^t 0.9.7-r1^t{tbz2} 0.9.8^t{tbz2} {systemd test}
Installed versions: 0.9.7-r1^t{tbz2}(04:27:38 PM 04/10/2024)(-systemd -test)
Homepage: http://christophe.varoqui.free.fr/
Description: Device mapper target autoconfig
(I don't update this particular system constantly as it's critical to
the infrastructure)
I added 'multipath' and 'multipathd' to the default runlevel:
# rc-status | grep multipath
multipath [ started ]
multipathd [ started ]
The configfiles:
# cat /etc/multipath.conf
defaults {
path_grouping_policy multibus
path_selector "queue-length 0"
rr_min_io_rq 100
}
# ls /etc/multipath
bindings wwids
san1 ~ # cat /etc/multipath/bindings
# Multipath bindings, Version : 1.0
# NOTE: this file is automatically maintained by the multipath program.
# You should not need to edit this file in normal circumstances.
#
# Format:
# alias wwid
#
san1 ~ # cat /etc/multipath/wwids
# Multipath wwids, Version : 1.0
# NOTE: This file is automatically maintained by multipath and multipathd.
# You should not need to edit this file in normal circumstances.
#
# Valid WWIDs:
With all this, I got multipath working:
# multipath -l
35000cca0c444c380 dm-11 HGST,HUS726T4TAL5204
size=3.6T features='0' hwhandler='0' wp=rw
`-+- policy='queue-length 0' prio=0 status=active
|- 0:0:10:0 sdk 8:160 active undef running
|- 0:0:35:0 sdai 66:32 active undef running
|- 1:0:10:0 sdbg 67:160 active undef running
`- 1:0:35:0 sdce 69:32 active undef running
I'm going to assume that I have something mis-configured in
dev-mapper as it wasn't originally used on my test system and only
added for the FC testing.
All the config I have is listed above, hope it helps
I NEVER use the any of the other entries in the /dev/... tree.
I understand what you're saying. I'd be following suit if multi-path
was working. -- I'm /well/ aware of the hazards of using the member
/ backing disks vs the multi-path disk.
I just checked and noticed some new entries under by-id:
/dev/disk/by-id/dm-uuid-mpath-....
These didn't exist when I originally set up this system and am not
going to risk issues by switching.
Based on that, do you see the fibrechannel drives show up as
/dev/sd... entries on your system at all?
Yes, both test LUNs (1 x 10 GB and 1 x 100 GB) show up once per
client HBA port.
Aside: I don't have the fiber switch in place yet, so I'm using
cables directly out of an HBA port into an EMC controller port.
I am seeing the expected number of /dev/sd* for the LUNs.
/dev/sdd = 10 GB
/dev/sdf = 100 GB
/dev/sdg = 10 GB
/dev/sdh = 100 GB
There is 1 thing that might cause issues. If multipath doesn't detect
"sdd" and "sdg" are the same physical disk, it won't automagically
link them. On my system, it identifies it due to them having the same
serial number. For Fiberchannel to something else, you might need to
configure this.
Can you check the output of the following commands:
cat /sys/block/sdd/device/wwid
cat /sys/block/sdf/device/wwid
cat /sys/block/sdg/device/wwid
cat /sys/block/sdh/device/wwid
For me, using devicenames that are in the same multipath-group, the
output is identical.
For different groups, I get different outputs.
I am assuming multipath uses this to group them. Example (for 2 of the
4 entries shown above):
# cat /sys/block/sdai/device/wwid
naa.5000cca0c444c380
# cat /sys/block/sda/device/wwid
naa.5000cca25d8c17ec
If yes, then multipath should pick them up.
That's what I thought.
As I get into the multipath command and debug things it seems like
something thinks the paths are offline despite the underlying member
/ backing disks being online and accessible.
Are the devices used somewhere already? Maybe mounted (automounter) or
LVM or.... ?
If not, I would expect you need to get them seen via all the paths first.
Yep. I can see them via their underlying member / backing paths
without any problem.
This feels like a device-mapper -> multipath issue to me.
It /may/ be complicated by the old EMC but I'd be surprised by that.
It could. Check the entries in the /sys/... filesystem referenced
above. That might show a possible cause.
I think there is a way to force multipath to group disks together even
when those IDs are different. But that is something we'd need to
investigate.
Thank you for confirming that device-mapper multipathing should be
working. -- Now that I know that I'm not chasing a ghost, I'll spend
some more time making sure that device-mapepr and multipathing is
installed and configured properly (USE flags, kernel options, user
space utilties, etc) before taking another swing at dm-multipath for
the SAN LUNs.
Check my config above and we might be able to figure this out.
It's been running stable for me for over 7 years now with most disks
being that age as well. :)
--
Joost
</list>