<p>Grant,

On 6/4/24 12:29 AM, Joost Roeleveld wrote:
I don't use FibreChannel myself (can't justify the cost).

I hear that. I manage an old EMC for a friend who's still using it as clustered storage for an old VMware install. It started giving us occasional hiccups (corrected memory errors) so we're assembling a cold spare. He's footing the bill and I get to play with the equipment. }:-) Though honestly, the bill ins't that bad any more.

But with SAS drives with dual expanders and 2 HBAs, I do get multipath to work.

ACK

The OS sees every drive 4 times (2 HBAs, each 2 paths to every drive).

Four times surprises me. -- I know that SAS drives have two SAS channels / lanes / terms? on them. But I don't know how those are both connected to two controllers. I would have naively assumed each drive showed up as two drives, one for each channel / lane / ???. I guess if there is some sort of multiplexor / breakout / backplane there's a way to connect both controllers to both channels / lanes / ??? on each drive and four paths would make sense.

Aside: I'd like to know more about how you're doing that physical connection.

I have 2 HBAs.
Each HBA is connected to expander A and expander B (seperate expander chips on the same backplane) Each SAS drive has 2 connections. Connection 1 is connected to expander A. 2 is connected to B.
(might be the other way round, but this explains the situation)

That means that each HBA has 2 paths to each drive.
And the OS can use either HBA to each drive.

This totals to 4 times :)

On top of this, I run multipath with the default configuration and use the corresponding /dev/mapper/... entries.

So dev-mapper multipathing is working in a contemporary Gentoo system. Thank you for the confirmation. That's what I was wanting.

Yes, I have dev-mapper multipath support in the kernel:
# zcat /proc/config.gz | grep -i multipath
# CONFIG_NVME_MULTIPATH is not set
CONFIG_DM_MULTIPATH=y
CONFIG_DM_MULTIPATH_QL=m
CONFIG_DM_MULTIPATH_ST=m
CONFIG_DM_MULTIPATH_HST=m
# CONFIG_DM_MULTIPATH_IOA is not set

I installed multipath:
# eix -I multipath
[U] sys-fs/multipath-tools
 Available versions: 0.9.7^t 0.9.7-r1^t{tbz2} 0.9.8^t{tbz2} {systemd test}
 Installed versions: 0.9.7-r1^t{tbz2}(04:27:38 PM 04/10/2024)(-systemd -test)
 Homepage: http://christophe.varoqui.free.fr/
 Description: Device mapper target autoconfig

(I don't update this particular system constantly as it's critical to the infrastructure)

I added 'multipath' and 'multipathd' to the default runlevel:
# rc-status | grep multipath
 multipath [ started ]
 multipathd [ started ]

The configfiles:

# cat /etc/multipath.conf
defaults {
 path_grouping_policy multibus
 path_selector "queue-length 0"
 rr_min_io_rq 100
}

# ls /etc/multipath
bindings wwids
san1 ~ # cat /etc/multipath/bindings
# Multipath bindings, Version : 1.0
# NOTE: this file is automatically maintained by the multipath program.
# You should not need to edit this file in normal circumstances.
#
# Format:
# alias wwid
#
san1 ~ # cat /etc/multipath/wwids
# Multipath wwids, Version : 1.0
# NOTE: This file is automatically maintained by multipath and multipathd.
# You should not need to edit this file in normal circumstances.
#
# Valid WWIDs:

With all this, I got multipath working:

# multipath -l
35000cca0c444c380 dm-11 HGST,HUS726T4TAL5204
size=3.6T features='0' hwhandler='0' wp=rw
`-+- policy='queue-length 0' prio=0 status=active
 |- 0:0:10:0 sdk 8:160 active undef running
 |- 0:0:35:0 sdai 66:32 active undef running
 |- 1:0:10:0 sdbg 67:160 active undef running
 `- 1:0:35:0 sdce 69:32 active undef running

I'm going to assume that I have something mis-configured in dev-mapper as it wasn't originally used on my test system and only added for the FC testing.

All the config I have is listed above, hope it helps

I NEVER use the any of the other entries in the /dev/... tree.

I understand what you're saying. I'd be following suit if multi-path was working. -- I'm /well/ aware of the hazards of using the member / backing disks vs the multi-path disk.

I just checked and noticed some new entries under by-id:
/dev/disk/by-id/dm-uuid-mpath-....

These didn't exist when I originally set up this system and am not going to risk issues by switching.

Based on that, do you see the fibrechannel drives show up as /dev/sd... entries on your system at all?

Yes, both test LUNs (1 x 10 GB and 1 x 100 GB) show up once per client HBA port.

Aside: I don't have the fiber switch in place yet, so I'm using cables directly out of an HBA port into an EMC controller port.

I am seeing the expected number of /dev/sd* for the LUNs.

/dev/sdd = 10 GB
/dev/sdf = 100 GB
/dev/sdg = 10 GB
/dev/sdh = 100 GB

There is 1 thing that might cause issues. If multipath doesn't detect "sdd" and "sdg" are the same physical disk, it won't automagically link them. On my system, it identifies it due to them having the same serial number. For Fiberchannel to something else, you might need to configure this.

Can you check the output of the following commands:
cat /sys/block/sdd/device/wwid
cat /sys/block/sdf/device/wwid
cat /sys/block/sdg/device/wwid
cat /sys/block/sdh/device/wwid

For me, using devicenames that are in the same multipath-group, the output is identical.
For different groups, I get different outputs.
I am assuming multipath uses this to group them. Example (for 2 of the 4 entries shown above):

# cat /sys/block/sdai/device/wwid
naa.5000cca0c444c380
# cat /sys/block/sda/device/wwid
naa.5000cca25d8c17ec

If yes, then multipath should pick them up.

That's what I thought.

As I get into the multipath command and debug things it seems like something thinks the paths are offline despite the underlying member / backing disks being online and accessible.

Are the devices used somewhere already? Maybe mounted (automounter) or LVM or.... ?

If not, I would expect you need to get them seen via all the paths first.

Yep. I can see them via their underlying member / backing paths without any problem.

This feels like a device-mapper -> multipath issue to me.

It /may/ be complicated by the old EMC but I'd be surprised by that.

It could. Check the entries in the /sys/... filesystem referenced above. That might show a possible cause. I think there is a way to force multipath to group disks together even when those IDs are different. But that is something we'd need to investigate.

Thank you for confirming that device-mapper multipathing should be working. -- Now that I know that I'm not chasing a ghost, I'll spend some more time making sure that device-mapepr and multipathing is installed and configured properly (USE flags, kernel options, user space utilties, etc) before taking another swing at dm-multipath for the SAN LUNs.

Check my config above and we might be able to figure this out.
It's been running stable for me for over 7 years now with most disks being that age as well. :)

--
Joost
</list>



Reply via email to