On 6/4/24 11:03 AM, Joost Roeleveld wrote:
I have 2 HBAs.
Each HBA is connected to expander A and expander B (seperate expander
chips on the same backplane)
Each SAS drive has 2 connections. Connection 1 is connected to expander
A. 2 is connected to B.
(might be the other way round, but this explains the situation)
That means that each HBA has 2 paths to each drive.
And the OS can use either HBA to each drive.
This totals to 4 times :)
That explains things. Thank you for helping me understand your working
configuration.
All the config I have is listed above, hope it helps
I hope that it will to.
It definitely gives me things to check and compare against. Thank you!
I just checked and noticed some new entries under by-id:
/dev/disk/by-id/dm-uuid-mpath-....
These didn't exist when I originally set up this system and am not going
to risk issues by switching.
ACK and +10
There is 1 thing that might cause issues. If multipath doesn't detect
"sdd" and "sdg" are the same physical disk, it won't automagically link
them. On my system, it identifies it due to them having the same serial
number. For Fiberchannel to something else, you might need to configure
this.
Agreed.
I believe I checked and found that the two disks for each LUN did have
the same serial number.
Can you check the output of the following commands:
cat /sys/block/sdd/device/wwid
cat /sys/block/sdf/device/wwid
cat /sys/block/sdg/device/wwid
cat /sys/block/sdh/device/wwid
I will when I work on things this weekend.
For me, using devicenames that are in the same multipath-group, the
output is identical.
ACK
For different groups, I get different outputs.
As it should be.
I am assuming multipath uses this to group them. Example (for 2 of the 4
entries shown above):
# cat /sys/block/sdai/device/wwid
naa.5000cca0c444c380
# cat /sys/block/sda/device/wwid
naa.5000cca25d8c17ec
Based on the OUI of 00:0c:ca, that looks like HGST (Hitachi?) a Western
Digital Company according to the recent OUI list I have.
Network Addressing Authority 2, 5, and 6 are interesting. :-)
Are the devices used somewhere already? Maybe mounted (automounter) or
LVM or.... ?
No. Not yet.
I did partition one of the backend member LUNs, caused the system to
release all of the LUNs (echo 1 > /proc or /sys ... /delete), and then
re-scan (echo "- - -" /proc or /sys ... /host#/scan).
But nothing is actually using the LUNs yet.
It could. Check the entries in the /sys/... filesystem referenced above.
That might show a possible cause.
I'm not finding anything wrong with the backing / member LUNs.
The only problem that I've found is that multipath / multipathd didn't
like something about the paths when I cranked verbosity / debug way up.
However, this could be because I don't have device-mapper much less
multi-path installed / configured properly yet.
I added an HBA to my workstation as a test system. As such I'm adding
DM & MP MANY months after the system was configured.
I think there is a way to force multipath to group disks together even
when those IDs are different. But that is something we'd need to
investigate.
I've found that forcing things like this is usually doesn't work out as
well as I would like it to. :-/
Check my config above and we might be able to figure this out.
I definitely will.
It's been running stable for me for over 7 years now with most disks
being that age as well. :)
Thank you Joost.
--
Grant. . . .