On 6/4/24 11:03 AM, Joost Roeleveld wrote:
I have 2 HBAs.
Each HBA is connected to expander A and expander B (seperate expander chips on the same backplane) Each SAS drive has 2 connections. Connection 1 is connected to expander A. 2 is connected to B.
(might be the other way round, but this explains the situation)

That means that each HBA has 2 paths to each drive.
And the OS can use either HBA to each drive.

This totals to 4 times :)

That explains things. Thank you for helping me understand your working configuration.

All the config I have is listed above, hope it helps

I hope that it will to.

It definitely gives me things to check and compare against.  Thank you!

I just checked and noticed some new entries under by-id:
/dev/disk/by-id/dm-uuid-mpath-....

These didn't exist when I originally set up this system and am not going to risk issues by switching.

ACK and +10

There is 1 thing that might cause issues. If multipath doesn't detect "sdd" and "sdg" are the same physical disk, it won't automagically link them. On my system, it identifies it due to them having the same serial number. For Fiberchannel to something else, you might need to configure this.

Agreed.

I believe I checked and found that the two disks for each LUN did have the same serial number.

Can you check the output of the following commands:
cat /sys/block/sdd/device/wwid
cat /sys/block/sdf/device/wwid
cat /sys/block/sdg/device/wwid
cat /sys/block/sdh/device/wwid

I will when I work on things this weekend.

For me, using devicenames that are in the same multipath-group, the output is identical.

ACK

For different groups, I get different outputs.

As it should be.

I am assuming multipath uses this to group them. Example (for 2 of the 4 entries shown above):

# cat /sys/block/sdai/device/wwid
naa.5000cca0c444c380
# cat /sys/block/sda/device/wwid
naa.5000cca25d8c17ec

Based on the OUI of 00:0c:ca, that looks like HGST (Hitachi?) a Western Digital Company according to the recent OUI list I have.

Network Addressing Authority 2, 5, and 6 are interesting.  :-)

Are the devices used somewhere already? Maybe mounted (automounter) or LVM or.... ?

No.  Not yet.

I did partition one of the backend member LUNs, caused the system to release all of the LUNs (echo 1 > /proc or /sys ... /delete), and then re-scan (echo "- - -" /proc or /sys ... /host#/scan).

But nothing is actually using the LUNs yet.

It could. Check the entries in the /sys/... filesystem referenced above. That might show a possible cause.

I'm not finding anything wrong with the backing / member LUNs.

The only problem that I've found is that multipath / multipathd didn't like something about the paths when I cranked verbosity / debug way up.

However, this could be because I don't have device-mapper much less multi-path installed / configured properly yet.

I added an HBA to my workstation as a test system. As such I'm adding DM & MP MANY months after the system was configured.

I think there is a way to force multipath to group disks together even when those IDs are different. But that is something we'd need to investigate.

I've found that forcing things like this is usually doesn't work out as well as I would like it to. :-/

Check my config above and we might be able to figure this out.

I definitely will.

It's been running stable for me for over 7 years now with most disks being that age as well. :)

Thank you Joost.



--
Grant. . . .

Reply via email to