On 3/14/24 00:23, Strahil Nikolov wrote:
Hi All,

Do we have a thread about the HA iSCSI article ?
I saw a few things that can be optimized/updated:

Hello Strahil,

Thank you for reaching out and starting this thread! I just want to say up top that we don't often receive feedback on our tech guides, but we absolutely value and appreciate it.



1. Point 3.7.1 Shouldn't be done as pcs has a native way to auth, assemble and start the cluster and corosync tinkering is no longer needed. Something like this:

echo 'somepass' | passwd --stdin hacluster
pcs host auth node1 node1.example.com node2 node2.example.com

pcs cluster setup CLUSTERNAME node1 node2 totem token=10000 --enable --start


Great point. While I was aware of this, I'm very used to the older manual process of configuring the cluster communication layer. This comment applies to a lot of our newer guides that use pcs, and we'll work on addressing this throughout them all.


2. Point 3.8 is against any high availability and against Red Hat support policy - it should have a red lavel that this is done for the demo !

Absolutely agree. There is a big red "WARNING" at the bottom of section 3.8 that touches on this point, but we can move it to the top for better visibility.

It is my understanding that Red Hat doesn't support a few things that LINBIT does support in regards to Pacemaker, including clusters without node level fencing - albeit strongly suggested whenever possible.


3. In point 3.10 I highly recommend setting scsi_sn for the ocf:heartbeat:iSCSILogicalUnit - the software by default picks the same SN for the first LUN and when 1 client attaches 2 LUNs (from 2 srparate clusters) multipath will treat the 2 sources as one and aggregate the paths - becomes a real mess

Good catch. The automation we use internally for testing HA iSCSI deployments sets the scsi_sn, so I'm surprised that I missed it in this guide. I believe I recall older VMware products required the scsi_sn match for smooth failover as well, either way, will update.


4. Consider LVM filter for the DRBD device - a client might use the LUN as a PV in a volume group and then thr situation will get messy - the cluster won't be able to demote the node and pacemaker will fence it.

Excellent suggestion for a common use case. Will include.


5. Consider using fencing delay when using 2-node clusters - in case of split brain scenario the node with more resources will survive:

pcs resource defaults update priority=1
pcs property set priority-fencing-delay=10
<snip>

We definitely practice this for 2-node clusters we deploy with fencing configured.

I think it doesn't make sense to include in this guide where we do not configure fencing, but it should be included in a more general reference on, "how to properly configure fencing". I will make sure that we have this information somewhere for public consumption, and possibly link to it within all our tech guides.

Best Regards,
Matt Kereczman

P.S. We are happy to receive feedback and suggestions through this mailing list. Thanks again for what you have provided. You can also make suggestions about user's guides through opening issues in our GitHub repository (https://github.com/LINBIT/linbit-documentation). For technical "how-to" guides or other documentation content that isn't a user's guide, such as a knowledge base article or blog article, another option is to reach out to [email protected].

Reply via email to