Re: [ClusterLabs] Lost access with the volume while ZFS and other resources migrate to other node (reset VM)

Vladislav Bogdanov Tue, 04 Apr 2023 13:09:55 -0700

I know that uscsi initiators are very sensible to connection drops. That'swhy in all my setups with iscsi I use a special m/s resource agent which ina slave mode drops all packets to/from portals. That prevents initiatorsfrom receiving FIN packets from the target when it migrates, and theyusually behave much better. I can share that RA and setup instructions ifthat is interesting to someone.


Reid Wahl <nw...@redhat.com> 4 апреля 2023 г. 20:20:52 написал:

On Tue, Apr 4, 2023 at 7:08 AM Ken Gaillot <kgail...@redhat.com> wrote:

On Mon, 2023-04-03 at 02:47 +0300, Александр via Users wrote:
> Pacemaker + corosync cluster with 2 virtual machines (ubuntu 22.04,
> 16 Gb RAM, 8 CPU each) are assembled into a cluster, an HBA is
> forwarded to each of them to connect to a disk shelf according to the
> instructions https://netbergtw.com/top-support/articles/zfs-cib /. A

That looks like a well-thought-out guide. One minor correction, since
Corosync 3, no-quorum-policy=ignore is no longer needed. Instead, set
"two_node: 1" in corosync.conf (which may be automatic depending on
what tools you're using).

That's unlikely to be causing any issues, though.

> ZFS pool was assembled from 4 disks in draid1, resources were
> configured - virtual IP, iSCSITarget, iSCSILun. LUN connected in
> VMware. During an abnormal shutdown of the node, resources move, but

How are you testing abnormal shutdown? For something like a power
interruption. I'd expect that the node would be fenced, but in your
logs it looks like recovery is taking place between clean nodes.


See also discussion starting at this comment:
https://github.com/ClusterLabs/resource-agents/issues/1852#issuecomment-1479119045

Happy to see this on the mailing list :)


> at the moment this happens, VMware loses contact with the LUN, which
> should not happen. The journalctl log at the time of the move is
> here: https://pastebin.com/eLj8DdtY. I also tried to build a common
> storage on drbd with cloned VIP and Target resources, but this also
> does not work, besides, every time I move, there are always some
> problems with the start of resources. Any ideas what can be done
> about this? Loss of communication with the LUN even for a couple of
> seconds is already critical.
>
> corosync-qdevice/jammy,now 3.0.1-1 amd64 [installed]
> corosync-qnetd/jammy,now 3.0.1-1 amd64 [installed]
> corosync/jammy,now 3.1.6-1ubuntu1 amd64 [installed]
> pacemaker-cli-utils/jammy,now 2.1.2-1ubuntu3 amd64
> [installed,automatic]
> pacemaker-common/jammy,now 2.1.2-1ubuntu3 all [installed,automatic]
> pacemaker-resource-agents/jammy,now 2.1.2-1ubuntu3 all
> [installed,automatic]
> pacemaker/jammy,now 2.1.2-1ubuntu3 amd64 [installed]
> pcs/jammy,now 0.10.11-2ubuntu3 all [installed]
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
--
Ken Gaillot <kgail...@redhat.com>

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/




--
Regards,

Reid Wahl (He/Him)
Senior Software Engineer, Red Hat
RHEL High Availability - Pacemaker

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Lost access with the volume while ZFS and other resources migrate to other node (reset VM)

Reply via email to