On Wed, 2017-09-20 at 12:46 +0300, Vladislav Bogdanov wrote:
> Hi,
>
> as 1.1.17 received a lot of care in pcmk_remote, I decided to try it
> again
> in rather big setup (less then previous, so I'm not hit by IPC
> disconnects here).
>
> From the first runs there are still some severe issues when cluster
> nodes are fenced.
>
> The following results are obtained by killing the DC node (md12k-3-
> srv) which was
> hosting remote resources for nodes es7700-3-srv, es12kxe-3-srv and
> es12kxe-7-srv.
> After the fence new DC (md12k-1-srv) has moved that resources the
> following way:
> =
> Sep 20 08:53:28 md12k-1-srv pengine[2525]: notice: Movees12kxe-
> 3-srv (Started md12k-3-srv -> md12k-4-srv)
> Sep 20 08:53:28 md12k-1-srv pengine[2525]: notice: Movees12kxe-
> 7-srv (Started md12k-3-srv -> md12k-1-srv)
> Sep 20 08:53:28 md12k-1-srv pengine[2525]: notice: Movees7700-
> 3-srv (Started md12k-3-srv -> md12k-2-srv)
> =
>
> * The first issue I found (and I expect that to be a reason for some
> other issues) is that
> pacemaker_remote does not drop an old crmds' connection after new
> crmd connects.
> As IPC proxy connections are in the hash table, there is a 50% chance
> that remoted tries to
> reach an old crmd to f.e. proxy checks of node attributes when
> resources are reprobed.
> That leads to timeouts of that resources' probes with consequent
> reaction from a cluster.
> A solution here could be to drop old IPC proxy connection as soon as
> new one is established.
We can't drop connections from the pacemaker_remoted side because it
doesn't know anything about the cluster state (e.g. whether the cluster
connection resource is live-migrating).
However we can simply always use the most recently connected provider,
which I think solves the issue. See commit e9a7e3bb, one of a few
recent bugfixes in the master branch for pacemaker_remoted. It will
most likely not make it into 2.0 (which I'm trying to focus on
deprecated syntax removals), but the next release after that.
> * I _suspect_ that the issue above could lead to following lines in a
> cluster node logs.
> I didn't check, but I suspect that when remoted decides to disconnect
> an old connection
> after fenced node goes up and TCP connections are reset - it
> disconnects a new one too.
pacemaker_remoted doesn't really initiate disconnections except when it
exits. All connections and disconnections should be initiated on the
cluster side.
> At least this issue happens at the same time fenced node rejoins a
> cluster.
> These logs are for the case no resources operate node attributes (I
> removed that resources
> from the CIB and set a stickiness for all others).
> =
> Sep 20 08:55:41 md12k-1-srv crmd[11375]:error: Timed out (1
> ms) while waiting for remote data
> Sep 20 08:55:41 md12k-1-srv crmd[11375]:error: Unable to receive
> expected reply, disconnecting.
> Sep 20 08:55:41 md12k-1-srv crmd[11375]:error: Remote lrmd server
> disconnected while waiting for reply with id 9823.
> Sep 20 08:55:41 md12k-1-srv crmd[11375]:error: Unexpected
> disconnect on remote-node es12kxe-7-srv
The messages above may be confusing, but the disconnect is originating
on the cluster side, after a timeout waiting for a reply from
pacemaker_remoted.
I don't know of any reason that a cluster node rejoining would cause
pacemaker_remoted to stop working, aside from when the cluster node is
the same host running pacemaker_remoted (which I've done accidentally
during testing).
> Sep 20 08:55:41 md12k-1-srv crmd[11375]:error: Result of monitor
> operation for es12kxe-7-srv on md12k-1-srv: Error
> Sep 20 08:55:41 md12k-1-srv crmd[11375]:error: Couldn't perform
> lrmd_rsc_info operation (timeout=0): -107: Success (0)
> Sep 20 08:55:41 md12k-1-srv crmd[11375]:error: LRMD disconnected
> Sep 20 08:55:41 md12k-1-srv crmd[11375]:error: Could not add
> resource ost0033-es03a to LRM es12kxe-7-srv
> Sep 20 08:55:41 md12k-1-srv crmd[11375]:error: Invalid resource
> definition for ost0033-es03a
> Sep 20 08:55:41 md12k-1-srv crmd[11375]: warning: bad
> input version="3.0.13" subt="request" reference="lrm_invoke-tengine-150
> 5897731-3872" crm_task="lrm_invoke" crm_sys_to="lrmd"
> crm_sys_from="tengine" crm_host_to="md12k-1-srv"
> acl_target="hacluster" crm_user="hacluster">
> Sep 20 08:55:41 md12k-1-srv crmd[11375]: warning: bad
> input
> Sep 20 08:55:41 md12k-1-srv crmd[11375]: warning: bad
> input operation_key="ost0033-es03a_monitor_0" on_node="es12kxe-7-srv"
> on_node_uuid=
> "es12kxe-7-srv" router_node="md12k-1-srv" transition-
> key="221:23:7:7df0b5ef-5bfa-4d21-82e3-4dc964d6cffd">
> Sep 20 08:55:41 md12k-1-srv crmd[11375]: warning: bad
> input provider="ddn" type="lustre-server"/>
> Sep 20 08:55:41 md12k-1-srv crmd[11375]: warning: bad
> input CRM_meta_on_node_uuid="es12kxe-7-srv" CRM_meta_op_target_rc="7" CRM
> _meta_record_pending="false" CRM_meta_timeout="3