s master.
> 2. Kill services on A, node B will come up as master.
> 3. node A is ready to join the cluster, we have to delete the lock file it
> creates on any one of the node and execute the cleanup command to get the
> node back as standby
>
> Step 3 is manual so HA is not ac
ouce agent which supports automatic
failback? To avoid generation of lock file and deleting it.
2. If there is no such support, if we need such functionality, do we have
to modify existing code?
How this can be achieved. Please suggest.
Thanks.
Thanks.
-- next part
2. If there is no such support, if we need such functionality, do we have
to modify existing code?
How this can be achieved. Please suggest.
Thanks.
Thanks.
-- next part --
An HTML attachment was scrubbed...
URL:
<https://lists.clusterlabs.org/piperma
On Mon, 2019-08-12 at 11:15 +0200, Ulrich Windl wrote:
> Hi!
>
> Back in December 2011 I had written a script to retrieve all failed
> resource operations by using "cibadmin -Q -o lrm_resources" as data
> base. I was querying lrm_rsc_op for op-status != 0.
> In a newer release this does not
On Mon, 2019-08-12 at 18:09 +0200, Lentes, Bernd wrote:
> Hi,
>
> last Friday (9th of August) i had to install patches on my two-node
> cluster.
> I put one of the nodes (ha-idg-2) into standby (crm node standby ha-
> idg-2), patched it, rebooted,
> started the cluster (systemctl start
On Mon, 2019-08-12 at 17:46 +0200, Ulrich Windl wrote:
> Hi!
>
> I just noticed that a "crm resource cleanup " caused some
> unexpected behavior and the syslog message:
> crmd[7281]: warning: new_event_notification (7281-97955-15): Broken
> pipe (32)
>
> It's SLES14 SP4 last updated Sept. 2018
On Mon, 2019-08-12 at 23:09 +0300, Andrei Borzenkov wrote:
>
>
> On Mon, Aug 12, 2019 at 4:12 PM Michael Powell <
> michael.pow...@harmonicinc.com> wrote:
> > At 07:44:49, the ss agent discovers that the master instance has
> > failed on node mgraid…-0 as a result of a failed ssadm request in
>
uch support, if we need such functionality, do we have
to modify existing code?
How this can be achieved. Please suggest.
Thanks.
Thanks.
-- next part --
An HTML attachment was scrubbed...
URL:
<https://lists.clusterlabs.org/pipermail/users/attachments/201908
s automatic
failback? To avoid generation of lock file and deleting it.
2. If there is no such support, if we need such functionality, do we have
to modify existing code?
How this can be achieved. Please suggest.
Thanks.
Thanks.
-- next part ------
An HTML attachment wa
On 07/08/19 16:06 +0200, Momcilo Medic wrote:
> On Wed, Aug 7, 2019 at 1:00 PM Klaus Wenninger wrote:
>
>> On 8/7/19 12:26 PM, Momcilo Medic wrote:
>>
>>> We have three node cluster that is setup to stop resources on lost
>>> quorum. Failure (network going down) handling is done properly,
>>>
On Mon, Aug 12, 2019 at 4:12 PM Michael Powell <
michael.pow...@harmonicinc.com> wrote:
> At 07:44:49, the ss agent discovers that the master instance has failed on
> node *mgraid…-0* as a result of a failed *ssadm* request in response to
> an *ss_monitor()* operation. It issues a *crm_master -Q
When ha-idg-1 started Pacemaker around 17:43, it did not see ha-idg-2, for
example,
Aug 09 17:43:05 [6318] ha-idg-1 pacemakerd: info: pcmk_quorum_notification:
Quorum retained | membership=1320 members=1
after ~20s (dc-deadtime parameter), ha-idg-2 is marked 'unclean' and STONITHed
as
Hello,
Postgres version : 9.6
OS:Rhel 7.6
We are working on HA setup for postgres cluster of two nodes in
active-passive mode.
Installed:
Pacemaker 1.1.19
Corosync 2.4.3
The pacemaker agent with this installation doesn't support automatic
failback. What I mean by that is explained below:
1.
Hi,
last Friday (9th of August) i had to install patches on my two-node cluster.
I put one of the nodes (ha-idg-2) into standby (crm node standby ha-idg-2),
patched it, rebooted,
started the cluster (systemctl start pacemaker) again, put the node again
online, everything fine.
Then i wanted
Hi!
I just noticed that a "crm resource cleanup " caused some unexpected
behavior and the syslog message:
crmd[7281]: warning: new_event_notification (7281-97955-15): Broken pipe (32)
It's SLES14 SP4 last updated Sept. 2018 (up since then,
pacemaker-1.1.19+20180928.0d2680780-1.8.x86_64).
The
On 8/12/19 3:24 PM, Klaus Wenninger wrote:
> On 8/12/19 2:30 PM, Yan Gao wrote:
>> Hi Klaus,
>>
>> On 8/12/19 1:39 PM, Klaus Wenninger wrote:
>>> On 8/9/19 9:06 PM, Yan Gao wrote:
On 8/9/19 6:40 PM, Andrei Borzenkov wrote:
> 09.08.2019 16:34, Yan Gao пишет:
>> Hi,
>>
>> With
On 8/12/19 2:30 PM, Yan Gao wrote:
> Hi Klaus,
>
> On 8/12/19 1:39 PM, Klaus Wenninger wrote:
>> On 8/9/19 9:06 PM, Yan Gao wrote:
>>> On 8/9/19 6:40 PM, Andrei Borzenkov wrote:
09.08.2019 16:34, Yan Gao пишет:
> Hi,
>
> With disk-less sbd, it's fine to stop cluster service from
On 8/12/19 8:42 AM, Ulrich Windl wrote:
> Hi!
>
> One motivation to stop all nodes at the same time is to avoid needless moving
> of resources, like the following:
> You stop node A, then resources are stopped on A and started elsewhere
> You stop node B, and resources are stopped and moved to
Hi Klaus,
On 8/12/19 1:39 PM, Klaus Wenninger wrote:
> On 8/9/19 9:06 PM, Yan Gao wrote:
>> On 8/9/19 6:40 PM, Andrei Borzenkov wrote:
>>> 09.08.2019 16:34, Yan Gao пишет:
Hi,
With disk-less sbd, it's fine to stop cluster service from the cluster
nodes all at the same time.
On 8/9/19 9:06 PM, Yan Gao wrote:
> On 8/9/19 6:40 PM, Andrei Borzenkov wrote:
>> 09.08.2019 16:34, Yan Gao пишет:
>>> Hi,
>>>
>>> With disk-less sbd, it's fine to stop cluster service from the cluster
>>> nodes all at the same time.
>>>
>>> But if to stop the nodes one by one, for example with a
>>> Roger Zhou schrieb am 12.08.2019 um 10:55 in Nachricht
<7249e013-1256-675a-3cea-3572f4615...@suse.com>:
> On 8/12/19 2:48 PM, Ulrich Windl wrote:
> Andrei Borzenkov schrieb am 09.08.2019 um 18:40
in
>> Nachricht <217d10d8-022c-eaf6-28ae-a4f58b2f9...@gmail.com>:
>>> 09.08.2019 16:34,
Hi!
Back in December 2011 I had written a script to retrieve all failed resource
operations by using "cibadmin -Q -o lrm_resources" as data base. I was
querying lrm_rsc_op for op-status != 0.
In a newer release this does not seems to work anymore.
I see resource IDs ending with "_last_0",
On 8/12/19 2:48 PM, Ulrich Windl wrote:
Andrei Borzenkov schrieb am 09.08.2019 um 18:40 in
> Nachricht <217d10d8-022c-eaf6-28ae-a4f58b2f9...@gmail.com>:
>> 09.08.2019 16:34, Yan Gao пишет:
[...]
>>
>> Lack of cluster wide shutdown mode was mentioned more than once on this
>> list. I
Andrei Borzenkov napsal(a):
Отправлено с iPhone
12 авг. 2019 г., в 8:46, Jan Friesse написал(а):
Олег Самойлов napsal(a):
9 авг. 2019 г., в 9:25, Jan Friesse написал(а):
Please do not set dpd_interval that high. dpd_interval on qnetd side is not
about how often is the ping is sent.
You should be able to increase this timeout by running:
pcs stonith update shell_timeout=10
Oyvind
On 08/08/19 12:13 -0600, Casey & Gina wrote:
Hi, I'm currently running into periodic premature killing of nodes due to the
fence monitor timeout being set to 5 seconds. Here is an example
Отправлено с iPhone
12 авг. 2019 г., в 9:48, Ulrich Windl
написал(а):
Andrei Borzenkov schrieb am 09.08.2019 um 18:40 in
> Nachricht <217d10d8-022c-eaf6-28ae-a4f58b2f9...@gmail.com>:
>> 09.08.2019 16:34, Yan Gao пишет:
>>> Hi,
>>>
>>> With disk-less sbd, it's fine to stop cluster
>>> Andrei Borzenkov schrieb am 09.08.2019 um 18:40 in
Nachricht <217d10d8-022c-eaf6-28ae-a4f58b2f9...@gmail.com>:
> 09.08.2019 16:34, Yan Gao пишет:
>> Hi,
>>
>> With disk-less sbd, it's fine to stop cluster service from the cluster
>> nodes all at the same time.
>>
>> But if to stop the
Hi!
One motivation to stop all nodes at the same time is to avoid needless moving
of resources, like the following:
You stop node A, then resources are stopped on A and started elsewhere
You stop node B, and resources are stopped and moved to remaining nodes
...until the last node stops, or
Отправлено с iPhone
> 12 авг. 2019 г., в 8:46, Jan Friesse написал(а):
>
> Олег Самойлов napsal(a):
>>> 9 авг. 2019 г., в 9:25, Jan Friesse написал(а):
>>> Please do not set dpd_interval that high. dpd_interval on qnetd side is not
>>> about how often is the ping is sent. Could you please
>>> Roger Zhou schrieb am 09.08.2019 um 10:19 in Nachricht
<06f700cb-d941-2f53-aee5-2d64c499c...@suse.com>:
>
> On 8/9/19 3:39 PM, Jan Friesse wrote:
>> Roger Zhou napsal(a):
>>>
>>> On 8/9/19 2:27 PM, Roger Zhou wrote:
On 7/29/19 12:24 AM, Andrei Borzenkov wrote:
>
Nickle, Richard napsal(a):
I've built a two-node DRBD cluster with SBD and STONITH, following advice
from ClusterLabs, LinBit, Beekhof's blog on SBD.
I still cannot get automated failover when I down one of the nodes. I
thought that perhaps I needed to have an odd-numbered quorum so I
31 matches
Mail list logo