On 13/08/19 09:44 +0200, Ulrich Windl wrote:
Harvey Shepherd schrieb am 12.08.2019 um
23:38
> in Nachricht :
>> I've been experiencing exactly the same issue. Pacemaker prioritises
>> restarting the failed resource over maintaining a master instance. In my
>> case
>> I used
Thank you, I reached the same conclusion after reaching through the script.
Another question - I am no longer seeing the error quoted below as I've
increased shell_timeout to 30 seconds, but failovers are still happening. From
the logs, it appears that the cluster simply loses communication
On Tue, 2019-08-13 at 11:06 +0200, Ulrich Windl wrote:
> Hi,
>
> an update:
> After setting a failure-timeout for the resource that stale monitor
> failure
> was removed automatically at next cluster recheck (it seems).
> Still I wonder why a resource cleanup didn't do that (bug?).
Possibly ...
Олег Самойлов napsal(a):
13 авг. 2019 г., в 15:55, Jan Friesse написал(а):
There is going to be slightly different solution (set this timeouts based on
corosync token timeout) which I'm working on, but it's kind of huge amount of
work and not super high prio (workaround exists), so no ETA
- On Aug 13, 2019, at 3:14 PM, Ulrich Windl
ulrich.wi...@rz.uni-regensburg.de wrote:
> You said you booted the hosts sequentially. From the logs they were starting
> in
> parallel.
>
No. last says:
ha-idg-1:
reboot system boot 4.12.14-95.29-de Fri Aug 9 17:42 - 15:56 (3+22:14)
- On Aug 13, 2019, at 3:34 PM, Matthias Ferdinand m...@14v.de wrote:
>> 17:26:35 crm node standby ha-idg1-
>
> if that is not a copy error (ha-idg1- vs. ha-idg-1), then ha-idg-1
> was not set to standby, and installing updates may have done some
> meddling with corosync/pacemaker (like
> 13 авг. 2019 г., в 15:55, Jan Friesse написал(а):
>
> There is going to be slightly different solution (set this timeouts based on
> corosync token timeout) which I'm working on, but it's kind of huge amount of
> work and not super high prio (workaround exists), so no ETA yet.
Is it will
ct, this was sufficient to
> > promote the other instance to master, but in the current product,
> > that does not happen. Currently, the failed application is
> > restarted, as expected, and is promoted to master, but this takes 10?s of
> > seconds.
> >
> >
> >
>
> Did you try to disable resource stickiness for this ms?
&g
On Mon, Aug 12, 2019 at 04:09:48PM -0400, users-requ...@clusterlabs.org wrote:
> Date: Mon, 12 Aug 2019 18:09:24 +0200 (CEST)
> From: "Lentes, Bernd"
> To: Pacemaker ML
> Subject: [ClusterLabs] why is node fenced ?
> Message-ID:
>
You said you booted the hosts sequentially. From the logs they were starting in
parallel.
>>> "Lentes, Bernd" schrieb am 13.08.2019
um
13:53 in Nachricht
<767205671.1953556.1565697218136.javamail.zim...@helmholtz-muenchen.de>:
> ‑ On Aug 12, 2019, at 7:47 PM, Chris Walker cwal...@cray.com
Олег Самойлов napsal(a):
12 авг. 2019 г., в 8:46, Jan Friesse написал(а):
Let me try to bring some light in there:
- dpd_interval is qnetd variable how often qnetd walks thru the list of all
clients (qdevices) and checks timestamp of last sent message. If diff between
current timestamp
> 12 авг. 2019 г., в 8:46, Jan Friesse написал(а):
>
> Let me try to bring some light in there:
>
> - dpd_interval is qnetd variable how often qnetd walks thru the list of all
> clients (qdevices) and checks timestamp of last sent message. If diff between
> current timestamp and last sent
- On Aug 12, 2019, at 7:47 PM, Chris Walker cwal...@cray.com wrote:
> When ha-idg-1 started Pacemaker around 17:43, it did not see ha-idg-2, for
> example,
>
> Aug 09 17:43:05 [6318] ha-idg-1 pacemakerd: info:
> pcmk_quorum_notification:
> Quorum retained | membership=1320 members=1
>
I am happy to announce the latest release of pcs, version 0.9.168.
Source code is available at:
https://github.com/ClusterLabs/pcs/archive/0.9.168.tar.gz
or
https://github.com/ClusterLabs/pcs/archive/0.9.168.zip
Complete change log for this release:
## [0.9.168] - 2019-08-02
### Added
- It is
>>> "Lentes, Bernd" schrieb am 13.08.2019
um
10:54 in Nachricht
<848962511.1856599.1565686469666.javamail.zim...@helmholtz-muenchen.de>:
>
> ‑ On Aug 13, 2019, at 9:00 AM, Ulrich Windl
ulrich.wi...@rz.uni‑regensburg.de
> wrote:
>
>> Personally I feel more save with updates when the whole
Hi,
an update:
After setting a failure-timeout for the resource that stale monitor failure
was removed automatically at next cluster recheck (it seems).
Still I wonder why a resource cleanup didn't do that (bug?).
Regards,
Ulrich
>>> "Ulrich Windl" schrieb am 13.08.2019
um
10:07 in Nachricht
>>> "Ulrich Windl" schrieb am 13.08.2019 um
09:00 in Nachricht <5d52600102a100032...@gwsmtp.uni-regensburg.de>:
...
> Personally I feel more save with updates when the whole cluster node is
Of course I meant "more safe"... Time for coffein it seems ;-)
...
On 07/08/2019 16:48, lejeczek wrote:
> hi guys,
>
> after a reboot Ganesha export are not there. Suffices to do: $ systemctl
> restart nfs-ganesha - and all is good again.
>
> Would you have any ideas why?
>
> I'm on Centos 7.6 with nfs-ganesha-gluster-2.7.6-1.el7.x86_64;
>
- On Aug 13, 2019, at 9:00 AM, Ulrich Windl
ulrich.wi...@rz.uni-regensburg.de wrote:
> Personally I feel more save with updates when the whole cluster node is
> offline, not standby. When you are going to boot anyway, it won't make much of
> a difference. Also you don't have to remember
>>> Ken Gaillot schrieb am 13.08.2019 um 01:03 in
Nachricht
:
> On Mon, 2019‑08‑12 at 17:46 +0200, Ulrich Windl wrote:
>> Hi!
>>
>> I just noticed that a "crm resource cleanup " caused some
>> unexpected behavior and the syslog message:
>> crmd[7281]: warning: new_event_notification
>>> Harvey Shepherd schrieb am 12.08.2019 um
>>> 23:38
in Nachricht :
> I've been experiencing exactly the same issue. Pacemaker prioritises
> restarting the failed resource over maintaining a master instance. In my case
> I used crm_simulate to analyse the actions planned and taken by
>>> Jan Pokorný schrieb am 12.08.2019 um 22:30 in
Nachricht
<20190812203037.gm25...@redhat.com>:
[...]
> Is it OK for lower level components to do autonomous decisions
> without at least informing the higher level wrt. what exactly is
> going on, as we could observe here?
[...]
Excuse me for
>>> "Lentes, Bernd" schrieb am 12.08.2019
um
18:09 in Nachricht
<546330844.1686419.1565626164456.javamail.zim...@helmholtz-muenchen.de>:
> Hi,
>
> last Friday (9th of August) i had to install patches on my two-node
cluster.
> I put one of the nodes (ha-idg-2) into standby (crm node standby
On 8/12/19 9:24 PM, Klaus Wenninger wrote:
[...]
> If you shutdown solely pacemaker one-by-one on all nodes
> and these shutdowns are considered graceful then you are
> not gonna experience any reboots (e.g. 3 node cluster).
While revisit what you said, then run `systemctl stop pacemaker`
Momcilo
On Wed, Aug 7, 2019 at 1:00 PM Klaus Wenninger wrote:
On 8/7/19 12:26 PM, Momcilo Medic wrote:
We have three node cluster that is setup to stop resources on lost quorum.
Failure (network going down) handling is done properly, but recovery
doesn't seem to work.
What do you mean by
25 matches
Mail list logo