Hi,
Looking at lib/common/ipc.c, Pacemaker recommends setting
PCMK_ipc_buffer to 4 times the *uncompressed* size of the biggest
message seen:
error: Could not compress the message (2309508 bytes) into less than the
configured ipc limit (131072 bytes). Set PCMK_ipc_buffer to a higher value
"Lentes, Bernd" writes:
> 2018-12-03T16:03:02.836145+01:00 ha-idg-2 libvirtd[3117]: 2018-12-03
> 15:03:02.835+: 4515: error : qemuMigrationCheckJobStatus:1456 : operation
> failed: migration job: unexpectedly failed
The above message is a hint at the real problem. It comes from
libvirtd,
Patrick Whitney writes:
> I have a two node (test) cluster running corosync/pacemaker with DLM
> and CLVM.
>
> I was running into an issue where when one node failed, the remaining node
> would appear to do the right thing, from the pcmk perspective, that is.
> It would create a new cluster (of
Christine Caulfield writes:
> I'm also looking into high-res timestamps for logfiles too.
Wouldn't that be a useful option for the syslog output as well? I'm
sometimes concerned by the batching effect added by the transport
between the application and the (local) log server (rsyslog or
Ken Gaillot writes:
> libqb would simply provide the API for reopening the log, and clients
> such as pacemaker would intercept the signal and call the API.
Just for posterity: you needn't restrict yourself to signals. Logrotate
has nothing to do with signals. Signals are a rather limited
Ken Gaillot writes:
> On Thu, 2018-09-27 at 09:36 +0200, Ulrich Windl wrote:
>
>> Obviously you violated the most important cluster rule that is "be
>> patient". Maybe the next important is "Don't change the
>> configuration while the cluster is not in IDLE state" ;-)
>
> Agreed -- although
Christine Caulfield writes:
> TBH I would be quite happy to leave this to logrotate but the message I
> was getting here is that we need additional help from libqb. I'm willing
> to go with a consensus on this though
Yes, to do a proper job logrotate has to have a way to get the log files
Christine Caulfield writes:
> I'm looking into new features for libqb and the option in
> https://github.com/ClusterLabs/libqb/issues/142#issuecomment-76206425
> looks like a good option to me.
It feels backwards to me: traditionally, increasing numbers signify
older rotated logs, while this
Hi,
The current behavior of cancelled migration with Pacemaker 1.1.16 with a
resource implementing push migration:
# /usr/sbin/crm_resource --ban -r vm-conv-4
vhbl03 crmd[10017]: notice: State transition S_IDLE -> S_POLICY_ENGINE
vhbl03 pengine[10016]: notice: Migrate vm-conv-4#011(Started
Jan Friesse writes:
> wagner.fer...@kifu.gov.hu writes:
>
>> triggered by your favourite IPC mechanism (SIGHUP and SIGUSRx are common
>> choices, but logging.* cmap keys probably fit Corosync better). That
>> would enable proper log rotation.
>
> What is the reason that you find "copytruncate"
Jan Friesse writes:
> Default example config should be definitively ported to newer style of
> nodelist without interface section. example.udpu can probably be
> deleted as well as example.xml (whole idea of having XML was because
> of cluster config tools like pcs, but these tools never used
>
Jan Friesse writes:
> Have you had a time to play with packaging current alpha to find out
> if there are no issues? I had no problems with Fedora, but Debian has
> a lot of patches, and I would be really grateful if we could reduce
> them a lot - so please let me know if there is patch which
Jan Friesse writes:
> Currently I'm pretty happy with current Corosync alpha stability so it
> would be possible to release final right now, but because I want to
> give us some room to break protocol/abi (only if needed and right now
> I don't see any strong reason for such breakage), I didn't
Jan Friesse writes:
> try corosync 3.x (current Alpha4 is pretty stable [...]
Hi Honza,
Can you provide an estimate for the Corosync 3 release timeline? We
have to plan the ABI transition in Debian anf the freeze date is drawing
closer.
--
Thanks,
Feri
wf...@niif.hu (Ferenc Wágner) writes:
> David Tolosa writes:
>
>> I tried to install corosync 3.x and it works pretty well.
>> But when I install pacemaker, it installs previous version of corosync as
>> dependency and breaks all the setup.
>> Any suggestions?
David Tolosa writes:
> I tried to install corosync 3.x and it works pretty well.
> But when I install pacemaker, it installs previous version of corosync as
> dependency and breaks all the setup.
> Any suggestions?
Install the equivs package to create a dummy corosync package
representing your
Jan Friesse writes:
> Is that system VM or physical machine? Because " Corosync main process
> was not scheduled for..." is usually happening on VMs where hosts are
> highly overloaded.
Or when physical hosts use BMC watchdogs. But Prasad didn't encounter
such logs in the setup at hand, as far
FeldHost™ Admin writes:
> rule of thumb is use separate dedicated network for corosync traffic.
> For ex. we use two corosync rings, first and active one on separate
> network card and switch, second passive one on team (bond) device vlan.
Hi,
That's fine in principle, but this is a
David Teigland writes:
> On Thu, Aug 09, 2018 at 06:11:48PM +0200, Ferenc Wágner wrote:
>
>> Almost ten years ago you requested more info in a similar case, let's
>> see if we can get further now!
>
> Hi, the usual cause is that a network message from the dlm has be
wf...@niif.hu (Ferenc Wágner) writes:
> For a start I attached the dump output from another node.
I meant to...
146 dlm_controld 4.0.5 started
146 our_nodeid 167773708
146 found /dev/misc/dlm-control minor 58
146 found /dev/misc/dlm-monitor minor 57
146 found /dev/misc/dlm_plock minor 56
Hi David,
Almost ten years ago you requested more info in a similar case, let's
see if we can get further now!
We're running a 6-node Corosync cluster. DLM is started by systemd:
● dlm.service - dlm control daemon
Loaded: loaded (/lib/systemd/system/dlm.service; enabled)
Active: active
Jan Pokorný writes:
> 1. [X] Do you edit CIB by hand (as opposed to relying on crm/pcs or
> their UI counterparts)?
For debugging one has to understand the CIB anyway, so why learn
additional syntaxes? :) Most of our configuration changes are scripted
via a home-grown domain-specific
Jan Pokorný writes:
> On 12/04/18 14:33 +0200, Jan Friesse wrote:
>
>> This release contains a lot of fixes, including fix for
>> CVE-2018-1084.
>
> Security related updates would preferably provide more context
Absolutely, thanks for providing that! Looking at the git
Ken Gaillot writes:
> A couple of regressions have been found in the recent Pacemaker 1.1.18
> release.
>
> Fixes for these, plus one finishing an incomplete fix in 1.1.18, are in
> the master branch, and have been backported to the 1.1 branch for ease
> of patching. It is
Andrei Borzenkov writes:
> 25.11.2017 10:05, Andrei Borzenkov пишет:
>
>> In one of guides suggested procedure to simulate split brain was to kill
>> corosync process. It actually worked on one cluster, but on another
>> corosync process was restarted after being killed
Ken Gaillot writes:
> When an operation completes, a history entry () is added to
> the pe-input file. If the agent supports reload, the entry will include
> op-force-restart and op-restart-digest fields. Now I see those are
> present in the vm-alder_last_0 entry, so agent
Ken Gaillot writes:
> The pe-input is indeed entirely sufficient.
>
> I forgot to check why the reload was not possible in this case. It
> turns out it is this:
>
> trace: check_action_definition: Resource vm-alder doesn't know
> how to reload
>
> Does the resource
Dennis Jacobfeuerborn writes:
> if I create a new unit file for the new file the services would not
> depend on it so it wouldn't get automatically mounted when they start.
Put the new unit file under /etc/systemd/system/x.service.requires to
have x.service require it. I
Ken Gaillot <kgail...@redhat.com> writes:
> On Fri, 2017-10-20 at 15:52 +0200, Ferenc Wágner wrote:
>
>> Ken Gaillot <kgail...@redhat.com> writes:
>>
>>> On Fri, 2017-09-22 at 18:30 +0200, Ferenc Wágner wrote:
>>>
>>>> Ken Gaillo
Norberto Lopes <nlopes...@gmail.com> writes:
> On Fri, 27 Oct 2017 at 06:41 Ferenc Wágner <wf...@niif.hu> wrote:
>
>> Norberto Lopes <nlopes...@gmail.com> writes:
>>
>>> colocation backup-vip-not-with-master -inf: backupVIP postgresMS:Master
Norberto Lopes writes:
> colocation backup-vip-not-with-master -inf: backupVIP postgresMS:Master
> colocation backup-vip-not-with-master inf: backupVIP postgresMS:Slave
>
> Basically what's occurring in my cluster is that the first rule stops the
> Sync node from being
Ken Gaillot <kgail...@redhat.com> writes:
> On Fri, 2017-09-22 at 18:30 +0200, Ferenc Wágner wrote:
>> Ken Gaillot <kgail...@redhat.com> writes:
>>
>>> Hmm, stop+reload is definitely a bug. Can you attach (or email it to
>>> me privately, or file a
Václav Mach <ma...@cesnet.cz> writes:
> On 10/11/2017 09:00 AM, Ferenc Wágner wrote:
>
>> Václav Mach <ma...@cesnet.cz> writes:
>>
>>> allow-hotplug eth0
>>> iface eth0 inet dhcp
>>
>> Try replacing allow-hotplug with auto. Ifupdow
Donat Zenichev writes:
> then resource is stopped, but nothing occurred on e-mail destination.
> Where I did wrong actions?
Please note that ClusterMon notifications are becoming deprecated (they
should still work, but I've got no experience with them). Try using
Václav Mach writes:
> allow-hotplug eth0
> iface eth0 inet dhcp
Try replacing allow-hotplug with auto. Ifupdown simply runs ifup -a
before network-online.target, which excludes allow-hotplug interfaces.
That means allow-hotplug interfaces are not waited for before corosync
is
Ken Gaillot writes:
> Hmm, stop+reload is definitely a bug. Can you attach (or email it to me
> privately, or file a bz with it attached) the above pe-input file with
> any sensitive info removed?
I sent you the pe-input file privately. It indeed shows the issue:
$
Ken Gaillot writes:
> * undocumented LRMD_MAX_CHILDREN environment variable
> (PCMK_node_action_limit is the current syntax)
By the way, is the current syntax documented somewhere? Looking at
crmd/throttle.c, throttle_update_job_max() is only ever invoked with a
NULL
Jan Friesse writes:
> Back to problem you have. It's definitively HW issue but I'm thinking
> how to solve it in software. Right now, I can see two ways:
> 1. Set dog FD to be non blocking right at the end of setup_watchdog -
>This is proffered but I'm not sure if it's
Klaus Wenninger writes:
> Just for my understanding: You are using watchdog-handling in corosync?
Yes, I was.
--
Feri
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users
Project
Valentin Vidic <valentin.vi...@carnet.hr> writes:
> On Sun, Sep 10, 2017 at 08:27:47AM +0200, Ferenc Wágner wrote:
>
>> Confirmed: setting watchdog_device: off cluster wide got rid of the
>> above warnings.
>
> Interesting, what brand or version of IPMI has this pro
wf...@niif.hu (Ferenc Wágner) writes:
> Jan Friesse <jfrie...@redhat.com> writes:
>
>> wf...@niif.hu writes:
>>
>>> In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day
>>> (in August; in May, it happened 0-2 times a day only, it's slowl
Jan Friesse writes:
> wf...@niif.hu writes:
>
>> In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day
>> (in August; in May, it happened 0-2 times a day only, it's slowly
>> ramping up):
>>
>> vhbl08 corosync[3687]: [TOTEM ] A processor failed, forming new
Digimer <li...@alteeve.ca> writes:
> On 2017-08-29 10:45 AM, Ferenc Wágner wrote:
>
>> Digimer <li...@alteeve.ca> writes:
>>
>>> On 2017-08-28 12:07 PM, Ferenc Wágner wrote:
>>>
>>>> [...]
>>>> While dlm_tool status repo
Jan Friesse writes:
> wf...@niif.hu writes:
>
>> Jan Friesse writes:
>>
>>> wf...@niif.hu writes:
>>>
In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day
(in August; in May, it happened 0-2 times a day only, it's slowly
Jan Friesse writes:
> wf...@niif.hu writes:
>
>> In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day
>> (in August; in May, it happened 0-2 times a day only, it's slowly
>> ramping up):
>>
>> vhbl08 corosync[3687]: [TOTEM ] A processor failed, forming new
Klaus Wenninger writes:
> Just seen that you are hosting VMs which might make you use KSM ...
> Don't fully remember at the moment but I have some memory of
> issues with KSM and page-locking.
> iirc it was some bug in the kernel memory-management that should
> be fixed a
Jan Friesse writes:
> wf...@niif.hu writes:
>
>> Jan Friesse writes:
>>
>>> wf...@niif.hu writes:
>>>
In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day
(in August; in May, it happened 0-2 times a day only, it's slowly
Digimer <li...@alteeve.ca> writes:
> On 2017-08-28 12:07 PM, Ferenc Wágner wrote:
>
>> [...]
>> While dlm_tool status reports (similar on all nodes):
>>
>> cluster nodeid 167773705 quorate 1 ring seq 3088 3088
>> daemon now 2941405 fence_pid 0
>&g
Jan Friesse writes:
> wf...@niif.hu writes:
>
>> In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day
>> (in August; in May, it happened 0-2 times a day only, it's slowly
>> ramping up):
>>
>> vhbl08 corosync[3687]: [TOTEM ] A processor failed, forming new
Hi,
In a 6-node cluster (vhbl03-08) the following happens 1-5 times a day
(in August; in May, it happened 0-2 times a day only, it's slowly
ramping up):
vhbl08 corosync[3687]: [TOTEM ] A processor failed, forming new configuration.
vhbl03 corosync[3890]: [TOTEM ] A processor failed, forming
Ken Gaillot writes:
> The most significant change in this release is a new cluster option to
> improve scalability.
>
> As users start to create clusters with hundreds of resources and many
> nodes, one bottleneck is a complete reprobe of all resources (for
> example, after
Digimer <li...@alteeve.ca> writes:
> On 19/06/17 11:40 PM, Andrei Borzenkov wrote:
>
>> 20.06.2017 02:15, Digimer пишет:
>>
>>> On 19/06/17 06:59 PM, Ferenc Wágner wrote:
>>>
>>>> Digimer <li...@alteeve.ca> writes:
>>>&
Digimer writes:
> So we have a tool that watches for changes to clvmd by running
> pvscan/vgscan/lvscan, but this seems to be expensive and occassionally
> cause trouble.
What kind of trouble did you experience?
> Is there any other way to be notified or to check when
James Booth writes:
> Sorry for the repeat mails, but I had issues subscribing list time
> (Looks like it has worked successfully now!).
>
> Anywho, I'm really desperate for some help on my issue in
>
Ken Gaillot <kgail...@redhat.com> writes:
> On 04/13/2017 11:11 AM, Ferenc Wágner wrote:
>
>> I encountered several (old) statements on various forums along the lines
>> of: "the CIB is not a transactional database and shouldn't be used as
>> one" or &
Hi,
I encountered several (old) statements on various forums along the lines
of: "the CIB is not a transactional database and shouldn't be used as
one" or "resource parameters should only uniquely identify a resource,
not configure it" and "the CIB was not designed to be a configuration
database
kgronl...@suse.com (Kristoffer Grönlund) writes:
> I discovered today that a location constraint with score=INFINITY
> doesn't actually restrict resources to running only on particular
> nodes.
Yeah, I made the same "discovery" some time ago. Since then I've been
using something like the
Jeffrey Westgate writes:
> We use Nagios to monitor, and once every 20 to 40 hours - sometimes
> longer, and we cannot set a clock by it - while the machine is 95%
> idle (or more according to 'top'), the host load shoots up to 50 or
> 60%. It takes about 20
Oscar Segarra writes:
> In my environment I have 5 guestes that have to be started up in a
> specified order starting for the MySQL database server.
We use a somewhat redesigned resource agent, which connects to the guest
using a virtio channel and waits for a signal
Jehan-Guillaume de Rorthais writes:
> PAF use private attribute to give informations between actions. We
> detect the failure during the notify as well, but raise the error
> during the promotion itself. See how I dealt with this in PAF:
>
>
Ken Gaillot writes:
> On 02/07/2017 01:11 AM, Ulrich Windl wrote:
>
>> Ken Gaillot writes:
>>
>>> On 02/06/2017 03:28 AM, Ulrich Windl wrote:
>>>
Isn't the question: Is crmd a process that is expected to die (and
thus need restarting)? Or
Ken Gaillot <kgail...@redhat.com> writes:
> On 02/03/2017 07:00 AM, RaSca wrote:
>>
>> On 03/02/2017 11:06, Ferenc Wágner wrote:
>>> Ken Gaillot <kgail...@redhat.com> writes:
>>>
>>>> On 01/10/2017 04:24 AM, Stefan Schloesser wrote:
>&g
Hi,
There was an interesting discussion on this list about "Doing reload
right" last July (which I still haven't digested entirely). Now I've
got a related question about the current and intented behavior: what
happens if a reload operation fails? I found some suggestions in
Ken Gaillot writes:
> On 01/10/2017 04:24 AM, Stefan Schloesser wrote:
>
>> I am currently testing a 2 node cluster under Ubuntu 16.04. The setup
>> seems to be working ok including the STONITH.
>> For test purposes I issued a "pkill -f pace" killing all pacemaker
>>
Marco Marino writes:
> Ferenc, regarding the flag use_lvmetad in
> /usr/lib/ocf/resource.d/heartbeat/LVM I read:
>
>> lvmetad is a daemon that caches lvm metadata to improve the
>> performance of LVM commands. This daemon should never be used when
>> volume groups exist
Marco Marino writes:
> I agree with you for
> use_lvmetad = 0 (setting it = 1 in a clustered environment is an error)
Where does this information come from? AFAIK, if locking_type=3 (LVM
uses internal clustered locking, that is, clvmd), lvmetad is not used
anyway, even if
Ken Gaillot writes:
> * When you move the VM, the cluster detects that it is not running on
> the node you told it to keep it running on. Because there is no
> "Stopped" monitor, the cluster doesn't immediately realize that a new
> rogue instance is running on another node.
Jan Friesse <jfrie...@redhat.com> writes:
> Ferenc Wágner napsal(a):
>
>> Have you got any plans/timeline for 2.4.2 yet?
>
> Yep, I'm going to release it in few minutes/hours.
Man, that was quick. I've got a bunch of typo fixes queued..:) Please
consider announcing up
Ken Gaillot writes:
> This spurred me to complete a long-planned overhaul of Pacemaker
> Explained's "Upgrading" appendix:
>
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Pacemaker_Explained/_upgrading.html
>
> Feedback is welcome.
Since you asked for it..:)
1.
Jan Friesse writes:
> Please note that because of required changes in votequorum,
> libvotequorum is no longer binary compatible. This is reason for
> version bump.
Er, what version bump? Corosync 2.4.1 still produces
libvotequorum.so.7.0.0 for me, just like Corosync
Ken Gaillot writes:
> Does anyone know of an RA that uses reload correctly?
My resource agents advertise a no-op reload action for handling their
"private" meta attributes. Meta in the sense that they are used by the
resource agent when performing certain operations, not
"Lentes, Bernd" writes:
> i don't have neither an init-script nor a systemd service file.
> The only packages i find in the repositories concerning dlm are:
> libdlm3-3.00.01-0.31.87
> libdlm-3.00.01-0.31.87
> And i have a kernel module for dlm.
> Nothing
"Lentes, Bernd" writes:
> is it possible to have a DLM running without CRM?
Yes. You'll need to configure fencing, though, since by default DLM
will try to use stonithd (from Pacemaker). But DLM fencing didn't
handle fencing failures correctly for me,
Hi,
Could somebody please elaborate a little why the pacemaker systemd
service file contains "Restart=on-failure"? I mean that a failed node
gets fenced anyway, so most of the time this would be a futile effort.
On the other hand, one could argue that restarting failed services
should be the
Klaus Wenninger <kwenn...@redhat.com> writes:
> On 06/16/2016 11:05 AM, Ferenc Wágner wrote:
>
>> Klaus Wenninger <kwenn...@redhat.com> writes:
>>
>>> On 06/15/2016 06:11 PM, Ferenc Wágner wrote:
>>>
>>>> I think the default timestamp
Klaus Wenninger <kwenn...@redhat.com> writes:
> On 06/15/2016 06:11 PM, Ferenc Wágner wrote:
>
>> Please find some random notes about my adventures testing the new alert
>> system.
>>
>> The first alert example in the documentation has no recipient:
>&
Ilia Sokolinski writes:
> We have a custom Master-Slave resource running on a 3-node pcs cluster on
> CentOS 7.1
>
> As part of what is supposed to be an NDU we do update some properties of the
> resource.
> For some reason this causes both Master and Slave instances of
Nikhil Utane writes:
> Would like to know the best and easiest way to add a new node to an already
> running cluster.
>
> Our limitation:
> 1) pcsd cannot be used since (as per my understanding) it communicates over
> ssh which is prevented.
> 2) No manual editing of
"Lentes, Bernd" <bernd.len...@helmholtz-muenchen.de> writes:
> - On Jun 7, 2016, at 3:53 PM, Ferenc Wágner wf...@niif.hu wrote:
>
>> "Lentes, Bernd" <bernd.len...@helmholtz-muenchen.de> writes:
>>
>>> Ok. Does DLM takes care that a L
"Lentes, Bernd" writes:
> Ok. Does DLM takes care that a LV just can be used on one host ?
No. Even plain LVM uses locks to serialize access to its metadata
(avoid concurrent writes corrupting it). These locks are provided by
the host kernel
"Stephano-Shachter, Dylan" writes:
> I can not figure out why version 4 is not supported.
Have you got fsid=root (or fsid=0) on your root export?
See man exports.
--
Feri
___
Users mailing list: Users@clusterlabs.org
Andrey Rogovsky writes:
> I have deb rules, comes from 1.12 and try apply it to current release.
1.1.14 is available in sid, stretch and jessie-backports, any reason you
can't use those packages?
> In the building I get an error:
> dh_testroot -a
> rm -rf
David Teigland writes:
> On Tue, Apr 26, 2016 at 09:57:06PM +0200, Valentin Vidic wrote:
>
>> The bug is caused by the missing braces in the expanded if
>> statement.
>>
>> Do you think we can get a new version out with this patch as the
>> fencing in 4.0.4 does not work
Hi,
Are recurring monitor operations constrained by the batch-limit cluster
option? I ask because I'd like to limit the number of parallel start
and stop operations (because they are resource hungry and potentially
take long) without starving other operations, especially monitors.
--
Thanks,
Ken Gaillot writes:
> Each alert may have any number of recipients configured. These values
> will simply be passed to the script as arguments. The first recipient
> will also be passed as the CRM_alert_recipient environment variable,
> for compatibility with existing
"Ulrich Windl" <ulrich.wi...@rz.uni-regensburg.de> writes:
> Ferenc Wágner <wf...@niif.hu> schrieb am 19.04.2016 um 13:42 in Nachricht
>
>> "Ulrich Windl" <ulrich.wi...@rz.uni-regensburg.de> writes:
>>
>>> Ferenc Wágner <wf..
"Ulrich Windl" <ulrich.wi...@rz.uni-regensburg.de> writes:
> Ferenc Wágner <wf...@niif.hu> schrieb am 18.04.2016 um 17:07 in Nachricht
>
>> I'm using the "balanced" placement strategy with good success. It
>> distributes our VM resources accord
Hi,
I'm using the "balanced" placement strategy with good success. It
distributes our VM resources according to memory size perfectly.
However, I'd like to take the NUMA topology into account. That means
each host should have several capacity pools (of each capacity type) to
arrange the
Hi,
On a freshly rebooted cluster node (after crm_mon reports it as
'online'), I get the following:
wferi@vhbl08:~$ sudo crm_resource -r vm-cedar --cleanup
Cleaning up vm-cedar on vhbl03, removing fail-count-vm-cedar
Cleaning up vm-cedar on vhbl04, removing fail-count-vm-cedar
Cleaning up
"Ulrich Windl" writes:
> Actually form my SLES11 SP[1-4] experience, the cluster always
> distributes resources across all available nodes, and only if don't
> want that, I'll have to add constraints. I wonder why that does not
> seem to work for you.
Because
Ken Gaillot <kgail...@redhat.com> writes:
> On 03/30/2016 08:37 PM, Ferenc Wágner wrote:
>
>> I've got a couple of resources (A, B, C, D, ... more than cluster nodes)
>> that I want to spread out to different nodes as much as possible. They
>> are all the same
(Please post only to the list, or at least keep it amongst the Cc-s.)
Momcilo Medic <fedorau...@fedoraproject.org> writes:
> On Wed, Mar 23, 2016 at 1:56 PM, Ferenc Wágner <wf...@niif.hu> wrote:
>> Momcilo Medic <fedorau...@fedoraproject.org> writes:
>>
>&g
Momcilo Medic writes:
> I have three hosts setup in my test environment.
> They each have two connections to the SAN which has GFS2 on it.
>
> Everything works like a charm, except when I reboot a host.
> Once it tries to stop gfs2-utils service it will just hang.
Ken Gaillot writes:
> There is a fence parameter pcmk_host_check that specifies how pacemaker
> determines which fence devices can fence which nodes. The default is
> dynamic-list, which means to run the fence agent's list command to get
> the nodes. [...]
>
> You can
Andrei Borzenkov <arvidj...@gmail.com> writes:
> On Wed, Mar 16, 2016 at 2:22 PM, Ferenc Wágner <wf...@niif.hu> wrote:
>
>> Pacemaker explained says about this cluster option:
>>
>> Advanced Use Only: Should the cluster shoot unseen nodes? Not using
>&g
Hi,
Pacemaker explained says about this cluster option:
Advanced Use Only: Should the cluster shoot unseen nodes? Not using
the default is very unsafe!
1. What are those "unseen" nodes?
And a possibly related question:
2. If I've got UNCLEAN (offline) nodes, is there a way to clean
Andrei Borzenkov <arvidj...@gmail.com> writes:
> On Wed, Mar 16, 2016 at 4:18 PM, Lars Ellenberg <lars.ellenb...@linbit.com>
> wrote:
>
>> On Wed, Mar 16, 2016 at 01:47:52PM +0100, Ferenc Wágner wrote:
>>
>>>>> And some more about fencing:
>>
Hi,
I'm referring here to an ancient LKML thread introducing DLM. In
http://article.gmane.org/gmane.linux.kernel/299788 David Teigland
states:
GFS requires that a failed node be fenced prior to gfs being told to
begin recovery for that node
which sounds very plausible as according to
Ken Gaillot <kgail...@redhat.com> writes:
> On 03/07/2016 02:03 PM, Ferenc Wágner wrote:
>
>> The transition-keys match, does this mean that the above is a late
>> result from the monitor operation which was considered timed-out
>> previously? How did it reach
Andrew Beekhof <abeek...@redhat.com> writes:
> On Tue, Mar 8, 2016 at 7:03 AM, Ferenc Wágner <wf...@niif.hu> wrote:
>
>> Ken Gaillot <kgail...@redhat.com> writes:
>>
>>> On 03/07/2016 07:31 AM, Ferenc Wágner wrote:
>>>
>>>>
1 - 100 of 105 matches
Mail list logo