date:20161026

[ClusterLabs] Antw: Troubleshooting Faulty Networks / Heartbeat Rings

2016-10-26 Thread Ulrich Windl

>>> Martin Schlegel  schrieb am 26.10.2016 um 13:55 in
Nachricht
<1875006565.5761.eadc80df-ed0f-4dcf-bc75-89991bd8c2a1.open-xchange@email.1und1.d
 
>:
> Hello all
> 
> One one of our test clusters the network seems to be dropping messages at
> different times of the day - we know it was not a network latency issue. We
> could prove it via iperf - a local network test utility.
> 
> However, I wish there was some more detailed logs than the retransmit log
> messages we are seeing. Even with debug enabled in Corosync it was next to
> impossible for me to get confirmation from the logs about what is causing it 
> and
> how it affects the heartbeat ring.
> 
> How can I can track the heartbeat ring in action using time stamps to first
> understand how it operates in detail and finally to tune it's configuration
> parameters and trouble shoot it adequately ?
> 
> It seems there is little documentation on this topic (besides the source 
> code).
> Could somebody please point me to some useful sources of information ?

The best thing I ever found was corosync-blackbox ;-)

> 
> 
> Regards,
> Martin Schlegel
> 
> ___
> Users mailing list: Users@clusterlabs.org 
> http://clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org 





___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] Troubleshooting Faulty Networks / Heartbeat Rings

2016-10-26 Thread Martin Schlegel

Hello all

One one of our test clusters the network seems to be dropping messages at
different times of the day - we know it was not a network latency issue. We
could prove it via iperf - a local network test utility.

However, I wish there was some more detailed logs than the retransmit log
messages we are seeing. Even with debug enabled in Corosync it was next to
impossible for me to get confirmation from the logs about what is causing it and
how it affects the heartbeat ring.

How can I can track the heartbeat ring in action using time stamps to first
understand how it operates in detail and finally to tune it's configuration
parameters and trouble shoot it adequately ?

It seems there is little documentation on this topic (besides the source code).
Could somebody please point me to some useful sources of information ?


Regards,
Martin Schlegel

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] Live migration not working on shutdown

2016-10-26 Thread Rainer Nerb


Hello all,

we're currently testing a 2-node-cluster with 2 vms and live migration  
on CentOS 7.2 and Pacemaker 1.1.13-10 with disks on iSCSI-targets and  
migration via ssh-method.


Live migration works, if we issue "pcs resource move ...", "pcs  
cluster standby", "pcs cluster stop" and even "systemctl rescue".
The latter only worked, after adding the following additional  
dependencies to pacemaker.service and leaving the management of those  
services to systemd:


  *   After/Requires=systemd-machined.service
  *   After/Requires=systemd-machine-id-commit.service
  *   After/Requires=remote-fs.target
  *   After/Requires=libvirtd.service
  *   After/Requires=iscsi.service
  *   After/Requires=iscsid.service
  *   After/Requires=sshd.service
When shutting down or rebooting migration fails and not even the  
regular shutdown of the vms succeeds. Systemd seems to tear down the  
vms by terminating something they depend on.


Is this a known issue? Did we miss any further dependencies?

Tia
Rainer


---
IT Nerb GmbH
Lessingstraße 8
85098 Großmehring

Telefon: +49 700 ITNERBGMBH
Telefax: +49 8407 939 284
email  : i...@it-nerb.de
Internet   : www.it-nerb.de

Geschäftsführer: Rainer Nerb
Handelsregister: HRB 2592
HR-Gericht : Ingolstadt
---___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] Change permissions from command line

2016-10-26 Thread Auer, Jens

Hi,

is it possible to change user permissions from the command line? I am currently 
changing them via the web interface, but I have to write a manual and just 
pasting commands is easier than showing screenshots. I know pcs can edit acl 
settings, but I don't know how to change permissions.

I found the permissions in /var/lib/pcsd/pcs_settings.conf. Is it ok to edit 
this file when the cluster is stopped and then just restart the cluster?

Best wishes,
  Jens

--
Jens Auer | CGI | Software-Engineer
CGI (Germany) GmbH & Co. KG
Rheinstraße 95 | 64295 Darmstadt | Germany
T: +49 6151 36860 154
jens.a...@cgi.com
Unsere Pflichtangaben gemäß § 35a GmbHG / §§ 161, 125a HGB finden Sie unter 
de.cgi.com/pflichtangaben.

CONFIDENTIALITY NOTICE: Proprietary/Confidential information belonging to CGI 
Group Inc. and its affiliates may be contained in this message. If you are not 
a recipient indicated or intended in this message (or responsible for delivery 
of this message to such person), or you think for any reason that this message 
may have been addressed to you in error, you may not use or copy or deliver 
this message to anyone else. In such case, you should destroy this message and 
are asked to notify the sender by reply e-mail.
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely

2016-10-26 Thread renayama19661014

Hi Klaus,
Hi Jan,
Hi All,

Our member argued about watchdog using WD service.

1) The WD service is not abolished.
2) In pacemaker_remote, it is available by starting corosync in localhost.
3) It is necessary for the scramble of watchdog to consider it.
4) Because I think about the case which does not use sbd, I do not think about 
adding an interface similar to corosync-API to sbd for the moment.

The user chooses a method using method and WD service using sbd and will use it.
It may cause confusion that there are two methods, but there is value for the 
user who does not use sbd.

We want to include watchdog using WD service in Pacemaker.
I intend to make an official patch.

What do you think?

Best Regards,
Hideo Yamauchi.



- Original Message -
> From: "renayama19661...@ybb.ne.jp" 
> To: Cluster Labs - All topics related to open-source clustering welcomed 
> 
> Cc: 
> Date: 2016/10/20, Thu 19:08
> Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is 
> frozen, cluster decisions are delayed infinitely
> 
> Hi Klaus,
> Hi Jan,
> 
> Thank you for comment.
> 
> I wait for other comment a little more.
> We will argue about this matter next week.
> 
> Best Regards,
> Hideo Yamauchi.
> 
> 
> - Original Message -
>>  From: Jan Friesse 
>>  To: kwenn...@redhat.com; Cluster Labs - All topics related to open-source 
> clustering welcomed 
>>  Cc: 
>>  Date: 2016/10/20, Thu 15:46
>>  Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd 
> is frozen, cluster decisions are delayed infinitely
>> 
>>> 
>>>   On 10/14/2016 11:21 AM, renayama19661...@ybb.ne.jp wrote:
   Hi Klaus,
   Hi All,
 
   I tried prototype of watchdog using WD service.
     - 
>> 
> https://github.com/HideoYamauchi/pacemaker/commit/3ee97b76e0212b1790226864dfcacd1a327dbcc9
 
   Please comment.
>>>   Thank you Hideo for providing the prototype.
>>>   Added the patch to my build and it seems to
>>>   be working as expected.
>>> 
>>>   A few thoughts triggered by this approach:
>>> 
>>>   - we have to alert the corosync-people as in
>>>      a chat with Jan Friesse he pointed me to the
>>>      fact that for corosync 3.x the wd-service was
>>>      planned to be removed
>> 
>>  Actually I didn't express myself correctly. What I wanted to say was 
>>  "I'm considering idea of removing it", simply because 
> it's 
>>  disabled in 
>>  downstream.
>> 
>>  BUT keep in mind that removing functionality = ask community to find out 
>>  if there is not somebody actively using it.
>> 
>>  And because there is active users and future use case, removing of wd is 
>>  not an option.
>> 
>> 
>>> 
>>>      especially delicate as the binding is very loose
>>>      so that - as is - it builds against a corosync with
>>>      disabled wd-service without any complaints...
>>> 
>>>   - as of now if you enable wd-service in the
>>>      corosync-build it is on by default and would
>>>      be hogging the watchdog presumably
>>>      (there is obviously a pull request that makes
>>>      it default to off)
>>> 
>>>   - with my thoughts about adding an API to
>>>      sbd previously in the thread I was trying to
>>>      target closer observation of pacemaker_remoted
>>>      as well (remote-nodes don't have corosync
>>>      running)
>>> 
>>>      I guess it would be possible to run corosync
>>>      with a static config as single-node cluster
>>>      bound to localhost for that purpose.
>>> 
>>>      I read the thread about corosync-remote and
>>>      that happening might make the special-handling
>>>      for pacemaker-remote obsolete anyway ...
>>> 
>>>   - to enable the approach to live alongside
>>>      sbd it would be possible to make sbd use
>>>      the corosync-API as well for watchdog purposes
>>>      instead of opening the watchdog directly
>>> 
>>>      This shouldn't be a big deal for sbd used to
>>>      observe a pacemaker-node as cluster-watcher
>>>      (the part of sbd that sends cpg-pings to corosync)
>>>      already builds against corosync.
>>>      The blockdevice-part of sbd being basically
>>>      generic it might be an issue though.
>>> 
>>>   Regards,
>>>   Klaus
>>> 
 
 
   Best Regards,
   Hideo Yamauchi.
 
 
   - Original Message -
>   From: "renayama19661...@ybb.ne.jp" 
>>  
>   To: "users@clusterlabs.org" 
> 
>   Cc:
>   Date: 2016/10/11, Tue 17:58
>   Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When 
> the 
>>  DC crmd is frozen, cluster decisions are delayed infinitely
> 
>   Hi Klaus,
> 
>   Thank you for comment.
> 
>   I make the patch which is prototype using WD service.
> 
>   Please wait a little.
> 
>   Best Regards,
>   Hideo Yamauchi.
> 
> 
> 
> 
>   - Original Message -
>>     From: Klaus Wenninger 
>>     To: users@clusterlabs.org
>>     Cc:
>>     Date: 2016/10/10, Mon 21:03

[ClusterLabs] Antw: Troubleshooting Faulty Networks / Heartbeat Rings

[ClusterLabs] Troubleshooting Faulty Networks / Heartbeat Rings

[ClusterLabs] Live migration not working on shutdown

[ClusterLabs] Change permissions from command line

Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely

5 matches

Site Navigation

Mail list logo

Footer information