[ClusterLabs] did anyone manage to combine ClusterMon RA with HP systems insight manager ?

2017-07-11 Thread Lentes, Bernd
Hi,

i established a two node cluster and i'd like to start now a test period with 
some not very important resources.
I'd like to monitor the cluster via SNMP, so i realize if he's e.g. migrating.
I followed 
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html-single/Pacemaker_Explained/index.html#s-notification-snmp
 .
Configuration of the RA went fine, and the cluster is sending traps. As a 
central management station we use a HP Systems Insight Manager, because we have 
a some HP servers and SIM can monitor the hardware quite well.
I tried to integrate the respective mib into the SIM. Compilation and Adding 
seem to work fine, but SIM does not relate the traps to the mib.
Traps arrive quite fine, i checked it with tcpdump and wireshark. For SIM the 
traps are "unregistered", that means it does not relate it to any mib.
I also tried the most recent one from 
https://github.com/ClusterLabs/pacemaker/blob/master/extra/PCMK-MIB.txt . Same 
problem. I also tried the one from https://github.com/sys4/pacemaker-snmp 
(combined with the respective perl-script), 
but it also did not work.
I'd like to stay with SIM, because of our HP hardware. And maintaining a second 
system, e.g. Nagios, just for the cluster, ... is this really necessary ? 
I read already a lot about SNMP and mib's, and what i learned until now is that 
SNMP is maybe "simple", but not trivial. Same with the mib's.

Did anyone combine these two successfully ?

We use SLES 11 SP4 and pacemaker 1.1.12.


Bernd

-- 
Bernd Lentes 

Systemadministration 
institute of developmental genetics 
Gebäude 35.34 - Raum 208 
HelmholtzZentrum München 
bernd.len...@helmholtz-muenchen.de 
phone: +49 (0)89 3187 1241 
fax: +49 (0)89 3187 2294 

no backup - no mercy
 

Helmholtz Zentrum Muenchen
Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
Ingolstaedter Landstr. 1
85764 Neuherberg
www.helmholtz-muenchen.de
Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe
Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons Enhsen
Registergericht: Amtsgericht Muenchen HRB 6466
USt-IdNr: DE 129521671


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antwort: Antw: Antwort: Re: reboot node / cluster standby

2017-07-11 Thread Ken Gaillot
On 07/11/2017 07:34 AM, philipp.achmuel...@arz.at wrote:
> 
> 
> Mit freundlichen Grüßen / best regards
> *
> Dipl.-Ing. (FH) Philipp Achmüller*
> *
> ARZ Allgemeines Rechenzentrum GmbH*
> UNIX Systems
> 
> A-6020 Innsbruck, Tschamlerstraße 2
> Tel: +43 / (0)50 4009-1917 _
> __philipp.achmuel...@arz.at _ _
> __http://www.arz.at_ 
> Landes- als Handelsgericht Innsbruck, FN 38653v
> DVR: 0419427
> _
> _
> 
> 
> 
> "Ulrich Windl"  schrieb am 06.07.2017
> 09:24:12:
> 
>> Von: "Ulrich Windl" 
>> An: 
>> Datum: 06.07.2017 09:28
>> Betreff: [ClusterLabs] Antw:  Antwort: Re:  reboot node / cluster standby
>>
>> >>>  schrieb am 03.07.2017 um 15:30 in
> Nachricht
>> :
>> > Ken Gaillot  schrieb am 29.06.2017 21:15:59:
>> >
>> >> Von: Ken Gaillot 
>> >> An: Ludovic Vaugeois-Pepin , Cluster Labs - All
>> >> topics related to open-source clustering welcomed
>> > 
>> >> Datum: 29.06.2017 21:19
>> >> Betreff: Re: [ClusterLabs] reboot node / cluster standby
>> >>
>> >> On 06/29/2017 01:38 PM, Ludovic Vaugeois-Pepin wrote:
>> >> > On Thu, Jun 29, 2017 at 7:27 PM, Ken Gaillot 
>> > wrote:
>> >> >> On 06/29/2017 04:42 AM, philipp.achmuel...@arz.at wrote:
>> >> >>> Hi,
>> >> >>>
>> >> >>> In order to reboot a Clusternode i would like to set the node to
>> > standby
>> >> >>> first, so a clean takeover for running resources can take in place.
>> >> >>> Is there a default way i can set in pacemaker, or do i have to
> setup
>> > my
>> >> >>> own systemd implementation?
>> >> >>>
>> >> >>> thank you!
>> >> >>> regards
>> >> >>> 
>> >> >>> env:
>> >> >>> Pacemaker 1.1.15
>> >> >>> SLES 12.2
>> >> >>
>> >> >> If a node cleanly shuts down or reboots, pacemaker will move all
>> >> >> resources off it before it exits, so that should happen as you're
>> >> >> describing, without needing an explicit standby.
>> >> >
>> >
>> > how does this work when evacuating e.g. 5 nodes out of a 10 node
> cluster
>> > at the same time?
>>
>> What is the command to to do that? If doing it sequentially, I'd
>> wait before the DC returns to IDLE state before starting the next
>> command. One rule of cluster is "be patient!" ;-)
>> [...]
> 
> on other cluster-software i used the standby function to free several
> Nodes from Resources in parallel and issued a distributed shutdown from
> my jumphost afterwards. when resource move is beeing initiated during
> server shutdown i think i have to do it sequential, or does pacemaker
> can handle shutdown command from several nodes parallel?
> 
>>
>> Regards,
>> Ulrich

Pacemaker can shutdown any number of nodes in parallel, though of course
if there is some time between each, there may be unnecessary resource
migrations as resources move to a node that is soon to be shut down
itself and so have to move again. If they are shut down within a few
seconds, Pacemaker will only have to move (or stop) resources once.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Antwort: Re: PCSD - Change Port 2224

2017-07-11 Thread philipp . achmueller
Tomas Jelinek  schrieb am 10.07.2017 14:49:45:

> Von: Tomas Jelinek 
> An: users@clusterlabs.org
> Datum: 10.07.2017 14:54
> Betreff: Re: [ClusterLabs] PCSD - Change Port 2224
> 
> Dne 6.7.2017 v 10:47 philipp.achmuel...@arz.at napsal(a):
> > Hi,
> >
> > I would like to change default Port for webaccess - actually this is
> > "hardcoded" to 2224 - any plans to integrate this into any config file
> > so this could be changed more easy?
> 
> Hi,
> 
> Yes, we plan to make the port configurable. The feature request is 
> tracked here:
> https://bugzilla.redhat.com/show_bug.cgi?id=1415197

thank you!

> 
> Regards,
> Tomas
> 
> >
> > thank you!
> > regards
> > Philipp
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Antwort: Antw: Antwort: Re: reboot node / cluster standby

2017-07-11 Thread philipp . achmueller
Mit freundlichen Grüßen / best regards 

Dipl.-Ing. (FH) Philipp Achmüller

ARZ Allgemeines Rechenzentrum GmbH
UNIX Systems

A-6020 Innsbruck, Tschamlerstraße 2 
Tel: +43 / (0)50 4009-1917 
philipp.achmuel...@arz.at 
http://www.arz.at 
Landes- als Handelsgericht Innsbruck, FN 38653v 
DVR: 0419427




"Ulrich Windl"  schrieb am 06.07.2017 
09:24:12:

> Von: "Ulrich Windl" 
> An: 
> Datum: 06.07.2017 09:28
> Betreff: [ClusterLabs] Antw:  Antwort: Re:  reboot node / cluster 
standby
> 
> >>>  schrieb am 03.07.2017 um 15:30 in 
Nachricht
> :
> > Ken Gaillot  schrieb am 29.06.2017 21:15:59:
> > 
> >> Von: Ken Gaillot 
> >> An: Ludovic Vaugeois-Pepin , Cluster Labs - All
> >> topics related to open-source clustering welcomed 
> > 
> >> Datum: 29.06.2017 21:19
> >> Betreff: Re: [ClusterLabs] reboot node / cluster standby
> >> 
> >> On 06/29/2017 01:38 PM, Ludovic Vaugeois-Pepin wrote:
> >> > On Thu, Jun 29, 2017 at 7:27 PM, Ken Gaillot  
> > wrote:
> >> >> On 06/29/2017 04:42 AM, philipp.achmuel...@arz.at wrote:
> >> >>> Hi,
> >> >>>
> >> >>> In order to reboot a Clusternode i would like to set the node to 
> > standby
> >> >>> first, so a clean takeover for running resources can take in 
place.
> >> >>> Is there a default way i can set in pacemaker, or do i have to 
setup 
> > my
> >> >>> own systemd implementation?
> >> >>>
> >> >>> thank you!
> >> >>> regards
> >> >>> 
> >> >>> env:
> >> >>> Pacemaker 1.1.15
> >> >>> SLES 12.2
> >> >>
> >> >> If a node cleanly shuts down or reboots, pacemaker will move all
> >> >> resources off it before it exits, so that should happen as you're
> >> >> describing, without needing an explicit standby.
> >> > 
> > 
> > how does this work when evacuating e.g. 5 nodes out of a 10 node 
cluster 
> > at the same time?
> 
> What is the command to to do that? If doing it sequentially, I'd 
> wait before the DC returns to IDLE state before starting the next 
> command. One rule of cluster is "be patient!" ;-)
> [...]

on other cluster-software i used the standby function to free several 
Nodes from Resources in parallel and issued a distributed shutdown from my 
jumphost afterwards. when resource move is beeing initiated during server 
shutdown i think i have to do it sequential, or does pacemaker can handle 
shutdown command from several nodes parallel?

> 
> Regards,
> Ulrich
> 
> 
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] fence_vbox Unable to connect/login to fencing device

2017-07-11 Thread ArekW
Adding login_timeout=30 solved the stonith problem. Thank you very much!

Pozdrawiam,
Arek

2017-07-11 13:06 GMT+02:00 Marek Grac :

> Hi,
>
> On Tue, Jul 11, 2017 at 11:13 AM, ArekW  wrote:
>
>> Hi,
>> I may be wrong but it doesn't seem to be timeout problem because the log
>> repeats the same way every few minutes and it contains "Unable to connect"
>> and just after that there is list of vms etc so It has connected
>> successfully.
>>
>
> After an un-succesful attempt to monitor, your settings my attempt to do
> next attempt. In some cases, second ssh connection may be much faster. So
> second attempt will success more often.
>
>
>> I described a active-active failover problem in separate mail. When a
>> node is poweroff the cluster enters UNCLEAN status and whole thing hungs.
>> Could it be related to stonith problem? I'm out of ideas what is wrong
>> because I seems to work manually but seems not to work as a fence process.
>> How can I increase the login_timeout (Is it for stonith?)
>>
>
> add login_timeout=XXs (or look at manual pages for other timeout options)
>
> m,
>
>
>> Thanks
>> Arek
>>
>> 2017-07-10 13:10 GMT+02:00 Marek Grac :
>>
>>>
>>>
>>> On Fri, Jul 7, 2017 at 1:45 PM, ArekW  wrote:
>>>
 The reason for --force is:
 Error: missing required option(s): 'ipaddr, login, plug' for resource
 type: stonith:fence_vbox (use --force to override)

>>>
>>> It looks like you use unreleased upstream of fence agents without a
>>> similary new version of pcs (with the commit 7f85340b7aa4e8c016720012cf42c3
>>> 04e68dd1fe)
>>>
>>>

 I have selinux disabled on both nodes:
 [root@nfsnode1 ~]# cat /etc/sysconfig/selinux
 SELINUX=disabled

 pcs stonith update vbox-fencing verbose=true
 Error: resource option(s): 'verbose', are not recognized for resource
 type: 'stonith::fence_vbox' (use --force to override)

>>>
>>> It shoulbe fixed in commit b47558331ba6615aa5720484301d644cc8e973fd
>>> (Jun 12)
>>>
>>>


>>>

 Jul  7 13:37:49 nfsnode1 fence_vbox: Unable to connect/login to fencing
 device
 Jul  7 13:37:49 nfsnode1 stonith-ng[2045]: warning: fence_vbox[4765]
 stderr: [ Running command: /usr/bin/ssh -4  AW23321@10.0.2.2 -i
 /root/.ssh/id_rsa -p 22 -t '/bin/bash -c "PS1=\\[EXPECT\\]#\  /bin/bash
 --noprofile --norc"' ]

>>>
>>> ok, so sometimes it works and sometimes not. It looks like that our
>>> timeouts are set quite strict for your environment. Try to increase
>>> login_timeout from default 30s higher.
>>>
>>> m,
>>>
>>> ___
>>> Users mailing list: Users@clusterlabs.org
>>> http://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>>
>>
>> ___
>> Users mailing list: Users@clusterlabs.org
>> http://lists.clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] fence_vbox Unable to connect/login to fencing device

2017-07-11 Thread Marek Grac
Hi,

On Tue, Jul 11, 2017 at 11:13 AM, ArekW  wrote:

> Hi,
> I may be wrong but it doesn't seem to be timeout problem because the log
> repeats the same way every few minutes and it contains "Unable to connect"
> and just after that there is list of vms etc so It has connected
> successfully.
>

After an un-succesful attempt to monitor, your settings my attempt to do
next attempt. In some cases, second ssh connection may be much faster. So
second attempt will success more often.


> I described a active-active failover problem in separate mail. When a node
> is poweroff the cluster enters UNCLEAN status and whole thing hungs. Could
> it be related to stonith problem? I'm out of ideas what is wrong because I
> seems to work manually but seems not to work as a fence process.
> How can I increase the login_timeout (Is it for stonith?)
>

add login_timeout=XXs (or look at manual pages for other timeout options)

m,


> Thanks
> Arek
>
> 2017-07-10 13:10 GMT+02:00 Marek Grac :
>
>>
>>
>> On Fri, Jul 7, 2017 at 1:45 PM, ArekW  wrote:
>>
>>> The reason for --force is:
>>> Error: missing required option(s): 'ipaddr, login, plug' for resource
>>> type: stonith:fence_vbox (use --force to override)
>>>
>>
>> It looks like you use unreleased upstream of fence agents without a
>> similary new version of pcs (with the commit 7f85340b7aa4e8c016720012cf42c3
>> 04e68dd1fe)
>>
>>
>>>
>>> I have selinux disabled on both nodes:
>>> [root@nfsnode1 ~]# cat /etc/sysconfig/selinux
>>> SELINUX=disabled
>>>
>>> pcs stonith update vbox-fencing verbose=true
>>> Error: resource option(s): 'verbose', are not recognized for resource
>>> type: 'stonith::fence_vbox' (use --force to override)
>>>
>>
>> It shoulbe fixed in commit b47558331ba6615aa5720484301d644cc8e973fd (Jun
>> 12)
>>
>>
>>>
>>>
>>
>>>
>>> Jul  7 13:37:49 nfsnode1 fence_vbox: Unable to connect/login to fencing
>>> device
>>> Jul  7 13:37:49 nfsnode1 stonith-ng[2045]: warning: fence_vbox[4765]
>>> stderr: [ Running command: /usr/bin/ssh -4  AW23321@10.0.2.2 -i
>>> /root/.ssh/id_rsa -p 22 -t '/bin/bash -c "PS1=\\[EXPECT\\]#\  /bin/bash
>>> --noprofile --norc"' ]
>>>
>>
>> ok, so sometimes it works and sometimes not. It looks like that our
>> timeouts are set quite strict for your environment. Try to increase
>> login_timeout from default 30s higher.
>>
>> m,
>>
>> ___
>> Users mailing list: Users@clusterlabs.org
>> http://lists.clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] fence_vbox Unable to connect/login to fencing device

2017-07-11 Thread ArekW
Hi,
I may be wrong but it doesn't seem to be timeout problem because the log
repeats the same way every few minutes and it contains "Unable to connect"
and just after that there is list of vms etc so It has connected
successfully.
I described a active-active failover problem in separate mail. When a node
is poweroff the cluster enters UNCLEAN status and whole thing hungs. Could
it be related to stonith problem? I'm out of ideas what is wrong because I
seems to work manually but seems not to work as a fence process.
How can I increase the login_timeout (Is it for stonith?)
Thanks
Arek

2017-07-10 13:10 GMT+02:00 Marek Grac :

>
>
> On Fri, Jul 7, 2017 at 1:45 PM, ArekW  wrote:
>
>> The reason for --force is:
>> Error: missing required option(s): 'ipaddr, login, plug' for resource
>> type: stonith:fence_vbox (use --force to override)
>>
>
> It looks like you use unreleased upstream of fence agents without a
> similary new version of pcs (with the commit 7f85340b7aa4e8c016720012cf42c3
> 04e68dd1fe)
>
>
>>
>> I have selinux disabled on both nodes:
>> [root@nfsnode1 ~]# cat /etc/sysconfig/selinux
>> SELINUX=disabled
>>
>> pcs stonith update vbox-fencing verbose=true
>> Error: resource option(s): 'verbose', are not recognized for resource
>> type: 'stonith::fence_vbox' (use --force to override)
>>
>
> It shoulbe fixed in commit b47558331ba6615aa5720484301d644cc8e973fd (Jun
> 12)
>
>
>>
>>
>
>>
>> Jul  7 13:37:49 nfsnode1 fence_vbox: Unable to connect/login to fencing
>> device
>> Jul  7 13:37:49 nfsnode1 stonith-ng[2045]: warning: fence_vbox[4765]
>> stderr: [ Running command: /usr/bin/ssh -4  AW23321@10.0.2.2 -i
>> /root/.ssh/id_rsa -p 22 -t '/bin/bash -c "PS1=\\[EXPECT\\]#\  /bin/bash
>> --noprofile --norc"' ]
>>
>
> ok, so sometimes it works and sometimes not. It looks like that our
> timeouts are set quite strict for your environment. Try to increase
> login_timeout from default 30s higher.
>
> m,
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org