Re: [ClusterLabs] [ClusterLabs Developers] Resource Agent language discussion

2015-08-10 Thread Andrew Beekhof

 On 8 Aug 2015, at 1:14 am, Jehan-Guillaume de Rorthais j...@dalibo.com 
 wrote:
 
 Hi Jan,
 
 On Fri, 7 Aug 2015 15:36:57 +0200
 Jan Pokorný jpoko...@redhat.com wrote:
 
 On 07/08/15 12:09 +0200, Jehan-Guillaume de Rorthais wrote:
 Now, I would like to discuss about the language used to write a RA in
 Pacemaker. I never seen discussion or page about this so far.
 
 it wasn't in such a heretic :) tone, but I tried to show that it
 is extremely hard (if not impossible in some instances) to write
 bullet-proof code in bash (or POSIX shell, for that matter) because
 it's so cumbersome to move from whitespace-delimited words as
 a single argument and words as standalone arguments back and forth,
 connected with quotation-desired/-counterproductive madness
 (what if one wants to indeed pass quotation marks as legitimate
 characters within the passed value, etc.) few months back:
 
 http://clusterlabs.org/pipermail/users/2015-May/000403.html
 (even on developers list, but with fewer replies and broken threading:
 http://clusterlabs.org/pipermail/developers/2015-May/23.html).
 
 Thanks for the links and history. You add some more argument to my points :)
 
 HINT: I don't want to discuss (neither troll about) what is the best
 language. I would like to know why **ALL** the RA are written in
 bash
 
 I would expect the original influence were the init scripts (as RAs
 are mostly just enriched variants to support more flexible
 configuration and better diagnostics back to the cluster stack),
 which in turn were born having simplicity and ease of debugging
 (maintainability) in mind.
 
 That sounds legitimate. And bash is still appropriate for some simple RA.
 
 But for the same ease of code debugging and maintainability arguments (and 
 many
 others), complexe RA shouldn't use shell as language.

You can and should use whatever language you like for your own private RAs.
But if you want it accepted and maintained by the resource-agents project, you 
would be advised to use the language they have standardised on.

As always, the people doing the work get to make the rules.

 
 and if there's traps (hidden far in ocf-shellfuncs as instance)
 to avoid if using a different language. And is it acceptable to
 include new libs for other languages?
 
 https://github.com/ClusterLabs/resource-agents/blob/v3.9.6/doc/dev-guides/ra-dev-guide.txt#L33
 doesn't make any assumption about the target language beside stating
 what's a common one.
 
 Yes, I know that page. But this dev guide focus on shell and have some
 assumptions about ocf-shellfuncs.
 
 I'll take the same exemple than in my previous message, there's nothing
 about the best practice for logging. In the Script variables section, some
 comes from environment, others from ocf-shellfuncs.
 
 We rewrote the RA in perl, mostly because of me. I was bored with bash/sh
 limitations AND syntax AND useless code complexity for some easy tasks AND
 traps (return code etc). In my opinion, bash/sh are fine if you RA code is
 short and simple. Which was mostly the case back in the time of heartbeat
 which was stateless only. But it became a nightmare with multi-state agents
 struggling with complexe code to fit with Pacemaker behavior. Have a look
 to the mysql or pgsql agents.
 
 Moreover, with bash, I had some weird behaviors (timeouts) from the RA
 between runuser/su/sudo and systemd/pamd some months ago. The three of them
 have system implications or side effects deep in the system you need to
 take care off. Using a language able to seteuid/setuid after forking is
 much more natural and clean to drop root privileges and start the daemon
 (PostgreSQL refuses to start as root and is not able to drop its privileges
 to another system user itself).
 
 Other disadvantage of shell scripts is that frequently many processes
 are spawned for simple changes within the filesystem and for string
 parsing/reformatting, which in turn creates a dependency on plenty
 of external executables.
 
 True. Either you need to pipe multi small programs, forking all of them
 (cat|grep|cut|...), sometime with different behavior depending on the system 
 or
 use a complexe one most people don't want to hear anymore (sed, awk, perl, 
 ...).
 In the later case, you not only have to master bash, but other languages as
 well.
 
 Now, we are far to have a enterprise class certified code, our RA had its
 very first tests passed successfully yesterday, but here is a quick
 feedback. The downside of picking another language than bash/sh is that
 there is no OCF module/library available for them. This is quite
 inconvenient when you need to get system specifics variables or logging
 shortcut only defined in ocf-shellfuncs (and I would guess patched by
 packagers ?).
 
 As instance, I had to capture values of $HA_SBIN_DIR and $HA_RSCTMP from
 my perl code.
 
 There could be a shell wrapper that would put these values into the
 environment and then executed the target itself for its disposal
 (generic 

Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: Antw: Re: Antw: pacemaker doesn't correctly handle a resource after time/date change

2015-08-10 Thread Kostiantyn Ponomarenko
  I still fail to see the use case for setting the time backwards while
the cluster is up and running.
Essentially the issue can appear if NTP server is not reachable at the boot
time before the cluster is started (automatically) and a node for some
reason has a wrong time (say, 1 hour forward).
So, then, after NTP becomes reachable, the bug appears.

Thank you,
Kostya

On Mon, Aug 10, 2015 at 9:13 AM, Ulrich Windl 
ulrich.wi...@rz.uni-regensburg.de wrote:

  Kostiantyn Ponomarenko konstantin.ponomare...@gmail.com schrieb am
 07.08.2015
 um 16:43 in Nachricht
 caenth0do6w8_extevpus6ehobke7liktppraqpyark7fwou...@mail.gmail.com:
  Hi Andrew,
 
  So the issue is:
 
  Having one node up and running, set time on the node backward to, say, 15
  min (generally more than 10 min), then do stop for a resource.
  That leads to the next - the cluster fails the resource once, then shows
 it
  as started, but the resource actually remains stopped.

 I guess it's due to chronology saying the resource had been stopped 15
 minutes before it was started. I still fail to see the use case for setting
 the time backwards while the cluster is up and running. Every cluster I
 know has the requirement of synchronized time. Synchronized time, in turn,
 implies that the time doesn't go backwards.

 Some databases use sequence numbers instead of time stamps, but It
 wouldn't surprise me if there were some US patent on that ;-)

 
  Do you need more input from me on the issue?
 
  Thank you,
  Kostya
 
  On Wed, Aug 5, 2015 at 3:01 AM, Andrew Beekhof and...@beekhof.net
 wrote:
 
 
   On 4 Aug 2015, at 7:31 pm, Kostiantyn Ponomarenko 
  konstantin.ponomare...@gmail.com wrote:
  
  
   On Tue, Aug 4, 2015 at 3:57 AM, Andrew Beekhof and...@beekhof.net
  wrote:
   Github might be another.
  
   I am not able to open an issue/bug here
  https://github.com/ClusterLabs/pacemaker
 
  Oh, for pacemaker bugs see http://clusterlabs.org/help.html
  Can someone clearly state what the issue is?  The thread was quite
  fractured and hard to follow.
 
  
   Thank you,
   Kostya
   ___
   Users mailing list: Users@clusterlabs.org
   http://clusterlabs.org/mailman/listinfo/users
  
   Project Home: http://www.clusterlabs.org
   Getting started:
 http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
   Bugs: http://bugs.clusterlabs.org
 
 
  ___
  Users mailing list: Users@clusterlabs.org
  http://clusterlabs.org/mailman/listinfo/users
 
  Project Home: http://www.clusterlabs.org
  Getting started:
 http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 





 ___
 Users mailing list: Users@clusterlabs.org
 http://clusterlabs.org/mailman/listinfo/users

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Memory leak in crm_mon ?

2015-08-10 Thread Attila Megyeri
Hi!

We are building a new cluster on top of pacemaker/corosync and several times 
during the past days we noticed that crm_mon -Af used up all the memory+swap 
and caused high CPU usage. Killing the process solves the issue.

We are using the binary package versions available in the latest ubuntu trusty, 
namely:

crmsh  1.2.5+hg1034-1ubuntu4
pacemaker1.1.10+git20130802-1ubuntu2.3
pacemaker-cli-utils1.1.10+git20130802-1ubuntu2.3
corosync 2.3.3-1ubuntu1

Kernel is 3.13.0-46-generic

Looking back some atop data, the CPU went to 100% many times during the last 
couple of days, at various times, more often around midnight exaclty (strange).

08.05 14:00
08.06 21:41
08.07 00:00
08.07 00:00
08.08 00:00
08.09 06:27

Checked the corosync log and syslog, but did not find any correlation between 
the entries int he logs around the specific times.
For most of the time, the node running the crm_mon was the DC as well - not 
running any resources (e.g. a pairless node for quorum).


We have another running system, where everything works perfecly, whereas it is 
almost the same:

crmsh  1.2.5+hg1034-1ubuntu4
pacemaker1.1.10+git20130802-1ubuntu2.1
pacemaker-cli-utils1.1.10+git20130802-1ubuntu2.1
corosync 2.3.3-1ubuntu1

Kernel is 3.13.0-8-generic


Is this perhaps a known issue? Any hints?

Thanks!
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org