Re: [Linux-ha-dev] interface monitoring

2018-08-01 Thread Alan Robertson
The Linux-HA project has been obsolete since around 2007. You should switch to 
the Pacemaker project. It's capable of doing what you want - and it's not dead 
;-)

On Wed, Aug 1, 2018, at 9:02 AM, Chanandler Bong wrote:
> Hello,
> i have 2 clusters. The master cluster gives up resources when the
> given ping ip address is unreachable, but what i want to do is release
> resources when a specific network interface is down. How can i do it?
> Thank you.
> ___
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/


-- 
  Alan Robertson
  al...@unix.sh
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] Looking for a few good queries ;-)

2015-01-08 Thread Alan Robertson
Hi,

If anyone would like to try their hand at it, there are a few Cypher
queries I'd like to have written...

They relate to dependencies...

 1. Compute the set of all services which depend directly or indirectly
upon the given starting service
 2. Same as #1 - but with a server as a starting place
 3. Compute the set of all services upon which the given starting
service depends (inverse of #1)
 4. Same as #3 - but with a server as a starting place

And it would be nice if there we could have another 4 queries but which
delivered servers as output instead of services  :-D

These are likely to be pretty complex Cypher queries...

One could even imagine versions of these 4 (or 8) queries where there
was an input that said follow no more than n levels of indirection -
or you could just write them that way in the first place...

By the way, I suspect the outputs of these queries probably ought to be
paths...  At least that's what comes to my mind...

The reasons why I think this is valuable is for the GUI.  This is
exactly what we need for displaying a group of related servers/services
in the GUI...


-- Alan Robertson
   al...@unix.sh


___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] Encryption progress...

2014-12-08 Thread Alan Robertson
Hi all,

I've drawn a few diagrams to represent my current ideas about how to
manage encryption.  I'll be posting them later this week along with text
explaining how I see this working for another round of criticism.  I
have a talk to give tomorrow, and I'll likely include them in the slide
deck.  It's a technical talk on distributed computing.  Seems
appropriate ;-).

A short summary:
Every nanoprobe has its own keypair
We will use Trust On First Use (TOFU) for nanoprobes
CMA public keys will be distributed with the software
We are able to deal with having more than one CMA public key, making
it easier to eventually deal with compromised CMA keys

The low-level code to support this is written, in the repository and it
works(!).  None of the high-level policy stuff is there yet.  The code
that works is basically a ping testing program that deliberately loses
packets to encourage the protocol to recover from lost packets.  This
code turned out to be simpler than I thought it would be.  That's a
rarity, for sure!

-- Alan Robertson
   al...@unix.sh
   
   

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] RFC: pidfile handling; current worst case: stop failure and node level fencing

2014-10-24 Thread Alan Robertson
On 10/24/2014 03:32 AM, Lars Marowsky-Bree wrote:
 On 2014-10-23T20:36:38, Lars Ellenberg lars.ellenb...@linbit.com wrote:

 If we want to require presence of start-stop-daemon,
 we could make all this somebody elses problem.
 I need find some time to browse through the code
 to see if it can be improved further.
 But in any case, using (a tool like) start-stop-daemon consistently
 throughout all RAs would improve the situation already.

 Do we want to do that?
 Dejan? David? Anyone?
 I'm showing my age, but Linux FailSafe had such a tool as well. ;-) So
 that might make sense.

 Though in Linux nowadays, I wonder if one might not directly want to add
 container support to the LRM, or directly use systemd. With a container,
 all processes that the RA started would be easily tracked.

Process groups do that nicely.  The LRM (at least used to) put
everything in a process group.
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] RFC: pidfile handling; current worst case: stop failure and node level fencing

2014-10-22 Thread Alan Robertson
On 10/22/2014 03:33 AM, Dejan Muhamedagic wrote:
 Hi Alan,

 On Mon, Oct 20, 2014 at 02:52:13PM -0600, Alan Robertson wrote:
 For the Assimilation code I use the full pathname of the binary from
 /proc to tell if it's one of mine.  That's not perfect if you're using
 an interpreted language.  It works quite well for compiled languages.
 Yes, though not perfect, that may be good enough. I supposed that
 the probability that the very same program gets the same recycled
 pid is rather low. (Or is it?)
From my 'C' code I could touch the lock file to match the timestamp of
the /proc/pid/stat (or /proc/pid/exe) symlink -- and verify that they
match.  If there is no /proc/pid/stat, then you won't get that extra
safeguard.  But as you suggest, it decreases the probability by orders
of magnitude even without the

The /proc/pid/exe symlink appears to have the same timestamp as
/proc/pid/stat

Does anyone know which OSes have either or both of those /proc names?

-- AlanRobertson
   al...@unix.sh


___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] RFC: pidfile handling; current worst case: stop failure and node level fencing

2014-10-22 Thread Alan Robertson
On 10/22/2014 07:09 AM, Dejan Muhamedagic wrote:
 On Wed, Oct 22, 2014 at 06:50:37AM -0600, Alan Robertson wrote:
 On 10/22/2014 03:33 AM, Dejan Muhamedagic wrote:
 Hi Alan,

 On Mon, Oct 20, 2014 at 02:52:13PM -0600, Alan Robertson wrote:
 For the Assimilation code I use the full pathname of the binary from
 /proc to tell if it's one of mine.  That's not perfect if you're using
 an interpreted language.  It works quite well for compiled languages.
 Yes, though not perfect, that may be good enough. I supposed that
 the probability that the very same program gets the same recycled
 pid is rather low. (Or is it?)
 From my 'C' code I could touch the lock file to match the timestamp of
 the /proc/pid/stat (or /proc/pid/exe) symlink -- and verify that they
 match.  If there is no /proc/pid/stat, then you won't get that extra
 safeguard.  But as you suggest, it decreases the probability by orders
 of magnitude even without the

 The /proc/pid/exe symlink appears to have the same timestamp as
 /proc/pid/stat
 Hmm, not here:

 $ sudo ls -lt /proc/1
 ...
 lrwxrwxrwx 1 root root 0 Aug 27 13:51 exe - /sbin/init
 dr-x-- 2 root root 0 Aug 27 13:51 fd
 -r--r--r-- 1 root root 0 Aug 27 13:20 cmdline
 -r--r--r-- 1 root root 0 Aug 27 13:18 stat

 And the process (init) has been running since July:

 $ ps auxw | grep -w [i]nit
 root 1  0.0  0.0  10540   780 ?Ss   Jul07   1:03 init [3]

 Interesting.
And a little worrisome for these strategies...

Here is what I see for timestamps that look to be about the time of
system boot:

-r 1 root root 0 Oct 21 15:42 environ
lrwxrwxrwx 1 root root 0 Oct 21 15:42 root - /
-r--r--r-- 1 root root 0 Oct 21 15:42 limits
dr-x-- 2 root root 0 Oct 21 15:42 fd
lrwxrwxrwx 1 root root 0 Oct 21 15:42 exe - /sbin/init
-r--r--r-- 1 root root 0 Oct 21 15:42 stat
-r--r--r-- 1 root root 0 Oct 21 15:42 cgroup
-r--r--r-- 1 root root 0 Oct 21 15:42 cmdline

servidor:/proc/1 $ ls -l /var/log/boot.log
-rw-r--r-- 1 root root 5746 Oct 21 15:42 /var/log/boot.log

servidor:/proc/1 $ ls -ld .
dr-xr-xr-x 9 root root 0 Oct 21 15:42 .

So, you can open file descriptors (fd), change your environment and
cmdline and (soft) limits.  You can't change your exe, or root.  Cgroup
is new, and I suspect you can't change it.  I suspect that the directory
timestamp (/proc//pid/) won't change either.

I wonder if it will change on BSD or Solaris or AIX.

/proc info for AIX:
   
http://www-01.ibm.com/support/knowledgecenter/ssw_aix_61/com.ibm.aix.files/proc.htm
It doesn't say anything about file timestamps.
Solaris info is here:
http://docs.oracle.com/cd/E23824_01/html/821-1473/proc-4.html#scrolltoc
It also doesn't mention timestamps.
FreeBSD is here:
http://www.unix.com/man-page/freebsd/5/procfs/



___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] RFC: pidfile handling; current worst case: stop failure and node level fencing

2014-10-22 Thread Alan Robertson
On 10/22/2014 07:11 AM, Tim Small wrote:
 On 22/10/14 13:50, Alan Robertson wrote:
 Does anyone know which OSes have either or both of those /proc names?
 Once again, can I recommend taking a look at the start-stop-daemon
 source (see earlier posting), which does this stuff, and includes checks
 for Linux/Hurd/Sun/OpenBSD/FreeBSD/NetBSD/DragonFly, and whilst I've
 only ever used it on Linux, at the very least the BSD side seems to be
 maintained:

 http://anonscm.debian.org/cgit/dpkg/dpkg.git/tree/utils/start-stop-daemon.c
According to how you described it earlier, it didn't seem to solve the
problems described in this thread. At best it does pretty much exactly
what my previously-implemented solution does.

This discussion has been a bit esoteric.  Although my method (and also
start-stop-daemon) are highly unlikely to err, they can make mistakes in
some circumstances.

-- Alan Robertson
   al...@unix.sh
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] RFC: pidfile handling; current worst case: stop failure and node level fencing

2014-10-21 Thread Alan Robertson

On 10/21/2014 2:29 AM, Lars Ellenberg wrote:

On Mon, Oct 20, 2014 at 11:21:36PM +0200, Lars Ellenberg wrote:

On Mon, Oct 20, 2014 at 03:04:31PM -0600, Alan Robertson wrote:

On 10/20/2014 02:52 PM, Alan Robertson wrote:

For the Assimilation code I use the full pathname of the binary from
/proc to tell if it's one of mine.  That's not perfect if you're using
an interpreted language.  It works quite well for compiled languages.

It works just as well (or as bad) from interpreted languages:
readlink /proc/$pid/exe
(very old linux has a fsid:inode encoding there, but I digress)

But that does solve a different subset of problems,
has race conditions in itself, and breaks if you have updated the binary
since start of that service (which does happen).

Sorry, I lost the original.
Alan then wrote:


It only breaks if you change the *name* of the binary.  Updating the
binary contents has no effect.  Changing the name of the binary is
pretty unusual - or so it seems to me.  Did I miss something?

And if you do, you should stop with the binary with the old version and
start it with the new one.  Very few methods are going to deal well with
radical changes in the service without stopping it with the old script,
updating, and starting with the new script.

Well, the pid starttime method does...


I don't believe I see the race condition.

Does not matter.


It won't loop, and it's not fooled by pid wraparound.  What else are you
looking for? [Guess I missed something else here]

pid + exe is certainly is better than the pid alone.
It may even be good enough.

But it still has shortcomings.

/proc/pid/exe is not stable,
(changes to deleted if the binary is deleted)
could be accounted for.

/proc/pid/exe links to the interpreter (python, bash, java, whatever)

Even if it is a real binary, (pid, /proc/pid/exe) is
still NOT unique for pid re-use after wrap around:
think different instances of mysql or whatever.
(yes, it gets increasingly unlikely...)
For most cases, a persistent daemon is a compiled language.  Of course 
not all, but all the ones I personally care about ;-)


However, (pid, starttime) *is* unique (for the lifetime of the pidfile,
as long as that is stored on tmpfs resp. cleared after reboot).
(unless you tell me you can eat through pid_max, or at least the
currently unused pids, within the granularity of starttime...)

So that's why I propose to use (pid, starttime) tuple.

If you see problems with (pid, starttime), please speak up.
If you have something *better*, please speak up.
If you just have something different,
feel free to tell us anyways :-)


The contents of the pidfile are specified by the LSB (or at least they 
were at some time in the past)  That's why I use just the pid. The 
current version specifies that the first line of a pidfile consists of 
one or more numbers, and any subsequent lines should be ignored.  If you 
go the way you do, I'd suggest other data be put on a separate lines.


You might compare what you're doing to 
http://refspecs.linuxbase.org/LSB_3.1.1/LSB-Core-generic/LSB-Core-generic/iniscrptfunc.html


Instead of storing the start time explicitly, you could touch the pid 
file's creation time to match that of the process ;-)  That's harder to 
do in the shell, unfortunately...


-- Alan
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] RFC: pidfile handling; current worst case: stop failure and node level fencing

2014-10-20 Thread Alan Robertson
For the Assimilation code I use the full pathname of the binary from
/proc to tell if it's one of mine.  That's not perfect if you're using
an interpreted language.  It works quite well for compiled languages.


On 10/20/2014 01:17 PM, Lars Ellenberg wrote:
 Recent discussions with Dejan made me again more prominently aware of a
 few issues we probably all know about, but usually dismis as having not
 much relevance in the real-world.

 The facts:

  * a pidfile typically only stores a pid
  * a pidfile may stale, not properly cleaned up
when the pid it references died.
  * pids are recycled

This is more an issue if kernel.pid_max is small
wrt the number of processes created per unit time,
for example on some embeded systems,
or on some very busy systems.

But it may be an issue on any system,
even a mostly idle one, given bad luck^W timing,
see below.

 A common idiom in resource agents is to

 kill_that_pid_and_wait_until_dead()
 {
   local pid=$1
   is_alive $pid || return 0
   kill -TERM $pid
   while is_alive $pid ; sleep 1; done
   return 0
 }

 The naïve implementation of is_alive() is
 is_alive() { kill -0 $1 ; }

 This is the main issue:
 ---

 If the last-used-pid is just a bit smaller then $pid,
 during the sleep 1, $pid may die,
 and the OS may already have created a new process with that exact pid.

 Using above is_alive, kill_that_pid() will not notice that the
 to-be-killed pid has actually terminated while that new process runs.
 Which may be a very long time if that is some other long running daemon.

 This may result in stop failure and resulting node level fencing.

 The question is, which better way do we have to detect if some pid died
 after we killed it. Or, related, and even better: how to detect if the
 process currently running with some pid is in fact still the process
 referenced by the pidfile.

 I have two suggestions.

 (I am trying to avoid bashisms in here.
  But maybe I overlook some.
  Also, the code is typed, not sourced from some working script,
  so there may be logic bugs and typos.
  My intent should be obvious enough, though.)

 using cd /proc/$pid; stat .
 -

 # this is most likely linux specific
 kill_that_pid_and_wait_until_dead()
 {
   local pid=$1
   (
   cd /proc/$pid || return 0
   kill -TERM $pid
   while stat . ; sleep 1; done
   )
   return 0
 }

 Once pid dies, /proc/$pid will become stale (but not completely go away,
 because it is our cwd), and stat . will return No such process.

 Variants:

 using test -ef
 --

   exec 7/proc/$pid || return 0
   kill -TERM $pid
   while :; do
   exec 8/proc/$pid || break
   test /proc/self/fd/7 -ef /proc/self/fd/8 || break
   sleep 1
   done
   exec 7- 8-

 using stat -c %Y /proc/$pid
 ---

   ctime0=$(stat -c %Y /proc/$pid)
   kill -TERM $pid
   while ctime=$(stat -c %Y /proc/$pid)  [ $ctime = $ctime0 ] ; do sleep 
 1; done


 Why not use the inode number I hear you say.
 Because it is not stable. Sorry.
 Don't believe me? Don't want to read kernel source?
 Try it yourself:

   sleep 120  k=$!
   stat /proc/$k
   echo 3  /proc/sys/vm/drop_caches
   stat /proc/$k

 But that leads me to an other proposal:   
 store the starttime together with the pid in a pidfile.

 For linux that would be:

 (see proc(5) for /proc/pid/stat field meanings.
  note that (comm) may contain both whitespace and ),
  which is the reason for my sed | cut below)

 spawn_create_exclusive_pid_starttime()
 {
   local pidfile=$1
   shift
   local reset
   case $- in *C*) reset=:;; *) set -C; reset=set +C;; esac
   if ! exec 3$pidfile ; then
   $reset
   return 1
   fi

   $reset
   setsid sh -c '
   read pid _  /proc/self/stat
   starttime=$(sed -e 's/^.*) //' /proc/$pid/stat | cut -d' ' -f 
 20)
   3 echo $pid $starttime
   3- exec $@
   ' -- $@ 
   return 0
 }

 It does not seem possible to cycle through all available pids
 within fractions of time smaller than the granularity of starttime,
 so pid starttime should be a unique tuple (until the next reboot --
 at least on linux, starttime is measured as strictly monotonic uptime).


 If we have pid starttime in the pidfile,
 we can:

 get_proc_pid_starttime()
 {
   proc_pid_starttime=$(sed -e 's/^.*) //' /proc/$pid/stat) || return 1
   proc_pid_starttime=$(echo $proc_pid_starttime | cut -d' ' -f 20)
 }

 kill_using_pidfile()
 {
   local pidfile=$1
   local pid starttime proc_pid_starttime

   test -e $pidfile|| return # already dead
   read pid starttime $pidfile|| return # unreadable

   # check pid and starttime are both present, numeric only, ...
   # 

Re: [Linux-ha-dev] RFC: pidfile handling; current worst case: stop failure and node level fencing

2014-10-20 Thread Alan Robertson
On 10/20/2014 03:21 PM, Lars Ellenberg wrote:
 On Mon, Oct 20, 2014 at 03:04:31PM -0600, Alan Robertson wrote:
 On 10/20/2014 02:52 PM, Alan Robertson wrote:
 For the Assimilation code I use the full pathname of the binary from
 /proc to tell if it's one of mine.  That's not perfect if you're using
 an interpreted language.  It works quite well for compiled languages.
 It works just as well (or as bad) from interpreted languages:
 readlink /proc/$pid/exe
 (very old linux has a fsid:inode encoding there, but I digress)

 But that does solve a different subset of problems,
 has race conditions in itself, and breaks if you have updated the binary
 since start of that service (which does happen).

 It does not fully address what I am talking about.
It only breaks if you change the *name* of the binary.  Updating the
binary contents has no effect.  Changing the name of the binary is
pretty unusual - or so it seems to me.  Did I miss something?

And if you do, you should stop with the binary with the old version and
start it with the new one.  Very few methods are going to deal well with
radical changes in the service without stopping it with the old script,
updating, and starting with the new script.

I don't believe I see the race condition.

It won't loop, and it's not fooled by pid wraparound.  What else are you
looking for? [Guess I missed something else here]
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] Prototype REST code

2013-11-25 Thread Alan Robertson
Hi folks,

I just added the first bit of prototype REST code to the system.  Its
purpose it to provide an interface for JavaScript UI code yet to be
written ;-).

It uses the Flask library to provide the http support.

There is currently only one interesting URL supported by the code:
doquery/queryname

So, you perform a query by sending an http request to the
doquery/queryname URL.

Parameters to the query are supplied in the usual ? format.

Here are a few examples of queries:

http://localhost:5000/doquery/GetAllQueries
retrieve information (metadata) about all valid queries

http://localhost:5000/doquery/GetAQuery?queryname=somequeryname:*
get detailed information about a particular query (metadata)

http://localhost:5000/doquery/ListDrones
lists all known machinesand their status (in detail)

http://localhost:5000/doquery/DownDrones
lists all known machines which are currently down for any reason


http://localhost:5000/doquery/CrashedDrones
lists all known machines which are currently down because
they crashed


Also, there is a hack in the code, where it reads some data from a
location that needs to be installed -- but isn't yet.  So it's hardwired
to a pathname on my machine.  That'll get fixed tomorrow.

But the cool thing is that it works, and that it spits out what looks to
me like good JSON.

Lots to be tested, but it's a reasonable start (a minor milestone).

The main other thing that comes to mind is to allow a client to
subscribe to changes to things.

But before that it would be cool to have any kind of a primitive piece
of code that used this interface.




-- 
Alan Robertson al...@unix.sh - @OSSAlanR

Openness is the foundation and preservative of friendship...  Let me claim 
from you at all times your undisguised opinions. - William Wilberforce
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [Linux-HA] Announcing release 0.1.0 of the Assimilation Monitoring Project!

2013-05-02 Thread Alan Robertson
I ran across it while doing some research.  As you said, it's a
different focus - but related.

Thanks for the heads-up!


On 04/24/2013 04:09 AM, Lars Ellenberg wrote:
 On Fri, Apr 19, 2013 at 03:01:51PM -0600, Alan Robertson wrote:
 Hi all,
 Hi Alan!

 Good to see progress on this Project.

 Did you know about NeDi www.nedi.ch ?

 I put Remo Rickli on Cc; aparently NeDi prefers forum over mailing list.

 Nedi has a few years head start, and a different focus maybe.
 But as both projects seem to have a lot in common at least for the
 discovery part, you still should be able to find some potential
 synergies, or at least productively cooperate.

 Cheers,
   Lars

 This announcement is likely to be of interest to people like you who are
 concerned about availability.

 I founded the Linux-HA project in 1998 and led it for nearly 10 years. 
 Back in about November 2010, I announced the beginnings of what would
 become the Assimilation Monitoring Project on this mailing list.

 The Assimilation Monitoring project [http://assimmon.org
 http://assimmon.org/] is a new open source monitoring project with a
 revolutionary architecture.  It provides highly scalable [~*/O/*(1)]
 monitoring driven by integrated continuous Stealth Discovery(TM).

 This first release is intended as a proof of concept, to demonstrate the
 architecture, get feedback, add early adopters, and grow the community.

 The project has basically two thrusts:

   * It provides /extremely/ scalable exception monitoring (100K servers
 -- no problem)
   * It discovers all the details of your infrastructure (servers,
 services, dependencies, switches, switch port connections, etc.),
 builds a Neo4j graph database of all the gory details and updates it
 as things change - without setting off network security alarms.
   * The two functions are integrated in a way that will permit much
 easier configuration than traditional systems, and support the
 creation of simple audits to see if everything is being monitored.

 Release description:   
 http://linux-ha.org/source-doc/assimilation/html/_release_descriptions.html
 Technology video:http://bit.ly/OD6bY6
 TechTarget Interview:   http://bit.ly/17M6DK2

 Join the mailing list: 
 http://lists.community.tummy.com/cgi-bin/mailman/listinfo/assimilation


 Join the mailing list, download the code, try it out, and send your
 comments and questions to the list!


 Thanks and have a great weekend!



 -- 
 Alan Robertson al...@unix.sh - @OSSAlanR

 Openness is the foundation and preservative of friendship...  Let me claim 
 from you at all times your undisguised opinions. - William Wilberforce

 ___
 Linux-HA mailing list
 linux...@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems


-- 
Alan Robertson al...@unix.sh - @OSSAlanR

Openness is the foundation and preservative of friendship...  Let me claim 
from you at all times your undisguised opinions. - William Wilberforce
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] Announcing release 0.1.0 of the Assimilation Monitoring Project!

2013-04-19 Thread Alan Robertson
Hi all,

This announcement is likely to be of interest to people like you who are
concerned about availability.

I founded the Linux-HA project in 1998 and led it for nearly 10 years. 
Back in about November 2010, I announced the beginnings of what would
become the Assimilation Monitoring Project on this mailing list.

The Assimilation Monitoring project [http://assimmon.org
http://assimmon.org/] is a new open source monitoring project with a
revolutionary architecture.  It provides highly scalable [~*/O/*(1)]
monitoring driven by integrated continuous Stealth Discovery(TM).

This first release is intended as a proof of concept, to demonstrate the
architecture, get feedback, add early adopters, and grow the community.

The project has basically two thrusts:

  * It provides /extremely/ scalable exception monitoring (100K servers
-- no problem)
  * It discovers all the details of your infrastructure (servers,
services, dependencies, switches, switch port connections, etc.),
builds a Neo4j graph database of all the gory details and updates it
as things change - without setting off network security alarms.
  * The two functions are integrated in a way that will permit much
easier configuration than traditional systems, and support the
creation of simple audits to see if everything is being monitored.

Release description:   
http://linux-ha.org/source-doc/assimilation/html/_release_descriptions.html
Technology video:http://bit.ly/OD6bY6
TechTarget Interview:   http://bit.ly/17M6DK2

Join the mailing list: 
http://lists.community.tummy.com/cgi-bin/mailman/listinfo/assimilation


Join the mailing list, download the code, try it out, and send your
comments and questions to the list!


Thanks and have a great weekend!



-- 
Alan Robertson al...@unix.sh - @OSSAlanR

Openness is the foundation and preservative of friendship...  Let me claim 
from you at all times your undisguised opinions. - William Wilberforce

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Slight bending of OCF specs: Re: Issues found in Apache resource agent

2012-09-12 Thread Alan Robertson
On 09/12/2012 05:14 AM, Lars Marowsky-Bree wrote:
 On 2012-09-11T15:04:55, Alan Robertson al...@unix.sh wrote:

 Depends. Pacemaker may still care about the status of these agents.
 If it can't start or stop them, what can it do with them?
 The status from these agents may feed into operations on other
 resources that are fully managed.

Understood.

I believe it will care about those other agents - not these.   It 
shouldn't know about these, AFAIK.

The fact that the other agents might call these is an implementation 
detail - not something it should care about directly.  Just as the 
resource agents should only rely on things that the OCF RA spec says are 
provided, consumers of those agents (like pacemaker) shouldn't go past 
the spec in terms of expectations from or observations of resource 
agents beyond the spec.  Or at least that's how it seems to me.

It's still my intent to have the exit codes, argument passing, etc. be 
fully compliant with the OCF RA specification.  The only exception I 
plan on is no start or stop (or reload, etc) actions. They will 
implement the meta-data and monitor and validate-all actions.  I'm not 
sure whether validate-all makes sense for them or not(?).  I'll think 
about that...



-- 
 Alan Robertson al...@unix.sh - @OSSAlanR

Openness is the foundation and preservative of friendship...  Let me claim 
from you at all times your undisguised opinions. - William Wilberforce
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Slight bending of OCF specs: Re: Issues found in Apache resource agent

2012-09-11 Thread Alan Robertson
On 09/08/2012 02:53 PM, Lars Marowsky-Bree wrote:
 On 2012-09-07T13:46:27, Alan Robertson al...@unix.sh wrote:

 Well, I presume that one would not tell pacemaker about such agents, as
 they would not be useful to pacemaker.  From the point of view of the
 crm command, you wouldn't consider them as valid resource agents to
 put in a configuration for pacemaker.
 Depends. Pacemaker may still care about the status of these agents.
If it can't start or stop them, what can it do with them?   And 
presuming it can't do anything with them, then it doesn't make sense to 
include them in a configuration.

Am I missing something here?

-- 
 Alan Robertson al...@unix.sh - @OSSAlanR

Openness is the foundation and preservative of friendship...  Let me claim 
from you at all times your undisguised opinions. - William Wilberforce
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Slight bending of OCF specs: Re: Issues found in Apache resource agent

2012-09-07 Thread Alan Robertson
On 09/05/2012 03:32 AM, Dejan Muhamedagic wrote:

 This would be for my new monitoring project, of course
 ;-).  But it could then be called by all the HTTP resource agents - or
 used directly - for example by the Assimilation project.

 This would be a slight but useful bending of OCF resource agent APIs.
 We could create some new metadata to document it, and also not put start
 and stop into the actions in the operations section. Or just the latter.

 What do you think?
 Right now, there's a bunch of resource agents faking the state
 (e.g. ping), that is pretending to be able to start and stop.
 If we could somehow do without it, that would obviously be
 beneficial. Not sure if/how the pacemaker could deal with such
 agents.
Well, I presume that one would not tell pacemaker about such agents, as 
they would not be useful to pacemaker.  From the point of view of the 
crm command, you wouldn't consider them as valid resource agents to 
put in a configuration for pacemaker.

People would instead use the nginx or apache agents that _do_ know how 
to start and stop things.

-- 
 Alan Robertson al...@unix.sh - @OSSAlanR

Openness is the foundation and preservative of friendship...  Let me claim 
from you at all times your undisguised opinions. - William Wilberforce
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] Slight bending of OCF specs: Re: Issues found in Apache resource agent

2012-09-04 Thread Alan Robertson
Hi Dejan,

If the resource agent is not running correctly it needs to be 
restarted.  My memory says that OCF_ERR_GENERIC will not cause that 
behavior.  I believe the spec says you should exit with not running if 
it is not functioning correctly.  (but I didn't check it, and my memory 
isn't that clear in this case).

I will likely write a monitor-only resource agent for web servers.  What 
would you think about calling it from the other web resource agents?

This resource agent will not look at any config files, and will require 
everything explicitly in parameters, and will not know how to start or 
stop anything.  This would be for my new monitoring project, of course 
;-).  But it could then be called by all the HTTP resource agents - or 
used directly - for example by the Assimilation project.

This would be a slight but useful bending of OCF resource agent APIs.  
We could create some new metadata to document it, and also not put start 
and stop into the actions in the operations section. Or just the latter.

What do you think?



On 08/29/2012 05:31 AM, Dejan Muhamedagic wrote:
 Hi Alan,

 On Mon, Aug 27, 2012 at 10:51:15AM -0600, Alan Robertson wrote:
 Hi,

 I was recently using the Apache resource agent, and discovered a few
 problems:

   The exit code from grep was used directly as an OCF exit code.
   It is NOT an OCF exit code, and should not be directly used
 in this way.
 I guess you mean the greps in monitor_apache_extended and
 monitor_apache_basic? These lines:

 267   $whattorun $test_url | grep -Ei $test_regex  /dev/null
 277   ${ourhttpclient}_func $STATUSURL | grep -Ei $TESTREGEX  /dev/null

   This caused a not running error to become a generic error.
 These lines are invoked _only_ in case it was previously
 established that the apache server is running. So, they should
 return OCF_ERR_GENERIC if the test fails. grep exits with code 1
 which matches OCF_ERR_GENERIC. But indeed the OCF error code
 should be returned explicitely.

   Pacemaker reacts very differently to the two kinds of errors.

 This code occurred in two places.

 The resource agent used OCF_CHECK_LEVEL improperly.

 The specification says that if you receive an OCF_CHECK_LEVEL which you
 do not support, you are required to interpret it as the next lower
 supported value for OCF_CHECK_LEVEL.

 In effect, there are no invalid OCF_CHECK_LEVEL values.  The Apache
 agent declared all values but one to be errors.  This is not the correct
 behavior.
 OK. That somehow slipped while I had been reading the OCF standard.

 BTW, it'd be great if nginx shared some code with apache. The
 latter has already been split into three scripts.

 Cheers,

 Dejan

 -- 
   Alan Robertson al...@unix.sh - @OSSAlanR

 Openness is the foundation and preservative of friendship...  Let me claim 
 from you at all times your undisguised opinions. - William Wilberforce
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/


-- 
 Alan Robertson al...@unix.sh - @OSSAlanR

Openness is the foundation and preservative of friendship...  Let me claim 
from you at all times your undisguised opinions. - William Wilberforce
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] Issues found in Apache resource agent

2012-08-27 Thread Alan Robertson
Hi,

I was recently using the Apache resource agent, and discovered a few 
problems:

 The exit code from grep was used directly as an OCF exit code.
 It is NOT an OCF exit code, and should not be directly used 
in this way.
 This caused a not running error to become a generic error.
 Pacemaker reacts very differently to the two kinds of errors.

   This code occurred in two places.

The resource agent used OCF_CHECK_LEVEL improperly.

The specification says that if you receive an OCF_CHECK_LEVEL which you 
do not support, you are required to interpret it as the next lower 
supported value for OCF_CHECK_LEVEL.

In effect, there are no invalid OCF_CHECK_LEVEL values.  The Apache 
agent declared all values but one to be errors.  This is not the correct 
behavior.

-- 
 Alan Robertson al...@unix.sh - @OSSAlanR

Openness is the foundation and preservative of friendship...  Let me claim 
from you at all times your undisguised opinions. - William Wilberforce
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] LRM bug

2012-08-07 Thread Alan Robertson
On 08/07/2012 08:18 AM, Dejan Muhamedagic wrote:
 Hi Alan,

 On Mon, Jul 30, 2012 at 10:14:27AM -0600, Alan Robertson wrote:
 The LRM treats operation timeouts as ERROR:s - not just failed
 operations that give warnings.  This violates the meaning of ERROR:
 messages in the code.

 We reserved ERROR: messages for things that the software did not expect
 - and therefore possibly could not be properly recovered from.  In this
 case, the behavior is perfectly expected and the condition will be
 properly recovered from.  It just means the operation in question failed.

 An sample message:
   ERROR: process_lrm_event: LRM operation agent-da:3_monitor_5000
 (47) Timed Out (timeout=6ms)

 Because of this one message, you can't tell customers If you ever have
 an ERROR: message, the HA software has failed.

 This ought to just be a warning, like any other failed action...
 I guess that ERROR is used because resource agents use the same
 severity when reporting failures they cannot recover from. In
 this case, the RA won't log anything, so the lrmd does that on
 its behalf. That seems OK to me. The other option would be to
 remove the ERROR severity log messages in all RA, because a
 resource problem should normally always be recoverable.
The exceptions that print ERROR: should be relegated to things like The 
CRM gave me a command I didn't understand, or referenced a resource that 
I don't know about -- and similar things that really shouldn't happen.

Or that's how it seems to me anyway...


-- 
 Alan Robertson al...@unix.sh - @OSSAlanR

Openness is the foundation and preservative of friendship...  Let me claim 
from you at all times your undisguised opinions. - William Wilberforce
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Probable sub-optimal behavior in ucast

2012-07-31 Thread Alan Robertson
On 07/31/2012 01:58 AM, Lars Ellenberg wrote:
 Besides that a ten node cluster is likely to break the 64k message size
 limit, even after compression...
The CIB is about 20K before compression...  So I think we're not in as
bad a shape as I would have guessed.

 You probably should re-organize the code so that you only have
 one receiving ucast socket per nic/ip/port.
That would be a big change or so it seems to me.  Right now, the parent
code doesn't look at the parameters given to its children...

 But I think that a single UDP packet will be delivered to
 a single socket, even though you have 18 receiving sockets
 bound to the same port (possible because of SO_REUSEPORT, only).
I was having various troubles with the system and wasn't sure debugging
was actually taking effect.  But your explanation may be the right one. 
I will get some more time on one of the systems in the next few days and
verify that.
 If we, as I think we do, receive on just one of them, where which one is
 determined by the kernel, not us, your suggested ingress filter on
 expected source IP would break communications.
Good point.

 Do you have evidence for the assumption that you receive incoming
 packets on all sockets, and not on just one of them?
I wasn't sure, actually - because of the troubles mentioned above.  I'll
check back in and let you know...

I saw the IPC (!) having troubles on one of the systems - and the CIB
was trying to send packets that were getting lost - and eventually the
CIB lost its connection to Heartbeat.  I could not imagine what could
cause that - so this was my theory.  We had a resource that we were
trying to restart but because of some disk problem it wouldn't actually
restart.   About this time on a different machine (the DC) we saw this
IPC issue.

If you have an idea what could cause IPC to behave this way I'd be happy
to know what it was...

-- Alan Robertson
   al...@unix.sh
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Probable sub-optimal behavior in ucast

2012-07-31 Thread Alan Robertson
On 07/31/2012 06:08 AM, Alan Robertson wrote:

 I wasn't sure, actually - because of the troubles mentioned above.  I'll
 check back in and let you know...
Only two of the read processes are accumulating any CPU - it's the last 
one on each interface.

You hit it spot on.

 Thanks Lars!


-- 
 Alan Robertson al...@unix.sh - @OSSAlanR

Openness is the foundation and preservative of friendship...  Let me claim 
from you at all times your undisguised opinions. - William Wilberforce
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] LRM bug

2012-07-30 Thread Alan Robertson
The LRM treats operation timeouts as ERROR:s - not just failed 
operations that give warnings.  This violates the meaning of ERROR: 
messages in the code.

We reserved ERROR: messages for things that the software did not expect 
- and therefore possibly could not be properly recovered from.  In this 
case, the behavior is perfectly expected and the condition will be 
properly recovered from.  It just means the operation in question failed.

An sample message:
 ERROR: process_lrm_event: LRM operation agent-da:3_monitor_5000 
(47) Timed Out (timeout=6ms)

Because of this one message, you can't tell customers If you ever have 
an ERROR: message, the HA software has failed.

This ought to just be a warning, like any other failed action...

-- 
 Alan Robertson al...@unix.sh - @OSSAlanR

Openness is the foundation and preservative of friendship...  Let me claim 
from you at all times your undisguised opinions. - William Wilberforce
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] Probable sub-optimal behavior in ucast

2012-07-30 Thread Alan Robertson
Hi,

I have a 10-node system with each system having 2 interfaces, and 
therefore each ha.cf file has 18 ucast lines in it.

If I read the code correctly, I think each heartbeat packet is then 
being received 18 times and sent to the master control process - where 
each is then uncompressed and 17 of them are thrown away...

Could someone else offer your thoughts on this?

It looks to be a 2 or 3 line fix in ucast.c to thrown away ucast packets 
that aren't from the address we expect - which would cut us down to only 
one of them being sent from each of the interfaces - a 9 to 1 reduction 
in work on the master control process.  And I don't have to uncompress 
them to throw them away - I can just look at the source IP address...

What do you think?





-- 
 Alan Robertson al...@unix.sh - @OSSAlanR

Openness is the foundation and preservative of friendship...  Let me claim 
from you at all times your undisguised opinions. - William Wilberforce
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] STONITH agent for SoftLayer API

2012-06-07 Thread Alan Robertson
Red Hat invented their own API then disabled the working API in their 
version of the code.   Of course, they don't have as many agents, and 
they're not as well tested and the API is a bit odd.  But what do I 
know, since I invented the Linux-HA API.



On 6/7/2012 3:50 PM, Brad Jones wrote:
 Can someone help me understand the correct spec to build a STONITH
 plugin against?  Through a bunch of trial and error today, I've
 discovered that there may be a few different methods of passing
 configuration options to a plugin, or perhaps this differs
 significantly across distributions and I'm not getting the nuiance.

 The plugin in question is at
 https://github.com/bradjones1/stonith-softlayer/blob/master/softlayer.php
 - you'll notice I now have options to pull from STDIN, command-line
 and environment variables (in a global, since this is php.)

 I originally built this according to
 https://fedorahosted.org/cluster/wiki/FenceAgentAPI but this seems to
 be very different from the plugins I am finding at
 http://hg.linux-ha.org/glue/file/c69dc6ace936/lib/plugins/stonith/external,
 for instance.

 FWIW I am using cluster-glue 1.0.8.

 I'd be happy to help write some documentation once I figure out
 exactly why this is/was so confusing.  Thanks...
 --
 Brad Jones
 b...@jones.name
 Mobile: 303-219-0795
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [RFC] IPaddr2: Proposal patch to support the dual stack of IPv4 and IPv6.

2012-06-04 Thread Alan Robertson
On 06/04/2012 12:32 AM, Keisuke MORI wrote:
 Hi Alan,

 Thank you for your comments.
 It's an interesting idea, but I don't think we need to care about IPv4
 link-local addresses
 because users can configure using the same manner as a regular IP address.
 (and it's used very rarely)

 In the case of IPv6 link-local addresses it is almost always a wrong
 configuration if nic is missing
 (the socket API mandate it) so we want to check it.

 However, for addresses which are not yet up (which is unfortunately what
 you're concerned with),  ipv6 link-local addresses take the form
fe80:: -- followed by 64-bits of MAC addresses (48 bit
 MACs are padded out)

 http://en.wikipedia.org/wiki/Link-local_address

 MAC addresses never begin with 4 bytes of zeros, so the regular expression
 to match this is pretty straightforward.  This isn't a bad approximation
 (but could easily be made better):
 Yes, you are right. Matching to 'fe80::' should be pretty easy and good 
 enough.
 Why I could not think of such a simple idea :)
I'm delighted to have been of service.

I'm best at simple things ;-).


-- 
 Alan Robertsonal...@unix.sh  - @OSSAlanR

Openness is the foundation and preservative of friendship...  Let me claim 
from you at all times your undisguised opinions. - William Wilberforce
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] The role of dependencies in managing computer systems

2012-06-04 Thread Alan Robertson

Hi,

I recently wrote a blog post on the importance of dependencies in 
managing computer systems.


Of course, a small number of dependencies are modelled very nicely by 
things like Pacemaker, but the picture is much bigger than those 
dependencies.


Read more here:

http://techthoughts.typepad.com/managing_computers/2012/06/dependency-information-in-computer-systems.html


--
Alan Robertsonal...@unix.sh  - @OSSAlanR

Openness is the foundation and preservative of friendship...  Let me claim from you 
at all times your undisguised opinions. - William Wilberforce

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] USENIX Configuration Management Summit

2012-06-04 Thread Alan Robertson

Hi,

For those of you interested in managing servers beyond high-availability 
(which is likely many of you), you might be interested in next week's 
USENIX Configuration Management Summit.


It will be held in Boston, and will feature a series of interesting 
talks, including one by me on the Assimilation Monitoring Project - 
http://assimmon.org/.


https://www.usenix.org/conference/ucms12

Look forward to seeing you there!

--
Alan Robertsonal...@unix.sh  - @OSSAlanR

Openness is the foundation and preservative of friendship...  Let me claim from you 
at all times your undisguised opinions. - William Wilberforce

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [RFC] IPaddr2: Proposal patch to support the dual stack of IPv4 and IPv6.

2012-05-31 Thread Alan Robertson
It's straightforward to determine if an IP address is link-local or not
- for an already configured address.

3: eth1: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc pfifo_fast
state UP qlen 1000
link/ether 94:db:c9:3f:7c:20 brd ff:ff:ff:ff:ff:ff
inet 10.10.10.30/24 brd 10.10.10.255 *scope global* eth1
inet6 fe80::96db:c9ff:fe3f:7c20/64 *scope link *
   valid_lft forever preferred_lft forever

This works uniformly for both ipv4 and ipv6 addresses (quite nice!)

However, for addresses which are not yet up (which is unfortunately what
you're concerned with), ipv6 link-local addresses take the form
  fe80:: -- followed by 64-bits of MAC addresses (48 bit
MACs are padded out)

http://en.wikipedia.org/wiki/Link-local_address

MAC addresses never begin with 4 bytes of zeros, so the regular
expression to match this is pretty straightforward.  This isn't a bad
approximation (but could easily be made better):

islinklocal() {
  if
 echo $1 | grep -i '^fe80::[^:]*:[^:]*:[^:]*:[^:]*$' /dev/null
  then
echo $1 is link-local
  else
echo $1 is NOT link-local
  fi
}




On 05/31/2012 12:29 AM, Keisuke MORI wrote:
 I would like to propose an enhancement of IPaddr2 to support IPv6 as
 well as IPv4.

 I've submitted this as a pull request #97 but also posting to the ML
 for a wider audience.

 I would appreciate your comments and suggestions for merging this into
 the upstream.

 
 [RFC] IPaddr2: Proposal patch to support the dual stack of IPv4 and IPv6.
 https://github.com/ClusterLabs/resource-agents/pull/97


 ## Benefits:

 * Unify the usage, behavior and the code maintenance between IPv4 and
   IPv6 on Linux.

   The usage of IPaddr2 and IPv6addr are similar but they have
   different parameters and different behaviors.
   In particular, they may choose a different interface depending
   on your configuration even if you provided similar parameters
   in the past.

   IPv6addr is written in C and rather hard to make improvements.
   As /bin/ip already supports both IPv4 and IPv6, we can share
   the most of the code of IPaddr2 written in bash.

 * usable for LVS on IPv6.

   IPv6addr does not support lvs_support=true and unfortunately
   there is no possible way to use LVS on IPv6 right now.

   IPaddr2(/bin/ip) works for LVS configurations without
   enabling lvs_support both for IPv4 and IPv6.

   (You don't have to remove an address on the loopback interface
   if the virtual address is assigned by using /bin/ip.)

   See also:
   http://www.gossamer-threads.com/lists/linuxha/dev/76429#76429

 * retire the old 'findif' binary.

   'findif' binary is replaced by a shell script version of
   findif, originally developed by lge.
   See findif could be rewritten in shell :
   https://github.com/ClusterLabs/resource-agents/issues/53

 * easier support for other pending issues

   These pending issues can be fix based on this new IPaddr2.
   * Allow ipv6addr to mark new address as deprecated
 https://github.com/ClusterLabs/resource-agents/issues/68
   * New RA that controls IPv6 address in loopback interface
 https://github.com/ClusterLabs/resource-agents/pull/77


 ## Notes / Changes:

 * findif semantics changes

   There are some incompatibility in deciding which interface to
   be used when your configuration is ambiguous. But in reality
   it should not be a problem as long as it's configured properly.

   The changes mostly came from fixing a bug in the findif binary
   (returns a wrong broadcast) or merging the difference between
   (old)IPaddr2 and IPv6addr.
   See the ofct test cases for details.
   (case No.6, No.9, No.10, No.12, No.15 in IPaddr2v4 test cases)

   Other notable changes are described below.

 * broadcast parameter for IPv4

   broadcast parameter may be required along with cidr_netmask
   when you want use a different subnet mask from the static IP address.
   It's because doing such calculation is difficult in the shell
   script version of findif.

   See the ofct test cases for details.
   (case No.11, No.14, No.16, No.17 in IPaddr2v4 test cases)

   This limitation may be eliminated if we would remove
   brd options from the /bin/ip command line.

 * loopback(lo) now requires cidr_netmask or broadcast.

   See the ofct test case in the IPaddr2 ocft script.
   The reason is similar to the previous one.

 * loose error check for nic for a IPv6 link-local address.

   IPv6addr was able to check this, but in the shell script it is
   hard to determine a link-local address (requires bitmask calculation).
   I do not think it's worth to implement it in shell.

 * send_ua: a new binary

   We need one new binary as a replacement of send_arp for IPv6 support.
   IPv6addr.c is reused to make this command.


 Note that IPv6addr RA is still there and you can continue to use
 it for the backward compatibility.


 ## Acknowledgement

 Thanks to Tomo Nozawa-san for his hard work for writing and
 testing this patch.

 Thanks 

Re: [Linux-ha-dev] [Patch] The patch which revises memory leak.

2012-05-08 Thread Alan Robertson
FYI: there is code in the heartbeat communication layer which is quite 
happy to simulate lost packets.

I made it difficult to turn on accidentally.  Read the code for details 
if you're interested.



On 04/30/2012 10:21 PM, renayama19661...@ybb.ne.jp wrote:
 Hi Lars,

 We confirmed that this problem occurred with v1 mode of Heartbeat.
   * The problem happens with the v2 mode in the same way.

 We confirmed a problem in the next procedure.

 Step 1) Put a special device extinguishing a communication packet of 
 Heartbeat in the network.

 Step 2) Between nodes, the retransmission of the message is carried out 
 repeatedly.

 Step 3) Then the memory of the master process increases little by little.


  As a result of the ps command of the master process --
 * node1
 (start)
 32126 ?SLs0:00  0   182 53989  7128  0.0 heartbeat: master 
 control process
 (One hour later)
 32126 ?SLs0:03  0   182 54729  7868  0.0 heartbeat: master 
 control process
 (Two hour later)
 32126 ?SLs0:08  0   182 55317  8456  0.0 heartbeat: master 
 control process
 (Four hours later)
 32126 ?SLs0:24  0   182 56673  9812  0.0 heartbeat: master 
 control process

 * node2
 (start)
 31928 ?SLs0:00  0   182 53989  7128  0.0 heartbeat: master 
 control process
 (One hour later)
 31928 ?SLs0:02  0   182 54481  7620  0.0 heartbeat: master 
 control process
 (Two hour later)
 31928 ?SLs0:08  0   182 55353  8492  0.0 heartbeat: master 
 control process
 (Four hours later)
 31928 ?SLs0:23  0   182 56689  9828  0.0 heartbeat: master 
 control process


 The state of the memory leak seems to vary according to a node with the 
 quantity of the retransmission.

 The increase of this memory disappears by applying my patch.

 And the similar correspondence seems to be necessary in send_reqnodes_msg(), 
 but this is like little leak.

 Best Regards,
 Hideo Yamauchi.


 --- On Sat, 2012/4/28, renayama19661...@ybb.ne.jprenayama19661...@ybb.ne.jp 
  wrote:

 Hi Lars,

 Thank you for comments.

 Have you actually been able to measure that memory leak you observed,
 and you can confirm this patch will fix it?

 Because I don't think this patch has any effect.
 Yes.
 I really measured leak.
 I can show a result next week.
 #Japan is a holiday until Tuesday.

 send_rexmit_request() is only used as paramter to
 Gmain_timeout_add_full, and it returns FALSE always,
 which should cause the respective sourceid to be auto-removed.
 It seems to be necessary to release gsource somehow or other.
 The similar liberation seems to be carried out in lrmd.

 Best Regards,
 Hideo Yamauchi.


 --- On Fri, 2012/4/27, Lars Ellenberglars.ellenb...@linbit.com  wrote:

 On Thu, Apr 26, 2012 at 10:56:30AM +0900, renayama19661...@ybb.ne.jp wrote:
 Hi All,

 We gave test that assumed remote cluster environment.
 And we tested packet lost.

 The retransmission timer of Heartbeat causes memory leak.

 I donate a patch.
 Please confirm the contents of the patch.
 And please reflect a patch in a repository of Heartbeat.
 Have you actually been able to measure that memory leak you observed,
 and you can confirm this patch will fix it?

 Because I don't think this patch has any effect.

 send_rexmit_request() is only used as paramter to
 Gmain_timeout_add_full, and it returns FALSE always,
 which should cause the respective sourceid to be auto-removed.


 diff -r 106ca984041b heartbeat/hb_rexmit.c
 --- a/heartbeat/hb_rexmit.cThu Apr 26 19:28:26 2012 +0900
 +++ b/heartbeat/hb_rexmit.cThu Apr 26 19:31:44 2012 +0900
 @@ -164,6 +164,8 @@
seqno_t seq = (seqno_t) ri-seq;
struct node_info* node = ri-node;
struct ha_msg*hmsg;
 +unsigned long   sourceid;
 +gpointer value;
   
if (STRNCMP_CONST(node-status, UPSTATUS) != 0
STRNCMP_CONST(node-status, ACTIVESTATUS) !=0) {
 @@ -196,11 +198,17 @@
   
node-track.last_rexmit_req = time_longclock();   
   
 -if (!g_hash_table_remove(rexmit_hash_table, ri)){
 -cl_log(LOG_ERR, %s: entry not found in rexmit_hash_table
 -   for seq/node(%ld %s),
 -   __FUNCTION__, ri-seq, ri-node-nodename);
 -return FALSE;
 +value = g_hash_table_lookup(rexmit_hash_table, ri);
 +if ( value != NULL) {
 +sourceid = (unsigned long) value;
 +Gmain_timeout_remove(sourceid);
 +
 +if (!g_hash_table_remove(rexmit_hash_table, ri)){
 +cl_log(LOG_ERR, %s: entry not found in rexmit_hash_table
 +   for seq/node(%ld %s),
 +   __FUNCTION__, ri-seq, ri-node-nodename);
 +return FALSE;
 +}
}
   
schedule_rexmit_request(node, seq, max_rexmit_delay);

 -- 
 : Lars Ellenberg
 : LINBIT | Your Way to High Availability
 : DRBD/HA support and consulting 

Re: [Linux-ha-dev] [Patch] The patch which revises memory leak.

2012-05-02 Thread Alan Robertson
This is very interesting.  My apologies for missing this memory leak 
:-(.  The code logs memory usage periodically exactly to help notice 
such a thing.

In my new open source project [http://assimmon.org], I am death on 
memory leaks.  But I can assure you that back when that code was 
written, it was not at all clear who deleted what memory when - when it 
came to the glib.  I'm not sure if valgrind was out back then, but I 
certainly didn't know about it.

I confess that even on this new project I had a heck of a time making 
all the glib objects go away.  I finally got them cleaned up - but it 
took weeks of running under valgrind before I worked out when to do what 
to make it throw the objects away - but not crash due to a bad reference.

By the way, I suspect Lars' suggestion would work fine.  I would 
certainly explain what the better patch is in the comments when you 
apply this one.


On 05/02/2012 04:57 PM, renayama19661...@ybb.ne.jp wrote:
 Hi Lars,

 Thank you for comments.

 And when it passes more than a full day

 * node1
 32126 ?SLs   79:52  0   182 71189 24328  0.1 heartbeat: master 
 control process   

 * node2
 31928 ?SLs   77:01  0   182 70869 24008  0.1 heartbeat: master 
 control process
 Oh, I see.

 This is a design choice (maybe not even intentional) of the Gmain_*
 wrappers used throughout the heartbeat code.

 The real glib g_timeout_add_full(), and most other similar functions,
 internally do
   id = g_source_attach(source, ...);
   g_source_unref(source);
   return id;

 Thus in g_main_dispatch, the
   need_destroy = ! dispatch (...)
   if (need_destroy)
   g_source_destroy_internal()

 in fact ends up destroying it,
 if dispatch() returns FALSE,
 as documented:
  The function is called repeatedly until it returns FALSE, at
  which point the timeout is automatically destroyed and the
  function will not be called again.

 Not so with the heartbeat/glue/libplumbing Gmain_timeout_add_full.
 It does not g_source_unref(), so we keep the extra reference around
 until someone explicitly calls Gmain_timeout_remove().

 Talk about principle of least surprise :(

 Changing this behaviour to match glib's, i.e. unref'ing after
 g_source_attach, would seem like the correct thing to do,
 but is likely to break other pieces of code in subtle ways,
 so it may not be the right thing to do at this point.
 Thank you for detailed explanation.
 If you found the method that is appropriate than the correction that I 
 suggested, I approve of it.

 I'm going to take your patch more or less as is.
 If it does not show up soon, prod me again.

 All right.

 Many Thanks!
 Hideo Yamauchi.


 Thank you for tracking this down.

 Best Regards,
 Hideo Yamauchi.


 --- On Tue, 2012/5/1, 
 renayama19661...@ybb.ne.jprenayama19661...@ybb.ne.jp  wrote:

 Hi Lars,

 We confirmed that this problem occurred with v1 mode of Heartbeat.
* The problem happens with the v2 mode in the same way.

 We confirmed a problem in the next procedure.

 Step 1) Put a special device extinguishing a communication packet of 
 Heartbeat in the network.

 Step 2) Between nodes, the retransmission of the message is carried out 
 repeatedly.

 Step 3) Then the memory of the master process increases little by little.


  As a result of the ps command of the master process --
 * node1
 (start)
 32126 ?SLs0:00  0   182 53989  7128  0.0 heartbeat: master 
 control process
 (One hour later)
 32126 ?SLs0:03  0   182 54729  7868  0.0 heartbeat: master 
 control process
 (Two hour later)
 32126 ?SLs0:08  0   182 55317  8456  0.0 heartbeat: master 
 control process
 (Four hours later)
 32126 ?SLs0:24  0   182 56673  9812  0.0 heartbeat: master 
 control process

 * node2
 (start)
 31928 ?SLs0:00  0   182 53989  7128  0.0 heartbeat: master 
 control process
 (One hour later)
 31928 ?SLs0:02  0   182 54481  7620  0.0 heartbeat: master 
 control process
 (Two hour later)
 31928 ?SLs0:08  0   182 55353  8492  0.0 heartbeat: master 
 control process
 (Four hours later)
 31928 ?SLs0:23  0   182 56689  9828  0.0 heartbeat: master 
 control process


 The state of the memory leak seems to vary according to a node with the 
 quantity of the retransmission.

 The increase of this memory disappears by applying my patch.

 And the similar correspondence seems to be necessary in 
 send_reqnodes_msg(), but this is like little leak.

 Best Regards,
 Hideo Yamauchi.


 --- On Sat, 2012/4/28, 
 renayama19661...@ybb.ne.jprenayama19661...@ybb.ne.jp  wrote:

 Hi Lars,

 Thank you for comments.

 Have you actually been able to measure that memory leak you observed,
 and you can confirm this patch will fix it?

 Because I don't think this patch has any effect.
 Yes.
 I really measured leak.
 I can show a result next week.
 #Japan is a holiday until Tuesday.

Re: [Linux-ha-dev] Core Dump When Sending to Other Node That's Resetting

2012-04-18 Thread Alan Robertson

Have you tried running this under valgrind?

On 04/13/2012 05:22 PM, Nguyen Dinh Phong wrote:

Hi,
I wrote a wrapper using hbclient api for an application that manages 
the redundancy of our system. The application uses the wrapper to 
send/receive messages (string) between the primary and secondary.
In our testing of reset and switch over, once in a while, there is 
core dump in the send with double free in libc, that I do not know if 
caused by my wrapper of hbclient api.


/lib/libc.so.6[0xf7d71629]
/lib/libc.so.6(cfree+0x59)[0xf7d719e9]
/usr/lib/libplumb.so.2[0xf7e88dcf]
/usr/lib/libplumb.so.2[0xf7e9a03e]
/usr/lib/libplumb.so.2[0xf7e9a1a4]
/usr/lib/libplumb.so.2[0xf7e9922f]
/usr/lib/libplumb.so.2(msg2ipcchan+0xb8)[0xf7e891ea]
/usr/lib/libhbclient.so.1[0xf7e6a736]
/usr/lib/libha_lib.so(hb_send+0x204)[0xf7e61e15] --- my wrapper

I use send_ordered_nodemsg() to send and readmsg() to read (based on 
api_test.c). However in sample codes of ipfail or drbd, I saw the 
setting up of IPChannel and usage of msg2ipcchan(). Which is more 
appropriate?


I'd also like to know if I should add more codes to handle node status 
change because the crashes always occur when the other node go reset.


Snippet of my codes:

1. Initialization:
if (mhm_hb-llc_ops-signon(mhm_hb, ping)!= HA_OK) { // I pasted the 
common ping,

// plan to change to different name
cl_log(LOG_ERR, Cannot sign on with heartbeat);
...

2. Send:
int hb_send(ll_cluster_t *hb, char *dest, void *buf, size_t sz)
{
  HA_Message *msg;
  if (hb==NULL) return HA_FAIL;
  msg = ha_msg_new(0);
  if (ha_msg_add(msg, F_TYPE, T_MHM_MSG) != HA_OK) {
cl_log(LOG_ERR, hb_send: cannot add field TYPE\n);
ZAPMSG(msg);
return HA_FAIL;
  }
  if (ha_msg_add(msg, F_ORIG, node_name) != HA_OK) {
cl_log(LOG_ERR, hb_send: cannot add field ORIG\n);
ZAPMSG(msg);
return HA_FAIL;
  }
  char *payload = malloc(sz+1);
  if (payload==NULL) {
ZAPMSG(msg);
return HA_FAIL;
  }
  memset(payload, 0, sz+1);// Add a Null byte at the end
  memcpy(payload, buf, sz);
  if (ha_msg_add(msg, F_MHM_PAYLOAD, payload) != HA_OK) {
cl_log(LOG_ERR, hb_send: cannot add field PAYLOAD\n);
ZAPMSG(msg);
return HA_FAIL;
  }
  if (hb-llc_ops-send_ordered_nodemsg(hb, msg, peer_name) != HA_OK) {
ZAPMSG(msg);
return HA_FAIL;
  }
  else {
ZAPMSG(msg);
return sz;
  }
}

3. Receive:
int hb_recv(ll_cluster_t *hb, void *buf, size_t sz)
{
int msgcount=0;
HA_Message *reply;

if (hb==NULL) return HA_FAIL;
memset(buf, 0, sz);
for(; (reply=hb-llc_ops-readmsg(hb, 1)) != NULL;) {  
Blocking receiving

const char *type;
const char *orig;
const char *payload;
++msgcount;
if ((type = ha_msg_value(reply, F_TYPE)) == NULL) {
type = ?;
}
if ((orig = ha_msg_value(reply, F_ORIG)) == NULL) {
orig = ?;
}
cl_log(LOG_DEBUG, Got message %d of type [%s] from [%s]
,msgcount, type, orig);
if (strcmp(type, T_MHM_MSG) == 0) {
  payload = ha_msg_value(reply, F_MHM_PAYLOAD);

  int p_sz = strlen(payload);
  cl_log(LOG_DEBUG, payload %s sz %d p_sz %d\n, payload, sz, 
p_sz);


  if (p_sz = sz) {
char *tmp = (char*) buf;
strncpy(tmp, payload, p_sz);
cl_log(LOG_DEBUG, return buf %s sz %d ret_val %d, buf, 
strlen(buf), p_sz);

ZAPMSG(reply);
return(p_sz);
  } else {
cl_log(LOG_ERR, Receive buffer %d too small for payload 
%d, sz, p_sz);

ZAPMSG(reply);
return HA_FAIL;
  }
}
ZAPMSG(reply);  --- Could we delete message that's not 
meant to our module, or should we let it go?

}
if (reply==NULL) {
  cl_log(LOG_ERR, read_hb_msg returned NULL);
  cl_log(LOG_ERR, REASON: %s, hb-llc_ops-errmsg(hb));
}
return 0;
}

Thanks,
Phong



___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/



--
Alan Robertsonal...@unix.sh  - @OSSAlanR

Openness is the foundation and preservative of friendship...  Let me claim from you 
at all times your undisguised opinions. - William Wilberforce

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Another problem with the heartbeat init script...

2012-04-01 Thread Alan Robertson
On 03/30/2012 04:58 PM, Lars Ellenberg wrote:
 On Mon, Mar 26, 2012 at 02:28:42PM -0600, Alan Robertson wrote:
 Earlier I mentioned the problem with portreserve (which was apparently
 ignored?)
 No.
 But I cached a cold.
Sorry to hear that :-(.  Hope you're feeling better now.  You have my 
full sympathies - since I had bronchitis that lasted over two weeks.
 And you did not send a patch, did you?
Good point.  Sorry to be a whiner...  I was hoping for a little more 
conversation in any case.
 Now I have run into another problem.  When you set LRM parameters in
 /etc/sysconfig, the code assumes that the LRM will start within 20
 seconds of starting heartbeat.  That is not the case.

 lrmd was changed to getenv() it's max-children meanwhile.

 You need to cherry pick that patch, or update glue.
Good to hear that's changed.  I put a patch of my own into my local copy 
- just extending the loop and making it not print those annoying '.'s or 
delay startup while waiting.  So, I'm good locally - and there's no need 
for a patch for the future.

I guess I need to make a workspace so I can submit patches properly (as 
you noted above).

On the other hand, the good news though is that by upping that limit to 
16 and switching from a group to explicit dependencies, the failover 
time was cut from about 60 seconds to about 18 seconds - so I'm happy.


-- 
 Alan Robertsonal...@unix.sh  - @OSSAlanR

Openness is the foundation and preservative of friendship...  Let me claim 
from you at all times your undisguised opinions. - William Wilberforce
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Patch: pgsql streaming replication

2012-03-26 Thread Alan Robertson
On 03/20/2012 03:40 AM, Lars Marowsky-Bree wrote:
 On 2012-03-19T16:29:23, Soffen, Matthewmsof...@iso-ne.com  wrote:

 I believe that the reason for not using #bash is that it is it NOT part of 
 the default install on non Linux systems.
 That is what package dependencies are for.
Matt's point is simple:  Avoiding dependencies is far better than 
declaring them.

There is nothing in bash which cannot be easily done in the standard 
POSIX shell.

We have avoided these things in most of our RAs - and there is no reason 
to change that.

-- 
 Alan Robertsonal...@unix.sh  - @OSSAlanR

Openness is the foundation and preservative of friendship...  Let me claim 
from you at all times your undisguised opinions. - William Wilberforce
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Occasionally heartbeat doesn't start...

2012-03-12 Thread Alan Robertson
In this case, it was actually rpcbind that grabbed the port.  AFAIK, 
there is no way to tell it to use or not use a particular port - except 
to grab it first.  That's what portreserve does.  If it is run first, 
and given the right config files, it _will_ keep anyone else from using 
that port.

It makes sense to put port 694 in /etc/portreserve/heartbeat as part of 
our package and include that invocation.

If someone chooses a different port they can always edit that file.

Redhat provides portreserve and starts it by default before rpcbind.

If other distros don't provide it or use it - no harm comes from 
installing the file and attempting to run portrelease.

But for those that provide it, it is a help.



On 03/12/2012 05:43 AM, Lars Ellenberg wrote:
 On Fri, Mar 09, 2012 at 11:52:56AM -0700, Alan Robertson wrote:
 Hi,

 I've been investigating an HA configuration for a customer.  One
 time in testing heartbeat didn't start, because rpcbind had stolen
 its reserved port.  Restarting rpcbind made it choose a different
 random port.  This is definitely an interesting problem - even if it
 doesn't happen very often.

 The best solution to this, AFAIK is to make a file
 /etc/portreserve/heartbeat with this one line in it:
 694/udp

 and then add portrelease heartbeat to the init script.
 rpcbind used to be portmap.

 You would need the portreserve daemon available, installed,
 and started at the right time during your boot sequence.
 So that's only a hackish workaround.

 On Debian (Ubuntu, other derivatives) you'd simply add a line
 to /etc/bindresvport.blacklist. But that may fail as well,
 there have been reports where this was ignored for some reason.
 So that again is just a workaround.

 If you know exactly what will register with portmap (rpcbind),
 you can tell those services to request fixed ports instead.

 Typically you do, and those are just a few nfs related services.
 So just edit /etc/sysconfig/* or /etc/defaults/*
 to e.g. include -o and -p options for rpc.statd, and similar.

 This really is a fix, as long as you know all services
 that are started before heartbeat, and can tell them
 to use specific ports.



-- 
 Alan Robertsonal...@unix.sh  - @OSSAlanR

Openness is the foundation and preservative of friendship...  Let me claim 
from you at all times your undisguised opinions. - William Wilberforce
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] Occasionally heartbeat doesn't start...

2012-03-09 Thread Alan Robertson
Hi,

I've been investigating an HA configuration for a customer.  One time in 
testing heartbeat didn't start, because rpcbind had stolen its reserved 
port.  Restarting rpcbind made it choose a different random port.  This 
is definitely an interesting problem - even if it doesn't happen very often.

The best solution to this, AFAIK is to make a file 
/etc/portreserve/heartbeat with this one line in it:
694/udp

and then add portrelease heartbeat to the init script.

Thoughts?

-- 
 Alan Robertsonal...@unix.sh  - @OSSAlanR

Openness is the foundation and preservative of friendship...  Let me claim 
from you at all times your undisguised opinions. - William Wilberforce
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] Assimilation Monitoring Project on Twitter

2011-12-24 Thread Alan Robertson
Hi,

I'm giving a little more detailed status on the Assimilation project's 
development progress on Twitter - on @OSSAlanR.

-- 
 Alan Robertsonal...@unix.sh  - @OSSAlanR

Openness is the foundation and preservative of friendship...  Let me claim 
from you at all times your undisguised opinions. - William Wilberforce
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Additional changes made via DHCPD review process

2011-12-06 Thread Alan Robertson
I agree about avoiding the feature to sync config files.  My typical  
recommendation is to use drbdlinks and put it on replicated or shared  
storage.  In fact, I do that at home, and are doing it for a current  
customer.

By the way, Sean has recently revised drbdlinks to support the OCF  
API.  (In fact, it supports all of the OCF, heartbeat-v1 and LSB APIs).

http://www.tummy.com/Community/software/drbdlinks/

You can find his source control for it on github:
 https://github.com/linsomniac/drbdlinks




Quoting Florian Haas flor...@hastexo.com:

 On Tue, Dec 6, 2011 at 4:44 PM, Dejan Muhamedagic de...@suse.de wrote:
 Hi,

 On Tue, Dec 06, 2011 at 10:59:20AM -0400, Chris Bowlby wrote:
 Hi Everyone,

   I would like to thank Florian, Andreas and Dejan for making
 suggestions and pointing out some additional changed I should make. At
 this point the following additional changes have been made:

 - A test case in the validation function for ocf_is_probe has been
 reversed tp ! ocf_is_probe, and the test/[ ] wrappers removed to
 ensure the validation is not occuring if the partition is not mounted or
 under a probe.
 - An extraneous return code has been removed from the else clause of
 the probe test, to ensure the rest of the validation can finish.
 - The call to the DHCPD daemon itself during the start phase has been
 wrapped with the ocf_run helper function, to ensure that is somewhat
 standardized.

 The first two changes corrected the Failed Action... Not installed
 issue on the secondary node, as well as the fail-over itself. I've been
 able to fail over to secondary and primary nodes multiple times and the
 service follows the rest of the grouped services.

 There are a few things I'd like to add to the script, now that the main
 issues/code changes have been addressed, and they are as follows:

 - Add a means of copying /etc/dhcpd.conf from node1 to node2...nodeX
 from within the script. The logic behind this is as follows:

 I'd say that this is admin's responsibility. There are tools such
 as csync2 which can deal with that. Doing it from the RA is
 possible, but definitely very error prone and I'd be very
 reluctant to do that. Note that we have many RAs which keep
 additional configuration in a file and none if them tries to keep
 the copies of that configuration in sync itself.

 Seconded. Whatever configuration doesn't live _in_ the CIB proper, is
 not Pacemaker's job to replicate. The admin gets to either sync files
 manually across the nodes (csync2 greatly simplifies this; no need to
 reinvent the wheel), or put the config files on storage that's
 available to all cluster nodes.

 Cheers,
 Florian
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/


___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Stonith turns node names to lowercase

2011-10-20 Thread Alan Robertson
On 10/19/2011 04:11 AM, Lars Marowsky-Bree wrote:
 On 2011-10-18T12:40:40, Florian Haasflor...@hastexo.com  wrote:

g_strdown(nodecopy);

 Is there a reason for this ?
 I suppose Dejan will accept a patch making this configurable.
 Please, no. We fence by hostname; hostnames are case insensitive by
 definition. Plugins need to handle that.
More specifically - this patch was put in to make this work in the real 
world.  In the real world, host names correspond to DNS names (or you 
will go crazy).  DNS names are case-insensitive - and that's how it is.


-- 
 Alan Robertsonal...@unix.sh

Openness is the foundation and preservative of friendship...  Let me claim 
from you at all times your undisguised opinions. - William Wilberforce
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] attrd and repeated changes

2011-10-20 Thread Alan Robertson
On 10/20/2011 03:41 AM, Philipp Marek wrote:
 Hello,

 when constantly sending new data via attrd the changes are never used.


 Example:
 while sleep 1
   do attrd_updater -l reboot -d 5 -n rep_chg -U try$SECONDS
   cibadmin -Ql | grep rep_chg
 done

 This always returns the same value - the one that was given with more than 5
 seconds delay afterwards, so that the dampen interval wasn't broken by the
 next change.


 I've attached two draft patches; one for allowing the _first_ value in a
 dampen interval to be used (effectively ignoring changes until this value is
 written), and one for using the _last_ value in the dampen interval (by not
 changing the dampen timer). [1]


 ***  Note: they are for discussion only!
 ***  I didn't test them, not even for compilation.


 Perhaps this bug [2] was introduced with one of these changes (the hashes
 are the GIT numbers)

  High: crmd: Bug lf#2528 - Introduce a slight delay when
creating a transition to allow attrd time to perform its updates
  e7f5da92490844d190609931f434e08c0440da0f

  Low: attrd: Indicate when attrd clients are updating fields
  69b49b93ff6fd25ac91f589d8149f2e71a5114c5


 What is the correct way to handle multiple updates within the dampen
 interval?
Personally, I'd vote for the last value.  I agree with you about this 
being a bug.


-- 
 Alan Robertsonal...@unix.sh

Openness is the foundation and preservative of friendship...  Let me claim 
from you at all times your undisguised opinions. - William Wilberforce
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] endian oddity in cluster glue

2011-09-06 Thread Alan Robertson
On 09/05/2011 09:02 AM, Pádraig Brady wrote:
 I was looking at extricating some logic from cluster-glue-libs
 and noticed strangeness wrt the endianess checking.
 CONFIG_BIG_ENDIAN is defined on my x86 machine, which is
 due to configure.ac referencing a non existent byteorder_test.c

 To fix this I would suggest the following patch.
 However, I'm wary that this may introduce compatibility
 issues with generated md5 sums which is the only code
 that inspects the above defines.  If we stick with BIG_ENDIAN
 always then there will be interoperability issues between
 x86 and ppc hosts for example (which may not be an issue)?

 cheers,
 Pádraig.

Strange as it seems, there are some mixed PPC and x86 clusters out there...

-- 
 Alan Robertsonal...@unix.sh

Openness is the foundation and preservative of friendship...  Let me claim 
from you at all times your undisguised opinions. - William Wilberforce
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Announcing - the Assimilation monitoring system - a sub-project of Linux-HA

2011-08-19 Thread Alan Robertson
On 08/17/2011 09:15 PM, Digimer wrote:

 Linux is fairly described as an ecosystem. Differing branches and
 methods of solving a given problem are tried, and the one with the most
 backing and merit wins. It's part of what makes open-source what it is.
 So, from my point of view, best of luck to both. :)
Thanks!

And, I share your last point of view as well - I wish all parties good luck!

-- 
 Alan Robertsonal...@unix.sh

Openness is the foundation and preservative of friendship...  Let me claim 
from you at all times your undisguised opinions. - William Wilberforce
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Announcing - the Assimilation monitoring system - a sub-project of Linux-HA

2011-08-17 Thread Alan Robertson
On 08/16/2011 05:08 PM, Angus Salkeld wrote:
 On Fri, Aug 12, 2011 at 03:11:36PM -0600, Alan Robertson wrote:
 Hi,

 Back last November or so, I started work on a new monitoring project -
 for monitoring servers and services.

 It's aims are:
 - Scalable virtually without limit - tens of thousands of servers is
 not a problem
 - Easy installation and upkeep - includes integrated discovery of
 servers, services
   and switches - without setting off security alarms ;-)

 This project isn't ready for a public release yet (it's in a fairly
 early stage), but it seemed worthwhile to let others know that the
 project exists, and start getting folks to read over the code, and
 perhaps begin to play with it a bit as well.

 The project has two arenas of operation:
   nanoprobes - which run in (nearly) every monitored machine
 Why not matahari (http://matahariproject.org/)?

   Collective management - running in a central server (or HA cluster).
 Quite simerlar to http://pacemaker-cloud.org/. Seems a
 shame not to be working together.

 -Angus
This is a set of ideas I've been working on for the last four years or 
so.  My most grandiose vision of it I called a Data Center Operating 
System.   This is about the same time that Amazon announced their first 
cloud offering (unknown to me).  There are a few hints about it a couple 
of years ago in my blog.

I heard a little about Andrew's project when I announced this back in 
November.  Andrew has made it perfectly clear that he doesn't want to 
work with me (really, absolutely, abundantly, perfectly, crystal clear) 
and there is evidence that he doesn't work well with others besides me, 
that's not a possibility.

In the short term I'm not specially concerned with clouds - just with 
any collection of computers which range from 4 up to and above cloud 
scale.  That includes clouds of course - but we'll get a lot more users 
at the small scale than we well at cloud scale.

There are several reasons for this approach:
   - Existing monitoring software sucks.
   - Many more collections of computers besides clouds exist and need 
help - although this would work very well with clouds

This problem has dimensions that a cloud environment doesn't have.  In a 
cloud, all deployment is automated, so you can _know_ what is running 
where.  In a more conventional data center, having a way to discover 
what's in your data center, and what's running on those servers is 
important.

-- 
 Alan Robertsonal...@unix.sh

Openness is the foundation and preservative of friendship...  Let me claim 
from you at all times your undisguised opinions. - William Wilberforce
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Uniquness OCF Parameters

2011-06-20 Thread Alan Robertson
On 06/17/2011 02:43 AM, Lars Ellenberg wrote:
 On Thu, Jun 16, 2011 at 03:52:37PM -0600, Alan Robertson wrote:
 On 06/16/2011 02:51 AM, Lars Ellenberg wrote:
 On Thu, Jun 16, 2011 at 09:48:20AM +0200, Florian Haas wrote:
 On 2011-06-16 09:03, Lars Ellenberg wrote:
 With the current unique=true/false, you cannot express that.
 Thanks. You learn something every day. :)
 Sorry that I left off the As you are well aware of,
 introductionary phrase. ;-)

 I just summarized the problem:

 Depending on what we chose the meaning to be,
 parameters marked unique=true would be required to
 either be all _independently_ unique,
 or be unique as a tuple.
 And made a suggestion how to solve it:

 If we want to be able to express both, we need a different markup.

 Of course, we can move the markup out of the parameter description,
 into an additional markup, that spells them out,
 likeunique params=foo,bar /unique params=bla /.

 But using unique=0 as the current non-unique meaning, then
 unique=small-integer-or-even-named-label-who-cares, would
 name the scope for this uniqueness requirement,
 where parameters marked with the same such label
 would form a unique tuple.
 Enables us to mark multiple tuples, and individual parameters,
 at the same time.
 If we really think it _is_ a problem.
 If one wanted to, one could say
   unique=1,3
 or
   unique=1
   unique=3

 Then parameters which share the same uniqueness list are part of the
 same uniqueness grouping.  Since RAs today normally say unique=1, if one
 excluded the unique group 0 from being unique, then this could be done
 in a completely upwards-compatible way for nearly all resources.
 That is what I suggested, yes.
 Where unique=0 is basically not mentioning the unique hint.
Originally that's what I thought you said.  But somehow read it 
differently later.  Perhaps I got my comment authorship cross-wired.  
Wouldn't be hard to imagine ;-)


-- 
 Alan Robertsonal...@unix.sh

Openness is the foundation and preservative of friendship...  Let me claim 
from you at all times your undisguised opinions. - William Wilberforce

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Uniquness OCF Parameters

2011-06-16 Thread Alan Robertson
On 06/16/2011 02:51 AM, Lars Ellenberg wrote:
 On Thu, Jun 16, 2011 at 09:48:20AM +0200, Florian Haas wrote:
 On 2011-06-16 09:03, Lars Ellenberg wrote:
 With the current unique=true/false, you cannot express that.
 Thanks. You learn something every day. :)
 Sorry that I left off the As you are well aware of,
 introductionary phrase. ;-)

 I just summarized the problem:

 Depending on what we chose the meaning to be,
 parameters marked unique=true would be required to
either be all _independently_ unique,
or be unique as a tuple.
 And made a suggestion how to solve it:

 If we want to be able to express both, we need a different markup.

 Of course, we can move the markup out of the parameter description,
 into an additional markup, that spells them out,
 likeunique params=foo,bar /unique params=bla /.

 But using unique=0 as the current non-unique meaning, then
 unique=small-integer-or-even-named-label-who-cares, would
 name the scope for this uniqueness requirement,
 where parameters marked with the same such label
 would form a unique tuple.
 Enables us to mark multiple tuples, and individual parameters,
 at the same time.
 If we really think it _is_ a problem.
If one wanted to, one could say
 unique=1,3
or
 unique=1
 unique=3

Then parameters which share the same uniqueness list are part of the 
same uniqueness grouping.  Since RAs today normally say unique=1, if one 
excluded the unique group 0 from being unique, then this could be done 
in a completely upwards-compatible way for nearly all resources.


-- 
 Alan Robertsonal...@unix.sh

Openness is the foundation and preservative of friendship...  Let me claim 
from you at all times your undisguised opinions. - William Wilberforce

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Uniquness OCF Parameters

2011-06-15 Thread Alan Robertson
On 06/14/2011 06:03 AM, Florian Haas wrote:
 On 2011-06-14 13:08, Dejan Muhamedagic wrote:
 Hi Alan,

 On Mon, Jun 13, 2011 at 10:32:02AM -0600, Alan Robertson wrote:
 On 06/13/2011 04:12 AM, Simon Talbot wrote:
 A couple of observations (I am sure there are more) on the uniqueness flag 
 for OCF script parameters:

 Would it be wise for the for the index parameter of the SFEX ocf script to 
 have its unique flag set to 1 so that the crm tool (and others) would warn 
 if one inadvertantly tried to create two SFEX resource primitives with the 
 same index?

 Also, an example of the opposite, the Stonith/IPMI script, has parameters 
 such as interface, username and password with their unique flags set to 1, 
 causing erroneous warnings if you use the same interface, username or 
 password for multiple IPMI stonith primitives, which of course if often 
 the case in large clusters?

 When we designed it, we intended that Unique applies to the complete set
 of parameters - not to individual parameters.  It's like a multi-part
 unique key.  It takes all 3 to create a unique instance (for the example
 you gave).
 That makes sense.
 Does it really? Then what would be the point of having some params that
 are unique, and some that are not? Or would the tuple of _all_
 parameters marked as unique be considered unique?

I don't know what you think I said, but A multi-part key to a database 
is a tuple which consists of all marked parameters.  You just said what 
I said in a different way.

So we agree.

-- 
 Alan Robertsonal...@unix.sh

Openness is the foundation and preservative of friendship...  Let me claim 
from you at all times your undisguised opinions. - William Wilberforce

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Uniquness OCF Parameters

2011-06-15 Thread Alan Robertson
On 06/14/2011 07:21 AM, Dejan Muhamedagic wrote:
 Hi Florian,

 On Tue, Jun 14, 2011 at 02:03:19PM +0200, Florian Haas wrote:
 On 2011-06-14 13:08, Dejan Muhamedagic wrote:
 Hi Alan,

 On Mon, Jun 13, 2011 at 10:32:02AM -0600, Alan Robertson wrote:
 On 06/13/2011 04:12 AM, Simon Talbot wrote:
 A couple of observations (I am sure there are more) on the uniqueness 
 flag for OCF script parameters:

 Would it be wise for the for the index parameter of the SFEX ocf script 
 to have its unique flag set to 1 so that the crm tool (and others) would 
 warn if one inadvertantly tried to create two SFEX resource primitives 
 with the same index?

 Also, an example of the opposite, the Stonith/IPMI script, has parameters 
 such as interface, username and password with their unique flags set to 
 1, causing erroneous warnings if you use the same interface, username or 
 password for multiple IPMI stonith primitives, which of course if often 
 the case in large clusters?

 When we designed it, we intended that Unique applies to the complete set
 of parameters - not to individual parameters.  It's like a multi-part
 unique key.  It takes all 3 to create a unique instance (for the example
 you gave).
 That makes sense.
 Does it really? Then what would be the point of having some params that
 are unique, and some that are not? Or would the tuple of _all_
 parameters marked as unique be considered unique?
 Consider the example above for sfex. It has a device and index
 which together determine which part of the disk the RA should
 use. Only the device:index tuple must be unique.  Currently,
 neither device nor index is a unique parameter (in the
 meta-data). Otherwise we'd have false positives for the
 following configuration:

 disk1:1
 disk1:2
 disk2:1
 disk2:2

 Now, stuff such as configfile and pidfile obviously both must be
 unique independently of each other. There are probably other
 examples of both kinds.
Although what you said makes sense and is obviously upwards compatible 
and useful - I don't think we thought it through to that detail 
originally.  It was a long time ago, and we were trying to think all 
these kind of things through - and I don't recall that kind of detail in 
the discussions.  I think we did a pretty good job.  It's one of the 
things I think we did right.  That's not saying it's perfect - nothing 
is.  But, it's not bad, and I think it has stood the test of time pretty 
well.

We met face to face for these discussions, and not every word we said 
was archived for posterity ;-).  We wrote up the specs afterwards - 
several weeks later due to things like getting the web site set up and 
so on.

Folks from two proprietary stacks, people from the user community, and 
folks from the Linux-HA community met to do these things together.  
About 8-12 people total.  I remember most of them.

Although we invited Red Hat - they didn't send anyone.  It's a testament 
to the spec that in spite of not participating in the standard, that 
they implemented it anyway.  I was certainly pleased with that.

-- 
 Alan Robertsonal...@unix.sh

Openness is the foundation and preservative of friendship...  Let me claim 
from you at all times your undisguised opinions. - William Wilberforce

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Uniquness OCF Parameters

2011-06-13 Thread Alan Robertson
On 06/13/2011 04:12 AM, Simon Talbot wrote:
 A couple of observations (I am sure there are more) on the uniqueness flag 
 for OCF script parameters:

 Would it be wise for the for the index parameter of the SFEX ocf script to 
 have its unique flag set to 1 so that the crm tool (and others) would warn if 
 one inadvertantly tried to create two SFEX resource primitives with the same 
 index?

 Also, an example of the opposite, the Stonith/IPMI script, has parameters 
 such as interface, username and password with their unique flags set to 1, 
 causing erroneous warnings if you use the same interface, username or 
 password for multiple IPMI stonith primitives, which of course if often the 
 case in large clusters?


When we designed it, we intended that Unique applies to the complete set 
of parameters - not to individual parameters.  It's like a multi-part 
unique key.  It takes all 3 to create a unique instance (for the example 
you gave).


-- 
 Alan Robertsonal...@unix.sh

Openness is the foundation and preservative of friendship...  Let me claim 
from you at all times your undisguised opinions. - William Wilberforce

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Problem WARN: Gmain_timeout_dispatch Again

2011-05-16 Thread Alan Robertson
That's probably OK.  If you're really having a problem, it should 
ordinarily show it up before it causes a false failover.


Then you can figure out if you want to raise your timeout or figure out 
what's causing the slow processing.



On 05/14/2011 09:08 AM, gilmarli...@agrovale.com.br wrote:

Thanks again.
deadtime 30 and warntime 15 this good ?

 BUT also either make warntime smaller or deadtime larger...


 On 5/13/2011 7:48 PM, gilmarli...@agrovale.com.br wrote:
 Thank you for your attention.
 His recommendation and wait, if only to continue the logs I get
 following warning if the services do not migrate to another server
 just keep watching the logs warning.

  I typically make deadtime something like 3 times warntime. That way
  you'll get data before you get into trouble. When your heartbeats
  exceed warntime, you get information on how late it is. I would
  typically make deadtime AT LEAST twice the latest time you've 
ever seen

  with warntime.
 
  If the worst case you ever saw was this 60ms instead of 50ms, I'd 
look
  somewhere else for the problem. However, it is possible that you 
have a

  hardware trouble, or a kernel bug. Possible, but unlikely.
 
  More logs are always good when looking at a problem like this.
  hb_report will get you lots of logs and so on for the next time it
 happens.
 
  On 05/13/2011 11:44 AM, gilmarli...@agrovale.com.br wrote:
  Thanks for the help.
 
  I had a problem the 30 days that began with this post, and after two
  days the heartbeat message that the accused had fallen server1 and
  services migrated to server2
  Now with this change to eth1 and eth2 for drbd and heartbeat to the
  amendment of warntime deadtime 20 to 15 and do not know if this will
  happen again.
  Thanks
 
   That's related to process dispatch time in the kernel. It might
 be the
   case that this expectation is a bit aggressive (mea culpa).
  
   In the mean time, as long as those timings remain close to the
   expectations (60 vs 50ms) I'd ignore them.
  
   Those messages are meant to debug real-time problems - which you
 don't
   appear to be having.
  
   -- Alan Robertson
   al...@unix.sh
  
  
   On 05/12/2011 12:54 PM, gilmarli...@agrovale.com.br wrote:
   Hello!
   I'm using heartbeat version 3.0.3-2 on debian squeeze with 
dedicated

   gigabit ethernet interface for the heartbeat.
   But even this generates the following message:
   WARN: Gmain_timeout_dispatch: Dispatch function for send local
 status
   took too long to execute: 60 ms ( 50 ms) (GSource: 0x101c350)
   I'm using eth1 to eth2 and to Synchronize DRBD(eth1) HEARBEAT
 (eth2).
   I tried increasing the values deadtime = 20 and 15 warntime
   Interface Gigabit Ethernet controller: Intel Corporation 82575GB
   Serv.1 and the Ethernet controller: Broadcom Corporation
 NetXtreme II
   BCM5709 in Serv.2
   Tested using two Broadcom for the heartbeat, also without 
success.

  
   Thanks
  
   --
 


 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/

 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/



___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/



--
Alan Robertsonal...@unix.sh

Openness is the foundation and preservative of friendship...  Let me claim from you 
at all times your undisguised opinions. - William Wilberforce

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Problem WARN: Gmain_timeout_dispatch Again

2011-05-13 Thread Alan Robertson
That's related to process dispatch time in the kernel.  It might be the 
case that this expectation is a bit aggressive (mea culpa).

In the mean time, as long as those timings remain close to the 
expectations (60 vs 50ms) I'd ignore them.

Those messages are meant to debug real-time problems - which you don't 
appear to be having.

 -- Alan Robertson
 al...@unix.sh


On 05/12/2011 12:54 PM, gilmarli...@agrovale.com.br wrote:
 Hello!
 I'm using heartbeat version 3.0.3-2 on debian squeeze with dedicated 
 gigabit ethernet interface for the heartbeat.
 But even this generates the following message:
 WARN: Gmain_timeout_dispatch: Dispatch function for send local status 
 took too long to execute: 60 ms ( 50 ms) (GSource: 0x101c350)
 I'm using eth1 to eth2 and to Synchronize DRBD(eth1) HEARBEAT (eth2).
 I tried increasing the values deadtime = 20 and 15 warntime
 Interface Gigabit Ethernet controller: Intel Corporation 82575GB 
 Serv.1 and the Ethernet controller: Broadcom Corporation NetXtreme II 
 BCM5709 in Serv.2
 Tested using two Broadcom for the heartbeat, also without success.

 Thanks

-- 
 Alan Robertsonal...@unix.sh

Openness is the foundation and preservative of friendship...  Let me claim 
from you at all times your undisguised opinions. - William Wilberforce

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Problem WARN: Gmain_timeout_dispatch Again

2011-05-13 Thread Alan Robertson
I typically make deadtime something like 3 times warntime.  That way 
you'll get data before you get into trouble.  When your heartbeats 
exceed warntime, you get information on how late it is.  I would 
typically make deadtime AT LEAST twice the latest time you've ever seen 
with warntime.


If the worst case you ever saw was this 60ms instead of 50ms, I'd look 
somewhere else for the problem.  However, it is possible that you have a 
hardware trouble, or a kernel bug.  Possible, but unlikely.


More logs are always good when looking at a problem like this.  
hb_report will get you lots of logs and so on for the next time it happens.


On 05/13/2011 11:44 AM, gilmarli...@agrovale.com.br wrote:

Thanks for the help.

I had a problem the 30 days that began with this post, and after two 
days the heartbeat message that the accused had fallen server1 and 
services migrated to server2
Now with this change to eth1 and eth2 for drbd and heartbeat to the 
amendment of warntime deadtime 20 to 15 and do not know if this will 
happen again.

Thanks

 That's related to process dispatch time in the kernel. It might be the
 case that this expectation is a bit aggressive (mea culpa).

 In the mean time, as long as those timings remain close to the
 expectations (60 vs 50ms) I'd ignore them.

 Those messages are meant to debug real-time problems - which you don't
 appear to be having.

 -- Alan Robertson
 al...@unix.sh


 On 05/12/2011 12:54 PM, gilmarli...@agrovale.com.br wrote:
 Hello!
 I'm using heartbeat version 3.0.3-2 on debian squeeze with dedicated
 gigabit ethernet interface for the heartbeat.
 But even this generates the following message:
 WARN: Gmain_timeout_dispatch: Dispatch function for send local status
 took too long to execute: 60 ms ( 50 ms) (GSource: 0x101c350)
 I'm using eth1 to eth2 and to Synchronize DRBD(eth1) HEARBEAT (eth2).
 I tried increasing the values deadtime = 20 and 15 warntime
 Interface Gigabit Ethernet controller: Intel Corporation 82575GB
 Serv.1 and the Ethernet controller: Broadcom Corporation NetXtreme II
 BCM5709 in Serv.2
 Tested using two Broadcom for the heartbeat, also without success.

 Thanks

 --


___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/



--
Alan Robertsonal...@unix.sh

Openness is the foundation and preservative of friendship...  Let me claim from you 
at all times your undisguised opinions. - William Wilberforce

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] nginx resource agent

2011-01-01 Thread Alan Robertson
On 12/14/2010 02:42 AM, Dejan Muhamedagic wrote:
 Hi Alan,

 On Mon, Dec 06, 2010 at 10:41:48AM -0700, Alan Robertson wrote:
 Hi,

 Attached is a resource agent for the nginx web server/proxy package.

  http://en.wikipedia.org/wiki/Nginx
  http://nginx.org/

 I'd like for it to be added to the next release of the resource
 agents package.
 It's a pity that we cannot share code with the apache RA. We need
 to setup some place for that.

 Pushed to the repository.

#
# I'm not convinced this is a wonderful idea (AlanR)
#
for sig in SIGTERM SIGHUP SIGKILL
do
  if
pgrep -f $NGINXD.*$CONFIGFILE/dev/null
  then
pkill -$sig -f $NGINXD.*$CONFIGFILE/dev/null
ocf_log info nginxd children were signalled ($sig)
sleep 1
  else
break
  fi
done
 Can't recall anymore the details, there was a bit of discussion
 on the matter a few years ago, but NTT insisted on killing httpd
 children. Or do you mind the implementation?

Hi Dejan,

I know it's been a long time.  Sorry about that.  If I _hated_ the idea, 
I would have left it out.  It definitely leaves me feeling a bit 
unsettled.  If it causes a problem, it will no doubt eventually show 
up.  It looks like it's just masking a bug in Apache - that is, that 
giving it a shutdown request doesn't really work...

Perhaps I shouldn't have kept it in the nginx code - since it does seem 
to be a bit specific to some circumstance in Apache...  On the other 
hand, it shouldn't hurt anything either...


-- 
 Alan Robertsonal...@unix.sh

Openness is the foundation and preservative of friendship...  Let me claim 
from you at all times your undisguised opinions. - William Wilberforce

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] nginx resource agent

2010-12-06 Thread Alan Robertson

Hi,

Attached is a resource agent for the nginx web server/proxy package.

http://en.wikipedia.org/wiki/Nginx
http://nginx.org/

I'd like for it to be added to the next release of the resource agents 
package.


Thanks!

--
Alan Robertsonal...@unix.sh

Openness is the foundation and preservative of friendship...  Let me claim from you 
at all times your undisguised opinions. - William Wilberforce

#!/bin/sh
#
#   High-Availability nginx OCF resource agent
# 
# nginx
#
# Description:  starts/stops nginx servers.
#
# Author:   Alan Robertson
#   Dejan Muhamedagic
#   This code is based significantly on the apache resource agent
#
# Support:  linux...@lists.linux-ha.org
#
# License:  GNU General Public License (GPL)
#
# Copyright:(C) 2002-2010 International Business Machines
#
#
# Our parsing of the nginx config files is very rudimentary.
# It'll work with lots of different configurations - but not every
# possible configuration.
#
# Patches are being accepted ;-)
#
# OCF parameters:
#  OCF_RESKEY_configfile
#  OCF_RESKEY_nginx
#  OCF_RESKEY_port
#  OCF_RESKEY_options
#  OCF_RESKEY_status10regex
#  OCF_RESKEY_status10url
#  OCF_RESKEY_client
#  OCF_RESKEY_testurl
#  OCF_RESKEY_test20regex
#  OCF_RESKEY_test20conffile
#  OCF_RESKEY_test20name
#  OCF_RESKEY_external_monitor30_cmd
#
#
#   TO DO:
#   More extensive tests of extended monitor actions
#   Look at the --with-http_stub_status_module for validating
#   the configuration?  (or is that automatically done?)
#   Checking could certainly result in better error
#   messages.
#   Allow for the fact that the config file and so on might all be
#   on shared disks - this affects the validate-all option.


: ${OCF_FUNCTIONS_DIR=$OCF_ROOT/resource.d/heartbeat}
. ${OCF_FUNCTIONS_DIR}/.ocf-shellfuncs
HA_VARRUNDIR=${HA_VARRUN}

###
#
#   Configuration options - usually you don't need to change these
#
###
#
NGINXDLIST=/usr/sbin/nginx /usr/local/sbin/nginx

# default options for http clients
# NB: We _always_ test a local resource, so it should be
# safe to connect from the local interface.
WGETOPTS=-O- -q -L --no-proxy --bind-address=127.0.0.1
CURLOPTS=-o - -Ss -L --interface lo

LOCALHOST=http://localhost;
NGINXDOPTS=
#
#
#   End of Configuration options
###

CMD=`basename $0`

#   The config-file-pathname is the pathname to the configuration
#   file for this web server.  Various appropriate defaults are
#   assumed if no config file is specified.
usage() {
  cat -!
usage: $0 action

action:
start   start nginx

stopstop nginx

reload  reload the nginx configuration

status  return the status of web server, running or stopped

monitor  return TRUE if the web server appears to be working.
For this to be supported you must configure mod_status
and give it a server-status URL - or configure what URL
you wish to be monitored.  You have to have installed
either curl or wget for this to work.

meta-data   show meta data message

validate-allvalidate the instance parameters
!
  exit $1
}

#
# run the http client
#
curl_func() {
cl_opts=$CURLOPTS $test_httpclient_opts
if
  [ x != x$test_user ]
then
  echo -u $test_user:$test_password |
  curl -K - $cl_opts $1
else
  curl $cl_opts $1
fi
}
wget_func() {
auth=
cl_opts=$WGETOPTS $test_httpclient_opts
[ x != x$test_user ] 
auth=--http-user=$test_user --http-passwd=$test_password
wget $auth $cl_opts $1
}
#
# rely on whatever the user provided
userdefined() {
$test_httpclient $test_httpclient_opts $1
}

#
# find a good http client
#
findhttpclient() {
# prefer curl if present...
if
  [ x$CLIENT != x ]
then
echo $CLIENT
elif
  which curl /dev/null 21
then
echo curl
elif
  which wget /dev/null 21
then
echo wget
else
return 1
fi
}
gethttpclient() {
[ -z $test_httpclient ] 
test_httpclient=$ourhttpclient
case $test_httpclient in
curl|wget) echo ${test_httpclient}_func;;  #these are supported
*) echo userdefined;;
esac
}

# test configuration good?
is_testconf_sane() {
if
  [ x$test_regex = x -o x$test_url = x ]
then
  ocf_log err test regular expression or test url empty
  return 1
fi

Re: [Linux-ha-dev] nginx resource agent

2010-12-06 Thread Alan Robertson
Let the the list know how it works out.  It has some improvements over 
how the original Apache resource agent works.

On 12/06/2010 10:59 AM, Raoul Bhatia [IPAX] wrote:
 On 12/06/2010 06:41 PM, Alan Robertson wrote:
 Hi,

 Attached is a resource agent for the nginx web server/proxy package.

  http://en.wikipedia.org/wiki/Nginx
  http://nginx.org/

 I'd like for it to be added to the next release of the resource agents
 package.
 nice - thank you! :)

 cheers,
 raoul


-- 
 Alan Robertsonal...@unix.sh

Openness is the foundation and preservative of friendship...  Let me claim 
from you at all times your undisguised opinions. - William Wilberforce

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] OCF RA dev guide: final heads up

2010-12-06 Thread Alan Robertson
On 12/06/2010 09:35 AM, Dejan Muhamedagic wrote:
 On a different matter:

 Perhaps it would be good to add a section about ocf-tester. Or
 would you consider that out of scope?

Let me second that request.  If you don't know about ocf-tester, then 
you don't really know much about building OCF RAs (IMHO).

-- 
 Alan Robertsonal...@unix.sh

Openness is the foundation and preservative of friendship...  Let me claim 
from you at all times your undisguised opinions. - William Wilberforce

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Thinking about a new communications plugin

2010-11-28 Thread Alan Robertson
On 11/27/2010 04:19 PM, Lars Ellenberg wrote:
 On Sun, Nov 28, 2010 at 12:03:23AM +0100, Lars Ellenberg wrote:
 But until then, you could probably already have implemented your
 original proposal in the cumulative man hours spent writing and reading
 this thread, and I'm sure I will get used. So please, just go ahead.
 tztztz.
 Though possibly I get used, too, sometimes, I obviously meant
 ..., and I'm sure _it_ will be used.
 And I'm going to be one of those that use it, probably...
It wasn't that bad to read it all.  I hadn't realized the messages had 
gotten so large.

We put in compression exactly to deal with this situation.  All that 
bulky XML is extremely compressible.

I didn't write that part of the code, and hadn't noticed that it did all 
that excessive compression/decompression.  But you will note that this 
only really happens during a cluster transition.   Most of the time 
nothing happens - and nothing but heartbeats go over the network - or 
has that changed too?

On a completely different subject, I'm modernizing my home production 
cluster.  Switching to Ubuntu, replacing motherboard with multi-core 
CPUs, replacing hard drives, adding striping.  I was planning on putting 
the DRBD metadata in an SSD - but there seems to some incompatibility 
between the SSD I bought and Linux and/or my motherboard.  On the other 
hand the SSD works nicely with non-Linux disk testing utilities.  Sigh...


-- 
 Alan Robertsonal...@unix.sh

Openness is the foundation and preservative of friendship...  Let me claim 
from you at all times your undisguised opinions. - William Wilberforce

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] Thinking about a new communications plugin

2010-11-22 Thread Alan Robertson
Hi,

I've been thinking about a new unicast communications plugin that would 
work slightly differently from the current ucast plugin.

It would take a filename giving the hostnames or ipv4 or ipv6 unicast 
addresses that one wants to send heartbeats to.

When heartbeat receives a SIGHUP, this plugin would reread this file and 
reconfigure the hosts to send heartbeats to.

This would mean that there would be no reason to have to restart 
heartbeat just to add or delete a host from the list being sent heartbeats.

Some environments (notably clouds) don't allow either broadcasts or 
multicasts.  This would allow those environments to be able to add and 
delete hosts to the cluster without having to restart heartbeat - as 
occurs now...  [and I'd like to support ipv6 for heartbeats].

Any thoughts about this?

Would anyone else like such a plugin?

-- 
 Alan Robertsonal...@unix.sh

Openness is the foundation and preservative of friendship...  Let me claim 
from you at all times your undisguised opinions. - William Wilberforce

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] a scalable membership and LRM proxy proposal

2010-11-18 Thread Alan Robertson
On 11/18/2010 02:47 AM, Andrew Beekhof wrote:
 Its not that I'm against your proposal, I just don't know of enough
 resources to build, test and stabilize a new communication protocol.
 In that context, an off-the-shelf component that gives us a couple of
 magnitudes worth of additional scaling looks pretty attractive - and
 should provide some valuable feedback for how to take it to the next
 level.
Right.  I don't think this is as complex as for example the original 
heartbeat protocol (with error recovery and so on).  Time will tell - 
assuming I have enough time to do anything with it at all ;-).

But, I won't be dealing with the overhead my company imposes on official 
efforts - which will at least triple my effectiveness ;-).


-- 
 Alan Robertsonal...@unix.sh

Openness is the foundation and preservative of friendship...  Let me claim 
from you at all times your undisguised opinions. - William Wilberforce

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] a scalable membership and LRM proxy proposal

2010-11-16 Thread Alan Robertson
Thanks for this information.  This is _SOOO_ much better than trying to 
dig it all out of the web site.

On 11/16/2010 03:04 AM, Andrew Beekhof wrote:
 Alan Robertson wrote:
 I was hoping for something a little more lightweight - although I
 clearly understand the benefits of it already exists and having some
 credible claims to security as a goal (since nothing is ever secure).

 I wonder if you really want that kind of very strongly guaranteed
 message delivery
 Not always, possibly not ever.
 But happily this is configurable, so we'd only ask for those
 guarantees if we needed them.
That's good to know.  Is this an Apache extension, or is this part of 
the standard?

Does Qpid support IPv6?
 - since messages sent to a node that crashes before
 receiving them are delivered after it comes back up.  But, of course,
 there's always a way to work around things that don't do what you need
 for them to.  Presumably you'd also need to clean up those messages out
 of the queues of all senders if the node is going away permanently - at
 least once you figure that out...  Messages to clients seem to better
 match the semantics of RDS.  Messages back to overlords could use AMQP
 without obvious corresponding issues.

 I wonder about latency - particularly when federated - and taking
 garbage collection into account...  I see that QPID claims to be
 extremely fast.  It probably is pretty fast for a large and complex
 Java program.
 Here are the numbers from their website:

 Red Hat MRG product built on Qpid has shown 760,000msg/sec ingress on
 an 8 way box or 6,000,000 msg/sec
Is there something missing from this sentence, or am I just dense?  I'm 
guessing that this is intended to imply that it can process 760K 
msgs/sec per CPU, giving a projected 6M msgs/sec for an 8-way...
 Latencies have been recored as low as 180-250us (.18ms-.3ms) for TCP
 round trip and 60-80us for RDMA round trip using the C++ broker
For latencies, something more like 99th percentile guarantees are a 
better measure than best case latencies.  And, if it uses TCP, then the 
overhead of holding 10K TCP connections open at once seems a bit high - 
just to do nothing most of the time...  This model is different from the 
design point for this protocol.  I expect that most of the time these 
connections would sit idle.

One of the cool things about the proposal I made is that that the 
overlords incur near-zero ongoing overhead to monitor a very very big 
network, and no network congestion.  This work to do this monitoring is 
spread pretty evenly among all the nodes in the system such that no node 
has to keep track of more than a handful of peers (most only have two 
peers - it looks like it could be bounded to 4 peers worst case).  
Ring-structured heartbeat communication looks like it should work out 
very well.
 Nevertheless, I see the attraction.  Not sure it's what I want, but
 since I don't know yet quite what I want - that would be hard to say :-).
 Yep, nothing forcing everyone down the same path.
Got that. I see advantages to having at least some common 
APIs/libraries/interfaces/something.  Cross-pollination of ideas is 
good.  Sharing code and having alternatives is better - if not too 
expensive in code, organizational overhead and emotional energy.

Thanks for taking time to share ideas and educate me,

-- 
 Alan Robertsonal...@unix.sh

Openness is the foundation and preservative of friendship...  Let me claim 
from you at all times your undisguised opinions. - William Wilberforce

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] a scalable membership and LRM proxy proposal

2010-11-15 Thread Alan Robertson
This sounds like a reasonable approach - since it's very similar to one 
I'd been advocating for a number of years ;-)

Liveness being provided directly to the top tier through QMF interfaces 
sounds unlikely to be scalable to the degree I'm looking for.  If a node 
receives 10s of thousands of I'm alive messages per second, that 
sounds wasteful at the least...  But, I haven't read the specs - so 
maybe this is all happily taken care of.

I'll go read some specs.

 Thanks for the info!

 -- Alan Robertson
al...@unix.sh

On 11/11/2010 05:28 AM, Andrew Beekhof wrote:
 Some of your thinking mirrors our own.

 What we're moving towards is indeed two tiers of membership.
 One being a small but fully meshed set of, to use your terminology,
 Overlords running a traditional cluster stack.
 The other being a much larger set of independent nodes or VMs running
 only an lrm-like proxy.

 Members of the second tier have no knowledge of each other's
 existence, nor even of the cluster itself.

 The transport layer we plan on using to talk to these nodes is QMF
 (which implements AMQP).
 QMF has the nice properties of being cross-platform (ie. windows),
 standards based and something that already exists.
 We also know that it is secure, fast, and scales well through federation.
 Happily it also gives us node up/down information for free.

 As Lars mentioned, a Matahari agent (essentially the lrm with a QMF
 interface on top) is intended to act as the proxy.
 He also mentioned container resources, but this was a red herring.
 Whether the entities running Matahari are also guests being managed by
 Pacemaker is irrelevant.  They can equally be physical machines or
 cloud instances.

 The Matahari and QMF pieces are both generically useful components
 with no ties to Pacemaker.
 There will still need to be integration done to hook up the node
 liveliness and add the ability to send resource commands via the QMF
 bus.  What form this work takes will depend on which parts of
 Pacemaker are being used in the overall architecture.

 On Thu, Nov 4, 2010 at 3:48 PM, Alan Robertsonal...@unix.sh  wrote:
 I've been thinking about the idea of very highly scalable membership, and
 also about the LRM proxy function which is currently being performed by
 Pacemaker.  Towards this end I wrote up a high-level design (or
 architecture, or design philosophy or something) for such a scalable
 membership/LRM proxy service.  The design is not specific to working with
 Pacemaker - it could work with Pacemaker, or a number of other kinds of
 management entities.

 The kind of membership outlined here would be (in Pacemaker terms) sort of a
 second-class membership - which has advantages and disadvantages.

 The blog post can be found here:
 http://techthoughts.typepad.com/managing_computers/2010/10/big-clusters-scalable-membership-proposal.html

 Please feel to comment on it on the blog, or on the mailing list.  I've
 reproduced the blog posting below:

 Really Big Clusters: A Scalable membership proposal

 This blog entry is a bit different than previous entries - I'm proposing
 some enhanced capabilities to go with the LRM and friends from the Linux-HA
 project.  I will update this entry on an ongoing basis to match my current
 thinking about this  proposal.

 This post outlines a proposed server liveness (membership) design which is
 intended to scale up to tens of thousands of servers to be managed as an
 entity.

 Scalability depends on a lot of factors - processor overhead, network
 bandwidth, and network load.  A highly scalable system will take all of
 these factors into account.  From the perspective of the server software
 author (like, for example, me), one of the easiest to overlook is network
 load.  Network load depends on a number of factors - number of packets, size
 of the packets, how many switches or routers it has to go through, and how
 many endpoints will receive the packet.  To best accomplish this task, it is
 desirable that the majority of normal traffic be network topology aware.
 To scale up to very large collections of computers, it also necessary that
 as much as possible be monitored as locally as possible.  In addition, since
 switching gear is not optimized for multicast packets, and multicast packets
 consume significant resources when compared to unicast packets, it is
 desirable to avoid using multicast packets during normal operation.

 The Basic Concept - network aware liveness

 Although the LRM in Linux-HA is not network-enabled, it tries to minimize
 monitoring overhead by distributing the task of monitoring resources to the
 machine providing the resource, and only reporting failures upstream.

 To extend this idea of local monitoring into system liveness monitoring, one
 might imagine a standard 48-port network switch with 48 servers attached to
 it.  If one were to choose a server to act as proxy for monitoring the
 servers on that switch, then the other 47 nodes on the switch would

Re: [Linux-ha-dev] a scalable membership and LRM proxy proposal

2010-11-15 Thread Alan Robertson
Hi,

I missed the through federation part.  Sorry...   As a point of 
comparison - the proposal as described on my blog does not require 
federation.  Probably at least as scalable, and it's very probable that 
it's lower latency - and it's pretty much dead certain that it's lower 
traffic on the network.

I assume that QMF is the Qpid Management Framework found here?
 https://cwiki.apache.org/qpid/qpid-management-framework.html

I was hoping for something a little more lightweight - although I 
clearly understand the benefits of it already exists and having some 
credible claims to security as a goal (since nothing is ever secure).

I wonder if you really want that kind of very strongly guaranteed 
message delivery - since messages sent to a node that crashes before 
receiving them are delivered after it comes back up.  But, of course, 
there's always a way to work around things that don't do what you need 
for them to.  Presumably you'd also need to clean up those messages out 
of the queues of all senders if the node is going away permanently - at 
least once you figure that out...  Messages to clients seem to better 
match the semantics of RDS.  Messages back to overlords could use AMQP 
without obvious corresponding issues.

I wonder about latency - particularly when federated - and taking 
garbage collection into account...  I see that QPID claims to be 
extremely fast.  It probably is pretty fast for a large and complex 
Java program.

Nevertheless, I see the attraction.  Not sure it's what I want, but 
since I don't know yet quite what I want - that would be hard to say :-).

 -- Alan Robertson
 al...@unix.sh





On 11/11/2010 05:28 AM, Andrew Beekhof wrote:
 Some of your thinking mirrors our own.

 What we're moving towards is indeed two tiers of membership.
 One being a small but fully meshed set of, to use your terminology,
 Overlords running a traditional cluster stack.
 The other being a much larger set of independent nodes or VMs running
 only an lrm-like proxy.

 Members of the second tier have no knowledge of each other's
 existence, nor even of the cluster itself.

 The transport layer we plan on using to talk to these nodes is QMF
 (which implements AMQP).
 QMF has the nice properties of being cross-platform (ie. windows),
 standards based and something that already exists.
 We also know that it is secure, fast, and scales well through federation.
 Happily it also gives us node up/down information for free.

 As Lars mentioned, a Matahari agent (essentially the lrm with a QMF
 interface on top) is intended to act as the proxy.
 He also mentioned container resources, but this was a red herring.
 Whether the entities running Matahari are also guests being managed by
 Pacemaker is irrelevant.  They can equally be physical machines or
 cloud instances.

 The Matahari and QMF pieces are both generically useful components
 with no ties to Pacemaker.
 There will still need to be integration done to hook up the node
 liveliness and add the ability to send resource commands via the QMF
 bus.  What form this work takes will depend on which parts of
 Pacemaker are being used in the overall architecture.

 On Thu, Nov 4, 2010 at 3:48 PM, Alan Robertsonal...@unix.sh  wrote:
 I've been thinking about the idea of very highly scalable membership, and
 also about the LRM proxy function which is currently being performed by
 Pacemaker.  Towards this end I wrote up a high-level design (or
 architecture, or design philosophy or something) for such a scalable
 membership/LRM proxy service.  The design is not specific to working with
 Pacemaker - it could work with Pacemaker, or a number of other kinds of
 management entities.

 The kind of membership outlined here would be (in Pacemaker terms) sort of a
 second-class membership - which has advantages and disadvantages.

 The blog post can be found here:
 http://techthoughts.typepad.com/managing_computers/2010/10/big-clusters-scalable-membership-proposal.html

 Please feel to comment on it on the blog, or on the mailing list.  I've
 reproduced the blog posting below:

 Really Big Clusters: A Scalable membership proposal

 This blog entry is a bit different than previous entries - I'm proposing
 some enhanced capabilities to go with the LRM and friends from the Linux-HA
 project.  I will update this entry on an ongoing basis to match my current
 thinking about this  proposal.

 This post outlines a proposed server liveness (membership) design which is
 intended to scale up to tens of thousands of servers to be managed as an
 entity.

 Scalability depends on a lot of factors - processor overhead, network
 bandwidth, and network load.  A highly scalable system will take all of
 these factors into account.  From the perspective of the server software
 author (like, for example, me), one of the easiest to overlook is network
 load.  Network load depends on a number of factors - number of packets, size
 of the packets, how many switches or routers

[Linux-ha-dev] a scalable membership and LRM proxy proposal

2010-11-04 Thread Alan Robertson
I've been thinking about the idea of very highly scalable membership, 
and also about the LRM proxy function which is currently being performed 
by Pacemaker.  Towards this end I wrote up a high-level design (or 
architecture, or design philosophy or something) for such a scalable 
membership/LRM proxy service.  The design is not specific to working 
with Pacemaker - it could work with Pacemaker, or a number of other 
kinds of management entities.


The kind of membership outlined here would be (in Pacemaker terms) sort 
of a second-class membership - which has advantages and disadvantages.


The blog post can be found here:
http://techthoughts.typepad.com/managing_computers/2010/10/big-clusters-scalable-membership-proposal.html

Please feel to comment on it on the blog, or on the mailing list.  I've 
reproduced the blog posting below:



 Really Big Clusters: A Scalable membership proposal

This blog entry is a bit different than previous entries - I'm proposing 
some enhanced capabilities to go with the LRM and friends from the 
Linux-HA project.  I will update this entry on an ongoing basis to match 
my current thinking about this  proposal.


This post outlines a proposed server liveness (membership) design 
which is intended to scale up to tens of thousands of servers to be 
managed as an entity.


Scalability depends on a lot of factors - processor overhead, network 
bandwidth, and network load.  A highly scalable system will take all of 
these factors into account.  From the perspective of the server software 
author (like, for example, me), one of the easiest to overlook is 
network load.  Network load depends on a number of factors - number of 
packets, size of the packets, how many switches or routers it has to go 
through, and how many endpoints will receive the packet.  To best 
accomplish this task, it is desirable that the majority of normal 
traffic be network topology aware.  To scale up to very large 
collections of computers, it also necessary that as much as possible be 
monitored as locally as possible.  In addition, since switching gear is 
not optimized for multicast packets, and multicast packets consume 
significant resources when compared to unicast packets, it is desirable 
to avoid using multicast packets during normal operation.


*The Basic Concept - network aware liveness*

Although the LRM in Linux-HA is not network-enabled, it tries to 
minimize monitoring overhead by distributing the task of monitoring 
resources to the machine providing the resource, and only reporting 
failures upstream.


To extend this idea of local monitoring into system liveness monitoring, 
one might imagine a standard 48-port network switch with 48 servers 
attached to it.  If one were to choose a server to act as proxy for 
monitoring the servers on that switch, then the other 47 nodes on the 
switch would send that unicast heartbeats to that node.  That node in 
turn would report on failure-to-receive-heartbeat events from the other 
47 nodes on the switch.


To ensure detection of failures of the monitoring node, that node could 
send its heartbeat upstream to a process monitoring it.  In order to 
ensure continual service, it may be desirable to have two monitoring 
nodes per switch, with each one also monitoring the other.  This results 
in a 24-to-one reduction in traffic going off the switch, and a 
corresponding decrease in workload to monitor these 48 servers.


If one were to implement this in the context of a cloud-like 
infrastructure, or a monitoring environment it is likely to be desirable 
to also run the LRM (or something like it) on these same machines so one 
can perform more detailed monitoring and/or effect changes to the 
managed servers.


*Some Tentative Terminology*

In order to facilitate discussion, I'll use the following terms (which 
are, of course, subject to change):


   * *Overlord* - the policy-aware management entity at the top of the
 food chain.  The Overlord layer might be a cloud management layer,
 it might be a System management package like IBM director or
 Nagios, or a high-performance scientific cluster management
 entity, or some other entity interested in managing large numbers
 of computers.
   * *Minion* - a monitoring node which is listening to heartbeats from
 other nodes and reporting failure-to-receive-heartbeat events to
 an Overlord.  The minions at the lowest level in the system are
 only part-time minions - with the rest of their resources being
 spent on performing other tasks.  Dedicated minion servers should
 be capable of monitoring thousands (or tens of thousands) of other
 minions.  It is anticipated that the minion function would be
 performed by some software which also acts as a network proxy for
 an LRM client.
   * *Slave* - a node which is sending heartbeats to a Minion.  Most of
 these servers will spend most of their resources performing tasks
 unrelated to liveness or 

[Linux-ha-dev] Linux-HA server(s) down for an hour on 23 March 2010

2010-03-22 Thread Alan Robertson
 From noon until 1 PM US Mountain Daylight time (1800-1900 UTC) on
23 March 2010, the servers supporting the Linux-HA web site and 
Mercurial source control will be down for server migration.

This is not expected to take more than an hour.

Thanks go the good folks at tummy.com for continuing to provide
and maintain servers for Linux-HA!

Sorry for the inconvenience,


-- 
 Alan Robertson al...@unix.sh

Openness is the foundation and preservative of friendship...  Let me 
claim from you at all times your undisguised opinions. - William 
Wilberforce
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] Linux-HA Leadership Announcement

2008-06-24 Thread Alan Robertson

After more than 10 years as the Linux-HA project leader,  I've decided to
create a new leadership structure.

One of my original success criteria for the project was that it eventually
would not need me.   In the last few years, it has seemed more and more
likely that we'd reached this plateau of success - and the time has come to
put that supposition to the test.

Effective today, I am appointing a team of three people to lead and govern
the project going forward.  These three outstanding people have proved
themselves key contributors to the project, and are ready and willing to
take over the reins of leadership - and lead the project into the future.

These people are: Keisuke MORI [EMAIL PROTECTED]
   Dave Blaschke [EMAIL PROTECTED]
   Lars Marowsky-Bree [EMAIL PROTECTED]

As for me, my current assignment in IBM doesn't permit me to spend full
time on the project, but I will continue to promote and contribute to
the project as time permits.  Should future circumstances permit it, I
expect that I will increase my efforts the project again.

Congratulations to Mori-san, Dave and Lars!   They're working out their new
roles, scheduling releases, and so on.   Expect to hear from them soon!

   -- Alan Robertson
  [EMAIL PROTECTED]
  Linux-HA founder, Linux-HA project leader emeritus
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] Announcing! Release 2.1.3 of Linux-HA is now available!

2007-12-21 Thread Alan Robertson
 files to be verified
  + RA: apache - make status quieter
  + [RA] eDir88: include the stop option
  + OSDL bug 1666: in BSC, make sure temp rsc dir exists for RAs
  + Contrib: dopd - Fix usage of crm_log_init() by code that shouldn't 
be using it

  + Tools: ocf-tester - use the default value for OCF_ROOT if it exists
  + RA: IPaddr2 - Make the check for the modprobe/iptables utilities 
conditional on the IP being cloned
  + CRM: Update crm/cib feature sets and the set of tags/attributes 
used for feature set detection
  + crmd: Simplify the detection of active actions and resources at 
shutdown

  + PE: Use failcount to handle failed stops and starts
  + TE: Set failcount to INFINITY for resources that fail to start or stop
  + CRM: Remove debug code that should not have been committed
  + PE: Add regression test for previous commit
  + PE: Regression: Allow M/S resources to be promoted based solely on 
rsc_location constraints
  + PE: Fix up the tests now that compare_version() functions correctly 
(as of cs: 7d69ef94a258)
  + CRM: Fix compare_version() to actually work correctly on a regular 
basis

  + PE: Update testcases to include all_stopped (added in cs: 800c2fec24ee)
  + crmd: Bug 1655 - crmd can't exit when the PE or TE is killed from 
underneath it

  + Tools: Bug 1653 - Misc attrd/attrd_updater cleanups
  + Tools: Bug 1653 - Further changes to prevent use of NULL when no 
attribute is specified
  + CRM: Make logging setup consistent and do not log the command-line 
to stderr

  + RA: Delay (v1) - Remove extra characters from call to ra_execocf
  + Tools: Bug 1653 - attrd crashes when no attribute is specified
  + OCF: Provide the location of /sbin as used by some agents (HA_SBIN_DIR)
  + PE: Move the creation of stonith shutdown constraints to 
native_internal_constraints()
  + crmd: Only remap monitor operation status to LRM_OP_DONE under the 
correct conditions

  + PE: Handle two new actions in text2task
  + CTS: Give stonith devices penty of time to start
  + PE: Include description for the remove-after-stop option
  + PE: Streamline STONITH ordering. Make sure 'all_stopped' depends on 
all STONITH ops.
  + PE: Aggregate the startup of fencing resources into a stonith_up 
pseudo-action

  + PE: STONITH Shutdown ordering
  + Bugzilla 1657: Speed up BasicSanityCheck and also make logging 
inheritance more uniform.
  + OSDL 1449 / Novell 291749: GUI should not overwrite more specific 
settings of contained resources.

  + Remove autoconf and friends on make distclean

--
PS:  Special thanks to Dejan for making up this change log - it's an 
annoying and thankless task.


--
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me
claim from you at all times your undisguised opinions. - William
Wilberforce


___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Problem with heartbeat gui

2007-12-16 Thread Alan Robertson
Dejan Muhamedagic wrote:
 Hi,
 
 On Thu, Dec 13, 2007 at 10:27:53AM +0100, Fernando Iglesias wrote:
 Hi all,

 I've a little problem with heartbeat gui, i'll try introduce it. I've got a
 two nodes cluster and I should be able to connect using GUI installed in a
 third machine ( admin cluster machine), I set the login parameters (I've
 checked they're right ) and try to connect but I've no response, I only have
 one Updating data from server and  one No data available messages. After
 a few hours I've no changes.

 Any guess about this problem?
 
 Not unless you post logs and version information.

Also, there is a 10-minute screencast video on using the GUI which gives
a variety of tips on authorization and so on.  You might look at it.

http://linux-ha.org/Education/Newbie/IPaddrScreencast


-- 
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me
claim from you at all times your undisguised opinions. - William
Wilberforce
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: AW: [Linux-ha-dev] Call for testers: 2.1.3

2007-12-14 Thread Alan Robertson
Spindler Michael wrote:
 Hi,
 
 This problem has been solved. My packaging box didn't have all 
 necessary packages for building GUI rpm. When I added 
 them it was 
 able to build haclinet (GUI) and that find-lang.sh tool 
 worked fine.
 I didn't find the problem with pegasus on my CentOS 5.0 
 but I have 
 32 bit version, and the problem was reported for 64 bit.

 OK.

 So, this step should only be included if --enable-mgmt, I guess?

 Right. It establish language settings for the GUI, so it's 
 not needed 
 if GUI isn't needed.
 We are trying to build it on RedHat(Red Hat Enterprise Linux 
 ES release 4 (Nahant Update 4)), and a problem remains before us.
 Please check Mori-san's patch again.
 http://developerbugs.linux-foundation.org//attachment.cgi?id=1109

 -if test x${CIMOM} = x; then
 -if test x${CIMOM} = x; then 
 -AC_CHECK_PROG([CIMOM], [cimserver], [pegasus])
 +if test x${enable_cim_provider} = xyes; then   # 
 maybe, here #
 +if test x${CIMOM} = x; then
 +if test x${CIMOM} = x; then

 I attached the configure.log

 
 fyi: I was able to build the rpms on RedHat AS 4 without any problems.

There was two bugs in the configure stuff:
1) It got the package name for pegasus wrong for Red Hat
2) It didn't work if you had pegasus installed but didn't
enable the CIM provider.


-- 
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me
claim from you at all times your undisguised opinions. - William
Wilberforce
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] build RPM without pegasus on RHEL4

2007-12-12 Thread Alan Robertson

Junko IKEDA wrote:

Hi,

I got the latest dev including the fix about RPM,
http://hg.linux-ha.org/dev/rev/7e3c4ea27853

but I'm still facing the following error.

error: Macro %CMPI_PROVIDER_DIR has empty body
error: Failed build dependencies:
pegasus is needed by heartbeat-2.1.3-1.x86_64
gmake: *** [rpm] Error 1

redha-release;
Red Hat Enterprise Linux ES release 4 (Nahant Update 4)

Kernel-release;
2.6.9-42.ELsmp

I wonder why RedHat AS 4 is OK?
There is something about tog-pegasus, but no just pegasus...

# rpm -qa | grep pegasus
tog-pegasus-devel-2.5.1-2.EL4
tog-pegasus-2.5.1-2.EL4


OK.

I now understand the failed dependency issue, but not yet the empty 
macro body.  You should have some messages in your build output like this:

  CIM server   = ${CIMOM}])
  CIM providers dir= ${CMPI_PROVIDER_DIR}]
  CMPI header files= ${CMPI_HEADER_PATH}]

Can you tell me what these messages from your build say?  Better yet, I 
just committed the package name fix to 'dev', why don't you update and 
send me your whole build output - to my email address?


If you want to do an egrep for 'CIM|CMPI' in your build output and post 
that to the list, that would also be a reasonable thought.  But, still 
please send the complete build output to my email address.


--
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me 
claim from you at all times your undisguised opinions. - William 
Wilberforce

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Call for testers: 2.1.3

2007-12-08 Thread Alan Robertson

Known Problem in 2.1.3:
The STONITHd test seems to fail if fencing is enabled.  I suspect this 
of being a testing quirk rather than a new problem.  I'm working it.



--
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me 
claim from you at all times your undisguised opinions. - William 
Wilberforce

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [mgmt]Rewriting order and colocation configurations

2007-12-08 Thread Alan Robertson

Yan Gao wrote:

I'm rewriting the order and colocation configurations of mgmt. Following
features will be implemented:

1. Get the crm.dtd file from server end.

2. Dynamically adding gtk widgets for attributes of a type of elment
completely according to the dtd definition.

3. The Added widgets are different for CDATA or enum type of attributes
to ensure the inputted values will be legal for dtd.

4. Marking out the default values.

5. The widgets for required or optional attributes will be put into
different tables.

6. Dynamically generating appropriate description for current setting.


Hence all information in dtd can be exploited. In other words, it'll
provide full features.
I've been thinking of building up a general model for kinds of elements.
And I'll try to unify the style of adding objects and viewing objects.

The following are some screenshots. Any comments will be appreciated.


I didn't look at your screenshots, but the ideas sound wonderful.  Also, 
you might look at using crmlint to validate the CIB you generate.  There 
is also information in ciblint which should definitely be of value to you.


I'm kind of focused on the release at the moment, so if you would also 
CC me directly, if you want to discuss this, that would help me.  Thanks!


--
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me 
claim from you at all times your undisguised opinions. - William 
Wilberforce

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [mgmt]Rewriting order and colocation configurations

2007-12-08 Thread Alan Robertson

Yan Gao wrote:

On Sat, 2007-12-08 at 07:54 -0700, Alan Robertson wrote:

Yan Gao wrote:

I'm rewriting the order and colocation configurations of mgmt. Following
features will be implemented:

1. Get the crm.dtd file from server end.

2. Dynamically adding gtk widgets for attributes of a type of elment
completely according to the dtd definition.

3. The Added widgets are different for CDATA or enum type of attributes
to ensure the inputted values will be legal for dtd.

4. Marking out the default values.

5. The widgets for required or optional attributes will be put into
different tables.

6. Dynamically generating appropriate description for current setting.


Hence all information in dtd can be exploited. In other words, it'll
provide full features.
I've been thinking of building up a general model for kinds of elements.
And I'll try to unify the style of adding objects and viewing objects.

The following are some screenshots. Any comments will be appreciated.

I didn't look at your screenshots, but the ideas sound wonderful.

Sorry, I added the sreenshots in attachments.

  Also, 
you might look at using crmlint to validate the CIB you generate.  There 
is also information in ciblint which should definitely be of value to you.


I'm kind of focused on the release at the moment, so if you would also 
CC me directly, if you want to discuss this, that would help me.  Thanks!


Thanks! It's a good tool. By now, haclient doesn't generate a xml file.
In the GUI, orignal comboboxes will be added for the attributes of enum
type. It'll just allow users to select a valid value from a list and
prevent them to input a invalid value. Any other validation checking
hasn't been implemented yet.


I have extended the metadata to explicitly include enumeration values. 
I think this would help a lot for the kinds of validations I think 
you're doing.


--
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me 
claim from you at all times your undisguised opinions. - William 
Wilberforce

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Call for testers: 2.1.3

2007-12-08 Thread Alan Robertson

Alan Robertson wrote:

Known Problem in 2.1.3:
The STONITHd test seems to fail if fencing is enabled.  I suspect this 
of being a testing quirk rather than a new problem.  I'm working it.


Now fixed in 'dev' and 'test'.


--
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me 
claim from you at all times your undisguised opinions. - William 
Wilberforce

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [mgmt]Rewriting order and colocation configurations

2007-12-08 Thread Alan Robertson

Lars Marowsky-Bree wrote:

On 2007-12-09T02:50:52, Yan Gao [EMAIL PROTECTED] wrote:


Thanks! It's a good tool. By now, haclient doesn't generate a xml file.

Ideally, haclient should generate a valid xml, and then transfer to
mgmtd. Xinwei and I think that the current protocol is too complicated
and has many limitations. We want to simplify the protocol and improve
the applicability so that it's convenient to implement full features
according to the dtd.


Right; the client very likely should directly talk to the CIB daemon.
(Which already supports this; we may need a way to apply ACLs in the
future though.)

You're preaching to the choir ;-)

I have extended the metadata to explicitly include enumeration values. 
I think this would help a lot for the kinds of validations I think 
you're doing.

In my implementation,I've adopted the enumeration values specified in
the dtd to be used for the list of combobox options.


It is quite possible that a DTD is not powerful enough to adequately
describe the syntax and semantics of the CIB. The DTD is, simply put,
the oldest and least complex schema format for XML, and happened to be
what I knew when I conjured up the original one ;-)

XML Schema, Relax NG (or others I know even less about) may be more
appropriate standards to describe the CIB as we have it today, and as it
evolves further.

This may be preferable to needing to duplicate this in home-grown
fashion. A lint-like tool is still a good idea, but it should be build
on top of this, IMHO.


/me redirects this whining into /dev/null

Feel free to investigate the best tools, define the DTD-replacement, and 
get Andrew to adopt it - incorporating it into crm_verify.  Show me the 
code is the Linux way after all...


If it has python support, likely all I'll have to do to use it is import 
a few more python classes at the top, and add a half-dozen lines to 
ciblint to read in the DTD-replacement, and call its validation function.


When all is said and done, it won't eliminate more than a few hundred 
lines of code in ciblint (which is now more than 2K lines and still 
growing).


--
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me 
claim from you at all times your undisguised opinions. - William 
Wilberforce

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] Request for feedback: Checking CIBs for errors...

2007-12-07 Thread Alan Robertson

Hi,

One of the problems I've run into both personally and working with 
customers is finding errors in CIBs.  I see this kind of thing a good 
bit on the mailing list too.


So, to help with this and maybe lighten the burden on the mailing list 
and Andrew and Dejan and Lars, I wrote a command which checks for errors 
in CIB files - which will be included in 2.1.3 when that comes out.


It's called ciblint, and you can read some more about it, and see where 
to get a copy to try out here:

http://linux-ha.org/ciblint

It is _not_ a finished product yet, but even as a work-in-progress, it 
does a number of interesting things.  It is intentionally picky - and I 
_intend_ for it to be picky in the right ways - but of course there are 
no doubt errors in it.  If there is an old way of doing something, and 
and a new-and-more-correct way, I intend for it to insist on the new 
way.  So, some things it complains about may be perfectly acceptable to 
the CRM, but are not preferred.  I'm pretty sure that some of the things 
the GUI does will fall into that category.


It can also provide you a good bit of information about the legal values 
to put in various places (-l and -A options).


Although I've learned more about the CIB while writing this script, I'm 
know I still have more to learn.


I'm looking for constructive feedback on it.  Here are a few specific 
kinds of feedback that would be especially helpful:


- Did it find anything useful for you?

- Do you think it's incorrect (not just pedantic) in some cases?

- Do you have any suggestions for errors you've made or seen
that you think it should catch?

- What corrections do you have for any of the explanatory text
  -- in particular from the -A option?

- Any other constructive suggestions would be welcome.

- Comments about how stupid I am for having something wrong
or what an incredibly stupid idea this is
will be cheerfully redirected to /dev/null

Sample CIBs for these various kinds of feedback would be especially 
appreciated.


It's a python script, and my current thinking for a todo list is in the 
text of the script.


It currently does a sudo to run lrmadmin to grab some metadata from the 
LRM.  That will eventually change.  [lrmadmin shouldn't require you to 
be root for the things I'm asking it to do]


--
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me 
claim from you at all times your undisguised opinions. - William 
Wilberforce

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] ANNOUNCE: Project Organization - CRM to become its own project

2007-12-07 Thread Alan Robertson
Andrew's contributions to the Linux-HA community will be missed.  I am 
sad that he has unilaterally decided to leave Linux-HA and fork his code 
into in a separate project.


I have suspected that this was coming for a number of months, but as you 
probably have guessed, Andrew won't reply to emails I send him, or 
answer the phone when I call - except when I hide my caller id.  I wish 
I had known how to fix that without him feeling he needed to leave the 
project.


I'm not sure what this means yet.

It may mean that we're in for a time of difficult coordination that I 
find hard to imagine working - because the need for coordination with a 
separate project will be higher than if it were in the same project - 
and communication and coordination was the problem in the first place.


D-:

Or it may mean that we'll be looking for someone to pick up maintaining 
the CRM.


D-:

I really do not know.  But if anyone is interested in picking up the 
CRM, do let me know.


In any case, for the convenience of the project, I expect that we'll be 
mirroring his work from his new project on our Mercurial repository - at 
least until we get this figured out.  I'll let you know when that's set up.


Or maybe Andrew leaving and going his own way will work out for the best 
and things will be better than I could possibly imagine.  Anyone who has 
ideas on how this can be made to happen, please email me.



--
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me 
claim from you at all times your undisguised opinions. - William 
Wilberforce

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] stonith plugin through HMC web interface

2007-12-07 Thread Alan Robertson

Xinwei Hu wrote:

Hi all,
  The ibmhmc stonith plugin doesn't work with the web interface of HMC.

  The attachment is a workable version of  stonith plugin through HMC
web interface. It depends on curl and /bin/sh.

  It'll be great if someone can help to review and include it upstream then.


Why don't you want to use the current HMC interface?


--
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me 
claim from you at all times your undisguised opinions. - William 
Wilberforce

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] errors with 76c25be5c854

2007-12-06 Thread Alan Robertson

Tadashiro Yoshida wrote:

Hi,

We detected some errors in the ComponentFail test while running CTS with a dev 
version. It might be a CTS's problem with message handling.
Please check it if it happens something wrong.

Dev version: 76c25be5c854
# python CTSlab.py -v2 -r -c
  --facility local7 -L /var/log/ha-log-local7 500 21 | tee cts.log

---
Nov 26 19:22:09 x3650a CTS: Running test ComponentFail (x3650b) [16]
Nov 26 19:22:10 x3650b heartbeat: [27967]: WARN:
Managed /usr/lib64/heartbeat/stonithd process 27980
killed by signal 9 [SIGKILL - Kill, unblockable].
Nov 26 19:22:10 x3650b heartbeat: [27967]: ERROR:
Respawning client /usr/lib64/heartbeat/stonithd:
Nov 26 19:22:10 x3650b heartbeat: [27967]: info:
Starting child client /usr/lib64/heartbeat/stonithd(0,0)
Nov 26 19:22:10 x3650b stonithd: [30753]: notice:
/usr/lib64/heartbeat/stonithd start up successfully.
   :
Nov 26 19:32:41 x3650a CTS: Patterns not found:
['x3650c crmd:.*LOST:.* x3650b ',
 'Updating node state to member for x3650b']
Nov 26 19:32:41 x3650a CTS: Test ComponentFail failed
[reason:Didn't find all expected patterns]
Nov 26 19:32:41 x3650a CTS: Test ComponentFail (x3650b) [FAILED]
---


I think this was a pattern problem in the messages-to-ignore, which I 
believe is now fixed.

http://hg.linux-ha.org/dev/rev/e4a4c6fd5649



Besides, it seams there are some failures in the stonithd testing, although the final message says it was succeeded. 


---
Nov 26 19:54:11 x3650a CTS: BadNews: Nov 26 19:46:26 x3650b stonithd: [26162]:
  CRIT: command ssh -q -x -n -l root x3650c echo 'sleep 2;
  /sbin/reboot -nf' | SHELL=/bin/sh at now /dev/null 21 failed
Nov 26 19:54:11 x3650a CTS: BadNews: Nov 26 19:46:53 x3650b stonithd: [23258]: 
  ERROR: Failed to STONITH the node x3650c: optype=RESET,

  op_result=TIMEOUT
Nov 26 19:54:11 x3650a CTS: BadNews: Nov 26 19:46:53 x3650b tengine: [26116]:
  ERROR: tengine_stonith_callback: Stonith of x3650c failed (2)...
  aborting transition.
---


I need to change my testing setup to look at this.  I'd heard a rumor 
that this was happening, but it wasn't happening to me, and no bugzilla 
was filed.  But, I'm pretty sure it's an indication of a fault in the 
stonith ssh module.


I changed the code to fail-fast, which is vastly safer when you don't 
have STONITH available, and not harmful when you have real STONITH. 
However, if the ssh STONITH module can't connect to the machine it will 
show a failure like this one.  So, I think the thing to do is to figure 
out how to report success in this case - in the testing STONITH module.



--
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me 
claim from you at all times your undisguised opinions. - William 
Wilberforce

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Call for testers: 2.1.3

2007-12-06 Thread Alan Robertson

Serge Dubrouski wrote:

It also build on FC6 but not on CentOS.



Whiinnne...

/me straightens up.

Thanks for the info!



--
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me 
claim from you at all times your undisguised opinions. - William 
Wilberforce

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Call for testers: 2.1.3

2007-12-06 Thread Alan Robertson

Serge Dubrouski wrote:

This problem has been solved. My packaging box didn't have all
necessary packages for building GUI rpm. When I added them it was able
to build haclinet (GUI) and that find-lang.sh tool worked fine.

I didn't find the problem with pegasus on my CentOS 5.0 but I have 32
bit version, and the problem was reported for 64 bit.



OK.

So, this step should only be included if --enable-mgmt, I guess?



--
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me 
claim from you at all times your undisguised opinions. - William 
Wilberforce

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Heartbeat - Dev: changeset 11642:54085bc025ce

2007-12-06 Thread Alan Robertson

Andrew Beekhof wrote:

http://hg.linux-ha.org/dev/rev/54085bc025ce


This was (at least) caused by a bug in the ssh plugin.

uhhh... no.
the plugin behaved correctly - it's _supposed_ to report failure when it 
can't complete the stonith operation.


risk: near-zero - changes were not made to any production code

again, no.

you know full well that people use the ssh agents and that the change is 
incredibly dangerous for those people.


if you want to make these cts-specific hacks, please create a new agent 
called external/cts or perhaps external/broken and do them there.

anything else is just irresponsible.


What I know full well that I have never wavered in strongly advising 
against using the ssh plugin in production.


The SSH plugin was written specifically for CTS - nothing else.  It was 
written because my machines kept blowing out power supplies, etc from 
being stonithed with a real power switch in CTS thousands of time.  It 
has always been documented as a test tool ONLY.  At one time Lars and I 
discussed leaving it out of what's shipped in the plugin library but it 
made life too messy, so we left it in, and documented it as 
not-for-production.


This has been discussed dozens of times over the last 6 or 7 years, and 
the recommendation every time it's come up has been to never use it in 
production.


Also note, that this is NOT the ssh plugin, but the external/ssh 
plugin.  The ssh plugin is unchanged.  The external/ssh plugin was 
written to exercise the new external stonith module, and comes with 
the same caveat:  Never use it in production.


From what your strong reaction to this change, I'm guessing that you 
might have advised some people to use it in production...


I stand by my recommendation that it never be used in production, but 
given that what seems to be implied about your recommendations, I can 
make that last set of changes optional based on a parameter to the RA, 
which we can then supply in CTS.

livedangerously=yes

No point in having three stonith agents that do the same thing - we 
already have two.


These changes are in changeset 11643:35a4edc666b8, which has now been 
pushed into 'dev'.


--
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me 
claim from you at all times your undisguised opinions. - William 
Wilberforce

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] Re: Heartbeat - Dev: changeset 11628:e4a4c6fd5649

2007-12-04 Thread Alan Robertson

Andrew Beekhof wrote:
this commit is wrong - only the children indicated in the process 
definition are allowed to die

please revert this change asap

http://hg.linux-ha.org/dev/rev/e4a4c6fd5649


Well... That's not what happens in reality, and as far as I can tell 
it's expected.


When one of your processes dies, it creates a cascading chain of other 
dying processes which are connected to it via IPC, which die when it 
dies.  As a result, when something important like the CIB dies, 
virtually any/every one of your processes can die as a result.  Which 
one(s) die before the node suicides depend on the timing.


The key causative factors of this are:
Your processes don't suicide directly.
It appears that file descriptor notification
pretty often happens before death-of-child
signals

So, a process (let's say the CIB) dies, and then one or more of
its many local peers (CRM, pengine, attrd, tengine,
etc.) discovers that it has disconnected.  It in turn
dies, and depending on the relative timing of when
the log message gets sent out or the suicide occurs,
the log messages may be received by the remote logging
daemon - or not.

What have I missed here?

--
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me 
claim from you at all times your undisguised opinions. - William 
Wilberforce

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] Re: Commit messages

2007-11-06 Thread Alan Robertson

Lars Marowsky-Bree wrote:

Hi,

how about commit messages which have some resemblance to what the change
actually is about - preferably from a user's point of view, but I'd even
take a developer PoV, but with

bug impact: major (if you use cl_respawn), risk: low-to-moderate LF bug
1706 (finishing up associated issues)

not even _I_ can figure out what it is about and supposed to fix w/o
going to bugzilla.

My suggestion would be to have a short, concise summary in the first
line (user's point of view) and explain any implementation details worth
mentioning in the body of the commit (developer's point of view).

The severity and risk assessment in the summary also seems to be
counter-productive and uses up most of the conciseness of the summary
already, and we can't seem to decide between impact/severity/risk/bug
impact etc.

This _is_ somewhat annoying to parse.

I know! Maybe we should have a policy and stick to it? ;-)


I know that others might disagree with this - but trying to squeeze 
everything that is already in a bugzilla entry into a commit message is 
 going to result in these kinds of complaints.  And, not to mention a 
lot of duplication of effort.


There is so _much_ we potentially want to know about a change - and a 
good bit of it is already in bugzillas - for those where we create 
bugzillas.  And, you could argue that the rest could or should go into 
the bugzilla database - especially since bugzilla is customizable (or so 
I understand it).


There is no query facility for commit messages - and it would be very 
difficult to write one.


If I want to ask the question about what all bugs we've fixed in a 
certain area, I can't figure that out from commit messages without 
reading them all.


Putting this kind of info into a database is an obvious well, duh kind 
of answer.  And, bugzilla is designed for just this kind of thing, and 
we do use it for quite a few changes.


If we added the risk field to bugzilla - it already has all the rest of 
the info in the database.


It would not be terribly complicated to write a tool to pick out the bug 
number from the commit message and bring up the corresponding bugzilla. 
 If we pasted in the URL into the bugzilla, it would be even easier.


If you look in the real world outside open source development, it's 
the norm to tie source control and bug tracking together for lots of 
good reasons.


--
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me 
claim from you at all times your undisguised opinions. - William 
Wilberforce

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] cl_log dropping messages

2007-11-06 Thread Alan Robertson

Lars Marowsky-Bree wrote:

On 2007-10-25T16:25:30, Lars Marowsky-Bree [EMAIL PROTECTED] wrote:


http://hg.linux-ha.org/dev/rev/69f0395c2ead seems to fix some of this
for me.


BTW, I was able to conclude a 100 cycle run with that patch applied on 7
nodes, and absolutely not a single BadNews, which is a first.



Cutting out that debug should be OK - or raising it to happen if debug 
is  1 would probably also be OK.  If you're seeing this happen a lot, 
that's not a good thing.  Getting behind 200 messages seems like a lot 
to me - off hand.


Are you also seeing retransmissions?

Just because you have a lot of processors doesn't mean that Xen is 
scheduling you properly.  You have two different schedulers going on 
here - so the opportunities for problems go up rather rapidly.


--
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me 
claim from you at all times your undisguised opinions. - William 
Wilberforce

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] Recovering from unexpected bad things - is STONITH the answer?

2007-11-06 Thread Alan Robertson
We now have the ComponentFail test in CTS.  Thanks Lars for getting it 
going!


And, in the process, it's showing up some kinds of problems that we 
hadn't been looking for before.  A couple examples of such problems can 
be found here:


http://old.linux-foundation.org/developer_bugzilla/show_bug.cgi?id=1762
http://old.linux-foundation.org/developer_bugzilla/show_bug.cgi?id=1732

The question that comes up is this:

For problems that should never happen like death of one of our 
core/key processes, is an immediate reboot of the machine the right 
recovery technique?


The advantages of such a choice include:
 It is fast
 It will invoke recovery paths that we exercise a lot in testing
 It is MUCH simpler than trying to recover from all these cases,
therefore almost certainly more reliable

The disadvantages of such a choice include:
 It is crude, and very annoying
 It probably shouldn't be invoked for single-node clusters (?)
 It could be criticized as being lazy
 It shouldn't be invoked if there is another simple and correct method
 Continual rebooting becomes a possibility...

We do not have a policy of doing this throughout the project, what we 
have is a few places where we do it.


I propose that we should consider making a uniform policy decision for 
the project - and specifically decide to use ungraceful reboots as our 
recovery method for key processes dying (for example: CCM, heartbeat, 
CIB, CRM).  It should work for those cases where people don't configure 
in watchdogs or explicitly define any STONITH devices, and also 
independently of quorum policies - because AFAIK it seems like the right 
choice, there's no technical reason not to do so.


My inclination is to think that this is a good approach to take for 
problems that in our best-guess judgment shouldn't happen.



I'm bringing this to both lists, so that we can hear comments both from
developers and users.


Comments please...

--
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me 
claim from you at all times your undisguised opinions. - William 
Wilberforce

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] Re: [Linux-HA] Recovering from unexpected bad things - is STONITH the answer?

2007-11-06 Thread Alan Robertson

Kevin Tomlinson wrote:

On Tue, 2007-11-06 at 10:25 -0700, Alan Robertson wrote:

We now have the ComponentFail test in CTS.  Thanks Lars for getting it 
going!


And, in the process, it's showing up some kinds of problems that we 
hadn't been looking for before.  A couple examples of such problems can 
be found here:


http://old.linux-foundation.org/developer_bugzilla/show_bug.cgi?id=1762
http://old.linux-foundation.org/developer_bugzilla/show_bug.cgi?id=1732

The question that comes up is this:

For problems that should never happen like death of one of our 
core/key processes, is an immediate reboot of the machine the right 
recovery technique?


The advantages of such a choice include:
  It is fast
  It will invoke recovery paths that we exercise a lot in testing
  It is MUCH simpler than trying to recover from all these cases,
therefore almost certainly more reliable

The disadvantages of such a choice include:
  It is crude, and very annoying
  It probably shouldn't be invoked for single-node clusters (?)
  It could be criticized as being lazy
  It shouldn't be invoked if there is another simple and correct method
  Continual rebooting becomes a possibility...

We do not have a policy of doing this throughout the project, what we 
have is a few places where we do it.


I propose that we should consider making a uniform policy decision for 
the project - and specifically decide to use ungraceful reboots as our 
recovery method for key processes dying (for example: CCM, heartbeat, 
CIB, CRM).  It should work for those cases where people don't configure 
in watchdogs or explicitly define any STONITH devices, and also 
independently of quorum policies - because AFAIK it seems like the right 
choice, there's no technical reason not to do so.


My inclination is to think that this is a good approach to take for 
problems that in our best-guess judgment shouldn't happen.



I'm bringing this to both lists, so that we can hear comments both from
developers and users.


Comments please...




I would say the right thing would depend on your cluster
implementation and what is consider the right thing to do for the
applications that the cluster is monitoring.
I would propose that this action should be administrator configurable.

From a user point of view with the cluster that we are implementing we

would expect any cluster failure (internal) to either get itself back
and running or just send out an alert Help me. im not working... as we
would want our applications to continue running on the nodes. ** We dont
want a service outage just because the cluster is no longer monitoring
our applications. **
We would expect to get a 24x7 call out. Sev1 and then logon to the
cluster and see what was happening. (configured alerting)
Our applications only want a service outage if the node itself has
issues not the Cluster..


Here's the issue:

The solution as I see it is to do one of:

a) reboot the node and clear the problem with certainty

b) continue on and risk damaging your disks.

c) write some new code to recover from specific cases more
   gracefully and then test it thoroughly.

d) Try and figure out how to propagate the failure to the
top layer of the cluster, and hope you get the notice
there soon enough so that it can freeze the cluster
before the code reacts to the apparent failure
and begins to try and recover from it.

In the current code, sometimes you'll get behavior (a) and sometimes 
you'll get behavior (b) and sometimes you'll get behavior (c).


In the particular case described by bug 1762, failure to reboot the node 
did indeed start the same resource twice.  In a cluster where you have 
shared disk (like yours for example), that would probably trash the 
filesystem.  Not a good plan unless you're tired of your current job 
;-).  I'd like to take most/all of the cases where you might get 
behavior (b) and cause them to use behavior (a).


If writing correct code and testing it were free, then (c) would 
obviously be the right choice.


Quite honestly, I don't know how to do (d) in a reliable way at all. 
It's much more difficult than it sounds.  Among other reasons, it relies 
on the components you're telling to freeze things to work correctly. 
Since resource freezes happen at the top level of the system, and the 
top layers need all the layers under them to work correctly, getting 
this right seems to be the kind of approach you could make into your 
life's work - and still never get it right.


Case (c) has to be handled on a case by case basis, where you write and 
test the code for a particular failure case.  IMHO the only feasible 
_general_ answer is (a).


There are an infinite number of things that can go wrong.  So, having a 
reliable and general strategy to deal with the WTF's of the world is a 
good thing.  Of course, for those cases where we have a (c) behavior 
would

Re: [Linux-ha-dev] bug in failcount handling?

2007-10-30 Thread Alan Robertson

Lars Marowsky-Bree wrote:

On 2007-10-29T19:44:48, Alan Robertson [EMAIL PROTECTED] wrote:

Off hand, this sounds a bit like a bug to me.  I've attached the relevant 
files - the output of cibadmin -Q, a spreadsheet with the output of the 
various ptest runs, and the logs from both machines in the clusters.


If it's a bug, please file a bugzilla, but most importantly, include the
/var/lib/heartbeat/pengine/ files for the relevant transitions.


Andrew already replied to this, but for some reason - he sent the reply 
to the other mailing list.



--
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me 
claim from you at all times your undisguised opinions. - William 
Wilberforce

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] RFC: pkg/ and port/ directory location

2007-10-23 Thread Alan Robertson
David Lee wrote:
 On Mon, 22 Oct 2007, Lars Marowsky-Bree wrote:
 
 On 2007-10-18T13:07:45, Andrew Beekhof [EMAIL PROTECTED] wrote:

 Quick question, does anyone know if the pkg and port directories need
 to live in their current location?

 If not, I'm considering moving them to contrib/build/(pkg|port) where
 they'd also be joined by the openBSD port-build-file-thingies.
 I think that ought to be fine; we might want to create a packaging/
 top-directory though, and move the debian, rpm stuff there as well.

 Also can someone refresh my memory as to which system each targets?
 pkg ::= solaris
 port ::= freebsd
 Yeah, that's what I recall too.
 
 Yes, pkg is used by Solaris (and anything else that might want to jump
 on that bandwagon).
 
 I can't see any objection to migrating the pkg directory down into a
 subdirectory related to OS-related building.  Indeed, such a tidy-up of
 the top-level, by migrating all such OS/building items (such as port,
 pkg, debian, rpm=heartbeat.spec), seems good.
 
 So +1 for a migration of these things into a subdirectory.
 
 
 Are we voting between potential names packaging/ and contrib/build/?
 I would vote for packaging/.  This is stuff that we (to a first
 approximation: developers with Hg commit access) try to support, as
 distinct from contrib/, which would be stuff (which might include a
 packaging mechanism) provided by others for which we ourselves would not
 offer support.
 
 So a gentle +1 in favour of packaging/ rather than contrib/.../ for
 the existing cases.

All of these are maintained by long-time contributors.  For example,
Horms has been on the project longer than anyone but me (yes, even
longer than Lars).  David and Matt are close behind Horms in terms of
project longevity - and all of them have done a very creditable job of
keeping up with their respective platforms.

I too would vote for packaging rather than contrib.  Unfortunately, I
believe that the RPM spec file needs to be in the top level directory of
the project :-( (correct me if I'm wrong).

-- 
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me
claim from you at all times your undisguised opinions. - William
Wilberforce
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Starting heartbeat when interfaces are down

2007-10-23 Thread Alan Robertson
Graham, Simon wrote:
 On 2007-10-19T21:57:17, Dejan Muhamedagic [EMAIL PROTECTED] wrote:

 http://old.linux-
 foundation.org/developer_bugzilla/show_bug.cgi?id=1732
 for some discussion on communication interfaces.
 discussion means the current deficits are by design ;-)

 
 So right now I'm thinking I need to modify the config and restart 
 the hb service when changes occur in the NICs...
 
 This seems somewhat counter to the idea of high availability but
 I'd
 like to understand the design center for this behavior before I
 start
 trying to 'fix' it...
 What are your circumstances? In which situations should the
 interface be down?
 Hotplug interfaces. Transient issues. Weird bugs.

 One NIC dead on start-up (which is a valid SPoF scenario which is
 currently not handled).
 
 Exactly - when you are in a degraded state you still want the cluster to
 come up.

And indeed, the cluster does come up - without a node.  A more accurate
summation is that a single node in the cluster doesn't come up.  So,
the _cluster_ does recover from this error.  It just does it without
that node.  So, service is not interrupted.

 The specific case that started me looking at this is when there is no
 address
 set on a link (e.g. if link is down at startup) which causes hb to
 simply 
 refuse to start.

It's not link down.  It's hardware missing.  Link down won't keep
heartbeat from starting, but missing hardware will certainly do so.

So, to correct both of these errors in the description:

When the hardware supporting heartbeat communications is missing
on startup, then the node on which its missing will refuse to
start, resulting in a degraded but operational cluster.

Of course, if you do recover from the error, you have the same
situation - a degraded but operational cluster.  In this case, somewhat
less degraded than the case above.

Here's why it works that way:
It is very common for people to make mistakes in configuration.
It is impossible to distinguish between a mistake and a broken
interface.
It is very hard to get people's attention to read logs.
Failing to start does a good job of doing that.

And, because of those considerations _and_ the complexity of doing
otherwise, it does not put any effort into trying to recover from it.
Because such code would be very rarely used in practice (like once every
5K-10K  cluster years - judging from past experience), the chances of it
having undiscovered bugs in it are very great.  The current behavior
exercises well-tested recovery paths (what to do when a node is down).

I don't claim that this is a perfect response, but in terms of initial
startup - you really don't want configuration errors to go unnoticed,
and you can't tell which case this is.  I would guess that in 99+% of
the cases it's a misconfiguration rather than a real failure.

The other case, of an interface going away, is a case the code
_probably_ should recover from that.

It is also worth noting that in practice (as opposed to in testing),
this has not come up to my knowledge.  The only bug in real life I've
heard of which exhibited this is one where the system was quite probably
misconfigured (using DHCP for cluster interfaces).

Keep in mind that a cluster will not stop providing service just because
a single node doesn't come up.  So, you haven't lost service when this
happens, but you get some really nasty messages and failure to start
usually gets people's attention.

I am fully aware that subsequent failures will indeed cause things to
fail - but this behavior does not constitute a single point of failure
for the cluster.

This is the rationale for this behavior.  It's not perfect behavior, but
it's not completely irrational either...

-- 
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me
claim from you at all times your undisguised opinions. - William
Wilberforce
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Starting heartbeat when interfaces are down

2007-10-23 Thread Alan Robertson
Simon Talbot wrote:
 All,
 
 Does anyone know of any Quagga/Zebra OCF Scripts in development/mature,
 if not I will put some proper effort into making some decent ones?

We have some for one specific special case, but I'm not aware of any
more general ones.

-- 
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me
claim from you at all times your undisguised opinions. - William
Wilberforce
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] RFC: pkg/ and port/ directory location

2007-10-23 Thread Alan Robertson
Lars Marowsky-Bree wrote:
 On 2007-10-22T12:03:32, Andrew Beekhof [EMAIL PROTECTED] wrote:
 
 actually debian does need to be in its current location - which is why
 i thought to ask :-)
 
 Why?
 
 And maybe a symlink would suffice, if debian insists? (Not that it
 matters, it just might be more tidy.)

Depending on the mechanism, this might or might not work...

 We could fold the Build Service specfile into the tarball as well, but
 maybe that is a stupid idea ;-)

Getting to a single RPM spec file is not a stupid idea.  I've taken some
of the code from your specfile, and some from the CentOS and Fedora
specfiles and combined them into one specfile.

No doubt it still needs more work, but it's much closer than you might
think.

I know we have had our disagreements on this, but I've spent a lot of
effort and moved a long way towards a compromise arrangement - which
should allow you to mechanically produce SUSE specfiles from the current
specfile - without taking away any of the flexibility of the current
arrangement, or the ability to tailor the result to the SUSE build system.

If you need to add those to the SUSE repository as a source file (which
I believe you do), of course you can do that - and it will be trivial to
do.  But, I'd like to give this idea a fair shot before giving up on
having the flexible build system feed into SUSE's system.

I've talked to Kevin Fenzi (the Fedora maintainer), and he seems willing
to work with me on a common specfile.  Unfortunately, he was ill when I
last contacted him, so we haven't gotten the job done yet - and I'm
currently out of the country.

I haven't talked to Johnny Hughes (CentOS maintainer) about this yet,
because I thought I'd deal with one distro at a time.  But, since you
inquired, I thought I'd at least let you know what's going on.

-- 
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me
claim from you at all times your undisguised opinions. - William
Wilberforce
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] mysql ocf patch - update documentation and fix a typo

2007-10-18 Thread Alan Robertson

Raoul Bhatia [IPAX] wrote:

please find another ocf::heartbeat::mysql patch attached.


When you attach patches, it would be nice if you're able to make them 
text/plain MIME types.


--
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me 
claim from you at all times your undisguised opinions. - William 
Wilberforce

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] mysql ocf patch - update documentation and fix a typo

2007-10-18 Thread Alan Robertson

Raoul Bhatia [IPAX] wrote:

Alan Robertson wrote:

Raoul Bhatia [IPAX] wrote:

please find another ocf::heartbeat::mysql patch attached.


When you attach patches, it would be nice if you're able to make them 
text/plain MIME types.


I didn't mean to say you should resend it, I just meant this for future 
patches.  If you name your patches foopatch.txt, then Thunderbird will 
make it a text/plain file.


Then I can read it inline, or handle it as an attachment - either way. 
What you did was fine, but IMHO, this is slightly (but not a lot) better 
;-).


--
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me 
claim from you at all times your undisguised opinions. - William 
Wilberforce

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] Announcing HA/DR Educational Blog

2007-10-11 Thread Alan Robertson
Alan Robertson has started a blog focusing on HA, DR, and related 
topics - largely from an educational perspective.


You can find the blog here:
http://techthoughts.typepad.com/managing_computers/

A few of the recent topics include:

  * Split-brain, Quorum, and Fencing
  * How to use a watchdog timer
  * Virtualization as High Availability (Disaster Recovery) or
High-Availability as Virtualization?
  * Tools for monitoring services
  * Service Monitoring - basics of a key part of automated management
  * Automated Disaster Recovery - Data Replication



--
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me 
claim from you at all times your undisguised opinions. - William 
Wilberforce

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] Release 2.1.3 planned for 10 December, 2007

2007-10-10 Thread Alan Robertson
, since communication has been 
one of our failures in the past, we will communicate by both RSS and 
email – so that no one is missed.
* Any non-trivial fix will cause the final testing interval to 
restart. During this final stage, developers must state if a bug fix is 
trivial or not based on their knowledge. If the test team disagrees with 
respect to their place in the testing procedure and their knowledge of 
the fix, they will dialog with the developers involved to reach a 
consensus.


Activities:

* Developers and test team and volunteers test 'test' base using CTS
* Test team and volunteers execute test plan against 'test' version
* Test team and volunteers report bugs to community Bugzilla
* Final testing interval shall include complete manual tests from 
test plan, and at least 5000 iterations of the CTS suite spread across 
at least two architectures, preferably more.


Responsibilities

This section outlines the responsibilities of each of the set of 
participating stakeholders in the Linux-HA project during the official 
test intervals.


Developer Responsibilities

* Communicate status, progress, and problems with other 
stakeholders using the linux-ha-dev email mailing list, and when 
appropriate, private emails, conference calls, and other phone calls.
* provide timely bug fixes for bugs assigned to you to the 'dev' 
workspace
* avoid committing major restructuring and similar high-impact 
things to 'dev' during the entire testing interval.
* Build RPMs using make rpm and run BasicSanityCheck before pushing 
any fixes to 'dev' during test interval

* Run CTS tests during test intervals as specified
* Run manual tests – from test plan and/or ad hoc
* File bug reports for discovered bugs – making sure bugs get 
assigned appropriately
* When creating bug fixes during testing period, email the -dev 
list asking for your fix(es) to be incorporated.
* Verify that bugs you submitted are fixed after they've been 
incorporated into 'test' workspace.


Test team responsibilities

* Communicate status, progress, and problems with other 
stakeholders using the linux-ha-dev email mailing list, and when 
appropriate, private emails. Status communication regarding 
high-priority bugs, missing fixes, fixed bugs, etc. is especially 
important during the test interval.
* Publish overall schedule for the stages of the testing progress 
based on Alan's draft. Review, correct and update this schedule. After 
the first release, the test team will create future schedules.
* Develop and publish a manual test plan on the wiki site for 
testing, participating in discussions with, and incorporating feedback 
from the community. This test plan must include limited testing for 
R1-style clusters, and also for upgrades as well as initial (clean) 
installs.

* Manage the 'test' workspace
* Incorporate bug fixes to the 'test' workspace according to the 
rules for the different phases of the test interval
* Carry out test plan – both manual and automated testing. Make 
sure that someone is doing at least limited r1-style testing. If you 
don't do it, find someone who will do it for you. 500 iterations during 
end of initial test cycle, and 500 during final test cycle is sufficient.
* Check with other community members to ensure that we are covering 
these platforms at least minimally during testing:

  o x86 linux
  o x86_64 linux
  o ppc linux
  o s390x linux
  o x86 FreeBSD
  o x86 OpenBSD
  o x86 solaris
* Create bug reports for bugs discovered while testing. Make sure 
bugs are assigned to an appropriate person.

* When release is completed, tag released version
* After tagging release, push to the stable repository
* Tell Alan to publish release and digitally sign tar ball, 
packages, etc.


Community responsibilities

Communicate status, progress, and problems with other stakeholders using 
the linux-ha-dev email mailing list, and when appropriate, private 
emails, conference calls, and other phone calls.


* Evaluate and comment on test plan
* Perform ad-hoc testing during entire test cycle
* Perform BasicSanityCheck testing during test cycle
* Offer opinions as appropriate on the severity of bugs and the 
relative importance of bug fixes in your environment and experience

* Perform CTS testing (if possible) during the testing cycle
* Make sure your OS and/or distribution works well with the new release
* File good, detailed bug reports as appropriate. Dejan Muhamadagic 
has a tool which helps with this.


Miscellaneous responsibilities

These responsibilities belong to individuals and small groups of people.

Alan Robertson

* Communicate status, progress, and problems with other 
stakeholders using the linux-ha-dev email mailing list, and when 
appropriate, private emails, conference calls, and other phone calls

Re: [Linux-ha-dev] Release 2.1.3 planned for 10 December, 2007

2007-10-10 Thread Alan Robertson

Alan Robertson wrote:

Hi,

As noted in the subject, we're currently planning on putting out release 
2.1.3 on 10 December 2007.


Thanks to the good people at NTT and NEC we have more people to help 
with testing than in the past.  I have documented a proposed release 
testing procedure here:

http://linux-ha.org/ReleaseTesting


I have added a page to the wiki specifically outlining the dates for the 
the next release - 2.1.3.  You can find this page here:

http://linux-ha.org/ReleaseTesting/2.1.3

As before, I've pasted that document here to facilitate discussion:

Proposed Schedule for Release 2.1.3

Since this will be the first time we've tried to have a dedicated test 
team, and they haven't done this before, I've allowed extra time for 
this release. This way, no one should be rushed, and we should be able 
to produce a quality result this first time, including time for learning 
how to do everything.


Proposed Schedule for Release 2.1.3

Since this will be the first time we've tried to have a dedicated test 
team, and they haven't done this before, I've allowed extra time for 
this release. This way, no one should be rushed, and we should be able 
to produce a quality result this first time, including time for learning 
how to do everything.


* Initial testing interval:
  o Begin 18 October, 2007
  o End 17 November, 2007
* Final testing interval
  o Begin 18 November, 2007
  o End 09 December, 2007
* Release date
  o 10 December, 2007

Known time difficulties

* Alan is in the UK from 21-27 October
* Alan is in Dallas, TX from 11-16 November
* Andrew is on vacation from 29 November - 4 December
* 22-23 November is a holiday in the US, some people take off the 
whole week



--
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me 
claim from you at all times your undisguised opinions. - William 
Wilberforce

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Dynamic Modify the timeout values

2007-08-24 Thread Alan Robertson
 anything new.  That happens
in step 2.  Test this code in CTS, and test it manually.
Have it reviewed, and repeat until people are happy.
Then I'll commit it for you.

Step 2 - add the code to deal with changes in the configuration, and
figure out when to kill things, when to start new ones.

Step 3 - Create CTS tests which change the configuration, then change it
back, watching for the correct behavior in each case.  Run 1000
instances of this test alone in a CTS run.  After you have had the code
reviewed, and have run these tests, and everyone is happy, then we'll
commit this stage of the changes.

Suggested Enhancement - after doing this:
Since you now know how to restart anything in heartbeat, you should also
be able to restart a pair of read and write children if either should
die.   So, we should be able to then recover from them dying.  Add the
code to do this, and fix up the CTS test which is supposed to kill
random processes, to know how to kill any process in the system.  Turn
the test back on, and run 1000 instances of this test in CTS.  Similarly
for this stage, submit it for review, and when everyone is happy, we'll
commit it.

And, in the end this will be a great improvement, and the system will
also be more robust (better able to recover from errors) than it has
ever been.

How does that sound for an outline of a plan?


--
Alan Robertson [EMAIL PROTECTED]


Hi, Alan-san.

I am sorry for delay. And we asked our sponsor and he admit to
research what you suggest. Though I researched the parameters for
ha.cf, they are over 50 and I think that almost parameters are not
needed to be modified dynamically, e.g. crm, use_logd, baud, etc. So,
your issue is ideally, but to realize it takes many costs and it is
not pratical.


I'm sorry that you view it this way.  It is _certainly_ not 50 times 
harder than what you've done.  It is probably 3-5 times harder than what 
you've done.


Most of the work would be very simple - keeping a copy of the 
configuration in memory, and restructuring the while loops.  And, for 
any parameter you could imagine, someone has at some time wanted to 
change it at run time without a complete restart.


Especially people have asked for the ability change communications setup 
- especially add ucast media - at run time.


People have also asked to recover from communication child processes 
which die - and a similar restructuring is necessary for this case.


These are the harder cases - and the most useful cases.  I know of very 
strong and common use cases for these situations.


I still don't understand the use case for the change you want to make. 
We certainly can't add a new signal for each type of parameter we want 
to change.  Since that's the case, I don't want to add a new signal for 
a single case - because that's not a general approach.


What I mean by use case is this - Why does someone want (need) to 
change the heartbeat interval, dead time, warn time, etc. at run time?


What brings that about and makes it a common thing to want to do?

--
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me 
claim from you at all times your undisguised opinions. - William 
Wilberforce

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] OpenBSD port update

2007-08-16 Thread Alan Robertson
Sebastian Reitenbach wrote:
 Hi,
 
 I updated my OpenBSD port of linux ha. Maybe someone has an OpenBSD box
 around and might test it, it can be downloaded here:
 https://www.l00-bugdead-prods.de
 
 it is a port of linux-ha 2.1.2, with the following tweaks:
 - removed dependencies to bash (that already went into dev, thanks)
 - some hardcoded replacement of pam with bsd_auth to allow compilation
   of mgmtd and usage of hb_gui
 - some other minor changes

Any chance you'd send me the patch for the PAM/bsd_auth change?


-- 
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me
claim from you at all times your undisguised opinions. - William
Wilberforce
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Local loopback HBcomm plugin?

2007-08-16 Thread Alan Robertson
Xinwei Hu wrote:
 Hi all,
 
   I'd like to run heartbeat on a single node.
 
   Because heartbeat refuses to start as the it requires at least one
 usable media for communication, I think it might be good idea to have
 a new HBcomm plugin for this purpose.
 
   The attached file has been tested on openSUSE 10.2 + hg version of hb2.
 
   Comments are welcome.

You can also do a ucast 127.0.0.1 without any new code...  That's what
we do for testing in BasicSanityCheck.

-- 
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me
claim from you at all times your undisguised opinions. - William
Wilberforce
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Shared disk file Exclusiveness controlprogramforHB2

2007-08-15 Thread Alan Robertson
Junko IKEDA wrote:
 Hi,
 
 We are planning to make sf-ex into a quorum plugin,
 but It might take a while because we aren't really familiar with it.
 Your continued support will be greatly appreciated.
 
 as for RA, last come, last served, let's see, the last node which can update
 the reserve status is going to win the right to run resources, won't work at
 all?

At some point the algorithm will terminate on each side of the connection.

What is vital is that after one side terminates and think it has
ownership of the resources that the other side not terminate thinking
that it also has ownership of the resource.

It is also important that the monitor action for this resource be
invoked frequently, and that it verify that it continues to own the
resource, and fail if the other side somehow takes ownership away.

If everything goes well, then one side will start the resource and
succeed, and the other side will try and start the resource and fail to
do so.  Normally heartbeat will try and relocate the resource on another
node after a start failure.  After it runs out of nodes which have had
start failures, then it will not start it anywhere until a human intervenes.

If the other side subsequently dies for any reason (including STONITH),
then this side will be unable to take over the resource without human
intervention.

I have also just come into possession of a piece of code which does this
same kind of work, and is known to work correctly.  I'll post it after I
read it over in more detail.

-- 
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me
claim from you at all times your undisguised opinions. - William
Wilberforce
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Shared disk file Exclusiveness controlprogramforHB2

2007-08-14 Thread Alan Robertson
Junko IKEDA wrote:
 OK. I think you are mis-understanding the problem.

 When the communication between Node A  B is fine, you don't need any
 kind of lock. Heartbeat itself can ensure the resource runs on one
 selected
 node, and on one node only.
 
 sfex_lock() is just checking the status that shows which node succeeded to
 lock.
 It won't be always trying to lock over and over again
 
 sfex_lock is valuable when the communication between A  B is broken.
 But when the communication IS broken, you can't assume sfex_lock will run
 in order any more.
 
 If the interconnect LAN is down, Split-Brain will come.
 the lock status is reserved for Node A at this moment,
 but Node B is also trying to update the status in order to lock because
 Split-Brain has arisen.
 while Node A checks the status, Node B might update it.
 Node A, which is overwrote its status, is going to release the lock.
 sfex_lock() doesn't have such a complex logic.

I believe that the point he was trying to make is that it _needs_ the
complexity of the logic to be always correct even in the split-brain
case - and I agree.

If this logic fails and both sides think they have exclusive access in a
split-brain case, then a filesystem on disk may be destroyed.  This is a
_very_ bad consequence - much worse than a crash.  It doesn't matter if
it is relatively unlikely, because the consequence is so terrible.  With
hundreds of thousands of clusters running Heartbeat, even unlikely
events eventually happen.
http://linux-ha.org/BadThingsWillHappen

You should be able to run hundreds of thousands or millions of tests
where both sides are trying to get the lock at the same time, and be
able to verify that only one side got the lock - in every single case.

Please don't be discouraged.  Horms started a similar effort a few years
ago, but he wasn't able to spend enough time with it to get it right.

What you're doing is a valuable thing to do, and we all understand very
well that it's difficult.

When I first entered this discussion, I mentioned lockless
synchronization algorithms as being good things to study.  In this case,
we are trying to create a lock, but I suspect the lockless methods would
be a good way to synchronize the creation of a lock (even though this
sounds odd).

-- 
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me
claim from you at all times your undisguised opinions. - William
Wilberforce
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Shared disk file Exclusiveness control program for HB2

2007-08-14 Thread Alan Robertson
Alan Robertson wrote:
 Andrew Beekhof wrote:
 On 8/8/07, NAKAHIRA Kazutomo [EMAIL PROTECTED] wrote:
 Hello, all.

 We wrote a Shared Disk File EXclusiveness Control Program, called
 SF-EX for short, could prevent a destruction of data on
 shared disk file system due to Split-Brain.

 This program consists of CUI commands written in the C and RA,
 the former is used for managing and monitoring shared disk status
 and the latter is the same as other common RAs.

 Your suggestions on how to improve are really appreciated.
 Oddly enough i was just thinking about this and wondering why no-one
 had written one yet.

 I understand the logic for making it into a quorum plugin, but I would
 really love to see it useable in both ways.  To me, you just can't
 beat the simplicity and flexibility of making it an RA (whereas I am
 somewhat cautious about involving the CCM, especially in it's
 currently unmaintained state).
 
 We just put a set of bug fixes in the CCM, and have another set planned.
  How does that translate into unmaintained.  Few bugs filed against it
 would be better translated as stable, not unmaintained.
 
 If you have some CCM misbehaviors which you think need fixing, by all
 means create bugzillas for them, please.  [I saw some mentioned in
 another thread].


Let me start off by agreeing with Andrew:
I see no technical reason not to provide the capability both ways,
provided that you are willing to create and maintain two interfaces to it.

If you are going to provide and support it only one way, I would suggest
that the quorum module would be the more useful of the two - for reasons
given below.

If one implements quorum as a resource, then everything which needs
quorum needs to depend on the quorum resource.  Unfortunately, a quorum
resource in this case is not integrated with fencing.   So, integrating
it as a quorum plugin has definite advantages which can't be easily
duplicated using a chain of dependencies.

I would suspect the description of it as a simple solution is a matter
of subjective opinion which I wouldn't find myself agreeing with.  If I
were to look at which component to involve, I would look at the level of
integration into the total solution, introduction of new single points
of failure, probability of user misconfiguration, etc.  Creating the set
of dependencies to make every resource in the cluster depend on this
resource is not complex, but it is tedious, error prone, and the kind of
thing which would be likely to be botched in some future update of the
CIB by a new administrator (i.e., the configuration would be fragile).
 It also makes this resource a single point of failure for the entire
cluster.

On the other hand, if you wanted to create a dedicated disk partition
for every resource group, then one could create a resource driven
cluster (similar to SteelEye's LifeKeeper).  This would provide a
certain degree of flexibility, and would eliminate the single point of
failure aspect against all resources - while further increasing the
complexity of the solution.  [It would remain an SPOF for each resource
group involved, but not for the cluster-as-a-whole].

I suspect there are some special cases where the resource model is a
clear winner, but my guess is that they are relatively rare.

Of course, if that happens to be your situation, then you'll likely be a
little happier if that capability is provided.


-- 
Alan Robertson [EMAIL PROTECTED]

Openness is the foundation and preservative of friendship...  Let me
claim from you at all times your undisguised opinions. - William
Wilberforce
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


  1   2   3   4   >