Re: [Linux-ha-dev] Dynamic Modify the timeout values

Alan Robertson Fri, 24 Aug 2007 05:30:57 -0700

DAIKI MATSUDA wrote:

DAIKI MATSUDA wrote:

Hi, All


I add the new function for heartbeat-2.0.8 and attached its patch file.

The function is to apply the new timeout parameters ( keepalive,
deadtime, deadping, warntime ) without stopping the heartbeat services.
Currently heartbeat boot scripts supply the 'reload' or 'forcereload'
function, but it, they are same, does stop the services and the HA
services are moved to standby node, because its process kills the forked
heartbeat processes and clients ( crmd etc. ).
So, we think to without suspending the services make the changing
parameters to apply to driving nodes. Current feature is following.
1. changing ha.cf <http://ha.cf> file for 4 parameters
2. send working parent heartbeat process signal SIGRTMAX ( e.g. kill -s
SIGRTMAX `cat /var/run/heartbeat.pid` (Why do I choose SIGRTMAX? I do
not find the unused good signal.)

As we research the heatbeat, it may be safety. And I want to listen to
your issues for patch and functions.

Sorry to be coming in so late on this, but I was working on the release
for many weeks now.  I really like the idea of dynamically modifying the
heartbeat configuration - but if you're going to go to the trouble to do
it, I'd like to see it done more generally.

In other words, I'd like to be able to change nearly any  parameter in
ha.cf at run time without restarting heartbeat.

This would require reworking (and improving) the way heartbeat starts
up.  This would be probably about twice or three times as much work as
what you've done, but it would be much more useful, and much more general.

In the end, if done right, it could be groundwork to letting let us
eventually be able receive config updates from the CIB.  [I know there's
a bootstrapping issue, but we can deal with that when we get to deciding
to do that work].

I have thought about this and have some specific ideas on what kinds of
things need to be done to make this happen.

Hi, Alan.

I understood what you say and think it is very good idea to tread all
parameters in ha.cf. I thought my implementation is for testing and it
is better that you, ha-dev team, make its feature.

I don't know quite what you meant by "it is better that you, ha-dev
team, make it's feature".

I am sorry for poor English. It means that the feature you think to
make is better than what I made.
If possible, could you show the schedule

Not a problem.  This will all work out.

I don't have a particular schedule in mind.  I'm also not sure how long
it will take, and this kind of thing depends a lot on how well the
person doing the change knows the code.


Here is a suggested approach.  At each stage, please test the patch
some, submit the patch for review and then test it extensively, and
submit it for re-review if you found more bugs.  I would suggest in this
order - to keep you from spending too much time testing a patch we ask
you to do over.  In fact, on the first stage maybe review your data
structures first, because that will determine the code in the end.

Step 1 - Further categorize and modularize the configuration.
        There are at least 4 kinds of statements in the configuration
        and there may be more:
          1. media statements - like ucast, bcast, etc. Things
                which load plugins and start read/write processes
          2. global statements - which affect some or all of the
                media statements - things like port number, serial
                baud rate, etc.  Knowing which global statements
                affect which media statements, may eventually be
                important.
          3. Respawn statements - things which start child processes
                this includes the implied respawn statements in things
                like 'crm on'.
          4. Other statements.  For each of these, figure out which
                class of processes are affected by each change.

        Make it so that each media statement is processed by a single
        function call.  Right now, the processing for any given media
        statement is embedded in a loop.  This is just restructuring.

        If you store all the ha.cf statements in an array, then you can
        make a minor improvement even in this stage.  Make a pass
        through the array looking for global statements and execute them
        first.  This will fix some known annoying behaviors where these
        need to occur before they're used.

        For media and respawn statements, you need to add an association
        between the statements and the child processes they created.
        That way, when we finally get around to processing changes, we
        can kill them when they go away or change.  We already have
        a special way to track processes.  Use that code, but create
        new associations.

        Note that this doesn't implement the feature we are talking
        about, it just lays the groundwork for it.  At this point
        the code won't be able to do anything new.  That happens
        in step 2.  Test this code in CTS, and test it manually.
        Have it reviewed, and repeat until people are happy.
        Then I'll commit it for you.

Step 2 - add the code to deal with changes in the configuration, and
figure out when to kill things, when to start new ones.

Step 3 - Create CTS tests which change the configuration, then change it
back, watching for the correct behavior in each case.  Run 1000
instances of this test alone in a CTS run.  After you have had the code
reviewed, and have run these tests, and everyone is happy, then we'll
commit this stage of the changes.

Suggested Enhancement - after doing this:
Since you now know how to restart anything in heartbeat, you should also
be able to restart a pair of read and write children if either should
die.   So, we should be able to then recover from them dying.  Add the
code to do this, and fix up the CTS test which is supposed to kill
random processes, to know how to kill any process in the system.  Turn
the test back on, and run 1000 instances of this test in CTS.  Similarly
for this stage, submit it for review, and when everyone is happy, we'll
commit it.

And, in the end this will be a great improvement, and the system will
also be more robust (better able to recover from errors) than it has
ever been.

How does that sound for an outline of a plan?


--
    Alan Robertson <[EMAIL PROTECTED]>


Hi, Alan-san.

I am sorry for delay. And we asked our sponsor and he admit to
research what you suggest. Though I researched the parameters for
ha.cf, they are over 50 and I think that almost parameters are not
needed to be modified dynamically, e.g. crm, use_logd, baud, etc. So,
your issue is ideally, but to realize it takes many costs and it is
not pratical.

I'm sorry that you view it this way. It is _certainly_ not 50 timesharder than what you've done. It is probably 3-5 times harder than whatyou've done.

Most of the work would be very simple - keeping a copy of theconfiguration in memory, and restructuring the while loops. And, forany parameter you could imagine, someone has at some time wanted tochange it at run time without a complete restart.

Especially people have asked for the ability change communications setup- especially add ucast media - at run time.

People have also asked to recover from communication child processeswhich die - and a similar restructuring is necessary for this case.

These are the harder cases - and the most useful cases. I know of verystrong and common use cases for these situations.

I still don't understand the use case for the change you want to make.We certainly can't add a new signal for each type of parameter we wantto change. Since that's the case, I don't want to add a new signal fora single case - because that's not a general approach.

What I mean by "use case" is this - Why does someone want (need) tochange the heartbeat interval, dead time, warn time, etc. at run time?


What brings that about and makes it a common thing to want to do?

--
    Alan Robertson <[EMAIL PROTECTED]>

"Openness is the foundation and preservative of friendship... Let meclaim from you at all times your undisguised opinions." - WilliamWilberforce

_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] Dynamic Modify the timeout values

Reply via email to