Hello,

Lars Marowsky-Bree wrote:

On 2006-01-20T10:03:53, Andrew Beekhof <[EMAIL PROTECTED]> wrote:

Woah, what are you calling crm_attribute for all the time?
Its either an ipfail replacement or his way of getting resources to run on the node where they've failed the least... I forget which.

There are three usages:
1. There is an ifmonitord that monitors all network interfaces in the cluster
and writes the current status to the cib so it is available to all nodes.
When a network interface fails (link goes down)  before I return
and error and cause a failover, I check if the other node has
a link status of "up" for the specified interface.  This is obviously
neccessary before it can take over.  If the status is not "up",
no error is returned.  This is for clusters that are not fully
redundant to minimize the risk of a false alarm.
2. You can set a resource in maintenance mode, that prevents
the monitor action to return an error.  This variable is also
stored in the cib, so tha RA have to check it every monitor
interval.
3. you can set the maximum number of restarts before a real
failover occurs, this is also stored in the cib.

Regards,

   Peter

We definetely need to include both features ourselves into 2.0.4.

Despite bugfixing and some RAs, these would be about the only real new
stuff I'd like to see there... (And the good thing is that it's probably
much the same mechanism.)

If 2.0.3 is delayed more, feel free to start writing a design & coding
it up already ;-)

Yeah, we know, logging needs tuning. This one probbably needs to be
tuned down.
Nod.  Not logging read-only CIB calls wouldn't affect me too much.

Yeah, it's this kind of feedback we need to really understand what we
need to log, so it's all well.

A regression test which just pounds the CIB with queries from several
clients in parallel however seems a good idea. Andrew, if you're
bored, how about such a testcase? (We could add it to BSC, or at
least run it on demand there.)
Except it takes 24 hours of such pounding to trigger it... not really
feasible for CTS.

Right, which is why I suggested a stand-alone CIB pounding which we can
leave running somehwere for a couple of hours-days. I expect that if we
really pound it from say 2-8 clients at once continuously/randomly the
bug might surface faster than 24h ;-)



_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Reply via email to