07.06.2016 02:20, Ken Gaillot wrote:
On 06/06/2016 03:30 PM, Vladislav Bogdanov wrote:
06.06.2016 22:43, Ken Gaillot wrote:
On 06/06/2016 12:25 PM, Vladislav Bogdanov wrote:
06.06.2016 19:39, Ken Gaillot wrote:
On 06/05/2016 07:27 PM, Andrew Beekhof wrote:
On Sat, Jun 4, 2016 at 12:16 AM, Ken Gaillot <kgail...@redhat.com>
wrote:
On 06/02/2016 08:01 PM, Andrew Beekhof wrote:
On Fri, May 20, 2016 at 1:53 AM, Ken Gaillot <kgail...@redhat.com>
wrote:
A recent thread discussed a proposed new feature, a new environment
variable that would be passed to resource agents, indicating
whether a
stop action was part of a recovery.

Since that thread was long and covered a lot of topics, I'm
starting a
new one to focus on the core issue remaining:

The original idea was to pass the number of restarts remaining
before
the resource will no longer tried to be started on the same node.
This
involves calculating (fail-count - migration-threshold), and that
implies certain limitations: (1) it will only be set when the
cluster
checks migration-threshold; (2) it will only be set for the failed
resource itself, not for other resources that may be recovered
due to
dependencies on it.

Ulrich Windl proposed an alternative: setting a boolean value
instead. I
forgot to cc the list on my reply, so I'll summarize now: We would
set a
new variable like OCF_RESKEY_CRM_recovery=true

This concept worries me, especially when what we've implemented is
called OCF_RESKEY_CRM_restarting.

Agreed; I plan to rename it yet again, to
OCF_RESKEY_CRM_start_expected.

The name alone encourages people to "optimise" the agent to not
actually stop the service "because its just going to start again
shortly".  I know thats not what Adam would do, but not everyone
understands how clusters work.

There are any number of reasons why a cluster that intends to
restart
a service may not do so.  In such a scenario, a badly written agent
would cause the cluster to mistakenly believe that the service is
stopped - allowing it to start elsewhere.

Its true there are any number of ways to write bad agents, but I
would
argue that we shouldn't be nudging people in that direction :)

I do have mixed feelings about that. I think if we name it
start_expected, and document it carefully, we can avoid any casual
mistakes.

My main question is how useful would it actually be in the
proposed use
cases. Considering the possibility that the expected start might
never
happen (or fail), can an RA really do anything different if
start_expected=true?

I would have thought not.  Correctness should trump optimal.
But I'm prepared to be mistaken.

If the use case is there, I have no problem with
adding it, but I want to make sure it's worthwhile.

Anyone have comments on this?

A simple example: pacemaker calls an RA stop with start_expected=true,
then before the start happens, someone disables the resource, so the
start is never called. Or the node is fenced before the start happens,
etc.

Is there anything significant an RA can do differently based on
start_expected=true/false without causing problems if an expected start
never happens?

Yep.

It may request stop of other resources
* on that node by removing some node attributes which participate in
location constraints
* or cluster-wide by revoking/putting to standby cluster ticket other
resources depend on

Latter case is that's why I asked about the possibility of passing the
node name resource is intended to be started on instead of a boolean
value (in comments to PR #1026) - I would use it to request stop of
lustre MDTs and OSTs by revoking ticket they depend on if MGS (primary
lustre component which does all "request routing") fails to start
anywhere in cluster. That way, if RA does not receive any node name,

Why would ordering constraints be insufficient?

They are in place, but advisory ones to allow MGS fail/switch-over.

What happens if the MDTs/OSTs continue running because a start of MGS
was expected, but something prevents the start from actually happening?

Nothing critical, lustre clients won't be able to contact them without
MGS running and will hang.
But it is safer to shutdown them if it is known that MGS cannot be
started right now. Especially if geo-cluster failover is expected in
that case (as MGS can be local to a site, countrary to all other lustre
parts which need to be replicated). Actually that is the only part of a
puzzle remaining to "solve" that big project, and IMHO it is enough to
have a node name of a intended start or nothing in that attribute
(nothing means stop everything and initiate geo-failover if needed). If
f.e. fencing happens for a node intended to start resource, then stop
will be called again after the next start failure after failure-timeout
lapses. That would be much better than no information at all. Total stop
or geo-failover will happen just with some (configurable) delay instead
of rendering the whole filesystem to an unusable state requiring manual
intervention.

My gut feeling is that this is getting RAs a little too involved in the
cluster's inner workings. If I understand your idea correctly, it would

;)

be sufficient for your needs to know whether a start is expected on any
node in the same transition. So maybe start_expected=no/local/peer would
cover this use case and the original one.

Yes, that is perfectly good for me.



then it can be "almost sure" pacemaker does not intend to restart
resource (yet) and can request it to stop everything else (because
filesystem is not usable anyways). Later, if another start attempt
(caused by failure-timeout expiration) succeeds, RA may grant the ticket
back, and all other resources start again.

Best,
Vladislav

_______________________________________________
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



_______________________________________________
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to