Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-02 Thread Andrew Beekhof
On Fri, May 20, 2016 at 1:53 AM, Ken Gaillot  wrote:
> A recent thread discussed a proposed new feature, a new environment
> variable that would be passed to resource agents, indicating whether a
> stop action was part of a recovery.
>
> Since that thread was long and covered a lot of topics, I'm starting a
> new one to focus on the core issue remaining:
>
> The original idea was to pass the number of restarts remaining before
> the resource will no longer tried to be started on the same node. This
> involves calculating (fail-count - migration-threshold), and that
> implies certain limitations: (1) it will only be set when the cluster
> checks migration-threshold; (2) it will only be set for the failed
> resource itself, not for other resources that may be recovered due to
> dependencies on it.
>
> Ulrich Windl proposed an alternative: setting a boolean value instead. I
> forgot to cc the list on my reply, so I'll summarize now: We would set a
> new variable like OCF_RESKEY_CRM_recovery=true

This concept worries me, especially when what we've implemented is
called OCF_RESKEY_CRM_restarting.

The name alone encourages people to "optimise" the agent to not
actually stop the service "because its just going to start again
shortly".  I know thats not what Adam would do, but not everyone
understands how clusters work.

There are any number of reasons why a cluster that intends to restart
a service may not do so.  In such a scenario, a badly written agent
would cause the cluster to mistakenly believe that the service is
stopped - allowing it to start elsewhere.

Its true there are any number of ways to write bad agents, but I would
argue that we shouldn't be nudging people in that direction :)

> whenever a start is
> scheduled after a stop on the same node in the same transition. This
> would avoid the corner cases of the previous approach; instead of being
> tied to migration-threshold, it would be set whenever a recovery was
> being attempted, for any reason. And with this approach, it should be
> easier to set the variable for all actions on the resource
> (demote/stop/start/promote), rather than just the stop.
>
> I think the boolean approach fits all the envisioned use cases that have
> been discussed. Any objections to going that route instead of the count?
> --
> Ken Gaillot 
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Can't get nfs4 to work.

2016-06-02 Thread Jan Pokorný
On 02/06/16 02:35 +0200, Dennis Jacobfeuerborn wrote:
> On 01.06.2016 20:25, Stephano-Shachter, Dylan wrote:
>> I have just finished setting up my HA nfs cluster and I am having a small
>> problem. I would like to have nfs4 working but whenever I try to mount I
>> get the following message,
>> 
>> mount: no type was given - I'll assume nfs because of the colon
> 
> I'm not sure if the type "nfs" is supposed to work with v4 as well but
> on my systems the mounts use the explicit type "nfs4" so you can try
> mounting with "-t nfs4".

$ rpm -qf $(man -w mount.nfs)
> nfs-utils-1.3.3-7.rc4.fc22.x86_64

$ man mount.nfs | fmt -w70 | grep -A2 Under
>   Under Linux 2.6.32 and later kernel versions, mount.nfs can
>   mount all NFS file system versions.  Under earlier Linux
>   kernel versions, mount.nfs4 must be used  for mounting NFSv4
>   file systems while mount.nfs must be used for NFSv3 and v2.

-- 
Jan (Poki)


pgpGOFtGYIpvL.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: Q: status section of CIB: "last_0" IDs and "queue-time"

2016-06-02 Thread Ken Gaillot
On 06/02/2016 01:07 AM, Ulrich Windl wrote:
 Ken Gaillot  schrieb am 01.06.2016 um 16:14 in 
 Nachricht
> <574eede2.1090...@redhat.com>:
>> On 06/01/2016 06:14 AM, Ulrich Windl wrote:
>>> Hello!
>>>
>>> I have a question:
>>> Inspecting the XML of our cluster, I noticed that there are several IDs 
>> ending with "last_0". So I wondered:
>>> It seems those IDs are generated for start and stop operations, and I 
>> discovered one case where an ID is duplicate (the status is for different 
>> nodes, and one is a start operation, while the other is a stop 
>> operationhowever).
>>
>> The "*_last_*" IDs simply refer to the last (= most recently executed)
>> operation :)
>>
>> Those IDs are not directly used by the cluster; they're just used to
>> store the most recent operation in the CIB.
>>
>>> Background: I wrote some program that extarcts the runtimes of operations 
>> from the CIB, like this:
>>> prm_r00_fs_last_0 13464 stop
>>> prm_r00_fs_last_0 61 start
>>> prm_r00_fs_monitor_30 34 monitor
>>> prm_r00_fs_monitor_30 43 monitor
>>>
>>> The first word is the "id" attribute, the second is the "exec-time" 
>> attribute, and the last one (added to help myself out of confusion) is the 
>> "operation" attribute. Values are converted to milliseconds.
>>>
>>> Is the name of the id intentional, or is it some mistake?
>>>
>>> And another question: For an operation with "start-delay" it seems the 
>>> start 
>> delay is simple added to the queue time (as if the operation was waiting 
>> that 
>> long). Is that intentional?
>>
>> Yes. The operation is queued when it is received, and if it has a start
>> delay, a timer is set to execute it at a later time. So the delay
>> happens while the operation is queued.
> 
> Ken,
> 
> thanks for the answers. Is there a way to distinguish "intentional" from "non 
> intentional" queueing? One would look deeper into non-intentional queueing.

No, from the cluster's point of view, it's always intentional, just
different lengths of time. You'd just have to subtract any start delay
if you're not interested in that.

> Regards,
> Ulrich
> 
>>
>>> Another program tried to extract queue and execution times for operations, 
>> and the sorted result looks like this then:
>>>
>>> 1 27 prm_nfs_home_exp_last_0 monitor
>>> 1 39 prm_q10_ip_2_monitor_6 monitor
>>> 1 42 prm_e10_ip_2_monitor_6 monitor
>>> 1 58 prm_s01_ip_last_0 stop
>>> 1 74 prm_nfs_cbw_trans_exp_last_0 start
>>> 30001 1180 prm_stonith_sbd_monitor_18 monitor
>>> 30001 178 prm_c11_ascs_ers_monitor_6 monitor
>>> 30002 165 prm_c11_ascs_ers_monitor_45000 monitor
>>>
>>> Regards,
>>> Ulrich

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Can't get nfs4 to work.

2016-06-02 Thread Dennis Jacobfeuerborn
On 02.06.2016 09:18, Ferenc Wágner wrote:
> "Stephano-Shachter, Dylan"  writes:
> 
>> I can not figure out why version 4 is not supported.
> 
> Have you got fsid=root (or fsid=0) on your root export?
> See man exports.
> 

This is apparently no longer recommended:
http://wiki.linux-nfs.org/wiki/index.php/Nfsv4_configuration

"The linux implementation allows you to designate a real filesystem as
the pseudofilesystem, identifying that export with the fsid=0 option; we
no longer recommend this. Instead, on any recent linux distribution,
just list exports in /etc/exports exactly as you would for NFSv2 or NFSv3. "

Regards,
  Dennis


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Can't get nfs4 to work.

2016-06-02 Thread Ferenc Wágner
"Stephano-Shachter, Dylan"  writes:

> I can not figure out why version 4 is not supported.

Have you got fsid=root (or fsid=0) on your root export?
See man exports.
-- 
Feri

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org