[Linux-HA] Antw: Q: groups of groups

2013-08-22 Thread Ulrich Windl
>>> I wrote 22.08.2013 um 11:31 in Nachricht <5215DA7A.3B0:161:60728>:
[...]
> So what you really want is
> "group grp_A ((R1 VG1 F1) (R2 VG2 F2)) A"
[...]

Of course that's nonsense (despite of the fact that the syntax does not exist), 
or at least non-obvious. I meant
"group grp_A ([R1 VG1 F1] ([R2 VG2 F2]) A" where [...] denotes an ordered 
"sub-group" or "sequence", and both of those sequences should be executes in 
parallel.

I hope it's less confusing now.

Regards,
Ulrich



___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] Antw: Re: Backing out of HA

2013-08-22 Thread Ulrich Windl
>>> Ferenc Wagner  schrieb am 22.08.2013 um 14:00 in Nachricht
<87siy2ndf8@lant.ki.iif.hu>:
> Lars Marowsky-Bree  writes:
> 
>> "Poisoned resources" indeed should just fail to start and that should be
>> that. What instead can happen is that the resource agent notices it
>> can't start, reports back to the cluster, and the cluster manager goes
>> "Oh no, I couldn't start the resource successfully! It's now possibly in
>> a weird state and I better stop it!"
>>
>> ... And because of the misconfiguration, the *stop* also fails, and
>> you're hit with the full power of node-level recovery.
>>
>> I think this is an issue with some resource agents (if the parameters
>> are so bad that the resource couldn't possibly have started, why fail
>> the stop?) and possibly also something where one could contemplate a
>> better on-fail="" default for "stop in response to first-start failure".
> 
> Check out http://www.linux-ha.org/doc/dev-guides/_execution_block.html,
> especially the comment "anything other than meta-data and usage must
> pass validation".  So if the start action fails with some validation
> error, the stop action will as well.  Is this good practice after all?
> Or is OCF_ERR_GENERIC treated differently from the other errors in this
> regard and thus the validate action should never return OCF_ERR_GENERIC?

Hi!

Please note that "anything other than meta-data and usage must pass 
validation". isn't very help ful to the RA developer: The statement just says 
it's not allowed to use such methodes (like start and stop) if the parameters 
don't pass validation, but it does not say _who_ is to ensure that:

Requiring the RA to do parameter validation before any such method is called 
maybe helpful only while debugging the RA. Normally the program that does 
configuration changes should validate the parameters (preferrably on aany node 
where the resource is to be activated).

Regards,
Ulrich

> -- 
> Thanks,
> Feri.
> ___
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org 
> http://lists.linux-ha.org/mailman/listinfo/linux-ha 
> See also: http://linux-ha.org/ReportingProblems 



___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] cibadmin --delete disregarding attributes

2013-08-22 Thread Andrew Beekhof

On 22/08/2013, at 10:22 PM, Ferenc Wagner  wrote:

> Hi,
> 
> man cibadmin says: "the tagname and all attributes must match in order
> for the element to be deleted",

"for the element to be deleted" <--- not the children of the element to be 
deleted

> but experience says otherwise: the
> primitive is deleted even if it was created with different attributes
> than those provided to the --delete call, cf. 'foo' vs 'bar' in the
> example below.  Do I misinterpret the documentation or is this a bug in
> Pacemaker 1.1.7?
> 
> # cibadmin --create --obj_type resources --xml-text " provider='heartbeat' type='Dummy' id='test_dummy'>
>  
>
>  
>  
> value='false'/>
>  "
> 
> # cibadmin --delete --obj_type resources --xml-text " provider='heartbeat' type='Dummy' id='test_dummy'>

tagname, class, provider, type and id all match here. so deletion is allowed to 
proceed

>  
>
>  
>  
> value='false'/>
>  "
> -- 
> Thanks,
> Feri.
> ___
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Storing arbitrary metadata in the CIB

2013-08-22 Thread Andrew Beekhof

On 22/08/2013, at 10:08 PM, Ferenc Wagner  wrote:

> Hi,
> 
> Our setup uses some cluster wide pieces of meta information.  Think
> access control lists for resource instances used by some utilities or
> some common configuration data used by the resource agents.  Currently
> this info is stored in local files on the nodes or replicated in each
> primitive as parameters.

Are you aware that resources can
- have multiple sets of parameters, and
- share sets of parameters

The combination might be useful here.

>  I find this suboptimal, as keeping them in
> sync is a hassle.  It is possible to store such stuff in the "fake"
> parameter of unmanaged Dummy resources, but that clutters the status
> output.  Can somebody offer some advice in this direction?  Or is this
> idea a pure heresy?
> -- 
> Thanks,
> Feri.
> ___
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Q: groups of groups

2013-08-22 Thread Andrew Beekhof

On 22/08/2013, at 7:31 PM, Ulrich Windl  
wrote:

> Hi!
> 
> Suppose you have an application A that needs two filesystems F1 and F2. The 
> filesystems are on separate LVM VGs VG1 and VG2 with LVs L1 and L2, 
> respectively. The RAID R1 and R2 provide the LVM PVs.
> 
> (Actually we have one group that has 58 primitives in them with both 
> dimensions being wider than in this example)
> 
> So you can configure
> "group grp_A R1 R2 VG1 VG2 F1 F2 A" (assuming the elements are primitives 
> already configured)
> 
> Now for example if R2 has a problem, the cluster will restart the whole group 
> of resources, even that sequence that is unaffected (R1 VG1 F1). This causes 
> extra operations and time for recovery what you don't like.

So don't put them in a group?

> 
> What you can do now is having parallel execution like this
> "group grp_A (R1 R2) (VG1 VG2) (F1 F2) A"

You're saying this is currently possible?
If so, crmsh must be re-writing this into something other than a group.

> (Note that this is probably a bad idea as the RAIDs and VGs (and maybe mount 
> also) most likely use a common lock each that forces serialization)
> 
> For the same failure scenario R2 wouldn't be restarted, so the gain is small. 
> A better approach seems to be
> "group grp_A (R1 VG1 F1) (R2 VG2 F2) A"
> 
> Now for the same failure R1, VG1, and F1 will survive; unfortunately if R1 
> fails, then everything will be restarted, like in the beginning.
> 
> So what you really want is
> "group grp_A ((R1 VG1 F1) (R2 VG2 F2)) A"
> 
> Now if R2 fails, then R1, VG1, and F1 will survive, and if R1 fails, then R2, 
> VG2 and F2 will survive
> 
> Unfortunately the syntax of the last example is not supported.

I'm surprised the one before it is even supported. Groups of groups have never 
been supported.

> This one isn't either:
> 
> group grp_1 R1 VG1 F1
> group grp_2 R2 VG2 F2
> group grp_A (grp_1 grp_2) A
> 
> So a group of groups would be nice to have. I thought about that long time 
> ago, but only yesterday I learned about the syntax of "netgroups" which has 
> exactly that: a netgroup can contain another netgroup ;-)
> 
> Regards,
> Ulrich
> 
> 
> ___
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] How to painlessly change depended upon resource groups?

2013-08-22 Thread Arnold Krille
Hi,

On Thu, 22 Aug 2013 18:22:50 +0200 Ferenc Wagner  wrote:
> I built a Pacemaker cluster to manage virtual machines (VMs).  Storage
> is provided by cLVM volume groups, network access is provided by
> software bridges.  I wanted to avoid maintaining precise VG and bridge
> dependencies, so I created two cloned resource groups:
> 
> group storage dlm clvmd vg-vm vg-data
> group network br150 br151
> 
> I cloned these groups and thus every VM resource uniformly got two
> these two dependencies only, which makes it easy to add new VM
> resources:
> 
> colocation cl-elm-network inf: vm-elm network-clone
> colocation cl-elm-storage inf: vm-elm storage-clone
> order o-elm-network inf: network-clone vm-elm
> order o-elm-storage inf: storage-clone vm-elm
> 
> Of course the network and storage groups do not even model their
> internal dependencies correctly, as the different VGs and bridges are
> independent and unordered, but this is not a serious limitation in my
> case.
> 
> The problem is, if I want to extend for example the network group by a
> new bridge, the cluster wants to restart all running VM resources
> while starting the new bridge.  I get this info by changing a shadow
> copy of the CIB and crm_simulate --run --live-check on it.  This is
> perfectly understandable due to the strict ordering and colocation
> constraints above, but undesirable in these cases.
> 
> The actual restarts are avoidable by putting the cluster in
> maintenance mode beforehand, starting the bridge on each node
> manually, changing the configuration and moving the cluster out of
> maintenance mode, but this is quite a chore, and I did not find a way
> to make sure everything would be fine, like seeing the planned
> cluster actions after the probes for the new bridge resource are run
> (when there should not be anything left to do).  Is there a way to
> regain my peace of mind during such operations?  Or is there at least
> a way to order the cluster to start the new bridge clones so that I
> don't have to invoke the resource agent by hand on each node, thus
> reducing possible human mistakes?
> 
> The bridge configuration was moved into the cluster to avoid having to
> maintain it in each node's OS separately.  The network and storage
> resource groups provide a great concise status output with only the VM
> resources expanded.  These are bonuses, but not requirements; if
> sensible maintenance is not achievable with this setup, everything is
> subject to change.  Actually, I'm starting to feel that simplifying
> the VM dependencies may not be viable in the end, but wanted to ask
> for outsider ideas before overhauling the whole configuration.

If I understand you correctly, the problem only arises when adding new
bridges while the cluster is running. And your vms will (rightfully)
get restarted when you add a non-running bridge-resource to the
cloned dependency-group.
You might be able to circumvent this problem: Define the bridge as a
single cloned resource and start it. When it runs on all nodes, remove
the clones for the single resource and add the resource to your
dependency-group in one single edit. With commit the cluster should see
that the new resource in the group is already running and thus not
affect the vms.


On a side-note: I made the (sad) experience that its easier to
configure such stuff outside of pacemaker/corosync and use the cluster
only for the reliable ha things. Configuring several systems into a
sane state is more a job for configuration-management such as chef,
puppet or at least csync2 (to sync the configs).

Have fun,

Arnold


signature.asc
Description: PGP signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Storing arbitrary metadata in the CIB

2013-08-22 Thread Ferenc Wagner
Vladislav Bogdanov  writes:

> 22.08.2013 15:08, Ferenc Wagner wrote:
> 
>> Our setup uses some cluster wide pieces of meta information.
>
> You may use meta attributes of any primitives for that. Although crmsh
> doe not like that very much, it can be switched to a "relaxed" mode.

OK, but the point is that this data is not specific to any resource, but
cluster-global, so that would be unnatural.
-- 
Cheers,
Feri.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] How to painlessly change depended upon resource groups?

2013-08-22 Thread Ferenc Wagner
Hi,

I built a Pacemaker cluster to manage virtual machines (VMs).  Storage
is provided by cLVM volume groups, network access is provided by
software bridges.  I wanted to avoid maintaining precise VG and bridge
dependencies, so I created two cloned resource groups:

group storage dlm clvmd vg-vm vg-data
group network br150 br151

I cloned these groups and thus every VM resource uniformly got two these
two dependencies only, which makes it easy to add new VM resources:

colocation cl-elm-network inf: vm-elm network-clone
colocation cl-elm-storage inf: vm-elm storage-clone
order o-elm-network inf: network-clone vm-elm
order o-elm-storage inf: storage-clone vm-elm

Of course the network and storage groups do not even model their
internal dependencies correctly, as the different VGs and bridges are
independent and unordered, but this is not a serious limitation in my
case.

The problem is, if I want to extend for example the network group by a
new bridge, the cluster wants to restart all running VM resources while
starting the new bridge.  I get this info by changing a shadow copy of
the CIB and crm_simulate --run --live-check on it.  This is perfectly
understandable due to the strict ordering and colocation constraints
above, but undesirable in these cases.

The actual restarts are avoidable by putting the cluster in maintenance
mode beforehand, starting the bridge on each node manually, changing the
configuration and moving the cluster out of maintenance mode, but this
is quite a chore, and I did not find a way to make sure everything would
be fine, like seeing the planned cluster actions after the probes for
the new bridge resource are run (when there should not be anything left
to do).  Is there a way to regain my peace of mind during such
operations?  Or is there at least a way to order the cluster to start
the new bridge clones so that I don't have to invoke the resource agent
by hand on each node, thus reducing possible human mistakes?

The bridge configuration was moved into the cluster to avoid having to
maintain it in each node's OS separately.  The network and storage
resource groups provide a great concise status output with only the VM
resources expanded.  These are bonuses, but not requirements; if
sensible maintenance is not achievable with this setup, everything is
subject to change.  Actually, I'm starting to feel that simplifying the
VM dependencies may not be viable in the end, but wanted to ask for
outsider ideas before overhauling the whole configuration.
-- 
Thanks in advance,
Feri.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Storing arbitrary metadata in the CIB

2013-08-22 Thread Vladislav Bogdanov
22.08.2013 15:08, Ferenc Wagner wrote:
> Hi,
> 
> Our setup uses some cluster wide pieces of meta information.  Think
> access control lists for resource instances used by some utilities or
> some common configuration data used by the resource agents.  Currently
> this info is stored in local files on the nodes or replicated in each
> primitive as parameters.  I find this suboptimal, as keeping them in
> sync is a hassle.  It is possible to store such stuff in the "fake"
> parameter of unmanaged Dummy resources, but that clutters the status
> output.  Can somebody offer some advice in this direction?  Or is this
> idea a pure heresy?
> 

You may use meta attributes of any primitives for that. Although crmsh
doe not like that very much, it can be switched to a "relaxed" mode.

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] cibadmin --delete disregarding attributes

2013-08-22 Thread Ferenc Wagner
Hi,

man cibadmin says: "the tagname and all attributes must match in order
for the element to be deleted", but experience says otherwise: the
primitive is deleted even if it was created with different attributes
than those provided to the --delete call, cf. 'foo' vs 'bar' in the
example below.  Do I misinterpret the documentation or is this a bug in
Pacemaker 1.1.7?

# cibadmin --create --obj_type resources --xml-text "
  

  
  

  "

# cibadmin --delete --obj_type resources --xml-text "
  

  
  

  "
-- 
Thanks,
Feri.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] Storing arbitrary metadata in the CIB

2013-08-22 Thread Ferenc Wagner
Hi,

Our setup uses some cluster wide pieces of meta information.  Think
access control lists for resource instances used by some utilities or
some common configuration data used by the resource agents.  Currently
this info is stored in local files on the nodes or replicated in each
primitive as parameters.  I find this suboptimal, as keeping them in
sync is a hassle.  It is possible to store such stuff in the "fake"
parameter of unmanaged Dummy resources, but that clutters the status
output.  Can somebody offer some advice in this direction?  Or is this
idea a pure heresy?
-- 
Thanks,
Feri.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Backing out of HA

2013-08-22 Thread Ferenc Wagner
Lars Marowsky-Bree  writes:

> "Poisoned resources" indeed should just fail to start and that should be
> that. What instead can happen is that the resource agent notices it
> can't start, reports back to the cluster, and the cluster manager goes
> "Oh no, I couldn't start the resource successfully! It's now possibly in
> a weird state and I better stop it!"
>
> ... And because of the misconfiguration, the *stop* also fails, and
> you're hit with the full power of node-level recovery.
>
> I think this is an issue with some resource agents (if the parameters
> are so bad that the resource couldn't possibly have started, why fail
> the stop?) and possibly also something where one could contemplate a
> better on-fail="" default for "stop in response to first-start failure".

Check out http://www.linux-ha.org/doc/dev-guides/_execution_block.html,
especially the comment "anything other than meta-data and usage must
pass validation".  So if the start action fails with some validation
error, the stop action will as well.  Is this good practice after all?
Or is OCF_ERR_GENERIC treated differently from the other errors in this
regard and thus the validate action should never return OCF_ERR_GENERIC?
-- 
Thanks,
Feri.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] Q: groups of groups

2013-08-22 Thread Ulrich Windl
Hi!

Suppose you have an application A that needs two filesystems F1 and F2. The 
filesystems are on separate LVM VGs VG1 and VG2 with LVs L1 and L2, 
respectively. The RAID R1 and R2 provide the LVM PVs.

(Actually we have one group that has 58 primitives in them with both dimensions 
being wider than in this example)

So you can configure
"group grp_A R1 R2 VG1 VG2 F1 F2 A" (assuming the elements are primitives 
already configured)

Now for example if R2 has a problem, the cluster will restart the whole group 
of resources, even that sequence that is unaffected (R1 VG1 F1). This causes 
extra operations and time for recovery what you don't like.

What you can do now is having parallel execution like this
"group grp_A (R1 R2) (VG1 VG2) (F1 F2) A"
(Note that this is probably a bad idea as the RAIDs and VGs (and maybe mount 
also) most likely use a common lock each that forces serialization)

For the same failure scenario R2 wouldn't be restarted, so the gain is small. A 
better approach seems to be
"group grp_A (R1 VG1 F1) (R2 VG2 F2) A"

Now for the same failure R1, VG1, and F1 will survive; unfortunately if R1 
fails, then everything will be restarted, like in the beginning.

So what you really want is
"group grp_A ((R1 VG1 F1) (R2 VG2 F2)) A"

Now if R2 fails, then R1, VG1, and F1 will survive, and if R1 fails, then R2, 
VG2 and F2 will survive

Unfortunately the syntax of the last example is not supported. This one isn't 
either:

group grp_1 R1 VG1 F1
group grp_2 R2 VG2 F2
group grp_A (grp_1 grp_2) A

So a group of groups would be nice to have. I thought about that long time ago, 
but only yesterday I learned about the syntax of "netgroups" which has exactly 
that: a netgroup can contain another netgroup ;-)

Regards,
Ulrich


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems