Re: [ClusterLabs] ubsubscribe

2024-02-12 Thread Vladislav Bogdanov

s/ub/un/

On February 12, 2024 20:17:45 Bob Marčan via Users  
wrote:



On Mon, 12 Feb 2024 16:48:19 +0100
"Antony Stone"  wrote:


On Monday 12 February 2024 at 16:42:06, Bob Marčan via Users wrote:

> It should be in the body, not in the subject.

According to the headers, it should be in the subject, but not sent to the
list address:



To be on the safe side, i wrote it to the both.

This is the output from the help command:
This is email command help for version 2.1.30 of the "Mailman"
list manager.  The following describes commands you can send to get
information about and control your subscription to Mailman lists at
this site.  A command can be in the subject line or in the body of the
message.

...

List specific commands (subscribe, who, etc) should be sent to the
*-request address for the particular list, e.g. for the 'mailman'
list, use 'mailman-request@...'.

Did you send it to the proper address?

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] how to disable pacemaker throttle mode

2024-02-05 Thread Vladislav Bogdanov
IIRC, there is one issue with that, is that IO load is considered a CPU 
load, so on busy storage servers you get throttling with almost free CPU. I 
may be wrong that load is calculated from loadavg, which is a different 
story at all, as it indicates the number of processes which are ready to 
consume the CPU time, including those waiting for IOs to complete, but that 
is what my mind recalls.


I easily get loadavg of 128 on iscsi storage servers with almost free CPU, 
no thermal reaction at all.


Best,
Vlad

On February 5, 2024 19:22:11 Ken Gaillot  wrote:


On Mon, 2024-02-05 at 18:08 +0800, hywang via Users wrote:

hello, everyone:
Is there any way to disable pacemaker throttle mode. If there is,
where to find it?
Thanks!



You can influence it via the load-threshold and node-action-limit
cluster options.

The cluster throttles when CPU usage approaches load-threshold
(defaulting to 80%), and limits the number of simultaneous actions on a
node to node-action-limit (defaulting to twice the number of cores).

The node action limit can be overridden per node by setting the
PCMK_node_action_limit environment variable (typically in
/etc/sysconfig/pacemaker, /etc/default/pacemaker, etc. depending on
distro).
--
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] cluster doesn't do HA as expected, pingd doesn't help

2023-12-19 Thread Vladislav Bogdanov
What if node (especially vm) freezes for several minutes and then continues 
to write to a shared disk where other nodes already put their data?
In my opinion, fencing, preferably two-level, is mandatory for lustre, 
trust me, I'd developed whole HA stack for both Exascaler and PangeaFS. 
We've seen so many points where data loss may occur...


On December 19, 2023 19:42:56 Artem  wrote:

Andrei and Klaus thanks for prompt reply and clarification!
As I understand, design and behavior of Pacemaker is tightly coupled with 
the stonith concept. But isn't it too rigid?


Is there a way to leverage self-monitoring or pingd rules to trigger 
isolated node to umount its FS? Like vSphere High Availability host 
isolation response.
Can resource-stickiness=off (auto-failback) decrease risk of corruption by 
unresponsive node coming back online?
Is there a quorum feature not for cluster but for resource start/stop? Got 
lock - is welcome to mount, unable to refresh lease - force unmount.
Can on-fail=ignore break manual failover logic (stopped will be considered 
as failed and thus ignored)?


best regards,
Artem

On Tue, 19 Dec 2023 at 17:03, Klaus Wenninger  wrote:


On Tue, Dec 19, 2023 at 10:00 AM Andrei Borzenkov  wrote:
On Tue, Dec 19, 2023 at 10:41 AM Artem  wrote:
...
Dec 19 09:48:13 lustre-mds2.ntslab.ru pacemaker-schedulerd[785107] 
(update_resource_action_runnable)warning: OST4_stop_0 on lustre4 is 
unrunnable (node is offline)
Dec 19 09:48:13 lustre-mds2.ntslab.ru pacemaker-schedulerd[785107] 
(recurring_op_for_active)info: Start 20s-interval monitor for OST4 on 
lustre3
Dec 19 09:48:13 lustre-mds2.ntslab.ru pacemaker-schedulerd[785107] 
(log_list_item)  notice: Actions: Stop   OST4( lustre4 
)  blocked


This is the default for the failed stop operation. The only way
pacemaker can resolve failure to stop a resource is to fence the node
where this resource was active. If it is not possible (and IIRC you
refuse to use stonith), pacemaker has no other choice as to block it.
If you insist, you can of course sert on-fail=ignore, but this means
unreachable node will continue to run resources. Whether it can lead
to some corruption in your case I cannot guess.

Don't know if I'm reading that correctly but I understand what you had written
above that you try to trigger the failover by stopping the VM (lustre4) without
ordered shutdown.
With fencing disabled what we are seeing is exactly what we would expect:
The state of the resource is unknown - pacemaker tries to stop it - doesn't 
work
as the node is offline - no fencing configured - so everything it can do is 
wait

till there is info if the resource is up or not.
I guess the strange output below is because of fencing disabled - quite an
unusual - also not recommended - configuration and so this might not have
shown up too often in that way.

Klaus

Dec 19 09:48:13 lustre-mds2.ntslab.ru pacemaker-schedulerd[785107] 
(pcmk__create_graph) crit: Cannot fence lustre4 because of OST4: 
blocked (OST4_stop_0)


That is a rather strange phrase. The resource is blocked because the
pacemaker could not fence the node, not the other way round.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Mutually exclusive resources ?

2023-09-27 Thread Vladislav Bogdanov

Hi,

Probably utilization attributes may help with that. Try to add f.e. 'ip' 
utilization attrubute with value '1' to both nodes, and then add the same 
to VIP resources.




Adam Cecile  27 сентября 2023 г. 14:21:05 написал:

Hello,

I'm struggling to understand if it's possible to create some kind of 
constraint to avoid two different resources to be running on the same host.
Basically, I'd like to have floating IP "1" and floating IP "2" always 
being assigned to DIFFERENT nodes.

Is that something possible ? Can you give me a hint ?

Thanks in advance, Adam.

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] location constraint does not move promoted resource ?

2023-07-03 Thread Vladislav Bogdanov
I think 1 is a common number across promotable resource agents writers 
to pass to crm_master when agent during probe/monitor call thinks that node 
is really ready to have a resource promoted. Drbd is one of examples.


Best,
Vlad

lejeczek via Users  03.06.2023. 19:32:58 wrote:


On 03/07/2023 18:55, Andrei Borzenkov wrote:

On 03.07.2023 19:39, Ken Gaillot wrote:

On Mon, 2023-07-03 at 19:22 +0300, Andrei Borzenkov wrote:

On 03.07.2023 18:07, Ken Gaillot wrote:

On Mon, 2023-07-03 at 12:20 +0200, lejeczek via Users
wrote:

On 03/07/2023 11:16, Andrei Borzenkov wrote:

On 03.07.2023 12:05, lejeczek via Users wrote:

Hi guys.

I have pgsql with I constrain like so:

-> $ pcs constraint location PGSQL-clone rule
role=Promoted
score=-1000 gateway-link ne 1

and I have a few more location constraints with that
ethmonitor & those work, but this one does not seem to.
When contraint is created cluster is silent, no
errors nor
warning, but relocation does not take place.
I can move promoted resource manually just fine, to
that
node where 'location' should move it.


Instance to promote is selected according to promotion
scores which are normally set by resource agent.
Documentation implies that standard location constraints
are also taken in account, but there is no
explanation how
promotion scores interoperate with location scores.
It is
possible that promotion score in this case takes
precedence.

It seems to have kicked in with score=-1 but..
that was me just guessing.
Indeed it would be great to know how those are
calculated,
in a way which would' be admin friendly or just obvious.

thanks, L.


It's a longstanding goal to have some sort of tool for
explaining
how
scores interact in a given situation. However it's a
challenging
problem and there's never enough time ...

Basically, all scores are added together for each node,
and the
node
with the highest score runs the resource, subject to
any placement
strategy configured. These mainly include stickiness,
location
constraints, colocation constraints, and node health.
Nodes may be


And you omitted the promotion scores which was the main
question.


Oh right -- first, the above is used to determine the
nodes on which
clone instances will be placed. After that, an
appropriate number of
nodes are selected for the promoted role, based on
promotion scores and
location and colocation constraints for the promoted role.


I am sorry but it does not really explain anything. Let's
try concrete examples

a) master clone instance has location score -1000 for a
node and promotion score 1000. Is this node eligible for
promoting clone instance (assuming no other scores are
present)?

b) promotion score is equal on two nodes A and B, but node
A has better location score than node B. Is it guaranteed
that clone will be promoted on A?

a real-life example:
...Colocation Constraints:
  Started resource 'HA-10-1-1-253' with Promoted resource
'PGSQL-clone' (id:
  colocation-HA-10-1-1-253-PGSQL-clone-INFINITY)
score=INFINITY
..
Order Constraints:
  promote resource 'PGSQL-clone' then start resource
'HA-10-1-1-253' (id: order-
  PGSQL-clone-HA-10-1-1-253-Mandatory)
symmetrical=0 kind=Mandatory
  demote resource 'PGSQL-clone' then stop resource
'HA-10-1-1-253' (id: order-
  PGSQL-clone-HA-10-1-1-253-Mandatory-1)
symmetrical=0 kind=Mandatory

I had to bump this one up to:
...
  resource 'PGSQL-clone' (id: location-PGSQL-clone)
Rules:
  Rule: role=Promoted score=-1 (id:
location-PGSQL-clone-rule)
Expression: gateway-link ne 1 (id:
location-PGSQL-clone-rule-expr)


'-1000' did not seem to be good enough, '-1' was just a
"lucky" guess.

as earlier: I was able to 'move' the promoted and I think
'prefers' also worked.
I don't know it 'pgsql' would work with any other
constraints, if it was safe to try so.

many thanks, L.


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Offtopic - role migration

2023-04-18 Thread Vladislav Bogdanov
Btw, an interesting question. How much efforts would it take to support a 
migration of a Master role over the nodes? An use-case is drbd, configured 
for a multi-master mode internally, but with master-max=1 in the resource 
definition. Assuming that resource-agent supports that flow -

1. Do nothing.
2. Promote on a dest node.
3. Demote on a source node.

Actually just wonder, because may be it could be some-how achievable to 
migrate VM which are on top of drbd which is not a multi-master in 
pacemaker. Fully theoretical case. Didn't verify the flow in-the-mind.


I believe that currently only the top-most resource is allowed to migrate, 
but may be there is some room for impovement?


Sorry for the off-topic.

Best
Vlad

Ken Gaillot  18 апреля 2023 г. 18:23:00 написал:


On Tue, 2023-04-18 at 14:58 +0200, lejeczek via Users wrote:

Hi guys.

When it's done by the cluster itself, eg. a node goes 'standby' - how
do clusters migrate VirtualDomain resources?


1. Call resource agent migrate_to action on original node
2. Call resource agent migrate_from action on new node
3. Call resource agent stop action on original node


Do users have any control over it and if so then how?


The allow-migrate resource meta-attribute (true/false)


I'd imagine there must be some docs - I failed to find


It's sort of scattered throughout Pacemaker Explained -- the main one
is:

https://clusterlabs.org/pacemaker/doc/2.1/Pacemaker_Explained/html/advanced-options.html#migrating-resources


Especially in large deployments one obvious question would be - I'm
guessing as my setup is rather SOHO - can VMs migrate in sequence or
it is(always?) a kind of 'swarm' migration?


The migration-limit cluster property specifies how many live migrations
may be initiated at once (the default of -1 means unlimited).
--
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] HA problem: No live migration when setting node on standby

2023-04-12 Thread Vladislav Bogdanov
On Wed, 2023-04-12 at 14:04 +0300, Andrei Borzenkov wrote:
> On Wed, Apr 12, 2023 at 1:21 PM Vladislav Bogdanov  ok.com> wrote:
> > 
> > Hi,
> > 
> > Just add a Master role for drbd resource in the colocation. Default
> > is Started (or Slave).
> > 
> 
> Could you elaborate why it is needed? The problem is not leaving the
> resource on the node with a demoted instance - when the node goes
> into
> standby, all resources must be evacuated from it anyway. How
> collocating VM with master changes it?

Just an experience. Having constraints non-consistent with each other
touches many corner cases in the code. Especially in such extreme
circumstances like node going to standby, which usually involves
several transitions.

For me that is just a rule of thumb:
colocate VM:Started with drbd:Master
order drbd:promote then VM:start



> 
> > 
> > Philip Schiller  12 апреля 2023 г.
> > 11:28:57 написал:
> > > 
> > > 
> > > 
> > > Hi All,
> > > 
> > > I am using a simple two-nodes cluster with Zvol -> DRBD -> Virsh
> > > in
> > > primary/primary mode (necessary for live migration).  My
> > > configuration:
> > > 
> > > primitive pri-vm-alarmanlage VirtualDomain \
> > >     params config="/etc/libvirt/qemu/alarmanlage.xml"
> > > hypervisor="qemu:///system" migration_transport=ssh \
> > >     meta allow-migrate=true target-role=Started is-
> > > managed=true \
> > >     op monitor interval=0 timeout=120 \
> > >     op start interval=0 timeout=120 \
> > >     op stop interval=0 timeout=1800 \
> > >     op migrate_to interval=0 timeout=1800 \
> > >     op migrate_from interval=0 timeout=1800 \
> > >     utilization cpu=2 hv_memory=4096
> > > ms mas-drbd-alarmanlage pri-drbd-alarmanlage \
> > >     meta clone-max=2 promoted-max=2 notify=true promoted-
> > > node-max=1 clone-node-max=1 interleave=true target-role=Started
> > > is-managed=true
> > > colocation colo_mas_drbd_alarmanlage_with_clo_pri_zfs_drbd-
> > > storage inf: mas-drbd-alarmanlage clo-pri-zfs-drbd_storage
> > > location location-pri-vm-alarmanlage-s0-200 pri-vm-alarmanlage
> > > 200: s1
> > > order ord_pri-alarmanlage-after-mas-drbd-alarmanlage Mandatory:
> > > mas-drbd-alarmanlage:promote pri-vm-alarmanlage:start
> > > 
> > > So to summerize:
> > > - A  resource for Virsh
> > > - A Master/Slave DRBD ressources for the VM filesystem .
> > > - a "order" directive to start the VM after drbd has been
> > > promoted.
> > > 
> > > Node startup is ok, the VM is started after DRBD is promoted.
> > > Migration with virsh or over crm  > > alarmanlage s0> works fine.
> > > 
> > > Node standby is problematic. Assuming the Virsh VM runs on node
> > > s1 :
> > > 
> > > When puting node s1 in standby when node s0 is active, a live
> > > migration
> > > is started, BUT in the same second, pacemaker tries to demote
> > > DRBD
> > > volumes on s1 (while live migration is in progress).
> > > 
> > > All this results in "stopping the vm" on s1 and starting the "vm
> > > on s0".
> > > 
> > > I do not understand why pacemaker does demote/stop DRBD volumes
> > > before VM is migrated.
> > > Do i need additional constraints?
> > > 
> > > Setup is done with
> > > - Corosync Cluster Engine, version '3.1.6'
> > > - Pacemaker 2.1.2
> > > - Ubuntu 22.04.2 LTS
> > > 
> > > Thanks for your help,
> > > 
> > > with kind regards Philip
> > > 
> > > ___
> > > Manage your subscription:
> > > https://lists.clusterlabs.org/mailman/listinfo/users
> > > 
> > > ClusterLabs home: https://www.clusterlabs.org/
> > > 
> > 
> > ___
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> > 
> > ClusterLabs home: https://www.clusterlabs.org/
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] HA problem: No live migration when setting node on standby

2023-04-12 Thread Vladislav Bogdanov

Hi,

Just add a Master role for drbd resource in the colocation. Default is 
Started (or Slave).



Philip Schiller  12 апреля 2023 г. 11:28:57 написал:
Hi All, I am using a simple two-nodes cluster with Zvol -> DRBD -> Virsh in 
primary/primary mode (necessary for live migration). My configuration: 
primitive pri-vm-alarmanlage VirtualDomain \ params 
config="/etc/libvirt/qemu/alarmanlage.xml" hypervisor="qemu:///system" 
migration_transport=ssh \ meta allow-migrate=true target-role=Started 
is-managed=true \ op monitor interval=0 timeout=120 \ op start interval=0 
timeout=120 \ op stop interval=0 timeout=1800 \ op migrate_to interval=0 
timeout=1800 \ op migrate_from interval=0 timeout=1800 \ utilization cpu=2 
hv_memory=4096
ms mas-drbd-alarmanlage pri-drbd-alarmanlage \ meta clone-max=2 
promoted-max=2 notify=true promoted-node-max=1 clone-node-max=1 
interleave=true target-role=Started is-managed=true
colocation colo_mas_drbd_alarmanlage_with_clo_pri_zfs_drbd-storage inf: 
mas-drbd-alarmanlage clo-pri-zfs-drbd_storage

location location-pri-vm-alarmanlage-s0-200 pri-vm-alarmanlage 200: s1
order ord_pri-alarmanlage-after-mas-drbd-alarmanlage Mandatory: 
mas-drbd-alarmanlage:promote pri-vm-alarmanlage:start So to summerize:

- A resource for Virsh
- A Master/Slave DRBD ressources for the VM filesystem . - a "order" 
directive to start the VM after drbd has been promoted. Node startup is ok, 
the VM is started after DRBD is promoted.
Migration with virsh or over crm  
works fine. Node standby is problematic. Assuming the Virsh VM runs on node 
s1 : When puting node s1 in standby when node s0 is active, a live 
migration is started, BUT in the same second, pacemaker tries to demote 
DRBD volumes on s1 (while live migration is in progress). All this results 
in "stopping the vm" on s1 and starting the "vm on s0". I do not understand 
why pacemaker does demote/stop DRBD volumes before VM is migrated.
Do i need additional constraints? Setup is done with - Corosync Cluster 
Engine, version '3.1.6'

- Pacemaker 2.1.2
- Ubuntu 22.04.2 LTS Thanks for your help, with kind regards Philip

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Lost access with the volume while ZFS and other resources migrate to other node (reset VM)

2023-04-05 Thread Vladislav Bogdanov
Ah, and yes, it is for iptables, not for nft or firewalld. Could be easily 
fixed though.

And RA expects target chains to be pre-created.

Vladislav Bogdanov  5 апреля 2023 г. 14:53:35 написал:


Please find attached.
I use it the following way:

primitive vip-10-5-4-235 ocf:my-org:IPaddr2 \
   params ip="10.5.4.235" cidr_netmask="24" \
   op start interval="0" timeout="20" \
   op stop interval="0" timeout="20" \
   op monitor interval="30" timeout="20"
primitive vip-10-5-4-235-fw ocf:my-org:VIPfirewall \
   params vip="10.5.4.235" allow_action="pass" \
   input_chain="_ISCSI_INPUT" output_chain="_ISCSI_OUTPUT" \
   op start interval="0" timeout="30" \
   op stop interval="0" timeout="60" \
   op monitor interval="30" timeout="10" role="Master" \
   op monitor interval="15" timeout="10" role="Slave"
primitive vip-10-5-4-236 ocf:my-org:IPaddr2 \
   params ip="10.5.4.236" cidr_netmask="24" \
   op start interval="0" timeout="20" \
   op stop interval="0" timeout="20" \
   op monitor interval="30" timeout="20"
primitive vip-10-5-4-236-fw ocf:my-org:VIPfirewall \
   params vip="10.5.4.236" allow_action="pass" \
   input_chain="_ISCSI_INPUT" output_chain="_ISCSI_OUTPUT" \
   op start interval="0" timeout="30" \
   op stop interval="0" timeout="60" \
   op monitor interval="30" timeout="10" role="Master" \
   op monitor interval="15" timeout="10" role="Slave"
group c01-pool-0-iscsi-vips vip-10-5-4-235 vip-10-5-4-236
group c01-pool-0-iscsi-vips-fw vip-10-5-4-235-fw vip-10-5-4-236-fw
ms ms-c01-pool-0-iscsi-vips-fw c01-pool-0-iscsi-vips-fw \
   meta master-max="1" master-node-max="1" clone-max="2" \
   clone-node-max="1" notify="false" interleave="true" \
   target-role="Master"
colocation c01-pool-0-iscsi-vips-fw-with-vips inf: \
   ms-c01-pool-0-iscsi-vips-fw:Master \
   c01-pool-0-iscsi-vips:Started
order c01-pool-0-iscsi-vips-fw-after-target inf: iscsi-export:start \
   ms-c01-pool-0-iscsi-vips-fw:promote
order c01-pool-0-iscsi-vips-fw-after-vips inf: \
   c01-pool-0-iscsi-vips:start \
   ms-c01-pool-0-iscsi-vips-fw:promote

On Wed, 2023-04-05 at 07:17 +0300, Александр via Users wrote:

What is this agent? For iscsitarget, I only found portblock RA, in
the linstor manual. Can you share the agent and setup instructions?>
Среда, 5 апреля 2023, 6:09 +10:00 от Vladislav Bogdanov
:
>
> I know that uscsi initiators are very sensible to connection drops.
> That's why in all my setups with iscsi I use a special m/s resource
> agent which in a slave mode drops all packets to/from portals. That
> prevents initiators from receiving FIN packets from the target when
> it migrates, and they usually behave much better. I can share that
> RA and setup instructions if that is interesting to someone.


--
С уважением,
Александр Волков

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/





--
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Lost access with the volume while ZFS and other resources migrate to other node (reset VM)

2023-04-05 Thread Vladislav Bogdanov
Please find attached.
I use it the following way:

primitive vip-10-5-4-235 ocf:my-org:IPaddr2 \
params ip="10.5.4.235" cidr_netmask="24" \
op start interval="0" timeout="20" \
op stop interval="0" timeout="20" \
op monitor interval="30" timeout="20"
primitive vip-10-5-4-235-fw ocf:my-org:VIPfirewall \
params vip="10.5.4.235" allow_action="pass" \
input_chain="_ISCSI_INPUT" output_chain="_ISCSI_OUTPUT" \
op start interval="0" timeout="30" \
op stop interval="0" timeout="60" \
op monitor interval="30" timeout="10" role="Master" \
op monitor interval="15" timeout="10" role="Slave"
primitive vip-10-5-4-236 ocf:my-org:IPaddr2 \
params ip="10.5.4.236" cidr_netmask="24" \
op start interval="0" timeout="20" \
op stop interval="0" timeout="20" \
op monitor interval="30" timeout="20"
primitive vip-10-5-4-236-fw ocf:my-org:VIPfirewall \
params vip="10.5.4.236" allow_action="pass" \
input_chain="_ISCSI_INPUT" output_chain="_ISCSI_OUTPUT" \
op start interval="0" timeout="30" \
op stop interval="0" timeout="60" \
op monitor interval="30" timeout="10" role="Master" \
op monitor interval="15" timeout="10" role="Slave"
group c01-pool-0-iscsi-vips vip-10-5-4-235 vip-10-5-4-236
group c01-pool-0-iscsi-vips-fw vip-10-5-4-235-fw vip-10-5-4-236-fw
ms ms-c01-pool-0-iscsi-vips-fw c01-pool-0-iscsi-vips-fw \
meta master-max="1" master-node-max="1" clone-max="2" \
clone-node-max="1" notify="false" interleave="true" \
target-role="Master"
colocation c01-pool-0-iscsi-vips-fw-with-vips inf: \
ms-c01-pool-0-iscsi-vips-fw:Master \
c01-pool-0-iscsi-vips:Started
order c01-pool-0-iscsi-vips-fw-after-target inf: iscsi-export:start \
ms-c01-pool-0-iscsi-vips-fw:promote
order c01-pool-0-iscsi-vips-fw-after-vips inf: \
c01-pool-0-iscsi-vips:start \
ms-c01-pool-0-iscsi-vips-fw:promote

On Wed, 2023-04-05 at 07:17 +0300, Александр via Users wrote:
> What is this agent? For iscsitarget, I only found portblock RA, in
> the linstor manual. Can you share the agent and setup instructions?>
> Среда, 5 апреля 2023, 6:09 +10:00 от Vladislav Bogdanov
> :
> >  
> > I know that uscsi initiators are very sensible to connection drops.
> > That's why in all my setups with iscsi I use a special m/s resource
> > agent which in a slave mode drops all packets to/from portals. That
> > prevents initiators from receiving FIN packets from the target when
> > it migrates, and they usually behave much better. I can share that
> > RA and setup instructions if that is interesting to someone.
>  
>  
> --
> С уважением,
> Александр Волков
>  
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/



VIPfirewall
Description: application/shellscript
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Lost access with the volume while ZFS and other resources migrate to other node (reset VM)

2023-04-04 Thread Vladislav Bogdanov
I know that uscsi initiators are very sensible to connection drops. That's 
why in all my setups with iscsi I use a special m/s resource agent which in 
a slave mode drops all packets to/from portals. That prevents initiators 
from receiving FIN packets from the target when it migrates, and they 
usually behave much better. I can share that RA and setup instructions if 
that is interesting to someone.


Reid Wahl  4 апреля 2023 г. 20:20:52 написал:


On Tue, Apr 4, 2023 at 7:08 AM Ken Gaillot  wrote:


On Mon, 2023-04-03 at 02:47 +0300, Александр via Users wrote:
> Pacemaker + corosync cluster with 2 virtual machines (ubuntu 22.04,
> 16 Gb RAM, 8 CPU each) are assembled into a cluster, an HBA is
> forwarded to each of them to connect to a disk shelf according to the
> instructions https://netbergtw.com/top-support/articles/zfs-cib /. A

That looks like a well-thought-out guide. One minor correction, since
Corosync 3, no-quorum-policy=ignore is no longer needed. Instead, set
"two_node: 1" in corosync.conf (which may be automatic depending on
what tools you're using).

That's unlikely to be causing any issues, though.

> ZFS pool was assembled from 4 disks in draid1, resources were
> configured - virtual IP, iSCSITarget, iSCSILun. LUN connected in
> VMware. During an abnormal shutdown of the node, resources move, but

How are you testing abnormal shutdown? For something like a power
interruption. I'd expect that the node would be fenced, but in your
logs it looks like recovery is taking place between clean nodes.


See also discussion starting at this comment:
https://github.com/ClusterLabs/resource-agents/issues/1852#issuecomment-1479119045

Happy to see this on the mailing list :)



> at the moment this happens, VMware loses contact with the LUN, which
> should not happen. The journalctl log at the time of the move is
> here: https://pastebin.com/eLj8DdtY. I also tried to build a common
> storage on drbd with cloned VIP and Target resources, but this also
> does not work, besides, every time I move, there are always some
> problems with the start of resources. Any ideas what can be done
> about this? Loss of communication with the LUN even for a couple of
> seconds is already critical.
>
> corosync-qdevice/jammy,now 3.0.1-1 amd64 [installed]
> corosync-qnetd/jammy,now 3.0.1-1 amd64 [installed]
> corosync/jammy,now 3.1.6-1ubuntu1 amd64 [installed]
> pacemaker-cli-utils/jammy,now 2.1.2-1ubuntu3 amd64
> [installed,automatic]
> pacemaker-common/jammy,now 2.1.2-1ubuntu3 all [installed,automatic]
> pacemaker-resource-agents/jammy,now 2.1.2-1ubuntu3 all
> [installed,automatic]
> pacemaker/jammy,now 2.1.2-1ubuntu3 amd64 [installed]
> pcs/jammy,now 0.10.11-2ubuntu3 all [installed]
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
--
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/




--
Regards,

Reid Wahl (He/Him)
Senior Software Engineer, Red Hat
RHEL High Availability - Pacemaker

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: [EXT] resource cloned group colocations

2023-03-02 Thread Vladislav Bogdanov
On Thu, 2023-03-02 at 14:30 +0100, Ulrich Windl wrote:
> > > > Gerald Vogt  schrieb am 02.03.2023 um 08:41
> > > > in Nachricht
> <624d0b70-5983-4d21-6777-55be91688...@spamcop.net>:
> > Hi,
> > 
> > I am setting up a mail relay cluster which main purpose is to
> > maintain 
> > the service ips via IPaddr2 and move them between cluster nodes
> > when 
> > necessary.
> > 
> > The service ips should only be active on nodes which are running
> > all 
> > necessary mail (systemd) services.
> > 
> > So I have set up a resource for each of those services, put them
> > into a 
> > group in order they should start, cloned the group as they are
> > normally 
> > supposed to run on the nodes at all times.
> > 
> > Then I added an order constraint
> >    start mail-services-clone then start mail1-ip
> >    start mail-services-clone then start mail2-ip
> > 
> > and colocations to prefer running the ips on different nodes but
> > only 
> > with the clone running:
> > 
> >    colocation add mail2-ip with mail1-ip -1000
> >    colocation ip1 with mail-services-clone
> >    colocation ip2 with mail-services-clone
> > 
> > as well as a location constraint to prefer running the first ip on
> > the 
> > first node and the second on the second
> > 
> >    location ip1 prefers ha1=2000
> >    location ip2 prefers ha2=2000
> > 
> > Now if I stop pacemaker on one of those nodes, e.g. on node ha2,
> > it's 
> > fine. ip2 will be moved immediately to ha3. Good.
> > 
> > However, if pacemaker on ha2 starts up again, it will immediately
> > remove 
> > ip2 from ha3 and keep it offline, while the services in the group
> > are 
> > starting on ha2. As the services unfortunately take some time to
> > come 
> > up, ip2 is offline for more than a minute.
> 
> That is because you wanted "ip2 prefers ha2=2000", so if the cluster
> _can_ run it there, then it will, even if it's running elsewhere.
> 

Pacemaker sometime places actions in the transition in a suboptimal
order (prom the humans point of view).
So instead of

start group on nodeB
stop vip on nodeA
start vip on nodeB

it runs

stop vip on nodeA
start group on nodeB
start vip on nodeB

So, if start of group takes a lot of time, then vip is not available on
any node during that start.

One more techniques to minimize the time during which vip is stopped
would be to add resource migration support to IPAddr2.
That could help, but I'm not sure.
At least I know for sure pacemaker behaves differently with migratable
resources and MAY decide to use the first order I provided..

> Maybe explain what you really want.
> 
> > 
> > It seems the colocations with the clone are already good once the
> > clone 
> > group begins to start services and thus allows the ip to be removed
> > from 
> > the current node.
> > 
> > I was wondering how can I define the colocation to be accepted only
> > if 
> > all services in the clone have been started? And not once the first
> > service in the clone is starting?
> > 
> > Thanks,
> > 
> > Gerald
> > 
> > 
> > ___
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users 
> > 
> > ClusterLabs home: https://www.clusterlabs.org/ 
> 
> 
> 
> 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] resource cloned group colocations

2023-03-02 Thread Vladislav Bogdanov
On Thu, 2023-03-02 at 08:41 +0100, Gerald Vogt wrote:
> Hi,
> 
> I am setting up a mail relay cluster which main purpose is to
> maintain 
> the service ips via IPaddr2 and move them between cluster nodes when 
> necessary.
> 
> The service ips should only be active on nodes which are running all 
> necessary mail (systemd) services.
> 
> So I have set up a resource for each of those services, put them into
> a 
> group in order they should start, cloned the group as they are
> normally 
> supposed to run on the nodes at all times.
> 
> Then I added an order constraint
>    start mail-services-clone then start mail1-ip
>    start mail-services-clone then start mail2-ip
> 
> and colocations to prefer running the ips on different nodes but only
> with the clone running:
> 
>    colocation add mail2-ip with mail1-ip -1000
>    colocation ip1 with mail-services-clone
>    colocation ip2 with mail-services-clone
> 
> as well as a location constraint to prefer running the first ip on
> the 
> first node and the second on the second
> 
>    location ip1 prefers ha1=2000
>    location ip2 prefers ha2=2000
> 
> Now if I stop pacemaker on one of those nodes, e.g. on node ha2, it's
> fine. ip2 will be moved immediately to ha3. Good.
> 
> However, if pacemaker on ha2 starts up again, it will immediately
> remove 
> ip2 from ha3 and keep it offline, while the services in the group are
> starting on ha2. As the services unfortunately take some time to come
> up, ip2 is offline for more than a minute.
> 
> It seems the colocations with the clone are already good once the
> clone 
> group begins to start services and thus allows the ip to be removed
> from 
> the current node.
> 
> I was wondering how can I define the colocation to be accepted only
> if 
> all services in the clone have been started? And not once the first 
> service in the clone is starting?
> 
> Thanks,
> 
> Gerald
> 

I noticed such behavior many years ago - it is especially visible with
a long-starting resources, and one of techniques
to deal with that is to use transient node attributes instead of
colocation/order between group and vip.
I'm not sure there is a suitable open-source resource agent which just
manages specified node attribute, but it should be
not hard to compose one which implements a pseudo-resource handler
together with atrrd_updater calls.
Probably you can trim all ethernet-related from a ethmonitor to make
such almost-dummy resource agent.

Once RA is there, you can add it as the last resource in the group, and
then rely on the attribute it manages to start your VIP.
That is done with location constraints, just use score-attribute in
their rules -
 
https://clusterlabs.org/pacemaker/doc/2.1/Pacemaker_Explained/singlehtml/index.html#rule-properties

So, the idea is: your custom RA sets attribute 'mail-clone-started' to
something like 1,
and you have a location constraint which prevents cluster from starting
your VIP resource on a node if value of  'mail-clone-started' attribute
on a node is less then 1 or not defined.
Once node has that attribute set (which happens at the very end of a
start sequence of a group) then (and only then) it decides to move your
VIP
to that node (because of other location constraints with preferences
you already have).

Just make sure attributes are transient (not stored into CIB).


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Failed 'virsh' call when test RA run by crm_resource (con't)

2023-01-11 Thread Vladislav Bogdanov
What would be the reason of running that command without redirecting its 
output somewhere?



Madison Kelly  12 января 2023 г. 07:21:44 написал:


On 2023-01-12 01:12, Reid Wahl wrote:

On Wed, Jan 11, 2023 at 8:11 PM Madison Kelly  wrote:


Hi all,

There was a lot of sub-threads, so I figured it's helpful to start a
new thread with a summary so far. For context; I have a super simple
perl script that pretends to be an RA for the sake of debugging.

https://pastebin.com/9z314TaB

I've had variations log environment variables and confirmed that all
the variables in the direct call that work are in the crm_resource
triggered call. There are no selinux issues logged in audit.log and
selinux is permissive. The script logs the real and effective UID and
GID and it's the same in both instances. Calling other shell programs
(tested with 'hostname') run fine, this is specifically crm_resource ->
test RA -> virsh call.

I ran strace on the virsh call from inside my test script (changing
'virsh.good' to 'virsh.bad' between running directly and via
crm_resource. The strace runs made six files each time. Below are
pastebin links with the outputs of the six runs in one paste, but each
file's output is in it's own block (search for file: to see the
different file outputs)

Good/direct run of the test RA:
- https://pastebin.com/xtqe9NSG

Bad/crm_resource triggered run of the test RA:
- https://pastebin.com/vBiLVejW

Still absolutely stumped.


The strace outputs show that your bad runs are all getting stopped
with SIGTTOU. If you've never heard of that, me either.


The hell?! This is new to me also.


https://www.gnu.org/software/libc/manual/html_node/Job-Control-Signals.html

Macro: int SIGTTOU

 This is similar to SIGTTIN, but is generated when a process in a
background job attempts to write to the terminal or set its modes.
Again, the default action is to stop the process. SIGTTOU is only
generated for an attempt to write to the terminal if the TOSTOP output
mode is set; see Output Modes.


Maybe this has something to do with the buffer settings in the perl
script(?). It might be worth trying a version that doesn't fiddle with
the outputs and buffer settings.


I tried removing the $|, and then I changed the script to be entirely a
bash script, still hanging. I tried 'virsh --connect  list
--all' where method was qemu:///system, qemu:///session, and
ssh+qemu:///root@localhost/system, all hang. In bash or perl.


I don't know which difference between your environment and mine is
relevant here, such that I can't reproduce the issue using your test
script. It works perfectly fine for me.

Can you run `stty -a | grep tostop`? If there's a minus sign
("-tostop"), it's disabled; if it's present without a minus sign
("tostop"), it's enabled, as best I can tell.


-tostop is there


[root@mk-a07n02 ~]# stty -a | grep tostop
isig icanon iexten echo echoe echok -echonl -noflsh -xcase -tostop -echoprt
[root@mk-a07n02 ~]#



I'm just spitballing here. It's disabled by default on my machine...
but even when I enable it, crm_resource --validate works fine. It may
be set differently when running under crm_resource.


How do you enable it?

--
Madison Kelly
Alteeve's Niche!
Chief Technical Officer
c: +1-647-471-0951
https://alteeve.com/

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] RA hangs when called by crm_resource (resending text format)

2023-01-11 Thread Vladislav Bogdanov
And, one more thing can affect that - selinux. I doubt, but that's worth 
checking.


Vladislav Bogdanov  11 января 2023 г. 22:21:03 написал:
Then I would suggest to log all env vars and compare them, probably 
something is missing in validate for virsh to be happy.


Madison Kelly  11 января 2023 г. 22:06:45 написал:


On 2023-01-11 01:13, Vladislav Bogdanov wrote:

I suspect that valudate action is run as a non-root user.


I modified the script to log the real and effective UIDs and it's
running as root in both instances.


Madison Kelly  11 января 2023 г. 07:06:55 написал:


On 2023-01-11 00:21, Madison Kelly wrote:

On 2023-01-11 00:14, Madison Kelly wrote:

Hi all,

Edit: Last message was in HTML format, sorry about that.

I've got a hell of a weird problem, and I am absolutely stumped on
what's going on.

The short of it is; if my RA is called from the command line, it's
fine. If a resource exists, monitor, enable, disable, all that stuff
works just fine. If I try to create a resource, it hangs on the
validate stage. Specifically, it hangs when 'pcs' calls:

crm_resource --validate --output-as xml --class ocf --agent server
--provider alteeve --option name=

Specifically, it hangs when it tries to make a shell call (to
virsh, specifically, but that doesn't matter). So to debug, I started
stripping down my RA simpler and simpler until I was left with the
very most basic of programs;

https://pastebin.com/VtSpkwMr

That is literally the simplest program I could write that made the
shell call. The 'open()' call is where it hangs.

When I call directly;

time /usr/lib/ocf/resource.d/alteeve/server --validate-all --server
srv04-test; echo rc:$?


real0m0.061s
user0m0.037s
sys0m0.014s
rc:0


It's just fine. I can see in the log the output from the 'virsh' call
as well. However, when I call from crm_resource;

time crm_resource --validate --output-as xml --class ocf --agent
server --provider alteeve --option name=srv04-test; echo rc:$?









crm_resource: Error performing operation: Error
occurred




real0m20.521s
user0m0.022s
sys0m0.010s
rc:1


In the log file, I see (from line 20 of the super-simple-test-script):


Calling: [/usr/bin/virsh dumpxml --inactive srv04-test 2>&1;
/usr/bin/echo return_code:0 |]


Then nothing else.

The strace output is: https://pastebin.com/raw/UCEUdBeP

Environment;

* selinux is permissive
* Pacemaker 2.1.5-4.el8
* pcs 0.10.15
* 4.18.0-408.el8.x86_64
* CentOS Stream release 8

Any help is appreciated, I am stumped. :/


After sending this, I tried having my "RA" call 'hostname', and that
worked fine. I switched back to 'virsh list --all', and that hangs. So
it seems to somehow be related to call 'virsh' specifically.


OK, so more info... Knowing now that it's a problem with the virsh call
specifically (but only when validating, existing VMs monitor, enable,
disable fine, all which repeatedly call virsh), I noticed a few things.

First, I see in the logs:


Jan 11 00:30:43 mk-a07n02.digimer.ca libvirtd[2937]: Cannot recv data:
Connection reset by peer


So with this, I further simplified my test script to this:

https://pastebin.com/Ey8FdL1t

Then when I ran my test script directly, the strace output is:

Good: https://pastebin.com/Trbq67ub

When my script is called via crm_resource, the strace is this:

Bad: https://pastebin.com/jtbzHrUM

The first difference I can see happens around line 929 in the good
paste, the line "futex(0x7f48b0001ca0, FUTEX_WAKE_PRIVATE, 1) = 0"
exists, which doesn't in the bad paste. Shortly after, I start seeing:


line: [write(4, "\1\0\0\0\0\0\0\0", 8) = 8]
line: [brk(NULL)   = 0x562b7877d000]
line: [brk(0x562b787aa000) = 0x562b787aa000]
line: [write(4, "\1\0\0\0\0\0\0\0", 8) = 8]


Around line 959 in the bad paste. There are more brk() lines, and not
long after the output stops.

--
Madison Kelly
Alteeve's Niche!
Chief Technical Officer
c: +1-647-471-0951
https://alteeve.com/

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


--
Madison Kelly
Alteeve's Niche!
Chief Technical Officer
c: +1-647-471-0951
https://alteeve.com/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] RA hangs when called by crm_resource (resending text format)

2023-01-11 Thread Vladislav Bogdanov
Then I would suggest to log all env vars and compare them, probably 
something is missing in validate for virsh to be happy.


Madison Kelly  11 января 2023 г. 22:06:45 написал:


On 2023-01-11 01:13, Vladislav Bogdanov wrote:

I suspect that valudate action is run as a non-root user.


I modified the script to log the real and effective UIDs and it's
running as root in both instances.


Madison Kelly  11 января 2023 г. 07:06:55 написал:


On 2023-01-11 00:21, Madison Kelly wrote:

On 2023-01-11 00:14, Madison Kelly wrote:

Hi all,

Edit: Last message was in HTML format, sorry about that.

   I've got a hell of a weird problem, and I am absolutely stumped on
what's going on.

   The short of it is; if my RA is called from the command line, it's
fine. If a resource exists, monitor, enable, disable, all that stuff
works just fine. If I try to create a resource, it hangs on the
validate stage. Specifically, it hangs when 'pcs' calls:

crm_resource --validate --output-as xml --class ocf --agent server
--provider alteeve --option name=

   Specifically, it hangs when it tries to make a shell call (to
virsh, specifically, but that doesn't matter). So to debug, I started
stripping down my RA simpler and simpler until I was left with the
very most basic of programs;

https://pastebin.com/VtSpkwMr

   That is literally the simplest program I could write that made the
shell call. The 'open()' call is where it hangs.

When I call directly;

time /usr/lib/ocf/resource.d/alteeve/server --validate-all --server
srv04-test; echo rc:$?


real0m0.061s
user0m0.037s
sys0m0.014s
rc:0


It's just fine. I can see in the log the output from the 'virsh' call
as well. However, when I call from crm_resource;

time crm_resource --validate --output-as xml --class ocf --agent
server --provider alteeve --option name=srv04-test; echo rc:$?



   
 
 
   
   
 
   crm_resource: Error performing operation: Error
occurred
 
   


real0m20.521s
user0m0.022s
sys0m0.010s
rc:1


In the log file, I see (from line 20 of the super-simple-test-script):


Calling: [/usr/bin/virsh dumpxml --inactive srv04-test 2>&1;
/usr/bin/echo return_code:0 |]


Then nothing else.

The strace output is: https://pastebin.com/raw/UCEUdBeP

Environment;

* selinux is permissive
* Pacemaker 2.1.5-4.el8
* pcs 0.10.15
* 4.18.0-408.el8.x86_64
* CentOS Stream release 8

Any help is appreciated, I am stumped. :/


After sending this, I tried having my "RA" call 'hostname', and that
worked fine. I switched back to 'virsh list --all', and that hangs. So
it seems to somehow be related to call 'virsh' specifically.



OK, so more info... Knowing now that it's a problem with the virsh call
specifically (but only when validating, existing VMs monitor, enable,
disable fine, all which repeatedly call virsh), I noticed a few things.

First, I see in the logs:


Jan 11 00:30:43 mk-a07n02.digimer.ca libvirtd[2937]: Cannot recv data:
Connection reset by peer


So with this, I further simplified my test script to this:

https://pastebin.com/Ey8FdL1t

Then when I ran my test script directly, the strace output is:

Good: https://pastebin.com/Trbq67ub

When my script is called via crm_resource, the strace is this:

Bad: https://pastebin.com/jtbzHrUM

The first difference I can see happens around line 929 in the good
paste, the line "futex(0x7f48b0001ca0, FUTEX_WAKE_PRIVATE, 1) = 0"
exists, which doesn't in the bad paste. Shortly after, I start seeing:


line: [write(4, "\1\0\0\0\0\0\0\0", 8) = 8]
line: [brk(NULL)   = 0x562b7877d000]
line: [brk(0x562b787aa000) = 0x562b787aa000]
line: [write(4, "\1\0\0\0\0\0\0\0", 8) = 8]


Around line 959 in the bad paste. There are more brk() lines, and not
long after the output stops.

--
Madison Kelly
Alteeve's Niche!
Chief Technical Officer
c: +1-647-471-0951
https://alteeve.com/

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/




--
Madison Kelly
Alteeve's Niche!
Chief Technical Officer
c: +1-647-471-0951
https://alteeve.com/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] RA hangs when called by crm_resource (resending text format)

2023-01-10 Thread Vladislav Bogdanov

I suspect that valudate action is run as a non-root user.

Madison Kelly  11 января 2023 г. 07:06:55 написал:


On 2023-01-11 00:21, Madison Kelly wrote:

On 2023-01-11 00:14, Madison Kelly wrote:

Hi all,

Edit: Last message was in HTML format, sorry about that.

   I've got a hell of a weird problem, and I am absolutely stumped on
what's going on.

   The short of it is; if my RA is called from the command line, it's
fine. If a resource exists, monitor, enable, disable, all that stuff
works just fine. If I try to create a resource, it hangs on the
validate stage. Specifically, it hangs when 'pcs' calls:

crm_resource --validate --output-as xml --class ocf --agent server
--provider alteeve --option name=

   Specifically, it hangs when it tries to make a shell call (to
virsh, specifically, but that doesn't matter). So to debug, I started
stripping down my RA simpler and simpler until I was left with the
very most basic of programs;

https://pastebin.com/VtSpkwMr

   That is literally the simplest program I could write that made the
shell call. The 'open()' call is where it hangs.

When I call directly;

time /usr/lib/ocf/resource.d/alteeve/server --validate-all --server
srv04-test; echo rc:$?


real0m0.061s
user0m0.037s
sys0m0.014s
rc:0


It's just fine. I can see in the log the output from the 'virsh' call
as well. However, when I call from crm_resource;

time crm_resource --validate --output-as xml --class ocf --agent
server --provider alteeve --option name=srv04-test; echo rc:$?



   
 
 
   
   
 
   crm_resource: Error performing operation: Error
occurred
 
   


real0m20.521s
user0m0.022s
sys0m0.010s
rc:1


In the log file, I see (from line 20 of the super-simple-test-script):


Calling: [/usr/bin/virsh dumpxml --inactive srv04-test 2>&1;
/usr/bin/echo return_code:0 |]


Then nothing else.

The strace output is: https://pastebin.com/raw/UCEUdBeP

Environment;

* selinux is permissive
* Pacemaker 2.1.5-4.el8
* pcs 0.10.15
* 4.18.0-408.el8.x86_64
* CentOS Stream release 8

Any help is appreciated, I am stumped. :/


After sending this, I tried having my "RA" call 'hostname', and that
worked fine. I switched back to 'virsh list --all', and that hangs. So
it seems to somehow be related to call 'virsh' specifically.



OK, so more info... Knowing now that it's a problem with the virsh call
specifically (but only when validating, existing VMs monitor, enable,
disable fine, all which repeatedly call virsh), I noticed a few things.

First, I see in the logs:


Jan 11 00:30:43 mk-a07n02.digimer.ca libvirtd[2937]: Cannot recv data:
Connection reset by peer


So with this, I further simplified my test script to this:

https://pastebin.com/Ey8FdL1t

Then when I ran my test script directly, the strace output is:

Good: https://pastebin.com/Trbq67ub

When my script is called via crm_resource, the strace is this:

Bad: https://pastebin.com/jtbzHrUM

The first difference I can see happens around line 929 in the good
paste, the line "futex(0x7f48b0001ca0, FUTEX_WAKE_PRIVATE, 1) = 0"
exists, which doesn't in the bad paste. Shortly after, I start seeing:


line: [write(4, "\1\0\0\0\0\0\0\0", 8) = 8]
line: [brk(NULL)   = 0x562b7877d000]
line: [brk(0x562b787aa000) = 0x562b787aa000]
line: [write(4, "\1\0\0\0\0\0\0\0", 8) = 8]


Around line 959 in the bad paste. There are more brk() lines, and not
long after the output stops.

--
Madison Kelly
Alteeve's Niche!
Chief Technical Officer
c: +1-647-471-0951
https://alteeve.com/

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: [EXT] DRBD Dual Primary Write Speed Extremely Slow

2022-11-14 Thread Vladislav Bogdanov
Hi

On Mon, 2022-11-14 at 15:00 +0100, Tyler Phillippe via Users wrote:
> Good idea! I setup a RAM disk on both of those systems, let them
> sync, added it to the cluster. 
> 
> One thing I left out (which didn't hit me until yesterday as a
> possibility) is that I have the iSCSI LUN attached to two Windows
> servers that are acting as a Scale-Out File Server. When I copied a
> file over to the new RAMdisk LUN via Scale-Out File Server, I am
> still getting 10-20MB/s; however, when I create a large file to the
> underlying, shared DRBD on those CentOS machines, I am getting about
> 700+MB/s, which I watched via iostat. So, I guess it's the Scale-Out
> File Server causing the issue. Not sure why Microsoft and the Scale-
> Out File Server is causing the issue - guess Microsoft really doesn't
> like non-Microsoft backing disks
> 


Not with Microsoft, but with overall iSCSI performance. For the older
iSCSI target - IET - I used to use the following settings:
InitialR2T=No 
ImmediateData=Yes 
MaxRecvDataSegmentLength=65536 
MaxXmitDataSegmentLength=65536 
MaxBurstLength=262144 
FirstBurstLength=131072 
MaxOutstandingR2T=2 
Wthreads=128 
QueuedCommands=32

Without that iSCSI LUNs were very slow independently of backing device
speed.
Probably LIO provides a way to set them up as well.

Best,
Vladislav

> Does anyone have any experience with that, perhaps? Thanks!!
> 
> Respectfully,
>  Tyler
> 
> 
> 
> Nov 14, 2022, 2:30 AM by ulrich.wi...@rz.uni-regensburg.de:
> > Hi!
> > 
> > If you have planty of RAM you could configure an iSCSI disk using a
> > ram disk and try how much I/O you get from there.
> > Maybe you issue is not-su-much DRBD related. However when my local
> > MD-RAID1 resyncs with about 120MB/s (spinning disks), the system
> > also is hardly usable.
> > 
> > Regards,
> > Ulrich
> > > > > Tyler Phillippe via Users  schrieb am
> > > > > 13.11.2022 um
> > 19:26 in Nachricht :
> > > Hello all,
> > > 
> > > I have setup a Linux cluster on 2x CentOS 8 Stream machines - it
> > > has 
> > > resources to manage a dual primary, GFS2 DRBD setup. DRBD and the
> > > cluster 
> > > have a diskless witness. Everything works fine - I have the dual
> > > primary DRBD 
> > > working and it is able to present an iSCSI LUN out to my LAN.
> > > However, the 
> > > DRBD write speed is terrible. The backing DRBD disks (HDD) are
> > > RAID10 using 
> > > mdadm and they (re)sync at around 150MB/s. DRBD verify has been
> > > limited to 
> > > 100MB/s, but left untethered, it will get to around 140MB/s. If I
> > > write data 
> > > to the iSCSI LUN, I only get about 10-15MB/s. Here's the DRBD 
> > > global_common.conf - these are exactly the same on both machines:
> > > 
> > > global {
> > > usage-count no;
> > > udev-always-use-vnr;
> > > }
> > > 
> > > common {
> > > handlers {
> > > }
> > > 
> > > startup {
> > > wfc-timeout 5;
> > > degr-wfc-timeout 5;
> > > }
> > > 
> > > options {
> > > auto-promote yes;
> > > quorum 1;
> > > on-no-data-accessible suspend-io;
> > > on-no-quorum suspend-io;
> > > }
> > > 
> > > disk {
> > > al-extents 4096;
> > > al-updates yes;
> > > no-disk-barrier;
> > > disk-flushes;
> > > on-io-error detach;
> > > c-plan-ahead 0;
> > > resync-rate 100M;
> > > }
> > > 
> > > net {
> > > protocol C;
> > > allow-two-primaries yes;
> > > cram-hmac-alg "sha256";
> > > csums-alg "sha256";
> > > verify-alg "sha256";
> > > shared-secret "secret123";
> > > max-buffers 36864;
> > > rcvbuf-size 5242880;
> > > sndbuf-size 5242880;
> > > }
> > > }
> > > 
> > > Respectfully,
> > > Tyler
> > 
> > 
> > 
> > 
> > ___
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> > 
> > ClusterLabs home: https://www.clusterlabs.org/
> 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] FYI: clusterlabs.org server maintenance window this weekend

2022-11-01 Thread Vladislav Bogdanov

It is not under pacemaker control???

Ken Gaillot  1 ноября 2022 г. 19:03:45 написал:


Hi everybody,

Just FYI, the clusterlabs.org server (including the websites and
mailing lists) will be taken down for planned maintenance this weekend.
Most likely it will just be a few hours on Saturday, but if there are
complications it could be longer.
--
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] How to globally enable trace log level in pacemaker?

2021-10-31 Thread Vladislav Bogdanov

Hi
You may want to look at blackbox fuctionality, controlled by signals, if 
you won't find a way to get traces by env vars. It provides traces.


Best regards

On October 31, 2021 11:20:16 AM Andrei Borzenkov  wrote:


I think it worked in the past by passing a lot of -VVV when starting
pacemaker. It does not seem to work now. I can call /usr/sbin/pacemakerd
-..., but it does pass options further to children it
starts. So every other daemon is started without any option and with
default log level.

This pacemaker 2.1.0 from openSUSE Tumbleweed.

P.S. environment variable to directly set log level would certainly be
helpful.


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Cloned ressource is restarted on all nodes if one node fails

2021-08-09 Thread Vladislav Bogdanov

Hi.
I'd suggest to set your clone meta attribute 'interleaved' to 'true'

Best,
Vladislav

On August 9, 2021 1:43:16 PM Andreas Janning  wrote:

Hi all,

we recently experienced an outage in our pacemaker cluster and I would like 
to understand how we can configure the cluster to avoid this problem in the 
future.


First our basic setup:
- CentOS7
- Pacemaker 1.1.23
- Corosync 2.4.5
- Resource-Agents 4.1.1

Our cluster is composed of multiple active/passive nodes. Each software 
component runs on two nodes simultaneously and all traffic is routed to the 
active node via Virtual IP.
If the active node fails, the passive node grabs the Virtual IP and 
immediately takes over all work of the failed node. Since the software is 
already up and running on the passive node, there should be virtually no 
downtime.
We have tried achieved this in pacemaker by configuring clone-sets for each 
software component.


Now the problem:
When a software component fails on the active node, the Virtual-IP is 
correctly grabbed by the passive node. BUT the software component is also 
immediately restarted on the passive Node.
That unfortunately defeats the purpose of the whole setup, since we now 
have a downtime until the software component is restarted on the passive 
node and the restart might even fail and lead to a complete outage.
After some investigating I now understand that the cloned resource is 
restarted on all nodes after a monitoring failure because the default 
"on-fail" of "monitor" is restart. But that is not what I want.


I have created a minimal setup that reproduces the problem:




value="false"/>
value="1.1.23-1.el7_9.1-9acf116022"/>
name="cluster-infrastructure" value="corosync"/>
value="pacemaker-test"/>
value="false"/>
name="symmetric-cluster" value="false"/>










value="{{infrastructure.virtual_ip}}"/>



timeout="20s"/>









value="http://localhost/server-status"/>



timeout="20s"/>






value="2"/>
name="clone-node-max" value="1"/>





rsc="apache-clone" score="100" resource-discovery="exclusive"/>
rsc="apache-clone" score="0" resource-discovery="exclusive"/>
rsc="vip" score="100" resource-discovery="exclusive"/>
rsc="vip" score="0" resource-discovery="exclusive"/>
score="INFINITY" with-rsc="apache-clone"/>




name="resource-stickiness" value="50"/>






When this configuration is started, httpd will be running on active-node 
and passive-node. The VIP runs only on active-node.
When crashing the httpd on active-node (with killall httpd), passive-node 
immediately grabs the VIP and restarts its own httpd.


How can I change this configuration so that when the resource fails on 
active-node:

- passive-node immediately grabs the VIP (as it does now).
- active-node tries to restart the failed resource, giving up after x attempts.
- passive-node does NOT restart the resource.

Regards

Andreas Janning



--

Beste Arbeitgeber ITK 2021 - 1. Platz für QAware
ausgezeichnet von Great Place to Work
Andreas Janning
Expert Software Engineer
QAware GmbH
Aschauer Straße 32
81549 München, Germany
Mobil +49 160 1492426
andreas.jann...@qaware.de
www.qaware.de
Geschäftsführer: Christian Kamm, Johannes Weigend, Dr. Josef Adersberger
Registergericht: München
Handelsregisternummer: HRB 163761
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Sub-clusters / super-clusters?

2021-08-03 Thread Vladislav Bogdanov

Hi
You probably want to look at booth and tickets for a geo-clustering solution.


On August 3, 2021 11:40:54 AM Antony Stone  
wrote:



On Tuesday 11 May 2021 at 12:56:01, Strahil Nikolov wrote:


Here is the example I had promised:

pcs node attribute server1 city=LA
pcs node attribute server2 city=NY

# Don't run on any node that is not in LA
pcs constraint location DummyRes1 rule score=-INFINITY city ne LA

#Don't run on any node that is not in NY
pcs constraint location DummyRes2 rule score=-INFINITY city ne NY

The idea is that if you add a node and you forget to specify the attribute
with the name 'city' , DummyRes1 & DummyRes2 won't be started on it.

For resources that do not have a constraint based on the city -> they will
run everywhere unless you specify a colocation constraint between the
resources.


Excellent - thanks.  I happen to use crmsh rather than pcs, but I've adapted
the above and got it working.

Unfortunately, there is a problem.

My current setup is:

One 3-machine cluster in city A running a bunch of resources between them, the
most important of which for this discussion is Asterisk telephony.

One 3-machine cluster in city B doing exactly the same thing.

The two clusters have no knowledge of each other.

I have high-availability routing between my clusters and my upstream telephony
provider, such that a call can be handled by Cluster A or Cluster B, and if
one is unavailable, the call gets routed to the other.

Thus, a total failure of Cluster A means I still get phone calls, via Cluster
B.


To implement the above "one resource which can run anywhere, but only a single
instance", I joined together clusters A and B, and placed the corresponding
location constraints on the resources I want only at A and the ones I want
only at B.  I then added the resource with no location constraint, and it runs
anywhere, just once.

So far, so good.


The problem is:

With the two independent clusters, if two machines in city A fail, then
Cluster A fails completely (no quorum), and Cluster B continues working.  That
means I still get phone calls.

With the new setup, if two machines in city A fail, then _both_ clusters stop
working and I have no functional resources anywhere.


So, my question now is:

How can I have a 3-machine Cluster A running local resources, and a 3-machine
Cluster B running local resources, plus one resource running on either Cluster
A or Cluster B, but without a failure of one cluster causing _everything_ to
stop?


Thanks,


Antony.

--
One tequila, two tequila, three tequila, floor.

  Please reply to the list;
please *don't* CC me.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] 32 nodes pacemaker cluster setup issue

2021-05-19 Thread Vladislav Bogdanov

Hi.

Have you considered using pacemaker-remote instead?


On May 18, 2021 5:55:57 PM S Sathish S  wrote:

Hi Team,

We are setup 32 nodes pacemaker cluster setup each node has 10 resource so 
total [around 300+ components] are up and running. While performing 
installation/update with below task will happen.


From First node we start adding all 31 nodes one-by-one into the cluster 
and added resource for each nodes.
we execute pcs command stop/start resource parallelly in some use-case for 
all nodes.
If any network related change in node , we kept pcs in maintenance mode and 
post that network change disable pcs maintenance mode.
Some case we use to reboot the node one-by-one also for some 
kernel/application changes to be reflected.


Till 9 node cluster is working fine for us  we don’t see below reported 
issue , For 32 node cluster setup we are facing below error whenever we 
perform installation/upgrade with above task is executed.


Please find the coroysnc logs in problematic duration with below error 
message :


May 17 08:08:47 [1978] node1  corosync notice  [TOTEM ] A new membership 
(10.61.78.50:85864) was formed. Members left: 2 16 17 31 15 12 13 14 27 28 
29 30 20 32 18 7 22 19 24 25 10 5 6 26 23 21 11 3 4
May 17 08:08:47 [1978] node1  corosync notice  [TOTEM ] Failed to receive 
the leave message. failed: 2 16 17 31 15 12 13 14 27 28 29 30 20 32 18 7 22 
19 24 25 10 5 6 26 23 21 11 3 4
May 17 08:08:47 [1978] node1  corosync notice  [QUORUM] This node is within 
the non-primary component and will NOT provide any services.

May 17 08:08:47 [1978] node1  corosync notice  [QUORUM] Members[1]: 1
May 17 08:08:47 [1978] node1  corosync notice  [MAIN  ] Completed service 
synchronization, ready to provide service.
May 17 11:17:30 [1866] node1  corosync notice  [MAIN  ] Corosync Cluster 
Engine ('UNKNOWN'): started and ready to provide service.
May 17 11:17:30 [1866] node1   corosync info[MAIN  ] Corosync built-in 
features: pie relro bindnow
May 17 11:17:30 [1866] node1   corosync warning [MAIN  ] Could not set 
SCHED_RR at priority 99: Operation not permitted (1)
May 17 11:17:30 [1866] node1   corosync notice  [TOTEM ] Initializing 
transport (UDP/IP Unicast).
May 17 11:17:30 [1866] node1  corosync notice  [TOTEM ] Initializing 
transmit/receive security (NSS) crypto: none hash: none
May 17 11:17:30 [1866] node1   corosync notice  [TOTEM ] The network 
interface [10.61.78.50] is now up.
May 17 11:17:30 [1866] node1   corosync notice  [SERV  ] Service engine 
loaded: corosync configuration map access [0]

May 17 11:17:30 [1866] node1   corosync info[QB] server name: cmap
May 17 11:17:30 [1866] node1   corosync notice  [SERV  ] Service engine 
loaded: corosync configuration service [1]

May 17 11:17:30 [1866] node1   corosync info[QB] server name: cfg
May 17 11:17:30 [1866] node1   corosync notice  [SERV  ] Service engine 
loaded: corosync cluster closed process group service v1.01 [2]

May 17 11:17:30 [1866] node1   corosync info[QB] server name: cpg
May 17 11:17:30 [1866] node1   corosync notice  [SERV  ] Service engine 
loaded: corosync profile loading service [4]
May 17 11:17:30 [1866] node1   corosync notice  [QUORUM] Using quorum 
provider corosync_votequorum
May 17 11:17:30 [1866] node1   corosync notice  [SERV  ] Service engine 
loaded: corosync vote quorum service v1.0 [5]

May 17 11:17:30 [1866] node1  corosync info[QB] server name: votequorum
May 17 11:17:30 [1866] node1  corosync notice  [SERV  ] Service engine 
loaded: corosync cluster quorum service v0.1 [3]

May 17 11:17:30 [1866] node1  corosync info[QB] server name: quorum

Another node logs :
May 18 16:20:17 [1968] node2 corosync notice  [TOTEM ] A new membership 
(10.223.106.11:104056) was formed. Members left: 2 16 17 31 15 12 1 13 14 
27 28 29 30 20 7 22 8 9 19 24 25 10 5 6 26 23 11 3 4
May 18 16:20:17 [1968] node2 corosync notice  [TOTEM ] Failed to receive 
the leave message. failed: 2 16 17 31 15 12 1 13 14 27 28 29 30 20 7 22 8 9 
19 24 25 10 5 6 26 23 11 3 4
May 18 16:20:17 [1968] node2 corosync notice  [QUORUM] This node is within 
the non-primary component and will NOT provide any services.

May 18 16:20:17 [1968] node2 corosync notice  [QUORUM] Members[1]: 32
May 18 16:20:17 [1968] node2 corosync notice  [MAIN  ] Completed service 
synchronization, ready to provide service.
May 18 16:22:20 [1968] node2 corosync notice  [TOTEM ] A new membership 
(10.217.41.26:104104) was formed. Members joined: 27 29 18

May 18 16:22:20 [1968] node2 corosync notice  [QUORUM] Members[4]: 27 29 32 18
May 18 16:22:20 [1968] node2 corosync notice  [MAIN  ] Completed service 
synchronization, ready to provide service.
May 18 16:22:45 [1968] node2 corosync notice  [TOTEM ] A new membership 
(10.217.41.26:104112) was formed. Members

May 18 16:22:45 [1968] node2 corosync notice  [QUORUM] Members[4]: 27 29 32 18
May 18 16:22:45 [1968] node2 corosync notice  [MAIN  ] Completed service 

Re: [ClusterLabs] Is reverse order for "promote" supposed to be "demote"?

2021-05-11 Thread Vladislav Bogdanov

Hi.

Try
order o_fs_drbd0_after_ms_drbd0 Mandatory: ms_drbd0:promote fs_drbd0:start



On May 11, 2021 6:35:58 PM Andrei Borzenkov  wrote:


While testing drbd cluster I found errors (drbd device busy) when
stopping drbd master with mounted filesystem. I do have

order o_fs_drbd0_after_ms_drbd0 Mandatory: ms_drbd0:promote fs_drbd0

and I assumed pacemaker automatically does reverse as "first stop then
demote". It does not - umount and demote are initiated concurrently.

Adding explicit

order o_stop_fs_drbd0_before_demote_ms_drbd0 Mandatory: fs_drbd0:stop
ms_drbd0:demote

fixed it, but should not this be automatic?

The versions is pacemaker-2.0.5+20210310.83e765df6-1.1.x86_64.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] How to set up "active-active" cluster by balancing multiple exports across servers?

2021-01-13 Thread Vladislav Bogdanov
Hi.

I would run nfsserver and nfsnotify as a separate cloned group and make
both other groups colocated/ordered with it.
So nfs server will be just a per-host service, and then you attach
exports (with LVs, filesystems, ip addresses) to it.
NFS server in linux is an in-kernel creature, not an userspace process,
and it is not designed to have several instances bound to different
addresses. But with the approach above you can overcome that.

On Tue, 2021-01-12 at 11:04 -0700, Billy Wilson wrote:
> I'm having trouble setting up what seems like should be a 
> straightforward NFS-HA design. It is similar to what Christoforos 
> Christoforou attempted to do earlier in 2020 
> (https://www.mail-archive.com/users@clusterlabs.org/msg09671.html).
> 
> My goal is to balance multiple NFS exports across two nodes to 
> effectively have an "active-active" configuration. Each export should
> only be available from one node at a time, but they should be able to
> freely fail back and forth to balance between the two nodes.
> 
> I'm also hoping to isolate each exported filesystem to its own set of
> underlying disks, to prevent heavy IO on one exported filesystem from
> affecting another one. So each filesystem to be exported should be 
> backed by a unique volume group.
> 
> I've set up two nodes with fencing, an ethmonitor clone, and the 
> following two resource groups.
> 
> """
>    * Resource Group: ha1:
>  * alice_lvm    (ocf::heartbeat:LVM-activate):    Started host1
>  * alice_xfs    (ocf::heartbeat:Filesystem):    Started host1
>  * alice_nfs    (ocf::heartbeat:nfsserver):    Started host1
>  * alice_ip    (ocf::heartbeat:IPaddr2):    Started host1
>  * alice_nfsnotify    (ocf::heartbeat:nfsnotify):    Started
> host1
>  * alice_login01    (ocf::heartbeat:exportfs):    Started host1
>  * alice_login02    (ocf::heartbeat:exportfs):    Started host1
>    * Resource Group: ha2:
>  * bob_lvm    (ocf::heartbeat:LVM-activate):    Started host2
>  * bob_xfs    (ocf::heartbeat:Filesystem):    Started host2
>  * bob_nfs    (ocf::heartbeat:nfsserver):    Started host2
>  * bob_ip    (ocf::heartbeat:IPaddr2):    Started host2
>  * bob_nfsnotify    (ocf::heartbeat:nfsnotify):    Started host2
>  * bob_login01    (ocf::heartbeat:exportfs):    Started host2
>  * bob_login02    (ocf::heartbeat:exportfs):    Started host2
> """
> 
> We had an older storage appliance that used Red Hat HA on RHEL 6
> (back 
> when it still used RGManager and not Pacemaker), and it was capable
> of 
> load-balanced NFS-HA like this.
> 
> The problem with this approach using Pacemaker is that the
> "nfsserver" 
> resource agent only wants one instance per host. During a failover 
> event, both "nfsserver" RAs will try to bind mount the NFS shared
> info 
> directory to /var/lib/nfs/. Only one will claim the directory.
> 
> If I convert everything to a single resource group as Christoforos
> did, 
> then the cluster is active-passive, and all the resources fail as a 
> single unit. Having one node serve all the exports while the other is
> idle doesn't seem very ideal.
> 
> I'd like to eventually have something like this:
> 
> """
>    * Resource Group: ha1:
>  * alice_lvm    (ocf::heartbeat:LVM-activate):    Started host1
>  * alice_xfs    (ocf::heartbeat:Filesystem):    Started host1
>  * charlie_lvm    (ocf::heartbeat:LVM-activate):    Started host1
>  * charlie_xfs    (ocf::heartbeat:Filesystem):    Started host1
>  * ha1_nfs    (ocf::heartbeat:nfsserver):    Started host1
>  * alice_ip    (ocf::heartbeat:IPaddr2):    Started host1
>  * charlie_ip    (ocf::heartbeat:IPaddr2):    Started host1
>  * ha1_nfsnotify    (ocf::heartbeat:nfsnotify):    Started host1
>  * alice_login01    (ocf::heartbeat:exportfs):    Started host1
>  * alice_login02    (ocf::heartbeat:exportfs):    Started host1
>  * charlie_login01    (ocf::heartbeat:exportfs):    Started host1
>  * charlie_login02    (ocf::heartbeat:exportfs):    Started host1
>    * Resource Group: ha2:
>  * bob_lvm    (ocf::heartbeat:LVM-activate):    Started host2
>  * bob_xfs    (ocf::heartbeat:Filesystem):    Started host2
>  * david_lvm    (ocf::heartbeat:LVM-activate):    Started host2
>  * david_xfs    (ocf::heartbeat:Filesystem):    Started host2
>  * ha2_nfs    (ocf::heartbeat:nfsserver):    Started host2
>  * bob_ip    (ocf::heartbeat:IPaddr2):    Started host2
>  * david_ip    (ocf::heartbeat:IPaddr2):    Started host2
>  * ha2_nfsnotify    (ocf::heartbeat:nfsnotify):    Started host2
>  * bob_login01    (ocf::heartbeat:exportfs):    Started host2
>  * bob_login02    (ocf::heartbeat:exportfs):    Started host2
>  * david_login01    (ocf::heartbeat:exportfs):    Started host2
>  * david_login02    (ocf::heartbeat:exportfs):    Started host2
> """
> 
> Or even this:
> 
> """
>    * Resource Group: alice_research:
>  * alice_lvm    

Re: [ClusterLabs] VirtualDomain stop operation traced - but nothing appears in /var/lib/heartbeat/trace_ra/

2020-09-30 Thread Vladislav Bogdanov

Hi

Try to enable trace_ra for start op.

On September 28, 2020 10:50:19 PM "Lentes, Bernd" 
 wrote:



Hi,

currently i have a VirtualDomains resource which sometimes fails to stop.
To investigate further i'm tracing the stop operation of this resource.
But although i stopped it already now several times, nothing appears in 
/var/lib/heartbeat/trace_ra/.


This is my config:
primitive vm_amok VirtualDomain \
   params config="/mnt/share/vm_amok.xml" \
   params hypervisor="qemu:///system" \
   params migration_transport=ssh \
   params migrate_options="--p2p --tunnelled" \
   op start interval=0 timeout=120 \
   op monitor interval=30 timeout=25 \
   op migrate_from interval=0 timeout=300 \
   op migrate_to interval=0 timeout=300 \
   op stop interval=0 timeout=180 \
   op_params trace_ra=1 \
   meta allow-migrate=true target-role=Started is-managed=true 
maintenance=false \


   
  id="vm_amok-instance_attributes-config"/>

   
   
  id="vm_amok-instance_attributes-0-hypervisor"/>

   
   
  id="vm_amok-instance_attributes-1-migration_transport"/>

   
   
  id="vm_amok-instance_attributes-2-migrate_options"/>

   
   
 
 
  id="vm_amok-migrate_from-0"/>

 
 
   
  id="vm_amok-stop-0-instance_attributes-trace_ra"/>

   
 
   

Any ideas ?
SLES 12 SP4, pacemaker-1.1.19+20181105.ccd6b5b10-3.13.1.x86_64

Bernd

--

Bernd Lentes
Systemadministration
Institute for Metabolism and Cell Death (MCD)
Building 25 - office 122
HelmholtzZentrum München
bernd.len...@helmholtz-muenchen.de
phone: +49 89 3187 1241
phone: +49 89 3187 3827
fax: +49 89 3187 2294
http://www.helmholtz-muenchen.de/mcd

stay healthy
Helmholtz Zentrum München

Helmholtz Zentrum München

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Coming in Pacemaker 2.0.5: better start-up/shutdown coordination with sbd

2020-08-23 Thread Vladislav Bogdanov
Good to know it is now not needed. You are correct about logic, yes, I just 
forgot details. I just recall that sbd added too much load during cluster 
start and recoveries.


Thank you!

On August 23, 2020 1:23:37 PM Klaus Wenninger  wrote:

On 8/21/20 8:55 PM, Vladislav Bogdanov wrote:

Hi,

btw, is sbd is now able to handle cib diffs internally?
Last time I tried to use it with frequently changing CIB, it became a CPU 
hog - it requested full CIB copy on every change.

Actually sbd should have been able to handle cib-diffs since ever.
Are you sure it requested a full copy of the CIB with every change?
Atm it should request a full update roughly twice every watchdog-timeout
and in between just noop-pings to the cib-api - as long as imbedding the
diffs goes OK of course.
In general we need full cib-updates as otherwise loss of a cib-diff
would mean possibly missing node-state updates.
What it on top does is convert the cib to a cluster-state roughly
every second or with every 10th cib-diff. The latter might impose
some cpu-usage when cib is updating at a high rate of course and
might not be really needed.
With the new pacemakerd-API we don't need the cib-diffs anymore
for graceful-shutdown-detection. Thus easiest might be to disable
diff-handling completely when pacemakerd-API is used.



Fri, 21/08/2020 в 13:16 -0500, Ken Gaillot wrote:

Hi all,

Looking ahead to the Pacemaker 2.0.5 release expected toward the end of
this year, we will have improvements of interest to anyone running
clusters with sbd.

Previously at start-up, if sbd was blocked from contacting Pacemaker's
CIB in a way that looked like pacemaker wasn't running (SELinux being a
good example), pacemaker would run resources without protection from
sbd. Now, if sbd is running, pacemaker will wait until sbd contacts it
before it will start any resources, so the cluster is protected in this
situation.

Additionally, sbd will now periodically contact the main pacemaker
daemon for a status report. Currently, this is just an immediate
response, but it ensures that the main pacemaker daemon is responsive
to IPC requests. This is a bit more assurance that pacemaker is not
only running, but functioning properly. In future versions, we will
have even more in-depth health checks as part of this feature.

Previously at shutdown, sbd determined a clean pacemaker shutdown by
checking whether any resources were running at shutdown. This would
lead to sbd fencing if pacemaker shut down in maintenance mode with
resources active. Now, sbd will determine clean shutdowns as part of
the status report described above, avoiding that situation.

These behaviors will be controlled by a new option in
/etc/sysconfig/sbd or /etc/default/sbd, SBD_SYNC_RESOURCE_STARTUP. This
defaults to "no" for backward compatibility when a newer sbd is used
with an older pacemaker or vice versa. Distributions may change the
value to "yes" since they can ensure both sbd and pacemaker versions
support it; users who build their own installations can set it
themselves if both versions support it.



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: 
https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Coming in Pacemaker 2.0.5: better start-up/shutdown coordination with sbd

2020-08-21 Thread Vladislav Bogdanov
Hi,

btw, is sbd is now able to handle cib diffs internally?
Last time I tried to use it with frequently changing CIB, it became a
CPU hog - it requested full CIB copy on every change.


Fri, 21/08/2020 в 13:16 -0500, Ken Gaillot wrote:
> Hi all,
> 
> Looking ahead to the Pacemaker 2.0.5 release expected toward the end of
> this year, we will have improvements of interest to anyone running
> clusters with sbd.
> 
> Previously at start-up, if sbd was blocked from contacting Pacemaker's
> CIB in a way that looked like pacemaker wasn't running (SELinux being a
> good example), pacemaker would run resources without protection from
> sbd. Now, if sbd is running, pacemaker will wait until sbd contacts it
> before it will start any resources, so the cluster is protected in this
> situation.
> 
> Additionally, sbd will now periodically contact the main pacemaker
> daemon for a status report. Currently, this is just an immediate
> response, but it ensures that the main pacemaker daemon is responsive
> to IPC requests. This is a bit more assurance that pacemaker is not
> only running, but functioning properly. In future versions, we will
> have even more in-depth health checks as part of this feature.
> 
> Previously at shutdown, sbd determined a clean pacemaker shutdown by
> checking whether any resources were running at shutdown. This would
> lead to sbd fencing if pacemaker shut down in maintenance mode with
> resources active. Now, sbd will determine clean shutdowns as part of
> the status report described above, avoiding that situation.
> 
> These behaviors will be controlled by a new option in
> /etc/sysconfig/sbd or /etc/default/sbd, SBD_SYNC_RESOURCE_STARTUP. This
> defaults to "no" for backward compatibility when a newer sbd is used
> with an older pacemaker or vice versa. Distributions may change the
> value to "yes" since they can ensure both sbd and pacemaker versions
> support it; users who build their own installations can set it
> themselves if both versions support it.

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Still Beginner STONITH Problem

2020-07-09 Thread Vladislav Bogdanov
Hi.

This thread is getting too long.

First, you need to ensure that your switch (or all switches in the
path) have igmp snooping enabled on host ports (and probably
interconnects along the path between your hosts).

Second, you need an igmp querier to be enabled somewhere near (better
to have it enabled on a switch itself). Please verify that you see its
queries on hosts.

Next, you probably need to make your hosts to use IGMPv2 (not 3) as
many switches still can not understand v3. This is doable by sysctl,
find on internet, there are many articles.

These advices are also applicable for running corosync itself in
multicast mode.

Best,
Vladislav

Thu, 02/07/2020 в 17:18 +0200, stefan.schm...@farmpartner-tec.com
wrote:
> Hello,
> 
> I hope someone can help with this problem. We are (still) trying to
> get 
> Stonith to achieve a running active/active HA Cluster, but sadly to
> no 
> avail.
> 
> There are 2 Centos Hosts. On each one there is a virtual Ubuntu VM.
> The 
> Ubuntu VMs are the ones which should form the HA Cluster.
> 
> The current status is this:
> 
> # pcs status
> Cluster name: pacemaker_cluster
> WARNING: corosync and pacemaker node names do not match (IPs used in
> setup?)
> Stack: corosync
> Current DC: server2ubuntu1 (version 1.1.18-2b07d5c5a9) - partition
> with 
> quorum
> Last updated: Thu Jul  2 17:03:53 2020
> Last change: Thu Jul  2 14:33:14 2020 by root via cibadmin on
> server4ubuntu1
> 
> 2 nodes configured
> 13 resources configured
> 
> Online: [ server2ubuntu1 server4ubuntu1 ]
> 
> Full list of resources:
> 
>   stonith_id_1   (stonith:external/libvirt): Stopped
>   Master/Slave Set: r0_pacemaker_Clone [r0_pacemaker]
>   Masters: [ server4ubuntu1 ]
>   Slaves: [ server2ubuntu1 ]
>   Master/Slave Set: WebDataClone [WebData]
>   Masters: [ server2ubuntu1 server4ubuntu1 ]
>   Clone Set: dlm-clone [dlm]
>   Started: [ server2ubuntu1 server4ubuntu1 ]
>   Clone Set: ClusterIP-clone [ClusterIP] (unique)
>   ClusterIP:0(ocf::heartbeat:IPaddr2):   Started 
> server2ubuntu1
>   ClusterIP:1(ocf::heartbeat:IPaddr2):   Started 
> server4ubuntu1
>   Clone Set: WebFS-clone [WebFS]
>   Started: [ server4ubuntu1 ]
>   Stopped: [ server2ubuntu1 ]
>   Clone Set: WebSite-clone [WebSite]
>   Started: [ server4ubuntu1 ]
>   Stopped: [ server2ubuntu1 ]
> 
> Failed Actions:
> * stonith_id_1_start_0 on server2ubuntu1 'unknown error' (1):
> call=201, 
> status=Error, exitreason='',
>  last-rc-change='Thu Jul  2 14:37:35 2020', queued=0ms,
> exec=3403ms
> * r0_pacemaker_monitor_6 on server2ubuntu1 'master' (8):
> call=203, 
> status=complete, exitreason='',
>  last-rc-change='Thu Jul  2 14:38:39 2020', queued=0ms, exec=0ms
> * stonith_id_1_start_0 on server4ubuntu1 'unknown error' (1):
> call=202, 
> status=Error, exitreason='',
>  last-rc-change='Thu Jul  2 14:37:39 2020', queued=0ms,
> exec=3411ms
> 
> 
> The stonith resoursce is stopped and does not seem to work.
> On both hosts the command
> # fence_xvm -o list
> kvm102   bab3749c-15fc-40b7-8b6c-d4267b9f0eb9 
> on
> 
> returns the local VM. Apparently it connects through the
> Virtualization 
> interface because it returns the VM name not the Hostname of the
> client 
> VM. I do not know if this is how it is supposed to work?
> 
> In the local network, every traffic is allowed. No firewall is
> locally 
> active, just the connections leaving the local network are
> firewalled.
> Hence there are no coneection problems between the hosts and clients.
> For example we can succesfully connect from the clients to the Hosts:
> 
> # nc -z -v -u 192.168.1.21 1229
> Ncat: Version 7.50 ( 
> https://nmap.org/ncat
>  )
> Ncat: Connected to 192.168.1.21:1229.
> Ncat: UDP packet sent successfully
> Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds.
> 
> # nc -z -v -u 192.168.1.13 1229
> Ncat: Version 7.50 ( 
> https://nmap.org/ncat
>  )
> Ncat: Connected to 192.168.1.13:1229.
> Ncat: UDP packet sent successfully
> Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds.
> 
> 
> On the Ubuntu VMs we created and configured the the stonith resource 
> according to the  howto provided here:
> https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/pdf/Clusters_from_Scratch/Pacemaker-1.1-Clusters_from_Scratch-en-US.pdf
> 
> 
> The actual line we used:
> # pcs -f stonith_cfg stonith create stonith_id_1 external/libvirt 
> hostlist="Host4,host2"
> hypervisor_uri="qemu+ssh://192.168.1.21/system"
> 
> 
> But as you can see in in the pcs status output, stonith is stopped
> and 
> exits with an unkown error.
> 
> Can somebody please advise on how to procced or what additionla 
> information is needed to solve this problem?
> Any help would be greatly appreciated! Thank you in advance.
> 
> Kind regards
> Stefan Schmitz
> 
> 
> 
> 
> 
> 
> 
> 



___
Manage your subscription:

Re: [ClusterLabs] Reusing resource set in multiple constraints

2019-07-27 Thread Vladislav Bogdanov

Hi.
For location you can use regexps. That is supported in crmsh as well. For 
order and colocation the similar feature should be implemented.


Andrei Borzenkov  27 Jul 2019 11:04:43 AM wrote


Is it possible to have single definition of resource set that is later
references in order and location constraints? All syntax in
documentation or crmsh presumes inline set definition in location or
order statement.


In this particular case there will be set of filesystems that need to be
colocated and ordered against other resources; these filesystems will be
extended over time and I would like to add new definition in just one place.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users


ClusterLabs home: https://www.clusterlabs.org/




___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: Any CLVM/DLM users around?

2018-10-01 Thread Vladislav Bogdanov
On October 1, 2018 8:01:36 PM UTC, Patrick Whitney  
wrote:

[...]

>so we were lucky enough our test environment is a KVM/libvirt
>environment,
>so I used fence_virsh.  Again, I had the same problem... when the "bad"
>node was fenced, dlm_controld would issue (what appears to be) a
>fence_all,
>and I would receive messages that that the dlm clone was down on all
>members and would have a log message that the clvm lockspace was
>abandoned.

What is your dlm versuon btw?

[...]


___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Any CLVM/DLM users around?

2018-10-01 Thread Vladislav Bogdanov
On October 1, 2018 5:44:20 PM UTC, Patrick Whitney  
wrote:
>We tested with both, and experienced the same behavior using both
>fencing
>strategies:  an abandoned DLM lockspace.   More than once, within this
>forum, I've heard that DLM only supports power fencing, but without
>explanation.  Can you explain why DLM requires power fencing?

The main part of dlm runs inside the kernel, and it is very hard to impossible 
to return it into a vanilla state programmatically. Espesially if filesystems 
like gfs2 run on top. iiuc Sistina originally developed dlm mainly for their 
gfs1, and only then for lvm. Things changed, but original design remains I 
believe.

>
>Best,
>-Pat
>
>On Mon, Oct 1, 2018 at 1:38 PM Vladislav Bogdanov
>
>wrote:
>
>> On October 1, 2018 4:55:07 PM UTC, Patrick Whitney
>
>> wrote:
>> >>
>> >> Fencing in clustering is always required, but unlike pacemaker
>that
>> >lets
>> >> you turn it off and take your chances, DLM doesn't.
>> >
>> >
>> >As a matter of fact, DLM has a setting "enable_fencing=0|1" for what
>> >that's
>> >worth.
>> >
>> >
>> >> You must have
>> >> working fencing for DLM (and anything using it) to function
>> >correctly.
>> >>
>> >
>> >We do have fencing enabled in the cluster; we've tested both node
>level
>> >fencing and resource fencing; DLM behaved identically in both
>> >scenarios,
>> >until we set it to 'enable_fencing=0' in the dlm.conf file.
>>
>> Do you have power or fabric fencing? Dlm requires former.
>>
>>
>> ___
>> Users mailing list: Users@clusterlabs.org
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started:
>http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>

___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Any CLVM/DLM users around?

2018-10-01 Thread Vladislav Bogdanov
On October 1, 2018 4:55:07 PM UTC, Patrick Whitney  
wrote:
>>
>> Fencing in clustering is always required, but unlike pacemaker that
>lets
>> you turn it off and take your chances, DLM doesn't.
>
>
>As a matter of fact, DLM has a setting "enable_fencing=0|1" for what
>that's
>worth.
>
>
>> You must have
>> working fencing for DLM (and anything using it) to function
>correctly.
>>
>
>We do have fencing enabled in the cluster; we've tested both node level
>fencing and resource fencing; DLM behaved identically in both
>scenarios,
>until we set it to 'enable_fencing=0' in the dlm.conf file.

Do you have power or fabric fencing? Dlm requires former.


___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] 2 node cluster dlm/clvm trouble

2018-09-11 Thread Vladislav Bogdanov

On 11.09.2018 16:31, Patrick Whitney wrote:
But, when I invoke the "human" stonith power device (i.e. I turn the 
node off), the other node collapses...


In the logs I supplied, I basically do this:

1. stonith fence (With fence scsi)


At this point DLM on a healthy node is notified that node was fenced and 
expects no connections from DLM on a fenced node. What happens if it 
sees such connection is hidden deep in code.



2. verify UI shows fenced node as stopped


Then I wouldn't trust such UI.


3. power off fenced node

It's only when I shut down the fenced node that the running node falls 
over.


How would using a power fencing agent differ from me manually removing 
power?


There is a delay between fence success notification to DLM and actual 
power off. With power fencing notification goes after power is cut.




Thanks (I very much appreciate the discussion!)

Best,
-Pat



Would it be useful to show logs of what that looks like?

On Tue, Sep 11, 2018 at 9:22 AM Valentin Vidic > wrote:


On Tue, Sep 11, 2018 at 09:13:08AM -0400, Patrick Whitney wrote:
 > So when the cluster suggests that DLM is shutdown on coro-test-1:
 > Clone Set: dlm-clone [dlm]
 >      Started: [ coro-test-2 ]
 >      Stopped: [ coro-test-1 ]
 >
 > ... DLM isn't actually stopped on 1?

If you can connect to the node and see dlm services running than
it is not stopped:

20101 dlm_controld
20245 dlm_scand
20246 dlm_recv
20247 dlm_send
20248 dlm_recoverd

But if you kill the power on the node than it will be gone for sure :)

-- 
Valentin

___
Users mailing list: Users@clusterlabs.org 
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



--
Patrick Whitney
DevOps Engineer -- Tools


___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] 2 node cluster dlm/clvm trouble

2018-09-11 Thread Vladislav Bogdanov

On 11.09.2018 16:10, Valentin Vidic wrote:

On Tue, Sep 11, 2018 at 09:02:06AM -0400, Patrick Whitney wrote:

What I'm having trouble understanding is why dlm flattens the remaining
"running" node when the already fenced node is shutdown...  I'm having
trouble understanding how power fencing would cause dlm to behave any
differently than just shutting down the fenced node.


fences_scsi just kills the storage on the node, but dlm continues to run
causing problems for the rest of the cluster nodes.  So it seems some
other fence agent should be used that would kill dlm too.



And that is not an easy task sometimes, because main part of dlm runs in 
kernel.

In some circumstances the only option is to forcibly reset the node.
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Q: HA_RSCTMP in SLES11 SP4 at first start after reboot

2018-08-13 Thread Vladislav Bogdanov

10.08.2018 19:52, Ulrich Windl wrote:


Hi!

A simple question: One of my RAs uses $HA_RSCTMP in SLES11 SP4, and it reports 
the following problem:
  WARNING: Unwritable HA_RSCTMP directory /var/run/resource-agents - using /tmp


Just make sure you avoid using that code in 'meta-data' action handler 
(it is run by crmd which runs under hacluster user to obtain and cache 
agent meta-data and I bet that message is from that run).




However the directory has the following permissions:
drwxr-xr-t 2 root root 4096 Aug 10 18:05 /var/run/resource-agents

My code to check this is:
if [ ! -w "$HA_RSCTMP" -a "$(id -u)" -ne 0 ]; then
 ocf_log warn "Unwritable HA_RSCTMP directory $HA_RSCTMP - using /tmp"
 HA_RSCTMP=/tmp
fi

Did I overlook something obvious? Could it be that the directory is created 
after the error message? I suspect that the error is triggered during a 
parameter validation after the cluster node had been rebooted...

Regards,
Ulrich


___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Resources not monitored in SLES11 SP4 (1.1.12-f47ea56)

2018-06-26 Thread Vladislav Bogdanov

26.06.2018 09:14, Ulrich Windl wrote:

Hi!

We just observed some strange effect we cannot explain in SLES 11 SP4 
(pacemaker 1.1.12-f47ea56):
We run about a dozen of Xen PVMs on a three-node cluster (plus some 
infrastructure and monitoring stuff). It worked all well so far, and there was 
no significant change recently.
However when a colleague stopped on VM for maintenance via cluster command, the 
cluster did not notice when the PVM actually was running again (it had been 
started not using the cluster (a bad idea, I know)).


To be on a safe side in such cases you'd probably want to enable 
additional monitor for a "Stopped" role. Default one covers only 
"Started" role. The same thing as for multistate resources, where you 
need several monitor ops, for "Started/Slave" and "Master" roles.

But, this will increase a load.
And, I believe cluster should reprobe a resource on all nodes once you 
change target-role back to "Started".



Examining the logs, it seems that the recheck timer popped periodically, but no 
monitor action was run for the VM (the action is configured to run every 10 
minutes).

Actually the only monitor operations found were:
May 23 08:04:13
Jun 13 08:13:03
Jun 25 09:29:04
Then a manual "reprobe" was done, and several monitor operations were run.
Then again I see no more monitor actions in syslog.

What could be the reasons for this? Too many operations defined?

The other message I don't understand is like ": Rolling back scores from 
"

Could it be a new bug introduced in pacemaker, or could it be some 
configuration problem (The status is completely clean however)?

According to the packet changelog, there was no change since Nov 2016...

Regards,
Ulrich


___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] [questionnaire] Do you manage your pacemaker configuration by hand and (if so) what reusability features do you use?

2018-06-07 Thread Vladislav Bogdanov

Hi,

On 31.05.2018 15:48, Jan Pokorný wrote:

Hello,

I am soliciting feedback on these CIB features related questions,
please reply (preferably on-list so we have the shared collective
knowledge) if at least one of the questions is answered positively
in your case (just tick the respective "[ ]" boxes as "[x]").

Any other commentary also welcome -- thank you in advance.



1.  [x] Do you edit CIB by hand (as opposed to relying on crm/pcs or
 their UI counterparts)?

Very rare, but sometimes it is the only way.

2.  [x] Do you use "template" based syntactic simplification[1] in CIB?

This is not only simplification, but allows to drastically reduce CIB size.

3.  [ ] Do you use "id-ref" based syntactic simplification[2] in CIB?

3.1 [ ] When positive about 3., would you mind much if "id-refs" got
 unfold/exploded during the "cibadmin --upgrade --force"
 equivalent as a reliability/safety precaution?

4.  [ ] Do you use "tag" based syntactic grouping[3] in CIB?



(Some of these questions tangentially touch the topic of perhaps
excessively complex means of configuration that was raised during
the 2017's cluster summit.)

[1] 
https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained/index.html#_reusing_resource_definitions
[2] 
https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained/index.html#s-reusing-config-elements
[3] 
https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained/index.html#_tagging_configuration_elements



___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Feedback wanted: changing "master/slave" terminology

2018-01-26 Thread Vladislav Bogdanov

25.01.2018 21:28, Ken Gaillot wrote:

[...]


If I can throw another suggestion in (without offering preference for
it
myself), 'dual-state clones'? The reasoning is that, though three
words
instead of two, spell-check likes it, it sounds OK on day one (from a
language perspective) and it reflects that the clone has only one of
two
states.


Or "dual-role".


Btw, is the word 'tri-state' or 'tristate' usable in contemporary English?
Some online translators accept it, but google isn't.



Binary/dual/multi all have the issue that all resources have multiple
states (stopped, started, etc.). Not a deal-breaker, but a factor to
consider.

What we're trying to represent is: clone resources that have an
additional possible role that pacemaker manages via the promote/demote
actions.

I go back and forth between options. "Multistate" would be OK,
especially since it's already used in some places. "Promotable" is
probably most accurate.




___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: Antw: Re: Coming in Pacemaker 2.0.0: /var/log/pacemaker/pacemaker.log

2018-01-15 Thread Vladislav Bogdanov

15.01.2018 11:23, Ulrich Windl wrote:




Vladislav Bogdanov <bub...@hoster-ok.com> schrieb am 12.01.2018 um 10:06 in

Nachricht <3c5d9060-4714-cc20-3039-aa53b4a95...@hoster-ok.com>:

11.01.2018 18:39, Ken Gaillot wrote:

[...]

I thought one option aired at the summit to address this was
/var/log/clusterlabs, but it's entirely possible my memory's
playing
tricks on me again.


I don't remember that, but it sounds like a good choice. However we'd
still have the same issue of needing a single package to own it.


In rpm world several packages may own a directory if it is consistently
marked as '%dir' in a filelist.


Sure? I mean a package using a directory should include it as %dir, but what if 
multiple packages use the same dir with different owners, maybe?


Then they will conflict.

I just rechecked, creating two dummy packages owning one directory. If 
owner/mode matches, then packages are correctly installed and directory 
is reported to be owned by both.

If there is mismatch, then rpm refuses to install.


Every file or directory should have exacltly one owner IMHO.
A common solution seems to put common directories in a separate package that "client 
packages" require.

Still independent of that is having a clean structure of things.

Regards,
Ulrich




___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: Coming in Pacemaker 2.0.0: /var/log/pacemaker/pacemaker.log

2018-01-12 Thread Vladislav Bogdanov

11.01.2018 18:39, Ken Gaillot wrote:

[...]

I thought one option aired at the summit to address this was
/var/log/clusterlabs, but it's entirely possible my memory's
playing
tricks on me again.


I don't remember that, but it sounds like a good choice. However we'd
still have the same issue of needing a single package to own it.


In rpm world several packages may own a directory if it is consistently 
marked as '%dir' in a filelist.



___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] pcmk_remote evaluation (continued)

2017-12-11 Thread Vladislav Bogdanov

11.12.2017 23:06, Ken Gaillot wrote:
[...]

=

* The first issue I found (and I expect that to be a reason for some
other issues) is that
pacemaker_remote does not drop an old crmds' connection after new
crmd connects.
As IPC proxy connections are in the hash table, there is a 50% chance
that remoted tries to
reach an old crmd to f.e. proxy checks of node attributes when
resources are reprobed.
That leads to timeouts of that resources' probes with consequent
reaction from a cluster.
A solution here could be to drop old IPC proxy connection as soon as
new one is established.


We can't drop connections from the pacemaker_remoted side because it
doesn't know anything about the cluster state (e.g. whether the cluster
connection resource is live-migrating).


Well, ok. But what happens when the fenced cluster node goes back and 
receives a TCP packet from the old connection? Yes, it sends RST which 
would terminate a connection on the peer side and then pcmk_remoted 
should shutdown it on a socket event.




However we can simply always use the most recently connected provider,
which I think solves the issue. See commit e9a7e3bb, one of a few
recent bugfixes in the master branch for pacemaker_remoted. It will
most likely not make it into 2.0 (which I'm trying to focus on
deprecated syntax removals), but the next release after that.


Will definitely try it, all stakeholders are already notified that we 
need another round on all available hardware :) We will test as soon as 
it becomes free.


I will return to this as soon as I have some results.

Thank you,
Vladislav

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] How much cluster-glue support is still needed in Pacemaker?

2017-11-17 Thread Vladislav Bogdanov

17.11.2017 02:26, Ken Gaillot wrote:

We're starting work on Pacemaker 2.0, which will remove support for the
heartbeat stack.

cluster-glue was traditionally associated with heartbeat. Do current
distributions still ship it?

Currently, Pacemaker uses cluster-glue's stonith/stonith.h to support
heartbeat-class stonith agents via the fence_legacy agent. If this is
still widely used, we can keep this support.


This would be nice, AFAIK rcd_serial agent is available only in 
cluster-glue yet.




Pacemaker also checks for heartbeat/glue_config.h and uses certain
configuration values there in favor of Pacemaker's own defaults (e.g.
the value of HA_COREDIR instead of /var/lib/pacemaker/cores). Does
anyone still use the cluster-glue configuration for such things? If
not, I'd prefer to drop this.




___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker resource start delay when there are another resource is starting

2017-11-01 Thread Vladislav Bogdanov

01.11.2017 17:20, Ken Gaillot wrote:

On Sat, 2017-10-28 at 01:11 +0800, lkxjtu wrote:


Thank you for your response! This means that there shoudn't be long
"sleep" in ocf script.
If my service takes 10 minite from service starting to healthcheck
normally, then what shoud I do?


That is a tough situation with no great answer.

You can leave it as it is, and live with the delay. Note that it only
happens if a resource fails after the slow resource has already begun
starting ... if they fail at the same time (as with a node failure),
the cluster will schedule recovery for both at the same time.

Another possibility would be to have the start return immediately, and
make the monitor artificially return success for the first 10 minutes
after starting. It's hacky, and it depends on your situation whether
the behavior is acceptable. My first thought on how to implement this
would be to have the start action set a private node attribute
(attrd_updater -p) with a timestamp. When the monitor runs, it could do
its usual check, and if it succeeds, remove that node attribute, but if
it fails, check the node attribute to see whether it's within the
desired delay.


Or write a master-slave resource agent, like DRBD has.
It sets low master score on an outdated node after sync is started and 
raises it after sync is finished, thus it is not promoted to master 
until sync is complete.


If you "map" service states to a pacemaker states like

Starting - Slave
Started - Master

that would help.





Thank you very much!
  

Hi,
If I remember correctly, any pending actions from a previous

transition

must be completed before a new transition can be calculated.

Otherwise,

there's the possibility that the pending action could change the

state

in a way that makes the second transition's decisions harmful.
Theoretically (and ideally), pacemaker could figure out whether

some of

the actions in the second transition would be needed regardless of
whether the pending actions succeeded or failed, but in practice,

that

would be difficult to implement (and possibly take more time to
calculate than is desirable in a recovery situation).
  

On Fri, 2017-10-27 at 23:54 +0800, lkxjtu wrote:



I have two clone resources in my corosync/pacemaker cluster. They

are

fm_mgt and logserver. Both of their RA is ocf. fm_mgt takes 1

minute

to start the
service(calling ocf start function for 1 minite). Configured as
below:
# crm configure show
node 168002177: 192.168.2.177
node 168002178: 192.168.2.178
node 168002179: 192.168.2.179
primitive fm_mgt fm_mgt \
  op monitor interval=20s timeout=120s \
  op stop interval=0 timeout=120s on-fail=restart \
  op start interval=0 timeout=120s on-fail=restart \
  meta target-role=Started
primitive logserver logserver \
  op monitor interval=20s timeout=120s \
  op stop interval=0 timeout=120s on-fail=restart \
  op start interval=0 timeout=120s on-fail=restart \
  meta target-role=Started
clone fm_mgt_replica fm_mgt
clone logserver_replica logserver
property cib-bootstrap-options: \
  have-watchdog=false \
  dc-version=1.1.13-10.el7-44eb2dd \
  cluster-infrastructure=corosync \
  stonith-enabled=false \
  start-failure-is-fatal=false
When I kill fm_mgt service on one node,pacemaker will immediately
recover it after monitor failed. This looks perfectly normal. But

in

this 1 minite
of fm_mgt starting, if I kill logserver service on any node, the
monitor will catch the fail normally too,but pacemaker will not
restart it
immediately but waiting for fm_mgt starting finished. After fm_mgt
starting finished, pacemaker begin restarting logserver. It seems
that there are
some dependency between pacemaker resource.
# crm status
Last updated: Thu Oct 26 06:40:24 2017  Last change: Thu

Oct

26 06:36:33 2017 by root via crm_resource on 192.168.2.177
Stack: corosync
Current DC: 192.168.2.179 (version 1.1.13-10.el7-44eb2dd) -

partition

with quorum
3 nodes and 6 resources configured
Online: [ 192.168.2.177 192.168.2.178 192.168.2.179 ]
Full list of resources:
   Clone Set: logserver_replica [logserver]
   logserver  (ocf::heartbeat:logserver): FAILED

192.168.2.177

   Started: [ 192.168.2.178 192.168.2.179 ]
   Clone Set: fm_mgt_replica [fm_mgt]
   Started: [ 192.168.2.178 192.168.2.179 ]
   Stopped: [ 192.168.2.177 ]
I am confusing very much. Is there something wrong configure?Thank
you very much!
James
best regards
  



【网易自营】好吃到爆!鲜香弹滑加热即食,经典13香/麻辣小龙虾仅75元3斤>>


【网易自营】好吃到爆!鲜香弹滑加热即食,经典13香/麻辣小龙虾仅75元3斤>>


【网易自营|30天无忧退货】仅售同款价1/4!MUJI制造商“2017秋冬舒适家居拖鞋系列”限时仅34.9元>>



___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Fwd: Stopped DRBD

2017-10-18 Thread Vladislav Bogdanov

Hi,

ensure you have two monitor operations configured for your drbd 
resource: for 'Master' and 'Slave' roles ('Slave' == 'Started' == '' for 
ms resources).


http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/_monitoring_multi_state_resources.html


18.10.2017 11:18, Антон Сацкий wrote:


Hi list need your help


[root@voipserver ~]# pcs status
Cluster name: ClusterKrusher
Stack: corosync
Current DC: voipserver.backup (version 1.1.16-12.el7_4.2-94ff4df) -
partition with quorum
Last updated: Tue Oct 17 19:46:05 2017
Last change: Tue Oct 17 19:28:22 2017 by root via cibadmin on
voipserver.primary

2 nodes configured
3 resources configured

Node voipserver.backup: standby
Online: [ voipserver.primary ]

Full list of resources:

 ClusterIP  (ocf::heartbeat:IPaddr2):   Started voipserver.primary
 Master/Slave Set: DrbdDataClone [DrbdData]
 Masters: [ voipserver.primary ]
 Stopped: [ voipserver.backup ]

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled



BUT IN FACT
[root@voipserver ~]# drbd-overview
NOTE: drbd-overview will be deprecated soon.
Please consider using drbdtop.

 1:r0/0 Connected Primary/Secondary UpToDate/UpToDate


Is it normal behavior or a BUG




--
Best regards
Antony



___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] pcmk_remote evaluation (continued)

2017-09-20 Thread Vladislav Bogdanov
Hi,

as 1.1.17 received a lot of care in pcmk_remote, I decided to try it again
in rather big setup (less then previous, so I'm not hit by IPC disconnects 
here).

>From the first runs there are still some severe issues when cluster nodes are 
>fenced.

The following results are obtained by killing the DC node (md12k-3-srv) which 
was
hosting remote resources for nodes es7700-3-srv, es12kxe-3-srv and 
es12kxe-7-srv.
After the fence new DC (md12k-1-srv) has moved that resources the following way:
=
Sep 20 08:53:28 md12k-1-srv pengine[2525]:   notice: Movees12kxe-3-srv  
(Started md12k-3-srv -> md12k-4-srv)
Sep 20 08:53:28 md12k-1-srv pengine[2525]:   notice: Movees12kxe-7-srv  
(Started md12k-3-srv -> md12k-1-srv)
Sep 20 08:53:28 md12k-1-srv pengine[2525]:   notice: Movees7700-3-srv   
(Started md12k-3-srv -> md12k-2-srv)
=

* The first issue I found (and I expect that to be a reason for some other 
issues) is that
pacemaker_remote does not drop an old crmds' connection after new crmd connects.
As IPC proxy connections are in the hash table, there is a 50% chance that 
remoted tries to
reach an old crmd to f.e. proxy checks of node attributes when resources are 
reprobed.
That leads to timeouts of that resources' probes with consequent reaction from 
a cluster.
A solution here could be to drop old IPC proxy connection as soon as new one is 
established.

* I _suspect_ that the issue above could lead to following lines in a cluster 
node logs.
I didn't check, but I suspect that when remoted decides to disconnect an old 
connection
after fenced node goes up and TCP connections are reset - it disconnects a new 
one too.
At least this issue happens at the same time fenced node rejoins a cluster.
These logs are for the case no resources operate node attributes (I removed 
that resources
from the CIB and set a stickiness for all others).
=
Sep 20 08:55:41 md12k-1-srv crmd[11375]:error: Timed out (1 ms) while 
waiting for remote data
Sep 20 08:55:41 md12k-1-srv crmd[11375]:error: Unable to receive expected 
reply, disconnecting.
Sep 20 08:55:41 md12k-1-srv crmd[11375]:error: Remote lrmd server 
disconnected while waiting for reply with id 9823.
Sep 20 08:55:41 md12k-1-srv crmd[11375]:error: Unexpected disconnect on 
remote-node es12kxe-7-srv
Sep 20 08:55:41 md12k-1-srv crmd[11375]:error: Result of monitor operation 
for es12kxe-7-srv on md12k-1-srv: Error
Sep 20 08:55:41 md12k-1-srv crmd[11375]:error: Couldn't perform 
lrmd_rsc_info operation (timeout=0): -107: Success (0)
Sep 20 08:55:41 md12k-1-srv crmd[11375]:error: LRMD disconnected
Sep 20 08:55:41 md12k-1-srv crmd[11375]:error: Could not add resource 
ost0033-es03a to LRM es12kxe-7-srv
Sep 20 08:55:41 md12k-1-srv crmd[11375]:error: Invalid resource definition 
for ost0033-es03a
Sep 20 08:55:41 md12k-1-srv crmd[11375]:  warning: bad input   

Sep 20 08:55:41 md12k-1-srv crmd[11375]:  warning: bad input 
Sep 20 08:55:41 md12k-1-srv crmd[11375]:  warning: bad input   
Sep 20 08:55:41 md12k-1-srv crmd[11375]:  warning: bad input 
Sep 20 08:55:41 md12k-1-srv crmd[11375]:  warning: bad input 

Sep 20 08:55:41 md12k-1-srv crmd[11375]:  warning: bad input   
Sep 20 08:55:41 md12k-1-srv crmd[11375]:  warning: bad input 
Sep 20 08:55:41 md12k-1-srv crmd[11375]:  warning: bad input   

Sep 20 08:55:41 md12k-1-srv crmd[11375]:  warning: Resource ost0033-es03a no 
longer exists in the lrmd
Sep 20 08:55:41 md12k-1-srv crmd[11375]:  warning: Action 221 
(ost0033-es03a_monitor_0) on es12kxe-7-srv failed (target: 7 vs. rc: 6): Error
Sep 20 08:55:41 md12k-1-srv crmd[11375]:   notice: Transition aborted by 
lrm_rsc_op.ost0033-es03a_last_failure_0: Event failed
Sep 20 08:55:41 md12k-1-srv crmd[11375]:  warning: Action 221 
(ost0033-es03a_monitor_0) on es12kxe-7-srv failed (target: 7 vs. rc: 6): Error
Sep 20 08:55:41 md12k-1-srv crmd[11375]:error: Result of probe operation 
for ost0033-es03a on es12kxe-7-srv: Error
Sep 20 08:55:41 md12k-1-srv crmd[11375]:   notice: Transition aborted by 
operation es12kxe-3-srv_monitor_3 'create' on md12k-4-srv: Old event
...
ep 20 08:56:41 md12k-1-srv attrd[2524]:   notice: Node md12k-3-srv state is now 
member
Sep 20 08:56:41 md12k-1-srv cib[2511]:   notice: Node md12k-3-srv state is now 
member
Sep 20 08:56:41 md12k-1-srv pacemakerd[2398]:   notice: Node md12k-3-srv state 
is now member
Sep 20 08:56:41 md12k-1-srv crmd[11375]:   notice: Node md12k-3-srv state is 
now member
Sep 20 08:56:41 md12k-1-srv stonith-ng[2522]:   notice: Node md12k-3-srv state 
is now member
Sep 20 08:56:41 md12k-1-srv crmd[11375]:  warning: No reason to expect node 2 
to be down
Sep 20 08:56:41 md12k-1-srv crmd[11375]:   notice: Stonith/shutdown of 
md12k-3-srv not matched
=
I cannot check if that is a true until the first issue is fixed.

* There are repeated probe results with rc 6 (PCMK_OCF_NOT_CONFIGURED)
and 189 (PCMK_OCF_CONNECTION_DIED) if 

Re: [ClusterLabs] Resources are stopped and started when one node rejoins

2017-08-28 Thread Vladislav Bogdanov

28.08.2017 14:03, Octavian Ciobanu wrote:

Hey Vladislav,

Thank you for the info. I've tried you suggestions but the behavior is
still the same. When an offline/standby node rejoins the cluster all the
resources are first stopped and then started. I've added the changes
I've made, see below in reply message, next to your suggestions.


Logs on DC (node where you see logs from the pengine process) should 
contain references to pe-input-XX.bz2 files. Something like "notice: 
Calculated transition , saving inputs in 
/var/lib/pacemaker/pengine/pe-input-XX.bz2"

Locate one for which Stop actions occur.
You can replay them with 'crm_simulate -S -x 
/var/lib/pacemaker/pengine/pe-input-XX.bz2' to see if that is the 
correct one (look in the middle of output).


After that you may add some debugging:
PCMK_debug=yes PCMK_logfile=./pcmk.log crm_simulate -S -x 
/var/lib/pacemaker/pengine/pe-input-XX.bz2


That will produce a big file with all debugging messages enabled.

Try to locate a reason for restarts there.

Best,
Vladislav

Also please look inline (may be info there will be enough so you won't 
need to debug).




Once again thank you for info.

Best regards.
Octavian Ciobanu

On Sat, Aug 26, 2017 at 8:17 PM, Vladislav Bogdanov
<bub...@hoster-ok.com <mailto:bub...@hoster-ok.com>> wrote:

26.08.2017 19 <tel:26.08.2017%2019>:36, Octavian Ciobanu wrote:

Thank you for your reply.

There is no reason to set location for the resources, I think,
because
all the resources are set with clone options so they are started
on all
nodes at the same time.


You still need to colocate "upper" resources with their
dependencies. Otherwise pacemaker will try to start them even if
their dependencies fail. Order without colocation has very limited
use (usually when resources may run on different nodes). For clones
that is even more exotic.


I've added collocation

pcs constraint colocation add iSCSI1-clone with DLM-clone
pcs constraint colocation add iSCSI2-clone with DLM-clone
pcs constraint colocation add iSCSI3-clone with DLM-clone
pcs constraint colocation add Mount1-clone with iSCSI1-clone
pcs constraint colocation add Mount2-clone with iSCSI2-clone
pcs constraint colocation add Mount4-clone with iSCSI3-clone

The result is the same ... all clones are first stopped and then started
beginning with DLM resource and ending with the Mount ones.


Yep, that was not meant to fix your problem. Just to prevent future issues.




For you original question: ensure you have interleave=true set for
all your clones. You seem to miss it for iSCSI ones.
interleave=false (default) is for different uses (when upper
resources require all clone instances to be up).


Modified iSCSI resources and added interleave="true" and still no change
in behavior.


Weird... Probably you also do not need 'ordered="true"' for your DLM 
clone? Knowing what is DLM, it does not need ordering, its instances may 
be safely started in the parallel.





Also, just a minor note, iSCSI resources do not actually depend on
dlm, mounts should depend on it.


I know but the mount resource must know when the iSCSI resource to whom
is connected is started so the only solution I've seen was to place DLM
before iSCSI and then Mount. If there is another solution, a proper way
to do it, please can you give a reference or a place from where to read
on how to do it ?


You would want to colocate (and order) mount with both DLM and iSCSI. 
Multiple colocations/orders for the same resource are allowed.
For mount you need DLM running and iSCSI disk connected. But you 
actually do not need DLM to connect iSCSI disk (so DLM and iSCSI 
resources may start in the parallel).





And when it comes to stickiness I forgot to
mention that but it set to 200. and also I have stonith
configured  to
use vmware esxi.

Best regards
Octavian Ciobanu

On Sat, Aug 26, 2017 at 6:16 PM, John Keates <j...@keates.nl
<mailto:j...@keates.nl>
<mailto:j...@keates.nl <mailto:j...@keates.nl>>> wrote:

While I am by no means a CRM/Pacemaker expert, I only see the
resource primitives and the order constraints. Wouldn’t you need
location and/or colocation as well as stickiness settings to
prevent
this from happening? What I think it might be doing is
seeing the
new node, then trying to move the resources (but not finding
it a
suitable target) and then moving them back where they came
from, but
fast enough for you to only see it as a restart.

If you crm_resource -P, it should also restart all
resources, but
put them in the preferred spot. If they end up in the same
place,
you probab

Re: [ClusterLabs] Resources are stopped and started when one node rejoins

2017-08-26 Thread Vladislav Bogdanov

26.08.2017 19:36, Octavian Ciobanu wrote:

Thank you for your reply.

There is no reason to set location for the resources, I think, because
all the resources are set with clone options so they are started on all
nodes at the same time.


You still need to colocate "upper" resources with their dependencies. 
Otherwise pacemaker will try to start them even if their dependencies 
fail. Order without colocation has very limited use (usually when 
resources may run on different nodes). For clones that is even more exotic.


For you original question: ensure you have interleave=true set for all 
your clones. You seem to miss it for iSCSI ones. interleave=false 
(default) is for different uses (when upper resources require all clone 
instances to be up).


Also, just a minor note, iSCSI resources do not actually depend on dlm, 
mounts should depend on it.




And when it comes to stickiness I forgot to
mention that but it set to 200. and also I have stonith configured  to
use vmware esxi.

Best regards
Octavian Ciobanu

On Sat, Aug 26, 2017 at 6:16 PM, John Keates > wrote:

While I am by no means a CRM/Pacemaker expert, I only see the
resource primitives and the order constraints. Wouldn’t you need
location and/or colocation as well as stickiness settings to prevent
this from happening? What I think it might be doing is seeing the
new node, then trying to move the resources (but not finding it a
suitable target) and then moving them back where they came from, but
fast enough for you to only see it as a restart.

If you crm_resource -P, it should also restart all resources, but
put them in the preferred spot. If they end up in the same place,
you probably didn’t put and weighing in the config or have
stickiness set to INF.

Kind regards,

John Keates


On 26 Aug 2017, at 14:23, Octavian Ciobanu
> wrote:

Hello all,

While playing with cluster configuration I noticed a strange
behavior. If I stop/standby cluster services on one node and
reboot it, when it joins the cluster all the resources that were
started and working on active nodes get stopped and restarted.

My testing configuration is based on 4 nodes. One node is a
storage node that makes 3 iSCSI targets available for the other
nodes to use,it is not configured to join cluster, and three nodes
that are configured in a cluster using the following commands.

pcs resource create DLM ocf:pacemaker:controld op monitor
interval="60" on-fail="fence" clone meta clone-max="3"
clone-node-max="1" interleave="true" ordered="true"
pcs resource create iSCSI1 ocf:heartbeat:iscsi
portal="10.0.0.1:3260 "
target="iqn.2017-08.example.com
:tgt1" op start interval="0"
timeout="20" op stop interval="0" timeout="20" op monitor
interval="120" timeout="30" clone meta clone-max="3"
clone-node-max="1"
pcs resource create iSCSI2 ocf:heartbeat:iscsi
portal="10.0.0.1:3260 "
target="iqn.2017-08.example.com
:tgt2" op start interval="0"
timeout="20" op stop interval="0" timeout="20" op monitor
interval="120" timeout="30" clone meta clone-max="3"
clone-node-max="1"
pcs resource create iSCSI3 ocf:heartbeat:iscsi
portal="10.0.0.1:3260 "
target="iqn.2017-08.example.com
:tgt3" op start interval="0"
timeout="20" op stop interval="0" timeout="20" op monitor
interval="120" timeout="30" clone meta clone-max="3"
clone-node-max="1"
pcs resource create Mount1 ocf:heartbeat:Filesystem
device="/dev/disk/by-label/MyCluster:Data1" directory="/mnt/data1"
fstype="gfs2" options="noatime,nodiratime,rw" op monitor
interval="90" on-fail="fence" clone meta clone-max="3"
clone-node-max="1" interleave="true"
pcs resource create Mount2 ocf:heartbeat:Filesystem
device="/dev/disk/by-label/MyCluster:Data2" directory="/mnt/data2"
fstype="gfs2" options="noatime,nodiratime,rw" op monitor
interval="90" on-fail="fence" clone meta clone-max="3"
clone-node-max="1" interleave="true"
pcs resource create Mount3 ocf:heartbeat:Filesystem
device="/dev/disk/by-label/MyCluster:Data3" directory="/mnt/data3"
fstype="gfs2" options="noatime,nodiratime,rw" op monitor
interval="90" on-fail="fence" clone meta clone-max="3"
clone-node-max="1" interleave="true"
pcs constraint order DLM-clone then iSCSI1-clone
pcs constraint order DLM-clone then iSCSI2-clone
pcs constraint order DLM-clone then iSCSI3-clone
pcs constraint order iSCSI1-clone then Mount1-clone
pcs constraint order iSCSI2-clone then Mount2-clone
pcs constraint order iSCSI3-clone then Mount3-clone

If I issue the command "pcs cluster standby node1" 

Re: [ClusterLabs] IPaddr2 RA and bonding

2017-08-07 Thread Vladislav Bogdanov

07.08.2017 20:39, Tomer Azran wrote:

I don't want to use this approach since I don't want to be depend on pinging to 
other host or couple of hosts.
Is there any other solution?
I'm thinking of writing a simple script that will take a bond down using ifdown 
command when there are no slaves available and put it on /sbin/ifdown-local


For the similar purpose I wrote and use this one - 
https://github.com/ClusterLabs/pacemaker/blob/master/extra/resources/ifspeed


It sets a node attribute on which other resources may depend via 
location constraint  - 
http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/ch08.html#ch-rules


It is not installed by default, and that should probably be fixed.

That RA supports bonds (and bridges), and even tries to guess actual 
resulting bond speed based on a bond type. For load-balancing bonds like 
LACP (mode 4) one it uses coefficient of 0.8 (iirc) to reflect actual 
possible load via multiple links.





-Original Message-
From: Ken Gaillot [mailto:kgail...@redhat.com]
Sent: Monday, August 7, 2017 7:14 PM
To: Cluster Labs - All topics related to open-source clustering welcomed 

Subject: Re: [ClusterLabs] IPaddr2 RA and bonding

On Mon, 2017-08-07 at 10:02 +, Tomer Azran wrote:

Hello All,



We are using CentOS 7.3 with pacemaker in order to create a cluster.

Each cluster node ha a bonding interface consists of two nics.

The cluster has an IPAddr2 resource configured like that:



# pcs resource show cluster_vip

Resource: cluster_vip (class=ocf provider=heartbeat type=IPaddr2)

  Attributes: ip=192.168.1.3

  Operations: start interval=0s timeout=20s (cluster_vip
-start-interval-0s)

  stop interval=0s timeout=20s (cluster_vip
-stop-interval-0s)

  monitor interval=30s (cluster_vip -monitor-interval-30s)





We are running tests and want to simulate a state when the network
links are down.

We are pulling both network cables from the server.



The problem is that the resource is not marked as failed, and the
faulted node keep holding it and does not fail it over to the other
node.

I think that the problem is within the bond interface. The bond
interface is marked as UP on the OS. It even can ping itself:



# ip link show

2: eno3:  mtu 1500 qdisc mq
master bond1 state DOWN mode DEFAULT qlen 1000

link/ether 00:1e:67:f6:5a:8a brd ff:ff:ff:ff:ff:ff

3: eno4:  mtu 1500 qdisc mq
master bond1 state DOWN mode DEFAULT qlen 1000

link/ether 00:1e:67:f6:5a:8a brd ff:ff:ff:ff:ff:ff

9: bond1:  mtu 1500 qdisc
noqueue state DOWN mode DEFAULT qlen 1000

link/ether 00:1e:67:f6:5a:8a brd ff:ff:ff:ff:ff:ff



As far as I understand the IPaddr2 RA does not check the link state of
the interface – What can be done?


You are correct. The IP address itself *is* up, even if the link is down, and 
it can be used locally on that host.

If you want to monitor connectivity to other hosts, you have to do that 
separately. The most common approach is to use the ocf:pacemaker:ping resource. 
See:

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#_moving_resources_due_to_connectivity_changes


BTW, I tried to find a solution on the bonding configuration which
disables the bond when no link is up, but I didn't find any.



Tomer.


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


--
Ken Gaillot 





___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] how to set a dedicated fence delay for a stonith agent ?

2017-05-17 Thread Vladislav Bogdanov

08.05.2017 22:20, Lentes, Bernd wrote:

Hi,

i remember that digimer often campaigns for a fence delay in a 2-node  cluster.
E.g. here: http://oss.clusterlabs.org/pipermail/pacemaker/2013-July/019228.html
In my eyes it makes sense, so i try to establish that. I have two HP servers, 
each with an ILO card.
I have to use the stonith:external/ipmi agent, the stonith:external/riloe 
refused to work.

But i don't have a delay parameter there.
crm ra info stonith:external/ipmi:


Hi,

There is another ipmi fence agent - fence_ipmilan (part of fence-agents 
package). It has 'delay' parameter.




...
pcmk_delay_max (time, [0s]): Enable random delay for stonith actions and 
specify the maximum of random delay
This prevents double fencing when using slow devices such as sbd.
Use this to enable random delay for stonith actions and specify the maximum 
of random delay.
...

This is the only delay parameter i can use. But a random delay does not seem to 
be a reliable solution.

The stonith:ipmilan agent also provides just a random delay. Same with the 
riloe agent.

How did anyone solve this problem ?

Or do i have to edit the RA (I will get practice in that :-))?


Bernd





___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Antw: notice: throttle_handle_load: High CPU load detected

2017-05-09 Thread Vladislav Bogdanov

09.05.2017 00:56, Ken Gaillot wrote:

[...]


Those messages indicate there is a real issue with the CPU load. When
the cluster notices high load, it reduces the number of actions it will
execute at the same time. This is generally a good idea, to avoid making
the load worse.



[...]


message, and 2.0 to get the "High CPU load" message. These are measured
against the 1-minute system load average (the same number you would get
with top, uptime, etc.).


Well, linux loadavg actually has nothing to *CPU* load.

https://en.wikipedia.org/wiki/Load_(computing)

The most common example to prove that is a storage system (I see that 
with in-kernel iSCSI target) with dedicated data disks/arrays, where 
loadavg can be very high (100-200 is not uncommon), but actual CPU usage 
(user+system) is not more that 20%. For such systems load threshold 
plays bad role, unnecessarily slowing down cluster reactions.


Best,
Vladislav


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Colocation of a primitive resource with a clone with limited copies

2017-04-21 Thread Vladislav Bogdanov

20.04.2017 23:16, Jan Wrona wrote:

On 20.4.2017 19:33, Ken Gaillot wrote:

On 04/20/2017 10:52 AM, Jan Wrona wrote:

Hello,

my problem is closely related to the thread [1], but I didn't find a
solution there. I have a resource that is set up as a clone C restricted
to two copies (using the clone-max=2 meta attribute||), because the
resource takes long time to get ready (it starts immediately though),

A resource agent must not return from "start" until a "monitor"
operation would return success.

Beyond that, the cluster doesn't care what "ready" means, so it's OK if
it's not fully operational by some measure. However, that raises the
question of what you're accomplishing with your monitor.

I know all that and my RA respects that. I didn't want to go into
details about the service I'm running, but maybe it will help you
understand. Its a data collector which receives and processes data from
a UDP stream. To understand these data, it needs templates which
periodically occur in the stream (every five minutes or so). After
"start" the service is up and running, "monitor" operations are
successful, but until the templates arrive the service is not "ready". I
basically need to somehow simulate this "ready" state.


If you are able to detect that your application is ready (it already 
received its templates) in your RA's monitor, you may want to use 
transient node attributes to indicate that to the cluster. And tie your 
vip with such an attribute (with location constraint with rules).


http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/_using_rules_to_determine_resource_location.html#_location_rules_based_on_other_node_properties

Look at pacemaker/ping RA for attr management example.

[...]


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Corosync ring marked as FAULTY

2017-02-22 Thread Vladislav Bogdanov

22.02.2017 11:40, Denis Gribkov wrote:

Hi,


On 22/02/17 10:35, bliu wrote:

Did you specify interface with "-i " when
you are using tcpdump. If you did, corosync is not talking with the
multicast address, you need to check if your private network support
multicast.


Yes, I have used command:

# tcpdump -i em2 udp port 5505 -vv -X

Thanks for your advice, I'll ask network engineers about the issue.


That could be igmp querier issue. Corosync does not follow "common" 
model of mcast usage - one sender/router in a segment and many 
receivers. Instead, all corosync nodes are mcast senders and receivers. 
For that to work reliably, both IGMP snooping should be enabled and 
*work* in a segment and IGMP querier exists there (in absence of a mcast 
router). Also, if switches differ in that two segments, then issue could 
be with IGMPv2 vs IGMPv3 snooping support in that in ring0 segment. Not 
all switches support IGMPv3 (linux default) snooping, and to be on a 
safest side it could be needed to downgrade used linux IGMP version to 
v2 (/proc/sys/net/ipv4/conf//force_igmp_version).





--
Regards Denis Gribkov



___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Issue with attrd_updater hang

2017-01-09 Thread Vladislav Bogdanov
Hi!

our customers were hit by a quite strange issue with resources populating 
attributes
in attrd. The most obscure fact is that they see that issue only on a selected
subset of nodes (two nodes in a 8-node cluster). Symptoms are sporadic timeouts 
of
resources whose RAs call attrd_updater to manage node attributes. In order to 
debug
an issue we modified resource agents to run attrd_updater under strace and also 
to
collect attrd blackbox on a stop (after the timed-out monitor).

That way we managed to get a little bit further.

Below are the analysis for a failure registered Dec 17 08:08:19.
Resource has monitor timeout of 30 sec and interval 10 sec, so the monitor 
operation
was started around Dec 17 08:07:49:

Dec 17 08:08:19 [38018] pfs2n5   lrmd:  warning: child_timeout_callback:
ifspeed-o2ib0_monitor_1 process (PID 40819) timed out
Dec 17 08:08:24 [38018] pfs2n5   lrmd: crit: child_timeout_callback:
ifspeed-o2ib0_monitor_1 process (PID 40819) will not die!
^^ This is because of strace
Dec 17 08:08:48 [38018] pfs2n5   lrmd:  warning: operation_finished:
ifspeed-o2ib0_monitor_1:40819 - timed out after 3ms

Strace log for attrd_updater shows the following:
===
[...]
connect(3, {sa_family=AF_LOCAL, sun_path=@"attrd"}, 110) = 0
setsockopt(3, SOL_SOCKET, SO_PASSCRED, [1], 4) = 0
sendto(3, "\377\377\377\377\0\0\0\0\30\0\0\0\0\0\0\0\0\0\20\0\0\0\0\0", 24, 
MSG_NOSIGNAL, NULL, 0) = 24
setsockopt(3, SOL_SOCKET, SO_PASSCRED, [0], 4) = 0
recvfrom(3, 0x7ffdb55ade40, 12328, 16640, 0, 0) = -1 EAGAIN (Resource 
temporarily unavailable)
poll([{fd=3, events=POLLIN}], 1, -1)= 1 ([{fd=3, revents=POLLIN}])
recvfrom(3, 
"\377\377\377\377\0\0\0\0(0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\0\0\20\0"..., 
12328, MSG_WAITALL|MSG_NOSIGNAL, NULL, NULL) = 12328
mmap(NULL, 1052672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x7f6981ee6000
open("/dev/shm/qb-attrd-request-38019-40694-10-header", O_RDWR) = 4
ftruncate(4, 8252)  = 0
mmap(NULL, 8252, PROT_READ|PROT_WRITE, MAP_SHARED, 4, 0) = 0x7f698210
open("/dev/shm/qb-attrd-request-38019-40694-10-data", O_RDWR) = 5
ftruncate(5, 1052672)   = 0
mmap(NULL, 2105344, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x7f6981ce4000
mmap(0x7f6981ce4000, 1052672, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_FIXED, 5, 0) 
= 0x7f6981ce4000
mmap(0x7f6981de5000, 1052672, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_FIXED, 5, 0) 
= 0x7f6981de5000
close(5)= 0
close(4)= 0
open("/dev/shm/qb-attrd-response-38019-40694-10-header", O_RDWR) = 4
ftruncate(4, 8248)  = 0
mmap(NULL, 8248, PROT_READ|PROT_WRITE, MAP_SHARED, 4, 0) = 0x7f69820fd000
open("/dev/shm/qb-attrd-response-38019-40694-10-data", O_RDWR) = 5
ftruncate(5, 1052672)   = 0
mmap(NULL, 2105344, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x7f6981ae2000
mmap(0x7f6981ae2000, 1052672, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_FIXED, 5, 0) 
= 0x7f6981ae2000
mmap(0x7f6981be3000, 1052672, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_FIXED, 5, 0) 
= 0x7f6981be3000
close(5)= 0
close(4)= 0
open("/dev/shm/qb-attrd-event-38019-40694-10-header", O_RDWR) = 4
ftruncate(4, 8248)  = 0
mmap(NULL, 8248, PROT_READ|PROT_WRITE, MAP_SHARED, 4, 0) = 0x7f69820fa000
open("/dev/shm/qb-attrd-event-38019-40694-10-data", O_RDWR) = 5
ftruncate(5, 1052672)   = 0
mmap(NULL, 2105344, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x7f69818e
mmap(0x7f69818e, 1052672, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_FIXED, 5, 0) 
= 0x7f69818e
mmap(0x7f69819e1000, 1052672, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_FIXED, 5, 0) 
= 0x7f69819e1000
close(5)= 0
close(4)= 0
poll([{fd=3, events=POLLIN}], 1, 0) = 0 (Timeout)
sendto(3, "\355", 1, MSG_NOSIGNAL, NULL, 0) = 1
futex(0x7f69820ff010, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 0, {1481958460, 
283442042}, ) = 0
poll([{fd=3, events=POLLIN}], 1, 0) = 0 (Timeout)
exit_group(0)   = ?
+++ exited with 0 +++
==

At the same time attrd blackbox traces do not show a connect/update attempt at 
all:

[...]
debug   Dec 17 08:07:19 attrd_client_update(320):0: Broadcasting 
ifspeed-o2ib0[pfs2n5] = 56000
debug   Dec 17 08:07:29 attrd_client_update(320):0: Broadcasting 
ifspeed-o2ib0[pfs2n5] = 56000
debug   Dec 17 08:07:39 attrd_client_update(320):0: Broadcasting 
ifspeed-o2ib0[pfs2n5] = 56000

<<< Here should be lines about ifspeed-o2ib0[pfs2n5] at 08:07:49, but no lines 
with that timestamp there >>>

debug   Dec 17 08:08:48 attrd_client_update(320):0: Broadcasting 
fail-count-ifspeed-o2ib0[pfs2n5] = 2
debug   Dec 17 08:08:48 attrd_client_update(320):0: Broadcasting 
last-failure-ifspeed-o2ib0[pfs2n5] = 1481958528 
r_name="last-failure-ifspeed-o2ib0" task="update" 

Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Pacemaker 1.1.16 - Release Candidate 1

2016-11-09 Thread Vladislav Bogdanov

09.11.2016 10:59, Ulrich Windl wrote:

Ken Gaillot  schrieb am 08.11.2016 um 18:16 in Nachricht

<92c4a0de-33ce-cdc2-a778-17fddfe63...@redhat.com>:

On 11/08/2016 03:02 AM, Ulrich Windl wrote:


[...]

The user is responsible for choosing meaningful values. For example, if
node-health-base is +10 but yellow is -15, then any yellow attribute
will still push resources away. Of course, that could still be
meaningful when combined with other scores -- someone might do that if
they want a location preference of +5 to counteract a single yellow
attribute. Or maybe instead of node-health-base, someone sets a positive
stickiness, so existing resources can stay on a yellow node, but new
resources won't be placed there. It can be as simple or complicated as
you want to get :)


I think it's too complicated, already: In my simple world nodes with
status "green" are OK to run any resource, nodes with status "yellow"
should not start new resources, and nodes with status "red" should move
away running resources. Ok, I see the some people have a desire for
"orange", "indian yellow" and a lots of different colors on the
spectrum, but thinking of the seemingly endless cases to test, I prefer
a simple world ;-)


That was implemented mainly to allow drbd (among other replicated 
datastores) to be started as a secondary on a node with yellow 
attributes. That attributes come from outside of pacemaker and actually 
may have any meaning one may imagine. In our case we utilized new alerts 
feature to manage specific yellow attributes if resources repeatedly 
fail and recover, something similar to a flapping detection in nagios. 
So all services go away from a node with the worst health score, making 
another node "active" one. But we still wanted to have secondary data 
replica online and in sync.
The first attempt was to set positive "base" health attribute from a 
resource agent, but then we realized that resource won't start (rather 
expected, like every other resource) after node with some yellow 
attributes was brought back from the standby mode.

So the decision was made to submit a 5-line patch for pacemaker ;)





___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org






___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: Antw: Unexpected Resource movement after failover

2016-10-24 Thread Vladislav Bogdanov

24.10.2016 14:22, Nikhil Utane wrote:

I had set resource utilization to 1. Even then it scheduled 2 resources.
Doesn't it honor utilization resources if it doesn't find a free node?


To make utilization work you need to set both:
* node overall capacity (per-node utilization attribute)
* capacity usage by a resource (per-resource utilization attribute)



-Nikhil

On Mon, Oct 24, 2016 at 4:43 PM, Vladislav Bogdanov
<bub...@hoster-ok.com <mailto:bub...@hoster-ok.com>> wrote:

24.10.2016 14:04, Nikhil Utane wrote:

That is what happened here :(.
When 2 nodes went down, two resources got scheduled on single node.
Isn't there any way to stop this from happening. Colocation
constraint
is not helping.


If it is ok to have some instances not running in such outage cases,
you can limit them to 1-per-node with utilization attributes (as was
suggested earlier). Then, when nodes return, resource instances will
return with (and on!) them.



-Regards
Nikhil

On Sat, Oct 22, 2016 at 12:57 AM, Vladislav Bogdanov
<bub...@hoster-ok.com <mailto:bub...@hoster-ok.com>
<mailto:bub...@hoster-ok.com <mailto:bub...@hoster-ok.com>>> wrote:

21.10.2016 19:34, Andrei Borzenkov wrote:

    14.10.2016 10:39, Vladislav Bogdanov пишет:


use of utilization (balanced strategy) has one caveat:
resources are
not moved just because of utilization of one node is
less,
when nodes
have the same allocation score for the resource. So,
after the
simultaneus outage of two nodes in a 5-node cluster,
it may
appear
that one node runs two resources and two recovered
nodes run
nothing.


I call this a feature. Every resource move potentially
means service
outage, so it should not happen without explicit action.


In a case I describe that moves could be easily prevented by
using
stickiness (it increases allocation score on a current node).
The issue is that it is impossible to "re-balance" resources in
time-frames when stickiness is zero (over-night maintenance
window).



Original 'utilization' strategy only limits resource
placement, it is
not considered when choosing a node for a resource.



___
Users mailing list: Users@clusterlabs.org
<mailto:Users@clusterlabs.org>
<mailto:Users@clusterlabs.org
<mailto:Users@clusterlabs.org>>
http://clusterlabs.org/mailman/listinfo/users
<http://clusterlabs.org/mailman/listinfo/users>
<http://clusterlabs.org/mailman/listinfo/users
<http://clusterlabs.org/mailman/listinfo/users>>

Project Home: http://www.clusterlabs.org
Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
<http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
<http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
<http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>>
Bugs: http://bugs.clusterlabs.org



___
Users mailing list: Users@clusterlabs.org
<mailto:Users@clusterlabs.org> <mailto:Users@clusterlabs.org
<mailto:Users@clusterlabs.org>>
http://clusterlabs.org/mailman/listinfo/users
<http://clusterlabs.org/mailman/listinfo/users>
<http://clusterlabs.org/mailman/listinfo/users
<http://clusterlabs.org/mailman/listinfo/users>>

Project Home: http://www.clusterlabs.org
Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
<http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
<http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
<http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>>
Bugs: http://bugs.clusterlabs.org




___
Users mailing list: Users@clusterlabs.org
<mailto:Users@clusterlabs.org>
http://clusterlabs.org/mailman/listinfo/users
<http://clusterlabs.org/mailman/listinfo/users>

Project Home: http://www.clusterlabs.org
Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
<http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
Bu

Re: [ClusterLabs] Antw: Re: Antw: Unexpected Resource movement after failover

2016-10-21 Thread Vladislav Bogdanov

21.10.2016 19:34, Andrei Borzenkov wrote:

14.10.2016 10:39, Vladislav Bogdanov пишет:


use of utilization (balanced strategy) has one caveat: resources are
not moved just because of utilization of one node is less, when nodes
have the same allocation score for the resource. So, after the
simultaneus outage of two nodes in a 5-node cluster, it may appear
that one node runs two resources and two recovered nodes run
nothing.



I call this a feature. Every resource move potentially means service
outage, so it should not happen without explicit action.



In a case I describe that moves could be easily prevented by using 
stickiness (it increases allocation score on a current node).
The issue is that it is impossible to "re-balance" resources in 
time-frames when stickiness is zero (over-night maintenance window).




Original 'utilization' strategy only limits resource placement, it is
not considered when choosing a node for a resource.




___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: Antw: Unexpected Resource movement after failover

2016-10-14 Thread Vladislav Bogdanov
On October 14, 2016 10:13:17 AM GMT+03:00, Ulrich Windl 
 wrote:
 Nikhil Utane  schrieb am 13.10.2016 um
>16:43 in
>Nachricht
>:
>> Ulrich,
>> 
>> I have 4 resources only (not 5, nodes are 5). So then I only need 6
>> constraints, right?
>> 
>>  [,1]   [,2]   [,3]   [,4]   [,5]  [,6]
>> [1,] "A"  "A"  "A""B"   "B""C"
>> [2,] "B"  "C"  "D"   "C"  "D""D"
>
>Sorry for my confusion. As Andrei Borzenkovsaid in
>
>you probably have to add (A, B) _and_ (B, A)! Thinking about it, I
>wonder whether an easier solution would be using "utilization": If
>every node has one token to give, and every resource needs on token, no
>two resources will run on one node. Sounds like an easier solution to
>me.
>
>Regards,
>Ulrich
>
>
>> 
>> I understand that if I configure constraint of R1 with R2 score as
>> -infinity, then the same applies for R2 with R1 score as -infinity
>(don't
>> have to configure it explicitly).
>> I am not having a problem of multiple resources getting schedule on
>the
>> same node. Rather, one working resource is unnecessarily getting
>relocated.
>> 
>> -Thanks
>> Nikhil
>> 
>> 
>> On Thu, Oct 13, 2016 at 7:45 PM, Ulrich Windl <
>> ulrich.wi...@rz.uni-regensburg.de> wrote:
>> 
>>> Hi!
>>>
>>> Don't you need 10 constraints, excluding every possible pair of your
>5
>>> resources (named A-E here), like in this table (produced with R):
>>>
>>>  [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
>>> [1,] "A"  "A"  "A"  "A"  "B"  "B"  "B"  "C"  "C"  "D"
>>> [2,] "B"  "C"  "D"  "E"  "C"  "D"  "E"  "D"  "E"  "E"
>>>
>>> Ulrich
>>>
>>> >>> Nikhil Utane  schrieb am 13.10.2016
>um
>>> 15:59 in
>>> Nachricht
>>>
>:
>>> > Hi,
>>> >
>>> > I have 5 nodes and 4 resources configured.
>>> > I have configured constraint such that no two resources can be
>>> co-located.
>>> > I brought down a node (which happened to be DC). I was expecting
>the
>>> > resource on the failed node would be migrated to the 5th waiting
>node
>>> (that
>>> > is not running any resource).
>>> > However what happened was the failed node resource was started on
>another
>>> > active node (after stopping it's existing resource) and that
>node's
>>> > resource was moved to the waiting node.
>>> >
>>> > What could I be doing wrong?
>>> >
>>> > >> > name="have-watchdog"/>
>>> > value="1.1.14-5a6cdd1"
>>> > name="dc-version"/>
>>> > >> value="corosync"
>>> > name="cluster-infrastructure"/>
>>> > >> > name="stonith-enabled"/>
>>> > >> > name="no-quorum-policy"/>
>>> > value="240"
>>> > name="default-action-timeout"/>
>>> > >> > name="symmetric-cluster"/>
>>> >
>>> > # pcs constraint
>>> > Location Constraints:
>>> >   Resource: cu_2
>>> > Enabled on: Redun_CU4_Wb30 (score:0)
>>> > Enabled on: Redund_CU2_WB30 (score:0)
>>> > Enabled on: Redund_CU3_WB30 (score:0)
>>> > Enabled on: Redund_CU5_WB30 (score:0)
>>> > Enabled on: Redund_CU1_WB30 (score:0)
>>> >   Resource: cu_3
>>> > Enabled on: Redun_CU4_Wb30 (score:0)
>>> > Enabled on: Redund_CU2_WB30 (score:0)
>>> > Enabled on: Redund_CU3_WB30 (score:0)
>>> > Enabled on: Redund_CU5_WB30 (score:0)
>>> > Enabled on: Redund_CU1_WB30 (score:0)
>>> >   Resource: cu_4
>>> > Enabled on: Redun_CU4_Wb30 (score:0)
>>> > Enabled on: Redund_CU2_WB30 (score:0)
>>> > Enabled on: Redund_CU3_WB30 (score:0)
>>> > Enabled on: Redund_CU5_WB30 (score:0)
>>> > Enabled on: Redund_CU1_WB30 (score:0)
>>> >   Resource: cu_5
>>> > Enabled on: Redun_CU4_Wb30 (score:0)
>>> > Enabled on: Redund_CU2_WB30 (score:0)
>>> > Enabled on: Redund_CU3_WB30 (score:0)
>>> > Enabled on: Redund_CU5_WB30 (score:0)
>>> > Enabled on: Redund_CU1_WB30 (score:0)
>>> > Ordering Constraints:
>>> > Colocation Constraints:
>>> >   cu_3 with cu_2 (score:-INFINITY)
>>> >   cu_4 with cu_2 (score:-INFINITY)
>>> >   cu_4 with cu_3 (score:-INFINITY)
>>> >   cu_5 with cu_2 (score:-INFINITY)
>>> >   cu_5 with cu_3 (score:-INFINITY)
>>> >   cu_5 with cu_4 (score:-INFINITY)
>>> >
>>> > -Thanks
>>> > Nikhil
>>>
>>>
>>>
>>>
>>>
>>> ___
>>> Users mailing list: Users@clusterlabs.org 
>>> http://clusterlabs.org/mailman/listinfo/users 
>>>
>>> Project Home: http://www.clusterlabs.org 
>>> Getting started:
>http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>>> Bugs: http://bugs.clusterlabs.org 
>>>
>
>
>
>
>___
>Users mailing list: Users@clusterlabs.org
>http://clusterlabs.org/mailman/listinfo/users
>
>Project Home: http://www.clusterlabs.org
>Getting started:
>http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>Bugs: http://bugs.clusterlabs.org

Hi,

use of utilization (balanced 

Re: [ClusterLabs] Antw: Re: Establishing Timeouts

2016-10-11 Thread Vladislav Bogdanov

11.10.2016 09:31, Ulrich Windl wrote:

Klaus Wenninger  schrieb am 10.10.2016 um
20:04 in

Nachricht <936e4d4b-df5c-246d-4552-5678653b3...@redhat.com>:

On 10/10/2016 06:58 PM, Eric Robinson wrote:

Thanks for the clarification. So what's the easiest way to ensure
that the

cluster waits a desired timeout before deciding that a
re-convergence is necessary?

By raising the token (lost) timeout I would say.


Somewhat off-topic: I had always wished there were a kind of
spreadsheet where you could play with those parameters, and together
with required constraints you would be informed what consequences
changing one parameter has.


Nice wish/idea.
+1


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] ocf scripts shell and local variables

2016-08-29 Thread Vladislav Bogdanov
On August 29, 2016 11:07:39 PM GMT+03:00, Lars Ellenberg 
 wrote:
>On Mon, Aug 29, 2016 at 04:37:00PM +0200, Dejan Muhamedagic wrote:
>> Hi,
>> 
>> On Mon, Aug 29, 2016 at 02:58:11PM +0200, Gabriele Bulfon wrote:
>> > I think the main issue is the usage of the "local" operator in ocf*
>> > I'm not an expert on this operator (never used!), don't know how
>hard it is to replace it with a standard version.
>> 
>> Unfortunately, there's no command defined in POSIX which serves
>> the purpose of local, i.e. setting variables' scope. "local" is,
>> however, supported in almost all shells (including most versions
>> of ksh, but apparently not the one you run) and hence we
>> tolerated that in /bin/sh resource agents.
>
>local variables in shell:
>
>  dash (which we probably need to support) knows about "local",
>  and as far as I know, nothing else.
>
>  Some versions of dash treat "local a=A b=B"
>  different from "local a=A; local b=B;"
>
>  bash knows about typeset (which it considers obsolete),
>  declare (which is the replacement for typeset)
> and local (which is mostly, but not completely, identical to declare).
>
>  ksh can do function local variables with "typeset",
>  but only in functions defined with the function keyword,
>  NOT in functions that are defined with the "name()" syntax.
>
>function definitions in shell:
>
>  ksh treats "function x {}" and "x() {}" differently (see above)
>  bash knows both "function name {}" syntax, and "name() { }" syntax,
>  and treats them identically,
>  but dash only knows "name() {}" syntax. (at least in my version...)
>
>that's all broken.  always was.
>
>The result is that it is not possible to write shell scripts
>using functions with local variables that run in
>dash, bash and ksh.
>
>And no, I strongly do not think that we should "fall back" to the
>"art" of shell syntax and idioms that was force on you by the original"
>choose-your-brand-and-year-and-version shell, just because some
>"production systems" still have /bin/sh point to whatever it was
>their oldest ancestor system shipped with in the 19sixties...
>
>Maybe we should simply put some sanity check into
>ony of the first typically sourced helper "include" scripts,
>and bail out early with a sane message if it looks like it won't work?
>
>And also package all shell scripts with a shebang of
>/opt/bin/bash (or whatever) for non-linux systems?
>
>Lars
>
>
>___
>Users mailing list: Users@clusterlabs.org
>http://clusterlabs.org/mailman/listinfo/users
>
>Project Home: http://www.clusterlabs.org
>Getting started:
>http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>Bugs: http://bugs.clusterlabs.org

Maybe #!/bin/ocfsh symlink provided by resource-agents package?


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Doing reload right

2016-07-04 Thread Vladislav Bogdanov

01.07.2016 18:26, Ken Gaillot wrote:

[...]


You're right, "parameters" or "params" would be more consistent with
existing usage. "Instance attributes" is probably the most technically
correct term. I'll vote for "reload-params"


May be "reconfigure" fits better? This would at least introduce an 
action name which does not intersect with LSB/systemd/etc.


"reload" is for service itself as admin would expect, "reconfigure" is 
for its controlling resource.


[...]


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Node is silently unfenced if transition is very long

2016-06-17 Thread Vladislav Bogdanov

17.06.2016 15:05, Vladislav Bogdanov wrote:

03.05.2016 01:14, Ken Gaillot wrote:

On 04/19/2016 10:47 AM, Vladislav Bogdanov wrote:

Hi,

Just found an issue with node is silently unfenced.

That is quite large setup (2 cluster nodes and 8 remote ones) with
a plenty of slowly starting resources (lustre filesystem).

Fencing was initiated due to resource stop failure.
lustre often starts very slowly due to internal recovery, and some such
resources were starting in that transition where another resource
failed to stop.
And, as transition did not finish in time specified by the
"failure-timeout" (set to 9 min), and was not aborted, that stop
failure was successfully cleaned.
There were transition aborts due to attribute changes, after that
stop failure happened, but fencing
was not initiated for some reason.


Unfortunately, that makes sense with the current code. Failure timeout
changes the node attribute, which aborts the transition, which causes a
recalculation based on the new state, and the fencing is no longer


Ken, could this one be considered to be fixed before 1.1.15 is released?


I created https://github.com/ClusterLabs/pacemaker/pull/1072 for this
That is RFC, tested only to compile.
I hope that should be correct, please tell me if I do something damn 
wrong, or if there could be a better way.


Best,
Vladislav


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Node is silently unfenced if transition is very long

2016-06-17 Thread Vladislav Bogdanov

03.05.2016 01:14, Ken Gaillot wrote:

On 04/19/2016 10:47 AM, Vladislav Bogdanov wrote:

Hi,

Just found an issue with node is silently unfenced.

That is quite large setup (2 cluster nodes and 8 remote ones) with
a plenty of slowly starting resources (lustre filesystem).

Fencing was initiated due to resource stop failure.
lustre often starts very slowly due to internal recovery, and some such
resources were starting in that transition where another resource failed to 
stop.
And, as transition did not finish in time specified by the
"failure-timeout" (set to 9 min), and was not aborted, that stop failure was 
successfully cleaned.
There were transition aborts due to attribute changes, after that stop failure 
happened, but fencing
was not initiated for some reason.


Unfortunately, that makes sense with the current code. Failure timeout
changes the node attribute, which aborts the transition, which causes a
recalculation based on the new state, and the fencing is no longer


Ken, could this one be considered to be fixed before 1.1.15 is released?
I was just hit by the same in the completely different setup.
Two-node cluster, one node fails to stop a resource, and is fenced. 
Right after that second node fails to activate clvm volume (different 
story, need to investigate) and then fails to stop it. Node is scheduled 
to be fenced, but it cannot be because first node didn't come up yet.
Any cleanup (automatic or manual) of a resource failed to stop clears 
node state, removing "unclean" state from a node. That is probably not 
what I could expect (resource cleanup is a node unfence)...

Honestly, this potentially leads to a data corruption...

Also (probably not related) there was one more resource stop failure (in 
that case - timeout) prior to failed stop mentioned above. And that stop 
timeout did not lead to fencing by itself.


I have logs (but not pe-inputs/traces/blackboxes) from both nodes, so 
any additional information from them can be easily provided.


Best regards,
Vladislav


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Corosync 2.3.6 is available at corosync.org!

2016-06-16 Thread Vladislav Bogdanov

16.06.2016 16:04, Christine Caulfield wrote:

On 16/06/16 13:54, Vladislav Bogdanov wrote:

16.06.2016 15:28, Christine Caulfield wrote:

On 16/06/16 13:22, Vladislav Bogdanov wrote:

Hi,

16.06.2016 14:09, Jan Friesse wrote:

I am pleased to announce the latest maintenance release of Corosync
2.3.6 available immediately from our website at
http://build.clusterlabs.org/corosync/releases/.

[...]

Christine Caulfield (9):

[...]

 Add some more RO keys


Is there a strong reason to make quorum.wait_for_all read-only?



It's almost a no-op for documentation purposes. corosync has never
looked at that value after startup anyway. This just makes sure that an
error will be returned if an attempt is made to change it.


But it looks at it on a config reload, allowing to change
wait_for_all_status from 0 to 1, but not vice versa. And reload does not
look at "ro" - I though it does. That's fine.
IIUC, even after this change I still have everything working as expected
(I actually did not look at that part of code before):

Setting wait_for_all to 0 and two_node to 1 in config (both were not set
at all prior to that) and then reload leaves wait_for_all_status=0 and
NODE_FLAGS_WFASTATUS bit unset in flags. But setting wait_for_all to 1
after that (followed by another reload) sets wait_for_all_status=1 and
NODE_FLAGS_WFASTATUS bit.


Interesting. I'm not sure that's intended but it sounds safe :) I'll
look into it though - if only for my own curiousity.


Please do not fix^H^H^Hbreak that!
But some inline documentation about the current behavior is worth adding ;)






Great, thank you!



___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Corosync 2.3.6 is available at corosync.org!

2016-06-16 Thread Vladislav Bogdanov

16.06.2016 15:28, Christine Caulfield wrote:

On 16/06/16 13:22, Vladislav Bogdanov wrote:

Hi,

16.06.2016 14:09, Jan Friesse wrote:

I am pleased to announce the latest maintenance release of Corosync
2.3.6 available immediately from our website at
http://build.clusterlabs.org/corosync/releases/.

[...]

Christine Caulfield (9):

[...]

Add some more RO keys


Is there a strong reason to make quorum.wait_for_all read-only?



It's almost a no-op for documentation purposes. corosync has never
looked at that value after startup anyway. This just makes sure that an
error will be returned if an attempt is made to change it.


But it looks at it on a config reload, allowing to change 
wait_for_all_status from 0 to 1, but not vice versa. And reload does not 
look at "ro" - I though it does. That's fine.
IIUC, even after this change I still have everything working as expected 
(I actually did not look at that part of code before):


Setting wait_for_all to 0 and two_node to 1 in config (both were not set 
at all prior to that) and then reload leaves wait_for_all_status=0 and 
NODE_FLAGS_WFASTATUS bit unset in flags. But setting wait_for_all to 1 
after that (followed by another reload) sets wait_for_all_status=1 and 
NODE_FLAGS_WFASTATUS bit.


Great, thank you!

Vladislav



Chrissie


In one of products I use the following (fully-automated) actions to
migrate from one-node to two-node setup:

== mark second node "being joined"
* set quorum.wait_for_all to 0 to make cluster function if node is
reboot/power is lost
* set quorum.two_node to 1
* Add second node to corosync.conf
* reload corosync on a first node
* configure fencing in pacemaker (for both nodes)
* copy corosync.{key,conf} to a second node
* enable/start corosync on the second node
* set quorum.wait_for_all to 1
* copy corosync.conf again to a second node
* reload corosync on both nodes
== Only at this point mark second node "joined"
* enable/start pacemaker on a second node

I realize that all is a little bit paranoid, but actually it is handy
when you want to predict any problem you are not aware about yet.

Best regards,
Vladislav


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Corosync 2.3.6 is available at corosync.org!

2016-06-16 Thread Vladislav Bogdanov

Hi,

16.06.2016 14:09, Jan Friesse wrote:

I am pleased to announce the latest maintenance release of Corosync
2.3.6 available immediately from our website at
http://build.clusterlabs.org/corosync/releases/.

[...]

Christine Caulfield (9):

[...]

   Add some more RO keys


Is there a strong reason to make quorum.wait_for_all read-only?

In one of products I use the following (fully-automated) actions to 
migrate from one-node to two-node setup:


== mark second node "being joined"
* set quorum.wait_for_all to 0 to make cluster function if node is 
reboot/power is lost

* set quorum.two_node to 1
* Add second node to corosync.conf
* reload corosync on a first node
* configure fencing in pacemaker (for both nodes)
* copy corosync.{key,conf} to a second node
* enable/start corosync on the second node
* set quorum.wait_for_all to 1
* copy corosync.conf again to a second node
* reload corosync on both nodes
== Only at this point mark second node "joined"
* enable/start pacemaker on a second node

I realize that all is a little bit paranoid, but actually it is handy 
when you want to predict any problem you are not aware about yet.


Best regards,
Vladislav


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-06 Thread Vladislav Bogdanov

07.06.2016 02:20, Ken Gaillot wrote:

On 06/06/2016 03:30 PM, Vladislav Bogdanov wrote:

06.06.2016 22:43, Ken Gaillot wrote:

On 06/06/2016 12:25 PM, Vladislav Bogdanov wrote:

06.06.2016 19:39, Ken Gaillot wrote:

On 06/05/2016 07:27 PM, Andrew Beekhof wrote:

On Sat, Jun 4, 2016 at 12:16 AM, Ken Gaillot <kgail...@redhat.com>
wrote:

On 06/02/2016 08:01 PM, Andrew Beekhof wrote:

On Fri, May 20, 2016 at 1:53 AM, Ken Gaillot <kgail...@redhat.com>
wrote:

A recent thread discussed a proposed new feature, a new environment
variable that would be passed to resource agents, indicating
whether a
stop action was part of a recovery.

Since that thread was long and covered a lot of topics, I'm
starting a
new one to focus on the core issue remaining:

The original idea was to pass the number of restarts remaining
before
the resource will no longer tried to be started on the same node.
This
involves calculating (fail-count - migration-threshold), and that
implies certain limitations: (1) it will only be set when the
cluster
checks migration-threshold; (2) it will only be set for the failed
resource itself, not for other resources that may be recovered
due to
dependencies on it.

Ulrich Windl proposed an alternative: setting a boolean value
instead. I
forgot to cc the list on my reply, so I'll summarize now: We would
set a
new variable like OCF_RESKEY_CRM_recovery=true


This concept worries me, especially when what we've implemented is
called OCF_RESKEY_CRM_restarting.


Agreed; I plan to rename it yet again, to
OCF_RESKEY_CRM_start_expected.


The name alone encourages people to "optimise" the agent to not
actually stop the service "because its just going to start again
shortly".  I know thats not what Adam would do, but not everyone
understands how clusters work.

There are any number of reasons why a cluster that intends to
restart
a service may not do so.  In such a scenario, a badly written agent
would cause the cluster to mistakenly believe that the service is
stopped - allowing it to start elsewhere.

Its true there are any number of ways to write bad agents, but I
would
argue that we shouldn't be nudging people in that direction :)


I do have mixed feelings about that. I think if we name it
start_expected, and document it carefully, we can avoid any casual
mistakes.

My main question is how useful would it actually be in the
proposed use
cases. Considering the possibility that the expected start might
never
happen (or fail), can an RA really do anything different if
start_expected=true?


I would have thought not.  Correctness should trump optimal.
But I'm prepared to be mistaken.


If the use case is there, I have no problem with
adding it, but I want to make sure it's worthwhile.


Anyone have comments on this?

A simple example: pacemaker calls an RA stop with start_expected=true,
then before the start happens, someone disables the resource, so the
start is never called. Or the node is fenced before the start happens,
etc.

Is there anything significant an RA can do differently based on
start_expected=true/false without causing problems if an expected start
never happens?


Yep.

It may request stop of other resources
* on that node by removing some node attributes which participate in
location constraints
* or cluster-wide by revoking/putting to standby cluster ticket other
resources depend on

Latter case is that's why I asked about the possibility of passing the
node name resource is intended to be started on instead of a boolean
value (in comments to PR #1026) - I would use it to request stop of
lustre MDTs and OSTs by revoking ticket they depend on if MGS (primary
lustre component which does all "request routing") fails to start
anywhere in cluster. That way, if RA does not receive any node name,


Why would ordering constraints be insufficient?


They are in place, but advisory ones to allow MGS fail/switch-over.


What happens if the MDTs/OSTs continue running because a start of MGS
was expected, but something prevents the start from actually happening?


Nothing critical, lustre clients won't be able to contact them without
MGS running and will hang.
But it is safer to shutdown them if it is known that MGS cannot be
started right now. Especially if geo-cluster failover is expected in
that case (as MGS can be local to a site, countrary to all other lustre
parts which need to be replicated). Actually that is the only part of a
puzzle remaining to "solve" that big project, and IMHO it is enough to
have a node name of a intended start or nothing in that attribute
(nothing means stop everything and initiate geo-failover if needed). If
f.e. fencing happens for a node intended to start resource, then stop
will be called again after the next start failure after failure-timeout
lapses. That would be much better than no information at all. Total stop
or geo-failover will happen just with some (configurable) delay instead
of rendering the wh

Re: [ClusterLabs] Different pacemaker versions split cluster

2016-06-06 Thread Vladislav Bogdanov

06.06.2016 23:28, Ken Gaillot wrote:

On 05/30/2016 01:14 PM, DacioMF wrote:

Hi,

I had 4 nodes with Ubuntu 14.04LTS in my cluster and all of then worked well. I 
need upgrade all my cluster nodes to Ubuntu 16.04LTS without stop my resources. 
Two nodes have been updated to 16.04 and the two others remains with 14.04. The 
problem is that my cluster was splited and the nodes with Ubuntu 14.04 only 
work with the other in the same version. The same is true for the nodes with 
Ubuntu 16.04. The feature set of pacemaker in Ubuntu 14.04 is v3.0.7 and in 
16.04 is v3.0.10.

The following commands shows what's happening:

root@xenserver50:/var/log/corosync# crm status
Last updated: Thu May 19 17:19:06 2016
Last change: Thu May 19 09:00:48 2016 via cibadmin on xenserver50
Stack: corosync
Current DC: xenserver51 (51) - partition with quorum
Version: 1.1.10-42f2063
4 Nodes configured
4 Resources configured

Online: [ xenserver50 xenserver51 ]
OFFLINE: [ xenserver52 xenserver54 ]

-

root@xenserver52:/var/log/corosync# crm status
Last updated: Thu May 19 17:20:04 2016Last change: Thu May 19 08:54:57 
2016 by hacluster via crmd on xenserver54
Stack: corosync
Current DC: xenserver52 (version 1.1.14-70404b0) - partition with quorum
4 nodes and 4 resources configured

Online: [ xenserver52 xenserver54 ]
OFFLINE: [ xenserver50 xenserver51 ]

xenserver52 and xenserver54 are Ubuntu 16.04 the others are Ubuntu 14.04.

Someone knows what's the problem?

Sorry by my poor english.

Best regards,
 DacioMF Analista de Redes e Infraestrutura


Hi,

We aim for backward compatibility, so this likely is a bug. Can you
attach the output of crm_report from around this time?

  crm_report --from "-M-D H:M:S" --to "-M-D H:M:S"

FYI, you cannot do a rolling upgrade from corosync 1 to corosync 2, but
I believe both 14.04 and 16.04 use corosync 2.


iirc there were incompatible wire changes, probably between 2.1 and 2.2 
(or 2.2 and 2.3) at least if crypto/secauth is enabled.




___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-06 Thread Vladislav Bogdanov

06.06.2016 22:43, Ken Gaillot wrote:

On 06/06/2016 12:25 PM, Vladislav Bogdanov wrote:

06.06.2016 19:39, Ken Gaillot wrote:

On 06/05/2016 07:27 PM, Andrew Beekhof wrote:

On Sat, Jun 4, 2016 at 12:16 AM, Ken Gaillot <kgail...@redhat.com>
wrote:

On 06/02/2016 08:01 PM, Andrew Beekhof wrote:

On Fri, May 20, 2016 at 1:53 AM, Ken Gaillot <kgail...@redhat.com>
wrote:

A recent thread discussed a proposed new feature, a new environment
variable that would be passed to resource agents, indicating
whether a
stop action was part of a recovery.

Since that thread was long and covered a lot of topics, I'm
starting a
new one to focus on the core issue remaining:

The original idea was to pass the number of restarts remaining before
the resource will no longer tried to be started on the same node.
This
involves calculating (fail-count - migration-threshold), and that
implies certain limitations: (1) it will only be set when the cluster
checks migration-threshold; (2) it will only be set for the failed
resource itself, not for other resources that may be recovered due to
dependencies on it.

Ulrich Windl proposed an alternative: setting a boolean value
instead. I
forgot to cc the list on my reply, so I'll summarize now: We would
set a
new variable like OCF_RESKEY_CRM_recovery=true


This concept worries me, especially when what we've implemented is
called OCF_RESKEY_CRM_restarting.


Agreed; I plan to rename it yet again, to
OCF_RESKEY_CRM_start_expected.


The name alone encourages people to "optimise" the agent to not
actually stop the service "because its just going to start again
shortly".  I know thats not what Adam would do, but not everyone
understands how clusters work.

There are any number of reasons why a cluster that intends to restart
a service may not do so.  In such a scenario, a badly written agent
would cause the cluster to mistakenly believe that the service is
stopped - allowing it to start elsewhere.

Its true there are any number of ways to write bad agents, but I would
argue that we shouldn't be nudging people in that direction :)


I do have mixed feelings about that. I think if we name it
start_expected, and document it carefully, we can avoid any casual
mistakes.

My main question is how useful would it actually be in the proposed use
cases. Considering the possibility that the expected start might never
happen (or fail), can an RA really do anything different if
start_expected=true?


I would have thought not.  Correctness should trump optimal.
But I'm prepared to be mistaken.


If the use case is there, I have no problem with
adding it, but I want to make sure it's worthwhile.


Anyone have comments on this?

A simple example: pacemaker calls an RA stop with start_expected=true,
then before the start happens, someone disables the resource, so the
start is never called. Or the node is fenced before the start happens,
etc.

Is there anything significant an RA can do differently based on
start_expected=true/false without causing problems if an expected start
never happens?


Yep.

It may request stop of other resources
* on that node by removing some node attributes which participate in
location constraints
* or cluster-wide by revoking/putting to standby cluster ticket other
resources depend on

Latter case is that's why I asked about the possibility of passing the
node name resource is intended to be started on instead of a boolean
value (in comments to PR #1026) - I would use it to request stop of
lustre MDTs and OSTs by revoking ticket they depend on if MGS (primary
lustre component which does all "request routing") fails to start
anywhere in cluster. That way, if RA does not receive any node name,


Why would ordering constraints be insufficient?


They are in place, but advisory ones to allow MGS fail/switch-over.


What happens if the MDTs/OSTs continue running because a start of MGS
was expected, but something prevents the start from actually happening?


Nothing critical, lustre clients won't be able to contact them without 
MGS running and will hang.
But it is safer to shutdown them if it is known that MGS cannot be 
started right now. Especially if geo-cluster failover is expected in 
that case (as MGS can be local to a site, countrary to all other lustre 
parts which need to be replicated). Actually that is the only part of a 
puzzle remaining to "solve" that big project, and IMHO it is enough to 
have a node name of a intended start or nothing in that attribute 
(nothing means stop everything and initiate geo-failover if needed). If 
f.e. fencing happens for a node intended to start resource, then stop 
will be called again after the next start failure after failure-timeout 
lapses. That would be much better than no information at all. Total stop 
or geo-failover will happen just with some (configurable) delay instead 
of rendering the whole filesystem to an unusable state requiring manual 
intervention.





then it can b

Re: [ClusterLabs] Informing RAs about recovery: failed resource recovery, or any start-stop cycle?

2016-06-06 Thread Vladislav Bogdanov

06.06.2016 19:39, Ken Gaillot wrote:

On 06/05/2016 07:27 PM, Andrew Beekhof wrote:

On Sat, Jun 4, 2016 at 12:16 AM, Ken Gaillot  wrote:

On 06/02/2016 08:01 PM, Andrew Beekhof wrote:

On Fri, May 20, 2016 at 1:53 AM, Ken Gaillot  wrote:

A recent thread discussed a proposed new feature, a new environment
variable that would be passed to resource agents, indicating whether a
stop action was part of a recovery.

Since that thread was long and covered a lot of topics, I'm starting a
new one to focus on the core issue remaining:

The original idea was to pass the number of restarts remaining before
the resource will no longer tried to be started on the same node. This
involves calculating (fail-count - migration-threshold), and that
implies certain limitations: (1) it will only be set when the cluster
checks migration-threshold; (2) it will only be set for the failed
resource itself, not for other resources that may be recovered due to
dependencies on it.

Ulrich Windl proposed an alternative: setting a boolean value instead. I
forgot to cc the list on my reply, so I'll summarize now: We would set a
new variable like OCF_RESKEY_CRM_recovery=true


This concept worries me, especially when what we've implemented is
called OCF_RESKEY_CRM_restarting.


Agreed; I plan to rename it yet again, to OCF_RESKEY_CRM_start_expected.


The name alone encourages people to "optimise" the agent to not
actually stop the service "because its just going to start again
shortly".  I know thats not what Adam would do, but not everyone
understands how clusters work.

There are any number of reasons why a cluster that intends to restart
a service may not do so.  In such a scenario, a badly written agent
would cause the cluster to mistakenly believe that the service is
stopped - allowing it to start elsewhere.

Its true there are any number of ways to write bad agents, but I would
argue that we shouldn't be nudging people in that direction :)


I do have mixed feelings about that. I think if we name it
start_expected, and document it carefully, we can avoid any casual mistakes.

My main question is how useful would it actually be in the proposed use
cases. Considering the possibility that the expected start might never
happen (or fail), can an RA really do anything different if
start_expected=true?


I would have thought not.  Correctness should trump optimal.
But I'm prepared to be mistaken.


If the use case is there, I have no problem with
adding it, but I want to make sure it's worthwhile.


Anyone have comments on this?

A simple example: pacemaker calls an RA stop with start_expected=true,
then before the start happens, someone disables the resource, so the
start is never called. Or the node is fenced before the start happens, etc.

Is there anything significant an RA can do differently based on
start_expected=true/false without causing problems if an expected start
never happens?


Yep.

It may request stop of other resources
* on that node by removing some node attributes which participate in 
location constraints
* or cluster-wide by revoking/putting to standby cluster ticket other 
resources depend on


Latter case is that's why I asked about the possibility of passing the 
node name resource is intended to be started on instead of a boolean 
value (in comments to PR #1026) - I would use it to request stop of 
lustre MDTs and OSTs by revoking ticket they depend on if MGS (primary 
lustre component which does all "request routing") fails to start 
anywhere in cluster. That way, if RA does not receive any node name, 
then it can be "almost sure" pacemaker does not intend to restart 
resource (yet) and can request it to stop everything else (because 
filesystem is not usable anyways). Later, if another start attempt 
(caused by failure-timeout expiration) succeeds, RA may grant the ticket 
back, and all other resources start again.


Best,
Vladislav



___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] mail server (postfix)

2016-06-06 Thread Vladislav Bogdanov

05.06.2016 22:22, Dimitri Maziuk wrote:

On 06/04/2016 01:02 PM, Vladislav Bogdanov wrote:


I'd modify RA to support master/slave concept.


I'm assuming you use a shared mail store on your imapd cluster? I want


No, I use cyrus internal replication.


to host the storage on the same cluster with mail daemons. I want to a)
stop accepting mail, b) fail-over drbd maildirs, then c) restart postfix
in send-only "slave" configuration. On the other node I could simply
restart the "master" postfix after b), but on the node going passive the
b) has to be between a) and c).


Do you have reasons for b) to be strictly between a) and c) ?

I'd propose something as the following:
0a) service is a master on one node (nodeA) - it listens on socket and 
stores mail to DRBD-backd maildirs.

0b) service is a slave on second node (nodeB) - send-only config
1) stop VIP on nodeA
2) demote service on nodeA (replace config and restart/reload it 
internally in the RA) - that would combine your a) and c)

3) demote DRBD on nodeA (first part of your b) )
4) promote DRBD on nodeB (second part of b) )
5) promote service on nodeB - replace config and internally reload/restart
6) start VIP on nodeB

1-2 and 5-6 pairs may need to be reversed if you bind service to a 
specific VIP (instead of listening on INADDR_ANY).


For 2 and 5 to work correctly you need to colocate service in *master* 
role with DRBD in *master* role. That way "slave" service instance does 
not require DRBD at all.


Hope this helps,
Vladislav


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] mail server (postfix)

2016-06-04 Thread Vladislav Bogdanov
3 .6.2016 г. 20:33:01 GMT+03:00, Dimitri Maziuk  wrote

Sorry for top-post.

I'd modify RA to support master/slave concept. I use the same approach to 
manage cyrus-imapd replicas, passing them different pre-installed config files, 
depending on operation, start, promote, or demote.

Best,
Vladislav

>Hi all,
>
>quick question: is anyone running an MTA on an active-passive cluster?
>Specifically, I need to do a stop - wait for drbd fs to move over -
>update symlinks - then start again -- on both nodes. So that on the
>active node the MTA runs with "mail gateway" postfix config in
>/drbd/etc/postfix and on the passive: with "send-only" config in
>/etc/postfix.
>
>Off the top of my head it looks like defining two postfix resources
>that
>both start/stop the same postfix only at different times/on different
>nodes should do the trick. Any gotchas I'm not seeing? Better ways to
>accomplish it?
>
>(I know running an MTA that way is not the Approved Way(tm), I have my
>reasons for wanting to it like this.)
>
>TIA



___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Node is silently unfenced if transition is very long

2016-04-19 Thread Vladislav Bogdanov
Hi,

Just found an issue with node is silently unfenced.

That is quite large setup (2 cluster nodes and 8 remote ones) with
a plenty of slowly starting resources (lustre filesystem).

Fencing was initiated due to resource stop failure.
lustre often starts very slowly due to internal recovery, and some such
resources were starting in that transition where another resource failed to 
stop.
And, as transition did not finish in time specified by the
"failure-timeout" (set to 9 min), and was not aborted, that stop failure was 
successfully cleaned.
There were transition aborts due to attribute changes, after that stop failure 
happened, but fencing
was not initiated for some reason.
Node where stop failed was a DC.
pacemaker is 1.1.14-5a6cdd1 (from fedora, built on EL7)

Here is log excerpt illustrating the above:
Apr 19 14:57:56 mds1 pengine[3452]:   notice: Movemdt0-es03a-vg
(Started mds1 -> mds0)
Apr 19 14:58:06 mds1 pengine[3452]:   notice: Movemdt0-es03a-vg
(Started mds1 -> mds0)
Apr 19 14:58:10 mds1 crmd[3453]:   notice: Initiating action 81: monitor 
mdt0-es03a-vg_monitor_0 on mds0
Apr 19 14:58:11 mds1 crmd[3453]:   notice: Initiating action 2993: stop 
mdt0-es03a-vg_stop_0 on mds1 (local)
Apr 19 14:58:11 mds1 LVM(mdt0-es03a-vg)[6228]: INFO: Deactivating volume group 
vg_mdt0_es03a
Apr 19 14:58:12 mds1 LVM(mdt0-es03a-vg)[6541]: ERROR: Logical volume 
vg_mdt0_es03a/mdt0 contains a filesystem in use. Can't deactivate volume group 
"vg_mdt0_es03a" with 1 open logical volume(s)
[...]
Apr 19 14:58:30 mds1 LVM(mdt0-es03a-vg)[9939]: ERROR: LVM: vg_mdt0_es03a did 
not stop correctly
Apr 19 14:58:30 mds1 LVM(mdt0-es03a-vg)[9943]: WARNING: vg_mdt0_es03a still 
Active
Apr 19 14:58:30 mds1 LVM(mdt0-es03a-vg)[9947]: INFO: Retry deactivating volume 
group vg_mdt0_es03a
Apr 19 14:58:31 mds1 lrmd[3450]:   notice: mdt0-es03a-vg_stop_0:5865:stderr [ 
ocf-exit-reason:LVM: vg_mdt0_es03a did not stop correctly ]
[...]
Apr 19 14:58:31 mds1 lrmd[3450]:   notice: mdt0-es03a-vg_stop_0:5865:stderr [ 
ocf-exit-reason:LVM: vg_mdt0_es03a did not stop correctly ]
Apr 19 14:58:31 mds1 crmd[3453]:   notice: Operation mdt0-es03a-vg_stop_0: 
unknown error (node=mds1, call=324, rc=1, cib-update=1695, confirmed=true)
Apr 19 14:58:31 mds1 crmd[3453]:   notice: mds1-mdt0-es03a-vg_stop_0:324 [ 
ocf-exit-reason:LVM: vg_mdt0_es03a did not stop correctly\nocf-exit-reason:LVM: 
vg_mdt0_es03a did not stop correctly\nocf-exit-reason:LVM: vg_mdt0_es03a did 
not stop correctly\nocf-exit-reason:LVM: vg_mdt0_es03a did not stop 
correctly\nocf-exit-reason:LVM: vg_mdt0_es03a did not stop 
correctly\nocf-exit-reason:LVM: vg_mdt0_es03a did not stop 
correctly\nocf-exit-reason:LVM: vg_mdt0_es03a did not stop 
correctly\nocf-exit-reason:LVM: vg_mdt0_es03a did not stop correctl
Apr 19 14:58:31 mds1 crmd[3453]:  warning: Action 2993 (mdt0-es03a-vg_stop_0) 
on mds1 failed (target: 0 vs. rc: 1): Error
Apr 19 14:58:31 mds1 crmd[3453]:  warning: Action 2993 (mdt0-es03a-vg_stop_0) 
on mds1 failed (target: 0 vs. rc: 1): Error
Apr 19 15:02:03 mds1 pengine[3452]:  warning: Processing failed op stop for 
mdt0-es03a-vg on mds1: unknown error (1)
Apr 19 15:02:03 mds1 pengine[3452]:  warning: Processing failed op stop for 
mdt0-es03a-vg on mds1: unknown error (1)
Apr 19 15:02:03 mds1 pengine[3452]:  warning: Node mds1 will be fenced because 
of resource failure(s)
Apr 19 15:02:03 mds1 pengine[3452]:  warning: Forcing mdt0-es03a-vg away from 
mds1 after 100 failures (max=100)
Apr 19 15:02:03 mds1 pengine[3452]:  warning: Scheduling Node mds1 for STONITH
Apr 19 15:02:03 mds1 pengine[3452]:   notice: Stop of failed resource 
mdt0-es03a-vg is implicit after mds1 is fenced
Apr 19 15:02:03 mds1 pengine[3452]:   notice: Recover mdt0-es03a-vg
(Started mds1 -> mds0)
[... many of these ]
Apr 19 15:07:22 mds1 pengine[3452]:  warning: Processing failed op stop for 
mdt0-es03a-vg on mds1: unknown error (1)
Apr 19 15:07:22 mds1 pengine[3452]:  warning: Processing failed op stop for 
mdt0-es03a-vg on mds1: unknown error (1)
Apr 19 15:07:22 mds1 pengine[3452]:  warning: Node mds1 will be fenced because 
of resource failure(s)
Apr 19 15:07:22 mds1 pengine[3452]:  warning: Forcing mdt0-es03a-vg away from 
mds1 after 100 failures (max=100)
Apr 19 15:07:23 mds1 pengine[3452]:  warning: Scheduling Node mds1 for STONITH
Apr 19 15:07:23 mds1 pengine[3452]:   notice: Stop of failed resource 
mdt0-es03a-vg is implicit after mds1 is fenced
Apr 19 15:07:23 mds1 pengine[3452]:   notice: Recover mdt0-es03a-vg
(Started mds1 -> mds0)
Apr 19 15:07:24 mds1 pengine[3452]:  warning: Processing failed op stop for 
mdt0-es03a-vg on mds1: unknown error (1)
Apr 19 15:07:24 mds1 pengine[3452]:  warning: Processing failed op stop for 
mdt0-es03a-vg on mds1: unknown error (1)
Apr 19 15:07:24 mds1 pengine[3452]:  warning: Node mds1 will be fenced because 
of resource failure(s)
Apr 19 15:07:24 mds1 pengine[3452]:  warning: Forcing mdt0-es03a-vg away from 

Re: [ClusterLabs] Approach to validate on stop op (Was Re: crmsh configure delete for constraints)

2016-03-29 Thread Vladislav Bogdanov

29.03.2016 15:28, Vladislav Bogdanov wrote:
[...]

 *) # monitor | notify | reload | etc
 validate
 ret=$?
 if [ ${ret} -ne $OCF_SUCCESS ] ; then
 if ocf_is_probe ; then
 exit $OCF_NOT_RUNNING
 fi
 exit $?


Of course it is exit ${ret}


 fi
 ;;





___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Approach to validate on stop op (Was Re: crmsh configure delete for constraints)

2016-03-29 Thread Vladislav Bogdanov

10.02.2016 12:31, Vladislav Bogdanov wrote:

10.02.2016 11:38, Ulrich Windl wrote:

Vladislav Bogdanov <bub...@hoster-ok.com> schrieb am 10.02.2016 um
05:39 in

Nachricht <6e479808-6362-4932-b2c6-348c7efc4...@hoster-ok.com>:

[...]

Well, I'd reword. Generally, RA should not exit with error if validation
fails on stop.
Is that better?

[...]

As we have different error codes, what type of error?


Any which makes pacemaker to think resource stop op failed.
OCF_ERR_* particularly.

If pacemaker has got an error on start, it will run stop with the same
set of parameters anyways. And will get error again if that one was from
validation and RA does not differentiate validation for start and stop.
And then circular fencing over the whole cluster is triggered for no
reason.

Of course, for safety, RA could save its state if start was successful
and skip validation on stop only if that state is not found. Otherwise
removed binary or config file would result in resource running on
several nodes.

Well, this all seems to be very complicated to make some general
algorithm ;)


Well, after some thinking, I've got an approach which sounds both 
elegant and safe enough to me and my colleagues. Please look at the 
following excerpt (part of hypothetical RA before the main 'case'):


-
VALIDATION_FAILURE_FLAG="${HA_RSCTMP}/${OCF_RESOURCE_INSTANCE}.invalid"

case "${__OCF_ACTION}" in
meta-data)
meta_data
exit $OCF_SUCCESS
;;
usage|help)
usage
exit $OCF_SUCCESS
;;
start)
validate
ret=$?
if [ ${ret} -ne $OCF_SUCCESS ] ; then
touch "${VALIDATION_FAILURE_FLAG}"
exit ${ret}
fi
;;
stop)
validate
ret=$?
if [ ${ret} -ne $OCF_SUCCESS ] ; then
if [ -f "${VALIDATION_FAILURE_FLAG}" ] ; then
rm -f "${VALIDATION_FAILURE_FLAG}"
exit $OCF_SUCCESS
else
exit ${ret}
fi
fi
;;
*) # monitor | notify | reload | etc
validate
ret=$?
if [ ${ret} -ne $OCF_SUCCESS ] ; then
if ocf_is_probe ; then
exit $OCF_NOT_RUNNING
fi
exit $?
fi
;;
esac
-

Above assumes that validation function does not call exit (and thus uses 
have_binary instead of check_binary, etc.) but returns an error code.


The main difference to the current ocf_rarun implementation is that 
changes to machine environment (deleted binaries, configs, etc.) still 
result in stop failure (and thus fencing) if that changes were made 
after the successful validation on resource start.


I plan to extensively test such approach in my RAs shortly.

Comments are welcome.

Best,
Vladislav







Regards,
Ulrich



___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] attrd does not clean per-node cache after node removal

2016-03-23 Thread Vladislav Bogdanov

23.03.2016 19:39, Ken Gaillot wrote:

On 03/23/2016 07:35 AM, Vladislav Bogdanov wrote:

Hi!

It seems like atomic attrd in post-1.1.14 (eb89393) does not
fully clean node cache after node is removed.


Is this a regression? Or have you only tried it with this version?


Only with this one.




After our QA guys remove node wa-test-server-ha-03 from a two-node cluster:
* stop pacemaker and corosync on wa-test-server-ha-03
* remove node wa-test-server-ha-03 from corosync nodelist on 
wa-test-server-ha-04
* tune votequorum settings
* reload corosync on wa-test-server-ha-04
* remove node from pacemaker on wa-test-server-ha-04
* delete everything from /var/lib/pacemaker/cib on wa-test-server-ha-03
, and then join it with the different corosync ID (but with the same node name),
we see the following in logs:

Leave node 1 (wa-test-server-ha-03):
Mar 23 04:19:53 wa-test-server-ha-04 attrd[25962]:   notice: 
crm_update_peer_proc: Node wa-test-server-ha-03[1] - state is now lost (was 
member)
Mar 23 04:19:53 wa-test-server-ha-04 attrd[25962]:   notice: Removing all 
wa-test-server-ha-03 (1) attributes for attrd_peer_change_cb
Mar 23 04:19:53 wa-test-server-ha-04 attrd[25962]:   notice: Lost attribute 
writer wa-test-server-ha-03
Mar 23 04:19:53 wa-test-server-ha-04 attrd[25962]:   notice: Removing 
wa-test-server-ha-03/1 from the membership list
Mar 23 04:19:53 wa-test-server-ha-04 attrd[25962]:   notice: Purged 1 peers 
with id=1 and/or uname=wa-test-server-ha-03 from the membership cache
Mar 23 04:19:56 wa-test-server-ha-04 attrd[25962]:   notice: Processing 
peer-remove from wa-test-server-ha-04: wa-test-server-ha-03 0
Mar 23 04:19:56 wa-test-server-ha-04 attrd[25962]:   notice: Removing all 
wa-test-server-ha-03 (0) attributes for wa-test-server-ha-04
Mar 23 04:19:56 wa-test-server-ha-04 attrd[25962]:   notice: Removing 
wa-test-server-ha-03/1 from the membership list
Mar 23 04:19:56 wa-test-server-ha-04 attrd[25962]:   notice: Purged 1 peers 
with id=0 and/or uname=wa-test-server-ha-03 from the membership cache

Join node 3 (the same one, wa-test-server-ha-03, but ID differs):
Mar 23 04:21:23 wa-test-server-ha-04 attrd[25962]: notice: 
crm_update_peer_proc: Node wa-test-server-ha-03[3] - state is now member (was 
(null))
Mar 23 04:21:26 wa-test-server-ha-04 attrd[25962]:  warning: crm_find_peer: 
Node 3/wa-test-server-ha-03 = 0x201bf30 - a4cbcdeb-c36a-4a0e-8ed6-c45b3db89296
Mar 23 04:21:26 wa-test-server-ha-04 attrd[25962]:  warning: crm_find_peer: 
Node 2/wa-test-server-ha-04 = 0x1f90e20 - 6c18faa1-f8c2-4b0c-907c-20db450e2e79
Mar 23 04:21:26 wa-test-server-ha-04 attrd[25962]: crit: Node 1 and 3 share 
the same name 'wa-test-server-ha-03'


It took me a while to understand the above combination of messages. This
is not node 3 joining. This is node 1 joining after node 3 has already
been seen.


Hmmm...
corosync.conf and corosync-cmapctl both say it is 3
Also, cib lists it as 3 and lrmd puts its status records under 3.

Actually issue is that drbd resources are not promoted because their 
master attributes go to section with node-id 1. And that is the only 
reason why we found that. Everything not related to volatile attributes 
works well.




The warnings are a complete dump of the peer cache. So you can see that
wa-test-server-ha-03 is listed only once, with id 3.

The critical message ("Node 1 and 3") lists the new id first and the
found ID second. So id 1 is what it's trying to add to the cache.


But there is also 'Node 'wa-test-server-ha-03' has changed its ID from 1 
to 3' -  it goes first. Does that matter?




Did you update the node ID in corosync.conf on *both* nodes?


Sure.
It is automatically copied to a node being joined.




Mar 23 04:21:29 wa-test-server-ha-04 attrd[25962]:   notice: Node 
'wa-test-server-ha-03' has changed its ID from 1 to 3
Mar 23 04:21:29 wa-test-server-ha-04 attrd[25962]:  warning: crm_find_peer: 
Node 3/wa-test-server-ha-03 = 0x201bf30 - a4cbcdeb-c36a-4a0e-8ed6-c45b3db89296
Mar 23 04:21:29 wa-test-server-ha-04 attrd[25962]:  warning: crm_find_peer: 
Node 2/wa-test-server-ha-04 = 0x1f90e20 - 6c18faa1-f8c2-4b0c-907c-20db450e2e79
Mar 23 04:21:29 wa-test-server-ha-04 attrd[25962]: crit: Node 1 and 3 share 
the same name 'wa-test-server-ha-03'
Mar 23 04:21:31 wa-test-server-ha-04 attrd[25962]:   notice: Node 
'wa-test-server-ha-03' has changed its ID from 1 to 3
Mar 23 04:21:31 wa-test-server-ha-04 attrd[25962]:  warning: crm_find_peer: 
Node 3/wa-test-server-ha-03 = 0x201bf30 - a4cbcdeb-c36a-4a0e-8ed6-c45b3db89296
Mar 23 04:21:31 wa-test-server-ha-04 attrd[25962]:  warning: crm_find_peer: 
Node 2/wa-test-server-ha-04 = 0x1f90e20 - 6c18faa1-f8c2-4b0c-907c-20db450e2e79
Mar 23 04:21:31 wa-test-server-ha-04 attrd[25962]: crit: Node 1 and 3 share 
the same name 'wa-test-server-ha-03'
Mar 23 04:21:31 wa-test-server-ha-04 attrd[25962]:   notice: Node 
'wa-test-server-ha-03' has changed its ID from 3 to 1
...

On the node being joined:
Mar 23 04:21:23 wa-test

Re: [ClusterLabs] Antw: Re: DLM fencing

2016-02-11 Thread Vladislav Bogdanov

10.02.2016 19:32, Digimer wrote:
[snip]


To be clear; DLM does NOT have it's own fencing. It relies on the
cluster's fencing.



Actually, dlm4 can use fence-agents directly (device keyword in 
dlm.conf). Default is to use dlm_stonith though.




___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: crmsh configure delete for constraints

2016-02-10 Thread Vladislav Bogdanov

10.02.2016 11:38, Ulrich Windl wrote:

Vladislav Bogdanov <bub...@hoster-ok.com> schrieb am 10.02.2016 um 05:39 in

Nachricht <6e479808-6362-4932-b2c6-348c7efc4...@hoster-ok.com>:

[...]

Well, I'd reword. Generally, RA should not exit with error if validation
fails on stop.
Is that better?

[...]

As we have different error codes, what type of error?


Any which makes pacemaker to think resource stop op failed.
OCF_ERR_* particularly.

If pacemaker has got an error on start, it will run stop with the same 
set of parameters anyways. And will get error again if that one was from 
validation and RA does not differentiate validation for start and stop. 
And then circular fencing over the whole cluster is triggered for no reason.


Of course, for safety, RA could save its state if start was successful 
and skip validation on stop only if that state is not found. Otherwise 
removed binary or config file would result in resource running on 
several nodes.


Well, this all seems to be very complicated to make some general 
algorithm ;)





Regards,
Ulrich



___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: crmsh configure delete for constraints

2016-02-10 Thread Vladislav Bogdanov

10.02.2016 13:56, Ferenc Wágner wrote:

Vladislav Bogdanov <bub...@hoster-ok.com> writes:


If pacemaker has got an error on start, it will run stop with the same
set of parameters anyways. And will get error again if that one was
from validation and RA does not differentiate validation for start and
stop. And then circular fencing over the whole cluster is triggered
for no reason.

Of course, for safety, RA could save its state if start was successful
and skip validation on stop only if that state is not found. Otherwise
removed binary or config file would result in resource running on
several nodes.


What would happen if we made the start operation return OCF_NOT_RUNNING


Well, then cluster will try to start it again, and that could be 
undesirable - what are OCF_ERR_INSTALLED and OCF_ERR_CONFIGURED for then?



if validation fails?  Or more broadly: if the start operation knows that
the resource is not running, thus a stop opration would do no good.
 From Pacemaker Explained B.4: "The cluster will not attempt to stop a
resource that returns this for any action."  The probes could still
return OCF_ERR_CONFIGURED, putting real info into the logs, the stop
failure could still lead to fencing, protecting data integrity, but
circular fencing would not happen.  I hope.

By the way, what are the reasons to run stop after a failed start?  To
clean up halfway-started resources?  Besides OCF_ERR_GENERIC, the other
error codes pretty much guarrantee that the resource can not be active.


That heavily depends on how given RA is implemented...


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] crmsh configure delete for constraints

2016-02-09 Thread Vladislav Bogdanov
Dejan Muhamedagic <deja...@fastmail.fm> wrote:
>Hi,
>
>On Tue, Feb 09, 2016 at 05:15:15PM +0300, Vladislav Bogdanov wrote:
>> 09.02.2016 16:31, Kristoffer Grönlund wrote:
>> >Vladislav Bogdanov <bub...@hoster-ok.com> writes:
>> >
>> >>Hi,
>> >>
>> >>when performing a delete operation, crmsh (2.2.0) having -F tries
>> >>to stop passed op arguments and then waits for DC to become idle.
>> >>
>> >
>> >Hi again,
>> >
>> >I have pushed a fix that only waits for DC if any resources were
>> >actually stopped:
>https://github.com/ClusterLabs/crmsh/commit/164aa48
>> 
>> Great!
>> 
>> >
>> >>
>> >>More, it may be worth checking stop-orphan-resources property and
>pass stop
>> >>work to pacemaker if it is set to true.
>> >
>> >I am a bit concerned that this might not be 100% reliable. I found
>an
>> >older discussion regarding this and the recommendation from David
>Vossel
>> >then was to always make sure resources were stopped before removing
>> >them, and not relying on stop-orphan-resources to clean things up
>> >correctly. His example of when this might not work well is when
>removing
>> >a group, as the group members might get stopped out-of-order.
>> 
>> OK, I agree. That was just an idea.
>> 
>> >
>> >At the same time, I have thought before that the current
>functionality
>> >is not great. Having to stop resources before removing them is if
>> >nothing else annoying! I have a tentative change proposal to this
>where
>> >crmsh would stop the resources even if --force is not set, and there
>> >would be a flag to pass to stop to get it to ignore whether
>resources
>> >are running, since that may be useful if the resource is
>misconfigured
>> >and the stop action doesn't work.
>> 
>> That should result in fencing, no? I think that is RA issue if that
>> happens.
>
>Right. Unfortunately, this case often gets too little attention;
>people typically test with good and working configurations only.
>The first time we hear about it is from some annoyed user who's
>node got fenced for no good reason. Even worse, with some bad
>configurations, it can happen that the nodes get fenced in a
>round-robin fashion, which certainly won't make your time very
>productive.
>
>> Particularly, imho RAs should not run validate_all on stop
>> action.
>
>I'd disagree here. If the environment is no good (bad
>installation, missing configuration and similar), then the stop
>operation probably won't do much good. Ultimately, it may depend
>on how the resource is managed. In ocf-rarun, validate_all is
>run, but then the operation is not carried out if the environment
>is invalid. In particular, the resource is considered to be
>stopped, and the stop operation exits with success. One of the
>most common cases is when the software resides on shared
>non-parallel storage.

Well, I'd reword. Generally, RA should not exit with error if validation fails 
on stop.
Is that better?

>
>BTW, handling the stop and monitor/probe operations was the
>primary motivation to develop ocf-rarun. It's often quite
>difficult to get these things right.
>
>Cheers,
>
>Dejan
>
>
>> Best,
>> Vladislav
>> 
>> 
>> ___
>> Users mailing list: Users@clusterlabs.org
>> http://clusterlabs.org/mailman/listinfo/users
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started:
>http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>___
>Users mailing list: Users@clusterlabs.org
>http://clusterlabs.org/mailman/listinfo/users
>
>Project Home: http://www.clusterlabs.org
>Getting started:
>http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>Bugs: http://bugs.clusterlabs.org



___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] crmsh configure delete for constraints

2016-02-08 Thread Vladislav Bogdanov
Hi,

when performing a delete operation, crmsh (2.2.0) having -F tries
to stop passed op arguments and then waits for DC to become idle.

That is not needed if only constraints are passed to delete.
Could that be changed? Or, could it wait only if there is something to stop?

Something like this:
diff --git a/modules/ui_configure.py b/modules/ui_configure.py
index cf98702..96ab77e 100644
--- a/modules/ui_configure.py
+++ b/modules/ui_configure.py
@@ -552,6 +552,9 @@ class CibConfig(command.UI):
 if not ok or not cib_factory.commit():
 raise ValueError("Failed to stop one or more running 
resources: %s" %
  (', '.join(to_stop)))
+return True
+else:
+return False
 
 @command.skill_level('administrator')
 @command.completers_repeating(_id_list)
@@ -562,8 +565,8 @@ class CibConfig(command.UI):
 arg_force = any((x in ('-f', '--force')) for x in argl)
 argl = [x for x in argl if (x not in ('-f', '--force'))]
 if arg_force or config.core.force:
-self._stop_if_running(argl)
-utils.wait4dc(what="Stopping %s" % (", ".join(argl)))
+if (self._stop_if_running(argl)):
+utils.wait4dc(what="Stopping %s" % (", ".join(argl)))
 return cib_factory.delete(*argl)
 
 @command.name('default-timeouts')


More, it may be worth checking stop-orphan-resources property and pass stop
work to pacemaker if it is set to true.


Thank you,
Vladislav

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] dlm_controld 4.0.4 exits when crmd is fencing another node

2016-01-22 Thread Vladislav Bogdanov
Hi David, list,

recently I tried to upgrade dlm from 4.0.2 to 4.0.4 and found that it
no longer handles fencing of a remote node initiated by other cluster 
components.
First I noticed that during valid fencing due to resource stop failure,
but it is easily reproduced with 'crm node fence XXX'.

I took logs from both 4.0.2 and 4.0.4 and "normalized" (replaced timestamps)
their part after fencing is originated by pacemaker.

That resulted in the following diff:
--- dlm_controld.log.4.0.2 2016-01-22 15:37:42.860999831 +
+++ dlm_controld.log.4.0.4 2016-01-22 14:53:23.962999872 +
@@ -24,26 +24,11 @@
 clvmd wait for fencing
 fence wait 2 pid 11266 running
 clvmd wait for fencing
-fence result 2 pid 11266 result 0 exit status
-fence wait 2 pid 11266 result 0
-clvmd wait for fencing
-fence status 2 receive 0 from 1 walltime 1453473364 local 1001
-clvmd check_fencing 2 done start 618 fail 1000 fence 1001
-clvmd check_fencing done
-clvmd send_start 1:3 counts 2 1 0 1 1
-clvmd receive_start 1:3 len 76
-clvmd match_change 1:3 matches cg 3
-clvmd wait_messages cg 3 got all 1
-clvmd start_kernel cg 3 member_count 1
+shutdown
+cpg_leave dlm:controld ...
+clear_configfs_nodes rmdir "/sys/kernel/config/dlm/cluster/comms/1"
 dir_member 2
 dir_member 1
-set_members rmdir "/sys/kernel/config/dlm/cluster/spaces/clvmd/nodes/2"
-write "1" to "/sys/kernel/dlm/clvmd/control"
-clvmd prepare_plocks
-dlm:controld ring 1:412 2 memb 1 2
-fence work wait for cluster ringid
-dlm:ls:clvmd ring 1:412 2 memb 1 2
-fence work wait for cluster ringid
-cluster quorum 1 seq 412 nodes 2
-cluster node 2 added seq 412
-set_configfs_node 2 192.168.124.2 local 0
+clear_configfs_space_nodes rmdir 
"/sys/kernel/config/dlm/cluster/spaces/clvmd/nodes/2"
+clear_configfs_space_nodes rmdir 
"/sys/kernel/config/dlm/cluster/spaces/clvmd/nodes/1"
+clear_configfs_spaces rmdir "/sys/kernel/config/dlm/cluster/spaces/clvmd"

Both are built against pacemaker 1.1.14 (I rebuild 4.0.2 to ensure that bug is
not in the stonith API headers).

Systems (2 nodes) run corosync 2.3.5 (libqb 0.17.2) on the top of CentOS 6.7. 
They
run in virtual machines with fencing configured and working.

I hope this could be easily fixed,
feel free to request any further information (f.e. original logs),


Best regards,
Vladislav

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] dlm_controld 4.0.4 exits when crmd is fencing another node

2016-01-22 Thread Vladislav Bogdanov

22.01.2016 19:28, David Teigland wrote:

On Fri, Jan 22, 2016 at 06:59:25PM +0300, Vladislav Bogdanov wrote:

Hi David, list,

recently I tried to upgrade dlm from 4.0.2 to 4.0.4 and found that it
no longer handles fencing of a remote node initiated by other cluster 
components.
First I noticed that during valid fencing due to resource stop failure,
but it is easily reproduced with 'crm node fence XXX'.

I took logs from both 4.0.2 and 4.0.4 and "normalized" (replaced timestamps)
their part after fencing is originated by pacemaker.


There are very few commits there, and only two I could imagine being
related.  Could you try reverting them and see if that helps?

79e87eb5913f Make systemd stop dlm on corosync restart


There is no systemd on EL6, so this one is not a suspect.


fb61984c9388 dlm_stonith: use kick_helper result


Tried reverting this one and a51b2bb ("If an error occurs unlink the 
lock file and exit with status 1") one-by-one and both together, the 
same result.


So problem seems to be somewhere deeper.

Best,
Vladislav


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] GFS2 with Pacemaker, Corosync on Ubuntu 14.04

2016-01-19 Thread Vladislav Bogdanov

19.01.2016 18:14, Momcilo Medic wrote:

Dear all,

I am trying to setup GFS2 on two Ubuntu 14.04 servers.
Every guide I can find online is for 12.04 by using cman package which
was abandoned in 13.10

So, I tried using Pacemaker with Corosync as instructed on your guide [1].
In this guide pcs is used which is not available in Ubuntu so I am
translating commands to crmsh.

I installed everything and noticed bad packaging on DLM which have
wrong init scripts. I reported [2] this bug to Ubuntu.
This also might cause DLM not finding the /dev/misc/dlm-control device
(which is actually located at /dev/dlm-control).

This is the output that I am getting:
# dlm_controld -D
769887 dlm_controld 4.0.1 started
769887 our_nodeid 739311650
769897 cannot find device /dev/misc/dlm-control with minor 52


Check that dlm's udev rules are installed in the location (lib/udev vs 
/usr/lib/udev) appropriate for your system. That changed recently in the 
upstream dlm.



769897 shutdown
769897 /sys/kernel/config/dlm/cluster/comms: opendir failed: 2
769897 /sys/kernel/config/dlm/cluster/spaces: opendir failed: 2

# lsmod | grep dlm
dlm   156389  1 gfs2
sctp  247248  3 dlm
configfs   35358  2 dlm

# ls -hal /dev/dlm*
crw--- 1 root root 10, 52 Jan 14 17:55 /dev/dlm-control
crw--- 1 root root 10, 51 Jan 14 17:55 /dev/dlm-monitor
crw--- 1 root root 10, 50 Jan 14 17:55 /dev/dlm_plock

I checked man dlm, man dlm.conf, man dlm_controld but didn't find
anywhere an option to specify device location.

Any help would be highly appreciated.

[1] 
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-crmsh/html-single/Clusters_from_Scratch/index.html#_install_the_cluster_software
[2] https://bugs.launchpad.net/ubuntu/+source/dlm/+bug/1535242

Kind regards,
Momcilo 'Momo' Medic.
(fedorauser)

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] [OCF] Pacemaker reports a multi-state clone resource instance as running while it is not in fact

2016-01-01 Thread Vladislav Bogdanov
31.12.2015 15:33:45 CET, Bogdan Dobrelya <bdobre...@mirantis.com> wrote:
>On 31.12.2015 14:48, Vladislav Bogdanov wrote:
>> blackbox tracing inside pacemaker, USR1, USR2 and TRAP signals iirc,
>quick google search should point you to Andrew's blog with all
>information about that feature.
>> Next, if you use ocf-shellfuncs in your RA, you could enable tracing
>for resource itself, just add 'trace_ra=1' to every operation config
>(start and monitor).
>
>Thank you, I will try to play with these things once I have the issue
>reproduced again. Cannot provide CIB as I don't have the env now.
>
>But still let me ask again, do anyone know or heard of anything like
>known/fixed bugs about corosync with pacemaker stop running monitor
>actions for a resource at some point, while notifications are still
>logged?
>
>Here is example:
>node-16 crmd:
>2015-12-29T13:16:49.113679+00:00 notice:notice: process_lrm_event:
>Operation p_rabbitmq-server_monitor_27000: unknown error
>(node=node-16.test.domain.local, call=254, rc=1, cib-updat
>e=1454, confirmed=false)
>node-17:
>2015-12-29T13:16:57.603834+00:00 notice:notice: process_lrm_event:
>Operation p_rabbitmq-server_monitor_103000: unknown error
>(node=node-17.test.domain.local, call=181, rc=1, cib-upda
>te=297, confirmed=false)
>node-18:
>2015-12-29T13:20:16.870619+00:00 notice:notice: process_lrm_event:
>Operation p_rabbitmq-server_monitor_103000: not running
>(node=node-18.test.domain.local, call=187, rc=7, cib-update
>=306, confirmed=false)
>node-20:
>2015-12-29T13:20:51.486219+00:00 notice:notice: process_lrm_event:
>Operation p_rabbitmq-server_monitor_3: not running
>(node=node-20.test.domain.local, call=180, rc=7, cib-update=
>308, confirmed=false)
>
>after that point only notifications got logged for affected nodes, like
>Operation p_rabbitmq-server_notify_0: ok
>(node=node-20.test.domain.local, call=287, rc=0, cib-update=0,
>confirmed=t
>rue)
>
>While the node-19 was not affected, and actions
>monitor/stop/start/notify logged OK all the time, like:
>2015-12-29T14:30:00.973561+00:00 notice:notice: process_lrm_event:
>Operation p_rabbitmq-server_monitor_3: not running
>(node=node-19.test.domain.local, call=423, rc=7, cib-update=438,
>confirmed=false)
>2015-12-29T14:30:01.631609+00:00 notice:notice: process_lrm_event:
>Operation p_rabbitmq-server_notify_0: ok
>(node=node-19.test.domain.local, call=424, rc=0, cib-update=0,
>confirmed=true)
>2015-12-29T14:31:19.084165+00:00 notice:notice: process_lrm_event:
>Operation p_rabbitmq-server_stop_0: ok (node=node-19.test.domain.local,
>call=427, rc=0, cib-update=439, confirmed=true)
>2015-12-29T14:32:53.120157+00:00 notice:notice: process_lrm_event:
>Operation p_rabbitmq-server_start_0: unknown error
>(node=node-19.test.domain.local, call=428, rc=1, cib-update=441,
>confirmed=true)

Well, not running and not logged is not the same thing. I do not have access to 
code right now, but I'm pretty sure that successful recurring monitors are not 
logged after the first run. trace_ra for monitor op should prove that. If not, 
then it should be a bug. I recall something was fixed in that area recently.

Best,
Vladislav


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] [OCF] Pacemaker reports a multi-state clone resource instance as running while it is not in fact

2015-12-31 Thread Vladislav Bogdanov
31.12.2015 12:57:45 CET, Bogdan Dobrelya  wrote:
>Hello.
>I've been hopelessly fighting a bug [0] in the custom OCF agent of Fuel
>for OpenStack project. It is related to the destructive test case when
>one node of 3 or 5 total goes down and then back. The bug itself is
>tricky (is rarely reproduced), tl;dr, and has many duplicates. So I
>only
>put here the latest comment.
>
>As it says,
>at some point, after the rabbit OCF monitor reported an error followed
>by several "not running" reports (see crmd log snippet [1]), pacemaker
>starts "thinking" everything is fine with the resource and shows it as
>"running". While in fact it is completely dead and manually triggered
>OCF action monitor may confirm that (not running). But *why* pacemaker
>shows the resource is running and never calls monitor actions again?
>I have no idea how to proceed with the root cause of such pacemaker
>behaviour.
>
>So, I'm asking for guidance on the any recommendations on how-to debug
>and troubleshoot this strange situation and for which useful log
>patterns to seek (and where).
>Thank you in advance!
>
>PS. this is Pacemaker 1.1.12, Corosync 2.3.4,  libqb0 0.17.0 from
>Ubuntu
>vivid. But the Corosync & Pacemaker cluster looks healthy and I can
>find
>no log records saying otherwise.
>
>[0] https://bugs.launchpad.net/fuel/+bug/1472230/comments/32
>[1] http://pastebin.com/0UuBvzzz

Hi.
First, could you paste your CIB, preferably not in xml, but in crmsh format? 
Just to check that everything is fine with resource and fencing configuration.
Then, you may enable blackbox tracing inside pacemaker, USR1, USR2 and TRAP 
signals iirc, quick google search should point you to Andrew's blog with all 
information about that feature.
Next, if you use ocf-shellfuncs in your RA, you could enable tracing for 
resource itself, just add 'trace_ra=1' to every operation config (start and 
monitor).

All that may give you some additional hints on what's going on.

Also, you may think about upgrading pacemaker to 1.1.14-rcX, together with 
libqb to 0.17.2 (and rebuild corosync against that libqb).

Best,
Vladislav



___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Failover to spare node

2015-10-22 Thread Vladislav Bogdanov

22.10.2015 19:49, Andrei Borzenkov wrote:

Let's say I have a pool of nodes and multiple services, somehow
distributed across them. I would like to keep one node as "spare",
without services by default, and if any of "worker" nodes fail, services
that were running there should be relocated to spare together.


placement-strategy=minimal ?



This ensures each service keeps the same resources available and does
not compete with services on other nodes.

It obviously can be achieved with explicit location constraints on each
service for "primary" and "secondary" node; but is there some generic
trick that avoids configuring each and every service?

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] attrd: Fix sigsegv on exit if initialization failed

2015-10-12 Thread Vladislav Bogdanov

Hi,

This was caught with 0.17.1 libqb, which didn't play well with long pids.

commit 180a943846b6d94c27b9b984b039ac0465df64da
Author: Vladislav Bogdanov <bub...@hoster-ok.com>
Date:   Mon Oct 12 11:05:29 2015 +

attrd: Fix sigsegv on exit if initialization failed

diff --git a/attrd/main.c b/attrd/main.c
index 069e9fa..94e9212 100644
--- a/attrd/main.c
+++ b/attrd/main.c
@@ -368,8 +368,12 @@ main(int argc, char **argv)
 crm_notice("Cleaning up before exit");

 election_fini(writer);
-crm_client_disconnect_all(ipcs);
-qb_ipcs_destroy(ipcs);
+
+if (ipcs) {
+crm_client_disconnect_all(ipcs);
+qb_ipcs_destroy(ipcs);
+}
+
 g_hash_table_destroy(attributes);

 if (the_cib) {

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Need bash instead of /bin/sh

2015-09-23 Thread Vladislav Bogdanov

23.09.2015 15:42, dan wrote:

ons 2015-09-23 klockan 14:08 +0200 skrev Ulrich Windl:

dan  schrieb am 23.09.2015 um 13:39 in Nachricht

<1443008370.2386.8.ca...@intraphone.com>:

Hi

As I had problem with corosync 2.3.3 and pacemaker 1.1.10 which was
default in my version of ubuntu, I have now compiled and installed
corosync 2.3.4 and pacemaker 1.1.12.

And now it works.

Though the file /usr/lib/ocf/resource.d/pacemaker/controld
does not work as /bin/sh is linked to dash on ubuntu (and I think
several other Linux variants).

It is line 182:
 local addr_list=$(cat
/sys/kernel/config/dlm/cluster/comms/*/addr_list 2>/dev/null)


That looks like plain POSIX shell to me. What part is causing the problem?


Did a small test:
---test.sh
controld_start() {
 local addr_list=$(echo AF_INET 10.1.1.1 AF_INET 10.1.1.2)

yep, that is a bashism.

posix shell denies assignment of local variables in the declaration.

local addr_list; addr_list=$(echo AF_INET 10.1.1.1 AF_INET 10.1.1.2)

should work


 echo $addr_list
}

controld_start
--

dash test.sh
test.sh: 2: local: 10.1.1.1: bad variable name

bash test.sh
AF_INET 10.1.1.1 AF_INET 10.1.1.2


 Dan


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Clustered LVM with iptables issue

2015-09-11 Thread Vladislav Bogdanov

Hi Digimer,

Be aware that SCTP support in both kernel and DLM _may_ have issues (as 
long as I remember it was not recommended to use at least in cman's 
version of DLM at least because of the leak of testing).


I believe you can force use of TCP via dlm_controld parameters (or 
config options). Of course that could require some kind of bonding to be 
involved. Btw that is the main reason I prefer bonding over multi-ring 
configurations.


Best,
Vladislav

11.09.2015 02:43, Digimer wrote:

For the record;

   Noel helped me on IRC. The problem was that sctp was now allowed in
the firewall. The clue was:


[root@node1 ~]# /etc/init.d/clvmd start
Starting clvmd:
Activating VG(s):  [  OK  ]


] syslog
Sep 10 23:30:47 node1 kernel: ip_tables: (C) 2000-2006 Netfilter Core Team
Sep 10 23:30:47 node1 kernel: nf_conntrack version 0.5.0 (16384 buckets,
65536 max)
*** Sep 10 23:31:02 node1 kernel: dlm: Using SCTP for communications
Sep 10 23:31:03 node1 clvmd: Cluster LVM daemon started - connected to CMAN



[root@node2 ~]# /etc/init.d/clvmd start
Starting clvmd: clvmd startup timed out


] syslog
Sep 10 23:31:03 node2 kernel: dlm: Using SCTP for communications
Sep 10 23:31:05 node2 corosync[3001]:   [TOTEM ] Incrementing problem
counter for seqid 5644 iface 10.20.10.2 to [1 of 3]
Sep 10 23:31:07 node2 corosync[3001]:   [TOTEM ] ring 0 active with no
faults


Adding;

iptables -I INPUT -p sctp -j ACCEPT

Got it working. Obviously, that needs to be tightened up.

digimer

On 10/09/15 07:01 PM, Digimer wrote:

On 10/09/15 06:54 PM, Noel Kuntze wrote:


Hello Digimer,

I initially assumed you were familiar with ss or netstat and simply
forgot about them.
Seems I was wrong.

Check the output of this: `ss -tpn` and `ss -upn`.
Those commands give you the current open TCP and UDP connections,
as well as the program that opened the connection.
Check listening sockets with `ss -tpnl` and `ss -upnl`


I'm not so strong on the network side of things, so I am not very
familiar with ss or netstat.

I have clvmd running:


[root@node1 ~]# /etc/init.d/clvmd status
clvmd (pid  3495) is running...
Clustered Volume Groups: (none)
Active clustered Logical Volumes: (none)


Thought I don't seem to see anything:


[root@node1 ~]# ss -tpnl
State  Recv-Q Send-Q   Local Address:Port
   Peer Address:Port
LISTEN 0  5   :::1
 :::*  users:(("ricci",2482,3))
LISTEN 0  128  127.0.0.1:199
  *:*  users:(("snmpd",2020,8))
LISTEN 0  128 :::111
 :::*  users:(("rpcbind",1763,11))
LISTEN 0  128  *:111
  *:*  users:(("rpcbind",1763,8))
LISTEN 0  128  *:48976
  *:*  users:(("rpc.statd",1785,8))
LISTEN 0  5   :::16851
 :::*  users:(("modclusterd",2371,5))
LISTEN 0  128 :::55476
 :::*  users:(("rpc.statd",1785,10))
LISTEN 0  128 :::22
 :::*  users:(("sshd",2037,4))
LISTEN 0  128  *:22
  *:*  users:(("sshd",2037,3))
LISTEN 0  100::1:25
 :::*  users:(("master",2142,13))
LISTEN 0  100  127.0.0.1:25
  *:*  users:(("master",2142,12))



[root@node1 ~]# ss -tpn
State  Recv-Q Send-Q   Local Address:Port
   Peer Address:Port
ESTAB  0  0   192.168.122.10:22
  192.168.122.1:53935  users:(("sshd",2636,3))
ESTAB  0  0   192.168.122.10:22
  192.168.122.1:53934  users:(("sshd",2613,3))
ESTAB  0  0   10.10.10.1:48985
 10.10.10.2:7788
ESTAB  0  0   10.10.10.1:7788
 10.10.10.2:51681
ESTAB  0  0:::10.20.10.1:16851
  :::10.20.10.2:43553  users:(("modclusterd",2371,6))



[root@node1 ~]# ss -upn
State  Recv-Q Send-Q   Local Address:Port
   Peer Address:Port


I ran all three again and routed output to a file, stopped clvmd and
re-ran the three calls to a different file. I diff'ed the resulting
files and saw nothing of interest:


[root@node1 ~]# /etc/init.d/clvmd status
clvmd (pid  

[ClusterLabs] crm_report consumes all available RAM

2015-09-08 Thread Vladislav Bogdanov
Hi,

just discovered very interesting issue.
If there is a system user with very big UID (8002 in my case),
then crm_report (actually 'grep' it runs) consumes too much RAM.

Relevant part of the process tree at that moment looks like (word-wrap off):
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
...
root 25526  0.0  0.0 106364   636 ?S12:37   0:00  \_ 
/bin/sh /usr/sbin/crm_report --dest=/var/log/crm_report -f -01-01 00:00:00
root 25585  0.0  0.0 106364   636 ?S12:37   0:00  
\_ bash /var/log/crm_report/collector
root 25613  0.0  0.0 106364   152 ?S12:37   0:00
  \_ bash /var/log/crm_report/collector
root 25614  0.0  0.0 106364   692 ?S12:37   0:00
  \_ bash /var/log/crm_report/collector
root 27965  4.9  0.0 100936   452 ?S12:38   0:01
  |   \_ cat /var/log/lastlog
root 27966 23.0 82.9 3248996 1594688 ? D12:38   0:08
  |   \_ grep -l -e Starting Pacemaker
root 25615  0.0  0.0 155432   600 ?S12:37   0:00
  \_ sort -u

ls -ls /var/log/lastlog shows:
40 -rw-r--r--. 1 root root 2336876 Sep  8 04:36 /var/log/lastlog

That is sparse binary file, which consumes only 40k of disk space.
At the same time its size is 23GB, and grep takes all the RAM trying to
grep a string from a 23GB of mostly zeroes without new-lines.

I believe this is worth fixing,

Thank you,
Vladislav

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: crm_report consumes all available RAM

2015-09-08 Thread Vladislav Bogdanov

08.09.2015 15:18, Ulrich Windl wrote:

Vladislav Bogdanov <bub...@hoster-ok.com> schrieb am 08.09.2015 um 14:05 in

Nachricht <55eecefb.8050...@hoster-ok.com>:

Hi,

just discovered very interesting issue.
If there is a system user with very big UID (8002 in my case),
then crm_report (actually 'grep' it runs) consumes too much RAM.

Relevant part of the process tree at that moment looks like (word-wrap off):
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
...
root 25526  0.0  0.0 106364   636 ?S12:37   0:00  \_
/bin/sh /usr/sbin/crm_report --dest=/var/log/crm_report -f -01-01 00:00:00
root 25585  0.0  0.0 106364   636 ?S12:37   0:00
  \_ bash /var/log/crm_report/collector
root 25613  0.0  0.0 106364   152 ?S12:37   0:00
  \_ bash /var/log/crm_report/collector
root 25614  0.0  0.0 106364   692 ?S12:37   0:00
  \_ bash /var/log/crm_report/collector
root 27965  4.9  0.0 100936   452 ?S12:38   0:01
  |   \_ cat /var/log/lastlog
root 27966 23.0 82.9 3248996 1594688 ? D12:38   0:08
  |   \_ grep -l -e Starting Pacemaker
root 25615  0.0  0.0 155432   600 ?S12:37   0:00
  \_ sort -u

ls -ls /var/log/lastlog shows:
40 -rw-r--r--. 1 root root 2336876 Sep  8 04:36 /var/log/lastlog

That is sparse binary file, which consumes only 40k of disk space.
At the same time its size is 23GB, and grep takes all the RAM trying to
grep a string from a 23GB of mostly zeroes without new-lines.

I believe this is worth fixing,


I guess the UID value is used as offset in the lastlog file (which


exactly
I just should add, that user should be logged-in at least once.


isOK). When reading such a sparse file, the filesystem should simply
deliver zero blocks to grep. As grep is designed to read from streads
there is not much you can do against reading all these zeros, I guess.


yep, I think that another indicator should be used


Also an mmap based solution might exceed the virtual address space,
especially for 32-bit systems.
BTW: Did you try "last Pacemaker"? I could only test with "last
reboot" here...


That is post-1.1.13

Thanks,
Vladislav


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: Antw: pacemaker doesn't correctly handle a resource after time/date change

2015-08-28 Thread Vladislav Bogdanov

28.08.2015 12:25, Kostiantyn Ponomarenko wrote:

In my case the final solution will be shipped to different counties
which means different time zones.


Why not to keep all HW clocks in UTC?


And the replacement of one of the nodes in the working solution could
happens.
So the possibilities of the issue to happen is still there.

Thank you,
Kostya

On Fri, Aug 28, 2015 at 10:02 AM, Andrew Beekhof and...@beekhof.net
mailto:and...@beekhof.net wrote:


 On 21 Aug 2015, at 11:06 pm, Kostiantyn Ponomarenko 
konstantin.ponomare...@gmail.com
mailto:konstantin.ponomare...@gmail.com wrote:

 As I wrote in the previous email, it could happen when NTP servers are 
unreachable before Pacemaker's start, and then, after some time, NTP becomes 
reachable again.
 So it is possible that time will be synchronized in any direction: 15 min 
or 23 min or 1 hour or 12 hours backward/forward.

If your clock is drifting by hours during a reboot cycle, then I
would suggest you have a hardware issue that needs attending to.

  And in that case the bug will appear itself.
 
  Thank you,
  Kostya
 
  On Mon, Aug 17, 2015 at 3:01 AM, Andrew Beekhof
and...@beekhof.net mailto:and...@beekhof.net wrote:
 
   On 8 Aug 2015, at 12:43 am, Kostiantyn Ponomarenko
konstantin.ponomare...@gmail.com
mailto:konstantin.ponomare...@gmail.com wrote:
  
   Hi Andrew,
  
   So the issue is:
  
   Having one node up and running, set time on the node backward
to, say, 15 min (generally more than 10 min), then do stop for a
resource.
   That leads to the next - the cluster fails the resource once,
then shows it as started, but the resource actually remains stopped.
  
   Do you need more input from me on the issue?
 
  I think “why” :)
 
  I’m struggling to imagine why this would need to happen.
 
  
   Thank you,
   Kostya
  
   On Wed, Aug 5, 2015 at 3:01 AM, Andrew Beekhof
and...@beekhof.net mailto:and...@beekhof.net wrote:
  
On 4 Aug 2015, at 7:31 pm, Kostiantyn Ponomarenko
konstantin.ponomare...@gmail.com
mailto:konstantin.ponomare...@gmail.com wrote:
   
   
On Tue, Aug 4, 2015 at 3:57 AM, Andrew Beekhof
and...@beekhof.net mailto:and...@beekhof.net wrote:
Github might be another.
   
I am not able to open an issue/bug here
https://github.com/ClusterLabs/pacemaker
  
   Oh, for pacemaker bugs see http://clusterlabs.org/help.html
   Can someone clearly state what the issue is?  The thread was
quite fractured and hard to follow.
  
   
Thank you,
Kostya
___
Users mailing list: Users@clusterlabs.org
mailto:Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users
   
Project Home: http://www.clusterlabs.org
Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
  
  
   ___
   Users mailing list: Users@clusterlabs.org
mailto:Users@clusterlabs.org
   http://clusterlabs.org/mailman/listinfo/users
  
   Project Home: http://www.clusterlabs.org
   Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
   Bugs: http://bugs.clusterlabs.org
  
   ___
   Users mailing list: Users@clusterlabs.org
mailto:Users@clusterlabs.org
   http://clusterlabs.org/mailman/listinfo/users
  
   Project Home: http://www.clusterlabs.org
   Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
   Bugs: http://bugs.clusterlabs.org
 
 
  ___
  Users mailing list: Users@clusterlabs.org
mailto:Users@clusterlabs.org
  http://clusterlabs.org/mailman/listinfo/users
 
  Project Home: http://www.clusterlabs.org
  Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 
  ___
  Users mailing list: Users@clusterlabs.org
mailto:Users@clusterlabs.org
  http://clusterlabs.org/mailman/listinfo/users
 
  Project Home: http://www.clusterlabs.org
  Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org mailto:Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




___

Re: [ClusterLabs] Single quotes in values for 'crm resource rsc param set'

2015-08-17 Thread Vladislav Bogdanov
17.08.2015 10:39, Kristoffer Grönlund wrote:
 Vladislav Bogdanov bub...@hoster-ok.com writes:
 
 Hi Kristoffer, all.

 Could you please look why I get error when trying to update valid
 resource value (which already has single quotes inside) with the
 slightly different one by running the command in the subject?

 It looks like is_value_sane() doesn't accept single quotes just because
 crmsh quotes all arguments to crm_resource with them. I need to pass a
 command-line with semicolons in one of parameters which is run with eval
 in the resource agent. Backslashed double-quoting does not work in this
 case, but single-quotes work fine.

 Could that be some-how fixed?
 
 Well, first of all passing the command line through bash complicates
 things, so if that's what is causing you trouble you could try writing
 your command line to a file and passing it to crmsh using crm -f file.
 Another option is using crm -f - and piping the command line into
 crmsh.
 

Do you mean one with double-quotes?
Otherwise is_value_sane() will fail anyways.

Using ... \string;string\ notation in the file strips quotes from the 
actual command run.
Well, may be function I use is not smart enough, but that works with 
single-qouted value.

What I think could be done for single-quotes support is to assume that value 
which contains
them was actually passed in the double-quotes, so double-quotes should be used 
when
running crm_resource. We may also have in mind that CIB uses double-quotes for 
values internally.

 
 If that doesn't help, it would help /me/ in figuring out just what the
 problem is if you could give me an example of what the current value is
 and what it is you are trying to set it to. :)

Well, this is the (obfuscated a bit due to customer's policies) working 
resource definition
(word wrap off):

primitive staging-0-fs ocf:vendor:Filesystem \
params device=/dev/vg_staging_shared/staging_0 
directory=/cluster/storage/staging-0 fstype=gfs2 options= 
manage_directory=true subagent=/sbin/fs-io-throttle %a staging-0 
/cluster/storage/staging-0 zone 0 
'5000M:300;2500M:100;1500M:50;1000M:35;500M:10;300M:mm' subagent_timeout=10 \
op start interval=0 timeout=90 \
op stop interval=0 timeout=100 \
op monitor interval=10 timeout=45 depth=0 \
op monitor interval=240 timeout=240 depth=10 \
op monitor interval=360 timeout=240 depth=20

Here is the command which fails:

# crm resource param staging-0-fs set subagent /sbin/fs-io-throttle %a 
staging-0 /cluster/storage/staging-0 zone 0 
'5000M:300;2500M:100;1500M:50;1000M:35;500M:10;400M:mm'
DEBUG: pacemaker version: [err: ][out: CRM Version: 1.1.12 (1b9beb7)]
DEBUG: found pacemaker version: 1.1.12
ERROR: /sbin/fs-io-throttle %a staging-0 /cluster/storage/staging-0 zone 0 
'5000M:300;2500M:100;1500M:50;1000M:35;500M:10;400M:mm': bad name
ERROR: Bad usage: Expected valid name, got '/sbin/fs-io-throttle %a staging-0 
/cluster/storage/staging-0 zone 0 
'5000M:300;2500M:100;1500M:50;1000M:35;500M:10;400M:mm'', command: 'param 
staging-0-fs set subagent /sbin/fs-io-throttle %a staging-0 
/cluster/storage/staging-0 zone 0 
'5000M:300;2500M:100;1500M:50;1000M:35;500M:10;400M:mm''

Replacing single-quotes with back-slashed double ones 
(\5000M:300;2500M:100;1500M:50;1000M:35;500M:10;400M:mm\)
makes that string unquoted in the CIB, so semicolons are recognized as command 
separators by the shell
run from the RA.
Using double-escaping 
(\\5000M:300;2500M:100;1500M:50;1000M:35;500M:10;400M:mm\\) when passing 
value in the
double quotes breaks the shell which runs command.

Using single quotes with one or two back-slashes before double-quote inside for 
a value produces
unparseable CIB with dqout; in it.


Here is the function which runs that subagent command (I believe it should 
support several
semicolon-separated commands as well, but did not test that yet):

run_subagent() {
local subagent_timeout=$1
local subagent_command=$2
local WRAPPER

subagent_command=${subagent_command//%a/${__OCF_ACTION}}
subagent_command=${subagent_command//%r/${OCF_RESOURCE_INSTANCE%:*}}
subagent_command=${subagent_command//%n/$( crm_node -n )}

case ${subagent_timeout} in
0||*[!0-9]*)
WRAPPER=bash -c \${subagent_command}\
;;
*)
WRAPPER=timeout -s KILL ${subagent_timeout} bash -c 
\${subagent_command}\
;;
esac

ocf_run eval ${WRAPPER}
}

It is called with:

run_subagent ${OCF_RESKEY_subagent_timeout} ${OCF_RESKEY_subagent}


Best regards,
Vladislav


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Single quotes in values for 'crm resource rsc param set'

2015-08-17 Thread Vladislav Bogdanov

14.08.2015 19:51, Jan Pokorný wrote:

On 14/08/15 18:22 +0300, Vladislav Bogdanov wrote:

I need to pass a command-line with semicolons in one of parameters
which is run with eval in the resource agent. Backslashed
double-quoting does not work in this case, but single-quotes work
fine.


Hmm, another data point to the recent shell can be troublesome:
http://clusterlabs.org/pipermail/users/2015-August/000996.html


Yes, see my last message for more shell madness ;)


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: Single quotes in values for 'crm resource rsc param set'

2015-08-17 Thread Vladislav Bogdanov

17.08.2015 12:44, Ulrich Windl wrote:

Hi!

Somewhat stupid question: Why don't you put monsters like
subagent=/sbin/fs-io-throttle %a staging-0 /cluster/storage/staging-0 zone 0
'5000M:300;2500M:100;1500M:50;1000M:35;500M:10;300M:mm'
ins a shell command file and execute that?


Hmm, probably a good point. I will think about it, thanks.



Regards,
Ulrich


Vladislav Bogdanov bub...@hoster-ok.com schrieb am 17.08.2015 um 11:22

in
Nachricht 55d1a7d9.20...@hoster-ok.com:

17.08.2015 10:39, Kristoffer Grönlund wrote:

Vladislav Bogdanov bub...@hoster-ok.com writes:


Hi Kristoffer, all.

Could you please look why I get error when trying to update valid
resource value (which already has single quotes inside) with the
slightly different one by running the command in the subject?

It looks like is_value_sane() doesn't accept single quotes just because
crmsh quotes all arguments to crm_resource with them. I need to pass a
command-line with semicolons in one of parameters which is run with eval
in the resource agent. Backslashed double-quoting does not work in this
case, but single-quotes work fine.

Could that be some-how fixed?


Well, first of all passing the command line through bash complicates
things, so if that's what is causing you trouble you could try writing
your command line to a file and passing it to crmsh using crm -f file.
Another option is using crm -f - and piping the command line into
crmsh.



Do you mean one with double-quotes?
Otherwise is_value_sane() will fail anyways.

Using ... \string;string\ notation in the file strips quotes from the
actual command run.
Well, may be function I use is not smart enough, but that works with
single-qouted value.

What I think could be done for single-quotes support is to assume that value



which contains
them was actually passed in the double-quotes, so double-quotes should be
used when
running crm_resource. We may also have in mind that CIB uses double-quotes
for values internally.



If that doesn't help, it would help /me/ in figuring out just what the
problem is if you could give me an example of what the current value is
and what it is you are trying to set it to. :)


Well, this is the (obfuscated a bit due to customer's policies) working
resource definition
(word wrap off):

primitive staging-0-fs ocf:vendor:Filesystem \
 params device=/dev/vg_staging_shared/staging_0
directory=/cluster/storage/staging-0 fstype=gfs2 options=
manage_directory=true subagent=/sbin/fs-io-throttle %a staging-0
/cluster/storage/staging-0 zone 0
'5000M:300;2500M:100;1500M:50;1000M:35;500M:10;300M:mm' subagent_timeout=10



\
 op start interval=0 timeout=90 \
 op stop interval=0 timeout=100 \
 op monitor interval=10 timeout=45 depth=0 \
 op monitor interval=240 timeout=240 depth=10 \
 op monitor interval=360 timeout=240 depth=20

Here is the command which fails:

# crm resource param staging-0-fs set subagent /sbin/fs-io-throttle %a
staging-0 /cluster/storage/staging-0 zone 0
'5000M:300;2500M:100;1500M:50;1000M:35;500M:10;400M:mm'
DEBUG: pacemaker version: [err: ][out: CRM Version: 1.1.12 (1b9beb7)]
DEBUG: found pacemaker version: 1.1.12
ERROR: /sbin/fs-io-throttle %a staging-0 /cluster/storage/staging-0 zone 0
'5000M:300;2500M:100;1500M:50;1000M:35;500M:10;400M:mm': bad name
ERROR: Bad usage: Expected valid name, got '/sbin/fs-io-throttle %a
staging-0 /cluster/storage/staging-0 zone 0
'5000M:300;2500M:100;1500M:50;1000M:35;500M:10;400M:mm'', command: 'param
staging-0-fs set subagent /sbin/fs-io-throttle %a staging-0
/cluster/storage/staging-0 zone 0
'5000M:300;2500M:100;1500M:50;1000M:35;500M:10;400M:mm''

Replacing single-quotes with back-slashed double ones
(\5000M:300;2500M:100;1500M:50;1000M:35;500M:10;400M:mm\)
makes that string unquoted in the CIB, so semicolons are recognized as
command separators by the shell
run from the RA.
Using double-escaping
(\\5000M:300;2500M:100;1500M:50;1000M:35;500M:10;400M:mm\\) when passing
value in the
double quotes breaks the shell which runs command.

Using single quotes with one or two back-slashes before double-quote inside



for a value produces
unparseable CIB with dqout; in it.


Here is the function which runs that subagent command (I believe it should
support several
semicolon-separated commands as well, but did not test that yet):

run_subagent() {
 local subagent_timeout=$1
 local subagent_command=$2
 local WRAPPER

 subagent_command=${subagent_command//%a/${__OCF_ACTION}}
 subagent_command=${subagent_command//%r/${OCF_RESOURCE_INSTANCE%:*}}
 subagent_command=${subagent_command//%n/$( crm_node -n )}

 case ${subagent_timeout} in
 0||*[!0-9]*)
 WRAPPER=bash -c \${subagent_command}\
 ;;
 *)
 WRAPPER=timeout -s KILL ${subagent_timeout} bash -c
\${subagent_command}\
 ;;
 esac

 ocf_run eval ${WRAPPER}
}

It is called with:

run_subagent ${OCF_RESKEY_subagent_timeout

[ClusterLabs] Single quotes in values for 'crm resource rsc param set'

2015-08-14 Thread Vladislav Bogdanov

Hi Kristoffer, all.

Could you please look why I get error when trying to update valid 
resource value (which already has single quotes inside) with the 
slightly different one by running the command in the subject?


It looks like is_value_sane() doesn't accept single quotes just because 
crmsh quotes all arguments to crm_resource with them. I need to pass a 
command-line with semicolons in one of parameters which is run with eval 
in the resource agent. Backslashed double-quoting does not work in this 
case, but single-quotes work fine.


Could that be some-how fixed?

Best,
Vladislav

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] node attributes go to different instance_attributes sections in crmsh

2015-08-06 Thread Vladislav Bogdanov
Hi,

following illustrates what happens with 'crm configure show' output after
playing with 'crm node standby|online' having some node attributes already set 
from
the loaded config.

xml node id=1 uname=dell71 \
  instance_attributes id=dell71-instance_attributes \
nvpair name=staging-0-0-placement value=true 
id=dell71-instance_attributes-staging-0-0-placement/ \
nvpair name=meta-0-0-placement value=true 
id=dell71-instance_attributes-meta-0-0-placement/ \
  /instance_attributes \
  instance_attributes id=nodes-1 \
nvpair id=nodes-1-standby name=standby value=off/ \
  /instance_attributes \
/node

I hope it is possible to fix that,
verified on 180ba56 (with CIB erase and following reupload from crmsh format).

Best,
Vladislav

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org