Re: [ClusterLabs] File System does not do a recovery on fail over

2019-06-12 Thread George Melikov
It may be off topic - but for such large filesystems journal check IS a problem. You may look at for ex. ZFS, it doesn't have a need for any journal recovery or fsck on mount (but it may be slower on some performance use cases, please test everything before use).12.06.2019, 08:30, "Indivar Nair" :Thanks, GangIt is a very large file system - around 600TB.Could this be why it takes around 5 - 10mins to do journal recovery?What we do as a workaround is -- Disable the filesystem resource on startup- Manually mount it (wait for as long as it takes)- Then umount it- Enable filesystem resourceBut this doesn't seem like the right approach.We have tried repairing the Filesystem when a failover happens, but ithas never shown any major corruption.Regards,Indivar NairOn Tue, Jun 11, 2019 at 10:18 AM Gang He  wrote: Hi Indivar, See my comments inline. >>> On 6/11/2019 at 12:10 pm, in message , Indivar Nair  wrote: > Hello ..., > > I have an Active-Passive cluster with two nodes hosting an XFS > Filesystem over a CLVM Volume. > > If a failover happens, the volume is mounted on the other node without > a recovery that usually happens to a volume that has not been cleanly > unmounted. > The FS journal is on the same volume. > > Now, when we fail it back (with a complete cluster shutdown and > restart) on to its original node, it undergoes the automatic recovery. > > 1. > Shouldn't it do an FS recovery during the failover to the other node? > Note: The FS journal is on the same volume. Usually, file system must do the log recovery during the file system is mounted. > > 2. > Also, the failback usually fails because the FS check takes a > considerable amount of time. How do I configure the mount not to fail > when an automatic FS check is going on? File system introduces a journal to avoiding take too long time for file system recovery. If the time is too long, maybe this is a file system problem, e.g. file system is damaged. Secondly, you can set the timeout value longer. Thanks Gang > > Any help/pointers would be highly appreciated. > > Thanks. > > Regards, > > > Indivar Nair > ___ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/___Manage your subscription:https://lists.clusterlabs.org/mailman/listinfo/usersClusterLabs home: https://www.clusterlabs.org/Sincerely,George Melikov___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] drbd could not start by pacemaker. strange limited root privileges?

2019-05-24 Thread George Melikov
Looks like selinux restrictions. 23.05.2019, 14:22, "László Neduki" :Hi, (I sent a similar question from an other acount 3 days ago, but: - I do not see it on the list. Maybe I should not see my own email? So I created a new account- I have additional infos (but no solution), so I rewrite the question) pacemaker cannot start drbd9 resources. As I see, root has very limited privileges in the drbd resource agent, when it run by the pacemaker. I downloaded the latest pacemaker this week, and I compiled drbd9 rpms also. I hope, You can help me, I do not find the cause of this behaviour. Please see the below test cases: 1. When I create Pacemaker DRBD resource I get errors# pcs resource create DrbdDB ocf:linbit:drbd drbd_resource=drbd_db op monitor interval=60s meta notify=true# pcs resource master DrbdDBClone DrbdDB master-max=1 master-node-max=1 clone-node-max=1 notify=true# pcs constraint location DrbdDBClone prefers node1=INFINITY# pcs cluster stop --all; pcs cluster start --all; pcs statusFailed Actions:* DrbdDB_monitor_0 on node1 'not installed' (5): call=6, status=complete, exitreason='DRBD kernel (module) not available?',    last-rc-change='Thu May 23 09:54:09 2019', queued=0ms, exec=58ms* DrbdDB_monitor_0 on node2 'not installed' (5): call=6, status=complete, exitreason='DRBD kernel (module) not available?',    last-rc-change='Thu May 23 10:00:22 2019', queued=0ms, exec=71ms 2. when I try to start drbd_db by drbdadm directly, it works well:# modprobe drbd #on each node# drbdadm up drbd_db #on each node# drbdadm primary drbd_db# drbdadm status it shows drbd_db is UpToDate on each nodeI also can promote and mount filesystem well 3. When I use debug-start, it works fine (so the resource syntax sould be correct)# drbdadm statusNo currently configured DRBD found.# pcs resource debug-start DrbdDBMasterError: unable to debug-start a master, try the master's resource: DrbdDB# pcs resource debug-start DrbdDB #on each nodeOperation start for DrbdDB:0 (ocf:linbit:drbd) returned: 'ok' (0)# drbdadm statusit shows drbd_db is UpToDate on each node 4. Pacemaker handle other resources well . If I set auto_promote=yes, and I start (but not promote) the drbd_db by drbdadm, then pacemaker can create filesystem on it well, and also the appserver, database resources.  5. The strangest behaviour for me. Root have very limited privileges whitin the drbd resource agent. If I write this line to the srbd_start() method of  /usr/lib/ocf/resource.d/linbit/drbd ocf_log err "lados " $(whoami) $( ls -l /home/opc/tmp/modprobe2.trace ) $( do_cmd touch /home/opc/tmp/modprobe2.trace ) I got theese messeges in log, when I start the cluster # tail -f /var/log/cluster/corosync.log | grep -A 8 -B 3 -i lados ...May 21 15:35:12  drbd(DrbdDB)[31649]:    ERROR: lados  rootMay 21 15:35:12 [31309] node1   lrmd:   notice: operation_finished:    DrbdDB_start_0:31649:stderr [ ls: cannot access /home/opc/tmp/modprobe2.trace: Permission denied ]May 21 15:35:12 [31309] node1   lrmd:   notice: operation_finished:    DrbdFra_start_0:31649:stderr [ touch: cannot touch '/home/opc/tmp/modprobe2.trace': Permission denied ]...and also, when I try to strace the "modprobe -s drbd `$DRBDADM sh-mod-parms`" in drbd resource agent, I only see 1 line in the /root/modprobe2.trace. This meens for me:- root cannot trace the calls in drbdadm (even if root can strace drbdadm outside of pacemaker well)- root can write into files his own directory (/root/modprobe2.trace)  6. Opposit of previous testroot has these privileges outside from pacamaker # sudo su -# touch /home/opc/tmp/modprobe2.trace# ls -l /home/opc/tmp/modprobe2.trace-rw-r--r--. 1 root root 0 May 21 15:44 /home/opc/tmp/modprobe2.trace  Thanks: lados.  ,___Manage your subscription:https://lists.clusterlabs.org/mailman/listinfo/usersClusterLabs home: https://www.clusterlabs.org/  ________Sincerely,George Melikov ___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] resource agent - wait for app data sync

2018-11-26 Thread George Melikov
Some apps's data may need to be synced before it's safe to 
promote/demote/standby.

For example - DRBD, it replicates data across servers, but if you shut down 
master server during resync - you'll have a split brain.

Is there a way to tell pacemaker from OCF agent that it's not safe now to do 
any migration?

Unfortunately, I didn't find anything about this case in 
https://github.com/ClusterLabs/resource-agents/blob/master/doc/dev-guides/ra-dev-guide.asc
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] pcs API / python module

2018-09-20 Thread George Melikov
Hello all, are there any official API for pacemaker/pcs? I'm interested in python integration especially. There are pcs module, but looks like it doesn't have stable API? https://lists.clusterlabs.org/pipermail/users/2015-August/001258.html Sincerely,George Melikov
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] pacemaker as data store

2018-05-16 Thread George Melikov
Thank you, it works great on healthy cluster.


Sincerely,
George Melikov,
Tel. 7-915-278-39-36
Skype: georgemelikov

С наилучшими пожеланиями,
Георгий Меликов,
m...@gmelikov.ru
Моб:         +7 9152783936
Skype:     georgemelikov


15.05.2018, 18:01, "Ken Gaillot" <kgail...@redhat.com>:
> On Tue, 2018-05-15 at 13:25 +0300, George Melikov wrote:
>>  Hello,
>>
>>  Sorry for a (likely) dumb question,
>>  but is there a way to store and sync data via pacemaker/corosync?
>>
>>  Are there any way to store key/value properties or files?
>>
>>  I've found `pcs property set --force`, but it didn't survive cluster
>>  restart.
>
> That's surprising, cluster properties (even unrecognized ones) should
> persist. After setting it, try double-checking that it was written to
> disk with pcs cluster cib | less. I would use some prefix (like the
> name of your organization) for all property names, to make conflicts
> with real properties less likely.
>
> Permanent node attributes are another possibility, though they record a
> separate value for each node. The values of any node, however, can be
> queried from any other node. That means you could just pick one node
> and set all your name/value pairs using its name.
>
> However, there's a reason not to use pacemaker for this purpose:
> changes to cluster properties or node attributes will trigger a new
> calculation of where resources should be. It won't cause any harm, but
> it will add CPU and I/O load unnecessarily. Similarly, if your data set
> is large, it will take longer to do such calculations, slowing down
> recovery unnecessarily.
>
> You could run etcd or some NoSQL database as a cluster resource, then
> keep your data there.
>
>>  
>>  Sincerely,
>>  George Melikov,
>>  Tel. 7-915-278-39-36
>>  Skype: georgemelikov
>>
>>  С наилучшими пожеланиями,
>>  Георгий Меликов,
>>  m...@gmelikov.ru
>>  Моб:         +7 9152783936
>>  Skype:     georgemelikov
> --
> Ken Gaillot <kgail...@redhat.com>
> ___
> Users mailing list: Users@clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] pacemaker as data store

2018-05-15 Thread George Melikov
Hello, 

Sorry for a (likely) dumb question,
but is there a way to store and sync data via pacemaker/corosync?

Are there any way to store key/value properties or files?

I've found `pcs property set --force`, but it didn't survive cluster restart.


Sincerely,
George Melikov,
Tel. 7-915-278-39-36
Skype: georgemelikov

С наилучшими пожеланиями,
Георгий Меликов,
m...@gmelikov.ru
Моб:         +7 9152783936
Skype:     georgemelikov
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] symmetric-cluster=false doesn't work

2018-03-28 Thread George Melikov
Thank you for clarification! 
I think you're right, our last config doesn't have any problem with asymmetric 
work.

26.03.2018, 22:37, "Ken Gaillot" <kgail...@redhat.com>:
> On Tue, 2018-03-20 at 22:03 +0300, George Melikov wrote:
>>  Hello,
>>
>>  I tried to create an asymmetric cluster via property symmetric-
>>  cluster=false , but my resources try to start on any node, though I
>>  have set locations for them.
>>
>>  What did I miss?
>>
>>  cib: https://pastebin.com/AhYqgUdw
>>
>>  Thank you for any help!
>>  
>>  Sincerely,
>>  George Melikov
>
> That output looks fine -- the resources are started only on nodes where
> they are allowed. What are you expecting to be different?
>
> Note that resources will be *probed* on every node (a one-time monitor
> action to check whether they are already running there), but they
> should only be *started* on allowed nodes.
> --
> Ken Gaillot <kgail...@redhat.com>
> ___
> Users mailing list: Users@clusterlabs.org
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org