Re: [ClusterLabs] how to sync data using cmap between cluster

2017-05-25 Thread Jan Friesse

ok, but why the node status(left, join) can be sync to other node in
the cluster by CMAP? thanks!


It is not synced by cmap. Node status is essential property of totem 
protocol and it is just stored into local cmap mostly for 
diagnostics/monitoring. Also they are not so much in sync. You can try 
to create two node cluster, block traffic on one of the node and you 
will get node A which sees node B as a left and node B which see node A 
as left.


Honza



On Thu, May 25, 2017 at 11:26 PM, Christine Caulfield
 wrote:

On 25/05/17 15:48, Rui Feng wrote:

Hi,

   I have a test based on corosync 2.3.4, and find the data stored by
cmap( corosync-cmapctl -s test i8 1) which can't be sync to other
node.
   Could somebody give some comment or solution for it, thanks!




cmap isn't replicated across the cluster. If you need data replication
then you'll have to use some other method.

Chrissie

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: (no subject)

2017-05-25 Thread Digimer
On 26/05/17 02:05 AM, Ulrich Windl wrote:
> PLEASE learn how to use the subject in E_mail messages!

Christopher explained that the email was sent early by accident.

digimer

 Christopher Pax  schrieb am 24.05.2017 um 22:36 in 
 Nachricht
> :
>> TO ADMIN: I am going to resubmit this question. please delete this thread.
> 
> 
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 


-- 
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of
Einstein’s brain than in the near certainty that people of equal talent
have lived and died in cotton fields and sweatshops." - Stephen Jay Gould

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Antw: Re: (no subject)

2017-05-25 Thread Ulrich Windl
PLEASE learn how to use the subject in E_mail messages!

>>> Christopher Pax  schrieb am 24.05.2017 um 22:36 in 
>>> Nachricht
:
> TO ADMIN: I am going to resubmit this question. please delete this thread.



___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Resources still retains in Primary Node even though its interface went down

2017-05-25 Thread pillai bs
Thanks Ken Gaillot,
I will try configuring the resources to monitor and update. Thank you.

Regards,
Madhan.B


On Fri, May 19, 2017 at 10:48 PM, Ken Gaillot  wrote:

> On 05/18/2017 07:54 AM, pillai bs wrote:
> > Hi Ken Gaillot,
> >
> > Sorry for the late reply.
> > No i didnt configured any thing to monitor the resources.
> >
> > Regards,
> > Madhan.B
>
> Pacemaker has the built-in capability to monitor resources, but it
> doesn't do that unless you tell it to. Without a monitor, Pacemaker will
> never know when a resource has failed.
>
> How you configure a monitor depends on whether you use crm shell, pcs,
> or whatever, but basically you just specify that you want to create an
> operation for the resource of type "monitor" with some interval (such as
> 30 seconds).
>
>
> > On Sat, May 13, 2017 at 12:54 AM, Ken Gaillot  > > wrote:
> >
> > On 05/12/2017 09:43 AM, pillai bs wrote:
> > > Thank you for the Prompt reply.
> > > I have one more question.sorry it might be silly. but am wondering
> after
> > > noticed this.
> > >  I made that interface down  but how the ip
> address(Public)
> > > & VIP (IP resource) are still in primary node.
> > > If i made interface down, public IP address also have to go down
> right?
> > >
> > > Regards,
> > > pillai.bs  
> >
> > Did you configure a monitor operation on the IP address resources?
> >
> > > On Wed, May 3, 2017 at 7:34 PM, Ken Gaillot  
> > > >> wrote:
> > >
> > > On 05/03/2017 02:43 AM, pillai bs wrote:
> > > > Hi Experts!!!
> > > >
> > > >   Am having two node setup for HA
> > (Primary/Secondary) with
> > > > separate resources for Home/data/logs/Virtual IP.. As known
> the
> > > Expected
> > > > behavior should be , if Primary node went down, secondary
> > has to take
> > > > in-charge (meaning initially the VIP will point the primary
> > node, so
> > > > user can access home/data/logs from primary node.Once
> > primary node
> > > went
> > > > down, the VIP/floatingIP will point the secondary node so
> > that the
> > > user
> > > > can experienced uninterrupted service).
> > > >  I'm using dual ring support to avoid split
> > brain. I have
> > > > two interfaces (Public & Private).Intention for having
> private
> > > interface
> > > > is for Data Sync alone.
> > > >
> > > > I have tested my setup in two different ways:
> > > > 1. Made primary Interface down (ifdown eth0), as expected
> > VIP and
> > > other
> > > > resources moved from primary to secondary node.(VIP will not
> be
> > > > reachable from primary node)
> > > > 2. Made Primary Interface down (Physically unplugged the
> > Ethernet
> > > > Cable). The primary node still retain the resources,
> > > VIP/FloatingIP was
> > > > reachable from primary node.
> > > >
> > > > Is my testing correct?? how come the VIP will be reachable
> > even though
> > > > eth0 was down. Please advice!!!
> > > >
> > > > Regards,
> > > > Madhan.B
> > >
> > > Sorry, didn't see this message before replying to the other
> one :)
> > >
> > > The IP resource is successful if the IP is up *on that host*.
> > It doesn't
> > > check that the IP is reachable from any other site. Similarly,
> > > filesystem resources just make sure that the filesystem can be
> > mounted
> > > on the host. So, unplugging the Ethernet won't necessarily
> > make those
> > > resources fail.
> > >
> > > Take a look at the ocf:pacemaker:ping resource for a way to
> > ensure that
> > > the primary host has connectivity to the outside world. Also,
> > be sure
> > > you have fencing configured, so that the surviving node can
> > kill a node
> > > that is completely cut off or unresponsive.
>
>
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] how to sync data using cmap between cluster

2017-05-25 Thread Rui Feng
ok, but why the node status(left, join) can be sync to other node in
the cluster by CMAP? thanks!

On Thu, May 25, 2017 at 11:26 PM, Christine Caulfield
 wrote:
> On 25/05/17 15:48, Rui Feng wrote:
>> Hi,
>>
>>   I have a test based on corosync 2.3.4, and find the data stored by
>> cmap( corosync-cmapctl -s test i8 1) which can't be sync to other
>> node.
>>   Could somebody give some comment or solution for it, thanks!
>>
>>
>
> cmap isn't replicated across the cluster. If you need data replication
> then you'll have to use some other method.
>
> Chrissie
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] resource monitor logging

2017-05-25 Thread Ken Gaillot
On 05/24/2017 03:44 PM, Christopher Pax wrote:
> 
> I am running postgresql as a resource in corosync. and there is a
> monitor process that kicks off every few seconds to see if postgresqlis
> alive (it runs a select now()). My immediate concernis that it is
> generating alotof logs in auth.log, and I am wondering of this is normal
> behavior? Is there a way to silence this?

That's part of the operating system's security setup. I wouldn't disable
it, for security reasons. If the issue is that the logs are growing too
fast, I'd just rotate them more frequently and keep fewer old logs.

Also, consider whether running a monitor that frequently is really
necessary.

Another possibility, which probably would require modifying the resource
agent, would be to configure two monitors of different "levels". The
regular monitor, scheduled frequently, could just check that the
postgresql pid is still alive. The second level monitor, scheduled less
frequently, could try the select.

> 
> ##
> ## /var/log/auth.log
> ##
> May 24 15:23:19 ssinode02-g2 runuser: pam_unix(runuser:session): session
> opened for user postgres by (uid=0)
> May 24 15:23:19 ssinode02-g2 runuser: pam_unix(runuser:session): session
> closed for user postgres
> May 24 15:23:19 ssinode02-g2 runuser: pam_unix(runuser:session): session
> opened for user postgres by (uid=0)
> May 24 15:23:19 ssinode02-g2 runuser: pam_unix(runuser:session): session
> closed for user postgres
> May 24 15:23:19 ssinode02-g2 runuser: pam_unix(runuser:session): session
> opened for user postgres by (uid=0)
> May 24 15:23:19 ssinode02-g2 runuser: pam_unix(runuser:session): session
> closed for user postgres
> 
> ##
> ## /var/log/postgresql/data.log
> ##
> DEBUG:  forked new backend, pid=27900 socket=11
> LOG:  connection received: host=[local]
> LOG:  connection authorized: user=postgres database=template1
> LOG:  statement: select now();
> LOG:  disconnection: session time: 0:00:00.003 user=postgres
> database=template1 host=[local]
> DEBUG:  server process (PID 27900) exited with exit code 0
> DEBUG:  forked new backend, pid=28030 socket=11
> LOG:  connection received: host=[local]
> LOG:  connection authorized: user=postgres database=template1
> LOG:  statement: select now();
> LOG:  disconnection: session time: 0:00:00.002 user=postgres
> database=template1 host=[local]
> 
> 
> ## snippit of pgsql corosync primitive
> primitive res_pgsql_2 pgsql \
> params pgdata="/mnt/drbd/postgres"
> config="/mnt/drbd/postgres/postgresql.conf" start_opt="-d 2"
> pglibs="/usr/lib/postgresql/9.5/lib"
> logfile="/var/log/postgresql/data.log" \
> operations $id=res_pgsql_1-operations \
> op start interval=0 timeout=60 \
> op stop interval=0 timeout=60 \
> op monitor interval=3 timeout=60 start-delay=0

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] ClusterIP won't return to recovered node

2017-05-25 Thread Ken Gaillot
On 05/24/2017 12:27 PM, Dan Ragle wrote:
> I suspect this has been asked before and apologize if so, a google
> search didn't seem to find anything that was helpful to me ...
> 
> I'm setting up an active/active two-node cluster and am having an issue
> where one of my two defined clusterIPs will not return to the other node
> after it (the other node) has been recovered.
> 
> I'm running on CentOS 7.3. My resource setups look like this:
> 
> # cibadmin -Q|grep dc-version
>  value="1.1.15-11.el7_3.4-e174ec8"/>
> 
> # pcs resource show PublicIP-clone
>  Clone: PublicIP-clone
>   Meta Attrs: clone-max=2 clone-node-max=2 globally-unique=true
> interleave=true
>   Resource: PublicIP (class=ocf provider=heartbeat type=IPaddr2)
>Attributes: ip=75.144.71.38 cidr_netmask=24 nic=bond0
>Meta Attrs: resource-stickiness=0
>Operations: start interval=0s timeout=20s (PublicIP-start-interval-0s)
>stop interval=0s timeout=20s (PublicIP-stop-interval-0s)
>monitor interval=30s (PublicIP-monitor-interval-30s)
> 
> # pcs resource show PrivateIP-clone
>  Clone: PrivateIP-clone
>   Meta Attrs: clone-max=2 clone-node-max=2 globally-unique=true
> interleave=true
>   Resource: PrivateIP (class=ocf provider=heartbeat type=IPaddr2)
>Attributes: ip=192.168.1.3 nic=bond1 cidr_netmask=24
>Meta Attrs: resource-stickiness=0
>Operations: start interval=0s timeout=20s (PrivateIP-start-interval-0s)
>stop interval=0s timeout=20s (PrivateIP-stop-interval-0s)
>monitor interval=10s timeout=20s
> (PrivateIP-monitor-interval-10s)
> 
> # pcs constraint --full | grep -i publicip
>   start WEB-clone then start PublicIP-clone (kind:Mandatory)
> (id:order-WEB-clone-PublicIP-clone-mandatory)
> # pcs constraint --full | grep -i privateip
>   start WEB-clone then start PrivateIP-clone (kind:Mandatory)
> (id:order-WEB-clone-PrivateIP-clone-mandatory)

FYI These constraints cover ordering only. If you also want to be sure
that the IPs only start on a node where the web service is functional,
then you also need colocation constraints.

> 
> When I first create the resources, they split across the two nodes as
> expected/desired:
> 
>  Clone Set: PublicIP-clone [PublicIP] (unique)
>  PublicIP:0(ocf::heartbeat:IPaddr2):   Started node1-pcs
>  PublicIP:1(ocf::heartbeat:IPaddr2):   Started node2-pcs
>  Clone Set: PrivateIP-clone [PrivateIP] (unique)
>  PrivateIP:0(ocf::heartbeat:IPaddr2):   Started node1-pcs
>  PrivateIP:1(ocf::heartbeat:IPaddr2):   Started node2-pcs
>  Clone Set: WEB-clone [WEB]
>  Started: [ node1-pcs node2-pcs ]
> 
> I then put the second node in standby:
> 
> # pcs node standby node2-pcs
> 
> And the IPs both jump to node1 as expected:
> 
>  Clone Set: PublicIP-clone [PublicIP] (unique)
>  PublicIP:0(ocf::heartbeat:IPaddr2):   Started node1-pcs
>  PublicIP:1(ocf::heartbeat:IPaddr2):   Started node1-pcs
>  Clone Set: WEB-clone [WEB]
>  Started: [ node1-pcs ]
>  Stopped: [ node2-pcs ]
>  Clone Set: PrivateIP-clone [PrivateIP] (unique)
>  PrivateIP:0(ocf::heartbeat:IPaddr2):   Started node1-pcs
>  PrivateIP:1(ocf::heartbeat:IPaddr2):   Started node1-pcs
> 
> Then unstandby the second node:
> 
> # pcs node unstandby node2-pcs
> 
> The publicIP goes back, but the private does not:
> 
>  Clone Set: PublicIP-clone [PublicIP] (unique)
>  PublicIP:0(ocf::heartbeat:IPaddr2):   Started node1-pcs
>  PublicIP:1(ocf::heartbeat:IPaddr2):   Started node2-pcs
>  Clone Set: WEB-clone [WEB]
>  Started: [ node1-pcs node2-pcs ]
>  Clone Set: PrivateIP-clone [PrivateIP] (unique)
>  PrivateIP:0(ocf::heartbeat:IPaddr2):   Started node1-pcs
>  PrivateIP:1(ocf::heartbeat:IPaddr2):   Started node1-pcs
> 
> Anybody see what I'm doing wrong? I'm not seeing anything in the logs to
> indicate that it tries node2 and then fails; but I'm fairly new to the
> software so it's possible I'm not looking in the right place.

The pcs status would show any failed actions, and anything important in
the logs would start with "error:" or "warning:".

At any given time, one of the nodes is the DC, meaning it schedules
actions for the whole cluster. That node will have more "pengine:"
messages in its logs at the time. You can check those logs to see what
decisions were made, as well as a "saving inputs" message to get the
cluster state that was used to make those decisions. There is a
crm_simulate tool that you can run on that file to get more information.

By default, pacemaker will try to balance the number of resources
running on each node, so I'm not sure why in this case node1 has four
resources and node2 has two. crm_simulate might help explain it.

However, there's nothing here telling pacemaker that the instances of
PrivateIP should run on different nodes when possible. With your

Re: [ClusterLabs] Node attribute disappears when pacemaker is started

2017-05-25 Thread Ken Gaillot
On 05/24/2017 05:13 AM, 井上 和徳 wrote:
> Hi,
> 
> After loading the node attribute, when I start pacemaker of that node, the 
> attribute disappears.
> 
> 1. Start pacemaker on node1.
> 2. Load configure containing node attribute of node2.
>(I use multicast addresses in corosync, so did not set "nodelist {nodeid: 
> }" in corosync.conf.)
> 3. Start pacemaker on node2, the node attribute that should have been load 
> disappears.
>Is this specifications ?

Hi,

No, this should not happen for a permanent node attribute.

Transient node attributes (status-attr in crm shell) are erased when the
node starts, so it would be expected in that case.

I haven't been able to reproduce this with a permanent node attribute.
Can you attach logs from both nodes around the time node2 is started?

> 
> 1.
> [root@rhel73-1 ~]# systemctl start corosync;systemctl start pacemaker
> [root@rhel73-1 ~]# crm configure show
> node 3232261507: rhel73-1
> property cib-bootstrap-options: \
>   have-watchdog=false \
>   dc-version=1.1.17-0.1.rc2.el7-524251c \
>   cluster-infrastructure=corosync
> 
> 2.
> [root@rhel73-1 ~]# cat rhel73-2.crm
> node rhel73-2 \
>   utilization capacity="2" \
>   attributes attrname="attr2"
> 
> [root@rhel73-1 ~]# crm configure load update rhel73-2.crm
> [root@rhel73-1 ~]# crm configure show
> node 3232261507: rhel73-1
> node rhel73-2 \
>   utilization capacity=2 \
>   attributes attrname=attr2
> property cib-bootstrap-options: \
>   have-watchdog=false \
>   dc-version=1.1.17-0.1.rc2.el7-524251c \
>   cluster-infrastructure=corosync
> 
> 3.
> [root@rhel73-1 ~]# ssh rhel73-2 'systemctl start corosync;systemctl start 
> pacemaker'
> [root@rhel73-1 ~]# crm configure show
> node 3232261507: rhel73-1
> node 3232261508: rhel73-2
> property cib-bootstrap-options: \
>   have-watchdog=false \
>   dc-version=1.1.17-0.1.rc2.el7-524251c \
>   cluster-infrastructure=corosync
> 
> Regards,
> Kazunori INOUE

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] how to sync data using cmap between cluster

2017-05-25 Thread Christine Caulfield
On 25/05/17 15:48, Rui Feng wrote:
> Hi,
> 
>   I have a test based on corosync 2.3.4, and find the data stored by
> cmap( corosync-cmapctl -s test i8 1) which can't be sync to other
> node.
>   Could somebody give some comment or solution for it, thanks!
> 
>

cmap isn't replicated across the cluster. If you need data replication
then you'll have to use some other method.

Chrissie

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] how to sync data using cmap between cluster

2017-05-25 Thread Rui Feng
Hi,

  I have a test based on corosync 2.3.4, and find the data stored by
cmap( corosync-cmapctl -s test i8 1) which can't be sync to other
node.
  Could somebody give some comment or solution for it, thanks!


Br,
Feng Rui

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org