Re: [Pacemaker] Quorum disk?

2010-08-30 Thread Ciro Iriarte
2010/8/25 Michael Schwartzkopff :
> Am Mittwoch, den 25.08.2010, 17:01 -0400 schrieb Ciro Iriarte:
>> Hi, I'm planning to use OpanAIS+Pacemaker on SLES11-SP1 and would like
>> to know if it's possible to use a quorum disk in a two-node cluster.
>> The idea is to avoid adding a third node just for quorum...
>>
>> Regards,
>
> Hi,
>
> you could have a look at the sfex resource agent.
>
> Greetings,
>
> Michael Schwartzkopff
>

Thanks, sounds interesting, but it doesn't modify the quorum count

Regards,

-- 
Ciro Iriarte
http://cyruspy.wordpress.com
--

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] cib fails to start until host is rebooted

2010-08-30 Thread Michael Smith

Hi,

I have a pacemaker/corosync setup on a bunch of fully patched SLES11 SP1 
systems. On one of the systems, if I /etc/init.d/openais stop, then 
/etc/init.d/openais start, pacemaker fails to come up:


Aug 30 15:48:09 xen-test1 cib: [5858]: info: crm_cluster_connect: 
Connecting to OpenAIS
Aug 30 15:48:09 xen-test1 cib: [5858]: info: init_ais_connection: 
Creating connection to our AIS plugin


Aug 30 15:48:10 xen-test1 corosync[5851]:  [IPC   ] Invalid IPC credentials.
Aug 30 15:48:10 xen-test1 cib: [5858]: info: init_ais_connection: 
Connection to our AIS plugin (9) failed: unknown (100)


Aug 30 15:48:10 xen-test1 cib: [5858]: CRIT: cib_init: Cannot sign in to 
the cluster... terminating


I've tried rm /var/run/crm/*, but it doesn't help; the only fix is to 
reboot.


I have an strace -f of /etc/init.d/openais start, if that would help.

cluster-glue-1.0.5-0.5.1
corosync-1.2.1-0.5.1
libpacemaker3-1.1.2-0.2.1
libcorosync4-1.2.1-0.5.1
libopenais3-1.1.2-0.5.19
pacemaker-1.1.2-0.2.1
openais-1.1.2-0.5.19

Thanks,
Mike

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Adding a STONITH module to the distribution

2010-08-30 Thread William Seligman
William Seligman  writes:

> William Seligman  writes:
> 
> > I've written a STONITH device script for systems that monitor their UPSes 
> > w/NUT. I think it might be of sufficient interest to include in the standard
> > Pacemaker distribution. What is the procedure for submitting such scripts?
> > 
> > I don't particularly want credit or anything like that. It's just a simple
> > script that I think could be a time-saver for sysadmins like me.
> 
> Here's a link to the script. (I would have posted the link to the script
> directly, but it has lines longer than 80 characters, and the web interface to
> GMANE is giving me some flak.)

What I meant to say was "I would have posted the script directly..."

> http://bit.ly/3yPjS

Oh, frak. bit.ly created a link to my home page, which is of interest to no one.
Let me try that again:

http://bit.ly/annpi1

> If the comment near the top of the file is not sufficient to put the script
> under the GPL license, please let me know.



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Adding a STONITH module to the distribution

2010-08-30 Thread William Seligman
William Seligman  writes:

> I've written a STONITH device script for systems that monitor their UPSes 
> using
> NUT. I think it might be of sufficient interest to include in the standard
> Pacemaker distribution. What is the procedure for submitting such scripts?
> 
> I don't particularly want credit or anything like that. It's just a simple
> script that I think could be a time-saver for sysadmins like me.

Here's a link to the script. (I would have posted the link to the script
directly, but it has lines longer than 80 characters, and the web interface to
GMANE is giving me some flak.)

http://bit.ly/3yPjS

If the comment near the top of the file is not sufficient to put the script
under the GPL license, please let me know.


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] cluster-dlm: set_fs_notified: set_fs_notified no nodeid 1812048064#012

2010-08-30 Thread Roberto Giordani
Ok I'll do.
Thanks!

On 08/30/2010 11:16 AM, Dan Frincu wrote:
> Try using RSTP on the switches, if possible, it has a lower
> convergence time.
>
> Roberto Giordani wrote:
>> Thanks,
>> who should I contact? Which mailing list?
>> I've discovered that this problem occours when the port of my switch
>> where the cluster ring is connected became "blocked" due spanning tree.
>> I've resolved the bug using for the ring a separate switch without
>> spanning tre enabled and different subnet.
>> Is there a configuration to avoid that before the spanning tree
>> recalculate the route due a failure, the cluster nodes doesn't hang?
>> The hang occurses on SLES11sp1 too where the servers are up running, the
>> cluster status is ok, but when try to connect to the server with ssh,
>> after the login hang the session.
>>
>> Usually the recalculate takes 50 seconds.
>>
>> Regards,
>> Roberto.
>>
>> On 08/26/2010 10:24 AM, Dejan Muhamedagic wrote:
>>   
>>> Hi,
>>>
>>> On Thu, Aug 26, 2010 at 09:36:10AM +0200, Andrew Beekhof wrote:
>>>   
>>> 
 On Wed, Aug 18, 2010 at 6:24 PM, Roberto Giordani  
 wrote:
 
   
> Hello,
> I'll explain what’s happened after a network black-out
> I've a cluster with pacemaker on Opensuse 11.2 64bit
> 
> Last updated: Wed Aug 18 18:13:33 2010
> Current DC: nodo1 (nodo1)
> Version: 1.0.2-ec6b0bbee1f3aa72c4c2559997e675db6ab39160
> 3 Nodes configured.
> 11 Resources configured.
> 
>
> Node: nodo1 (nodo1): online
> Node: nodo3 (nodo3): online
> Node: nodo4 (nodo4): online
>
> Clone Set: dlm-clone
> dlm:0   (ocf::pacemaker:controld):  Started nodo3
> dlm:1   (ocf::pacemaker:controld):  Started nodo1
> dlm:2   (ocf::pacemaker:controld):  Started nodo4
> Clone Set: o2cb-clone
> o2cb:0  (ocf::ocfs2:o2cb):  Started nodo3
> o2cb:1  (ocf::ocfs2:o2cb):  Started nodo1
> o2cb:2  (ocf::ocfs2:o2cb):  Started nodo4
> Clone Set: XencfgFS-Clone
> XencfgFS:0  (ocf::heartbeat:Filesystem):Started nodo3
> XencfgFS:1  (ocf::heartbeat:Filesystem):Started nodo1
> XencfgFS:2  (ocf::heartbeat:Filesystem):Started nodo4
> Clone Set: XenimageFS-Clone
> XenimageFS:0(ocf::heartbeat:Filesystem):Started nodo3
> XenimageFS:1(ocf::heartbeat:Filesystem):Started nodo1
> XenimageFS:2(ocf::heartbeat:Filesystem):Started nodo4
> rsa1-fencing(stonith:external/ibmrsa-telnet):   Started nodo4
> rsa2-fencing(stonith:external/ibmrsa-telnet):   Started nodo3
> rsa3-fencing(stonith:external/ibmrsa-telnet):   Started nodo4
> rsa4-fencing(stonith:external/ibmrsa-telnet):   Started nodo3
> mailsrv-rm  (ocf::heartbeat:Xen):   Started nodo3
> dbsrv-rm(ocf::heartbeat:Xen):   Started nodo4
> websrv-rm   (ocf::heartbeat:Xen):   Started nodo4
>
> After a  switch failure all the nodes and the rsa stonith devices was
> unreachable.
>
> On the cluster happen the following error on one node
>
> Aug 18 13:11:38 nodo1 cluster-dlm: receive_plocks_stored:
> receive_plocks_stored 1778493632:2 need_plocks 0#012
>
> Aug 18 13:11:38 nodo1 kernel: [ 4154.272025] [ cut here
> ]
>
> Aug 18 13:11:38 nodo1 kernel: [ 4154.272036] kernel BUG at
> /usr/src/packages/BUILD/kernel-xen-2.6.31.12/linux-2.6.31/fs/inode.c:1323!
>
> Aug 18 13:11:38 nodo1 kernel: [ 4154.272042] invalid opcode:  [#1] SMP
>
> Aug 18 13:11:38 nodo1 kernel: [ 4154.272046] last sysfs file:
> /sys/kernel/dlm/0BB443F896254AD3BA8FB960C425B666/control
>
> Aug 18 13:11:38 nodo1 kernel: [ 4154.272050] CPU 1
>
> Aug 18 13:11:38 nodo1 kernel: [ 4154.272053] Modules linked in:
> nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_physdev
> iptable_filter ip_tables x_tables ocfs2 ocfs2_nodemanager quota_tree
> ocfs2_stack_user ocfs2_stackglue dlm configfs netbk coretemp blkbk
> blkback_pagemap blktap xenbus_be ipmi_si edd dm_round_robin scsi_dh_rdac
> dm_multipath scsi_dh bridge stp llc bonding ipv6 fuse ext4 jbd2 crc16 loop
> dm_mod sr_mod ide_pci_generic ide_core iTCO_wdt ata_generic ibmpex i5k_amb
> ibmaem iTCO_vendor_support ipmi_msghandler bnx2 i5000_edac 8250_pnp shpchp
> ata_piix pcspkr ics932s401 joydev edac_core i2c_i801 ses pci_hotplug 8250
> i2c_core serio_raw enclosure serial_core button sg reiserfs usbhid hid
> uhci_hcd ehci_hcd xenblk cdrom xennet fan processor pata_acpi lpfc thermal
> thermal_sys hwmon aacraid [last unloaded: ocfs2_stackglue]
>
> Aug 18 13:11:38 nodo1 kernel: [ 4154.272111] Pid: 8889, comm: dlm_send Not
> tainted 2.6.31.12-0.2-xen #1 IBM System x3650 -[7979AC1]-
>
> Aug 18 13:11:38 nodo1 kernel: [ 4154.2

Re: [Pacemaker] Howto upgrade Pacemaker cluster from Version: 1.0.2 to the last released on clusterlabs

2010-08-30 Thread Roberto Giordani
Thanks!

On 08/30/2010 11:15 AM, Andrew Beekhof wrote:
> http://www.clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/ap-upgrade-config.html
>
> On Sat, Aug 28, 2010 at 9:34 AM, Roberto Giordani  
> wrote:
>   
>> Hello,
>> but How to migrate the entire cluster configuration (resource, nodes,
>> stonith)?
>> Regards,
>> Roberto.
>>
>> On 08/26/2010 09:40 AM, Andrew Beekhof wrote:
>> 
>>> On Wed, Aug 18, 2010 at 11:15 PM, Roberto Giordani  
>>> wrote:
>>>
>>>   
 Hello,
 I'd like to know how is it possible to upgrade a running cluster
 pacemaker on Opensuse 11.2 version 1.02 to the last available on 
 clusterlabs
 using dlm + ocfs2 too

 
>>> The problem is that the versions of pacemaker on clusterlabs are
>>> probably incompatible with your existing dlm and ocfs2 packages.
>>> You'd need to rebuild them against the new pacemaker packages.
>>>
>>>
>>>   
 Could someone explain in some steps how to proceed without loose all the
 cluster configuration up and running?

 
>>> Assuming you have a compatible set of new packages (see above), just
>>> do a rolling upgrade.
>>>
>>> ___
>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: 
>>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>>
>>>
>>>   
>>
>> ___
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: 
>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>
>> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: 
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>   


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Two cloned VM, only one of the both shows online when starting corosync/pacemaker

2010-08-30 Thread Guillaume Chanaud

 Le 27/08/2010 16:29, Andrew Beekhof a écrit :

On Tue, Aug 3, 2010 at 4:40 PM, Guillaume Chanaud
  wrote:

Hello,
sorry for the delay it took, july is not the best month to get things
working fast.

Neither is august :-)


lol sure :)

Here is the core dump file (55MB) :
http://www.connecting-nature.com/corosync/core
corosync version is 1.2.3

Sorry, but I can't do anything with that file.
Core files are only usable on the machine they came from.

you'll have to open it with gdb and type "bt" to get a backtrace.
Sorry , saw that after sending last mail. In fact i tried to debug/bt 
it, but

1. I'm not a c developer (i understand a little about it...)
2. I never used gdb before uh, so hard to step into the corosync debug

I'm not sure the trace will be usefull but here it is :
Core was generated by `corosync'.
Program terminated with signal 6, Aborted.
#0  0x003506a329a5 in raise (sig=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:64

64  return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb) bt
#0  0x003506a329a5 in raise (sig=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:64

#1  0x003506a34185 in abort () at abort.c:92
#2  0x003506a2b935 in __assert_fail (assertion=0x7fce14f0b2ae 
"token_memb_entries >= 1", file=, line=1194,

function=) at assert.c:81
#3  0x7fce14efb716 in memb_consensus_agreed 
(instance=0x7fce12338010) at totemsrp.c:1194
#4  0x7fce14f01723 in memb_join_process (instance=0x7fce12338010, 
memb_join=0x822bf8) at totemsrp.c:3922
#5  0x7fce14f01a3a in message_handler_memb_join 
(instance=0x7fce12338010, msg=, msg_len=optimized out>,

endian_conversion_needed=) at totemsrp.c:4165
#6  0x7fce14ef7644 in rrp_deliver_fn (context=, 
msg=0x822bf8, msg_len=420) at totemrrp.c:1404
#7  0x7fce14ef6569 in net_deliver_fn (handle=, 
fd=, revents=, data=0x822550)

at totemudp.c:1244
#8  0x7fce14ef259a in poll_run (handle=2240235047305084928) at 
coropoll.c:435
#9  0x00405594 in main (argc=, argv=optimized out>) at main.c:1558


I tried to compile it from source (1.2.7 tag and svn trunk) but i'm 
unable to backtrace it as gdb tell me he doesn't find debuginfos (i did 
a ./configure --enable-debug but gdb seems to need a 
/usr/lib/debug/.build-id/... related to current executable, and i don't 
know how to generate this)
On the 1.2.7 version, init script tell it started correctly but after 
one or two seconds only lrmd and pengine processes are still alive


On the trunk version, the init script fail to start (and so processes 
are correctly killed)


In the 1.2.7 when i'm stepping, i'm unable to go further than
service.c:201res = service->exec_init_fn (corosync_api);
as it should create a new process for pacemaker services i think
(i don't know how to step inside this new process and debug it)

If you need/want i'll let you access this vm via ssh to test/debug it.

It should be related to other posts about "Could not connect to the CIB 
service: connection failed" (i saw some message related to things more 
or less like my problem)


I put back end of the messages log here :
Aug 30 16:30:50 www01 crmd: [19821]: notice: ais_dispatch: Membership 
208656: quorum acquired
Aug 30 16:30:50 www01 crmd: [19821]: info: crm_update_peer: Node 
www01.connecting-nature.com: id=1006676160 state=member (new) addr=r(0) 
ip(192.168.0.60)  (
Aug 30 16:30:50 www01 crmd: [19821]: info: crm_new_peer: Node  now 
has id: 83929280
Aug 30 16:30:50 www01 crmd: [19821]: info: crm_update_peer: Node (null): 
id=83929280 state=member (new) addr=r(0) ip(192.168.0.5)  votes=0 born=0 
seen=20865
Aug 30 16:30:50 www01 crmd: [19821]: info: crm_new_peer: Node 
filer2.connecting-nature.com now has id: 100706496
Aug 30 16:30:50 www01 crmd: [19821]: info: crm_new_peer: Node 100706496 
is now known as filer2.connecting-nature.com
Aug 30 16:30:50 www01 crmd: [19821]: info: crm_update_peer: Node 
filer2.connecting-nature.com: id=100706496 state=member (new) addr=r(0) 
ip(192.168.0.6)  vo
Aug 30 16:30:50 www01 crmd: [19821]: info: crm_new_peer: Node  now 
has id: 1174448320
Aug 30 16:30:50 www01 crmd: [19821]: info: crm_update_peer: Node (null): 
id=1174448320 state=member (new) addr=r(0) ip(192.168.0.70)  votes=0 
born=0 seen=20
Aug 30 16:30:50 www01 crmd: [19821]: info: do_started: The local CRM is 
operational
Aug 30 16:30:50 www01 crmd: [19821]: info: do_state_transition: State 
transition S_STARTING -> S_PENDING [ input=I_PENDING 
cause=C_FSA_INTERNAL origin=do_st

Aug 30 16:30:50 www01 corosync[19809]:   [TOTEM ] FAILED TO RECEIVE
Aug 30 16:30:51 www01 crmd: [19821]: info: ais_dispatch: Membership 
208656: quorum retained
Aug 30 16:30:51 www01 crmd: [19821]: info: te_connect_stonith: 
Attempting connection to fencing daemon...

Aug 30 16:30:52 www01 crmd: [19821]: info: te_connect_stonith: Connected
Aug 30 16:30:52 www01 cib: [19817]: ERROR: ais_dispatch: Receiving 
message body failed: (2) Library error: Resource temporarily unavailable 
(11)
Aug 30 16:30:52 www01 

Re: [Pacemaker] clmvd hangs on node1 if node2 is fenced

2010-08-30 Thread Rainer
Michael Smith  writes:

> I've got a pair of fully patched SLES11 SP1 nodes and they're showing 
> what I guess is the same behaviour: if I hard-poweroff node2, operations 
> like "vgdisplay -v" hang on node1 for quite some time. Sometimes a 
> minute, sometimes two, sometimes forever. They get stuck here:

Hi Michael,

the Bug is fixed with the DEVEL Package after SP1 - and yes you need STONITH to
work it stable ;)

Kind regards,

Rainer


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] DRBD Outdated by Heartbeat/Pacemaker - node alive don't get Primary

2010-08-30 Thread Raphaël LOUIS
Hi pacemaker group,


I am using Debian 5.0.5 Lenny, DRBD 8.3.7, Heartbeat 3.0.3 (backports),
pacemaker 1.0.9 (backports)

I have a problem with putting nodes in standby mode, or shutting down one
node :

When one node is offline or in standby (crm node standby), the other one
goes slave and DRBD gets secondary / outdated :

#crm_mon

Last updated: Mon Aug 30 11:50:45 2010
Stack: Heartbeat
Current DC: swmaster1 (2cd4bf30-7a63-4da7-9102-b4f49d91b9d0) - partition
with quorum
Version: 1.0.9-unknown
2 Nodes configured, unknown expected votes
2 Resources configured.


Online: [ swmaster1 ]
OFFLINE: [ swslave1 ]

 Master/Slave Set: ms_drbd_mysql
 Slaves: [ swmaster1 ]
 Stopped: [ drbd_mysql:0 ]

_

SWMaster1:~# cat /proc/drbd
version: 8.3.7 (api:88/proto:86-91)
built-in
1: cs:WFConnection ro:Secondary/Unknown ds:Outdated/DUnknown C r
ns:1104 nr:744 dw:1944 dr:67439479 al:44 bm:67 lo:0 pe:0 ua:0 ap:0 ep:1
wo:b oos:64


___

When both nodes are online, everything is ok, and I can switch resources
using 'crm resource migrate grp_mysql'  :



Last updated: Mon Aug 30 11:57:09 2010
Stack: Heartbeat
Current DC: swmaster1 (2cd4bf30-7a63-4da7-9102-b4f49d91b9d0) - partition
with quorum
Version: 1.0.9-unknown
2 Nodes configured, unknown expected votes
2 Resources configured.


Online: [ swslave1 swmaster1 ]

 Master/Slave Set: ms_drbd_mysql
 Masters: [ swmaster1 ]
 Slaves: [ swslave1 ]
 Resource Group: grp_mysql
 fs_mysql   (ocf::heartbeat:Filesystem):Started swmaster1
 mysqld (lsb:mysql):Started swmaster1

_

Reconnecting...SWMaster1:~# cat /proc/drbd
version: 8.3.7 (api:88/proto:86-91)
built-in
1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r
ns:1520 nr:1136 dw:2704 dr:67449610 al:50 bm:79 lo:0 pe:0 ua:0 ap:0 ep:1
wo:b oos:0


___


Instead of having an HA infrastructure, I have a LA :).

When I use DRBD manually and shutting down hearbeat (/etc/init.d/heartbeat
stop), I can stop DRBD on one side and the other node stay in update state,
so I can put it primary (drbdadm primary all).


How can I do to make understand Heartbeat/Pacemaker not to put DRBD in
Outdated state and make it putting services/resources on the other node ?


Here are my configurations :

SWMaster1:~# crm configure show
node $id="2cd4bf30-7a63-4da7-9102-b4f49d91b9d0" swmaster1 \
attributes standby="off"
node $id="e022eabd-ef7b-4049-b941-fc26d00c5cd1" swslave1 \
attributes standby="off"
primitive drbd_mysql ocf:linbit:drbd \
params drbd_resource="mysql" \
op monitor interval="15s"
primitive fs_mysql ocf:heartbeat:Filesystem \
params device="/dev/drbd/by-res/mysql" directory="/var/lib/mysql"
fstype="ext3"
primitive mysqld lsb:mysql
group grp_mysql fs_mysql mysqld
ms ms_drbd_mysql drbd_mysql \
meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true"
location cli-prefer-mysqld mysqld \
rule $id="cli-prefer-rule-mysqld" inf: #uname eq swmaster1
location cli-standby-grp_mysql grp_mysql \
rule $id="cli-standby-rule-grp_mysql" -inf: #uname eq swslave1
colocation mysql_on_drbd inf: grp_mysql ms_drbd_mysql:Master
order mysql_after_drbd inf: ms_drbd_mysql:promote grp_mysql:start
property $id="cib-bootstrap-options" \
dc-version="1.0.9-unknown" \
cluster-infrastructure="Heartbeat" \
stonith-enabled="false" \
no-quorum-policy="ignore"


___


SWMaster1:~# cat /etc/ha.d/ha.cf

use_logd on

autojoin none

node SWMaster1
node SWSlave1

crm yes

compression bz2

warntime 10
deadtime 40
initdead 60

msgfmt netstring

ucast eth0 ip.serv.mas.ter
ucast eth0 ip.serv.sla.ve


___

cat /etc/drbd.conf

global {
usage-count yes;
}

common {
protocol C;
syncer  {
#algorithme a utiliser et activation de la
possibilite verification de synchronisation on-line - drbdadm verify
[ressource|all]
verify-alg sha1;

#comparaison de blocs par checksum pour
verifier necessite ecriture
csums-alg sha1;

#vitesse de synchronisation - drbdsetup
/dev/drbdnum syncer -r 10M
rate 7M;
}

disk{
on-io-error detach;
}

net {
#
http://www.drbd.org/users-guide-emb/s-integrity-check.html

Re: [Pacemaker] cluster-dlm: set_fs_notified: set_fs_notified no nodeid 1812048064#012

2010-08-30 Thread Dan Frincu
Try using RSTP on the switches, if possible, it has a lower convergence 
time.


Roberto Giordani wrote:

Thanks,
who should I contact? Which mailing list?
I've discovered that this problem occours when the port of my switch
where the cluster ring is connected became "blocked" due spanning tree.
I've resolved the bug using for the ring a separate switch without
spanning tre enabled and different subnet.
Is there a configuration to avoid that before the spanning tree
recalculate the route due a failure, the cluster nodes doesn't hang?
The hang occurses on SLES11sp1 too where the servers are up running, the
cluster status is ok, but when try to connect to the server with ssh,
after the login hang the session.

Usually the recalculate takes 50 seconds.

Regards,
Roberto.

On 08/26/2010 10:24 AM, Dejan Muhamedagic wrote:
  

Hi,

On Thu, Aug 26, 2010 at 09:36:10AM +0200, Andrew Beekhof wrote:
  


On Wed, Aug 18, 2010 at 6:24 PM, Roberto Giordani  wrote:

  

Hello,
I'll explain what’s happened after a network black-out
I've a cluster with pacemaker on Opensuse 11.2 64bit

Last updated: Wed Aug 18 18:13:33 2010
Current DC: nodo1 (nodo1)
Version: 1.0.2-ec6b0bbee1f3aa72c4c2559997e675db6ab39160
3 Nodes configured.
11 Resources configured.


Node: nodo1 (nodo1): online
Node: nodo3 (nodo3): online
Node: nodo4 (nodo4): online

Clone Set: dlm-clone
dlm:0   (ocf::pacemaker:controld):  Started nodo3
dlm:1   (ocf::pacemaker:controld):  Started nodo1
dlm:2   (ocf::pacemaker:controld):  Started nodo4
Clone Set: o2cb-clone
o2cb:0  (ocf::ocfs2:o2cb):  Started nodo3
o2cb:1  (ocf::ocfs2:o2cb):  Started nodo1
o2cb:2  (ocf::ocfs2:o2cb):  Started nodo4
Clone Set: XencfgFS-Clone
XencfgFS:0  (ocf::heartbeat:Filesystem):Started nodo3
XencfgFS:1  (ocf::heartbeat:Filesystem):Started nodo1
XencfgFS:2  (ocf::heartbeat:Filesystem):Started nodo4
Clone Set: XenimageFS-Clone
XenimageFS:0(ocf::heartbeat:Filesystem):Started nodo3
XenimageFS:1(ocf::heartbeat:Filesystem):Started nodo1
XenimageFS:2(ocf::heartbeat:Filesystem):Started nodo4
rsa1-fencing(stonith:external/ibmrsa-telnet):   Started nodo4
rsa2-fencing(stonith:external/ibmrsa-telnet):   Started nodo3
rsa3-fencing(stonith:external/ibmrsa-telnet):   Started nodo4
rsa4-fencing(stonith:external/ibmrsa-telnet):   Started nodo3
mailsrv-rm  (ocf::heartbeat:Xen):   Started nodo3
dbsrv-rm(ocf::heartbeat:Xen):   Started nodo4
websrv-rm   (ocf::heartbeat:Xen):   Started nodo4

After a  switch failure all the nodes and the rsa stonith devices was
unreachable.

On the cluster happen the following error on one node

Aug 18 13:11:38 nodo1 cluster-dlm: receive_plocks_stored:
receive_plocks_stored 1778493632:2 need_plocks 0#012

Aug 18 13:11:38 nodo1 kernel: [ 4154.272025] [ cut here
]

Aug 18 13:11:38 nodo1 kernel: [ 4154.272036] kernel BUG at
/usr/src/packages/BUILD/kernel-xen-2.6.31.12/linux-2.6.31/fs/inode.c:1323!

Aug 18 13:11:38 nodo1 kernel: [ 4154.272042] invalid opcode:  [#1] SMP

Aug 18 13:11:38 nodo1 kernel: [ 4154.272046] last sysfs file:
/sys/kernel/dlm/0BB443F896254AD3BA8FB960C425B666/control

Aug 18 13:11:38 nodo1 kernel: [ 4154.272050] CPU 1

Aug 18 13:11:38 nodo1 kernel: [ 4154.272053] Modules linked in:
nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_physdev
iptable_filter ip_tables x_tables ocfs2 ocfs2_nodemanager quota_tree
ocfs2_stack_user ocfs2_stackglue dlm configfs netbk coretemp blkbk
blkback_pagemap blktap xenbus_be ipmi_si edd dm_round_robin scsi_dh_rdac
dm_multipath scsi_dh bridge stp llc bonding ipv6 fuse ext4 jbd2 crc16 loop
dm_mod sr_mod ide_pci_generic ide_core iTCO_wdt ata_generic ibmpex i5k_amb
ibmaem iTCO_vendor_support ipmi_msghandler bnx2 i5000_edac 8250_pnp shpchp
ata_piix pcspkr ics932s401 joydev edac_core i2c_i801 ses pci_hotplug 8250
i2c_core serio_raw enclosure serial_core button sg reiserfs usbhid hid
uhci_hcd ehci_hcd xenblk cdrom xennet fan processor pata_acpi lpfc thermal
thermal_sys hwmon aacraid [last unloaded: ocfs2_stackglue]

Aug 18 13:11:38 nodo1 kernel: [ 4154.272111] Pid: 8889, comm: dlm_send Not
tainted 2.6.31.12-0.2-xen #1 IBM System x3650 -[7979AC1]-

Aug 18 13:11:38 nodo1 kernel: [ 4154.272113] RIP: e030:[]
[] iput+0x82/0x90

Aug 18 13:11:38 nodo1 kernel: [ 4154.272121] RSP: e02b:88014ec03c30
EFLAGS: 00010246

Aug 18 13:11:38 nodo1 kernel: [ 4154.272122] RAX:  RBX:
880148a703c8 RCX: 

Aug 18 13:11:38 nodo1 kernel: [ 4154.272123] RDX: c901 RSI:
880148a70380 RDI: 880148a703c8

Aug 18 13:11:38 nodo1 kernel: [ 4154.272125] RBP: 88014ec03c50 R08:
b038 R09: fe99594c51a57607

Aug 18 13:11:38 nodo1 kernel: [ 4154.272126] R10: 880040410270 R11:
 R12: 8801713e6e08

Aug 18 13:11:38 nodo1 kernel: 

Re: [Pacemaker] Howto upgrade Pacemaker cluster from Version: 1.0.2 to the last released on clusterlabs

2010-08-30 Thread Andrew Beekhof
http://www.clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/ap-upgrade-config.html

On Sat, Aug 28, 2010 at 9:34 AM, Roberto Giordani  wrote:
> Hello,
> but How to migrate the entire cluster configuration (resource, nodes,
> stonith)?
> Regards,
> Roberto.
>
> On 08/26/2010 09:40 AM, Andrew Beekhof wrote:
>> On Wed, Aug 18, 2010 at 11:15 PM, Roberto Giordani  
>> wrote:
>>
>>> Hello,
>>> I'd like to know how is it possible to upgrade a running cluster
>>> pacemaker on Opensuse 11.2 version 1.02 to the last available on clusterlabs
>>> using dlm + ocfs2 too
>>>
>> The problem is that the versions of pacemaker on clusterlabs are
>> probably incompatible with your existing dlm and ocfs2 packages.
>> You'd need to rebuild them against the new pacemaker packages.
>>
>>
>>> Could someone explain in some steps how to proceed without loose all the
>>> cluster configuration up and running?
>>>
>> Assuming you have a compatible set of new packages (see above), just
>> do a rolling upgrade.
>>
>> ___
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: 
>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>
>>
>
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: 
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] ocf:pacemaker:o2cb Unable to connect to CKPT

2010-08-30 Thread Andrew Beekhof
On Wed, Aug 25, 2010 at 11:05 AM, Michael Schwartzkopff
 wrote:
> Am Mittwoch, den 25.08.2010, 09:43 +0200 schrieb Andrew Beekhof:
>> On Fri, Aug 6, 2010 at 3:33 PM, Michael Fung  wrote:
>> > Hi All,
>> >
>> >
>> > I am still testing with the Debian Squeeze machine.
>> >
>> > Unable to start the RA ocf:pacemaker:o2cb
> (...)
>>
>> No. It just tells corosync to load the extra services like ckpt (part
>> of openais) needed by ocfs2
>
>
> Hi,
>
> how can I tell corosync to load ckpt service?

Add a service block like you do for pacemaker or use the same option as michael

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Howto upgrade Pacemaker cluster from Version: 1.0.2 to the last released on clusterlabs

2010-08-30 Thread Roberto Giordani
Hello,
but How to migrate the entire cluster configuration (resource, nodes,
stonith)?
Regards,
Roberto.

On 08/26/2010 09:40 AM, Andrew Beekhof wrote:
> On Wed, Aug 18, 2010 at 11:15 PM, Roberto Giordani  
> wrote:
>   
>> Hello,
>> I'd like to know how is it possible to upgrade a running cluster
>> pacemaker on Opensuse 11.2 version 1.02 to the last available on clusterlabs
>> using dlm + ocfs2 too
>> 
> The problem is that the versions of pacemaker on clusterlabs are
> probably incompatible with your existing dlm and ocfs2 packages.
> You'd need to rebuild them against the new pacemaker packages.
>
>   
>> Could someone explain in some steps how to proceed without loose all the
>> cluster configuration up and running?
>> 
> Assuming you have a compatible set of new packages (see above), just
> do a rolling upgrade.
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: 
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>   


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] how to keep ftp connection when swap from primary to secondary

2010-08-30 Thread Michael Schwartzkopff
Am Donnerstag, den 26.08.2010, 17:17 +0200 schrieb Raoul Bhatia [IPAX]:
> On 08/26/2010 04:42 PM, liang...@asc-csa.gc.ca wrote:
> > I have followed the guide in “Clusters from Scratch” written by Andrew
> > Beekhof and successfully setup an Active/Passive pair of cluster
> > servers. The cluster runs in Fedora 13 and includes services like
> > apache, vsftpd and nfs. Drbd is used to allow data consistence during a
> > failover. Everything works fine except ftp lose its connection when the
> > service swaps from primary to the secondary or vice versa. I know to
> > keep the ftp connection, one may need to keep the connection states for
> > the session across the nodes. But I couldn’t find clue how to do it.
> > Does anyone there have any idea how to keep the ftp connection when
> > swapping nodes, if it is possible?
> 
> hi,
> 
> as of now, we're not syncing our connections between the load
> balancers, but i would suggest
> http://www.linuxvirtualserver.org/docs/sync.html and the like.
> 
> 
> cheers,
> raoul

Even a Load Balancer wouldn't sync the data that the FTP server on the
real servers hold in RAM. You would need a cluster-aware FTP for such
purpose.

On the other hand: How often does a failover happen? Is it really
nescessary to take care for such rare events?

Michael.


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] cluster-dlm: set_fs_notified: set_fs_notified no nodeid 1812048064#012

2010-08-30 Thread Roberto Giordani
Thanks,
who should I contact? Which mailing list?
I've discovered that this problem occours when the port of my switch
where the cluster ring is connected became "blocked" due spanning tree.
I've resolved the bug using for the ring a separate switch without
spanning tre enabled and different subnet.
Is there a configuration to avoid that before the spanning tree
recalculate the route due a failure, the cluster nodes doesn't hang?
The hang occurses on SLES11sp1 too where the servers are up running, the
cluster status is ok, but when try to connect to the server with ssh,
after the login hang the session.

Usually the recalculate takes 50 seconds.

Regards,
Roberto.

On 08/26/2010 10:24 AM, Dejan Muhamedagic wrote:
> Hi,
>
> On Thu, Aug 26, 2010 at 09:36:10AM +0200, Andrew Beekhof wrote:
>   
>> On Wed, Aug 18, 2010 at 6:24 PM, Roberto Giordani  
>> wrote:
>> 
>>> Hello,
>>> I'll explain what’s happened after a network black-out
>>> I've a cluster with pacemaker on Opensuse 11.2 64bit
>>> 
>>> Last updated: Wed Aug 18 18:13:33 2010
>>> Current DC: nodo1 (nodo1)
>>> Version: 1.0.2-ec6b0bbee1f3aa72c4c2559997e675db6ab39160
>>> 3 Nodes configured.
>>> 11 Resources configured.
>>> 
>>>
>>> Node: nodo1 (nodo1): online
>>> Node: nodo3 (nodo3): online
>>> Node: nodo4 (nodo4): online
>>>
>>> Clone Set: dlm-clone
>>> dlm:0   (ocf::pacemaker:controld):  Started nodo3
>>> dlm:1   (ocf::pacemaker:controld):  Started nodo1
>>> dlm:2   (ocf::pacemaker:controld):  Started nodo4
>>> Clone Set: o2cb-clone
>>> o2cb:0  (ocf::ocfs2:o2cb):  Started nodo3
>>> o2cb:1  (ocf::ocfs2:o2cb):  Started nodo1
>>> o2cb:2  (ocf::ocfs2:o2cb):  Started nodo4
>>> Clone Set: XencfgFS-Clone
>>> XencfgFS:0  (ocf::heartbeat:Filesystem):Started nodo3
>>> XencfgFS:1  (ocf::heartbeat:Filesystem):Started nodo1
>>> XencfgFS:2  (ocf::heartbeat:Filesystem):Started nodo4
>>> Clone Set: XenimageFS-Clone
>>> XenimageFS:0(ocf::heartbeat:Filesystem):Started nodo3
>>> XenimageFS:1(ocf::heartbeat:Filesystem):Started nodo1
>>> XenimageFS:2(ocf::heartbeat:Filesystem):Started nodo4
>>> rsa1-fencing(stonith:external/ibmrsa-telnet):   Started nodo4
>>> rsa2-fencing(stonith:external/ibmrsa-telnet):   Started nodo3
>>> rsa3-fencing(stonith:external/ibmrsa-telnet):   Started nodo4
>>> rsa4-fencing(stonith:external/ibmrsa-telnet):   Started nodo3
>>> mailsrv-rm  (ocf::heartbeat:Xen):   Started nodo3
>>> dbsrv-rm(ocf::heartbeat:Xen):   Started nodo4
>>> websrv-rm   (ocf::heartbeat:Xen):   Started nodo4
>>>
>>> After a  switch failure all the nodes and the rsa stonith devices was
>>> unreachable.
>>>
>>> On the cluster happen the following error on one node
>>>
>>> Aug 18 13:11:38 nodo1 cluster-dlm: receive_plocks_stored:
>>> receive_plocks_stored 1778493632:2 need_plocks 0#012
>>>
>>> Aug 18 13:11:38 nodo1 kernel: [ 4154.272025] [ cut here
>>> ]
>>>
>>> Aug 18 13:11:38 nodo1 kernel: [ 4154.272036] kernel BUG at
>>> /usr/src/packages/BUILD/kernel-xen-2.6.31.12/linux-2.6.31/fs/inode.c:1323!
>>>
>>> Aug 18 13:11:38 nodo1 kernel: [ 4154.272042] invalid opcode:  [#1] SMP
>>>
>>> Aug 18 13:11:38 nodo1 kernel: [ 4154.272046] last sysfs file:
>>> /sys/kernel/dlm/0BB443F896254AD3BA8FB960C425B666/control
>>>
>>> Aug 18 13:11:38 nodo1 kernel: [ 4154.272050] CPU 1
>>>
>>> Aug 18 13:11:38 nodo1 kernel: [ 4154.272053] Modules linked in:
>>> nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_physdev
>>> iptable_filter ip_tables x_tables ocfs2 ocfs2_nodemanager quota_tree
>>> ocfs2_stack_user ocfs2_stackglue dlm configfs netbk coretemp blkbk
>>> blkback_pagemap blktap xenbus_be ipmi_si edd dm_round_robin scsi_dh_rdac
>>> dm_multipath scsi_dh bridge stp llc bonding ipv6 fuse ext4 jbd2 crc16 loop
>>> dm_mod sr_mod ide_pci_generic ide_core iTCO_wdt ata_generic ibmpex i5k_amb
>>> ibmaem iTCO_vendor_support ipmi_msghandler bnx2 i5000_edac 8250_pnp shpchp
>>> ata_piix pcspkr ics932s401 joydev edac_core i2c_i801 ses pci_hotplug 8250
>>> i2c_core serio_raw enclosure serial_core button sg reiserfs usbhid hid
>>> uhci_hcd ehci_hcd xenblk cdrom xennet fan processor pata_acpi lpfc thermal
>>> thermal_sys hwmon aacraid [last unloaded: ocfs2_stackglue]
>>>
>>> Aug 18 13:11:38 nodo1 kernel: [ 4154.272111] Pid: 8889, comm: dlm_send Not
>>> tainted 2.6.31.12-0.2-xen #1 IBM System x3650 -[7979AC1]-
>>>
>>> Aug 18 13:11:38 nodo1 kernel: [ 4154.272113] RIP: e030:[]
>>> [] iput+0x82/0x90
>>>
>>> Aug 18 13:11:38 nodo1 kernel: [ 4154.272121] RSP: e02b:88014ec03c30
>>> EFLAGS: 00010246
>>>
>>> Aug 18 13:11:38 nodo1 kernel: [ 4154.272122] RAX:  RBX:
>>> 880148a703c8 RCX: 
>>>
>>> Aug 18 13:11:38 nodo1 kernel: [ 4154.272123] RDX: c901 RSI:
>>> 880148a70380 RDI: 880148a703c8
>>>
>>> Aug 18 13:11:38 nodo1 

Re: [Pacemaker] ocf:pacemaker:o2cb Unable to connect to CKPT

2010-08-30 Thread Michael Schwartzkopff
Am Mittwoch, den 25.08.2010, 09:43 +0200 schrieb Andrew Beekhof:
> On Fri, Aug 6, 2010 at 3:33 PM, Michael Fung  wrote:
> > Hi All,
> >
> >
> > I am still testing with the Debian Squeeze machine.
> >
> > Unable to start the RA ocf:pacemaker:o2cb
(...)
> 
> No. It just tells corosync to load the extra services like ckpt (part
> of openais) needed by ocfs2


Hi,

how can I tell corosync to load ckpt service?

Thanks.

> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: 
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Quorum disk?

2010-08-30 Thread Michael Schwartzkopff
Am Mittwoch, den 25.08.2010, 17:01 -0400 schrieb Ciro Iriarte:
> Hi, I'm planning to use OpanAIS+Pacemaker on SLES11-SP1 and would like
> to know if it's possible to use a quorum disk in a two-node cluster.
> The idea is to avoid adding a third node just for quorum...
> 
> Regards,

Hi,

you could have a look at the sfex resource agent.

Greetings,

Michael Schwartzkopff


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] drbd diskless -> failover to other node

2010-08-30 Thread jimbob palmer
>> Are you saying that if a server loses its disk, it will transparently
>> write to the secondary server without any need to failover at all?
>
> Yes. As long as it still has a network connection to the peer, of course.
>
>> WOW. I never knew DRBD did this. This is a _fantastic_ feature :)
>
> Well, that's what diskless mode is really all about.
> http://www.drbd.org/users-guide/s-handling-disk-errors.html

A final question: does DRBD switch to Protocol C in diskless mode, or
does it stay with the configured Protocol? If it doesn't switch, can
it be configured to?

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker