[DRBD-user] DRBD Diskless

2010-02-09 Thread Alan Cowes


Every time I restart both clusters, one comes back up as secondary and the
other one comes back as diskless (only one resource which is the /home


All I have to do (manually) is to stop DRBD & HEARTBEAT, and start both
services back again and promote the secondary as primary.


Why is this thing happening?


Any extra information, please let me know,




De acuerdo con la Ley de Servicios de la Sociedad de la Informacion y Comercio 
Electrónico aprobada por el parlamento español y de la vigente Ley Orgánica 15 
13/12/1999 de Protección de Datos española, le comunicamos que su dirección de 
Correo electrónico forma parte de nuestra base de datos, teniendo usted derecho 
de oposición, acceso, rectificación y cancelación de sus datos. Si desea 
acceder, rectificar, cancelar u oponerse al tratamiento de sus datos, contacte 
con nosotros en DATATRONICS, S.A. Avda. Cardenal Herrera Oria,383 - 28034 
Madrid, o bien envíenos un e-mail a l...@datatronics.es junto con su  nombre y 
apellidos, indicando en el campo asunto el texto: "Borrar Datos Personales". Si 
desea seguir recibiendo información, su dirección de correo electrónico seguirá 
en nuestra base de datos, entendiéndose que acepta los términos y condiciones, 
expresados en este Aviso Legal. Agradeciendo su colaboración para poder seguir 
ofreciéndole nuestros servicios y reiterando nuest
 ro firme compromiso de uso responsable de sus datos, aprovechamos la ocasión 
para saludarle muy cordialmente.


Este mensaje y los archivos adjuntos son confidenciales. Los mismos contienen 
informaciÿn reservada y que no puede ser difundida. Si usted ha recibido este 
correo por error, por favor avÿsenos inmediatamente por e-mail o por telÿfono 
(+34 91 376 92 90) y tenga la amabilidad de eliminarlo de su sistema.


This message and its attached files are confidential. They contain information 
that is privileged and legally exempt from disclosure. If you have received 
this e-mail by mistake, please let us know immediately by e-mail and by phone 
(+34 91 376 92 90 ) and delete it from your system. 
drbd-user mailing list

Re: [DRBD-user] drbd fencing policy problem, upgrading from 8.2.7 -> 8.3.6

2010-02-09 Thread Petrakis, Peter

That bitmap adjustment did the trick, the resource-only fencing
is  working now. I'll investigate the new metadata tools, the less
of this stuff we have to maintain the better. Thanks!


> -Original Message-
> From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-
> boun...@lists.linbit.com] On Behalf Of Lars Ellenberg
> Sent: Saturday, February 06, 2010 1:12 PM
> To: drbd-user@lists.linbit.com
> Subject: Re: [DRBD-user] drbd fencing policy problem,upgrading from
> 8.2.7 -> 8.3.6
> On Fri, Feb 05, 2010 at 12:46:24PM -0500, Petrakis, Peter wrote:
> > Hi All,
> >
> > We use resource-only fencing currently which is causing us a problem
> > when we're bringing up a cluster for the first time. This all worked
> > well with 8.2.7. We only have one node active at this time and the
> > fencing handler is bailing with '5' which is correct. The problem
> > this return failure is stopping us from becoming Primary.
> >
> >
> > block drbd5: helper command: /usr/lib/spine/bin/avance_drbd_helper
> > fence-peer minor-5 exit code 5 (0x500)
> > block drbd5: State change failed: Refusing to be Primary without at
> > least one UpToDate disk
> > block drbd5:   state = { cs:StandAlone ro:Secondary/Unknown
> > ds:Consistent/DUnknown r--- }
> > block drbd5:  wanted = { cs:StandAlone ro:Primary/Unknown
> > ds:Consistent/DUnknown r--- }
> I'm quoting our internal bugzilla here:
> scenario:
>   A (Primary) --- b (Secondary)
>   A (Primary) b (down, crashed; does not know that it is out of
> date)
>   a (down)
>clean shutdown or crashed, does not matter, knows b is Outdated...
>   a (still down)  b (reboot)
>   after init dead, heartbeat tries to promote
> current behaviour:
> b (Secondary, Consistent, pdsk DUnknown)
> calls dopd, times out, considers other node as outdated,
> and goes Primary.
> proposal:
> only try to outdate peer if local data is UpToDate.
>   - works for primary crash (secondary data is UpToDate)
>   - works for secondary crash (naturally)
>   - works for primary reboot while secondary offline,
> because pdsk is marked Outdated in local meta data,
> so we can become UpToDate on restart.
>   - additionally works for previously described scenario
> because crashed secondary has pdsk Unknown,
> thus comes up as Consistent (not UpToDate)
> avoids one more way to create diverging data sets.
> if you want to force Consistent to be UpToDate,
> you'd then need either fiddle with drbdmeta,
> or maybe just the good old "--overwrite-data-of-peer" does the trick?
> does not work for cluster crash,
> when only former secondary comes up.
> --- Comment #1 From Florian Haas 2009-07-03 09:07:57 [reply]
> -
> Bumping severity to "normal" and setting target milestone to 8.3.3.
> This is not an enhancement feature, it's a real (however subtle) bug.
> This
> tends to break dual-Primary setups with fencing, such as with GFS:
> - Dual Primary configuration, GFS and CMAN running.
> - Replication link goes away.
> - Both nodes are now "degraded clusters".
> - GFS/CMAN initiates fencing, one node gets rebooted.
> - Node reboots, link is still down. Since we were "degraded clusters"
> to begin
>   with, degr-wfc-timeout now applies, which is finite by default.
> - After the timeout expires, the recovered node is now Consistent and
> attempts
>   to fence the peer, when it should not.
> - Since the network link is still down, fencing fails, but we now
> assume the
>   peer is dead, and the node becomes Primary anyway.
> - We have split brain, diverging datasets, all our fencing precautions
> are moot.
> --- Comment #3 From Philipp Reisner 2009-08-25 14:42:58 [reply]
> Proposed solution:
> Just to recap, the expected exit codes of the fence-peer-hander:
> 3  Peer's disk state was already Inconsistent.
> 4  Peer's disk state was successfully set to Outdated (or was Outdated
> to begin with).
> 5  Connection to the peer node failed, peer could not be reached.
> 6  Peer refused to be outdated because the affected resource was in
> primary role.
> 7  Peer node was successfully fenced off the cluster. This should
> occur
>unless fencing is set to resource-and-stonith for the affected
> resource.
> Now, if we get a 5 (peer not reachable) and we are not UpToDate (that
> means we
> are only Consistent) then refuse to become primary and do not consider
> the peer
> as outdated.
> The change in DRBD is minimal, but we need then to implement exit code
> 5 also
> in crm-fence-peer.sh
> --- Comment #4 From Philipp Reisner 2009-08-26 19:15:33 [reply]
> commit 3a26bafa2e27892c9e157720525a094b55748f09
> Author: Philipp Reisner 
> Date:   Tue Aug 25 14:49:19 2009 +0200
> Do not consider peer dead when fence-peer handler can not reach it
> and we are < UpToDate
> --- Comment #5 From Philipp Reisner 2009-10-21 13:26:58 [reply]
> Released with

[DRBD-user] (no subject)

2010-02-09 Thread Petrakis, Peter

drbd-user mailing list

[DRBD-user] drbd with 3 nodes - please help a newbie

2010-02-09 Thread Muhammad Sharfuddin

I am running a two node(node1, node2) active passive(standby) Oracle
cluster via Linux-HA. Oracle is installed on "/oracle", and /oracle is
an 'ext3' filesystem on SAN/LUN.

At any given time, either all of the resources(IP, Filesystem, and
Oracle) are on node1, or node2.

Now to make a DR, I want to put/implement 'drbd', but in a way that both
the cluster nodes(node1, and node2) remains mounting the same
disk/device(SAN Disk), but there will be another machine(node3) which
should not be the part of Linux-HA cluster, and will be the standby
oracle machine having its own/separate disk. 

so my drbd configuration should be like 

1 - /oracle is mounted on /dev/drbd0(/dev/sdb1.. a SAN disk/LUN) mounted
by either node1 or node2, 


2 - /dev/drbd0(/dev/sdc1) on node3 will be the drbd devices.

is it possible ?
any help/document/url will be highly appreciated 


drbd-user mailing list

Re: [DRBD-user] LVM crash maybe due to a drbd issue (Maxence DUNNEWIND)

2010-02-09 Thread Heribert Tockner
thx Lars for your answer, now i can give you some updates.

> it will be used:
version: 8.3.7 (api:88/proto:86-91)
GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by
r...@vserver3-backup, 2010-02-08 10:59:23

> > The Problem comes with the time not only a with load and requests. I made
> > test with dbench and recursive parallel processe on the installed Apache
> on
> > differnet sites on the Vserver in drbd. This produces load about 70
> > and more.
> > There were no Problem. After 5 days the system hangs with drbd. i let
> > you know about my tests.
> Are you sure it would not hang without DRBD?

yes. without Drbd the systems have no problems
>Funny memleaks somewhere?
I have done some memetests . ALso there are other productiv Standalone
Vmware-Guest without any problems

> TCP stack mistuned to break?
> Does it also hang with DRBD unconnected?

> What happens just before the hang?
> If you can reproduce with,
> please try to also reproduce with something closer to kernel.org,
> just so we can rule out any strange side effects there.
> Good luck.
> --
> : Lars Ellenberg
> : LINBIT | Your Way to High Availability
> : DRBD/HA support and consulting http://www.linbit.com
> DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
> __
> please don't Cc me, but send to list   --   I'm subscribed
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user

Yesterday i have installed Linux.Vserver without
Grsecurity and Drbd from  sources  Version: 8.3.7 .

yesterday night the secondary node gets unconnected with following log on
primary server:

Feb  8 21:18:04 vserver3-backup kernel: [40842.249486] block drbd1: PingAck
did not arrive in time.
Feb  8 21:18:04 vserver3-backup kernel: [40842.249678] block drbd1: peer(
Secondary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate ->
DUnknown )
Feb  8 21:18:04 vserver3-backup kernel: [40842.249700] block drbd1: asender
Feb  8 21:18:04 vserver3-backup kernel: [40842.249702] block drbd1:
Terminating asender thread
Feb  8 21:18:04 vserver3-backup kernel: [40842.249780] block drbd1: short
read expecting header on sock: r=-512
Feb  8 21:18:04 vserver3-backup kernel: [40842.249988] block drbd1: Creating
new current UUID
Feb  8 21:18:04 vserver3-backup kernel: [40842.250522] block drbd1:
Connection closed
Feb  8 21:18:04 vserver3-backup kernel: [40842.250531] block drbd1: conn(
NetworkFailure -> Unconnected )
Feb  8 21:18:04 vserver3-backup kernel: [40842.250535] block drbd1: receiver
Feb  8 21:18:04 vserver3-backup kernel: [40842.250537] block drbd1:
Restarting receiver thread
Feb  8 21:18:04 vserver3-backup kernel: [40842.250541] block drbd1: receiver
Feb  8 21:18:04 vserver3-backup kernel: [40842.250546] block drbd1: conn(
Unconnected -> WFConnection )
Feb  8 21:18:11 vserver3-backup kernel: [40849.221564] block drbd0: PingAck
did not arrive in time.
Feb  8 21:18:11 vserver3-backup kernel: [40849.221658] block drbd0: peer(
Secondary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate ->
DUnknown )
Feb  8 21:18:11 vserver3-backup kernel: [40849.221675] block drbd0: asender
Feb  8 21:18:11 vserver3-backup kernel: [40849.221701] block drbd0:
Terminating asender thread
Feb  8 21:18:11 vserver3-backup kernel: [40849.221746] block drbd0: short
read expecting header on sock: r=-512
Feb  8 21:18:11 vserver3-backup kernel: [40849.221903] block drbd0: Creating
new current UUID
Feb  8 21:18:11 vserver3-backup kernel: [40849.222606] block drbd0:
Connection closed
Feb  8 21:18:11 vserver3-backup kernel: [40849.222618] block drbd0: conn(
NetworkFailure -> Unconnected )
Feb  8 21:18:11 vserver3-backup kernel: [40849.222623] block drbd0: receiver
Feb  8 21:18:11 vserver3-backup kernel: [40849.222628] block drbd0:
Restarting receiver thread
Feb  8 21:18:11 vserver3-backup kernel: [40849.222633] block drbd0: receiver
Feb  8 21:18:11 vserver3-backup kernel: [40849.222640] block drbd0: conn(
Unconnected -> WFConnection )

After this time the Secondary Server was not reachable via Network.
log from secondary node before network crash.

Feb  8 21:08:52 vserver3-produktiv ntpd[2008]: synchronized to,
stratum 3
Feb  8 21:17:01 vserver3-produktiv /USR/SBIN/CRON[6004]: (root) CMD (   cd /
&& run-parts --report /etc/cron.hourly)

the server was not reachable for 30 minutes after this time pings arrived
the machine (nagios) .

Today i checked the network connection between the hosts. From primary to
secondary pings arrived . Secondary system was very strange . 1 ping returns
afterwards you only could interrupt with Control-C.

i tried to disable the network interface for drbd witch ifconfig eth1 down .
it doesnt seems to work and i interrupted with Control-C.
Only the reboot of the secondary system makes