[DRBD-user] DRBD Diskless

2010-02-09 Thread Alan Cowes
 

 

Every time I restart both clusters, one comes back up as secondary and the
other one comes back as diskless (only one resource which is the /home
directory)

 

All I have to do (manually) is to stop DRBD & HEARTBEAT, and start both
services back again and promote the secondary as primary.

 

Why is this thing happening?

 

Any extra information, please let me know,

 

Thanks!


LOPD: INFORMACION SOBRE PROTECCION Y TRATAMIENTO DE DATOS PERSONALES:

De acuerdo con la Ley de Servicios de la Sociedad de la Informacion y Comercio 
Electrónico aprobada por el parlamento español y de la vigente Ley Orgánica 15 
13/12/1999 de Protección de Datos española, le comunicamos que su dirección de 
Correo electrónico forma parte de nuestra base de datos, teniendo usted derecho 
de oposición, acceso, rectificación y cancelación de sus datos. Si desea 
acceder, rectificar, cancelar u oponerse al tratamiento de sus datos, contacte 
con nosotros en DATATRONICS, S.A. Avda. Cardenal Herrera Oria,383 - 28034 
Madrid, o bien envíenos un e-mail a l...@datatronics.es junto con su  nombre y 
apellidos, indicando en el campo asunto el texto: "Borrar Datos Personales". Si 
desea seguir recibiendo información, su dirección de correo electrónico seguirá 
en nuestra base de datos, entendiéndose que acepta los términos y condiciones, 
expresados en este Aviso Legal. Agradeciendo su colaboración para poder seguir 
ofreciéndole nuestros servicios y reiterando nuest
 ro firme compromiso de uso responsable de sus datos, aprovechamos la ocasión 
para saludarle muy cordialmente.


AVISO DE CONFIDENCIALIDAD:

Este mensaje y los archivos adjuntos son confidenciales. Los mismos contienen 
informaciÿn reservada y que no puede ser difundida. Si usted ha recibido este 
correo por error, por favor avÿsenos inmediatamente por e-mail o por telÿfono 
(+34 91 376 92 90) y tenga la amabilidad de eliminarlo de su sistema.


CONFIDENTIALE NOTICE:

This message and its attached files are confidential. They contain information 
that is privileged and legally exempt from disclosure. If you have received 
this e-mail by mistake, please let us know immediately by e-mail and by phone 
(+34 91 376 92 90 ) and delete it from your system. 
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] drbd fencing policy problem, upgrading from 8.2.7 -> 8.3.6

2010-02-09 Thread Petrakis, Peter
Lars,

That bitmap adjustment did the trick, the resource-only fencing
is  working now. I'll investigate the new metadata tools, the less
of this stuff we have to maintain the better. Thanks!

Peter

> -Original Message-
> From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-
> boun...@lists.linbit.com] On Behalf Of Lars Ellenberg
> Sent: Saturday, February 06, 2010 1:12 PM
> To: drbd-user@lists.linbit.com
> Subject: Re: [DRBD-user] drbd fencing policy problem,upgrading from
> 8.2.7 -> 8.3.6
> 
> On Fri, Feb 05, 2010 at 12:46:24PM -0500, Petrakis, Peter wrote:
> > Hi All,
> >
> > We use resource-only fencing currently which is causing us a problem
> > when we're bringing up a cluster for the first time. This all worked
> > well with 8.2.7. We only have one node active at this time and the
> > fencing handler is bailing with '5' which is correct. The problem
is,
> > this return failure is stopping us from becoming Primary.
> >
> >
> > block drbd5: helper command: /usr/lib/spine/bin/avance_drbd_helper
> > fence-peer minor-5 exit code 5 (0x500)
> > block drbd5: State change failed: Refusing to be Primary without at
> > least one UpToDate disk
> > block drbd5:   state = { cs:StandAlone ro:Secondary/Unknown
> > ds:Consistent/DUnknown r--- }
> > block drbd5:  wanted = { cs:StandAlone ro:Primary/Unknown
> > ds:Consistent/DUnknown r--- }
> 
> 
> I'm quoting our internal bugzilla here:
> 
> scenario:
>   A (Primary) --- b (Secondary)
>   A (Primary) b (down, crashed; does not know that it is out of
> date)
>   a (down)
>clean shutdown or crashed, does not matter, knows b is Outdated...
> 
>   a (still down)  b (reboot)
>   after init dead, heartbeat tries to promote
> 
> current behaviour:
> b (Secondary, Consistent, pdsk DUnknown)
> calls dopd, times out, considers other node as outdated,
> and goes Primary.
> 
> proposal:
> only try to outdate peer if local data is UpToDate.
>   - works for primary crash (secondary data is UpToDate)
>   - works for secondary crash (naturally)
>   - works for primary reboot while secondary offline,
> because pdsk is marked Outdated in local meta data,
> so we can become UpToDate on restart.
>   - additionally works for previously described scenario
> because crashed secondary has pdsk Unknown,
> thus comes up as Consistent (not UpToDate)
> 
> avoids one more way to create diverging data sets.
> 
> if you want to force Consistent to be UpToDate,
> you'd then need either fiddle with drbdmeta,
> or maybe just the good old "--overwrite-data-of-peer" does the trick?
> 
> does not work for cluster crash,
> when only former secondary comes up.
> 
> 
> --- Comment #1 From Florian Haas 2009-07-03 09:07:57 [reply]
--
> -
> 
> Bumping severity to "normal" and setting target milestone to 8.3.3.
> 
> This is not an enhancement feature, it's a real (however subtle) bug.
> This
> tends to break dual-Primary setups with fencing, such as with GFS:
> 
> - Dual Primary configuration, GFS and CMAN running.
> - Replication link goes away.
> - Both nodes are now "degraded clusters".
> - GFS/CMAN initiates fencing, one node gets rebooted.
> - Node reboots, link is still down. Since we were "degraded clusters"
> to begin
>   with, degr-wfc-timeout now applies, which is finite by default.
> - After the timeout expires, the recovered node is now Consistent and
> attempts
>   to fence the peer, when it should not.
> - Since the network link is still down, fencing fails, but we now
> assume the
>   peer is dead, and the node becomes Primary anyway.
> - We have split brain, diverging datasets, all our fencing precautions
> are moot.
> 
> --- Comment #3 From Philipp Reisner 2009-08-25 14:42:58 [reply]
---
> 
> 
> Proposed solution:
> 
> Just to recap, the expected exit codes of the fence-peer-hander:
> 
> 3  Peer's disk state was already Inconsistent.
> 4  Peer's disk state was successfully set to Outdated (or was Outdated
> to begin with).
> 5  Connection to the peer node failed, peer could not be reached.
> 6  Peer refused to be outdated because the affected resource was in
the
> primary role.
> 7  Peer node was successfully fenced off the cluster. This should
never
> occur
>unless fencing is set to resource-and-stonith for the affected
> resource.
> 
> Now, if we get a 5 (peer not reachable) and we are not UpToDate (that
> means we
> are only Consistent) then refuse to become primary and do not consider
> the peer
> as outdated.
> 
> The change in DRBD is minimal, but we need then to implement exit code
> 5 also
> in crm-fence-peer.sh
> 
> --- Comment #4 From Philipp Reisner 2009-08-26 19:15:33 [reply]
---
> 
> 
> commit 3a26bafa2e27892c9e157720525a094b55748f09
> Author: Philipp Reisner 
> Date:   Tue Aug 25 14:49:19 2009 +0200
> 
> Do not consider peer dead when fence-peer handler can not reach it
> and we are < UpToDate
> 
> --- Comment #5 From Philipp Reisner 2009-10-21 13:26:58 [reply]
---
> 
> 
> Released with

[DRBD-user] (no subject)

2010-02-09 Thread Petrakis, Peter

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] drbd with 3 nodes - please help a newbie

2010-02-09 Thread Muhammad Sharfuddin
Hello,

I am running a two node(node1, node2) active passive(standby) Oracle
cluster via Linux-HA. Oracle is installed on "/oracle", and /oracle is
an 'ext3' filesystem on SAN/LUN.

At any given time, either all of the resources(IP, Filesystem, and
Oracle) are on node1, or node2.

Now to make a DR, I want to put/implement 'drbd', but in a way that both
the cluster nodes(node1, and node2) remains mounting the same
disk/device(SAN Disk), but there will be another machine(node3) which
should not be the part of Linux-HA cluster, and will be the standby
oracle machine having its own/separate disk. 

so my drbd configuration should be like 

1 - /oracle is mounted on /dev/drbd0(/dev/sdb1.. a SAN disk/LUN) mounted
by either node1 or node2, 

and 

2 - /dev/drbd0(/dev/sdc1) on node3 will be the drbd devices.


is it possible ?
any help/document/url will be highly appreciated 

Regards,
--ms


___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] LVM crash maybe due to a drbd issue (Maxence DUNNEWIND)

2010-02-09 Thread Heribert Tockner
thx Lars for your answer, now i can give you some updates.




> it will be used:
>
version: 8.3.7 (api:88/proto:86-91)
GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by
r...@vserver3-backup, 2010-02-08 10:59:23


> > The Problem comes with the time not only a with load and requests. I made
> > test with dbench and recursive parallel processe on the installed Apache
> on
> > differnet sites on the Vserver in drbd. This produces load about 70
> > and more.
> > There were no Problem. After 5 days the system hangs with drbd. i let
> > you know about my tests.
>
> Are you sure it would not hang without DRBD?
>

yes. without Drbd the systems have no problems
>Funny memleaks somewhere?
I have done some memetests . ALso there are other productiv Standalone
Vmware-Guest without any problems


> TCP stack mistuned to break?
> Does it also hang with DRBD unconnected?
>


> What happens just before the hang?
>
> If you can reproduce with 2.6.22.19-grsec2.1.11-vs2.2.0.7,
> please try to also reproduce with something closer to kernel.org,
> just so we can rule out any strange side effects there.
>
> Good luck.
>
> --
> : Lars Ellenberg
> : LINBIT | Your Way to High Availability
> : DRBD/HA support and consulting http://www.linbit.com
>
> DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
> __
> please don't Cc me, but send to list   --   I'm subscribed
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>


Yesterday i have installed Linux.Vserver 2.6.22.19-vs2.2.0.7 without
Grsecurity and Drbd from  sources  Version: 8.3.7 .

yesterday night the secondary node gets unconnected with following log on
primary server:

Feb  8 21:18:04 vserver3-backup kernel: [40842.249486] block drbd1: PingAck
did not arrive in time.
Feb  8 21:18:04 vserver3-backup kernel: [40842.249678] block drbd1: peer(
Secondary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate ->
DUnknown )
Feb  8 21:18:04 vserver3-backup kernel: [40842.249700] block drbd1: asender
terminated
Feb  8 21:18:04 vserver3-backup kernel: [40842.249702] block drbd1:
Terminating asender thread
Feb  8 21:18:04 vserver3-backup kernel: [40842.249780] block drbd1: short
read expecting header on sock: r=-512
Feb  8 21:18:04 vserver3-backup kernel: [40842.249988] block drbd1: Creating
new current UUID
Feb  8 21:18:04 vserver3-backup kernel: [40842.250522] block drbd1:
Connection closed
Feb  8 21:18:04 vserver3-backup kernel: [40842.250531] block drbd1: conn(
NetworkFailure -> Unconnected )
Feb  8 21:18:04 vserver3-backup kernel: [40842.250535] block drbd1: receiver
terminated
Feb  8 21:18:04 vserver3-backup kernel: [40842.250537] block drbd1:
Restarting receiver thread
Feb  8 21:18:04 vserver3-backup kernel: [40842.250541] block drbd1: receiver
(re)started
Feb  8 21:18:04 vserver3-backup kernel: [40842.250546] block drbd1: conn(
Unconnected -> WFConnection )
Feb  8 21:18:11 vserver3-backup kernel: [40849.221564] block drbd0: PingAck
did not arrive in time.
Feb  8 21:18:11 vserver3-backup kernel: [40849.221658] block drbd0: peer(
Secondary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate ->
DUnknown )
Feb  8 21:18:11 vserver3-backup kernel: [40849.221675] block drbd0: asender
terminated
Feb  8 21:18:11 vserver3-backup kernel: [40849.221701] block drbd0:
Terminating asender thread
Feb  8 21:18:11 vserver3-backup kernel: [40849.221746] block drbd0: short
read expecting header on sock: r=-512
Feb  8 21:18:11 vserver3-backup kernel: [40849.221903] block drbd0: Creating
new current UUID
Feb  8 21:18:11 vserver3-backup kernel: [40849.222606] block drbd0:
Connection closed
Feb  8 21:18:11 vserver3-backup kernel: [40849.222618] block drbd0: conn(
NetworkFailure -> Unconnected )
Feb  8 21:18:11 vserver3-backup kernel: [40849.222623] block drbd0: receiver
terminated
Feb  8 21:18:11 vserver3-backup kernel: [40849.222628] block drbd0:
Restarting receiver thread
Feb  8 21:18:11 vserver3-backup kernel: [40849.222633] block drbd0: receiver
(re)started
Feb  8 21:18:11 vserver3-backup kernel: [40849.222640] block drbd0: conn(
Unconnected -> WFConnection )

After this time the Secondary Server was not reachable via Network.
log from secondary node before network crash.

Feb  8 21:08:52 vserver3-produktiv ntpd[2008]: synchronized to 85.233.96.33,
stratum 3
Feb  8 21:17:01 vserver3-produktiv /USR/SBIN/CRON[6004]: (root) CMD (   cd /
&& run-parts --report /etc/cron.hourly)

the server was not reachable for 30 minutes after this time pings arrived
the machine (nagios) .

Today i checked the network connection between the hosts. From primary to
secondary pings arrived . Secondary system was very strange . 1 ping returns
afterwards you only could interrupt with Control-C.

i tried to disable the network interface for drbd witch ifconfig eth1 down .
it doesnt seems to work and i interrupted with Control-C.
Only the reboot of the secondary system makes