[DRBD-user] DRBD Diskless
Every time I restart both clusters, one comes back up as secondary and the other one comes back as diskless (only one resource which is the /home directory) All I have to do (manually) is to stop DRBD & HEARTBEAT, and start both services back again and promote the secondary as primary. Why is this thing happening? Any extra information, please let me know, Thanks! LOPD: INFORMACION SOBRE PROTECCION Y TRATAMIENTO DE DATOS PERSONALES: De acuerdo con la Ley de Servicios de la Sociedad de la Informacion y Comercio Electrónico aprobada por el parlamento español y de la vigente Ley Orgánica 15 13/12/1999 de Protección de Datos española, le comunicamos que su dirección de Correo electrónico forma parte de nuestra base de datos, teniendo usted derecho de oposición, acceso, rectificación y cancelación de sus datos. Si desea acceder, rectificar, cancelar u oponerse al tratamiento de sus datos, contacte con nosotros en DATATRONICS, S.A. Avda. Cardenal Herrera Oria,383 - 28034 Madrid, o bien envíenos un e-mail a l...@datatronics.es junto con su nombre y apellidos, indicando en el campo asunto el texto: "Borrar Datos Personales". Si desea seguir recibiendo información, su dirección de correo electrónico seguirá en nuestra base de datos, entendiéndose que acepta los términos y condiciones, expresados en este Aviso Legal. Agradeciendo su colaboración para poder seguir ofreciéndole nuestros servicios y reiterando nuest ro firme compromiso de uso responsable de sus datos, aprovechamos la ocasión para saludarle muy cordialmente. AVISO DE CONFIDENCIALIDAD: Este mensaje y los archivos adjuntos son confidenciales. Los mismos contienen informaciÿn reservada y que no puede ser difundida. Si usted ha recibido este correo por error, por favor avÿsenos inmediatamente por e-mail o por telÿfono (+34 91 376 92 90) y tenga la amabilidad de eliminarlo de su sistema. CONFIDENTIALE NOTICE: This message and its attached files are confidential. They contain information that is privileged and legally exempt from disclosure. If you have received this e-mail by mistake, please let us know immediately by e-mail and by phone (+34 91 376 92 90 ) and delete it from your system. ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] drbd fencing policy problem, upgrading from 8.2.7 -> 8.3.6
Lars, That bitmap adjustment did the trick, the resource-only fencing is working now. I'll investigate the new metadata tools, the less of this stuff we have to maintain the better. Thanks! Peter > -Original Message- > From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user- > boun...@lists.linbit.com] On Behalf Of Lars Ellenberg > Sent: Saturday, February 06, 2010 1:12 PM > To: drbd-user@lists.linbit.com > Subject: Re: [DRBD-user] drbd fencing policy problem,upgrading from > 8.2.7 -> 8.3.6 > > On Fri, Feb 05, 2010 at 12:46:24PM -0500, Petrakis, Peter wrote: > > Hi All, > > > > We use resource-only fencing currently which is causing us a problem > > when we're bringing up a cluster for the first time. This all worked > > well with 8.2.7. We only have one node active at this time and the > > fencing handler is bailing with '5' which is correct. The problem is, > > this return failure is stopping us from becoming Primary. > > > > > > block drbd5: helper command: /usr/lib/spine/bin/avance_drbd_helper > > fence-peer minor-5 exit code 5 (0x500) > > block drbd5: State change failed: Refusing to be Primary without at > > least one UpToDate disk > > block drbd5: state = { cs:StandAlone ro:Secondary/Unknown > > ds:Consistent/DUnknown r--- } > > block drbd5: wanted = { cs:StandAlone ro:Primary/Unknown > > ds:Consistent/DUnknown r--- } > > > I'm quoting our internal bugzilla here: > > scenario: > A (Primary) --- b (Secondary) > A (Primary) b (down, crashed; does not know that it is out of > date) > a (down) >clean shutdown or crashed, does not matter, knows b is Outdated... > > a (still down) b (reboot) > after init dead, heartbeat tries to promote > > current behaviour: > b (Secondary, Consistent, pdsk DUnknown) > calls dopd, times out, considers other node as outdated, > and goes Primary. > > proposal: > only try to outdate peer if local data is UpToDate. > - works for primary crash (secondary data is UpToDate) > - works for secondary crash (naturally) > - works for primary reboot while secondary offline, > because pdsk is marked Outdated in local meta data, > so we can become UpToDate on restart. > - additionally works for previously described scenario > because crashed secondary has pdsk Unknown, > thus comes up as Consistent (not UpToDate) > > avoids one more way to create diverging data sets. > > if you want to force Consistent to be UpToDate, > you'd then need either fiddle with drbdmeta, > or maybe just the good old "--overwrite-data-of-peer" does the trick? > > does not work for cluster crash, > when only former secondary comes up. > > > --- Comment #1 From Florian Haas 2009-07-03 09:07:57 [reply] -- > - > > Bumping severity to "normal" and setting target milestone to 8.3.3. > > This is not an enhancement feature, it's a real (however subtle) bug. > This > tends to break dual-Primary setups with fencing, such as with GFS: > > - Dual Primary configuration, GFS and CMAN running. > - Replication link goes away. > - Both nodes are now "degraded clusters". > - GFS/CMAN initiates fencing, one node gets rebooted. > - Node reboots, link is still down. Since we were "degraded clusters" > to begin > with, degr-wfc-timeout now applies, which is finite by default. > - After the timeout expires, the recovered node is now Consistent and > attempts > to fence the peer, when it should not. > - Since the network link is still down, fencing fails, but we now > assume the > peer is dead, and the node becomes Primary anyway. > - We have split brain, diverging datasets, all our fencing precautions > are moot. > > --- Comment #3 From Philipp Reisner 2009-08-25 14:42:58 [reply] --- > > > Proposed solution: > > Just to recap, the expected exit codes of the fence-peer-hander: > > 3 Peer's disk state was already Inconsistent. > 4 Peer's disk state was successfully set to Outdated (or was Outdated > to begin with). > 5 Connection to the peer node failed, peer could not be reached. > 6 Peer refused to be outdated because the affected resource was in the > primary role. > 7 Peer node was successfully fenced off the cluster. This should never > occur >unless fencing is set to resource-and-stonith for the affected > resource. > > Now, if we get a 5 (peer not reachable) and we are not UpToDate (that > means we > are only Consistent) then refuse to become primary and do not consider > the peer > as outdated. > > The change in DRBD is minimal, but we need then to implement exit code > 5 also > in crm-fence-peer.sh > > --- Comment #4 From Philipp Reisner 2009-08-26 19:15:33 [reply] --- > > > commit 3a26bafa2e27892c9e157720525a094b55748f09 > Author: Philipp Reisner > Date: Tue Aug 25 14:49:19 2009 +0200 > > Do not consider peer dead when fence-peer handler can not reach it > and we are < UpToDate > > --- Comment #5 From Philipp Reisner 2009-10-21 13:26:58 [reply] --- > > > Released with
[DRBD-user] (no subject)
___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
[DRBD-user] drbd with 3 nodes - please help a newbie
Hello, I am running a two node(node1, node2) active passive(standby) Oracle cluster via Linux-HA. Oracle is installed on "/oracle", and /oracle is an 'ext3' filesystem on SAN/LUN. At any given time, either all of the resources(IP, Filesystem, and Oracle) are on node1, or node2. Now to make a DR, I want to put/implement 'drbd', but in a way that both the cluster nodes(node1, and node2) remains mounting the same disk/device(SAN Disk), but there will be another machine(node3) which should not be the part of Linux-HA cluster, and will be the standby oracle machine having its own/separate disk. so my drbd configuration should be like 1 - /oracle is mounted on /dev/drbd0(/dev/sdb1.. a SAN disk/LUN) mounted by either node1 or node2, and 2 - /dev/drbd0(/dev/sdc1) on node3 will be the drbd devices. is it possible ? any help/document/url will be highly appreciated Regards, --ms ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] LVM crash maybe due to a drbd issue (Maxence DUNNEWIND)
thx Lars for your answer, now i can give you some updates. > it will be used: > version: 8.3.7 (api:88/proto:86-91) GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by r...@vserver3-backup, 2010-02-08 10:59:23 > > The Problem comes with the time not only a with load and requests. I made > > test with dbench and recursive parallel processe on the installed Apache > on > > differnet sites on the Vserver in drbd. This produces load about 70 > > and more. > > There were no Problem. After 5 days the system hangs with drbd. i let > > you know about my tests. > > Are you sure it would not hang without DRBD? > yes. without Drbd the systems have no problems >Funny memleaks somewhere? I have done some memetests . ALso there are other productiv Standalone Vmware-Guest without any problems > TCP stack mistuned to break? > Does it also hang with DRBD unconnected? > > What happens just before the hang? > > If you can reproduce with 2.6.22.19-grsec2.1.11-vs2.2.0.7, > please try to also reproduce with something closer to kernel.org, > just so we can rule out any strange side effects there. > > Good luck. > > -- > : Lars Ellenberg > : LINBIT | Your Way to High Availability > : DRBD/HA support and consulting http://www.linbit.com > > DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. > __ > please don't Cc me, but send to list -- I'm subscribed > ___ > drbd-user mailing list > drbd-user@lists.linbit.com > http://lists.linbit.com/mailman/listinfo/drbd-user > Yesterday i have installed Linux.Vserver 2.6.22.19-vs2.2.0.7 without Grsecurity and Drbd from sources Version: 8.3.7 . yesterday night the secondary node gets unconnected with following log on primary server: Feb 8 21:18:04 vserver3-backup kernel: [40842.249486] block drbd1: PingAck did not arrive in time. Feb 8 21:18:04 vserver3-backup kernel: [40842.249678] block drbd1: peer( Secondary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) Feb 8 21:18:04 vserver3-backup kernel: [40842.249700] block drbd1: asender terminated Feb 8 21:18:04 vserver3-backup kernel: [40842.249702] block drbd1: Terminating asender thread Feb 8 21:18:04 vserver3-backup kernel: [40842.249780] block drbd1: short read expecting header on sock: r=-512 Feb 8 21:18:04 vserver3-backup kernel: [40842.249988] block drbd1: Creating new current UUID Feb 8 21:18:04 vserver3-backup kernel: [40842.250522] block drbd1: Connection closed Feb 8 21:18:04 vserver3-backup kernel: [40842.250531] block drbd1: conn( NetworkFailure -> Unconnected ) Feb 8 21:18:04 vserver3-backup kernel: [40842.250535] block drbd1: receiver terminated Feb 8 21:18:04 vserver3-backup kernel: [40842.250537] block drbd1: Restarting receiver thread Feb 8 21:18:04 vserver3-backup kernel: [40842.250541] block drbd1: receiver (re)started Feb 8 21:18:04 vserver3-backup kernel: [40842.250546] block drbd1: conn( Unconnected -> WFConnection ) Feb 8 21:18:11 vserver3-backup kernel: [40849.221564] block drbd0: PingAck did not arrive in time. Feb 8 21:18:11 vserver3-backup kernel: [40849.221658] block drbd0: peer( Secondary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) Feb 8 21:18:11 vserver3-backup kernel: [40849.221675] block drbd0: asender terminated Feb 8 21:18:11 vserver3-backup kernel: [40849.221701] block drbd0: Terminating asender thread Feb 8 21:18:11 vserver3-backup kernel: [40849.221746] block drbd0: short read expecting header on sock: r=-512 Feb 8 21:18:11 vserver3-backup kernel: [40849.221903] block drbd0: Creating new current UUID Feb 8 21:18:11 vserver3-backup kernel: [40849.222606] block drbd0: Connection closed Feb 8 21:18:11 vserver3-backup kernel: [40849.222618] block drbd0: conn( NetworkFailure -> Unconnected ) Feb 8 21:18:11 vserver3-backup kernel: [40849.222623] block drbd0: receiver terminated Feb 8 21:18:11 vserver3-backup kernel: [40849.222628] block drbd0: Restarting receiver thread Feb 8 21:18:11 vserver3-backup kernel: [40849.222633] block drbd0: receiver (re)started Feb 8 21:18:11 vserver3-backup kernel: [40849.222640] block drbd0: conn( Unconnected -> WFConnection ) After this time the Secondary Server was not reachable via Network. log from secondary node before network crash. Feb 8 21:08:52 vserver3-produktiv ntpd[2008]: synchronized to 85.233.96.33, stratum 3 Feb 8 21:17:01 vserver3-produktiv /USR/SBIN/CRON[6004]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) the server was not reachable for 30 minutes after this time pings arrived the machine (nagios) . Today i checked the network connection between the hosts. From primary to secondary pings arrived . Secondary system was very strange . 1 ping returns afterwards you only could interrupt with Control-C. i tried to disable the network interface for drbd witch ifconfig eth1 down . it doesnt seems to work and i interrupted with Control-C. Only the reboot of the secondary system makes