[DRBD-user] LVM Snapshot cavets

2009-08-11 Thread Petrakis, Peter
Hi All,

http://thread.gmane.org/gmane.comp.linux.drbd/6175

Does this summary still hold true for drbd 8.2.7 and above?
I'm essentially building a storage plugin for XenServer 5.5 and want to
make sure I can still do "on the fly" snapshots.
It's seems to me that I'll have to go with LVM on top of DRBD to achieve
this. Thanks.

Peter
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] drbd fencing policy problem, upgrading from 8.2.7 -> 8.3.6

2010-02-05 Thread Petrakis, Peter
Hi All,

We use resource-only fencing currently which is causing us a problem
when we're bringing up a cluster for the first time. This all worked
well with 8.2.7. We only have one node active at this time and the
fencing handler is bailing with '5' which is correct. The problem is,
this return failure is stopping us from becoming Primary.

 
block drbd5: helper command: /usr/lib/spine/bin/avance_drbd_helper
fence-peer minor-5 exit code 5 (0x500)
block drbd5: State change failed: Refusing to be Primary without at
least one UpToDate disk
block drbd5:   state = { cs:StandAlone ro:Secondary/Unknown
ds:Consistent/DUnknown r--- }
block drbd5:  wanted = { cs:StandAlone ro:Primary/Unknown
ds:Consistent/DUnknown r--- }


We setup our metadata in advance before we use drbdsetup to attach the
disk. Here's the
metadata.

version "v08";
uuid {
  0x0006; 0x; 0x;
0x;
  flags 0x0011;
}
la-size-sect 2097016;
bm-byte-per-bit 4096;
device-uuid 0x;
bm {
  0x; 0x; 0x;
0x;
  4092 times 0x;
}


The flags we set ought to be telling DRBD that our data is UpToDate
enough to become
Primary but it doesn't seem to matter while fencing is enabled. The
result is
the same regardless of whether I specify '-o' while calling drbdsetup
primary.

This is how the disk is being setup.

/sbin/drbdsetup /dev/drbd5 disk /dev/disk-drbd5 /dev/disk-drbd5 internal
--set-defaults --create-device
 --on-io-error=detach -f resource-only

Thanks in advance.

Peter
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] (no subject)

2010-02-09 Thread Petrakis, Peter

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] drbd fencing policy problem, upgrading from 8.2.7 -> 8.3.6

2010-02-09 Thread Petrakis, Peter
Lars,

That bitmap adjustment did the trick, the resource-only fencing
is  working now. I'll investigate the new metadata tools, the less
of this stuff we have to maintain the better. Thanks!

Peter

> -Original Message-
> From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-
> boun...@lists.linbit.com] On Behalf Of Lars Ellenberg
> Sent: Saturday, February 06, 2010 1:12 PM
> To: drbd-user@lists.linbit.com
> Subject: Re: [DRBD-user] drbd fencing policy problem,upgrading from
> 8.2.7 -> 8.3.6
> 
> On Fri, Feb 05, 2010 at 12:46:24PM -0500, Petrakis, Peter wrote:
> > Hi All,
> >
> > We use resource-only fencing currently which is causing us a problem
> > when we're bringing up a cluster for the first time. This all worked
> > well with 8.2.7. We only have one node active at this time and the
> > fencing handler is bailing with '5' which is correct. The problem
is,
> > this return failure is stopping us from becoming Primary.
> >
> >
> > block drbd5: helper command: /usr/lib/spine/bin/avance_drbd_helper
> > fence-peer minor-5 exit code 5 (0x500)
> > block drbd5: State change failed: Refusing to be Primary without at
> > least one UpToDate disk
> > block drbd5:   state = { cs:StandAlone ro:Secondary/Unknown
> > ds:Consistent/DUnknown r--- }
> > block drbd5:  wanted = { cs:StandAlone ro:Primary/Unknown
> > ds:Consistent/DUnknown r--- }
> 
> 
> I'm quoting our internal bugzilla here:
> 
> scenario:
>   A (Primary) --- b (Secondary)
>   A (Primary) b (down, crashed; does not know that it is out of
> date)
>   a (down)
>clean shutdown or crashed, does not matter, knows b is Outdated...
> 
>   a (still down)  b (reboot)
>   after init dead, heartbeat tries to promote
> 
> current behaviour:
> b (Secondary, Consistent, pdsk DUnknown)
> calls dopd, times out, considers other node as outdated,
> and goes Primary.
> 
> proposal:
> only try to outdate peer if local data is UpToDate.
>   - works for primary crash (secondary data is UpToDate)
>   - works for secondary crash (naturally)
>   - works for primary reboot while secondary offline,
> because pdsk is marked Outdated in local meta data,
> so we can become UpToDate on restart.
>   - additionally works for previously described scenario
> because crashed secondary has pdsk Unknown,
> thus comes up as Consistent (not UpToDate)
> 
> avoids one more way to create diverging data sets.
> 
> if you want to force Consistent to be UpToDate,
> you'd then need either fiddle with drbdmeta,
> or maybe just the good old "--overwrite-data-of-peer" does the trick?
> 
> does not work for cluster crash,
> when only former secondary comes up.
> 
> 
> --- Comment #1 From Florian Haas 2009-07-03 09:07:57 [reply]
--
> -
> 
> Bumping severity to "normal" and setting target milestone to 8.3.3.
> 
> This is not an enhancement feature, it's a real (however subtle) bug.
> This
> tends to break dual-Primary setups with fencing, such as with GFS:
> 
> - Dual Primary configuration, GFS and CMAN running.
> - Replication link goes away.
> - Both nodes are now "degraded clusters".
> - GFS/CMAN initiates fencing, one node gets rebooted.
> - Node reboots, link is still down. Since we were "degraded clusters"
> to begin
>   with, degr-wfc-timeout now applies, which is finite by default.
> - After the timeout expires, the recovered node is now Consistent and
> attempts
>   to fence the peer, when it should not.
> - Since the network link is still down, fencing fails, but we now
> assume the
>   peer is dead, and the node becomes Primary anyway.
> - We have split brain, diverging datasets, all our fencing precautions
> are moot.
> 
> --- Comment #3 From Philipp Reisner 2009-08-25 14:42:58 [reply]
---
> 
> 
> Proposed solution:
> 
> Just to recap, the expected exit codes of the fence-peer-hander:
> 
> 3  Peer's disk state was already Inconsistent.
> 4  Peer's disk state was successfully set to Outdated (or was Outdated
> to begin with).
> 5  Connection to the peer node failed, peer could not be reached.
> 6  Peer refused to be outdated because the affected resource was in
the
> primary role.
> 7  Peer node was successfully fenced off the cluster. This should
never
> occur
>unless fencing is set to resource-and-stonith for the affected
> resource.
> 
> Now, if we get a 5 (peer not reachable) and we are not UpToDate (that
> means we
> are only Consistent) then refuse to become primary and do not consider
> the peer
> as outdated.
> 
> The change in DRBD

[DRBD-user] drbd-8.3.6 pdsk: Uptodate->Inconsitent but is really uptodate after resync

2010-02-10 Thread Petrakis, Peter
Hi All,

We're encountering a resync problem with 8.3.6 where after we resync,
the target node transitions to UpToDate, which the peer sees, but then
another state transition happens that claims the pdsk state is UpToDate
-> Inconsitent. The circumstances surrounding the fault were we lost
connectivity to our peer which was then rebooted, after which point the
resync began.

Here's the config (same for all resources):
/sbin/drbdsetup /dev/drbd16 show

disk {   
size0s _is_default; # bytes  
on-io-error detach;  
fencing dont-care _is_default;
max-bio-bvecs   0 _is_default;
}
net {
timeout 60 _is_default; # 1/10 seconds
max-epoch-size  2048 _is_default;
max-buffers 2048 _is_default;
unplug-watermark128 _is_default;
connect-int 10 _is_default; # seconds
ping-int10 _is_default; # seconds
sndbuf-size 0 _is_default; # bytes
rcvbuf-size 0 _is_default; # bytes
ko-count2;
allow-two-primaries;
after-sb-0pri   discard-zero-changes;
after-sb-1pri   violently-as0p;
after-sb-2pri   violently-as0p;
rr-conflict violently;
ping-timeout20; # 1/10 seconds
}
syncer {
rate30720k; # bytes/second
after   15;
al-extents  709;
}
protocol C;
_this_host {
device  minor 16;
disk"/dev/disk-drbd16";
meta-disk   internal;
address ipv4 169.254.84.220:8916;
}
_remote_host {
address ipv4 169.254.214.196:8916;
}

and the log snippets from both sides, I have full logs if needed. I
tried sending them
to the list, even zipped I can't get them across.

(Source)


Feb  6 01:57:13 node0 kernel: block drbd16: Starting asender thread
(from drbd16_receiver [4790]) Feb  6 01:57:13 node0 kernel: block
drbd16: data-integrity-alg:  Feb  6 01:57:13 node0 kernel:
block drbd16: drbd_sync_handshake:
Feb  6 01:57:13 node0 kernel: block drbd16: self
93E3C6B459596F95:3F4D478748F24FE7:6E1F4F316DBF9291:0006
bits:0 flags:0 Feb  6 01:57:13 node0 kernel: block drbd16: peer
3F4D478748F24FE6::6E1F4F316DBF9290:0006
bits:0 flags:0 Feb  6 01:57:13 node0 kernel: block drbd16:
uuid_compare()=1 by rule 70 Feb  6 01:57:13 node0 kernel: block drbd16:
peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk(
DUnknown -> UpToDate ) peer_isp( 0 -> 1 ) Feb  6 01:57:13 node0 kernel:
block drbd16: conn( WFBitMapS -> PausedSyncS ) pdsk( UpToDate ->
Inconsistent ) Feb  6 01:57:13 node0 kernel: block drbd16: Began resync
as PausedSyncS (will sync 0 KB [0 bits set]).
Feb  6 01:57:14 node0 kernel: block drbd16: aftr_isp( 1 -> 0 ) Feb  6
01:57:15 node0 kernel: block drbd16: Resync done (total 2 sec; paused 0
sec; 0 K/sec) Feb  6 01:57:15 node0 kernel: block drbd16: conn(
PausedSyncS -> Connected ) pdsk( Inconsistent -> UpToDate ) Feb  6
01:57:15 node0 kernel: block drbd16: pdsk( UpToDate -> Inconsistent )
peer_isp( 1 -> 0 )


(Target)

Feb  6 01:57:13 node1 kernel: block drbd16: Starting asender thread
(from drbd16_receiver [18186]) Feb  6 01:57:13 node1 kernel: block
drbd16: data-integrity-alg:  Feb  6 01:57:13 node1 kernel:
block drbd16: drbd_sync_handshake:
Feb  6 01:57:13 node1 kernel: block drbd16: self
3F4D478748F24FE6::6E1F4F316DBF9290:0006
bits:0 flags:0 Feb  6 01:57:13 node1 kernel: block drbd16: peer
93E3C6B459596F95:3F4D478748F24FE7:6E1F4F316DBF9291:0006
bits:0 flags:0 Feb  6 01:57:13 node1 kernel: block drbd16:
uuid_compare()=-1 by rule 50 Feb  6 01:57:13 node1 kernel: block drbd16:
peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk(
DUnknown -> UpToDate ) peer_isp( 0 -> 1 ) Feb  6 01:57:13 node1 kernel:
block drbd16: conn( WFBitMapT -> WFSyncUUID ) Feb  6 01:57:13 node1
kernel: block drbd16: helper command:
/usr/lib/spine/bin/avance_drbd_helper before-resync-target minor-16 Feb
6 01:57:13 node1 kernel: block drbd16: helper command:
/usr/lib/spine/bin/avance_drbd_helper before-resync-target minor-16 exit
code 0 (0x0) Feb  6 01:57:13 node1 kernel: block drbd16: conn(
WFSyncUUID -> PausedSyncT ) disk( UpToDate -> Inconsistent ) Feb  6
01:57:13 node1 kernel: block drbd16: Began resync as PausedSyncT (will
sync 0 KB [0 bits set]).
Feb  6 01:57:14 node1 kernel: block drbd16: aftr_isp( 1 -> 0 ) Feb  6
01:57:15 node1 kernel: block drbd16: Resync done (total 2 sec; paused 0
sec; 0 K/sec) Feb  6 01:57:15 node1 kernel: block drbd16: conn(
PausedSyncT -> Connected ) disk( Inconsistent -> UpToDate ) Feb  6
01:57:15 node1 kernel: block drbd16: helper command:
/usr/li

Re: [DRBD-user] drbd-8.3.6 pdsk: Uptodate->Inconsitent but isreally uptodate after resync

2010-02-10 Thread Petrakis, Peter
That used to make a difference but ever since we rebased to 8.3.x
we've dropped pretty much all of our patches except those that manage
the init scripts because the default behavior causes us some
trouble. It's really more of an integration effort. I think 
we have one kludge left that makes sure barriers are on by default
and a some extra anchors for blktrace to work, but that's it.

I can make the tree available to you if necessary.

Peter

> -Original Message-
> From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-
> boun...@lists.linbit.com] On Behalf Of Lars Ellenberg
> Sent: Wednesday, February 10, 2010 1:48 PM
> To: drbd-user@lists.linbit.com
> Subject: Re: [DRBD-user] drbd-8.3.6 pdsk: Uptodate->Inconsitent but
> isreally uptodate after resync
> 
> On Wed, Feb 10, 2010 at 07:31:07PM +0100, Lars Ellenberg wrote:
> > On Wed, Feb 10, 2010 at 09:46:45AM -0500, Petrakis, Peter wrote:
> > > Hi All,
> > >
> > > We're encountering a resync problem with 8.3.6
> 
> Nope.  You have a locally patched version possibly based on 8.3.6.
> Just for the record...
> 
> --
> : Lars Ellenberg
> : LINBIT | Your Way to High Availability
> : DRBD/HA support and consulting http://www.linbit.com
> 
> DRBD(r) and LINBIT(r) are registered trademarks of LINBIT, Austria.
> __
> please don't Cc me, but send to list   --   I'm subscribed
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user